Brief description

A website analyzer is a web/desktop/mobile app that samples a few pages of a website and gathers useful information.

Input

You are given a website address (also known as the URL)

Output

A linked tag cloud of all the significant terms in the website content

A dashboard containing the results of analysis of the website content.

Features – Essential

  1. Given a URL, find the home page of the website (sometimes you may be given a URL of an inner page)
  2. Find the links in the home page (and remove duplicates)
  3. Store the links
  4. For each link, get the web page, extract text and store it
  5. Parse the text (remove stop words, punctuation) and generate a list of uni-grams, and bi-grams from the text. We will refer to these as key terms.
  6. Create a tag cloud of the top 20  key terms.

Features – Desirable

  1. Customize the tag cloud (multiple fonts based – the more frequent the term, the higher the font)
  2. Make each key-term a hyper-link. You can display terms with higher frequency with a bigger font compared to the terms with lower frequency.
  3. Generate a JSON file with the information given below:

1. Number of pages at the top level
2. A list of hyperlinked titles  and a list of 10 tags (from the page text)

  1. Display the statistics gathered in 9

Features – Nice to Have

  1. Include  additional information in the JSON file

1. A list of contact addresses extracted from the site (if available)
2. A list of job positions on the site (if available)
3. A list of products or services the company offers (if available)
4. Social media links (if available)

  1. Display the statistics  gathered in steps 9, and 11.

Leave a Reply

Your email address will not be published.