I need a program that will leverage the Keyword data from IBM's Watson Natural Language Understanding AI.
Input will be the top 30 URLs that rank on Google for a specific keyword phrase. For example, "apple pie recipe."
Output number 1: A list of keyword phrases with at least a 0.3 Keyword Relevance score. These words will be sorted in order of frequency.
I am not a programmer, but I used the text analysis sandbox to run a list of URLs that rank for the keyword "apple pie recipe" on Google. https://www.ibm.com/demos/live/natural-language-understanding/self-service/home
I copied all the Keywords data from the Extraction tab, via the sandbox, into a spreadsheet. I then ran it through a simple online word frequency calculator. That gave me a basic list of phrases with the highest frequency.
I have attached two spreadsheets (keywords-watson & keyword-density) to give you a good starting off point of what I am wanting to accomplish. The first has the Keywords, Relevance Scores, and URLs from where the KWs came from. The second has the results of the online word frequency calculator that I dropped all the Keywords into.
Output number 2: A list of the top 30 URLs for the keyword phrase with the...
-Number of words on the web page -Content score (0-100 scale converted to letter grade)
The content score is going to come from a basic algorithm calculating the percentage of words from Output 1 that were used in that particular website URL. I have attached a report for the top 30 URLs that show up in Google for "apple pie recipe." This report comes from a working program that uses the same data from Watson's NLU. The goal here is to create a simple algorithm that will get scores similar to the scores on this report.
Here are the customization coding docs for Watson's NLU: https://cloud.ibm.com/docs/natural-language-understanding?topic=natural-language-understanding-customizing