Completed

Overture keyword extracter

I am looking for a script that uses Overture Search Term Suggestion Tool to compile a list of ALL the keyword phrases that have a given term as a substring.

Example:

I have a term "clip", the script will give me a text file [url removed, login to view] that has a list of all the keywords from page

<[url removed, login to view]>

and all the pages that are linked to from this page and so on.

The result list must be sorted by Count (a number of searches supplied by Overture) and should have all the duplicate entries removed.

Important: The script must do deep crawling - it should get all the entries that are relevant to a given term, not only the entries from pages on level one or two. Basically, the script should stop gathering the content only when there is no more links to content. This is an example of a good page to stop:

<[url removed, login to view]>

- the page doesn't have any more links.

Requirements:

The script should be done in Perl or PHP, should be runnable from command line, should be able to gather info on popular terms (means it should be able to download and parse tens of thousands of pages; terms such as "clip" or "camera" are good examples of popular terms; I will not accept code that works OK for smaller terms like "toronto condos", but doesn't work for larger terms).

Hint:

Here is a script that partially does what I need:

<[url removed, login to view]>

## Deliverables

1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.

2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):

a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.

b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.

3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).

## Platform

Linux box with PHP and Perl installed.

Evner: Perl, PHP

Se mere: what is substring, substring search, substring in c, substring c, substring 0 2, search term tool, search substring, linux deep web, i need to do a deep web search, get substring, deep web linux, c substring, what is web crawling, duplicate inventory, linux compile program, perl keyword, inventory php html, place keyword, duplicate file source code, overture search term suggestion

Om arbejdsgiveren:
( 7 bedømmelser ) Canada

Projekt ID: #3332107

Tildelt til:

kreutz

See private message.

$42.5 USD in 10 dage
(10 bedømmelser)
4.5

2 freelancere byder i gennemsnit $43 på dette job

jas969

See private message.

$44.2 USD in 10 dage
(2 bedømmelser)
1.3