I am looking for a developer with skills to develop a web crawler and data extraction spider. I am only interested in provider with experience in crawler development.
The spider needs to perform these functions:
1) Search for all sites in a particular country that meet the specific subject/search terms. This could be achieved by querying Google to get a list of sites to crawl. This will then create a list of sites to crawl regularly.
2) Crawl the list of sites from step 1 and search for a specific types of items on the crawled web sites in the list.
3) If the item type is found on the web site then extract the data from the web pages in as clean a way as possible. This process will remove as many HTML, CSS and other tags as possible to acquire data that is relevent and as free from distracting tags as possible.
4) Write the extracted content to a table in a MySQL database.
All code to be PHP 5+ compatible and to run as a windows executable or as PHP script.
There will be other projects for the developer with the right skill sets, experience and sheer ability to deliver high quality systems at a reasonable price.