We are using Chrome extension [login to view URL] to scrap a web site. It includes two scraping sitemaps (in webscraper's terminology), one to retrieve a list of all products from a web store, and another one to visit pages of new products and extract the actual information we need. Both sitemaps are running fine, but they are fired manually, and the list of new products URLs is also set manually.
So I want a script to run in a Windows machine (probably a laptop) to do this:
1) Run the first sitemap.
2) Read the results wherever they are stored, and send them via HTTP POST to our server.
3) Later, poll the server and get a list of URLs to visit - about a hundred each time.
4) Edit the webscraper's sitemap JSON file to include that list of URLs (if any).
5) Run webscraper with the new sitemap.
6) Read the results and send again to the server via HTPP POST.
7) Repeat from point 3 until the next day, then repeat from point 1.
The scripts also can be run manually, for testing.
If possible, the new scripts, or macros, or whatever, should run in a standard Windows machine. And have to be editable, inform about what is happening, and have a verbose mode for debugging.
7 freelancere byder i gennemsnit €207 på dette job
Hi, Dear Client. I am expert Data Miner and Site Crawler. You can check my skills from my work history and reviews. I can make that perfectly. Thanks