I need a script for harvesting the data from 99 acres . com. The site uses AJAX for its search results.
The script needs to pull the search results from that site into an XML and images folder structure.
One more thing, the script needs to support incremental scrapping. What I mean by this is that the first time the script is run it should get all the data for the search criteria. The next time it is run, it should only get the results that were not scraped the last time the script was run i.e. any new results should be retrieved.
My one last request on the script. Sorry I am still trying to get the requirements together. There is another real-estate site magic bricks . com Is there any way you can scrap 99acres.com and magic bricks . com. Merge the results from both and remove any duplicates? May be using MongoDd to store the data temporarily for the comparison before outputting the final results in XML/JSON & img structure? I think the DB can then also be used to filter the content for incremental scrapping (Just an Idea)