I am looking at a general web scraping application that potentially targets a large section of a site like [login to view URL], [login to view URL] or Newegg.com. My target data that I will try and extra will vary from site to site and of course, it's dependant on how the site is structured and how information can be searched, my data needs will hinge off that. The following is a general outline of 3 possible strategies that we might invoke on any given site:
1)Input a search term and extra search return results and possible recurse through the search return results.
2)Being at a particular base starting page and recurse through all links and sublinks from that starting page.
3)Some pages may require that we login with a username and password in order to be entitled to do 1) or 2) or possibly a third kind which is to extra specific pieces of data from specific pages.
On top of all of this, one key concern is the amount of storage space required to store this raw data and whether to propose some kind of regular expression methods to reduce the data down to the parts that we probably are most interested in.
I suppose we can possibly tackle each website as a seperate project as I expect applicants to have extensive experience performing webscraping on the websites mentioned and I wil disclose more sites as we progress on.
For those choosing to apply for this project, can you please provide a response to the question about data reduction and whether it's easily performed, it's pros and cons, and whether I can even be given a user interface to tweak what that data reduction method might be, eg. an interface to specify key words or phrases that would help us pinpoint what data would be extracted.
16 freelancere byder i gennemsnit $490 på dette job
Hi, I have developed similar scraper/crawler and data/web automation projects. Please let me know if you are interested and I am available to start right away.
Hi, I have done a lot of web scrapers. Here are my past projects: [login to view URL] This project is perfect with me. Hope to work with you. Best Regards
Hi, I have developed a Chrome extension that might suit your needs. Please watch this video to see how it works: [login to view URL] Best regards, Youssef