I would like a programmer to crawl a website and extract data into an MS Access file.
The main site is [url removed, login to view]
All of the information is publicly available, but a significant portion of it is embedded in drop down boxes and secondary links.
Specifically, I would like a programmer to use a list Newgrounds URLs (which I will provide) and extract a variety of information that will be either directly or indirectly accessible from those URLs.
There are around 50 to 60 variables on each submission (URL) that I will need information on.
Each randomly selected submission, furthermore, will not have data extracted in isolation. After every submission has had its data extracted, I would like information on all other submissions made by the same author. This means that the URLs that I will provide will not constitute the entire list of URLs ultimately used.
Lastly, it is important to note that not all submissions will be substantive. Some URLs will not need full author-portfolio data extraction. Rather than a full entry, some will lead to deleted submissions and their “graveyards”. Simpler information will be collected from such submissions and their associated sites.
If the collection of some variables proves especially cumbersome, I am willing to drop a few. Total collection, however, would be preferable.
Once an arrangement has been reached, I will provide the successful bidder with a list of the actual URLs, a sample of the data, more detailed instructions, and a complete variable list. The URLs will number around 700.
Please note that a format other than MSAccess can be used, such as standard Comma Separated Values, as long as the data can be efficiently imported into Stata 7.
Please find below the full variable list, with locations and samples. The URLs haven't been prepared yet, but there will be about 700 of them.
See attached for variable list on "graveyard" URLs.
24 freelancers are bidding on average $242 for this job
Wow! I like your style of concretizing your requests. Its make me clear about your needs. Ready to start immidiately. Hope you'll choose me for that. Best regards, Alex.
Hi, The number of days might reduce once all the URL's are available and if found that fields to gather are almost the same across all the sites. Best Regards, Water...