I need to extract some information from a number of websites.
All the websites are public and any data extracted will be credited / attributed back to the source website.
Most of the websites have static data so that should be straight forward.
However some websites require some interaction with the browser to reveal the data.
For example, click on a button or a drop down menu. There is also one particular site that loads data as the user scrolls down, however this appears within an iframe, so to reveal the data, the user must scroll within the iframe, not the whole page.
I need to be able to extract the data from these websites, preferably in CSV format. In certain cases the data needs to be also manipulated, e.g. remove a decimal point, remove a comma etc.
I need this to run in a script on a Linux machine (probably Debian) and if possible also in a Windows environment.
This job is initially for three pages only from three different websites. If the results are OK, then I will proceed with the chosen freelancer with the rest of the pages. There are at least 30 pages in total, possibly more.
Also, I need the chosen person to spend a bit of time to talk me through the code, so that I can also understand what it's doing. I don't want a black box code, without any explanation of what is going on. I have some CSS experience and I'm comfortable with basic scripts in Linux, but obviously I'm not an expert.
Please reply with details of which technologies / languages you intend to use. Please do not just reply with "I can do this". I would like to understand what detailed knowledge you have in web scraping. If you can show me a previous web scraping project you've done, then even better.
Please feel free to ask any questions.
33 freelancere byder i gennemsnit $158 på dette job
I can provide you fast and effective Python + Scrapy spiders that will output into CSV file just like you want. You'll get easy to read source code + instructions on how to install / use it.
Hello! I have read the job description with care. I am intending to use python with selenium on the web scrap. I have some experiences with web scrap by python. Regards