Looking for a web crawler built in Python (2.7.x) for crawling predefined lists of URLs. Crawler should be confined to the input URL list only. Python script should ask for a list with ULRs. Only crawling of defined domains is requested. Output files should be in XML format, one file for each page URL. Each XML should only include body text without HTML tags (only text is requested). Each XML should be named after the page_url. Requested fields in the XML should include page URL, URL domain, date of crawling. Nice to haves: publishing date of the website (or last updated). It's a small pilot project.
40 freelancere byder i gennemsnit €145 på dette job
Hi, I have read your project details and can make this crawler in PYTHON, I have experience with similar crawler in PYTHON, lets dicuss more in chat Proposed Milestones €90 EUR - Final Milestone
It looks like as quite easy project Relevant Skills and Experience I have experience with Web scraping, and about 2 yerars of experience with Puthon programming Proposed Milestones €33 EUR - 33EUR