Lukket

C# web crawler

I would like a web crawler written in C#

Requirements:

- based on a specific list of websites, be able to crawl the entire content of the sites

- need some strategy to permit the crawling to scale to a decent level using a single machine; so I would expect the crawler to use multithreading or asychnronous I/O to reach processing speeds of at least 5 pages/second

Some other questions:

- do you have a strategy for handling dynamic pages? I.e. crawling a site in which most of the content is hidden behind a form?

- what is your approach for making sure the crawler doesn't revisit the same pages? Do you check the URL, check the page content itself?

- what is your approach for handling site revisits?

- would be nice to have the app unzip pages that are returned compressed.

I'm not looking for any add-on indexing or search functionality. Just a high-quality crawler, with clean modular code that I can use to integrate into my other applications. Your output will be the working source code and a reasonably small amount of availability for questions (obviously, the better commented and cleaner the code, the less then need for questions).

I don't have a huge amount of time, so it's likely that the winning bidder will be someone who's already built similar crawlers, therefore has already thought about the key issues, etc.

Please bid the real price you'd like to charge; (ignore the range I've selected below- it's clearly too low)

Færdigheder: .NET

Se mere: web crawler source code, crawler content web, crawler app, url crawler, working web crawler, sure web, scale price, quality source, high quality net, code web, app making price, crawler real time, web crawler strategy, web content crawler, search o, o search, hidden web, web net, web making, web e, web crawlers, web c, price crawler, list making, it web

Om arbejdsgiveren:
( 0 bedømmelser ) vienna, Austria

Projekt-ID: #21663