Hi,
I'm quite interested in developing your web crawler (it's the reason I've joined Freelancer). I've made several crawlers of my own for various purposes, both for my role as a web application developer for Monash University Australia, and for my own interest.
Please don't hesitate to contact me if you have any questions about my experience, or about the approach I'm proposing for development of your web crawler.
Assumptions:
I would be undertaking this development using Perl. I have a development server I'd use whilst developing the crawler. After development is complete I'd assist you in migrating the code to the environment in which you'd like it to run.
Parts 1 and 2 of your proposal are easily achieved. 3 and 4, however, will be the more challenging parts.
Part 3:
Before beginning development, we should agree the desired success rate for article classification. The crawlers I've created for searches have indexed the contents of articles, so it would be a small step then to determine the sorts of content appearing in the index.
Ideally we'd create a test set of articles that have already been categorized against which the performance of my development could be measured.
Part 4:
If the aim here is to find the hyperlinks between articles we're indexing, then this step will be quite simple. However if it will require a comparison of the contents of articles to see if they're related, it becomes more complex. Here my approach would be similar to that used for categorization, however, the content would be rated according to how well it matched terms and phrases in other articles in the database.
Milestone:
I would require the milestone payment of 80% after I can demonstrate a script with all the functionality you have described. The further 20% would be for refining article categorization and the finding of related articles to your specific requirements.
Time taken for delivery of project:
My estimate of 14 days is elapsed time, so 2 weeks from beginning the project, you would have the final product. Though please let me know if you'd prefer a faster turnaround, as it's possible I could reprioritize other projects to accommodate a shorter timeframe.
Thank you for considering my proposal.
Kind regards,
Ed