All code is to be written in Python in the Plone framework, and database operations should all use SQLAlchemy.
Create a python application that will walk a web site with bibliographic data, gathering author names. These names and their papers' names will be stored in a couple of DBMS tables. Application will keep track of how often it has crawled and extracted data from these web sites. When the web sites change (date/diff) they will be crawled again and new information will be added to the database tables (and timestamp noted in table entries).
This system will be used by a researcher to perform a continuous search. The researcher will keep track of other researchers home pages. These home pages usually have a listing of papers. The format of these listings varies, so you cannot definitivelyparse the information. Therefore, the researcher needs to perform the "mapping". So, if initially you present the listing to the researcher in HTML format, the researcher can cut and paste the relevant paper titles into entry fields. You can then persist these titles in some "paper" table. You can also cache a copy of the listing web page for future crawls. So, in future crawls, crawl the page and present the HTML diff text to the researcher. This most likely will contain text of new papers.
Deadline is 5 days. It's strict, so don't bid if you can't make it. Only serious coders will be considered.