We are currently in the process of implementing the Squish-E indexing engine to spider a various number of websites. However, the spider that it currently comes with does not do any pre-processing on the HTML before sending it to the indexing engine. We need someone to make the following modifications to the spider: 1. Be able to read new configuration data that specificies a textual replacement in the HTML document. For example, we would like to configure the following in the spider config doc: #replace "" "" this would replace the above html comment with the xml tag . #insert " there " 50 this would insert the enclosed text at line 50 #span class=main_article_title title would replace this: Are .NET Code Generators Worth It? with: There will be others modifications we need as we find new sites that we want to index, but the above enhancements are needed immediately.
1) Updated spider script. 2) Approval for project will be based on a successful implementation of each new modification and test search against a website using squish.
Running on Windows2000, but Squish is a cross platform open-source product.