I need a program that will count the occurrences of each word appearing in a document, and output a **ranked frequency table** with every word, and its frequency.
Ideally the program should be able to read MS Word documents, pdf files, and text documents, contained in a given folder, and give a total list/count for all the documents in the folder. If it can also identify high frequency strings within words, and phrases of more than one word, and ignore case, so much the better.
A **second phase** of the project would enable a list of possible abbreviations to be compared with the list of words, and calculate the number of keystrokes saved if the abbreviations had been used in place of the full words/strings/phrases. The list of abbreviations will be compiled by hand, but should be capable of comparison repeatedly, ie without entering them all by hand again, against multiple document analyses.
The system must run under Windows XP.
The **third phase** would see the software deployed on a web server so that users could upload their own folder of files, and receive the analysis automatically once generated.
I am happy to receive quotes for phase 1 alone, 1+2, or all 3.
1) Complete and fully-functional working program(s) in executable form as well as complete source code of all work done.
2) Deliverables must be in ready-to-run condition, as follows (depending on the nature of the deliverables):
a) For web sites or other server-side deliverables intended to only ever exist in one place in the Buyer's environment--Deliverables must be installed by the Seller in ready-to-run condition in the Buyer's environment.
b) For all others including desktop software or software the buyer intends to distribute: A software installation package that will install the software in ready-to-run condition on the platform(s) specified in this bid request.
3) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
Windows XP for phases 1 and 2. Open to suggestions for phase 3.