This project involves improving the speed of an inner loop in our system by using MMX/SSE/SSE2 instructions. The work is performed in a "sandbox" style project that contains a reference implementation with a test harness to benchmark and check results for correctnes.
The operation in question works on grayscale images and creates a sum of differences over the whole image. Pixels are simple unsigned bytes.
A "naive" implementation of a portion of the loop is given in the project material. Bonus possible if speedup is significantly larger than required.
1) Complete and fully-functional working program in executable form as well as complete source code of all work done.
2) Results of operation must be correct to a specified small margin of error. Timing results must indicate a [url removed, login to view] speedup or more over the included current MMX implementation.
3) Deliverables are: a) All source code needed, b) project files for specified built enviroment ready to compile and run, and c) binaries ready to run. All deliverables are to be contained in a zip (or similar) archive.
4) All deliverables will be considered "work made for hire" under U.S. Copyright law. Buyer will receive exclusive and complete copyrights to all work purchased. (No GPL, GNU, 3rd party components, etc. unless all copyright ramifications are explained AND AGREED TO by the buyer on the site per the coder's Seller Legal Agreement).
The resulting code should run on processors with MMX support under XP, compiled with Visual Studio [url removed, login to view]