The purpose of this project is too create a program that will aid in finding patterns in binary code. This is to research into the possibilities of A.I. and better compression programs for specific files. All the program does is read a files binary and looks for patterns of binary that can be taken out. For instance if it were looking at a 20 bit file:
A pattern can be seen if we look at the first two 0's and skip two we have two more 0's and this continues throughout the whole file. However we don't only want patterns that go through the whole file as sometimes this doesn't exsist for instance in the same example the third 1 if you skip three you reach another 1 and skip three again there is the 1 again. After that the pattern ends. I need a program that searches through and finds every possible combination of a pattern possible. I realize this is a large task no matter how much data is being looked at and I do realize the program will be slow. The second function will be described in the bigger description.
Once the program scans through the file all the possible ways to proceed will be listed. They should be ordered by what will take the most data away and it should be noted wither or not multiple actions can be taken at once or not. It should also be stated how much will be taken away. When looking for these patterns we need a system that will make it so in decompression we can get back all the data. So if we had our original 20 bit example
and we wanted to get rid of the first two 0's The program will take note that the numbers to be taken out is 00 and what the pattern is so every two number and then we say either how many times this pattern is repeated or if it should just follow the pattern til the end of the file.
Once the program takes note the 20 bit example will look like this
This took away our every three 1's option. Now if the program could do both options this would be optimal. We would be left with
Now if the program has selected its option or options we can scan the file again or call it quits there if we can't see any use to try to continue the compression. This program has to be able to handle at least a couple of gigabytes at a time if not having no limit. If any ideas of speed can be made to speed the process up it is appreciated however I realize the monumental size of this task for particular files and I can be patient as long as I know the results will be beneficial. Now when all options have been chosen I want too be able to save what we did to an independent .exe that will only work for that particular file. This is so that when compression and decompression is at work the smaller programs only objective is compress and or decompress. We don't need anything fancy like archiving. When the .exe is made it should be able to be named whatever is desired at the time. It would also be good if we can save the options that we choose or edit the .exe with the main program in case we decide to continue compression. This is planed to [url removed, login to view] selecting the options for the .exe we should be able to see how large the .exe will be.
There are many ways to find patterns as anyone can tell you. It could be anything from every 53 bit is part of a pattern to 5 megabytes are exactly the same so we can take 4 of them out and just remember where they belong. You could say every 4 and 7 bits is part of the pattern. Every size and amount of bits should be tested and I can't possibly list every possible way to find patterns nor could you but try your best and be creative. If there is a way to find a particular pattern but it will take a long time to find it in particular we should use it and sacrifice the speed. The main objective of this project is thoroughness over speed. We would need a ETA for every scan to find out how long it will take to go through the whole file.
As a last and maybe strange idea we want the ability to add junk bits to the end of the file. So lets say we compressed our 20 bit example to
0100001 to the 7 bits however we need the file to be 14 bits randomly. We should be able to either select random or purposely choose junk bits to place at the end. The program should make note of where the junk bits begin and end as remembering what they actually were in some cases would make the smaller .exe larger then it needs to be and since all the junk would be at the end of the files binary this will help matters along.