NO PLACE MARKER BIDS - ONLY REAL BIDS FROM ACTUAL DETAILS OF THIS PROJECT. We have a tremendous amount of microscope photography - close-up images of paper. We have some images which we know should be the standard (to create training) and we want to compare all images to standards. Some photos show papers with tiny black specks from black and white printing or plain paper and some images show tiny yellow dots from another paper source. All images are from printed papers from different sources. We want to sort all which show tiny signs of color printer interaction and those which show black and white interaction. We also need to attempt to determine how many possible different papers were used. There are possibly 50 different types of papers, we have THOUSANDS of training photographs fully labeled for training purposes. We want to identify what we can from the known supposed standards and then try to sort what is DIFFERENT in the others into as many general categories as possible. We want a fully automated program that will run thousands of images.
As a feature enhancement to the application functionality already developed, we require deployment capability in the Amazon Web Services cloud. Specifically, we wish to use dedicated EC2 instances to launch the application and we would like to use AWS S3 as both our source and target destinations for test data.
In addition, we need to ensure that the application is controllable via web user interface that our operators can access remotely via web browser. This may or may not already be completed as part of the project scope of work. Within this web interface we need to ensure that our operators can select one or more folders from a predefined S3 bucket to process, instead of dragging and dropping, or select local filesystem files to upload. The reason is that our data will be located in S3. We should define the S3 buckets within configuration/environment variables for the application. It is acceptable to require being set directly within code if it is clear where to configure this.
We should ensure that no stateful data is required to be stored on the instance so that we can replace it in a cloud-friendly manner if we require it. All data should be read and written from S3, or written out to the web user interface. We do not require the capability to load balance instances, but we do want to ensure that we can run multiple instances if we need to. We will manage keeping track of processed test data separately.
For each folder that we select to process through the user interface, we require the software to have the ability to recurse within sub folders, possibly many levels deep. Unfortunately, our data is not sorted in a predictable fashion and thus we require the ability to find all test data under the specified folder structure. We will have a number of top level folders with any number of sub-folders containing either more sub-folders or test data. The test data selection process should allow the operator to specify either a root folder, or sub folders to process sequentially (or parallel if possible).
The report data can be delivered via the web user interface, if it is not already. This will allow the operator to save just the report data to the local workstation. The report will need to capture relevant details including test data and also file object folder structure(s) in order to identify the data used for the report. You will have to use your best judgment as to how much of this data needs to be presented and what can be truncated. The test-proof data will need to be written to a separate, pre-defined S3 bucket for later archival and possible inspection.
(see detail sheets and samples)