This work is for an adult movie review website, if you are not comfortable with that (or under 18) please don't bid. Don't go to our site or click on any link in this posting.
This project is to complete and tune a page scrape, data import and a little data manipulation. I have described what we are looking for and summarized in the numbered list below.
Sending the UPC via our form via the URL is no longer working (sending the title via another URL still works as comparison). The URL on its own with the UPC added at the end as the PHP code is developed manually does present a page in the browser.
The names of the product ID from this scrape is not importing the first several letters. For example: UPC 657447006982 saves "WSENS785" for the title_id, while the full id from the site is "DVDNEWSENS785" This is a consistent problem with 5 characters being dropped. Many files have been imported incorrectly, need to reconstruct this id based on product (they all start with "DVD") and the movie studio "NEWSENSATION", so add the two first characters (another example: DVDGREEDY16 saves as EEDY16) In our DB we have the company name in a field and the title_id in another, so add "DVD" and the first two characters of the company field to the title_id field.
Also, for some 2-disc set titles, we need the parenthesis stripped before import. For example, the data from the scrape would be "(2-disc set)" it strips the "2-disc set" out but leaves the parenthesis "()".
Before bringing the data from the scrape into HTML fields to edit we have a preview function. Clicking on the item to bring the data into the fields is no longer working. One other feed still works, so you can use that as comparison for what we are looking for.
1. Fix the function to send the UPC to URL for scrape.
2. Fix scrape to bring in full title_id.
3. Reconstruct title_id from "DVD" and the company name from DB.
4. Fix scrape routine to remove parenthesis.
5. Fix field population from data preview.
Date Import & Scrape
We need to have an import fixed to recognize a different file delimited format. (attached). We can get this as tab, comma, pipe & semi colon. The problem that we have is the data going into the wrong fields. So the bulk of the work is done. The import is triggered by placing the file manually on our FTP server.
6. Fix import to map data to the right fields in database.
Auto Search Function
Problem: auto-search feature is triggered after we submit the data from a scraped search to the database. After we submit the data for a scraped title, an auto search starts for another title that is in the DB. The auto-search should only happen after we save a title that is called from the DB, data added to that record and re-saved to the DB.
7. Stop auto search from searching after a new record data submit.
8. Another data import-change URL where data is retrieved (we have an old one up now).
In short, should be straight forward, but the biggest task is just figuring out how this works. If interested and need more info, just ask and we can provide the URLs for the interface that we have as well as the URL for the scrape, etc.
***I am willing to pay more to automate #6. Right now we have to go to a web site interface, choose the date range and category of the information we wish to have, save the file to our FTP server for import.