1) CRAWL FOR COMPANY SOCIAL HANDLES - Populate the database with social feeds and qualified media from a given company URL.
The social feeds or handles include:
LinkedIn, YouTube, Vimeo, Twitter, Facebook, Google+, Blog RSS
2) CRAWL FOR EMBEDDED FILES
The media types include:
A. Raw video formats, all types (.MP4, .MOV, etc.)
B. Embedded video (Vimeo, Youtube)
C. Embedded documents (slideshare, scribD)
3) CONTENT IMPORT FROM SOCIAL CHANNELS/APIs - For embedded documents (YouTube, Vimeo, Slideshare, scribD) extracting additional videos and documents through the discovered social feed and using the API + discovered handle to retrieve additional content, while checking for duplicates from the site crawl.
4) UPDATE WEBSITE MEDIA SCHEDULED CRON - Update the database with any new media (including title, description, date created, url) from the company source URL as part of an overnight scheduled cron while checking for duplicates.
~ 2,000 websites
***A BETA VERSION OF THE CRAWLER IS ALREADY DEVELOPED, THIS IS A REFINEMENT OF THE EXISTING CRAWLER***