What we are in need of is creating more "plugins" for our Media platform. The Plugins are basically "directory servers" to media content. eg: A plugin for Youtube, a plugin for YouPorn, a plugin for MLB, and so on. What changes the most between the plugins, are the actual scrapers.
Now I am open to all kinds of content, torrents, videos sites of all kinds, sports, movies, music and so on.
The platform the Media Application has been developed in is MONO/.NET. And the scraping is realtime. Basically broken down as: (Basic for understanding)
Get_Root (root returns the Base level folder structure of the site being scraped). An example would be (because its easiest to explain) all the categories of video on the youporn site. I have an object defined which is the same for folders or individual items, only setting a flag as whether its a folder, sorted folder (same folder but url to sort - LONGEST, etc) or real item. it includes a number of properties (title,description, and so on), as well as the URI that would be needed to either then "scrape" that folder, or PLAY that item.
Get_Folder(URI or other data needed to scrape,start item,reccount) Returns further folders/categorties, or individual items.
Get_PlayURL(URI needed to convert), in some case, the immediate playable URL is not available from scraping the folder. (eg you porn, you need to goto the actual page to get the PLAY URL (mp4 download). So this function gets the page and return the PLAYURL. This function can return multiple URLs with media information.
All thse calls each return items or additional folders. Items returned may or may not contain the direct media link(s), as is the purpose of the Get_PlayURLs function.
Now the scrapers can be any language, I dont care, so long as I can access them from with my .NET plugin. Or actual .NET code. LUA has a lib interface, .NET has things like HTMLAgility, and so on.
Data can be returned as a NET OBJECT (We will provide this class and Scraper Interface to you) or as XML representing the same object class.
We are interested in all obtainable media information (title, duration, description, ratings, publish date, etc) if availble. We will collect the media on our own for media processing, just need the means of browsing/searching for the data, and getting to the content with links.
Scrapers may need to impersonate browsers (such as iPhone/iPad) to be able to get actual listings of content available. (eg we do not want SWF links. FLV is ok, MP4, 3GP, etc are always better) We would accept an SWF link if you could also provide the means to extract the FLV source from the SWF.
What sites are we open too? Almost everything and anything if it has content.
Please bid by providing the sites you wish to provide scrapers for, and the cost for each scraper listed. As we also do understand some scrapers will be easy and some harder.
Once bids are received we will break this one project in multiple projects and award each party we have selected.
Please request additional information if you have further questions, interest.
It seems no one actually reads thru the project postings. As I specifically outlined for bidding to provide a cost per plugin and sample sites to which the cost was attributable too.
In light of this, here are some example sites:
And many many more.
PLEASE SEE ATTACHED PNG FILE for UML of the Obect Class for Folders/Items. Data can be returned via our .NET CLASS (we will give you the CLASS) or XML representation of same.