1) log the status code of the url given and any urls redirected through - example if given a url that redirects to another url with a 301 status code I need the 301 code and the 200 that it redirects to.
2) List the urls in a redirect chain if there is a chain.
3) Get all the links on the page given even ones hidden in onclick divs or other methods.
4) list all the rel, anchor text and image url elements for each link if they exist
5) follow redirects if required by meta redirects or [url removed, login to view] and list the urls in the redirect
6) We need to be able to run this from command line on a linux machine. I don't care too much what language but we need to be able to use it with php. Previously we were running HTML unit through shell_exec in php and then capturing what was echoed to the command line. Continuing like this is fine.
We had some luck with HTML unit but we have not got enough experience to get all our requirements.
8 freelancers are bidding on average £390 for this job
I'm an expert Webbot, Netbot creator and a Professional webscraper. .NET/C# My webscraping skills can be found at [url removed, login to view] I'll scrape any data from any website.