I'm wanting all the images from Wikimedia Commons (over 900 kb)
there are tens of thousands and it is my understanding that some code could be written to extract the file names, but this is not my realm of knowledge.
I would need the proper name of the image
the metadata associated with it
authors info (just name is good)
realizing the amount of data I thought it would be best to narrow the info for now just to Category: Paintings by painter LINK: [url removed, login to view]:Paintings_by_painter
I have a bite from [url removed, login to view] that talks a little bit about what might be needed:
"The Filefield Sources modules allows to enter an URL as image source, instead of having to download the image to the local computer and then upload it manually into the filefield.
However, the part that requires manual work (which most users fail to do properly) still has to be handled manually: Giving the image a description, entereing metadata, giving proper credits to the creator, linking to the image source, evaluating the type of license (which CC license? GFDL?)
I you are able to write code, you could programatically download images from WikiMedia Commons; there is an API. Some pointers are @[url removed, login to view]: Downloading Images from WikiMedia Commons. This module might give you ideas how to do it: Flickr Batch Import"
another helpful bite: [url removed, login to view]
7 freelancers are bidding on average $314 for this job
I have lots of previous experience of difficult web scraping. This is fairly easy. The only question is really how you want the end result delivered.