As this is a continuous assignment i am including the description of the test 1 and solution file for the same but i want the solution for test 2. Also included the data file for this test. This is a hadoop program basically.
Test 1 - Python
Use data set from the files movie ratings 1 million records ([login to view URL], [login to view URL], [login to view URL]). Please make Python/Mapreduce code (mapper and reducer) to answer the following research question:
"What are the most popular movies for different age groups?"
Data set [login to view URL] has an information about age groups
* 1: "Under 18"
* 18: "18-24"
* 25: "25-34"
* 35: "35-44"
* 45: "45-49"
* 50: "50-55"
* 56: "56+"
Your code should be able to provide a movie ID for the movie that has the highest number of ratings and that number for each age group. If you want, you can also provide the name of the movie as well. However, this is optional.
To achieve the first task, you can join [login to view URL] and [login to view URL] and get most popular movies IDs.
For the optional task, you can produce two mapreduce programs (that is, mapper1, reducer1, mapper2, reducer2). The first one will join [login to view URL] and [login to view URL] and get most popular movies IDs. The second one will join your result with [login to view URL] and output movie titles. If you go this way, you should provide me an instruction what mapper/reducer use first and what data to load in each of them.
Your submission will include three files: mapper, reducer and result output from Hadoop (part-00000 file). If you decide to go with the optional task, then you will submit more files and an instruction how to use them. Either way - you don't need to submit data files.
Hadoop Test 2 - Pig
Your test 2 is to finish the optional task the same as in test 1, i.e., provide a movie name for the movie that has the highest number of ratings and that number for each age group.
The only difference - now you have to use Pig and PigLatin. This task requires "normal" programming logic: load three data sets, join first and second, then join resulted set with the third one, group, aggregate, probably group again to find maximum.
You have to submit two files - PigLatin script and Hadoop/MapReduce output with results.
9 freelancere byder i gennemsnit $82 på dette job
i can wotk on this rightaway. but will need an elaboration on pig and piglatin. i have searched online but i am not sure. please confirm and i could start right away.
I am certified Hadoop Developer/ HBASe expert. I have been working on similar solutions in the past. I would be able to provide solution for the same in Java