As this is a continuous assignment i am including the description of the test 1 and solution file for the same but i want the solution for test 2. Also included the data file for this test. This is a hadoop program basically.
Test 1 - Python
Use data set from the files movie ratings 1 million records ([login to view URL], [login to view URL], [login to view URL]). Please make Python/Mapreduce code (mapper and reducer) to answer the following research question:
"What are the most popular movies for different age groups?"
Data set [login to view URL] has an information about age groups
* 1: "Under 18"
* 18: "18-24"
* 25: "25-34"
* 35: "35-44"
* 45: "45-49"
* 50: "50-55"
* 56: "56+"
Your code should be able to provide a movie ID for the movie that has the highest number of ratings and that number for each age group. If you want, you can also provide the name of the movie as well. However, this is optional.
To achieve the first task, you can join [login to view URL] and [login to view URL] and get most popular movies IDs.
For the optional task, you can produce two mapreduce programs (that is, mapper1, reducer1, mapper2, reducer2). The first one will join [login to view URL] and [login to view URL] and get most popular movies IDs. The second one will join your result with [login to view URL] and output movie titles. If you go this way, you should provide me an instruction what mapper/reducer use first and what data to load in each of them.
Your submission will include three files: mapper, reducer and result output from Hadoop (part-00000 file). If you decide to go with the optional task, then you will submit more files and an instruction how to use them. Either way - you don't need to submit data files.
Hadoop Test 2 - Pig
Your test 2 is to finish the optional task the same as in test 1, i.e., provide a movie name for the movie that has the highest number of ratings and that number for each age group.
The only difference - now you have to use Pig and PigLatin. This task requires "normal" programming logic: load three data sets, join first and second, then join resulted set with the third one, group, aggregate, probably group again to find maximum.
You have to submit two files - PigLatin script and Hadoop/MapReduce output with results.
9 freelancere byder i gennemsnit $81 på dette job
Dear Employer I have extensive experience in map reduce programming using hadoop and java. I can finish the work as per your requirements. Please let me know if you are interested.
Hello I am good at hadoop ecosystem. I have gone through your problem statement and I can solve your second problem. Hadoop Test 2 - Pig. lets chat to explore more
Hi I have good experience in hadoop and map reduce programming . I have 4+ experience in Hive , Map Reduce and Pig . Please provide the opportunity to start the work . Thanks Akram