Implement the co-occurrence problem over common crawl dataset:
[login to view URL] (Links to an external site.), compare the performance of different optimisations such as pairs, stripes by changing number of clusters, dataset size, etc. If you implement this, you can get a fair score. If you further implement local aggregation, partitioner, combiner, you will get a good score. If you further use multiple datasets from AWS except the common crawl to do above analysis, you will get full mark.
3 freelancere byder i gennemsnit $278 på dette job
need to understand your particular problem better, but I am suggesting using mahout for this, but more details needed on infrastructure, data source, tech stack (what's available) etc, before I commit