Search online to download reasonably large dataset. Define your own problem based on the dataset and provide a solution to it with your knowledge of Apache PySpark platform. You may obtain some idea for defining your own problem by referring to research papers. Include the reference in this case.
You earn high points based on the level of difficulty. For example, you will receive higher points depending on the difficulty and creativity of your problem.
Prepare a final report including 1) motivation, 2) design, and 3) relevant source code and screen shots. Also explain difficulties experienced and how to resolve them.