The test is a typical de-normalization task that is performed frequently when loading data to BigQuery.
The test itself doesn’t require interaction with BigQuery, as we find that final output of transformed
data to BigQuery is the easy part. The transformations in Google DataFlow are more complex and this is
what we would like you to do.
You’ll be given 3 files in gzip-archived JSON format that we receive from Spotify API: streams, tracks and
users. Your job is to develop two pipelines in Google DataFlow (one in Java and one in Python) to
denormalize these three files into one flat output JSON file.