Python/Kubernets/Docker Custom task

There are 4 tasks.

1. Extract and manipulate data

Using the lookup data in [login to view URL], you should extract information about each node's tags in the HTML trees. In particular, for each node in each HTML page, we need its tag, the tag of its left and right siblings, and the tag of its parent. The utility function load_single_warc_record will allow you to download the HTML and the get_* functions will should help you extract the relevant columns (but you will have to implement one of those functions yourself).

2. Store in a database

Record all this information in an SQLite3 database. As a minimum, you should create and populate these tables:

1. webpage for storing data about the website / HTML. Namely, the URL, but also anything else you find important

2. tags for storing the four extracted tag columns and anything else you find important

As part of your assessment, we ask that you supply the SQLite3 database file containing extracted data in the relevant tables.


The script used to upload the data to the database should be able to deal with new data that has been extracted by the script in part 1. The requirements are

1. It should not upload duplicate data again.

2. If the tags of a URL change it should not overwrite existing data.

3. New URLs and corresponding tags should be inserted if found.

3. Dockerize

Please write a Dockerfile that can be used to run your code end-to-end. That is, it must perform steps 1) and 2) above. To test your solution, we will run your Dockerfile with multiple files like [login to view URL] to make sure duplicates and new data is being handled correctly.

Write an accompanying script containing the exact docker build and docker run commands for that Dockerfile.

4. CI/CD

4a. Docker container

Write a CI workflow to build and deploy the docker container from the Dockerfile in step 3. You can use Github Actions for this.

4b. Orchestration

The docker container should be run daily. We use Kubernetes for orchestration, and if you have experience of Kube please write a manifest that will run this docker container on a daily basis.


You should make a GitHub repository containing the code you developed for this task, structuring it in a sensible way. If you choose not to commit the file containing your SQLite database, please send it to us as an attachment along with the link to your GitHub repo.

Good luck!

Supplied to you: - [login to view URL] - [login to view URL] - [login to view URL] (this file)

Required by us: - Data: - SQLite3 database file produced by your code

• Code:

– Extraction / storage script(s)

– Dockerfile

– Script with the docker build and docker run commands

– Kubernetes manifest

– CI workflow


• [login to view URL]

• [login to view URL]

• [login to view URL]

• [login to view URL]

• [login to view URL]

• [login to view URL]

Evner: Kubernetes, Python, Docker, SQLite, GitHub

Se mere: web part custom task pane, sharepoint custom task notification mail, python scripts creating custom services, sharepoint custom task notification, sharepoint 2007 custom task email notification, sharepoint custom task email notification, sharepoint designer custom task notification, outlook custom task pane drag drop, enable drag drop outlook custom task pane, outlook custom task form, printing custom task form, sharepoint custom task email html, spd workflow create list item custom task list custom field, ssis custom task, sharepoint foundation custom task, outlook custom task view print, custom task ssis

Om arbejdsgiveren:
( 0 bedømmelser ) Hyderabad, India

Projekt ID: #31636331

2 freelancere byder i gennemsnit ₹1050 timen for dette job


I have extensive knowledge and 12 Years Experience in Python Statistics and Probability Machine Learning -UNSUPERVISED LEARNING Machine Learning - SUPERVISED LEARNING Natural Language Processing Deep Learning Artif Flere

₹1050 INR in 7 dage
(0 bedømmelser)

I have hands on Experience in Python, SQLite, Docker, Kubernetes, GitHub. I Have Very Much interested in this Project. Please Let us Discuss in detail in chat.

₹1050 INR in 7 dage
(0 bedømmelser)