Jupyter Notebook Contents
The various sections in the notebook should include code, code comments and appropriate
Markup cells describing your approach chosen.
In detail, sections should include the following:
Introduction and Problem Definition
- Textual description providing an overview over the data
- A discussion on why this problem is a regression problem
- A detailed problem statement question as discussed in Lecture 3.
- Code to load the data into a suitable format to be used in the notebook
- A description of the statistical data types for each field in the file [login to view URL]
You should assume that exploratory data analysis has taken place and the following was
- Missing values are in the ‘temp’ and ‘atemp’ columns.
- The peak usage hours are: 7-9AM and 4-7PM on working days, and 10am-4pm on
- At night (10pm-4am) the bike rentals are low
- If the humidity or wind-speed is high, the number of rentals decreases.
Your data preparation steps should therefore include the following:
- Fill the missing values in the temperature columns automatically with values that
would most closely mirror the actual temperature.
- Create a new field that indicates whether it is a peak time or not
- Create a new field that indicates whether it is night time or not
- Remove all fields containing information about specific dates (‘yr’, ‘mnth’, ‘dteday’),
‘casual’ and ‘registered’ and any other variables that you deem irrelevant.
- A justification (and potential application) of whether you should use data binning or
- Suitable encoding of the data
- Code and justification for the selection and application of a suitable data split
- Selection of two different Regression models and justification why they are suitable.
Only one of those models should be Tree-Based (e.g. Random Forest or Decision
- Application of those models as a baseline on the data
- Utilisation of manual or automatic hyperparameter optimization and justification of
your choices to create “optimized” versions of each regression model
- Selection of appropriate regression metrics and a written outline why they are
suitable for this data
- A comparison of the baseline models to the “optimized” versions and an evaluation
of the results
- A conclusion and interpretation of the results and suggestion of potential
13 freelancere byder i gennemsnit $43 timen for dette job
Hello, I have delivered similar data using Jupyter Notebook , Pyspark and Pandas. I therefore believe I can deliver well on the assignment. Please reach out for further discussion Thank you.