Practice: Coding Machine Learning
Practice: Coding Machine Learning#
Jupyter Info
Reminder, that on this site the Jupyter Notebooks are read-only and you can’t interact with them. Click the button above to launch an interactive version of this notebook.
With Binder, you get a temporary Jupyter Notebook website that opens with this notebook. Any code you write will be lost when you close the tab. Make sure to download the notebook so you can save it for later!
With Colab, it will open Google Colaboratory. You can save the notebook there to your Google Drive. If you don’t save to your Drive, any code you write will be lost when you close the tab. You can find the data files for this notebook below:
You will need to run all the cells of the notebook to see the output. You can do this with hitting Shift-Enter
on each cell or clicking the “Run All” button above.
In this notebook, we will practice trying to predict the weather. We won’t try to predict it in the sense you are familiar with, where meteorologists try to predict what the weather will be a week out from now. Instead, we will do a simpler example where we look at various information about a day and try to predict the maximimum temperature that day.
The data is stored in weather.csv
and has the following columns.
STA
: A code representing what station the measurements were taken fromYR
: Which year this measurement was takenMO
: Which month this measurement was takenDA
: Which day this measurement was takenMAX
(our target): The maximum temperature that was reached that dayMIN
: The minimum temperature that was reached that day.
Since the target we want to predict is a number, this will be a regression task rather than a classification task. Almost all the code you will write will be the same as we saw in the lesson, except:
You will use a
DecisionTreeRegressor
fromsklearn.tree
instead of aDecisionTreeClassifier
You will use the
mean_squared_error
function from thesklearn.metrics
module instead ofaccuracy_score
. It behaves similarly in the sense it takes the true labels and the predicted labels, but is different in that it returns the error of the predictions instead. Formally, this is returning the mean-squared error between your predictions and the true values (find the difference for each example, square them, and average them). A higher MSE means the model did worse, while an MSE of 0 means there were no errors!
As a recommendation, you may use the following variable names for the parts of the problem:
data
should store theDataFrame
of all the data stored inweather.csv
.features
should store theDataFrame
of just the features.labels
should store theSeries
of labels.model
should store theDecisionTreeRegressor
.error
should store the error of the trainedmodel
on the whole dataset.
We don’t specify each step so that you refer back to your notes and the code process you saw from the notebook earlier in the lesson. Refer back to that for the steps to train the model (accounting for the differences we highlighted above). Remember to import all the necessary libraries!
As a hint for correctness on this task, your model should get 0 error on this dataset. We will discuss in Lesson 12 why getting 0 error might be a sign of something is actually wrong with our model, but for this lesson, we will consider that correct!
For these problems, you should not use any loops!
# Write your code here!