@ -105,11 +105,11 @@ Now that you have an understanding of the math behind linear regression, let's c
From the previous lesson you have probably seen that the average price for different months looks like this:
<imgalt="Average price by month"src="../2-Data/images/barchart.png" width="50%"/>
<imgalt="Average price by month"src="/2-Regression/2-Data/images/barchart.png" width="50%"/>
This suggests that there should be some correlation, and we can try training linear regression model to predict the relationship between `Month` and `Price`, or between `DayOfYear` and `Price`. Here is the scatter plot that shows the latter relationship:
<imgalt="Scatter plot of Price vs. Day of Year"src="images/scatter-dayofyear.png" width="50%"/>
<imgalt="Scatter plot of Price vs. Day of Year"src="/2-Regression/3-Linear/images/scatter-dayofyear.png" width="50%"/>
Let's see if there is a correlation using the `corr` function:
@ -128,7 +128,7 @@ for i,var in enumerate(new_pumpkins['Variety'].unique()):
<imgalt="Scatter plot of Price vs. Day of Year"src="images/pie-pumpkins-scatter.png" width="50%"/>
<imgalt="Scatter plot of Price vs. Day of Year"src="/2-Regression/3-Linear/images/pie-pumpkins-scatter.png" width="50%"/>
If we now calculate the correlation between `Price` and `DayOfYear` using `corr` function, we will get something like `-0.27` - which means that training a predictive model makes sense.
@ -248,7 +248,7 @@ Using `PolynomialFeatures(2)` means that we will include all second-degree polyn
Pipelines can be used in the same manner as the original `LinearRegression` object, i.e. we can `fit` the pipeline, and then use `predict` to get the prediction results. Here is the graph showing test data, and the approximation curve:
Using Polynomial Regression, we can get slightly lower MSE and higher determination, but not significantly. We need to take into account other features!
@ -266,7 +266,7 @@ In the ideal world, we want to be able to predict prices for different pumpkin v
Here you can see how average price depends on variety:
<imgalt="Average price by variety"src="images/price-by-variety.png" width="50%"/>
<imgalt="Average price by variety"src="/2-Regression/3-Linear/images/price-by-variety.png" width="50%"/>
To take variety into account, we first need to convert it to numeric form, or **encode** it. There are several way we can do it: