Fix formatting

pull/562/head
Dmitri Soshnikov 4 years ago
parent 69c817b2a9
commit 18c38929a0

@ -77,7 +77,7 @@ A good linear regression model will be one that has a high (nearer to 1 than 0)
In the code below, we will assume that we have cleaned up the data, and obtained a dataframe called `new_pumpkins`, similar to the following: In the code below, we will assume that we have cleaned up the data, and obtained a dataframe called `new_pumpkins`, similar to the following:
| Month | DayOfYear | Variety | City | Package | Low Price | High Price | Price ID | Month | DayOfYear | Variety | City | Package | Low Price | High Price | Price
---|-------|-----------|---------|------|---------|-----------|------------|------- ---|-------|-----------|---------|------|---------|-----------|------------|-------
70 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 15.0 | 15.0 | 13.636364 70 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 15.0 | 15.0 | 13.636364
71 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 18.0 | 18.0 | 16.363636 71 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 18.0 | 18.0 | 16.363636
@ -97,11 +97,11 @@ Now that you have an understanding of the math behind linear regression, let's c
From the previous lesson you have probably seen that the average price for different months looks like this: From the previous lesson you have probably seen that the average price for different months looks like this:
<img alt="Average price by month" src="../2-Data/images/barchart.png" width="30%"/> <img alt="Average price by month" src="../2-Data/images/barchart.png" width="50%"/>
This suggests that there should be some correlation, and we can try training linear regression model to predict the relationship between `Month` and `Price`, or between `DayOfYear` and `Price`. Here is the scatter plot that shows the latter relationship: This suggests that there should be some correlation, and we can try training linear regression model to predict the relationship between `Month` and `Price`, or between `DayOfYear` and `Price`. Here is the scatter plot that shows the latter relationship:
<img alt="Scatter plot of Price vs. Day of Year" src="images/scatter-dayofyear.png" width="30%" /> <img alt="Scatter plot of Price vs. Day of Year" src="images/scatter-dayofyear.png" width="50%" />
It looks like there are different clusters of prices corresponding to different pumpkin varieties. To confirm this hypothesis, let's plot each pumpkin category using different color. By passing `ax` parameter to the `scatter` plotting function we can plot all points on the same graph: It looks like there are different clusters of prices corresponding to different pumpkin varieties. To confirm this hypothesis, let's plot each pumpkin category using different color. By passing `ax` parameter to the `scatter` plotting function we can plot all points on the same graph:
@ -113,7 +113,7 @@ for i,var in enumerate(new_pumpkins['Variety'].unique()):
ax = df.plot.scatter('DayOfYear','Price',ax=ax,c=colors[i],label=var) ax = df.plot.scatter('DayOfYear','Price',ax=ax,c=colors[i],label=var)
``` ```
<img alt="Scatter plot of Price vs. Day of Year" src="images/scatter-dayofyear-color.png" width="30%" /> <img alt="Scatter plot of Price vs. Day of Year" src="images/scatter-dayofyear-color.png" width="50%" />
Our investigation suggests that variety has more effect on the overall price than actual selling date. So let us focus for the moment only on one pumpkin variety, and see what effect does the date have: Our investigation suggests that variety has more effect on the overall price than actual selling date. So let us focus for the moment only on one pumpkin variety, and see what effect does the date have:
@ -122,7 +122,7 @@ Our investigation suggests that variety has more effect on the overall price tha
pie_pumpkins = new_pumpkins[new_pumpkins['Variety']=='PIE TYPE'] pie_pumpkins = new_pumpkins[new_pumpkins['Variety']=='PIE TYPE']
pie_pumpkins.plot.scatter('DayOfYear','Price') pie_pumpkins.plot.scatter('DayOfYear','Price')
``` ```
<img alt="Scatter plot of Price vs. Day of Year" src="images/pie-pumpkins-scatter.png" width="30%" /> <img alt="Scatter plot of Price vs. Day of Year" src="images/pie-pumpkins-scatter.png" width="50%" />
If we now calculate the correlation between `Price` and `DayOfYear` using `corr` function, we will get something like `-0.27` - which means that training predictive model makes sense. If we now calculate the correlation between `Price` and `DayOfYear` using `corr` function, we will get something like `-0.27` - which means that training predictive model makes sense.
@ -193,7 +193,7 @@ plt.scatter(X_test,y_test)
plt.plot(X_test,pred) plt.plot(X_test,pred)
``` ```
<img alt="Linear regression" src="images/linear-results.png" width="40%" /> <img alt="Linear regression" src="images/linear-results.png" width="50%" />
## Polynomial Regression ## Polynomial Regression
@ -223,7 +223,7 @@ Using `PolynomialFeatures(2)` means that we will include all second-degree polyn
Pipeline can be used in the same manner as original `LinearRegression` object, i.e. we can `fit` the pipeline, and then use `predict` to get the prediction results. Here is the graph showing test data, and the approximation curve: Pipeline can be used in the same manner as original `LinearRegression` object, i.e. we can `fit` the pipeline, and then use `predict` to get the prediction results. Here is the graph showing test data, and the approximation curve:
<img alt="Polynomial regression" src="images/poly-results.png" width="40%" /> <img alt="Polynomial regression" src="images/poly-results.png" width="50%" />
Using polynomial regression we can get slightly lower MSE and higher determination, but not significantly. We need to take into account other features! Using polynomial regression we can get slightly lower MSE and higher determination, but not significantly. We need to take into account other features!
@ -237,7 +237,7 @@ In the ideal world, we want to be able to predict prices for different pumpkin v
Here you can see how average price depends on variety: Here you can see how average price depends on variety:
<img alt="Average price by variety" src="images/price-by-variety.png" width="40%" /> <img alt="Average price by variety" src="images/price-by-variety.png" width="50%" />
To take variety into account, we first need to convert it to numeric form, or **encode**. There are several way we can do it: To take variety into account, we first need to convert it to numeric form, or **encode**. There are several way we can do it:

Loading…
Cancel
Save