diff --git a/2-Regression/3-Linear/README.md b/2-Regression/3-Linear/README.md
index bf8e5cc1..2a8ae959 100644
--- a/2-Regression/3-Linear/README.md
+++ b/2-Regression/3-Linear/README.md
@@ -77,7 +77,7 @@ A good linear regression model will be one that has a high (nearer to 1 than 0)
In the code below, we will assume that we have cleaned up the data, and obtained a dataframe called `new_pumpkins`, similar to the following:
- | Month | DayOfYear | Variety | City | Package | Low Price | High Price | Price
+ID | Month | DayOfYear | Variety | City | Package | Low Price | High Price | Price
---|-------|-----------|---------|------|---------|-----------|------------|-------
70 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 15.0 | 15.0 | 13.636364
71 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 18.0 | 18.0 | 16.363636
@@ -97,11 +97,11 @@ Now that you have an understanding of the math behind linear regression, let's c
From the previous lesson you have probably seen that the average price for different months looks like this:
-
+
This suggests that there should be some correlation, and we can try training linear regression model to predict the relationship between `Month` and `Price`, or between `DayOfYear` and `Price`. Here is the scatter plot that shows the latter relationship:
-
+
It looks like there are different clusters of prices corresponding to different pumpkin varieties. To confirm this hypothesis, let's plot each pumpkin category using different color. By passing `ax` parameter to the `scatter` plotting function we can plot all points on the same graph:
@@ -113,7 +113,7 @@ for i,var in enumerate(new_pumpkins['Variety'].unique()):
ax = df.plot.scatter('DayOfYear','Price',ax=ax,c=colors[i],label=var)
```
-
+
Our investigation suggests that variety has more effect on the overall price than actual selling date. So let us focus for the moment only on one pumpkin variety, and see what effect does the date have:
@@ -122,7 +122,7 @@ Our investigation suggests that variety has more effect on the overall price tha
pie_pumpkins = new_pumpkins[new_pumpkins['Variety']=='PIE TYPE']
pie_pumpkins.plot.scatter('DayOfYear','Price')
```
-
+
If we now calculate the correlation between `Price` and `DayOfYear` using `corr` function, we will get something like `-0.27` - which means that training predictive model makes sense.
@@ -193,7 +193,7 @@ plt.scatter(X_test,y_test)
plt.plot(X_test,pred)
```
-
+
## Polynomial Regression
@@ -223,7 +223,7 @@ Using `PolynomialFeatures(2)` means that we will include all second-degree polyn
Pipeline can be used in the same manner as original `LinearRegression` object, i.e. we can `fit` the pipeline, and then use `predict` to get the prediction results. Here is the graph showing test data, and the approximation curve:
-
+
Using polynomial regression we can get slightly lower MSE and higher determination, but not significantly. We need to take into account other features!
@@ -237,7 +237,7 @@ In the ideal world, we want to be able to predict prices for different pumpkin v
Here you can see how average price depends on variety:
-
+
To take variety into account, we first need to convert it to numeric form, or **encode**. There are several way we can do it: