@ -103,7 +103,14 @@ This suggests that there should be some correlation, and we can try training lin
<imgalt="Scatter plot of Price vs. Day of Year"src="images/scatter-dayofyear.png"width="50%"/>
<imgalt="Scatter plot of Price vs. Day of Year"src="images/scatter-dayofyear.png"width="50%"/>
It looks like there are different clusters of prices corresponding to different pumpkin varieties. To confirm this hypothesis, let's plot each pumpkin category using a different color. By passing an `ax` parameter to the `scatter` plotting function we can plot all points on the same graph:
Let's see if there is a correlation using the `corr` function:
It looks like the correlation is pretty small, -0.15 by `Month` and -0.17 by the `DayOfMonth`, but there could be another important relationship. It looks like there are different clusters of prices corresponding to different pumpkin varieties. To confirm this hypothesis, let's plot each pumpkin category using a different color. By passing an `ax` parameter to the `scatter` plotting function we can plot all points on the same graph:
```python
```python
ax=None
ax=None
@ -115,7 +122,15 @@ for i,var in enumerate(new_pumpkins['Variety'].unique()):
<imgalt="Scatter plot of Price vs. Day of Year"src="images/scatter-dayofyear-color.png"width="50%"/>
<imgalt="Scatter plot of Price vs. Day of Year"src="images/scatter-dayofyear-color.png"width="50%"/>
Our investigation suggests that variety has more effect on the overall price than the actual selling date. So let us focus for the moment only on one pumpkin variety, and see what effect the date has on the price:
Our investigation suggests that variety has more effect on the overall price than the actual selling date. We can see this with a bar graph: