Merge pull request #647 from microsoft/ml_for_beginners_review_3

Reviewing regression tools/data/linear learning units
pull/653/head
Carlotta Castelluccio 1 year ago committed by GitHub
commit 1c82556a31
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -149,10 +149,11 @@ In a new code cell, load the diabetes dataset by calling `load_diabetes()`. The
✅ Think a bit about the relationship between the data and the regression target. Linear regression predicts relationships between feature X and target variable y. Can you find the [target](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) for the diabetes dataset in the documentation? What is this dataset demonstrating, given that target?
2. Next, select a portion of this dataset to plot by arranging it into a new array using numpy's `newaxis` function. We are going to use linear regression to generate a line between values in this data, according to a pattern it determines.
2. Next, select a portion of this dataset to plot by selecting the 3rd column of the dataset. You can do this by using the `:` operator to select all rows, and then selecting the 3rd column using the index (2). You can also reshape the data to be a 2D array - as required for plotting - by using `reshape(n_rows, n_columns)`. If one of the parameter is -1, the corresponding dimension is calculated automatically.
```python
X = X[:, np.newaxis, 2]
X = X[:, 2]
X = X.reshape((-1,1))
```
✅ At any time, print out the data to check its shape.

File diff suppressed because one or more lines are too long

@ -73,11 +73,11 @@ Open the _notebook.ipynb_ file in Visual Studio Code and import the spreadsheet
There is missing data, but maybe it won't matter for the task at hand.
1. To make your dataframe easier to work with, drop several of its columns, using `drop()`, keeping only the columns you need:
1. To make your dataframe easier to work with, select only the columns you need, using the `loc` function which extracts from the original dataframe a group of rows (passed as first parameter) and columns (passed as second parameter). The expression `:` in the case below means "all rows".
```python
new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']
pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
columns_to_select = ['Package', 'Low Price', 'High Price', 'Date']
pumpkins = pumpkins.loc[:, columns_to_select]
```
### Second, determine average price of pumpkin

@ -304,8 +304,8 @@
"source": [
"\n",
"# A set of new columns for a new dataframe. Filter out nonmatching columns\n",
"new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']\n",
"pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)\n",
"columns_to_select = ['Package', 'Low Price', 'High Price', 'Date']\n",
"pumpkins = pumpkins.loc[:, columns_to_select]\n",
"\n",
"# Get an average between low and high price for the base pumpkin price\n",
"price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
@ -412,7 +412,7 @@
"name": "python",
"nbconvert_exporter": "python",
"pygments_lexer": "ipython3",
"version": "3.8.9"
"version": "3.11.1"
},
"metadata": {
"interpreter": {

@ -38,8 +38,8 @@
"source": [
"pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
"\n",
"new_columns = ['Package', 'Variety', 'City Name', 'Month', 'Low Price', 'High Price', 'Date']\n",
"pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)\n",
"columns_to_select = ['Package', 'Variety', 'City Name', 'Low Price', 'High Price', 'Date']\n",
"pumpkins = pumpkins.loc[:, columns_to_select]\n",
"\n",
"price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
"\n",

Loading…
Cancel
Save