Merge pull request #647 from microsoft/ml_for_beginners_review_3

Reviewing regression tools/data/linear learning units
2 years ago · 1c82556a31
parent 9c925bf772 15ce95efe5
commit 1c82556a31
5 changed files with 557 additions and 79 deletions
--- a/2-Regression/1-Tools/README.md
+++ b/2-Regression/1-Tools/README.md
@ -149,10 +149,11 @@ In a new code cell, load the diabetes dataset by calling `load_diabetes()`. The

    ✅ Think a bit about the relationship between the data and the regression target. Linear regression predicts relationships between feature X and target variable y. Can you find the [target](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) for the diabetes dataset in the documentation? What is this dataset demonstrating, given that target?

-2. Next, select a portion of this dataset to plot by arranging it into a new array using numpy's `newaxis` function. We are going to use linear regression to generate a line between values in this data, according to a pattern it determines.
+2. Next, select a portion of this dataset to plot by selecting the 3rd column of the dataset. You can do this by using the `:` operator to select all rows, and then selecting the 3rd column using the index (2). You can also reshape the data to be a 2D array - as required for plotting - by using `reshape(n_rows, n_columns)`. If one of the parameter is -1, the corresponding dimension is calculated automatically.

   ```python
-   X = X[:, np.newaxis, 2]
+   X = X[:, 2]
+   X = X.reshape((-1,1))
   ```

   ✅ At any time, print out the data to check its shape.
--- a/2-Regression/1-Tools/solution/notebook.ipynb
+++ b/2-Regression/1-Tools/solution/notebook.ipynb
--- a/2-Regression/2-Data/README.md
+++ b/2-Regression/2-Data/README.md
@ -73,11 +73,11 @@ Open the _notebook.ipynb_ file in Visual Studio Code and import the spreadsheet

    There is missing data, but maybe it won't matter for the task at hand.

-1. To make your dataframe easier to work with, drop several of its columns, using `drop()`, keeping only the columns you need:
+1. To make your dataframe easier to work with, select only the columns you need, using the `loc` function which extracts from the original dataframe a group of rows (passed as first parameter) and columns (passed as second parameter). The expression `:` in the case below means "all rows".

    ```python
-    new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']
-    pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
+    columns_to_select = ['Package', 'Low Price', 'High Price', 'Date']
+    pumpkins = pumpkins.loc[:, columns_to_select]
    ```

 ### Second, determine average price of pumpkin
--- a/2-Regression/2-Data/solution/notebook.ipynb
+++ b/2-Regression/2-Data/solution/notebook.ipynb
@ -304,8 +304,8 @@
   "source": [
    "\n",
    "# A set of new columns for a new dataframe. Filter out nonmatching columns\n",
-    "new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']\n",
-    "pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)\n",
+    "columns_to_select = ['Package', 'Low Price', 'High Price', 'Date']\n",
+    "pumpkins = pumpkins.loc[:, columns_to_select]\n",
    "\n",
    "# Get an average between low and high price for the base pumpkin price\n",
    "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
@ -412,7 +412,7 @@
   "name": "python",
   "nbconvert_exporter": "python",
   "pygments_lexer": "ipython3",
-   "version": "3.8.9"
+   "version": "3.11.1"
  },
  "metadata": {
   "interpreter": {
--- a/2-Regression/3-Linear/notebook.ipynb
+++ b/2-Regression/3-Linear/notebook.ipynb
@ -38,8 +38,8 @@
   "source": [
    "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
    "\n",
-    "new_columns = ['Package', 'Variety', 'City Name', 'Month', 'Low Price', 'High Price', 'Date']\n",
-    "pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)\n",
+    "columns_to_select = ['Package', 'Variety', 'City Name', 'Low Price', 'High Price', 'Date']\n",
+    "pumpkins = pumpkins.loc[:, columns_to_select]\n",
    "\n",
    "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
    "\n",