{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "name": "lesson_12-R.ipynb",
   "provenance": [],
   "collapsed_sections": []
  },
  "kernelspec": {
   "name": "ir",
   "display_name": "R"
  },
  "language_info": {
   "name": "R"
  },
  "coopTranslator": {
   "original_hash": "fab50046ca413a38939d579f8432274f",
   "translation_date": "2025-09-06T15:38:31+00:00",
   "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb",
   "language_code": "en"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jsFutf_ygqSx"
   },
   "source": [
    "# Build a classification model: Delicious Asian and Indian Cuisines\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HD54bEefgtNO"
   },
   "source": [
    "## Cuisine classifiers 2\n",
    "\n",
    "In this second classification lesson, we will explore `additional methods` for classifying categorical data. We will also discuss the implications of choosing one classifier over another.\n",
    "\n",
    "### [**Pre-lecture quiz**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n",
    "\n",
    "### **Prerequisite**\n",
    "\n",
    "We assume that you have completed the previous lessons since we will be building on concepts introduced earlier.\n",
    "\n",
    "For this lesson, the following packages will be required:\n",
    "\n",
    "-   `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [set of R packages](https://www.tidyverse.org/packages) designed to make data science faster, easier, and more enjoyable!\n",
    "\n",
    "-   `tidymodels`: The [tidymodels](https://www.tidymodels.org/) framework is a [collection of packages](https://www.tidymodels.org/packages/) for modeling and machine learning.\n",
    "\n",
    "-   `themis`: The [themis package](https://themis.tidymodels.org/) provides additional recipe steps for handling imbalanced data.\n",
    "\n",
    "You can install them using the following command:\n",
    "\n",
    "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n",
    "\n",
    "Alternatively, the script below checks whether the required packages for this module are installed and installs any missing ones for you.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vZ57IuUxgyQt"
   },
   "source": [
    "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
    "\n",
    "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z22M-pj4g07x"
   },
   "source": [
    "## **1. A classification map**\n",
    "\n",
    "In our [previous lesson](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1), we explored the question: how do we decide between different models? To a large extent, the choice depends on the characteristics of the data and the type of problem we aim to solve (e.g., classification or regression).\n",
    "\n",
    "Earlier, we learned about the various options available for classifying data using Microsoft's cheat sheet. Python's Machine Learning framework, Scikit-learn, provides a similar but more detailed cheat sheet that can help further refine your choice of estimators (another term for classifiers):\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/map.png\"\n",
    "   width=\"700\"/>\n",
    "   <figcaption></figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u1i3xRIVg7vG"
   },
   "source": [
    "> Tip: [visit this map online](https://scikit-learn.org/stable/tutorial/machine_learning_map/) and click along the path to read documentation.\n",
    ">\n",
    "> The [Tidymodels reference site](https://www.tidymodels.org/find/parsnip/#models) also provides excellent documentation about different types of models.\n",
    "\n",
    "### **The plan** 🗺️\n",
    "\n",
    "This map is very useful once you have a solid understanding of your data, as you can 'navigate' its paths to make a decision:\n",
    "\n",
    "-   We have \\>50 samples\n",
    "\n",
    "-   We want to predict a category\n",
    "\n",
    "-   We have labeled data\n",
    "\n",
    "-   We have fewer than 100K samples\n",
    "\n",
    "-   ✨ We can choose a Linear SVC\n",
    "\n",
    "-   If that doesn't work, since we have numeric data\n",
    "\n",
    "    -   We can try a ✨ KNeighbors Classifier\n",
    "\n",
    "        -   If that doesn't work, try ✨ SVC and ✨ Ensemble Classifiers\n",
    "\n",
    "This is a great path to follow. Now, let's dive right into it using the [tidymodels](https://www.tidymodels.org/) modeling framework: a consistent and flexible collection of R packages designed to promote good statistical practices 😊.\n",
    "\n",
    "## 2. Split the data and handle imbalanced datasets.\n",
    "\n",
    "From our previous lessons, we learned that there were a set of common ingredients across our cuisines. Additionally, there was a significant imbalance in the distribution of cuisines.\n",
    "\n",
    "We'll address these issues by:\n",
    "\n",
    "-   Dropping the most common ingredients that cause confusion between distinct cuisines, using `dplyr::select()`.\n",
    "\n",
    "-   Using a `recipe` to preprocess the data and prepare it for modeling by applying an `over-sampling` algorithm.\n",
    "\n",
    "We already covered this in the previous lesson, so this should be a piece of cake 🥳!\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "6tj_rN00hClA"
   },
   "source": [
    "# Load the core Tidyverse and Tidymodels packages\n",
    "library(tidyverse)\n",
    "library(tidymodels)\n",
    "\n",
    "# Load the original cuisines data\n",
    "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n",
    "\n",
    "# Drop id column, rice, garlic and ginger from our original data set\n",
    "df_select <- df %>% \n",
    "  select(-c(1, rice, garlic, ginger)) %>%\n",
    "  # Encode cuisine column as categorical\n",
    "  mutate(cuisine = factor(cuisine))\n",
    "\n",
    "\n",
    "# Create data split specification\n",
    "set.seed(2056)\n",
    "cuisines_split <- initial_split(data = df_select,\n",
    "                                strata = cuisine,\n",
    "                                prop = 0.7)\n",
    "\n",
    "# Extract the data in each split\n",
    "cuisines_train <- training(cuisines_split)\n",
    "cuisines_test <- testing(cuisines_split)\n",
    "\n",
    "# Display distribution of cuisines in the training set\n",
    "cuisines_train %>% \n",
    "  count(cuisine) %>% \n",
    "  arrange(desc(n))"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zFin5yw3hHb1"
   },
   "source": [
    "### Addressing Imbalanced Data\n",
    "\n",
    "Imbalanced data can often negatively impact model performance. Many models work best when the number of observations is balanced, and they tend to struggle when faced with unbalanced data.\n",
    "\n",
    "There are primarily two approaches to handle imbalanced datasets:\n",
    "\n",
    "-   Adding observations to the minority class: `Over-sampling`, for example, using the SMOTE algorithm, which synthetically generates new examples for the minority class by leveraging the nearest neighbors of those cases.\n",
    "\n",
    "-   Removing observations from the majority class: `Under-sampling`\n",
    "\n",
    "In our previous lesson, we demonstrated how to handle imbalanced datasets using a `recipe`. A recipe can be thought of as a blueprint that outlines the steps to be applied to a dataset to prepare it for analysis. In this case, we aim to achieve an equal distribution of cuisines in our `training set`. Let’s dive in!\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "cRzTnHolhLWd"
   },
   "source": [
    "# Load themis package for dealing with imbalanced data\n",
    "library(themis)\n",
    "\n",
    "# Create a recipe for preprocessing training data\n",
    "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n",
    "  step_smote(cuisine) \n",
    "\n",
    "# Print recipe\n",
    "cuisines_recipe"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KxOQ2ORhhO81"
   },
   "source": [
    "Now we are ready to train models 👩‍💻👨‍💻!\n",
    "\n",
    "## 3. Beyond multinomial regression models\n",
    "\n",
    "In our previous lesson, we explored multinomial regression models. Let's dive into some more flexible models for classification.\n",
    "\n",
    "### Support Vector Machines\n",
    "\n",
    "In classification tasks, `Support Vector Machines` is a machine learning technique that aims to find a *hyperplane* that \"optimally\" separates the classes. Here's a simple example:\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/svm.png\"\n",
    "   width=\"300\"/>\n",
    "   <figcaption>https://commons.wikimedia.org/w/index.php?curid=22877598</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C4Wsd0vZhXYu"
   },
   "source": [
    "H1~ does not separate the classes. H2~ does, but only with a small margin. H3~ separates them with the maximal margin.\n",
    "\n",
    "#### Linear Support Vector Classifier\n",
    "\n",
    "Support-Vector clustering (SVC) is part of the Support-Vector machines family of machine learning techniques. In SVC, the hyperplane is selected to correctly separate `most` of the training observations, but `may misclassify` a few observations. By allowing some points to fall on the wrong side, the SVM becomes more robust to outliers, which improves its ability to generalize to new data. The parameter that controls this tolerance is called `cost`, which has a default value of 1 (see `help(\"svm_poly\")`).\n",
    "\n",
    "Let's create a linear SVC by setting `degree = 1` in a polynomial SVM model.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vJpp6nuChlBz"
   },
   "source": [
    "# Make a linear SVC specification\n",
    "svc_linear_spec <- svm_poly(degree = 1) %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svc_linear_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svc_linear_spec)\n",
    "\n",
    "# Print out workflow\n",
    "svc_linear_wf"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rDs8cWNkhoqu"
   },
   "source": [
    "Now that we have encapsulated the preprocessing steps and model specification into a *workflow*, we can proceed to train the linear SVC and assess the results simultaneously. For performance metrics, let's define a metric set to evaluate: `accuracy`, `sensitivity`, `Positive Predicted Value`, and `F Measure`.\n",
    "\n",
    "> `augment()` will append column(s) containing predictions to the provided data.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "81wiqcwuhrnq"
   },
   "source": [
    "# Train a linear SVC model\n",
    "svc_linear_fit <- svc_linear_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svc_linear_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0UFQvHf-huo3"
   },
   "source": [
    "#### Support Vector Machine\n",
    "\n",
    "The support vector machine (SVM) is an advanced version of the support vector classifier designed to handle non-linear boundaries between classes. Essentially, SVMs utilize the *kernel trick* to expand the feature space, making it possible to model nonlinear relationships between classes. One widely used and highly versatile kernel function employed by SVMs is the *Radial basis function.* Let's explore how it performs with our data.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "-KX4S8mzhzmp"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Make an RBF SVM specification\n",
    "svm_rbf_spec <- svm_rbf() %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svm_rbf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svm_rbf_spec)\n",
    "\n",
    "\n",
    "# Train an RBF model\n",
    "svm_rbf_fit <- svm_rbf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svm_rbf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QBFSa7WSh4HQ"
   },
   "source": [
    "Much better 🤩!\n",
    "\n",
    "> ✅ Please see:\n",
    ">\n",
    "> -   [*Support Vector Machines*](https://bradleyboehmke.github.io/HOML/svm.html), Hands-on Machine Learning with R\n",
    ">\n",
    "> -   [*Support Vector Machines*](https://www.statlearning.com/), An Introduction to Statistical Learning with Applications in R\n",
    ">\n",
    "> for further reading.\n",
    "\n",
    "### Nearest Neighbor classifiers\n",
    "\n",
    "*K*-nearest neighbor (KNN) is an algorithm where each observation is predicted based on its *similarity* to other observations.\n",
    "\n",
    "Let's fit one to our data.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "k4BxxBcdh9Ka"
   },
   "source": [
    "# Make a KNN specification\n",
    "knn_spec <- nearest_neighbor() %>% \n",
    "  set_engine(\"kknn\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "knn_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(knn_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "knn_wf_fit <- knn_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "knn_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HaegQseriAcj"
   },
   "source": [
    "It seems that this model is not performing very well. Adjusting the model's arguments (see `help(\"nearest_neighbor\")`) might improve its performance. Make sure to give it a try.\n",
    "\n",
    "> ✅ Please refer to:\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> to learn more about *K*-Nearest Neighbors classifiers.\n",
    "\n",
    "### Ensemble classifiers\n",
    "\n",
    "Ensemble algorithms work by combining multiple base estimators to create an optimal model, either by:\n",
    "\n",
    "`bagging`: using an *averaging function* on a collection of base models\n",
    "\n",
    "`boosting`: building a sequence of models that improve upon each other to enhance predictive performance.\n",
    "\n",
    "Let's begin by experimenting with a Random Forest model, which constructs a large collection of decision trees and then applies an averaging function to achieve a better overall model.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "49DPoVs6iK1M"
   },
   "source": [
    "# Make a random forest specification\n",
    "rf_spec <- rand_forest() %>% \n",
    "  set_engine(\"ranger\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "rf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(rf_spec)\n",
    "\n",
    "# Train a random forest model\n",
    "rf_wf_fit <- rf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "rf_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGVYwC_aiUWc"
   },
   "source": [
    "Good job 👏!\n",
    "\n",
    "Let's also experiment with a Boosted Tree model.\n",
    "\n",
    "Boosted Tree is an ensemble method that builds a sequence of decision trees, where each tree relies on the outcomes of the previous ones to gradually minimize errors. It emphasizes the weights of misclassified items and adjusts the next classifier to improve accuracy.\n",
    "\n",
    "There are various approaches to fitting this model (refer to `help(\"boost_tree\")`). In this example, we'll fit Boosted Trees using the `xgboost` engine.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Py1YWo-micWs"
   },
   "source": [
    "# Make a boosted tree specification\n",
    "boost_spec <- boost_tree(trees = 200) %>% \n",
    "  set_engine(\"xgboost\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "boost_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(boost_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "boost_wf_fit <- boost_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "boost_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zNQnbuejigZM"
   },
   "source": [
    "> ✅ Please see:\n",
    ">\n",
    "> -   [Machine Learning for Social Scientists](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> -   <https://algotech.netlify.app/blog/xgboost/> - Explores the AdaBoost model which is a good alternative to xgboost.\n",
    ">\n",
    "> to learn more about Ensemble classifiers.\n",
    "\n",
    "## 4. Extra - comparing multiple models\n",
    "\n",
    "We’ve worked with quite a few models in this lab 🙌. Creating workflows for different combinations of preprocessors and/or model specifications, and then calculating performance metrics for each one individually, can quickly become tedious and time-consuming.\n",
    "\n",
    "Let’s tackle this by building a function that fits a list of workflows on the training set and then returns the performance metrics based on the test set. To achieve this, we’ll use `map()` and `map_dfr()` from the [purrr](https://purrr.tidyverse.org/) package to apply functions to each element in a list.\n",
    "\n",
    "> [`map()`](https://purrr.tidyverse.org/reference/map.html) functions let you replace many for loops with code that is more concise and easier to read. The best resource for learning about [`map()`](https://purrr.tidyverse.org/reference/map.html) functions is the [iteration chapter](http://r4ds.had.co.nz/iteration.html) in R for data science.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Qzb7LyZnimd2"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "# Define a function that returns performance metrics\n",
    "compare_models <- function(workflow_list, train_set, test_set){\n",
    "  \n",
    "  suppressWarnings(\n",
    "    # Fit each model to the train_set\n",
    "    map(workflow_list, fit, data = train_set) %>% \n",
    "    # Make predictions on the test set\n",
    "      map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n",
    "    # Select desired columns\n",
    "      select(model, cuisine, .pred_class) %>% \n",
    "    # Evaluate model performance\n",
    "      group_by(model) %>% \n",
    "      eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n",
    "      ungroup()\n",
    "  )\n",
    "  \n",
    "} # End of function"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Fwa712sNisDA"
   },
   "source": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "3i4VJOi2iu-a"
   },
   "source": [
    "# Make a list of workflows\n",
    "workflow_list <- list(\n",
    "  \"svc\" = svc_linear_wf,\n",
    "  \"svm\" = svm_rbf_wf,\n",
    "  \"knn\" = knn_wf,\n",
    "  \"random_forest\" = rf_wf,\n",
    "  \"xgboost\" = boost_wf)\n",
    "\n",
    "# Call the function\n",
    "set.seed(2056)\n",
    "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n",
    "\n",
    "# Print out performance metrics\n",
    "perf_metrics %>% \n",
    "  group_by(.metric) %>% \n",
    "  arrange(desc(.estimate)) %>% \n",
    "  slice_head(n=7)\n",
    "\n",
    "# Compare accuracy\n",
    "perf_metrics %>% \n",
    "  filter(.metric == \"accuracy\") %>% \n",
    "  arrange(desc(.estimate))\n"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KuWK_lEli4nW"
   },
   "source": [
    "[**workflowset**](https://workflowsets.tidymodels.org/) package allows users to create and easily fit a large number of models but is primarily designed to work with resampling techniques such as `cross-validation`, an approach we have yet to cover.\n",
    "\n",
    "## **🚀Challenge**\n",
    "\n",
    "Each of these techniques has a variety of parameters that you can adjust, such as `cost` in SVMs, `neighbors` in KNN, and `mtry` (Randomly Selected Predictors) in Random Forest.\n",
    "\n",
    "Research the default parameters for each and consider what changing these parameters might mean for the quality of the model.\n",
    "\n",
    "To learn more about a specific model and its parameters, use: `help(\"model\")`, e.g., `help(\"rand_forest\")`.\n",
    "\n",
    "> In practice, we often *estimate* the *optimal values* for these parameters by training multiple models on a `simulated data set` and evaluating how well each model performs. This process is called **tuning**.\n",
    "\n",
    "### [**Post-lecture quiz**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n",
    "\n",
    "### **Review & Self Study**\n",
    "\n",
    "There’s a lot of technical terminology in these lessons, so take a moment to review [this list](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott) of useful terms!\n",
    "\n",
    "#### THANK YOU TO:\n",
    "\n",
    "[`Allison Horst`](https://twitter.com/allison_horst/) for creating the wonderful illustrations that make R more approachable and engaging. You can find more of her work in her [gallery](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM).\n",
    "\n",
    "[Cassie Breviu](https://www.twitter.com/cassieview) and [Jen Looper](https://www.twitter.com/jenlooper) for creating the original Python version of this module ♥️\n",
    "\n",
    "Happy Learning,\n",
    "\n",
    "[Eric](https://twitter.com/ericntay), Gold Microsoft Learn Student Ambassador.\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/r_learners_sm.jpeg\"\n",
    "   width=\"569\"/>\n",
    "   <figcaption>Artwork by @allison_horst</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**Disclaimer**:  \nThis document has been translated using the AI translation service [Co-op Translator](https://github.com/Azure/co-op-translator). While we aim for accuracy, please note that automated translations may include errors or inaccuracies. The original document in its native language should be regarded as the authoritative source. For critical information, professional human translation is advised. We are not responsible for any misunderstandings or misinterpretations resulting from the use of this translation.\n"
   ]
  }
 ]
}