ML-For-Beginners/translations/zh/4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb

{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "name": "lesson_12-R.ipynb",
   "provenance": [],
   "collapsed_sections": []
  },
  "kernelspec": {
   "name": "ir",
   "display_name": "R"
  },
  "language_info": {
   "name": "R"
  },
  "coopTranslator": {
   "original_hash": "fab50046ca413a38939d579f8432274f",
   "translation_date": "2025-09-03T20:31:31+00:00",
   "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb",
   "language_code": "zh"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jsFutf_ygqSx"
   },
   "source": [
    "# 构建分类模型：美味的亚洲和印度美食\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HD54bEefgtNO"
   },
   "source": [
    "## 美食分类器 2\n",
    "\n",
    "在第二节分类课程中，我们将探索`更多方法`来分类类别数据。同时，我们还会学习选择不同分类器所带来的影响。\n",
    "\n",
    "### [**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n",
    "\n",
    "### **前置知识**\n",
    "\n",
    "我们假设你已经完成了之前的课程，因为我们会继续使用之前学到的一些概念。\n",
    "\n",
    "在本课程中，我们需要以下软件包：\n",
    "\n",
    "-   `tidyverse`： [tidyverse](https://www.tidyverse.org/) 是一个[由 R 包组成的集合](https://www.tidyverse.org/packages)，旨在让数据科学更快、更简单、更有趣！\n",
    "\n",
    "-   `tidymodels`： [tidymodels](https://www.tidymodels.org/) 框架是一个[由 R 包组成的集合](https://www.tidymodels.org/packages)，用于建模和机器学习。\n",
    "\n",
    "-   `themis`： [themis 包](https://themis.tidymodels.org/) 提供了额外的配方步骤，用于处理不平衡数据。\n",
    "\n",
    "你可以通过以下命令安装它们：\n",
    "\n",
    "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n",
    "\n",
    "或者，下面的脚本会检查你是否已经安装了完成本模块所需的软件包，并在缺少时为你安装它们。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vZ57IuUxgyQt"
   },
   "source": [
    "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
    "\n",
    "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z22M-pj4g07x"
   },
   "source": [
    "## **1. 分类图**\n",
    "\n",
    "在我们[上一节课](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1)中，我们尝试解决一个问题：如何在多个模型之间进行选择？在很大程度上，这取决于数据的特性以及我们想要解决的问题类型（例如分类或回归）。\n",
    "\n",
    "之前，我们学习了使用微软的速查表对数据进行分类的各种选项。Python的机器学习框架Scikit-learn提供了一个类似但更细化的速查表，可以进一步帮助缩小你的估算器（分类器的另一种说法）的选择范围：\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/map.png\"\n",
    "   width=\"700\"/>\n",
    "   <figcaption></figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u1i3xRIVg7vG"
   },
   "source": [
    "> 提示：[在线查看这张地图](https://scikit-learn.org/stable/tutorial/machine_learning_map/)，并沿着路径点击以阅读相关文档。\n",
    ">\n",
    "> [Tidymodels参考网站](https://www.tidymodels.org/find/parsnip/#models)也提供了关于不同模型类型的优秀文档。\n",
    "\n",
    "### **计划** 🗺️\n",
    "\n",
    "这张地图在你清楚了解数据后非常有用，因为你可以沿着路径“走”到一个决策：\n",
    "\n",
    "-   我们有超过50个样本\n",
    "\n",
    "-   我们想预测一个类别\n",
    "\n",
    "-   我们有标注数据\n",
    "\n",
    "-   我们的样本少于10万\n",
    "\n",
    "-   ✨ 我们可以选择线性SVC\n",
    "\n",
    "-   如果这不起作用，因为我们有数值数据\n",
    "\n",
    "    -   我们可以尝试 ✨ KNeighbors分类器\n",
    "\n",
    "        -   如果这不起作用，尝试 ✨ SVC 和 ✨ 集成分类器\n",
    "\n",
    "这是一条非常有用的路径。现在，让我们使用 [tidymodels](https://www.tidymodels.org/) 建模框架直接开始吧：一个一致且灵活的R包集合，旨在鼓励良好的统计实践 😊。\n",
    "\n",
    "## 2. 划分数据并处理不平衡数据集\n",
    "\n",
    "从之前的课程中，我们了解到不同菜系之间有一组常见的成分。此外，菜系的数量分布也非常不均衡。\n",
    "\n",
    "我们将通过以下方式处理这些问题：\n",
    "\n",
    "-   使用 `dplyr::select()` 删除那些在不同菜系之间造成混淆的最常见成分。\n",
    "\n",
    "-   使用一个 `recipe` 来预处理数据，使其通过应用 `过采样` 算法为建模做好准备。\n",
    "\n",
    "我们在之前的课程中已经看过这些内容，所以这应该会很轻松 🥳！\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "6tj_rN00hClA"
   },
   "source": [
    "# Load the core Tidyverse and Tidymodels packages\n",
    "library(tidyverse)\n",
    "library(tidymodels)\n",
    "\n",
    "# Load the original cuisines data\n",
    "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n",
    "\n",
    "# Drop id column, rice, garlic and ginger from our original data set\n",
    "df_select <- df %>% \n",
    "  select(-c(1, rice, garlic, ginger)) %>%\n",
    "  # Encode cuisine column as categorical\n",
    "  mutate(cuisine = factor(cuisine))\n",
    "\n",
    "\n",
    "# Create data split specification\n",
    "set.seed(2056)\n",
    "cuisines_split <- initial_split(data = df_select,\n",
    "                                strata = cuisine,\n",
    "                                prop = 0.7)\n",
    "\n",
    "# Extract the data in each split\n",
    "cuisines_train <- training(cuisines_split)\n",
    "cuisines_test <- testing(cuisines_split)\n",
    "\n",
    "# Display distribution of cuisines in the training set\n",
    "cuisines_train %>% \n",
    "  count(cuisine) %>% \n",
    "  arrange(desc(n))"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zFin5yw3hHb1"
   },
   "source": [
    "### 处理数据不平衡问题\n",
    "\n",
    "数据不平衡通常会对模型性能产生负面影响。许多模型在观察数量相等时表现最佳，因此在处理不平衡数据时往往会遇到困难。\n",
    "\n",
    "处理数据不平衡问题主要有两种方法：\n",
    "\n",
    "-   为少数类别添加观察值：`过采样`，例如使用 SMOTE 算法，该算法通过少数类别的近邻合成生成新的样本。\n",
    "\n",
    "-   从多数类别中移除观察值：`欠采样`\n",
    "\n",
    "在之前的课程中，我们演示了如何使用 `recipe` 来处理数据不平衡问题。`recipe` 可以被看作是一个蓝图，描述了应该对数据集应用哪些步骤以使其准备好进行数据分析。在我们的案例中，我们希望在 `训练集` 中实现菜系数量的均匀分布。让我们直接开始吧。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "cRzTnHolhLWd"
   },
   "source": [
    "# Load themis package for dealing with imbalanced data\n",
    "library(themis)\n",
    "\n",
    "# Create a recipe for preprocessing training data\n",
    "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n",
    "  step_smote(cuisine) \n",
    "\n",
    "# Print recipe\n",
    "cuisines_recipe"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KxOQ2ORhhO81"
   },
   "source": [
    "现在我们准备开始训练模型了 👩‍💻👨‍💻！\n",
    "\n",
    "## 3. 超越多项式回归模型\n",
    "\n",
    "在之前的课程中，我们学习了多项式回归模型。现在让我们探索一些更灵活的分类模型。\n",
    "\n",
    "### 支持向量机\n",
    "\n",
    "在分类的背景下，`支持向量机`是一种机器学习技术，它试图找到一个*超平面*来“最佳”地分隔不同的类别。让我们来看一个简单的例子：\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/svm.png\"\n",
    "   width=\"300\"/>\n",
    "   <figcaption>https://commons.wikimedia.org/w/index.php?curid=22877598</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C4Wsd0vZhXYu"
   },
   "source": [
    "H1~ 不会分隔类。H2~ 会分隔类，但仅有小的间距。H3~ 会以最大间距分隔类。\n",
    "\n",
    "#### 线性支持向量分类器\n",
    "\n",
    "支持向量聚类（SVC）是支持向量机（SVM）机器学习技术家族中的一种方法。在 SVC 中，超平面被选择为正确分隔`大多数`训练样本，但`可能会错误分类`一些样本。通过允许某些点位于错误的一侧，SVM 对异常值的鲁棒性更强，因此对新数据的泛化能力更好。调节这种违反规则的参数称为`cost`，其默认值为 1（参见 `help(\"svm_poly\")`）。\n",
    "\n",
    "让我们通过在多项式 SVM 模型中设置 `degree = 1` 来创建一个线性 SVC。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vJpp6nuChlBz"
   },
   "source": [
    "# Make a linear SVC specification\n",
    "svc_linear_spec <- svm_poly(degree = 1) %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svc_linear_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svc_linear_spec)\n",
    "\n",
    "# Print out workflow\n",
    "svc_linear_wf"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rDs8cWNkhoqu"
   },
   "source": [
    "现在我们已经将预处理步骤和模型规范整合到一个*工作流*中，可以继续训练线性SVC并在此过程中评估结果。对于性能指标，我们可以创建一个指标集来评估：`准确率`、`敏感性`、`正预测值`和`F值`。\n",
    "\n",
    "> `augment()` 会向给定数据添加预测结果的列。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "81wiqcwuhrnq"
   },
   "source": [
    "# Train a linear SVC model\n",
    "svc_linear_fit <- svc_linear_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svc_linear_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0UFQvHf-huo3"
   },
   "source": [
    "#### 支持向量机\n",
    "\n",
    "支持向量机（SVM）是支持向量分类器的扩展，用于处理类别之间的非线性边界。本质上，SVM通过使用*核技巧*来扩大特征空间，以适应类别之间的非线性关系。SVM使用的一种流行且极其灵活的核函数是*径向基函数*。让我们看看它在我们的数据上表现如何。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "-KX4S8mzhzmp"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Make an RBF SVM specification\n",
    "svm_rbf_spec <- svm_rbf() %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svm_rbf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svm_rbf_spec)\n",
    "\n",
    "\n",
    "# Train an RBF model\n",
    "svm_rbf_fit <- svm_rbf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svm_rbf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QBFSa7WSh4HQ"
   },
   "source": [
    "太棒了 🤩！\n",
    "\n",
    "> ✅ 请参阅：\n",
    ">\n",
    "> -   [*支持向量机*](https://bradleyboehmke.github.io/HOML/svm.html)，《Hands-on Machine Learning with R》\n",
    ">\n",
    "> -   [*支持向量机*](https://www.statlearning.com/)，《An Introduction to Statistical Learning with Applications in R》\n",
    ">\n",
    "> 了解更多内容。\n",
    "\n",
    "### 最近邻分类器\n",
    "\n",
    "*K*-最近邻（KNN）是一种算法，根据每个观测值与其他观测值的*相似性*来进行预测。\n",
    "\n",
    "让我们将其应用到我们的数据中。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "k4BxxBcdh9Ka"
   },
   "source": [
    "# Make a KNN specification\n",
    "knn_spec <- nearest_neighbor() %>% \n",
    "  set_engine(\"kknn\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "knn_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(knn_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "knn_wf_fit <- knn_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "knn_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HaegQseriAcj"
   },
   "source": [
    "看起来这个模型的表现不是很好。可能通过更改模型的参数（请参阅 `help(\"nearest_neighbor\")`）可以提升模型的性能。一定要尝试一下。\n",
    "\n",
    "> ✅ 请参考：\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> 了解更多关于 *K*-最近邻分类器的信息。\n",
    "\n",
    "### 集成分类器\n",
    "\n",
    "集成算法通过结合多个基础估计器来构建一个优化模型，其方法包括：\n",
    "\n",
    "`bagging`：对一组基础模型应用*平均函数*\n",
    "\n",
    "`boosting`：构建一系列模型，彼此之间相互依赖，以提升预测性能。\n",
    "\n",
    "我们先尝试一个随机森林模型，它通过构建大量决策树并应用平均函数来生成一个更优的整体模型。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "49DPoVs6iK1M"
   },
   "source": [
    "# Make a random forest specification\n",
    "rf_spec <- rand_forest() %>% \n",
    "  set_engine(\"ranger\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "rf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(rf_spec)\n",
    "\n",
    "# Train a random forest model\n",
    "rf_wf_fit <- rf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "rf_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGVYwC_aiUWc"
   },
   "source": [
    "干得好 👏！\n",
    "\n",
    "我们也来尝试一下提升树模型。\n",
    "\n",
    "提升树是一种集成方法，它通过创建一系列连续的决策树，每棵树都依赖于前一棵树的结果，试图逐步减少误差。它重点关注被错误分类的项目的权重，并调整下一分类器的拟合以进行纠正。\n",
    "\n",
    "有多种方法可以拟合此模型（参见 `help(\"boost_tree\")`）。在这个例子中，我们将通过 `xgboost` 引擎来拟合提升树。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Py1YWo-micWs"
   },
   "source": [
    "# Make a boosted tree specification\n",
    "boost_spec <- boost_tree(trees = 200) %>% \n",
    "  set_engine(\"xgboost\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "boost_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(boost_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "boost_wf_fit <- boost_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "boost_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zNQnbuejigZM"
   },
   "source": [
    "> ✅ 请参阅：\n",
    ">\n",
    "> -   [社会科学中的机器学习](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n",
    ">\n",
    "> -   [R语言实践中的机器学习](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [统计学习导论：R语言应用](https://www.statlearning.com/)\n",
    ">\n",
    "> -   <https://algotech.netlify.app/blog/xgboost/> - 探讨了AdaBoost模型，这是xgboost的一个不错替代方案。\n",
    ">\n",
    "> 了解更多关于集成分类器的信息。\n",
    "\n",
    "## 4. 额外内容 - 比较多个模型\n",
    "\n",
    "在本次实验中，我们已经拟合了相当多的模型 🙌。如果需要从不同的预处理器和/或模型规格中创建大量工作流，然后逐一计算性能指标，这可能会变得繁琐或费力。\n",
    "\n",
    "让我们看看是否可以通过创建一个函数来解决这个问题，该函数可以在训练集上拟合一组工作流，并根据测试集返回性能指标。我们将使用 [purrr](https://purrr.tidyverse.org/) 包中的 `map()` 和 `map_dfr()` 来对列表中的每个元素应用函数。\n",
    "\n",
    "> [`map()`](https://purrr.tidyverse.org/reference/map.html) 函数允许您用更简洁且更易读的代码替代许多for循环。学习 [`map()`](https://purrr.tidyverse.org/reference/map.html) 函数的最佳地方是《R语言数据科学》中的[迭代章节](http://r4ds.had.co.nz/iteration.html)。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Qzb7LyZnimd2"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "# Define a function that returns performance metrics\n",
    "compare_models <- function(workflow_list, train_set, test_set){\n",
    "  \n",
    "  suppressWarnings(\n",
    "    # Fit each model to the train_set\n",
    "    map(workflow_list, fit, data = train_set) %>% \n",
    "    # Make predictions on the test set\n",
    "      map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n",
    "    # Select desired columns\n",
    "      select(model, cuisine, .pred_class) %>% \n",
    "    # Evaluate model performance\n",
    "      group_by(model) %>% \n",
    "      eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n",
    "      ungroup()\n",
    "  )\n",
    "  \n",
    "} # End of function"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Fwa712sNisDA"
   },
   "source": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "3i4VJOi2iu-a"
   },
   "source": [
    "# Make a list of workflows\n",
    "workflow_list <- list(\n",
    "  \"svc\" = svc_linear_wf,\n",
    "  \"svm\" = svm_rbf_wf,\n",
    "  \"knn\" = knn_wf,\n",
    "  \"random_forest\" = rf_wf,\n",
    "  \"xgboost\" = boost_wf)\n",
    "\n",
    "# Call the function\n",
    "set.seed(2056)\n",
    "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n",
    "\n",
    "# Print out performance metrics\n",
    "perf_metrics %>% \n",
    "  group_by(.metric) %>% \n",
    "  arrange(desc(.estimate)) %>% \n",
    "  slice_head(n=7)\n",
    "\n",
    "# Compare accuracy\n",
    "perf_metrics %>% \n",
    "  filter(.metric == \"accuracy\") %>% \n",
    "  arrange(desc(.estimate))\n"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KuWK_lEli4nW"
   },
   "source": [
    "[**workflowset**](https://workflowsets.tidymodels.org/) 包允许用户创建并轻松拟合大量模型，但主要设计用于与诸如 `交叉验证` 之类的重采样技术配合使用，这是一种我们尚未涉及的方法。\n",
    "\n",
    "## **🚀挑战**\n",
    "\n",
    "每种技术都有许多参数可以调整，例如 SVM 中的 `cost`，KNN 中的 `neighbors`，随机森林中的 `mtry`（随机选择的预测变量）。\n",
    "\n",
    "研究每种模型的默认参数，并思考调整这些参数对模型质量的影响。\n",
    "\n",
    "要了解特定模型及其参数的更多信息，请使用：`help(\"model\")`，例如 `help(\"rand_forest\")`\n",
    "\n",
    "> 实际中，我们通常通过在一个 `模拟数据集` 上训练多个模型并测量这些模型的表现来*估计*这些参数的*最佳值*。这个过程称为 **调参**。\n",
    "\n",
    "### [**课后测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n",
    "\n",
    "### **复习与自学**\n",
    "\n",
    "这些课程中有很多术语，因此花点时间查看[这个列表](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott)中的有用术语！\n",
    "\n",
    "#### 特别感谢：\n",
    "\n",
    "[`Allison Horst`](https://twitter.com/allison_horst/) 创作了令人惊叹的插图，使 R 更加友好和吸引人。可以在她的[画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)中找到更多插图。\n",
    "\n",
    "[Cassie Breviu](https://www.twitter.com/cassieview) 和 [Jen Looper](https://www.twitter.com/jenlooper) 创作了本模块的原始 Python 版本 ♥️\n",
    "\n",
    "祝学习愉快，\n",
    "\n",
    "[Eric](https://twitter.com/ericntay)，微软金牌学习学生大使。\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/r_learners_sm.jpeg\"\n",
    "   width=\"569\"/>\n",
    "   <figcaption>插图作者 @allison_horst</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**免责声明**：  \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性，但请注意，自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息，建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
   ]
  }
 ]
}