ML-For-Beginners/translations/tw/4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb

{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "name": "lesson_12-R.ipynb",
   "provenance": [],
   "collapsed_sections": []
  },
  "kernelspec": {
   "name": "ir",
   "display_name": "R"
  },
  "language_info": {
   "name": "R"
  },
  "coopTranslator": {
   "original_hash": "fab50046ca413a38939d579f8432274f",
   "translation_date": "2025-09-03T20:33:25+00:00",
   "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb",
   "language_code": "tw"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jsFutf_ygqSx"
   },
   "source": [
    "# 建立分類模型：美味的亞洲和印度料理\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HD54bEefgtNO"
   },
   "source": [
    "## 美食分類器 2\n",
    "\n",
    "在這第二部分的分類課程中，我們將探索`更多方法`來分類類別型數據。我們還將了解選擇不同分類器所帶來的影響。\n",
    "\n",
    "### [**課前測驗**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n",
    "\n",
    "### **前置條件**\n",
    "\n",
    "我們假設您已完成之前的課程，因為我們將延續之前學到的一些概念。\n",
    "\n",
    "在本課程中，我們需要以下套件：\n",
    "\n",
    "-   `tidyverse`： [tidyverse](https://www.tidyverse.org/) 是一個[由 R 套件組成的集合](https://www.tidyverse.org/packages)，旨在讓數據科學更快速、更簡單、更有趣！\n",
    "\n",
    "-   `tidymodels`： [tidymodels](https://www.tidymodels.org/) 框架是一個[套件集合](https://www.tidymodels.org/packages/)，用於建模和機器學習。\n",
    "\n",
    "-   `themis`： [themis 套件](https://themis.tidymodels.org/) 提供額外的配方步驟，用於處理不平衡數據。\n",
    "\n",
    "您可以使用以下指令安裝它們：\n",
    "\n",
    "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n",
    "\n",
    "或者，以下腳本會檢查您是否已安裝完成此模組所需的套件，並在缺少時為您安裝。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vZ57IuUxgyQt"
   },
   "source": [
    "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
    "\n",
    "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z22M-pj4g07x"
   },
   "source": [
    "## **1. 分類地圖**\n",
    "\n",
    "在我們的[上一課](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1)中，我們試圖解答一個問題：如何在多個模型之間進行選擇？在很大程度上，這取決於數據的特性以及我們想要解決的問題類型（例如分類或回歸）。\n",
    "\n",
    "之前，我們學習了使用 Microsoft 的速查表進行數據分類的各種選項。Python 的機器學習框架 Scikit-learn 提供了一個類似但更細緻的速查表，可以進一步幫助縮小您的估算器（另一個分類器的術語）範圍：\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/map.png\"\n",
    "   width=\"700\"/>\n",
    "   <figcaption></figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u1i3xRIVg7vG"
   },
   "source": [
    "> 提示：[在線查看此地圖](https://scikit-learn.org/stable/tutorial/machine_learning_map/)，並沿著路徑點擊以閱讀相關文檔。\n",
    ">\n",
    "> [Tidymodels 參考網站](https://www.tidymodels.org/find/parsnip/#models)也提供了關於不同模型類型的出色文檔。\n",
    "\n",
    "### **計劃** 🗺️\n",
    "\n",
    "當你對數據有清晰的理解時，這張地圖非常有幫助，因為你可以沿著它的路徑“走”到一個決策：\n",
    "\n",
    "-   我們有超過 50 個樣本\n",
    "\n",
    "-   我們想要預測一個類別\n",
    "\n",
    "-   我們有標籤數據\n",
    "\n",
    "-   我們的樣本少於 100K\n",
    "\n",
    "-   ✨ 我們可以選擇 Linear SVC\n",
    "\n",
    "-   如果這不起作用，因為我們有數值數據\n",
    "\n",
    "    -   我們可以嘗試 ✨ KNeighbors Classifier\n",
    "\n",
    "        -   如果這不起作用，嘗試 ✨ SVC 和 ✨ Ensemble Classifiers\n",
    "\n",
    "這是一條非常有幫助的路徑。現在，讓我們使用 [tidymodels](https://www.tidymodels.org/) 建模框架直接開始吧：這是一套一致且靈活的 R 套件集合，旨在鼓勵良好的統計實踐 😊。\n",
    "\n",
    "## 2. 分割數據並處理不平衡數據集\n",
    "\n",
    "在之前的課程中，我們了解到在不同的菜系中有一組常見的成分。此外，菜系的數量分佈也非常不均。\n",
    "\n",
    "我們將通過以下方式處理這些問題：\n",
    "\n",
    "-   使用 `dplyr::select()` 刪除那些在不同菜系之間造成混淆的最常見成分。\n",
    "\n",
    "-   使用一個 `recipe` 預處理數據，通過應用 `over-sampling` 算法使其準備好進行建模。\n",
    "\n",
    "我們在之前的課程中已經看過上述內容，所以這應該會非常輕鬆 🥳！\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "6tj_rN00hClA"
   },
   "source": [
    "# Load the core Tidyverse and Tidymodels packages\n",
    "library(tidyverse)\n",
    "library(tidymodels)\n",
    "\n",
    "# Load the original cuisines data\n",
    "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n",
    "\n",
    "# Drop id column, rice, garlic and ginger from our original data set\n",
    "df_select <- df %>% \n",
    "  select(-c(1, rice, garlic, ginger)) %>%\n",
    "  # Encode cuisine column as categorical\n",
    "  mutate(cuisine = factor(cuisine))\n",
    "\n",
    "\n",
    "# Create data split specification\n",
    "set.seed(2056)\n",
    "cuisines_split <- initial_split(data = df_select,\n",
    "                                strata = cuisine,\n",
    "                                prop = 0.7)\n",
    "\n",
    "# Extract the data in each split\n",
    "cuisines_train <- training(cuisines_split)\n",
    "cuisines_test <- testing(cuisines_split)\n",
    "\n",
    "# Display distribution of cuisines in the training set\n",
    "cuisines_train %>% \n",
    "  count(cuisine) %>% \n",
    "  arrange(desc(n))"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zFin5yw3hHb1"
   },
   "source": [
    "### 處理不平衡數據\n",
    "\n",
    "不平衡數據通常會對模型性能產生負面影響。許多模型在觀測數量相等時表現最佳，因此在面對不平衡數據時往往會遇到困難。\n",
    "\n",
    "處理不平衡數據集主要有兩種方法：\n",
    "\n",
    "-   增加少數類別的觀測數量：`過採樣`，例如使用 SMOTE 演算法，該演算法通過少數類別案例的最近鄰居合成生成新的少數類別樣本。\n",
    "\n",
    "-   移除多數類別的觀測數量：`欠採樣`\n",
    "\n",
    "在之前的課程中，我們展示了如何使用 `recipe` 處理不平衡數據集。`recipe` 可以被視為一種藍圖，描述了應該對數據集應用哪些步驟以使其準備好進行數據分析。在我們的案例中，我們希望在 `訓練集` 中的菜系數量分佈是均等的。現在就讓我們開始吧！\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "cRzTnHolhLWd"
   },
   "source": [
    "# Load themis package for dealing with imbalanced data\n",
    "library(themis)\n",
    "\n",
    "# Create a recipe for preprocessing training data\n",
    "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n",
    "  step_smote(cuisine) \n",
    "\n",
    "# Print recipe\n",
    "cuisines_recipe"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KxOQ2ORhhO81"
   },
   "source": [
    "現在我們準備好訓練模型了 👩‍💻👨‍💻！\n",
    "\n",
    "## 3. 超越多項式迴歸模型\n",
    "\n",
    "在上一節課中，我們探討了多項式迴歸模型。現在讓我們來看看一些更靈活的分類模型。\n",
    "\n",
    "### 支援向量機\n",
    "\n",
    "在分類的背景下，`支援向量機`是一種機器學習技術，試圖找到一個*超平面*來「最佳」地分隔不同的類別。我們來看一個簡單的例子：\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/svm.png\"\n",
    "   width=\"300\"/>\n",
    "   <figcaption>https://commons.wikimedia.org/w/index.php?curid=22877598</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C4Wsd0vZhXYu"
   },
   "source": [
    "H1~ 不會分隔類別。H2~ 會分隔，但僅有小的間距。H3~ 則以最大的間距分隔類別。\n",
    "\n",
    "#### 線性支持向量分類器\n",
    "\n",
    "支持向量聚類（SVC）是支持向量機（SVM）家族中的一種機器學習技術。在 SVC 中，超平面被選擇用來正確分隔`大部分`的訓練觀測值，但`可能會錯誤分類`一些觀測值。通過允許某些點位於錯誤的一側，SVM 對異常值的抵抗力更強，因此對新數據的泛化能力更好。調節這種違規的參數被稱為`cost`，其默認值為 1（請參閱 `help(\"svm_poly\")`）。\n",
    "\n",
    "讓我們通過在多項式 SVM 模型中設置 `degree = 1` 來創建一個線性 SVC。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vJpp6nuChlBz"
   },
   "source": [
    "# Make a linear SVC specification\n",
    "svc_linear_spec <- svm_poly(degree = 1) %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svc_linear_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svc_linear_spec)\n",
    "\n",
    "# Print out workflow\n",
    "svc_linear_wf"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rDs8cWNkhoqu"
   },
   "source": [
    "現在我們已經將預處理步驟和模型規範整合到*工作流程*中，接下來可以進行線性 SVC 的訓練，並在此過程中評估結果。關於性能指標，我們將建立一組指標來評估：`準確率`、`敏感度`、`正確預測值`以及`F 值`。\n",
    "\n",
    "> `augment()` 會將預測結果作為欄位新增到給定的數據中。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "81wiqcwuhrnq"
   },
   "source": [
    "# Train a linear SVC model\n",
    "svc_linear_fit <- svc_linear_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svc_linear_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0UFQvHf-huo3"
   },
   "source": [
    "#### 支援向量機\n",
    "\n",
    "支援向量機（SVM）是支援向量分類器的延伸，旨在適應類別之間的非線性邊界。本質上，SVM 使用*核技巧*來擴展特徵空間，以適應類別之間的非線性關係。SVM 使用的一種流行且非常靈活的核函數是*徑向基函數*。讓我們看看它在我們的數據上會有怎樣的表現。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "-KX4S8mzhzmp"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Make an RBF SVM specification\n",
    "svm_rbf_spec <- svm_rbf() %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svm_rbf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svm_rbf_spec)\n",
    "\n",
    "\n",
    "# Train an RBF model\n",
    "svm_rbf_fit <- svm_rbf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svm_rbf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QBFSa7WSh4HQ"
   },
   "source": [
    "太棒了 🤩！\n",
    "\n",
    "> ✅ 請參閱：\n",
    ">\n",
    "> -   [*支持向量機*](https://bradleyboehmke.github.io/HOML/svm.html)，《Hands-on Machine Learning with R》\n",
    ">\n",
    "> -   [*支持向量機*](https://www.statlearning.com/)，《An Introduction to Statistical Learning with Applications in R》\n",
    ">\n",
    "> 以獲取更多閱讀資料。\n",
    "\n",
    "### 最近鄰分類器\n",
    "\n",
    "*K*-最近鄰（KNN）是一種算法，其中每個觀測值根據其與其他觀測值的*相似性*進行預測。\n",
    "\n",
    "讓我們將其應用到我們的數據中。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "k4BxxBcdh9Ka"
   },
   "source": [
    "# Make a KNN specification\n",
    "knn_spec <- nearest_neighbor() %>% \n",
    "  set_engine(\"kknn\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "knn_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(knn_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "knn_wf_fit <- knn_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "knn_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HaegQseriAcj"
   },
   "source": [
    "看起來這個模型的表現不是很好。可能透過更改模型的參數（請參考 `help(\"nearest_neighbor\")`）可以提升模型的表現。務必嘗試一下。\n",
    "\n",
    "> ✅ 請參考：\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> 來深入了解 *K*-最近鄰分類器。\n",
    "\n",
    "### 集成分類器\n",
    "\n",
    "集成算法透過結合多個基礎估算器來生成最佳模型，其方法包括：\n",
    "\n",
    "`bagging`：對一組基礎模型應用*平均函數*\n",
    "\n",
    "`boosting`：構建一系列模型，彼此之間相互改進以提升預測性能。\n",
    "\n",
    "我們先從嘗試隨機森林模型開始。隨機森林模型會構建大量的決策樹，然後應用平均函數以生成更好的整體模型。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "49DPoVs6iK1M"
   },
   "source": [
    "# Make a random forest specification\n",
    "rf_spec <- rand_forest() %>% \n",
    "  set_engine(\"ranger\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "rf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(rf_spec)\n",
    "\n",
    "# Train a random forest model\n",
    "rf_wf_fit <- rf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "rf_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGVYwC_aiUWc"
   },
   "source": [
    "做得好 👏！\n",
    "\n",
    "我們也來嘗試使用 Boosted Tree 模型。\n",
    "\n",
    "Boosted Tree 定義了一種集成方法，通過建立一系列連續的決策樹，每棵樹都依賴於前一棵樹的結果，試圖逐步減少錯誤。它專注於那些被錯誤分類項目的權重，並調整下一個分類器的擬合以進行修正。\n",
    "\n",
    "有多種方式可以擬合此模型（請參閱 `help(\"boost_tree\")`）。在這個例子中，我們將通過 `xgboost` 引擎來擬合 Boosted Tree。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Py1YWo-micWs"
   },
   "source": [
    "# Make a boosted tree specification\n",
    "boost_spec <- boost_tree(trees = 200) %>% \n",
    "  set_engine(\"xgboost\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "boost_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(boost_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "boost_wf_fit <- boost_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "boost_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zNQnbuejigZM"
   },
   "source": [
    "> ✅ 請參閱：\n",
    ">\n",
    "> -   [社會科學家的機器學習](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n",
    ">\n",
    "> -   [R 的實作機器學習](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [統計學習入門：R 的應用](https://www.statlearning.com/)\n",
    ">\n",
    "> -   <https://algotech.netlify.app/blog/xgboost/> - 探討 AdaBoost 模型，它是 xgboost 的一個良好替代方案。\n",
    ">\n",
    "> 以了解更多關於集成分類器的內容。\n",
    "\n",
    "## 4. 額外部分 - 比較多個模型\n",
    "\n",
    "在這次實驗中，我們已經擬合了相當多的模型 🙌。如果需要從不同的預處理器和/或模型規範中建立大量工作流程，然後逐一計算性能指標，這可能會變得繁瑣或費力。\n",
    "\n",
    "讓我們看看是否可以通過創建一個函數來解決這個問題。該函數可以在訓練集上擬合一系列工作流程，然後根據測試集返回性能指標。我們將使用 [purrr](https://purrr.tidyverse.org/) 套件中的 `map()` 和 `map_dfr()` 來對列表中的每個元素應用函數。\n",
    "\n",
    "> [`map()`](https://purrr.tidyverse.org/reference/map.html) 函數允許您用更簡潔且更易讀的代碼替代許多 for 迴圈。學習 [`map()`](https://purrr.tidyverse.org/reference/map.html) 函數的最佳地方是 R for Data Science 中的 [迭代章節](http://r4ds.had.co.nz/iteration.html)。\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Qzb7LyZnimd2"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "# Define a function that returns performance metrics\n",
    "compare_models <- function(workflow_list, train_set, test_set){\n",
    "  \n",
    "  suppressWarnings(\n",
    "    # Fit each model to the train_set\n",
    "    map(workflow_list, fit, data = train_set) %>% \n",
    "    # Make predictions on the test set\n",
    "      map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n",
    "    # Select desired columns\n",
    "      select(model, cuisine, .pred_class) %>% \n",
    "    # Evaluate model performance\n",
    "      group_by(model) %>% \n",
    "      eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n",
    "      ungroup()\n",
    "  )\n",
    "  \n",
    "} # End of function"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Fwa712sNisDA"
   },
   "source": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "3i4VJOi2iu-a"
   },
   "source": [
    "# Make a list of workflows\n",
    "workflow_list <- list(\n",
    "  \"svc\" = svc_linear_wf,\n",
    "  \"svm\" = svm_rbf_wf,\n",
    "  \"knn\" = knn_wf,\n",
    "  \"random_forest\" = rf_wf,\n",
    "  \"xgboost\" = boost_wf)\n",
    "\n",
    "# Call the function\n",
    "set.seed(2056)\n",
    "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n",
    "\n",
    "# Print out performance metrics\n",
    "perf_metrics %>% \n",
    "  group_by(.metric) %>% \n",
    "  arrange(desc(.estimate)) %>% \n",
    "  slice_head(n=7)\n",
    "\n",
    "# Compare accuracy\n",
    "perf_metrics %>% \n",
    "  filter(.metric == \"accuracy\") %>% \n",
    "  arrange(desc(.estimate))\n"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KuWK_lEli4nW"
   },
   "source": [
    "[**workflowset**](https://workflowsets.tidymodels.org/) 套件讓使用者能夠建立並輕鬆擬合大量模型，但主要設計是用於像 `交叉驗證` 這類的重抽樣技術，我們尚未涵蓋這部分。\n",
    "\n",
    "## **🚀挑戰**\n",
    "\n",
    "每種技術都有許多參數可以調整，例如 SVM 的 `cost`、KNN 的 `neighbors`、隨機森林的 `mtry`（隨機選擇的預測變數）。\n",
    "\n",
    "研究每種模型的預設參數，並思考調整這些參數對模型品質可能產生的影響。\n",
    "\n",
    "若想了解特定模型及其參數，可使用：`help(\"model\")`，例如 `help(\"rand_forest\")`\n",
    "\n",
    "> 在實際操作中，我們通常透過在 `模擬數據集` 上訓練多個模型並測量這些模型的表現來*估算*這些參數的*最佳值*。這個過程稱為 **調參**。\n",
    "\n",
    "### [**課後測驗**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n",
    "\n",
    "### **複習與自學**\n",
    "\n",
    "這些課程中有許多術語，因此花點時間查看[這份術語表](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott)，幫助你理解重要的概念！\n",
    "\n",
    "#### 特別感謝：\n",
    "\n",
    "[`Allison Horst`](https://twitter.com/allison_horst/) 創作了令人驚嘆的插圖，讓 R 更加親切且吸引人。可以在她的[畫廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)找到更多插圖。\n",
    "\n",
    "[Cassie Breviu](https://www.twitter.com/cassieview) 和 [Jen Looper](https://www.twitter.com/jenlooper) 創建了這個模組的原始 Python 版本 ♥️\n",
    "\n",
    "祝學習愉快，\n",
    "\n",
    "[Eric](https://twitter.com/ericntay)，Gold Microsoft Learn 學生大使。\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/r_learners_sm.jpeg\"\n",
    "   width=\"569\"/>\n",
    "   <figcaption>插圖由 @allison_horst 創作</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**免責聲明**：  \n本文件使用 AI 翻譯服務 [Co-op Translator](https://github.com/Azure/co-op-translator) 進行翻譯。我們致力於提供準確的翻譯，但請注意，自動翻譯可能包含錯誤或不準確之處。應以原始語言的文件作為權威來源。對於關鍵資訊，建議尋求專業人工翻譯。我們對因使用此翻譯而產生的任何誤解或錯誤解讀概不負責。\n"
   ]
  }
 ]
}