ML-For-Beginners/translations/th/4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb

{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "name": "lesson_12-R.ipynb",
   "provenance": [],
   "collapsed_sections": []
  },
  "kernelspec": {
   "name": "ir",
   "display_name": "R"
  },
  "language_info": {
   "name": "R"
  },
  "coopTranslator": {
   "original_hash": "fab50046ca413a38939d579f8432274f",
   "translation_date": "2025-09-06T14:48:47+00:00",
   "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb",
   "language_code": "th"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jsFutf_ygqSx"
   },
   "source": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HD54bEefgtNO"
   },
   "source": [
    "## ตัวจำแนกประเภทอาหาร 2\n",
    "\n",
    "ในบทเรียนการจำแนกประเภทครั้งที่สองนี้ เราจะสำรวจ `วิธีเพิ่มเติม` ในการจำแนกข้อมูลเชิงหมวดหมู่ นอกจากนี้เรายังจะเรียนรู้ถึงผลกระทบจากการเลือกตัวจำแนกประเภทหนึ่งแทนอีกตัวหนึ่ง\n",
    "\n",
    "### [**แบบทดสอบก่อนเรียน**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n",
    "\n",
    "### **ความรู้พื้นฐานที่ต้องมี**\n",
    "\n",
    "เราสมมติว่าคุณได้เรียนจบบทเรียนก่อนหน้านี้แล้ว เนื่องจากเราจะนำแนวคิดบางอย่างที่เราเรียนรู้มาก่อนมาใช้ต่อในบทเรียนนี้\n",
    "\n",
    "สำหรับบทเรียนนี้ เราจะต้องใช้แพ็กเกจดังต่อไปนี้:\n",
    "\n",
    "-   `tidyverse`: [tidyverse](https://www.tidyverse.org/) คือ [ชุดของแพ็กเกจ R](https://www.tidyverse.org/packages) ที่ออกแบบมาเพื่อทำให้การวิเคราะห์ข้อมูลเร็วขึ้น ง่ายขึ้น และสนุกมากขึ้น!\n",
    "\n",
    "-   `tidymodels`: [tidymodels](https://www.tidymodels.org/) เป็นกรอบงานที่เป็น [ชุดของแพ็กเกจ](https://www.tidymodels.org/packages/) สำหรับการสร้างแบบจำลองและการเรียนรู้ของเครื่อง\n",
    "\n",
    "-   `themis`: [แพ็กเกจ themis](https://themis.tidymodels.org/) ให้ขั้นตอนเพิ่มเติมสำหรับการจัดการข้อมูลที่ไม่สมดุล\n",
    "\n",
    "คุณสามารถติดตั้งแพ็กเกจเหล่านี้ได้โดยใช้คำสั่ง:\n",
    "\n",
    "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n",
    "\n",
    "หรือใช้สคริปต์ด้านล่างเพื่อตรวจสอบว่าคุณมีแพ็กเกจที่จำเป็นสำหรับการเรียนในโมดูลนี้หรือไม่ และติดตั้งให้ในกรณีที่ยังไม่มี\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vZ57IuUxgyQt"
   },
   "source": [
    "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
    "\n",
    "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z22M-pj4g07x"
   },
   "source": [
    "## **1. แผนที่การจัดประเภท**\n",
    "\n",
    "ใน [บทเรียนก่อนหน้านี้](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1) เราได้พยายามตอบคำถามว่า: เราจะเลือกใช้โมเดลใดในหลายๆ โมเดลที่มีอยู่? คำตอบส่วนใหญ่ขึ้นอยู่กับลักษณะของข้อมูลและประเภทของปัญหาที่เราต้องการแก้ไข (เช่น การจัดประเภทหรือการถดถอย)\n",
    "\n",
    "ก่อนหน้านี้ เราได้เรียนรู้เกี่ยวกับตัวเลือกต่างๆ ที่คุณมีเมื่อจัดประเภทข้อมูลโดยใช้แผ่นโกงของ Microsoft Python's Machine Learning framework, Scikit-learn มีแผ่นโกงที่คล้ายกันแต่มีรายละเอียดมากขึ้น ซึ่งสามารถช่วยจำกัดตัวเลือกของคุณให้แคบลง (อีกคำหนึ่งที่ใช้เรียกตัวจัดประเภท):\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/map.png\"\n",
    "   width=\"700\"/>\n",
    "   <figcaption></figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u1i3xRIVg7vG"
   },
   "source": [
    "> เคล็ดลับ: [เยี่ยมชมแผนที่นี้ออนไลน์](https://scikit-learn.org/stable/tutorial/machine_learning_map/) และคลิกตามเส้นทางเพื่ออ่านเอกสารประกอบ  \n",
    ">  \n",
    "> [เว็บไซต์อ้างอิงของ Tidymodels](https://www.tidymodels.org/find/parsnip/#models) ยังมีเอกสารที่ยอดเยี่ยมเกี่ยวกับประเภทของโมเดลต่าง ๆ ให้ศึกษาเพิ่มเติมด้วย\n",
    "\n",
    "### **แผนการ** 🗺️\n",
    "\n",
    "แผนที่นี้มีประโยชน์มากเมื่อคุณเข้าใจข้อมูลของคุณอย่างชัดเจน เพราะคุณสามารถ 'เดิน' ไปตามเส้นทางเพื่อหาคำตอบได้:\n",
    "\n",
    "-   เรามีตัวอย่างมากกว่า 50 ตัวอย่าง\n",
    "\n",
    "-   เราต้องการทำนายประเภท (category)\n",
    "\n",
    "-   เรามีข้อมูลที่มีป้ายกำกับ (labeled data)\n",
    "\n",
    "-   เรามีตัวอย่างน้อยกว่า 100,000 ตัวอย่าง\n",
    "\n",
    "-   ✨ เราสามารถเลือกใช้ Linear SVC ได้\n",
    "\n",
    "-   ถ้าไม่ได้ผล เนื่องจากเรามีข้อมูลเชิงตัวเลข\n",
    "\n",
    "    -   เราสามารถลองใช้ ✨ KNeighbors Classifier\n",
    "\n",
    "        -   ถ้ายังไม่ได้ผลอีก ให้ลองใช้ ✨ SVC และ ✨ Ensemble Classifiers\n",
    "\n",
    "นี่เป็นเส้นทางที่มีประโยชน์มากในการปฏิบัติตาม ตอนนี้ มาเริ่มต้นกันเลยโดยใช้ [tidymodels](https://www.tidymodels.org/) ซึ่งเป็นกรอบการทำงานสำหรับการสร้างโมเดล: คอลเลกชันของแพ็กเกจ R ที่สอดคล้องและยืดหยุ่น ซึ่งพัฒนาขึ้นเพื่อส่งเสริมการปฏิบัติทางสถิติที่ดี 😊\n",
    "\n",
    "## 2. แบ่งข้อมูลและจัดการกับชุดข้อมูลที่ไม่สมดุล\n",
    "\n",
    "จากบทเรียนก่อนหน้า เราได้เรียนรู้ว่ามีส่วนผสมบางอย่างที่พบได้ทั่วไปในอาหารของเรา นอกจากนี้ ยังมีการกระจายตัวของจำนวนอาหารที่ไม่เท่ากันอย่างมาก\n",
    "\n",
    "เราจะจัดการกับสิ่งเหล่านี้โดย:\n",
    "\n",
    "-   ลบส่วนผสมที่พบได้บ่อยที่สุดซึ่งสร้างความสับสนระหว่างอาหารที่แตกต่างกัน โดยใช้ `dplyr::select()`\n",
    "\n",
    "-   ใช้ `recipe` เพื่อเตรียมข้อมูลให้พร้อมสำหรับการสร้างโมเดล โดยการใช้อัลกอริธึม `over-sampling`\n",
    "\n",
    "เราได้ดูสิ่งเหล่านี้ไปแล้วในบทเรียนก่อนหน้า ดังนั้นสิ่งนี้น่าจะง่ายมาก 🥳!\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "6tj_rN00hClA"
   },
   "source": [
    "# Load the core Tidyverse and Tidymodels packages\n",
    "library(tidyverse)\n",
    "library(tidymodels)\n",
    "\n",
    "# Load the original cuisines data\n",
    "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n",
    "\n",
    "# Drop id column, rice, garlic and ginger from our original data set\n",
    "df_select <- df %>% \n",
    "  select(-c(1, rice, garlic, ginger)) %>%\n",
    "  # Encode cuisine column as categorical\n",
    "  mutate(cuisine = factor(cuisine))\n",
    "\n",
    "\n",
    "# Create data split specification\n",
    "set.seed(2056)\n",
    "cuisines_split <- initial_split(data = df_select,\n",
    "                                strata = cuisine,\n",
    "                                prop = 0.7)\n",
    "\n",
    "# Extract the data in each split\n",
    "cuisines_train <- training(cuisines_split)\n",
    "cuisines_test <- testing(cuisines_split)\n",
    "\n",
    "# Display distribution of cuisines in the training set\n",
    "cuisines_train %>% \n",
    "  count(cuisine) %>% \n",
    "  arrange(desc(n))"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zFin5yw3hHb1"
   },
   "source": [
    "### จัดการกับข้อมูลที่ไม่สมดุล\n",
    "\n",
    "ข้อมูลที่ไม่สมดุลมักส่งผลเสียต่อประสิทธิภาพของโมเดล หลายโมเดลทำงานได้ดีที่สุดเมื่อจำนวนข้อมูลมีความเท่ากัน และดังนั้นจึงมักมีปัญหาเมื่อเจอกับข้อมูลที่ไม่สมดุล\n",
    "\n",
    "มีวิธีหลัก ๆ สองวิธีในการจัดการกับชุดข้อมูลที่ไม่สมดุล:\n",
    "\n",
    "-   เพิ่มข้อมูลในกลุ่มที่มีจำนวนน้อย: `Over-sampling` เช่น การใช้ SMOTE algorithm ซึ่งสร้างตัวอย่างใหม่ในกลุ่มที่มีจำนวนน้อยโดยใช้ข้อมูลจากเพื่อนบ้านที่ใกล้เคียงที่สุดของกรณีเหล่านั้น\n",
    "\n",
    "-   ลบข้อมูลออกจากกลุ่มที่มีจำนวนมาก: `Under-sampling`\n",
    "\n",
    "ในบทเรียนก่อนหน้านี้ เราได้แสดงวิธีจัดการกับชุดข้อมูลที่ไม่สมดุลโดยใช้ `recipe` ซึ่งสามารถมองว่าเป็นแผนงานที่อธิบายขั้นตอนที่ควรนำไปใช้กับชุดข้อมูลเพื่อเตรียมให้พร้อมสำหรับการวิเคราะห์ข้อมูล ในกรณีของเรา เราต้องการให้มีการกระจายจำนวนข้อมูลในกลุ่มอาหารของเราอย่างเท่าเทียมกันสำหรับ `training set` มาเริ่มกันเลย!\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "cRzTnHolhLWd"
   },
   "source": [
    "# Load themis package for dealing with imbalanced data\n",
    "library(themis)\n",
    "\n",
    "# Create a recipe for preprocessing training data\n",
    "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n",
    "  step_smote(cuisine) \n",
    "\n",
    "# Print recipe\n",
    "cuisines_recipe"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KxOQ2ORhhO81"
   },
   "source": [
    "ตอนนี้เราพร้อมที่จะฝึกโมเดลแล้ว 👩‍💻👨‍💻!\n",
    "\n",
    "## 3. เกินกว่ารุ่นการถดถอยแบบพหุคูณ\n",
    "\n",
    "ในบทเรียนก่อนหน้า เราได้ศึกษารุ่นการถดถอยแบบพหุคูณ ลองมาสำรวจโมเดลที่ยืดหยุ่นมากขึ้นสำหรับการจำแนกประเภทกันเถอะ\n",
    "\n",
    "### Support Vector Machines\n",
    "\n",
    "ในบริบทของการจำแนกประเภท `Support Vector Machines` เป็นเทคนิคการเรียนรู้ของเครื่องที่พยายามค้นหา *ไฮเปอร์เพลน* ที่ \"ดีที่สุด\" ในการแยกประเภท ลองดูตัวอย่างง่าย ๆ:\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/svm.png\"\n",
    "   width=\"300\"/>\n",
    "   <figcaption>https://commons.wikimedia.org/w/index.php?curid=22877598</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C4Wsd0vZhXYu"
   },
   "source": [
    "H1~ ไม่ได้แยกคลาสออกจากกัน H2~ แยกคลาสออกจากกัน แต่มีระยะห่างเพียงเล็กน้อย H3~ แยกคลาสออกจากกันด้วยระยะห่างสูงสุด\n",
    "\n",
    "#### ตัวจำแนกเชิงเส้นแบบ Support Vector\n",
    "\n",
    "Support-Vector clustering (SVC) เป็นส่วนหนึ่งของกลุ่มเทคนิคการเรียนรู้ของเครื่อง (ML) ในตระกูล Support-Vector machines ใน SVC จะมีการเลือก hyperplane เพื่อแยก `ส่วนใหญ่` ของข้อมูลการฝึกอบรมออกจากกันอย่างถูกต้อง แต่ `อาจมีการจัดประเภทผิดพลาด` สำหรับบางข้อมูล โดยการอนุญาตให้บางจุดอยู่ในด้านที่ผิด SVM จะมีความทนทานต่อค่าผิดปกติมากขึ้น และสามารถปรับตัวให้เข้ากับข้อมูลใหม่ได้ดีขึ้น พารามิเตอร์ที่ควบคุมการละเมิดนี้เรียกว่า `cost` ซึ่งมีค่าเริ่มต้นเป็น 1 (ดู `help(\"svm_poly\")`)\n",
    "\n",
    "มาลองสร้าง SVC เชิงเส้นโดยตั้งค่า `degree = 1` ในโมเดล SVM แบบพหุนาม\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vJpp6nuChlBz"
   },
   "source": [
    "# Make a linear SVC specification\n",
    "svc_linear_spec <- svm_poly(degree = 1) %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svc_linear_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svc_linear_spec)\n",
    "\n",
    "# Print out workflow\n",
    "svc_linear_wf"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rDs8cWNkhoqu"
   },
   "source": [
    "ตอนนี้ที่เราได้รวบรวมขั้นตอนการเตรียมข้อมูลและการกำหนดโมเดลไว้ใน *workflow* แล้ว เราสามารถเริ่มฝึก Linear SVC และประเมินผลลัพธ์ไปพร้อมกันได้ สำหรับตัวชี้วัดประสิทธิภาพ เรามาสร้างชุดตัวชี้วัดที่จะประเมิน: `accuracy`, `sensitivity`, `Positive Predicted Value` และ `F Measure`\n",
    "\n",
    "> `augment()` จะเพิ่มคอลัมน์สำหรับการทำนายลงในข้อมูลที่กำหนด\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "81wiqcwuhrnq"
   },
   "source": [
    "# Train a linear SVC model\n",
    "svc_linear_fit <- svc_linear_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svc_linear_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0UFQvHf-huo3"
   },
   "source": [
    "#### ซัพพอร์ตเวกเตอร์แมชชีน\n",
    "\n",
    "ซัพพอร์ตเวกเตอร์แมชชีน (SVM) เป็นการขยายความสามารถของซัพพอร์ตเวกเตอร์คลาสซิไฟเออร์เพื่อรองรับเส้นแบ่งระหว่างคลาสที่ไม่เป็นเส้นตรง โดยหลักการแล้ว SVM ใช้ *เคอร์เนลทริก* เพื่อขยายพื้นที่ฟีเจอร์ให้สามารถปรับตัวเข้ากับความสัมพันธ์ที่ไม่เป็นเส้นตรงระหว่างคลาสได้ หนึ่งในฟังก์ชันเคอร์เนลที่ได้รับความนิยมและมีความยืดหยุ่นสูงที่ SVM ใช้คือ *Radial basis function* มาดูกันว่า SVM จะทำงานกับข้อมูลของเราได้อย่างไร\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "-KX4S8mzhzmp"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Make an RBF SVM specification\n",
    "svm_rbf_spec <- svm_rbf() %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svm_rbf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svm_rbf_spec)\n",
    "\n",
    "\n",
    "# Train an RBF model\n",
    "svm_rbf_fit <- svm_rbf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svm_rbf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QBFSa7WSh4HQ"
   },
   "source": [
    "ดีขึ้นมาก 🤩!\n",
    "\n",
    "> ✅ โปรดดู:\n",
    ">\n",
    "> -   [*Support Vector Machines*](https://bradleyboehmke.github.io/HOML/svm.html), Hands-on Machine Learning with R\n",
    ">\n",
    "> -   [*Support Vector Machines*](https://www.statlearning.com/), An Introduction to Statistical Learning with Applications in R\n",
    ">\n",
    "> สำหรับการอ่านเพิ่มเติม\n",
    "\n",
    "### ตัวจำแนกประเภท Nearest Neighbor\n",
    "\n",
    "*K*-nearest neighbor (KNN) เป็นอัลกอริทึมที่การคาดการณ์แต่ละค่าจะขึ้นอยู่กับ *ความคล้ายคลึง* กับค่าของการสังเกตอื่น ๆ\n",
    "\n",
    "ลองปรับให้เข้ากับข้อมูลของเรากันเถอะ\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "k4BxxBcdh9Ka"
   },
   "source": [
    "# Make a KNN specification\n",
    "knn_spec <- nearest_neighbor() %>% \n",
    "  set_engine(\"kknn\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "knn_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(knn_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "knn_wf_fit <- knn_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "knn_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HaegQseriAcj"
   },
   "source": [
    "ดูเหมือนว่ารุ่นนี้จะยังทำงานได้ไม่ดีนัก อาจจะลองเปลี่ยนพารามิเตอร์ของรุ่น (ดู `help(\"nearest_neighbor\")`) เพื่อปรับปรุงประสิทธิภาพของรุ่น อย่าลืมทดลองดูนะ\n",
    "\n",
    "> ✅ โปรดดู:\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> เพื่อเรียนรู้เพิ่มเติมเกี่ยวกับตัวจัดประเภท *K*-Nearest Neighbors\n",
    "\n",
    "### ตัวจัดประเภทแบบ Ensemble\n",
    "\n",
    "อัลกอริธึมแบบ Ensemble ทำงานโดยการรวมตัวประมาณค่าพื้นฐานหลายตัวเข้าด้วยกันเพื่อสร้างโมเดลที่เหมาะสมที่สุด โดยใช้วิธีการดังนี้:\n",
    "\n",
    "`bagging`: ใช้ *ฟังก์ชันเฉลี่ย* กับชุดของโมเดลพื้นฐาน\n",
    "\n",
    "`boosting`: สร้างลำดับของโมเดลที่ต่อยอดจากกันและกันเพื่อปรับปรุงประสิทธิภาพการพยากรณ์\n",
    "\n",
    "เริ่มต้นด้วยการลองใช้โมเดล Random Forest ซึ่งสร้างชุดของต้นไม้ตัดสินใจจำนวนมาก จากนั้นใช้ฟังก์ชันเฉลี่ยเพื่อสร้างโมเดลโดยรวมที่ดียิ่งขึ้น\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "49DPoVs6iK1M"
   },
   "source": [
    "# Make a random forest specification\n",
    "rf_spec <- rand_forest() %>% \n",
    "  set_engine(\"ranger\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "rf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(rf_spec)\n",
    "\n",
    "# Train a random forest model\n",
    "rf_wf_fit <- rf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "rf_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGVYwC_aiUWc"
   },
   "source": [
    "ทำได้ดีมาก 👏!\n",
    "\n",
    "ลองมาทดลองใช้โมเดล Boosted Tree กันเถอะ\n",
    "\n",
    "Boosted Tree เป็นวิธีการแบบกลุ่มที่สร้างชุดของต้นไม้ตัดสินใจที่ต่อเนื่องกัน โดยที่แต่ละต้นไม้จะขึ้นอยู่กับผลลัพธ์ของต้นไม้ก่อนหน้า เพื่อพยายามลดข้อผิดพลาดลงทีละน้อย วิธีนี้เน้นไปที่น้ำหนักของรายการที่ถูกจัดประเภทผิด และปรับการคาดการณ์สำหรับตัวจัดประเภทถัดไปเพื่อแก้ไขข้อผิดพลาด\n",
    "\n",
    "มีหลายวิธีในการปรับโมเดลนี้ (ดู `help(\"boost_tree\")`) ในตัวอย่างนี้ เราจะปรับ Boosted trees ผ่านเครื่องมือ `xgboost`\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Py1YWo-micWs"
   },
   "source": [
    "# Make a boosted tree specification\n",
    "boost_spec <- boost_tree(trees = 200) %>% \n",
    "  set_engine(\"xgboost\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "boost_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(boost_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "boost_wf_fit <- boost_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "boost_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zNQnbuejigZM"
   },
   "source": [
    "> ✅ โปรดดู:\n",
    ">\n",
    "> -   [Machine Learning for Social Scientists](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> -   <https://algotech.netlify.app/blog/xgboost/> - สำรวจโมเดล AdaBoost ซึ่งเป็นทางเลือกที่ดีสำหรับ xgboost\n",
    ">\n",
    "> เพื่อเรียนรู้เพิ่มเติมเกี่ยวกับ Ensemble classifiers\n",
    "\n",
    "## 4. เพิ่มเติม - การเปรียบเทียบโมเดลหลายตัว\n",
    "\n",
    "เราได้สร้างโมเดลจำนวนมากในห้องทดลองนี้ 🙌 การสร้าง workflows จำนวนมากจากชุด preprocessors และ/หรือ model specifications ที่แตกต่างกัน แล้วคำนวณค่าประสิทธิภาพทีละตัวอาจกลายเป็นงานที่น่าเบื่อหรือยุ่งยาก\n",
    "\n",
    "ลองมาดูกันว่าเราสามารถแก้ไขปัญหานี้ได้หรือไม่โดยการสร้างฟังก์ชันที่สามารถปรับ workflows หลายตัวในชุดข้อมูลการฝึกอบรม แล้วคืนค่าประสิทธิภาพตามชุดข้อมูลทดสอบ เราจะใช้ `map()` และ `map_dfr()` จากแพ็กเกจ [purrr](https://purrr.tidyverse.org/) เพื่อใช้ฟังก์ชันกับแต่ละองค์ประกอบในลิสต์\n",
    "\n",
    "> ฟังก์ชัน [`map()`](https://purrr.tidyverse.org/reference/map.html) ช่วยให้คุณแทนที่ for loops จำนวนมากด้วยโค้ดที่กระชับและอ่านง่ายขึ้น จุดเริ่มต้นที่ดีที่สุดในการเรียนรู้เกี่ยวกับฟังก์ชัน [`map()`](https://purrr.tidyverse.org/reference/map.html) คือบท [iteration chapter](http://r4ds.had.co.nz/iteration.html) ใน R for data science\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Qzb7LyZnimd2"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "# Define a function that returns performance metrics\n",
    "compare_models <- function(workflow_list, train_set, test_set){\n",
    "  \n",
    "  suppressWarnings(\n",
    "    # Fit each model to the train_set\n",
    "    map(workflow_list, fit, data = train_set) %>% \n",
    "    # Make predictions on the test set\n",
    "      map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n",
    "    # Select desired columns\n",
    "      select(model, cuisine, .pred_class) %>% \n",
    "    # Evaluate model performance\n",
    "      group_by(model) %>% \n",
    "      eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n",
    "      ungroup()\n",
    "  )\n",
    "  \n",
    "} # End of function"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Fwa712sNisDA"
   },
   "source": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "3i4VJOi2iu-a"
   },
   "source": [
    "# Make a list of workflows\n",
    "workflow_list <- list(\n",
    "  \"svc\" = svc_linear_wf,\n",
    "  \"svm\" = svm_rbf_wf,\n",
    "  \"knn\" = knn_wf,\n",
    "  \"random_forest\" = rf_wf,\n",
    "  \"xgboost\" = boost_wf)\n",
    "\n",
    "# Call the function\n",
    "set.seed(2056)\n",
    "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n",
    "\n",
    "# Print out performance metrics\n",
    "perf_metrics %>% \n",
    "  group_by(.metric) %>% \n",
    "  arrange(desc(.estimate)) %>% \n",
    "  slice_head(n=7)\n",
    "\n",
    "# Compare accuracy\n",
    "perf_metrics %>% \n",
    "  filter(.metric == \"accuracy\") %>% \n",
    "  arrange(desc(.estimate))\n"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KuWK_lEli4nW"
   },
   "source": [
    "แพ็กเกจ [**workflowset**](https://workflowsets.tidymodels.org/) ช่วยให้ผู้ใช้สามารถสร้างและปรับใช้โมเดลจำนวนมากได้อย่างง่ายดาย แต่ส่วนใหญ่ถูกออกแบบมาเพื่อใช้งานร่วมกับเทคนิคการสุ่มตัวอย่าง เช่น `cross-validation` ซึ่งเป็นวิธีที่เรายังไม่ได้กล่าวถึง\n",
    "\n",
    "## **🚀ความท้าทาย**\n",
    "\n",
    "แต่ละเทคนิคเหล่านี้มีพารามิเตอร์จำนวนมากที่คุณสามารถปรับแต่งได้ เช่น `cost` ใน SVMs, `neighbors` ใน KNN, `mtry` (ตัวทำนายที่ถูกเลือกแบบสุ่ม) ใน Random Forest\n",
    "\n",
    "ค้นคว้าพารามิเตอร์เริ่มต้นของแต่ละโมเดล และลองคิดดูว่าการปรับแต่งพารามิเตอร์เหล่านี้จะส่งผลต่อคุณภาพของโมเดลอย่างไร\n",
    "\n",
    "หากต้องการทราบข้อมูลเพิ่มเติมเกี่ยวกับโมเดลและพารามิเตอร์ของมัน ให้ใช้: `help(\"model\")` เช่น `help(\"rand_forest\")`\n",
    "\n",
    "> ในการใช้งานจริง เรามักจะ *ประมาณค่า* *ค่าที่ดีที่สุด* โดยการฝึกโมเดลหลายตัวบน `ชุดข้อมูลจำลอง` และวัดผลว่าโมเดลเหล่านี้ทำงานได้ดีเพียงใด กระบวนการนี้เรียกว่า **การปรับแต่ง**\n",
    "\n",
    "### [**แบบทดสอบหลังการบรรยาย**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n",
    "\n",
    "### **ทบทวนและศึกษาด้วยตนเอง**\n",
    "\n",
    "มีคำศัพท์เฉพาะมากมายในบทเรียนเหล่านี้ ใช้เวลาสักครู่เพื่อทบทวน [รายการนี้](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott) ซึ่งรวบรวมคำศัพท์ที่มีประโยชน์ไว้!\n",
    "\n",
    "#### ขอขอบคุณ:\n",
    "\n",
    "[`Allison Horst`](https://twitter.com/allison_horst/) สำหรับการสร้างภาพประกอบที่น่าทึ่งซึ่งทำให้ R ดูน่าสนใจและเข้าถึงได้มากขึ้น ค้นหาภาพประกอบเพิ่มเติมได้ที่ [แกลเลอรีของเธอ](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)\n",
    "\n",
    "[Cassie Breviu](https://www.twitter.com/cassieview) และ [Jen Looper](https://www.twitter.com/jenlooper) สำหรับการสร้างเวอร์ชัน Python ดั้งเดิมของโมดูลนี้ ♥️\n",
    "\n",
    "เรียนรู้อย่างมีความสุข,\n",
    "\n",
    "[Eric](https://twitter.com/ericntay), Gold Microsoft Learn Student Ambassador\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/r_learners_sm.jpeg\"\n",
    "   width=\"569\"/>\n",
    "   <figcaption>ภาพประกอบโดย @allison_horst</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**ข้อจำกัดความรับผิดชอบ**:  \nเอกสารนี้ได้รับการแปลโดยใช้บริการแปลภาษา AI [Co-op Translator](https://github.com/Azure/co-op-translator) แม้ว่าเราจะพยายามให้การแปลมีความถูกต้อง แต่โปรดทราบว่าการแปลอัตโนมัติอาจมีข้อผิดพลาดหรือความไม่แม่นยำ เอกสารต้นฉบับในภาษาดั้งเดิมควรถือเป็นแหล่งข้อมูลที่เชื่อถือได้ สำหรับข้อมูลที่สำคัญ ขอแนะนำให้ใช้บริการแปลภาษาจากผู้เชี่ยวชาญ เราไม่รับผิดชอบต่อความเข้าใจผิดหรือการตีความที่ผิดพลาดซึ่งเกิดจากการใช้การแปลนี้\n"
   ]
  }
 ]
}