{
 "nbformat": 4,
 "nbformat_minor": 0,
 "metadata": {
  "colab": {
   "name": "lesson_12-R.ipynb",
   "provenance": [],
   "collapsed_sections": []
  },
  "kernelspec": {
   "name": "ir",
   "display_name": "R"
  },
  "language_info": {
   "name": "R"
  },
  "coopTranslator": {
   "original_hash": "fab50046ca413a38939d579f8432274f",
   "translation_date": "2025-09-04T02:35:47+00:00",
   "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb",
   "language_code": "ko"
  }
 },
 "cells": [
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "jsFutf_ygqSx"
   },
   "source": [
    "# 분류 모델 구축: 맛있는 아시아 및 인도 요리\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HD54bEefgtNO"
   },
   "source": [
    "## 요리 분류기 2\n",
    "\n",
    "이 두 번째 분류 수업에서는 범주형 데이터를 분류하는 `다양한 방법`을 탐구합니다. 또한, 한 분류기를 다른 분류기 대신 선택했을 때의 결과에 대해 배워볼 것입니다.\n",
    "\n",
    "### [**강의 전 퀴즈**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n",
    "\n",
    "### **사전 요구사항**\n",
    "\n",
    "이전 수업을 완료했다고 가정합니다. 이번 수업에서는 이전에 배운 개념을 이어서 사용할 것입니다.\n",
    "\n",
    "이번 수업을 위해 다음 패키지가 필요합니다:\n",
    "\n",
    "-   `tidyverse`: [tidyverse](https://www.tidyverse.org/)는 데이터 과학을 더 빠르고, 쉽고, 재미있게 만들어주는 [R 패키지 모음](https://www.tidyverse.org/packages)입니다.\n",
    "\n",
    "-   `tidymodels`: [tidymodels](https://www.tidymodels.org/) 프레임워크는 모델링과 머신러닝을 위한 [패키지 모음](https://www.tidymodels.org/packages/)입니다.\n",
    "\n",
    "-   `themis`: [themis 패키지](https://themis.tidymodels.org/)는 불균형 데이터 처리를 위한 추가 레시피 단계를 제공합니다.\n",
    "\n",
    "다음 명령어를 사용하여 패키지를 설치할 수 있습니다:\n",
    "\n",
    "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n",
    "\n",
    "또는 아래 스크립트를 사용하면 필요한 패키지가 설치되어 있는지 확인하고, 누락된 경우 자동으로 설치합니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vZ57IuUxgyQt"
   },
   "source": [
    "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
    "\n",
    "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "z22M-pj4g07x"
   },
   "source": [
    "## **1. 분류 지도**\n",
    "\n",
    "[이전 강의](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1)에서 우리는 \"여러 모델 중에서 어떻게 선택할 것인가?\"라는 질문에 대해 다뤄보았습니다. 이는 데이터의 특성과 우리가 해결하려는 문제 유형(예: 분류 또는 회귀)에 크게 좌우됩니다.\n",
    "\n",
    "이전에, 데이터를 분류할 때 사용할 수 있는 다양한 옵션에 대해 Microsoft의 치트 시트를 통해 배웠습니다. Python의 머신러닝 프레임워크인 Scikit-learn은 이와 유사하지만 더 세분화된 치트 시트를 제공하여 분류기(또는 추정기)를 선택하는 데 도움을 줄 수 있습니다:\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/map.png\"\n",
    "   width=\"700\"/>\n",
    "   <figcaption></figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "u1i3xRIVg7vG"
   },
   "source": [
    "> Tip: [이 온라인 지도를 방문하세요](https://scikit-learn.org/stable/tutorial/machine_learning_map/) 그리고 경로를 따라가며 문서를 읽어보세요.\n",
    ">\n",
    "> [Tidymodels 참조 사이트](https://www.tidymodels.org/find/parsnip/#models)에서도 다양한 모델 유형에 대한 훌륭한 문서를 제공합니다.\n",
    "\n",
    "### **계획** 🗺️\n",
    "\n",
    "이 지도는 데이터를 명확히 이해한 후에 매우 유용합니다. 경로를 따라가며 결정을 내릴 수 있습니다:\n",
    "\n",
    "-   샘플이 \\>50개 있습니다.\n",
    "\n",
    "-   카테고리를 예측하고 싶습니다.\n",
    "\n",
    "-   라벨이 있는 데이터가 있습니다.\n",
    "\n",
    "-   샘플이 100K개 미만입니다.\n",
    "\n",
    "-   ✨ Linear SVC를 선택할 수 있습니다.\n",
    "\n",
    "-   만약 작동하지 않는다면, 숫자 데이터가 있으므로\n",
    "\n",
    "    -   ✨ KNeighbors Classifier를 시도할 수 있습니다.\n",
    "\n",
    "        -   그래도 작동하지 않으면 ✨ SVC와 ✨ Ensemble Classifiers를 시도해보세요.\n",
    "\n",
    "이 경로는 따라가기 매우 유용합니다. 이제 [tidymodels](https://www.tidymodels.org/) 모델링 프레임워크를 사용해 바로 시작해봅시다. 이는 좋은 통계적 실천을 장려하기 위해 개발된 일관되고 유연한 R 패키지 모음입니다 😊.\n",
    "\n",
    "## 2. 데이터를 분리하고 불균형 데이터 세트를 처리하기.\n",
    "\n",
    "이전 수업에서 우리는 각 요리에서 공통적으로 사용되는 재료 세트를 배웠습니다. 또한, 요리의 수가 매우 불균등하게 분포되어 있다는 것도 알게 되었습니다.\n",
    "\n",
    "이를 다음과 같이 처리할 것입니다:\n",
    "\n",
    "-   서로 다른 요리 간 혼란을 초래하는 가장 일반적인 재료를 `dplyr::select()`를 사용해 제거합니다.\n",
    "\n",
    "-   데이터를 모델링에 적합하게 준비하기 위해 `recipe`를 사용하여 데이터를 전처리하고 `over-sampling` 알고리즘을 적용합니다.\n",
    "\n",
    "이전 수업에서 이미 위 내용을 다뤘으니 이번에는 쉽게 진행할 수 있을 것입니다 🥳!\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "6tj_rN00hClA"
   },
   "source": [
    "# Load the core Tidyverse and Tidymodels packages\n",
    "library(tidyverse)\n",
    "library(tidymodels)\n",
    "\n",
    "# Load the original cuisines data\n",
    "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n",
    "\n",
    "# Drop id column, rice, garlic and ginger from our original data set\n",
    "df_select <- df %>% \n",
    "  select(-c(1, rice, garlic, ginger)) %>%\n",
    "  # Encode cuisine column as categorical\n",
    "  mutate(cuisine = factor(cuisine))\n",
    "\n",
    "\n",
    "# Create data split specification\n",
    "set.seed(2056)\n",
    "cuisines_split <- initial_split(data = df_select,\n",
    "                                strata = cuisine,\n",
    "                                prop = 0.7)\n",
    "\n",
    "# Extract the data in each split\n",
    "cuisines_train <- training(cuisines_split)\n",
    "cuisines_test <- testing(cuisines_split)\n",
    "\n",
    "# Display distribution of cuisines in the training set\n",
    "cuisines_train %>% \n",
    "  count(cuisine) %>% \n",
    "  arrange(desc(n))"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zFin5yw3hHb1"
   },
   "source": [
    "### 불균형 데이터 처리하기\n",
    "\n",
    "불균형 데이터는 종종 모델 성능에 부정적인 영향을 미칩니다. 많은 모델은 관측값의 수가 동일할 때 가장 잘 작동하며, 따라서 불균형 데이터에서는 어려움을 겪는 경향이 있습니다.\n",
    "\n",
    "불균형 데이터 세트를 처리하는 주요 방법은 두 가지입니다:\n",
    "\n",
    "-   소수 클래스에 관측값을 추가하기: `오버샘플링` 예를 들어, SMOTE 알고리즘을 사용하여 소수 클래스의 새로운 예제를 이 사례들의 가장 가까운 이웃을 기반으로 합성적으로 생성합니다.\n",
    "\n",
    "-   다수 클래스에서 관측값을 제거하기: `언더샘플링`\n",
    "\n",
    "이전 강의에서는 `recipe`를 사용하여 불균형 데이터 세트를 처리하는 방법을 시연했습니다. `recipe`는 데이터 분석을 준비하기 위해 데이터 세트에 어떤 단계를 적용해야 하는지를 설명하는 청사진으로 생각할 수 있습니다. 우리의 경우, `training set`에서 요리의 수가 균등하게 분포되도록 하고 싶습니다. 바로 시작해봅시다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "cRzTnHolhLWd"
   },
   "source": [
    "# Load themis package for dealing with imbalanced data\n",
    "library(themis)\n",
    "\n",
    "# Create a recipe for preprocessing training data\n",
    "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n",
    "  step_smote(cuisine) \n",
    "\n",
    "# Print recipe\n",
    "cuisines_recipe"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KxOQ2ORhhO81"
   },
   "source": [
    "이제 모델을 훈련할 준비가 되었습니다 👩‍💻👨‍💻!\n",
    "\n",
    "## 3. 다항 회귀 모델을 넘어서\n",
    "\n",
    "이전 강의에서는 다항 회귀 모델에 대해 살펴보았습니다. 이제 분류를 위한 더 유연한 모델들을 탐구해 봅시다.\n",
    "\n",
    "### 서포트 벡터 머신(Support Vector Machines)\n",
    "\n",
    "분류의 맥락에서, `서포트 벡터 머신(Support Vector Machines)`은 클래스들을 \"최적\"으로 분리하는 *초평면(hyperplane)*을 찾으려는 머신 러닝 기법입니다. 간단한 예를 살펴보겠습니다:\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/svm.png\"\n",
    "   width=\"300\"/>\n",
    "   <figcaption>https://commons.wikimedia.org/w/index.php?curid=22877598</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "C4Wsd0vZhXYu"
   },
   "source": [
    "H1~는 클래스를 분리하지 않습니다. H2~는 분리하지만, 간격이 작습니다. H3~는 최대 간격으로 클래스를 분리합니다.\n",
    "\n",
    "#### 선형 서포트 벡터 분류기\n",
    "\n",
    "서포트 벡터 클러스터링(SVC)은 머신러닝 기법 중 서포트 벡터 머신(SVM) 계열에 속하는 하위 기술입니다. SVC에서는 초평면이 훈련 관측값의 `대부분`을 올바르게 분리하도록 선택되지만, 일부 관측값은 `잘못 분류될 수` 있습니다. 일부 점이 잘못된 쪽에 위치하도록 허용함으로써 SVM은 이상치에 대해 더 강건해지고 새로운 데이터에 대한 일반화 능력이 향상됩니다. 이러한 위반을 조정하는 매개변수를 `cost`라고 하며, 기본값은 1입니다 (`help(\"svm_poly\")`를 참조하세요).\n",
    "\n",
    "다항식 SVM 모델에서 `degree = 1`로 설정하여 선형 SVC를 만들어봅시다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "vJpp6nuChlBz"
   },
   "source": [
    "# Make a linear SVC specification\n",
    "svc_linear_spec <- svm_poly(degree = 1) %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svc_linear_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svc_linear_spec)\n",
    "\n",
    "# Print out workflow\n",
    "svc_linear_wf"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "rDs8cWNkhoqu"
   },
   "source": [
    "이제 전처리 단계와 모델 사양을 *워크플로*에 담았으니, 선형 SVC를 학습시키고 결과를 평가해봅시다. 성능 지표로는 `정확도(accuracy)`, `민감도(sensitivity)`, `양성 예측 값(Positive Predicted Value)`, 그리고 `F 측정값(F Measure)`을 평가할 수 있는 지표 세트를 만들어 보겠습니다.\n",
    "\n",
    "> `augment()`는 주어진 데이터에 예측 결과를 담은 열(column)을 추가합니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "81wiqcwuhrnq"
   },
   "source": [
    "# Train a linear SVC model\n",
    "svc_linear_fit <- svc_linear_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svc_linear_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "0UFQvHf-huo3"
   },
   "source": [
    "#### 서포트 벡터 머신\n",
    "\n",
    "서포트 벡터 머신(SVM)은 클래스 간의 비선형 경계를 처리하기 위해 서포트 벡터 분류기를 확장한 것입니다. 본질적으로, SVM은 *커널 트릭*을 사용하여 특징 공간을 확장함으로써 클래스 간의 비선형 관계에 적응합니다. SVM에서 사용되는 인기 있고 매우 유연한 커널 함수 중 하나는 *방사 기저 함수*입니다. 이제 이것이 우리의 데이터에서 어떻게 작동하는지 살펴보겠습니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "-KX4S8mzhzmp"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Make an RBF SVM specification\n",
    "svm_rbf_spec <- svm_rbf() %>% \n",
    "  set_engine(\"kernlab\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle specification and recipe into a worklow\n",
    "svm_rbf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(svm_rbf_spec)\n",
    "\n",
    "\n",
    "# Train an RBF model\n",
    "svm_rbf_fit <- svm_rbf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "svm_rbf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "QBFSa7WSh4HQ"
   },
   "source": [
    "훨씬 더 좋아요 🤩!\n",
    "\n",
    "> ✅ 참고하세요:\n",
    ">\n",
    "> -   [*Support Vector Machines*](https://bradleyboehmke.github.io/HOML/svm.html), Hands-on Machine Learning with R\n",
    ">\n",
    "> -   [*Support Vector Machines*](https://www.statlearning.com/), An Introduction to Statistical Learning with Applications in R\n",
    ">\n",
    "> 추가 학습을 위해.\n",
    "\n",
    "### 최근접 이웃 분류기\n",
    "\n",
    "*K*-최근접 이웃(KNN)은 각 관측값이 다른 관측값과의 *유사성*을 기반으로 예측되는 알고리즘입니다.\n",
    "\n",
    "우리 데이터에 이를 적용해 봅시다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "k4BxxBcdh9Ka"
   },
   "source": [
    "# Make a KNN specification\n",
    "knn_spec <- nearest_neighbor() %>% \n",
    "  set_engine(\"kknn\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "knn_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(knn_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "knn_wf_fit <- knn_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "knn_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "HaegQseriAcj"
   },
   "source": [
    "이 모델의 성능이 그다지 좋지 않은 것 같습니다. 아마도 `help(\"nearest_neighbor\")`를 참고하여 모델의 매개변수를 변경하면 성능이 향상될 수 있습니다. 꼭 시도해 보세요.\n",
    "\n",
    "> ✅ 참고 자료:\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> 위 자료를 통해 *K*-Nearest Neighbors 분류기에 대해 더 알아보세요.\n",
    "\n",
    "### 앙상블 분류기\n",
    "\n",
    "앙상블 알고리즘은 여러 개의 기본 추정기를 결합하여 최적의 모델을 생성하는 방식으로 작동합니다. 방법은 다음과 같습니다:\n",
    "\n",
    "`bagging`: 기본 모델들의 집합에 *평균 함수*를 적용하는 방식\n",
    "\n",
    "`boosting`: 예측 성능을 개선하기 위해 서로를 기반으로 구축하는 일련의 모델을 생성하는 방식\n",
    "\n",
    "우선, 랜덤 포레스트(Random Forest) 모델을 시도해 봅시다. 이 모델은 다수의 결정 트리를 생성한 후, 평균 함수를 적용하여 더 나은 전체 모델을 만듭니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "49DPoVs6iK1M"
   },
   "source": [
    "# Make a random forest specification\n",
    "rf_spec <- rand_forest() %>% \n",
    "  set_engine(\"ranger\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "rf_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(rf_spec)\n",
    "\n",
    "# Train a random forest model\n",
    "rf_wf_fit <- rf_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "rf_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "RGVYwC_aiUWc"
   },
   "source": [
    "잘했어요 👏!\n",
    "\n",
    "Boosted Tree 모델도 실험해 봅시다.\n",
    "\n",
    "Boosted Tree는 일련의 순차적인 결정 트리를 생성하는 앙상블 방법을 정의합니다. 각 트리는 이전 트리의 결과에 따라 달라지며, 점진적으로 오류를 줄이려는 시도를 합니다. 이 방법은 잘못 분류된 항목의 가중치에 초점을 맞추고, 다음 분류기가 이를 수정하도록 적합성을 조정합니다.\n",
    "\n",
    "이 모델을 적합시키는 방법에는 여러 가지가 있습니다 (`help(\"boost_tree\")`를 참조하세요). 이 예제에서는 `xgboost` 엔진을 사용하여 Boosted Tree를 적합시킬 것입니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Py1YWo-micWs"
   },
   "source": [
    "# Make a boosted tree specification\n",
    "boost_spec <- boost_tree(trees = 200) %>% \n",
    "  set_engine(\"xgboost\") %>% \n",
    "  set_mode(\"classification\")\n",
    "\n",
    "# Bundle recipe and model specification into a workflow\n",
    "boost_wf <- workflow() %>% \n",
    "  add_recipe(cuisines_recipe) %>% \n",
    "  add_model(boost_spec)\n",
    "\n",
    "# Train a boosted tree model\n",
    "boost_wf_fit <- boost_wf %>% \n",
    "  fit(data = cuisines_train)\n",
    "\n",
    "\n",
    "# Make predictions and Evaluate model performance\n",
    "boost_wf_fit %>% \n",
    "  augment(new_data = cuisines_test) %>% \n",
    "  eval_metrics(truth = cuisine, estimate = .pred_class)"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "zNQnbuejigZM"
   },
   "source": [
    "> ✅ 참고하세요:\n",
    ">\n",
    "> -   [사회과학자를 위한 머신러닝](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n",
    ">\n",
    "> -   [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
    ">\n",
    "> -   [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
    ">\n",
    "> -   <https://algotech.netlify.app/blog/xgboost/> - AdaBoost 모델을 탐구하며, xgboost의 좋은 대안이 될 수 있습니다.\n",
    ">\n",
    "> 앙상블 분류기에 대해 더 알아보세요.\n",
    "\n",
    "## 4. 추가 - 여러 모델 비교하기\n",
    "\n",
    "이번 실습에서는 꽤 많은 모델을 적용해 보았습니다 🙌. 다양한 전처리기와/또는 모델 사양을 사용하여 여러 워크플로를 생성하고, 성능 지표를 하나씩 계산하는 작업은 번거롭거나 부담스러울 수 있습니다.\n",
    "\n",
    "이를 해결하기 위해, 훈련 세트에서 여러 워크플로를 적용하고 테스트 세트를 기반으로 성능 지표를 반환하는 함수를 만들어 보겠습니다. 이를 통해 [purrr](https://purrr.tidyverse.org/) 패키지의 `map()` 및 `map_dfr()`를 사용하여 리스트의 각 요소에 함수를 적용할 수 있습니다.\n",
    "\n",
    "> [`map()`](https://purrr.tidyverse.org/reference/map.html) 함수는 많은 for 루프를 더 간결하고 읽기 쉬운 코드로 대체할 수 있도록 도와줍니다. [`map()`](https://purrr.tidyverse.org/reference/map.html) 함수에 대해 배우기에 가장 좋은 장소는 R for Data Science의 [반복(iteration) 챕터](http://r4ds.had.co.nz/iteration.html)입니다.\n"
   ]
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "Qzb7LyZnimd2"
   },
   "source": [
    "set.seed(2056)\n",
    "\n",
    "# Create a metric set\n",
    "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
    "\n",
    "# Define a function that returns performance metrics\n",
    "compare_models <- function(workflow_list, train_set, test_set){\n",
    "  \n",
    "  suppressWarnings(\n",
    "    # Fit each model to the train_set\n",
    "    map(workflow_list, fit, data = train_set) %>% \n",
    "    # Make predictions on the test set\n",
    "      map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n",
    "    # Select desired columns\n",
    "      select(model, cuisine, .pred_class) %>% \n",
    "    # Evaluate model performance\n",
    "      group_by(model) %>% \n",
    "      eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n",
    "      ungroup()\n",
    "  )\n",
    "  \n",
    "} # End of function"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "Fwa712sNisDA"
   },
   "source": []
  },
  {
   "cell_type": "code",
   "metadata": {
    "id": "3i4VJOi2iu-a"
   },
   "source": [
    "# Make a list of workflows\n",
    "workflow_list <- list(\n",
    "  \"svc\" = svc_linear_wf,\n",
    "  \"svm\" = svm_rbf_wf,\n",
    "  \"knn\" = knn_wf,\n",
    "  \"random_forest\" = rf_wf,\n",
    "  \"xgboost\" = boost_wf)\n",
    "\n",
    "# Call the function\n",
    "set.seed(2056)\n",
    "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n",
    "\n",
    "# Print out performance metrics\n",
    "perf_metrics %>% \n",
    "  group_by(.metric) %>% \n",
    "  arrange(desc(.estimate)) %>% \n",
    "  slice_head(n=7)\n",
    "\n",
    "# Compare accuracy\n",
    "perf_metrics %>% \n",
    "  filter(.metric == \"accuracy\") %>% \n",
    "  arrange(desc(.estimate))\n"
   ],
   "execution_count": null,
   "outputs": []
  },
  {
   "cell_type": "markdown",
   "metadata": {
    "id": "KuWK_lEli4nW"
   },
   "source": [
    "[**workflowset**](https://workflowsets.tidymodels.org/) 패키지는 사용자가 많은 모델을 생성하고 쉽게 적합시킬 수 있도록 해주며, 주로 `교차 검증`과 같은 재샘플링 기법과 함께 사용하도록 설계되었습니다. 이 접근법은 아직 다루지 않았습니다.\n",
    "\n",
    "## **🚀도전 과제**\n",
    "\n",
    "이 기법들 각각은 조정할 수 있는 많은 매개변수를 가지고 있습니다. 예를 들어, SVM의 `cost`, KNN의 `neighbors`, 랜덤 포레스트의 `mtry`(무작위로 선택된 예측 변수) 등이 있습니다.\n",
    "\n",
    "각 모델의 기본 매개변수를 조사하고, 이러한 매개변수를 조정하는 것이 모델의 품질에 어떤 영향을 미칠지 생각해 보세요.\n",
    "\n",
    "특정 모델과 그 매개변수에 대해 더 알아보려면 다음을 사용하세요: `help(\"model\")` 예: `help(\"rand_forest\")`\n",
    "\n",
    "> 실제로는, `모의 데이터 세트`에서 여러 모델을 훈련시키고 이 모델들이 얼마나 잘 수행하는지 측정하여 *최적의 값*을 *추정*하는 경우가 많습니다. 이 과정을 **튜닝**이라고 합니다.\n",
    "\n",
    "### [**강의 후 퀴즈**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n",
    "\n",
    "### **복습 및 자기 학습**\n",
    "\n",
    "이 강의들에는 전문 용어가 많이 등장하니, [이 목록](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott)의 유용한 용어들을 검토하는 시간을 가져보세요!\n",
    "\n",
    "#### 감사의 말씀:\n",
    "\n",
    "[`Allison Horst`](https://twitter.com/allison_horst/)에게 R을 더 친근하고 매력적으로 만들어주는 멋진 삽화를 만들어 주신 것에 대해 감사드립니다. 더 많은 삽화는 그녀의 [갤러리](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)에서 확인할 수 있습니다.\n",
    "\n",
    "[Cassie Breviu](https://www.twitter.com/cassieview)와 [Jen Looper](https://www.twitter.com/jenlooper)에게 이 모듈의 원래 Python 버전을 만들어 주신 것에 대해 감사드립니다 ♥️\n",
    "\n",
    "즐거운 학습 되세요,\n",
    "\n",
    "[Eric](https://twitter.com/ericntay), Gold Microsoft Learn Student Ambassador.\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/r_learners_sm.jpeg\"\n",
    "   width=\"569\"/>\n",
    "   <figcaption>@allison_horst의 작품</figcaption>\n"
   ]
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**면책 조항**:  \n이 문서는 AI 번역 서비스 [Co-op Translator](https://github.com/Azure/co-op-translator)를 사용하여 번역되었습니다. 정확성을 위해 최선을 다하고 있으나, 자동 번역에는 오류나 부정확성이 포함될 수 있습니다. 원본 문서의 원어 버전을 권위 있는 출처로 간주해야 합니다. 중요한 정보의 경우, 전문적인 인간 번역을 권장합니다. 이 번역 사용으로 인해 발생하는 오해나 잘못된 해석에 대해 책임을 지지 않습니다.\n"
   ]
  }
 ]
}