{ "nbformat": 4, "nbformat_minor": 2, "metadata": { "colab": { "name": "lesson_11-R.ipynb", "provenance": [], "collapsed_sections": [], "toc_visible": true }, "kernelspec": { "name": "ir", "display_name": "R" }, "language_info": { "name": "R" }, "coopTranslator": { "original_hash": "6ea6a5171b1b99b7b5a55f7469c048d2", "translation_date": "2025-09-03T20:24:29+00:00", "source_file": "4-Classification/2-Classifiers-1/solution/R/lesson_11-R.ipynb", "language_code": "zh" } }, "cells": [ { "cell_type": "markdown", "source": [ "# 构建分类模型:美味的亚洲和印度美食\n" ], "metadata": { "id": "zs2woWv_HoE8" } }, { "cell_type": "markdown", "source": [ "## 美食分类器 1\n", "\n", "在本课中,我们将探索多种分类器来*根据一组食材预测某种国家美食*。同时,我们将深入了解算法在分类任务中的一些应用方式。\n", "\n", "### [**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/21/)\n", "\n", "### **准备工作**\n", "\n", "本课基于我们[上一课](https://github.com/microsoft/ML-For-Beginners/blob/main/4-Classification/1-Introduction/solution/lesson_10-R.ipynb),其中我们:\n", "\n", "- 使用一个关于亚洲和印度各种美食的数据集进行了分类的简单介绍 😋。\n", "\n", "- 探索了一些 [dplyr 动词](https://dplyr.tidyverse.org/) 来准备和清理数据。\n", "\n", "- 使用 ggplot2 创建了漂亮的可视化图表。\n", "\n", "- 演示了如何通过使用 [recipes](https://recipes.tidymodels.org/articles/Simple_Example.html) 预处理数据来处理不平衡数据。\n", "\n", "- 演示了如何 `prep` 和 `bake` 我们的配方,以确认其能够正常工作。\n", "\n", "#### **前置条件**\n", "\n", "在本课中,我们需要以下包来清理、准备和可视化数据:\n", "\n", "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个 [R 包集合](https://www.tidyverse.org/packages),旨在让数据科学更快、更简单、更有趣!\n", "\n", "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个 [包集合](https://www.tidymodels.org/packages),用于建模和机器学习。\n", "\n", "- `themis`: [themis 包](https://themis.tidymodels.org/) 提供了额外的配方步骤,用于处理不平衡数据。\n", "\n", "- `nnet`: [nnet 包](https://cran.r-project.org/web/packages/nnet/nnet.pdf) 提供了用于估计具有单个隐藏层的前馈神经网络以及多项逻辑回归模型的函数。\n", "\n", "您可以通过以下方式安装这些包:\n" ], "metadata": { "id": "iDFOb3ebHwQC" } }, { "cell_type": "markdown", "source": [ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"DataExplorer\", \"here\"))`\n", "\n", "或者,下面的脚本会检查您是否已安装完成本模块所需的包,并在缺少时为您安装。\n" ], "metadata": { "id": "4V85BGCjII7F" } }, { "cell_type": "code", "execution_count": 2, "source": [ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\r\n", "\r\n", "pacman::p_load(tidyverse, tidymodels, themis, here)" ], "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "Loading required package: pacman\n", "\n" ] } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/" }, "id": "an5NPyyKIKNR", "outputId": "834d5e74-f4b8-49f9-8ab5-4c52ff2d7bc8" } }, { "cell_type": "markdown", "source": [ "## 1. 将数据分为训练集和测试集\n", "\n", "我们将从上一节课中选择几个步骤开始。\n", "\n", "### 使用 `dplyr::select()` 删除最常见的食材,这些食材容易在不同菜系之间造成混淆。\n", "\n", "谁不喜欢米饭、大蒜和姜呢!\n" ], "metadata": { "id": "0ax9GQLBINVv" } }, { "cell_type": "code", "execution_count": 3, "source": [ "# Load the original cuisines data\r\n", "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\r\n", "\r\n", "# Drop id column, rice, garlic and ginger from our original data set\r\n", "df_select <- df %>% \r\n", " select(-c(1, rice, garlic, ginger)) %>%\r\n", " # Encode cuisine column as categorical\r\n", " mutate(cuisine = factor(cuisine))\r\n", "\r\n", "# Display new data set\r\n", "df_select %>% \r\n", " slice_head(n = 5)\r\n", "\r\n", "# Display distribution of cuisines\r\n", "df_select %>% \r\n", " count(cuisine) %>% \r\n", " arrange(desc(n))" ], "outputs": [ { "output_type": "stream", "name": "stderr", "text": [ "New names:\n", "* `` -> ...1\n", "\n", "\u001b[1m\u001b[1mRows: \u001b[1m\u001b[22m\u001b[34m\u001b[34m2448\u001b[34m\u001b[39m \u001b[1m\u001b[1mColumns: \u001b[1m\u001b[22m\u001b[34m\u001b[34m385\u001b[34m\u001b[39m\n", "\n", "\u001b[36m──\u001b[39m \u001b[1m\u001b[1mColumn specification\u001b[1m\u001b[22m \u001b[36m────────────────────────────────────────────────────────\u001b[39m\n", "\u001b[1mDelimiter:\u001b[22m \",\"\n", "\u001b[31mchr\u001b[39m (1): cuisine\n", "\u001b[32mdbl\u001b[39m (384): ...1, almond, angelica, anise, anise_seed, apple, apple_brandy, a...\n", "\n", "\n", "\u001b[36mℹ\u001b[39m Use \u001b[30m\u001b[47m\u001b[30m\u001b[47m`spec()`\u001b[47m\u001b[30m\u001b[49m\u001b[39m to retrieve the full column specification for this data.\n", "\u001b[36mℹ\u001b[39m Specify the column types or set \u001b[30m\u001b[47m\u001b[30m\u001b[47m`show_col_types = FALSE`\u001b[47m\u001b[30m\u001b[49m\u001b[39m to quiet this message.\n", "\n" ] }, { "output_type": "display_data", "data": { "text/plain": [ " cuisine almond angelica anise anise_seed apple apple_brandy apricot armagnac\n", "1 indian 0 0 0 0 0 0 0 0 \n", "2 indian 1 0 0 0 0 0 0 0 \n", "3 indian 0 0 0 0 0 0 0 0 \n", "4 indian 0 0 0 0 0 0 0 0 \n", "5 indian 0 0 0 0 0 0 0 0 \n", " artemisia ⋯ whiskey white_bread white_wine whole_grain_wheat_flour wine wood\n", "1 0 ⋯ 0 0 0 0 0 0 \n", "2 0 ⋯ 0 0 0 0 0 0 \n", "3 0 ⋯ 0 0 0 0 0 0 \n", "4 0 ⋯ 0 0 0 0 0 0 \n", "5 0 ⋯ 0 0 0 0 0 0 \n", " yam yeast yogurt zucchini\n", "1 0 0 0 0 \n", "2 0 0 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "5 0 0 1 0 " ], "text/markdown": [ "\n", "A tibble: 5 × 381\n", "\n", "| cuisine <fct> | almond <dbl> | angelica <dbl> | anise <dbl> | anise_seed <dbl> | apple <dbl> | apple_brandy <dbl> | apricot <dbl> | armagnac <dbl> | artemisia <dbl> | ⋯ ⋯ | whiskey <dbl> | white_bread <dbl> | white_wine <dbl> | whole_grain_wheat_flour <dbl> | wine <dbl> | wood <dbl> | yam <dbl> | yeast <dbl> | yogurt <dbl> | zucchini <dbl> |\n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |\n", "\n" ], "text/latex": [ "A tibble: 5 × 381\n", "\\begin{tabular}{lllllllllllllllllllll}\n", " cuisine & almond & angelica & anise & anise\\_seed & apple & apple\\_brandy & apricot & armagnac & artemisia & ⋯ & whiskey & white\\_bread & white\\_wine & whole\\_grain\\_wheat\\_flour & wine & wood & yam & yeast & yogurt & zucchini\\\\\n", " & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n", "\\hline\n", "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t indian & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 381
cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiawhiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
<fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
indian0000000000000000000
indian1000000000000000000
indian0000000000000000000
indian0000000000000000000
indian0000000000000000010
\n" ] }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ " cuisine n \n", "1 korean 799\n", "2 indian 598\n", "3 chinese 442\n", "4 japanese 320\n", "5 thai 289" ], "text/markdown": [ "\n", "A tibble: 5 × 2\n", "\n", "| cuisine <fct> | n <int> |\n", "|---|---|\n", "| korean | 799 |\n", "| indian | 598 |\n", "| chinese | 442 |\n", "| japanese | 320 |\n", "| thai | 289 |\n", "\n" ], "text/latex": [ "A tibble: 5 × 2\n", "\\begin{tabular}{ll}\n", " cuisine & n\\\\\n", " & \\\\\n", "\\hline\n", "\t korean & 799\\\\\n", "\t indian & 598\\\\\n", "\t chinese & 442\\\\\n", "\t japanese & 320\\\\\n", "\t thai & 289\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 2
cuisinen
<fct><int>
korean 799
indian 598
chinese 442
japanese320
thai 289
\n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 735 }, "id": "jhCrrH22IWVR", "outputId": "d444a85c-1d8b-485f-bc4f-8be2e8f8217c" } }, { "cell_type": "markdown", "source": [ "太棒了!现在是时候将数据分割为70%用于训练,30%用于测试。我们还将应用一种`分层`技术,在分割数据时`保持每种菜系的比例`在训练和验证数据集中。\n", "\n", "[rsample](https://rsample.tidymodels.org/)是Tidymodels中的一个包,它提供了高效的数据分割和重采样的基础设施:\n" ], "metadata": { "id": "AYTjVyajIdny" } }, { "cell_type": "code", "execution_count": 4, "source": [ "# Load the core Tidymodels packages into R session\r\n", "library(tidymodels)\r\n", "\r\n", "# Create split specification\r\n", "set.seed(2056)\r\n", "cuisines_split <- initial_split(data = df_select,\r\n", " strata = cuisine,\r\n", " prop = 0.7)\r\n", "\r\n", "# Extract the data in each split\r\n", "cuisines_train <- training(cuisines_split)\r\n", "cuisines_test <- testing(cuisines_split)\r\n", "\r\n", "# Print the number of cases in each split\r\n", "cat(\"Training cases: \", nrow(cuisines_train), \"\\n\",\r\n", " \"Test cases: \", nrow(cuisines_test), sep = \"\")\r\n", "\r\n", "# Display the first few rows of the training set\r\n", "cuisines_train %>% \r\n", " slice_head(n = 5)\r\n", "\r\n", "\r\n", "# Display distribution of cuisines in the training set\r\n", "cuisines_train %>% \r\n", " count(cuisine) %>% \r\n", " arrange(desc(n))" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "Training cases: 1712\n", "Test cases: 736" ] }, { "output_type": "display_data", "data": { "text/plain": [ " cuisine almond angelica anise anise_seed apple apple_brandy apricot armagnac\n", "1 chinese 0 0 0 0 0 0 0 0 \n", "2 chinese 0 0 0 0 0 0 0 0 \n", "3 chinese 0 0 0 0 0 0 0 0 \n", "4 chinese 0 0 0 0 0 0 0 0 \n", "5 chinese 0 0 0 0 0 0 0 0 \n", " artemisia ⋯ whiskey white_bread white_wine whole_grain_wheat_flour wine wood\n", "1 0 ⋯ 0 0 0 0 1 0 \n", "2 0 ⋯ 0 0 0 0 1 0 \n", "3 0 ⋯ 0 0 0 0 0 0 \n", "4 0 ⋯ 0 0 0 0 0 0 \n", "5 0 ⋯ 0 0 0 0 0 0 \n", " yam yeast yogurt zucchini\n", "1 0 0 0 0 \n", "2 0 0 0 0 \n", "3 0 0 0 0 \n", "4 0 0 0 0 \n", "5 0 0 0 0 " ], "text/markdown": [ "\n", "A tibble: 5 × 381\n", "\n", "| cuisine <fct> | almond <dbl> | angelica <dbl> | anise <dbl> | anise_seed <dbl> | apple <dbl> | apple_brandy <dbl> | apricot <dbl> | armagnac <dbl> | artemisia <dbl> | ⋯ ⋯ | whiskey <dbl> | white_bread <dbl> | white_wine <dbl> | whole_grain_wheat_flour <dbl> | wine <dbl> | wood <dbl> | yam <dbl> | yeast <dbl> | yogurt <dbl> | zucchini <dbl> |\n", "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n", "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |\n", "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |\n", "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n", "\n" ], "text/latex": [ "A tibble: 5 × 381\n", "\\begin{tabular}{lllllllllllllllllllll}\n", " cuisine & almond & angelica & anise & anise\\_seed & apple & apple\\_brandy & apricot & armagnac & artemisia & ⋯ & whiskey & white\\_bread & white\\_wine & whole\\_grain\\_wheat\\_flour & wine & wood & yam & yeast & yogurt & zucchini\\\\\n", " & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n", "\\hline\n", "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 381
cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiawhiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
<fct><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl><dbl>
chinese0000000000000100000
chinese0000000000000100000
chinese0000000000000000000
chinese0000000000000000000
chinese0000000000000000000
\n" ] }, "metadata": {} }, { "output_type": "display_data", "data": { "text/plain": [ " cuisine n \n", "1 korean 559\n", "2 indian 418\n", "3 chinese 309\n", "4 japanese 224\n", "5 thai 202" ], "text/markdown": [ "\n", "A tibble: 5 × 2\n", "\n", "| cuisine <fct> | n <int> |\n", "|---|---|\n", "| korean | 559 |\n", "| indian | 418 |\n", "| chinese | 309 |\n", "| japanese | 224 |\n", "| thai | 202 |\n", "\n" ], "text/latex": [ "A tibble: 5 × 2\n", "\\begin{tabular}{ll}\n", " cuisine & n\\\\\n", " & \\\\\n", "\\hline\n", "\t korean & 559\\\\\n", "\t indian & 418\\\\\n", "\t chinese & 309\\\\\n", "\t japanese & 224\\\\\n", "\t thai & 202\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 2
cuisinen
<fct><int>
korean 559
indian 418
chinese 309
japanese224
thai 202
\n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 535 }, "id": "w5FWIkEiIjdN", "outputId": "2e195fd9-1a8f-4b91-9573-cce5582242df" } }, { "cell_type": "markdown", "source": [ "## 2. 处理数据不平衡\n", "\n", "正如你可能在原始数据集以及我们的训练集里注意到的,菜系的数量分布非常不均衡。韩餐的数量几乎是泰餐的*三倍*。数据不平衡通常会对模型性能产生负面影响。许多模型在观察数量相等时表现最佳,因此在处理不平衡数据时往往会遇到困难。\n", "\n", "处理数据不平衡主要有两种方法:\n", "\n", "- 为少数类别增加观察值:`过采样`,例如使用 SMOTE 算法,该算法通过少数类别的邻近样本合成生成新的样本。\n", "\n", "- 从多数类别中移除观察值:`欠采样`\n", "\n", "在之前的课程中,我们演示了如何使用 `recipe` 来处理数据不平衡问题。`recipe` 可以被看作是一个蓝图,描述了应该对数据集应用哪些步骤以使其准备好进行数据分析。在我们的案例中,我们希望在 `训练集` 中实现菜系数量的均衡分布。让我们直接开始吧。\n" ], "metadata": { "id": "daBi9qJNIwqW" } }, { "cell_type": "code", "execution_count": 5, "source": [ "# Load themis package for dealing with imbalanced data\r\n", "library(themis)\r\n", "\r\n", "# Create a recipe for preprocessing training data\r\n", "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>% \r\n", " step_smote(cuisine)\r\n", "\r\n", "# Print recipe\r\n", "cuisines_recipe" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "Data Recipe\n", "\n", "Inputs:\n", "\n", " role #variables\n", " outcome 1\n", " predictor 380\n", "\n", "Operations:\n", "\n", "SMOTE based on cuisine" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 200 }, "id": "Az6LFBGxI1X0", "outputId": "29d71d85-64b0-4e62-871e-bcd5398573b6" } }, { "cell_type": "markdown", "source": [ "您当然可以通过准备和烘焙来确认这份配方是否如预期般有效——所有标有“559”观察值的菜系标签。\n", "\n", "由于我们将使用这份配方作为建模的预处理器,`workflow()`将为我们完成所有的准备和烘焙工作,因此我们无需手动估算配方。\n", "\n", "现在我们准备开始训练模型了 👩‍💻👨‍💻!\n", "\n", "## 3. 选择您的分类器\n", "\n", "

\n", " \n", "

插画作者:@allison_horst
\n" ], "metadata": { "id": "NBL3PqIWJBBB" } }, { "cell_type": "markdown", "source": [ "现在我们需要决定使用哪种算法来完成任务 🤔。\n", "\n", "在 Tidymodels 中,[`parsnip package`](https://parsnip.tidymodels.org/index.html) 提供了一个一致的接口,用于跨不同引擎(包)处理模型。请参阅 parsnip 文档,探索[模型类型和引擎](https://www.tidymodels.org/find/parsnip/#models)及其对应的[模型参数](https://www.tidymodels.org/find/parsnip/#model-args)。乍一看,种类繁多令人眼花缭乱。例如,以下方法都包括分类技术:\n", "\n", "- C5.0规则分类模型\n", "\n", "- 灵活判别模型\n", "\n", "- 线性判别模型\n", "\n", "- 正则化判别模型\n", "\n", "- 逻辑回归模型\n", "\n", "- 多项式回归模型\n", "\n", "- 朴素贝叶斯模型\n", "\n", "- 支持向量机\n", "\n", "- 最近邻算法\n", "\n", "- 决策树\n", "\n", "- 集成方法\n", "\n", "- 神经网络\n", "\n", "这个列表还在继续!\n", "\n", "### **选择哪种分类器?**\n", "\n", "那么,应该选择哪种分类器呢?通常,通过尝试多个分类器并寻找效果较好的结果是一种测试方法。\n", "\n", "> AutoML 通过在云端运行这些比较,巧妙地解决了这个问题,让你可以选择最适合数据的算法。试试 [这里](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-77952-leestott)\n", "\n", "此外,分类器的选择取决于我们的问题。例如,当结果可以被分类为`多于两个类别`时,就像我们的情况一样,你必须使用`多分类算法`而不是`二分类算法`。\n", "\n", "### **更好的方法**\n", "\n", "比盲目猜测更好的方法是参考这个可下载的[机器学习速查表](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-77952-leestott)。在这里,我们发现,对于我们的多分类问题,我们有一些选择:\n", "\n", "

\n", " \n", "

微软算法速查表的一部分,详细介绍了多分类选项
\n" ], "metadata": { "id": "a6DLAZ3vJZ14" } }, { "cell_type": "markdown", "source": [ "### **推理**\n", "\n", "让我们根据现有的约束条件来分析不同的方法:\n", "\n", "- **深度神经网络过于复杂**。考虑到我们数据集虽然干净但规模较小,并且我们是在本地通过笔记本运行训练,深度神经网络对于这个任务来说过于笨重。\n", "\n", "- **不使用二分类分类器**。我们不使用二分类分类器,因此排除了“一对多”的方法。\n", "\n", "- **决策树或逻辑回归可能适用**。决策树可能有效,或者可以使用多项式回归/多分类逻辑回归来处理多分类数据。\n", "\n", "- **多分类提升决策树解决的是不同的问题**。多分类提升决策树最适合非参数任务,例如用于构建排名的任务,因此对我们来说并不适用。\n", "\n", "此外,通常在尝试更复杂的机器学习模型(例如集成方法)之前,构建一个最简单的模型来了解数据情况是一个好主意。因此,在本课程中,我们将从一个`多项式回归`模型开始。\n", "\n", "> 逻辑回归是一种用于结果变量是分类(或名义)时的技术。对于二元逻辑回归,结果变量的数量是两个,而对于多项式逻辑回归,结果变量的数量超过两个。有关更多信息,请参阅[高级回归方法](https://bookdown.org/chua/ber642_advanced_regression/multinomial-logistic-regression.html)。\n", "\n", "## 4. 训练并评估一个多项式逻辑回归模型\n", "\n", "在Tidymodels中,`parsnip::multinom_reg()`定义了一个使用线性预测器通过多项式分布预测多分类数据的模型。有关使用此模型的不同方法/引擎,请参阅`?multinom_reg()`。\n", "\n", "在这个示例中,我们将通过默认的[nnet](https://cran.r-project.org/web/packages/nnet/nnet.pdf)引擎来拟合一个多项式回归模型。\n", "\n", "> 我随机选择了一个`penalty`值。实际上有更好的方法来选择这个值,例如通过`重采样`和`调参`模型,这些内容我们稍后会讨论。\n", ">\n", "> 如果您想了解更多关于如何调节模型超参数的信息,请参阅[Tidymodels: 入门](https://www.tidymodels.org/start/tuning/)。\n" ], "metadata": { "id": "gWMsVcbBJemu" } }, { "cell_type": "code", "execution_count": 6, "source": [ "# Create a multinomial regression model specification\r\n", "mr_spec <- multinom_reg(penalty = 1) %>% \r\n", " set_engine(\"nnet\", MaxNWts = 2086) %>% \r\n", " set_mode(\"classification\")\r\n", "\r\n", "# Print model specification\r\n", "mr_spec" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "Multinomial Regression Model Specification (classification)\n", "\n", "Main Arguments:\n", " penalty = 1\n", "\n", "Engine-Specific Arguments:\n", " MaxNWts = 2086\n", "\n", "Computational engine: nnet \n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 166 }, "id": "Wq_fcyQiJvfG", "outputId": "c30449c7-3864-4be7-f810-72a003743e2d" } }, { "cell_type": "markdown", "source": [ "干得好 🥳!现在我们已经有了一个配方和一个模型规范,我们需要找到一种方法将它们打包到一个对象中,这个对象首先会对数据进行预处理,然后将模型拟合到预处理后的数据上,同时还允许进行潜在的后处理操作。在 Tidymodels 中,这个方便的对象叫做 [`workflow`](https://workflows.tidymodels.org/),它可以方便地保存你的建模组件!这在 *Python* 中我们称之为 *pipelines*。\n", "\n", "那么,让我们把所有内容打包到一个 workflow 中吧!📦\n" ], "metadata": { "id": "NlSbzDfgJ0zh" } }, { "cell_type": "code", "execution_count": 7, "source": [ "# Bundle recipe and model specification\r\n", "mr_wf <- workflow() %>% \r\n", " add_recipe(cuisines_recipe) %>% \r\n", " add_model(mr_spec)\r\n", "\r\n", "# Print out workflow\r\n", "mr_wf" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "══ Workflow ════════════════════════════════════════════════════════════════════\n", "\u001b[3mPreprocessor:\u001b[23m Recipe\n", "\u001b[3mModel:\u001b[23m multinom_reg()\n", "\n", "── Preprocessor ────────────────────────────────────────────────────────────────\n", "1 Recipe Step\n", "\n", "• step_smote()\n", "\n", "── Model ───────────────────────────────────────────────────────────────────────\n", "Multinomial Regression Model Specification (classification)\n", "\n", "Main Arguments:\n", " penalty = 1\n", "\n", "Engine-Specific Arguments:\n", " MaxNWts = 2086\n", "\n", "Computational engine: nnet \n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 333 }, "id": "Sc1TfPA4Ke3_", "outputId": "82c70013-e431-4e7e-cef6-9fcf8aad4a6c" } }, { "cell_type": "markdown", "source": [ "工作流 👌👌!一个 **`workflow()`** 可以像模型一样进行拟合。所以,是时候训练一个模型了!\n" ], "metadata": { "id": "TNQ8i85aKf9L" } }, { "cell_type": "code", "execution_count": 8, "source": [ "# Train a multinomial regression model\n", "mr_fit <- fit(object = mr_wf, data = cuisines_train)\n", "\n", "mr_fit" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "══ Workflow [trained] ══════════════════════════════════════════════════════════\n", "\u001b[3mPreprocessor:\u001b[23m Recipe\n", "\u001b[3mModel:\u001b[23m multinom_reg()\n", "\n", "── Preprocessor ────────────────────────────────────────────────────────────────\n", "1 Recipe Step\n", "\n", "• step_smote()\n", "\n", "── Model ───────────────────────────────────────────────────────────────────────\n", "Call:\n", "nnet::multinom(formula = ..y ~ ., data = data, decay = ~1, MaxNWts = ~2086, \n", " trace = FALSE)\n", "\n", "Coefficients:\n", " (Intercept) almond angelica anise anise_seed apple\n", "indian 0.19723325 0.2409661 0 -5.004955e-05 -0.1657635 -0.05769734\n", "japanese 0.13961959 -0.6262400 0 -1.169155e-04 -0.4893596 -0.08585717\n", "korean 0.22377347 -0.1833485 0 -5.560395e-05 -0.2489401 -0.15657804\n", "thai -0.04336577 -0.6106258 0 4.903828e-04 -0.5782866 0.63451105\n", " apple_brandy apricot armagnac artemisia artichoke asparagus\n", "indian 0 0.37042636 0 -0.09122797 0 -0.27181970\n", "japanese 0 0.28895643 0 -0.12651100 0 0.14054037\n", "korean 0 -0.07981259 0 0.55756709 0 -0.66979948\n", "thai 0 -0.33160904 0 -0.10725182 0 -0.02602152\n", " avocado bacon baked_potato balm banana barley\n", "indian -0.46624197 0.16008055 0 0 -0.2838796 0.2230625\n", "japanese 0.90341344 0.02932727 0 0 -0.4142787 2.0953906\n", "korean -0.06925382 -0.35804134 0 0 -0.2686963 -0.7233404\n", "thai -0.21473955 -0.75594439 0 0 0.6784880 -0.4363320\n", " bartlett_pear basil bay bean beech\n", "indian 0 -0.7128756 0.1011587 -0.8777275 -0.0004380795\n", "japanese 0 0.1288697 0.9425626 -0.2380748 0.3373437611\n", "korean 0 -0.2445193 -0.4744318 -0.8957870 -0.0048784496\n", "thai 0 1.5365848 0.1333256 0.2196970 -0.0113078024\n", " beef beef_broth beef_liver beer beet\n", "indian -0.7985278 0.2430186 -0.035598065 -0.002173738 0.01005813\n", "japanese 0.2241875 -0.3653020 -0.139551027 0.128905553 0.04923911\n", "korean 0.5366515 -0.6153237 0.213455197 -0.010828645 0.27325423\n", "thai 0.1570012 -0.9364154 -0.008032213 -0.035063746 -0.28279823\n", " bell_pepper bergamot berry bitter_orange black_bean\n", "indian 0.49074330 0 0.58947607 0.191256164 -0.1945233\n", "japanese 0.09074167 0 -0.25917977 -0.118915977 -0.3442400\n", "korean -0.57876763 0 -0.07874180 -0.007729435 -0.5220672\n", "thai 0.92554006 0 -0.07210196 -0.002983296 -0.4614426\n", " black_currant black_mustard_seed_oil black_pepper black_raspberry\n", "indian 0 0.38935801 -0.4453495 0\n", "japanese 0 -0.05452887 -0.5440869 0\n", "korean 0 -0.03929970 0.8025454 0\n", "thai 0 -0.21498372 -0.9854806 0\n", " black_sesame_seed black_tea blackberry blackberry_brandy\n", "indian -0.2759246 0.3079977 0.191256164 0\n", "japanese -0.6101687 -0.1671913 -0.118915977 0\n", "korean 1.5197674 -0.3036261 -0.007729435 0\n", "thai -0.1755656 -0.1487033 -0.002983296 0\n", " blue_cheese blueberry bone_oil bourbon_whiskey brandy\n", "indian 0 0.216164294 -0.2276744 0 0.22427587\n", "japanese 0 -0.119186087 0.3913019 0 -0.15595599\n", "korean 0 -0.007821986 0.2854487 0 -0.02562342\n", "thai 0 -0.004947048 -0.0253658 0 -0.05715244\n", "\n", "...\n", "and 308 more lines." ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 1000 }, "id": "GMbdfVmTKkJI", "outputId": "adf9ebdf-d69d-4a64-e9fd-e06e5322292e" } }, { "cell_type": "markdown", "source": [ "输出显示了模型在训练过程中学习到的系数。\n", "\n", "### 评估训练好的模型\n", "\n", "现在是时候通过在测试集上评估模型来看看它的表现了 📏!让我们从对测试集进行预测开始吧。\n" ], "metadata": { "id": "tt2BfOxrKmcJ" } }, { "cell_type": "code", "execution_count": 9, "source": [ "# Make predictions on the test set\n", "results <- cuisines_test %>% select(cuisine) %>% \n", " bind_cols(mr_fit %>% predict(new_data = cuisines_test))\n", "\n", "# Print out results\n", "results %>% \n", " slice_head(n = 5)" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ " cuisine .pred_class\n", "1 indian thai \n", "2 indian indian \n", "3 indian indian \n", "4 indian indian \n", "5 indian indian " ], "text/markdown": [ "\n", "A tibble: 5 × 2\n", "\n", "| cuisine <fct> | .pred_class <fct> |\n", "|---|---|\n", "| indian | thai |\n", "| indian | indian |\n", "| indian | indian |\n", "| indian | indian |\n", "| indian | indian |\n", "\n" ], "text/latex": [ "A tibble: 5 × 2\n", "\\begin{tabular}{ll}\n", " cuisine & .pred\\_class\\\\\n", " & \\\\\n", "\\hline\n", "\t indian & thai \\\\\n", "\t indian & indian\\\\\n", "\t indian & indian\\\\\n", "\t indian & indian\\\\\n", "\t indian & indian\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 2
cuisine.pred_class
<fct><fct>
indianthai
indianindian
indianindian
indianindian
indianindian
\n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 248 }, "id": "CqtckvtsKqax", "outputId": "e57fe557-6a68-4217-fe82-173328c5436d" } }, { "cell_type": "markdown", "source": [ "干得好!在Tidymodels中,可以使用[yardstick](https://yardstick.tidymodels.org/)评估模型性能——这是一个通过性能指标来衡量模型效果的工具包。正如我们在逻辑回归课程中所做的那样,让我们从计算混淆矩阵开始。\n" ], "metadata": { "id": "8w5N6XsBKss7" } }, { "cell_type": "code", "execution_count": 10, "source": [ "# Confusion matrix for categorical data\n", "conf_mat(data = results, truth = cuisine, estimate = .pred_class)\n" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ " Truth\n", "Prediction chinese indian japanese korean thai\n", " chinese 83 1 8 15 10\n", " indian 4 163 1 2 6\n", " japanese 21 5 73 25 1\n", " korean 15 0 11 191 0\n", " thai 10 11 3 7 70" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 133 }, "id": "YvODvsLkK0iG", "outputId": "bb69da84-1266-47ad-b174-d43b88ca2988" } }, { "cell_type": "markdown", "source": [ "当处理多个类别时,通常更直观的方式是将其可视化为热图,如下所示:\n" ], "metadata": { "id": "c0HfPL16Lr6U" } }, { "cell_type": "code", "execution_count": 11, "source": [ "update_geom_defaults(geom = \"tile\", new = list(color = \"black\", alpha = 0.7))\n", "# Visualize confusion matrix\n", "results %>% \n", " conf_mat(cuisine, .pred_class) %>% \n", " autoplot(type = \"heatmap\")" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ "plot without title" ], "image/png": "iVBORw0KGgoAAAANSUhEUgAAA0gAAANICAMAAADKOT/pAAADAFBMVEUAAAABAQECAgIDAwMEBAQFBQUGBgYHBwcICAgJCQkKCgoLCwsMDAwNDQ0ODg4PDw8QEBARERESEhITExMUFBQVFRUWFhYXFxcYGBgZGRkaGhobGxscHBwdHR0eHh4fHx8gICAhISEiIiIjIyMkJCQlJSUmJiYnJycoKCgpKSkqKiorKyssLCwtLS0uLi4vLy8wMDAxMTEyMjIzMzM0NDQ1NTU2NjY3Nzc4ODg5OTk6Ojo7Ozs8PDw9PT0+Pj4/Pz9AQEBBQUFCQkJDQ0NERERFRUVGRkZHR0dISEhJSUlKSkpLS0tMTExNTU1OTk5PT09QUFBRUVFSUlJTU1NUVFRVVVVWVlZXV1dYWFhZWVlaWlpbW1tcXFxdXV1eXl5fX19gYGBhYWFiYmJjY2NkZGRlZWVmZmZnZ2doaGhpaWlqampra2tsbGxtbW1ubm5vb29wcHBxcXFycnJzc3N0dHR1dXV2dnZ3d3d4eHh5eXl6enp7e3t8fHx9fX1+fn5/f3+AgICBgYGCgoKDg4OEhISFhYWGhoaHh4eIiIiJiYmKioqLi4uMjIyNjY2Ojo6Pj4+QkJCRkZGSkpKTk5OUlJSVlZWWlpaXl5eYmJiZmZmampqbm5ucnJydnZ2enp6fn5+goKChoaGioqKjo6OkpKSlpaWmpqanp6eoqKipqamqqqqrq6usrKytra2urq6vr6+wsLCxsbGysrKzs7O0tLS1tbW2tra3t7e4uLi5ubm6urq7u7u8vLy9vb2+vr6/v7/AwMDBwcHCwsLDw8PExMTFxcXGxsbHx8fIyMjJycnKysrLy8vMzMzNzc3Ozs7Pz8/Q0NDR0dHS0tLT09PU1NTV1dXW1tbX19fY2NjZ2dna2trb29vc3Nzd3d3e3t7f39/g4ODh4eHi4uLj4+Pk5OTl5eXm5ubn5+fo6Ojp6enq6urr6+vs7Ozt7e3u7u7v7+/w8PDx8fHy8vLz8/P09PT19fX29vb39/f4+Pj5+fn6+vr7+/v8/Pz9/f3+/v7////isF19AAAACXBIWXMAABJ0AAASdAHeZh94AAAgAElEQVR4nO3deWBU9b3//0+ibApWrbYuvYorXaxoaatWvVqpqG2HsCmLBAqoVXBDjCKbKMqOQUDFFVxKqyhVFLUqWKJsxg3Lz2IFGilLiEqptMX0hpzvnJkMCbx5/W5vz5k5Z+D5/OOc85nEz3w8Mw9mMjmo84gocC7qBRDtCQGJKISARBRCQCIKISARhRCQiEIISEQhBCSiEAISUQgBiSiEgEQUQkAiCiEgEYUQkIhCCEhEIQQkohACElEIAYkohIBEFEJAIgohIBGFEJCIQijHkLb+NUZVRb2Ahn26OeoVNCxWpyZWi/mbeGbnGNJl42PUmbNiVM+7nohRlz4ao/o/HqOuFc/sHEMasShGdfwsRt3+1qYYNXJ9jLq9MkZNE89sIMUkIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSbI+EVLQktatJvB8FpNGtv9LoqMteTx7d/f2vND6h5M3oIS07xT0XxjyhQJpx+sGNj79pbfCJgkJ6o7V72t/f4FKdFSmkRa3dnNTBgnYHNPneY7GCVPvB1ggg3ezaTZraq+C8RYvGF7a64cbW7rLIIU1sdmR8IE1ynX4957qC9pFDGtvsiDSkywsn+D0eJaRxycWkIC1pcdzYSecUzIwTpP+0YJBOONJ/CTqncP6iI49YsGjRwqMOjhrSS03GTY0PpJNaVia3P92nImJIc5vcWZqG1LVFoInCgPRCkzF3pyF1bLa8snLdd1pGC+nTOy8uvvdLr+iVEZ2KF/hv7WoTC0f07zvf8zaP79Vl8CrPe+2qzsX3Vu8YZgHS8cf624sKF5RdO84/+plbEDGk8oWfxQjSt7/pb7vu80ngmYJBWvTa+jpIPz0ickhLFlSmIa1vVuSPR7lXI4V0w9jN6wdM94qu+fCfj3XZ5v+MVDRwi/dKl23eoPFfVD/es3pj+/e3b7xudmaYDUjD3C+fmz+6aee64Zsnfz3QdOF82BAjSFPckOV/nrFfv+AzBf6woQ7SWa3Wr18dLaRkaUiL3FB/MMdNjhLS6sTG5KbcK3ra8zYmKlKQ5nrepsQnqxKbkz8zdStblVjtedu9zDD5zyxpn+wPIUJadFsz5wp7pz5i+P1vH2i372gg7dT9+yfPz/WVwScKC1Lrlh0PdAddvyYOkJ5zd/mDN9ywKCG92b42tS9anHwvl/g4BSl9WJZINbv2ng4ls9Z7mWHye9/4cbL3QoR0T/MzRt91SWHqI4bJzh0+KdBsex6kZw/4yYzfXL7PzfGB1LKw20P3F7mL4gDpSTfVHyxzN0YJaVH77WlIS+ohpQ+XJjJv4zbNG9mhrH64uwJBeuPwE/0Xo66FTya3L44f2ragF5AatPGo7/ovRlcULo0NpLff87dd3ZwYQHrOTfIHZW54lJDWJCo876MXdgNpbWJl8usbvZotyd30wZlhFiA97VJwJrjMLL9wDwGpvrfddf7uCXdPbCCle8IFmi8kSEvcLf7gqfQLU1SQvEEjKtddd+9uIHlDS6pqXuzy+at9Pq7dPGRKZpgVSD383Wg3+PkbHkyTugVI9ZW7/v7uEXdXbCCtXOlv73fjYgBpQ4uf+4MhrixSSFvu6NJz2rbdQdo8ruslJSu82ll9Ova6+++ZYRYgvdH8mDeSuw7usRcLT/WPLnGTgVTfxq+02pjc9Xa/jwukdwsv9AfnFbwRA0iVlzZ5p7Jy7bHfDjTXHnGt3UB32qgJFxe2XbSo2H332pLzC77zRsSQ5pWW9nADSkvfiQOkTXe6Hz/wxOWFRcFnCgbp2QkTurorJ0xYvL6Paztu1OmuX6DpgkGaO2lSN9d/0qRlle8dfPTQO37QaA6QFo06qWmjlleWLVr0Zkmrps2O7fFqoNlCgNQ7fS2ZezAWkDY9+P39Gp8wZH3wiYJB6ll3Vu5dv3ZM6xZNT5kYaLaAkHrVLWZ6ZeWiC1s0Pe2ZQLPtIZDCjau/ZVz9rQKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEgqINmAJAOSCkg2IMmApAKSDUgyIKmAZAOSDEiqmEDqOSJGnXJfjOpw+z0xqtOUGNU76rPRsMvEMzvHkEa+GqMuHBajznv8tRg1IOoFNGzgKzFqiHhm5xjSvZ/GqF/MjFHd3on6zWXD7ox6AQ27oypG3Sue2UCKSUCSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyhQTpraYHhzDLL/7zp/3YY9zg1MFDHQ7Z92sXz0ge3fTt5o2O6j0jWkjLTnHPhTFPGJDmt23e/OTSquAThQFpXcl/NW45bFPwiUKEVJN4f6fxpkTFrjdlGVLVmS5aSL0bH1wH6QeFF155puswc+b1BUf37HWi6xQppInNjowNpJcbtbx90jnuluAzhQEpsc9V0y9xJcEnChFS7Qdbd4W0601ZhjSp8bmRQhrWqPiyNKQS1y25/f43Z8z82qEPzJz58GEHRAnppSbjpsYG0o8O+ONnn1V9Z7+NgWcKAdJsd1ty+/Mzg78kZfGtXRLSv/29oUD6wwEll0YKadyomXWQftT0ofRND/e4zt+d7R6IEFL5ws/iA2nydH/bx/0p8EwhQOrSfF3wSVKF+9auNrFwRP++8z1v9aAuVy9Mv7WrGN6964gN3o4vZQ/SRSeujxZSsjpIh540c2aDH4tmnPDV/3TCkD5siA+kdOceGnyOECAdfW5VVWXwaapC/xmpaOAW75Uu22r7lW6rGpKGdGXptn+MKfEyX8oepIcK5n0aE0gzCs7t8/WC/S9KvQw9dNfw0/e5BkgNe9jdHnyS4JA2FfaadEzBQf0/iR+kuf5buk/+mNjoeUvSkLZ+6XmLO9RmvpT8xgVtkr0dNqQ/HdL307hAut8deuxVN15Y0Ma/qcS5Q274jyfcIyH9utlFsfjUrsId9b0Hnrqq8Gfxg7TY8zYnPi5rv93zPklDWj6kuLhboibzpeQ3lvdM9mHYkLoeviY2kB50zacndz9xtya3U6+/7LSCBJDqG7dPpw0hTBMc0l/cQWuSu8vcK7GDtCSlZX77Ws9bk4K0odPsam+pD2lJBtJuCg7pqYKHKyoquh1csS4GkGY2+6a/HeT61N3cPkUKSKmudIM+DWOeEH5GanGmv/2NuyvwTNmBtDxR6XllKUhlRTWe92j2IfVzdZ0fB0itDvO317orphQP948Gur5AqmtgQWkIs3wWCqQzjve3j7l7As+UHUjVPUq3rrs5BWllYsW/Fg5OVGUb0tsv+LU74IU34wCplytJbs8oHD+14Jv+p3ftUmMgJXvahfWJRQiQxrunk9su+7wVeKbsQPI+ur7z1e8k/uzfNKN7jylbB3bblGVI6aL9GWlonz5nu4v69Jkw86GWTdr3+6E7f+bMn7nju/c+veC4//QaoTAgzSst7eEGlJYGnyq4gcrjDipN9V7gqUKAtK71foPuLnKXB59pz7rWLmJIP657d3nVzJn3tv3KPocVJ/XM6H1046bfuGj6fzpnGJB6163rwcAzBYf0UeYt+GOBpwrjEqGP+3yt0XFj43WtXZC4+lvF1d8yrv62AUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAckGJBWQZECyAUkFJBmQbEBSAUkGJBuQVECSAcl28S9jVKt2Mer4blGfjob9KOoFNOxnV8Soi8QzO8eQ7loVo7r/JUYNf/KNGDXkTzFq1PoYNU08s3MMafLaGNUz6rcJDbvtmWUxanhFjIrV+8z7xDMbSDEJSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIs/yBt6n1EoUsFpFwHJFn+Qbp437a9+6UCUq4Dkiz/IH312WwBAtL/FpBk+QdpvyogRRWQZPkH6ezXgRRVQJLlH6S3f7gYSBEFJFn+QTrzv9x+R6cCUq4Dkiz/IJ3dNhOQch2QZPkHKfsBSQUkWT5C+uyFBx56+Qsg5T4gyfIP0vZBjfzLGvYfD6ScByRZ/kEa7zo+/OIL91/gHgVSrgOSLP8gfeuG9P6K7wEp1wFJln+QmsxP7+c1A1KuA5Is/yDt/3x6/2xzIOU6IMnyD9JZP672d9vanQukXAckWf5Bmldw1JWjbr/8iMJXgZTrgCTLP0jeb7/pf/z93XnZcgQkGZBkeQjJ89a/VV6ZNUZA0gFJlpeQshyQVECS5RmkVqO9VjsCUq4DkizPIJ1W6p22IyDlOiDJ8gxSTgKSCkiy/IPU5sP0/ulvASnXAUmWf5BceWr3P7c1BlKuA5Is3yC5+rhoNecBSZZvkN6/2xWl/uuQl434C5ByHZBk+QbJ8y74U7YAAel/C0iy/IPkbZyS3FTdtglIOQ9IsvyDtPIw/1OGCnfYaiDlOiDJ8g9Sh+Pf8ncfHt8JSLkOSLL8g3ToI+n9/S2AlOuAJMs/SM2eSO9/tV9MIc07t3nzk8ZW+Ie/P9k9GT2kkvSvC86OGtJrmV9cjF+2bNoPvtL4xMFLo4T0/Dn773/SmDUVFdenV3Vm9JCWneKeC2OefwvSjy6o8Xdf/ODMzC01ifdjBOnZfY8eNuYsd2PycHSzI+IA6ZeFd/n9OmpIbw5JdX7Br5ZNKmx1402nuCsihPTbfY8eOvosN6iiom/hWL+ZkUOa2OzIHEJ6ueDYASNH9Dm08OXMLbUfbI0RpNNbvLt2bcW391uz9rdNRk2KA6TuBwSfI10Yb+1eP7TDsmXfOLJs2bJFRx8cIaTTWrxdUbHmW/utqri4RaCJQoP0UpNxU3MIyXuljf9CfHJc/4bs+Lv9bbFbvrbsd2tjAelnRwafI10YkC458NVliwdO9A9/7sqigzRusr/t6d6ruPDweEAqX/hZTiF53mcf/H8N/4vF/lu7iuHdu47Y4FUnXh7cr+9SLzOuTSwc0b/vfM/bPL5Xl8GrPO+1qzoX31u9Y5gFSOnOPiS1iwWks1tVVa0NPk1VKJCeLCzJHC5tfVigqcL4sOHsQyoqzjyxomJlDCAlyzGkXfIhXVm67R9jSpKH1/3Ve7XDlszYKxq4xXulyzZv0Pgvqh/vWb2x/fvbN143OzNM/sP/XJfsy7Ah3eeGxQfSKcd0PsgdNOgvsYB0/qFvpPZvzH34gn3HRg3pHje0ouLklkUHuoOu/WhvgrTbvyHrQ9qatLC4Q21N4jnP2971lczYK5rreZsSn6xKbE7+LNWtbFVidfLrXmaY/IcXtEn2dsiQZjZrVxEfSMcU9pj5UEf30+AzBYf0ZOGg9MFU5w4vDTZXcEiPNDt/TUVFy8JL7r8n4S7YmyDt9m/I+pCWDyku7paoqUksS95w1azM2CtanHxbl/i4LJFqdu09HUpmrfcyw+T3rrg52c4XSQSGNGqfotVr4wPp/RX+trubG3im4JC6Nn49ffC7icPPL/hFtJBu36f9x8ndknJ/cLF7ai+CtNuSkDZ0ml3tLfUh+f9fzCt+nRl7RUtSkJYmquu+edO8kR3K6oe7Kyikfu7aT9bGCFK637hRgecIDGnp13/UYNTXzYgSUl93zZ/rR4+6EUB6v6yoxvMe9SE97XnVnV/LjDOQ1iZWJr9xo1ezJbmbPjgzzAqkqwvG7jiOBaTVq/3tQ25i4JkCQ3rE3eLvXip5xN/d5YZGCGlAwZj0wYoV/vYeN3ovgrR/g3b8DdkkpJWJFf9aODhRVZMYUFE9q+PfMuMMJG9oSVXNi10+f7XPx7Wbh0zJDLMB6VduZP0gDpA+KLzI37UtWBI9pKvdr/zd7wq/tyS56+qmRgfp8cwr0LLCdv7u3ILX9yJIXZO1anRG5w6nFLS5ugEkb0b3HlO2Duy2IfHiTZ37lXuZ8aYMpM3jul5SssKrndWnY6+7/54ZZgHSmmMPHDvOb8naOePGXeJ+OW7cm9FCqurnzp8w+gx3efCZAkP6uft9at/bnXz9ze0KTloSGaRVxxw4JnVBw6KK3u680SN/6PoEmS4MSPNKS3u4AaWl7+QAUrLZJ23wdyu/ObchpB2H74g5/g8FgvR+5oKyB9deWnc0LWJIG8efckDTU0uDTxQc0tmF6f3Swa2aNjuu5+uBJgsE6d3M4/RAxeo7Tm7RtPW4QI7CgNQ788zJDaSTnkrv72tdd8P2jxI7PnWLHlLYcfW3jKu/Vf8WpMavpfezm9TdsLDDqFog5SQgyfIP0hGXpna1XQ8PTgZI/7eAJMs/SLe67147atSAb7nBQMp1QJLlH6TacYf7P5EdMrwGSLkOSLL8g5Sk9Mmypau3Z4sRkHRAkuUjpG1vzfnU+x8g5T4gyfIQ0sQWzi3xhvwia5SApAKSLP8gPeDaT09CenTf8UDKdUCS5R+kk6/0tiUhebecCKRcByRZ/kFq+moa0u8aASnXAUmWf5C+9nwa0lMHACnXAUmWf5B+cs4/fUifn9QOSLkOSLL8g/T6Psdf5/r2PqDRm0DKdUCS5R8k77VT/Ssbfvj7bDkCkgxIsjyE5Hmb3ntvc9YYAUkHJFn+QToje/+JVSD9LwFJln+QvjEJSFEFJFn+QXruW7/9F5CiCUiy/IN09ndd4yOO9gNSrgOSLP8gnXle27qAlOuAJMs/SNkPSCogyfIO0rZlb24BUkQBSZZvkCa3cK5R/y/FNwIpuwFJlmeQnnEtbxh2lrtafCOQshuQZHkG6eyW/v8utm+jvwEpioAkyzNIzYf727dc1i5YBdL/X0CS5Rkkd7+/3eBeFt8JpKwGJFm+QXrQ3250LwEpioAkAxKQ/v2AJMs3SLcsSTbPlfo7IOU6IMnyDVLDgJTrgCTLM0i3NgxIuQ5IsjyDlJOApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZAOSCkgyINmApAKSDEg2IKmAJAOSDUgqIMmAZOvQM0Yd/4sYdWq7TjHqB5fGqJ/2iVEXimd2jiFNWR+jij+PUaOWboxRiWkxanTUj03DYvKKBCQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiQbkFRAkgHJBiQVkGRAsgFJBSQZkGxAUgFJBiRbUEhvtHZP+/sbXKqzIodUduEBTdr8KoSJAkNa1No9s+tRFJBGHOWuSx1cf3zjxifcsPNtkUEK7XEKBVJN4p1oIY1tdkQa0uWFE/wejxrS2y2OmzD53IIngs8UFNK45Kl5ZpejKCB1a3xQGs2V7shLLj1s35sa3hYZpPAepz0C0twmd5amIXVtEWii0CB1bvbh559vOumY4DMFhPR8k9GT03zqj6KANKjRJT3TaA498K5p0ya0aNXwtsgghfc47RGQFr22vg7ST4+IBaSqZh393Wj3euCpAkJaPH9jHZ/6oygg3XrLtDSaMe4sf9y2YHz9bZFBCvFxCg1SzbCRNX8d36tzyYfe9sTv+k32No/v1WXwKs+rGN6964gNXm1i4Yj+fednBVKyOkhntVq/fnX0kJa54f5urpsaeKrgHzbU84kQUrI0mjvcef6gixtYf1tkkEJ8nEKDVFrypTfo1i1fPtz1b17RwFX/9AaN/6L68Z7V3pWl2/4xpsRL3rjFe6XLtuS3f74s2d+yAql1y44HuoOuXxMxpBfc3f5uiRsReKo9DdLU/Y7yB23cZTGAFOLjFBakJ/p/4a1OrPW86osXeEVPet6qxGbPq+1W5m390vMWd6j1iuZ63qbEJ8lvX9Am2dtZgdSysNtD9xe5iyKG9Iy7z9+9424KPNWeBmlawv33yNsuaOH6xgBSiI9TSJDGJv7geW+2r00O+v/GKyrzvLJEqtne8iHFxd0SNV7RYs/bnPcB240AABDoSURBVPg4+R2rpyRblxVIb7/nb7u6OdFCmucm+7vF7tbAU+1xkO4+r8C5b13qrowBpBAfp5Ag9RsxsKYO0lVPeEVLPG9pojr1tQ2dZlcnBzWpG9OQdlNYkNI94UZGC+ltN8zfzUn/gReoPQ7StGljS+5M/ow0LAaQQnycQoJUvrXPI94a/43bts7zU2bWJlYmv7LRKyuq8bxHcwVp5Up/e78bFy2kT1sk/N1wtzjwVHsgJL/v7jclBpBCfJxC+7BhRYd3vZKRX2y7r+c/Uma8oSVVNS92+XxlYsW/Fg5OVOUE0ruFF/qD8wreiBbS58VNln/++YZjvxN8pj0O0umHTp42bXDhORZX7iGF+DiF93ukx4u3VN3R89Lbkj/8pCBtHtf1kpIVnjeje48pWwd225RFSM9OmNDVXTlhwuL1fVzbcaNOd/0CTRcCpD98teXwMT9sNDf4TAEhzZ04sZu7auLEpQ2OooB0Q48ep7u2PXqMnHZFwQnFHZp/dWzD2yKDFN7jtEdca9czfYWdu3f92jGtWzQ9ZWKg2UK51m7ZRS2anvFcCBMFhFRcd2rua3AUBaSz6u69z7Rpfb7RqPlpd+58W1SQwnuc9ghIIcfV3zKu/lYByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSRUTSB2LY9SJfWJUm469Y1TLs2NUIurHpmEXimd2jiFNq4xRvaP+c79ht77zWYwatSlGXV8eo24Xz2wgxSQgyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSDYgqYAkA5INSCogyYBkA5IKSDIg2YCkApIMSLagkBa1dnNSBwvaHdDke49FCym5mGd2PYoW0pz/PrjJSZM+DT5RcEh/cnXNjBbSgsw6JpSXzzq7eeOT7oo1pKIlOw1rEu9nBdK4ZkekIS1pcdzYSecUzIwSkr+YZ3Y5ihbSrMKTx0443Q2OA6R1d6UqKng9WkiLh6Y6v2BW+Zz9j7p56GkFE+MKafnHBlLtB1uzAemFJmPuTkPq2Gx5ZeW677SMENLzTUZPTvOpP4oYUsuj13322cbjD40DpHSrDy8OPkkIb+0WHtqxvPyCpi+Vly894RtxhXTbiwaSLBikJQsq05DWNyvyx6Pcq9FBWjx/Yx2f+qNoIVXe8YS/6+HWxQbSZQd/FHySECB1PXB++bKm5/uHg9wT8YQ0pH2n672iV0Z0Kl7geRXDu3cdsSFrb+0q6yAtckP9wRw3OTpIyer5xAJSuk9P+0bwSUKC9Gbh2BBmCQ5pduHN5eVPuwH+8XQ3Ip6QvH7+K9I1H/7zsS7bvCtLt/1jTEkdpPXPJKvKBqTn3F3+4A03DEg7tWH5y50bzQw+T0iQOhz+lxBmCQ6p3aGLyssfcMP846fc1XGG9LTnbUxUeFu/9LzFHWrTkBa0SfZ2NiA96ab6g2XuRiDt1DPOHfWbEOYJB9KbhXeGMU1gSLMLb0xup7nb/MGz7vI4Q1rseZsTH3vLhxQXd0vUZP8VaZI/KHPDgbRTH/1qaseCgcHnCQfS5Y1XhzFNYEjdGi9Mbh90Q/3BU+6aOENakoK0odPsam9pBtJuCgnSEneLP3gq/cIEpJ0a5F4NPEcokCqP+EkY0wSG9NbXz/R3c1x/f3dP+oUp3pDKimo879HsQ9rQ4uf+YIgrA1J9fxz3O3/3azc5HpBecpPCmCYwpBnpl6Jl+5/n7wa4p2IKqf/Df89AWplY8a+FgxNV2YZUeWmTdyor1x777UBz7WmQPio8syq5+6V7Jh6Qhrvgv4z1CwrpGjcrte/Q+Pny8kX/dUKgybIIaW7nPhlI3ozuPaZsHdhtQ3YgzZ00qZvrP2nSssr3Dj566B0/aDQnQkhzJ07s5q6aOHFpg6NoIX12nfvhqImdCr5fFQ9I3d2aMKYJDCnhFqb28w48csCgk/edHldI/5eCQepVd9nU9MrKRRe2aHraM4FmCwipuG4x9zU4ihjSp5NObrb/t66uCD5TKJAuKAxjluCQ/ruw7uDpc/Zvcup9wSbbIyCFHFd/y7j6WwUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkFZBsQJIBSQUkG5BkQFIByQYkGZBUQLIBSQYkVUwgzXg8Rg2MegENGzY16hU07NqoF9CwWD1O08QzO8eQiPbMgEQUQkAiCiEgEYUQkIhCCEhEIQQkohACElEIAYkohIBEFEJAIgohIBGFEJCIQghIRCGUl5CmTY56BQ2ad+emqJdQ34d3Lo16CfV9eeesqJfQoBl3ZnX6vISUaBf1Chp0R5uPo15Cfa+2eTzqJdS3tc3VUS+hQb2/n9XpgRQ0IKmAFPeApAKSDEg2IKmAJAMSUfwDElEIAYkohPIDUtGS1K4m8X7ECzFL2JSoyPmqYnAaTDWJd6Jegq7u6ZMpK+cvryDVfrA14oWYJSQh5XxVMTgNpthCWv6xgZSV85dXkGJYElLUS4hFsYV024u5efrEHNKnd15cfO+XXtErIzoVL/Bfk2sTC0f07zvf8zaP79Vl8CrPe+2qzsX3Vu8YZruGS1g9qMvVC9Nv7SqGd+86YoO340vZXkPmDqsTLw/u13epZxaQ69PjQ6oZNrLmr+N7dS750Nue+F2/yTvuNadnZ+eGtO90febpk1nH3vjW7oaxm9cPmO4VXfPhPx/rss0/A0UDt3ivdNnmDRr/RfXjPas3tn9/+8brZmeGWV9QgyXU9ivdVjUkDenK0m3/GFPi7Vhd1tdQd4c1iev+6r3aYYtZQK5Pjw+ptORLb9CtW758uOvfkutY9c8d95rTs7NL/fxXpPTTp/6k7XWQVic2JjflXtHTnrcx/ZQtmuu/n/pkVWJz8s1ut7JVidWet93LDLO+ogZL+KO/uCXpVW390vMWd6jNfCn7a6i7w5rEc8l//a6v7LqAnJ+eJKQn+n+RfMDWel71xQu8oie9+nvN6dnZpRSk9NOn/qTtdZDebF+b2hctTr5ZSXycehanD8sSqWbX3tOhZNZ6LzPM+ooaLqH9ds/7JA1p+ZDi4m6JmsyXsr+GujusSSxL3nDVrF0XkPPTU5MYm/hD5gHr/xuvKIl2x73m9OzsUgpS3f3uOGl7HaRF/nPVS/+0mIGUPlyayLxP2TRvZIey+mGWa7CE+f6TZk0K0oZOs6u9pf5TZUluIGXusCaRfI54V/x61wXk/PTUJPqNGFhTB+mqJ1LryNxrbs/OLvV7ccfTp/6k7XWQ1vifiX30wm4grU2sTH59o1ezJbmbPjgzzHoNlrA8Uen/qetDKiuq8bxHcwgpc4c1ieS7lurOr+26gJyfnppE+dY+jyQfsOQbt22d56fWkbnX3J6dXWoAqf6k7XWQvEEjKtddd+9uIHlDS6pqXuzy+at9Pq7dPGRKZpj1BTVYQnWP0q3rbk5BWplY8a+FgxNVOYOUucOaxICK6lkd/2YWkOvT43/YsKLDu17JyC+23dfzH+lPnOvuNbdnZ5f6P/z3zP3Wn7S9D9KWO7r0nLZtd5A2j+t6SckKr3ZWn4697v57Zpj1Gi7ho+s7X/1O4s/+TTO695iydWC3TbmClLnDDYkXb+rcr9wzC8j16Un9Hunx4i1Vd/S89LZ1db+6ydxrTs/OLs3t3GfHA7bjpO19kGg3NfgTNba/B93rAlLetf0j/zPtdECKS0DKuxZ2GFWbOQZSXAISUQgBiSiEgEQUQkAiCiEgEYUQkPK3X7pMp+32622Pzu169uqAlL+9PnXq1Gtd5+TWXNb9nv+4AimHASm/e92V7u7mKUDKcUDK7+ognXn28984w2vd2j8u+qp3QfLtXhuv7XFrLmze/JLsX8lLQMr36iCdd/I373mhHtKfilz5h17blq1HP3tjwS+iXeFeEpDyuzpIbd2c5HYHJK+f23Hjj74W4fL2noCU32UgNf6XZyE19a/J61UY4fL2noCU32UgHeFvd4V0tD/sx0OcizjL+V0G0tH+FkjRxVnO73aCdOpJ/vY0IEUQZzm/2wnSeYckfyja1CwJ6TL3P0DKaZzl/G4nSJPdmMp3f/ydJKQR7rangZTLOMv53U6Qqm84sknr5we08Ly/nNqoFZByGWeZKISARBRCQCIKISARhRCQiEIISEQhBCSiEAISUQgBiSiEgEQUQkAiCiEgEYXQ/wMhANIDIZLX1QAAAABJRU5ErkJggg==" }, "metadata": { "image/png": { "width": 420, "height": 420 } } } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 436 }, "id": "HsAtwukyLsvt", "outputId": "3032a224-a2c8-4270-b4f2-7bb620317400" } }, { "cell_type": "markdown", "source": [ "混淆矩阵图中较深的方块表示案例数量较多,希望你能看到一条较深方块组成的对角线,表明预测标签与实际标签一致的情况。\n", "\n", "现在让我们计算混淆矩阵的汇总统计数据。\n" ], "metadata": { "id": "oOJC87dkLwPr" } }, { "cell_type": "code", "execution_count": 12, "source": [ "# Summary stats for confusion matrix\n", "conf_mat(data = results, truth = cuisine, estimate = .pred_class) %>% \n", "summary()" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ " .metric .estimator .estimate\n", "1 accuracy multiclass 0.7880435\n", "2 kap multiclass 0.7276583\n", "3 sens macro 0.7780927\n", "4 spec macro 0.9477598\n", "5 ppv macro 0.7585583\n", "6 npv macro 0.9460080\n", "7 mcc multiclass 0.7292724\n", "8 j_index macro 0.7258524\n", "9 bal_accuracy macro 0.8629262\n", "10 detection_prevalence macro 0.2000000\n", "11 precision macro 0.7585583\n", "12 recall macro 0.7780927\n", "13 f_meas macro 0.7641862" ], "text/markdown": [ "\n", "A tibble: 13 × 3\n", "\n", "| .metric <chr> | .estimator <chr> | .estimate <dbl> |\n", "|---|---|---|\n", "| accuracy | multiclass | 0.7880435 |\n", "| kap | multiclass | 0.7276583 |\n", "| sens | macro | 0.7780927 |\n", "| spec | macro | 0.9477598 |\n", "| ppv | macro | 0.7585583 |\n", "| npv | macro | 0.9460080 |\n", "| mcc | multiclass | 0.7292724 |\n", "| j_index | macro | 0.7258524 |\n", "| bal_accuracy | macro | 0.8629262 |\n", "| detection_prevalence | macro | 0.2000000 |\n", "| precision | macro | 0.7585583 |\n", "| recall | macro | 0.7780927 |\n", "| f_meas | macro | 0.7641862 |\n", "\n" ], "text/latex": [ "A tibble: 13 × 3\n", "\\begin{tabular}{lll}\n", " .metric & .estimator & .estimate\\\\\n", " & & \\\\\n", "\\hline\n", "\t accuracy & multiclass & 0.7880435\\\\\n", "\t kap & multiclass & 0.7276583\\\\\n", "\t sens & macro & 0.7780927\\\\\n", "\t spec & macro & 0.9477598\\\\\n", "\t ppv & macro & 0.7585583\\\\\n", "\t npv & macro & 0.9460080\\\\\n", "\t mcc & multiclass & 0.7292724\\\\\n", "\t j\\_index & macro & 0.7258524\\\\\n", "\t bal\\_accuracy & macro & 0.8629262\\\\\n", "\t detection\\_prevalence & macro & 0.2000000\\\\\n", "\t precision & macro & 0.7585583\\\\\n", "\t recall & macro & 0.7780927\\\\\n", "\t f\\_meas & macro & 0.7641862\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 13 × 3
.metric.estimator.estimate
<chr><chr><dbl>
accuracy multiclass0.7880435
kap multiclass0.7276583
sens macro 0.7780927
spec macro 0.9477598
ppv macro 0.7585583
npv macro 0.9460080
mcc multiclass0.7292724
j_index macro 0.7258524
bal_accuracy macro 0.8629262
detection_prevalencemacro 0.2000000
precision macro 0.7585583
recall macro 0.7780927
f_meas macro 0.7641862
\n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 494 }, "id": "OYqetUyzL5Wz", "outputId": "6a84d65e-113d-4281-dfc1-16e8b70f37e6" } }, { "cell_type": "markdown", "source": [ "如果我们仅关注一些指标,比如准确率、敏感性、PPV,作为开始,我们的表现还不错 🥳!\n", "\n", "## 4. 深入探讨\n", "\n", "让我们问一个微妙的问题:选择某种菜系作为预测结果的标准是什么?\n", "\n", "实际上,统计机器学习算法,比如逻辑回归,是基于`概率`的;分类器真正预测的是一组可能结果的概率分布。然后,概率最高的类别会被选为给定观察数据中最可能的结果。\n", "\n", "让我们通过同时进行硬分类预测和概率预测来看看实际效果。\n" ], "metadata": { "id": "43t7vz8vMJtW" } }, { "cell_type": "code", "execution_count": 13, "source": [ "# Make hard class prediction and probabilities\n", "results_prob <- cuisines_test %>%\n", " select(cuisine) %>% \n", " bind_cols(mr_fit %>% predict(new_data = cuisines_test)) %>% \n", " bind_cols(mr_fit %>% predict(new_data = cuisines_test, type = \"prob\"))\n", "\n", "# Print out results\n", "results_prob %>% \n", " slice_head(n = 5)" ], "outputs": [ { "output_type": "display_data", "data": { "text/plain": [ " cuisine .pred_class .pred_chinese .pred_indian .pred_japanese .pred_korean\n", "1 indian thai 1.551259e-03 0.4587877 5.988039e-04 2.428503e-04\n", "2 indian indian 2.637133e-05 0.9999488 6.648651e-07 2.259993e-05\n", "3 indian indian 1.049433e-03 0.9909982 1.060937e-03 1.644947e-05\n", "4 indian indian 6.237482e-02 0.4763035 9.136702e-02 3.660913e-01\n", "5 indian indian 1.431745e-02 0.9418551 2.945239e-02 8.721782e-03\n", " .pred_thai \n", "1 5.388194e-01\n", "2 1.577948e-06\n", "3 6.874989e-03\n", "4 3.863391e-03\n", "5 5.653283e-03" ], "text/markdown": [ "\n", "A tibble: 5 × 7\n", "\n", "| cuisine <fct> | .pred_class <fct> | .pred_chinese <dbl> | .pred_indian <dbl> | .pred_japanese <dbl> | .pred_korean <dbl> | .pred_thai <dbl> |\n", "|---|---|---|---|---|---|---|\n", "| indian | thai | 1.551259e-03 | 0.4587877 | 5.988039e-04 | 2.428503e-04 | 5.388194e-01 |\n", "| indian | indian | 2.637133e-05 | 0.9999488 | 6.648651e-07 | 2.259993e-05 | 1.577948e-06 |\n", "| indian | indian | 1.049433e-03 | 0.9909982 | 1.060937e-03 | 1.644947e-05 | 6.874989e-03 |\n", "| indian | indian | 6.237482e-02 | 0.4763035 | 9.136702e-02 | 3.660913e-01 | 3.863391e-03 |\n", "| indian | indian | 1.431745e-02 | 0.9418551 | 2.945239e-02 | 8.721782e-03 | 5.653283e-03 |\n", "\n" ], "text/latex": [ "A tibble: 5 × 7\n", "\\begin{tabular}{lllllll}\n", " cuisine & .pred\\_class & .pred\\_chinese & .pred\\_indian & .pred\\_japanese & .pred\\_korean & .pred\\_thai\\\\\n", " & & & & & & \\\\\n", "\\hline\n", "\t indian & thai & 1.551259e-03 & 0.4587877 & 5.988039e-04 & 2.428503e-04 & 5.388194e-01\\\\\n", "\t indian & indian & 2.637133e-05 & 0.9999488 & 6.648651e-07 & 2.259993e-05 & 1.577948e-06\\\\\n", "\t indian & indian & 1.049433e-03 & 0.9909982 & 1.060937e-03 & 1.644947e-05 & 6.874989e-03\\\\\n", "\t indian & indian & 6.237482e-02 & 0.4763035 & 9.136702e-02 & 3.660913e-01 & 3.863391e-03\\\\\n", "\t indian & indian & 1.431745e-02 & 0.9418551 & 2.945239e-02 & 8.721782e-03 & 5.653283e-03\\\\\n", "\\end{tabular}\n" ], "text/html": [ "\n", "\n", "\n", "\t\n", "\t\n", "\n", "\n", "\t\n", "\t\n", "\t\n", "\t\n", "\t\n", "\n", "
A tibble: 5 × 7
cuisine.pred_class.pred_chinese.pred_indian.pred_japanese.pred_korean.pred_thai
<fct><fct><dbl><dbl><dbl><dbl><dbl>
indianthai 1.551259e-030.45878775.988039e-042.428503e-045.388194e-01
indianindian2.637133e-050.99994886.648651e-072.259993e-051.577948e-06
indianindian1.049433e-030.99099821.060937e-031.644947e-056.874989e-03
indianindian6.237482e-020.47630359.136702e-023.660913e-013.863391e-03
indianindian1.431745e-020.94185512.945239e-028.721782e-035.653283e-03
\n" ] }, "metadata": {} } ], "metadata": { "colab": { "base_uri": "https://localhost:8080/", "height": 248 }, "id": "xdKNs-ZPMTJL", "outputId": "68f6ac5a-725a-4eff-9ea6-481fef00e008" } }, { "cell_type": "markdown", "source": [ "为什么模型非常确定第一条观察是泰国菜?\n", "\n", "## **🚀挑战**\n", "\n", "在本课中,你使用清理后的数据构建了一个机器学习模型,可以根据一系列食材预测国家菜系。花点时间阅读 [Tidymodels 提供的多种选项](https://www.tidymodels.org/find/parsnip/#models) 来分类数据,以及 [其他方法](https://parsnip.tidymodels.org/articles/articles/Examples.html#multinom_reg-models) 来拟合多项式回归。\n", "\n", "#### 感谢:\n", "\n", "[`Allison Horst`](https://twitter.com/allison_horst/) 创作了令人惊叹的插图,使 R 更加友好和吸引人。可以在她的 [画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM) 中找到更多插图。\n", "\n", "[Cassie Breviu](https://www.twitter.com/cassieview) 和 [Jen Looper](https://www.twitter.com/jenlooper) 创作了本模块的原始 Python 版本 ♥️\n", "\n", "
\n", "本来想加点笑话,但我对食物的双关语一窍不通 😅。\n", "\n", "
\n", "\n", "祝学习愉快,\n", "\n", "[Eric](https://twitter.com/ericntay),微软金牌学习学生大使\n" ], "metadata": { "id": "2tWVHMeLMYdM" } }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n" ] } ] }