{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "lesson_12-R.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "ir", "display_name": "R" }, "language_info": { "name": "R" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "jsFutf_ygqSx" }, "source": [ "# Build a classification model: Delicious Asian and Indian Cuisines" ] }, { "cell_type": "markdown", "metadata": { "id": "HD54bEefgtNO" }, "source": [ "## Cuisine classifiers 2\n", "\n", "In this second classification lesson, we will explore `more ways` to classify categorical data. We will also learn about the ramifications for choosing one classifier over the other.\n", "\n", "### [**Pre-lecture quiz**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n", "\n", "### **Prerequisite**\n", "\n", "We assume that you have completed the previous lessons since we will be carrying forward some concepts we learned before.\n", "\n", "For this lesson, we'll require the following packages:\n", "\n", "- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!\n", "\n", "- `tidymodels`: The [tidymodels](https://www.tidymodels.org/) framework is a [collection of packages](https://www.tidymodels.org/packages/) for modeling and machine learning.\n", "\n", "- `themis`: The [themis package](https://themis.tidymodels.org/) provides Extra Recipes Steps for Dealing with Unbalanced Data.\n", "\n", "You can have them installed as:\n", "\n", "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n", "\n", "Alternatively, the script below checks whether you have the packages required to complete this module and installs them for you in case they are missing." ] }, { "cell_type": "code", "metadata": { "id": "vZ57IuUxgyQt" }, "source": [ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n", "\n", "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "z22M-pj4g07x" }, "source": [ "Now, let's hit the ground running!\n", "\n", "## **1. A classification map**\n", "\n", "In our [previous lesson](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1), we tried to address the question: how do we choose between multiple models? To a great extent, it depends on the characteristics of the data and the type of problem we want to solve (for instance classification or regression?)\n", "\n", "Previously, we learned about the various options you have when classifying data using Microsoft's cheat sheet. Python's Machine Learning framework, Scikit-learn, offers a similar but more granular cheat sheet that can further help narrow down your estimators (another term for classifiers):\n", "\n", "
\n",
" \n",
"
\n",
" \n",
"
\n",
" \n",
"