{ "nbformat": 4, "nbformat_minor": 0, "metadata": { "colab": { "name": "lesson_12-R.ipynb", "provenance": [], "collapsed_sections": [] }, "kernelspec": { "name": "ir", "display_name": "R" }, "language_info": { "name": "R" }, "coopTranslator": { "original_hash": "fab50046ca413a38939d579f8432274f", "translation_date": "2025-09-06T14:47:49+00:00", "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb", "language_code": "sw" } }, "cells": [ { "cell_type": "markdown", "metadata": { "id": "jsFutf_ygqSx" }, "source": [] }, { "cell_type": "markdown", "metadata": { "id": "HD54bEefgtNO" }, "source": [ "## Wainishaji wa vyakula 2\n", "\n", "Katika somo hili la pili la uainishaji, tutachunguza `njia zaidi` za kuainisha data ya kategoria. Pia tutajifunza kuhusu athari za kuchagua mainishaji mmoja badala ya mwingine.\n", "\n", "### [**Jaribio la awali la somo**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n", "\n", "### **Mahitaji ya awali**\n", "\n", "Tunadhani kuwa umekamilisha masomo ya awali kwa kuwa tutatumia baadhi ya dhana tulizojifunza hapo kabla.\n", "\n", "Kwa somo hili, tutahitaji vifurushi vifuatavyo:\n", "\n", "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) ni [mkusanyiko wa vifurushi vya R](https://www.tidyverse.org/packages) vilivyoundwa ili kufanya sayansi ya data kuwa ya haraka, rahisi, na ya kufurahisha!\n", "\n", "- `tidymodels`: Mfumo wa [tidymodels](https://www.tidymodels.org/) ni [mkusanyiko wa vifurushi](https://www.tidymodels.org/packages/) kwa ajili ya uundaji wa mifano na ujifunzaji wa mashine.\n", "\n", "- `themis`: [Kifurushi cha themis](https://themis.tidymodels.org/) kinatoa Hatua za Ziada za Mapishi kwa Kushughulikia Data Isiyosawazishwa.\n", "\n", "Unaweza kuvifunga kwa kutumia:\n", "\n", "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n", "\n", "Vinginevyo, script iliyo hapa chini hukagua kama una vifurushi vinavyohitajika kukamilisha moduli hii na kuvifunga kwako endapo havipo.\n" ] }, { "cell_type": "code", "metadata": { "id": "vZ57IuUxgyQt" }, "source": [ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n", "\n", "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "z22M-pj4g07x" }, "source": [ "## **1. Ramani ya uainishaji**\n", "\n", "Katika [somo letu la awali](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1), tulijaribu kujibu swali: tunachaguaje kati ya mifano mbalimbali? Kwa kiasi kikubwa, inategemea sifa za data na aina ya tatizo tunalotaka kutatua (kwa mfano, uainishaji au regression?)\n", "\n", "Hapo awali, tulijifunza kuhusu chaguo mbalimbali unazoweza kutumia unapouainisha data kwa kutumia karatasi ya msaada ya Microsoft. Mfumo wa Kujifunza kwa Mashine wa Python, Scikit-learn, unatoa karatasi ya msaada inayofanana lakini ya kina zaidi ambayo inaweza kusaidia zaidi kupunguza chaguo zako za estimators (neno lingine kwa classifiers):\n", "\n", "

\n", " \n", "

\n" ] }, { "cell_type": "markdown", "metadata": { "id": "u1i3xRIVg7vG" }, "source": [ "> Kidokezo: [tembelea ramani hii mtandaoni](https://scikit-learn.org/stable/tutorial/machine_learning_map/) na bonyeza kwenye njia ili kusoma nyaraka.\n", ">\n", "> Tovuti ya [Tidymodels reference](https://www.tidymodels.org/find/parsnip/#models) pia inatoa nyaraka bora kuhusu aina tofauti za modeli.\n", "\n", "### **Mpango** πŸ—ΊοΈ\n", "\n", "Ramani hii ni muhimu sana mara tu unapokuwa na uelewa mzuri wa data yako, kwani unaweza 'kutembea' kwenye njia zake kuelekea uamuzi:\n", "\n", "- Tuna sampuli \\>50\n", "\n", "- Tunataka kutabiri kategoria\n", "\n", "- Tuna data yenye lebo\n", "\n", "- Tuna sampuli chini ya 100K\n", "\n", "- ✨ Tunaweza kuchagua Linear SVC\n", "\n", "- Ikiwa hiyo haifanyi kazi, kwa kuwa tuna data ya nambari\n", "\n", " - Tunaweza kujaribu ✨ KNeighbors Classifier\n", "\n", " - Ikiwa hiyo haifanyi kazi, jaribu ✨ SVC na ✨ Ensemble Classifiers\n", "\n", "Hii ni njia muhimu sana ya kufuata. Sasa, hebu tuanze moja kwa moja kwa kutumia mfumo wa modeli wa [tidymodels](https://www.tidymodels.org/): mkusanyiko thabiti na rahisi wa pakiti za R zilizotengenezwa ili kuhimiza mazoea mazuri ya takwimu 😊.\n", "\n", "## 2. Gawanya data na kushughulikia seti ya data isiyo na uwiano.\n", "\n", "Kutoka kwenye masomo yetu ya awali, tulijifunza kuwa kulikuwa na seti ya viungo vya kawaida katika vyakula vyetu. Pia, kulikuwa na usambazaji usio sawa katika idadi ya vyakula.\n", "\n", "Tutashughulikia haya kwa:\n", "\n", "- Kuondoa viungo vya kawaida zaidi vinavyosababisha mkanganyiko kati ya vyakula tofauti, kwa kutumia `dplyr::select()`.\n", "\n", "- Kutumia `recipe` inayosindika data ili kuifanya iwe tayari kwa modeli kwa kutumia algoriti ya `over-sampling`.\n", "\n", "Tayari tulitazama haya katika somo la awali kwa hivyo hili linapaswa kuwa rahisi πŸ₯³!\n" ] }, { "cell_type": "code", "metadata": { "id": "6tj_rN00hClA" }, "source": [ "# Load the core Tidyverse and Tidymodels packages\n", "library(tidyverse)\n", "library(tidymodels)\n", "\n", "# Load the original cuisines data\n", "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n", "\n", "# Drop id column, rice, garlic and ginger from our original data set\n", "df_select <- df %>% \n", " select(-c(1, rice, garlic, ginger)) %>%\n", " # Encode cuisine column as categorical\n", " mutate(cuisine = factor(cuisine))\n", "\n", "\n", "# Create data split specification\n", "set.seed(2056)\n", "cuisines_split <- initial_split(data = df_select,\n", " strata = cuisine,\n", " prop = 0.7)\n", "\n", "# Extract the data in each split\n", "cuisines_train <- training(cuisines_split)\n", "cuisines_test <- testing(cuisines_split)\n", "\n", "# Display distribution of cuisines in the training set\n", "cuisines_train %>% \n", " count(cuisine) %>% \n", " arrange(desc(n))" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "zFin5yw3hHb1" }, "source": [ "### Kushughulikia Data Isiyosawazishwa\n", "\n", "Data isiyosawazishwa mara nyingi ina athari mbaya kwenye utendaji wa modeli. Modeli nyingi hufanya kazi vizuri zaidi pale idadi ya uchunguzi ni sawa, na kwa hivyo huwa zinapata changamoto na data isiyosawazishwa.\n", "\n", "Kuna njia kuu mbili za kushughulikia seti za data isiyosawazishwa:\n", "\n", "- kuongeza uchunguzi kwenye darasa lenye idadi ndogo: `Over-sampling` kwa mfano kutumia algoriti ya SMOTE ambayo huzalisha mifano mipya ya darasa lenye idadi ndogo kwa kutumia majirani wa karibu wa kesi hizo.\n", "\n", "- kuondoa uchunguzi kutoka darasa lenye idadi kubwa: `Under-sampling`\n", "\n", "Katika somo letu la awali, tulionyesha jinsi ya kushughulikia seti za data isiyosawazishwa kwa kutumia `recipe`. Recipe inaweza kufikiriwa kama mpango unaoelezea hatua gani zinapaswa kutumika kwenye seti ya data ili kuifanya iwe tayari kwa uchambuzi wa data. Katika hali yetu, tunataka kuwa na usambazaji sawa wa idadi ya vyakula vyetu kwa `training set` yetu. Hebu tuingie moja kwa moja.\n" ] }, { "cell_type": "code", "metadata": { "id": "cRzTnHolhLWd" }, "source": [ "# Load themis package for dealing with imbalanced data\n", "library(themis)\n", "\n", "# Create a recipe for preprocessing training data\n", "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n", " step_smote(cuisine) \n", "\n", "# Print recipe\n", "cuisines_recipe" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "KxOQ2ORhhO81" }, "source": [ "Sasa tuko tayari kufundisha mifano πŸ‘©β€πŸ’»πŸ‘¨β€πŸ’»!\n", "\n", "## 3. Zaidi ya mifano ya regression ya multinomial\n", "\n", "Katika somo letu la awali, tulichunguza mifano ya regression ya multinomial. Hebu tuangalie mifano mingine yenye kubadilika zaidi kwa ajili ya uainishaji.\n", "\n", "### Support Vector Machines\n", "\n", "Katika muktadha wa uainishaji, `Support Vector Machines` ni mbinu ya kujifunza kwa mashine inayojaribu kutafuta *hyperplane* inayotenganisha darasa kwa \"ubora\" zaidi. Hebu tuangalie mfano rahisi:\n", "\n", "

\n", " \n", "

https://commons.wikimedia.org/w/index.php?curid=22877598
\n" ] }, { "cell_type": "markdown", "metadata": { "id": "C4Wsd0vZhXYu" }, "source": [ "H1~ haigawanyi madarasa. H2~ inagawanya, lakini kwa pengo dogo tu. H3~ inagawanya kwa pengo kubwa zaidi.\n", "\n", "#### Klasifaya ya Msaada wa Vector ya Mstari\n", "\n", "Kuweka vikundi kwa kutumia Support-Vector (SVC) ni sehemu ya familia ya mbinu za ML za Support-Vector machines. Katika SVC, hyperplane huchaguliwa ili kutenganisha kwa usahihi `sehemu kubwa` ya uchunguzi wa mafunzo, lakini `inaweza kukosea` uchunguzi kadhaa. Kwa kuruhusu baadhi ya alama kuwa upande usio sahihi, SVM inakuwa thabiti zaidi kwa data isiyo ya kawaida na hivyo kuboresha uwezo wa kujumlisha data mpya. Kigezo kinachosimamia ukiukaji huu kinaitwa `gharama` ambayo ina thamani ya msingi ya 1 (tazama `help(\"svm_poly\")`).\n", "\n", "Hebu tuunde SVC ya mstari kwa kuweka `degree = 1` katika mfano wa polynomial SVM.\n" ] }, { "cell_type": "code", "metadata": { "id": "vJpp6nuChlBz" }, "source": [ "# Make a linear SVC specification\n", "svc_linear_spec <- svm_poly(degree = 1) %>% \n", " set_engine(\"kernlab\") %>% \n", " set_mode(\"classification\")\n", "\n", "# Bundle specification and recipe into a worklow\n", "svc_linear_wf <- workflow() %>% \n", " add_recipe(cuisines_recipe) %>% \n", " add_model(svc_linear_spec)\n", "\n", "# Print out workflow\n", "svc_linear_wf" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "rDs8cWNkhoqu" }, "source": [ "Sasa kwa kuwa tumeshakamata hatua za awali za uchakataji na maelezo ya modeli ndani ya *workflow*, tunaweza kuendelea na kufundisha SVC ya mstari na kutathmini matokeo wakati huo huo. Kwa vipimo vya utendaji, hebu tuunde seti ya vipimo ambayo itatathmini: `accuracy`, `sensitivity`, `Positive Predicted Value` na `F Measure`.\n", "\n", "> `augment()` itaongeza safu(safu) za utabiri kwenye data iliyotolewa.\n" ] }, { "cell_type": "code", "metadata": { "id": "81wiqcwuhrnq" }, "source": [ "# Train a linear SVC model\n", "svc_linear_fit <- svc_linear_wf %>% \n", " fit(data = cuisines_train)\n", "\n", "# Create a metric set\n", "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n", "\n", "\n", "# Make predictions and Evaluate model performance\n", "svc_linear_fit %>% \n", " augment(new_data = cuisines_test) %>% \n", " eval_metrics(truth = cuisine, estimate = .pred_class)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "0UFQvHf-huo3" }, "source": [ "#### Mashine ya Msaada wa Vector\n", "\n", "Mashine ya msaada wa vector (SVM) ni upanuzi wa mclasifia wa msaada wa vector ili kuweza kushughulikia mpaka usio wa mstari kati ya madarasa. Kimsingi, SVM hutumia *mbinu ya kernel* kupanua nafasi ya sifa ili kuendana na uhusiano usio wa mstari kati ya madarasa. Mojawapo ya kazi maarufu na yenye kubadilika sana ya kernel inayotumiwa na SVM ni *Radial basis function.* Hebu tuone jinsi itakavyofanya kazi kwenye data yetu.\n" ] }, { "cell_type": "code", "metadata": { "id": "-KX4S8mzhzmp" }, "source": [ "set.seed(2056)\n", "\n", "# Make an RBF SVM specification\n", "svm_rbf_spec <- svm_rbf() %>% \n", " set_engine(\"kernlab\") %>% \n", " set_mode(\"classification\")\n", "\n", "# Bundle specification and recipe into a worklow\n", "svm_rbf_wf <- workflow() %>% \n", " add_recipe(cuisines_recipe) %>% \n", " add_model(svm_rbf_spec)\n", "\n", "\n", "# Train an RBF model\n", "svm_rbf_fit <- svm_rbf_wf %>% \n", " fit(data = cuisines_train)\n", "\n", "\n", "# Make predictions and Evaluate model performance\n", "svm_rbf_fit %>% \n", " augment(new_data = cuisines_test) %>% \n", " eval_metrics(truth = cuisine, estimate = .pred_class)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "QBFSa7WSh4HQ" }, "source": [ "Bora zaidi 🀩!\n", "\n", "> βœ… Tafadhali angalia:\n", ">\n", "> - [*Support Vector Machines*](https://bradleyboehmke.github.io/HOML/svm.html), Hands-on Machine Learning with R\n", ">\n", "> - [*Support Vector Machines*](https://www.statlearning.com/), An Introduction to Statistical Learning with Applications in R\n", ">\n", "> kwa kusoma zaidi.\n", "\n", "### Vainishi vya Jirani wa Karibu\n", "\n", "*K*-jirani wa karibu (KNN) ni algorithimu ambapo kila uchunguzi unatabiriwa kulingana na *ufanano* wake na uchunguzi mwingine.\n", "\n", "Hebu tuifanyie data yetu.\n" ] }, { "cell_type": "code", "metadata": { "id": "k4BxxBcdh9Ka" }, "source": [ "# Make a KNN specification\n", "knn_spec <- nearest_neighbor() %>% \n", " set_engine(\"kknn\") %>% \n", " set_mode(\"classification\")\n", "\n", "# Bundle recipe and model specification into a workflow\n", "knn_wf <- workflow() %>% \n", " add_recipe(cuisines_recipe) %>% \n", " add_model(knn_spec)\n", "\n", "# Train a boosted tree model\n", "knn_wf_fit <- knn_wf %>% \n", " fit(data = cuisines_train)\n", "\n", "\n", "# Make predictions and Evaluate model performance\n", "knn_wf_fit %>% \n", " augment(new_data = cuisines_test) %>% \n", " eval_metrics(truth = cuisine, estimate = .pred_class)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "HaegQseriAcj" }, "source": [ "Inaonekana kwamba modeli hii haifanyi kazi vizuri sana. Huenda kubadilisha vigezo vya modeli (tazama `help(\"nearest_neighbor\")`) kutaboresha utendaji wa modeli. Hakikisha kujaribu.\n", "\n", "> βœ… Tafadhali angalia:\n", ">\n", "> - [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n", ">\n", "> - [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n", ">\n", "> ili kujifunza zaidi kuhusu *K*-Nearest Neighbors classifiers.\n", "\n", "### Wahesabuji wa Ensemble\n", "\n", "Algoriti za ensemble hufanya kazi kwa kuunganisha makadirio kadhaa ya msingi ili kuzalisha modeli bora kwa kutumia:\n", "\n", "`bagging`: kutumia *kazi ya wastani* kwa mkusanyiko wa modeli za msingi\n", "\n", "`boosting`: kujenga mfululizo wa modeli zinazojenga juu ya kila moja ili kuboresha utendaji wa utabiri.\n", "\n", "Hebu tuanze kwa kujaribu modeli ya Random Forest, ambayo hujenga mkusanyiko mkubwa wa miti ya maamuzi kisha hutumia kazi ya wastani ili kupata modeli bora zaidi kwa ujumla.\n" ] }, { "cell_type": "code", "metadata": { "id": "49DPoVs6iK1M" }, "source": [ "# Make a random forest specification\n", "rf_spec <- rand_forest() %>% \n", " set_engine(\"ranger\") %>% \n", " set_mode(\"classification\")\n", "\n", "# Bundle recipe and model specification into a workflow\n", "rf_wf <- workflow() %>% \n", " add_recipe(cuisines_recipe) %>% \n", " add_model(rf_spec)\n", "\n", "# Train a random forest model\n", "rf_wf_fit <- rf_wf %>% \n", " fit(data = cuisines_train)\n", "\n", "\n", "# Make predictions and Evaluate model performance\n", "rf_wf_fit %>% \n", " augment(new_data = cuisines_test) %>% \n", " eval_metrics(truth = cuisine, estimate = .pred_class)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "RGVYwC_aiUWc" }, "source": [ "Kazi nzuri πŸ‘!\n", "\n", "Hebu pia tujaribu na mfano wa Boosted Tree.\n", "\n", "Boosted Tree hufafanua mbinu ya ensemble inayounda mfululizo wa miti ya maamuzi ya mfululizo ambapo kila mti unategemea matokeo ya miti ya awali kwa lengo la kupunguza makosa hatua kwa hatua. Inalenga uzito wa vitu vilivyokosewa kuainishwa na kurekebisha mwelekeo wa classifier inayofuata ili kusahihisha.\n", "\n", "Kuna njia tofauti za kufanikisha mfano huu (tazama `help(\"boost_tree\")`). Katika mfano huu, tutafanikisha Boosted trees kupitia injini ya `xgboost`.\n" ] }, { "cell_type": "code", "metadata": { "id": "Py1YWo-micWs" }, "source": [ "# Make a boosted tree specification\n", "boost_spec <- boost_tree(trees = 200) %>% \n", " set_engine(\"xgboost\") %>% \n", " set_mode(\"classification\")\n", "\n", "# Bundle recipe and model specification into a workflow\n", "boost_wf <- workflow() %>% \n", " add_recipe(cuisines_recipe) %>% \n", " add_model(boost_spec)\n", "\n", "# Train a boosted tree model\n", "boost_wf_fit <- boost_wf %>% \n", " fit(data = cuisines_train)\n", "\n", "\n", "# Make predictions and Evaluate model performance\n", "boost_wf_fit %>% \n", " augment(new_data = cuisines_test) %>% \n", " eval_metrics(truth = cuisine, estimate = .pred_class)" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "zNQnbuejigZM" }, "source": [ "βœ… Tafadhali angalia:\n", "\n", "- [Machine Learning for Social Scientists](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n", "\n", "- [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n", "\n", "- [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n", "\n", "- - Inachunguza mfano wa AdaBoost ambao ni mbadala mzuri kwa xgboost.\n", "\n", "kujifunza zaidi kuhusu waainishaji wa Ensemble.\n", "\n", "## 4. Ziada - kulinganisha mifano mingi\n", "\n", "Tumetengeneza idadi kubwa ya mifano katika maabara hii πŸ™Œ. Inaweza kuwa kazi ngumu au ya kuchosha kuunda mtiririko wa kazi nyingi kutoka kwa seti tofauti za preprocessors na/au maelezo ya mifano kisha kuhesabu vipimo vya utendaji moja baada ya nyingine.\n", "\n", "Hebu tuone kama tunaweza kushughulikia hili kwa kuunda kazi ambayo inafaa orodha ya mtiririko wa kazi kwenye seti ya mafunzo kisha inarudisha vipimo vya utendaji kulingana na seti ya majaribio. Tutatumia `map()` na `map_dfr()` kutoka kwenye kifurushi cha [purrr](https://purrr.tidyverse.org/) ili kutumia kazi kwa kila kipengele katika orodha.\n", "\n", "> [`map()`](https://purrr.tidyverse.org/reference/map.html) kazi zinakuruhusu kubadilisha mikondo mingi ya for na msimbo ambao ni mfupi zaidi na rahisi kusoma. Sehemu bora ya kujifunza kuhusu [`map()`](https://purrr.tidyverse.org/reference/map.html) kazi ni sura ya [iteration](http://r4ds.had.co.nz/iteration.html) katika R kwa data science.\n" ] }, { "cell_type": "code", "metadata": { "id": "Qzb7LyZnimd2" }, "source": [ "set.seed(2056)\n", "\n", "# Create a metric set\n", "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n", "\n", "# Define a function that returns performance metrics\n", "compare_models <- function(workflow_list, train_set, test_set){\n", " \n", " suppressWarnings(\n", " # Fit each model to the train_set\n", " map(workflow_list, fit, data = train_set) %>% \n", " # Make predictions on the test set\n", " map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n", " # Select desired columns\n", " select(model, cuisine, .pred_class) %>% \n", " # Evaluate model performance\n", " group_by(model) %>% \n", " eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n", " ungroup()\n", " )\n", " \n", "} # End of function" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "Fwa712sNisDA" }, "source": [] }, { "cell_type": "code", "metadata": { "id": "3i4VJOi2iu-a" }, "source": [ "# Make a list of workflows\n", "workflow_list <- list(\n", " \"svc\" = svc_linear_wf,\n", " \"svm\" = svm_rbf_wf,\n", " \"knn\" = knn_wf,\n", " \"random_forest\" = rf_wf,\n", " \"xgboost\" = boost_wf)\n", "\n", "# Call the function\n", "set.seed(2056)\n", "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n", "\n", "# Print out performance metrics\n", "perf_metrics %>% \n", " group_by(.metric) %>% \n", " arrange(desc(.estimate)) %>% \n", " slice_head(n=7)\n", "\n", "# Compare accuracy\n", "perf_metrics %>% \n", " filter(.metric == \"accuracy\") %>% \n", " arrange(desc(.estimate))\n" ], "execution_count": null, "outputs": [] }, { "cell_type": "markdown", "metadata": { "id": "KuWK_lEli4nW" }, "source": [ "Kifurushi cha [**workflowset**](https://workflowsets.tidymodels.org/) kinawawezesha watumiaji kuunda na kufanikisha urahisi idadi kubwa ya mifano ya modeli, lakini kimeundwa hasa kufanya kazi na mbinu za sampuli kama `cross-validation`, mbinu ambayo bado hatujafikia.\n", "\n", "## **πŸš€Changamoto**\n", "\n", "Kila moja ya mbinu hizi ina idadi kubwa ya vigezo ambavyo unaweza kurekebisha, kwa mfano `cost` katika SVMs, `neighbors` katika KNN, `mtry` (Vitabiri Vilivyochaguliwa kwa Nasibu) katika Random Forest.\n", "\n", "Fanya utafiti kuhusu vigezo vya msingi vya kila moja na fikiria maana ya kurekebisha vigezo hivi kwa ubora wa modeli.\n", "\n", "Ili kupata maelezo zaidi kuhusu modeli fulani na vigezo vyake, tumia: `help(\"model\")` mfano `help(\"rand_forest\")`\n", "\n", "> Kwa vitendo, mara nyingi tunafanya *makadirio* ya *thamani bora* kwa kufundisha modeli nyingi kwenye `seti ya data iliyosimuliwa` na kupima jinsi modeli hizi zinavyofanya kazi. Mchakato huu unaitwa **tuning**.\n", "\n", "### [**Jaribio la baada ya somo**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n", "\n", "### **Mapitio na Kujifunza Binafsi**\n", "\n", "Kuna maneno mengi ya kitaalamu katika masomo haya, kwa hivyo chukua muda kupitia [orodha hii](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott) ya istilahi muhimu!\n", "\n", "#### SHUKRANI KWA:\n", "\n", "[`Allison Horst`](https://twitter.com/allison_horst/) kwa kuunda michoro ya kuvutia inayofanya R kuwa ya kupendeza na ya kuvutia zaidi. Pata michoro zaidi kwenye [galeria yake](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM).\n", "\n", "[Cassie Breviu](https://www.twitter.com/cassieview) na [Jen Looper](https://www.twitter.com/jenlooper) kwa kuunda toleo la awali la moduli hii kwa Python β™₯️\n", "\n", "Jifunze kwa furaha,\n", "\n", "[Eric](https://twitter.com/ericntay), Balozi wa Wanafunzi wa Microsoft Learn wa Dhahabu.\n", "\n", "

\n", " \n", "

Michoro na @allison_horst
\n" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ "\n---\n\n**Kanusho**: \nHati hii imetafsiriwa kwa kutumia huduma ya tafsiri ya AI [Co-op Translator](https://github.com/Azure/co-op-translator). Ingawa tunajitahidi kwa usahihi, tafadhali fahamu kuwa tafsiri za kiotomatiki zinaweza kuwa na makosa au kutokuwa sahihi. Hati ya asili katika lugha yake ya awali inapaswa kuzingatiwa kama chanzo cha mamlaka. Kwa taarifa muhimu, inashauriwa kutumia tafsiri ya kitaalamu ya binadamu. Hatutawajibika kwa maelewano mabaya au tafsiri zisizo sahihi zinazotokana na matumizi ya tafsiri hii.\n" ] } ] }