diff --git a/4-Classification/2-Classifiers-1/solution/R/lesson_11.Rmd b/4-Classification/2-Classifiers-1/solution/R/lesson_11.Rmd index 695ac7e55..2bc9b9988 100644 --- a/4-Classification/2-Classifiers-1/solution/R/lesson_11.Rmd +++ b/4-Classification/2-Classifiers-1/solution/R/lesson_11.Rmd @@ -18,7 +18,7 @@ In this lesson, we'll explore a variety of classifiers to *predict a given natio ### **Preparation** -This lesson builds up on our [previous lesson](https://github.com/microsoft/ML-For-Beginners/blob/main/4-Classification/1-Introduction/solution/lesson_10-R.ipynb) where we: +This lesson builds up on our [previous lesson](https://github.com/microsoft/ML-For-Beginners/blob/main/4-Classification/1-Introduction/solution/R/lesson_10.html) where we: - Made a gentle introduction to classifications using a dataset about all the brilliant cuisines of Asia and India 😋. diff --git a/4-Classification/2-Classifiers-1/solution/R/lesson_11.html b/4-Classification/2-Classifiers-1/solution/R/lesson_11.html new file mode 100644 index 000000000..20a9d3b90 --- /dev/null +++ b/4-Classification/2-Classifiers-1/solution/R/lesson_11.html @@ -0,0 +1,3560 @@ + + + + + + + + + + + + + +Build a classification model: Delicious Asian and Indian Cuisines + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

Cuisine classifiers 1

+

In this lesson, we’ll explore a variety of classifiers to predict +a given national cuisine based on a group of ingredients. While +doing so, we’ll learn more about some of the ways that algorithms can be +leveraged for classification tasks.

+ +
+

Preparation

+

This lesson builds up on our previous +lesson where we:

+
    +
  • Made a gentle introduction to classifications using a dataset +about all the brilliant cuisines of Asia and India 😋.

  • +
  • Explored some dplyr +verbs to prep and clean our data.

  • +
  • Made beautiful visualizations using ggplot2.

  • +
  • Demonstrated how to deal with imbalanced data by preprocessing it +using recipes.

  • +
  • Demonstrated how to prep and bake our +recipe to confirm that it will work as supposed to.

  • +
+
+

Prerequisite

+

For this lesson, we’ll require the following packages to clean, prep +and visualize our data:

+
    +
  • tidyverse: The tidyverse is a collection of R packages +designed to makes data science faster, easier and more fun!

  • +
  • tidymodels: The tidymodels framework is a collection of packages +for modeling and machine learning.

  • +
  • DataExplorer: The DataExplorer +package is meant to simplify and automate EDA process and report +generation.

  • +
  • themis: The themis package provides Extra +Recipes Steps for Dealing with Unbalanced Data.

  • +
  • nnet: The nnet +package provides functions for estimating feed-forward neural +networks with a single hidden layer, and for multinomial logistic +regression models.

  • +
+

You can have them installed as:

+

install.packages(c("tidyverse", "tidymodels", "DataExplorer", "here"))

+

Alternatively, the script below checks whether you have the packages +required to complete this module and installs them for you in case they +are missing.

+
suppressWarnings(if (!require("pacman"))install.packages("pacman"))
+
+pacman::p_load(tidyverse, tidymodels, DataExplorer, themis, here)
+

Now, let’s hit the ground running!

+
+
+
+
+

1. Split the data into training and test sets.

+

We’ll start by picking a few steps from our previous lesson.

+
+

Drop the most common ingredients that create confusion between +distinct cuisines, using dplyr::select().

+

Everyone loves rice, garlic and ginger!

+
# Load the original cuisines data
+df <- read_csv(file = "https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv")
+
## New names:
+## Rows: 2448 Columns: 385
+## ── Column specification
+## ──────────────────────────────────────────────────────── Delimiter: "," chr
+## (1): cuisine dbl (384): ...1, almond, angelica, anise, anise_seed, apple,
+## apple_brandy, a...
+## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
+## Specify the column types or set `show_col_types = FALSE` to quiet this message.
+## • `` -> `...1`
+
# Drop id column, rice, garlic and ginger from our original data set
+df_select <- df %>% 
+  select(-c(1, rice, garlic, ginger)) %>%
+  # Encode cuisine column as categorical
+  mutate(cuisine = factor(cuisine))
+
+# Display new data set
+df_select %>% 
+  slice_head(n = 5)
+
+ +
+
# Display distribution of cuisines
+df_select %>% 
+  count(cuisine) %>% 
+  arrange(desc(n))
+
+ +
+

Perfect! Now, time to split the data such that 70% of the data goes +to training and 30% goes to testing. We’ll also apply a +stratification technique when splitting the data to +maintain the proportion of each cuisine in the training and +validation datasets.

+

rsample, a package in +Tidymodels, provides infrastructure for efficient data splitting and +resampling:

+
# Load the core Tidymodels packages into R session
+library(tidymodels)
+
+# Create split specification
+set.seed(2056)
+cuisines_split <- initial_split(data = df_select,
+                                strata = cuisine,
+                                prop = 0.7)
+
+# Extract the data in each split
+cuisines_train <- training(cuisines_split)
+cuisines_test <- testing(cuisines_split)
+
+# Print the number of cases in each split
+cat("Training cases: ", nrow(cuisines_train), "\n",
+    "Test cases: ", nrow(cuisines_test), sep = "")
+
## Training cases: 1712
+## Test cases: 736
+
# Display the first few rows of the training set
+cuisines_train %>% 
+  slice_head(n = 5)
+
+ +
+
# Display distribution of cuisines in the training set
+cuisines_train %>% 
+  count(cuisine) %>% 
+  arrange(desc(n))
+
+ +
+
+
+
+

2. Deal with imbalanced data

+

As you might have noticed in the original data set as well as in our +training set, there is quite an unequal distribution in the number of +cuisines. Korean cuisines are almost 3 times Thai cuisines. +Imbalanced data often has negative effects on the model performance. +Many models perform best when the number of observations is equal and, +thus, tend to struggle with unbalanced data.

+

There are majorly two ways of dealing with imbalanced data sets:

+
    +
  • adding observations to the minority class: +Over-sampling e.g using a SMOTE algorithm which +synthetically generates new examples of the minority class using nearest +neighbors of these cases.

  • +
  • removing observations from majority class: +Under-sampling

  • +
+

In our previous lesson, we demonstrated how to deal with imbalanced +data sets using a recipe. A recipe can be thought of as a +blueprint that describes what steps should be applied to a data set in +order to get it ready for data analysis. In our case, we want to have an +equal distribution in the number of our cuisines for our +training set. Let’s get right into it.

+
# Load themis package for dealing with imbalanced data
+library(themis)
+
+# Create a recipe for preprocessing training data
+cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>% 
+  step_smote(cuisine)
+
+# Print recipe
+cuisines_recipe
+
## 
+
## ── Recipe ──────────────────────────────────────────────────────────────────────
+
## 
+
## ── Inputs
+
## Number of variables by role
+
## outcome:     1
+## predictor: 380
+
## 
+
## ── Operations
+
## • SMOTE based on: cuisine
+

You can of course go ahead and confirm (using prep+bake) that the +recipe will work as you expect it - all the cuisine labels having +559 observations.

+

Since we’ll be using this recipe as a preprocessor for modeling, a +workflow() will do all the prep and bake for us, so we +won’t have to manually estimate the recipe.

+

Now we are ready to train a model 👩‍💻👨‍💻!

+
+
+

3. Choosing your classifier

+
+Artwork by @allison_horst +
Artwork by @allison_horst
+
+

Now we have to decide which algorithm to use for the job 🤔.

+

In Tidymodels, the parsnip package +provides consistent interface for working with models across different +engines (packages). Please see the parsnip documentation to explore model types & +engines and their corresponding model +arguments. The variety is quite bewildering at first sight. For +instance, the following methods all include classification +techniques:

+
    +
  • C5.0 Rule-Based Classification Models

  • +
  • Flexible Discriminant Models

  • +
  • Linear Discriminant Models

  • +
  • Regularized Discriminant Models

  • +
  • Logistic Regression Models

  • +
  • Multinomial Regression Models

  • +
  • Naive Bayes Models

  • +
  • Support Vector Machines

  • +
  • Nearest Neighbors

  • +
  • Decision Trees

  • +
  • Ensemble methods

  • +
  • Neural Networks

  • +
+

The list goes on!

+
+

What classifier to go with?

+

So, which classifier should you choose? Often, running through +several and looking for a good result is a way to test.

+
+

AutoML solves this problem neatly by running these comparisons in the +cloud, allowing you to choose the best algorithm for your data. Try it +here

+
+

Also the choice of classifier depends on our problem. For instance, +when the outcome can be categorized into +more than two classes, like in our case, you must use a +multiclass classification algorithm as opposed to +binary classification.

+
+
+

A better approach

+

A better way than wildly guessing, however, is to follow the ideas on +this downloadable ML +Cheat sheet. Here, we discover that, for our multiclass problem, we +have some choices:

+
+A section of Microsoft’s Algorithm Cheat Sheet, detailing multiclass classification options +
A section of Microsoft’s Algorithm Cheat Sheet, +detailing multiclass classification options
+
+
+
+

Reasoning

+

Let’s see if we can reason our way through different approaches given +the constraints we have:

+
    +
  • Deep Neural networks are too heavy. Given our +clean, but minimal dataset, and the fact that we are running training +locally via notebooks, deep neural networks are too heavyweight for this +task.

  • +
  • No two-class classifier. We do not use a +two-class classifier, so that rules out one-vs-all.

  • +
  • Decision tree or logistic regression could work. +A decision tree might work, or multinomial regression/multiclass +logistic regression for multiclass data.

  • +
  • Multiclass Boosted Decision Trees solve a different +problem. The multiclass boosted decision tree is most suitable +for nonparametric tasks, e.g. tasks designed to build rankings, so it is +not useful for us.

  • +
+

Also, normally before embarking on more complex machine learning +models e.g ensemble methods, it’s a good idea to build the simplest +possible model to get an idea of what is going on. So for this lesson, +we’ll start with a multinomial logistic regression +model.

+
+

Logistic regression is a technique used when the outcome variable is +categorical (or nominal). For Binary logistic regression the number of +outcome variables is two, whereas the number of outcome variables for +multinomial logistic regression is more than two. See Advanced +Regression Methods for further reading.

+
+
+
+
+

4. Train and evaluate a Multinomial logistic regression model.

+

In Tidymodels, parsnip::multinom_reg(), defines a model +that uses linear predictors to predict multiclass data using the +multinomial distribution. See ?multinom_reg() for the +different ways/engines you can use to fit this model.

+

For this example, we’ll fit a Multinomial regression model via the +default nnet +engine.

+
+

I picked a value for penalty sort of randomly. There are +better ways to choose this value that is, by using +resampling and tuning the model which we’ll +discuss later.

+

See Tidymodels: +Get Started in case you want to learn more on how to tune model +hyperparameters.

+
+
# Create a multinomial regression model specification
+mr_spec <- multinom_reg(penalty = 1) %>% 
+  set_engine("nnet", MaxNWts = 2086) %>% 
+  set_mode("classification")
+
+# Print model specification
+mr_spec
+
## Multinomial Regression Model Specification (classification)
+## 
+## Main Arguments:
+##   penalty = 1
+## 
+## Engine-Specific Arguments:
+##   MaxNWts = 2086
+## 
+## Computational engine: nnet
+

Great job 🥳! Now that we have a recipe and a model specification, we +need to find a way of bundling them together into an object that will +first preprocess the data then fit the model on the preprocessed data +and also allow for potential post-processing activities. In Tidymodels, +this convenient object is called a workflow and +conveniently holds your modeling components! This is what we’d call +pipelines in Python.

+

So let’s bundle everything up into a workflow!📦

+
# Bundle recipe and model specification
+mr_wf <- workflow() %>% 
+  add_recipe(cuisines_recipe) %>% 
+  add_model(mr_spec)
+
+# Print out workflow
+mr_wf
+
## ══ Workflow ════════════════════════════════════════════════════════════════════
+## Preprocessor: Recipe
+## Model: multinom_reg()
+## 
+## ── Preprocessor ────────────────────────────────────────────────────────────────
+## 1 Recipe Step
+## 
+## • step_smote()
+## 
+## ── Model ───────────────────────────────────────────────────────────────────────
+## Multinomial Regression Model Specification (classification)
+## 
+## Main Arguments:
+##   penalty = 1
+## 
+## Engine-Specific Arguments:
+##   MaxNWts = 2086
+## 
+## Computational engine: nnet
+

Workflows 👌👌! A workflow() can be fit +in much the same way a model can. So, time to train a model!

+
# Train a multinomial regression model
+mr_fit <- fit(object = mr_wf, data = cuisines_train)
+
+mr_fit
+
## ══ Workflow [trained] ══════════════════════════════════════════════════════════
+## Preprocessor: Recipe
+## Model: multinom_reg()
+## 
+## ── Preprocessor ────────────────────────────────────────────────────────────────
+## 1 Recipe Step
+## 
+## • step_smote()
+## 
+## ── Model ───────────────────────────────────────────────────────────────────────
+## Call:
+## nnet::multinom(formula = ..y ~ ., data = data, decay = ~1, MaxNWts = ~2086, 
+##     trace = FALSE)
+## 
+## Coefficients:
+##          (Intercept)     almond angelica         anise anise_seed       apple
+## indian    0.19723325  0.2409661        0 -5.004955e-05 -0.1657635 -0.05769734
+## japanese  0.13961959 -0.6262400        0 -1.169155e-04 -0.4893596 -0.08585717
+## korean    0.22377347 -0.1833485        0 -5.560395e-05 -0.2489401 -0.15657804
+## thai     -0.04336577 -0.6106258        0  4.903828e-04 -0.5782866  0.63451105
+##          apple_brandy     apricot armagnac   artemisia artichoke   asparagus
+## indian              0  0.37042636        0 -0.09122797         0 -0.27181970
+## japanese            0  0.28895643        0 -0.12651100         0  0.14054037
+## korean              0 -0.07981259        0  0.55756709         0 -0.66979948
+## thai                0 -0.33160904        0 -0.10725182         0 -0.02602152
+##              avocado       bacon baked_potato balm     banana     barley
+## indian   -0.46624197  0.16008055            0    0 -0.2838796  0.2230625
+## japanese  0.90341344  0.02932727            0    0 -0.4142787  2.0953906
+## korean   -0.06925382 -0.35804134            0    0 -0.2686963 -0.7233404
+## thai     -0.21473955 -0.75594439            0    0  0.6784880 -0.4363320
+##          bartlett_pear      basil        bay       bean         beech
+## indian               0 -0.7128756  0.1011587 -0.8777275 -0.0004380795
+## japanese             0  0.1288697  0.9425626 -0.2380748  0.3373437611
+## korean               0 -0.2445193 -0.4744318 -0.8957870 -0.0048784496
+## thai                 0  1.5365848  0.1333256  0.2196970 -0.0113078024
+##                beef beef_broth   beef_liver         beer        beet
+## indian   -0.7985278  0.2430186 -0.035598065 -0.002173738  0.01005813
+## japanese  0.2241875 -0.3653020 -0.139551027  0.128905553  0.04923911
+## korean    0.5366515 -0.6153237  0.213455197 -0.010828645  0.27325423
+## thai      0.1570012 -0.9364154 -0.008032213 -0.035063746 -0.28279823
+##          bell_pepper bergamot       berry bitter_orange black_bean
+## indian    0.49074330        0  0.58947607   0.191256164 -0.1945233
+## japanese  0.09074167        0 -0.25917977  -0.118915977 -0.3442400
+## korean   -0.57876763        0 -0.07874180  -0.007729435 -0.5220672
+## thai      0.92554006        0 -0.07210196  -0.002983296 -0.4614426
+##          black_currant black_mustard_seed_oil black_pepper black_raspberry
+## indian               0             0.38935801   -0.4453495               0
+## japanese             0            -0.05452887   -0.5440869               0
+## korean               0            -0.03929970    0.8025454               0
+## thai                 0            -0.21498372   -0.9854806               0
+##          black_sesame_seed  black_tea   blackberry blackberry_brandy
+## indian          -0.2759246  0.3079977  0.191256164                 0
+## japanese        -0.6101687 -0.1671913 -0.118915977                 0
+## korean           1.5197674 -0.3036261 -0.007729435                 0
+## thai            -0.1755656 -0.1487033 -0.002983296                 0
+##          blue_cheese    blueberry   bone_oil bourbon_whiskey      brandy
+## indian             0  0.216164294 -0.2276744               0  0.22427587
+## japanese           0 -0.119186087  0.3913019               0 -0.15595599
+## korean             0 -0.007821986  0.2854487               0 -0.02562342
+## thai               0 -0.004947048 -0.0253658               0 -0.05715244
+## 
+## ...
+## and 308 more lines.
+

The output shows the coefficients that the model learned during +training.

+
+

Evaluate the Trained Model

+

It’s time to see how the model performed 📏 by evaluating it on a +test set! Let’s begin by making predictions on the test set.

+
# Make predictions on the test set
+results <- cuisines_test %>% select(cuisine) %>% 
+  bind_cols(mr_fit %>% predict(new_data = cuisines_test))
+
+# Print out results
+results %>% 
+  slice_head(n = 5)
+
+ +
+

Great job! In Tidymodels, evaluating model performance can be done +using yardstick - a +package used to measure the effectiveness of models using performance +metrics. As we did in our logistic regression lesson, let’s begin by +computing a confusion matrix.

+
# Confusion matrix for categorical data
+conf_mat(data = results, truth = cuisine, estimate = .pred_class)
+
##           Truth
+## Prediction chinese indian japanese korean thai
+##   chinese       83      1        8     15   10
+##   indian         4    163        1      2    6
+##   japanese      21      5       73     25    1
+##   korean        15      0       11    191    0
+##   thai          10     11        3      7   70
+

When dealing with multiple classes, it’s generally more intuitive to +visualize this as a heat map, like this:

+
update_geom_defaults(geom = "tile", new = list(color = "black", alpha = 0.7))
+# Visualize confusion matrix
+results %>% 
+  conf_mat(cuisine, .pred_class) %>% 
+  autoplot(type = "heatmap")
+

+

The darker squares in the confusion matrix plot indicate high numbers +of cases, and you can hopefully see a diagonal line of darker squares +indicating cases where the predicted and actual label are the same.

+

Let’s now calculate summary statistics for the confusion matrix.

+
# Summary stats for confusion matrix
+conf_mat(data = results, truth = cuisine, estimate = .pred_class) %>% summary()
+
+ +
+

If we narrow down to some metrics such as accuracy, sensitivity, ppv, +we are not badly off for a start 🥳!

+
+
+
+

4. Digging Deeper

+

Let’s ask one subtle question: What criteria is used to settle for a +given type of cuisine as the predicted outcome?

+

Well, Statistical machine learning algorithms, like logistic +regression, are based on probability; so what actually gets +predicted by a classifier is a probability distribution over a set of +possible outcomes. The class with the highest probability is then chosen +as the most likely outcome for the given observations.

+

Let’s see this in action by making both hard class predictions and +probabilities.

+
# Make hard class prediction and probabilities
+results_prob <- cuisines_test %>%
+  select(cuisine) %>% 
+  bind_cols(mr_fit %>% predict(new_data = cuisines_test)) %>% 
+  bind_cols(mr_fit %>% predict(new_data = cuisines_test, type = "prob"))
+
+# Print out results
+results_prob %>% 
+  slice_head(n = 5)
+
+ +
+

Much better!

+

✅ Can you explain why the model is pretty sure that the first +observation is Thai?

+
+
+

🚀Challenge

+

In this lesson, you used your cleaned data to build a machine +learning model that can predict a national cuisine based on a series of +ingredients. Take some time to read through the many options +Tidymodels provides to classify data and other +ways to fit multinomial regression.

+
+

THANK YOU TO:

+

Allison Horst +for creating the amazing illustrations that make R more welcoming and +engaging. Find more illustrations at her gallery.

+

Cassie Breviu and Jen Looper for creating the +original Python version of this module ♥️

+

Happy Learning,

+

Eric, Gold Microsoft Learn +Student Ambassador.

+
+
+ +
LS0tCnRpdGxlOiAnQnVpbGQgYSBjbGFzc2lmaWNhdGlvbiBtb2RlbDogRGVsaWNpb3VzIEFzaWFuIGFuZCBJbmRpYW4gQ3Vpc2luZXMnCm91dHB1dDoKICBodG1sX2RvY3VtZW50OgogICAgZGZfcHJpbnQ6IHBhZ2VkCiAgICB0aGVtZTogZmxhdGx5CiAgICBoaWdobGlnaHQ6IGJyZWV6ZWRhcmsKICAgIHRvYzogeWVzCiAgICB0b2NfZmxvYXQ6IHllcwogICAgY29kZV9kb3dubG9hZDogeWVzCi0tLQoKIyMgQ3Vpc2luZSBjbGFzc2lmaWVycyAxCgpJbiB0aGlzIGxlc3Nvbiwgd2UnbGwgZXhwbG9yZSBhIHZhcmlldHkgb2YgY2xhc3NpZmllcnMgdG8gKnByZWRpY3QgYSBnaXZlbiBuYXRpb25hbCBjdWlzaW5lIGJhc2VkIG9uIGEgZ3JvdXAgb2YgaW5ncmVkaWVudHMuKiBXaGlsZSBkb2luZyBzbywgd2UnbGwgbGVhcm4gbW9yZSBhYm91dCBzb21lIG9mIHRoZSB3YXlzIHRoYXQgYWxnb3JpdGhtcyBjYW4gYmUgbGV2ZXJhZ2VkIGZvciBjbGFzc2lmaWNhdGlvbiB0YXNrcy4KCiMjIyBbKipQcmUtbGVjdHVyZSBxdWl6KipdKGh0dHBzOi8vZ3JheS1zYW5kLTA3YTEwZjQwMy4xLmF6dXJlc3RhdGljYXBwcy5uZXQvcXVpei8yMS8pCgojIyMgKipQcmVwYXJhdGlvbioqCgpUaGlzIGxlc3NvbiBidWlsZHMgdXAgb24gb3VyIFtwcmV2aW91cyBsZXNzb25dKGh0dHBzOi8vZ2l0aHViLmNvbS9taWNyb3NvZnQvTUwtRm9yLUJlZ2lubmVycy9ibG9iL21haW4vNC1DbGFzc2lmaWNhdGlvbi8xLUludHJvZHVjdGlvbi9zb2x1dGlvbi9SL2xlc3Nvbl8xMC5odG1sKSB3aGVyZSB3ZToKCi0gICBNYWRlIGEgZ2VudGxlIGludHJvZHVjdGlvbiB0byBjbGFzc2lmaWNhdGlvbnMgdXNpbmcgYSBkYXRhc2V0IGFib3V0IGFsbCB0aGUgYnJpbGxpYW50IGN1aXNpbmVzIG9mIEFzaWEgYW5kIEluZGlhIPCfmIsuCgotICAgRXhwbG9yZWQgc29tZSBbZHBseXIgdmVyYnNdKGh0dHBzOi8vZHBseXIudGlkeXZlcnNlLm9yZy8pIHRvIHByZXAgYW5kIGNsZWFuIG91ciBkYXRhLgoKLSAgIE1hZGUgYmVhdXRpZnVsIHZpc3VhbGl6YXRpb25zIHVzaW5nIGdncGxvdDIuCgotICAgRGVtb25zdHJhdGVkIGhvdyB0byBkZWFsIHdpdGggaW1iYWxhbmNlZCBkYXRhIGJ5IHByZXByb2Nlc3NpbmcgaXQgdXNpbmcgW3JlY2lwZXNdKGh0dHBzOi8vcmVjaXBlcy50aWR5bW9kZWxzLm9yZy9hcnRpY2xlcy9TaW1wbGVfRXhhbXBsZS5odG1sKS4KCi0gICBEZW1vbnN0cmF0ZWQgaG93IHRvIGBwcmVwYCBhbmQgYGJha2VgIG91ciByZWNpcGUgdG8gY29uZmlybSB0aGF0IGl0IHdpbGwgd29yayBhcyBzdXBwb3NlZCB0by4KCiMjIyMgKipQcmVyZXF1aXNpdGUqKgoKRm9yIHRoaXMgbGVzc29uLCB3ZSdsbCByZXF1aXJlIHRoZSBmb2xsb3dpbmcgcGFja2FnZXMgdG8gY2xlYW4sIHByZXAgYW5kIHZpc3VhbGl6ZSBvdXIgZGF0YToKCi0gICBgdGlkeXZlcnNlYDogVGhlIFt0aWR5dmVyc2VdKGh0dHBzOi8vd3d3LnRpZHl2ZXJzZS5vcmcvKSBpcyBhIFtjb2xsZWN0aW9uIG9mIFIgcGFja2FnZXNdKGh0dHBzOi8vd3d3LnRpZHl2ZXJzZS5vcmcvcGFja2FnZXMpIGRlc2lnbmVkIHRvIG1ha2VzIGRhdGEgc2NpZW5jZSBmYXN0ZXIsIGVhc2llciBhbmQgbW9yZSBmdW4hCgotICAgYHRpZHltb2RlbHNgOiBUaGUgW3RpZHltb2RlbHNdKGh0dHBzOi8vd3d3LnRpZHltb2RlbHMub3JnLykgZnJhbWV3b3JrIGlzIGEgW2NvbGxlY3Rpb24gb2YgcGFja2FnZXNdKGh0dHBzOi8vd3d3LnRpZHltb2RlbHMub3JnL3BhY2thZ2VzLykgZm9yIG1vZGVsaW5nIGFuZCBtYWNoaW5lIGxlYXJuaW5nLgoKLSAgIGBEYXRhRXhwbG9yZXJgOiBUaGUgW0RhdGFFeHBsb3JlciBwYWNrYWdlXShodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvRGF0YUV4cGxvcmVyL3ZpZ25ldHRlcy9kYXRhZXhwbG9yZXItaW50cm8uaHRtbCkgaXMgbWVhbnQgdG8gc2ltcGxpZnkgYW5kIGF1dG9tYXRlIEVEQSBwcm9jZXNzIGFuZCByZXBvcnQgZ2VuZXJhdGlvbi4KCi0gICBgdGhlbWlzYDogVGhlIFt0aGVtaXMgcGFja2FnZV0oaHR0cHM6Ly90aGVtaXMudGlkeW1vZGVscy5vcmcvKSBwcm92aWRlcyBFeHRyYSBSZWNpcGVzIFN0ZXBzIGZvciBEZWFsaW5nIHdpdGggVW5iYWxhbmNlZCBEYXRhLgoKLSAgIGBubmV0YDogVGhlIFtubmV0IHBhY2thZ2VdKGh0dHBzOi8vY3Jhbi5yLXByb2plY3Qub3JnL3dlYi9wYWNrYWdlcy9ubmV0L25uZXQucGRmKSBwcm92aWRlcyBmdW5jdGlvbnMgZm9yIGVzdGltYXRpbmcgZmVlZC1mb3J3YXJkIG5ldXJhbCBuZXR3b3JrcyB3aXRoIGEgc2luZ2xlIGhpZGRlbiBsYXllciwgYW5kIGZvciBtdWx0aW5vbWlhbCBsb2dpc3RpYyByZWdyZXNzaW9uIG1vZGVscy4KCllvdSBjYW4gaGF2ZSB0aGVtIGluc3RhbGxlZCBhczoKCmBpbnN0YWxsLnBhY2thZ2VzKGMoInRpZHl2ZXJzZSIsICJ0aWR5bW9kZWxzIiwgIkRhdGFFeHBsb3JlciIsICJoZXJlIikpYAoKQWx0ZXJuYXRpdmVseSwgdGhlIHNjcmlwdCBiZWxvdyBjaGVja3Mgd2hldGhlciB5b3UgaGF2ZSB0aGUgcGFja2FnZXMgcmVxdWlyZWQgdG8gY29tcGxldGUgdGhpcyBtb2R1bGUgYW5kIGluc3RhbGxzIHRoZW0gZm9yIHlvdSBpbiBjYXNlIHRoZXkgYXJlIG1pc3NpbmcuCgpgYGB7ciwgbWVzc2FnZT1GLCB3YXJuaW5nPUZ9CnN1cHByZXNzV2FybmluZ3MoaWYgKCFyZXF1aXJlKCJwYWNtYW4iKSlpbnN0YWxsLnBhY2thZ2VzKCJwYWNtYW4iKSkKCnBhY21hbjo6cF9sb2FkKHRpZHl2ZXJzZSwgdGlkeW1vZGVscywgRGF0YUV4cGxvcmVyLCB0aGVtaXMsIGhlcmUpCmBgYAoKTm93LCBsZXQncyBoaXQgdGhlIGdyb3VuZCBydW5uaW5nIQoKIyMgMS4gU3BsaXQgdGhlIGRhdGEgaW50byB0cmFpbmluZyBhbmQgdGVzdCBzZXRzLgoKV2UnbGwgc3RhcnQgYnkgcGlja2luZyBhIGZldyBzdGVwcyBmcm9tIG91ciBwcmV2aW91cyBsZXNzb24uCgojIyMgRHJvcCB0aGUgbW9zdCBjb21tb24gaW5ncmVkaWVudHMgdGhhdCBjcmVhdGUgY29uZnVzaW9uIGJldHdlZW4gZGlzdGluY3QgY3Vpc2luZXMsIHVzaW5nIGBkcGx5cjo6c2VsZWN0KClgLgoKRXZlcnlvbmUgbG92ZXMgcmljZSwgZ2FybGljIGFuZCBnaW5nZXIhCgpgYGB7ciByZWNhcF9kcm9wfQojIExvYWQgdGhlIG9yaWdpbmFsIGN1aXNpbmVzIGRhdGEKZGYgPC0gcmVhZF9jc3YoZmlsZSA9ICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vbWljcm9zb2Z0L01MLUZvci1CZWdpbm5lcnMvbWFpbi80LUNsYXNzaWZpY2F0aW9uL2RhdGEvY3Vpc2luZXMuY3N2IikKCiMgRHJvcCBpZCBjb2x1bW4sIHJpY2UsIGdhcmxpYyBhbmQgZ2luZ2VyIGZyb20gb3VyIG9yaWdpbmFsIGRhdGEgc2V0CmRmX3NlbGVjdCA8LSBkZiAlPiUgCiAgc2VsZWN0KC1jKDEsIHJpY2UsIGdhcmxpYywgZ2luZ2VyKSkgJT4lCiAgIyBFbmNvZGUgY3Vpc2luZSBjb2x1bW4gYXMgY2F0ZWdvcmljYWwKICBtdXRhdGUoY3Vpc2luZSA9IGZhY3RvcihjdWlzaW5lKSkKCiMgRGlzcGxheSBuZXcgZGF0YSBzZXQKZGZfc2VsZWN0ICU+JSAKICBzbGljZV9oZWFkKG4gPSA1KQoKIyBEaXNwbGF5IGRpc3RyaWJ1dGlvbiBvZiBjdWlzaW5lcwpkZl9zZWxlY3QgJT4lIAogIGNvdW50KGN1aXNpbmUpICU+JSAKICBhcnJhbmdlKGRlc2MobikpCmBgYAoKUGVyZmVjdCEgTm93LCB0aW1lIHRvIHNwbGl0IHRoZSBkYXRhIHN1Y2ggdGhhdCA3MCUgb2YgdGhlIGRhdGEgZ29lcyB0byB0cmFpbmluZyBhbmQgMzAlIGdvZXMgdG8gdGVzdGluZy4gV2UnbGwgYWxzbyBhcHBseSBhIGBzdHJhdGlmaWNhdGlvbmAgdGVjaG5pcXVlIHdoZW4gc3BsaXR0aW5nIHRoZSBkYXRhIHRvIGBtYWludGFpbiB0aGUgcHJvcG9ydGlvbiBvZiBlYWNoIGN1aXNpbmVgIGluIHRoZSB0cmFpbmluZyBhbmQgdmFsaWRhdGlvbiBkYXRhc2V0cy4KCltyc2FtcGxlXShodHRwczovL3JzYW1wbGUudGlkeW1vZGVscy5vcmcvKSwgYSBwYWNrYWdlIGluIFRpZHltb2RlbHMsIHByb3ZpZGVzIGluZnJhc3RydWN0dXJlIGZvciBlZmZpY2llbnQgZGF0YSBzcGxpdHRpbmcgYW5kIHJlc2FtcGxpbmc6CgpgYGB7ciBkYXRhX3NwbGl0fQojIExvYWQgdGhlIGNvcmUgVGlkeW1vZGVscyBwYWNrYWdlcyBpbnRvIFIgc2Vzc2lvbgpsaWJyYXJ5KHRpZHltb2RlbHMpCgojIENyZWF0ZSBzcGxpdCBzcGVjaWZpY2F0aW9uCnNldC5zZWVkKDIwNTYpCmN1aXNpbmVzX3NwbGl0IDwtIGluaXRpYWxfc3BsaXQoZGF0YSA9IGRmX3NlbGVjdCwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzdHJhdGEgPSBjdWlzaW5lLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHByb3AgPSAwLjcpCgojIEV4dHJhY3QgdGhlIGRhdGEgaW4gZWFjaCBzcGxpdApjdWlzaW5lc190cmFpbiA8LSB0cmFpbmluZyhjdWlzaW5lc19zcGxpdCkKY3Vpc2luZXNfdGVzdCA8LSB0ZXN0aW5nKGN1aXNpbmVzX3NwbGl0KQoKIyBQcmludCB0aGUgbnVtYmVyIG9mIGNhc2VzIGluIGVhY2ggc3BsaXQKY2F0KCJUcmFpbmluZyBjYXNlczogIiwgbnJvdyhjdWlzaW5lc190cmFpbiksICJcbiIsCiAgICAiVGVzdCBjYXNlczogIiwgbnJvdyhjdWlzaW5lc190ZXN0KSwgc2VwID0gIiIpCgojIERpc3BsYXkgdGhlIGZpcnN0IGZldyByb3dzIG9mIHRoZSB0cmFpbmluZyBzZXQKY3Vpc2luZXNfdHJhaW4gJT4lIAogIHNsaWNlX2hlYWQobiA9IDUpCgoKIyBEaXNwbGF5IGRpc3RyaWJ1dGlvbiBvZiBjdWlzaW5lcyBpbiB0aGUgdHJhaW5pbmcgc2V0CmN1aXNpbmVzX3RyYWluICU+JSAKICBjb3VudChjdWlzaW5lKSAlPiUgCiAgYXJyYW5nZShkZXNjKG4pKQoKCmBgYAoKIyMgMi4gRGVhbCB3aXRoIGltYmFsYW5jZWQgZGF0YQoKQXMgeW91IG1pZ2h0IGhhdmUgbm90aWNlZCBpbiB0aGUgb3JpZ2luYWwgZGF0YSBzZXQgYXMgd2VsbCBhcyBpbiBvdXIgdHJhaW5pbmcgc2V0LCB0aGVyZSBpcyBxdWl0ZSBhbiB1bmVxdWFsIGRpc3RyaWJ1dGlvbiBpbiB0aGUgbnVtYmVyIG9mIGN1aXNpbmVzLiBLb3JlYW4gY3Vpc2luZXMgYXJlICphbG1vc3QqIDMgdGltZXMgVGhhaSBjdWlzaW5lcy4gSW1iYWxhbmNlZCBkYXRhIG9mdGVuIGhhcyBuZWdhdGl2ZSBlZmZlY3RzIG9uIHRoZSBtb2RlbCBwZXJmb3JtYW5jZS4gTWFueSBtb2RlbHMgcGVyZm9ybSBiZXN0IHdoZW4gdGhlIG51bWJlciBvZiBvYnNlcnZhdGlvbnMgaXMgZXF1YWwgYW5kLCB0aHVzLCB0ZW5kIHRvIHN0cnVnZ2xlIHdpdGggdW5iYWxhbmNlZCBkYXRhLgoKVGhlcmUgYXJlIG1ham9ybHkgdHdvIHdheXMgb2YgZGVhbGluZyB3aXRoIGltYmFsYW5jZWQgZGF0YSBzZXRzOgoKLSAgIGFkZGluZyBvYnNlcnZhdGlvbnMgdG8gdGhlIG1pbm9yaXR5IGNsYXNzOiBgT3Zlci1zYW1wbGluZ2AgZS5nIHVzaW5nIGEgU01PVEUgYWxnb3JpdGhtIHdoaWNoIHN5bnRoZXRpY2FsbHkgZ2VuZXJhdGVzIG5ldyBleGFtcGxlcyBvZiB0aGUgbWlub3JpdHkgY2xhc3MgdXNpbmcgbmVhcmVzdCBuZWlnaGJvcnMgb2YgdGhlc2UgY2FzZXMuCgotICAgcmVtb3Zpbmcgb2JzZXJ2YXRpb25zIGZyb20gbWFqb3JpdHkgY2xhc3M6IGBVbmRlci1zYW1wbGluZ2AKCkluIG91ciBwcmV2aW91cyBsZXNzb24sIHdlIGRlbW9uc3RyYXRlZCBob3cgdG8gZGVhbCB3aXRoIGltYmFsYW5jZWQgZGF0YSBzZXRzIHVzaW5nIGEgYHJlY2lwZWAuIEEgcmVjaXBlIGNhbiBiZSB0aG91Z2h0IG9mIGFzIGEgYmx1ZXByaW50IHRoYXQgZGVzY3JpYmVzIHdoYXQgc3RlcHMgc2hvdWxkIGJlIGFwcGxpZWQgdG8gYSBkYXRhIHNldCBpbiBvcmRlciB0byBnZXQgaXQgcmVhZHkgZm9yIGRhdGEgYW5hbHlzaXMuIEluIG91ciBjYXNlLCB3ZSB3YW50IHRvIGhhdmUgYW4gZXF1YWwgZGlzdHJpYnV0aW9uIGluIHRoZSBudW1iZXIgb2Ygb3VyIGN1aXNpbmVzIGZvciBvdXIgYHRyYWluaW5nIHNldGAuIExldCdzIGdldCByaWdodCBpbnRvIGl0LgoKYGBge3IgcmVjYXBfYmFsYW5jZX0KIyBMb2FkIHRoZW1pcyBwYWNrYWdlIGZvciBkZWFsaW5nIHdpdGggaW1iYWxhbmNlZCBkYXRhCmxpYnJhcnkodGhlbWlzKQoKIyBDcmVhdGUgYSByZWNpcGUgZm9yIHByZXByb2Nlc3NpbmcgdHJhaW5pbmcgZGF0YQpjdWlzaW5lc19yZWNpcGUgPC0gcmVjaXBlKGN1aXNpbmUgfiAuLCBkYXRhID0gY3Vpc2luZXNfdHJhaW4pICU+JSAKICBzdGVwX3Ntb3RlKGN1aXNpbmUpCgojIFByaW50IHJlY2lwZQpjdWlzaW5lc19yZWNpcGUKCmBgYAoKWW91IGNhbiBvZiBjb3Vyc2UgZ28gYWhlYWQgYW5kIGNvbmZpcm0gKHVzaW5nIHByZXArYmFrZSkgdGhhdCB0aGUgcmVjaXBlIHdpbGwgd29yayBhcyB5b3UgZXhwZWN0IGl0IC0gYWxsIHRoZSBjdWlzaW5lIGxhYmVscyBoYXZpbmcgYDU1OWAgb2JzZXJ2YXRpb25zLgoKU2luY2Ugd2UnbGwgYmUgdXNpbmcgdGhpcyByZWNpcGUgYXMgYSBwcmVwcm9jZXNzb3IgZm9yIG1vZGVsaW5nLCBhIGB3b3JrZmxvdygpYCB3aWxsIGRvIGFsbCB0aGUgcHJlcCBhbmQgYmFrZSBmb3IgdXMsIHNvIHdlIHdvbid0IGhhdmUgdG8gbWFudWFsbHkgZXN0aW1hdGUgdGhlIHJlY2lwZS4KCk5vdyB3ZSBhcmUgcmVhZHkgdG8gdHJhaW4gYSBtb2RlbCDwn5Gp4oCN8J+Su/CfkajigI3wn5K7IQoKIyMgMy4gQ2hvb3NpbmcgeW91ciBjbGFzc2lmaWVyCgohW0FydHdvcmsgYnkgXEBhbGxpc29uX2hvcnN0XSguLi8uLi9pbWFnZXMvcGFyc25pcC5qcGcpe3dpZHRoPSI2MDAifQoKTm93IHdlIGhhdmUgdG8gZGVjaWRlIHdoaWNoIGFsZ29yaXRobSB0byB1c2UgZm9yIHRoZSBqb2Ig8J+klC4KCkluIFRpZHltb2RlbHMsIHRoZSBbYHBhcnNuaXAgcGFja2FnZWBdKGh0dHBzOi8vcGFyc25pcC50aWR5bW9kZWxzLm9yZy9pbmRleC5odG1sKSBwcm92aWRlcyBjb25zaXN0ZW50IGludGVyZmFjZSBmb3Igd29ya2luZyB3aXRoIG1vZGVscyBhY3Jvc3MgZGlmZmVyZW50IGVuZ2luZXMgKHBhY2thZ2VzKS4gUGxlYXNlIHNlZSB0aGUgcGFyc25pcCBkb2N1bWVudGF0aW9uIHRvIGV4cGxvcmUgW21vZGVsIHR5cGVzICYgZW5naW5lc10oaHR0cHM6Ly93d3cudGlkeW1vZGVscy5vcmcvZmluZC9wYXJzbmlwLyNtb2RlbHMpIGFuZCB0aGVpciBjb3JyZXNwb25kaW5nIFttb2RlbCBhcmd1bWVudHNdKGh0dHBzOi8vd3d3LnRpZHltb2RlbHMub3JnL2ZpbmQvcGFyc25pcC8jbW9kZWwtYXJncykuIFRoZSB2YXJpZXR5IGlzIHF1aXRlIGJld2lsZGVyaW5nIGF0IGZpcnN0IHNpZ2h0LiBGb3IgaW5zdGFuY2UsIHRoZSBmb2xsb3dpbmcgbWV0aG9kcyBhbGwgaW5jbHVkZSBjbGFzc2lmaWNhdGlvbiB0ZWNobmlxdWVzOgoKLSAgIEM1LjAgUnVsZS1CYXNlZCBDbGFzc2lmaWNhdGlvbiBNb2RlbHMKCi0gICBGbGV4aWJsZSBEaXNjcmltaW5hbnQgTW9kZWxzCgotICAgTGluZWFyIERpc2NyaW1pbmFudCBNb2RlbHMKCi0gICBSZWd1bGFyaXplZCBEaXNjcmltaW5hbnQgTW9kZWxzCgotICAgTG9naXN0aWMgUmVncmVzc2lvbiBNb2RlbHMKCi0gICBNdWx0aW5vbWlhbCBSZWdyZXNzaW9uIE1vZGVscwoKLSAgIE5haXZlIEJheWVzIE1vZGVscwoKLSAgIFN1cHBvcnQgVmVjdG9yIE1hY2hpbmVzCgotICAgTmVhcmVzdCBOZWlnaGJvcnMKCi0gICBEZWNpc2lvbiBUcmVlcwoKLSAgIEVuc2VtYmxlIG1ldGhvZHMKCi0gICBOZXVyYWwgTmV0d29ya3MKClRoZSBsaXN0IGdvZXMgb24hCgojIyMgKipXaGF0IGNsYXNzaWZpZXIgdG8gZ28gd2l0aD8qKgoKU28sIHdoaWNoIGNsYXNzaWZpZXIgc2hvdWxkIHlvdSBjaG9vc2U/IE9mdGVuLCBydW5uaW5nIHRocm91Z2ggc2V2ZXJhbCBhbmQgbG9va2luZyBmb3IgYSBnb29kIHJlc3VsdCBpcyBhIHdheSB0byB0ZXN0LgoKPiBBdXRvTUwgc29sdmVzIHRoaXMgcHJvYmxlbSBuZWF0bHkgYnkgcnVubmluZyB0aGVzZSBjb21wYXJpc29ucyBpbiB0aGUgY2xvdWQsIGFsbG93aW5nIHlvdSB0byBjaG9vc2UgdGhlIGJlc3QgYWxnb3JpdGhtIGZvciB5b3VyIGRhdGEuIFRyeSBpdCBbaGVyZV0oaHR0cHM6Ly9kb2NzLm1pY3Jvc29mdC5jb20vbGVhcm4vbW9kdWxlcy9hdXRvbWF0ZS1tb2RlbC1zZWxlY3Rpb24td2l0aC1henVyZS1hdXRvbWwvP1dULm1jX2lkPWFjYWRlbWljLTc3OTUyLWxlZXN0b3R0KQoKQWxzbyB0aGUgY2hvaWNlIG9mIGNsYXNzaWZpZXIgZGVwZW5kcyBvbiBvdXIgcHJvYmxlbS4gRm9yIGluc3RhbmNlLCB3aGVuIHRoZSBvdXRjb21lIGNhbiBiZSBjYXRlZ29yaXplZCBpbnRvIGBtb3JlIHRoYW4gdHdvIGNsYXNzZXNgLCBsaWtlIGluIG91ciBjYXNlLCB5b3UgbXVzdCB1c2UgYSBgbXVsdGljbGFzcyBjbGFzc2lmaWNhdGlvbiBhbGdvcml0aG1gIGFzIG9wcG9zZWQgdG8gYGJpbmFyeSBjbGFzc2lmaWNhdGlvbi5gCgojIyMgKipBIGJldHRlciBhcHByb2FjaCoqCgpBIGJldHRlciB3YXkgdGhhbiB3aWxkbHkgZ3Vlc3NpbmcsIGhvd2V2ZXIsIGlzIHRvIGZvbGxvdyB0aGUgaWRlYXMgb24gdGhpcyBkb3dubG9hZGFibGUgW01MIENoZWF0IHNoZWV0XShodHRwczovL2RvY3MubWljcm9zb2Z0LmNvbS9henVyZS9tYWNoaW5lLWxlYXJuaW5nL2FsZ29yaXRobS1jaGVhdC1zaGVldD9XVC5tY19pZD1hY2FkZW1pYy03Nzk1Mi1sZWVzdG90dCkuIEhlcmUsIHdlIGRpc2NvdmVyIHRoYXQsIGZvciBvdXIgbXVsdGljbGFzcyBwcm9ibGVtLCB3ZSBoYXZlIHNvbWUgY2hvaWNlczoKCiFbQSBzZWN0aW9uIG9mIE1pY3Jvc29mdCdzIEFsZ29yaXRobSBDaGVhdCBTaGVldCwgZGV0YWlsaW5nIG11bHRpY2xhc3MgY2xhc3NpZmljYXRpb24gb3B0aW9uc10oLi4vLi4vaW1hZ2VzL2NoZWF0c2hlZXQucG5nKXt3aWR0aD0iNTAwIn0KCiMjIyAqKlJlYXNvbmluZyoqCgpMZXQncyBzZWUgaWYgd2UgY2FuIHJlYXNvbiBvdXIgd2F5IHRocm91Z2ggZGlmZmVyZW50IGFwcHJvYWNoZXMgZ2l2ZW4gdGhlIGNvbnN0cmFpbnRzIHdlIGhhdmU6CgotICAgKipEZWVwIE5ldXJhbCBuZXR3b3JrcyBhcmUgdG9vIGhlYXZ5KiouIEdpdmVuIG91ciBjbGVhbiwgYnV0IG1pbmltYWwgZGF0YXNldCwgYW5kIHRoZSBmYWN0IHRoYXQgd2UgYXJlIHJ1bm5pbmcgdHJhaW5pbmcgbG9jYWxseSB2aWEgbm90ZWJvb2tzLCBkZWVwIG5ldXJhbCBuZXR3b3JrcyBhcmUgdG9vIGhlYXZ5d2VpZ2h0IGZvciB0aGlzIHRhc2suCgotICAgKipObyB0d28tY2xhc3MgY2xhc3NpZmllcioqLiBXZSBkbyBub3QgdXNlIGEgdHdvLWNsYXNzIGNsYXNzaWZpZXIsIHNvIHRoYXQgcnVsZXMgb3V0IG9uZS12cy1hbGwuCgotICAgKipEZWNpc2lvbiB0cmVlIG9yIGxvZ2lzdGljIHJlZ3Jlc3Npb24gY291bGQgd29yayoqLiBBIGRlY2lzaW9uIHRyZWUgbWlnaHQgd29yaywgb3IgbXVsdGlub21pYWwgcmVncmVzc2lvbi9tdWx0aWNsYXNzIGxvZ2lzdGljIHJlZ3Jlc3Npb24gZm9yIG11bHRpY2xhc3MgZGF0YS4KCi0gICAqKk11bHRpY2xhc3MgQm9vc3RlZCBEZWNpc2lvbiBUcmVlcyBzb2x2ZSBhIGRpZmZlcmVudCBwcm9ibGVtKiouIFRoZSBtdWx0aWNsYXNzIGJvb3N0ZWQgZGVjaXNpb24gdHJlZSBpcyBtb3N0IHN1aXRhYmxlIGZvciBub25wYXJhbWV0cmljIHRhc2tzLCBlLmcuIHRhc2tzIGRlc2lnbmVkIHRvIGJ1aWxkIHJhbmtpbmdzLCBzbyBpdCBpcyBub3QgdXNlZnVsIGZvciB1cy4KCkFsc28sIG5vcm1hbGx5IGJlZm9yZSBlbWJhcmtpbmcgb24gbW9yZSBjb21wbGV4IG1hY2hpbmUgbGVhcm5pbmcgbW9kZWxzIGUuZyBlbnNlbWJsZSBtZXRob2RzLCBpdCdzIGEgZ29vZCBpZGVhIHRvIGJ1aWxkIHRoZSBzaW1wbGVzdCBwb3NzaWJsZSBtb2RlbCB0byBnZXQgYW4gaWRlYSBvZiB3aGF0IGlzIGdvaW5nIG9uLiBTbyBmb3IgdGhpcyBsZXNzb24sIHdlJ2xsIHN0YXJ0IHdpdGggYSBgbXVsdGlub21pYWwgbG9naXN0aWMgcmVncmVzc2lvbmAgbW9kZWwuCgo+IExvZ2lzdGljIHJlZ3Jlc3Npb24gaXMgYSB0ZWNobmlxdWUgdXNlZCB3aGVuIHRoZSBvdXRjb21lIHZhcmlhYmxlIGlzIGNhdGVnb3JpY2FsIChvciBub21pbmFsKS4gRm9yIEJpbmFyeSBsb2dpc3RpYyByZWdyZXNzaW9uIHRoZSBudW1iZXIgb2Ygb3V0Y29tZSB2YXJpYWJsZXMgaXMgdHdvLCB3aGVyZWFzIHRoZSBudW1iZXIgb2Ygb3V0Y29tZSB2YXJpYWJsZXMgZm9yIG11bHRpbm9taWFsIGxvZ2lzdGljIHJlZ3Jlc3Npb24gaXMgbW9yZSB0aGFuIHR3by4gU2VlIFtBZHZhbmNlZCBSZWdyZXNzaW9uIE1ldGhvZHNdKGh0dHBzOi8vYm9va2Rvd24ub3JnL2NodWEvYmVyNjQyX2FkdmFuY2VkX3JlZ3Jlc3Npb24vbXVsdGlub21pYWwtbG9naXN0aWMtcmVncmVzc2lvbi5odG1sKSBmb3IgZnVydGhlciByZWFkaW5nLgoKIyMgNC4gVHJhaW4gYW5kIGV2YWx1YXRlIGEgTXVsdGlub21pYWwgbG9naXN0aWMgcmVncmVzc2lvbiBtb2RlbC4KCkluIFRpZHltb2RlbHMsIGBwYXJzbmlwOjptdWx0aW5vbV9yZWcoKWAsIGRlZmluZXMgYSBtb2RlbCB0aGF0IHVzZXMgbGluZWFyIHByZWRpY3RvcnMgdG8gcHJlZGljdCBtdWx0aWNsYXNzIGRhdGEgdXNpbmcgdGhlIG11bHRpbm9taWFsIGRpc3RyaWJ1dGlvbi4gU2VlIGA/bXVsdGlub21fcmVnKClgIGZvciB0aGUgZGlmZmVyZW50IHdheXMvZW5naW5lcyB5b3UgY2FuIHVzZSB0byBmaXQgdGhpcyBtb2RlbC4KCkZvciB0aGlzIGV4YW1wbGUsIHdlJ2xsIGZpdCBhIE11bHRpbm9taWFsIHJlZ3Jlc3Npb24gbW9kZWwgdmlhIHRoZSBkZWZhdWx0IFtubmV0XShodHRwczovL2NyYW4uci1wcm9qZWN0Lm9yZy93ZWIvcGFja2FnZXMvbm5ldC9ubmV0LnBkZikgZW5naW5lLgoKPiBJIHBpY2tlZCBhIHZhbHVlIGZvciBgcGVuYWx0eWAgc29ydCBvZiByYW5kb21seS4gVGhlcmUgYXJlIGJldHRlciB3YXlzIHRvIGNob29zZSB0aGlzIHZhbHVlIHRoYXQgaXMsIGJ5IHVzaW5nIGByZXNhbXBsaW5nYCBhbmQgYHR1bmluZ2AgdGhlIG1vZGVsIHdoaWNoIHdlJ2xsIGRpc2N1c3MgbGF0ZXIuCj4KPiBTZWUgW1RpZHltb2RlbHM6IEdldCBTdGFydGVkXShodHRwczovL3d3dy50aWR5bW9kZWxzLm9yZy9zdGFydC90dW5pbmcvKSBpbiBjYXNlIHlvdSB3YW50IHRvIGxlYXJuIG1vcmUgb24gaG93IHRvIHR1bmUgbW9kZWwgaHlwZXJwYXJhbWV0ZXJzLgoKYGBge3IgbXVsdGlub3JtX3JlZ30KIyBDcmVhdGUgYSBtdWx0aW5vbWlhbCByZWdyZXNzaW9uIG1vZGVsIHNwZWNpZmljYXRpb24KbXJfc3BlYyA8LSBtdWx0aW5vbV9yZWcocGVuYWx0eSA9IDEpICU+JSAKICBzZXRfZW5naW5lKCJubmV0IiwgTWF4Tld0cyA9IDIwODYpICU+JSAKICBzZXRfbW9kZSgiY2xhc3NpZmljYXRpb24iKQoKIyBQcmludCBtb2RlbCBzcGVjaWZpY2F0aW9uCm1yX3NwZWMKCmBgYAoKR3JlYXQgam9iIPCfpbMhIE5vdyB0aGF0IHdlIGhhdmUgYSByZWNpcGUgYW5kIGEgbW9kZWwgc3BlY2lmaWNhdGlvbiwgd2UgbmVlZCB0byBmaW5kIGEgd2F5IG9mIGJ1bmRsaW5nIHRoZW0gdG9nZXRoZXIgaW50byBhbiBvYmplY3QgdGhhdCB3aWxsIGZpcnN0IHByZXByb2Nlc3MgdGhlIGRhdGEgdGhlbiBmaXQgdGhlIG1vZGVsIG9uIHRoZSBwcmVwcm9jZXNzZWQgZGF0YSBhbmQgYWxzbyBhbGxvdyBmb3IgcG90ZW50aWFsIHBvc3QtcHJvY2Vzc2luZyBhY3Rpdml0aWVzLiBJbiBUaWR5bW9kZWxzLCB0aGlzIGNvbnZlbmllbnQgb2JqZWN0IGlzIGNhbGxlZCBhIFtgd29ya2Zsb3dgXShodHRwczovL3dvcmtmbG93cy50aWR5bW9kZWxzLm9yZy8pIGFuZCBjb252ZW5pZW50bHkgaG9sZHMgeW91ciBtb2RlbGluZyBjb21wb25lbnRzISBUaGlzIGlzIHdoYXQgd2UnZCBjYWxsICpwaXBlbGluZXMqIGluICpQeXRob24qLgoKU28gbGV0J3MgYnVuZGxlIGV2ZXJ5dGhpbmcgdXAgaW50byBhIHdvcmtmbG93IfCfk6YKCmBgYHtyIHdvcmtmbG93fQojIEJ1bmRsZSByZWNpcGUgYW5kIG1vZGVsIHNwZWNpZmljYXRpb24KbXJfd2YgPC0gd29ya2Zsb3coKSAlPiUgCiAgYWRkX3JlY2lwZShjdWlzaW5lc19yZWNpcGUpICU+JSAKICBhZGRfbW9kZWwobXJfc3BlYykKCiMgUHJpbnQgb3V0IHdvcmtmbG93Cm1yX3dmCgpgYGAKCldvcmtmbG93cyDwn5GM8J+RjCEgQSAqKmB3b3JrZmxvdygpYCoqIGNhbiBiZSBmaXQgaW4gbXVjaCB0aGUgc2FtZSB3YXkgYSBtb2RlbCBjYW4uIFNvLCB0aW1lIHRvIHRyYWluIGEgbW9kZWwhCgpgYGB7ciB0cmFpbn0KIyBUcmFpbiBhIG11bHRpbm9taWFsIHJlZ3Jlc3Npb24gbW9kZWwKbXJfZml0IDwtIGZpdChvYmplY3QgPSBtcl93ZiwgZGF0YSA9IGN1aXNpbmVzX3RyYWluKQoKbXJfZml0CmBgYAoKVGhlIG91dHB1dCBzaG93cyB0aGUgY29lZmZpY2llbnRzIHRoYXQgdGhlIG1vZGVsIGxlYXJuZWQgZHVyaW5nIHRyYWluaW5nLgoKIyMjIEV2YWx1YXRlIHRoZSBUcmFpbmVkIE1vZGVsCgpJdCdzIHRpbWUgdG8gc2VlIGhvdyB0aGUgbW9kZWwgcGVyZm9ybWVkIPCfk48gYnkgZXZhbHVhdGluZyBpdCBvbiBhIHRlc3Qgc2V0ISBMZXQncyBiZWdpbiBieSBtYWtpbmcgcHJlZGljdGlvbnMgb24gdGhlIHRlc3Qgc2V0LgoKYGBge3IgdGVzdH0KIyBNYWtlIHByZWRpY3Rpb25zIG9uIHRoZSB0ZXN0IHNldApyZXN1bHRzIDwtIGN1aXNpbmVzX3Rlc3QgJT4lIHNlbGVjdChjdWlzaW5lKSAlPiUgCiAgYmluZF9jb2xzKG1yX2ZpdCAlPiUgcHJlZGljdChuZXdfZGF0YSA9IGN1aXNpbmVzX3Rlc3QpKQoKIyBQcmludCBvdXQgcmVzdWx0cwpyZXN1bHRzICU+JSAKICBzbGljZV9oZWFkKG4gPSA1KQoKYGBgCgpHcmVhdCBqb2IhIEluIFRpZHltb2RlbHMsIGV2YWx1YXRpbmcgbW9kZWwgcGVyZm9ybWFuY2UgY2FuIGJlIGRvbmUgdXNpbmcgW3lhcmRzdGlja10oaHR0cHM6Ly95YXJkc3RpY2sudGlkeW1vZGVscy5vcmcvKSAtIGEgcGFja2FnZSB1c2VkIHRvIG1lYXN1cmUgdGhlIGVmZmVjdGl2ZW5lc3Mgb2YgbW9kZWxzIHVzaW5nIHBlcmZvcm1hbmNlIG1ldHJpY3MuIEFzIHdlIGRpZCBpbiBvdXIgbG9naXN0aWMgcmVncmVzc2lvbiBsZXNzb24sIGxldCdzIGJlZ2luIGJ5IGNvbXB1dGluZyBhIGNvbmZ1c2lvbiBtYXRyaXguCgpgYGB7ciBjb25mX21hdH0KIyBDb25mdXNpb24gbWF0cml4IGZvciBjYXRlZ29yaWNhbCBkYXRhCmNvbmZfbWF0KGRhdGEgPSByZXN1bHRzLCB0cnV0aCA9IGN1aXNpbmUsIGVzdGltYXRlID0gLnByZWRfY2xhc3MpCgoKYGBgCgpXaGVuIGRlYWxpbmcgd2l0aCBtdWx0aXBsZSBjbGFzc2VzLCBpdCdzIGdlbmVyYWxseSBtb3JlIGludHVpdGl2ZSB0byB2aXN1YWxpemUgdGhpcyBhcyBhIGhlYXQgbWFwLCBsaWtlIHRoaXM6CgpgYGB7ciBjb25mX3Zpen0KdXBkYXRlX2dlb21fZGVmYXVsdHMoZ2VvbSA9ICJ0aWxlIiwgbmV3ID0gbGlzdChjb2xvciA9ICJibGFjayIsIGFscGhhID0gMC43KSkKIyBWaXN1YWxpemUgY29uZnVzaW9uIG1hdHJpeApyZXN1bHRzICU+JSAKICBjb25mX21hdChjdWlzaW5lLCAucHJlZF9jbGFzcykgJT4lIAogIGF1dG9wbG90KHR5cGUgPSAiaGVhdG1hcCIpCmBgYAoKVGhlIGRhcmtlciBzcXVhcmVzIGluIHRoZSBjb25mdXNpb24gbWF0cml4IHBsb3QgaW5kaWNhdGUgaGlnaCBudW1iZXJzIG9mIGNhc2VzLCBhbmQgeW91IGNhbiBob3BlZnVsbHkgc2VlIGEgZGlhZ29uYWwgbGluZSBvZiBkYXJrZXIgc3F1YXJlcyBpbmRpY2F0aW5nIGNhc2VzIHdoZXJlIHRoZSBwcmVkaWN0ZWQgYW5kIGFjdHVhbCBsYWJlbCBhcmUgdGhlIHNhbWUuCgpMZXQncyBub3cgY2FsY3VsYXRlIHN1bW1hcnkgc3RhdGlzdGljcyBmb3IgdGhlIGNvbmZ1c2lvbiBtYXRyaXguCgpgYGB7ciBjb25mX3N0YXRzfQojIFN1bW1hcnkgc3RhdHMgZm9yIGNvbmZ1c2lvbiBtYXRyaXgKY29uZl9tYXQoZGF0YSA9IHJlc3VsdHMsIHRydXRoID0gY3Vpc2luZSwgZXN0aW1hdGUgPSAucHJlZF9jbGFzcykgJT4lIHN1bW1hcnkoKQpgYGAKCklmIHdlIG5hcnJvdyBkb3duIHRvIHNvbWUgbWV0cmljcyBzdWNoIGFzIGFjY3VyYWN5LCBzZW5zaXRpdml0eSwgcHB2LCB3ZSBhcmUgbm90IGJhZGx5IG9mZiBmb3IgYSBzdGFydCDwn6WzIQoKIyMgNC4gRGlnZ2luZyBEZWVwZXIKCkxldCdzIGFzayBvbmUgc3VidGxlIHF1ZXN0aW9uOiBXaGF0IGNyaXRlcmlhIGlzIHVzZWQgdG8gc2V0dGxlIGZvciBhIGdpdmVuIHR5cGUgb2YgY3Vpc2luZSBhcyB0aGUgcHJlZGljdGVkIG91dGNvbWU/CgpXZWxsLCBTdGF0aXN0aWNhbCBtYWNoaW5lIGxlYXJuaW5nIGFsZ29yaXRobXMsIGxpa2UgbG9naXN0aWMgcmVncmVzc2lvbiwgYXJlIGJhc2VkIG9uIGBwcm9iYWJpbGl0eWA7IHNvIHdoYXQgYWN0dWFsbHkgZ2V0cyBwcmVkaWN0ZWQgYnkgYSBjbGFzc2lmaWVyIGlzIGEgcHJvYmFiaWxpdHkgZGlzdHJpYnV0aW9uIG92ZXIgYSBzZXQgb2YgcG9zc2libGUgb3V0Y29tZXMuIFRoZSBjbGFzcyB3aXRoIHRoZSBoaWdoZXN0IHByb2JhYmlsaXR5IGlzIHRoZW4gY2hvc2VuIGFzIHRoZSBtb3N0IGxpa2VseSBvdXRjb21lIGZvciB0aGUgZ2l2ZW4gb2JzZXJ2YXRpb25zLgoKTGV0J3Mgc2VlIHRoaXMgaW4gYWN0aW9uIGJ5IG1ha2luZyBib3RoIGhhcmQgY2xhc3MgcHJlZGljdGlvbnMgYW5kIHByb2JhYmlsaXRpZXMuCgpgYGB7ciBwcmVkX3Byb2J9CiMgTWFrZSBoYXJkIGNsYXNzIHByZWRpY3Rpb24gYW5kIHByb2JhYmlsaXRpZXMKcmVzdWx0c19wcm9iIDwtIGN1aXNpbmVzX3Rlc3QgJT4lCiAgc2VsZWN0KGN1aXNpbmUpICU+JSAKICBiaW5kX2NvbHMobXJfZml0ICU+JSBwcmVkaWN0KG5ld19kYXRhID0gY3Vpc2luZXNfdGVzdCkpICU+JSAKICBiaW5kX2NvbHMobXJfZml0ICU+JSBwcmVkaWN0KG5ld19kYXRhID0gY3Vpc2luZXNfdGVzdCwgdHlwZSA9ICJwcm9iIikpCgojIFByaW50IG91dCByZXN1bHRzCnJlc3VsdHNfcHJvYiAlPiUgCiAgc2xpY2VfaGVhZChuID0gNSkKICAKCmBgYAoKTXVjaCBiZXR0ZXIhCgrinIUgQ2FuIHlvdSBleHBsYWluIHdoeSB0aGUgbW9kZWwgaXMgcHJldHR5IHN1cmUgdGhhdCB0aGUgZmlyc3Qgb2JzZXJ2YXRpb24gaXMgVGhhaT8KCiMjICoq8J+agENoYWxsZW5nZSoqCgpJbiB0aGlzIGxlc3NvbiwgeW91IHVzZWQgeW91ciBjbGVhbmVkIGRhdGEgdG8gYnVpbGQgYSBtYWNoaW5lIGxlYXJuaW5nIG1vZGVsIHRoYXQgY2FuIHByZWRpY3QgYSBuYXRpb25hbCBjdWlzaW5lIGJhc2VkIG9uIGEgc2VyaWVzIG9mIGluZ3JlZGllbnRzLiBUYWtlIHNvbWUgdGltZSB0byByZWFkIHRocm91Z2ggdGhlIFttYW55IG9wdGlvbnNdKGh0dHBzOi8vd3d3LnRpZHltb2RlbHMub3JnL2ZpbmQvcGFyc25pcC8jbW9kZWxzKSBUaWR5bW9kZWxzIHByb3ZpZGVzIHRvIGNsYXNzaWZ5IGRhdGEgYW5kIFtvdGhlciB3YXlzXShodHRwczovL3BhcnNuaXAudGlkeW1vZGVscy5vcmcvYXJ0aWNsZXMvYXJ0aWNsZXMvRXhhbXBsZXMuaHRtbCNtdWx0aW5vbV9yZWctbW9kZWxzKSB0byBmaXQgbXVsdGlub21pYWwgcmVncmVzc2lvbi4KCiMjIyMgVEhBTksgWU9VIFRPOgoKW2BBbGxpc29uIEhvcnN0YF0oaHR0cHM6Ly90d2l0dGVyLmNvbS9hbGxpc29uX2hvcnN0LykgZm9yIGNyZWF0aW5nIHRoZSBhbWF6aW5nIGlsbHVzdHJhdGlvbnMgdGhhdCBtYWtlIFIgbW9yZSB3ZWxjb21pbmcgYW5kIGVuZ2FnaW5nLiBGaW5kIG1vcmUgaWxsdXN0cmF0aW9ucyBhdCBoZXIgW2dhbGxlcnldKGh0dHBzOi8vd3d3Lmdvb2dsZS5jb20vdXJsP3E9aHR0cHM6Ly9naXRodWIuY29tL2FsbGlzb25ob3JzdC9zdGF0cy1pbGx1c3RyYXRpb25zJnNhPUQmc291cmNlPWVkaXRvcnMmdXN0PTE2MjYzODA3NzI1MzAwMDAmdXNnPUFPdlZhdzN6Y2Z5Q2l6RlFacGtTTHp4aWlRRU0pLgoKW0Nhc3NpZSBCcmV2aXVdKGh0dHBzOi8vd3d3LnR3aXR0ZXIuY29tL2Nhc3NpZXZpZXcpIGFuZCBbSmVuIExvb3Blcl0oaHR0cHM6Ly93d3cudHdpdHRlci5jb20vamVubG9vcGVyKSBmb3IgY3JlYXRpbmcgdGhlIG9yaWdpbmFsIFB5dGhvbiB2ZXJzaW9uIG9mIHRoaXMgbW9kdWxlIOKZpe+4jwoKSGFwcHkgTGVhcm5pbmcsCgpbRXJpY10oaHR0cHM6Ly90d2l0dGVyLmNvbS9lcmljbnRheSksIEdvbGQgTWljcm9zb2Z0IExlYXJuIFN0dWRlbnQgQW1iYXNzYWRvci4K
+ + +
+
+ +
+ + + + + + + + + + + + + + + + + diff --git a/4-Classification/3-Classifiers-2/solution/R/lesson_12.html b/4-Classification/3-Classifiers-2/solution/R/lesson_12.html new file mode 100644 index 000000000..363352e09 --- /dev/null +++ b/4-Classification/3-Classifiers-2/solution/R/lesson_12.html @@ -0,0 +1,3615 @@ + + + + + + + + + + + + + +Build a classification model: Delicious Asian and Indian Cuisines + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

Cuisine classifiers 2

+

In this second classification lesson, we will explore +more ways to classify categorical data. We will also learn +about the ramifications for choosing one classifier over the other.

+ +
+

Prerequisite

+

We assume that you have completed the previous lessons since we will +be carrying forward some concepts we learned before.

+

For this lesson, we’ll require the following packages:

+ +

You can have them installed as:

+

install.packages(c("tidyverse", "tidymodels", "kernlab", "themis", "ranger", "xgboost", "kknn"))

+

Alternatively, the script below checks whether you have the packages +required to complete this module and installs them for you in case they +are missing.

+
suppressWarnings(if (!require("pacman"))install.packages("pacman"))
+
+pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)
+
## 
+## The downloaded binary packages are in
+##  /var/folders/c9/r3f6t3kj3wv9jrh50g63hp1r0000gn/T//RtmpE2TSCy/downloaded_packages
+## 
+## The downloaded binary packages are in
+##  /var/folders/c9/r3f6t3kj3wv9jrh50g63hp1r0000gn/T//RtmpE2TSCy/downloaded_packages
+## 
+## The downloaded binary packages are in
+##  /var/folders/c9/r3f6t3kj3wv9jrh50g63hp1r0000gn/T//RtmpE2TSCy/downloaded_packages
+## 
+## The downloaded binary packages are in
+##  /var/folders/c9/r3f6t3kj3wv9jrh50g63hp1r0000gn/T//RtmpE2TSCy/downloaded_packages
+

Now, let’s hit the ground running!

+
+
+
+

1. A classification map

+

In our previous +lesson, we tried to address the question: how do we choose between +multiple models? To a great extent, it depends on the characteristics of +the data and the type of problem we want to solve (for instance +classification or regression?)

+

Previously, we learned about the various options you have when +classifying data using Microsoft’s cheat sheet. Python’s Machine +Learning framework, Scikit-learn, offers a similar but more granular +cheat sheet that can further help narrow down your estimators (another +term for classifiers):

+


+

+
+

Tip: visit +this map online and click along the path to read documentation.

+

The Tidymodels +reference site also provides an excellent documentation about +different types of model.

+
+
+

The plan 🗺️

+

This map is very helpful once you have a clear grasp of your data, as +you can ‘walk’ along its paths to a decision:

+
    +
  • We have >50 samples

  • +
  • We want to predict a category

  • +
  • We have labeled data

  • +
  • We have fewer than 100K samples

  • +
  • ✨ We can choose a Linear SVC

  • +
  • If that doesn’t work, since we have numeric data

    +
      +
    • We can try a ✨ KNeighbors Classifier

      +
        +
      • If that doesn’t work, try ✨ SVC and ✨ Ensemble Classifiers
      • +
    • +
  • +
+

This is a very helpful trail to follow. Now, let’s get right into it +using the tidymodels modelling +framework: a consistent and flexible collection of R packages developed +to encourage good statistical practice 😊.

+
+
+
+

2. Split the data and deal with imbalanced data set.

+

From our previous lessons, we learnt that there were a set of common +ingredients across our cuisines. Also, there was quite an unequal +distribution in the number of cuisines.

+

We’ll deal with these by

+
    +
  • Dropping the most common ingredients that create confusion +between distinct cuisines, using dplyr::select().

  • +
  • Use a recipe that preprocesses the data to get it +ready for modelling by applying an over-sampling +algorithm.

  • +
+

We already looked at the above in the previous lesson so this should +be a breeze 🥳!

+
# Load the core Tidyverse and Tidymodels packages
+library(tidyverse)
+library(tidymodels)
+
+# Load the original cuisines data
+df <- read_csv(file = "https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv")
+
## New names:
+## Rows: 2448 Columns: 385
+## ── Column specification
+## ──────────────────────────────────────────────────────── Delimiter: "," chr
+## (1): cuisine dbl (384): ...1, almond, angelica, anise, anise_seed, apple,
+## apple_brandy, a...
+## ℹ Use `spec()` to retrieve the full column specification for this data. ℹ
+## Specify the column types or set `show_col_types = FALSE` to quiet this message.
+## • `` -> `...1`
+
# Drop id column, rice, garlic and ginger from our original data set
+df_select <- df %>% 
+  select(-c(1, rice, garlic, ginger)) %>%
+  # Encode cuisine column as categorical
+  mutate(cuisine = factor(cuisine))
+
+
+# Create data split specification
+set.seed(2056)
+cuisines_split <- initial_split(data = df_select,
+                                strata = cuisine,
+                                prop = 0.7)
+
+# Extract the data in each split
+cuisines_train <- training(cuisines_split)
+cuisines_test <- testing(cuisines_split)
+
+# Display distribution of cuisines in the training set
+cuisines_train %>% 
+  count(cuisine) %>% 
+  arrange(desc(n))
+
+ +
+
+

Deal with imbalanced data

+

Imbalanced data often has negative effects on the model performance. +Many models perform best when the number of observations is equal and, +thus, tend to struggle with unbalanced data.

+

There are majorly two ways of dealing with imbalanced data sets:

+
    +
  • adding observations to the minority class: +Over-sampling e.g using a SMOTE algorithm which +synthetically generates new examples of the minority class using nearest +neighbors of these cases.

  • +
  • removing observations from majority class: +Under-sampling

  • +
+

In our previous lesson, we demonstrated how to deal with imbalanced +data sets using a recipe. A recipe can be thought of as a +blueprint that describes what steps should be applied to a data set in +order to get it ready for data analysis. In our case, we want to have an +equal distribution in the number of our cuisines for our +training set. Let’s get right into it.

+
# Load themis package for dealing with imbalanced data
+library(themis)
+
+# Create a recipe for preprocessing training data
+cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%
+  step_smote(cuisine) 
+
+# Print recipe
+cuisines_recipe
+
## 
+
## ── Recipe ──────────────────────────────────────────────────────────────────────
+
## 
+
## ── Inputs
+
## Number of variables by role
+
## outcome:     1
+## predictor: 380
+
## 
+
## ── Operations
+
## • SMOTE based on: cuisine
+

Now we are ready to train models 👩‍💻👨‍💻!

+
+
+
+

3. Beyond multinomial regression models

+

In our previous lesson, we looked at multinomial regression models. +Let’s explore some more flexible models for classification.

+
+

Support Vector Machines.

+

In the context of classification, +Support Vector Machines is a machine learning technique +that tries to find a hyperplane that “best” separates the +classes. Let’s look at a simple example:

+
+By User:ZackWeinberg:This file was derived from: https://commons.wikimedia.org/w/index.php?curid=22877598 +
By User:ZackWeinberg:This file was derived from: +https://commons.wikimedia.org/w/index.php?curid=22877598
+
+

H1 does not separate the classes. H2 does, but +only with a small margin. H3 separates them with the maximal +margin.

+
+

Linear Support Vector Classifier

+

Support-Vector clustering (SVC) is a child of the Support-Vector +machines family of ML techniques. In SVC, the hyperplane is chosen to +correctly separate most of the training observations, but +may misclassify a few observations. By allowing some points +to be on the wrong side, the SVM becomes more robust to outliers hence +better generalization to new data. The parameter that regulates this +violation is referred to as cost which has a default value +of 1 (see help("svm_poly")).

+

Let’s create a linear SVC by setting degree = 1 in a +polynomial SVM model.

+
# Make a linear SVC specification
+svc_linear_spec <- svm_poly(degree = 1) %>% 
+  set_engine("kernlab") %>% 
+  set_mode("classification")
+
+# Bundle specification and recipe into a worklow
+svc_linear_wf <- workflow() %>% 
+  add_recipe(cuisines_recipe) %>% 
+  add_model(svc_linear_spec)
+
+# Print out workflow
+svc_linear_wf
+
## ══ Workflow ════════════════════════════════════════════════════════════════════
+## Preprocessor: Recipe
+## Model: svm_poly()
+## 
+## ── Preprocessor ────────────────────────────────────────────────────────────────
+## 1 Recipe Step
+## 
+## • step_smote()
+## 
+## ── Model ───────────────────────────────────────────────────────────────────────
+## Polynomial Support Vector Machine Model Specification (classification)
+## 
+## Main Arguments:
+##   degree = 1
+## 
+## Computational engine: kernlab
+

Now that we have captured the preprocessing steps and model +specification into a workflow, we can go ahead and train the +linear SVC and evaluate results while at it. For performance metrics, +let’s create a metric set that will evaluate: accuracy, +sensitivity, Positive Predicted Value and +F Measure

+
+

augment() will add column(s) for predictions to the +given data.

+
+
# Train a linear SVC model
+svc_linear_fit <- svc_linear_wf %>% 
+  fit(data = cuisines_train)
+
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
+
# Create a metric set
+eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)
+
+
+# Make predictions and Evaluate model performance
+svc_linear_fit %>% 
+  augment(new_data = cuisines_test) %>% 
+  eval_metrics(truth = cuisine, estimate = .pred_class)
+
+ +
+
+
+

+
+
+

Support Vector Machine

+

The support vector machine (SVM) is an extension of the support +vector classifier in order to accommodate a non-linear boundary between +the classes. In essence, SVMs use the kernel trick to enlarge +the feature space to adapt to nonlinear relationships between classes. +One popular and extremely flexible kernel function used by SVMs is the +Radial basis function. Let’s see how it will perform on our +data.

+
set.seed(2056)
+
+# Make an RBF SVM specification
+svm_rbf_spec <- svm_rbf() %>% 
+  set_engine("kernlab") %>% 
+  set_mode("classification")
+
+# Bundle specification and recipe into a worklow
+svm_rbf_wf <- workflow() %>% 
+  add_recipe(cuisines_recipe) %>% 
+  add_model(svm_rbf_spec)
+
+
+# Train an RBF model
+svm_rbf_fit <- svm_rbf_wf %>% 
+  fit(data = cuisines_train)
+
## Warning in .local(x, ...): Variable(s) `' constant. Cannot scale data.
+
# Make predictions and Evaluate model performance
+svm_rbf_fit %>% 
+  augment(new_data = cuisines_test) %>% 
+  eval_metrics(truth = cuisine, estimate = .pred_class)
+
+ +
+

Much better 🤩!

+
+

✅ Please see:

+ +

for further reading.

+
+
+
+
+

Nearest Neighbor classifiers

+

K-nearest neighbor (KNN) is an algorithm in which each +observation is predicted based on its similarity to other +observations.

+

Let’s fit one to our data.

+
# Make a KNN specification
+knn_spec <- nearest_neighbor() %>% 
+  set_engine("kknn") %>% 
+  set_mode("classification")
+
+# Bundle recipe and model specification into a workflow
+knn_wf <- workflow() %>% 
+  add_recipe(cuisines_recipe) %>% 
+  add_model(knn_spec)
+
+# Train a boosted tree model
+knn_wf_fit <- knn_wf %>% 
+  fit(data = cuisines_train)
+
+
+# Make predictions and Evaluate model performance
+knn_wf_fit %>% 
+  augment(new_data = cuisines_test) %>% 
+  eval_metrics(truth = cuisine, estimate = .pred_class)
+
+ +
+

It appears that this model is not performing that well. Probably +changing the model’s arguments (see +help("nearest_neighbor") will improve model performance. Be +sure to try it out.

+
+

✅ Please see:

+ +

to learn more about K-Nearest Neighbors classifiers.

+
+
+
+

Ensemble classifiers

+

Ensemble algorithms work by combining multiple base estimators to +produce an optimal model either by:

+

bagging: applying an averaging function to a +collection of base models

+

boosting: building a sequence of models that build on +one another to improve predictive performance.

+

Let’s start by trying out a Random Forest model, which builds a large +collection of decision trees then applies an averaging function to for a +better overall model.

+
# Make a random forest specification
+rf_spec <- rand_forest() %>% 
+  set_engine("ranger") %>% 
+  set_mode("classification")
+
+# Bundle recipe and model specification into a workflow
+rf_wf <- workflow() %>% 
+  add_recipe(cuisines_recipe) %>% 
+  add_model(rf_spec)
+
+# Train a random forest model
+rf_wf_fit <- rf_wf %>% 
+  fit(data = cuisines_train)
+
+
+# Make predictions and Evaluate model performance
+rf_wf_fit %>% 
+  augment(new_data = cuisines_test) %>% 
+  eval_metrics(truth = cuisine, estimate = .pred_class)
+
+ +
+

Good job 👏!

+

Let’s also experiment with a Boosted Tree model.

+

Boosted Tree defines an ensemble method that creates a series of +sequential decision trees where each tree depends on the results of +previous trees in an attempt to incrementally reduce the error. It +focuses on the weights of incorrectly classified items and adjusts the +fit for the next classifier to correct.

+

There are different ways to fit this model (see +help("boost_tree")). In this example, we’ll fit Boosted +trees via xgboost engine.

+
# Make a boosted tree specification
+boost_spec <- boost_tree(trees = 200) %>% 
+  set_engine("xgboost") %>% 
+  set_mode("classification")
+
+# Bundle recipe and model specification into a workflow
+boost_wf <- workflow() %>% 
+  add_recipe(cuisines_recipe) %>% 
+  add_model(boost_spec)
+
+# Train a boosted tree model
+boost_wf_fit <- boost_wf %>% 
+  fit(data = cuisines_train)
+
+
+# Make predictions and Evaluate model performance
+boost_wf_fit %>% 
+  augment(new_data = cuisines_test) %>% 
+  eval_metrics(truth = cuisine, estimate = .pred_class)
+
+ +
+
+

✅ Please see:

+ +

to learn more about Ensemble classifiers.

+
+
+
+
+

4. Extra - comparing multiple models

+

We have fitted quite a number of models in this lab 🙌. It can become +tedious or onerous to create a lot of workflows from different sets of +preprocessors and/or model specifications and then calculate the +performance metrics one by one.

+

Let’s see if we can address this by creating a function that fits a +list of workflows on the training set then returns the performance +metrics based on the test set. We’ll get to use map() and +map_dfr() from the purrr package to apply functions +to each element in list.

+
+

map() +functions allow you to replace many for loops with code that is both +more succinct and easier to read. The best place to learn about the map() +functions is the iteration chapter in R +for data science.

+
+
set.seed(2056)
+
+# Create a metric set
+eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)
+
+# Define a function that returns performance metrics
+compare_models <- function(workflow_list, train_set, test_set){
+  
+  suppressWarnings(
+    # Fit each model to the train_set
+    map(workflow_list, fit, data = train_set) %>% 
+    # Make predictions on the test set
+      map_dfr(augment, new_data = test_set, .id = "model") %>%
+    # Select desired columns
+      select(model, cuisine, .pred_class) %>% 
+    # Evaluate model performance
+      group_by(model) %>% 
+      eval_metrics(truth = cuisine, estimate = .pred_class) %>% 
+      ungroup()
+  )
+  
+} # End of function
+

Let’s call our function and compare the accuracy across the +models.

+
# Make a list of workflows
+workflow_list <- list(
+  "svc" = svc_linear_wf,
+  "svm" = svm_rbf_wf,
+  "knn" = knn_wf,
+  "random_forest" = rf_wf,
+  "xgboost" = boost_wf)
+
+# Call the function
+set.seed(2056)
+perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)
+
+# Print out performance metrics
+perf_metrics %>% 
+  group_by(.metric) %>% 
+  arrange(desc(.estimate)) %>% 
+  slice_head(n=7)
+
+ +
+
# Compare accuracy
+perf_metrics %>% 
+  filter(.metric == "accuracy") %>% 
+  arrange(desc(.estimate))
+
+ +
+

workflowset +package allow users to create and easily fit a large number of models +but is mostly designed to work with resampling techniques such as +cross-validation, an approach we are yet to cover.

+
+
+

🚀Challenge

+

Each of these techniques has a large number of parameters that you +can tweak for instance cost in SVMs, neighbors +in KNN, mtry (Randomly Selected Predictors) in Random +Forest.

+

Research each one’s default parameters and think about what tweaking +these parameters would mean for the model’s quality.

+

To find out more about a particular model and its parameters, use: +help("model") e.g help("rand_forest")

+
+

In practice, we usually estimate the best values +for these by training many models on a simulated data set +and measuring how well all these models perform. This process is called +tuning.

+
+
+

Post-lecture +quiz

+
+
+

Review & Self Study

+

There’s a lot of jargon in these lessons, so take a minute to review +this +list of useful terminology!

+
+

THANK YOU TO:

+

Allison Horst +for creating the amazing illustrations that make R more welcoming and +engaging. Find more illustrations at her gallery.

+

Cassie Breviu and Jen Looper for creating the +original Python version of this module ♥️

+

Happy Learning,

+

Eric, Gold Microsoft Learn +Student Ambassador.

+
+Artwork by @allison_horst +
Artwork by @allison_horst
+
+
+
+
+ +
LS0tCnRpdGxlOiAnQnVpbGQgYSBjbGFzc2lmaWNhdGlvbiBtb2RlbDogRGVsaWNpb3VzIEFzaWFuIGFuZCBJbmRpYW4gQ3Vpc2luZXMnCm91dHB1dDoKICBodG1sX2RvY3VtZW50OgogICAgZGZfcHJpbnQ6IHBhZ2VkCiAgICB0aGVtZTogZmxhdGx5CiAgICBoaWdobGlnaHQ6IGJyZWV6ZWRhcmsKICAgIHRvYzogeWVzCiAgICB0b2NfZmxvYXQ6IHllcwogICAgY29kZV9kb3dubG9hZDogeWVzCi0tLQoKIyMgQ3Vpc2luZSBjbGFzc2lmaWVycyAyCgpJbiB0aGlzIHNlY29uZCBjbGFzc2lmaWNhdGlvbiBsZXNzb24sIHdlIHdpbGwgZXhwbG9yZSBgbW9yZSB3YXlzYCB0byBjbGFzc2lmeSBjYXRlZ29yaWNhbCBkYXRhLiBXZSB3aWxsIGFsc28gbGVhcm4gYWJvdXQgdGhlIHJhbWlmaWNhdGlvbnMgZm9yIGNob29zaW5nIG9uZSBjbGFzc2lmaWVyIG92ZXIgdGhlIG90aGVyLgoKIyMjIFsqKlByZS1sZWN0dXJlIHF1aXoqKl0oaHR0cHM6Ly9ncmF5LXNhbmQtMDdhMTBmNDAzLjEuYXp1cmVzdGF0aWNhcHBzLm5ldC9xdWl6LzIzLykKCiMjIyAqKlByZXJlcXVpc2l0ZSoqCgpXZSBhc3N1bWUgdGhhdCB5b3UgaGF2ZSBjb21wbGV0ZWQgdGhlIHByZXZpb3VzIGxlc3NvbnMgc2luY2Ugd2Ugd2lsbCBiZSBjYXJyeWluZyBmb3J3YXJkIHNvbWUgY29uY2VwdHMgd2UgbGVhcm5lZCBiZWZvcmUuCgpGb3IgdGhpcyBsZXNzb24sIHdlJ2xsIHJlcXVpcmUgdGhlIGZvbGxvd2luZyBwYWNrYWdlczoKCi0gICBgdGlkeXZlcnNlYDogVGhlIFt0aWR5dmVyc2VdKGh0dHBzOi8vd3d3LnRpZHl2ZXJzZS5vcmcvKSBpcyBhIFtjb2xsZWN0aW9uIG9mIFIgcGFja2FnZXNdKGh0dHBzOi8vd3d3LnRpZHl2ZXJzZS5vcmcvcGFja2FnZXMpIGRlc2lnbmVkIHRvIG1ha2VzIGRhdGEgc2NpZW5jZSBmYXN0ZXIsIGVhc2llciBhbmQgbW9yZSBmdW4hCgotICAgYHRpZHltb2RlbHNgOiBUaGUgW3RpZHltb2RlbHNdKGh0dHBzOi8vd3d3LnRpZHltb2RlbHMub3JnLykgZnJhbWV3b3JrIGlzIGEgW2NvbGxlY3Rpb24gb2YgcGFja2FnZXNdKGh0dHBzOi8vd3d3LnRpZHltb2RlbHMub3JnL3BhY2thZ2VzLykgZm9yIG1vZGVsaW5nIGFuZCBtYWNoaW5lIGxlYXJuaW5nLgoKLSAgIGB0aGVtaXNgOiBUaGUgW3RoZW1pcyBwYWNrYWdlXShodHRwczovL3RoZW1pcy50aWR5bW9kZWxzLm9yZy8pIHByb3ZpZGVzIEV4dHJhIFJlY2lwZXMgU3RlcHMgZm9yIERlYWxpbmcgd2l0aCBVbmJhbGFuY2VkIERhdGEuCgpZb3UgY2FuIGhhdmUgdGhlbSBpbnN0YWxsZWQgYXM6CgpgaW5zdGFsbC5wYWNrYWdlcyhjKCJ0aWR5dmVyc2UiLCAidGlkeW1vZGVscyIsICJrZXJubGFiIiwgInRoZW1pcyIsICJyYW5nZXIiLCAieGdib29zdCIsICJra25uIikpYAoKQWx0ZXJuYXRpdmVseSwgdGhlIHNjcmlwdCBiZWxvdyBjaGVja3Mgd2hldGhlciB5b3UgaGF2ZSB0aGUgcGFja2FnZXMgcmVxdWlyZWQgdG8gY29tcGxldGUgdGhpcyBtb2R1bGUgYW5kIGluc3RhbGxzIHRoZW0gZm9yIHlvdSBpbiBjYXNlIHRoZXkgYXJlIG1pc3NpbmcuCgpgYGB7ciwgbWVzc2FnZT1GLCB3YXJuaW5nPUZ9CnN1cHByZXNzV2FybmluZ3MoaWYgKCFyZXF1aXJlKCJwYWNtYW4iKSlpbnN0YWxsLnBhY2thZ2VzKCJwYWNtYW4iKSkKCnBhY21hbjo6cF9sb2FkKHRpZHl2ZXJzZSwgdGlkeW1vZGVscywgdGhlbWlzLCBrZXJubGFiLCByYW5nZXIsIHhnYm9vc3QsIGtrbm4pCmBgYAoKTm93LCBsZXQncyBoaXQgdGhlIGdyb3VuZCBydW5uaW5nIQoKIyMgKioxLiBBIGNsYXNzaWZpY2F0aW9uIG1hcCoqCgpJbiBvdXIgW3ByZXZpb3VzIGxlc3Nvbl0oaHR0cHM6Ly9naXRodWIuY29tL21pY3Jvc29mdC9NTC1Gb3ItQmVnaW5uZXJzL3RyZWUvbWFpbi80LUNsYXNzaWZpY2F0aW9uLzItQ2xhc3NpZmllcnMtMSksIHdlIHRyaWVkIHRvIGFkZHJlc3MgdGhlIHF1ZXN0aW9uOiBob3cgZG8gd2UgY2hvb3NlIGJldHdlZW4gbXVsdGlwbGUgbW9kZWxzPyBUbyBhIGdyZWF0IGV4dGVudCwgaXQgZGVwZW5kcyBvbiB0aGUgY2hhcmFjdGVyaXN0aWNzIG9mIHRoZSBkYXRhIGFuZCB0aGUgdHlwZSBvZiBwcm9ibGVtIHdlIHdhbnQgdG8gc29sdmUgKGZvciBpbnN0YW5jZSBjbGFzc2lmaWNhdGlvbiBvciByZWdyZXNzaW9uPykKClByZXZpb3VzbHksIHdlIGxlYXJuZWQgYWJvdXQgdGhlIHZhcmlvdXMgb3B0aW9ucyB5b3UgaGF2ZSB3aGVuIGNsYXNzaWZ5aW5nIGRhdGEgdXNpbmcgTWljcm9zb2Z0J3MgY2hlYXQgc2hlZXQuIFB5dGhvbidzIE1hY2hpbmUgTGVhcm5pbmcgZnJhbWV3b3JrLCBTY2lraXQtbGVhcm4sIG9mZmVycyBhIHNpbWlsYXIgYnV0IG1vcmUgZ3JhbnVsYXIgY2hlYXQgc2hlZXQgdGhhdCBjYW4gZnVydGhlciBoZWxwIG5hcnJvdyBkb3duIHlvdXIgZXN0aW1hdG9ycyAoYW5vdGhlciB0ZXJtIGZvciBjbGFzc2lmaWVycyk6CgohW10oLi4vLi4vaW1hZ2VzL21hcC5wbmcpe3dpZHRoPSI2NTAifVwKCj4gVGlwOiBbdmlzaXQgdGhpcyBtYXAgb25saW5lXShodHRwczovL3NjaWtpdC1sZWFybi5vcmcvc3RhYmxlL3R1dG9yaWFsL21hY2hpbmVfbGVhcm5pbmdfbWFwLykgYW5kIGNsaWNrIGFsb25nIHRoZSBwYXRoIHRvIHJlYWQgZG9jdW1lbnRhdGlvbi4KPgo+IFRoZSBbVGlkeW1vZGVscyByZWZlcmVuY2Ugc2l0ZV0oaHR0cHM6Ly93d3cudGlkeW1vZGVscy5vcmcvZmluZC9wYXJzbmlwLyNtb2RlbHMpIGFsc28gcHJvdmlkZXMgYW4gZXhjZWxsZW50IGRvY3VtZW50YXRpb24gYWJvdXQgZGlmZmVyZW50IHR5cGVzIG9mIG1vZGVsLgoKIyMjICoqVGhlIHBsYW4qKiDwn5e677iPCgpUaGlzIG1hcCBpcyB2ZXJ5IGhlbHBmdWwgb25jZSB5b3UgaGF2ZSBhIGNsZWFyIGdyYXNwIG9mIHlvdXIgZGF0YSwgYXMgeW91IGNhbiAnd2FsaycgYWxvbmcgaXRzIHBhdGhzIHRvIGEgZGVjaXNpb246CgotICAgV2UgaGF2ZSBcPjUwIHNhbXBsZXMKCi0gICBXZSB3YW50IHRvIHByZWRpY3QgYSBjYXRlZ29yeQoKLSAgIFdlIGhhdmUgbGFiZWxlZCBkYXRhCgotICAgV2UgaGF2ZSBmZXdlciB0aGFuIDEwMEsgc2FtcGxlcwoKLSAgIOKcqCBXZSBjYW4gY2hvb3NlIGEgTGluZWFyIFNWQwoKLSAgIElmIHRoYXQgZG9lc24ndCB3b3JrLCBzaW5jZSB3ZSBoYXZlIG51bWVyaWMgZGF0YQoKICAgIC0gICBXZSBjYW4gdHJ5IGEg4pyoIEtOZWlnaGJvcnMgQ2xhc3NpZmllcgoKICAgICAgICAtICAgSWYgdGhhdCBkb2Vzbid0IHdvcmssIHRyeSDinKggU1ZDIGFuZCDinKggRW5zZW1ibGUgQ2xhc3NpZmllcnMKClRoaXMgaXMgYSB2ZXJ5IGhlbHBmdWwgdHJhaWwgdG8gZm9sbG93LiBOb3csIGxldCdzIGdldCByaWdodCBpbnRvIGl0IHVzaW5nIHRoZSBbdGlkeW1vZGVsc10oaHR0cHM6Ly93d3cudGlkeW1vZGVscy5vcmcvKSBtb2RlbGxpbmcgZnJhbWV3b3JrOiBhIGNvbnNpc3RlbnQgYW5kIGZsZXhpYmxlIGNvbGxlY3Rpb24gb2YgUiBwYWNrYWdlcyBkZXZlbG9wZWQgdG8gZW5jb3VyYWdlIGdvb2Qgc3RhdGlzdGljYWwgcHJhY3RpY2Ug8J+Yii4KCiMjIDIuIFNwbGl0IHRoZSBkYXRhIGFuZCBkZWFsIHdpdGggaW1iYWxhbmNlZCBkYXRhIHNldC4KCkZyb20gb3VyIHByZXZpb3VzIGxlc3NvbnMsIHdlIGxlYXJudCB0aGF0IHRoZXJlIHdlcmUgYSBzZXQgb2YgY29tbW9uIGluZ3JlZGllbnRzIGFjcm9zcyBvdXIgY3Vpc2luZXMuIEFsc28sIHRoZXJlIHdhcyBxdWl0ZSBhbiB1bmVxdWFsIGRpc3RyaWJ1dGlvbiBpbiB0aGUgbnVtYmVyIG9mIGN1aXNpbmVzLgoKV2UnbGwgZGVhbCB3aXRoIHRoZXNlIGJ5CgotICAgRHJvcHBpbmcgdGhlIG1vc3QgY29tbW9uIGluZ3JlZGllbnRzIHRoYXQgY3JlYXRlIGNvbmZ1c2lvbiBiZXR3ZWVuIGRpc3RpbmN0IGN1aXNpbmVzLCB1c2luZyBgZHBseXI6OnNlbGVjdCgpYC4KCi0gICBVc2UgYSBgcmVjaXBlYCB0aGF0IHByZXByb2Nlc3NlcyB0aGUgZGF0YSB0byBnZXQgaXQgcmVhZHkgZm9yIG1vZGVsbGluZyBieSBhcHBseWluZyBhbiBgb3Zlci1zYW1wbGluZ2AgYWxnb3JpdGhtLgoKV2UgYWxyZWFkeSBsb29rZWQgYXQgdGhlIGFib3ZlIGluIHRoZSBwcmV2aW91cyBsZXNzb24gc28gdGhpcyBzaG91bGQgYmUgYSBicmVlemUg8J+lsyEKCmBgYHtyIGNsZWFuX2ltYmFsYW5jZX0KIyBMb2FkIHRoZSBjb3JlIFRpZHl2ZXJzZSBhbmQgVGlkeW1vZGVscyBwYWNrYWdlcwpsaWJyYXJ5KHRpZHl2ZXJzZSkKbGlicmFyeSh0aWR5bW9kZWxzKQoKIyBMb2FkIHRoZSBvcmlnaW5hbCBjdWlzaW5lcyBkYXRhCmRmIDwtIHJlYWRfY3N2KGZpbGUgPSAiaHR0cHM6Ly9yYXcuZ2l0aHVidXNlcmNvbnRlbnQuY29tL21pY3Jvc29mdC9NTC1Gb3ItQmVnaW5uZXJzL21haW4vNC1DbGFzc2lmaWNhdGlvbi9kYXRhL2N1aXNpbmVzLmNzdiIpCgojIERyb3AgaWQgY29sdW1uLCByaWNlLCBnYXJsaWMgYW5kIGdpbmdlciBmcm9tIG91ciBvcmlnaW5hbCBkYXRhIHNldApkZl9zZWxlY3QgPC0gZGYgJT4lIAogIHNlbGVjdCgtYygxLCByaWNlLCBnYXJsaWMsIGdpbmdlcikpICU+JQogICMgRW5jb2RlIGN1aXNpbmUgY29sdW1uIGFzIGNhdGVnb3JpY2FsCiAgbXV0YXRlKGN1aXNpbmUgPSBmYWN0b3IoY3Vpc2luZSkpCgoKIyBDcmVhdGUgZGF0YSBzcGxpdCBzcGVjaWZpY2F0aW9uCnNldC5zZWVkKDIwNTYpCmN1aXNpbmVzX3NwbGl0IDwtIGluaXRpYWxfc3BsaXQoZGF0YSA9IGRmX3NlbGVjdCwKICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgICBzdHJhdGEgPSBjdWlzaW5lLAogICAgICAgICAgICAgICAgICAgICAgICAgICAgICAgIHByb3AgPSAwLjcpCgojIEV4dHJhY3QgdGhlIGRhdGEgaW4gZWFjaCBzcGxpdApjdWlzaW5lc190cmFpbiA8LSB0cmFpbmluZyhjdWlzaW5lc19zcGxpdCkKY3Vpc2luZXNfdGVzdCA8LSB0ZXN0aW5nKGN1aXNpbmVzX3NwbGl0KQoKIyBEaXNwbGF5IGRpc3RyaWJ1dGlvbiBvZiBjdWlzaW5lcyBpbiB0aGUgdHJhaW5pbmcgc2V0CmN1aXNpbmVzX3RyYWluICU+JSAKICBjb3VudChjdWlzaW5lKSAlPiUgCiAgYXJyYW5nZShkZXNjKG4pKQoKCmBgYAoKIyMjIERlYWwgd2l0aCBpbWJhbGFuY2VkIGRhdGEKCkltYmFsYW5jZWQgZGF0YSBvZnRlbiBoYXMgbmVnYXRpdmUgZWZmZWN0cyBvbiB0aGUgbW9kZWwgcGVyZm9ybWFuY2UuIE1hbnkgbW9kZWxzIHBlcmZvcm0gYmVzdCB3aGVuIHRoZSBudW1iZXIgb2Ygb2JzZXJ2YXRpb25zIGlzIGVxdWFsIGFuZCwgdGh1cywgdGVuZCB0byBzdHJ1Z2dsZSB3aXRoIHVuYmFsYW5jZWQgZGF0YS4KClRoZXJlIGFyZSBtYWpvcmx5IHR3byB3YXlzIG9mIGRlYWxpbmcgd2l0aCBpbWJhbGFuY2VkIGRhdGEgc2V0czoKCi0gICBhZGRpbmcgb2JzZXJ2YXRpb25zIHRvIHRoZSBtaW5vcml0eSBjbGFzczogYE92ZXItc2FtcGxpbmdgIGUuZyB1c2luZyBhIFNNT1RFIGFsZ29yaXRobSB3aGljaCBzeW50aGV0aWNhbGx5IGdlbmVyYXRlcyBuZXcgZXhhbXBsZXMgb2YgdGhlIG1pbm9yaXR5IGNsYXNzIHVzaW5nIG5lYXJlc3QgbmVpZ2hib3JzIG9mIHRoZXNlIGNhc2VzLgoKLSAgIHJlbW92aW5nIG9ic2VydmF0aW9ucyBmcm9tIG1ham9yaXR5IGNsYXNzOiBgVW5kZXItc2FtcGxpbmdgCgpJbiBvdXIgcHJldmlvdXMgbGVzc29uLCB3ZSBkZW1vbnN0cmF0ZWQgaG93IHRvIGRlYWwgd2l0aCBpbWJhbGFuY2VkIGRhdGEgc2V0cyB1c2luZyBhIGByZWNpcGVgLiBBIHJlY2lwZSBjYW4gYmUgdGhvdWdodCBvZiBhcyBhIGJsdWVwcmludCB0aGF0IGRlc2NyaWJlcyB3aGF0IHN0ZXBzIHNob3VsZCBiZSBhcHBsaWVkIHRvIGEgZGF0YSBzZXQgaW4gb3JkZXIgdG8gZ2V0IGl0IHJlYWR5IGZvciBkYXRhIGFuYWx5c2lzLiBJbiBvdXIgY2FzZSwgd2Ugd2FudCB0byBoYXZlIGFuIGVxdWFsIGRpc3RyaWJ1dGlvbiBpbiB0aGUgbnVtYmVyIG9mIG91ciBjdWlzaW5lcyBmb3Igb3VyIGB0cmFpbmluZyBzZXRgLiBMZXQncyBnZXQgcmlnaHQgaW50byBpdC4KCmBgYHtyIHJlY2FwX2JhbGFuY2V9CiMgTG9hZCB0aGVtaXMgcGFja2FnZSBmb3IgZGVhbGluZyB3aXRoIGltYmFsYW5jZWQgZGF0YQpsaWJyYXJ5KHRoZW1pcykKCiMgQ3JlYXRlIGEgcmVjaXBlIGZvciBwcmVwcm9jZXNzaW5nIHRyYWluaW5nIGRhdGEKY3Vpc2luZXNfcmVjaXBlIDwtIHJlY2lwZShjdWlzaW5lIH4gLiwgZGF0YSA9IGN1aXNpbmVzX3RyYWluKSAlPiUKICBzdGVwX3Ntb3RlKGN1aXNpbmUpIAoKIyBQcmludCByZWNpcGUKY3Vpc2luZXNfcmVjaXBlCgpgYGAKCk5vdyB3ZSBhcmUgcmVhZHkgdG8gdHJhaW4gbW9kZWxzIPCfkanigI3wn5K78J+RqOKAjfCfkrshCgojIyAzLiBCZXlvbmQgbXVsdGlub21pYWwgcmVncmVzc2lvbiBtb2RlbHMKCkluIG91ciBwcmV2aW91cyBsZXNzb24sIHdlIGxvb2tlZCBhdCBtdWx0aW5vbWlhbCByZWdyZXNzaW9uIG1vZGVscy4gTGV0J3MgZXhwbG9yZSBzb21lIG1vcmUgZmxleGlibGUgbW9kZWxzIGZvciBjbGFzc2lmaWNhdGlvbi4KCiMjIyBTdXBwb3J0IFZlY3RvciBNYWNoaW5lcy4KCkluIHRoZSBjb250ZXh0IG9mIGNsYXNzaWZpY2F0aW9uLCBgU3VwcG9ydCBWZWN0b3IgTWFjaGluZXNgIGlzIGEgbWFjaGluZSBsZWFybmluZyB0ZWNobmlxdWUgdGhhdCB0cmllcyB0byBmaW5kIGEgKmh5cGVycGxhbmUqIHRoYXQgImJlc3QiIHNlcGFyYXRlcyB0aGUgY2xhc3Nlcy4gTGV0J3MgbG9vayBhdCBhIHNpbXBsZSBleGFtcGxlOgoKIVtCeSBVc2VyOlphY2tXZWluYmVyZzpUaGlzIGZpbGUgd2FzIGRlcml2ZWQgZnJvbTogPGh0dHBzOi8vY29tbW9ucy53aWtpbWVkaWEub3JnL3cvaW5kZXgucGhwP2N1cmlkPTIyODc3NTk4Pl0oLi4vLi4vaW1hZ2VzL3N2bS5wbmcpe3dpZHRoPSIzMDAifQoKSH4xfiBkb2VzIG5vdCBzZXBhcmF0ZSB0aGUgY2xhc3Nlcy4gSH4yfiBkb2VzLCBidXQgb25seSB3aXRoIGEgc21hbGwgbWFyZ2luLiBIfjN+IHNlcGFyYXRlcyB0aGVtIHdpdGggdGhlIG1heGltYWwgbWFyZ2luLgoKIyMjIyBMaW5lYXIgU3VwcG9ydCBWZWN0b3IgQ2xhc3NpZmllcgoKU3VwcG9ydC1WZWN0b3IgY2x1c3RlcmluZyAoU1ZDKSBpcyBhIGNoaWxkIG9mIHRoZSBTdXBwb3J0LVZlY3RvciBtYWNoaW5lcyBmYW1pbHkgb2YgTUwgdGVjaG5pcXVlcy4gSW4gU1ZDLCB0aGUgaHlwZXJwbGFuZSBpcyBjaG9zZW4gdG8gY29ycmVjdGx5IHNlcGFyYXRlIGBtb3N0YCBvZiB0aGUgdHJhaW5pbmcgb2JzZXJ2YXRpb25zLCBidXQgYG1heSBtaXNjbGFzc2lmeWAgYSBmZXcgb2JzZXJ2YXRpb25zLiBCeSBhbGxvd2luZyBzb21lIHBvaW50cyB0byBiZSBvbiB0aGUgd3Jvbmcgc2lkZSwgdGhlIFNWTSBiZWNvbWVzIG1vcmUgcm9idXN0IHRvIG91dGxpZXJzIGhlbmNlIGJldHRlciBnZW5lcmFsaXphdGlvbiB0byBuZXcgZGF0YS4gVGhlIHBhcmFtZXRlciB0aGF0IHJlZ3VsYXRlcyB0aGlzIHZpb2xhdGlvbiBpcyByZWZlcnJlZCB0byBhcyBgY29zdGAgd2hpY2ggaGFzIGEgZGVmYXVsdCB2YWx1ZSBvZiAxIChzZWUgYGhlbHAoInN2bV9wb2x5IilgKS4KCkxldCdzIGNyZWF0ZSBhIGxpbmVhciBTVkMgYnkgc2V0dGluZyBgZGVncmVlID0gMWAgaW4gYSBwb2x5bm9taWFsIFNWTSBtb2RlbC4KCmBgYHtyIHN2Y19zcGVjfQojIE1ha2UgYSBsaW5lYXIgU1ZDIHNwZWNpZmljYXRpb24Kc3ZjX2xpbmVhcl9zcGVjIDwtIHN2bV9wb2x5KGRlZ3JlZSA9IDEpICU+JSAKICBzZXRfZW5naW5lKCJrZXJubGFiIikgJT4lIAogIHNldF9tb2RlKCJjbGFzc2lmaWNhdGlvbiIpCgojIEJ1bmRsZSBzcGVjaWZpY2F0aW9uIGFuZCByZWNpcGUgaW50byBhIHdvcmtsb3cKc3ZjX2xpbmVhcl93ZiA8LSB3b3JrZmxvdygpICU+JSAKICBhZGRfcmVjaXBlKGN1aXNpbmVzX3JlY2lwZSkgJT4lIAogIGFkZF9tb2RlbChzdmNfbGluZWFyX3NwZWMpCgojIFByaW50IG91dCB3b3JrZmxvdwpzdmNfbGluZWFyX3dmCmBgYAoKTm93IHRoYXQgd2UgaGF2ZSBjYXB0dXJlZCB0aGUgcHJlcHJvY2Vzc2luZyBzdGVwcyBhbmQgbW9kZWwgc3BlY2lmaWNhdGlvbiBpbnRvIGEgKndvcmtmbG93Kiwgd2UgY2FuIGdvIGFoZWFkIGFuZCB0cmFpbiB0aGUgbGluZWFyIFNWQyBhbmQgZXZhbHVhdGUgcmVzdWx0cyB3aGlsZSBhdCBpdC4gRm9yIHBlcmZvcm1hbmNlIG1ldHJpY3MsIGxldCdzIGNyZWF0ZSBhIG1ldHJpYyBzZXQgdGhhdCB3aWxsIGV2YWx1YXRlOiBgYWNjdXJhY3lgLCBgc2Vuc2l0aXZpdHlgLCBgUG9zaXRpdmUgUHJlZGljdGVkIFZhbHVlYCBhbmQgYEYgTWVhc3VyZWAKCj4gYGF1Z21lbnQoKWAgd2lsbCBhZGQgY29sdW1uKHMpIGZvciBwcmVkaWN0aW9ucyB0byB0aGUgZ2l2ZW4gZGF0YS4KCmBgYHtyIHN2Y190cmFpbn0KIyBUcmFpbiBhIGxpbmVhciBTVkMgbW9kZWwKc3ZjX2xpbmVhcl9maXQgPC0gc3ZjX2xpbmVhcl93ZiAlPiUgCiAgZml0KGRhdGEgPSBjdWlzaW5lc190cmFpbikKCiMgQ3JlYXRlIGEgbWV0cmljIHNldApldmFsX21ldHJpY3MgPC0gbWV0cmljX3NldChwcHYsIHNlbnMsIGFjY3VyYWN5LCBmX21lYXMpCgoKIyBNYWtlIHByZWRpY3Rpb25zIGFuZCBFdmFsdWF0ZSBtb2RlbCBwZXJmb3JtYW5jZQpzdmNfbGluZWFyX2ZpdCAlPiUgCiAgYXVnbWVudChuZXdfZGF0YSA9IGN1aXNpbmVzX3Rlc3QpICU+JSAKICBldmFsX21ldHJpY3ModHJ1dGggPSBjdWlzaW5lLCBlc3RpbWF0ZSA9IC5wcmVkX2NsYXNzKQogIAoKCmBgYAoKIyMjIyAKCiMjIyMgU3VwcG9ydCBWZWN0b3IgTWFjaGluZQoKVGhlIHN1cHBvcnQgdmVjdG9yIG1hY2hpbmUgKFNWTSkgaXMgYW4gZXh0ZW5zaW9uIG9mIHRoZSBzdXBwb3J0IHZlY3RvciBjbGFzc2lmaWVyIGluIG9yZGVyIHRvIGFjY29tbW9kYXRlIGEgbm9uLWxpbmVhciBib3VuZGFyeSBiZXR3ZWVuIHRoZSBjbGFzc2VzLiBJbiBlc3NlbmNlLCBTVk1zIHVzZSB0aGUgKmtlcm5lbCB0cmljayogdG8gZW5sYXJnZSB0aGUgZmVhdHVyZSBzcGFjZSB0byBhZGFwdCB0byBub25saW5lYXIgcmVsYXRpb25zaGlwcyBiZXR3ZWVuIGNsYXNzZXMuIE9uZSBwb3B1bGFyIGFuZCBleHRyZW1lbHkgZmxleGlibGUga2VybmVsIGZ1bmN0aW9uIHVzZWQgYnkgU1ZNcyBpcyB0aGUgKlJhZGlhbCBiYXNpcyBmdW5jdGlvbi4qIExldCdzIHNlZSBob3cgaXQgd2lsbCBwZXJmb3JtIG9uIG91ciBkYXRhLgoKYGBge3Igc3ZtX3JiZn0Kc2V0LnNlZWQoMjA1NikKCiMgTWFrZSBhbiBSQkYgU1ZNIHNwZWNpZmljYXRpb24Kc3ZtX3JiZl9zcGVjIDwtIHN2bV9yYmYoKSAlPiUgCiAgc2V0X2VuZ2luZSgia2VybmxhYiIpICU+JSAKICBzZXRfbW9kZSgiY2xhc3NpZmljYXRpb24iKQoKIyBCdW5kbGUgc3BlY2lmaWNhdGlvbiBhbmQgcmVjaXBlIGludG8gYSB3b3JrbG93CnN2bV9yYmZfd2YgPC0gd29ya2Zsb3coKSAlPiUgCiAgYWRkX3JlY2lwZShjdWlzaW5lc19yZWNpcGUpICU+JSAKICBhZGRfbW9kZWwoc3ZtX3JiZl9zcGVjKQoKCiMgVHJhaW4gYW4gUkJGIG1vZGVsCnN2bV9yYmZfZml0IDwtIHN2bV9yYmZfd2YgJT4lIAogIGZpdChkYXRhID0gY3Vpc2luZXNfdHJhaW4pCgoKIyBNYWtlIHByZWRpY3Rpb25zIGFuZCBFdmFsdWF0ZSBtb2RlbCBwZXJmb3JtYW5jZQpzdm1fcmJmX2ZpdCAlPiUgCiAgYXVnbWVudChuZXdfZGF0YSA9IGN1aXNpbmVzX3Rlc3QpICU+JSAKICBldmFsX21ldHJpY3ModHJ1dGggPSBjdWlzaW5lLCBlc3RpbWF0ZSA9IC5wcmVkX2NsYXNzKQpgYGAKCk11Y2ggYmV0dGVyIPCfpKkhCgo+IOKchSBQbGVhc2Ugc2VlOgo+Cj4gLSAgIFsqU3VwcG9ydCBWZWN0b3IgTWFjaGluZXMqXShodHRwczovL2JyYWRsZXlib2VobWtlLmdpdGh1Yi5pby9IT01ML3N2bS5odG1sKSwgSGFuZHMtb24gTWFjaGluZSBMZWFybmluZyB3aXRoIFIKPgo+IC0gICBbKlN1cHBvcnQgVmVjdG9yIE1hY2hpbmVzKl0oaHR0cHM6Ly93d3cuc3RhdGxlYXJuaW5nLmNvbS8pLCBBbiBJbnRyb2R1Y3Rpb24gdG8gU3RhdGlzdGljYWwgTGVhcm5pbmcgd2l0aCBBcHBsaWNhdGlvbnMgaW4gUgo+Cj4gZm9yIGZ1cnRoZXIgcmVhZGluZy4KCiMjIyBOZWFyZXN0IE5laWdoYm9yIGNsYXNzaWZpZXJzCgoqSyotbmVhcmVzdCBuZWlnaGJvciAoS05OKSBpcyBhbiBhbGdvcml0aG0gaW4gd2hpY2ggZWFjaCBvYnNlcnZhdGlvbiBpcyBwcmVkaWN0ZWQgYmFzZWQgb24gaXRzICpzaW1pbGFyaXR5KiB0byBvdGhlciBvYnNlcnZhdGlvbnMuCgpMZXQncyBmaXQgb25lIHRvIG91ciBkYXRhLgoKYGBge3Iga25ufQojIE1ha2UgYSBLTk4gc3BlY2lmaWNhdGlvbgprbm5fc3BlYyA8LSBuZWFyZXN0X25laWdoYm9yKCkgJT4lIAogIHNldF9lbmdpbmUoImtrbm4iKSAlPiUgCiAgc2V0X21vZGUoImNsYXNzaWZpY2F0aW9uIikKCiMgQnVuZGxlIHJlY2lwZSBhbmQgbW9kZWwgc3BlY2lmaWNhdGlvbiBpbnRvIGEgd29ya2Zsb3cKa25uX3dmIDwtIHdvcmtmbG93KCkgJT4lIAogIGFkZF9yZWNpcGUoY3Vpc2luZXNfcmVjaXBlKSAlPiUgCiAgYWRkX21vZGVsKGtubl9zcGVjKQoKIyBUcmFpbiBhIGJvb3N0ZWQgdHJlZSBtb2RlbAprbm5fd2ZfZml0IDwtIGtubl93ZiAlPiUgCiAgZml0KGRhdGEgPSBjdWlzaW5lc190cmFpbikKCgojIE1ha2UgcHJlZGljdGlvbnMgYW5kIEV2YWx1YXRlIG1vZGVsIHBlcmZvcm1hbmNlCmtubl93Zl9maXQgJT4lIAogIGF1Z21lbnQobmV3X2RhdGEgPSBjdWlzaW5lc190ZXN0KSAlPiUgCiAgZXZhbF9tZXRyaWNzKHRydXRoID0gY3Vpc2luZSwgZXN0aW1hdGUgPSAucHJlZF9jbGFzcykKYGBgCgpJdCBhcHBlYXJzIHRoYXQgdGhpcyBtb2RlbCBpcyBub3QgcGVyZm9ybWluZyB0aGF0IHdlbGwuIFByb2JhYmx5IGNoYW5naW5nIHRoZSBtb2RlbCdzIGFyZ3VtZW50cyAoc2VlIGBoZWxwKCJuZWFyZXN0X25laWdoYm9yIilgIHdpbGwgaW1wcm92ZSBtb2RlbCBwZXJmb3JtYW5jZS4gQmUgc3VyZSB0byB0cnkgaXQgb3V0LgoKPiDinIUgUGxlYXNlIHNlZToKPgo+IC0gICBbSGFuZHMtb24gTWFjaGluZSBMZWFybmluZyB3aXRoIFJdKGh0dHBzOi8vYnJhZGxleWJvZWhta2UuZ2l0aHViLmlvL0hPTUwvKQo+Cj4gLSAgIFtBbiBJbnRyb2R1Y3Rpb24gdG8gU3RhdGlzdGljYWwgTGVhcm5pbmcgd2l0aCBBcHBsaWNhdGlvbnMgaW4gUl0oaHR0cHM6Ly93d3cuc3RhdGxlYXJuaW5nLmNvbS8pCj4KPiB0byBsZWFybiBtb3JlIGFib3V0ICpLKi1OZWFyZXN0IE5laWdoYm9ycyBjbGFzc2lmaWVycy4KCiMjIyBFbnNlbWJsZSBjbGFzc2lmaWVycwoKRW5zZW1ibGUgYWxnb3JpdGhtcyB3b3JrIGJ5IGNvbWJpbmluZyBtdWx0aXBsZSBiYXNlIGVzdGltYXRvcnMgdG8gcHJvZHVjZSBhbiBvcHRpbWFsIG1vZGVsIGVpdGhlciBieToKCmBiYWdnaW5nYDogYXBwbHlpbmcgYW4gKmF2ZXJhZ2luZyBmdW5jdGlvbiogdG8gYSBjb2xsZWN0aW9uIG9mIGJhc2UgbW9kZWxzCgpgYm9vc3RpbmdgOiBidWlsZGluZyBhIHNlcXVlbmNlIG9mIG1vZGVscyB0aGF0IGJ1aWxkIG9uIG9uZSBhbm90aGVyIHRvIGltcHJvdmUgcHJlZGljdGl2ZSBwZXJmb3JtYW5jZS4KCkxldCdzIHN0YXJ0IGJ5IHRyeWluZyBvdXQgYSBSYW5kb20gRm9yZXN0IG1vZGVsLCB3aGljaCBidWlsZHMgYSBsYXJnZSBjb2xsZWN0aW9uIG9mIGRlY2lzaW9uIHRyZWVzIHRoZW4gYXBwbGllcyBhbiBhdmVyYWdpbmcgZnVuY3Rpb24gdG8gZm9yIGEgYmV0dGVyIG92ZXJhbGwgbW9kZWwuCgpgYGB7ciByZn0KIyBNYWtlIGEgcmFuZG9tIGZvcmVzdCBzcGVjaWZpY2F0aW9uCnJmX3NwZWMgPC0gcmFuZF9mb3Jlc3QoKSAlPiUgCiAgc2V0X2VuZ2luZSgicmFuZ2VyIikgJT4lIAogIHNldF9tb2RlKCJjbGFzc2lmaWNhdGlvbiIpCgojIEJ1bmRsZSByZWNpcGUgYW5kIG1vZGVsIHNwZWNpZmljYXRpb24gaW50byBhIHdvcmtmbG93CnJmX3dmIDwtIHdvcmtmbG93KCkgJT4lIAogIGFkZF9yZWNpcGUoY3Vpc2luZXNfcmVjaXBlKSAlPiUgCiAgYWRkX21vZGVsKHJmX3NwZWMpCgojIFRyYWluIGEgcmFuZG9tIGZvcmVzdCBtb2RlbApyZl93Zl9maXQgPC0gcmZfd2YgJT4lIAogIGZpdChkYXRhID0gY3Vpc2luZXNfdHJhaW4pCgoKIyBNYWtlIHByZWRpY3Rpb25zIGFuZCBFdmFsdWF0ZSBtb2RlbCBwZXJmb3JtYW5jZQpyZl93Zl9maXQgJT4lIAogIGF1Z21lbnQobmV3X2RhdGEgPSBjdWlzaW5lc190ZXN0KSAlPiUgCiAgZXZhbF9tZXRyaWNzKHRydXRoID0gY3Vpc2luZSwgZXN0aW1hdGUgPSAucHJlZF9jbGFzcykKICAKCmBgYAoKR29vZCBqb2Ig8J+RjyEKCkxldCdzIGFsc28gZXhwZXJpbWVudCB3aXRoIGEgQm9vc3RlZCBUcmVlIG1vZGVsLgoKQm9vc3RlZCBUcmVlIGRlZmluZXMgYW4gZW5zZW1ibGUgbWV0aG9kIHRoYXQgY3JlYXRlcyBhIHNlcmllcyBvZiBzZXF1ZW50aWFsIGRlY2lzaW9uIHRyZWVzIHdoZXJlIGVhY2ggdHJlZSBkZXBlbmRzIG9uIHRoZSByZXN1bHRzIG9mIHByZXZpb3VzIHRyZWVzIGluIGFuIGF0dGVtcHQgdG8gaW5jcmVtZW50YWxseSByZWR1Y2UgdGhlIGVycm9yLiBJdCBmb2N1c2VzIG9uIHRoZSB3ZWlnaHRzIG9mIGluY29ycmVjdGx5IGNsYXNzaWZpZWQgaXRlbXMgYW5kIGFkanVzdHMgdGhlIGZpdCBmb3IgdGhlIG5leHQgY2xhc3NpZmllciB0byBjb3JyZWN0LgoKVGhlcmUgYXJlIGRpZmZlcmVudCB3YXlzIHRvIGZpdCB0aGlzIG1vZGVsIChzZWUgYGhlbHAoImJvb3N0X3RyZWUiKWApLiBJbiB0aGlzIGV4YW1wbGUsIHdlJ2xsIGZpdCBCb29zdGVkIHRyZWVzIHZpYSBgeGdib29zdGAgZW5naW5lLgoKYGBge3IgYm9vc3RlZF90cmVlfQojIE1ha2UgYSBib29zdGVkIHRyZWUgc3BlY2lmaWNhdGlvbgpib29zdF9zcGVjIDwtIGJvb3N0X3RyZWUodHJlZXMgPSAyMDApICU+JSAKICBzZXRfZW5naW5lKCJ4Z2Jvb3N0IikgJT4lIAogIHNldF9tb2RlKCJjbGFzc2lmaWNhdGlvbiIpCgojIEJ1bmRsZSByZWNpcGUgYW5kIG1vZGVsIHNwZWNpZmljYXRpb24gaW50byBhIHdvcmtmbG93CmJvb3N0X3dmIDwtIHdvcmtmbG93KCkgJT4lIAogIGFkZF9yZWNpcGUoY3Vpc2luZXNfcmVjaXBlKSAlPiUgCiAgYWRkX21vZGVsKGJvb3N0X3NwZWMpCgojIFRyYWluIGEgYm9vc3RlZCB0cmVlIG1vZGVsCmJvb3N0X3dmX2ZpdCA8LSBib29zdF93ZiAlPiUgCiAgZml0KGRhdGEgPSBjdWlzaW5lc190cmFpbikKCgojIE1ha2UgcHJlZGljdGlvbnMgYW5kIEV2YWx1YXRlIG1vZGVsIHBlcmZvcm1hbmNlCmJvb3N0X3dmX2ZpdCAlPiUgCiAgYXVnbWVudChuZXdfZGF0YSA9IGN1aXNpbmVzX3Rlc3QpICU+JSAKICBldmFsX21ldHJpY3ModHJ1dGggPSBjdWlzaW5lLCBlc3RpbWF0ZSA9IC5wcmVkX2NsYXNzKQpgYGAKCj4g4pyFIFBsZWFzZSBzZWU6Cj4KPiAtICAgW01hY2hpbmUgTGVhcm5pbmcgZm9yIFNvY2lhbCBTY2llbnRpc3RzXShodHRwczovL2NpbWVudGFkYWouZ2l0aHViLmlvL21sX3NvY3NjaS90cmVlLWJhc2VkLW1ldGhvZHMuaHRtbCNyYW5kb20tZm9yZXN0cykKPgo+IC0gICBbSGFuZHMtb24gTWFjaGluZSBMZWFybmluZyB3aXRoIFJdKGh0dHBzOi8vYnJhZGxleWJvZWhta2UuZ2l0aHViLmlvL0hPTUwvKQo+Cj4gLSAgIFtBbiBJbnRyb2R1Y3Rpb24gdG8gU3RhdGlzdGljYWwgTGVhcm5pbmcgd2l0aCBBcHBsaWNhdGlvbnMgaW4gUl0oaHR0cHM6Ly93d3cuc3RhdGxlYXJuaW5nLmNvbS8pCj4KPiAtICAgPGh0dHBzOi8vYWxnb3RlY2gubmV0bGlmeS5hcHAvYmxvZy94Z2Jvb3N0Lz4gLSBFeHBsb3JlcyB0aGUgQWRhQm9vc3QgbW9kZWwgd2hpY2ggaXMgYSBnb29kIGFsdGVybmF0aXZlIHRvIHhnYm9vc3QuCj4KPiB0byBsZWFybiBtb3JlIGFib3V0IEVuc2VtYmxlIGNsYXNzaWZpZXJzLgoKIyMgNC4gRXh0cmEgLSBjb21wYXJpbmcgbXVsdGlwbGUgbW9kZWxzCgpXZSBoYXZlIGZpdHRlZCBxdWl0ZSBhIG51bWJlciBvZiBtb2RlbHMgaW4gdGhpcyBsYWIg8J+ZjC4gSXQgY2FuIGJlY29tZSB0ZWRpb3VzIG9yIG9uZXJvdXMgdG8gY3JlYXRlIGEgbG90IG9mIHdvcmtmbG93cyBmcm9tIGRpZmZlcmVudCBzZXRzIG9mIHByZXByb2Nlc3NvcnMgYW5kL29yIG1vZGVsIHNwZWNpZmljYXRpb25zIGFuZCB0aGVuIGNhbGN1bGF0ZSB0aGUgcGVyZm9ybWFuY2UgbWV0cmljcyBvbmUgYnkgb25lLgoKTGV0J3Mgc2VlIGlmIHdlIGNhbiBhZGRyZXNzIHRoaXMgYnkgY3JlYXRpbmcgYSBmdW5jdGlvbiB0aGF0IGZpdHMgYSBsaXN0IG9mIHdvcmtmbG93cyBvbiB0aGUgdHJhaW5pbmcgc2V0IHRoZW4gcmV0dXJucyB0aGUgcGVyZm9ybWFuY2UgbWV0cmljcyBiYXNlZCBvbiB0aGUgdGVzdCBzZXQuIFdlJ2xsIGdldCB0byB1c2UgYG1hcCgpYCBhbmQgYG1hcF9kZnIoKWAgZnJvbSB0aGUgW3B1cnJyXShodHRwczovL3B1cnJyLnRpZHl2ZXJzZS5vcmcvKSBwYWNrYWdlIHRvIGFwcGx5IGZ1bmN0aW9ucyB0byBlYWNoIGVsZW1lbnQgaW4gbGlzdC4KCj4gW2BtYXAoKWBdKGh0dHBzOi8vcHVycnIudGlkeXZlcnNlLm9yZy9yZWZlcmVuY2UvbWFwLmh0bWwpIGZ1bmN0aW9ucyBhbGxvdyB5b3UgdG8gcmVwbGFjZSBtYW55IGZvciBsb29wcyB3aXRoIGNvZGUgdGhhdCBpcyBib3RoIG1vcmUgc3VjY2luY3QgYW5kIGVhc2llciB0byByZWFkLiBUaGUgYmVzdCBwbGFjZSB0byBsZWFybiBhYm91dCB0aGUgW2BtYXAoKWBdKGh0dHBzOi8vcHVycnIudGlkeXZlcnNlLm9yZy9yZWZlcmVuY2UvbWFwLmh0bWwpIGZ1bmN0aW9ucyBpcyB0aGUgW2l0ZXJhdGlvbiBjaGFwdGVyXShodHRwOi8vcjRkcy5oYWQuY28ubnovaXRlcmF0aW9uLmh0bWwpIGluIFIgZm9yIGRhdGEgc2NpZW5jZS4KCmBgYHtyIGNvbXBhcmVfbW9kZWxzfQpzZXQuc2VlZCgyMDU2KQoKIyBDcmVhdGUgYSBtZXRyaWMgc2V0CmV2YWxfbWV0cmljcyA8LSBtZXRyaWNfc2V0KHBwdiwgc2VucywgYWNjdXJhY3ksIGZfbWVhcykKCiMgRGVmaW5lIGEgZnVuY3Rpb24gdGhhdCByZXR1cm5zIHBlcmZvcm1hbmNlIG1ldHJpY3MKY29tcGFyZV9tb2RlbHMgPC0gZnVuY3Rpb24od29ya2Zsb3dfbGlzdCwgdHJhaW5fc2V0LCB0ZXN0X3NldCl7CiAgCiAgc3VwcHJlc3NXYXJuaW5ncygKICAgICMgRml0IGVhY2ggbW9kZWwgdG8gdGhlIHRyYWluX3NldAogICAgbWFwKHdvcmtmbG93X2xpc3QsIGZpdCwgZGF0YSA9IHRyYWluX3NldCkgJT4lIAogICAgIyBNYWtlIHByZWRpY3Rpb25zIG9uIHRoZSB0ZXN0IHNldAogICAgICBtYXBfZGZyKGF1Z21lbnQsIG5ld19kYXRhID0gdGVzdF9zZXQsIC5pZCA9ICJtb2RlbCIpICU+JQogICAgIyBTZWxlY3QgZGVzaXJlZCBjb2x1bW5zCiAgICAgIHNlbGVjdChtb2RlbCwgY3Vpc2luZSwgLnByZWRfY2xhc3MpICU+JSAKICAgICMgRXZhbHVhdGUgbW9kZWwgcGVyZm9ybWFuY2UKICAgICAgZ3JvdXBfYnkobW9kZWwpICU+JSAKICAgICAgZXZhbF9tZXRyaWNzKHRydXRoID0gY3Vpc2luZSwgZXN0aW1hdGUgPSAucHJlZF9jbGFzcykgJT4lIAogICAgICB1bmdyb3VwKCkKICApCiAgCn0gIyBFbmQgb2YgZnVuY3Rpb24KCgpgYGAKCkxldCdzIGNhbGwgb3VyIGZ1bmN0aW9uIGFuZCBjb21wYXJlIHRoZSBhY2N1cmFjeSBhY3Jvc3MgdGhlIG1vZGVscy4KCmBgYHtyIGNhbGxfZm59CiMgTWFrZSBhIGxpc3Qgb2Ygd29ya2Zsb3dzCndvcmtmbG93X2xpc3QgPC0gbGlzdCgKICAic3ZjIiA9IHN2Y19saW5lYXJfd2YsCiAgInN2bSIgPSBzdm1fcmJmX3dmLAogICJrbm4iID0ga25uX3dmLAogICJyYW5kb21fZm9yZXN0IiA9IHJmX3dmLAogICJ4Z2Jvb3N0IiA9IGJvb3N0X3dmKQoKIyBDYWxsIHRoZSBmdW5jdGlvbgpzZXQuc2VlZCgyMDU2KQpwZXJmX21ldHJpY3MgPC0gY29tcGFyZV9tb2RlbHMod29ya2Zsb3dfbGlzdCA9IHdvcmtmbG93X2xpc3QsIHRyYWluX3NldCA9IGN1aXNpbmVzX3RyYWluLCB0ZXN0X3NldCA9IGN1aXNpbmVzX3Rlc3QpCgojIFByaW50IG91dCBwZXJmb3JtYW5jZSBtZXRyaWNzCnBlcmZfbWV0cmljcyAlPiUgCiAgZ3JvdXBfYnkoLm1ldHJpYykgJT4lIAogIGFycmFuZ2UoZGVzYyguZXN0aW1hdGUpKSAlPiUgCiAgc2xpY2VfaGVhZChuPTcpCgojIENvbXBhcmUgYWNjdXJhY3kKcGVyZl9tZXRyaWNzICU+JSAKICBmaWx0ZXIoLm1ldHJpYyA9PSAiYWNjdXJhY3kiKSAlPiUgCiAgYXJyYW5nZShkZXNjKC5lc3RpbWF0ZSkpCgpgYGAKClsqKndvcmtmbG93c2V0KipdKGh0dHBzOi8vd29ya2Zsb3dzZXRzLnRpZHltb2RlbHMub3JnLykgcGFja2FnZSBhbGxvdyB1c2VycyB0byBjcmVhdGUgYW5kIGVhc2lseSBmaXQgYSBsYXJnZSBudW1iZXIgb2YgbW9kZWxzIGJ1dCBpcyBtb3N0bHkgZGVzaWduZWQgdG8gd29yayB3aXRoIHJlc2FtcGxpbmcgdGVjaG5pcXVlcyBzdWNoIGFzIGBjcm9zcy12YWxpZGF0aW9uYCwgYW4gYXBwcm9hY2ggd2UgYXJlIHlldCB0byBjb3Zlci4KCiMjICoq8J+agENoYWxsZW5nZSoqCgpFYWNoIG9mIHRoZXNlIHRlY2huaXF1ZXMgaGFzIGEgbGFyZ2UgbnVtYmVyIG9mIHBhcmFtZXRlcnMgdGhhdCB5b3UgY2FuIHR3ZWFrIGZvciBpbnN0YW5jZSBgY29zdGAgaW4gU1ZNcywgYG5laWdoYm9yc2AgaW4gS05OLCBgbXRyeWAgKFJhbmRvbWx5IFNlbGVjdGVkIFByZWRpY3RvcnMpIGluIFJhbmRvbSBGb3Jlc3QuCgpSZXNlYXJjaCBlYWNoIG9uZSdzIGRlZmF1bHQgcGFyYW1ldGVycyBhbmQgdGhpbmsgYWJvdXQgd2hhdCB0d2Vha2luZyB0aGVzZSBwYXJhbWV0ZXJzIHdvdWxkIG1lYW4gZm9yIHRoZSBtb2RlbCdzIHF1YWxpdHkuCgpUbyBmaW5kIG91dCBtb3JlIGFib3V0IGEgcGFydGljdWxhciBtb2RlbCBhbmQgaXRzIHBhcmFtZXRlcnMsIHVzZTogYGhlbHAoIm1vZGVsIilgIGUuZyBgaGVscCgicmFuZF9mb3Jlc3QiKWAKCj4gSW4gcHJhY3RpY2UsIHdlIHVzdWFsbHkgKmVzdGltYXRlKiB0aGUgKmJlc3QgdmFsdWVzKiBmb3IgdGhlc2UgYnkgdHJhaW5pbmcgbWFueSBtb2RlbHMgb24gYSBgc2ltdWxhdGVkIGRhdGEgc2V0YCBhbmQgbWVhc3VyaW5nIGhvdyB3ZWxsIGFsbCB0aGVzZSBtb2RlbHMgcGVyZm9ybS4gVGhpcyBwcm9jZXNzIGlzIGNhbGxlZCAqKnR1bmluZyoqLgoKIyMjIFsqKlBvc3QtbGVjdHVyZSBxdWl6KipdKGh0dHBzOi8vZ3JheS1zYW5kLTA3YTEwZjQwMy4xLmF6dXJlc3RhdGljYXBwcy5uZXQvcXVpei8yNC8pCgojIyMgKipSZXZpZXcgJiBTZWxmIFN0dWR5KioKClRoZXJlJ3MgYSBsb3Qgb2YgamFyZ29uIGluIHRoZXNlIGxlc3NvbnMsIHNvIHRha2UgYSBtaW51dGUgdG8gcmV2aWV3IFt0aGlzIGxpc3RdKGh0dHBzOi8vZG9jcy5taWNyb3NvZnQuY29tL2RvdG5ldC9tYWNoaW5lLWxlYXJuaW5nL3Jlc291cmNlcy9nbG9zc2FyeT9XVC5tY19pZD1hY2FkZW1pYy03Nzk1Mi1sZWVzdG90dCkgb2YgdXNlZnVsIHRlcm1pbm9sb2d5IQoKIyMjIyBUSEFOSyBZT1UgVE86CgpbYEFsbGlzb24gSG9yc3RgXShodHRwczovL3R3aXR0ZXIuY29tL2FsbGlzb25faG9yc3QvKSBmb3IgY3JlYXRpbmcgdGhlIGFtYXppbmcgaWxsdXN0cmF0aW9ucyB0aGF0IG1ha2UgUiBtb3JlIHdlbGNvbWluZyBhbmQgZW5nYWdpbmcuIEZpbmQgbW9yZSBpbGx1c3RyYXRpb25zIGF0IGhlciBbZ2FsbGVyeV0oaHR0cHM6Ly93d3cuZ29vZ2xlLmNvbS91cmw/cT1odHRwczovL2dpdGh1Yi5jb20vYWxsaXNvbmhvcnN0L3N0YXRzLWlsbHVzdHJhdGlvbnMmc2E9RCZzb3VyY2U9ZWRpdG9ycyZ1c3Q9MTYyNjM4MDc3MjUzMDAwMCZ1c2c9QU92VmF3M3pjZnlDaXpGUVpwa1NMenhpaVFFTSkuCgpbQ2Fzc2llIEJyZXZpdV0oaHR0cHM6Ly93d3cudHdpdHRlci5jb20vY2Fzc2lldmlldykgYW5kIFtKZW4gTG9vcGVyXShodHRwczovL3d3dy50d2l0dGVyLmNvbS9qZW5sb29wZXIpIGZvciBjcmVhdGluZyB0aGUgb3JpZ2luYWwgUHl0aG9uIHZlcnNpb24gb2YgdGhpcyBtb2R1bGUg4pml77iPCgpIYXBweSBMZWFybmluZywKCltFcmljXShodHRwczovL3R3aXR0ZXIuY29tL2VyaWNudGF5KSwgR29sZCBNaWNyb3NvZnQgTGVhcm4gU3R1ZGVudCBBbWJhc3NhZG9yLgoKIVtBcnR3b3JrIGJ5IFxAYWxsaXNvbl9ob3JzdF0oLi4vLi4vaW1hZ2VzL3JfbGVhcm5lcnNfc20uanBlZykK
+ + +
+
+ +
+ + + + + + + + + + + + + + + + + diff --git a/5-Clustering/1-Visualize/README.md b/5-Clustering/1-Visualize/README.md index 480aaef3c..cbff4726c 100644 --- a/5-Clustering/1-Visualize/README.md +++ b/5-Clustering/1-Visualize/README.md @@ -99,7 +99,7 @@ There are over 100 clustering algorithms, and their use depends on the nature of Clustering as a technique is greatly aided by proper visualization, so let's get started by visualizing our music data. This exercise will help us decide which of the methods of clustering we should most effectively use for the nature of this data. -1. Open the _notebook.ipynb_ file in this folder. +1. Open the [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/notebook.ipynb) file in this folder. 1. Import the `Seaborn` package for good data visualization. @@ -107,7 +107,7 @@ Clustering as a technique is greatly aided by proper visualization, so let's get !pip install seaborn ``` -1. Append the song data from _nigerian-songs.csv_. Load up a dataframe with some data about the songs. Get ready to explore this data by importing the libraries and dumping out the data: +1. Append the song data from [_nigerian-songs.csv_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/data/nigerian-songs.csv). Load up a dataframe with some data about the songs. Get ready to explore this data by importing the libraries and dumping out the data: ```python import matplotlib.pyplot as plt diff --git a/5-Clustering/1-Visualize/solution/R/lesson_14.html b/5-Clustering/1-Visualize/solution/R/lesson_14.html new file mode 100644 index 000000000..c6b791f3c --- /dev/null +++ b/5-Clustering/1-Visualize/solution/R/lesson_14.html @@ -0,0 +1,5445 @@ + + + + + + + + + + + + + +Introduction to clustering: Clean, prep and visualize your data + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

Nigerian Music scraped from Spotify - an +analysis

+

Clustering is a type of Unsupervised +Learning that presumes that a dataset is unlabelled or that its +inputs are not matched with predefined outputs. It uses various +algorithms to sort through unlabeled data and provide groupings +according to patterns it discerns in the data.

+

Pre-lecture +quiz

+
+

Introduction

+

Clustering +is very useful for data exploration. Let’s see if it can help discover +trends and patterns in the way Nigerian audiences consume music.

+
+

✅ Take a minute to think about the uses of clustering. In real life, +clustering happens whenever you have a pile of laundry and need to sort +out your family members’ clothes 🧦👕👖🩲. In data science, clustering +happens when trying to analyze a user’s preferences, or determine the +characteristics of any unlabeled dataset. Clustering, in a way, helps +make sense of chaos, like a sock drawer.

+
+

In a professional setting, clustering can be used to determine things +like market segmentation, determining what age groups buy what items, +for example. Another use would be anomaly detection, perhaps to detect +fraud from a dataset of credit card transactions. Or you might use +clustering to determine tumors in a batch of medical scans.

+

✅ Think a minute about how you might have encountered clustering ‘in +the wild’, in a banking, e-commerce, or business setting.

+
+

🎓 Interestingly, cluster analysis originated in the fields of +Anthropology and Psychology in the 1930s. Can you imagine how it might +have been used?

+
+

Alternately, you could use it for grouping search results - by +shopping links, images, or reviews, for example. Clustering is useful +when you have a large dataset that you want to reduce and on which you +want to perform more granular analysis, so the technique can be used to +learn about data before other models are constructed.

+

✅ Once your data is organized in clusters, you assign it a cluster +Id, and this technique can be useful when preserving a dataset’s +privacy; you can instead refer to a data point by its cluster id, rather +than by more revealing identifiable data. Can you think of other reasons +why you’d refer to a cluster Id rather than other elements of the +cluster to identify it?

+
+
+

Getting started with clustering

+
+

🎓 How we create clusters has a lot to do with how we gather up the +data points into groups. Let’s unpack some vocabulary:

+

🎓 ‘Transductive’ +vs. ‘inductive’

+

Transductive inference is derived from observed training cases that +map to specific test cases. Inductive inference is derived from training +cases that map to general rules which are only then applied to test +cases.

+

An example: Imagine you have a dataset that is only partially +labelled. Some things are ‘records’, some ‘cds’, and some are blank. +Your job is to provide labels for the blanks. If you choose an inductive +approach, you’d train a model looking for ‘records’ and ‘cds’, and apply +those labels to your unlabeled data. This approach will have trouble +classifying things that are actually ‘cassettes’. A transductive +approach, on the other hand, handles this unknown data more effectively +as it works to group similar items together and then applies a label to +a group. In this case, clusters might reflect ‘round musical things’ and +‘square musical things’.

+

🎓 ‘Non-flat’ +vs. ‘flat’ geometry

+

Derived from mathematical terminology, non-flat vs. flat geometry +refers to the measure of distances between points by either ‘flat’ (Euclidean) or +‘non-flat’ (non-Euclidean) geometrical methods.

+

‘Flat’ in this context refers to Euclidean geometry (parts of which +are taught as ‘plane’ geometry), and non-flat refers to non-Euclidean +geometry. What does geometry have to do with machine learning? Well, as +two fields that are rooted in mathematics, there must be a common way to +measure distances between points in clusters, and that can be done in a +‘flat’ or ‘non-flat’ way, depending on the nature of the data. Euclidean +distances are measured as the length of a line segment between two +points. Non-Euclidean +distances are measured along a curve. If your data, visualized, +seems to not exist on a plane, you might need to use a specialized +algorithm to handle it.

+
+
+Infographic by Dasani Madipalli +
Infographic by Dasani Madipalli
+
+
+

🎓 ‘Distances’

+

Clusters are defined by their distance matrix, e.g. the distances +between points. This distance can be measured a few ways. Euclidean +clusters are defined by the average of the point values, and contain a +‘centroid’ or center point. Distances are thus measured by the distance +to that centroid. Non-Euclidean distances refer to ‘clustroids’, the +point closest to other points. Clustroids in turn can be defined in +various ways.

+

🎓 ‘Constrained’

+

Constrained +Clustering introduces ‘semi-supervised’ learning into this +unsupervised method. The relationships between points are flagged as +‘cannot link’ or ‘must-link’ so some rules are forced on the +dataset.

+

An example: If an algorithm is set free on a batch of unlabelled or +semi-labelled data, the clusters it produces may be of poor quality. In +the example above, the clusters might group ‘round music things’ and +‘square music things’ and ‘triangular things’ and ‘cookies’. If given +some constraints, or rules to follow (“the item must be made of +plastic”, “the item needs to be able to produce music”) this can help +‘constrain’ the algorithm to make better choices.

+

🎓 ‘Density’

+

Data that is ‘noisy’ is considered to be ‘dense’. The distances +between points in each of its clusters may prove, on examination, to be +more or less dense, or ‘crowded’ and thus this data needs to be analyzed +with the appropriate clustering method. This +article demonstrates the difference between using K-Means clustering +vs. HDBSCAN algorithms to explore a noisy dataset with uneven cluster +density.

+
+

Deepen your understanding of clustering techniques in this Learn +module

+
+
+

Clustering algorithms

+

There are over 100 clustering algorithms, and their use depends on +the nature of the data at hand. Let’s discuss some of the major +ones:

+
    +
  • Hierarchical clustering. If an object is classified +by its proximity to a nearby object, rather than to one farther away, +clusters are formed based on their members’ distance to and from other +objects. Hierarchical clustering is characterized by repeatedly +combining two clusters.
  • +
+
+Infographic by Dasani Madipalli +
Infographic by Dasani Madipalli
+
+
    +
  • Centroid clustering. This popular algorithm +requires the choice of ‘k’, or the number of clusters to form, after +which the algorithm determines the center point of a cluster and gathers +data around that point. K-means +clustering is a popular version of centroid clustering which +separates a data set into pre-defined K groups. The center is determined +by the nearest mean, thus the name. The squared distance from the +cluster is minimized.Infographic by Dasani Madipalli

  • +
  • Distribution-based clustering. Based in +statistical modeling, distribution-based clustering centers on +determining the probability that a data point belongs to a cluster, and +assigning it accordingly. Gaussian mixture methods belong to this +type.

  • +
  • Density-based clustering. Data points are +assigned to clusters based on their density, or their grouping around +each other. Data points far from the group are considered outliers or +noise. DBSCAN, Mean-shift and OPTICS belong to this type of +clustering.

  • +
  • Grid-based clustering. For multi-dimensional +datasets, a grid is created and the data is divided amongst the grid’s +cells, thereby creating clusters.

  • +
+

The best way to learn about clustering is to try it for yourself, so +that’s what you’ll do in this exercise.

+

We’ll require some packages to knock-off this module. You can have +them installed as: +install.packages(c('tidyverse', 'tidymodels', 'DataExplorer', 'summarytools', 'plotly', 'paletteer', 'corrplot', 'patchwork'))

+

Alternatively, the script below checks whether you have the packages +required to complete this module and installs them for you in case some +are missing.

+
suppressWarnings(if(!require("pacman")) install.packages("pacman"))
+
## Loading required package: pacman
+
pacman::p_load('tidyverse', 'tidymodels', 'DataExplorer', 'summarytools', 'plotly', 'paletteer', 'corrplot', 'patchwork')
+
## 
+## The downloaded binary packages are in
+##  /var/folders/c9/r3f6t3kj3wv9jrh50g63hp1r0000gn/T//RtmplRAI5s/downloaded_packages
+
## 
+## summarytools installed
+
## Warning in pacman::p_load("tidyverse", "tidymodels", "DataExplorer", "summarytools", : Failed to install/load:
+## summarytools
+
knitr::opts_chunk$set(warning = F, message = F)
+
+
+
+

Exercise - cluster your data

+

Clustering as a technique is greatly aided by proper visualization, +so let’s get started by visualizing our music data. This exercise will +help us decide which of the methods of clustering we should most +effectively use for the nature of this data.

+

Let’s hit the ground running by importing the data.

+
# Load the core tidyverse and make it available in your current R session
+library(tidyverse)
+
+# Import the data into a tibble
+df <- read_csv(file = "https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/5-Clustering/data/nigerian-songs.csv")
+
+# View the first 5 rows of the data set
+df %>% 
+  slice_head(n = 5)
+
+ +
+

Sometimes, we may want some little more information on our data. We +can have a look at the data and its structure +by using the glimpse() +function:

+
# Glimpse into the data set
+df %>% 
+  glimpse()
+
## Rows: 530
+## Columns: 16
+## $ name             <chr> "Sparky", "shuga rush", "LITT!", "Confident / Feeling…
+## $ album            <chr> "Mandy & The Jungle", "EVERYTHING YOU HEARD IS TRUE",…
+## $ artist           <chr> "Cruel Santino", "Odunsi (The Engine)", "AYLØ", "Lady…
+## $ artist_top_genre <chr> "alternative r&b", "afropop", "indie r&b", "nigerian …
+## $ release_date     <dbl> 2019, 2020, 2018, 2019, 2018, 2020, 2018, 2018, 2019,…
+## $ length           <dbl> 144000, 89488, 207758, 175135, 152049, 184800, 202648…
+## $ popularity       <dbl> 48, 30, 40, 14, 25, 26, 29, 27, 36, 30, 33, 35, 46, 2…
+## $ danceability     <dbl> 0.666, 0.710, 0.836, 0.894, 0.702, 0.803, 0.818, 0.80…
+## $ acousticness     <dbl> 0.8510, 0.0822, 0.2720, 0.7980, 0.1160, 0.1270, 0.452…
+## $ energy           <dbl> 0.420, 0.683, 0.564, 0.611, 0.833, 0.525, 0.587, 0.30…
+## $ instrumentalness <dbl> 5.34e-01, 1.69e-04, 5.37e-04, 1.87e-04, 9.10e-01, 6.6…
+## $ liveness         <dbl> 0.1100, 0.1010, 0.1100, 0.0964, 0.3480, 0.1290, 0.590…
+## $ loudness         <dbl> -6.699, -5.640, -7.127, -4.961, -6.044, -10.034, -9.8…
+## $ speechiness      <dbl> 0.0829, 0.3600, 0.0424, 0.1130, 0.0447, 0.1970, 0.199…
+## $ tempo            <dbl> 133.015, 129.993, 130.005, 111.087, 105.115, 100.103,…
+## $ time_signature   <dbl> 5, 3, 4, 4, 4, 4, 4, 4, 4, 4, 4, 4, 5, 4, 4, 4, 4, 4,…
+

Good job!💪

+

We can observe that glimpse() will give you the total +number of rows (observations) and columns (variables), then, the first +few entries of each variable in a row after the variable name. In +addition, the data type of the variable is given immediately +after each variable’s name inside < >.

+

DataExplorer::introduce() can summarize this information +neatly:

+
# Describe basic information for our data
+df %>% 
+  introduce()
+
+ +
+
# A visual display of the same
+df %>% 
+  plot_intro()
+

+

Awesome! We have just learnt that our data has no missing values.

+

While we are at it, we can explore common central tendency statistics +(e.g mean +and median) and +measures of dispersion (e.g standard +deviation) using summarytools::descr()

+
# Describe common statistics
+df %>% descr(stats = "common")
+
+

Let’s look at the general values of the data. Note that popularity +can be 0, which show songs that have no ranking. We’ll +remove those shortly.

+
+

🤔 If we are working with clustering, an unsupervised method that +does not require labeled data, why are we showing this data with labels? +In the data exploration phase, they come in handy, but they are not +necessary for the clustering algorithms to work.

+
+ +
+

2. Explore data distribution

+

Let’s ask some more subtle questions. Are the genres significantly +different in the perception of their danceability, based on their +popularity? Let’s examine our top three genres data distribution for +popularity and danceability along a given x and y axis using density +plots.

+
# Perform 2D kernel density estimation
+density_estimate_2d <- nigerian_songs %>% 
+  ggplot(mapping = aes(x = popularity, y = danceability, color = artist_top_genre)) +
+  geom_density_2d(bins = 5, size = 1) +
+  paletteer::scale_color_paletteer_d("RSkittleBrewer::wildberry") +
+  xlim(-20, 80) +
+  ylim(0, 1.2)
+
+# Density plot based on the popularity
+density_estimate_pop <- nigerian_songs %>% 
+  ggplot(mapping = aes(x = popularity, fill = artist_top_genre, color = artist_top_genre)) +
+  geom_density(size = 1, alpha = 0.5) +
+  paletteer::scale_fill_paletteer_d("RSkittleBrewer::wildberry") +
+  paletteer::scale_color_paletteer_d("RSkittleBrewer::wildberry") +
+  theme(legend.position = "none")
+
+# Density plot based on the danceability
+density_estimate_dance <- nigerian_songs %>% 
+  ggplot(mapping = aes(x = danceability, fill = artist_top_genre, color = artist_top_genre)) +
+  geom_density(size = 1, alpha = 0.5) +
+  paletteer::scale_fill_paletteer_d("RSkittleBrewer::wildberry") +
+  paletteer::scale_color_paletteer_d("RSkittleBrewer::wildberry")
+
+
+# Patch everything together
+library(patchwork)
+density_estimate_2d / (density_estimate_pop + density_estimate_dance)
+

+

We see that there are concentric circles that line up, regardless of +genre. Could it be that Nigerian tastes converge at a certain level of +danceability for this genre?

+

In general, the three genres align in terms of their popularity and +danceability. Determining clusters in this loosely-aligned data will be +a challenge. Let’s see whether a scatter plot can support this.

+
# A scatter plot of popularity and danceability
+scatter_plot <- nigerian_songs %>% 
+  ggplot(mapping = aes(x = popularity, y = danceability, color = artist_top_genre, shape = artist_top_genre)) +
+  geom_point(size = 2, alpha = 0.8) +
+  paletteer::scale_color_paletteer_d("futurevisions::mars")
+
+# Add a touch of interactivity
+ggplotly(scatter_plot)
+
+ +

A scatterplot of the same axes shows a similar pattern of +convergence.

+

In general, for clustering, you can use scatterplots to show clusters +of data, so mastering this type of visualization is very useful. In the +next lesson, we will take this filtered data and use k-means clustering +to discover groups in this data that see to overlap in interesting +ways.

+
+
+
+

🚀 Challenge

+

In preparation for the next lesson, make a chart about the various +clustering algorithms you might discover and use in a production +environment. What kinds of problems is the clustering trying to +address?

+
+
+

Post-lecture +quiz

+
+
+

Review & Self Study

+

Before you apply clustering algorithms, as we have learned, it’s a +good idea to understand the nature of your dataset. Read more on this +topic here

+

Deepen your understanding of clustering techniques:

+ +
+ +
+

THANK YOU TO:

+

Jen Looper for +creating the original Python version of this module ♥️

+

Dasani Madipalli +for creating the amazing illustrations that make machine learning +concepts more interpretable and easier to understand.

+

Happy Learning,

+

Eric, Gold Microsoft Learn +Student Ambassador.

+
+ +
LS0tCnRpdGxlOiAnSW50cm9kdWN0aW9uIHRvIGNsdXN0ZXJpbmc6IENsZWFuLCBwcmVwIGFuZCB2aXN1YWxpemUgeW91ciBkYXRhJwpvdXRwdXQ6CiAgaHRtbF9kb2N1bWVudDoKICAgIGRmX3ByaW50OiBwYWdlZAogICAgdGhlbWU6IGZsYXRseQogICAgaGlnaGxpZ2h0OiBicmVlemVkYXJrCiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICAgIGNvZGVfZG93bmxvYWQ6IHllcwotLS0KCiMjICoqTmlnZXJpYW4gTXVzaWMgc2NyYXBlZCBmcm9tIFNwb3RpZnkgLSBhbiBhbmFseXNpcyoqCgpDbHVzdGVyaW5nIGlzIGEgdHlwZSBvZiBbVW5zdXBlcnZpc2VkIExlYXJuaW5nXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9VbnN1cGVydmlzZWRfbGVhcm5pbmcpIHRoYXQgcHJlc3VtZXMgdGhhdCBhIGRhdGFzZXQgaXMgdW5sYWJlbGxlZCBvciB0aGF0IGl0cyBpbnB1dHMgYXJlIG5vdCBtYXRjaGVkIHdpdGggcHJlZGVmaW5lZCBvdXRwdXRzLiBJdCB1c2VzIHZhcmlvdXMgYWxnb3JpdGhtcyB0byBzb3J0IHRocm91Z2ggdW5sYWJlbGVkIGRhdGEgYW5kIHByb3ZpZGUgZ3JvdXBpbmdzIGFjY29yZGluZyB0byBwYXR0ZXJucyBpdCBkaXNjZXJucyBpbiB0aGUgZGF0YS4KClsqKlByZS1sZWN0dXJlIHF1aXoqKl0oaHR0cHM6Ly9ncmF5LXNhbmQtMDdhMTBmNDAzLjEuYXp1cmVzdGF0aWNhcHBzLm5ldC9xdWl6LzI3LykKCiMjIyAqKkludHJvZHVjdGlvbioqCgpbQ2x1c3RlcmluZ10oaHR0cHM6Ly9saW5rLnNwcmluZ2VyLmNvbS9yZWZlcmVuY2V3b3JrZW50cnkvMTAuMTAwNyUyRjk3OC0wLTM4Ny0zMDE2NC04XzEyNCkgaXMgdmVyeSB1c2VmdWwgZm9yIGRhdGEgZXhwbG9yYXRpb24uIExldCdzIHNlZSBpZiBpdCBjYW4gaGVscCBkaXNjb3ZlciB0cmVuZHMgYW5kIHBhdHRlcm5zIGluIHRoZSB3YXkgTmlnZXJpYW4gYXVkaWVuY2VzIGNvbnN1bWUgbXVzaWMuCgo+IOKchSBUYWtlIGEgbWludXRlIHRvIHRoaW5rIGFib3V0IHRoZSB1c2VzIG9mIGNsdXN0ZXJpbmcuIEluIHJlYWwgbGlmZSwgY2x1c3RlcmluZyBoYXBwZW5zIHdoZW5ldmVyIHlvdSBoYXZlIGEgcGlsZSBvZiBsYXVuZHJ5IGFuZCBuZWVkIHRvIHNvcnQgb3V0IHlvdXIgZmFtaWx5IG1lbWJlcnMnIGNsb3RoZXMg8J+npvCfkZXwn5GW8J+psi4gSW4gZGF0YSBzY2llbmNlLCBjbHVzdGVyaW5nIGhhcHBlbnMgd2hlbiB0cnlpbmcgdG8gYW5hbHl6ZSBhIHVzZXIncyBwcmVmZXJlbmNlcywgb3IgZGV0ZXJtaW5lIHRoZSBjaGFyYWN0ZXJpc3RpY3Mgb2YgYW55IHVubGFiZWxlZCBkYXRhc2V0LiBDbHVzdGVyaW5nLCBpbiBhIHdheSwgaGVscHMgbWFrZSBzZW5zZSBvZiBjaGFvcywgbGlrZSBhIHNvY2sgZHJhd2VyLgoKSW4gYSBwcm9mZXNzaW9uYWwgc2V0dGluZywgY2x1c3RlcmluZyBjYW4gYmUgdXNlZCB0byBkZXRlcm1pbmUgdGhpbmdzIGxpa2UgbWFya2V0IHNlZ21lbnRhdGlvbiwgZGV0ZXJtaW5pbmcgd2hhdCBhZ2UgZ3JvdXBzIGJ1eSB3aGF0IGl0ZW1zLCBmb3IgZXhhbXBsZS4gQW5vdGhlciB1c2Ugd291bGQgYmUgYW5vbWFseSBkZXRlY3Rpb24sIHBlcmhhcHMgdG8gZGV0ZWN0IGZyYXVkIGZyb20gYSBkYXRhc2V0IG9mIGNyZWRpdCBjYXJkIHRyYW5zYWN0aW9ucy4gT3IgeW91IG1pZ2h0IHVzZSBjbHVzdGVyaW5nIHRvIGRldGVybWluZSB0dW1vcnMgaW4gYSBiYXRjaCBvZiBtZWRpY2FsIHNjYW5zLgoK4pyFIFRoaW5rIGEgbWludXRlIGFib3V0IGhvdyB5b3UgbWlnaHQgaGF2ZSBlbmNvdW50ZXJlZCBjbHVzdGVyaW5nICdpbiB0aGUgd2lsZCcsIGluIGEgYmFua2luZywgZS1jb21tZXJjZSwgb3IgYnVzaW5lc3Mgc2V0dGluZy4KCj4g8J+OkyBJbnRlcmVzdGluZ2x5LCBjbHVzdGVyIGFuYWx5c2lzIG9yaWdpbmF0ZWQgaW4gdGhlIGZpZWxkcyBvZiBBbnRocm9wb2xvZ3kgYW5kIFBzeWNob2xvZ3kgaW4gdGhlIDE5MzBzLiBDYW4geW91IGltYWdpbmUgaG93IGl0IG1pZ2h0IGhhdmUgYmVlbiB1c2VkPwoKQWx0ZXJuYXRlbHksIHlvdSBjb3VsZCB1c2UgaXQgZm9yIGdyb3VwaW5nIHNlYXJjaCByZXN1bHRzIC0gYnkgc2hvcHBpbmcgbGlua3MsIGltYWdlcywgb3IgcmV2aWV3cywgZm9yIGV4YW1wbGUuIENsdXN0ZXJpbmcgaXMgdXNlZnVsIHdoZW4geW91IGhhdmUgYSBsYXJnZSBkYXRhc2V0IHRoYXQgeW91IHdhbnQgdG8gcmVkdWNlIGFuZCBvbiB3aGljaCB5b3Ugd2FudCB0byBwZXJmb3JtIG1vcmUgZ3JhbnVsYXIgYW5hbHlzaXMsIHNvIHRoZSB0ZWNobmlxdWUgY2FuIGJlIHVzZWQgdG8gbGVhcm4gYWJvdXQgZGF0YSBiZWZvcmUgb3RoZXIgbW9kZWxzIGFyZSBjb25zdHJ1Y3RlZC4KCuKchSBPbmNlIHlvdXIgZGF0YSBpcyBvcmdhbml6ZWQgaW4gY2x1c3RlcnMsIHlvdSBhc3NpZ24gaXQgYSBjbHVzdGVyIElkLCBhbmQgdGhpcyB0ZWNobmlxdWUgY2FuIGJlIHVzZWZ1bCB3aGVuIHByZXNlcnZpbmcgYSBkYXRhc2V0J3MgcHJpdmFjeTsgeW91IGNhbiBpbnN0ZWFkIHJlZmVyIHRvIGEgZGF0YSBwb2ludCBieSBpdHMgY2x1c3RlciBpZCwgcmF0aGVyIHRoYW4gYnkgbW9yZSByZXZlYWxpbmcgaWRlbnRpZmlhYmxlIGRhdGEuIENhbiB5b3UgdGhpbmsgb2Ygb3RoZXIgcmVhc29ucyB3aHkgeW91J2QgcmVmZXIgdG8gYSBjbHVzdGVyIElkIHJhdGhlciB0aGFuIG90aGVyIGVsZW1lbnRzIG9mIHRoZSBjbHVzdGVyIHRvIGlkZW50aWZ5IGl0PwoKIyMjIEdldHRpbmcgc3RhcnRlZCB3aXRoIGNsdXN0ZXJpbmcKCj4g8J+OkyBIb3cgd2UgY3JlYXRlIGNsdXN0ZXJzIGhhcyBhIGxvdCB0byBkbyB3aXRoIGhvdyB3ZSBnYXRoZXIgdXAgdGhlIGRhdGEgcG9pbnRzIGludG8gZ3JvdXBzLiBMZXQncyB1bnBhY2sgc29tZSB2b2NhYnVsYXJ5Ogo+Cj4g8J+OkyBbJ1RyYW5zZHVjdGl2ZScgdnMuICdpbmR1Y3RpdmUnXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9UcmFuc2R1Y3Rpb25fKG1hY2hpbmVfbGVhcm5pbmcpKQo+Cj4gVHJhbnNkdWN0aXZlIGluZmVyZW5jZSBpcyBkZXJpdmVkIGZyb20gb2JzZXJ2ZWQgdHJhaW5pbmcgY2FzZXMgdGhhdCBtYXAgdG8gc3BlY2lmaWMgdGVzdCBjYXNlcy4gSW5kdWN0aXZlIGluZmVyZW5jZSBpcyBkZXJpdmVkIGZyb20gdHJhaW5pbmcgY2FzZXMgdGhhdCBtYXAgdG8gZ2VuZXJhbCBydWxlcyB3aGljaCBhcmUgb25seSB0aGVuIGFwcGxpZWQgdG8gdGVzdCBjYXNlcy4KPgo+IEFuIGV4YW1wbGU6IEltYWdpbmUgeW91IGhhdmUgYSBkYXRhc2V0IHRoYXQgaXMgb25seSBwYXJ0aWFsbHkgbGFiZWxsZWQuIFNvbWUgdGhpbmdzIGFyZSAncmVjb3JkcycsIHNvbWUgJ2NkcycsIGFuZCBzb21lIGFyZSBibGFuay4gWW91ciBqb2IgaXMgdG8gcHJvdmlkZSBsYWJlbHMgZm9yIHRoZSBibGFua3MuIElmIHlvdSBjaG9vc2UgYW4gaW5kdWN0aXZlIGFwcHJvYWNoLCB5b3UnZCB0cmFpbiBhIG1vZGVsIGxvb2tpbmcgZm9yICdyZWNvcmRzJyBhbmQgJ2NkcycsIGFuZCBhcHBseSB0aG9zZSBsYWJlbHMgdG8geW91ciB1bmxhYmVsZWQgZGF0YS4gVGhpcyBhcHByb2FjaCB3aWxsIGhhdmUgdHJvdWJsZSBjbGFzc2lmeWluZyB0aGluZ3MgdGhhdCBhcmUgYWN0dWFsbHkgJ2Nhc3NldHRlcycuIEEgdHJhbnNkdWN0aXZlIGFwcHJvYWNoLCBvbiB0aGUgb3RoZXIgaGFuZCwgaGFuZGxlcyB0aGlzIHVua25vd24gZGF0YSBtb3JlIGVmZmVjdGl2ZWx5IGFzIGl0IHdvcmtzIHRvIGdyb3VwIHNpbWlsYXIgaXRlbXMgdG9nZXRoZXIgYW5kIHRoZW4gYXBwbGllcyBhIGxhYmVsIHRvIGEgZ3JvdXAuIEluIHRoaXMgY2FzZSwgY2x1c3RlcnMgbWlnaHQgcmVmbGVjdCAncm91bmQgbXVzaWNhbCB0aGluZ3MnIGFuZCAnc3F1YXJlIG11c2ljYWwgdGhpbmdzJy4KPgo+IPCfjpMgWydOb24tZmxhdCcgdnMuICdmbGF0JyBnZW9tZXRyeV0oaHR0cHM6Ly9kYXRhc2NpZW5jZS5zdGFja2V4Y2hhbmdlLmNvbS9xdWVzdGlvbnMvNTIyNjAvdGVybWlub2xvZ3ktZmxhdC1nZW9tZXRyeS1pbi10aGUtY29udGV4dC1vZi1jbHVzdGVyaW5nKQo+Cj4gRGVyaXZlZCBmcm9tIG1hdGhlbWF0aWNhbCB0ZXJtaW5vbG9neSwgbm9uLWZsYXQgdnMuIGZsYXQgZ2VvbWV0cnkgcmVmZXJzIHRvIHRoZSBtZWFzdXJlIG9mIGRpc3RhbmNlcyBiZXR3ZWVuIHBvaW50cyBieSBlaXRoZXIgJ2ZsYXQnIChbRXVjbGlkZWFuXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9FdWNsaWRlYW5fZ2VvbWV0cnkpKSBvciAnbm9uLWZsYXQnIChub24tRXVjbGlkZWFuKSBnZW9tZXRyaWNhbCBtZXRob2RzLgo+Cj4gJ0ZsYXQnIGluIHRoaXMgY29udGV4dCByZWZlcnMgdG8gRXVjbGlkZWFuIGdlb21ldHJ5IChwYXJ0cyBvZiB3aGljaCBhcmUgdGF1Z2h0IGFzICdwbGFuZScgZ2VvbWV0cnkpLCBhbmQgbm9uLWZsYXQgcmVmZXJzIHRvIG5vbi1FdWNsaWRlYW4gZ2VvbWV0cnkuIFdoYXQgZG9lcyBnZW9tZXRyeSBoYXZlIHRvIGRvIHdpdGggbWFjaGluZSBsZWFybmluZz8gV2VsbCwgYXMgdHdvIGZpZWxkcyB0aGF0IGFyZSByb290ZWQgaW4gbWF0aGVtYXRpY3MsIHRoZXJlIG11c3QgYmUgYSBjb21tb24gd2F5IHRvIG1lYXN1cmUgZGlzdGFuY2VzIGJldHdlZW4gcG9pbnRzIGluIGNsdXN0ZXJzLCBhbmQgdGhhdCBjYW4gYmUgZG9uZSBpbiBhICdmbGF0JyBvciAnbm9uLWZsYXQnIHdheSwgZGVwZW5kaW5nIG9uIHRoZSBuYXR1cmUgb2YgdGhlIGRhdGEuIFtFdWNsaWRlYW4gZGlzdGFuY2VzXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9FdWNsaWRlYW5fZGlzdGFuY2UpIGFyZSBtZWFzdXJlZCBhcyB0aGUgbGVuZ3RoIG9mIGEgbGluZSBzZWdtZW50IGJldHdlZW4gdHdvIHBvaW50cy4gW05vbi1FdWNsaWRlYW4gZGlzdGFuY2VzXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9Ob24tRXVjbGlkZWFuX2dlb21ldHJ5KSBhcmUgbWVhc3VyZWQgYWxvbmcgYSBjdXJ2ZS4gSWYgeW91ciBkYXRhLCB2aXN1YWxpemVkLCBzZWVtcyB0byBub3QgZXhpc3Qgb24gYSBwbGFuZSwgeW91IG1pZ2h0IG5lZWQgdG8gdXNlIGEgc3BlY2lhbGl6ZWQgYWxnb3JpdGhtIHRvIGhhbmRsZSBpdC4KCiFbSW5mb2dyYXBoaWMgYnkgRGFzYW5pIE1hZGlwYWxsaV0oLi4vLi4vaW1hZ2VzL2ZsYXQtbm9uZmxhdC5wbmcpe3dpZHRoPSI1MDAifQoKPiDwn46TIFsnRGlzdGFuY2VzJ10oaHR0cHM6Ly93ZWIuc3RhbmZvcmQuZWR1L2NsYXNzL2NzMzQ1YS9zbGlkZXMvMTItY2x1c3RlcmluZy5wZGYpCj4KPiBDbHVzdGVycyBhcmUgZGVmaW5lZCBieSB0aGVpciBkaXN0YW5jZSBtYXRyaXgsIGUuZy4gdGhlIGRpc3RhbmNlcyBiZXR3ZWVuIHBvaW50cy4gVGhpcyBkaXN0YW5jZSBjYW4gYmUgbWVhc3VyZWQgYSBmZXcgd2F5cy4gRXVjbGlkZWFuIGNsdXN0ZXJzIGFyZSBkZWZpbmVkIGJ5IHRoZSBhdmVyYWdlIG9mIHRoZSBwb2ludCB2YWx1ZXMsIGFuZCBjb250YWluIGEgJ2NlbnRyb2lkJyBvciBjZW50ZXIgcG9pbnQuIERpc3RhbmNlcyBhcmUgdGh1cyBtZWFzdXJlZCBieSB0aGUgZGlzdGFuY2UgdG8gdGhhdCBjZW50cm9pZC4gTm9uLUV1Y2xpZGVhbiBkaXN0YW5jZXMgcmVmZXIgdG8gJ2NsdXN0cm9pZHMnLCB0aGUgcG9pbnQgY2xvc2VzdCB0byBvdGhlciBwb2ludHMuIENsdXN0cm9pZHMgaW4gdHVybiBjYW4gYmUgZGVmaW5lZCBpbiB2YXJpb3VzIHdheXMuCj4KPiDwn46TIFsnQ29uc3RyYWluZWQnXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9Db25zdHJhaW5lZF9jbHVzdGVyaW5nKQo+Cj4gW0NvbnN0cmFpbmVkIENsdXN0ZXJpbmddKGh0dHBzOi8vd2ViLmNzLnVjZGF2aXMuZWR1L35kYXZpZHNvbi9QdWJsaWNhdGlvbnMvSUNETVR1dG9yaWFsLnBkZikgaW50cm9kdWNlcyAnc2VtaS1zdXBlcnZpc2VkJyBsZWFybmluZyBpbnRvIHRoaXMgdW5zdXBlcnZpc2VkIG1ldGhvZC4gVGhlIHJlbGF0aW9uc2hpcHMgYmV0d2VlbiBwb2ludHMgYXJlIGZsYWdnZWQgYXMgJ2Nhbm5vdCBsaW5rJyBvciAnbXVzdC1saW5rJyBzbyBzb21lIHJ1bGVzIGFyZSBmb3JjZWQgb24gdGhlIGRhdGFzZXQuCj4KPiBBbiBleGFtcGxlOiBJZiBhbiBhbGdvcml0aG0gaXMgc2V0IGZyZWUgb24gYSBiYXRjaCBvZiB1bmxhYmVsbGVkIG9yIHNlbWktbGFiZWxsZWQgZGF0YSwgdGhlIGNsdXN0ZXJzIGl0IHByb2R1Y2VzIG1heSBiZSBvZiBwb29yIHF1YWxpdHkuIEluIHRoZSBleGFtcGxlIGFib3ZlLCB0aGUgY2x1c3RlcnMgbWlnaHQgZ3JvdXAgJ3JvdW5kIG11c2ljIHRoaW5ncycgYW5kICdzcXVhcmUgbXVzaWMgdGhpbmdzJyBhbmQgJ3RyaWFuZ3VsYXIgdGhpbmdzJyBhbmQgJ2Nvb2tpZXMnLiBJZiBnaXZlbiBzb21lIGNvbnN0cmFpbnRzLCBvciBydWxlcyB0byBmb2xsb3cgKCJ0aGUgaXRlbSBtdXN0IGJlIG1hZGUgb2YgcGxhc3RpYyIsICJ0aGUgaXRlbSBuZWVkcyB0byBiZSBhYmxlIHRvIHByb2R1Y2UgbXVzaWMiKSB0aGlzIGNhbiBoZWxwICdjb25zdHJhaW4nIHRoZSBhbGdvcml0aG0gdG8gbWFrZSBiZXR0ZXIgY2hvaWNlcy4KPgo+IPCfjpMgJ0RlbnNpdHknCj4KPiBEYXRhIHRoYXQgaXMgJ25vaXN5JyBpcyBjb25zaWRlcmVkIHRvIGJlICdkZW5zZScuIFRoZSBkaXN0YW5jZXMgYmV0d2VlbiBwb2ludHMgaW4gZWFjaCBvZiBpdHMgY2x1c3RlcnMgbWF5IHByb3ZlLCBvbiBleGFtaW5hdGlvbiwgdG8gYmUgbW9yZSBvciBsZXNzIGRlbnNlLCBvciAnY3Jvd2RlZCcgYW5kIHRodXMgdGhpcyBkYXRhIG5lZWRzIHRvIGJlIGFuYWx5emVkIHdpdGggdGhlIGFwcHJvcHJpYXRlIGNsdXN0ZXJpbmcgbWV0aG9kLiBbVGhpcyBhcnRpY2xlXShodHRwczovL3d3dy5rZG51Z2dldHMuY29tLzIwMjAvMDIvdW5kZXJzdGFuZGluZy1kZW5zaXR5LWJhc2VkLWNsdXN0ZXJpbmcuaHRtbCkgZGVtb25zdHJhdGVzIHRoZSBkaWZmZXJlbmNlIGJldHdlZW4gdXNpbmcgSy1NZWFucyBjbHVzdGVyaW5nIHZzLiBIREJTQ0FOIGFsZ29yaXRobXMgdG8gZXhwbG9yZSBhIG5vaXN5IGRhdGFzZXQgd2l0aCB1bmV2ZW4gY2x1c3RlciBkZW5zaXR5LgoKRGVlcGVuIHlvdXIgdW5kZXJzdGFuZGluZyBvZiBjbHVzdGVyaW5nIHRlY2huaXF1ZXMgaW4gdGhpcyBbTGVhcm4gbW9kdWxlXShodHRwczovL2RvY3MubWljcm9zb2Z0LmNvbS9sZWFybi9tb2R1bGVzL3RyYWluLWV2YWx1YXRlLWNsdXN0ZXItbW9kZWxzP1dULm1jX2lkPWFjYWRlbWljLTc3OTUyLWxlZXN0b3R0KQoKIyMjICoqQ2x1c3RlcmluZyBhbGdvcml0aG1zKioKClRoZXJlIGFyZSBvdmVyIDEwMCBjbHVzdGVyaW5nIGFsZ29yaXRobXMsIGFuZCB0aGVpciB1c2UgZGVwZW5kcyBvbiB0aGUgbmF0dXJlIG9mIHRoZSBkYXRhIGF0IGhhbmQuIExldCdzIGRpc2N1c3Mgc29tZSBvZiB0aGUgbWFqb3Igb25lczoKCi0gICAqKkhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nKiouIElmIGFuIG9iamVjdCBpcyBjbGFzc2lmaWVkIGJ5IGl0cyBwcm94aW1pdHkgdG8gYSBuZWFyYnkgb2JqZWN0LCByYXRoZXIgdGhhbiB0byBvbmUgZmFydGhlciBhd2F5LCBjbHVzdGVycyBhcmUgZm9ybWVkIGJhc2VkIG9uIHRoZWlyIG1lbWJlcnMnIGRpc3RhbmNlIHRvIGFuZCBmcm9tIG90aGVyIG9iamVjdHMuIEhpZXJhcmNoaWNhbCBjbHVzdGVyaW5nIGlzIGNoYXJhY3Rlcml6ZWQgYnkgcmVwZWF0ZWRseSBjb21iaW5pbmcgdHdvIGNsdXN0ZXJzLgoKIVtJbmZvZ3JhcGhpYyBieSBEYXNhbmkgTWFkaXBhbGxpXSguLi8uLi9pbWFnZXMvaGllcmFyY2hpY2FsLnBuZyl7d2lkdGg9IjUwMCJ9CgotICAgKipDZW50cm9pZCBjbHVzdGVyaW5nKiouIFRoaXMgcG9wdWxhciBhbGdvcml0aG0gcmVxdWlyZXMgdGhlIGNob2ljZSBvZiAnaycsIG9yIHRoZSBudW1iZXIgb2YgY2x1c3RlcnMgdG8gZm9ybSwgYWZ0ZXIgd2hpY2ggdGhlIGFsZ29yaXRobSBkZXRlcm1pbmVzIHRoZSBjZW50ZXIgcG9pbnQgb2YgYSBjbHVzdGVyIGFuZCBnYXRoZXJzIGRhdGEgYXJvdW5kIHRoYXQgcG9pbnQuIFtLLW1lYW5zIGNsdXN0ZXJpbmddKGh0dHBzOi8vd2lraXBlZGlhLm9yZy93aWtpL0stbWVhbnNfY2x1c3RlcmluZykgaXMgYSBwb3B1bGFyIHZlcnNpb24gb2YgY2VudHJvaWQgY2x1c3RlcmluZyB3aGljaCBzZXBhcmF0ZXMgYSBkYXRhIHNldCBpbnRvIHByZS1kZWZpbmVkIEsgZ3JvdXBzLiBUaGUgY2VudGVyIGlzIGRldGVybWluZWQgYnkgdGhlIG5lYXJlc3QgbWVhbiwgdGh1cyB0aGUgbmFtZS4gVGhlIHNxdWFyZWQgZGlzdGFuY2UgZnJvbSB0aGUgY2x1c3RlciBpcyBtaW5pbWl6ZWQuIVtJbmZvZ3JhcGhpYyBieSBEYXNhbmkgTWFkaXBhbGxpXSguLi8uLi9pbWFnZXMvY2VudHJvaWQucG5nKXt3aWR0aD0iNTAwIn0KCi0gICAqKkRpc3RyaWJ1dGlvbi1iYXNlZCBjbHVzdGVyaW5nKiouIEJhc2VkIGluIHN0YXRpc3RpY2FsIG1vZGVsaW5nLCBkaXN0cmlidXRpb24tYmFzZWQgY2x1c3RlcmluZyBjZW50ZXJzIG9uIGRldGVybWluaW5nIHRoZSBwcm9iYWJpbGl0eSB0aGF0IGEgZGF0YSBwb2ludCBiZWxvbmdzIHRvIGEgY2x1c3RlciwgYW5kIGFzc2lnbmluZyBpdCBhY2NvcmRpbmdseS4gR2F1c3NpYW4gbWl4dHVyZSBtZXRob2RzIGJlbG9uZyB0byB0aGlzIHR5cGUuCgotICAgKipEZW5zaXR5LWJhc2VkIGNsdXN0ZXJpbmcqKi4gRGF0YSBwb2ludHMgYXJlIGFzc2lnbmVkIHRvIGNsdXN0ZXJzIGJhc2VkIG9uIHRoZWlyIGRlbnNpdHksIG9yIHRoZWlyIGdyb3VwaW5nIGFyb3VuZCBlYWNoIG90aGVyLiBEYXRhIHBvaW50cyBmYXIgZnJvbSB0aGUgZ3JvdXAgYXJlIGNvbnNpZGVyZWQgb3V0bGllcnMgb3Igbm9pc2UuIERCU0NBTiwgTWVhbi1zaGlmdCBhbmQgT1BUSUNTIGJlbG9uZyB0byB0aGlzIHR5cGUgb2YgY2x1c3RlcmluZy4KCi0gICAqKkdyaWQtYmFzZWQgY2x1c3RlcmluZyoqLiBGb3IgbXVsdGktZGltZW5zaW9uYWwgZGF0YXNldHMsIGEgZ3JpZCBpcyBjcmVhdGVkIGFuZCB0aGUgZGF0YSBpcyBkaXZpZGVkIGFtb25nc3QgdGhlIGdyaWQncyBjZWxscywgdGhlcmVieSBjcmVhdGluZyBjbHVzdGVycy4KClRoZSBiZXN0IHdheSB0byBsZWFybiBhYm91dCBjbHVzdGVyaW5nIGlzIHRvIHRyeSBpdCBmb3IgeW91cnNlbGYsIHNvIHRoYXQncyB3aGF0IHlvdSdsbCBkbyBpbiB0aGlzIGV4ZXJjaXNlLgoKV2UnbGwgcmVxdWlyZSBzb21lIHBhY2thZ2VzIHRvIGtub2NrLW9mZiB0aGlzIG1vZHVsZS4gWW91IGNhbiBoYXZlIHRoZW0gaW5zdGFsbGVkIGFzOiBgaW5zdGFsbC5wYWNrYWdlcyhjKCd0aWR5dmVyc2UnLCAndGlkeW1vZGVscycsICdEYXRhRXhwbG9yZXInLCAnc3VtbWFyeXRvb2xzJywgJ3Bsb3RseScsICdwYWxldHRlZXInLCAnY29ycnBsb3QnLCAncGF0Y2h3b3JrJykpYAoKQWx0ZXJuYXRpdmVseSwgdGhlIHNjcmlwdCBiZWxvdyBjaGVja3Mgd2hldGhlciB5b3UgaGF2ZSB0aGUgcGFja2FnZXMgcmVxdWlyZWQgdG8gY29tcGxldGUgdGhpcyBtb2R1bGUgYW5kIGluc3RhbGxzIHRoZW0gZm9yIHlvdSBpbiBjYXNlIHNvbWUgYXJlIG1pc3NpbmcuCgpgYGB7cn0Kc3VwcHJlc3NXYXJuaW5ncyhpZighcmVxdWlyZSgicGFjbWFuIikpIGluc3RhbGwucGFja2FnZXMoInBhY21hbiIpKQoKcGFjbWFuOjpwX2xvYWQoJ3RpZHl2ZXJzZScsICd0aWR5bW9kZWxzJywgJ0RhdGFFeHBsb3JlcicsICdzdW1tYXJ5dG9vbHMnLCAncGxvdGx5JywgJ3BhbGV0dGVlcicsICdjb3JycGxvdCcsICdwYXRjaHdvcmsnKQpgYGAKCmBgYHtyIHNldHVwfQprbml0cjo6b3B0c19jaHVuayRzZXQod2FybmluZyA9IEYsIG1lc3NhZ2UgPSBGKQoKYGBgCgojIyBFeGVyY2lzZSAtIGNsdXN0ZXIgeW91ciBkYXRhCgpDbHVzdGVyaW5nIGFzIGEgdGVjaG5pcXVlIGlzIGdyZWF0bHkgYWlkZWQgYnkgcHJvcGVyIHZpc3VhbGl6YXRpb24sIHNvIGxldCdzIGdldCBzdGFydGVkIGJ5IHZpc3VhbGl6aW5nIG91ciBtdXNpYyBkYXRhLiBUaGlzIGV4ZXJjaXNlIHdpbGwgaGVscCB1cyBkZWNpZGUgd2hpY2ggb2YgdGhlIG1ldGhvZHMgb2YgY2x1c3RlcmluZyB3ZSBzaG91bGQgbW9zdCBlZmZlY3RpdmVseSB1c2UgZm9yIHRoZSBuYXR1cmUgb2YgdGhpcyBkYXRhLgoKTGV0J3MgaGl0IHRoZSBncm91bmQgcnVubmluZyBieSBpbXBvcnRpbmcgdGhlIGRhdGEuCgpgYGB7cn0KIyBMb2FkIHRoZSBjb3JlIHRpZHl2ZXJzZSBhbmQgbWFrZSBpdCBhdmFpbGFibGUgaW4geW91ciBjdXJyZW50IFIgc2Vzc2lvbgpsaWJyYXJ5KHRpZHl2ZXJzZSkKCiMgSW1wb3J0IHRoZSBkYXRhIGludG8gYSB0aWJibGUKZGYgPC0gcmVhZF9jc3YoZmlsZSA9ICJodHRwczovL3Jhdy5naXRodWJ1c2VyY29udGVudC5jb20vbWljcm9zb2Z0L01MLUZvci1CZWdpbm5lcnMvbWFpbi81LUNsdXN0ZXJpbmcvZGF0YS9uaWdlcmlhbi1zb25ncy5jc3YiKQoKIyBWaWV3IHRoZSBmaXJzdCA1IHJvd3Mgb2YgdGhlIGRhdGEgc2V0CmRmICU+JSAKICBzbGljZV9oZWFkKG4gPSA1KQoKYGBgCgpTb21ldGltZXMsIHdlIG1heSB3YW50IHNvbWUgbGl0dGxlIG1vcmUgaW5mb3JtYXRpb24gb24gb3VyIGRhdGEuIFdlIGNhbiBoYXZlIGEgbG9vayBhdCB0aGUgYGRhdGFgIGFuZCBgaXRzIHN0cnVjdHVyZWAgYnkgdXNpbmcgdGhlIFsqZ2xpbXBzZSgpKl0oaHR0cHM6Ly9waWxsYXIuci1saWIub3JnL3JlZmVyZW5jZS9nbGltcHNlLmh0bWwpIGZ1bmN0aW9uOgoKYGBge3J9CiMgR2xpbXBzZSBpbnRvIHRoZSBkYXRhIHNldApkZiAlPiUgCiAgZ2xpbXBzZSgpCmBgYAoKR29vZCBqb2Ih8J+SqgoKV2UgY2FuIG9ic2VydmUgdGhhdCBgZ2xpbXBzZSgpYCB3aWxsIGdpdmUgeW91IHRoZSB0b3RhbCBudW1iZXIgb2Ygcm93cyAob2JzZXJ2YXRpb25zKSBhbmQgY29sdW1ucyAodmFyaWFibGVzKSwgdGhlbiwgdGhlIGZpcnN0IGZldyBlbnRyaWVzIG9mIGVhY2ggdmFyaWFibGUgaW4gYSByb3cgYWZ0ZXIgdGhlIHZhcmlhYmxlIG5hbWUuIEluIGFkZGl0aW9uLCB0aGUgKmRhdGEgdHlwZSogb2YgdGhlIHZhcmlhYmxlIGlzIGdpdmVuIGltbWVkaWF0ZWx5IGFmdGVyIGVhY2ggdmFyaWFibGUncyBuYW1lIGluc2lkZSBgPCA+YC4KCmBEYXRhRXhwbG9yZXI6OmludHJvZHVjZSgpYCBjYW4gc3VtbWFyaXplIHRoaXMgaW5mb3JtYXRpb24gbmVhdGx5OgoKYGBge3IgRGF0YUV4cGxvcmVyfQojIERlc2NyaWJlIGJhc2ljIGluZm9ybWF0aW9uIGZvciBvdXIgZGF0YQpkZiAlPiUgCiAgaW50cm9kdWNlKCkKCiMgQSB2aXN1YWwgZGlzcGxheSBvZiB0aGUgc2FtZQpkZiAlPiUgCiAgcGxvdF9pbnRybygpCgpgYGAKCkF3ZXNvbWUhIFdlIGhhdmUganVzdCBsZWFybnQgdGhhdCBvdXIgZGF0YSBoYXMgbm8gbWlzc2luZyB2YWx1ZXMuCgpXaGlsZSB3ZSBhcmUgYXQgaXQsIHdlIGNhbiBleHBsb3JlIGNvbW1vbiBjZW50cmFsIHRlbmRlbmN5IHN0YXRpc3RpY3MgKGUuZyBbbWVhbl0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvQXJpdGhtZXRpY19tZWFuKSBhbmQgW21lZGlhbl0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvTWVkaWFuKSkgYW5kIG1lYXN1cmVzIG9mIGRpc3BlcnNpb24gKGUuZyBbc3RhbmRhcmQgZGV2aWF0aW9uXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9TdGFuZGFyZF9kZXZpYXRpb24pKSB1c2luZyBgc3VtbWFyeXRvb2xzOjpkZXNjcigpYAoKYGBgCiMgRGVzY3JpYmUgY29tbW9uIHN0YXRpc3RpY3MKZGYgJT4lIGRlc2NyKHN0YXRzID0gImNvbW1vbiIpCgpgYGAKCkxldCdzIGxvb2sgYXQgdGhlIGdlbmVyYWwgdmFsdWVzIG9mIHRoZSBkYXRhLiBOb3RlIHRoYXQgcG9wdWxhcml0eSBjYW4gYmUgYDBgLCB3aGljaCBzaG93IHNvbmdzIHRoYXQgaGF2ZSBubyByYW5raW5nLiBXZSdsbCByZW1vdmUgdGhvc2Ugc2hvcnRseS4KCj4g8J+klCBJZiB3ZSBhcmUgd29ya2luZyB3aXRoIGNsdXN0ZXJpbmcsIGFuIHVuc3VwZXJ2aXNlZCBtZXRob2QgdGhhdCBkb2VzIG5vdCByZXF1aXJlIGxhYmVsZWQgZGF0YSwgd2h5IGFyZSB3ZSBzaG93aW5nIHRoaXMgZGF0YSB3aXRoIGxhYmVscz8gSW4gdGhlIGRhdGEgZXhwbG9yYXRpb24gcGhhc2UsIHRoZXkgY29tZSBpbiBoYW5keSwgYnV0IHRoZXkgYXJlIG5vdCBuZWNlc3NhcnkgZm9yIHRoZSBjbHVzdGVyaW5nIGFsZ29yaXRobXMgdG8gd29yay4KCiMjIyAxLiBFeHBsb3JlIHBvcHVsYXIgZ2VucmVzCgpMZXQncyBnbyBhaGVhZCBhbmQgZmluZCBvdXQgdGhlIG1vc3QgcG9wdWxhciBnZW5yZXMg8J+OtiBieSBtYWtpbmcgYSBjb3VudCBvZiB0aGUgaW5zdGFuY2VzIGl0IGFwcGVhcnMuCgpgYGB7ciBjb3VudF9nZW5yZXN9CiMgUG9wdWxhciBnZW5yZXMKdG9wX2dlbnJlcyA8LSBkZiAlPiUgCiAgY291bnQoYXJ0aXN0X3RvcF9nZW5yZSwgc29ydCA9IFRSVUUpICU+JSAKIyBFbmNvZGUgdG8gY2F0ZWdvcmljYWwgYW5kIHJlb3JkZXIgdGhlIGFjY29yZGluZyB0byBjb3VudAogIG11dGF0ZShhcnRpc3RfdG9wX2dlbnJlID0gZmFjdG9yKGFydGlzdF90b3BfZ2VucmUpICU+JSBmY3RfaW5vcmRlcigpKQoKIyBQcmludCB0aGUgdG9wIGdlbnJlcwp0b3BfZ2VucmVzCgpgYGAKClRoYXQgd2VudCB3ZWxsISBUaGV5IHNheSBhIHBpY3R1cmUgaXMgd29ydGggYSB0aG91c2FuZCByb3dzIG9mIGEgZGF0YSBmcmFtZSAoYWN0dWFsbHkgbm9ib2R5IGV2ZXIgc2F5cyB0aGF0IPCfmIUpLiBCdXQgeW91IGdldCB0aGUgZ2lzdCBvZiBpdCwgcmlnaHQ/CgpPbmUgd2F5IHRvIHZpc3VhbGl6ZSBjYXRlZ29yaWNhbCBkYXRhIChjaGFyYWN0ZXIgb3IgZmFjdG9yIHZhcmlhYmxlcykgaXMgdXNpbmcgYmFycGxvdHMuIExldCdzIG1ha2UgYSBiYXJwbG90IG9mIHRoZSB0b3AgMTAgZ2VucmVzOgoKYGBge3IgYmFyX3Bsb3RfZ2VucmV9CiMgQ2hhbmdlIHRoZSBkZWZhdWx0IGdyYXkgdGhlbWUKdGhlbWVfc2V0KHRoZW1lX2xpZ2h0KCkpCgojIFZpc3VhbGl6ZSBwb3B1bGFyIGdlbnJlcwp0b3BfZ2VucmVzICU+JQogIHNsaWNlKDE6MTApICU+JSAKICBnZ3Bsb3QobWFwcGluZyA9IGFlcyh4ID0gYXJ0aXN0X3RvcF9nZW5yZSwgeSA9IG4sCiAgICAgICAgICAgICAgICAgICAgICAgZmlsbCA9IGFydGlzdF90b3BfZ2VucmUpKSArCiAgZ2VvbV9jb2woYWxwaGEgPSAwLjgpICsKICBwYWxldHRlZXI6OnNjYWxlX2ZpbGxfcGFsZXR0ZWVyX2QoInJjYXJ0b2NvbG9yOjpWaXZpZCIpICsKICBnZ3RpdGxlKCJUb3AgZ2VucmVzIikgKwogIHRoZW1lKHBsb3QudGl0bGUgPSBlbGVtZW50X3RleHQoaGp1c3QgPSAwLjUpLAogICAgICAgICMgUm90YXRlcyB0aGUgWCBtYXJrZXJzIChzbyB3ZSBjYW4gcmVhZCB0aGVtKQogICAgYXhpcy50ZXh0LnggPSBlbGVtZW50X3RleHQoYW5nbGUgPSA5MCkpCmBgYAoKTm93IGl0J3Mgd2F5IGVhc2llciB0byBpZGVudGlmeSB0aGF0IHdlIGhhdmUgYG1pc3NpbmdgIGdlbnJlcyDwn6eQIQoKPiBBIGdvb2QgdmlzdWFsaXNhdGlvbiB3aWxsIHNob3cgeW91IHRoaW5ncyB0aGF0IHlvdSBkaWQgbm90IGV4cGVjdCwgb3IgcmFpc2UgbmV3IHF1ZXN0aW9ucyBhYm91dCB0aGUgZGF0YSAtIEhhZGxleSBXaWNraGFtIGFuZCBHYXJyZXR0IEdyb2xlbXVuZCwgW1IgRm9yIERhdGEgU2NpZW5jZV0oaHR0cHM6Ly9yNGRzLmhhZC5jby5uei9pbnRyb2R1Y3Rpb24uaHRtbCkKCk5vdGUsIHdoZW4gdGhlIHRvcCBnZW5yZSBpcyBkZXNjcmliZWQgYXMgYE1pc3NpbmdgLCB0aGF0IG1lYW5zIHRoYXQgU3BvdGlmeSBkaWQgbm90IGNsYXNzaWZ5IGl0LCBzbyBsZXQncyBnZXQgcmlkIG9mIGl0LgoKYGBge3IgcmVtb3ZlX21pc3Npbmd9CiMgVmlzdWFsaXplIHBvcHVsYXIgZ2VucmVzCnRvcF9nZW5yZXMgJT4lCiAgZmlsdGVyKGFydGlzdF90b3BfZ2VucmUgIT0gIk1pc3NpbmciKSAlPiUgCiAgc2xpY2UoMToxMCkgJT4lIAogIGdncGxvdChtYXBwaW5nID0gYWVzKHggPSBhcnRpc3RfdG9wX2dlbnJlLCB5ID0gbiwKICAgICAgICAgICAgICAgICAgICAgICBmaWxsID0gYXJ0aXN0X3RvcF9nZW5yZSkpICsKICBnZW9tX2NvbChhbHBoYSA9IDAuOCkgKwogIHBhbGV0dGVlcjo6c2NhbGVfZmlsbF9wYWxldHRlZXJfZCgicmNhcnRvY29sb3I6OlZpdmlkIikgKwogIGdndGl0bGUoIlRvcCBnZW5yZXMiKSArCiAgdGhlbWUocGxvdC50aXRsZSA9IGVsZW1lbnRfdGV4dChoanVzdCA9IDAuNSksCiAgICAgICAgIyBSb3RhdGVzIHRoZSBYIG1hcmtlcnMgKHNvIHdlIGNhbiByZWFkIHRoZW0pCiAgICBheGlzLnRleHQueCA9IGVsZW1lbnRfdGV4dChhbmdsZSA9IDkwKSkKYGBgCgpGcm9tIHRoZSBsaXR0bGUgZGF0YSBleHBsb3JhdGlvbiwgd2UgbGVhcm4gdGhhdCB0aGUgdG9wIHRocmVlIGdlbnJlcyBkb21pbmF0ZSB0aGlzIGRhdGFzZXQuIExldCdzIGNvbmNlbnRyYXRlIG9uIGBhZnJvIGRhbmNlaGFsbGAsIGBhZnJvcG9wYCwgYW5kIGBuaWdlcmlhbiBwb3BgLCBhZGRpdGlvbmFsbHkgZmlsdGVyIHRoZSBkYXRhc2V0IHRvIHJlbW92ZSBhbnl0aGluZyB3aXRoIGEgMCBwb3B1bGFyaXR5IHZhbHVlIChtZWFuaW5nIGl0IHdhcyBub3QgY2xhc3NpZmllZCB3aXRoIGEgcG9wdWxhcml0eSBpbiB0aGUgZGF0YXNldCBhbmQgY2FuIGJlIGNvbnNpZGVyZWQgbm9pc2UgZm9yIG91ciBwdXJwb3Nlcyk6CgpgYGB7ciBuZXdfZGF0YXNldH0KbmlnZXJpYW5fc29uZ3MgPC0gZGYgJT4lIAogICMgQ29uY2VudHJhdGUgb24gdG9wIDMgZ2VucmVzCiAgZmlsdGVyKGFydGlzdF90b3BfZ2VucmUgJWluJSBjKCJhZnJvIGRhbmNlaGFsbCIsICJhZnJvcG9wIiwibmlnZXJpYW4gcG9wIikpICU+JSAKICAjIFJlbW92ZSB1bmNsYXNzaWZpZWQgb2JzZXJ2YXRpb25zCiAgZmlsdGVyKHBvcHVsYXJpdHkgIT0gMCkKCgoKIyBWaXN1YWxpemUgcG9wdWxhciBnZW5yZXMKbmlnZXJpYW5fc29uZ3MgJT4lCiAgY291bnQoYXJ0aXN0X3RvcF9nZW5yZSkgJT4lCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IGFydGlzdF90b3BfZ2VucmUsIHkgPSBuLAogICAgICAgICAgICAgICAgICAgICAgIGZpbGwgPSBhcnRpc3RfdG9wX2dlbnJlKSkgKwogIGdlb21fY29sKGFscGhhID0gMC44KSArCiAgcGFsZXR0ZWVyOjpzY2FsZV9maWxsX3BhbGV0dGVlcl9kKCJnZ3NjaTo6Y2F0ZWdvcnkxMF9kMyIpICsKICBnZ3RpdGxlKCJUb3AgZ2VucmVzIikgKwogIHRoZW1lKHBsb3QudGl0bGUgPSBlbGVtZW50X3RleHQoaGp1c3QgPSAwLjUpKQpgYGAKCkxldCdzIHNlZSB3aGV0aGVyIHRoZXJlIGlzIGFueSBhcHBhcmVudCBsaW5lYXIgcmVsYXRpb25zaGlwIGFtb25nIHRoZSBudW1lcmljYWwgdmFyaWFibGVzIGluIG91ciBkYXRhIHNldC4gVGhpcyByZWxhdGlvbnNoaXAgaXMgcXVhbnRpZmllZCBtYXRoZW1hdGljYWxseSBieSB0aGUgW2NvcnJlbGF0aW9uIHN0YXRpc3RpY10oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvQ29ycmVsYXRpb24pLgoKVGhlIGNvcnJlbGF0aW9uIHN0YXRpc3RpYyBpcyBhIHZhbHVlIGJldHdlZW4gLTEgYW5kIDEgdGhhdCBpbmRpY2F0ZXMgdGhlIHN0cmVuZ3RoIG9mIGEgcmVsYXRpb25zaGlwLiBWYWx1ZXMgYWJvdmUgMCBpbmRpY2F0ZSBhICpwb3NpdGl2ZSogY29ycmVsYXRpb24gKGhpZ2ggdmFsdWVzIG9mIG9uZSB2YXJpYWJsZSB0ZW5kIHRvIGNvaW5jaWRlIHdpdGggaGlnaCB2YWx1ZXMgb2YgdGhlIG90aGVyKSwgd2hpbGUgdmFsdWVzIGJlbG93IDAgaW5kaWNhdGUgYSAqbmVnYXRpdmUqIGNvcnJlbGF0aW9uIChoaWdoIHZhbHVlcyBvZiBvbmUgdmFyaWFibGUgdGVuZCB0byBjb2luY2lkZSB3aXRoIGxvdyB2YWx1ZXMgb2YgdGhlIG90aGVyKS4KCmBgYHtyIGNvcnJlbGF0aW9ufQojIE5hcnJvdyBkb3duIHRvIG51bWVyaWMgdmFyaWFibGVzIGFuZCBmaWQgY29ycmVsYXRpb24KY29ycl9tYXQgPC0gbmlnZXJpYW5fc29uZ3MgJT4lIAogIHNlbGVjdCh3aGVyZShpcy5udW1lcmljKSkgJT4lIAogIGNvcigpCgojIFZpc3VhbGl6ZSBjb3JyZWxhdGlvbiBtYXRyaXgKY29ycnBsb3QoY29ycl9tYXQsIG9yZGVyID0gJ0FPRScsIGNvbCA9IGMoJ3doaXRlJywgJ2JsYWNrJyksIGJnID0gJ2dvbGQyJykgIApgYGAKClRoZSBkYXRhIGlzIG5vdCBzdHJvbmdseSBjb3JyZWxhdGVkIGV4Y2VwdCBiZXR3ZWVuIGBlbmVyZ3lgIGFuZCBgbG91ZG5lc3NgLCB3aGljaCBtYWtlcyBzZW5zZSwgZ2l2ZW4gdGhhdCBsb3VkIG11c2ljIGlzIHVzdWFsbHkgcHJldHR5IGVuZXJnZXRpYy4gYFBvcHVsYXJpdHlgIGhhcyBhIGNvcnJlc3BvbmRlbmNlIHRvIGByZWxlYXNlIGRhdGVgLCB3aGljaCBhbHNvIG1ha2VzIHNlbnNlLCBhcyBtb3JlIHJlY2VudCBzb25ncyBhcmUgcHJvYmFibHkgbW9yZSBwb3B1bGFyLiBMZW5ndGggYW5kIGVuZXJneSBzZWVtIHRvIGhhdmUgYSBjb3JyZWxhdGlvbiB0b28uCgpJdCB3aWxsIGJlIGludGVyZXN0aW5nIHRvIHNlZSB3aGF0IGEgY2x1c3RlcmluZyBhbGdvcml0aG0gY2FuIG1ha2Ugb2YgdGhpcyBkYXRhIQoKPiDwn46TIE5vdGUgdGhhdCBjb3JyZWxhdGlvbiBkb2VzIG5vdCBpbXBseSBjYXVzYXRpb24hIFdlIGhhdmUgcHJvb2Ygb2YgY29ycmVsYXRpb24gYnV0IG5vIHByb29mIG9mIGNhdXNhdGlvbi4gQW4gW2FtdXNpbmcgd2ViIHNpdGVdKGh0dHBzOi8vdHlsZXJ2aWdlbi5jb20vc3B1cmlvdXMtY29ycmVsYXRpb25zKSBoYXMgc29tZSB2aXN1YWxzIHRoYXQgZW1waGFzaXplIHRoaXMgcG9pbnQuCgojIyMgMi4gRXhwbG9yZSBkYXRhIGRpc3RyaWJ1dGlvbgoKTGV0J3MgYXNrIHNvbWUgbW9yZSBzdWJ0bGUgcXVlc3Rpb25zLiBBcmUgdGhlIGdlbnJlcyBzaWduaWZpY2FudGx5IGRpZmZlcmVudCBpbiB0aGUgcGVyY2VwdGlvbiBvZiB0aGVpciBkYW5jZWFiaWxpdHksIGJhc2VkIG9uIHRoZWlyIHBvcHVsYXJpdHk/IExldCdzIGV4YW1pbmUgb3VyIHRvcCB0aHJlZSBnZW5yZXMgZGF0YSBkaXN0cmlidXRpb24gZm9yIHBvcHVsYXJpdHkgYW5kIGRhbmNlYWJpbGl0eSBhbG9uZyBhIGdpdmVuIHggYW5kIHkgYXhpcyB1c2luZyBbZGVuc2l0eSBwbG90c10oaHR0cHM6Ly93d3cua2hhbmFjYWRlbXkub3JnL21hdGgvYXAtc3RhdGlzdGljcy9kZW5zaXR5LWN1cnZlcy1ub3JtYWwtZGlzdHJpYnV0aW9uLWFwL2RlbnNpdHktY3VydmVzL3YvZGVuc2l0eS1jdXJ2ZXMpLgoKYGBge3J9CiMgUGVyZm9ybSAyRCBrZXJuZWwgZGVuc2l0eSBlc3RpbWF0aW9uCmRlbnNpdHlfZXN0aW1hdGVfMmQgPC0gbmlnZXJpYW5fc29uZ3MgJT4lIAogIGdncGxvdChtYXBwaW5nID0gYWVzKHggPSBwb3B1bGFyaXR5LCB5ID0gZGFuY2VhYmlsaXR5LCBjb2xvciA9IGFydGlzdF90b3BfZ2VucmUpKSArCiAgZ2VvbV9kZW5zaXR5XzJkKGJpbnMgPSA1LCBzaXplID0gMSkgKwogIHBhbGV0dGVlcjo6c2NhbGVfY29sb3JfcGFsZXR0ZWVyX2QoIlJTa2l0dGxlQnJld2VyOjp3aWxkYmVycnkiKSArCiAgeGxpbSgtMjAsIDgwKSArCiAgeWxpbSgwLCAxLjIpCgojIERlbnNpdHkgcGxvdCBiYXNlZCBvbiB0aGUgcG9wdWxhcml0eQpkZW5zaXR5X2VzdGltYXRlX3BvcCA8LSBuaWdlcmlhbl9zb25ncyAlPiUgCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IHBvcHVsYXJpdHksIGZpbGwgPSBhcnRpc3RfdG9wX2dlbnJlLCBjb2xvciA9IGFydGlzdF90b3BfZ2VucmUpKSArCiAgZ2VvbV9kZW5zaXR5KHNpemUgPSAxLCBhbHBoYSA9IDAuNSkgKwogIHBhbGV0dGVlcjo6c2NhbGVfZmlsbF9wYWxldHRlZXJfZCgiUlNraXR0bGVCcmV3ZXI6OndpbGRiZXJyeSIpICsKICBwYWxldHRlZXI6OnNjYWxlX2NvbG9yX3BhbGV0dGVlcl9kKCJSU2tpdHRsZUJyZXdlcjo6d2lsZGJlcnJ5IikgKwogIHRoZW1lKGxlZ2VuZC5wb3NpdGlvbiA9ICJub25lIikKCiMgRGVuc2l0eSBwbG90IGJhc2VkIG9uIHRoZSBkYW5jZWFiaWxpdHkKZGVuc2l0eV9lc3RpbWF0ZV9kYW5jZSA8LSBuaWdlcmlhbl9zb25ncyAlPiUgCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IGRhbmNlYWJpbGl0eSwgZmlsbCA9IGFydGlzdF90b3BfZ2VucmUsIGNvbG9yID0gYXJ0aXN0X3RvcF9nZW5yZSkpICsKICBnZW9tX2RlbnNpdHkoc2l6ZSA9IDEsIGFscGhhID0gMC41KSArCiAgcGFsZXR0ZWVyOjpzY2FsZV9maWxsX3BhbGV0dGVlcl9kKCJSU2tpdHRsZUJyZXdlcjo6d2lsZGJlcnJ5IikgKwogIHBhbGV0dGVlcjo6c2NhbGVfY29sb3JfcGFsZXR0ZWVyX2QoIlJTa2l0dGxlQnJld2VyOjp3aWxkYmVycnkiKQoKCiMgUGF0Y2ggZXZlcnl0aGluZyB0b2dldGhlcgpsaWJyYXJ5KHBhdGNod29yaykKZGVuc2l0eV9lc3RpbWF0ZV8yZCAvIChkZW5zaXR5X2VzdGltYXRlX3BvcCArIGRlbnNpdHlfZXN0aW1hdGVfZGFuY2UpCmBgYAoKV2Ugc2VlIHRoYXQgdGhlcmUgYXJlIGNvbmNlbnRyaWMgY2lyY2xlcyB0aGF0IGxpbmUgdXAsIHJlZ2FyZGxlc3Mgb2YgZ2VucmUuIENvdWxkIGl0IGJlIHRoYXQgTmlnZXJpYW4gdGFzdGVzIGNvbnZlcmdlIGF0IGEgY2VydGFpbiBsZXZlbCBvZiBkYW5jZWFiaWxpdHkgZm9yIHRoaXMgZ2VucmU/CgpJbiBnZW5lcmFsLCB0aGUgdGhyZWUgZ2VucmVzIGFsaWduIGluIHRlcm1zIG9mIHRoZWlyIHBvcHVsYXJpdHkgYW5kIGRhbmNlYWJpbGl0eS4gRGV0ZXJtaW5pbmcgY2x1c3RlcnMgaW4gdGhpcyBsb29zZWx5LWFsaWduZWQgZGF0YSB3aWxsIGJlIGEgY2hhbGxlbmdlLiBMZXQncyBzZWUgd2hldGhlciBhIHNjYXR0ZXIgcGxvdCBjYW4gc3VwcG9ydCB0aGlzLgoKYGBge3Igc2NhdHRlcl9wbG90fQojIEEgc2NhdHRlciBwbG90IG9mIHBvcHVsYXJpdHkgYW5kIGRhbmNlYWJpbGl0eQpzY2F0dGVyX3Bsb3QgPC0gbmlnZXJpYW5fc29uZ3MgJT4lIAogIGdncGxvdChtYXBwaW5nID0gYWVzKHggPSBwb3B1bGFyaXR5LCB5ID0gZGFuY2VhYmlsaXR5LCBjb2xvciA9IGFydGlzdF90b3BfZ2VucmUsIHNoYXBlID0gYXJ0aXN0X3RvcF9nZW5yZSkpICsKICBnZW9tX3BvaW50KHNpemUgPSAyLCBhbHBoYSA9IDAuOCkgKwogIHBhbGV0dGVlcjo6c2NhbGVfY29sb3JfcGFsZXR0ZWVyX2QoImZ1dHVyZXZpc2lvbnM6Om1hcnMiKQoKIyBBZGQgYSB0b3VjaCBvZiBpbnRlcmFjdGl2aXR5CmdncGxvdGx5KHNjYXR0ZXJfcGxvdCkKYGBgCgpBIHNjYXR0ZXJwbG90IG9mIHRoZSBzYW1lIGF4ZXMgc2hvd3MgYSBzaW1pbGFyIHBhdHRlcm4gb2YgY29udmVyZ2VuY2UuCgpJbiBnZW5lcmFsLCBmb3IgY2x1c3RlcmluZywgeW91IGNhbiB1c2Ugc2NhdHRlcnBsb3RzIHRvIHNob3cgY2x1c3RlcnMgb2YgZGF0YSwgc28gbWFzdGVyaW5nIHRoaXMgdHlwZSBvZiB2aXN1YWxpemF0aW9uIGlzIHZlcnkgdXNlZnVsLiBJbiB0aGUgbmV4dCBsZXNzb24sIHdlIHdpbGwgdGFrZSB0aGlzIGZpbHRlcmVkIGRhdGEgYW5kIHVzZSBrLW1lYW5zIGNsdXN0ZXJpbmcgdG8gZGlzY292ZXIgZ3JvdXBzIGluIHRoaXMgZGF0YSB0aGF0IHNlZSB0byBvdmVybGFwIGluIGludGVyZXN0aW5nIHdheXMuCgojIyAqKvCfmoAgQ2hhbGxlbmdlKioKCkluIHByZXBhcmF0aW9uIGZvciB0aGUgbmV4dCBsZXNzb24sIG1ha2UgYSBjaGFydCBhYm91dCB0aGUgdmFyaW91cyBjbHVzdGVyaW5nIGFsZ29yaXRobXMgeW91IG1pZ2h0IGRpc2NvdmVyIGFuZCB1c2UgaW4gYSBwcm9kdWN0aW9uIGVudmlyb25tZW50LiBXaGF0IGtpbmRzIG9mIHByb2JsZW1zIGlzIHRoZSBjbHVzdGVyaW5nIHRyeWluZyB0byBhZGRyZXNzPwoKIyMgWyoqUG9zdC1sZWN0dXJlIHF1aXoqKl0oaHR0cHM6Ly9ncmF5LXNhbmQtMDdhMTBmNDAzLjEuYXp1cmVzdGF0aWNhcHBzLm5ldC9xdWl6LzI4LykKCiMjICoqUmV2aWV3ICYgU2VsZiBTdHVkeSoqCgpCZWZvcmUgeW91IGFwcGx5IGNsdXN0ZXJpbmcgYWxnb3JpdGhtcywgYXMgd2UgaGF2ZSBsZWFybmVkLCBpdCdzIGEgZ29vZCBpZGVhIHRvIHVuZGVyc3RhbmQgdGhlIG5hdHVyZSBvZiB5b3VyIGRhdGFzZXQuIFJlYWQgbW9yZSBvbiB0aGlzIHRvcGljIFtoZXJlXShodHRwczovL3d3dy5rZG51Z2dldHMuY29tLzIwMTkvMTAvcmlnaHQtY2x1c3RlcmluZy1hbGdvcml0aG0uaHRtbCkKCkRlZXBlbiB5b3VyIHVuZGVyc3RhbmRpbmcgb2YgY2x1c3RlcmluZyB0ZWNobmlxdWVzOgoKLSAgIFtUcmFpbiBhbmQgRXZhbHVhdGUgQ2x1c3RlcmluZyBNb2RlbHMgdXNpbmcgVGlkeW1vZGVscyBhbmQgZnJpZW5kc10oaHR0cHM6Ly9ycHVicy5jb20vZVJfaWMvY2x1c3RlcmluZykKCi0gICBCcmFkbGV5IEJvZWhta2UgJiBCcmFuZG9uIEdyZWVud2VsbCwgWypIYW5kcy1PbiBNYWNoaW5lIExlYXJuaW5nIHdpdGggUipdKGh0dHBzOi8vYnJhZGxleWJvZWhta2UuZ2l0aHViLmlvL0hPTUwvKSouKgoKIyMgKipBc3NpZ25tZW50KioKCltSZXNlYXJjaCBvdGhlciB2aXN1YWxpemF0aW9ucyBmb3IgY2x1c3RlcmluZ10oaHR0cHM6Ly9naXRodWIuY29tL21pY3Jvc29mdC9NTC1Gb3ItQmVnaW5uZXJzL2Jsb2IvbWFpbi81LUNsdXN0ZXJpbmcvMS1WaXN1YWxpemUvYXNzaWdubWVudC5tZCkKCiMjIFRIQU5LIFlPVSBUTzoKCltKZW4gTG9vcGVyXShodHRwczovL3d3dy50d2l0dGVyLmNvbS9qZW5sb29wZXIpIGZvciBjcmVhdGluZyB0aGUgb3JpZ2luYWwgUHl0aG9uIHZlcnNpb24gb2YgdGhpcyBtb2R1bGUg4pml77iPCgpbYERhc2FuaSBNYWRpcGFsbGlgXShodHRwczovL3R3aXR0ZXIuY29tL2Rhc2FuaV9kZWNvZGVkKSBmb3IgY3JlYXRpbmcgdGhlIGFtYXppbmcgaWxsdXN0cmF0aW9ucyB0aGF0IG1ha2UgbWFjaGluZSBsZWFybmluZyBjb25jZXB0cyBtb3JlIGludGVycHJldGFibGUgYW5kIGVhc2llciB0byB1bmRlcnN0YW5kLgoKSGFwcHkgTGVhcm5pbmcsCgpbRXJpY10oaHR0cHM6Ly90d2l0dGVyLmNvbS9lcmljbnRheSksIEdvbGQgTWljcm9zb2Z0IExlYXJuIFN0dWRlbnQgQW1iYXNzYWRvci4K
+ + +
+
+ +
+ + + + + + + + + + + + + + + + + diff --git a/5-Clustering/2-K-Means/README.md b/5-Clustering/2-K-Means/README.md index 628ecbb16..18a08fdd6 100644 --- a/5-Clustering/2-K-Means/README.md +++ b/5-Clustering/2-K-Means/README.md @@ -1,9 +1,5 @@ # K-Means clustering -[![Andrew Ng explains Clustering](https://img.youtube.com/vi/hDmNF9JG3lo/0.jpg)](https://youtu.be/hDmNF9JG3lo "Andrew Ng explains Clustering") - -> 🎥 Click the image above for a video: Andrew Ng explains clustering - ## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/29/) In this lesson, you will learn how to create clusters using Scikit-learn and the Nigerian music dataset you imported earlier. We will cover the basics of K-Means for Clustering. Keep in mind that, as you learned in the earlier lesson, there are many ways to work with clusters and the method you use depends on your data. We will try K-Means as it's the most common clustering technique. Let's get started! @@ -36,7 +32,7 @@ One drawback of using K-Means includes the fact that you will need to establish ## Prerequisite -You will work in this lesson's _notebook.ipynb_ file that includes the data import and preliminary cleaning you did in the last lesson. +You will work in this lesson's [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/2-K-Means/notebook.ipynb) file that includes the data import and preliminary cleaning you did in the last lesson. ## Exercise - preparation @@ -134,7 +130,7 @@ You see an array printed out with predicted clusters (0, 1,or 2) for each row of ## Silhouette score -Look for a silhouette score closer to 1. This score varies from -1 to 1, and if the score is 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters.[source](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam). +Look for a silhouette score closer to 1. This score varies from -1 to 1, and if the score is 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters. [(Source)](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam) Our score is **.53**, so right in the middle. This indicates that our data is not particularly well-suited to this type of clustering, but let's continue. @@ -157,11 +153,11 @@ Our score is **.53**, so right in the middle. This indicates that our data is no > 🎓 range: These are the iterations of the clustering process - > 🎓 random_state: "Determines random number generation for centroid initialization."[source](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans) + > 🎓 random_state: "Determines random number generation for centroid initialization." [Source](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans) - > 🎓 WCSS: "within-cluster sums of squares" measures the squared average distance of all the points within a cluster to the cluster centroid.[source](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce). + > 🎓 WCSS: "within-cluster sums of squares" measures the squared average distance of all the points within a cluster to the cluster centroid. [Source](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce). - > 🎓 Inertia: K-Means algorithms attempt to choose centroids to minimize 'inertia', "a measure of how internally coherent clusters are."[source](https://scikit-learn.org/stable/modules/clustering.html). The value is appended to the wcss variable on each iteration. + > 🎓 Inertia: K-Means algorithms attempt to choose centroids to minimize 'inertia', "a measure of how internally coherent clusters are." [Source](https://scikit-learn.org/stable/modules/clustering.html). The value is appended to the wcss variable on each iteration. > 🎓 k-means++: In [Scikit-learn](https://scikit-learn.org/stable/modules/clustering.html#k-means) you can use the 'k-means++' optimization, which "initializes the centroids to be (generally) distant from each other, leading to probably better results than random initialization. @@ -224,7 +220,7 @@ Previously, you surmised that, because you have targeted 3 song genres, you shou ## Variance -Variance is defined as "the average of the squared differences from the Mean" [source](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean. +Variance is defined as "the average of the squared differences from the Mean" [(Source)](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean. ✅ This is a great moment to think about all the ways you could correct this issue. Tweak the data a bit more? Use different columns? Use a different algorithm? Hint: Try [scaling your data](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/) to normalize it and test other columns. diff --git a/5-Clustering/2-K-Means/solution/R/lesson_15.Rmd b/5-Clustering/2-K-Means/solution/R/lesson_15.Rmd index 61f7869a8..901ec0155 100644 --- a/5-Clustering/2-K-Means/solution/R/lesson_15.Rmd +++ b/5-Clustering/2-K-Means/solution/R/lesson_15.Rmd @@ -206,7 +206,7 @@ Perfect, we have just partitioned our data set into a set of 3 groups. So, how g ### **Silhouette score** -[Silhouette analysis](https://en.wikipedia.org/wiki/Silhouette_(clustering)) can be used to study the separation distance between the resulting clusters. This score varies from -1 to 1, and if the score is near 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters.[source](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam). +[Silhouette analysis](https://en.wikipedia.org/wiki/Silhouette_(clustering)) can be used to study the separation distance between the resulting clusters. This score varies from -1 to 1, and if the score is near 1, the cluster is dense and well-separated from other clusters. A value near 0 represents overlapping clusters with samples very close to the decision boundary of the neighboring clusters. [(Source)](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam). The average silhouette method computes the average silhouette of observations for different values of *k*. A high average silhouette score indicates a good clustering. @@ -339,7 +339,7 @@ In Scikit-learn's documentation, you can see that a model like this one, with cl ## **Variance** -Variance is defined as "the average of the squared differences from the Mean" [source](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean. +Variance is defined as "the average of the squared differences from the Mean" [(Source)](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean. ✅ This is a great moment to think about all the ways you could correct this issue. Tweak the data a bit more? Use different columns? Use a different algorithm? Hint: Try [scaling your data](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/) to normalize it and test other columns. diff --git a/5-Clustering/2-K-Means/solution/R/lesson_15.html b/5-Clustering/2-K-Means/solution/R/lesson_15.html new file mode 100644 index 000000000..db00cf0fa --- /dev/null +++ b/5-Clustering/2-K-Means/solution/R/lesson_15.html @@ -0,0 +1,5495 @@ + + + + + + + + + + + + + +K-Means Clustering using Tidymodels and friends + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + +
+ + + +
+
+
+
+
+ +
+ + + + + + + +
+

Explore K-Means clustering using R and Tidy data principles.

+
+

Pre-lecture +quiz

+

In this lesson, you will learn how to create clusters using the +Tidymodels package and other packages in the R ecosystem (we’ll call +them friends 🧑‍🤝‍🧑), and the Nigerian music dataset you imported earlier. +We will cover the basics of K-Means for Clustering. Keep in mind that, +as you learned in the earlier lesson, there are many ways to work with +clusters and the method you use depends on your data. We will try +K-Means as it’s the most common clustering technique. Let’s get +started!

+

Terms you will learn about:

+
    +
  • Silhouette scoring

  • +
  • Elbow method

  • +
  • Inertia

  • +
  • Variance

  • +
+
+
+

Introduction

+

K-Means +Clustering is a method derived from the domain of signal processing. +It is used to divide and partition groups of data into +k clusters based on similarities in their features.

+

The clusters can be visualized as Voronoi diagrams, +which include a point (or ‘seed’) and its corresponding region.

+
+Infographic by Jen Looper +
Infographic by Jen Looper
+
+

K-Means clustering has the following steps:

+
    +
  1. The data scientist starts by specifying the desired number of +clusters to be created.

  2. +
  3. Next, the algorithm randomly selects K observations from the data +set to serve as the initial centers for the clusters (i.e., +centroids).

  4. +
  5. Next, each of the remaining observations is assigned to its +closest centroid.

  6. +
  7. Next, the new means of each cluster is computed and the centroid +is moved to the mean.

  8. +
  9. Now that the centers have been recalculated, every observation is +checked again to see if it might be closer to a different cluster. All +the objects are reassigned again using the updated cluster means. The +cluster assignment and centroid update steps are iteratively repeated +until the cluster assignments stop changing (i.e., when convergence is +achieved). Typically, the algorithm terminates when each new iteration +results in negligible movement of centroids and the clusters become +static.

  10. +
+
+
+

Note that due to randomization of the initial k observations used as +the starting centroids, we can get slightly different results each time +we apply the procedure. For this reason, most algorithms use several +random starts and choose the iteration with the lowest WCSS. As +such, it is strongly recommended to always run K-Means with several +values of nstart to avoid an undesirable local +optimum.

+
+
+

This short animation using the artwork +of Allison Horst explains the clustering process:

+
+Artwork by @allison_horst +
Artwork by @allison_horst
+
+

A fundamental question that arises in clustering is this: how do you +know how many clusters to separate your data into? One drawback of using +K-Means includes the fact that you will need to establish +k, that is the number of centroids. +Fortunately the elbow method helps to estimate a good +starting value for k. You’ll try it in a minute.

+
+
+

+

Prerequisite

+

We’ll pick off right from where we stopped in the previous +lesson, where we analysed the data set, made lots of visualizations +and filtered the data set to observations of interest. Be sure to check +it out!

+

We’ll require some packages to knock-off this module. You can have +them installed as: +install.packages(c('tidyverse', 'tidymodels', 'cluster', 'summarytools', 'plotly', 'paletteer', 'factoextra', 'patchwork'))

+

Alternatively, the script below checks whether you have the packages +required to complete this module and installs them for you in case some +are missing.

+
suppressWarnings(if(!require("pacman")) install.packages("pacman",repos = "http://cran.us.r-project.org"))
+
## Loading required package: pacman
+
pacman::p_load('tidyverse', 'tidymodels', 'cluster', 'summarytools', 'plotly', 'paletteer', 'factoextra', 'patchwork')
+
## 
+## The downloaded binary packages are in
+##  /var/folders/c9/r3f6t3kj3wv9jrh50g63hp1r0000gn/T//RtmpHKd9vp/downloaded_packages
+
## 
+## summarytools installed
+
## Warning in pacman::p_load("tidyverse", "tidymodels", "cluster", "summarytools", : Failed to install/load:
+## summarytools
+

Let’s hit the ground running!

+
+
+ +
+

2. More data exploration.

+

How clean is this data? Let’s check for outliers using box plots. We +will concentrate on numeric columns with fewer outliers (although you +could clean out the outliers). Boxplots can show the range of the data +and will help choose which columns to use. Note, Boxplots do not show +variance, an important element of good clusterable data. Please see this +discussion for further reading.

+

Boxplots are +used to graphically depict the distribution of numeric +data, so let’s start by selecting all numeric columns alongside +the popular music genres.

+
# Select top genre column and all other numeric columns
+df_numeric <- nigerian_songs %>% 
+  select(artist_top_genre, where(is.numeric)) 
+
+# Display the data
+df_numeric %>% 
+  slice_head(n = 5)
+
+ +
+

See how the selection helper where makes this easy 💁? +Explore such other functions here.

+

Since we’ll be making a boxplot for each numeric features and we want +to avoid using loops, let’s reformat our data into a longer +format that will allow us to take advantage of facets - +subplots that each display one subset of the data.

+
# Pivot data from wide to long
+df_numeric_long <- df_numeric %>% 
+  pivot_longer(!artist_top_genre, names_to = "feature_names", values_to = "values") 
+
+# Print out data
+df_numeric_long %>% 
+  slice_head(n = 15)
+
+ +
+

Much longer! Now time for some ggplots! So what +geom will we use?

+
# Make a box plot
+df_numeric_long %>% 
+  ggplot(mapping = aes(x = feature_names, y = values, fill = feature_names)) +
+  geom_boxplot() +
+  facet_wrap(~ feature_names, ncol = 4, scales = "free") +
+  theme(legend.position = "none")
+

+

Easy-gg!

+

Now we can see this data is a little noisy: by observing each column +as a boxplot, you can see outliers. You could go through the dataset and +remove these outliers, but that would make the data pretty minimal.

+

For now, let’s choose which columns we will use for our clustering +exercise. Let’s pick the numeric columns with similar ranges. We could +encode the artist_top_genre as numeric but we’ll drop it +for now.

+
# Select variables with similar ranges
+df_numeric_select <- df_numeric %>% 
+  select(popularity, danceability, acousticness, loudness, energy) 
+
+# Normalize data
+# df_numeric_select <- scale(df_numeric_select)
+
+
+

3. Computing k-means clustering in R

+

We can compute k-means in R with the built-in kmeans +function, see help("kmeans()"). kmeans() +function accepts a data frame with all numeric columns as it’s primary +argument.

+

The first step when using k-means clustering is to specify the number +of clusters (k) that will be generated in the final solution. We know +there are 3 song genres that we carved out of the dataset, so let’s try +3:

+
set.seed(2056)
+# Kmeans clustering for 3 clusters
+kclust <- kmeans(
+  df_numeric_select,
+  # Specify the number of clusters
+  centers = 3,
+  # How many random initial configurations
+  nstart = 25
+)
+
+# Display clustering object
+kclust
+
## K-means clustering with 3 clusters of sizes 65, 111, 110
+## 
+## Cluster means:
+##   popularity danceability acousticness  loudness    energy
+## 1   53.40000    0.7698615    0.2684248 -5.081200 0.7167231
+## 2   31.28829    0.7310811    0.2558767 -5.159550 0.7589279
+## 3   10.12727    0.7458727    0.2720171 -4.586418 0.7906091
+## 
+## Clustering vector:
+##   [1] 2 3 2 2 2 2 2 2 2 3 2 2 3 2 1 2 3 3 1 3 1 1 1 3 1 2 1 1 2 2 3 3 1 2 2 2 2
+##  [38] 3 3 1 2 1 2 1 2 1 1 3 3 2 3 1 1 2 2 2 2 3 3 1 3 2 2 3 2 2 3 2 3 2 2 3 3 3
+##  [75] 3 3 2 3 2 2 1 2 3 3 3 2 2 2 2 3 2 2 2 2 3 3 2 3 3 2 3 2 3 2 3 2 2 3 2 1 3
+## [112] 3 2 3 3 2 2 2 2 2 2 2 1 3 3 3 3 1 3 2 3 2 3 2 2 2 1 2 3 3 3 2 3 1 3 2 2 3
+## [149] 3 3 1 3 2 2 2 3 3 1 3 2 3 3 3 3 2 1 1 1 3 1 1 1 1 1 1 2 1 3 1 1 3 1 1 2 1
+## [186] 1 3 3 2 1 2 2 1 2 2 3 3 1 3 3 1 1 3 1 2 1 3 1 2 1 1 2 2 2 3 3 3 3 3 1 2 2
+## [223] 2 2 2 3 3 3 3 3 2 2 3 3 1 3 3 3 1 2 2 2 3 3 1 1 3 3 2 1 1 1 1 1 2 1 1 2 3
+## [260] 3 3 2 2 2 3 2 3 2 3 3 3 1 2 2 2 3 2 3 1 3 2 3 3 3 2 3
+## 
+## Within cluster sum of squares by cluster:
+## [1] 3550.293 4559.358 4889.010
+##  (between_SS / total_SS =  85.8 %)
+## 
+## Available components:
+## 
+## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
+## [6] "betweenss"    "size"         "iter"         "ifault"
+

The kmeans object contains several bits of information which is well +explained in help("kmeans()"). For now, let’s focus on a +few. We see that the data has been grouped into 3 clusters of sizes 65, +110, 111. The output also contains the cluster centers (means) for the 3 +groups across the 5 variables.

+

The clustering vector is the cluster assignment for each observation. +Let’s use the augment function to add the cluster +assignment the original data set.

+
# Add predicted cluster assignment to data set
+augment(kclust, df_numeric_select) %>% 
+  relocate(.cluster) %>% 
+  slice_head(n = 10)
+
+ +
+

Perfect, we have just partitioned our data set into a set of 3 +groups. So, how good is our clustering 🤷? Let’s take a look at the +Silhouette score

+
+

Silhouette score

+

Silhouette +analysis can be used to study the separation distance between the +resulting clusters. This score varies from -1 to 1, and if the score is +near 1, the cluster is dense and well-separated from other clusters. A +value near 0 represents overlapping clusters with samples very close to +the decision boundary of the neighboring clusters. (Source).

+

The average silhouette method computes the average silhouette of +observations for different values of k. A high average +silhouette score indicates a good clustering.

+

The silhouette function in the cluster package to +compuate the average silhouette width.

+
+

The silhouette can be calculated with any distance metric, such as the Euclidean distance or the Manhattan distance which we discussed in +the previous +lesson.

+
+
# Load cluster package
+library(cluster)
+
+# Compute average silhouette score
+ss <- silhouette(kclust$cluster,
+                 # Compute euclidean distance
+                 dist = dist(df_numeric_select))
+mean(ss[, 3])
+
## [1] 0.5494668
+

Our score is .549, so right in the middle. This +indicates that our data is not particularly well-suited to this type of +clustering. Let’s see whether we can confirm this hunch visually. The factoextra +package provides functions (fviz_cluster()) to +visualize clustering.

+
library(factoextra)
+
+# Visualize clustering results
+fviz_cluster(kclust, df_numeric_select)
+

+

The overlap in clusters indicates that our data is not particularly +well-suited to this type of clustering but let’s continue.

+
+
+
+

4. Determining optimal clusters

+

A fundamental question that often arises in K-Means clustering is +this - without known class labels, how do you know how many clusters to +separate your data into?

+

One way we can try to find out is to use a data sample to +create a series of clustering models with an incrementing +number of clusters (e.g from 1-10), and evaluate clustering metrics such +as the Silhouette score.

+

Let’s determine the optimal number of clusters by computing the +clustering algorithm for different values of k and evaluating +the Within Cluster Sum of Squares (WCSS). The total +within-cluster sum of square (WCSS) measures the compactness of the +clustering and we want it to be as small as possible, with lower values +meaning that the data points are closer.

+

Let’s explore the effect of different choices of k, from +1 to 10, on this clustering.

+
# Create a series of clustering models
+kclusts <- tibble(k = 1:10) %>% 
+  # Perform kmeans clustering for 1,2,3 ... ,10 clusters
+  mutate(model = map(k, ~ kmeans(df_numeric_select, centers = .x, nstart = 25)),
+  # Farm out clustering metrics eg WCSS
+         glanced = map(model, ~ glance(.x))) %>% 
+  unnest(cols = glanced)
+  
+
+# View clustering rsulsts
+kclusts
+
+ +
+

Now that we have the total within-cluster sum-of-squares +(tot.withinss) for each clustering algorithm with center k, we +use the elbow +method to find the optimal number of clusters. The method consists +of plotting the WCSS as a function of the number of clusters, and +picking the elbow of the curve as the number of +clusters to use.

+
set.seed(2056)
+# Use elbow method to determine optimum number of clusters
+kclusts %>% 
+  ggplot(mapping = aes(x = k, y = tot.withinss)) +
+  geom_line(size = 1.2, alpha = 0.8, color = "#FF7F0EFF") +
+  geom_point(size = 2, color = "#FF7F0EFF")
+
## Warning: Using `size` aesthetic for lines was deprecated in ggplot2 3.4.0.
+## ℹ Please use `linewidth` instead.
+## This warning is displayed once every 8 hours.
+## Call `lifecycle::last_lifecycle_warnings()` to see where this warning was
+## generated.
+

+

The plot shows a large reduction in WCSS (so greater +tightness) as the number of clusters increases from one to two, +and a further noticeable reduction from two to three clusters. After +that, the reduction is less pronounced, resulting in an +elbow 💪in the chart at around three clusters. This is a +good indication that there are two to three reasonably well separated +clusters of data points.

+

We can now go ahead and extract the clustering model where +k = 3:

+
+

pull(): used to extract a single column

+

pluck(): used to index data structures such as lists

+
+
# Extract k = 3 clustering
+final_kmeans <- kclusts %>% 
+  filter(k == 3) %>% 
+  pull(model) %>% 
+  pluck(1)
+
+
+final_kmeans
+
## K-means clustering with 3 clusters of sizes 111, 110, 65
+## 
+## Cluster means:
+##   popularity danceability acousticness  loudness    energy
+## 1   31.28829    0.7310811    0.2558767 -5.159550 0.7589279
+## 2   10.12727    0.7458727    0.2720171 -4.586418 0.7906091
+## 3   53.40000    0.7698615    0.2684248 -5.081200 0.7167231
+## 
+## Clustering vector:
+##   [1] 1 2 1 1 1 1 1 1 1 2 1 1 2 1 3 1 2 2 3 2 3 3 3 2 3 1 3 3 1 1 2 2 3 1 1 1 1
+##  [38] 2 2 3 1 3 1 3 1 3 3 2 2 1 2 3 3 1 1 1 1 2 2 3 2 1 1 2 1 1 2 1 2 1 1 2 2 2
+##  [75] 2 2 1 2 1 1 3 1 2 2 2 1 1 1 1 2 1 1 1 1 2 2 1 2 2 1 2 1 2 1 2 1 1 2 1 3 2
+## [112] 2 1 2 2 1 1 1 1 1 1 1 3 2 2 2 2 3 2 1 2 1 2 1 1 1 3 1 2 2 2 1 2 3 2 1 1 2
+## [149] 2 2 3 2 1 1 1 2 2 3 2 1 2 2 2 2 1 3 3 3 2 3 3 3 3 3 3 1 3 2 3 3 2 3 3 1 3
+## [186] 3 2 2 1 3 1 1 3 1 1 2 2 3 2 2 3 3 2 3 1 3 2 3 1 3 3 1 1 1 2 2 2 2 2 3 1 1
+## [223] 1 1 1 2 2 2 2 2 1 1 2 2 3 2 2 2 3 1 1 1 2 2 3 3 2 2 1 3 3 3 3 3 1 3 3 1 2
+## [260] 2 2 1 1 1 2 1 2 1 2 2 2 3 1 1 1 2 1 2 3 2 1 2 2 2 1 2
+## 
+## Within cluster sum of squares by cluster:
+## [1] 4559.358 4889.010 3550.293
+##  (between_SS / total_SS =  85.8 %)
+## 
+## Available components:
+## 
+## [1] "cluster"      "centers"      "totss"        "withinss"     "tot.withinss"
+## [6] "betweenss"    "size"         "iter"         "ifault"
+

Great! Let’s go ahead and visualize the clusters obtained. Care for +some interactivity using plotly?

+
# Add predicted cluster assignment to data set
+results <-  augment(final_kmeans, df_numeric_select) %>% 
+  bind_cols(df_numeric %>% select(artist_top_genre)) 
+
+# Plot cluster assignments
+clust_plt <- results %>% 
+  ggplot(mapping = aes(x = popularity, y = danceability, color = .cluster, shape = artist_top_genre)) +
+  geom_point(size = 2, alpha = 0.8) +
+  paletteer::scale_color_paletteer_d("ggthemes::Tableau_10")
+
+ggplotly(clust_plt)
+
+ +

Perhaps we would have expected that each cluster (represented by +different colors) would have distinct genres (represented by different +shapes).

+

Let’s take a look at the model’s accuracy.

+
# Assign genres to predefined integers
+label_count <- results %>% 
+  group_by(artist_top_genre) %>% 
+  mutate(id = cur_group_id()) %>% 
+  ungroup() %>% 
+  summarise(correct_labels = sum(.cluster == id))
+
+
+# Print results  
+cat("Result:", label_count$correct_labels, "out of", nrow(results), "samples were correctly labeled.")
+
## Result: 109 out of 286 samples were correctly labeled.
+
cat("\nAccuracy score:", label_count$correct_labels/nrow(results))
+
## 
+## Accuracy score: 0.3811189
+

This model’s accuracy is not bad, but not great. It may be that the +data may not lend itself well to K-Means Clustering. This data is too +imbalanced, too little correlated and there is too much variance between +the column values to cluster well. In fact, the clusters that form are +probably heavily influenced or skewed by the three genre categories we +defined above.

+

Nevertheless, that was quite a learning process!

+

In Scikit-learn’s documentation, you can see that a model like this +one, with clusters not very well demarcated, has a ‘variance’ +problem:

+
+Infographic from Scikit-learn +
Infographic from Scikit-learn
+
+
+
+

Variance

+

Variance is defined as “the average of the squared differences from +the Mean” (Source). +In the context of this clustering problem, it refers to data that the +numbers of our dataset tend to diverge a bit too much from the mean.

+

✅ This is a great moment to think about all the ways you could +correct this issue. Tweak the data a bit more? Use different columns? +Use a different algorithm? Hint: Try scaling +your data to normalize it and test other columns.

+
+

Try this ‘variance +calculator’ to understand the concept a bit more.

+
+
+
+
+

🚀Challenge

+

Spend some time with this notebook, tweaking parameters. Can you +improve the accuracy of the model by cleaning the data more (removing +outliers, for example)? You can use weights to give more weight to given +data samples. What else can you do to create better clusters?

+

Hint: Try to scale your data. There’s commented code in the notebook +that adds standard scaling to make the data columns resemble each other +more closely in terms of range. You’ll find that while the silhouette +score goes down, the ‘kink’ in the elbow graph smooths out. This is +because leaving the data unscaled allows data with less variance to +carry more weight. Read a bit more on this problem here.

+
+
+

Post-lecture +quiz

+
+
+

Review & Self Study

+
    +
  • Take a look at a K-Means Simulator such +as this one. You can use this tool to visualize sample data points +and determine its centroids. You can edit the data’s randomness, numbers +of clusters and numbers of centroids. Does this help you get an idea of +how the data can be grouped?

  • +
  • Also, take a look at this +handout on K-Means from Stanford.

  • +
+

Want to try out your newly acquired clustering skills to data sets +that lend well to K-Means clustering? Please see:

+ +
+ +
+

THANK YOU TO:

+

Jen Looper for +creating the original Python version of this module ♥️

+

Allison Horst +for creating the amazing illustrations that make R more welcoming and +engaging. Find more illustrations at her gallery.

+

Happy Learning,

+

Eric, Gold Microsoft Learn +Student Ambassador.

+
+Artwork by @allison_horst +
Artwork by @allison_horst
+
+

#{r include=FALSE} #library(here) #library(rmd2jupyter) #rmd2jupyter("lesson_14.Rmd") #

+
+ +
LS0tCnRpdGxlOiAnSy1NZWFucyBDbHVzdGVyaW5nIHVzaW5nIFRpZHltb2RlbHMgYW5kIGZyaWVuZHMnCm91dHB1dDoKICBodG1sX2RvY3VtZW50OgogICAgI2Nzczogc3R5bGVfNy5jc3MKICAgIGRmX3ByaW50OiBwYWdlZAogICAgdGhlbWU6IGZsYXRseQogICAgaGlnaGxpZ2h0OiBicmVlemVkYXJrCiAgICB0b2M6IHllcwogICAgdG9jX2Zsb2F0OiB5ZXMKICAgIGNvZGVfZG93bmxvYWQ6IHllcwotLS0KCiMjIEV4cGxvcmUgSy1NZWFucyBjbHVzdGVyaW5nIHVzaW5nIFIgYW5kIFRpZHkgZGF0YSBwcmluY2lwbGVzLgoKIyMjIFsqKlByZS1sZWN0dXJlIHF1aXoqKl0oaHR0cHM6Ly9ncmF5LXNhbmQtMDdhMTBmNDAzLjEuYXp1cmVzdGF0aWNhcHBzLm5ldC9xdWl6LzI5LykKCkluIHRoaXMgbGVzc29uLCB5b3Ugd2lsbCBsZWFybiBob3cgdG8gY3JlYXRlIGNsdXN0ZXJzIHVzaW5nIHRoZSBUaWR5bW9kZWxzIHBhY2thZ2UgYW5kIG90aGVyIHBhY2thZ2VzIGluIHRoZSBSIGVjb3N5c3RlbSAod2UnbGwgY2FsbCB0aGVtIGZyaWVuZHMg8J+nkeKAjfCfpJ3igI3wn6eRKSwgYW5kIHRoZSBOaWdlcmlhbiBtdXNpYyBkYXRhc2V0IHlvdSBpbXBvcnRlZCBlYXJsaWVyLiBXZSB3aWxsIGNvdmVyIHRoZSBiYXNpY3Mgb2YgSy1NZWFucyBmb3IgQ2x1c3RlcmluZy4gS2VlcCBpbiBtaW5kIHRoYXQsIGFzIHlvdSBsZWFybmVkIGluIHRoZSBlYXJsaWVyIGxlc3NvbiwgdGhlcmUgYXJlIG1hbnkgd2F5cyB0byB3b3JrIHdpdGggY2x1c3RlcnMgYW5kIHRoZSBtZXRob2QgeW91IHVzZSBkZXBlbmRzIG9uIHlvdXIgZGF0YS4gV2Ugd2lsbCB0cnkgSy1NZWFucyBhcyBpdCdzIHRoZSBtb3N0IGNvbW1vbiBjbHVzdGVyaW5nIHRlY2huaXF1ZS4gTGV0J3MgZ2V0IHN0YXJ0ZWQhCgpUZXJtcyB5b3Ugd2lsbCBsZWFybiBhYm91dDoKCi0gICBTaWxob3VldHRlIHNjb3JpbmcKCi0gICBFbGJvdyBtZXRob2QKCi0gICBJbmVydGlhCgotICAgVmFyaWFuY2UKCiMjIyAqKkludHJvZHVjdGlvbioqCgpbSy1NZWFucyBDbHVzdGVyaW5nXShodHRwczovL3dpa2lwZWRpYS5vcmcvd2lraS9LLW1lYW5zX2NsdXN0ZXJpbmcpIGlzIGEgbWV0aG9kIGRlcml2ZWQgZnJvbSB0aGUgZG9tYWluIG9mIHNpZ25hbCBwcm9jZXNzaW5nLiBJdCBpcyB1c2VkIHRvIGRpdmlkZSBhbmQgcGFydGl0aW9uIGdyb3VwcyBvZiBkYXRhIGludG8gYGsgY2x1c3RlcnNgIGJhc2VkIG9uIHNpbWlsYXJpdGllcyBpbiB0aGVpciBmZWF0dXJlcy4KClRoZSBjbHVzdGVycyBjYW4gYmUgdmlzdWFsaXplZCBhcyBbVm9yb25vaSBkaWFncmFtc10oaHR0cHM6Ly93aWtpcGVkaWEub3JnL3dpa2kvVm9yb25vaV9kaWFncmFtKSwgd2hpY2ggaW5jbHVkZSBhIHBvaW50IChvciAnc2VlZCcpIGFuZCBpdHMgY29ycmVzcG9uZGluZyByZWdpb24uCgohW0luZm9ncmFwaGljIGJ5IEplbiBMb29wZXJdKC4uLy4uL2ltYWdlcy92b3Jvbm9pLnBuZykKCkstTWVhbnMgY2x1c3RlcmluZyBoYXMgdGhlIGZvbGxvd2luZyBzdGVwczoKCjEuICBUaGUgZGF0YSBzY2llbnRpc3Qgc3RhcnRzIGJ5IHNwZWNpZnlpbmcgdGhlIGRlc2lyZWQgbnVtYmVyIG9mIGNsdXN0ZXJzIHRvIGJlIGNyZWF0ZWQuCgoyLiAgTmV4dCwgdGhlIGFsZ29yaXRobSByYW5kb21seSBzZWxlY3RzIEsgb2JzZXJ2YXRpb25zIGZyb20gdGhlIGRhdGEgc2V0IHRvIHNlcnZlIGFzIHRoZSBpbml0aWFsIGNlbnRlcnMgZm9yIHRoZSBjbHVzdGVycyAoaS5lLiwgY2VudHJvaWRzKS4KCjMuICBOZXh0LCBlYWNoIG9mIHRoZSByZW1haW5pbmcgb2JzZXJ2YXRpb25zIGlzIGFzc2lnbmVkIHRvIGl0cyBjbG9zZXN0IGNlbnRyb2lkLgoKNC4gIE5leHQsIHRoZSBuZXcgbWVhbnMgb2YgZWFjaCBjbHVzdGVyIGlzIGNvbXB1dGVkIGFuZCB0aGUgY2VudHJvaWQgaXMgbW92ZWQgdG8gdGhlIG1lYW4uCgo1LiAgTm93IHRoYXQgdGhlIGNlbnRlcnMgaGF2ZSBiZWVuIHJlY2FsY3VsYXRlZCwgZXZlcnkgb2JzZXJ2YXRpb24gaXMgY2hlY2tlZCBhZ2FpbiB0byBzZWUgaWYgaXQgbWlnaHQgYmUgY2xvc2VyIHRvIGEgZGlmZmVyZW50IGNsdXN0ZXIuIEFsbCB0aGUgb2JqZWN0cyBhcmUgcmVhc3NpZ25lZCBhZ2FpbiB1c2luZyB0aGUgdXBkYXRlZCBjbHVzdGVyIG1lYW5zLiBUaGUgY2x1c3RlciBhc3NpZ25tZW50IGFuZCBjZW50cm9pZCB1cGRhdGUgc3RlcHMgYXJlIGl0ZXJhdGl2ZWx5IHJlcGVhdGVkIHVudGlsIHRoZSBjbHVzdGVyIGFzc2lnbm1lbnRzIHN0b3AgY2hhbmdpbmcgKGkuZS4sIHdoZW4gY29udmVyZ2VuY2UgaXMgYWNoaWV2ZWQpLiBUeXBpY2FsbHksIHRoZSBhbGdvcml0aG0gdGVybWluYXRlcyB3aGVuIGVhY2ggbmV3IGl0ZXJhdGlvbiByZXN1bHRzIGluIG5lZ2xpZ2libGUgbW92ZW1lbnQgb2YgY2VudHJvaWRzIGFuZCB0aGUgY2x1c3RlcnMgYmVjb21lIHN0YXRpYy4KCjxkaXY+Cgo+IE5vdGUgdGhhdCBkdWUgdG8gcmFuZG9taXphdGlvbiBvZiB0aGUgaW5pdGlhbCBrIG9ic2VydmF0aW9ucyB1c2VkIGFzIHRoZSBzdGFydGluZyBjZW50cm9pZHMsIHdlIGNhbiBnZXQgc2xpZ2h0bHkgZGlmZmVyZW50IHJlc3VsdHMgZWFjaCB0aW1lIHdlIGFwcGx5IHRoZSBwcm9jZWR1cmUuIEZvciB0aGlzIHJlYXNvbiwgbW9zdCBhbGdvcml0aG1zIHVzZSBzZXZlcmFsICpyYW5kb20gc3RhcnRzKiBhbmQgY2hvb3NlIHRoZSBpdGVyYXRpb24gd2l0aCB0aGUgbG93ZXN0IFdDU1MuIEFzIHN1Y2gsIGl0IGlzIHN0cm9uZ2x5IHJlY29tbWVuZGVkIHRvIGFsd2F5cyBydW4gSy1NZWFucyB3aXRoIHNldmVyYWwgdmFsdWVzIG9mICpuc3RhcnQqIHRvIGF2b2lkIGFuICp1bmRlc2lyYWJsZSBsb2NhbCBvcHRpbXVtLioKCjwvZGl2PgoKVGhpcyBzaG9ydCBhbmltYXRpb24gdXNpbmcgdGhlIFthcnR3b3JrXShodHRwczovL2dpdGh1Yi5jb20vYWxsaXNvbmhvcnN0L3N0YXRzLWlsbHVzdHJhdGlvbnMpIG9mIEFsbGlzb24gSG9yc3QgZXhwbGFpbnMgdGhlIGNsdXN0ZXJpbmcgcHJvY2VzczoKCiFbQXJ0d29yayBieSBcQGFsbGlzb25faG9yc3RdKC4uLy4uL2ltYWdlcy9rbWVhbnMuZ2lmKQoKQSBmdW5kYW1lbnRhbCBxdWVzdGlvbiB0aGF0IGFyaXNlcyBpbiBjbHVzdGVyaW5nIGlzIHRoaXM6IGhvdyBkbyB5b3Uga25vdyBob3cgbWFueSBjbHVzdGVycyB0byBzZXBhcmF0ZSB5b3VyIGRhdGEgaW50bz8gT25lIGRyYXdiYWNrIG9mIHVzaW5nIEstTWVhbnMgaW5jbHVkZXMgdGhlIGZhY3QgdGhhdCB5b3Ugd2lsbCBuZWVkIHRvIGVzdGFibGlzaCBga2AsIHRoYXQgaXMgdGhlIG51bWJlciBvZiBgY2VudHJvaWRzYC4gRm9ydHVuYXRlbHkgdGhlIGBlbGJvdyBtZXRob2RgIGhlbHBzIHRvIGVzdGltYXRlIGEgZ29vZCBzdGFydGluZyB2YWx1ZSBmb3IgYGtgLiBZb3UnbGwgdHJ5IGl0IGluIGEgbWludXRlLgoKIyMjIAoKKipQcmVyZXF1aXNpdGUqKgoKV2UnbGwgcGljayBvZmYgcmlnaHQgZnJvbSB3aGVyZSB3ZSBzdG9wcGVkIGluIHRoZSBbcHJldmlvdXMgbGVzc29uXShodHRwczovL2dpdGh1Yi5jb20vbWljcm9zb2Z0L01MLUZvci1CZWdpbm5lcnMvYmxvYi9tYWluLzUtQ2x1c3RlcmluZy8xLVZpc3VhbGl6ZS9zb2x1dGlvbi9SL2xlc3Nvbl8xNC1SLmlweW5iKSwgd2hlcmUgd2UgYW5hbHlzZWQgdGhlIGRhdGEgc2V0LCBtYWRlIGxvdHMgb2YgdmlzdWFsaXphdGlvbnMgYW5kIGZpbHRlcmVkIHRoZSBkYXRhIHNldCB0byBvYnNlcnZhdGlvbnMgb2YgaW50ZXJlc3QuIEJlIHN1cmUgdG8gY2hlY2sgaXQgb3V0IQoKV2UnbGwgcmVxdWlyZSBzb21lIHBhY2thZ2VzIHRvIGtub2NrLW9mZiB0aGlzIG1vZHVsZS4gWW91IGNhbiBoYXZlIHRoZW0gaW5zdGFsbGVkIGFzOiBgaW5zdGFsbC5wYWNrYWdlcyhjKCd0aWR5dmVyc2UnLCAndGlkeW1vZGVscycsICdjbHVzdGVyJywgJ3N1bW1hcnl0b29scycsICdwbG90bHknLCAncGFsZXR0ZWVyJywgJ2ZhY3RvZXh0cmEnLCAncGF0Y2h3b3JrJykpYAoKQWx0ZXJuYXRpdmVseSwgdGhlIHNjcmlwdCBiZWxvdyBjaGVja3Mgd2hldGhlciB5b3UgaGF2ZSB0aGUgcGFja2FnZXMgcmVxdWlyZWQgdG8gY29tcGxldGUgdGhpcyBtb2R1bGUgYW5kIGluc3RhbGxzIHRoZW0gZm9yIHlvdSBpbiBjYXNlIHNvbWUgYXJlIG1pc3NpbmcuCgpgYGB7cn0Kc3VwcHJlc3NXYXJuaW5ncyhpZighcmVxdWlyZSgicGFjbWFuIikpIGluc3RhbGwucGFja2FnZXMoInBhY21hbiIscmVwb3MgPSAiaHR0cDovL2NyYW4udXMuci1wcm9qZWN0Lm9yZyIpKQoKcGFjbWFuOjpwX2xvYWQoJ3RpZHl2ZXJzZScsICd0aWR5bW9kZWxzJywgJ2NsdXN0ZXInLCAnc3VtbWFyeXRvb2xzJywgJ3Bsb3RseScsICdwYWxldHRlZXInLCAnZmFjdG9leHRyYScsICdwYXRjaHdvcmsnKQpgYGAKCkxldCdzIGhpdCB0aGUgZ3JvdW5kIHJ1bm5pbmchCgojIyAxLiBBIGRhbmNlIHdpdGggZGF0YTogTmFycm93IGRvd24gdG8gdGhlIDMgbW9zdCBwb3B1bGFyIG11c2ljIGdlbnJlcwoKVGhpcyBpcyBhIHJlY2FwIG9mIHdoYXQgd2UgZGlkIGluIHRoZSBwcmV2aW91cyBsZXNzb24uIExldCdzIHNsaWNlIGFuZCBkaWNlIHNvbWUgZGF0YSEKCmBgYHtyIG1lc3NhZ2U9Riwgd2FybmluZz1GfQojIExvYWQgdGhlIGNvcmUgdGlkeXZlcnNlIGFuZCBtYWtlIGl0IGF2YWlsYWJsZSBpbiB5b3VyIGN1cnJlbnQgUiBzZXNzaW9uCmxpYnJhcnkodGlkeXZlcnNlKQoKIyBJbXBvcnQgdGhlIGRhdGEgaW50byBhIHRpYmJsZQpkZiA8LSByZWFkX2NzdihmaWxlID0gImh0dHBzOi8vcmF3LmdpdGh1YnVzZXJjb250ZW50LmNvbS9taWNyb3NvZnQvTUwtRm9yLUJlZ2lubmVycy9tYWluLzUtQ2x1c3RlcmluZy9kYXRhL25pZ2VyaWFuLXNvbmdzLmNzdiIsIHNob3dfY29sX3R5cGVzID0gRkFMU0UpCgojIE5hcnJvdyBkb3duIHRvIHRvcCAzIHBvcHVsYXIgZ2VucmVzCm5pZ2VyaWFuX3NvbmdzIDwtIGRmICU+JSAKICAjIENvbmNlbnRyYXRlIG9uIHRvcCAzIGdlbnJlcwogIGZpbHRlcihhcnRpc3RfdG9wX2dlbnJlICVpbiUgYygiYWZybyBkYW5jZWhhbGwiLCAiYWZyb3BvcCIsIm5pZ2VyaWFuIHBvcCIpKSAlPiUgCiAgIyBSZW1vdmUgdW5jbGFzc2lmaWVkIG9ic2VydmF0aW9ucwogIGZpbHRlcihwb3B1bGFyaXR5ICE9IDApCgoKCiMgVmlzdWFsaXplIHBvcHVsYXIgZ2VucmVzIHVzaW5nIGJhciBwbG90cwp0aGVtZV9zZXQodGhlbWVfbGlnaHQoKSkKbmlnZXJpYW5fc29uZ3MgJT4lCiAgY291bnQoYXJ0aXN0X3RvcF9nZW5yZSkgJT4lCiAgZ2dwbG90KG1hcHBpbmcgPSBhZXMoeCA9IGFydGlzdF90b3BfZ2VucmUsIHkgPSBuLAogICAgICAgICAgICAgICAgICAgICAgIGZpbGwgPSBhcnRpc3RfdG9wX2dlbnJlKSkgKwogIGdlb21fY29sKGFscGhhID0gMC44KSArCiAgcGFsZXR0ZWVyOjpzY2FsZV9maWxsX3BhbGV0dGVlcl9kKCJnZ3NjaTo6Y2F0ZWdvcnkxMF9kMyIpICsKICBnZ3RpdGxlKCJUb3AgZ2VucmVzIikgKwogIHRoZW1lKHBsb3QudGl0bGUgPSBlbGVtZW50X3RleHQoaGp1c3QgPSAwLjUpKQoKCmBgYAoK8J+kqSBUaGF0IHdlbnQgd2VsbCEKCiMjIDIuIE1vcmUgZGF0YSBleHBsb3JhdGlvbi4KCkhvdyBjbGVhbiBpcyB0aGlzIGRhdGE/IExldCdzIGNoZWNrIGZvciBvdXRsaWVycyB1c2luZyBib3ggcGxvdHMuIFdlIHdpbGwgY29uY2VudHJhdGUgb24gbnVtZXJpYyBjb2x1bW5zIHdpdGggZmV3ZXIgb3V0bGllcnMgKGFsdGhvdWdoIHlvdSBjb3VsZCBjbGVhbiBvdXQgdGhlIG91dGxpZXJzKS4gQm94cGxvdHMgY2FuIHNob3cgdGhlIHJhbmdlIG9mIHRoZSBkYXRhIGFuZCB3aWxsIGhlbHAgY2hvb3NlIHdoaWNoIGNvbHVtbnMgdG8gdXNlLiBOb3RlLCBCb3hwbG90cyBkbyBub3Qgc2hvdyB2YXJpYW5jZSwgYW4gaW1wb3J0YW50IGVsZW1lbnQgb2YgZ29vZCBjbHVzdGVyYWJsZSBkYXRhLiBQbGVhc2Ugc2VlIFt0aGlzIGRpc2N1c3Npb25dKGh0dHBzOi8vc3RhdHMuc3RhY2tleGNoYW5nZS5jb20vcXVlc3Rpb25zLzkxNTM2L2RlZHVjZS12YXJpYW5jZS1mcm9tLWJveHBsb3QpIGZvciBmdXJ0aGVyIHJlYWRpbmcuCgpbQm94cGxvdHNdKGh0dHBzOi8vZW4ud2lraXBlZGlhLm9yZy93aWtpL0JveF9wbG90KSBhcmUgdXNlZCB0byBncmFwaGljYWxseSBkZXBpY3QgdGhlIGRpc3RyaWJ1dGlvbiBvZiBgbnVtZXJpY2AgZGF0YSwgc28gbGV0J3Mgc3RhcnQgYnkgKnNlbGVjdGluZyogYWxsIG51bWVyaWMgY29sdW1ucyBhbG9uZ3NpZGUgdGhlIHBvcHVsYXIgbXVzaWMgZ2VucmVzLgoKYGBge3Igc2VsZWN0fQojIFNlbGVjdCB0b3AgZ2VucmUgY29sdW1uIGFuZCBhbGwgb3RoZXIgbnVtZXJpYyBjb2x1bW5zCmRmX251bWVyaWMgPC0gbmlnZXJpYW5fc29uZ3MgJT4lIAogIHNlbGVjdChhcnRpc3RfdG9wX2dlbnJlLCB3aGVyZShpcy5udW1lcmljKSkgCgojIERpc3BsYXkgdGhlIGRhdGEKZGZfbnVtZXJpYyAlPiUgCiAgc2xpY2VfaGVhZChuID0gNSkKCmBgYAoKU2VlIGhvdyB0aGUgc2VsZWN0aW9uIGhlbHBlciBgd2hlcmVgIG1ha2VzIHRoaXMgZWFzeSDwn5KBPyBFeHBsb3JlIHN1Y2ggb3RoZXIgZnVuY3Rpb25zIFtoZXJlXShodHRwczovL3RpZHlzZWxlY3Quci1saWIub3JnLykuCgpTaW5jZSB3ZSdsbCBiZSBtYWtpbmcgYSBib3hwbG90IGZvciBlYWNoIG51bWVyaWMgZmVhdHVyZXMgYW5kIHdlIHdhbnQgdG8gYXZvaWQgdXNpbmcgbG9vcHMsIGxldCdzIHJlZm9ybWF0IG91ciBkYXRhIGludG8gYSAqbG9uZ2VyKiBmb3JtYXQgdGhhdCB3aWxsIGFsbG93IHVzIHRvIHRha2UgYWR2YW50YWdlIG9mIGBmYWNldHNgIC0gc3VicGxvdHMgdGhhdCBlYWNoIGRpc3BsYXkgb25lIHN1YnNldCBvZiB0aGUgZGF0YS4KCmBgYHtyIHBpdm90X2xvbmdlcn0KIyBQaXZvdCBkYXRhIGZyb20gd2lkZSB0byBsb25nCmRmX251bWVyaWNfbG9uZyA8LSBkZl9udW1lcmljICU+JSAKICBwaXZvdF9sb25nZXIoIWFydGlzdF90b3BfZ2VucmUsIG5hbWVzX3RvID0gImZlYXR1cmVfbmFtZXMiLCB2YWx1ZXNfdG8gPSAidmFsdWVzIikgCgojIFByaW50IG91dCBkYXRhCmRmX251bWVyaWNfbG9uZyAlPiUgCiAgc2xpY2VfaGVhZChuID0gMTUpCmBgYAoKTXVjaCBsb25nZXIhIE5vdyB0aW1lIGZvciBzb21lIGBnZ3Bsb3RzYCEgU28gd2hhdCBgZ2VvbWAgd2lsbCB3ZSB1c2U/CgpgYGB7cn0KIyBNYWtlIGEgYm94IHBsb3QKZGZfbnVtZXJpY19sb25nICU+JSAKICBnZ3Bsb3QobWFwcGluZyA9IGFlcyh4ID0gZmVhdHVyZV9uYW1lcywgeSA9IHZhbHVlcywgZmlsbCA9IGZlYXR1cmVfbmFtZXMpKSArCiAgZ2VvbV9ib3hwbG90KCkgKwogIGZhY2V0X3dyYXAofiBmZWF0dXJlX25hbWVzLCBuY29sID0gNCwgc2NhbGVzID0gImZyZWUiKSArCiAgdGhlbWUobGVnZW5kLnBvc2l0aW9uID0gIm5vbmUiKQpgYGAKCkVhc3ktZ2chCgpOb3cgd2UgY2FuIHNlZSB0aGlzIGRhdGEgaXMgYSBsaXR0bGUgbm9pc3k6IGJ5IG9ic2VydmluZyBlYWNoIGNvbHVtbiBhcyBhIGJveHBsb3QsIHlvdSBjYW4gc2VlIG91dGxpZXJzLiBZb3UgY291bGQgZ28gdGhyb3VnaCB0aGUgZGF0YXNldCBhbmQgcmVtb3ZlIHRoZXNlIG91dGxpZXJzLCBidXQgdGhhdCB3b3VsZCBtYWtlIHRoZSBkYXRhIHByZXR0eSBtaW5pbWFsLgoKRm9yIG5vdywgbGV0J3MgY2hvb3NlIHdoaWNoIGNvbHVtbnMgd2Ugd2lsbCB1c2UgZm9yIG91ciBjbHVzdGVyaW5nIGV4ZXJjaXNlLiBMZXQncyBwaWNrIHRoZSBudW1lcmljIGNvbHVtbnMgd2l0aCBzaW1pbGFyIHJhbmdlcy4gV2UgY291bGQgZW5jb2RlIHRoZSBgYXJ0aXN0X3RvcF9nZW5yZWAgYXMgbnVtZXJpYyBidXQgd2UnbGwgZHJvcCBpdCBmb3Igbm93LgoKYGBge3Igc2VsZWN0X2NvbHVtbnN9CiMgU2VsZWN0IHZhcmlhYmxlcyB3aXRoIHNpbWlsYXIgcmFuZ2VzCmRmX251bWVyaWNfc2VsZWN0IDwtIGRmX251bWVyaWMgJT4lIAogIHNlbGVjdChwb3B1bGFyaXR5LCBkYW5jZWFiaWxpdHksIGFjb3VzdGljbmVzcywgbG91ZG5lc3MsIGVuZXJneSkgCgojIE5vcm1hbGl6ZSBkYXRhCiMgZGZfbnVtZXJpY19zZWxlY3QgPC0gc2NhbGUoZGZfbnVtZXJpY19zZWxlY3QpCmBgYAoKIyMgMy4gQ29tcHV0aW5nIGstbWVhbnMgY2x1c3RlcmluZyBpbiBSCgpXZSBjYW4gY29tcHV0ZSBrLW1lYW5zIGluIFIgd2l0aCB0aGUgYnVpbHQtaW4gYGttZWFuc2AgZnVuY3Rpb24sIHNlZSBgaGVscCgia21lYW5zKCkiKWAuIGBrbWVhbnMoKWAgZnVuY3Rpb24gYWNjZXB0cyBhIGRhdGEgZnJhbWUgd2l0aCBhbGwgbnVtZXJpYyBjb2x1bW5zIGFzIGl0J3MgcHJpbWFyeSBhcmd1bWVudC4KClRoZSBmaXJzdCBzdGVwIHdoZW4gdXNpbmcgay1tZWFucyBjbHVzdGVyaW5nIGlzIHRvIHNwZWNpZnkgdGhlIG51bWJlciBvZiBjbHVzdGVycyAoaykgdGhhdCB3aWxsIGJlIGdlbmVyYXRlZCBpbiB0aGUgZmluYWwgc29sdXRpb24uIFdlIGtub3cgdGhlcmUgYXJlIDMgc29uZyBnZW5yZXMgdGhhdCB3ZSBjYXJ2ZWQgb3V0IG9mIHRoZSBkYXRhc2V0LCBzbyBsZXQncyB0cnkgMzoKCmBgYHtyIGttZWFuc30Kc2V0LnNlZWQoMjA1NikKIyBLbWVhbnMgY2x1c3RlcmluZyBmb3IgMyBjbHVzdGVycwprY2x1c3QgPC0ga21lYW5zKAogIGRmX251bWVyaWNfc2VsZWN0LAogICMgU3BlY2lmeSB0aGUgbnVtYmVyIG9mIGNsdXN0ZXJzCiAgY2VudGVycyA9IDMsCiAgIyBIb3cgbWFueSByYW5kb20gaW5pdGlhbCBjb25maWd1cmF0aW9ucwogIG5zdGFydCA9IDI1CikKCiMgRGlzcGxheSBjbHVzdGVyaW5nIG9iamVjdAprY2x1c3QKYGBgCgpUaGUga21lYW5zIG9iamVjdCBjb250YWlucyBzZXZlcmFsIGJpdHMgb2YgaW5mb3JtYXRpb24gd2hpY2ggaXMgd2VsbCBleHBsYWluZWQgaW4gYGhlbHAoImttZWFucygpIilgLiBGb3Igbm93LCBsZXQncyBmb2N1cyBvbiBhIGZldy4gV2Ugc2VlIHRoYXQgdGhlIGRhdGEgaGFzIGJlZW4gZ3JvdXBlZCBpbnRvIDMgY2x1c3RlcnMgb2Ygc2l6ZXMgNjUsIDExMCwgMTExLiBUaGUgb3V0cHV0IGFsc28gY29udGFpbnMgdGhlIGNsdXN0ZXIgY2VudGVycyAobWVhbnMpIGZvciB0aGUgMyBncm91cHMgYWNyb3NzIHRoZSA1IHZhcmlhYmxlcy4KClRoZSBjbHVzdGVyaW5nIHZlY3RvciBpcyB0aGUgY2x1c3RlciBhc3NpZ25tZW50IGZvciBlYWNoIG9ic2VydmF0aW9uLiBMZXQncyB1c2UgdGhlIGBhdWdtZW50YCBmdW5jdGlvbiB0byBhZGQgdGhlIGNsdXN0ZXIgYXNzaWdubWVudCB0aGUgb3JpZ2luYWwgZGF0YSBzZXQuCgpgYGB7ciBhdWdtZW50fQojIEFkZCBwcmVkaWN0ZWQgY2x1c3RlciBhc3NpZ25tZW50IHRvIGRhdGEgc2V0CmF1Z21lbnQoa2NsdXN0LCBkZl9udW1lcmljX3NlbGVjdCkgJT4lIAogIHJlbG9jYXRlKC5jbHVzdGVyKSAlPiUgCiAgc2xpY2VfaGVhZChuID0gMTApCmBgYAoKUGVyZmVjdCwgd2UgaGF2ZSBqdXN0IHBhcnRpdGlvbmVkIG91ciBkYXRhIHNldCBpbnRvIGEgc2V0IG9mIDMgZ3JvdXBzLiBTbywgaG93IGdvb2QgaXMgb3VyIGNsdXN0ZXJpbmcg8J+ktz8gTGV0J3MgdGFrZSBhIGxvb2sgYXQgdGhlIGBTaWxob3VldHRlIHNjb3JlYAoKIyMjICoqU2lsaG91ZXR0ZSBzY29yZSoqCgpbU2lsaG91ZXR0ZSBhbmFseXNpc10oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvU2lsaG91ZXR0ZV8oY2x1c3RlcmluZykpIGNhbiBiZSB1c2VkIHRvIHN0dWR5IHRoZSBzZXBhcmF0aW9uIGRpc3RhbmNlIGJldHdlZW4gdGhlIHJlc3VsdGluZyBjbHVzdGVycy4gVGhpcyBzY29yZSB2YXJpZXMgZnJvbSAtMSB0byAxLCBhbmQgaWYgdGhlIHNjb3JlIGlzIG5lYXIgMSwgdGhlIGNsdXN0ZXIgaXMgZGVuc2UgYW5kIHdlbGwtc2VwYXJhdGVkIGZyb20gb3RoZXIgY2x1c3RlcnMuIEEgdmFsdWUgbmVhciAwIHJlcHJlc2VudHMgb3ZlcmxhcHBpbmcgY2x1c3RlcnMgd2l0aCBzYW1wbGVzIHZlcnkgY2xvc2UgdG8gdGhlIGRlY2lzaW9uIGJvdW5kYXJ5IG9mIHRoZSBuZWlnaGJvcmluZyBjbHVzdGVycy4gWyhTb3VyY2UpXShodHRwczovL2R6b25lLmNvbS9hcnRpY2xlcy9rbWVhbnMtc2lsaG91ZXR0ZS1zY29yZS1leHBsYWluZWQtd2l0aC1weXRob24tZXhhbSkuCgpUaGUgYXZlcmFnZSBzaWxob3VldHRlIG1ldGhvZCBjb21wdXRlcyB0aGUgYXZlcmFnZSBzaWxob3VldHRlIG9mIG9ic2VydmF0aW9ucyBmb3IgZGlmZmVyZW50IHZhbHVlcyBvZiAqayouIEEgaGlnaCBhdmVyYWdlIHNpbGhvdWV0dGUgc2NvcmUgaW5kaWNhdGVzIGEgZ29vZCBjbHVzdGVyaW5nLgoKVGhlIGBzaWxob3VldHRlYCBmdW5jdGlvbiBpbiB0aGUgY2x1c3RlciBwYWNrYWdlIHRvIGNvbXB1YXRlIHRoZSBhdmVyYWdlIHNpbGhvdWV0dGUgd2lkdGguCgo+IFRoZSBzaWxob3VldHRlIGNhbiBiZSBjYWxjdWxhdGVkIHdpdGggYW55IFtkaXN0YW5jZV0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvRGlzdGFuY2UgIkRpc3RhbmNlIikgbWV0cmljLCBzdWNoIGFzIHRoZSBbRXVjbGlkZWFuIGRpc3RhbmNlXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9FdWNsaWRlYW5fZGlzdGFuY2UgIkV1Y2xpZGVhbiBkaXN0YW5jZSIpIG9yIHRoZSBbTWFuaGF0dGFuIGRpc3RhbmNlXShodHRwczovL2VuLndpa2lwZWRpYS5vcmcvd2lraS9NYW5oYXR0YW5fZGlzdGFuY2UgIk1hbmhhdHRhbiBkaXN0YW5jZSIpIHdoaWNoIHdlIGRpc2N1c3NlZCBpbiB0aGUgW3ByZXZpb3VzIGxlc3Nvbl0oaHR0cHM6Ly9naXRodWIuY29tL21pY3Jvc29mdC9NTC1Gb3ItQmVnaW5uZXJzL2Jsb2IvbWFpbi81LUNsdXN0ZXJpbmcvMS1WaXN1YWxpemUvc29sdXRpb24vUi9sZXNzb25fMTQtUi5pcHluYikuCgpgYGB7cn0KIyBMb2FkIGNsdXN0ZXIgcGFja2FnZQpsaWJyYXJ5KGNsdXN0ZXIpCgojIENvbXB1dGUgYXZlcmFnZSBzaWxob3VldHRlIHNjb3JlCnNzIDwtIHNpbGhvdWV0dGUoa2NsdXN0JGNsdXN0ZXIsCiAgICAgICAgICAgICAgICAgIyBDb21wdXRlIGV1Y2xpZGVhbiBkaXN0YW5jZQogICAgICAgICAgICAgICAgIGRpc3QgPSBkaXN0KGRmX251bWVyaWNfc2VsZWN0KSkKbWVhbihzc1ssIDNdKQoKYGBgCgpPdXIgc2NvcmUgaXMgKiouNTQ5KiosIHNvIHJpZ2h0IGluIHRoZSBtaWRkbGUuIFRoaXMgaW5kaWNhdGVzIHRoYXQgb3VyIGRhdGEgaXMgbm90IHBhcnRpY3VsYXJseSB3ZWxsLXN1aXRlZCB0byB0aGlzIHR5cGUgb2YgY2x1c3RlcmluZy4gTGV0J3Mgc2VlIHdoZXRoZXIgd2UgY2FuIGNvbmZpcm0gdGhpcyBodW5jaCB2aXN1YWxseS4gVGhlIFtmYWN0b2V4dHJhIHBhY2thZ2VdKGh0dHBzOi8vcnBrZ3MuZGF0YW5vdmlhLmNvbS9mYWN0b2V4dHJhL2luZGV4Lmh0bWwpIHByb3ZpZGVzIGZ1bmN0aW9ucyAoYGZ2aXpfY2x1c3RlcigpYCkgdG8gdmlzdWFsaXplIGNsdXN0ZXJpbmcuCgpgYGB7ciBmdml6X2NsdXN0ZXJ9CmxpYnJhcnkoZmFjdG9leHRyYSkKCiMgVmlzdWFsaXplIGNsdXN0ZXJpbmcgcmVzdWx0cwpmdml6X2NsdXN0ZXIoa2NsdXN0LCBkZl9udW1lcmljX3NlbGVjdCkKCmBgYAoKVGhlIG92ZXJsYXAgaW4gY2x1c3RlcnMgaW5kaWNhdGVzIHRoYXQgb3VyIGRhdGEgaXMgbm90IHBhcnRpY3VsYXJseSB3ZWxsLXN1aXRlZCB0byB0aGlzIHR5cGUgb2YgY2x1c3RlcmluZyBidXQgbGV0J3MgY29udGludWUuCgojIyA0LiBEZXRlcm1pbmluZyBvcHRpbWFsIGNsdXN0ZXJzCgpBIGZ1bmRhbWVudGFsIHF1ZXN0aW9uIHRoYXQgb2Z0ZW4gYXJpc2VzIGluIEstTWVhbnMgY2x1c3RlcmluZyBpcyB0aGlzIC0gd2l0aG91dCBrbm93biBjbGFzcyBsYWJlbHMsIGhvdyBkbyB5b3Uga25vdyBob3cgbWFueSBjbHVzdGVycyB0byBzZXBhcmF0ZSB5b3VyIGRhdGEgaW50bz8KCk9uZSB3YXkgd2UgY2FuIHRyeSB0byBmaW5kIG91dCBpcyB0byB1c2UgYSBkYXRhIHNhbXBsZSB0byBgY3JlYXRlIGEgc2VyaWVzIG9mIGNsdXN0ZXJpbmcgbW9kZWxzYCB3aXRoIGFuIGluY3JlbWVudGluZyBudW1iZXIgb2YgY2x1c3RlcnMgKGUuZyBmcm9tIDEtMTApLCBhbmQgZXZhbHVhdGUgY2x1c3RlcmluZyBtZXRyaWNzIHN1Y2ggYXMgdGhlICoqU2lsaG91ZXR0ZSBzY29yZS4qKgoKTGV0J3MgZGV0ZXJtaW5lIHRoZSBvcHRpbWFsIG51bWJlciBvZiBjbHVzdGVycyBieSBjb21wdXRpbmcgdGhlIGNsdXN0ZXJpbmcgYWxnb3JpdGhtIGZvciBkaWZmZXJlbnQgdmFsdWVzIG9mICprKiBhbmQgZXZhbHVhdGluZyB0aGUgKipXaXRoaW4gQ2x1c3RlciBTdW0gb2YgU3F1YXJlcyoqIChXQ1NTKS4gVGhlIHRvdGFsIHdpdGhpbi1jbHVzdGVyIHN1bSBvZiBzcXVhcmUgKFdDU1MpIG1lYXN1cmVzIHRoZSBjb21wYWN0bmVzcyBvZiB0aGUgY2x1c3RlcmluZyBhbmQgd2Ugd2FudCBpdCB0byBiZSBhcyBzbWFsbCBhcyBwb3NzaWJsZSwgd2l0aCBsb3dlciB2YWx1ZXMgbWVhbmluZyB0aGF0IHRoZSBkYXRhIHBvaW50cyBhcmUgY2xvc2VyLgoKTGV0J3MgZXhwbG9yZSB0aGUgZWZmZWN0IG9mIGRpZmZlcmVudCBjaG9pY2VzIG9mIGBrYCwgZnJvbSAxIHRvIDEwLCBvbiB0aGlzIGNsdXN0ZXJpbmcuCgpgYGB7cn0KIyBDcmVhdGUgYSBzZXJpZXMgb2YgY2x1c3RlcmluZyBtb2RlbHMKa2NsdXN0cyA8LSB0aWJibGUoayA9IDE6MTApICU+JSAKICAjIFBlcmZvcm0ga21lYW5zIGNsdXN0ZXJpbmcgZm9yIDEsMiwzIC4uLiAsMTAgY2x1c3RlcnMKICBtdXRhdGUobW9kZWwgPSBtYXAoaywgfiBrbWVhbnMoZGZfbnVtZXJpY19zZWxlY3QsIGNlbnRlcnMgPSAueCwgbnN0YXJ0ID0gMjUpKSwKICAjIEZhcm0gb3V0IGNsdXN0ZXJpbmcgbWV0cmljcyBlZyBXQ1NTCiAgICAgICAgIGdsYW5jZWQgPSBtYXAobW9kZWwsIH4gZ2xhbmNlKC54KSkpICU+JSAKICB1bm5lc3QoY29scyA9IGdsYW5jZWQpCiAgCgojIFZpZXcgY2x1c3RlcmluZyByc3Vsc3RzCmtjbHVzdHMKYGBgCgpOb3cgdGhhdCB3ZSBoYXZlIHRoZSB0b3RhbCB3aXRoaW4tY2x1c3RlciBzdW0tb2Ytc3F1YXJlcyAodG90LndpdGhpbnNzKSBmb3IgZWFjaCBjbHVzdGVyaW5nIGFsZ29yaXRobSB3aXRoIGNlbnRlciAqayosIHdlIHVzZSB0aGUgW2VsYm93IG1ldGhvZF0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvRWxib3dfbWV0aG9kXyhjbHVzdGVyaW5nKSkgdG8gZmluZCB0aGUgb3B0aW1hbCBudW1iZXIgb2YgY2x1c3RlcnMuIFRoZSBtZXRob2QgY29uc2lzdHMgb2YgcGxvdHRpbmcgdGhlIFdDU1MgYXMgYSBmdW5jdGlvbiBvZiB0aGUgbnVtYmVyIG9mIGNsdXN0ZXJzLCBhbmQgcGlja2luZyB0aGUgW2VsYm93IG9mIHRoZSBjdXJ2ZV0oaHR0cHM6Ly9lbi53aWtpcGVkaWEub3JnL3dpa2kvRWxib3dfb2ZfdGhlX2N1cnZlICJFbGJvdyBvZiB0aGUgY3VydmUiKSBhcyB0aGUgbnVtYmVyIG9mIGNsdXN0ZXJzIHRvIHVzZS4KCmBgYHtyIGVsYm93X21ldGhvZH0Kc2V0LnNlZWQoMjA1NikKIyBVc2UgZWxib3cgbWV0aG9kIHRvIGRldGVybWluZSBvcHRpbXVtIG51bWJlciBvZiBjbHVzdGVycwprY2x1c3RzICU+JSAKICBnZ3Bsb3QobWFwcGluZyA9IGFlcyh4ID0gaywgeSA9IHRvdC53aXRoaW5zcykpICsKICBnZW9tX2xpbmUoc2l6ZSA9IDEuMiwgYWxwaGEgPSAwLjgsIGNvbG9yID0gIiNGRjdGMEVGRiIpICsKICBnZW9tX3BvaW50KHNpemUgPSAyLCBjb2xvciA9ICIjRkY3RjBFRkYiKQpgYGAKClRoZSBwbG90IHNob3dzIGEgbGFyZ2UgcmVkdWN0aW9uIGluIFdDU1MgKHNvIGdyZWF0ZXIgKnRpZ2h0bmVzcyopIGFzIHRoZSBudW1iZXIgb2YgY2x1c3RlcnMgaW5jcmVhc2VzIGZyb20gb25lIHRvIHR3bywgYW5kIGEgZnVydGhlciBub3RpY2VhYmxlIHJlZHVjdGlvbiBmcm9tIHR3byB0byB0aHJlZSBjbHVzdGVycy4gQWZ0ZXIgdGhhdCwgdGhlIHJlZHVjdGlvbiBpcyBsZXNzIHByb25vdW5jZWQsIHJlc3VsdGluZyBpbiBhbiBgZWxib3dgIPCfkqppbiB0aGUgY2hhcnQgYXQgYXJvdW5kIHRocmVlIGNsdXN0ZXJzLiBUaGlzIGlzIGEgZ29vZCBpbmRpY2F0aW9uIHRoYXQgdGhlcmUgYXJlIHR3byB0byB0aHJlZSByZWFzb25hYmx5IHdlbGwgc2VwYXJhdGVkIGNsdXN0ZXJzIG9mIGRhdGEgcG9pbnRzLgoKV2UgY2FuIG5vdyBnbyBhaGVhZCBhbmQgZXh0cmFjdCB0aGUgY2x1c3RlcmluZyBtb2RlbCB3aGVyZSBgayA9IDNgOgoKPiBgcHVsbCgpYDogdXNlZCB0byBleHRyYWN0IGEgc2luZ2xlIGNvbHVtbgo+Cj4gYHBsdWNrKClgOiB1c2VkIHRvIGluZGV4IGRhdGEgc3RydWN0dXJlcyBzdWNoIGFzIGxpc3RzCgpgYGB7ciBleHRyYWN0X21vZGVsfQojIEV4dHJhY3QgayA9IDMgY2x1c3RlcmluZwpmaW5hbF9rbWVhbnMgPC0ga2NsdXN0cyAlPiUgCiAgZmlsdGVyKGsgPT0gMykgJT4lIAogIHB1bGwobW9kZWwpICU+JSAKICBwbHVjaygxKQoKCmZpbmFsX2ttZWFucwpgYGAKCkdyZWF0ISBMZXQncyBnbyBhaGVhZCBhbmQgdmlzdWFsaXplIHRoZSBjbHVzdGVycyBvYnRhaW5lZC4gQ2FyZSBmb3Igc29tZSBpbnRlcmFjdGl2aXR5IHVzaW5nIGBwbG90bHlgPwoKYGBge3Igdml6X2NsdXN0fQojIEFkZCBwcmVkaWN0ZWQgY2x1c3RlciBhc3NpZ25tZW50IHRvIGRhdGEgc2V0CnJlc3VsdHMgPC0gIGF1Z21lbnQoZmluYWxfa21lYW5zLCBkZl9udW1lcmljX3NlbGVjdCkgJT4lIAogIGJpbmRfY29scyhkZl9udW1lcmljICU+JSBzZWxlY3QoYXJ0aXN0X3RvcF9nZW5yZSkpIAoKIyBQbG90IGNsdXN0ZXIgYXNzaWdubWVudHMKY2x1c3RfcGx0IDwtIHJlc3VsdHMgJT4lIAogIGdncGxvdChtYXBwaW5nID0gYWVzKHggPSBwb3B1bGFyaXR5LCB5ID0gZGFuY2VhYmlsaXR5LCBjb2xvciA9IC5jbHVzdGVyLCBzaGFwZSA9IGFydGlzdF90b3BfZ2VucmUpKSArCiAgZ2VvbV9wb2ludChzaXplID0gMiwgYWxwaGEgPSAwLjgpICsKICBwYWxldHRlZXI6OnNjYWxlX2NvbG9yX3BhbGV0dGVlcl9kKCJnZ3RoZW1lczo6VGFibGVhdV8xMCIpCgpnZ3Bsb3RseShjbHVzdF9wbHQpCgpgYGAKClBlcmhhcHMgd2Ugd291bGQgaGF2ZSBleHBlY3RlZCB0aGF0IGVhY2ggY2x1c3RlciAocmVwcmVzZW50ZWQgYnkgZGlmZmVyZW50IGNvbG9ycykgd291bGQgaGF2ZSBkaXN0aW5jdCBnZW5yZXMgKHJlcHJlc2VudGVkIGJ5IGRpZmZlcmVudCBzaGFwZXMpLgoKTGV0J3MgdGFrZSBhIGxvb2sgYXQgdGhlIG1vZGVsJ3MgYWNjdXJhY3kuCgpgYGB7ciBvcmRpbmFsX2VuY29kZX0KIyBBc3NpZ24gZ2VucmVzIHRvIHByZWRlZmluZWQgaW50ZWdlcnMKbGFiZWxfY291bnQgPC0gcmVzdWx0cyAlPiUgCiAgZ3JvdXBfYnkoYXJ0aXN0X3RvcF9nZW5yZSkgJT4lIAogIG11dGF0ZShpZCA9IGN1cl9ncm91cF9pZCgpKSAlPiUgCiAgdW5ncm91cCgpICU+JSAKICBzdW1tYXJpc2UoY29ycmVjdF9sYWJlbHMgPSBzdW0oLmNsdXN0ZXIgPT0gaWQpKQoKCiMgUHJpbnQgcmVzdWx0cyAgCmNhdCgiUmVzdWx0OiIsIGxhYmVsX2NvdW50JGNvcnJlY3RfbGFiZWxzLCAib3V0IG9mIiwgbnJvdyhyZXN1bHRzKSwgInNhbXBsZXMgd2VyZSBjb3JyZWN0bHkgbGFiZWxlZC4iKQoKY2F0KCJcbkFjY3VyYWN5IHNjb3JlOiIsIGxhYmVsX2NvdW50JGNvcnJlY3RfbGFiZWxzL25yb3cocmVzdWx0cykpCgpgYGAKClRoaXMgbW9kZWwncyBhY2N1cmFjeSBpcyBub3QgYmFkLCBidXQgbm90IGdyZWF0LiBJdCBtYXkgYmUgdGhhdCB0aGUgZGF0YSBtYXkgbm90IGxlbmQgaXRzZWxmIHdlbGwgdG8gSy1NZWFucyBDbHVzdGVyaW5nLiBUaGlzIGRhdGEgaXMgdG9vIGltYmFsYW5jZWQsIHRvbyBsaXR0bGUgY29ycmVsYXRlZCBhbmQgdGhlcmUgaXMgdG9vIG11Y2ggdmFyaWFuY2UgYmV0d2VlbiB0aGUgY29sdW1uIHZhbHVlcyB0byBjbHVzdGVyIHdlbGwuIEluIGZhY3QsIHRoZSBjbHVzdGVycyB0aGF0IGZvcm0gYXJlIHByb2JhYmx5IGhlYXZpbHkgaW5mbHVlbmNlZCBvciBza2V3ZWQgYnkgdGhlIHRocmVlIGdlbnJlIGNhdGVnb3JpZXMgd2UgZGVmaW5lZCBhYm92ZS4KCk5ldmVydGhlbGVzcywgdGhhdCB3YXMgcXVpdGUgYSBsZWFybmluZyBwcm9jZXNzIQoKSW4gU2Npa2l0LWxlYXJuJ3MgZG9jdW1lbnRhdGlvbiwgeW91IGNhbiBzZWUgdGhhdCBhIG1vZGVsIGxpa2UgdGhpcyBvbmUsIHdpdGggY2x1c3RlcnMgbm90IHZlcnkgd2VsbCBkZW1hcmNhdGVkLCBoYXMgYSAndmFyaWFuY2UnIHByb2JsZW06CgohW0luZm9ncmFwaGljIGZyb20gU2Npa2l0LWxlYXJuXSguLi8uLi9pbWFnZXMvcHJvYmxlbXMucG5nKQoKIyMgKipWYXJpYW5jZSoqCgpWYXJpYW5jZSBpcyBkZWZpbmVkIGFzICJ0aGUgYXZlcmFnZSBvZiB0aGUgc3F1YXJlZCBkaWZmZXJlbmNlcyBmcm9tIHRoZSBNZWFuIiBbKFNvdXJjZSldKGh0dHBzOi8vd3d3Lm1hdGhzaXNmdW4uY29tL2RhdGEvc3RhbmRhcmQtZGV2aWF0aW9uLmh0bWwpLiBJbiB0aGUgY29udGV4dCBvZiB0aGlzIGNsdXN0ZXJpbmcgcHJvYmxlbSwgaXQgcmVmZXJzIHRvIGRhdGEgdGhhdCB0aGUgbnVtYmVycyBvZiBvdXIgZGF0YXNldCB0ZW5kIHRvIGRpdmVyZ2UgYSBiaXQgdG9vIG11Y2ggZnJvbSB0aGUgbWVhbi4KCuKchSBUaGlzIGlzIGEgZ3JlYXQgbW9tZW50IHRvIHRoaW5rIGFib3V0IGFsbCB0aGUgd2F5cyB5b3UgY291bGQgY29ycmVjdCB0aGlzIGlzc3VlLiBUd2VhayB0aGUgZGF0YSBhIGJpdCBtb3JlPyBVc2UgZGlmZmVyZW50IGNvbHVtbnM/IFVzZSBhIGRpZmZlcmVudCBhbGdvcml0aG0/IEhpbnQ6IFRyeSBbc2NhbGluZyB5b3VyIGRhdGFdKGh0dHBzOi8vd3d3Lm15Z3JlYXRsZWFybmluZy5jb20vYmxvZy9sZWFybmluZy1kYXRhLXNjaWVuY2Utd2l0aC1rLW1lYW5zLWNsdXN0ZXJpbmcvKSB0byBub3JtYWxpemUgaXQgYW5kIHRlc3Qgb3RoZXIgY29sdW1ucy4KCj4gVHJ5IHRoaXMgJ1t2YXJpYW5jZSBjYWxjdWxhdG9yXShodHRwczovL3d3dy5jYWxjdWxhdG9yc291cC5jb20vY2FsY3VsYXRvcnMvc3RhdGlzdGljcy92YXJpYW5jZS1jYWxjdWxhdG9yLnBocCknIHRvIHVuZGVyc3RhbmQgdGhlIGNvbmNlcHQgYSBiaXQgbW9yZS4KCi0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLS0tLQoKIyMgKirwn5qAQ2hhbGxlbmdlKioKClNwZW5kIHNvbWUgdGltZSB3aXRoIHRoaXMgbm90ZWJvb2ssIHR3ZWFraW5nIHBhcmFtZXRlcnMuIENhbiB5b3UgaW1wcm92ZSB0aGUgYWNjdXJhY3kgb2YgdGhlIG1vZGVsIGJ5IGNsZWFuaW5nIHRoZSBkYXRhIG1vcmUgKHJlbW92aW5nIG91dGxpZXJzLCBmb3IgZXhhbXBsZSk/IFlvdSBjYW4gdXNlIHdlaWdodHMgdG8gZ2l2ZSBtb3JlIHdlaWdodCB0byBnaXZlbiBkYXRhIHNhbXBsZXMuIFdoYXQgZWxzZSBjYW4geW91IGRvIHRvIGNyZWF0ZSBiZXR0ZXIgY2x1c3RlcnM/CgpIaW50OiBUcnkgdG8gc2NhbGUgeW91ciBkYXRhLiBUaGVyZSdzIGNvbW1lbnRlZCBjb2RlIGluIHRoZSBub3RlYm9vayB0aGF0IGFkZHMgc3RhbmRhcmQgc2NhbGluZyB0byBtYWtlIHRoZSBkYXRhIGNvbHVtbnMgcmVzZW1ibGUgZWFjaCBvdGhlciBtb3JlIGNsb3NlbHkgaW4gdGVybXMgb2YgcmFuZ2UuIFlvdSdsbCBmaW5kIHRoYXQgd2hpbGUgdGhlIHNpbGhvdWV0dGUgc2NvcmUgZ29lcyBkb3duLCB0aGUgJ2tpbmsnIGluIHRoZSBlbGJvdyBncmFwaCBzbW9vdGhzIG91dC4gVGhpcyBpcyBiZWNhdXNlIGxlYXZpbmcgdGhlIGRhdGEgdW5zY2FsZWQgYWxsb3dzIGRhdGEgd2l0aCBsZXNzIHZhcmlhbmNlIHRvIGNhcnJ5IG1vcmUgd2VpZ2h0LiBSZWFkIGEgYml0IG1vcmUgb24gdGhpcyBwcm9ibGVtIFtoZXJlXShodHRwczovL3N0YXRzLnN0YWNrZXhjaGFuZ2UuY29tL3F1ZXN0aW9ucy8yMTIyMi9hcmUtbWVhbi1ub3JtYWxpemF0aW9uLWFuZC1mZWF0dXJlLXNjYWxpbmctbmVlZGVkLWZvci1rLW1lYW5zLWNsdXN0ZXJpbmcvMjEyMjYjMjEyMjYpLgoKIyMgWyoqUG9zdC1sZWN0dXJlIHF1aXoqKl0oaHR0cHM6Ly9ncmF5LXNhbmQtMDdhMTBmNDAzLjEuYXp1cmVzdGF0aWNhcHBzLm5ldC9xdWl6LzMwLykKCiMjICoqUmV2aWV3ICYgU2VsZiBTdHVkeSoqCgotICAgVGFrZSBhIGxvb2sgYXQgYSBLLU1lYW5zIFNpbXVsYXRvciBbc3VjaCBhcyB0aGlzIG9uZV0oaHR0cHM6Ly91c2VyLmNlbmcubWV0dS5lZHUudHIvfmFraWZha2t1cy9jb3Vyc2VzL2Nlbmc1NzQvay1tZWFucy8pLiBZb3UgY2FuIHVzZSB0aGlzIHRvb2wgdG8gdmlzdWFsaXplIHNhbXBsZSBkYXRhIHBvaW50cyBhbmQgZGV0ZXJtaW5lIGl0cyBjZW50cm9pZHMuIFlvdSBjYW4gZWRpdCB0aGUgZGF0YSdzIHJhbmRvbW5lc3MsIG51bWJlcnMgb2YgY2x1c3RlcnMgYW5kIG51bWJlcnMgb2YgY2VudHJvaWRzLiBEb2VzIHRoaXMgaGVscCB5b3UgZ2V0IGFuIGlkZWEgb2YgaG93IHRoZSBkYXRhIGNhbiBiZSBncm91cGVkPwoKLSAgIEFsc28sIHRha2UgYSBsb29rIGF0IFt0aGlzIGhhbmRvdXQgb24gSy1NZWFuc10oaHR0cHM6Ly9zdGFuZm9yZC5lZHUvfmNwaWVjaC9jczIyMS9oYW5kb3V0cy9rbWVhbnMuaHRtbCkgZnJvbSBTdGFuZm9yZC4KCldhbnQgdG8gdHJ5IG91dCB5b3VyIG5ld2x5IGFjcXVpcmVkIGNsdXN0ZXJpbmcgc2tpbGxzIHRvIGRhdGEgc2V0cyB0aGF0IGxlbmQgd2VsbCB0byBLLU1lYW5zIGNsdXN0ZXJpbmc/IFBsZWFzZSBzZWU6CgotICAgW1RyYWluIGFuZCBFdmFsdWF0ZSBDbHVzdGVyaW5nIE1vZGVsc10oaHR0cHM6Ly9ycHVicy5jb20vZVJfaWMvY2x1c3RlcmluZykgdXNpbmcgVGlkeW1vZGVscyBhbmQgZnJpZW5kcwoKLSAgIFtLLW1lYW5zIENsdXN0ZXIgQW5hbHlzaXNdKGh0dHBzOi8vdWMtci5naXRodWIuaW8va21lYW5zX2NsdXN0ZXJpbmcpLCBVQyBCdXNpbmVzcyBBbmFseXRpY3MgUiBQcm9ncmFtbWluZyBHdWlkZQoKLSAgIFtLLW1lYW5zwqBjbHVzdGVyaW5nwqB3aXRowqB0aWR5wqBkYXRhwqBwcmluY2lwbGVzXShodHRwczovL3d3dy50aWR5bW9kZWxzLm9yZy9sZWFybi9zdGF0aXN0aWNzL2stbWVhbnMvKQoKIyMgKipBc3NpZ25tZW50KioKCltUcnkgZGlmZmVyZW50IGNsdXN0ZXJpbmcgbWV0aG9kc10oaHR0cHM6Ly9naXRodWIuY29tL21pY3Jvc29mdC9NTC1Gb3ItQmVnaW5uZXJzL2Jsb2IvbWFpbi81LUNsdXN0ZXJpbmcvMi1LLU1lYW5zL2Fzc2lnbm1lbnQubWQpCgojIyBUSEFOSyBZT1UgVE86CgpbSmVuIExvb3Blcl0oaHR0cHM6Ly93d3cudHdpdHRlci5jb20vamVubG9vcGVyKSBmb3IgY3JlYXRpbmcgdGhlIG9yaWdpbmFsIFB5dGhvbiB2ZXJzaW9uIG9mIHRoaXMgbW9kdWxlIOKZpe+4jwoKW2BBbGxpc29uIEhvcnN0YF0oaHR0cHM6Ly90d2l0dGVyLmNvbS9hbGxpc29uX2hvcnN0LykgZm9yIGNyZWF0aW5nIHRoZSBhbWF6aW5nIGlsbHVzdHJhdGlvbnMgdGhhdCBtYWtlIFIgbW9yZSB3ZWxjb21pbmcgYW5kIGVuZ2FnaW5nLiBGaW5kIG1vcmUgaWxsdXN0cmF0aW9ucyBhdCBoZXIgW2dhbGxlcnldKGh0dHBzOi8vd3d3Lmdvb2dsZS5jb20vdXJsP3E9aHR0cHM6Ly9naXRodWIuY29tL2FsbGlzb25ob3JzdC9zdGF0cy1pbGx1c3RyYXRpb25zJnNhPUQmc291cmNlPWVkaXRvcnMmdXN0PTE2MjYzODA3NzI1MzAwMDAmdXNnPUFPdlZhdzN6Y2Z5Q2l6RlFacGtTTHp4aWlRRU0pLgoKSGFwcHkgTGVhcm5pbmcsCgpbRXJpY10oaHR0cHM6Ly90d2l0dGVyLmNvbS9lcmljbnRheSksIEdvbGQgTWljcm9zb2Z0IExlYXJuIFN0dWRlbnQgQW1iYXNzYWRvci4KCiFbQXJ0d29yayBieSBcQGFsbGlzb25faG9yc3RdKC4uLy4uL2ltYWdlcy9yX2xlYXJuZXJzX3NtLmpwZWcpCgojYGBge3IgaW5jbHVkZT1GQUxTRX0KI2xpYnJhcnkoaGVyZSkKI2xpYnJhcnkocm1kMmp1cHl0ZXIpCiNybWQyanVweXRlcigibGVzc29uXzE0LlJtZCIpCiNgYGAK
+ + +
+
+ +
+ + + + + + + + + + + + + + + + + diff --git a/6-NLP/1-Introduction-to-NLP/README.md b/6-NLP/1-Introduction-to-NLP/README.md index 81adfbbba..430b683df 100644 --- a/6-NLP/1-Introduction-to-NLP/README.md +++ b/6-NLP/1-Introduction-to-NLP/README.md @@ -133,7 +133,7 @@ Let's create the bot next. We'll start by defining some phrases. It was nice talking to you, goodbye! ``` - One possible solution to the task is [here](solution/bot.py) + One possible solution to the task is [here](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/1-Introduction-to-NLP/solution/bot.py) ✅ Stop and consider diff --git a/6-NLP/2-Tasks/README.md b/6-NLP/2-Tasks/README.md index c29033b48..036e30f7c 100644 --- a/6-NLP/2-Tasks/README.md +++ b/6-NLP/2-Tasks/README.md @@ -187,7 +187,7 @@ Hmm, that's not great. Can you tell me more about old hounddogs? It was nice talking to you, goodbye! ``` -One possible solution to the task is [here](solution/bot.py) +One possible solution to the task is [here](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/2-Tasks/solution/bot.py) ✅ Knowledge Check diff --git a/6-NLP/3-Translation-Sentiment/README.md b/6-NLP/3-Translation-Sentiment/README.md index 712ef7313..9ed7a186f 100644 --- a/6-NLP/3-Translation-Sentiment/README.md +++ b/6-NLP/3-Translation-Sentiment/README.md @@ -143,7 +143,7 @@ Your task is to determine, using sentiment polarity, if *Pride and Prejudice* ha 1. If the polarity is 1 or -1 store the sentence in an array or list of positive or negative messages 5. At the end, print out all the positive sentences and negative sentences (separately) and the number of each. -Here is a sample [solution](solution/notebook.ipynb). +Here is a sample [solution](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/3-Translation-Sentiment/solution/notebook.ipynb). ✅ Knowledge Check diff --git a/6-NLP/5-Hotel-Reviews-2/README.md b/6-NLP/5-Hotel-Reviews-2/README.md index 38d32907d..36c00bedd 100644 --- a/6-NLP/5-Hotel-Reviews-2/README.md +++ b/6-NLP/5-Hotel-Reviews-2/README.md @@ -202,7 +202,7 @@ Finally, and this is delightful (because it didn't take much processing at all), | Family with older children | 26349 | | With a pet | 1405 | -You could argue that `Travellers with friends` is the same as `Group` more or less, and that would be fair to combine the two as above. The code for identifying the correct tags is [the Tags notebook](solution/1-notebook.ipynb). +You could argue that `Travellers with friends` is the same as `Group` more or less, and that would be fair to combine the two as above. The code for identifying the correct tags is [the Tags notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb). The final step is to create new columns for each of these tags. Then, for every review row, if the `Tag` column matches one of the new columns, add a 1, if not, add a 0. The end result will be a count of how many reviewers chose this hotel (in aggregate) for, say, business vs leisure, or to bring a pet to, and this is useful information when recommending a hotel. @@ -347,13 +347,13 @@ print("Saving results to Hotel_Reviews_NLP.csv") df.to_csv(r"../data/Hotel_Reviews_NLP.csv", index = False) ``` -You should run the entire code for [the analysis notebook](solution/3-notebook.ipynb) (after you've run [your filtering notebook](solution/1-notebook.ipynb) to generate the Hotel_Reviews_Filtered.csv file). +You should run the entire code for [the analysis notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb) (after you've run [your filtering notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) to generate the Hotel_Reviews_Filtered.csv file). To review, the steps are: -1. Original dataset file **Hotel_Reviews.csv** is explored in the previous lesson with [the explorer notebook](../4-Hotel-Reviews-1/solution/notebook.ipynb) -2. Hotel_Reviews.csv is filtered by [the filtering notebook](solution/1-notebook.ipynb) resulting in **Hotel_Reviews_Filtered.csv** -3. Hotel_Reviews_Filtered.csv is processed by [the sentiment analysis notebook](solution/3-notebook.ipynb) resulting in **Hotel_Reviews_NLP.csv** +1. Original dataset file **Hotel_Reviews.csv** is explored in the previous lesson with [the explorer notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb) +2. Hotel_Reviews.csv is filtered by [the filtering notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) resulting in **Hotel_Reviews_Filtered.csv** +3. Hotel_Reviews_Filtered.csv is processed by [the sentiment analysis notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb) resulting in **Hotel_Reviews_NLP.csv** 4. Use Hotel_Reviews_NLP.csv in the NLP Challenge below ### Conclusion diff --git a/7-TimeSeries/1-Introduction/README.md b/7-TimeSeries/1-Introduction/README.md index 66af0a20b..742b89c8e 100644 --- a/7-TimeSeries/1-Introduction/README.md +++ b/7-TimeSeries/1-Introduction/README.md @@ -71,9 +71,9 @@ In the next lesson, you will build an ARIMA model using [Univariate Time Series] ✅ Identify the variable that changes over time in this dataset -## Time Series [data characteristics](https://online.stat.psu.edu/stat510/lesson/1/1.1) to consider +## Time Series data characteristics to consider -When looking at time series data, you might notice that it has certain characteristics that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques. +When looking at time series data, you might notice that it has [certain characteristics](https://online.stat.psu.edu/stat510/lesson/1/1.1) that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques. Here are some concepts you should know to be able to work with time series: diff --git a/7-TimeSeries/2-ARIMA/README.md b/7-TimeSeries/2-ARIMA/README.md index 8e2e775ee..6de07c0d4 100644 --- a/7-TimeSeries/2-ARIMA/README.md +++ b/7-TimeSeries/2-ARIMA/README.md @@ -34,7 +34,7 @@ Bottom line: ARIMA is used to make a model fit the special form of time series d ## Exercise - build an ARIMA model -Open the _/working_ folder in this lesson and find the _notebook.ipynb_ file. +Open the [_/working_](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA/working) folder in this lesson and find the [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/7-TimeSeries/2-ARIMA/working/notebook.ipynb) file. 1. Run the notebook to load the `statsmodels` Python library; you will need this for ARIMA models. diff --git a/7-TimeSeries/3-SVR/README.md b/7-TimeSeries/3-SVR/README.md index cf5005583..8d4eea600 100644 --- a/7-TimeSeries/3-SVR/README.md +++ b/7-TimeSeries/3-SVR/README.md @@ -24,7 +24,7 @@ In the last lesson you learned about ARIMA, which is a very successful statistic The first few steps for data preparation are the same as that of the previous lesson on [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA). -Open the _/working_ folder in this lesson and find the _notebook.ipynb_ file.[^2] +Open the [_/working_](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/3-SVR/working) folder in this lesson and find the [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/7-TimeSeries/3-SVR/working/notebook.ipynb) file.[^2] 1. Run the notebook and import the necessary libraries: [^2] @@ -383,4 +383,4 @@ This lesson was to introduce the application of SVR for Time Series Forecasting. [^1]: The text, code and output in this section was contributed by [@AnirbanMukherjeeXD](https://github.com/AnirbanMukherjeeXD) -[^2]: The text, code and output in this section was taken from [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA) \ No newline at end of file +[^2]: The text, code and output in this section was taken from [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA) diff --git a/8-Reinforcement/1-QLearning/README.md b/8-Reinforcement/1-QLearning/README.md index c7d6263e3..9535b5019 100644 --- a/8-Reinforcement/1-QLearning/README.md +++ b/8-Reinforcement/1-QLearning/README.md @@ -17,9 +17,9 @@ By using reinforcement learning and a simulator (the game), you can learn how to In this lesson, we will be experimenting with some code in Python. You should be able to run the Jupyter Notebook code from this lesson, either on your computer or somewhere in the cloud. -You can open [the lesson notebook](notebook.ipynb) and walk through this lesson to build. +You can open [the lesson notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/notebook.ipynb) and walk through this lesson to build. -> **Note:** If you are opening this code from the cloud, you also need to fetch the [`rlboard.py`](rlboard.py) file, which is used in the notebook code. Add it to the same directory as the notebook. +> **Note:** If you are opening this code from the cloud, you also need to fetch the [`rlboard.py`](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/rlboard.py) file, which is used in the notebook code. Add it to the same directory as the notebook. ## Introduction @@ -41,7 +41,7 @@ Each cell in this board can either be: * an **apple**, which represents something Peter would be glad to find in order to feed himself. * a **wolf**, which is dangerous and should be avoided. -There is a separate Python module, [`rlboard.py`](rlboard.py), which contains the code to work with this environment. Because this code is not important for understanding our concepts, we will import the module and use it to create the sample board (code block 1): +There is a separate Python module, [`rlboard.py`](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/rlboard.py), which contains the code to work with this environment. Because this code is not important for understanding our concepts, we will import the module and use it to create the sample board (code block 1): ```python from rlboard import * @@ -186,7 +186,7 @@ Suppose we are now at the state *s*, and we want to move to the next state *s'*. This gives the **Bellman formula** for calculating the value of the Q-Table at state *s*, given action *a*: - + Here γ is the so-called **discount factor** that determines to which extent you should prefer the current reward over the future reward and vice versa. @@ -316,4 +316,5 @@ Overall, it is important to remember that the success and quality of the learnin ## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/46/) -## Assignment [A More Realistic World](assignment.md) +## Assignment +[A More Realistic World](assignment.md) diff --git a/8-Reinforcement/1-QLearning/images/bellman-equation.png b/8-Reinforcement/1-QLearning/images/bellman-equation.png new file mode 100644 index 000000000..60ff3c97b Binary files /dev/null and b/8-Reinforcement/1-QLearning/images/bellman-equation.png differ diff --git a/8-Reinforcement/2-Gym/README.md b/8-Reinforcement/2-Gym/README.md index de3eac373..b5e4237a6 100644 --- a/8-Reinforcement/2-Gym/README.md +++ b/8-Reinforcement/2-Gym/README.md @@ -1,7 +1,7 @@ # CartPole Skating The problem we have been solving in the previous lesson might seem like a toy problem, not really applicable for real life scenarios. This is not the case, because many real world problems also share this scenario - including playing Chess or Go. They are similar, because we also have a board with given rules and a **discrete state**. -https://white-water-09ec41f0f.azurestaticapps.net/ + ## [Pre-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/47/) ## Introduction @@ -331,10 +331,11 @@ You should see something like this: ## [Post-lecture quiz](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/48/) -## Assignment: [Train a Mountain Car](assignment.md) +## Assignment +[Train a Mountain Car](assignment.md) ## Conclusion We have now learned how to train agents to achieve good results just by providing them a reward function that defines the desired state of the game, and by giving them an opportunity to intelligently explore the search space. We have successfully applied the Q-Learning algorithm in the cases of discrete and continuous environments, but with discrete actions. -It's important to also study situations where action state is also continuous, and when observation space is much more complex, such as the image from the Atari game screen. In those problems we often need to use more powerful machine learning techniques, such as neural networks, in order to achieve good results. Those more advanced topics are the subject of our forthcoming more advanced AI course. \ No newline at end of file +It's important to also study situations where action state is also continuous, and when observation space is much more complex, such as the image from the Atari game screen. In those problems we often need to use more powerful machine learning techniques, such as neural networks, in order to achieve good results. Those more advanced topics are the subject of our forthcoming more advanced AI course. diff --git a/9-Real-World/1-Applications/README.md b/9-Real-World/1-Applications/README.md index 3cf3e16d2..36643172c 100644 --- a/9-Real-World/1-Applications/README.md +++ b/9-Real-World/1-Applications/README.md @@ -19,16 +19,14 @@ The finance sector offers many opportunities for machine learning. Many problems We learned about [k-means clustering](../../5-Clustering/2-K-Means/README.md) earlier in the course, but how can it be used to solve problems related to credit card fraud? K-means clustering comes in handy during a credit card fraud detection technique called **outlier detection**. Outliers, or deviations in observations about a set of data, can tell us if a credit card is being used in a normal capacity or if something unusual is going on. As shown in the paper linked below, you can sort credit card data using a k-means clustering algorithm and assign each transaction to a cluster based on how much of an outlier it appears to be. Then, you can evaluate the riskiest clusters for fraudulent versus legitimate transactions. - -https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.1195&rep=rep1&type=pdf +[Reference](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.1195&rep=rep1&type=pdf) ### Wealth management In wealth management, an individual or firm handles investments on behalf of their clients. Their job is to sustain and grow wealth in the long-term, so it is essential to choose investments that perform well. One way to evaluate how a particular investment performs is through statistical regression. [Linear regression](../../2-Regression/1-Tools/README.md) is a valuable tool for understanding how a fund performs relative to some benchmark. We can also deduce whether or not the results of the regression are statistically significant, or how much they would affect a client's investments. You could even further expand your analysis using multiple regression, where additional risk factors can be taken into account. For an example of how this would work for a specific fund, check out the paper below on evaluating fund performance using regression. - -http://www.brightwoodventures.com/evaluating-fund-performance-using-regression/ +[Reference](http://www.brightwoodventures.com/evaluating-fund-performance-using-regression/) ## 🎓 Education @@ -37,14 +35,12 @@ The educational sector is also a very interesting area where ML can be applied. ### Predicting student behavior [Coursera](https://coursera.com), an online open course provider, has a great tech blog where they discuss many engineering decisions. In this case study, they plotted a regression line to try to explore any correlation between a low NPS (Net Promoter Score) rating and course retention or drop-off. - -https://medium.com/coursera-engineering/controlled-regression-quantifying-the-impact-of-course-quality-on-learner-retention-31f956bd592a +[Reference](https://medium.com/coursera-engineering/controlled-regression-quantifying-the-impact-of-course-quality-on-learner-retention-31f956bd592a) ### Mitigating bias [Grammarly](https://grammarly.com), a writing assistant that checks for spelling and grammar errors, uses sophisticated [natural language processing systems](../../6-NLP/README.md) throughout its products. They published an interesting case study in their tech blog about how they dealt with gender bias in machine learning, which you learned about in our [introductory fairness lesson](../../1-Introduction/3-fairness/README.md). - -https://www.grammarly.com/blog/engineering/mitigating-gender-bias-in-autocorrect/ +[Reference](https://www.grammarly.com/blog/engineering/mitigating-gender-bias-in-autocorrect/) ## 👜 Retail @@ -53,14 +49,12 @@ The retail sector can definitely benefit from the use of ML, with everything fro ### Personalizing the customer journey At Wayfair, a company that sells home goods like furniture, helping customers find the right products for their taste and needs is paramount. In this article, engineers from the company describe how they use ML and NLP to "surface the right results for customers". Notably, their Query Intent Engine has been built to use entity extraction, classifier training, asset and opinion extraction, and sentiment tagging on customer reviews. This is a classic use case of how NLP works in online retail. - -https://www.aboutwayfair.com/tech-innovation/how-we-use-machine-learning-and-natural-language-processing-to-empower-search +[Reference](https://www.aboutwayfair.com/tech-innovation/how-we-use-machine-learning-and-natural-language-processing-to-empower-search) ### Inventory management Innovative, nimble companies like [StitchFix](https://stitchfix.com), a box service that ships clothing to consumers, rely heavily on ML for recommendations and inventory management. Their styling teams work together with their merchandising teams, in fact: "one of our data scientists tinkered with a genetic algorithm and applied it to apparel to predict what would be a successful piece of clothing that doesn't exist today. We brought that to the merchandise team and now they can use that as a tool." - -https://www.zdnet.com/article/how-stitch-fix-uses-machine-learning-to-master-the-science-of-styling/ +[Reference](https://www.zdnet.com/article/how-stitch-fix-uses-machine-learning-to-master-the-science-of-styling/) ## 🏥 Health Care @@ -69,20 +63,17 @@ The health care sector can leverage ML to optimize research tasks and also logis ### Managing clinical trials Toxicity in clinical trials is a major concern to drug makers. How much toxicity is tolerable? In this study, analyzing various clinical trial methods led to the development of a new approach for predicting the odds of clinical trial outcomes. Specifically, they were able to use random forest to produce a [classifier](../../4-Classification/README.md) that is able to distinguish between groups of drugs. - -https://www.sciencedirect.com/science/article/pii/S2451945616302914 +[Reference](https://www.sciencedirect.com/science/article/pii/S2451945616302914) ### Hospital readmission management Hospital care is costly, especially when patients have to be readmitted. This paper discusses a company that uses ML to predict readmission potential using [clustering](../../5-Clustering/README.md) algorithms. These clusters help analysts to "discover groups of readmissions that may share a common cause". - -https://healthmanagement.org/c/healthmanagement/issuearticle/hospital-readmissions-and-machine-learning +[Reference](https://healthmanagement.org/c/healthmanagement/issuearticle/hospital-readmissions-and-machine-learning) ### Disease management The recent pandemic has shone a bright light on the ways that machine learning can aid in stopping the spread of disease. In this article, you'll recognize the use of ARIMA, logistic curves, linear regression, and SARIMA. "This work is an attempt to calculate the rate of spread of this virus and thus to predict the deaths, recoveries, and confirmed cases, so that it may help us to prepare better and survive." - -https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979218/ +[Reference](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979218/) ## 🌲 Ecology and Green Tech @@ -93,22 +84,19 @@ Nature and ecology consists of many sensitive systems where the interplay betwee You learned about [Reinforcement Learning](../../8-Reinforcement/README.md) in previous lessons. It can be very useful when trying to predict patterns in nature. In particular, it can be used to track ecological problems like forest fires and the spread of invasive species. In Canada, a group of researchers used Reinforcement Learning to build forest wildfire dynamics models from satellite images. Using an innovative "spatially spreading process (SSP)", they envisioned a forest fire as "the agent at any cell in the landscape." "The set of actions the fire can take from a location at any point in time includes spreading north, south, east, or west or not spreading. This approach inverts the usual RL setup since the dynamics of the corresponding Markov Decision Process (MDP) is a known function for immediate wildfire spread." Read more about the classic algorithms used by this group at the link below. - -https://www.frontiersin.org/articles/10.3389/fict.2018.00006/full +[Reference](https://www.frontiersin.org/articles/10.3389/fict.2018.00006/full) ### Motion sensing of animals While deep learning has created a revolution in visually tracking animal movements (you can build your own [polar bear tracker](https://docs.microsoft.com/learn/modules/build-ml-model-with-azure-stream-analytics/?WT.mc_id=academic-77952-leestott) here), classic ML still has a place in this task. Sensors to track movements of farm animals and IoT make use of this type of visual processing, but more basic ML techniques are useful to preprocess data. For example, in this paper, sheep postures were monitored and analyzed using various classifier algorithms. You might recognize the ROC curve on page 335. - -https://druckhaus-hofmann.de/gallery/31-wj-feb-2020.pdf +[Reference](https://druckhaus-hofmann.de/gallery/31-wj-feb-2020.pdf) ### ⚡️ Energy Management In our lessons on [time series forecasting](../../7-TimeSeries/README.md), we invoked the concept of smart parking meters to generate revenue for a town based on understanding supply and demand. This article discusses in detail how clustering, regression and time series forecasting combined to help predict future energy use in Ireland, based off of smart metering. - -https://www-cdn.knime.com/sites/default/files/inline-images/knime_bigdata_energy_timeseries_whitepaper.pdf +[Reference](https://www-cdn.knime.com/sites/default/files/inline-images/knime_bigdata_energy_timeseries_whitepaper.pdf) ## 💼 Insurance @@ -117,8 +105,7 @@ The insurance sector is another sector that uses ML to construct and optimize vi ### Volatility Management MetLife, a life insurance provider, is forthcoming with the way they analyze and mitigate volatility in their financial models. In this article you'll notice binary and ordinal classification visualizations. You'll also discover forecasting visualizations. - -https://investments.metlife.com/content/dam/metlifecom/us/investments/insights/research-topics/macro-strategy/pdf/MetLifeInvestmentManagement_MachineLearnedRanking_070920.pdf +[Reference](https://investments.metlife.com/content/dam/metlifecom/us/investments/insights/research-topics/macro-strategy/pdf/MetLifeInvestmentManagement_MachineLearnedRanking_070920.pdf) ## 🎨 Arts, Culture, and Literature @@ -127,8 +114,7 @@ In the arts, for example in journalism, there are many interesting problems. Det ### Fake news detection Detecting fake news has become a game of cat and mouse in today's media. In this article, researchers suggest that a system combining several of the ML techniques we have studied can be tested and the best model deployed: "This system is based on natural language processing to extract features from the data and then these features are used for the training of machine learning classifiers such as Naive Bayes, Support Vector Machine (SVM), Random Forest (RF), Stochastic Gradient Descent (SGD), and Logistic Regression(LR)." - -https://www.irjet.net/archives/V7/i6/IRJET-V7I6688.pdf +[Reference](https://www.irjet.net/archives/V7/i6/IRJET-V7I6688.pdf) This article shows how combining different ML domains can produce interesting results that can help stop fake news from spreading and creating real damage; in this case, the impetus was the spread of rumors about COVID treatments that incited mob violence. @@ -137,16 +123,14 @@ This article shows how combining different ML domains can produce interesting re Museums are at the cusp of an AI revolution in which cataloging and digitizing collections and finding links between artifacts is becoming easier as technology advances. Projects such as [In Codice Ratio](https://www.sciencedirect.com/science/article/abs/pii/S0306457321001035#:~:text=1.,studies%20over%20large%20historical%20sources.) are helping unlock the mysteries of inaccessible collections such as the Vatican Archives. But, the business aspect of museums benefits from ML models as well. For example, the Art Institute of Chicago built models to predict what audiences are interested in and when they will attend expositions. The goal is to create individualized and optimized visitor experiences each time the user visits the museum. "During fiscal 2017, the model predicted attendance and admissions within 1 percent of accuracy, says Andrew Simnick, senior vice president at the Art Institute." - -https://www.chicagobusiness.com/article/20180518/ISSUE01/180519840/art-institute-of-chicago-uses-data-to-make-exhibit-choices +[Reference](https://www.chicagobusiness.com/article/20180518/ISSUE01/180519840/art-institute-of-chicago-uses-data-to-make-exhibit-choices) ## 🏷 Marketing ### Customer segmentation The most effective marketing strategies target customers in different ways based on various groupings. In this article, the uses of Clustering algorithms are discussed to support differentiated marketing. Differentiated marketing helps companies improve brand recognition, reach more customers, and make more money. - -https://ai.inqline.com/machine-learning-for-marketing-customer-segmentation/ +[Reference](https://ai.inqline.com/machine-learning-for-marketing-customer-segmentation/) ## 🚀 Challenge diff --git a/README.md b/README.md index 1a40ccbd4..4cc946f7b 100644 --- a/README.md +++ b/README.md @@ -100,11 +100,11 @@ By ensuring that the content aligns with projects, the process is made more enga | 08 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build a logistic regression model | | | | 09 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a web app to use your trained model | [Python](3-Web-App/1-Web-App/README.md) | Jen | | 10 | Introduction to classification | [Classification](4-Classification/README.md) | Clean, prep, and visualize your data; introduction to classification |