From 7609666fb2e52a1528832b3188ecb9d035f081ad Mon Sep 17 00:00:00 2001 From: Jen Looper Date: Mon, 7 Jun 2021 18:11:05 -0400 Subject: [PATCH] classification 1 --- 1-Introduction/1-intro-to-ML/README.md | 2 + 4-Classification/1-Introduction/README.md | 195 ++- .../1-Introduction/images/cuisine-dist.png | Bin 0 -> 4869 bytes .../1-Introduction/images/indian.png | Bin 0 -> 8174 bytes .../1-Introduction/images/japanese.png | Bin 0 -> 7755 bytes .../1-Introduction/images/korean.png | Bin 0 -> 9108 bytes .../1-Introduction/images/thai.png | Bin 0 -> 7844 bytes .../1-Introduction/notebook.ipynb | 28 + .../solution/data-prep-visual.ipynb | 1521 ----------------- .../solution/intro-classification.ipynb | 563 ------ .../1-Introduction/solution/notebook.ipynb | 164 +- .../README.md | 0 .../assignment.md | 0 .../notebook.ipynb} | 0 .../2-Classifiers-1/solution/notebook.ipynb | 336 ++++ .../translations/README.es.md | 0 .../README.md | 0 .../assignment.md | 0 .../3-Classifiers-2/notebook.ipynb | 0 .../3-Classifiers-2/solution/notebook.ipynb | 0 .../3-Classifiers-2/translations/README.es.md | 0 4-Classification/4-Applied/notebook.ipynb | 0 .../4-Applied/solution/notebook.ipynb | 0 4-Classification/README.md | 9 +- 24 files changed, 653 insertions(+), 2165 deletions(-) create mode 100644 4-Classification/1-Introduction/images/cuisine-dist.png create mode 100644 4-Classification/1-Introduction/images/indian.png create mode 100644 4-Classification/1-Introduction/images/japanese.png create mode 100644 4-Classification/1-Introduction/images/korean.png create mode 100644 4-Classification/1-Introduction/images/thai.png delete mode 100644 4-Classification/1-Introduction/solution/data-prep-visual.ipynb delete mode 100644 4-Classification/1-Introduction/solution/intro-classification.ipynb rename 4-Classification/{2-Discriminative => 2-Classifiers-1}/README.md (100%) rename 4-Classification/{2-Discriminative => 2-Classifiers-1}/assignment.md (100%) rename 4-Classification/{2-Discriminative/translations/README.es.md => 2-Classifiers-1/notebook.ipynb} (100%) create mode 100644 4-Classification/2-Classifiers-1/solution/notebook.ipynb rename 4-Classification/{3-Generative => 2-Classifiers-1}/translations/README.es.md (100%) rename 4-Classification/{3-Generative => 3-Classifiers-2}/README.md (100%) rename 4-Classification/{3-Generative => 3-Classifiers-2}/assignment.md (100%) create mode 100644 4-Classification/3-Classifiers-2/notebook.ipynb create mode 100644 4-Classification/3-Classifiers-2/solution/notebook.ipynb create mode 100644 4-Classification/3-Classifiers-2/translations/README.es.md create mode 100644 4-Classification/4-Applied/notebook.ipynb create mode 100644 4-Classification/4-Applied/solution/notebook.ipynb diff --git a/1-Introduction/1-intro-to-ML/README.md b/1-Introduction/1-intro-to-ML/README.md index 8e431fdbb..1d7a03697 100644 --- a/1-Introduction/1-intro-to-ML/README.md +++ b/1-Introduction/1-intro-to-ML/README.md @@ -2,6 +2,8 @@ [![ML, AI, Deep Learning - What's the difference?](https://img.youtube.com/vi/lTd9RSxS9ZE/0.jpg)](https://youtu.be/lTd9RSxS9ZE "ML, AI, Deep Learning - What's the difference?") +python path: https://docs.microsoft.com/en-us/learn/paths/python-language/ + > ๐ŸŽฅ Click the image above for a video discussing the difference between Machine Learning, AI, and Deep Learning. ## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/1/) ### Introduction diff --git a/4-Classification/1-Introduction/README.md b/4-Classification/1-Introduction/README.md index bc0c7aa4a..446fecef4 100644 --- a/4-Classification/1-Introduction/README.md +++ b/4-Classification/1-Introduction/README.md @@ -1,32 +1,205 @@ # Introduction to Classification -In these four lessons, you will discover the 'meat and potatoes' of classic machine learning - Classification. No pun intended - we will walk through using various classification algorithms with a dataset all about the brilliant cuisines of Asia. Hope you're hungry! +In these four lessons, you will discover the 'meat and potatoes' of classic machine learning - Classification. No pun intended - we will walk through using various classification algorithms with a dataset all about the brilliant cuisines of Asia and India. Hope you're hungry! Classification is a form of [supervised learning](https://wikipedia.org/wiki/Supervised_learning) that bears a lot in common with Regression techniques. If machine learning is all about assigning names to things via datasets, then classification generally falls into two groups: binary classification and multiclass classfication. -Remember, Linear Regression helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict what price a pumpkin would be in September vs. December, for example. Logistic Regression helped you discover binary categories: at this price point, is this pumpkin orange or not-orange? - -Classification uses various algorithms to determine other ways of determining a data point's label or class. Let's work with this recipe data to see whether, by observing a group of ingredients, we can determine its cuisine of origin. - [![Introduction to Classification](https://img.youtube.com/vi/eg8DJYwdMyg/0.jpg)](https://youtu.be/eg8DJYwdMyg "Introduction to Classification") > ๐ŸŽฅ Click the image above for a video: MIT's John Guttag introduces Classification +Remember, Linear Regression helped you predict relationships between variables and make accurate predictions on where a new datapoint would fall in relationship to that line. So, you could predict what price a pumpkin would be in September vs. December, for example. Logistic Regression helped you discover binary categories: at this price point, is this pumpkin orange or not-orange? + +Classification uses various algorithms to determine other ways of determining a data point's label or class. Let's work with this recipe data to see whether, by observing a group of ingredients, we can determine its cuisine of origin. ## [Pre-lecture quiz](link-to-quiz-app) ### Introduction -Before working to clean the data and prepare it for analysis, it's useful to understand several of the algorithms that you will use. +Classification is one of the fundamental activities of the machine learning researcher and data scientist. From basic classification of a binary value ("is this email spam or not?") to complex image classification and segmentation using computer vision, it's always useful to be able to sort data into classes and ask questions of it. Or, to state the process in a more scientific way, your classification method creates a predictive model that enables you to map the relationship between input variables to output variables. + +Before starting the process of cleaning our data, visualizing it, and prepping it for our ML tasks, let's learn a bit about the various ways machine learning can be leveraged to classify data. + +Derived from [statistics](https://wikipedia.org/wiki/Statistical_classification), classification using classic machine learning uses features, such as 'smoker','weight', and 'age' to determine 'likelihood of developing X disease'. As a supervised learning technique similar to the Regression exercises you performed earlier, your data is labeled and the ML algorithms use those labels to classify and predict classes (or 'features') of a dataset and assign them to a group or outcome. + +โœ… Take a moment to imagine a dataset about recipes. What would a multiclass model be able to answer? What would a binary model be able to answer? What if you wanted to determine whether a given cuisine was likely to contain Fenugreek? What if you wanted to see if, given a present of a grocery bag full of star anise, artichokes, cauliflower, and horseradish, you could create a typical Indian dish? + +## Hello 'classifier' + +The question we want to ask of this recipe dataset is actually a **multiclass question**, as we have several potential national cuisines to work with. Given a batch of ingredients, which of these many classes will the data fit? + +Scikit-Learn offers several different algorithms to use to classify data, depending on the kind of problem you want to solve. In the next two lessons, you'll learn about several of these algorithms. + +## Clean and Balance Your Data + +The first task at hand before starting this project is to clean and **balance** your data to get better results. Start with the blank `notebook.ipynb` file ini the root of this folder. + +The first think to install is [imblearn](https://imbalanced-learn.org/stable/). This is a Scikit-Learn package that will allow you to better balance the data (you will learn more about this task in a minute). + +```python +pip install imblearn +``` + +Then, import the packages you need to import your data and visualize it. Import SMOTE from imblearn. + +```python +import pandas as pd +import matplotlib.pyplot as plt +import matplotlib as mpl +import numpy as np +from imblearn.over_sampling import SMOTE +``` +The next task will be to import the data: + +```python +df = pd.read_csv('../data/recipes.csv') +``` + +Check the data's shape: + +```python +df.head() +``` -- Support-vector machines -- Naive Bayes -- Decision trees -- K-nearest neighbor algorithm +The first five rows look like this: +| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini | +| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- | +| 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | +| 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | -โœ… Knowledge Check - use this moment to stretch students' knowledge with open questions +Get info about this data: +```python +df.info() +``` +``` + +RangeIndex: 2448 entries, 0 to 2447 +Columns: 385 entries, Unnamed: 0 to zucchini +dtypes: int64(384), object(1) +memory usage: 7.2+ MB +``` +## Learning about cuisines + +Now the work starts to become more interesting. Let's discover the distribution of data, per cuisine: + +```python +df.cuisine.value_counts().plot.barh() +``` + +![cuisine data distribution](images/cuisine-dist.png) + +There are a finite number of cuisines, but the distribution of data is uneven. You can fix that! Before doing so, explore a little more. How much data exactly is available per cuisine? + +```python +thai_df = df[(df.cuisine == "thai")] +japanese_df = df[(df.cuisine == "japanese")] +chinese_df = df[(df.cuisine == "chinese")] +indian_df = df[(df.cuisine == "indian")] +korean_df = df[(df.cuisine == "korean")] + +print(f'thai df: {thai_df.shape}') +print(f'japanese df: {japanese_df.shape}') +print(f'chinese df: {chinese_df.shape}') +print(f'indian df: {indian_df.shape}') +print(f'korean df: {korean_df.shape}') +``` +thai df: (289, 385) +japanese df: (320, 385) +chinese df: (442, 385) +indian df: (598, 385) +korean df: (799, 385) + +## Discovering ingredients + +Now you can dig deeper into the data and learn what are the typical ingredients per cuisine. You should clean out recurrent data that creates confusion between cuisines, so let's learn about this problem. + +Create a function in Python to create an ingredient dataframe. This function will start by dropping an unhelpful column and sort through ingredients by their count: + +```python +def create_ingredient_df(df): + ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value') + ingredient_df = ingredient_df[(ingredient_df.T != 0).any()] + ingredient_df = ingredient_df.sort_values(by='value', ascending=False + inplace=False) + return ingredient_df +``` +Now you can use that function to get an idea of top ten most popular ingredients by cuisine: + +```python +thai_ingredient_df = create_ingredient_df(thai_df) +thai_ingredient_df.head(10).plot.barh() +``` +![thai](images/thai.png) + +```python +japanese_ingredient_df = create_ingredient_df(japanese_df) +japanese_ingredient_df.head(10).plot.barh() +``` +![japanese](images/japanese.png) + +```python +chinese_ingredient_df = create_ingredient_df(chinese_df) +chinese_ingredient_df.head(10).plot.barh() +``` +![chinese](images/chinese.png) + +```python +indian_ingredient_df = create_ingredient_df(indian_df) +indian_ingredient_df.head(10).plot.barh() +``` +![indian](images/indian.png) + +```python +korean_ingredient_df = create_ingredient_df(korean_df) +korean_ingredient_df.head(10).plot.barh() +``` +![korean](images/korean.png) + +Now, drop the most common ingredients that create confusion between distinct cuisines. Everyone loves rice, garlic and ginger! + +```python +feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1) +labels_df = df.cuisine #.unique() +feature_df.head() +``` +## Balance the dataset + +Now that you have cleaned the data, use [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html) - "Synthetic Minority Over-sampling Technique" - to balance it. This strategy generates new samples by interpolation. + +```python +oversample = SMOTE() +transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df) +``` +By balancing your data, you'll have better results when classifying it. Now you can check the numbers of labels per ingredient: + +```python +print(f'new label count: {transformed_label_df.value_counts()}') +print(f'old label count: {df.cuisine.value_counts()}') +``` +new label count: korean 799 +chinese 799 +indian 799 +japanese 799 +thai 799 +Name: cuisine, dtype: int64 +old label count: korean 799 +indian 598 +chinese 442 +japanese 320 +thai 289 +Name: cuisine, dtype: int64 + +The data is nice and clean, balanced, and very delicious! You can take one more look at the data using `transformed_df.head()` and `transformed_df.info()`. Save a copy of this data for use in future lessons: + +```python +transformed_df.to_csv("../../data/cleaned_cuisine.csv") +``` +This fresh CSV can now be found in the root data folder. ## ๐Ÿš€Challenge ## [Post-lecture quiz](link-to-quiz-app) diff --git a/4-Classification/1-Introduction/images/cuisine-dist.png b/4-Classification/1-Introduction/images/cuisine-dist.png new file mode 100644 index 0000000000000000000000000000000000000000..41ed93add1cdae27bb0ac9e0eab769eb8034a122 GIT binary patch literal 4869 zcmc&&c~n!^zCKBiS*TD&WQgJcYLyuoWNIrE4udFqKm;NRhzx{iVLVm~rpO=_5WaO)pv=RK_uY5djJ3~Q~bOnEh)b0-Q!;n-z1{HfKEt@51Mq~*Wz~>ij#LV z0LUr5KXKhdNo(Rpix`KKF~=j%#Zb>g1p(wUF_iGgnDCIZyW@hQqC+AhO!STP4G-)- z9}`14WMJ@HzkXy?utCjKstN!o1wi(npNP+&8cMDni}jnC9WPVO+xBHDJuT6G%xSBZ z13umNqrHY&a#i6M`jp7L@+8-wG3~x?tBN+`Mry1JEN&pUbdK;y1kSA4$C?b|;W zCMkJt>AP&^XM2uXUe$TN)Rg?DGjk~ZO5VI>a8YCGJgvN@kG43zI+WPF(7*aDEV*f< zaS?*|Kpq*9YE%H~X0$0~g0WThAT)Gf8{}c`VvoEaFV_+dlIxr@Y|FY>kGC zF=^T%GT*!~@TbL`PcXl4&2tv`VRdMoInRflH7czO?aIBC_tcACIu*dm`yt+f_bC>z z8!FC8X@Fhjj2|>`|K28IA1g9^NRC8n9^nMvil5Y5DsUPgbk+%5y!MgZPDYmvH zzE;|zK4<{nkfcu~A*OcG%L7bcb1zR7GpFC7@8#DemQB>L*OCOW7@uB+G;FijX(3-r zSh~g@N*!qSVZ3J36hw_sN7%R*A&kLdEb1i4ihe-{vW2URQO4w{91RKIO}*R9t(YV| z$o;fQYW;p_AcuSFX&K`nCWlTu0Ie8VWWT8gWNmMm2Dy`7uy}^Ht_2{2olI36lNPnX zvF{4ahku&%6 zG4p$~Pj8`JZIv7SC9eE$)(cai0+oU5DPXIPCCGb@sgz9&VTp7mLyi73ljg-=xCyp) z{IcPMyb4JU*4pALyXJsQ$*0>5+Ej073RIX4Dz(Z>L+HvWgi5fYOYMAv4CE{tzO0FL zu^OKHNHmJ&VQPrAq&2>w1l4qYs9Ae}cQtPL8ol$)_$35vKaLj0E4m|(89wO5uZ`tJ zl4hKgab>bN`fut4h?r|~mtc`c5$IDEIut%xNTGTn!G;zm*V)G;n4reX_YLs6jp%Z# zDjKS5I9v4*)cwPJebw)XQJ-jLEhwP82xTuRXh5e7|5wub0FV7&o3b;ku(Jb6oBAc@ zq=VBtJvap_MOM?&-OZz}Vy^zv*>+UhBMk<;gsSUydwJXC5`2*2S)W^qYMwnKgb@#Y z@uuQ8oYR6xOyL5^(Ow=kqkW0E>n`*2ydmT@Xf-gm0ur6)ydYba(ELipjChyma>z=2G8ORW?xuZODmMz!?b=PTdR65_ZLDG;dw2L|0BA!2~ZtCrf)= z(w~4mEUkRqSdSEYKMSVzy3pIoCeF~&l~NEOR-lmpRmU>QAg7Uy#^FG&vc)cok1aH7 zEBrb?$KGN3f=P@B_REJy^j$Ec(D*w9uOUykXYXV46EKVJ2JA2C?X{k+1Va?>Sm@0C z+7-1Bz#7tXq=%ssYc}RbqGUw``Sh|ir4;nvBfXn_k_<5=^@qrf6BFgpxA%J=zk>j zdPFMy$LjSBKi9~H&_N@x{dvcGp^=#S%H+Qd#s@%44%9}3KSvL4JOf$$xi)6Fx|=c7 z{e^1(xIH@IFkky73(Jz*>4=!(@8qXhcW>C3@N^O1OyM+`qXYRYcALe?iIflL_!S8? z$jgq*!CYZ4NOXw`R7-Y27TJ>c&{EsHMeNG&F!Xulo8Qa%|$Byarj;t8r!BovPorSfm3Z3BY#UYH!KiZ@t5Zp9}a~IHrTXB zPme>$0{^9RG1}DOZ~ikpg{`C{_P^D`#vnFc(ZItR>P~WkZ&*G4*4wCtU7}wb_Sy8hnfwAg=|yx-@5^P||7+5Z<({51cckP>oiN^A-;{WyOHlz+tS)1?F% z7X-F`K`I49qTHT{Ne5Z!x&zvHUoW>u)S2SskK_myncPgd452S_su}Nm~780&OhgU}5lxoVx+IpATKKn7Mq%vnr@?#^WYv(~Z* zY&f!(8=3UX+-{ynje+mwk$*&1rVj`T8`onipqqi$mRIur(d}}u=wFJ@`OZ^|)uQ1~ z6c2mFBF?vWBF{L*3gs@yj$RLz*tG7r`M%;|2a3xQ7v>;S7!8@F?|Yocw9XA|?Rt3z zs(|MD7iKt|S#WtoZ)OTQLb^=rbe`*jrd^{Pkxgqzm09i@bsXWJYT~eG$3gCcv8B(D zl!2wIaU0R(^|znQ*YJI_#ZYGH8uB|pJ~Q0q;deLOo&@kC=DR>IkZZ;BgO0Fn(V#+p zQiMsVfi9QgF;~N1%UVc$u$LD!wHWlavl+eET|^JUMu1j83dg~>J7;x?3|DW6hCwS+AEk%%^Tf-OPaZl5ZDsywvK9TySKSq;Ya^Ot6D&0X22 z<<>|(N!lWY!;)%~=9031w3QvyABT5*V`R21gIwQkhfgZeBTA?rf|jpD=@T@8^uk*- z>=-uAC{?hT={0DXk35$2>!>ZR6z8Q_kgO~!F$&6iHInnx20P(Xcvi%4qK`;*6p`y4 zo~>*XNa-5wr5IA=L_}4ZEOogs;~?cA<&+O;nB69AVxwHUBVOh{cU{VOTmQh+8=;R& zpq8eOf9$HeAu~5J!*v|!bMvs3)Q&QsoJ_POOD3H+Dol0>qV_&hjBmL_4p&n5hBv#i zi9mXITR{6@2}jo`zQ)c-l|TJ08B4(q%AdS}z4!Z|rZo4;m0O%h7wI z9di;SUs#u5C%<+UF}dQEzWl(i!{2LW#pG5e2*I7 zlH2{SCY?v}W*BN6qIMd2Jybw+{P=Z@U<``GC;#N1(;MmTWF}x$dfr8)7V;RS9+TUto|>zg?3S3*dKas>9Ww6(%7~;eKlt4O)Rr6Qh8*V49amMa z~-AY$`N?og#8^22(ws*b>l(qK<5;)%}QVbB(!=V%FmPeXQ1>76(c+ zsnpht@o^2@8$`mO1TFnh#Fg!2T9DNym%Wkea52AAI74;{vWcl)w*T;)aC%$Kz6LG;1@+J1~l=+*M bZ`dh2kje3G8AS0fD*$phVqfuv|6l$NCM`tr literal 0 HcmV?d00001 diff --git a/4-Classification/1-Introduction/images/indian.png b/4-Classification/1-Introduction/images/indian.png new file mode 100644 index 0000000000000000000000000000000000000000..bd3f8b7da69b8104820de78ce12cbefabb911aac GIT binary patch literal 8174 zcmb7JcT`i^zD@yAq!>{_s)j^Fse*t=Q3=RNDAJ?|f|OwBB2|isHK2eTdJ{_kg9r!_ z7-?~gAQD7FQ9vQ10tTc?ukSE3?_Te^?~i#|D?2C2Svx!X{J!%0PTVPrlfr_Mf-o3N z7%;_I!(i}2=y+lCCTJ|WazYJy;k$SOu-yzjqBna|q4!&YOdT)6U|Y9uJmH+ru`AGr zIw8gmAvS@;knnTCUNHZ2AwhnDA$}zHJz-wK7fFEu8Y=24YD#;~hlB*_s;d6`dzHXo zZ`EgGH^pHvIWquv+%_V2>~mDAeQ5pq)bLj6<1#0VHDe=oYwj@SJ2w;2YG!`$@Yv=r zssyv76{Qa60vY@$x25~3g?O#Q4cV9Mht3@SS9s-#Y;oeTfbBk4_(Ad@M)KO%<@x5d z+tD(^f&*FgjqK5<-QjKh`KvwotNr=2Uj5N?{gsCCo7rr3`~&-bcxF~sTCn>ejmi@j z{%h<}nSfRC*N#=4>Fn(MCUYap(U@LV<|>Xt=42dIPdjdIE*u&f8hP^^@w64D{-gD8 zs)Jcsj`wxc|2;0qrery4e`Juv4%(M`zPuqEd6P4?G&k0IH)`!N3EsKBsc^)|NOXeh}Y3_+HwPD%mcuw}Ra(vJC!nu+D zQ~_gsiTcQ*)q`Rq|3wd2hF8On!Yj$G=m$=m3(OCcM)*teDmG?&&5n<<5!)C1aGH4C zEj5zgkt?W$5L}6BT#NwMg8PU;lb!#HYK=bs+YY%DaM$Y4jH~ z7t6is34B|l>gYXLucqZ$cUdtb-1yYaa?XxvuEubeQR;~E2HT~CqYcn=86j>y^WUiw z{F;bi8X?p^IpLcLGL$Kf?|rl>mj6?~lQpPmWt~TR-lQ$rJ0YZ@F?we_df^X@8%Qgn z2v8cB{iMDa)cb8HDY7>?rk6Bo>Sp4!HbCsV8&4f1E%LbuscFm1^Hrcep_s>+9`xuF zl^$;qvAa-)fy!iR=62LGlvQ6;^V--oUt)S0kh1T>>g8^q{}?K9(+){eXP*G1}0GLg|!w_Gvk-t%6S<~dpy zTCf>v7l+J7Mn{jVul)RN`pv=C?Y`mr3Zuw`?ARWHE(5t-r^Gu3)RU7kQVrp`S?TQo z9_!2Gr(l?6^Cvrw0crgqwq7-D0z+B{R~t;i@<*12gtN2k9vF%0r$T>vl0#5^>h=N= z*5f7KzuZ4RomLio-ALrfH8D%QWV^?wr>0v~!ynaqwWdFfm~uCmxi6~j)46|R_fD58 zwQB@?UKFTu6(~jSD#TSv>pUBC$>dv+bRdMuJ9zI{_ z5r9xhq$+TX#H$G-DaEe*dS@z@hbcMOn8&BLt$XlU4DrST0Of6Y!+*M`C zUzlE!?^K~Dmf3Tn@-bN+jmG2&OWzqWG!0uXW~`E!j9rOQhCvQTr(&O9_sGezqfLEn zzWB#=@rr=Yy0VwP?=7^+|~VPR*tQ&evfrrNMD?zn%h$6M}hohVcn%JtzVN2-zJ}wZ7DR*iGa@CZ4kY1y4jIlhLJ?B?04A}y;H}q-OWU3 z+%nHq%7d79r}=T=4N-sJYZs;cyJFK^WQ=GV{l z-l}}7d+VyGLFQhg8$sXi_qb$^RS(?F((n%E6=ZPUy<_GCHnezUxSURB(GvT=S9EuE zci&JD)U2#ha(SML6xCCkPz&zhEE0;7sJ%m>tEGAx8V7eM?E8DH4B*%V=7r4F7BjkaXLWu7c$mSe`d6MeuIz|XH=bm z!{FK;o)$pBwxu#2($qm&&{mZv3tY8c6~M;0ayr0W1YNx}%k+@Pv}UZn0`GUw!+9^M zly)5~RXH+#2rv}bg^U`!utp^6TVeH9f?z?%aeDiBM6P=UsoOG%`V4GB4!HfyH9Km; z+7C1%w2=Zu(?(=@avpgR9HRLfQC`B~l^wB{@q=RMFLCKtKuuKU=6B2b$kv#*(4?oa zm_oyoEDzG6IMe?1_BVl*kqDk}i3 zc6n~$Su$@u!4`C`?Qj!3mVewRs=Kue+V4})ghP!puE=y8D_N#Q&OdeQLK$^|#M=ZM zW{&11cCy)3PWLC&u~7sc{E2+nUDzhAG@ux?)ys4WJ2x;fnFu_;8 zEdusYc;OPRj;{{GJI0*~aSYO*1nxa;ZGXdNJJHbX!k`L$RTnK1P~>B?DnMzKe^BbU z=;;=Ni>_twMVvZwMha5hF?&so%4*=xG8;?_Qfz0lFWoue`sNmOr(f)+TKU-|--77i zNHnDEr9-HctgM~!V*hl+-(P1|St~hp9c!ZrMYIP-*o~EOjvExU=3 z(mtWxB6KSdo;!1Pf7_j7g21l^^Xi_HaAvp=YuFT`v z4*%XcjsO}{86JR+0=_t9ri5m5h8n5>XcI2sInh8-^@X!B*i|#m1Katl^tuvxHAwI3 z=1j~w$d&l+c0@Mk8?JNowjia3(AkoiUu>6xEskZDP(L2is$VRF zVeF76;ToKoL0F8_k#8@+Qh1ofSien$KU60CdI~&#H;2j?ON2qUCLXzjh)TiM3?=T( zd5}k4xJnbJtoDuE8bZ4ZI{seJe((OPyiFh3CBKJ!o5&=>JSg*E!o4~8+Cg-zlM}%4 zdh%mnYs6u+NhsIBhN6LXr_g?qzv8P7uMZAHyD+m*PfL)^>Kc_-Mqa-Zh4!ok>=3H3 zVlR>~*Jv-mFacSl1z|q0m|_D*f0x$}R4b7No zcFz^Z#@2c>wk-&kZ$AHgF<2ysLgQy{zsGxV?I`&m2f29}?#h1`LkPZne%G6Pa*=hP zcmd%?L4$}bd@EKN7=5y+{_fx_Lt!0Nv{uyzaU!vMc866%pFCJB?;dk0d;FZ6JC1#@ z4=z|-Ri!mDGBPdll3nt^^e-e0#@hF%&InY-P|jRBVGV(2`hc%{Jk>I_vi=E{=gG}! zi_bX8YW_)WX0v;%%^O|x(BpsPCFSO1IldTN#iJog4iEY-BmI9^>%#R!dhIU|^Y~wg zeftxv_*V-1fOSP4tM02Bw1`IwghZ_w0_k4k6jy=NKi^0SSqG@fwE~KwX{~O zd77q{zw2I}5AYMC3~DN+iKfn+0SK7S7@jduTi~CcI+IE(!mip{PuMn`<%Dyjtxpbx zaUh~KdxhBt+=P20$!HrHQ-#|rUCmy8l$m0FDi;g2NIdfumbB^aOpucb}@Wyt!%(ioPzJs|F} zD9%LL;Mi%JQVyHzgp?4vmo5(baKnIngsi|CqHY&z6yw7G=^$Oad`pc)#M2)N8+f%= zk5gO>!u%RkLTeRX>L8QaM0cRNT*7ui zMOKU_nAu=t_-3%R+L~oom&b(7EFV6&j1W)Pc!avTY%^?6%gjnMHpUiG7g)Q-=cR+r zXT5|jqsl2j)G!nJ%UqTxk$C;fKOU{E0%P|V8$XtOl%=UKp~m(*_+;|&E5R4?8)hR_ zHYXF4r@%W&T0{fGLoyKc&%Pr>g8dizQ5THPKF!SPIcgW;4$jzdQ)e9UZ9_eoYYwqCRbE@7&rzg$CI@CeU?e1xU_(~N11 z%%PU2qjW$Vb*=h@l-|T z4zj`38%?M4;ppSYhi^UoBFRzaJ<>{Hsp`o7INZ>XKf<8Q1PiJ)v+0|?QSO{AgQ zGui=e?psX>R2lprHV?jso?XRrF70alZWVN}(aY z^8wP{$tZ5U*OrfLwk{5p78QlF4#kyNKFq#Ji9@NE)qef1asIqec92WBTsQj-J23Y9 zS#U)DK~-cV)^G);X3RP6IiDQRBCiO)E%7N|_wxNV_L~O}px=x}Pe`!YC3Oepp#C_^ zKPMNmgC(c_`FLQdP1N}5c{}9cl%b8rJXF1rP!<*OU$tXRWnp;E0K|)TAQ#tx(N`oj z&<22BBWqQl9gxhN<6(J`0J38^0~{)Be@}nwxgn&dI9ov#^TESlI{vv8XFD*D@7?R) z_b)kih}HKXcK`eaHQK@grHI5NYBF<}a*`sC{&I>d!X==lc1>>8vdoA@m+wJ$BL)N- z#8DgUqMat~E)jGnNM}-x^d^*ipE-+CmyS3*CZNV&gF!dJGLVJ=*zq`U1k-i6;eC16 zDul-@v|d`co|-(m8;)bSViRD6fFz%vsFG!9;3gvc1AG{1NElQ=2mn4lKlQ=lj_F+L z$IX7Jj7=Zq(^QkFMIFKVr&8vQ2xeMwPUojzNu(-z0=JybaB@(n1p1G&T(D3OtlIw+#~Ppvf=vZs7}B?N zh8K}|X5Tz(2Z}%rgoy%9u;>iGYnn#Z`sJlPDwPx;E!QSw&5lRrqNntb&WxRV^N`MW z+#M0Q4LkjzU`0XkK}A54X)!kH-5;xw%^ceQp<>4M&7EiZ*t zo`61>&-ri0QyAgJ+IAda@Gf5i6ukh3_&$r285(U7j8z~d*%~-<+CbUVnJY8`Au+0e z=8e@Cls;wMkWZ_{b`dV&IOXYy0|_pk#O{Zoqf~$s#67@$M0Ln0v})w9!J-3_LkuR7 z+}n)SyYzsEw4eCwOFMMX=o6eqxGSO|vq~6guz3cV4lHd6+8m^%7XKr67(Pp1+rfQ6 zjj-S}!VEVh2F9-Jdh3N&L*1K?+7w`34HZJ%cth$hMJlsp#kV*1;DmG@`@Ev!9qerq zxC!2eY$a5+#ZeQOCV($lpFEEYhj;%{rE*u%7h8a4`_y`5UTu@90O=$7k}{ANunO=Mn*#p7yUUVi5KgZOM}B8 z^_^%@sXVUbwInZdzR0lrfU@QHW2-kQ8kn&?bh#kOW4IL=2NjZY{7KLxY{nBJo|WgJ z8`X%3K}MKb%d5%N@GWG~4w9cSi;lkQ^sHzNyev-$GwMAut_O6G`I(Y#0WCup$@!+w z1s@^CO=sC`C4uIt5s0x~6a?%E*%?*Btv^?wHYz*b!VGM>W3IHI%0w}* zGgX+UxJy(CkvhTL9>2y0rmb5v?)Dz3Be8|FH_xdO!Yh0=b!i^|D{VI_?q)#Bc=~X=W&IFVn^k_B3wXoTOul~Pd8Ro#%e;bq%#KLOi*iY)l zRM-zJ49OeFZ_~K3quCr3d2>`mL!zqj~ok9H5ieM;_L&)S#y z9G=PVEF>+It3V>tkd!Ird-|#gHyl{{W&+eRsc)C~;G_obD`Yur z*kJGWKB$qm0xSp3eX9(Pt~&E6#ZkLp;91xlA`#vMyNHGVfRzBtnutcfKO|1VL{GAU z$apci_p@x#;0cXf6J4vX=tlz?DvcDSXw;Y65O#AMkb#@e`!^g_BFz3 z1{jw>=F@n(A$4BzovL!lrhVw`X zmQgom9Ry`l7!nJ@YBCN&N&-WHUVZHlsB7%4Ro%pTVjhY2ee`}pi(}@I@((j_R_%i> z(WtcX!rbfbk-vRU|G7*@{>dCUtrXjQS@)=$=+7R-4)UAki{83I2E5|+@`kAt{wD6|pGtpme)OkRyQrm%mUo8)m zOr-m{=Euq^@b&_gnEE|DCt%nViVD!ca>1n(tjpbWxzw39+hIpwqJ3Bi5~Wf|7r`4y zdwBCnmMQn=*pEZoYc}g_Ti8*487D3BiO-qy#-Oi2Hv_)@-b6_Z+XWQ+MnAlt;_VI* z5d9I0Pg;~srG~;#Qy;=c+7qZNC-UO%Gb%LXos4i7^zSyRS{(+g?d&W{$t?9 zWE;CgKP^4eX?h<|4>OCXQthdF>bdv?oUJ~;y5?k)*Q7N9Rf9O4!3r@{v*5sBKrP}R zS)H`pOfHUwTItH*M~_g)aZ1TE!rZ+;0yCOBKyd_j!NQSc!@D0s#35sh>sPm}>+S7* z^!Rb>Q78b8N7D_^J~p1a!G93B?4jTKv)z z+AycxB{*@42>4=@RTO>IO}t|7EJfd^z2}b$ub|n{^YhbZ-h=+G7E>xIA;BV3<5(UX zXg762^M2Cn?!}KtBmoHjlSR#qv1bM6P z7HVPtq|`A;I8vbH914ou8Y+q(NHAdDnaSgQZkrN8m=q1hGAk8JPt0AAn%Y&7WMgPo zG$$sXyuvX1z+{MA}e}_EMZ3plAZd=M5nmw!$rt&9IJ`y zAfjhE_nD`Kda8>aDj57gE?~L|_BQ*o`m&vub!I!37*1&haHxdM2HTda=>y&f1ygJ7 z>wTx|p77=-u8MV}O9`#~<{5+aU;o-dfX(BJGc?uH#!#ro9n-F4{pl@F%%8k+_!hhF z{JitR#g@VMCtl^KNXg*m8o&KMf1xts{=FC_dNp+4=`b^}#l*kNgq5SrnhC2TMIqJp zv`5H4pkHoGUYd^rmo$^KdDCm5?!9v|Vo(^71QRFdR)u(Ydd!|!Z##x%ZC>H~C@R5U zvZ)~@P-kKvU2*{%^NcuDKquwXYfVUIO)ytlwhd`!ZOBZr=7aEkte7tD;5} z8OFb+U$&5&S@r3FTNya3Z{v6$$10w=C>$i-W0;+hbK?tdIC}-{XXmTFt71U@%iv7X zsxsF^%N2hBR|giqj%^v(g$ghH^84Hc!ql$%Bq$5BzN(h>U1mlN6WoYn|xJ-&(`e|PZaN7_gudtkk;9F=vA%If~m-qtWj82z~0G_rD(>Z zztP6s<`x_8t3LV6wn5eLO0;Yyd43nt+`u0=b-D=x@g3NEf?2~cTfmpPLB@_j zHUYSx(91WxKz^5l0(}F5eDPO~gm~S!i4X8cqcl+JDo3sb1qJG0P_lJJi!r9;7Efef;2su0LK%YHAse=?w9xc_?O|vVAUY-1LjlMB+-;hDIBI z+zzgHqkV2W>g0H%^Jwt*=-zaq&S*t1qXYp#qs#vImJ$la;c#P;7C4;!F+l}1T3HyD zOr;*W7x%WUjn}L2CEvIGJObjv!NI}dX_s;KTEIqA^xAN!J4UiEf*MDMOioT7OELTL zqa9Mfr*JrJv8|19;paV$dLBnl@l5rVcz<6UW4KhI=B#U68)CLU`c(~o>&ic$-y5oK zcdF|iZZfa!T`hlPPO9t8D`jC!jq1zZVw^V#CwA{1CVL++(xGTGPt<+#-X0 zB;4Pj_jcS{}!(ag-X{f+RjyjXNSomARXx0fsNlu*GQxd(zCg`b|y+!7+v;`dfbawXC&;) z=tN1)q*8JHhEKxDMbqYo5-razKCRxIt<~wD4Q<$zI+_*48l7m^8dF0DPuhaSB_zJD zw-|M-FMK(%`q9(r#SJfGpP7q|!tg1gC=tv2a`o~u*2A6MZ(rD8qg_XwF2$*_OW+!L z(^CYL+E5$gz9mJS$mz=8jgHw^=43|Bz8DJbb8XAeNsO0{AeF?d6k;&>t5q@E5vZDP z2wX=d*3(mNA}drn%{?Z3?@((LvdLRiV-`#kJPH<mWVX;-M%%?{w_6*0y!jk%T zsM_h${%uvK(%)X+7rnXs!36sJ@++P@{P-nEm&g>oG3bQ)QN#$jJi$`!OYl|1{8_96 zD4O670?NKH!aF-$0qx!KHO0a0@?XF0ABvbNVmMdaYi>RVT;f?**I|W*fbt%DmxYnY zx$WqBuW*d|m8N~;ZB&~Dy~s=I#!bGZr|{I#65kq>WiijW)Qa86{^k2jg)eaws$K2= zg9?F-+@jATR(zym)-UIs&*|vUpR>p@j1K$uBtDFjo2K25>oi#VO7+hElodMiX{Krv zw>sS~8MFRrx?y)`?^I6Q91naW8~+>#Ybwh19SJUH7?@>%qAaTWd^=?pQ}mGPM?C3v zTMIa?k*FohJAzX09?71i=>Cd0niF;W>A8o0)YoT!dKDi~+CE!5*Wu)y_4-YcT>tfw zIx+na6_86noz`9eKHpc%d+~xAIvSm3j(R~uG}1aozLl=_^!3HVcs0usPHs+@Nd}&p zF*NV#T`>1~K-%b?T78zk*Yq1FDF15xVVZqFmM`a5ec+p#(?G9>ZLjoQM`b!UdrtP1 zxaU_b*o5P7!yZ+`A)b4n;9Q}!2MVqGDznBs_Ru042>bHB;);&{1SbWAyhH7WWFE5a zDJMF-rSu$F`*C0&Z3@zo9CS8)=MP%vge!n!J>POqVkM+PF+-0XRB^b0*iDEIIPUG8OGaqK&PUgl zt2q0qf>rBmS{$K^_<^}_s3LE zTS_C-Sa&MgW#+E|u`$+KqXBIH--F(VzIi?n@jZp=Bw8>aZ3D}LxF?V2zQ?Sx zU!^dUuQsj>ZGK!i!F){?0VhIhZD}3=t_s^NeLOQt)J6WxE;%s&M5xL}8S@Q-WGA1sC+I|006cdcLR=(y`Gr%%;; zJNjoaeFMPn26?HJP$_`A4%@{hDkADX^yJ9WLru<>W>B5RZ0!=UtQ2yp=|&xG5b3Ib zFQB1Wi(ceTeo@T!#n~}{VdbJ(CinX^fYEWZRJVR3ODMBzgrl%t!G|xza7;DxQ z56$5rks{@{S3j-0g7^b`4fIX_s^f|v)E8&3-Xm<0wPLk# z6PjUUnDj7eb45F~XenAAT{i2Wm9|J|AtcMMBzMc`;`yo!_)f7jn1b|Z?Z7*z5-wK0!=QO+B?$vhxYW z;j~TzMdG`7#HDAMLgvB3Alq+d1^l!hlMM(Qy2G*#`&O~i`?fYiN&#;kX?=QgXg1u# z_jaoWx(s!}QpkuxbrJ~t&t>wnPWij4tnC^+Tx>0s)9D$>QYHR| zhloq+*n;G8UCY=rtdg!}MRML(mnyZfX_LgZrctGyPfmQJSm$+wfsFR-p) z4AG(Eb2Hr3@_cZBm`Ru_jx zZ^k25DG5-w1kyn;Qh;k5dXVKo90UhIOTZ>3O?uc@gh7I3^UAfBwYPWLDGHpE>xyjEF}lxaG@1<$v-VYaG~85(`v=2L{lRk(tGC6 zJ*M~O_j;O8(y+T8?>aazp4JIng9P|BdSBuiLU67#mDqfT7^6V`LB^O7E!(>Kvv*!U zy16U2m%W7y8HR`{Kroil&dOI3>5~t{^&XLTWi`-+q`!s%Nm({9{=ebKe=m@~VP`@4 zuF~&#V|1_$Zp`)0N?=TEb`S6074nm4eJm|wK{h;=1^(@_eza+mT55L;r*GT3X;)?L|DD-Bw z3LQx=zwCE2ts7W#A6Nr6fqWspAdAqxn-k;ZK6k?Zl%`y)+>!NKGMc;A0{`#}d4c!b4r~t_3OB`Lw0jtOR4K~zibPInnFb)Zh zo_MMW%{<{{$6_0`SVECjI31?;)mf&2~=td=fjTxZ~Ql~|DTfdH#c;D&5%(l z`gg8+NM149DJy}q8hgai?W>O6$f9k$oC5Ux!yex~n9$gh_2or0C$fl{vkLweExcQ`bVQ(ValQqv;??xpww=uMVC6FJj`T?$SpjxzA79nW1Bm=Qh za*0)D+q1uus?V^Fl)?iN?~#5zwZFQ&NQkWIWR11Pzj77|x{k9~1QdKa;i{?5udPUd*)K<(K zAgvz`lu}2jI%mxu%KG4wECX!-=a~=ihCy^KSoQeWL9>C9$O0WS`rBm#1rOoCEZQ1j zg_%a^BD`iQGlzO5FioKVJjL)@a4CSRu#*|~FCdNsE4dle!vq9kPQ1V^Iy(BXS?n-dMP&(_YCLX z%gEP(ezIf{vm65oXuY0_vYdnD7}v&#NqN_tyrbBfC)S$)h;_WJZ3ztm7VLsGN?LwQ`%bo+VOMWpYmBXJ zoY#`+b$=MBqtSymGNv?!1vBlmV@00xit$8oi9sKvEo)$5;C$#);uqp#V~#lCKH|#B zz{B%#q%g^9*gDX`cS%N&&~r_v;Tqy$BDxK}GQk}5c`ko)cZpq58dT_9lL_HGnR}#i zMzZhrDNNHbHkw(kz1k;>O}4p^s?G2lAzK1E0xD&Ql7=tI&!_+ptv)YW`l&S$Jf7dP zk<8r#V$YHsVT|NuWe|3oc$RpW>>M(jR{8L8kk<3)fw^d});yD@NahmR4Roh!U{#F$ zIMq%3z0k)v6}Gd!M0V=* z%>FGG%+RHJGR=ITzg~2{k0jQ0w=n)rBgxZCy z@byn2Y~Rwr^wYifPpGZ@_DIu~)xq>+zUAy##s>F78hDjrZdEcD)3otg>2&K?SjP)+b zF*!Rg@`pxNS_O04xx35cS}V>SZ)+>KM{6cM3mA6egpO&^+$l3Ki^AI%0PcEMa1Fp^ zZ{-&aLT0Q#nH5Fp?hi=kCoU!eg2XK`rZx;S)Y6=o2?*K7v28O1Fsym_{{fz?r?H9j zTFs(sdH(MhK-U&3wHBYym^;AueQ1v`&(QAy@K{KrhWA~@;ap3UZYd%Fje=fW_r=x^ z;c>kWc||SA#YsVz`?u`KD1Zmr&y$1lX_4{lsN9DD& z(*Pe`glh;M1E-ly!iM-IWq|oeqt|A*pe_Q@3>gLORQUGmvLBGv!(5Eb(*H%BE`=0o z|K^42R$I7+JF9~nBF~r90E+dg7?=i-_t7%{^P|rV0*}l)in~T%_YhQ<43kB;SqzTw zSXgyIiFooI@M&#`|9t>RJEgk&L7JdtspE_rPgT>YJ=K44+Z>1LLOt(29;(~5u93{L$ z&hQ<we=yW|^;^YKkh&Usl3(zb8K4N&}QT!UsXIo_lu1 z010?Iz1uJMyi`T&|J#oJ%+R?CBU|r#((N^EbiC|GyeLkh^ZMI0Igvl6hgv}C#K_?{ zfADc-Pj~K1&FR-z_=4L=1t8-GBcc+q)>!}u?;LdG_kE;!Om*`m%^2Y64ZM$p3P%uo zv#alNc<-K`^2u${a=5n%FeGxa)Q`1x|5O<{F8ggq3~lw-4bkxA^gGXhm!KMbyl`a-$}59@w-cPGVY$vJqe(d!cWuI|A{kyQl-KT-=9GKEB@5rLNC|l=70Rn z&;R?eTeCEt|J%tZ;_CtDYm@}c1M-pErM|UK<%fGZ3?t3c(^vbD9A)}81W1GTuq-~K z+t7}6*lVyvF=IytFT*GAnmn+Y6jVYxhfOpSAF-QgX=~H&=`qOrLP(Rl5`=lcHOrj` zIS6RoejB1YL>h9(Ts*UK6EdG1vU-9?OdKxPW$2Hc`HA(O{)Lnh%H)mG^jj(_e+I5T zhG>SSfsL+tZ~n_JhenA2b=y^=_b++1?{~p44Mb|*SQjDD7y~lmHav=tWdFjhnwehW z_6Wd^rUgmPpYxHSM(A2HF&<4gAAR0xc|rM73m zfR3P{whi(~j!JmRKuDn0ya%js6tczZ2x}veLI?XN+>sA?61eO4WpZbl^d8h#MTnTC zb44Ni7Ey`>z#&)8{(i@e=`MH0!aJd}txc(-Kp3MP@L01lpTCd|uo%gzl^CfeN(g->8Db$$jn@Ts;o2=Uns>q#Kf8o&+Uouf|AwlA;_U6ozs^Z2xR zG6s+|XWseG3CGbS2;bogUtV7CKd`d0a#1M-5NmCWf;~I7(K~ZH+##EGu5_f$VX63a zTic=y#ltMoI{?E+ke}MZ~)r(*CJ8j4QZQ0POM(JL}Mh~R<+{R z)QnGY)s(GG(FI0Cv?37fe?;SFkAeqxUh;i}^w=s_j}(tb)LLov@NyUQ{g@&=`FVEq zt9mquoRD0OVaXsZ-_~aLB4S#0OtFeui1#sTXri$A@gCp+itV(P3wSM0Gf^V2tr zIK^7xbV@B2A3Vj{@tO!E(sYon*0d#M&QY>ka(OkQsC|jrEK_T^eNxRVE^6F$Dwv#Y zzA}_@uKFARd{wO-3zjo;FI=Yl)*{t7Q=B1GQjqWI$+|j5uNVI7>{e$0W_>C;3S37t zfL~u?7SDJP8rS>>G)VmQo{4z#2m)HDd`_1WV zX1gKF9knC&tqlTEPxnpq8RV_a$uVURpKp`aCHe`s^2aZAo!iq&NEcbbb6sFT| z2%heBUGcTjpQtp*hcp0Ph4^gD7P`87CnzIfG6l(2VOP#F=mvs$YXQgR^Gm;v39rjU zV}qHFr5~cf^_i%|E!wfPW-@3tt@{}MhRF3dH4d){8E2Hc#)*pQ5%yvQ?t%zttI83L%nFy6LKAGc}6m()96?omp zdH*yV?nyK!=9mwlN;clvwy)KvbUt5*W!x}m*t-ty=aKze{`8^?$Q4<>To<{|R)D(I zLTF7Dp(+?-eHF6j`bvyV{w#wW63jBLHJRI#Px+qJ-aNeVl3FFGc9wZmIBHV@F! zRuF}}Bl4~XA`L8c$irp*YE_72!TPn^C#yh!0Guu{#a{R7lEBOO$rt z`3@~Vq^r)3#=I{kDxXoeRo|FTd|g!49tX^~1g=pPY8UD#(=W3#?Da*f^>cU!D?aRM z17U0}4>g8s zt0%7_3fJhf#U)ZE;7;SF#nd}PodIs=U6+4TE6t~*lD7jsFTR|@KdpcD7S*K!5uueF zB;-IW>D$sCrXv-35o-#OH}JCaL9HjaLm)V4Cv;czqsE+btIvo?n3R}fuRUeST!f-b zT)DcY)e$fkMK*XC>bVi_51Hhik5= z{=g{sq~(kaQLWNhr*CP>;lSSbVBI?V-}gEHcjn!dDNs8!rr*kS5V-LR!Wvs)iqCl5 F`aiFVs7e3; literal 0 HcmV?d00001 diff --git a/4-Classification/1-Introduction/images/korean.png b/4-Classification/1-Introduction/images/korean.png new file mode 100644 index 0000000000000000000000000000000000000000..18d77efd3ff1943572961c0a8d0fc852ee586b35 GIT binary patch literal 9108 zcmbVS2Urv7x=w+hQglR`G)WQZU=$(<=o-p^f>H!TAOZ#>Ri#Nybk)!$^lk}NL_k0m zM8=gBL69{RDFRv0#ZUyKO1P+VbGLpg+L%V@ZZZ@Fz|2tJs#f<_^$i_+H8S;!nU|G;p44-rVdvS2*I7-K1j}B z#2Wl1$=}G{-`bbzA9U{Wd4%^le?Ko@e=nNrp1|{$uh4vbi0WGEnyPy)`1||mYH0j< zUfuVyheq|(EeQl-4+10~wF%Ci8Vt?0vu)U%eJ3rWz+Y%nxwT5+zUh%87|{a@!Af(i zz+F7DGsi_{UADsF%q335TQJhFJ>qw#`&OoE{H^AfuN|MUXpP26M*rPOLoVe%5 zj<0Ty5;aEKG+*RqXb^{&=4R8IC#U;8A1(9=%7xi`PEQX-Z)t05^IG*+2}Dq-)SL{_ z;*1Oj5eqhaaEO$cnUV34OTI{@roOQFMFi6A$qoH1m6DS3i^z+%QtRU4;;RJg%7L1W zsH!HL)2FSRoSY^#+uC%+zz&`24)gkd9pw5Wc*MR?ulKcyDJpiA5$h|iw>Wo07AL{R zi1mFcDxR}i7HmR6psj3*sE4?o(8SNV^^>)(w5p9iyATg~uddf6fX@N{MT zLFw6_SR3oBPnJm%}5<^yl+oW4E+#rY0r!$B&PI}j}a#r>}HkdJbW+^ub@SITtG_H z-T?^Gx0|h$v+qI^&<-vULUENSdmLQ#C2JM2Jtj?#kS`LyfHFpR67+@F_=+yvd;Odx zmuMpxi^hmeW{Cq?xRDO%aq;jEdC0v526B8MsMQ0PEs}8el(8js(^f;RfEIBgSX9yC z!^*^?$fY?pm{j|LWls9?3U`Q0M@#N^-?H?6Ziv5DKtAmDXoBI!V8>WakeqeFapawX zf}1Zy^kM;=@l0PwL~8c~yC-0&1v@TFBpVe=$hy02Ig%!RNEK@!4jB6WzU%`@^Do>o zYEsN9(&Hy-U~eLZ70quh+&emWGmk&}?61uRKt<{K!VA=z(cOB1s+|QVLYL;MXq}|d zSYqu&S$*($abdb%kh;WKKO`W$>EQMLWt?<|<~bCR+V<^P9>39ziBYdBI^i0C^?siN zNwlzd{uFJ$)0s4E@gw$jR02}bbUb5GcKZ&%7o~!a0~b=LPY}My2Emna;nGwaV{M$; z0b{Et{tXNFp)Z(K5432q4>9S$Q{gv*ZKa=XZkK=aTBEToM&sG5pKl0$g#-+sTL$wA z&*pNb%QrVxuXLR>R?&7q5O2_Vu#3hI89n6Z!V?b1%Lvs{K!w&u2A_XT|LX z0CW5T=Co1M#Y*xLNO5j;@MLJ1G=+gns2F-0{mh*W><1F}rN8wlDL;X7&Gn69_FT)( zg&>CFtFjjc&wIedMuuXGkG3ux1Uj_`R|Ozj2n!C<#n|pX)SkXLDdJ5jB6a3T<9b|$ zTl3}BsV9%A?Yl2tya=c8{mRNO`omw}(+YEXpI)7-&zo2>;k%9zb{%?p)myGmlqSlt z^89V=0po#m@ydN>I-VxMJ$Gksw+m9H)3iRJV&R>O&v8THiJU(2+3hT+?F8(G$l8Rq z<;%8G(Qo_avOaEi^<#=%@Ta@C(zU&J%dc?-lvGsQKEMCz@a9_Ru-9T@LfDV_=4qAZ zUSKvhmKx_E^me~j3b$sWEW@rH^z6~}9Xf0rxBC}(cd!1Q;URh8!bzO|lXy{DL{cYO zJOpKp8!2`b=*m_Ie_gzYd*4=?5%Vw+b6y*%F4=;IzBRr*?y>FH-+w& zlteSo(nFD3L#G>b3Ufo0uM2E1h}_%|lFxOT?Y}yflcD8?8Ow__2nYyxI`>8|Y<ReP^2LHG6nyW8yAocv~)Y%qG?SKULcRjs7nU_Q{OJ0tzQgKA$W0`Ez_O`am ze%OP=$Hplm`HfSDJQS^n#L$FX{m?Jt4<4o3?0*<3PwU7P&U@K^c(~e=N}YcD$i?t< z+1ZxNgPzf{X1B)s9xJY|EO>Uq086DV%AG%detvqmvG<(+Hs}6>MB?k`QC)3qssi7} zeJm3eSJl*2R^mCQGniFvqa{c5=ovbU`}49}_wZJrV!lp6)naY+6kutmK#OKuN6X&I za8RPzVr>i5Sks`_x`hQim<25hJY|*7{>X|zs$wjXY*c6!?AL$)jTt8rQTio(rMEWQuR;voI>lb9#+zp zV5WMpo!h1SN>RMN0*%2oH0@VOv1A=M_{-u`c4X@Oi(T(j!SlsKTfpu4bLncu_cDSz zG(K&{+tfT@XA$%>TuE9fL23Rs*k6k`8VfFib?WP1*O1E;(eOCSNbT;dp6-LIW8PHi z!%j|ppa8-Hu_?R;NFtw#tT}T);F6BtDe}|3#d(vXQSXo@s6^RBzfZS}RFX#tx;en&u@i4GW|=*MSPm5zZ8J7aD|$SuYE%&G( z#$ff4Aj)awNBqhyJNqfzjV`+&N&s@4tw+9+prkwNno~|eT*?mS7Mdnvknf+1DhPvwzyvc)cnqq->dNaC|1uEq^%6I z2;Q-3kI22pwc}B#8E6 z9}}kKo#3gi3Cbg#qnUXqL$t^q$D0QBoMx*WCqnqsjM~CcXM1(RgtWxSX64w}%K%b{= z{=oHwG&-x>Lu-Jc-yj`=H(w8`u+THb1J!8uPF$g$x^i0d(h01iYlLh%x@x&w0GLgv znh%L%U#_fZ*k;c$UHA-n7w6JFIn|@`Y&Y?I_j(j5chV4(0$D4ylquBDeO0f_n768& zcsExgHg@~jN1IjNek#@5IDejV`g%!8Nl{sudY~lCLZjK4%%Y24d3j#$F!elE7t-=vV1J$sxE79{)SxgyjMu$gN%%TJ&K$jJxs6`_NH+Yg(9xWFDVTid-eH8gS5$y zW9W$_{EQg)_o@b!ld`~d`Yrl&{+AZ$N2Vn5?3VYpTjlvhqM5(BYv^yDK6)6<=s*de zhLdc1G%4UH!PdNFsH#kT36Y-BCvL%hXeTJ6#=8vWVU!tS!0hpULow2UN}Yn?ZD9_a zRmLf3+A60V&f!&U?hdel{uc5FEzyQZWviSB$P287bQnkZXWL?#!ocjdnf>Ki`AT6s5V1CjDl>Y%Cj1cVn2-54GDg@-M5E%caW!(+9d$`yww6F_|x& zfuO(S<1=CYfvlOl;tYpdUt+6L@Zpm762KFr-R<@RDui%V+e2|)KczB*fncCEUA4Gp z0+Y?EY2-V+`wzkeSCC(A3T5n5YC0~YiNdvKYNXG6LKsF zA6*|{eB*<6CcU{wK#3%^=J*pIN9oemSEc$Ee!byN{FTRYR?I>!`4(P*$2n~ksMOIP z!=9k?1G96k&8U;Mwqo7g-QQ(UqKznE8#grRhVd4djb;y84?bYXn>5{LEh4GtJXNpb zoM6h~BTIuT{=X`vXQW^hIkKH|z-%+-ya}QcHzEduomz@K|76-j6uLZ%?F>}ZJllia zi%vp@*a*#Jy>D;pzb(p(=S`jNn!?EJ0wxAjB;jYxe3Td{cZIIST14!#?_bx~y!(vB zK4-*iZ!2{_lo+@ZsNk#J6>Q4+f!kU&5q3v;@W-lMq(XbG%#0K)JZ}ni|< zsbZ)Lv*3b!8F(_7O6{SG(xRj|PqHr>8o)!f2h-mDNpSvTRR6+!{?`+0OlYTdeMx+4 z6fs8jro$=yv1a!((qCf#!kX&NVpD9CXqRvwx0{vp$1lD~_IDZ}dr+xPC(_K>7*^QS zKwIf`|88nU82rYz>4c!wol* zLR~526XA?u1Nzz3z%tXkP`+&7lxFUcOrLWV&8Icwr>rvf+)|TKc%g)MqLdTg* z0J;GQ^M@~?jZK<#wxOe$Ex;);z;-Z5$#<_myzLLm>iCv_ARkVx@kPJ!E#I^OM_b-B zUQ>oq=M z2Pr^86rzLcBg0$6rsMpvn1aGuM98jreSNI9H_iaQ-7$FH#@2bA$7g<^xRal%)3*J3 zMS5yoYjL`)m(MD@h}f9!-nyqa==lk~4zwb=@%DiW(%}h-T_-rt+*^VUX-bpq ztQ;<`jGF?Va%8cfTKfZSqFSg z-;U7fnPxI?z?$3=(qJ@OeKmZwv#**Zj#{$Tw2osMB1d1P+Aet)OmAXx`62|eMS%%M z9@3mq$g#T;@pEc7{VNmz?Wgej*(pb6skW%^bv(1tVPhad()*>Bk5plcbAxNI8F1nu z{)}h-`H@wutFhpOyvSKepjoHV*PA?labU|geNyt^NwqG_gKyxw*;zKj_aD6ZCP#G1 zb`5<+PGH>@$<8~$A>{17%s}kOQSS4-i`8Gf;H>D+73BpondS`h7+4o%h~=dP{!_*L z=bVd7_=f=bqast=gFjb}M$2BSzUJ{aW2(Oe)HUJoXrZwBs0oP8$f%Ze!A*U@juf~y zQ>dp~%|eWoX?7rypq5-^&#?z_czrx9E4)Cl;?G}Msb**Jym*_yJM610anq2q97U_I z8LZE#{*CN0e2yYdx_C&QFqJz#r9wMBJ^_H}tf*=$PIlKOnzaaiOu{?W#?8rS6rZtL zCKwaV5FGFeP^b*BK5lRaXMEP_-FqzE?B02qiGr8CR_O^5~W(O@7=> z#O9MbQ_mqoG|pOj!m8ma#4CQ|PWfSc;s@?cs%wLIdge-_Y;8flq7?9_p~86qM+v`K zIPHn}@+M1N;4|7;(VV_yj`}20Y7roTd}3ZFU8|Rpg0?0IBT)NPE`<7!pYlH?6pJi- zAI)1e|DGGq5FrOdC-zk~3Hhnayy(d*gIBJaWx@VSxrSIw$!^nyQTj24C_M%*jEp#v zs@b@wnDx+w({Ps(dgb|8yZh+SKlpZa-&VM~-BRJ@JAv|h9WOTs*P@np{c6G~s)DuM z9@sf(z`3f=cvogzcr+hsd+~Ey&%Ep(H9H*S{7MzMDzog`FDG!AlezJ&awgBm?vg~E zRiF{e0g^_Beu4-r~dRP4+5vUd!Mns>)!`mX@Nl` zmOAR?fql7hK$`aH;kLHu*HQigNjnRY3NjoNf6I6Jrj7sC3i18K%sUse<5MNN%8EW6 zokH!U$%n^GqH5Cr9=G2N)&H*i{P-zlS^@h8OWb_73eOZ2#CMz3IaqB6O-$?RxwZER zgWcq;LBa`es@rU3$!5bAoU?XmOsz6k&cpJOuZ_MF_lnf3t~kj#U+m=qKE`*OhLAbX z;%<4^A<$6dJd{O0l?0CATs1WX(^0W;Oa=F#LwSO2{1~(Gr1;fsepf`jMbIn8P28@R zYQ<|hfYyHOP1I}p<1E4BoH4p6!!qA3B0cZ~2G&4ZxNvM5$IJdR=m+HZVQAHYZ_`9v`T5Ws9E+Zp;MF$)=a2r=R9v41HTEhLzC3w{k{^J7P(KmkNEDh+WZvp7O< z#*w1mxB9~{?Oyvkf-t{m zd@0rJ{gqO*hA}!oTb(=N(Em(++3b0B*C1}>1my->tN6GQEi1!8047wP$LK0}Z;Uo< zV5uoO>JDyA-~Qg_dv5fDyr5~D4Qs9(DA-c=_L=Yw z;4xkpT|!t#FKcl1+`(m7pd}Fo_2@SSu5Q59$gEX;sc-SNzijy)Y=rZq{TWy~!6^Lz z<|P>ph1lY~*pl4?V^X|8md!D3$ESCCE8x|eB_;Y4&IcqtGz~$0dV6QNidi(3UIZuG z{X>a0cbUsJ6>gfu$MGNQ|NkiM)OICJ)^2h~^h@GDT%^Be3QXs7)M<%uocpBsUt=rc z)o#xVfT7^{US0=lh3!@cuEekncw^tGsdWy>@=2Y)!}cKL(yb5=Re z!U^px8ypWtz~^ zY!AW?Z5R);mlmjFTb<;VFmck3~9jRB(80EmvvQYM?{&zrZ5PzrPHr+^Kb+!Q+RE#SLktH!bd9nw%JVZEq0qvGxaY=5RHX0$h5; ze!&3Ye7MF=!^m26Nt-tUZ&xNC%h2mH_Gb$GS*qX7G9k5jiY4c#=+nnofhcm zA);x_kwdTo1zH)lv`s1D@y%GTsGtzCm)ZG7fxTUJx2v3@G_gZ`r%iV!^(V6L4s@nw zWOy4Ya!xau1SlUm26GELR{N1c^Ae;p&F@vi5sezv2~_rWn|$fiRMgmqEi@Uw!07W^ z{aoW~;04j!_j6hstcsK?rDF`7Chf-;iX?Z4d7QO{dqAZzym3oy)*s5~ASVz@!}_T^ zmc@PEadTd+ur6lvh7NJU)VbeV5xC?)4#Q*R)B;uE;Ewov)cl2B|A!MZIF$K!;C-VC zn+T^-bt`j06IYGDV=XNaEJ>~+dCRNwo>i@Xo#%!Ui35zQv>*X7V2Rbx0pBN(lAHKD z{%>@Fpr6~Co7|<>epo(n-eyIrAug~3CiVSg$p;i0gw~MYD*Fp;FM;9uD!k-)k_SBI z&qU5zJkmLqIf}ripatm_fR+o#2O4IiG4A7B8`wzb5#88j=K#}_Q4DC=avY$85J+?3 z_uHa*+Y6(2|F`F?m*`Ps-E@z~>Kjg5+wPB2w1{fxElWLZo<@!aaAg}a}zLW`oJuTW|n-ePku7~0Z#R+@C0`O zu#!NK#B2G|JIURV0?w$LtaSluxb%(kyA((7t^zokZhNo~fh(Hj>R2Ci8Nz@crnni4 zOQ}_&CoQzC2ZgdR{=P;!322w&1EX}4e&q<*4QYm_eG_44+ArkA_lA%q<*dRBcYkM+ zfbu~^sKI>_@tHuCm2|{=EH4Y%%~kQ1Y=k?DcbNMEg?bpUI%Gx@%ap(G8Yd|Z-%Tdp z*I|YjpK<+1un4Yy)F!HEyZpt0{==yL4>9qFQ)8*EtX2JByeOOWxc6LoMnKVG*x9>( zugLwNsjf9AfVP@Fud7WYs_ch(Bp)Z{`LGzEBc=jN>VZsAngDCwiTm1JV=Mw$gdC^O z!pZMK@5H+1u$7g78QpR7`p#C>?C}#58OHrnb~E35xWe54 z*iBh3K#UnR?clB+>9`i|D?z}1uiWu-l783NxvUUmbK#dE=puL?s_i)-3HTya2swOv zif^%<`uAUeug!($AI1C%HMr{tuZt8B>fe5~}HoK(Epq_U0_DPLOR5)^zAd2b8w37IF0=y4ClS zis0zeSLBH70$h{>glNoE{&Y zO=N!R>=haIvI$Am2@niYV%bd1_WnFtaiiJQQ!b(4>fe0;^NtU$0_E3dgT31SdCA`8 z?`@ZVF#vaBd6M93!nzTs$+fu?zz?eO&Vw<6t$}Hi2K&0#MhcjI4rSA~=|4kTxG&l?*!>x4UF(<(6g2K$5U9rF%hl!^`rW6giRk=gbi()*prhlQT@+*u}5 zF81ec%eFbW5%X#d2ySFY$46DI8#Rru*ew|~Il6CB7W4UK_MyUXX+?XK-4k!6P!RkPs3XI?~A`Oot^?^}MlKPBZ}1J*U5e?z+5u3tSv z%gp@J3x?vpx3i|N(=E{&=p}R{KWrBUS*83u7apX_+0dmV#K+YRI`M}I_GTMQAEhOj z#2iXI1IA3Qx)2s*HDS}z1?`f+)Dc`t;RKVL3Ye0Zd7NW~l>LX>(Z-xv=rwT4deUAO z2hX=+TW>x0N~6u`Xy+1_d&e7HEIv9##NtyD32z5*BRa*k?#<{u*dfI4sQa%{A5G*P zbq;x>Tbkod{U4!Fsa{5iNPT9n31-!5bt^xKqB7*PAw-u$)r;$PEPI@?fi~;vFtCKI3;p$%p$>E$|8a1 r$NiZ}8XA4O|NFZl{}+30Szg|%*J0v6cArSsu+fOKKFmW9`Xc%ST z9_18mU{qANk%q>P^Xg#{0U8xk zS4ANZ85n>$N+yq9zx0O5jA-cvFI>aOx6g8a=Ep6mAM(29o_cSg zo5s>`Qd0cFmk$p&h4QtZp7?mMz4@a@#>vVf>BCXD9=Wewu0`&L^2`#zyB~7w8I2+n z+1+j@{9CT?5s=V*dZcZ|IhX7q+E;cGzOlMkSa1w>2h*6iQHiIxBbw0o>y&xVz@!uk zc}lgmuEJ|qJF~{I;<96E=-|wCvouje5_ct^s*X;f_X`1eKor&sfvWOIBnw;DNc0J> z*kG9VoZ;(6$lIhY=@t2KquIUTR+#m&(>b-!Wymt7S(;?VZ3j8WJqFFYL9_%W2{Z7= z1cPKcY9y_sFHEOUX0$=wXFB-T`t!@U!J24)p8%`TY3E$ZenvFoHBpSmn~W2L9DO7v zW>8V~CH4W2m<(cuds>~`E4i{~c3#BB3UVrWH_Fejg)Nz}It^XlAG(HMKX;vw(xACI zhJIeXGBtEUM`yrs`r=NpolGLJ0)m+H4BARU%(-Gr1iuda7D0uX-zK)Vr^Bm-SMZp1 z)p~PoL!L|x{K@#F@cYXS>|T7$ZNI)yIi$Q5WbPQqUm$tr9hS;e(Y<)R$w^wKEOH>E~B|zbdX5)HN{EID1(`)BmjqWMX1sh0SaJ zTB}v+C7zn|s=%f2tz)pvC7Pb3ZGP1v-^gbMCw`nSn?OTonbZs2AOiTI{Mhlh`CET1 zL{Z5fE5(e@sdiEwlGZjBIGr!PJi1Z6Japcp3=54MWBzV?Vs+Qd`fBh1m_^ki*HtnP zM9N1;M|(%Sk@G%3RpC$5OKFUfg-xeLMPlld@ufoQ%f{>za_!fM?W84TaXp_r-zr!h zT!w2pbzzX`o>Tcj0qLu}{4SF`+C)U4-0S1@k~St!c+hczGl|Q!AHy0qRz^e{U!zh? z4p*HSbbk>cS6cTc-`Z5wiP}l8$`Enebk*Nq%5G{xZr!>iAg1|1wPH*?6X{2zNG87^ zNhDK^s=qi8jwH8`^r113s(Z}FUDZykD&&X?sRS-_r%SMp>w-+UCi&yhU}mNkGy6bl z#)(Aovs0}R%Y!j=T)PAq&MzMx^Xlp8d2Q5xP_O;^n^#!<;?i3FH$>B&{YPT|e526j z8vw5UK4d=3X&`ghc*9#rDlOjt?nZUP58o$M-Hpp25* z>94q5XgOi)lTWT=i@SW;7<+C;hS`{jAh9=^-7mtgHhY8@u?0#;IIF+KjL^Pg;F zIEx}Kk0q#ZNquLkYR&p?g{xA$AZbdY{( z2QhW&{SMr>RdW~NY#&m*RdWsNGRjX|Q~bd=p{aXu)D4HrdFpwQBRlZEN;2FV6B1ZO z1$klcXwog3pfL0ldJ(R8AZ-~EZsDW=ywrKx&MXRY3DvY{c4S*(ODGnMc6<-lgF9!g z$4XS26iT!+3+@(7ecYGx@R%az2RlR~jt36;R-m@IMd4?HXxfh22pDg=_u2oRivHpg zn~DpyvDAWpm?E7#YLG*Y-dVYOj`*EvzF){u3k<(1u)D3i1R#|twbn41MRm#}5~&m) zq~Bgmm6uat@B9XExbFNmD?5_9o$&8oXpTCdN0PA#6}BlEXp+|*U;)nbYfkYZz0_hc&O_x;*B zg+V?>ek5J4Y9Yh@$e#RGvIBR%WwSGo0WpWX18#iHTh+IzsXpzJ0&TUxSg#H<+z1*^ z;;zCK;0XkO(i!*=zc)mHTHV1kL&Wk` z!&Yb%4R|o2BMR%1s+gO&CfQOZ+4b(6sP%mwRhTP$-WHbvKa(0I{T{wzF%f{ni68q_ zBjxMU!w-L{sHgx11kUw1y4wQtFz$TIdU7#{d%XXrWE*PqsY7;^hCINHd@X9zZ2R7W zfxc2-Qlc`=Y|->UMNBN0JpMufweoDj5v#pgI&`7Hux4FB!^S^0zKP^%8;ixtYDyVe zvUt(b{Mq~h@OKa!*-c&XY9#VDv;>+cf^dM)2^v<-v&5C%;SLxnNyJM+0KwL_=Xh#m zoP2mK#kW1Ees8QIa)>wEPHT24(7(}47=sbuHiO!Qo=gb6PJDwnP9z>)9$Y)q_6Q8e z75y;L6e1Nph2oeph8V+yhYTx4v%S&}3|bycz(0v1Bng6~MDBewgK$`j)kA~gngEq7 z6C0KwBNt!4oWabJL4-j&Y*%%{W9k0pu?kZTb7{^4(12d8j8+8; zKRD`-nM@pH9J+Q~>X0CUfV)HR3msQgJn&M0gk6K8E`yTs^IPhdZC{VCF18 zyN$#1>k&dJY+muoCwjZhH(7QHelUaTbmhsW@O!wsDjpv{d|wti=(y2IxPkIJb-wjC zCetVnc|Q*^kemk8qx^hovZfK(3m8?!0iKA0{|2b1XWnLIZ(=J&H7g_zgYf#cun3%K zp6dAOxqFmOLYRJVq88FD-NF=H(uSPXffm$AVrAXvCs6XMPnM#!L3w)dF9Yx&@!fcF zSSvbhHBGR~E;}b$9JlA>H~Rpb&&APftG}#BzN@u_I_I%RFId>2Qg+3XLfcNI#(nxC z!1|jwe@5>CHu7zrp)mI~$6fYL%ouUnJ=^x%FxjlzqbN*^;-rQOM2v-MQRyRju1 z?f}2st@g1%kT&zYxl?}#^quQ>Mo_v#{GSBZ_eF0(H4PSxCES%cVr(5oBcv6Yk!JGd zoMw11MgZ!|`!2NjHgQEE+*DHydBv)^d~cBSzjS)Bn;#B0Am`S^b(0H+Y4#)#V+9R! zM|MX-r(s8xIp@75dc_ce_@R2lTnf>h_Y=I%d@)H2xi_C<^Z;kYck4&-OusGw6B;vg zk*1SoW>kMS=1kf!)427aTZSvhpI&dgG)=i~-t7`T$ns_EW^m&-K@8U2*f;yJtF)HV zfFC+QI+sz-JNt~ZS?nPsm73q7f=_~fp&vPa3Y?&-+jbTvZ>!h6P*6ld^6r+4t?0H) z9;M_d*cMP@gQj0EC@w28WRv3<4#*MOIsG+hHl36UTg@o%fZkqc7slayY!P>(1^1MK zqC$)PXvW{?iigFlfw}u`;5+qCm!g#~J%q10P9}xmaQ=4-o#Jpf+&^ONAHVn?)8((9 zC&?Wb{{hW~fty06!6S%m&uP=bn@S0@HqzS$*&-9=O=n+>_wdJoJ6)zo3FGr%O}m~% z$pa_SK>c-!l&H>;0G!Y{8Dkn53(){Sy4%*sq+&*m6GKY3S-nQdKrcGXo5^ufILQR_ z^e)qi($oc6PV(i}fXYrB&L!Uu%)w`j7Yt3d0JiX|tz0l_B#^DozTdT^$9_l@gE6^; z>?TcMV*J@+h?mKh*HglH`{0?Ph?mHbXL~2a|CJ=qzdiV zIIMc`UNEuGp#12R&&dOzklf%p7K}aPe~~8NdEdrjBRSN!`yV1NG-?Yo(wW>YL;*3_4srH8*;P+U5wbOS;mWp-j3ifXmyJ?J(>+V8 zUdW-jCPEUd$D%pr{$j0+nBZ3)6X<#{=2yy9uw33Sav4uJ8wP|ybPm_LaX44_j&RLc zbCMbSlciRjKHSZ!Is4t9myV7~TCG#ucbShu`H6-5OAdxzBc4KyC|>{Fd9`}c2RV|! zaAEW!NAehpjB+B_Gw5_|L?wgkyjvvCR}!Hmvcje5;#m5lO>o=Hx|hl9Pp!qm{;>i* zmcyxX?WxXr>5`7xGr0?Np{Mv$(sblVyKcg+**tu0hvS9j)rV?oB8XJfqhAo0r)!r( z12VvBdK?j_T#8~@k69hukK9D!ceN-C=zq*i)MqUvZa$7w*43d;-EBp59$|<98xZtV zw@qXIFP73^C^@xzx|Nr|nZs(HI9>XS%J?_HsiZ0*pO-uw0sY|9;^Ox2y(uJ3 z9m&@|5NY5C8i|Rq|JhFb-Jbd>k7#{iq8+I}xTxpaU#%!-!IXb%j(KyXQuyW6(#x$` zJ|s)3RlfOmkC4`7&zdJK9c_zBUARc8@0`pajZu5A(K^sTj7_{dD^JeedPISQ|7Z3iH_S)36Xe;5UglsQud^# z)wUqbd7!%CU*pcL|<1JakEYYUQo*B?;MEQq;f8a?F=+h6dyNK?0qxXZt ztTHFD8O=Gz#iGQidGv-Wn5zwGzy}adL#V>}#A>M(m2mCZ zDw0PpoNCdm$u1%-{myX5Dr&m2plZc#tXQCe6eQVe5Os7891{qR(KivYyBj6`zMitt zL<}Sn1sR=;LuzVLKfq(^Vg%R`IMU7IBmwyFNiW372%&ote28R^fP%4YZIOQDr+EH*wf}psVRP*2PmO3?&y(~Rmch{f zLlgZVO+2G`u@**2>dYFjl~KqFwj)*c%@C_i3q6_45ZgV{cBBTr3w-YIb?&Ed3Bnws zgu&0CqWtP9N(>$>*Q{Y#^=Z>oLNsDd8rb5mmhQ_MdEDKKY0j$&$^(s|PfPCG)F`m3 zH3baeiWCp9+x!fnB@oL~U4Hu6M)Gdt$b`OYf!Hp)bf zqEFt4eX|uQ1wVccXX{-ouHPi(XMbHl{>P2TUZV3snA!z zzo4u-p9l>auw~7lP4f~Pj3uXQ~AkG|ntdX3jci1lh!R9yyOq?T3SReatPXm68KvzszL zSBLtobSf4)m~0N^h1bGv5K#1z!Zc4_h!vG%#)xFxWuV#jizb7qOQQ$3aG}F6`Fzeb zh5@~Vp40KBjd3W>bcn%Ac$_RZC4GwhSR^k=o^6Qr1?RD&%=Z2!4j9yXvJnwm=d~U} zNuu5$rb44`x1%fR?X(gCm;a<-cBHUxlpVwV&;_U zJ0Ng+anRLlK|A|Sr{qZOq9>+QNYZs;#&dOw9%63CtS5b}=l0x3M#rNs!i-!JPn7)= zuOQkqF?JtPpIVg!S(^FtxBd@r=gzc$WN0A$PGRbH)E%6QamWZRJOjYq@h1A-4n2@r zIed_)km*D9fOY{-dN#n|%z@ir`PV5Xzssebvw$EWu`ZzQtQh_MZn(At!$OSJ3O z6N2lzs+EzOLZ5h~U;@0Z$EB{1w*!c?y5T&tae%C1j(E#F&x`deUZ55h0M&7Hl9QS# z{^Ofil_jf!fn!g1mGc>DA}t|q@GUuxk4Ja!#p~-K_g1*PHhDv?-mbcYUn}MkQjBX% z3Tl%BOck%N1`%h4!4~f>#4S)Cg!78$AOmeNMRw+H@MXuMxwQOFE6(`DSLKjSamGbr zEVuIK*o?p_vGL7F4UQjHL{t5;GRL{E5Zn$LsknbcN98$L-%K(sye1cdpvI)Cqz`Nl2LSb5|bb-YHs{;U=jnD?in@xnmD{d^_gKHtN0 zh9~aT_;7v7u)jRjKazAe&U~o7{M^j|sAzue{B5{|&CzfEM_C-O?@34cB>-zfq#7#c zQgRa`k4}pS4!xj{gNj;DDB*u;soCo5Uaa`+%>GDp|Ik(Tveo!Tt?{{Z(H7%-xsO*G zlY_xZyJh6?C{`J#Ri;^Q#`lT4i@M~&AQT0Mx-K_~%vgJJsfHT8I%v*5#ExSxGkiUI z`wvf@hm^t+Z2KeP>XfdGD`R)J1dC-+a_?Hz*cP01g)@t+#oTe3N-!79+i SrtNKJ2w-s(Q*s!4@xK5LmgwpL literal 0 HcmV?d00001 diff --git a/4-Classification/1-Introduction/notebook.ipynb b/4-Classification/1-Introduction/notebook.ipynb index e69de29bb..8fe41283d 100644 --- a/4-Classification/1-Introduction/notebook.ipynb +++ b/4-Classification/1-Introduction/notebook.ipynb @@ -0,0 +1,28 @@ +{ + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3 + }, + "orig_nbformat": 2 + }, + "nbformat": 4, + "nbformat_minor": 2, + "cells": [ + { + "source": [ + "# Delicious Asian Recipes " + ], + "cell_type": "markdown", + "metadata": {} + } + ] +} \ No newline at end of file diff --git a/4-Classification/1-Introduction/solution/data-prep-visual.ipynb b/4-Classification/1-Introduction/solution/data-prep-visual.ipynb deleted file mode 100644 index a62ec2542..000000000 --- a/4-Classification/1-Introduction/solution/data-prep-visual.ipynb +++ /dev/null @@ -1,1521 +0,0 @@ -{ - "cells": [ - { - "cell_type": "code", - "execution_count": 1, - "metadata": {}, - "outputs": [], - "source": [ - "\r\n", - "import pandas as pd\r\n", - "import matplotlib.pyplot as plt\r\n", - "import matplotlib as mpl\r\n", - "import numpy as np\r\n", - "from imblearn.over_sampling import SMOTE" - ] - }, - { - "cell_type": "code", - "execution_count": 2, - "metadata": {}, - "outputs": [], - "source": [ - "df = pd.read_csv('.data/asian_indian_recipes.csv')" - ] - }, - { - "cell_type": "code", - "execution_count": 3, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
Unnamed: 0cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnac...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
065indian00000000...0000000000
166indian10000000...0000000000
267indian00000000...0000000000
368indian00000000...0000000000
469indian00000000...0000000010
\n", - "

5 rows ร— 385 columns

\n", - "
" - ], - "text/plain": [ - " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n", - "0 65 indian 0 0 0 0 0 \n", - "1 66 indian 1 0 0 0 0 \n", - "2 67 indian 0 0 0 0 0 \n", - "3 68 indian 0 0 0 0 0 \n", - "4 69 indian 0 0 0 0 0 \n", - "\n", - " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n", - "0 0 0 0 ... 0 0 0 \n", - "1 0 0 0 ... 0 0 0 \n", - "2 0 0 0 ... 0 0 0 \n", - "3 0 0 0 ... 0 0 0 \n", - "4 0 0 0 ... 0 0 0 \n", - "\n", - " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", - "0 0 0 0 0 0 0 0 \n", - "1 0 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 1 0 \n", - "\n", - "[5 rows x 385 columns]" - ] - }, - "execution_count": 3, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 4, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "\n", - "RangeIndex: 2448 entries, 0 to 2447\n", - "Columns: 385 entries, Unnamed: 0 to zucchini\n", - "dtypes: int64(384), object(1)\n", - "memory usage: 7.2+ MB\n" - ] - } - ], - "source": [ - "df.info()" - ] - }, - { - "cell_type": "code", - "execution_count": 5, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "korean 799\n", - "indian 598\n", - "chinese 442\n", - "japanese 320\n", - "thai 289\n", - "Name: cuisine, dtype: int64" - ] - }, - "execution_count": 5, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "df.cuisine.value_counts()" - ] - }, - { - "cell_type": "code", - "execution_count": 6, - "metadata": {}, - "outputs": [], - "source": [ - "#df.keys().values" - ] - }, - { - "cell_type": "code", - "execution_count": 7, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 7, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAD4CAYAAAAtrdtxAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAR0ElEQVR4nO3de5CddX3H8fenAYIRSERSuqKyYCMWQQEDFeMFlXop1EvLdKS1QscaW2sVnVHjSL10qhMvtVSpTuOlOmqpBVEpdIrUVqx4gQ0EEm5SNQp4AazGOzrw7R/nSTmGTcj+ds+eJ+z7NbNznv09v+c5n7PZzWef5znnbKoKSZJa/Mq4A0iSdl2WiCSpmSUiSWpmiUiSmlkikqRmu407wHzab7/9anJyctwxJGmXsn79+tuqavl06xZUiUxOTjI1NTXuGJK0S0ny9e2t83SWJKmZJSJJamaJSJKaWSKSpGaWiCSpmSUiSWpmiUiSmlkikqRmlogkqdmCesX6xpu3MLnmgnHH0BzZvPaEcUeQFjyPRCRJzSwRSVIzS0SS1MwSkSQ1s0QkSc0sEUlSM0tEktSsFyWSZFmSF3XLxyU5f4bb/1WS40eTTpK0Pb0oEWAZ8KLWjavqtVX1H3MXR5K0M/pSImuBhyTZALwV2CvJOUmuS/KRJAFI8toklyXZlGTd0PgHkpw0vviStDD1pUTWAF+pqiOAVwBHAqcBhwIHA6u6eWdW1dFVdRhwH+DEe9pxktVJppJM3fGTLaPILkkLVl9KZFuXVtVNVXUnsAGY7MafmORLSTYCTwIefk87qqp1VbWyqlYuWrJ0ZIElaSHq6xsw3j60fAewW5I9gXcBK6vqxiSvB/YcRzhJ0kBfjkR+COx9D3O2FsZtSfYCvAYiSWPWiyORqvpukkuSbAJ+CnxnmjnfT/IeYCOwGbhsflNKkrbVixIBqKo/2M74i4eWTwdOn2bOqaNLJknanr6czpIk7YIsEUlSM0tEktTMEpEkNbNEJEnNevPsrPlw+AFLmVp7wrhjSNK9hkcikqRmlogkqZklIklqZolIkppZIpKkZpaIJKmZJSJJamaJSJKaWSKSpGaWiCSpmSUiSWpmiUiSmlkikqRmlogkqZklIklqZolIkppZIpKkZpaIJKmZJSJJamaJSJKaWSKSpGa7jTvAfNp48xYm11ww7hgak81rTxh3BOlexyMRSVIzS0SS1MwSkSQ1s0QkSc0sEUlSM0tEktRsp0okyedHHUSStOvZqRKpqseMOogkadezs0ciP0qyV5JPJ7k8ycYkz+zWTSa5LskHk1yV5JwkS7p1r01yWZJNSdYlSTf+mSRvTnJpki8neVw3vijJW7ttrkrywm58Islnk2zo9rV1/lOSfKHLdHaSvUbxRZIkTW8m10R+Bjy7qo4Cngj8zdZSAA4B1lXVI4AfAC/qxs+sqqOr6jDgPsCJQ/vbraqOAU4DXteNPR/YUlVHA0cDL0hyEPAHwIVVdQTwSGBDkv2A04Hju0xTwMtn8HgkSbM0k7c9CfCmJI8H7gQOAPbv1t1YVZd0yx8GXgK8DXhiklcCS4B9gauBf+3mndvdrgcmu+WnAI9IclL3+VJgBXAZ8P4kuwOfqKoNSZ4AHApc0nXZHsAX7hY6WQ2sBli0z/IZPFxJ0j2ZSYn8IbAceFRV/SLJZmDPbl1tM7eS7Am8C1hZVTcmef3QfIDbu9s7hnIE+IuqunDbO+/K6wTgQ0neCnwPuKiqTt5R6KpaB6wDWDyxYtuckqRZmMnprKXALV2BPBE4cGjdg5Mc2y2fDHyOuwrjtu5axUncswuBP+uOOEjy0CT3TXJgd9/vAd4HHAV8EViV5Ne7uUuSPHQGj0eSNEs7eyRSwEeAf00yBWwArhtafy1wSpJ/AG4A3l1VP0nyHmAjsJnBKal78l4Gp7Yu76633Ao8CzgOeEWSXwA/Ap5XVbcmORU4K8nibvvTgS/v5GOSJM1SqnZ8hifJ/YHLq+rA7ayfBM7vLp732uKJFTVxyhnjjqEx8a3gpTZJ1lfVyunW7fB0VpIHMLhY/bZRBJMk7dp2eDqrqr4J7PA6Q1VtBnp/FCJJmnu+d5YkqZklIklqZolIkprN5MWGu7zDD1jKlM/QkaQ545GIJKmZJSJJamaJSJKaWSKSpGaWiCSpmSUiSWpmiUiSmlkikqRmlogkqZklIklqZolIkppZIpKkZpaIJKmZJSJJamaJSJKaWSKSpGaWiCSpmSUiSWpmiUiSmlkikqRmlogkqdlu4w4wnzbevIXJNReMO4bUbPPaE8YdQfolHolIkppZIpKkZpaIJKmZJSJJamaJSJKaWSKSpGaWiCSp2ZyWSJIPJDlpmvEHJDlnLu9LkjR+8/Jiw6r6JnC3cpEk7dpmdSSS5HlJrkpyZZIPdcOPT/L5JF/delSSZDLJpm751CTnJvn3JDckecvQ/p6S5AtJLk9ydpK9uvG1Sa7p7utt3djyJB9Lcln3sWo2j0WSNHPNRyJJHg68BlhVVbcl2Rd4OzABPBZ4GHAeMN1prCOAI4HbgeuTvBP4KXA6cHxV/TjJq4CXJzkTeDbwsKqqJMu6ffwd8LdV9bkkDwYuBH5jmpyrgdUAi/ZZ3vpwJUnTmM3prCcB51TVbQBV9b9JAD5RVXcC1yTZfzvbfrqqtgAkuQY4EFgGHApc0u1nD+ALwA+AnwHvTXIBcH63j+OBQ7u5APsk2buqfjh8R1W1DlgHsHhiRc3i8UqStjGbEgkw3X/Kt28zZzrDc+7ocgS4qKpOvtsdJccATwaeA7yYQYH9CnBsVf105tElSXNhNtdEPg38fpL7A3Sns2bji8CqJL/e7W9Jkod210WWVtW/AacxOBUG8CkGhUI3/wgkSfOq+Uikqq5O8kbg4iR3AFfMJkhV3ZrkVOCsJIu74dOBHwKfTLIng6OVl3XrXgL8fZKrGDyOzwJ/OpsMkqSZSdXCuUyweGJFTZxyxrhjSM38eyIahyTrq2rldOt8xbokqZklIklqZolIkppZIpKkZpaIJKnZvLwBY18cfsBSpnx2iyTNGY9EJEnNLBFJUjNLRJLUzBKRJDWzRCRJzSwRSVIzS0SS1MwSkSQ1s0QkSc0sEUlSM0tEktTMEpEkNbNEJEnNLBFJUjNLRJLUzBKRJDWzRCRJzSwRSVIzS0SS1MwSkSQ1s0QkSc12G3eA+bTx5i1Mrrlg3DEkzdDmtSeMO4K2wyMRSVIzS0SS1MwSkSQ1s0QkSc0sEUlSM0tEktRsZCWS5PMznH9ckvO75WckWTOaZJKkuTKy14lU1WNmse15wHlzGEeSNAKjPBL5UXd7XJLPJDknyXVJPpIk3bqndWOfA353aNtTk5zZLf9Oki8luSLJfyTZvxt/fZL3d/v+apKXjOqxSJKmN1/XRI4ETgMOBQ4GViXZE3gP8DvA44Bf2862nwMeXVVHAv8MvHJo3cOApwLHAK9LsvtI0kuSpjVfb3tyaVXdBJBkAzAJ/Aj4WlXd0I1/GFg9zbYPBD6aZALYA/ja0LoLqup24PYktwD7AzcNb5xk9db9Ltpn+Rw+JEnSfB2J3D60fAd3lVftxLbvBM6sqsOBFwJ77sR+/19VrauqlVW1ctGSpTNLLUnaoXE+xfc64KAkD+k+P3k785YCN3fLp4w8lSRpp42tRKrqZwxOM13QXVj/+namvh44O8l/A7fNUzxJ0k5I1c6cUbp3WDyxoiZOOWPcMSTNkG8FP15J1lfVyunW+Yp1SVIzS0SS1MwSkSQ1s0QkSc0sEUlSs/l6xXovHH7AUqZ8lockzRmPRCRJzSwRSVIzS0SS1MwSkSQ1s0QkSc0sEUlSM0tEktTMEpEkNbNEJEnNLBFJUjNLRJLUzBKRJDWzRCRJzSwRSVIzS0SS1MwSkSQ1s0QkSc0sEUlSM0tEktTMEpEkNbNEJEnNdht3gPm08eYtTK65YNwxJGlebV57wsj27ZGIJKmZJSJJamaJSJKaWSKSpGaWiCSpmSUiSWpmiUiSms1piSSZTLJpLvcpSeqvXhyJJFlQL3qUpHuLkZVIkoOTXJHk6CRfTHJVko8nuV+3/jNJ3pTkYuClSR6V5OIk65NcmGSim/eCJJcluTLJx5Is6cY/kOQdST6f5KtJThrVY5EkTW8kJZLkEOBjwB8D7wNeVVWPADYCrxuauqyqngC8A3gncFJVPQp4P/DGbs65VXV0VT0SuBZ4/tD2E8BjgROBtdvJsjrJVJKpO36yZc4eoyRpNO+dtRz4JPB7wE0MiuLibt0HgbOH5n60uz0EOAy4KAnAIuBb3brDkvw1sAzYC7hwaPtPVNWdwDVJ9p8uTFWtA9YBLJ5YUbN6ZJKkXzKKEtkC3Ais4q6S2J4fd7cBrq6qY6eZ8wHgWVV1ZZJTgeOG1t0+tJyWsJKkdqM4nfVz4FnA84ATgO8leVy37o+Ai6fZ5npgeZJjAZLsnuTh3bq9gW8l2R34wxHklSQ1Gsmzoqrqx0lOBC4CzgXe2l0Q/yqD6yTbzv95d2H8HUmWdrnOAK4G/hL4EvB1BtdU9h5FZknSzKVq4VwmWDyxoiZOOWPcMSRpXs3274kkWV9VK6db14vXiUiSdk2WiCSpmSUiSWpmiUiSmlkikqRmC+qNDw8/YClTs3yWgiTpLh6JSJKaWSKSpGaWiCSpmSUiSWpmiUiSmlkikqRmlogkqZklIklqZolIkppZIpKkZgvqj1Il+SGDP8XbV/sBt407xA6Yb3bMNzvmm53Z5DuwqpZPt2JBvXcWcP32/jpXHySZMl87882O+WZnoebzdJYkqZklIklqttBKZN24A9wD882O+WbHfLOzIPMtqAvrkqS5tdCORCRJc8gSkSQ1WzAlkuRpSa5P8j9J1owpw/uT3JJk09DYvkkuSnJDd3u/oXWv7vJen+SpI872oCT/leTaJFcneWnP8u2Z5NIkV3b53tCnfEP3uSjJFUnO71u+JJuTbEyyIclUD/MtS3JOkuu678Nj+5IvySHd123rxw+SnNaXfN39vaz72diU5KzuZ2b0+arqXv8BLAK+AhwM7AFcCRw6hhyPB44CNg2NvQVY0y2vAd7cLR/a5VwMHNTlXzTCbBPAUd3y3sCXuwx9yRdgr255d+BLwKP7km8o58uBfwLO79O/b3efm4H9thnrU74PAn/SLe8BLOtTvqGci4BvAwf2JR9wAPA14D7d5/8CnDof+Ub+Be/DB3AscOHQ568GXj2mLJP8colcD0x0yxMMXhB5t4zAhcCx85jzk8Bv9TEfsAS4HPjNPuUDHgh8GngSd5VIn/Jt5u4l0ot8wD7df4LpY75tMj0FuKRP+RiUyI3AvgxeRH5+l3Pk+RbK6aytX+CtburG+mD/qvoWQHf7q9342DInmQSOZPDbfm/ydaeKNgC3ABdVVa/yAWcArwTuHBrrU74CPpVkfZLVPct3MHAr8I/d6cD3Jrlvj/INew5wVrfci3xVdTPwNuAbwLeALVX1qfnIt1BKJNOM9f25zWPJnGQv4GPAaVX1gx1NnWZspPmq6o6qOoLBb/zHJDlsB9PnNV+SE4Fbqmr9zm4yzdio/31XVdVRwNOBP0/y+B3Mne98uzE41fvuqjoS+DGD0y/bM66fjz2AZwBn39PUacZG+f13P+CZDE5NPQC4b5Ln7miTacaa8i2UErkJeNDQ5w8EvjmmLNv6TpIJgO72lm583jMn2Z1BgXykqs7tW76tqur7wGeAp/Uo3yrgGUk2A/8MPCnJh3uUj6r6Znd7C/Bx4Jge5bsJuKk7ugQ4h0Gp9CXfVk8HLq+q73Sf9yXf8cDXqurWqvoFcC7wmPnIt1BK5DJgRZKDut8kngOcN+ZMW50HnNItn8LgWsTW8eckWZzkIGAFcOmoQiQJ8D7g2qp6ew/zLU+yrFu+D4Mfmuv6kq+qXl1VD6yqSQbfX/9ZVc/tS74k902y99ZlBufLN/UlX1V9G7gxySHd0JOBa/qSb8jJ3HUqa2uOPuT7BvDoJEu6n+UnA9fOS775uBDVhw/gtxk84+grwGvGlOEsBucrf8HgN4HnA/dncDH2hu5236H5r+nyXg88fcTZHsvgcPYqYEP38ds9yvcI4Iou3ybgtd14L/Jtk/U47rqw3ot8DK45XNl9XL31Z6Av+br7OwKY6v6NPwHcr2f5lgDfBZYOjfUp3xsY/GK1CfgQg2dejTyfb3siSWq2UE5nSZJGwBKRJDWzRCRJzSwRSVIzS0SS1MwSkSQ1s0QkSc3+DwlMP+/hPKDCAAAAAElFTkSuQmCC", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "#display classes in bar graph\r\n", - "df.cuisine.value_counts().plot.barh()" - ] - }, - { - "cell_type": "code", - "execution_count": 8, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "thai df: (289, 385)\n", - "japanese df: (320, 385)\n", - "chinese df: (442, 385)\n", - "indian df: (598, 385)\n", - "korean df: (799, 385)\n" - ] - } - ], - "source": [ - "# ingrediant counts by class count\r\n", - "# filter to thai food, display ingredients graph\r\n", - "\r\n", - "thai_df = df[(df.cuisine == \"thai\")]\r\n", - "japanese_df = df[(df.cuisine == \"japanese\")]\r\n", - "chinese_df = df[(df.cuisine == \"chinese\")]\r\n", - "indian_df = df[(df.cuisine == \"indian\")]\r\n", - "korean_df = df[(df.cuisine == \"korean\")]\r\n", - "\r\n", - "print(f'thai df: {thai_df.shape}')\r\n", - "print(f'japanese df: {japanese_df.shape}')\r\n", - "print(f'chinese df: {chinese_df.shape}')\r\n", - "print(f'indian df: {indian_df.shape}')\r\n", - "print(f'korean df: {korean_df.shape}')" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "## What are the top ingredients by class" - ] - }, - { - "cell_type": "code", - "execution_count": 9, - "metadata": {}, - "outputs": [], - "source": [ - "def create_ingredient_df(df):\r\n", - " #transpose df, drop cuisine and unnamed rows, sum the row to get total for ingredient and add value header to new df\r\n", - " ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')\r\n", - " # drop ingredients that have a 0 sum\r\n", - " ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]\r\n", - " # sort df\r\n", - " ingredient_df = ingredient_df.sort_values(by='value', ascending=False, inplace=False)\r\n", - " return ingredient_df\r\n" - ] - }, - { - "cell_type": "code", - "execution_count": 10, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 10, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "thai_ingredient_df = create_ingredient_df(thai_df)\r\n", - "thai_ingredient_df.head(10).plot.barh()" - ] - }, - { - "cell_type": "code", - "execution_count": 11, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 11, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "japanese_ingredient_df = create_ingredient_df(japanese_df)\r\n", - "japanese_ingredient_df.head(10).plot.barh()" - ] - }, - { - "cell_type": "code", - "execution_count": 12, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 12, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "chinese_ingredient_df = create_ingredient_df(chinese_df)\r\n", - "chinese_ingredient_df.head(10).plot.barh()" - ] - }, - { - "cell_type": "code", - "execution_count": 13, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 13, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "indian_ingredient_df = create_ingredient_df(indian_df)\r\n", - "indian_ingredient_df.head(10).plot.barh()" - ] - }, - { - "cell_type": "code", - "execution_count": 14, - "metadata": {}, - "outputs": [ - { - "data": { - "text/plain": [ - "" - ] - }, - "execution_count": 14, - "metadata": {}, - "output_type": "execute_result" - }, - { - "data": { - "image/png": "iVBORw0KGgoAAAANSUhEUgAAAdIAAAD4CAYAAABYIGfSAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjMuMiwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8vihELAAAACXBIWXMAAAsTAAALEwEAmpwYAAAijklEQVR4nO3de5xVdb3/8dcbHCEFKRU9COqYkSh32VioeYvUUstTIJzIvD3i6FHJ0oxSj3hOHTXNLhoqlULpLxPUvHDycggvJCkz4Dhc0jqCJy6pVOIFIYTP74/1ndiMe4BhzcyG2e/n48Fj1v6u71rrs7748M13rTV7KSIwMzOzbdOh3AWYmZntyBykZmZmOThIzczMcnCQmpmZ5eAgNTMzy2GnchdgbWvPPfeM6urqcpdhZrZDqa2tXRkR3Uutc5BWmOrqampqaspdhpnZDkXSy02t86VdMzOzHBykZmZmOThIzczMcvA9UjMze49169axdOlS1qxZU+5S2lTnzp3p1asXVVVVW72Ng7TC1C9bRfX46eUuw9qxJdecVO4SrAUsXbqUrl27Ul1djaRyl9MmIoK//OUvLF26lAMOOGCrt/OlXTMze481a9awxx57VEyIAkhijz32aPYs3EHaxiSdKumQreg3WdKIEu3HSHqodaozM9uokkK0wbacs4O07Z0KbDFIzcxsx+B7pI1IuhZ4OSImps8TgDfJ/tFxGtAJuC8irkzrrwDGAH8CVgK1EXG9pAOBHwHdgdXAl4DdgU8DR0u6HPgccBwwFtgZ+CNwekSsTuUMl/RlYG/gqxGxyUxU0q7AjUB/sr/LCRFxf4sPiplVvJZ+tqKl76V36dKFt956q0X3ubU8I32vu4BRRZ9PA14DegOHAYOAIZKOklQgC8PBwGeBQtF2k4ALI2IIcAkwMSKeBh4AvhYRgyLif4F7I2JoRAwEFgHnFO2jGjgaOAm4RVLnRrVeBvwmIoYCxwLXpXDdhKSxkmok1axfvar5I2JmZk1ykDYSEfOAvSTtI2kg8DdgAHA8MA+YC/QhC9Yjgfsj4p2IeBN4EEBSF+BwYKqk54BbgR5NHLKfpKck1ZPNbPsWrbs7IjZExB+Al9Jxix0PjE/HeBzoDOxX4pwmRUQhIgodd+nWrPEwMyuHr3/960ycOPEfnydMmMBVV13Fxz/+cQ499FD69+/P/fe/9wLc448/zsknn/yPzxdccAGTJ08GoLa2lqOPPpohQ4ZwwgknsGLFihap1UFa2jRgBNnM9C5AwNVpFjkoIj4UET9N7aV0AF4v6j8oIg5uou9k4IKI6A9cRRaGDaJR38afBXyu6Bj7RcSirT5LM7Pt1OjRo/nlL3/5j8933303Z511Fvfddx9z585l5syZXHzxxUQ0/t9iaevWrePCCy9k2rRp1NbWcvbZZ3PZZZe1SK0O0tLuAkaThek04BHg7DTTRFJPSXsBs4BTJHVO604CiIg3gMWSRqb+SrNbyO63di06VldghaQqshlpsZGSOqT7rR8EXmi0/hHgQqXHzCQNboFzNzMru8GDB/Pqq6+yfPly6urq+MAHPkCPHj345je/yYABAxg+fDjLli3jlVde2ar9vfDCC8yfP59PfOITDBo0iG9961ssXbq0RWr1w0YlRMQCSV2BZRGxgizoDgZmp8x6C/hCRMyR9ABQB7wM1AANNyHHADenh4qqyMK5Lv38saRxZEF9BfBM2r6eTUP2BeAJsoeNzo2INY0ezf5P4PvA8ylMlwAnY2bWDowYMYJp06bx5z//mdGjR3PnnXfy2muvUVtbS1VVFdXV1e/5nc+ddtqJDRs2/ONzw/qIoG/fvsyePbvF63SQNiFdai3+/APgByW6Xh8REyTtAjwJfDf1XwycWGK/v2XTX3+5Of1p3O/MJup6nOx+KBHxDvCvWzwZM7Md0OjRo/nSl77EypUreeKJJ7j77rvZa6+9qKqqYubMmbz88nvfbLb//vuzcOFC1q5dy5o1a5gxYwZHHnkkBx10EK+99hqzZ89m2LBhrFu3jhdffJG+ffuWOHLzOEjzm5S+YKEzMCUi5pa7oM3p37MbNf4KNzNrpnJ89WPfvn1588036dmzJz169GDMmDGccsopFAoFBg0aRJ8+jZ+/hH333ZfTTjuNAQMG0Lt3bwYPzu547bzzzkybNo1x48axatUq3n33XS666KIWCVJt7Y1aax8KhUL4xd5mtiWLFi3i4IObekayfSt17pJqI6JQqr8fNjIzM8vBQWpmZpaDg9TMzEqqxFt/23LODlIzM3uPzp0785e//KWiwrThfaSdOzf+NtbN81O7Zmb2Hr169WLp0qW89tpr5S6lTXXu3JlevXo1axsHqZmZvUdVVRUHHHBAucvYIfjSrpmZWQ4OUjMzsxwcpGZmZjn4HmmFqV+2qsXfdG+2tcrxNXNmrc0zUjMzsxwcpGZmZjmUJUglXZReO9acbY6R9FBr1dQeSFoiac9y12FmVkmaFaTKtET4XgQ0K0jNzMy2R1sMRUnVkhZJmgjMBX4qab6kekmjUp8ukmZImpvaP5Pad5U0XVJd2maUpHHAPsBMSTNTv+MlzU7bT5XUJbWfKOn3kmYBn91CnUdLei79mSepa2r/mqQ5kp6XdFVTdaX2f09950uaJEmp/XFJ35P0ZBqLoZLulfQHSd8qquELkp5NNdwqqWMTtXaUNLloHL+S2g+U9LCkWklPSeqT2rtLuifVNkfSEal9D0mPpvO9FVATxxsrqUZSzfrVq7b0V25mZs2wtbPLg4CfAd8CegEDgeHAdZJ6AGuAf46IQ4Fjge+mEDoRWB4RAyOiH/BwRPwQWA4cGxHHpkuRlwPD0/Y1wFcldQZ+DJwCfAz4py3UeAlwfkQMSv3fkXQ80Bs4DBgEDJF0VKm60j5uioihqe19wMlF+/97RBwF3ALcD5wP9APOTIF2MDAKOCLVsB4Y00Stg4CeEdEvIvoDt6f2ScCFETEknc/E1P4D4HsRMRT4HPCT1H4lMCsiBgMPAPuVOlhETIqIQkQUOu7SrekRNDOzZtvaX395OSJ+J+l7wC8iYj3wiqQngKHAr4H/SiG1AegJ7A3UA9dLuhZ4KCKeKrHvjwKHAL9NE8CdgdlAH2BxRPwBQNIdwNjN1Phb4AZJdwL3RsTSFKTHA/NSny5kwfpUE3UdK+lSssvOuwMLgAfTugfSz3pgQUSsSHW9BOwLHAkMAeak83gf8GoTtb4EfFDSjcB04NE0Cz8cmJq2B+iUfg4HDilq3y3NuI8izdQjYrqkv21mfMzMrBVsbZC+nX6WvHRINvPqDgyJiHWSlgCdI+JFSUOATwFXS3o0Iv6j0bYCHouIf9mkURoEbPVrByLiGknT07F+J2l42vfVEXFr4/6N6wK+QzYDLETEnyRNAIpfAbA2/dxQtNzwead0rCkR8Y2tqPVvkgYCJ5DNbE8ju2/8eprNNtYBGBYR7zQ6B2jGGJmZWctr7oNDTwKj0j2+7mQzomeBbsCrKUSPBfYHkLQPsDoi7gCuBw5N+3kT6JqWfwccIelDaZtdJH0Y+D1wgKQDU79NgrYxSQdGRH1EXEt2ebgP8AhwdtE9156S9mqirobQXJn6j2jm2MwARkjaKx1rd0n7N1HrnkCHiLgHuAI4NCLeABZLGpn6KIUtwKPABUXbD0qLT5IuH0v6JPCBZtZsZmY5Nfebje4DhgF1ZDOhSyPiz+ly6oOSaoDnyEIQoD/ZfdQNwDrgvNQ+Cfi1pBXpPumZwC8kNVzKvDzNZscC0yWtBGaR3ZNsykUpxNcDC4FfR8TadO9ydpq9vQV8AfhQ47oi4nVJPya7dLsEmNOcgYmIhZIuJ7tM2yHt93zg5RLdewK3a+MT0A2z2DHAzWk/VcBdZGM9DviRpOfJ/s6eBM4FriIbt7nAE8D/banO/j27UeNvlzEzazGqpJe2GhQKhaipqSl3GWZmOxRJtRFRKLXO32xkZmaWww73pfWSzgK+3Kj5txFxfjnq2RJJz7Dx6dsGp0dEfTnqMTOzlrXDBWlE3M7G37vc7kXER8pdg5mZtR5f2jUzM8vBQWpmZpaDg9TMzCwHB6mZmVkODlIzM7McHKRmZmY57HC//mL51C9bRfX46eUuw2wTS/y1lbYD84zUzMwsBwepmZlZDg7SMpD0VrlrMDOzluEgNTMzy8FB2kySdpU0XVKdpPmSRkn6uKR5kuol3SapU2q7r2i7T0i6t+jzdyXNlTQjvSQdSQdKelhSraSnJPVJ7adIeiYd438k7Z3aJ6TjPS7pJUnj2no8zMwqnYO0+U4ElkfEwIjoBzwMTAZGRUR/siehzwN+AxzcEJLAWWz8sv1dgbkRcSjZC7mvTO2TgAsjYghwCTAxtc8CPhoRg8le9n1pUT19gBOAw4ArJVU1LljSWEk1kmrWr16VewDMzGwjB2nz1QPDJV0r6WNANbA4Il5M66cAR0X2xvSfA1+Q9H5gGPDr1GcD8Mu0fAdwpKQuwOHAVEnPAbcCPVKfXsAjkuqBrwF9i+qZHhFrI2Il8Cqwd+OCI2JSRBQiotBxl265B8DMzDby75E2U0S8KGkI8CngauDRzXS/HXgQWANMjYh3m9ot2T9qXo+IQSXW3wjcEBEPSDoGmFC0bm3R8nr8d2pm1qY8I20mSfsAqyPiDuB6sllktaQPpS6nk12uJSKWA8uBy8ku/zboAIxIy58HZkXEG8BiSSPTcSRpYOrTDViWls9ojfMyM7Nt49lL8/UHrpO0AVhHdj+0G9kl2Z2AOcAtRf3vBLpHxMKitreBvpJqgVXAqNQ+BrhZ0uVAFdn90DqyGehUScuA3wEHtNK5mZlZMym7lWetRdJNwLyI+Gm5awEoFApRU1NT7jLMzHYokmojolBqnWekrSjNON8GLi53LWZm1jocpK0o/RqLmZm1Y37YyMzMLAcHqZmZWQ4OUjMzsxwcpGZmZjk4SM3MzHJwkJqZmeXgIDUzM8vBQWpmZpaDv5ChwtQvW0X1+OnlLsOsVS255qRyl2AVxDNSMzOzHByk2zFJ/51eCm5mZtspX9rdTkkScHJEbCh3LWZm1jTPSLcjkqolLZI0EZgLrJe0Z1r3RUnPS6qT9PPU1l3SPZLmpD9HlLN+M7NK5Bnp9ucg4KyI+DdJSwAk9QUuA46IiJWSdk99fwB8LyJmSdoPeAQ4uPEOJY0FxgJ03K17G5yCmVnlcJBuf16OiN81ajsOmBYRKwEi4q+pfThwSHYVGIDdJHWNiDeLN46IScAkgE49evtN7mZmLchBuv15u0SbgFIB2AEYFhHvtG5JZmbWFN8j3THMAE6TtAdA0aXdR4ELGjpJGtT2pZmZVTYH6Q4gIhYA3waekFQH3JBWjQMK6SGkhcC55arRzKxS+dLudiQilgD9ij5XFy1PAaY06r8SGNVG5ZmZWQkO0grTv2c3avz1aWZmLcaXds3MzHJwkJqZmeXgIDUzM8vBQWpmZpaDg9TMzCwHB6mZmVkODlIzM7McHKRmZmY5OEjNzMxycJCamZnl4K8IrDD1y1ZRPX56ucswK6sl/ppMa0GekZqZmeXgIDUzM8uhXQWppGpJ80u0Py6psA37O1PSTS1TnZmZtUftKkgNJPm+t5lZG2qPQbqTpCmSnpc0TdIuxSsl3SypRtICSVcVtQ+V9LSkOknPSuraaLuTJM2WtGepg0qaLOkWSU9JelHSyam9o6TrJM1JNf1raj9G0pOS7pO0MG3bIa17S9J3Jc2VNENS99R+oKSHJdWm4/QpOvYNkmYC15aobWw655r1q1flGlwzM9tUewzSg4BJETEAeAP4t0brL4uIAjAAOFrSAEk7A78EvhwRA4HhwDsNG0j6Z2A88KmIWLmZY1cDRwMnAbdI6gycA6yKiKHAUOBLkg5I/Q8DLgb6AwcCn03tuwJzI+JQ4AngytQ+CbgwIoYAlwATi479YWB4RFzcuKiImBQRhYgodNyl22bKNzOz5mqPlwH/FBG/Tct3AOMarT9N0liyc+8BHAIEsCIi5gBExBsAkgCOBQrA8Q3tm3F3RGwA/iDpJaAPcDwwQNKI1Kcb0Bv4O/BsRLyUjvUL4EhgGrCBLNgbzuFeSV2Aw4GpqS6ATkXHnhoR67dQn5mZtbD2GKTR1Oc0E7wEGBoRf5M0GegMqMR2DV4CPkg246vZhmOLbBb5SPEKScdsrtYS7R2A1yNiUBN93t5CbWZm1gra46Xd/SQNS8v/AswqWrcbWeCskrQ38MnU/ntgH0lDASR1LXpo52WyS64/k9R3C8ceKamDpAPJwvcF4BHgPElVad8flrRr6n+YpAPSvdFRRbV2ABpmsJ8HZqXZ8GJJI9N+JGng1g6KmZm1jvYYpIuAMyQ9D+wO3NywIiLqgHnAAuA24Lep/e9kQXajpDrgMbKZasN2LwBjyC6rHriZY79Adk/z18C5EbEG+AmwEJibfjXnVjZeCZgNXAPMBxYD96X2t4G+kmqB44D/SO1jgHNSjQuAzzRrZMzMrMUpoqmridYc6TLxQxExbSv7HwNcEhEnl1j3VkR0adECk0KhEDU1W7pCbWZmxSTVpgdV36M9zkjNzMzaTHt82KhVSboMGNmoeWpEnNmc/UTE48DjTaxrldmomZm1PAdpM0XEt4Fvl7sOMzPbPvjSrpmZWQ4OUjMzsxwcpGZmZjk4SM3MzHJwkJqZmeXgIDUzM8vBQWpmZpaDf4+0wtQvW0X1+OnlLsPMNmPJNSeVuwRrBs9IzczMcnCQmpmZ5eAg3cFJOlfSF9PyZEkjtrSNmZm1HN8j3cFFxC3lrsHMrJJV9IxU0q6SpkuqkzRf0ihJQyQ9IalW0iOSeqS+4yQtlPS8pLtS22GSnpY0L/08KLWfKelXkh6UtFjSBZK+mvr9TtLuqd+Bkh5Ox3pKUp/N1Lq/pBnp+DMk7ZfaJ0i6ZAvnOVZSjaSa9atXtdTwmZkZFR6kwInA8ogYGBH9gIeBG4ERETEEuI2Nb3oZDwyOiAHAuant98BRETEY+Hfgv4r23Q/4PHBY2sfq1G828MXUZxJwYTrWJcDEzdR6E/CzdPw7gR9u7UlGxKSIKEREoeMu3bZ2MzMz2wqVfmm3Hrhe0rXAQ8DfyALwMUkAHYEVqe/zwJ2SfgX8KrV1A6ZI6g0EUFW075kR8SbwpqRVwINFxxwgqQtwODA1HQug02ZqHQZ8Ni3/HPhOc0/WzMxaXkUHaUS8KGkI8CngauAxYEFEDCvR/STgKODTwBWS+gL/SRaY/yypmk1f1L22aHlD0ecNZOPeAXg9IgZta/nbuJ2ZmbWgir60K2kfskuudwDXAx8BuksaltZXSeorqQOwb0TMBC4F3g90IZuRLku7O7M5x46IN4DFkkamY0nSwM1s8jQwOi2PAWY153hmZtY6KnpGCvQHrpO0AVgHnAe8C/xQUjey8fk+8CJwR2oT8L2IeF3Sd8gu7X4V+M02HH8McLOky8kuC98F1DXRdxxwm6SvAa8BZ23D8ejfsxs1/tYUM7MWowhfIawkhUIhampqyl2GmdkORVJtRBRKravoS7tmZmZ5Vfql3e2OpMuAkY2ap0bEt0v1NzOz8nKQbmdSYDo0zcx2EL60a2ZmloOD1MzMLAcHqZmZWQ4OUjMzsxwcpGZmZjk4SM3MzHLwr79UmPplq6geP73cZZjZdmyJv0a0WTwjNTMzy8FBamZmloOD1MzMLAcHqZmZWQ4O0kTSrpKmS6qTNF/SKEkflzRPUr2k2yR1Sm33FW33CUn3NrHPjpImp/3VS/pKav+SpDnpWPdI2iW1T5Y0omj7t4qWL037qJN0TWo7UNLDkmolPSWpT2uNj5mZleYg3ehEYHlEDIyIfsDDwGRgVET0J3vC+TyyF3gfLKl72u4s4PYm9jkI6BkR/dI+GvrdGxFDI2IgsAg4Z3OFSfokcCrwkbTNd9KqScCFETEEuASY2MT2YyXVSKpZv3rV5g5lZmbN5CDdqB4YLulaSR8DqoHFEfFiWj8FOCqyN6H/HPiCpPcDw4BfN7HPl4APSrpR0onAG6m9X5pB1gNjgL5bqG04cHtErAaIiL9K6gIcDkyV9BxwK9Cj1MYRMSkiChFR6LhLty0cyszMmsO/R5pExIuShgCfAq4GHt1M99uBB4E1ZO8KfbeJff5N0kDgBOB84DTgbLKZ7qkRUSfpTOCYtMm7pH/cSBKwc2oXEI123wF4PSIGbfVJmplZi/OMNJG0D7A6Iu4Arieb7VVL+lDqcjrwBEBELAeWA5eThWJT+9wT6BAR9wBXAIemVV2BFZKqyGakDZYAQ9LyZ4CqtPwocHbRvdTdI+INYLGkkalNKbTNzKwNeUa6UX/gOkkbgHVk90O7kV063QmYA9xS1P9OoHtELNzMPnsCt0tq+AfLN9LPK4BngJfJLil3Te0/Bu6X9CwwA3gbICIeljQIqJH0d+C/gW+ShfDNki4nC927gLptO30zM9sWym75WXNJugmYFxE/LXctzVEoFKKmpqbcZZiZ7VAk1UZEodQ6z0i3gaRastnixeWuxczMystBug3Sr5tsQtIzQKdGzadHRH3bVGVmZuXgIG0hEfGRctdgZmZtz0/tmpmZ5eAgNTMzy8FBamZmloOD1MzMLAcHqZmZWQ4OUjMzsxwcpGZmZjn490grTP2yVVSPn17uMsysnVlyzUnlLqFsPCM1MzPLwUFqZmaWg4PUzMwsBwdpM0n6oqTnJdVJ+rmkUyQ9I2mepP+RtLekDpL+IKl72qaDpD9K2lNSd0n3SJqT/hyR+kyQdJukxyW9JGlcaq+WtEjSjyUtkPSopPeldQdKelhSraSnJPUp38iYmVUmB2kzSOoLXAYcFxEDgS8Ds4CPRsRgshdrXxoRG4A7yF68DTAcqIuIlcAPgO9FxFDgc8BPig7RBzgBOAy4UlJVau8N/Cgi+gKvp+0AJgEXprfRXAJMbKLusZJqJNWsX70q7zCYmVkRP7XbPMcB01IgEhF/ldQf+KWkHsDOwOLU9zbgfuD7wNnA7al9OHCIpIZ97iapa1qeHhFrgbWSXgX2Tu2LI+K5tFwLVEvqAhwOTC3aV+PXuJHqnEQWunTq0dtvcjcza0EO0uYR0DiIbgRuiIgHJB0DTACIiD9JekXSccBH2Dg77QAMi4h3NtlxFoZri5rWs/Hvp3H7+9J+Xo+IQbnOyMzMcvGl3eaZAZwmaQ8ASbsD3YBlaf0Zjfr/hOwS790RsT61PQpc0NBB0qBtKSQi3gAWSxqZ9iNJA7dlX2Zmtu0cpM0QEQuAbwNPSKoDbiCbgU6V9BSwstEmDwBd2HhZF2AcUEgPLC0Ezs1R0hjgnFTLAuAzOfZlZmbbQBG+ZdZaJBXIHiz6WLlradCpR+/occb3y12GmbUz7f2bjSTVRkSh1DrfI20lksYD57Hx3uh2oX/PbtS08//gzczaki/ttpKIuCYi9o+IWeWuxczMWo+D1MzMLAcHqZmZWQ4OUjMzsxwcpGZmZjk4SM3MzHJwkJqZmeXgIDUzM8vBQWpmZpaDg9TMzCwHf0Vghalftorq8dPLXYaZtXPt/bt3i3lGamZmloODtMwkVUuan5aPkfRQWv50+uJ7MzPbjvnS7nYqIh4ge5+pmZltxzwjzUnSrpKmS6qTNF/SKElDJT2d2p6V1DXNPJ+SNDf9OXwL+z1T0k1peX9JM9LLwGdI2i+1T5b0w3SslySNaItzNjOzjTwjze9EYHlEnAQgqRswDxgVEXMk7Qa8A7wKfCIi1kjqDfwCKPmS2BJuAn4WEVMknQ38EDg1resBHAn0IZvBTmu8saSxwFiAjrt136aTNDOz0jwjza8eGC7pWkkfA/YDVkTEHICIeCMi3gWqgB9LqgemAoc04xjDgP+Xln9OFpwNfhURGyJiIbB3qY0jYlJEFCKi0HGXbs06OTMz2zzPSHOKiBclDQE+BVwNPApEia5fAV4BBpL9A2ZNnsMWLa8tWlaOfZqZ2TbwjDQnSfsAqyPiDuB64KPAPpKGpvVdJe0EdCObqW4ATgc6NuMwTwOj0/IYYFZL1W9mZvl4Rppff+A6SRuAdcB5ZDPDGyW9j+z+6HBgInCPpJHATODtZhxjHHCbpK8BrwFntWD9ZmaWgyJKXYW09qpTj97R44zvl7sMM2vn2ts3G0mqjYiSD4h6Rlph+vfsRk07+w/czKycfI/UzMwsBwepmZlZDg5SMzOzHBykZmZmOThIzczMcnCQmpmZ5eAgNTMzy8FBamZmloOD1MzMLAd/s1GFqV+2iurx08tdhplZm2rNryz0jNTMzCwHB6mZmVkODtIdhKQzJd2Uls+V9MVy12RmZr5HukNILwb/h4i4pVy1mJnZphykbUjSFcAY4E/ASqAWWAWMBXYG/gicHhGrJU0G/goMBuYC9UX7mQC8FRHXS/oQcAvQHVgPjIyI/22rczIzq3S+tNtGJBWAz5EF42eBhhfE3hsRQyNiILAIOKdosw8DwyPi4s3s+k7gR2n7w4EVJY49VlKNpJr1q1e1wNmYmVkDz0jbzpHA/RHxDoCkB1N7P0nfAt4PdAEeKdpmakSsb2qHkroCPSPiPoCIWFOqX0RMAiYBdOrRO3Keh5mZFfGMtO2oifbJwAUR0R+4CuhctO7tbdynmZm1EQdp25kFnCKps6QuQMNvB3cFVkiqIrt/utUi4g1gqaRTASR1krRLC9ZsZmZb4CBtIxExB3gAqAPuBWrIHjS6AngGeAz4/Tbs+nRgnKTngaeBf2qRgs3MbKsowrfM2oqkLhHxVpo1PgmMjYi5bVlDoVCImpqatjykmdkOT1JtRBRKrfPDRm1rkqRDyO6DTmnrEDUzs5bnIG1DEfH5ctdgZmYty/dIzczMcnCQmpmZ5eAgNTMzy8FP7VYYSW8CL5S7ju3InmTfe2wei8Y8Hht5LGD/iOheaoUfNqo8LzT1CHclklTj8ch4LDbl8djIY7F5vrRrZmaWg4PUzMwsBwdp5ZlU7gK2Mx6PjTwWm/J4bOSx2Aw/bGRmZpaDZ6RmZmY5OEjNzMxycJBWEEknSnpB0h8ljS93Pa1N0m2SXpU0v6htd0mPSfpD+vmBonXfSGPzgqQTylN165C0r6SZkhZJWiDpy6m9Usejs6RnJdWl8bgqtVfkeABI6ihpnqSH0ueKHYvmcpBWCEkdgR8BnwQOAf4lvYmmPZsMnNiobTwwIyJ6AzPSZ9JYjAb6pm0mpjFrL94FLo6Ig4GPAuenc67U8VgLHBcRA4FBwImSPkrljgfAl4FFRZ8reSyaxUFaOQ4D/hgRL0XE34G7gM+UuaZWFRFPAn9t1PwZYEpangKcWtR+V0SsjYjFwB/JxqxdiIgVDa/ti4g3yf6H2ZPKHY+IiLfSx6r0J6jQ8ZDUCzgJ+ElRc0WOxbZwkFaOnsCfij4vTW2VZu+IWAFZuAB7pfaKGR9J1cBg4BkqeDzSpczngFeBxyKiksfj+8ClwIaitkodi2ZzkFYOlWjz7z5tVBHjI6kLcA9wUUS8sbmuJdra1XhExPqIGAT0Ag6T1G8z3dvteEg6GXg1Imq3dpMSbe1iLLaVg7RyLAX2LfrcC1heplrK6RVJPQDSz1dTe7sfH0lVZCF6Z0Tcm5ordjwaRMTrwONk9/sqcTyOAD4taQnZLZ/jJN1BZY7FNnGQVo45QG9JB0jamexhgQfKXFM5PACckZbPAO4vah8tqZOkA4DewLNlqK9VSBLwU2BRRNxQtKpSx6O7pPen5fcBw4HfU4HjERHfiIheEVFN9v+F30TEF6jAsdhWfvtLhYiIdyVdADwCdARui4gFZS6rVUn6BXAMsKekpcCVwDXA3ZLOAf4PGAkQEQsk3Q0sJHvC9fyIWF+WwlvHEcDpQH26LwjwTSp3PHoAU9LTph2AuyPiIUmzqczxKKVS/9toNn9FoJmZWQ6+tGtmZpaDg9TMzCwHB6mZmVkODlIzM7McHKRmZmY5OEjNzMxycJCamZnl8P8BPHn55hRQjNsAAAAASUVORK5CYII=", - "text/plain": [ - "
" - ] - }, - "metadata": { - "needs_background": "light" - }, - "output_type": "display_data" - } - ], - "source": [ - "korean_ingredient_df = create_ingredient_df(korean_df)\r\n", - "korean_ingredient_df.head(10).plot.barh()" - ] - }, - { - "cell_type": "code", - "execution_count": 15, - "metadata": {}, - "outputs": [], - "source": [ - "# TODO add categorical labels to food items - calculated columns to improve accuracy" - ] - }, - { - "cell_type": "code", - "execution_count": 17, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
almondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiaartichoke...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00000000000...0000000000
11000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000010
\n", - "

5 rows ร— 380 columns

\n", - "
" - ], - "text/plain": [ - " almond angelica anise anise_seed apple apple_brandy apricot \\\n", - "0 0 0 0 0 0 0 0 \n", - "1 1 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 0 0 \n", - "\n", - " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n", - "0 0 0 0 ... 0 0 0 \n", - "1 0 0 0 ... 0 0 0 \n", - "2 0 0 0 ... 0 0 0 \n", - "3 0 0 0 ... 0 0 0 \n", - "4 0 0 0 ... 0 0 0 \n", - "\n", - " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", - "0 0 0 0 0 0 0 0 \n", - "1 0 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 1 0 \n", - "\n", - "[5 rows x 380 columns]" - ] - }, - "execution_count": 17, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# set x and y features\r\n", - "# dropping common ingredients to improve accuracy\r\n", - "#feature_df= df.drop(['cuisine','Unnamed: 0'], axis=1)\r\n", - "feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)\r\n", - "labels_df = df.cuisine #.unique()\r\n", - "feature_df.head()\r\n" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [], - "source": [ - "# balance data with SMOTE oversamplling to the highest class. Read more here: https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html\r\n", - "oversample = SMOTE()\r\n", - "transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
almondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiaartichoke...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00000000000...0000000000
11000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000010
\n", - "

5 rows ร— 380 columns

\n", - "
" - ], - "text/plain": [ - " almond angelica anise anise_seed apple apple_brandy apricot \\\n", - "0 0 0 0 0 0 0 0 \n", - "1 1 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 0 0 \n", - "\n", - " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n", - "0 0 0 0 ... 0 0 0 \n", - "1 0 0 0 ... 0 0 0 \n", - "2 0 0 0 ... 0 0 0 \n", - "3 0 0 0 ... 0 0 0 \n", - "4 0 0 0 ... 0 0 0 \n", - "\n", - " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", - "0 0 0 0 0 0 0 0 \n", - "1 0 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 1 0 \n", - "\n", - "[5 rows x 380 columns]" - ] - }, - "execution_count": 28, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "transformed_feature_df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "new label count: chinese 799\n", - "korean 799\n", - "indian 799\n", - "thai 799\n", - "japanese 799\n", - "Name: cuisine, dtype: int64\n", - "old label count: korean 799\n", - "indian 598\n", - "chinese 442\n", - "japanese 320\n", - "thai 289\n", - "Name: cuisine, dtype: int64\n" - ] - } - ], - "source": [ - "print(f'new label count: {transformed_label_df.value_counts()}')\r\n", - "print(f'old label count: {df.cuisine.value_counts()}')" - ] - }, - { - "cell_type": "code", - "execution_count": 42, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisia...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
0indian000000000...0000000000
1indian100000000...0000000000
2indian000000000...0000000000
3indian000000000...0000000000
4indian000000000...0000000010
\n", - "

5 rows ร— 381 columns

\n", - "
" - ], - "text/plain": [ - " cuisine almond angelica anise anise_seed apple apple_brandy apricot \\\n", - "0 indian 0 0 0 0 0 0 0 \n", - "1 indian 1 0 0 0 0 0 0 \n", - "2 indian 0 0 0 0 0 0 0 \n", - "3 indian 0 0 0 0 0 0 0 \n", - "4 indian 0 0 0 0 0 0 0 \n", - "\n", - " armagnac artemisia ... whiskey white_bread white_wine \\\n", - "0 0 0 ... 0 0 0 \n", - "1 0 0 ... 0 0 0 \n", - "2 0 0 ... 0 0 0 \n", - "3 0 0 ... 0 0 0 \n", - "4 0 0 ... 0 0 0 \n", - "\n", - " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", - "0 0 0 0 0 0 0 0 \n", - "1 0 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 1 0 \n", - "\n", - "[5 rows x 381 columns]" - ] - }, - "execution_count": 42, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "# export transformed data to new csv for classification\r\n", - "transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')\r\n", - "transformed_df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 43, - "metadata": {}, - "outputs": [], - "source": [ - "transformed_feature_df.to_csv(\".data/features_dataset.csv\")\r\n", - "transformed_label_df.to_csv(\".data/labels_dataset.csv\")\r\n", - "transformed_df.to_csv(\".data/processed.csv\")" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Build classification model" - ] - }, - { - "cell_type": "code", - "execution_count": 23, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.linear_model import LogisticRegression\r\n", - "from sklearn.model_selection import train_test_split, cross_val_score\r\n", - "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\r\n", - "from sklearn.svm import SVC" - ] - }, - { - "cell_type": "code", - "execution_count": 24, - "metadata": {}, - "outputs": [], - "source": [ - "X_train, X_test, y_train, y_test = train_test_split(transformed_feature_df, transformed_label_df, test_size=0.3)" - ] - }, - { - "cell_type": "code", - "execution_count": 25, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Accuracy is 0.7973311092577148\n" - ] - } - ], - "source": [ - "lr = LogisticRegression(multi_class='ovr',solver='lbfgs')\r\n", - "model = lr.fit(X_train, y_train)\r\n", - "\r\n", - "accuracy = model.score(X_test, y_test)\r\n", - "print (\"Accuracy is {}\".format(accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 26, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ingredients: Index(['bean', 'carrot', 'cayenne', 'pea', 'sake', 'scallion', 'sesame_oil',\n", - " 'shrimp', 'starch', 'vegetable_oil', 'vinegar'],\n", - " dtype='object')\n", - "cusine: chinese\n" - ] - } - ], - "source": [ - "# test an item\r\n", - "print(f'ingredients: {X_test.iloc[20][X_test.iloc[20]!=0].keys()}')\r\n", - "print(f'cusine: {y_test.iloc[20]}')" - ] - }, - { - "cell_type": "code", - "execution_count": 27, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0
chinese0.848156
japanese0.110072
korean0.033688
thai0.005013
indian0.003072
\n", - "
" - ], - "text/plain": [ - " 0\n", - "chinese 0.848156\n", - "japanese 0.110072\n", - "korean 0.033688\n", - "thai 0.005013\n", - "indian 0.003072" - ] - }, - "execution_count": 27, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#rehsape to 2d array and transpose\r\n", - "test= X_test.iloc[20].values.reshape(-1, 1).T\r\n", - "# predict with score\r\n", - "proba = model.predict_proba(test)\r\n", - "classes = model.classes_\r\n", - "# create df with classes and scores\r\n", - "resultdf = pd.DataFrame(data=proba, columns=classes)\r\n", - "\r\n", - "# create df to show results\r\n", - "topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])\r\n", - "topPrediction.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 28, - "metadata": {}, - "outputs": [], - "source": [ - "y_pred = model.predict(X_test)" - ] - }, - { - "cell_type": "code", - "execution_count": 29, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " precision recall f1-score support\n", - "\n", - " chinese 0.78 0.75 0.76 243\n", - " indian 0.91 0.92 0.91 233\n", - " japanese 0.67 0.77 0.71 244\n", - " korean 0.84 0.78 0.81 241\n", - " thai 0.82 0.77 0.80 238\n", - "\n", - " accuracy 0.80 1199\n", - " macro avg 0.80 0.80 0.80 1199\n", - "weighted avg 0.80 0.80 0.80 1199\n", - "\n" - ] - } - ], - "source": [ - "print(classification_report(y_test,y_pred))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Try different classifiers" - ] - }, - { - "cell_type": "code", - "execution_count": 30, - "metadata": {}, - "outputs": [], - "source": [ - "\r\n", - "C = 10\r\n", - "# Create different classifiers.\r\n", - "classifiers = {\r\n", - " 'L1 logistic': LogisticRegression(C=C, penalty='l1',\r\n", - " solver='saga',\r\n", - " multi_class='multinomial',\r\n", - " max_iter=10000),\r\n", - " 'L2 logistic (Multinomial)': LogisticRegression(C=C, penalty='l2',\r\n", - " solver='saga',\r\n", - " multi_class='multinomial',\r\n", - " max_iter=10000),\r\n", - " 'L2 logistic (OvR)': LogisticRegression(C=C, penalty='l2',\r\n", - " solver='saga',\r\n", - " multi_class='ovr',\r\n", - " max_iter=10000),\r\n", - " 'Linear SVC': SVC(kernel='linear', C=C, probability=True,\r\n", - " random_state=0)\r\n", - "}\r\n" - ] - }, - { - "cell_type": "code", - "execution_count": 31, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Accuracy (train) for L1 logistic: 80.7% \n", - "Accuracy (train) for L2 logistic (Multinomial): 80.9% \n", - "Accuracy (train) for L2 logistic (OvR): 80.7% \n", - "Accuracy (train) for Linear SVC: 79.9% \n" - ] - } - ], - "source": [ - "n_classifiers = len(classifiers)\r\n", - "\r\n", - "for index, (name, classifier) in enumerate(classifiers.items()):\r\n", - " classifier.fit(X_train, y_train)\r\n", - "\r\n", - " y_pred = classifier.predict(X_test)\r\n", - " accuracy = accuracy_score(y_test, y_pred)\r\n", - " print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\r\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "interpreter": { - "hash": "dd61f40108e2a19f4ef0d3ebbc6b6eea57ab3c4bc13b15fe6f390d3d86442534" - }, - "kernelspec": { - "display_name": "Python 3.8.5 64-bit ('onnxwine': conda)", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file diff --git a/4-Classification/1-Introduction/solution/intro-classification.ipynb b/4-Classification/1-Introduction/solution/intro-classification.ipynb deleted file mode 100644 index 997e18258..000000000 --- a/4-Classification/1-Introduction/solution/intro-classification.ipynb +++ /dev/null @@ -1,563 +0,0 @@ -{ - "cells": [ - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Build Classification Model" - ] - }, - { - "cell_type": "code", - "execution_count": 58, - "metadata": {}, - "outputs": [], - "source": [ - "from sklearn.linear_model import LogisticRegression\r\n", - "from sklearn.model_selection import train_test_split, cross_val_score\r\n", - "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\r\n", - "from sklearn.svm import SVC\r\n", - "import pandas as pd\r\n", - "import numpy as np" - ] - }, - { - "cell_type": "code", - "execution_count": 48, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
almondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiaartichoke...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00000000000...0000000000
11000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000010
\n", - "

5 rows ร— 380 columns

\n", - "
" - ], - "text/plain": [ - " almond angelica anise anise_seed apple apple_brandy apricot \\\n", - "0 0 0 0 0 0 0 0 \n", - "1 1 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 0 0 \n", - "\n", - " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n", - "0 0 0 0 ... 0 0 0 \n", - "1 0 0 0 ... 0 0 0 \n", - "2 0 0 0 ... 0 0 0 \n", - "3 0 0 0 ... 0 0 0 \n", - "4 0 0 0 ... 0 0 0 \n", - "\n", - " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", - "0 0 0 0 0 0 0 0 \n", - "1 0 0 0 0 0 0 0 \n", - "2 0 0 0 0 0 0 0 \n", - "3 0 0 0 0 0 0 0 \n", - "4 0 0 0 0 0 1 0 \n", - "\n", - "[5 rows x 380 columns]" - ] - }, - "execution_count": 48, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "transformed_feature_df = pd.read_csv(\".data/features_dataset.csv\")\r\n", - "transformed_feature_df= transformed_feature_df.drop(['Unnamed: 0'], axis=1)\r\n", - "transformed_feature_df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 49, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
cuisine
0indian
1indian
2indian
3indian
4indian
\n", - "
" - ], - "text/plain": [ - " cuisine\n", - "0 indian\n", - "1 indian\n", - "2 indian\n", - "3 indian\n", - "4 indian" - ] - }, - "execution_count": 49, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "transformed_label_df = pd.read_csv(\".data/labels_dataset.csv\")\r\n", - "transformed_label_df= transformed_label_df.drop(['Unnamed: 0'], axis=1)\r\n", - "transformed_label_df.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 50, - "metadata": {}, - "outputs": [], - "source": [ - "X_train, X_test, y_train, y_test = train_test_split(transformed_feature_df, transformed_label_df, test_size=0.3)" - ] - }, - { - "cell_type": "code", - "execution_count": 59, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Accuracy is 0.8031693077564637\n" - ] - } - ], - "source": [ - "lr = LogisticRegression(multi_class='ovr',solver='lbfgs')\r\n", - "model = lr.fit(X_train, np.ravel(y_train))\r\n", - "\r\n", - "accuracy = model.score(X_test, y_test)\r\n", - "print (\"Accuracy is {}\".format(accuracy))" - ] - }, - { - "cell_type": "code", - "execution_count": 52, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "ingredients: Index(['corn'], dtype='object')\n", - "cusine: cuisine thai\n", - "Name: 3816, dtype: object\n" - ] - } - ], - "source": [ - "# test an item\r\n", - "print(f'ingredients: {X_test.iloc[20][X_test.iloc[20]!=0].keys()}')\r\n", - "print(f'cusine: {y_test.iloc[20]}')" - ] - }, - { - "cell_type": "code", - "execution_count": 53, - "metadata": {}, - "outputs": [ - { - "data": { - "text/html": [ - "
\n", - "\n", - "\n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - " \n", - "
0
thai0.475724
chinese0.201912
japanese0.152046
korean0.110980
indian0.059338
\n", - "
" - ], - "text/plain": [ - " 0\n", - "thai 0.475724\n", - "chinese 0.201912\n", - "japanese 0.152046\n", - "korean 0.110980\n", - "indian 0.059338" - ] - }, - "execution_count": 53, - "metadata": {}, - "output_type": "execute_result" - } - ], - "source": [ - "#rehsape to 2d array and transpose\r\n", - "test= X_test.iloc[20].values.reshape(-1, 1).T\r\n", - "# predict with score\r\n", - "proba = model.predict_proba(test)\r\n", - "classes = model.classes_\r\n", - "# create df with classes and scores\r\n", - "resultdf = pd.DataFrame(data=proba, columns=classes)\r\n", - "\r\n", - "# create df to show results\r\n", - "topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])\r\n", - "topPrediction.head()" - ] - }, - { - "cell_type": "code", - "execution_count": 54, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - " precision recall f1-score support\n", - "\n", - " chinese 0.77 0.70 0.74 239\n", - " indian 0.88 0.88 0.88 240\n", - " japanese 0.76 0.79 0.77 227\n", - " korean 0.86 0.78 0.82 240\n", - " thai 0.75 0.86 0.80 253\n", - "\n", - " accuracy 0.80 1199\n", - " macro avg 0.81 0.80 0.80 1199\n", - "weighted avg 0.81 0.80 0.80 1199\n", - "\n" - ] - } - ], - "source": [ - "y_pred = model.predict(X_test)\r\n", - "print(classification_report(y_test,y_pred))" - ] - }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "# Try different classifiers" - ] - }, - { - "cell_type": "code", - "execution_count": 60, - "metadata": {}, - "outputs": [], - "source": [ - "\r\n", - "C = 10\r\n", - "# Create different classifiers.\r\n", - "classifiers = {\r\n", - " 'L1 logistic': LogisticRegression(C=C, penalty='l1',\r\n", - " solver='saga',\r\n", - " multi_class='multinomial',\r\n", - " max_iter=10000),\r\n", - " 'L2 logistic (Multinomial)': LogisticRegression(C=C, penalty='l2',\r\n", - " solver='saga',\r\n", - " multi_class='multinomial',\r\n", - " max_iter=10000),\r\n", - " 'L2 logistic (OvR)': LogisticRegression(C=C, penalty='l2',\r\n", - " solver='saga',\r\n", - " multi_class='ovr',\r\n", - " max_iter=10000),\r\n", - " 'Linear SVC': SVC(kernel='linear', C=C, probability=True,\r\n", - " random_state=0)\r\n", - "}\r\n" - ] - }, - { - "cell_type": "code", - "execution_count": 61, - "metadata": {}, - "outputs": [ - { - "name": "stdout", - "output_type": "stream", - "text": [ - "Accuracy (train) for L1 logistic: 79.9% \n", - "Accuracy (train) for L2 logistic (Multinomial): 79.7% \n", - "Accuracy (train) for L2 logistic (OvR): 79.8% \n", - "Accuracy (train) for Linear SVC: 77.9% \n" - ] - } - ], - "source": [ - "n_classifiers = len(classifiers)\r\n", - "\r\n", - "for index, (name, classifier) in enumerate(classifiers.items()):\r\n", - " classifier.fit(X_train, np.ravel(y_train))\r\n", - "\r\n", - " y_pred = classifier.predict(X_test)\r\n", - " accuracy = accuracy_score(y_test, y_pred)\r\n", - " print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\r\n" - ] - }, - { - "cell_type": "code", - "execution_count": null, - "metadata": {}, - "outputs": [], - "source": [] - } - ], - "metadata": { - "interpreter": { - "hash": "dd61f40108e2a19f4ef0d3ebbc6b6eea57ab3c4bc13b15fe6f390d3d86442534" - }, - "kernelspec": { - "display_name": "Python 3.8.5 64-bit ('onnxwine': conda)", - "name": "python3" - }, - "language_info": { - "codemirror_mode": { - "name": "ipython", - "version": 3 - }, - "file_extension": ".py", - "mimetype": "text/x-python", - "name": "python", - "nbconvert_exporter": "python", - "pygments_lexer": "ipython3", - "version": "3.8.5" - } - }, - "nbformat": 4, - "nbformat_minor": 4 -} \ No newline at end of file diff --git a/4-Classification/1-Introduction/solution/notebook.ipynb b/4-Classification/1-Introduction/solution/notebook.ipynb index 0d08010ca..84f6b3826 100644 --- a/4-Classification/1-Introduction/solution/notebook.ipynb +++ b/4-Classification/1-Introduction/solution/notebook.ipynb @@ -11,12 +11,14 @@ "source": [ "Install Imblearn which will enable SMOTE. This is a Scikit-Learn package that helps handle imbalanced data when performing classification. (https://imbalanced-learn.org/stable/)" ], - "cell_type": "markdown", - "metadata": {} + "cell_type": "code", + "metadata": {}, + "execution_count": null, + "outputs": [] }, { "cell_type": "code", - "execution_count": 17, + "execution_count": 2, "metadata": {}, "outputs": [ { @@ -26,8 +28,8 @@ "Requirement already satisfied: imblearn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.0)\n", "Requirement already satisfied: imbalanced-learn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imblearn) (0.8.0)\n", "Requirement already satisfied: numpy>=1.13.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (1.19.2)\n", - "Requirement already satisfied: joblib>=0.11 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (0.16.0)\n", "Requirement already satisfied: scipy>=0.19.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (1.4.1)\n", + "Requirement already satisfied: joblib>=0.11 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (0.16.0)\n", "Requirement already satisfied: scikit-learn>=0.24 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (0.24.2)\n", "Requirement already satisfied: threadpoolctl>=2.0.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from scikit-learn>=0.24->imbalanced-learn->imblearn) (2.1.0)\n", "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n", @@ -42,11 +44,10 @@ }, { "cell_type": "code", - "execution_count": 18, + "execution_count": 3, "metadata": {}, "outputs": [], "source": [ - "\n", "import pandas as pd\n", "import matplotlib.pyplot as plt\n", "import matplotlib as mpl\n", @@ -56,7 +57,7 @@ }, { "cell_type": "code", - "execution_count": 19, + "execution_count": 4, "metadata": {}, "outputs": [], "source": [ @@ -72,7 +73,7 @@ }, { "cell_type": "code", - "execution_count": 20, + "execution_count": 5, "metadata": {}, "outputs": [ { @@ -105,7 +106,7 @@ "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnac...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
065indian00000000...0000000000
166indian10000000...0000000000
267indian00000000...0000000000
368indian00000000...0000000000
469indian00000000...0000000010
\n

5 rows ร— 385 columns

\n
" }, "metadata": {}, - "execution_count": 20 + "execution_count": 5 } ], "source": [ @@ -114,7 +115,7 @@ }, { "cell_type": "code", - "execution_count": 21, + "execution_count": 6, "metadata": {}, "outputs": [ { @@ -131,7 +132,7 @@ }, { "cell_type": "code", - "execution_count": 22, + "execution_count": 7, "metadata": {}, "outputs": [ { @@ -147,7 +148,7 @@ ] }, "metadata": {}, - "execution_count": 22 + "execution_count": 7 } ], "source": [ @@ -163,24 +164,24 @@ }, { "cell_type": "code", - "execution_count": 23, + "execution_count": 8, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, - "execution_count": 23 + "execution_count": 8 }, { "output_type": "display_data", "data": { "text/plain": "
", - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", + "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAD4CAYAAAAtrdtxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAASY0lEQVR4nO3df7TldV3v8eerGZkRRoeAiXtE5UgNIkUCjlwQIzAiC7NscdcSbcmsfkxl5SXX0juuyzK9d3UvlXnpplajma0kMtCUhluImNcr8msGBmb4pZaTQCFQOYom0fi+f+zPkd14hpnzOWefvYfzfKy113z35/vde7/22fvMa3++3733SVUhSVKPbxt3AEnSgcsSkSR1s0QkSd0sEUlSN0tEktRt+bgDLKYjjjiipqenxx1Dkg4oW7dufbiq1sy2bkmVyPT0NFu2bBl3DEk6oCT5u72tc3eWJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqduS+sT69vt3Mb3xqnHH0ALZefG5444gLXnORCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd0sEUlSN0tEktRtIkokyaFJXtuWz0yyeY6X/29Jzh5NOknS3kxEiQCHAq/tvXBVvbmqPraAeSRJ+2FSSuRi4DuTbAN+E1iV5Iokdye5NEkAkrw5yc1JdiTZNDT+viTnjTG/JC1Jk1IiG4G/qaoTgTcAJwEXAscDxwCnt+3eUVUvrKrvAZ4KvGxfV5xkQ5ItSbbs/tqu0aSXpCVqUkpkTzdV1X1V9Q1gGzDdxs9KcmOS7cBLgO/e1xVV1aaqWldV65YdvHp0iSVpCZrUL2B8dGh5N7A8yUrgXcC6qro3yVuAleMIJ0kamJSZyFeAp+1jm5nCeDjJKsBjIJI0ZhMxE6mqf0xyXZIdwL8AX5xlmy8leTewA3gAuHmRY0qS9jARJQJQVa/ay/gvDS1fBFw0yzbrR5dMkrQ3k7I7S5J0ALJEJEndLBFJUjdLRJLUzRKRJHWbmHdnLYYTjlrNlovPHXcMSXrScCYiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6rZ83AEW0/b7dzG98apxx9CY7Lz43HFHkJ50nIlIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG77VSJJPj3qIJKkA89+lUhVvWjUQSRJB579nYk8kmRVkmuT3JJke5Ifa+umk9yd5NIkdyW5IsnBbd2bk9ycZEeSTUnSxj+R5NeT3JTkM0m+r40vS/Kb7TK3J/m5Nj6V5JNJtrXrmtn+nCTXt0yXJ1k1ih+SJGl2czkm8nXgFVV1MnAW8FszpQA8F3hXVT0P+DLw2jb+jqp6YVV9D/BU4GVD17e8qk4BLgR+tY39NLCrql4IvBD42STPAV4FXF1VJwLPB7YlOQK4CDi7ZdoCvH4ud16SND9z+dqTAP8jyRnAN4CjgCPbunur6rq2/H7gdcDbgLOSvBE4GDgMuAP4i7bdh9q/W4HptnwO8L1JzmvnVwNrgZuB9yZ5CvDhqtqW5PuB44HrWpcdBFz/LaGTDcAGgGVPXzOHuytJ2pe5lMirgTXAC6rqsSQ7gZVtXe2xbSVZCbwLWFdV9yZ5y9D2AI+2f3cP5Qjwy1V19Z433srrXOB9Sd4O/DNwTVWd/0Shq2oTsAlgxdTaPXNKkuZhLruzVgMPtgI5Czh6aN2zk5zWll8FfIrHC+PhdqziPPbtauAX2oyDJMcmOSTJ0cAXq+rdwHuAk4EbgNOTfFfb9pAkx87h/kiS5ml/ZyIFXAr8RZLtDI4/3D20/h7gF5O8F7gT+N2q+lqSdwM7gAcY7JLal/cw2LV1Szve8hDw48CZwBuSPAY8Arymqh5Ksh64LMmKdvmLgM/s532SJM1Tqp54D0+Sw4FbqurovayfBja3g+cTbcXU2pq64JJxx9CY+FXwUp8kW6tq3WzrnnB3VpJnMDhY/bZRBJMkHdiecHdWVf098ITHGapqJzDxsxBJ0sLzu7MkSd0sEUlSN0tEktRtLh82POCdcNRqtvgOHUlaMM5EJEndLBFJUjdLRJLUzRKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd0sEUlSN0tEktTNEpEkdbNEJEndLBFJUjdLRJLUzRKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd2WjzvAYtp+/y6mN1417hhSt50XnzvuCNK/40xEktTNEpEkdbNEJEndLBFJUjdLRJLUzRKRJHWzRCRJ3Ra0RJK8L8l5s4w/I8kVC3lbkqTxW5QPG1bV3wPfUi6SpAPbvGYiSV6T5PYktyX54zZ8RpJPJ/nbmVlJkukkO9ry+iQfSvJXST6b5DeGru+cJNcnuSXJ5UlWtfGLk9zZbuttbWxNkg8mubmdTp/PfZEkzV33TCTJdwMXAS+qqoeTHAa8HZgCXgwcB1wJzLYb60TgJOBR4J4kvwP8S7u+s6vqq0n+C/D6JO8EXgEcV1WV5NB2Hb8N/K+q+lSSZwNXA8+bJecGYAPAsqev6b27kqRZzGd31kuAy6vqYYCq+qckAB+uqm8AdyY5ci+XvbaqdgEkuRM4GjgUOB64rl3PQcD1wC7g68AfJNkMbG7XcTZwfNsW4OlJVlXVI8M3VFWbgE0AK6bW1jzuryRpD6M4JvLo0HL2Y5vdLUeAa6rq/D03TnIK8AMMjqv8EoMC+zbg1Kr6+kKEliTN3XyOiXwc+E9JDgdou7Pm4wbg9CTf1a7vkCTHtuMiq6vq/wC/Ajy/bf9R4JdnLpzkxHneviRpjrpnIlV1R5JfA/5vkt3ArfMJUlUPJVkPXJZkRRu+CPgK8JEkKxnMVl7f1r0OeGeS2xncj08CPz+fDJKkuUnV0jlMsGJqbU1dcMm4Y0jd/HsiGockW6tq3Wzr/MS6JKmbJSJJ6maJSJK6WSKSpG6WiCSp26J8AeOkOOGo1Wzx3S2StGCciUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6LR93gMW0/f5dTG+8atwxJM3RzovPHXcE7YUzEUlSN0tEktTNEpEkdbNEJEndLBFJUjdLRJLUbWQlkuTTc9z+zCSb2/LLk2wcTTJJ0kIZ2edEqupF87jslcCVCxhHkjQCo5yJPNL+PTPJJ5JckeTuJJcmSVv30jZ2C/ATQ5ddn+QdbflHk9yY5NYkH0tyZBt/S5L3tuv+2ySvG9V9kSTNbrGOiZwEXAgcDxwDnJ5kJfBu4EeBFwD/YS+X/RRwalWdBPwp8MahdccBPwScAvxqkqeMJr4kaTaL9bUnN1XVfQBJtgHTwCPA56vqs238/cCGWS77TOADSaaAg4DPD627qqoeBR5N8iBwJHDf8IWTbJi53mVPX7OQ90mSlrzFmok8OrS8m7mV1+8A76iqE4CfA1bO5XqralNVrauqdcsOXj2Hm5Uk7cs43+J7NzCd5Dvb+fP3st1q4P62fMHIU0mS9tvYSqSqvs5gN9NV7cD6g3vZ9C3A5Um2Ag8vUjxJ0n5IVY07w6JZMbW2pi64ZNwxJM2RXwU/Xkm2VtW62db5iXVJUjdLRJLUzRKRJHWzRCRJ3SwRSVK3xfrE+kQ44ajVbPFdHpK0YJyJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrotH3eAxbT9/l1Mb7xq3DEkaVHtvPjckV23MxFJUjdLRJLUzRKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1W9ASSTKdZMdCXqckaXJNxEwkyZL60KMkPVmMrESSHJPk1iTfl+QPk2xv589q69cnuTLJx4Fr29gbktyc5PYkbx26rg8n2ZrkjiQbhsYfSfJrSW5LckOSI0d1fyRJ32okJZLkucAHgfXAKUBV1QnA+cAfJVnZNj0ZOK+qvj/JOcDatv2JwAuSnNG2+6mqegGwDnhdksPb+CHADVX1fOCTwM/OkmVDki1Jtuz+2q5R3F1JWrJGUSJrgI8Ar66q24AXA+8HqKq7gb8Djm3bXlNV/9SWz2mnW4FbgOMYlAoMiuM24AbgWUPj/wpsbstbgek9w1TVpqpaV1Xrlh28eqHuoySJ0XwB4y7gCwzK4859bPvVoeUA/7Oqfn94gyRnAmcDp1XV15J8ApiZyTxWVdWWd7PEvlBSksZtFDORfwVeAbwmyauA/we8GiDJscCzgXtmudzVwE8lWdW2PSrJdwCrgX9uBXIccOoIMkuSOozklXtVfTXJy4BrgP8OnJBkO/BvwPqqejTJnpf5aJLnAde3dY8APwn8FfDzSe5iUD43jCKzJGnu8vjeoCe/FVNra+qCS8YdQ5IW1Xz/nkiSrVW1brZ1E/E5EUnSgckSkSR1s0QkSd0sEUlSN0tEktRtSX0474SjVrNlnu9SkCQ9zpmIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqtqT+KFWSrzD7n+adFEcAD487xBMw3/yYb37MNz/zyXd0Va2ZbcWS+u4s4J69/XWuSZBki/n6mW9+zDc/SzWfu7MkSd0sEUlSt6VWIpvGHWAfzDc/5psf883Pksy3pA6sS5IW1lKbiUiSFpAlIknqtmRKJMlLk9yT5HNJNo4pw3uTPJhkx9DYYUmuSfLZ9u+3t/Ek+d8t7+1JTl6EfM9K8tdJ7kxyR5L/PEkZk6xMclOS21q+t7bx5yS5seX4QJKD2viKdv5zbf30KPO121yW5NYkmycw284k25NsS7KljU3EY9tu89AkVyS5O8ldSU6blHxJntt+bjOnLye5cFLytdv8lfZ7sSPJZe33ZfTPv6p60p+AZcDfAMcABwG3AcePIccZwMnAjqGx3wA2tuWNwK+35R8B/hIIcCpw4yLkmwJObstPAz4DHD8pGdvtrGrLTwFubLf7Z8Ar2/jvAb/Qll8L/F5bfiXwgUX4Gb4e+BNgczs/Sdl2AkfsMTYRj227zT8CfqYtHwQcOkn5hnIuAx4Ajp6UfMBRwOeBpw4979YvxvNvUX7o4z4BpwFXD51/E/CmMWWZ5t+XyD3AVFueYvCBSIDfB86fbbtFzPoR4AcnMSNwMHAL8B8ZfAp3+Z6PNXA1cFpbXt62ywgzPRO4FngJsLn9BzIR2drt7ORbS2QiHltgdftPMJOYb49M5wDXTVI+BiVyL3BYez5tBn5oMZ5/S2V31swPeMZ9bWwSHFlV/9CWHwCObMtjzdymtycxeLU/MRnb7qJtwIPANQxmmF+qqn+bJcM387X1u4DDRxjvEuCNwDfa+cMnKBtAAR9NsjXJhjY2KY/tc4CHgD9suwPfk+SQCco37JXAZW15IvJV1f3A24AvAP/A4Pm0lUV4/i2VEjkg1OBlwdjfc51kFfBB4MKq+vLwunFnrKrdVXUig1f9pwDHjSvLsCQvAx6sqq3jzvIEXlxVJwM/DPxikjOGV475sV3OYFfv71bVScBXGewe+qZxP/cA2jGFlwOX77lunPnasZgfY1DGzwAOAV66GLe9VErkfuBZQ+ef2cYmwReTTAG0fx9s42PJnOQpDArk0qr60CRmBKiqLwF/zWCKfmiSme+BG87wzXxt/WrgH0cU6XTg5Ul2An/KYJfWb09INuCbr1apqgeBP2dQwpPy2N4H3FdVN7bzVzAolUnJN+OHgVuq6ovt/KTkOxv4fFU9VFWPAR9i8Jwc+fNvqZTIzcDa9k6FgxhMR68cc6YZVwIXtOULGByHmBl/TXuXx6nArqFp80gkCfAHwF1V9fZJy5hkTZJD2/JTGRyvuYtBmZy3l3wzuc8DPt5eLS64qnpTVT2zqqYZPL8+XlWvnoRsAEkOSfK0mWUG+/V3MCGPbVU9ANyb5Llt6AeAOycl35DzeXxX1kyOScj3BeDUJAe33+OZn9/on3+LcSBqEk4M3i3xGQb70P/rmDJcxmB/5WMMXnn9NIP9kNcCnwU+BhzWtg3wzpZ3O7BuEfK9mMF0/HZgWzv9yKRkBL4XuLXl2wG8uY0fA9wEfI7BboYVbXxlO/+5tv6YRXqcz+Txd2dNRLaW47Z2umPmd2BSHtt2mycCW9rj+2Hg2ycs3yEMXq2vHhqbpHxvBe5uvxt/DKxYjOefX3siSeq2VHZnSZJGwBKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd3+PxNFbW14TY8fAAAAAElFTkSuQmCC\n" }, "metadata": { @@ -194,7 +195,7 @@ }, { "cell_type": "code", - "execution_count": 24, + "execution_count": 9, "metadata": {}, "outputs": [ { @@ -229,40 +230,40 @@ }, { "cell_type": "code", - "execution_count": 25, + "execution_count": 10, "metadata": {}, "outputs": [], "source": [ - "def create_ingredient_df(df):\r\n", - " #transpose df, drop cuisine and unnamed rows, sum the row to get total for ingredient and add value header to new df\r\n", - " ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')\r\n", - " # drop ingredients that have a 0 sum\r\n", - " ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]\r\n", - " # sort df\r\n", - " ingredient_df = ingredient_df.sort_values(by='value', ascending=False, inplace=False)\r\n", - " return ingredient_df\r\n" + "def create_ingredient_df(df):\n", + " # transpose df, drop cuisine and unnamed rows, sum the row to get total for ingredient and add value header to new df\n", + " ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')\n", + " # drop ingredients that have a 0 sum\n", + " ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]\n", + " # sort df\n", + " ingredient_df = ingredient_df.sort_values(by='value', ascending=False, inplace=False)\n", + " return ingredient_df\n" ] }, { "cell_type": "code", - "execution_count": 26, + "execution_count": 11, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, - "execution_count": 26 + "execution_count": 11 }, { "output_type": "display_data", "data": { "text/plain": "
", - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", + "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "image/png": "\n" }, "metadata": { @@ -277,24 +278,24 @@ }, { "cell_type": "code", - "execution_count": 27, + "execution_count": 12, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, - "execution_count": 27 + "execution_count": 12 }, { "output_type": "display_data", "data": { "text/plain": "
", - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", + "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "image/png": "\n" }, "metadata": { @@ -309,24 +310,24 @@ }, { "cell_type": "code", - "execution_count": 28, + "execution_count": 13, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, - "execution_count": 28 + "execution_count": 13 }, { "output_type": "display_data", "data": { "text/plain": "
", - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", + "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "image/png": "\n" }, "metadata": { @@ -341,24 +342,24 @@ }, { "cell_type": "code", - "execution_count": 29, + "execution_count": 14, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, - "execution_count": 29 + "execution_count": 14 }, { "output_type": "display_data", "data": { "text/plain": "
", - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", + "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "image/png": "\n" }, "metadata": { @@ -373,24 +374,24 @@ }, { "cell_type": "code", - "execution_count": 30, + "execution_count": 15, "metadata": {}, "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ - "" + "" ] }, "metadata": {}, - "execution_count": 30 + "execution_count": 15 }, { "output_type": "display_data", "data": { "text/plain": "
", - "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", + "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n\n", "image/png": "\n" }, "metadata": { @@ -412,7 +413,7 @@ }, { "cell_type": "code", - "execution_count": 31, + "execution_count": 16, "metadata": {}, "outputs": [ { @@ -445,7 +446,7 @@ "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
almondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiaartichoke...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00000000000...0000000000
11000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000010
\n

5 rows ร— 380 columns

\n
" }, "metadata": {}, - "execution_count": 31 + "execution_count": 16 } ], "source": [ @@ -456,14 +457,14 @@ }, { "source": [ - "Balance data with SMOTE oversamplling to the highest class. Read more here: https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html" + "Balance data with SMOTE oversampling to the highest class. Read more here: https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html" ], "cell_type": "markdown", "metadata": {} }, { "cell_type": "code", - "execution_count": 32, + "execution_count": 17, "metadata": {}, "outputs": [], "source": [ @@ -473,14 +474,14 @@ }, { "cell_type": "code", - "execution_count": 33, + "execution_count": 19, "metadata": {}, "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ - "new label count: chinese 799\nkorean 799\njapanese 799\nthai 799\nindian 799\nName: cuisine, dtype: int64\nold label count: korean 799\nindian 598\nchinese 442\njapanese 320\nthai 289\nName: cuisine, dtype: int64\n" + "new label count: korean 799\nchinese 799\nindian 799\njapanese 799\nthai 799\nName: cuisine, dtype: int64\nold label count: korean 799\nindian 598\nchinese 442\njapanese 320\nthai 289\nName: cuisine, dtype: int64\n" ] } ], @@ -491,7 +492,7 @@ }, { "cell_type": "code", - "execution_count": 34, + "execution_count": 20, "metadata": {}, "outputs": [ { @@ -524,23 +525,16 @@ "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
almondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiaartichoke...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00000000000...0000000000
11000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000010
\n

5 rows ร— 380 columns

\n
" }, "metadata": {}, - "execution_count": 34 + "execution_count": 20 } ], "source": [ "transformed_feature_df.head()" ] }, - { - "source": [ - "todo: explain this concatenation?" - ], - "cell_type": "markdown", - "metadata": {} - }, { "cell_type": "code", - "execution_count": 35, + "execution_count": 21, "metadata": {}, "outputs": [ { @@ -568,7 +562,7 @@ "4 0 0 0 ... 0 0 0 \n", "... ... ... ... ... ... ... ... \n", "3990 0 0 0 ... 0 0 0 \n", - "3991 0 0 0 ... 0 0 1 \n", + "3991 0 0 0 ... 0 0 0 \n", "3992 0 0 0 ... 0 0 0 \n", "3993 0 0 0 ... 0 0 0 \n", "3994 0 0 0 ... 0 0 0 \n", @@ -588,10 +582,10 @@ "\n", "[3995 rows x 381 columns]" ], - "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisia...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
0indian000000000...0000000000
1indian100000000...0000000000
2indian000000000...0000000000
3indian000000000...0000000000
4indian000000000...0000000010
..................................................................
3990thai000000000...0000000000
3991thai000000000...0010000000
3992thai000000000...0000000000
3993thai000000000...0000000000
3994thai000000000...0000000000
\n

3995 rows ร— 381 columns

\n
" + "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisia...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
0indian000000000...0000000000
1indian100000000...0000000000
2indian000000000...0000000000
3indian000000000...0000000000
4indian000000000...0000000010
..................................................................
3990thai000000000...0000000000
3991thai000000000...0000000000
3992thai000000000...0000000000
3993thai000000000...0000000000
3994thai000000000...0000000000
\n

3995 rows ร— 381 columns

\n
" }, "metadata": {}, - "execution_count": 35 + "execution_count": 21 } ], "source": [ @@ -599,6 +593,46 @@ "transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')\n", "transformed_df" ] + }, + { + "cell_type": "code", + "execution_count": 22, + "metadata": {}, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "\nRangeIndex: 3995 entries, 0 to 3994\nColumns: 381 entries, cuisine to zucchini\ndtypes: int64(380), object(1)\nmemory usage: 11.6+ MB\n" + ] + } + ], + "source": [ + "transformed_df.info()" + ] + }, + { + "source": [ + "Save the file for future use" + ], + "cell_type": "markdown", + "metadata": {} + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [], + "source": [ + "transformed_df.to_csv(\"../../data/cleaned_cuisine.csv\")" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] } ], "metadata": { diff --git a/4-Classification/2-Discriminative/README.md b/4-Classification/2-Classifiers-1/README.md similarity index 100% rename from 4-Classification/2-Discriminative/README.md rename to 4-Classification/2-Classifiers-1/README.md diff --git a/4-Classification/2-Discriminative/assignment.md b/4-Classification/2-Classifiers-1/assignment.md similarity index 100% rename from 4-Classification/2-Discriminative/assignment.md rename to 4-Classification/2-Classifiers-1/assignment.md diff --git a/4-Classification/2-Discriminative/translations/README.es.md b/4-Classification/2-Classifiers-1/notebook.ipynb similarity index 100% rename from 4-Classification/2-Discriminative/translations/README.es.md rename to 4-Classification/2-Classifiers-1/notebook.ipynb diff --git a/4-Classification/2-Classifiers-1/solution/notebook.ipynb b/4-Classification/2-Classifiers-1/solution/notebook.ipynb new file mode 100644 index 000000000..aeba09710 --- /dev/null +++ b/4-Classification/2-Classifiers-1/solution/notebook.ipynb @@ -0,0 +1,336 @@ +{ + "cells": [ + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Build Classification Model" + ] + }, + { + "cell_type": "code", + "execution_count": 25, + "metadata": {}, + "outputs": [], + "source": [ + "from sklearn.linear_model import LogisticRegression\n", + "from sklearn.model_selection import train_test_split, cross_val_score\n", + "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n", + "from sklearn.svm import SVC\n", + "import pandas as pd\n", + "import numpy as np" + ] + }, + { + "cell_type": "code", + "execution_count": 26, + "metadata": {}, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n", + "0 0 indian 0 0 0 0 0 \n", + "1 1 indian 1 0 0 0 0 \n", + "2 2 indian 0 0 0 0 0 \n", + "3 3 indian 0 0 0 0 0 \n", + "4 4 indian 0 0 0 0 0 \n", + "\n", + " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n", + "0 0 0 0 ... 0 0 0 \n", + "1 0 0 0 ... 0 0 0 \n", + "2 0 0 0 ... 0 0 0 \n", + "3 0 0 0 ... 0 0 0 \n", + "4 0 0 0 ... 0 0 0 \n", + "\n", + " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", + "0 0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 \n", + "2 0 0 0 0 0 0 0 \n", + "3 0 0 0 0 0 0 0 \n", + "4 0 0 0 0 0 1 0 \n", + "\n", + "[5 rows x 382 columns]" + ], + "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
Unnamed: 0cuisinealmondangelicaaniseanise_seedappleapple_brandyapricotarmagnac...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00indian00000000...0000000000
11indian10000000...0000000000
22indian00000000...0000000000
33indian00000000...0000000000
44indian00000000...0000000010
\n

5 rows ร— 382 columns

\n
" + }, + "metadata": {}, + "execution_count": 26 + } + ], + "source": [ + "recipes_df = pd.read_csv(\"../../data/cleaned_cuisine.csv\")\n", + "recipes_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 27, + "metadata": {}, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + "0 indian\n", + "1 indian\n", + "2 indian\n", + "3 indian\n", + "4 indian\n", + "Name: cuisine, dtype: object" + ] + }, + "metadata": {}, + "execution_count": 27 + } + ], + "source": [ + "recipes_label_df = recipes_df['cuisine']\n", + "recipes_label_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 28, + "metadata": {}, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " almond angelica anise anise_seed apple apple_brandy apricot \\\n", + "0 0 0 0 0 0 0 0 \n", + "1 1 0 0 0 0 0 0 \n", + "2 0 0 0 0 0 0 0 \n", + "3 0 0 0 0 0 0 0 \n", + "4 0 0 0 0 0 0 0 \n", + "\n", + " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n", + "0 0 0 0 ... 0 0 0 \n", + "1 0 0 0 ... 0 0 0 \n", + "2 0 0 0 ... 0 0 0 \n", + "3 0 0 0 ... 0 0 0 \n", + "4 0 0 0 ... 0 0 0 \n", + "\n", + " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n", + "0 0 0 0 0 0 0 0 \n", + "1 0 0 0 0 0 0 0 \n", + "2 0 0 0 0 0 0 0 \n", + "3 0 0 0 0 0 0 0 \n", + "4 0 0 0 0 0 1 0 \n", + "\n", + "[5 rows x 380 columns]" + ], + "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
almondangelicaaniseanise_seedappleapple_brandyapricotarmagnacartemisiaartichoke...whiskeywhite_breadwhite_winewhole_grain_wheat_flourwinewoodyamyeastyogurtzucchini
00000000000...0000000000
11000000000...0000000000
20000000000...0000000000
30000000000...0000000000
40000000000...0000000010
\n

5 rows ร— 380 columns

\n
" + }, + "metadata": {}, + "execution_count": 28 + } + ], + "source": [ + "recipes_feature_df = recipes_df.drop(['Unnamed: 0', 'cuisine'], axis=1)\n", + "recipes_feature_df.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 29, + "metadata": {}, + "outputs": [], + "source": [ + "X_train, X_test, y_train, y_test = train_test_split(recipes_feature_df, recipes_label_df, test_size=0.3)" + ] + }, + { + "cell_type": "code", + "execution_count": 30, + "metadata": {}, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Accuracy is 0.810675562969141\n" + ] + } + ], + "source": [ + "lr = LogisticRegression(multi_class='ovr',solver='lbfgs')\n", + "model = lr.fit(X_train, np.ravel(y_train))\n", + "\n", + "accuracy = model.score(X_test, y_test)\n", + "print (\"Accuracy is {}\".format(accuracy))" + ] + }, + { + "cell_type": "code", + "execution_count": 31, + "metadata": {}, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "ingredients: Index(['bean', 'coriander', 'cumin', 'fenugreek', 'pepper', 'turmeric',\n 'vegetable_oil'],\n dtype='object')\ncusine: thai\n" + ] + } + ], + "source": [ + "# test an item\n", + "print(f'ingredients: {X_test.iloc[20][X_test.iloc[20]!=0].keys()}')\n", + "print(f'cuisine: {y_test.iloc[20]}')" + ] + }, + { + "cell_type": "code", + "execution_count": 32, + "metadata": {}, + "outputs": [ + { + "output_type": "execute_result", + "data": { + "text/plain": [ + " 0\n", + "indian 0.530435\n", + "thai 0.344293\n", + "japanese 0.108792\n", + "chinese 0.015001\n", + "korean 0.001480" + ], + "text/html": "
\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n
0
indian0.530435
thai0.344293
japanese0.108792
chinese0.015001
korean0.001480
\n
" + }, + "metadata": {}, + "execution_count": 32 + } + ], + "source": [ + "#rehsape to 2d array and transpose\r\n", + "test= X_test.iloc[20].values.reshape(-1, 1).T\r\n", + "# predict with score\r\n", + "proba = model.predict_proba(test)\r\n", + "classes = model.classes_\r\n", + "# create df with classes and scores\r\n", + "resultdf = pd.DataFrame(data=proba, columns=classes)\r\n", + "\r\n", + "# create df to show results\r\n", + "topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])\r\n", + "topPrediction.head()" + ] + }, + { + "cell_type": "code", + "execution_count": 33, + "metadata": {}, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + " precision recall f1-score support\n\n chinese 0.75 0.67 0.70 231\n indian 0.91 0.90 0.90 255\n japanese 0.77 0.82 0.79 260\n korean 0.83 0.83 0.83 220\n thai 0.79 0.83 0.81 233\n\n accuracy 0.81 1199\n macro avg 0.81 0.81 0.81 1199\nweighted avg 0.81 0.81 0.81 1199\n\n" + ] + } + ], + "source": [ + "y_pred = model.predict(X_test)\r\n", + "print(classification_report(y_test,y_pred))" + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "# Try different classifiers" + ] + }, + { + "cell_type": "code", + "execution_count": 34, + "metadata": {}, + "outputs": [], + "source": [ + "\r\n", + "C = 10\r\n", + "# Create different classifiers.\r\n", + "classifiers = {\r\n", + " 'L1 logistic': LogisticRegression(C=C, penalty='l1',\r\n", + " solver='saga',\r\n", + " multi_class='multinomial',\r\n", + " max_iter=10000),\r\n", + " 'L2 logistic (Multinomial)': LogisticRegression(C=C, penalty='l2',\r\n", + " solver='saga',\r\n", + " multi_class='multinomial',\r\n", + " max_iter=10000),\r\n", + " 'L2 logistic (OvR)': LogisticRegression(C=C, penalty='l2',\r\n", + " solver='saga',\r\n", + " multi_class='ovr',\r\n", + " max_iter=10000),\r\n", + " 'Linear SVC': SVC(kernel='linear', C=C, probability=True,\r\n", + " random_state=0)\r\n", + "}\r\n" + ] + }, + { + "cell_type": "code", + "execution_count": 35, + "metadata": {}, + "outputs": [ + { + "output_type": "stream", + "name": "stdout", + "text": [ + "Accuracy (train) for L1 logistic: 79.4% \n", + "Accuracy (train) for L2 logistic (Multinomial): 79.2% \n", + "Accuracy (train) for L2 logistic (OvR): 80.2% \n", + "Accuracy (train) for Linear SVC: 79.1% \n" + ] + } + ], + "source": [ + "n_classifiers = len(classifiers)\r\n", + "\r\n", + "for index, (name, classifier) in enumerate(classifiers.items()):\r\n", + " classifier.fit(X_train, np.ravel(y_train))\r\n", + "\r\n", + " y_pred = classifier.predict(X_test)\r\n", + " accuracy = accuracy_score(y_test, y_pred)\r\n", + " print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\r\n" + ] + }, + { + "cell_type": "code", + "execution_count": null, + "metadata": {}, + "outputs": [], + "source": [] + } + ], + "metadata": { + "interpreter": { + "hash": "dd61f40108e2a19f4ef0d3ebbc6b6eea57ab3c4bc13b15fe6f390d3d86442534" + }, + "kernelspec": { + "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7", + "display_name": "Python 3.7.0 64-bit ('3.7')" + }, + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": "3.7.0" + }, + "metadata": { + "interpreter": { + "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d" + } + } + }, + "nbformat": 4, + "nbformat_minor": 4 +} \ No newline at end of file diff --git a/4-Classification/3-Generative/translations/README.es.md b/4-Classification/2-Classifiers-1/translations/README.es.md similarity index 100% rename from 4-Classification/3-Generative/translations/README.es.md rename to 4-Classification/2-Classifiers-1/translations/README.es.md diff --git a/4-Classification/3-Generative/README.md b/4-Classification/3-Classifiers-2/README.md similarity index 100% rename from 4-Classification/3-Generative/README.md rename to 4-Classification/3-Classifiers-2/README.md diff --git a/4-Classification/3-Generative/assignment.md b/4-Classification/3-Classifiers-2/assignment.md similarity index 100% rename from 4-Classification/3-Generative/assignment.md rename to 4-Classification/3-Classifiers-2/assignment.md diff --git a/4-Classification/3-Classifiers-2/notebook.ipynb b/4-Classification/3-Classifiers-2/notebook.ipynb new file mode 100644 index 000000000..e69de29bb diff --git a/4-Classification/3-Classifiers-2/solution/notebook.ipynb b/4-Classification/3-Classifiers-2/solution/notebook.ipynb new file mode 100644 index 000000000..e69de29bb diff --git a/4-Classification/3-Classifiers-2/translations/README.es.md b/4-Classification/3-Classifiers-2/translations/README.es.md new file mode 100644 index 000000000..e69de29bb diff --git a/4-Classification/4-Applied/notebook.ipynb b/4-Classification/4-Applied/notebook.ipynb new file mode 100644 index 000000000..e69de29bb diff --git a/4-Classification/4-Applied/solution/notebook.ipynb b/4-Classification/4-Applied/solution/notebook.ipynb new file mode 100644 index 000000000..e69de29bb diff --git a/4-Classification/README.md b/4-Classification/README.md index a313fc477..dbf584363 100644 --- a/4-Classification/README.md +++ b/4-Classification/README.md @@ -1,20 +1,19 @@ # Getting Started with Classification ## Regional topic: Delicious Asian Recipes ๐Ÿœ -In Asia, food traditions are extremely diverse, and very delicious! Let's look at data about regional recipes to try to guess where they originated. +In Asia and India, food traditions are extremely diverse, and very delicious! Let's look at data about regional recipes to try to guess where they originated. ![Thai food seller](./images/thai-food.jpg) > Photo by Lisheng Chang on Unsplash - ## What you will learn -In this section, you will build on the skills you learned in Lesson 1 (Regression) to learn about more classifiers you can use that will help you learn about your data. +In this section, you will build on the skills you learned in Lesson 1 (Regression) to learn about other classifiers you can use that will help you learn about your data. ## Lessons 1. [Introduction to Classification](1-Introduction/README.md) -2. [Build a Discriminative Model](2-Discriminative/README.md) -3. [Build a Generative Model](3-Generative/README.md) +2. [More Classifiers](2-Classifiers-1/README.md) +3. [Yet Other Classifiers](3-Classifiers-2/README.md) 4. [Applied ML: Build a Web App](4-Applied/README.md) ## Credits