diff --git a/.gitignore b/.gitignore
index a80a15e3..51f47a5a 100644
--- a/.gitignore
+++ b/.gitignore
@@ -33,6 +33,8 @@ bld/
# Visual Studio 2015/2017 cache/options directory
.vs/
+# Visual Studio Code cache/options directory
+.vscode/
# Uncomment if you have tasks that create the project's static files in wwwroot
#wwwroot/
diff --git a/1-Introduction/1-intro-to-ML/README.md b/1-Introduction/1-intro-to-ML/README.md
index e18a0036..db1c72b1 100644
--- a/1-Introduction/1-intro-to-ML/README.md
+++ b/1-Introduction/1-intro-to-ML/README.md
@@ -102,6 +102,8 @@ Sketch, on paper or using an online app like [Excalidraw](https://excalidraw.com
To learn more about how you can work with ML algorithms in the cloud, follow this [Learning Path](https://docs.microsoft.com/learn/paths/create-no-code-predictive-models-azure-machine-learning/?WT.mc_id=academic-15963-cxa).
+Take a [Learning Path](https://docs.microsoft.com/learn/modules/introduction-to-machine-learning/?WT.mc_id=academic-15963-cxa) about the basics of ML.
+
## Assignment
[Get up and running](assignment.md)
diff --git a/1-Introduction/1-intro-to-ML/translations/README.fr.md b/1-Introduction/1-intro-to-ML/translations/README.fr.md
new file mode 100644
index 00000000..d8915f58
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/README.fr.md
@@ -0,0 +1,109 @@
+# Introduction au machine learning
+
+[](https://youtu.be/lTd9RSxS9ZE "ML, AI, deep learning - What's the difference?")
+
+> 🎥 Cliquer sur l'image ci-dessus afin de regarder une vidéo expliquant la différence entre machine learning, AI et deep learning.
+
+## [Quiz préalable](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/1?loc=fr)
+
+### Introduction
+
+Bienvenue à ce cours sur le machine learning classique pour débutant ! Que vous soyez complètement nouveau sur ce sujet ou que vous soyez un professionnel du ML expérimenté cherchant à peaufiner vos connaissances, nous sommes heureux de vous avoir avec nous ! Nous voulons créer un tremplin chaleureux pour vos études en ML et serions ravis d'évaluer, de répondre et d'apprendre de vos retours d'[expériences](https://github.com/microsoft/ML-For-Beginners/discussions).
+
+[](https://youtu.be/h0e2HAPTGF4 "Introduction to ML")
+
+> 🎥 Cliquer sur l'image ci-dessus afin de regarder une vidéo: John Guttag du MIT introduit le machine learning
+### Débuter avec le machine learning
+
+Avant de commencer avec ce cours, vous aurez besoin d'un ordinateur configuré et prêt à faire tourner des notebooks (jupyter) localement.
+
+- **Configurer votre ordinateur avec ces vidéos**. Apprendre comment configurer votre ordinateur avec cette [série de vidéos](https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6).
+- **Apprendre Python**. Il est aussi recommandé d'avoir une connaissance basique de [Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa), un langage de programmaton utile pour les data scientist que nous utilisons tout au long de ce cours.
+- **Apprendre Node.js et Javascript**. Nous utilisons aussi Javascript par moment dans ce cours afin de construire des applications WEB, vous aurez donc besoin de [node](https://nodejs.org) et [npm](https://www.npmjs.com/) installé, ainsi que de [Visual Studio Code](https://code.visualstudio.com/) pour développer en Python et Javascript.
+- **Créer un compte GitHub**. Comme vous nous avez trouvé sur [GitHub](https://github.com), vous y avez sûrement un compte, mais si non, créez en un et répliquez ce cours afin de l'utiliser à votre grés. (N'oublier pas de nous donner une étoile aussi 😊)
+- **Explorer Scikit-learn**. Familiariser vous avec [Scikit-learn](https://scikit-learn.org/stable/user_guide.html), un ensemble de librairies ML que nous mentionnons dans nos leçons.
+
+### Qu'est-ce que le machine learning
+
+Le terme `machine learning` est un des mots les plus populaire et le plus utilisé ces derniers temps. Il y a une probabilité accrue que vous l'ayez entendu au moins une fois si vous avez une appétence pour la technologie indépendamment du domaine dans lequel vous travaillez. Le fonctionnement du machine learning, cependant, reste un mystère pour la plupart des personnes. Pour un débutant en machine learning, le sujet peut nous submerger. Ainsi, il est important de comprendre ce qu'est le machine learning et de l'apprendre petit à petit au travers d'exemples pratiques.
+
+
+
+> Google Trends montre la récente 'courbe de popularité' pour le mot 'machine learning'
+
+Nous vivons dans un univers rempli de mystères fascinants. De grands scientifiques comme Stephen Hawking, Albert Einstein et pleins d'autres ont dévoués leur vie à la recherche d'informations utiles afin de dévoiler les mystères qui nous entourent. C'est la condition humaine pour apprendre : un enfant apprend de nouvelles choses et découvre la structure du monde année après année jusqu'à qu'ils deviennent adultes.
+
+Le cerveau d'un enfant et ses sens perçoivent l'environnement qui les entourent et apprennent graduellement des schémas non observés de la vie qui vont l'aider à fabriquer des règles logiques afin d'identifier les schémas appris. Le processus d'apprentissage du cerveau humain est ce que rend les hommes comme la créature la plus sophistiquée du monde vivant. Apprendre continuellement par la découverte de schémas non observés et ensuite innover sur ces schémas nous permet de nous améliorer tout au long de notre vie. Cette capacité d'apprendre et d'évoluer est liée au concept de [plasticité neuronale](https://www.simplypsychology.org/brain-plasticity.html), nous pouvons tirer quelques motivations similaires entre le processus d'apprentissage du cerveau humain et le concept de machine learning.
+
+Le [cerveau humain](https://www.livescience.com/29365-human-brain.html) perçoit des choses du monde réel, assimile les informations perçues, fait des décisions rationnelles et entreprend certaines actions selon le contexte. C'est ce que l'on appelle se comporter intelligemment. Lorsque nous programmons une reproduction du processus de ce comportement à une machine, c'est ce que l'on appelle intelligence artificielle (IA).
+
+Bien que le terme peut être confu, machine learning (ML) est un important sous-ensemble de l'intelligence artificielle. **ML se réfère à l'utilisation d'algorithmes spécialisés afin de découvrir des informations utiles et de trouver des schémas non observés depuis des données perçues pour corroborer un processus de décision rationnel**.
+
+
+
+> Un diagramme montrant les relations entre AI, ML, deep learning et data science. Infographie par [Jen Looper](https://twitter.com/jenlooper) et inspiré par [ce graphique](https://softwareengineering.stackexchange.com/questions/366996/distinction-between-ai-ml-neural-networks-deep-learning-and-data-mining)
+
+## Ce que vous allez apprendre dans ce cours
+
+Dans ce cours, nous allons nous concentrer sur les concepts clés du machine learning qu'un débutant se doit de connaître. Nous parlerons de ce que l'on appelle le 'machine learning classique' en utilisant principalement Scikit-learn, une excellente librairie que beaucoup d'étudiants utilisent afin d'apprendre les bases. Afin de comprendre les concepts plus larges de l'intelligence artificielle ou du deep learning, une profonde connaissance en machine learning est indispensable, et c'est ce que nous aimerions fournir ici.
+
+Dans ce cours, vous allez apprendre :
+
+- Les concepts clés du machine learning
+- L'histoire du ML
+- ML et équité (fairness)
+- Les techniques de régression ML
+- Les techniques de classification ML
+- Les techniques de regroupement (clustering) ML
+- Les techniques du traitement automatique des langues (NLP) ML
+- Les techniques de prédictions à partir de séries chronologiques ML
+- Apprentissage renforcé
+- D'applications réels du ML
+
+## Ce que nous ne couvrirons pas
+
+- Deep learning
+- Neural networks
+- IA
+
+Afin d'avoir la meilleur expérience d'apprentissage, nous éviterons les complexités des réseaux neuronaux, du 'deep learning' (construire un modèle utilisant plusieurs couches de réseaux neuronaux) et IA, dont nous parlerons dans un cours différent. Nous offirons aussi un cours à venir sur la data science pour concentrer sur cet aspect de champs très large.
+
+## Pourquoi etudier le machine learning ?
+
+Le machine learning, depuis une perspective systémique, est défini comme la création de systèmes automatiques pouvant apprendre des schémas non observés depuis des données afin d'aider à prendre des décisions intelligentes.
+
+Ce but est faiblement inspiré de la manière dont le cerveau humain apprend certaines choses depuis les données qu'il perçoit du monde extérieur.
+
+✅ Penser une minute aux raisons qu'une entreprise aurait d'essayer d'utiliser des stratégies de machine learning au lieu de créer des règles codés en dur.
+
+### Les applications du machine learning
+
+Les applications du machine learning sont maintenant pratiquement partout, et sont aussi omniprésentes que les données qui circulent autour de notre société (générés par nos smartphones, appareils connectés ou autres systèmes). En prenant en considération l'immense potentiel des algorithmes dernier cri de machine learning, les chercheurs ont pu exploités leurs capacités afin de résoudre des problèmes multidimensionnels et interdisciplinaires de la vie avec d'important retours positifs
+
+**Vous pouvez utiliser le machine learning de plusieurs manières** :
+
+- Afin de prédire la possibilité d'avoir une maladie à partir des données médicales d'un patient.
+- Pour tirer parti des données météorologiques afin de prédire les événements météorologiques.
+- Afin de comprendre le sentiment d'un texte.
+- Afin de détecter les fake news pour stopper la propagation de la propagande.
+
+La finance, l'économie, les sciences de la terre, l'exploration spatiale, le génie biomédical, les sciences cognitives et même les domaines des sciences humaines ont adapté le machine learning pour résoudre les problèmes ardus et lourds de traitement des données dans leur domaine respectif.
+
+Le machine learning automatise le processus de découverte de modèles en trouvant des informations significatives à partir de données réelles ou générées. Il s'est avéré très utile dans les applications commerciales, de santé et financières, entre autres.
+
+Dans un avenir proche, comprendre les bases du machine learning sera indispensable pour les personnes de tous les domaines en raison de son adoption généralisée.
+
+---
+## 🚀 Challenge
+
+Esquisser, sur papier ou à l'aide d'une application en ligne comme [Excalidraw](https://excalidraw.com/), votre compréhension des différences entre l'IA, le ML, le deep learning et la data science. Ajouter quelques idées de problèmes que chacune de ces techniques est bonne à résoudre.
+
+## [Quiz de validation des connaissances](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/2?loc=fr)
+
+## Révision et auto-apprentissage
+
+Pour en savoir plus sur la façon dont vous pouvez utiliser les algorithmes de ML dans le cloud, suivez ce [Parcours d'apprentissage](https://docs.microsoft.com/learn/paths/create-no-code-predictive-models-azure-machine-learning/?WT.mc_id=academic-15963-cxa).
+
+## Devoir
+
+[Être opérationnel](assignment.fr.md)
diff --git a/1-Introduction/1-intro-to-ML/translations/README.id.md b/1-Introduction/1-intro-to-ML/translations/README.id.md
new file mode 100644
index 00000000..d0daadd8
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/README.id.md
@@ -0,0 +1,107 @@
+# Pengantar Machine Learning
+
+[](https://youtu.be/lTd9RSxS9ZE "ML, AI, deep learning - Apa perbedaannya?")
+
+> 🎥 Klik gambar diatas untuk menonton video yang mendiskusikan perbedaan antara Machine Learning, AI, dan Deep Learning.
+
+## [Quiz Pra-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/1/)
+
+### Pengantar
+
+Selamat datang di pelajaran Machine Learning klasik untuk pemula! Baik kamu yang masih benar-benar baru, atau seorang praktisi ML berpengalaman yang ingin meningkatkan kemampuan kamu, kami senang kamu ikut bersama kami! Kami ingin membuat sebuah titik mulai yang ramah untuk pembelajaran ML kamu dan akan sangat senang untuk mengevaluasi, merespon, dan memasukkan [umpan balik](https://github.com/microsoft/ML-For-Beginners/discussions) kamu.
+
+[](https://youtu.be/h0e2HAPTGF4 "Pengantar Machine Learning")
+
+> 🎥 Klik gambar diatas untuk menonton video: John Guttag dari MIT yang memberikan pengantar Machine Learning.
+### Memulai Machine Learning
+
+Sebelum memulai kurikulum ini, kamu perlu memastikan komputer kamu sudah dipersiapkan untuk menjalankan *notebook* secara lokal.
+
+- **Konfigurasi komputer kamu dengan video ini**. Pelajari bagaimana menyiapkan komputer kamu dalam [video-video](https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6) ini.
+- **Belajar Python**. Disarankan juga untuk memiliki pemahaman dasar dari [Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa), sebuah bahasa pemrograman yang digunakan oleh data scientist yang juga akan kita gunakan dalam pelajaran ini.
+- **Belajar Node.js dan JavaScript**. Kita juga menggunakan JavaScript beberapa kali dalam pelajaran ini ketika membangun aplikasi web, jadi kamu perlu menginstal [node](https://nodejs.org) dan [npm](https://www.npmjs.com/), serta [Visual Studio Code](https://code.visualstudio.com/) yang tersedia untuk pengembangan Python dan JavaScript.
+- **Buat akun GitHub**. Karena kamu menemukan kami di [GitHub](https://github.com), kamu mungkin sudah punya akun, tapi jika belum, silakan buat akun baru kemudian *fork* kurikulum ini untuk kamu pergunakan sendiri. (Jangan ragu untuk memberikan kami bintang juga 😊)
+- **Jelajahi Scikit-learn**. Buat diri kamu familiar dengan [Scikit-learn]([https://scikit-learn.org/stable/user_guide.html), seperangkat *library* ML yang kita acu dalam pelajaran-pelajaran ini.
+
+### Apa itu Machine Learning?
+
+Istilah 'Machine Learning' merupakan salah satu istilah yang paling populer dan paling sering digunakan saat ini. Ada kemungkinan kamu pernah mendengar istilah ini paling tidak sekali jika kamu familiar dengan teknologi. Tetapi untuk mekanisme Machine Learning sendiri, merupakan sebuah misteri bagi sebagian besar orang. Karena itu, penting untuk memahami sebenarnya apa itu Machine Learning, dan mempelajarinya langkah demi langkah melalui contoh praktis.
+
+
+
+> Google Trends memperlihatkan 'kurva tren' dari istilah 'Machine Learning' belakangan ini.
+
+Kita hidup di sebuah alam semesta yang penuh dengan misteri yang menarik. Ilmuwan-ilmuwan besar seperti Stephen Hawking, Albert Einstein, dan banyak lagi telah mengabdikan hidup mereka untuk mencari informasi yang berarti yang mengungkap misteri dari dunia disekitar kita. Ini adalah kondisi belajar manusia: seorang anak manusia belajar hal-hal baru dan mengungkap struktur dari dunianya tahun demi tahun saat mereka tumbuh dewasa.
+
+Otak dan indera seorang anak memahami fakta-fakta di sekitarnya dan secara bertahap mempelajari pola-pola kehidupan yang tersembunyi yang membantu anak untuk menyusun aturan-aturan logis untuk mengidentifikasi pola-pola yang dipelajari. Proses pembelajaran otak manusia ini menjadikan manusia sebagai makhluk hidup paling canggih di dunia ini. Belajar terus menerus dengan menemukan pola-pola tersembunyi dan kemudian berinovasi pada pola-pola itu memungkinkan kita untuk terus menjadikan diri kita lebih baik sepanjang hidup. Kapasitas belajar dan kemampuan berkembang ini terkait dengan konsep yang disebut dengan *[brain plasticity](https://www.simplypsychology.org/brain-plasticity.html)*. Secara sempit, kita dapat menarik beberapa kesamaan motivasi antara proses pembelajaran otak manusia dan konsep Machine Learning.
+
+[Otak manusia](https://www.livescience.com/29365-human-brain.html) menerima banyak hal dari dunia nyata, memproses informasi yang diterima, membuat keputusan rasional, dan melakukan aksi-aksi tertentu berdasarkan keadaan. Inilah yang kita sebut dengan berperilaku cerdas. Ketika kita memprogram sebuah salinan dari proses perilaku cerdas ke sebuah mesin, ini dinamakan kecerdasan buatan atau Artificial Intelligence (AI).
+
+Meskipun istilah-stilahnya bisa membingungkan, Machine Learning (ML) adalah bagian penting dari Artificial Intelligence. **ML berkaitan dengan menggunakan algoritma-algoritma terspesialisasi untuk mengungkap informasi yang berarti dan mencari pola-pola tersembunyi dari data yang diterima untuk mendukung proses pembuatan keputusan rasional**.
+
+
+
+> Sebuah diagram yang memperlihatkan hubungan antara AI, ML, Deep Learning, dan Data Science. Infografis oleh [Jen Looper](https://twitter.com/jenlooper) terinspirasi dari [infografis ini](https://softwareengineering.stackexchange.com/questions/366996/distinction-between-ai-ml-neural-networks-deep-learning-and-data-mining)
+
+## Apa yang akan kamu pelajari
+
+Dalam kurikulum ini, kita hanya akan membahas konsep inti dari Machine Learning yang harus diketahui oleh seorang pemula. Kita membahas apa yang kami sebut sebagai 'Machine Learning klasik' utamanya menggunakan Scikit-learn, sebuah *library* luar biasa yang banyak digunakan para siswa untuk belajar dasarnya. Untuk memahami konsep Artificial Intelligence atau Deep Learning yang lebih luas, pengetahuan dasar yang kuat tentang Machine Learning sangat diperlukan, itulah yang ingin kami tawarkan di sini.
+
+Kamu akan belajar:
+
+- Konsep inti ML
+- Sejarah dari ML
+- Keadilan dan ML
+- Teknik regresi ML
+- Teknik klasifikasi ML
+- Teknik *clustering* ML
+- Teknik *natural language processing* ML
+- Teknik *time series forecasting* ML
+- *Reinforcement learning*
+- Penerapan nyata dari ML
+## Yang tidak akan kita bahas
+
+- *deep learning*
+- *neural networks*
+- AI
+
+Untuk membuat pengalaman belajar yang lebih baik, kita akan menghindari kerumitan dari *neural network*, *deep learning* - membangun *many-layered model* menggunakan *neural network* - dan AI, yang mana akan kita bahas dalam kurikulum yang berbeda. Kami juga akan menawarkan kurikulum *data science* yang berfokus pada aspek bidang tersebut.
+## Kenapa belajar Machine Learning?
+
+Machine Learning, dari perspektif sistem, didefinisikan sebagai pembuatan sistem otomatis yang dapat mempelajari pola-pola tersembunyi dari data untuk membantu membuat keputusan cerdas.
+
+Motivasi ini secara bebas terinspirasi dari bagaimana otak manusia mempelajari hal-hal tertentu berdasarkan data yang diterimanya dari dunia luar.
+
+✅ Pikirkan sejenak mengapa sebuah bisnis ingin mencoba menggunakan strategi Machine Learning dibandingkan membuat sebuah mesin berbasis aturan yang tertanam (*hard-coded*).
+
+### Penerapan Machine Learning
+
+Penerapan Machine Learning saat ini hampir ada di mana-mana, seperti data yang mengalir di sekitar kita, yang dihasilkan oleh ponsel pintar, perangkat yang terhubung, dan sistem lainnya. Mempertimbangkan potensi besar dari algoritma Machine Learning terkini, para peneliti telah mengeksplorasi kemampuan Machine Learning untuk memecahkan masalah kehidupan nyata multi-dimensi dan multi-disiplin dengan hasil positif yang luar biasa.
+
+**Kamu bisa menggunakan Machine Learning dalam banyak hal**:
+
+- Untuk memprediksi kemungkinan penyakit berdasarkan riwayat atau laporan medis pasien.
+- Untuk memanfaatkan data cuaca untuk memprediksi peristiwa cuaca.
+- Untuk memahami sentimen sebuah teks.
+- Untuk mendeteksi berita palsu untuk menghentikan penyebaran propaganda.
+
+Keuangan, ekonomi, geosains, eksplorasi ruang angkasa, teknik biomedis, ilmu kognitif, dan bahkan bidang humaniora telah mengadaptasi Machine Learning untuk memecahkan masalah sulit pemrosesan data di bidang mereka.
+
+Machine Learning mengotomatiskan proses penemuan pola dengan menemukan wawasan yang berarti dari dunia nyata atau dari data yang dihasilkan. Machine Learning terbukti sangat berharga dalam penerapannya di berbagai bidang, diantaranya adalah bidang bisnis, kesehatan, dan keuangan.
+
+Dalam waktu dekat, memahami dasar-dasar Machine Learning akan menjadi suatu keharusan bagi orang-orang dari bidang apa pun karena adopsinya yang luas.
+
+---
+## 🚀 Tantangan
+
+Buat sketsa di atas kertas atau menggunakan aplikasi seperti [Excalidraw](https://excalidraw.com/), mengenai pemahaman kamu tentang perbedaan antara AI, ML, Deep Learning, dan Data Science. Tambahkan beberapa ide masalah yang cocok diselesaikan masing-masing teknik.
+
+## [Quiz Pasca-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/2/)
+
+## Ulasan & Belajar Mandiri
+
+Untuk mempelajari lebih lanjut tentang bagaimana kamu dapat menggunakan algoritma ML di cloud, ikuti [Jalur Belajar](https://docs.microsoft.com/learn/paths/create-no-code-predictive-models-azure-machine-learning/?WT.mc_id=academic-15963-cxa) ini.
+
+## Tugas
+
+[Persiapan](assignment.id.md)
diff --git a/1-Introduction/1-intro-to-ML/translations/README.ja.md b/1-Introduction/1-intro-to-ML/translations/README.ja.md
index aded0f7e..fd3c11d1 100644
--- a/1-Introduction/1-intro-to-ML/translations/README.ja.md
+++ b/1-Introduction/1-intro-to-ML/translations/README.ja.md
@@ -4,7 +4,7 @@
> 🎥 上の画像をクリックすると、機械学習、AI、深層学習の違いについて説明した動画が表示されます。
-## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/1/)
+## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/1?loc=ja)
### イントロダクション
@@ -94,12 +94,12 @@
## 🚀 Challenge
AI、ML、深層学習、データサイエンスの違いについて理解していることを、紙や[Excalidraw](https://excalidraw.com/)などのオンラインアプリを使ってスケッチしてください。また、それぞれの技術が得意とする問題のアイデアを加えてみてください。
-## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/2/)
+## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/2?loc=ja)
## 振り返りと自習
-クラウド上でMLアルゴリズムをどのように扱うことができるかについては、この[ラーニングパス](https://docs.microsoft.com/learn/paths/create-no-code-predictive-models-azure-machine-learning/?WT.mc_id=academic-15963-cxa)に従ってください。.
+クラウド上でMLアルゴリズムをどのように扱うことができるかについては、この[ラーニングパス](https://docs.microsoft.com/learn/paths/create-no-code-predictive-models-azure-machine-learning/?WT.mc_id=academic-15963-cxa)に従ってください。
## 課題
-[起動し、実行してください。](assignment.md)
+[稼働させる](assignment.ja.md)
diff --git a/1-Introduction/1-intro-to-ML/translations/README.zh-cn.md b/1-Introduction/1-intro-to-ML/translations/README.zh-cn.md
index 8693ff20..45ec79be 100644
--- a/1-Introduction/1-intro-to-ML/translations/README.zh-cn.md
+++ b/1-Introduction/1-intro-to-ML/translations/README.zh-cn.md
@@ -104,4 +104,4 @@
## 任务
-[启动并运行](../assignment.md)
+[启动并运行](assignment.zh-cn.md)
diff --git a/1-Introduction/1-intro-to-ML/translations/assignment.es.md b/1-Introduction/1-intro-to-ML/translations/assignment.es.md
new file mode 100644
index 00000000..5241ca96
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/assignment.es.md
@@ -0,0 +1,9 @@
+# Lévantate y corre
+
+## Instrucciones
+
+En esta tarea no calificada, debe repasar Python y hacer que su entorno esté en funcionamiento y sea capaz de ejecutar cuadernos.
+
+Tome esta [Ruta de aprendizaje de Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa), y luego configure sus sistemas con estos videos introductorios:
+
+https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
diff --git a/1-Introduction/1-intro-to-ML/translations/assignment.fr.md b/1-Introduction/1-intro-to-ML/translations/assignment.fr.md
new file mode 100644
index 00000000..0d703d26
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/assignment.fr.md
@@ -0,0 +1,10 @@
+# Être opérationnel
+
+
+## Instructions
+
+Dans ce devoir non noté, vous devez vous familiariser avec Python et rendre votre environnement opérationnel et capable d'exécuter des notebook.
+
+Suivez ce [parcours d'apprentissage Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa), puis configurez votre système en parcourant ces vidéos introductives :
+
+https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
diff --git a/1-Introduction/1-intro-to-ML/translations/assignment.id.md b/1-Introduction/1-intro-to-ML/translations/assignment.id.md
new file mode 100644
index 00000000..c6ba6e4a
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/assignment.id.md
@@ -0,0 +1,9 @@
+# Persiapan
+
+## Instruksi
+
+Dalam tugas yang tidak dinilai ini, kamu akan mempelajari Python dan mempersiapkan *environment* kamu sehingga dapat digunakan untuk menjalankan *notebook*.
+
+Ambil [Jalur Belajar Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa) ini, kemudian persiapkan sistem kamu dengan menonton video-video pengantar ini:
+
+https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
diff --git a/1-Introduction/1-intro-to-ML/translations/assignment.ja.md b/1-Introduction/1-intro-to-ML/translations/assignment.ja.md
new file mode 100644
index 00000000..9c86969c
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/assignment.ja.md
@@ -0,0 +1,9 @@
+# 稼働させる
+
+## 指示
+
+この評価のない課題では、Pythonについて復習し、環境を稼働させてノートブックを実行できるようにする必要があります。
+
+この[Pythonラーニングパス](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa)を受講し、次の入門用ビデオに従ってシステムをセットアップしてください。
+
+https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
diff --git a/1-Introduction/1-intro-to-ML/translations/assignment.zh-cn.md b/1-Introduction/1-intro-to-ML/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..fd59f691
--- /dev/null
+++ b/1-Introduction/1-intro-to-ML/translations/assignment.zh-cn.md
@@ -0,0 +1,9 @@
+# 启动和运行
+
+## 说明
+
+在这个不评分的作业中,你应该温习一下 Python,将 Python 环境能够运行起来,并且可以运行 notebooks。
+
+学习这个 [Python 学习路径](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-15963-cxa),然后通过这些介绍性的视频将你的系统环境设置好:
+
+https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
diff --git a/1-Introduction/2-history-of-ML/translations/README.fr.md b/1-Introduction/2-history-of-ML/translations/README.fr.md
new file mode 100644
index 00000000..9c59eb6f
--- /dev/null
+++ b/1-Introduction/2-history-of-ML/translations/README.fr.md
@@ -0,0 +1,117 @@
+# Histoire du Machine Learning (apprentissage automatique)
+
+
+> Sketchnote de [Tomomi Imura](https://www.twitter.com/girlie_mac)
+
+## [Quizz préalable](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/3?loc=fr)
+
+Dans cette leçon, nous allons parcourir les principales étapes de l'histoire du machine learning et de l'intelligence artificielle.
+
+L'histoire de l'intelligence artificielle, l'IA, en tant que domaine est étroitement liée à l'histoire du machine learning, car les algorithmes et les avancées informatiques qui sous-tendent le ML alimentent le développement de l'IA. Bien que ces domaines en tant que domaines de recherches distincts ont commencé à se cristalliser dans les années 1950, il est important de rappeler que les [découvertes algorithmiques, statistiques, mathématiques, informatiques et techniques](https://wikipedia.org/wiki/Timeline_of_machine_learning) ont précédé et chevauchait cette époque. En fait, le monde réfléchit à ces questions depuis [des centaines d'années](https://fr.wikipedia.org/wiki/Histoire_de_l%27intelligence_artificielle) : cet article traite des fondements intellectuels historiques de l'idée d'une « machine qui pense ».
+
+## Découvertes notables
+
+- 1763, 1812 [théorème de Bayes](https://wikipedia.org/wiki/Bayes%27_theorem) et ses prédécesseurs. Ce théorème et ses applications sous-tendent l'inférence, décrivant la probabilité qu'un événement se produise sur la base de connaissances antérieures.
+- 1805 [Théorie des moindres carrés](https://wikipedia.org/wiki/Least_squares) par le mathématicien français Adrien-Marie Legendre. Cette théorie, que vous découvrirez dans notre unité Régression, aide à l'ajustement des données.
+- 1913 [Chaînes de Markov](https://wikipedia.org/wiki/Markov_chain) du nom du mathématicien russe Andrey Markov sont utilisées pour décrire une séquence d'événements possibles basée sur un état antérieur.
+- 1957 [Perceptron](https://wikipedia.org/wiki/Perceptron) est un type de classificateur linéaire inventé par le psychologue américain Frank Rosenblatt qui sous-tend les progrès de l'apprentissage en profondeur.
+- 1967 [Nearest Neighbor](https://wikipedia.org/wiki/Nearest_neighbor) est un algorithme conçu à l'origine pour cartographier les itinéraires. Dans un contexte ML, il est utilisé pour détecter des modèles.
+- 1970 [Backpropagation](https://wikipedia.org/wiki/Backpropagation) est utilisé pour former des [réseaux de neurones feedforward (propagation avant)](https://fr.wikipedia.org/wiki/R%C3%A9seau_de_neurones_%C3%A0_propagation_avant).
+- 1982 [Réseaux de neurones récurrents](https://wikipedia.org/wiki/Recurrent_neural_network) sont des réseaux de neurones artificiels dérivés de réseaux de neurones à réaction qui créent des graphes temporels.
+
+✅ Faites une petite recherche. Quelles autres dates sont marquantes dans l'histoire du ML et de l'IA ?
+
+## 1950 : Des machines qui pensent
+
+Alan Turing, une personne vraiment remarquable qui a été élue [par le public en 2019](https://wikipedia.org/wiki/Icons:_The_Greatest_Person_of_the_20th_Century) comme le plus grand scientifique du 20e siècle, est reconnu pour avoir aidé à jeter les bases du concept d'une "machine qui peut penser". Il a lutté avec ses opposants et son propre besoin de preuves empiriques de sa théorie en créant le [Test de Turing] (https://www.bbc.com/news/technology-18475646), que vous explorerez dans nos leçons de NLP (TALN en français).
+
+## 1956 : Projet de recherche d'été à Dartmouth
+
+« Le projet de recherche d'été de Dartmouth sur l'intelligence artificielle a été un événement fondateur pour l'intelligence artificielle en tant que domaine », et c'est ici que le terme « intelligence artificielle » a été inventé ([source](https://250.dartmouth.edu/highlights/artificial-intelligence-ai-coined-dartmouth)).
+
+> Chaque aspect de l'apprentissage ou toute autre caractéristique de l'intelligence peut en principe être décrit si précisément qu'une machine peut être conçue pour les simuler.
+
+Le chercheur en tête, le professeur de mathématiques John McCarthy, espérait « procéder sur la base de la conjecture selon laquelle chaque aspect de l'apprentissage ou toute autre caractéristique de l'intelligence peut en principe être décrit avec une telle précision qu'une machine peut être conçue pour les simuler ». Les participants comprenaient une autre sommité dans le domaine, Marvin Minsky.
+
+L'atelier est crédité d'avoir initié et encouragé plusieurs discussions, notamment « l'essor des méthodes symboliques, des systèmes spécialisés sur des domaines limités (premiers systèmes experts) et des systèmes déductifs par rapport aux systèmes inductifs ». ([source](https://fr.wikipedia.org/wiki/Conf%C3%A9rence_de_Dartmouth)).
+
+## 1956 - 1974 : "Les années d'or"
+
+Des années 50 au milieu des années 70, l'optimisme était au rendez-vous en espérant que l'IA puisse résoudre de nombreux problèmes. En 1967, Marvin Minsky a déclaré avec assurance que « Dans une génération... le problème de la création d'"intelligence artificielle" sera substantiellement résolu. » (Minsky, Marvin (1967), Computation: Finite and Infinite Machines, Englewood Cliffs, N.J.: Prentice-Hall)
+
+La recherche sur le Natural Language Processing (traitement du langage naturel en français) a prospéré, la recherche a été affinée et rendue plus puissante, et le concept de « micro-mondes » a été créé, où des tâches simples ont été effectuées en utilisant des instructions en langue naturelle.
+
+La recherche a été bien financée par les agences gouvernementales, des progrès ont été réalisés dans le calcul et les algorithmes, et des prototypes de machines intelligentes ont été construits. Certaines de ces machines incluent :
+
+* [Shakey le robot](https://fr.wikipedia.org/wiki/Shakey_le_robot), qui pouvait manœuvrer et décider comment effectuer des tâches « intelligemment ».
+
+ 
+ > Shaky en 1972
+
+* Eliza, une des premières « chatbot », pouvait converser avec les gens et agir comme une « thérapeute » primitive. Vous en apprendrez plus sur Eliza dans les leçons de NLP.
+
+ 
+ > Une version d'Eliza, un chatbot
+
+* Le « monde des blocs » était un exemple de micro-monde où les blocs pouvaient être empilés et triés, et où des expériences d'apprentissages sur des machines, dans le but qu'elles prennent des décisions, pouvaient être testées. Les avancées réalisées avec des bibliothèques telles que [SHRDLU](https://fr.wikipedia.org/wiki/SHRDLU) ont contribué à faire avancer le natural language processing.
+
+ [](https://www.youtube.com/watch?v=QAJz4YKUwqw "Monde de blocs avec SHRDLU" )
+
+ > 🎥 Cliquez sur l'image ci-dessus pour une vidéo : Blocks world with SHRDLU
+
+## 1974 - 1980 : « l'hiver de l'IA »
+
+Au milieu des années 1970, il était devenu évident que la complexité de la fabrication de « machines intelligentes » avait été sous-estimée et que sa promesse, compte tenu de la puissance de calcul disponible, avait été exagérée. Les financements se sont taris et la confiance dans le domaine s'est ralentie. Parmi les problèmes qui ont eu un impact sur la confiance, citons :
+
+- **Restrictions**. La puissance de calcul était trop limitée.
+- **Explosion combinatoire**. Le nombre de paramètres à former augmentait de façon exponentielle à mesure que l'on en demandait davantage aux ordinateurs, sans évolution parallèle de la puissance et de la capacité de calcul.
+- **Pénurie de données**. Il y avait un manque de données qui a entravé le processus de test, de développement et de raffinement des algorithmes.
+- **Posions-nous les bonnes questions ?**. Les questions mêmes, qui étaient posées, ont commencé à être remises en question. Les chercheurs ont commencé à émettre des critiques sur leurs approches :
+ - Les tests de Turing ont été remis en question au moyen, entre autres, de la « théorie de la chambre chinoise » qui postulait que « la programmation d'un ordinateur numérique peut faire croire qu'il comprend le langage mais ne peut pas produire une compréhension réelle ». ([source](https://plato.stanford.edu/entries/chinese-room/))
+ - L'éthique de l'introduction d'intelligences artificielles telles que la "thérapeute" ELIZA dans la société a été remise en cause.
+
+Dans le même temps, diverses écoles de pensée sur l'IA ont commencé à se former. Une dichotomie a été établie entre les pratiques IA ["scruffy" et "neat"](https://wikipedia.org/wiki/Neats_and_scruffies). Les laboratoires _Scruffy_ peaufinaient leurs programmes pendant des heures jusqu'à ce qu'ils obtiennent les résultats souhaités. Les laboratoires _Neat_ "se concentraient sur la logique et la résolution formelle de problèmes". ELIZA et SHRDLU étaient des systèmes _scruffy_ bien connus. Dans les années 1980, alors qu'émergeait la demande de rendre les systèmes ML reproductibles, l'approche _neat_ a progressivement pris le devant de la scène car ses résultats sont plus explicables.
+
+## 1980 : Systèmes experts
+
+Au fur et à mesure que le domaine s'est développé, ses avantages pour les entreprises sont devenus plus clairs, particulièrement via les « systèmes experts » dans les années 1980. "Les systèmes experts ont été parmi les premières formes vraiment réussies de logiciels d'intelligence artificielle (IA)." ([source](https://fr.wikipedia.org/wiki/Syst%C3%A8me_expert)).
+
+Ce type de système est en fait _hybride_, composé en partie d'un moteur de règles définissant les exigences métier et d'un moteur d'inférence qui exploite le système de règles pour déduire de nouveaux faits.
+
+Cette époque a également vu une attention croissante accordée aux réseaux de neurones.
+
+## 1987 - 1993 : IA « Chill »
+
+La prolifération du matériel spécialisé des systèmes experts a eu pour effet malheureux de devenir trop spécialisée. L'essor des ordinateurs personnels a également concurrencé ces grands systèmes spécialisés et centralisés. La démocratisation de l'informatique a commencé et a finalement ouvert la voie à l'explosion des mégadonnées.
+
+## 1993 - 2011
+
+Cette époque a vu naître une nouvelle ère pour le ML et l'IA afin de résoudre certains des problèmes qui n'avaient pu l'être plus tôt par le manque de données et de puissance de calcul. La quantité de données a commencé à augmenter rapidement et à devenir plus largement disponibles, pour le meilleur et pour le pire, en particulier avec l'avènement du smartphone vers 2007. La puissance de calcul a augmenté de façon exponentielle et les algorithmes ont évolué parallèlement. Le domaine a commencé à gagner en maturité alors que l'ingéniosité a commencé à se cristalliser en une véritable discipline.
+
+## À présent
+
+Aujourd'hui, le machine learning et l'IA touchent presque tous les aspects de notre vie. Cette ère nécessite une compréhension approfondie des risques et des effets potentiels de ces algorithmes sur les vies humaines. Comme l'a déclaré Brad Smith de Microsoft, « les technologies de l'information soulèvent des problèmes qui vont au cœur des protections fondamentales des droits de l'homme comme la vie privée et la liberté d'expression. Ces problèmes accroissent la responsabilité des entreprises technologiques qui créent ces produits. À notre avis, ils appellent également à une réglementation gouvernementale réfléchie et au développement de normes autour des utilisations acceptables" ([source](https://www.technologyreview.com/2019/12/18/102365/the-future-of-ais-impact-on-society/)).
+
+Reste à savoir ce que l'avenir nous réserve, mais il est important de comprendre ces systèmes informatiques ainsi que les logiciels et algorithmes qu'ils exécutent. Nous espérons que ce programme vous aidera à mieux les comprendre afin que vous puissiez décider par vous-même.
+
+[](https://www.youtube.com/watch?v=mTtDfKgLm54 "L'histoire du Deep Learning")
+> 🎥 Cliquez sur l'image ci-dessus pour une vidéo : Yann LeCun discute de l'histoire du deep learning dans cette conférence
+
+---
+## 🚀Challenge
+
+Plongez dans l'un de ces moments historiques et apprenez-en plus sur les personnes derrière ceux-ci. Il y a des personnalités fascinantes, et aucune découverte scientifique n'a jamais été créée avec un vide culturel. Que découvrez-vous ?
+
+## [Quiz de validation des connaissances](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/4?loc=fr)
+
+## Révision et auto-apprentissage
+
+Voici quelques articles à regarder et à écouter :
+
+[Ce podcast où Amy Boyd discute de l'évolution de l'IA](http://runasradio.com/Shows/Show/739)
+
+[](https://www.youtube.com/watch?v=EJt3_bFYKss "L'histoire de l'IA par Amy Boyd")
+
+## Devoir
+
+[Créer une frise chronologique](assignment.fr.md)
diff --git a/1-Introduction/2-history-of-ML/translations/README.id.md b/1-Introduction/2-history-of-ML/translations/README.id.md
new file mode 100644
index 00000000..5e6a6f0f
--- /dev/null
+++ b/1-Introduction/2-history-of-ML/translations/README.id.md
@@ -0,0 +1,116 @@
+# Sejarah Machine Learning
+
+
+> Catatan sketsa oleh [Tomomi Imura](https://www.twitter.com/girlie_mac)
+
+## [Quiz Pra-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/3/)
+
+Dalam pelajaran ini, kita akan membahas tonggak utama dalam sejarah Machine Learning dan Artificial Intelligence.
+
+Sejarah Artifical Intelligence, AI, sebagai bidang terkait dengan sejarah Machine Learning, karena algoritma dan kemajuan komputasi yang mendukung ML dimasukkan ke dalam pengembangan AI. Penting untuk diingat bahwa, meski bidang-bidang ini sebagai bidang-bidang penelitian yang berbeda mulai terbentuk pada 1950-an, [algoritmik, statistik, matematik, komputasi dan penemuan teknis](https://wikipedia.org/wiki/Timeline_of_machine_learning) penting sudah ada sebelumnya, dan saling tumpang tindih di era ini. Faktanya, orang-orang telah memikirkan pertanyaan-pertanyaan ini selama [ratusan tahun](https://wikipedia.org/wiki/History_of_artificial_intelligence): artikel ini membahas dasar-dasar intelektual historis dari gagasan 'mesin yang berpikir'.
+
+## Penemuan penting
+
+- 1763, 1812 [Bayes Theorem](https://wikipedia.org/wiki/Bayes%27_theorem) dan para pendahulu. Teorema ini dan penerapannya mendasari inferensi, mendeskripsikan kemungkinan suatu peristiwa terjadi berdasarkan pengetahuan sebelumnya.
+- 1805 [Least Square Theory](https://wikipedia.org/wiki/Least_squares) oleh matematikawan Perancis Adrien-Marie Legendre. Teori ini yang akan kamu pelajari di unit Regresi, ini membantu dalam *data fitting*.
+- 1913 [Markov Chains](https://wikipedia.org/wiki/Markov_chain) dinamai dengan nama matematikawan Rusia, Andrey Markov, digunakan untuk mendeskripsikan sebuah urutan dari kejadian-kejadian yang mungkin terjadi berdasarkan kondisi sebelumnya.
+- 1957 [Perceptron](https://wikipedia.org/wiki/Perceptron) adalah sebuah tipe dari *linear classifier* yang ditemukan oleh psikolog Amerika, Frank Rosenblatt, yang mendasari kemajuan dalam *Deep Learning*.
+- 1967 [Nearest Neighbor](https://wikipedia.org/wiki/Nearest_neighbor) adalah sebuah algoritma yang pada awalnya didesain untuk memetakan rute. Dalam konteks ML, ini digunakan untuk mendeteksi berbagai pola.
+- 1970 [Backpropagation](https://wikipedia.org/wiki/Backpropagation) digunakan untuk melatih [feedforward neural networks](https://wikipedia.org/wiki/Feedforward_neural_network).
+- 1982 [Recurrent Neural Networks](https://wikipedia.org/wiki/Recurrent_neural_network) adalah *artificial neural networks* yang berasal dari *feedforward neural networks* yang membuat grafik sementara.
+
+✅ Lakukan sebuah riset kecil. Tanggal berapa lagi yang merupakan tanggal penting dalam sejarah ML dan AI?
+## 1950: Mesin yang berpikir
+
+Alan Turing, merupakan orang luar biasa yang terpilih oleh [publik di tahun 2019](https://wikipedia.org/wiki/Icons:_The_Greatest_Person_of_the_20th_Century) sebagai ilmuwan terhebat di abad 20, diberikan penghargaan karena membantu membuat fondasi dari sebuah konsep 'mesin yang bisa berpikir', Dia berjuang menghadapi orang-orang yang menentangnya dan keperluannya sendiri untuk bukti empiris dari konsep ini dengan membuat [Turing Test](https://www.bbc.com/news/technology-18475646), yang mana akan kamu jelajahi di pelajaran NLP kami.
+
+## 1956: Proyek Riset Musim Panas Dartmouth
+
+"Proyek Riset Musim Panas Dartmouth pada *artificial intelligence* merupakan sebuah acara penemuan untuk *artificial intelligence* sebagai sebuah bidang," dan dari sinilah istilah '*artificial intelligence*' diciptakan ([sumber](https://250.dartmouth.edu/highlights/artificial-intelligence-ai-coined-dartmouth)).
+
+> Setiap aspek pembelajaran atau fitur kecerdasan lainnya pada prinsipnya dapat dideskripsikan dengan sangat tepat sehingga sebuah mesin dapat dibuat untuk mensimulasikannya.
+
+Ketua peneliti, profesor matematika John McCarthy, berharap "untuk meneruskan dasar dari dugaan bahwa setiap aspek pembelajaran atau fitur kecerdasan lainnya pada prinsipnya dapat dideskripsikan dengan sangat tepat sehingga mesin dapat dibuat untuk mensimulasikannya." Marvin Minsky, seorang tokoh terkenal di bidang ini juga termasuk sebagai peserta penelitian.
+
+Workshop ini dipuji karena telah memprakarsai dan mendorong beberapa diskusi termasuk "munculnya metode simbolik, sistem yang berfokus pada domain terbatas (sistem pakar awal), dan sistem deduktif versus sistem induktif." ([sumber](https://wikipedia.org/wiki/Dartmouth_workshop)).
+
+## 1956 - 1974: "Tahun-tahun Emas"
+
+Dari tahun 1950-an hingga pertengahan 70-an, optimisme memuncak dengan harapan bahwa AI dapat memecahkan banyak masalah. Pada tahun 1967, Marvin Minsky dengan yakin menyatakan bahwa "Dalam satu generasi ... masalah menciptakan '*artificial intelligence*' akan terpecahkan secara substansial." (Minsky, Marvin (1967), Computation: Finite and Infinite Machines, Englewood Cliffs, N.J.: Prentice-Hall)
+
+Penelitian *natural language processing* berkembang, pencarian disempurnakan dan dibuat lebih *powerful*, dan konsep '*micro-worlds*' diciptakan, di mana tugas-tugas sederhana diselesaikan menggunakan instruksi bahasa sederhana.
+
+Penelitian didanai dengan baik oleh lembaga pemerintah, banyak kemajuan dibuat dalam komputasi dan algoritma, dan prototipe mesin cerdas dibangun. Beberapa mesin tersebut antara lain:
+
+* [Shakey the robot](https://wikipedia.org/wiki/Shakey_the_robot), yang bisa bermanuver dan memutuskan bagaimana melakukan tugas-tugas secara 'cerdas'.
+
+ 
+ > Shakey pada 1972
+
+* Eliza, sebuah 'chatterbot' awal, dapat mengobrol dengan orang-orang dan bertindak sebagai 'terapis' primitif. Kamu akan belajar lebih banyak tentang Eliza dalam pelajaran NLP.
+
+ 
+ > Sebuah versi dari Eliza, sebuah *chatbot*
+
+* "Blocks world" adalah contoh sebuah *micro-world* dimana balok dapat ditumpuk dan diurutkan, dan pengujian eksperimen mesin pengajaran untuk membuat keputusan dapat dilakukan. Kemajuan yang dibuat dengan *library-library* seperti [SHRDLU](https://wikipedia.org/wiki/SHRDLU) membantu mendorong kemajuan pemrosesan bahasa.
+
+ [](https://www.youtube.com/watch?v=QAJz4YKUwqw "blocks world dengan SHRDLU")
+
+ > 🎥 Klik gambar diatas untuk menonton video: Blocks world with SHRDLU
+
+## 1974 - 1980: "Musim Dingin AI"
+
+Pada pertengahan 1970-an, semakin jelas bahwa kompleksitas pembuatan 'mesin cerdas' telah diremehkan dan janjinya, mengingat kekuatan komputasi yang tersedia, telah dilebih-lebihkan. Pendanaan telah habis dan kepercayaan dalam bidang ini menurun. Beberapa masalah yang memengaruhi kepercayaan diri termasuk:
+
+- **Keterbatasan**. Kekuatan komputasi terlalu terbatas.
+- **Ledakan kombinatorial**. Jumlah parameter yang perlu dilatih bertambah secara eksponensial karena lebih banyak hal yang diminta dari komputer, tanpa evolusi paralel dari kekuatan dan kemampuan komputasi.
+- **Kekurangan data**. Adanya kekurangan data yang menghalangi proses pengujian, pengembangan, dan penyempurnaan algoritma.
+- **Apakah kita menanyakan pertanyaan yang tepat?**. Pertanyaan-pertanyaan yang diajukan pun mulai dipertanyakan kembali. Para peneliti mulai melontarkan kritik tentang pendekatan mereka
+ - Tes Turing mulai dipertanyakan, di antara ide-ide lain, dari 'teori ruang Cina' yang mengemukakan bahwa, "memprogram komputer digital mungkin membuatnya tampak memahami bahasa tetapi tidak dapat menghasilkan pemahaman yang sebenarnya." ([sumber](https://plato.stanford.edu/entries/chinese-room/))
+ - Tantangan etika ketika memperkenalkan kecerdasan buatan seperti si "terapis" ELIZA ke dalam masyarakat.
+
+Pada saat yang sama, berbagai aliran pemikiran AI mulai terbentuk. Sebuah dikotomi didirikan antara praktik ["scruffy" vs. "neat AI"](https://wikipedia.org/wiki/Neats_and_scruffies). Lab _Scruffy_ mengubah program selama berjam-jam sampai mendapat hasil yang diinginkan. Lab _Neat_ "berfokus pada logika dan penyelesaian masalah formal". ELIZA dan SHRDLU adalah sistem _scruffy_ yang terkenal. Pada tahun 1980-an, karena perkembangan permintaan untuk membuat sistem ML yang dapat direproduksi, pendekatan _neat_ secara bertahap menjadi yang terdepan karena hasilnya lebih dapat dijelaskan.
+
+## 1980s Sistem Pakar
+
+Seiring berkembangnya bidang ini, manfaatnya bagi bisnis menjadi lebih jelas, dan begitu pula dengan menjamurnya 'sistem pakar' pada tahun 1980-an. "Sistem pakar adalah salah satu bentuk perangkat lunak artificial intelligence (AI) pertama yang benar-benar sukses." ([sumber](https://wikipedia.org/wiki/Expert_system)).
+
+Tipe sistem ini sebenarnya adalah _hybrid_, sebagian terdiri dari mesin aturan yang mendefinisikan kebutuhan bisnis, dan mesin inferensi yang memanfaatkan sistem aturan untuk menyimpulkan fakta baru.
+
+Pada era ini juga terlihat adanya peningkatan perhatian pada jaringan saraf.
+
+## 1987 - 1993: AI 'Chill'
+
+Perkembangan perangkat keras sistem pakar terspesialisasi memiliki efek yang tidak menguntungkan karena menjadi terlalu terspesialiasasi. Munculnya komputer pribadi juga bersaing dengan sistem yang besar, terspesialisasi, dan terpusat ini. Demokratisasi komputasi telah dimulai, dan pada akhirnya membuka jalan untuk ledakan modern dari *big data*.
+
+## 1993 - 2011
+
+Pada zaman ini memperlihatkan era baru bagi ML dan AI untuk dapat menyelesaikan beberapa masalah yang sebelumnya disebabkan oleh kurangnya data dan daya komputasi. Jumlah data mulai meningkat dengan cepat dan tersedia secara luas, terlepas dari baik dan buruknya, terutama dengan munculnya *smartphone* sekitar tahun 2007. Daya komputasi berkembang secara eksponensial, dan algoritma juga berkembang saat itu. Bidang ini mulai mengalami kedewasaan karena hari-hari yang tidak beraturan di masa lalu mulai terbentuk menjadi disiplin yang sebenarnya.
+
+## Sekarang
+
+Saat ini, *machine learning* dan AI hampir ada di setiap bagian dari kehidupan kita. Era ini menuntut pemahaman yang cermat tentang risiko dan efek potensi dari berbagai algoritma yang ada pada kehidupan manusia. Seperti yang telah dinyatakan oleh Brad Smith dari Microsoft, "Teknologi informasi mengangkat isu-isu yang menjadi inti dari perlindungan hak asasi manusia yang mendasar seperti privasi dan kebebasan berekspresi. Masalah-masalah ini meningkatkan tanggung jawab bagi perusahaan teknologi yang menciptakan produk-produk ini. Dalam pandangan kami, mereka juga menyerukan peraturan pemerintah yang bijaksana dan untuk pengembangan norma-norma seputar penggunaan yang wajar" ([sumber](https://www.technologyreview.com/2019/12/18/102365/the-future-of-ais-impact-on-society/)).
+
+Kita masih belum tahu apa yang akan terjadi di masa depan, tetapi penting untuk memahami sistem komputer dan perangkat lunak serta algoritma yang dijalankannya. Kami berharap kurikulum ini akan membantu kamu untuk mendapatkan pemahaman yang lebih baik sehingga kamu dapat memutuskan sendiri.
+
+[](https://www.youtube.com/watch?v=mTtDfKgLm54 "Sejarah Deep Learning")
+> 🎥 Klik gambar diatas untuk menonton video: Yann LeCun mendiskusikan sejarah dari Deep Learning dalam pelajaran ini
+
+---
+## 🚀Tantangan
+
+Gali salah satu momen bersejarah ini dan pelajari lebih lanjut tentang orang-orang di baliknya. Ada karakter yang menarik, dan tidak ada penemuan ilmiah yang pernah dibuat dalam kekosongan budaya. Apa yang kamu temukan?
+
+## [Quiz Pasca-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/4/)
+
+## Ulasan & Belajar Mandiri
+
+Berikut adalah item untuk ditonton dan didengarkan:
+
+[Podcast dimana Amy Boyd mendiskusikan evolusi dari AI](http://runasradio.com/Shows/Show/739)
+
+[](https://www.youtube.com/watch?v=EJt3_bFYKss "Sejarah AI oleh Amy Boyd")
+
+## Tugas
+
+[Membuat sebuah *timeline*](assignment.id.md)
diff --git a/1-Introduction/2-history-of-ML/translations/README.ja.md b/1-Introduction/2-history-of-ML/translations/README.ja.md
index f9b4c045..5c17650c 100644
--- a/1-Introduction/2-history-of-ML/translations/README.ja.md
+++ b/1-Introduction/2-history-of-ML/translations/README.ja.md
@@ -3,7 +3,7 @@

> [Tomomi Imura](https://www.twitter.com/girlie_mac)によるスケッチ
-## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/3/)
+## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/3?loc=ja)
この授業では、機械学習と人工知能の歴史における主要な出来事を紹介します。
@@ -99,7 +99,7 @@
これらの歴史的瞬間の1つを掘り下げて、その背後にいる人々について学びましょう。魅力的な人々がいますし、文化的に空白の状態で科学的発見がなされたことはありません。どういったことが見つかるでしょうか?
-## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/4/)
+## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/4?loc=ja)
## 振り返りと自習
@@ -111,4 +111,4 @@
## 課題
-[時系列を制作してください](../assignment.md)
+[年表を作成する](./assignment.ja.md)
diff --git a/1-Introduction/2-history-of-ML/translations/README.zh-cn.md b/1-Introduction/2-history-of-ML/translations/README.zh-cn.md
index 51e66ecd..8ca7e690 100644
--- a/1-Introduction/2-history-of-ML/translations/README.zh-cn.md
+++ b/1-Introduction/2-history-of-ML/translations/README.zh-cn.md
@@ -113,4 +113,4 @@ Alan Turing,一个真正杰出的人,[在2019年被公众投票选出](https
## 任务
-[创建时间线](../assignment.md)
+[创建时间线](assignment.zh-cn.md)
diff --git a/1-Introduction/2-history-of-ML/translations/assignment.fr.md b/1-Introduction/2-history-of-ML/translations/assignment.fr.md
new file mode 100644
index 00000000..c562516e
--- /dev/null
+++ b/1-Introduction/2-history-of-ML/translations/assignment.fr.md
@@ -0,0 +1,11 @@
+# Créer une frise chronologique
+
+## Instructions
+
+Utiliser [ce repo](https://github.com/Digital-Humanities-Toolkit/timeline-builder), créer une frise chronologique de certains aspects de l'histoire des algorithmes, des mathématiques, des statistiques, de l'IA ou du machine learning, ou une combinaison de ceux-ci. Vous pouvez vous concentrer sur une personne, une idée ou une longue période d'innovations. Assurez-vous d'ajouter des éléments multimédias.
+
+## Rubrique
+
+| Critères | Exemplaire | Adéquate | A améliorer |
+| -------- | ---------------------------------------------------------------- | ------------------------------------ | ------------------------------------------------------------------ |
+| | Une chronologie déployée est présentée sous forme de page GitHub | Le code est incomplet et non déployé | La chronologie est incomplète, pas bien recherchée et pas déployée |
diff --git a/1-Introduction/2-history-of-ML/translations/assignment.id.md b/1-Introduction/2-history-of-ML/translations/assignment.id.md
new file mode 100644
index 00000000..0ee7c009
--- /dev/null
+++ b/1-Introduction/2-history-of-ML/translations/assignment.id.md
@@ -0,0 +1,11 @@
+# Membuat sebuah *timeline*
+
+## Instruksi
+
+Menggunakan [repo ini](https://github.com/Digital-Humanities-Toolkit/timeline-builder), buatlah sebuah *timeline* dari beberapa aspek sejarah algoritma, matematika, statistik, AI, atau ML, atau kombinasi dari semuanya. Kamu dapat fokus pada satu orang, satu ide, atau rentang waktu pemikiran yang panjang. Pastikan untuk menambahkan elemen multimedia.
+
+## Rubrik
+
+| Kriteria | Sangat Bagus | Cukup | Perlu Peningkatan |
+| -------- | ------------------------------------------------- | --------------------------------------- | ---------------------------------------------------------------- |
+| | *Timeline* yang dideploy disajikan sebagai halaman GitHub | Kode belum lengkap dan belum dideploy | *Timeline* belum lengkap, belum diriset dengan baik dan belum dideploy |
\ No newline at end of file
diff --git a/1-Introduction/2-history-of-ML/translations/assignment.ja.md b/1-Introduction/2-history-of-ML/translations/assignment.ja.md
new file mode 100644
index 00000000..f5f78799
--- /dev/null
+++ b/1-Introduction/2-history-of-ML/translations/assignment.ja.md
@@ -0,0 +1,11 @@
+# 年表を作成する
+
+## 指示
+
+[このリポジトリ](https://github.com/Digital-Humanities-Toolkit/timeline-builder) を使って、アルゴリズム・数学・統計学・人工知能・機械学習、またはこれらの組み合わせに対して、歴史のひとつの側面に関する年表を作成してください。焦点を当てるのは、ひとりの人物・ひとつのアイディア・長期間にわたる思想のいずれのものでも構いません。マルチメディアの要素を必ず加えるようにしてください。
+
+## 評価基準
+
+| 基準 | 模範的 | 十分 | 要改善 |
+| ---- | -------------------------------------- | ------------------------------------ | ------------------------------------------------------------ |
+| | GitHub page に年表がデプロイされている | コードが未完成でデプロイされていない | 年表が未完成で、十分に調査されておらず、デプロイされていない |
diff --git a/1-Introduction/2-history-of-ML/translations/assignment.zh-cn.md b/1-Introduction/2-history-of-ML/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..adf3ee15
--- /dev/null
+++ b/1-Introduction/2-history-of-ML/translations/assignment.zh-cn.md
@@ -0,0 +1,11 @@
+# 建立一个时间轴
+
+## 说明
+
+使用这个 [仓库](https://github.com/Digital-Humanities-Toolkit/timeline-builder),创建一个关于算法、数学、统计学、人工智能、机器学习的某个方面或者可以综合多个以上学科来讲。你可以着重介绍某个人,某个想法,或者一个经久不衰的思想。请确保添加了多媒体元素在你的时间线中。
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| ------------ | ---------------------------------- | ---------------------- | ------------------------------------------ |
+| | 有一个用 GitHub page 展示的 timeline | 代码还不完整并且没有部署 | 时间线不完整,没有经过充分的研究,并且没有部署 |
diff --git a/1-Introduction/3-fairness/translations/README.id.md b/1-Introduction/3-fairness/translations/README.id.md
new file mode 100644
index 00000000..6f09a148
--- /dev/null
+++ b/1-Introduction/3-fairness/translations/README.id.md
@@ -0,0 +1,213 @@
+# Keadilan dalam Machine Learning
+
+
+> Catatan sketsa oleh [Tomomi Imura](https://www.twitter.com/girlie_mac)
+
+## [Quiz Pra-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/5/)
+
+## Pengantar
+
+Dalam kurikulum ini, kamu akan mulai mengetahui bagaimana Machine Learning bisa memengaruhi kehidupan kita sehari-hari. Bahkan sekarang, sistem dan model terlibat dalam tugas pengambilan keputusan sehari-hari, seperti diagnosis kesehatan atau mendeteksi penipuan. Jadi, penting bahwa model-model ini bekerja dengan baik untuk memberikan hasil yang adil bagi semua orang.
+
+Bayangkan apa yang bisa terjadi ketika data yang kamu gunakan untuk membangun model ini tidak memiliki demografi tertentu, seperti ras, jenis kelamin, pandangan politik, agama, atau secara tidak proporsional mewakili demografi tersebut. Bagaimana jika keluaran dari model diinterpretasikan lebih menyukai beberapa demografis tertentu? Apa konsekuensi untuk aplikasinya?
+
+Dalam pelajaran ini, kamu akan:
+
+- Meningkatkan kesadaran dari pentingnya keadilan dalam Machine Learning.
+- Mempelajari tentang berbagai kerugian terkait keadilan.
+- Learn about unfairness assessment and mitigation.
+- Mempelajari tentang mitigasi dan penilaian ketidakadilan.
+
+## Prasyarat
+
+Sebagai prasyarat, silakan ikuti jalur belajar "Prinsip AI yang Bertanggung Jawab" dan tonton video di bawah ini dengan topik:
+
+Pelajari lebih lanjut tentang AI yang Bertanggung Jawab dengan mengikuti [Jalur Belajar](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-15963-cxa) ini
+
+[](https://youtu.be/dnC8-uUZXSc "Pendekatan Microsoft untuk AI yang Bertanggung Jawab")
+
+> 🎥 Klik gambar diatas untuk menonton video: Pendekatan Microsoft untuk AI yang Bertanggung Jawab
+
+## Ketidakadilan dalam data dan algoritma
+
+> "Jika Anda menyiksa data cukup lama, data itu akan mengakui apa pun " - Ronald Coase
+
+Pernyataan ini terdengar ekstrem, tetapi memang benar bahwa data dapat dimanipulasi untuk mendukung kesimpulan apa pun. Manipulasi semacam itu terkadang bisa terjadi secara tidak sengaja. Sebagai manusia, kita semua memiliki bias, dan seringkali sulit untuk secara sadar mengetahui kapan kamu memperkenalkan bias dalam data.
+
+Menjamin keadilan dalam AI dan machine learning tetap menjadi tantangan sosioteknik yang kompleks. Artinya, hal itu tidak bisa ditangani baik dari perspektif sosial atau teknis semata.
+
+### Kerugian Terkait Keadilan
+
+Apa yang dimaksud dengan ketidakadilan? "Ketidakadilan" mencakup dampak negatif atau "bahaya" bagi sekelompok orang, seperti yang didefinisikan dalam hal ras, jenis kelamin, usia, atau status disabilitas.
+
+Kerugian utama yang terkait dengan keadilan dapat diklasifikasikan sebagai:
+
+- **Alokasi**, jika suatu jenis kelamin atau etnisitas misalkan lebih disukai daripada yang lain.
+- **Kualitas layanan**. Jika kamu melatih data untuk satu skenario tertentu tetapi kenyataannya jauh lebih kompleks, hasilnya adalah layanan yang berkinerja buruk.
+- **Stereotip**. Mengaitkan grup tertentu dengan atribut yang ditentukan sebelumnya.
+- **Fitnah**. Untuk mengkritik dan melabeli sesuatu atau seseorang secara tidak adil.
+- **Representasi yang kurang atau berlebihan**. Idenya adalah bahwa kelompok tertentu tidak terlihat dalam profesi tertentu, dan layanan atau fungsi apa pun yang terus dipromosikan yang menambah kerugian.
+
+Mari kita lihat contoh-contohnya.
+
+### Alokasi
+
+Bayangkan sebuah sistem untuk menyaring pengajuan pinjaman. Sistem cenderung memilih pria kulit putih sebagai kandidat yang lebih baik daripada kelompok lain. Akibatnya, pinjaman ditahan dari pemohon tertentu.
+
+Contoh lain adalah alat perekrutan eksperimental yang dikembangkan oleh perusahaan besar untuk menyaring kandidat. Alat tersebut secara sistematis mendiskriminasi satu gender dengan menggunakan model yang dilatih untuk lebih memilih kata-kata yang terkait dengan gender lain. Hal ini mengakibatkan kandidat yang resumenya berisi kata-kata seperti "tim rugby wanita" tidak masuk kualifikasi.
+
+✅ Lakukan sedikit riset untuk menemukan contoh dunia nyata dari sesuatu seperti ini
+
+### Kualitas Layanan
+
+Para peneliti menemukan bahwa beberapa pengklasifikasi gender komersial memiliki tingkat kesalahan yang lebih tinggi di sekitar gambar wanita dengan warna kulit lebih gelap dibandingkan dengan gambar pria dengan warna kulit lebih terang. [Referensi](https://www.media.mit.edu/publications/gender-shades-intersectional-accuracy-disparities-in-commercial-gender-classification/)
+
+Contoh terkenal lainnya adalah dispenser sabun tangan yang sepertinya tidak bisa mendeteksi orang dengan kulit gelap. [Referensi](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
+
+### Stereotip
+
+Pandangan gender stereotip ditemukan dalam terjemahan mesin. Ketika menerjemahkan "dia (laki-laki) adalah seorang perawat dan dia (perempuan) adalah seorang dokter" ke dalam bahasa Turki, masalah muncul. Turki adalah bahasa tanpa gender yang memiliki satu kata ganti, "o" untuk menyampaikan orang ketiga tunggal, tetapi menerjemahkan kalimat kembali dari Turki ke Inggris menghasilkan stereotip dan salah sebagai "dia (perempuan) adalah seorang perawat dan dia (laki-laki) adalah seorang dokter".
+
+
+
+
+
+### Fitnah
+
+Sebuah teknologi pelabelan gambar yang terkenal salah memberi label gambar orang berkulit gelap sebagai gorila. Pelabelan yang salah berbahaya bukan hanya karena sistem membuat kesalahan karena secara khusus menerapkan label yang memiliki sejarah panjang yang sengaja digunakan untuk merendahkan orang kulit hitam.
+
+[](https://www.youtube.com/watch?v=QxuyfWoVV98 "Bukankah Aku Seorang Wanita?")
+> 🎥 Klik gambar diatas untuk sebuah video: AI, Bukankah Aku Seorang Wanita? - menunjukkan kerugian yang disebabkan oleh pencemaran nama baik yang menyinggung ras oleh AI
+
+### Representasi yang kurang atau berlebihan
+
+Hasil pencarian gambar yang condong ke hal tertentu (skewed) dapat menjadi contoh yang bagus dari bahaya ini. Saat menelusuri gambar profesi dengan persentase pria yang sama atau lebih tinggi daripada wanita, seperti teknik, atau CEO, perhatikan hasil yang lebih condong ke jenis kelamin tertentu.
+
+
+> Pencarian di Bing untuk 'CEO' ini menghasilkan hasil yang cukup inklusif
+
+Lima jenis bahaya utama ini tidak saling eksklusif, dan satu sistem dapat menunjukkan lebih dari satu jenis bahaya. Selain itu, setiap kasus bervariasi dalam tingkat keparahannya. Misalnya, memberi label yang tidak adil kepada seseorang sebagai penjahat adalah bahaya yang jauh lebih parah daripada memberi label yang salah pada gambar. Namun, penting untuk diingat bahwa bahkan kerugian yang relatif tidak parah dapat membuat orang merasa terasing atau diasingkan dan dampak kumulatifnya bisa sangat menekan.
+
+✅ **Diskusi**: Tinjau kembali beberapa contoh dan lihat apakah mereka menunjukkan bahaya yang berbeda.
+
+| | Alokasi | Kualitas Layanan | Stereotip | Fitnah | Representasi yang kurang atau berlebihan |
+| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
+| Sistem perekrutan otomatis | x | x | x | | x |
+| Terjemahan mesin | | | | | |
+| Melabeli foto | | | | | |
+
+
+## Mendeteksi Ketidakadilan
+
+Ada banyak alasan mengapa sistem tertentu berperilaku tidak adil. Bias sosial, misalnya, mungkin tercermin dalam kumpulan data yang digunakan untuk melatih. Misalnya, ketidakadilan perekrutan mungkin telah diperburuk oleh ketergantungan yang berlebihan pada data historis. Dengan menggunakan pola dalam resume yang dikirimkan ke perusahaan selama periode 10 tahun, model tersebut menentukan bahwa pria lebih berkualitas karena mayoritas resume berasal dari pria, yang mencerminkan dominasi pria di masa lalu di industri teknologi.
+
+Data yang tidak memadai tentang sekelompok orang tertentu dapat menjadi alasan ketidakadilan. Misalnya, pengklasifikasi gambar memiliki tingkat kesalahan yang lebih tinggi untuk gambar orang berkulit gelap karena warna kulit yang lebih gelap kurang terwakili dalam data.
+
+Asumsi yang salah yang dibuat selama pengembangan menyebabkan ketidakadilan juga. Misalnya, sistem analisis wajah yang dimaksudkan untuk memprediksi siapa yang akan melakukan kejahatan berdasarkan gambar wajah orang dapat menyebabkan asumsi yang merusak. Hal ini dapat menyebabkan kerugian besar bagi orang-orang yang salah diklasifikasikan.
+
+## Pahami model kamu dan bangun dalam keadilan
+
+Meskipun banyak aspek keadilan tidak tercakup dalam metrik keadilan kuantitatif, dan tidak mungkin menghilangkan bias sepenuhnya dari sistem untuk menjamin keadilan, Kamu tetap bertanggung jawab untuk mendeteksi dan mengurangi masalah keadilan sebanyak mungkin.
+
+Saat Kamu bekerja dengan model pembelajaran mesin, penting untuk memahami model Kamu dengan cara memastikan interpretasinya dan dengan menilai serta mengurangi ketidakadilan.
+
+Mari kita gunakan contoh pemilihan pinjaman untuk mengisolasi kasus untuk mengetahui tingkat dampak setiap faktor pada prediksi.
+
+## Metode Penilaian
+
+1. **Identifikasi bahaya (dan manfaat)**. Langkah pertama adalah mengidentifikasi bahaya dan manfaat. Pikirkan tentang bagaimana tindakan dan keputusan dapat memengaruhi calon pelanggan dan bisnis itu sendiri.
+
+1. **Identifikasi kelompok yang terkena dampak**. Setelah Kamu memahami jenis kerugian atau manfaat apa yang dapat terjadi, identifikasi kelompok-kelompok yang mungkin terpengaruh. Apakah kelompok-kelompok ini ditentukan oleh jenis kelamin, etnis, atau kelompok sosial?
+
+1. **Tentukan metrik keadilan**. Terakhir, tentukan metrik sehingga Kamu memiliki sesuatu untuk diukur dalam pekerjaan Kamu untuk memperbaiki situasi.
+
+### Identifikasi bahaya (dan manfaat)
+
+Apa bahaya dan manfaat yang terkait dengan pinjaman? Pikirkan tentang skenario negatif palsu dan positif palsu:
+
+**False negatives** (ditolak, tapi Y=1) - dalam hal ini, pemohon yang akan mampu membayar kembali pinjaman ditolak. Ini adalah peristiwa yang merugikan karena sumber pinjaman ditahan dari pemohon yang memenuhi syarat.
+
+**False positives** (diterima, tapi Y=0) - dalam hal ini, pemohon memang mendapatkan pinjaman tetapi akhirnya wanprestasi. Akibatnya, kasus pemohon akan dikirim ke agen penagihan utang yang dapat mempengaruhi permohonan pinjaman mereka di masa depan.
+
+### Identifikasi kelompok yang terkena dampak
+
+Langkah selanjutnya adalah menentukan kelompok mana yang kemungkinan akan terpengaruh. Misalnya, dalam kasus permohonan kartu kredit, sebuah model mungkin menentukan bahwa perempuan harus menerima batas kredit yang jauh lebih rendah dibandingkan dengan pasangan mereka yang berbagi aset rumah tangga. Dengan demikian, seluruh demografi yang ditentukan berdasarkan jenis kelamin menjadi terpengaruh.
+
+### Tentukan metrik keadilan
+
+Kamu telah mengidentifikasi bahaya dan kelompok yang terpengaruh, dalam hal ini digambarkan berdasarkan jenis kelamin. Sekarang, gunakan faktor terukur (*quantified factors*) untuk memisahkan metriknya. Misalnya, dengan menggunakan data di bawah ini, Kamu dapat melihat bahwa wanita memiliki tingkat *false positive* terbesar dan pria memiliki yang terkecil, dan kebalikannya berlaku untuk *false negative*.
+
+✅ Dalam pelajaran selanjutnya tentang *Clustering*, Kamu akan melihat bagaimana membangun 'confusion matrix' ini dalam kode
+
+| | False positive rate | False negative rate | count |
+| ---------- | ------------------- | ------------------- | ----- |
+| Women | 0.37 | 0.27 | 54032 |
+| Men | 0.31 | 0.35 | 28620 |
+| Non-binary | 0.33 | 0.31 | 1266 |
+
+
+Tabel ini memberitahu kita beberapa hal. Pertama, kami mencatat bahwa ada sedikit orang non-biner dalam data. Datanya condong (*skewed*), jadi Kamu harus berhati-hati dalam menafsirkan angka-angka ini.
+
+Dalam hal ini, kita memiliki 3 grup dan 2 metrik. Ketika kita memikirkan tentang bagaimana sistem kita memengaruhi kelompok pelanggan dengan permohonan pinjaman mereka, ini mungkin cukup, tetapi ketika Kamu ingin menentukan jumlah grup yang lebih besar, Kamu mungkin ingin menyaringnya menjadi kumpulan ringkasan yang lebih kecil. Untuk melakukannya, Kamu dapat menambahkan lebih banyak metrik, seperti perbedaan terbesar atau rasio terkecil dari setiap *false negative* dan *false positive*.
+
+✅ Berhenti dan Pikirkan: Kelompok lain yang apa lagi yang mungkin terpengaruh untuk pengajuan pinjaman?
+
+## Mengurangi ketidakadilan
+
+Untuk mengurangi ketidakadilan, jelajahi model untuk menghasilkan berbagai model yang dimitigasi dan bandingkan pengorbanan yang dibuat antara akurasi dan keadilan untuk memilih model yang paling adil.
+
+Pelajaran pengantar ini tidak membahas secara mendalam mengenai detail mitigasi ketidakadilan algoritmik, seperti pendekatan pasca-pemrosesan dan pengurangan (*post-processing and reductions approach*), tetapi berikut adalah *tool* yang mungkin ingin Kamu coba.
+
+### Fairlearn
+
+[Fairlearn](https://fairlearn.github.io/) adalah sebuah *package* Python open-source yang memungkinkan Kamu untuk menilai keadilan sistem Kamu dan mengurangi ketidakadilan.
+
+*Tool* ini membantu Kamu menilai bagaimana prediksi model memengaruhi kelompok yang berbeda, memungkinkan Kamu untuk membandingkan beberapa model dengan menggunakan metrik keadilan dan kinerja, dan menyediakan serangkaian algoritma untuk mengurangi ketidakadilan dalam klasifikasi dan regresi biner.
+
+- Pelajari bagaimana cara menggunakan komponen-komponen yang berbeda dengan mengunjungi [GitHub](https://github.com/fairlearn/fairlearn/) Fairlearn
+
+- Jelajahi [panduan pengguna](https://fairlearn.github.io/main/user_guide/index.html), [contoh-contoh](https://fairlearn.github.io/main/auto_examples/index.html)
+
+- Coba beberapa [sampel notebook](https://github.com/fairlearn/fairlearn/tree/master/notebooks).
+
+- Pelajari [bagaimana cara mengaktifkan penilaian keadilan](https://docs.microsoft.com/azure/machine-learning/how-to-machine-learning-fairness-aml?WT.mc_id=academic-15963-cxa) dari model machine learning di Azure Machine Learning.
+
+- Lihat [sampel notebook](https://github.com/Azure/MachineLearningNotebooks/tree/master/contrib/fairness) ini untuk skenario penilaian keadilan yang lebih banyak di Azure Machine Learning.
+
+---
+## 🚀 Tantangan
+
+Untuk mencegah kemunculan bias pada awalnya, kita harus:
+
+- memiliki keragaman latar belakang dan perspektif di antara orang-orang yang bekerja pada sistem
+- berinvestasi dalam dataset yang mencerminkan keragaman masyarakat kita
+- mengembangkan metode yang lebih baik untuk mendeteksi dan mengoreksi bias ketika itu terjadi
+
+Pikirkan tentang skenario kehidupan nyata di mana ketidakadilan terbukti dalam pembuatan dan penggunaan model. Apa lagi yang harus kita pertimbangkan?
+
+## [Quiz Pasca-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/6/)
+## Ulasan & Belajar Mandiri
+
+Dalam pelajaran ini, Kamu telah mempelajari beberapa dasar konsep keadilan dan ketidakadilan dalam pembelajaran mesin.
+
+Tonton workshop ini untuk menyelami lebih dalam kedalam topik:
+
+- YouTube: Kerugian terkait keadilan dalam sistem AI: Contoh, penilaian, dan mitigasi oleh Hanna Wallach dan Miro Dudik [Kerugian terkait keadilan dalam sistem AI: Contoh, penilaian, dan mitigasi - YouTube](https://www.youtube.com/watch?v=1RptHwfkx_k)
+
+Kamu juga dapat membaca:
+
+- Pusat sumber daya RAI Microsoft: [Responsible AI Resources – Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
+
+- Grup riset FATE Microsoft: [FATE: Fairness, Accountability, Transparency, and Ethics in AI - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
+
+Jelajahi *toolkit* Fairlearn
+
+[Fairlearn](https://fairlearn.org/)
+
+Baca mengenai *tools* Azure Machine Learning untuk memastikan keadilan
+
+- [Azure Machine Learning](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-15963-cxa)
+
+## Tugas
+
+[Jelajahi Fairlearn](assignment.id.md)
diff --git a/1-Introduction/3-fairness/translations/README.ja.md b/1-Introduction/3-fairness/translations/README.ja.md
index e8448359..9bb32639 100644
--- a/1-Introduction/3-fairness/translations/README.ja.md
+++ b/1-Introduction/3-fairness/translations/README.ja.md
@@ -3,7 +3,7 @@

> [Tomomi Imura](https://www.twitter.com/girlie_mac)によるスケッチ
-## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/5/)
+## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/5?loc=ja)
## イントロダクション
@@ -178,7 +178,7 @@ AIや機械学習における公平性の保証は、依然として複雑な社
モデルの構築や使用において、不公平が明らかになるような現実のシナリオを考えてみてください。他にどのようなことを考えるべきでしょうか?
-## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/6/)
+## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/6?loc=ja)
## Review & Self Study
このレッスンでは、機械学習における公平、不公平の概念の基礎を学びました。
@@ -201,4 +201,4 @@ Azure Machine Learningによる、公平性を確保するためのツールに
## 課題
-[Fairlearnを調査する](../assignment.md)
+[Fairlearnを調査する](./assignment.ja.md)
diff --git a/1-Introduction/3-fairness/translations/README.zh-cn.md b/1-Introduction/3-fairness/translations/README.zh-cn.md
index 02f41777..22204544 100644
--- a/1-Introduction/3-fairness/translations/README.zh-cn.md
+++ b/1-Introduction/3-fairness/translations/README.zh-cn.md
@@ -89,11 +89,11 @@
✅ **讨论**:重温一些例子,看看它们是否显示出不同的危害。
-| | 分配 | 服务质量 | 刻板印象 | 诋毁 | 代表性过高或过低 |
-| ----------------------- | :--------: | :----------------: | :----------: | :---------: | :----------------------------: |
-| 自动招聘系统 | x | x | x | | x |
-| 机器翻译 | | | | | |
-| 照片加标签 | | | | | |
+| | 分配 | 服务质量 | 刻板印象 | 诋毁 | 代表性过高或过低 |
+| ------------ | :---: | :------: | :------: | :---: | :--------------: |
+| 自动招聘系统 | x | x | x | | x |
+| 机器翻译 | | | | | |
+| 照片加标签 | | | | | |
## 检测不公平
@@ -138,14 +138,14 @@
✅ 在以后关于聚类的课程中,你将看到如何在代码中构建这个“混淆矩阵”
-| | 假阳性率 | 假阴性率 | 数量 |
-| ---------- | ------------------- | ------------------- | ----- |
-| 女性 | 0.37 | 0.27 | 54032 |
-| 男性 | 0.31 | 0.35 | 28620 |
-| 未列出性别 | 0.33 | 0.31 | 1266 |
+| | 假阳性率 | 假阴性率 | 数量 |
+| ---------- | -------- | -------- | ----- |
+| 女性 | 0.37 | 0.27 | 54032 |
+| 男性 | 0.31 | 0.35 | 28620 |
+| 未列出性别 | 0.33 | 0.31 | 1266 |
-这张桌子告诉我们几件事。首先,我们注意到数据中的未列出性别的人相对较少。数据是有偏差的,所以你需要小心解释这些数字。
+这个表格告诉我们几件事。首先,我们注意到数据中的未列出性别的人相对较少。数据是有偏差的,所以你需要小心解释这些数字。
在本例中,我们有3个组和2个度量。当我们考虑我们的系统如何影响贷款申请人的客户群时,这可能就足够了,但是当你想要定义更多的组时,你可能需要将其提取到更小的摘要集。为此,你可以添加更多的度量,例如每个假阴性和假阳性的最大差异或最小比率。
@@ -211,4 +211,4 @@
## 任务
-[探索Fairlearn](../assignment.md)
+[探索Fairlearn](assignment.zh-cn.md)
diff --git a/1-Introduction/3-fairness/translations/assignment.es.md b/1-Introduction/3-fairness/translations/assignment.es.md
new file mode 100644
index 00000000..cf83256e
--- /dev/null
+++ b/1-Introduction/3-fairness/translations/assignment.es.md
@@ -0,0 +1,11 @@
+# Explore Fairlearn
+
+## Instrucciones
+
+En esta lección, aprendió sobre Fairlearn, un "proyecto open-source impulsado por la comunidad para ayudar a los científicos de datos a mejorar la equidad de los sistemas de AI." Para esta tarea, explore uno de los [cuadernos](https://fairlearn.org/v0.6.2/auto_examples/index.html) de Fairlearn e informe sus hallazgos en un documento o presentación.
+
+## Rúbrica
+
+| Criterios | Ejemplar | Adecuado | Necesita mejorar |
+| -------- | --------- | -------- | ----------------- |
+| | Un documento o presentación powerpoint es presentado discutiendo los sistemas de Fairlearn, el cuadernos que fue ejecutado, y las conclusiones extraídas al ejecutarlo | Un documento es presentado sin conclusiones | No se presenta ningún documento |
diff --git a/1-Introduction/3-fairness/translations/assignment.id.md b/1-Introduction/3-fairness/translations/assignment.id.md
new file mode 100644
index 00000000..90389a14
--- /dev/null
+++ b/1-Introduction/3-fairness/translations/assignment.id.md
@@ -0,0 +1,11 @@
+# Jelajahi Fairlearn
+
+## Instruksi
+
+Dalam pelajaran ini kamu telah belajar mengenai Fairlearn, sebuah "proyek *open-source* berbasis komunitas untuk membantu para *data scientist* meningkatkan keadilan dari sistem AI." Untuk penugasan kali ini, jelajahi salah satu dari [notebook](https://fairlearn.org/v0.6.2/auto_examples/index.html) yang disediakan Fairlearn dan laporkan penemuanmu dalam sebuah paper atau presentasi.
+
+## Rubrik
+
+| Kriteria | Sangat Bagus | Cukup | Perlu Peningkatan |
+| -------- | --------- | -------- | ----------------- |
+| | Sebuah *paper* atau presentasi powerpoint yang membahas sistem Fairlearn, *notebook* yang dijalankan, dan kesimpulan yang diambil dari hasil menjalankannya | Sebuah paper yang dipresentasikan tanpa kesimpulan | Tidak ada paper yang dipresentasikan |
diff --git a/1-Introduction/3-fairness/translations/assignment.ja.md b/1-Introduction/3-fairness/translations/assignment.ja.md
new file mode 100644
index 00000000..dbf7b2b4
--- /dev/null
+++ b/1-Introduction/3-fairness/translations/assignment.ja.md
@@ -0,0 +1,11 @@
+# Fairlearnを調査する
+
+## 指示
+
+このレッスンでは、「データサイエンティストがAIシステムの公平性を向上させるための、オープンソースでコミュニティ主導のプロジェクト」であるFairlearnについて学習しました。この課題では、Fairlearnの [ノートブック](https://fairlearn.org/v0.6.2/auto_examples/index.html) のうちのひとつを調査し、わかったことをレポートやプレゼンテーションの形で報告してください。
+
+## 評価基準
+
+| 基準 | 模範的 | 十分 | 要改善 |
+| ---- | --------------------------------------------------------------------------------------------------------------------------------------- | ---------------------------------- | -------------------------- |
+| | Fairlearnのシステム・実行したノートブック・実行によって得られた結果が、レポートやパワーポイントのプレゼンテーションとして提示されている | 結論のないレポートが提示されている | レポートが提示されていない |
diff --git a/1-Introduction/3-fairness/translations/assignment.zh-cn.md b/1-Introduction/3-fairness/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..a8124199
--- /dev/null
+++ b/1-Introduction/3-fairness/translations/assignment.zh-cn.md
@@ -0,0 +1,11 @@
+# 探索 Fairlearn
+
+## 说明
+
+在这节课中,你了解了 Fairlearn,一个“开源的,社区驱动的项目,旨在帮助数据科学家们提高人工智能系统的公平性”。在这项作业中,探索 Fairlearn [笔记本](https://fairlearn.org/v0.6.2/auto_examples/index.html)中的一个例子,之后你可以用论文或者 ppt 的形式叙述你学习后的发现。
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | --------- | -------- | ----------------- |
+| | 提交了一篇论文或者ppt 关于讨论 Fairlearn 系统、挑选运行的例子、和运行这个例子后所得出来的心得结论 | 提交了一篇没有结论的论文 | 没有提交论文 |
diff --git a/1-Introduction/4-techniques-of-ML/README.md b/1-Introduction/4-techniques-of-ML/README.md
index ae9d2c44..70b96000 100644
--- a/1-Introduction/4-techniques-of-ML/README.md
+++ b/1-Introduction/4-techniques-of-ML/README.md
@@ -4,8 +4,9 @@ The process of building, using, and maintaining machine learning models and the
- Understand the processes underpinning machine learning at a high level.
- Explore base concepts such as 'models', 'predictions', and 'training data'.
-
+
## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/7/)
+
## Introduction
On a high level, the craft of creating machine learning (ML) processes is comprised of a number of steps:
@@ -39,14 +40,20 @@ To be able to answer your question with any kind of certainty, you need a good a
✅ After collecting and processing your data, take a moment to see if its shape will allow you to address your intended question. It may be that the data will not perform well in your given task, as we discover in our [Clustering](../../5-Clustering/1-Visualize/README.md) lessons!
-### Selecting your feature variable
+### Features and Target
+
+A [feature](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) is a measurable property of your data. In many datasets it is expressed as a column heading like 'date' 'size' or 'color'. Your feature variable, usually represented as `X` in code, represent the input variable which will be used to train model.
+
+A target is a thing you are trying to predict. Target usually represented as `y` in code, represents the answer to the question you are trying to ask of your data: in December, what **color** pumpkins will be cheapest? in San Francisco, what neighborhoods will have the best real estate **price**? Sometimes target is also referred as label attribute.
-A [feature](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) is a measurable property of your data. In many datasets it is expressed as a column heading like 'date' 'size' or 'color'. Your feature variable, usually represented as `y` in code, represents the answer to the question you are trying to ask of your data: in December, what **color** pumpkins will be cheapest? in San Francisco, what neighborhoods will have the best real estate **price**?
+### Selecting your feature variable
🎓 **Feature Selection and Feature Extraction** How do you know which variable to choose when building a model? You'll probably go through a process of feature selection or feature extraction to choose the right variables for the most performant model. They're not the same thing, however: "Feature extraction creates new features from functions of the original features, whereas feature selection returns a subset of the features." ([source](https://wikipedia.org/wiki/Feature_selection))
+
### Visualize your data
-An important aspect of the data scientist's toolkit is the power to visualize data using several excellent libraries such as Seaborn or MatPlotLib. Representing your data visually might allow you to uncover hidden correlations that you can leverage. Your visualizations might also help you to uncover bias or unbalanced data (as we discover in [Classification](../../4-Classification/2-Classifiers-1/README.md)).
+An important aspect of the data scientist's toolkit is the power to visualize data using several excellent libraries such as Seaborn or MatPlotLib. Representing your data visually might allow you to uncover hidden correlations that you can leverage. Your visualizations might also help you to uncover bias or unbalanced data (as we discover in [Classification](../../4-Classification/2-Classifiers-1/README.md)).
+
### Split your dataset
Prior to training, you need to split your dataset into two or more parts of unequal size that still represent the data well.
@@ -61,10 +68,12 @@ Using your training data, your goal is to build a model, or a statistical repres
### Decide on a training method
-Depending on your question and the nature of your data, your will choose a method to train it. Stepping through [Scikit-learn's documentation](https://scikit-learn.org/stable/user_guide.html) - which we use in this course - you can explore many ways to train a model. Depending on your experience, you might have to try several different methods to build the best model. You are likely to go through a process whereby data scientists evaluate the performance of a model by feeding it unseen data, checking for accuracy, bias, and other quality-degrading issues, and selecting the most appropriate training method for the task at hand.
+Depending on your question and the nature of your data, you will choose a method to train it. Stepping through [Scikit-learn's documentation](https://scikit-learn.org/stable/user_guide.html) - which we use in this course - you can explore many ways to train a model. Depending on your experience, you might have to try several different methods to build the best model. You are likely to go through a process whereby data scientists evaluate the performance of a model by feeding it unseen data, checking for accuracy, bias, and other quality-degrading issues, and selecting the most appropriate training method for the task at hand.
+
### Train a model
-Armed with your training data, you are ready to 'fit' it to create a model. You will notice that in many ML libraries you will find the code 'model.fit' - it is at this time that you send in your data as an array of values (usually 'X') and a feature variable (usually 'y').
+Armed with your training data, you are ready to 'fit' it to create a model. You will notice that in many ML libraries you will find the code 'model.fit' - it is at this time that you send in your feature variable as an array of values (usually 'X') and a target variable (usually 'y').
+
### Evaluate the model
Once the training process is complete (it can take many iterations, or 'epochs', to train a large model), you will be able to evaluate the model's quality by using test data to gauge its performance. This data is a subset of the original data that the model has not previously analyzed. You can print out a table of metrics about your model's quality.
diff --git a/1-Introduction/4-techniques-of-ML/translations/README.es.md b/1-Introduction/4-techniques-of-ML/translations/README.es.md
old mode 100644
new mode 100755
index e69de29b..81052021
--- a/1-Introduction/4-techniques-of-ML/translations/README.es.md
+++ b/1-Introduction/4-techniques-of-ML/translations/README.es.md
@@ -0,0 +1,107 @@
+# Técnicas de Machine Learning
+
+El proceso de creación, uso y mantenimiento de modelos de machine learning, y los datos que se utilizan, es un proceso muy diferente de muchos otros flujos de trabajo de desarrollo. En esta lección, demistificaremos el proceso, y describiremos las principales técnicas que necesita saber. Vas a:
+
+- Comprender los procesos que sustentan el machine learning a un alto nivel.
+- Explorar conceptos básicos como 'modelos', 'predicciones', y 'datos de entrenamiento'
+
+
+## [Cuestionario previo a la conferencia](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/7/)
+## Introducción
+
+A un alto nivel, el arte de crear procesos de machine learning (ML) se compone de una serie de pasos:
+
+1. **Decidir sobre la pregunta**. La mayoría de los procesos de ML, comienzan por hacer una pregunta que no puede ser respondida por un simple programa condicional o un motor basado en reglas. Esas preguntas a menudo giran en torno a predicciones basadas en una recopilación de datos.
+2. **Recopile y prepare datos**. Para poder responder a su pregunta, necesita datos. La calidad y, a veces, cantidad de sus datos determinarán que tan bien puede responder a su pregunta inicial. La visualización de datos es un aspecto importante de esta fase. Esta fase también incluye dividir los datos en un grupo de entrenamiento y pruebas para construir un modelo.
+3. **Elige un método de entrenamiento**. Dependiendo de su pregunta y la naturaleza de sus datos, debe elegir cómo desea entrenar un modelo para reflejar mejor sus datos y hacer predicciones precisas contra ellos. Esta es la parte de su proceso de ML que requiere experiencia específica y, a menudo, una cantidad considerable de experimetación.
+4. **Entrena el model**. Usando sus datos de entrenamiento, usará varios algoritmos para entrenar un modelo para reconocer patrones en los datos. El modelo puede aprovechar las ponderaciones internas que se pueden ajustar para privilegiar ciertas partes de los datos sobre otras para construir un modelo mejor.
+5. **Evaluar el modelo**. Utiliza datos nunca antes vistos (sus datos de prueba) de su conjunto recopilado para ver cómo se está desempeñando el modelo.
+6. **Ajuste de parámetros**. Según el rendimiento de su modelo, puede rehacer el proceso utilizando diferentes parámetros, o variables, que controlan el comportamiento de los algoritmos utlizados para entrenarl el modelo.
+7. **Predecir**. Utilice nuevas entradas para probar la precisión de su modelo.
+
+## Que pregunta hacer
+
+Las computadoras son particularmente hábiles para descubrir patrones ocultos en los datos. Esta utlidad es muy útil para los investigadores que tienen preguntas sobre un dominio determinado que no pueden responderse fácilmente mediante la creación de un motor de reglas basado en condicionales. Dada una tarea actuarial, por ejemplo, un científico de datos podría construir reglas artesanales sobre la mortalidad de los fumadores frente a los no fumadores.
+
+Sin embargo, cuandos se incorporan muchas otras variables a la ecuación, un modelo de ML podría resultar más eficiente para predecir las tasas de mortalidad futuras en funciòn de los antecedentes de salud. Un ejemplo más alegre podría hacer predicciones meteorólogicas para el mes de abril en una ubicación determinada que incluya latitud, longitud, cambio climático, proximidad al océano, patrones de la corriente en chorro, y más.
+
+✅ Esta [presentación de diapositivas](https://www2.cisl.ucar.edu/sites/default/files/0900%20June%2024%20Haupt_0.pdf) sobre modelos meteorológicos ofrece una perspectiva histórica del uso de ML en el análisis meteorológico.
+
+## Tarea previas a la construcción
+
+Antes de comenzar a construir su modelo, hay varias tareas que debe comletar. Para probar su pregunta y formar una hipótesis basada en las predicciones de su modelo, debe identificar y configurar varios elementos.
+
+### Datos
+
+Para poder responder su pregunta con cualquier tipo de certeza, necesita una buena cantidad de datos del tipo correcto.
+Hay dos cosas que debe hacer en este punto:
+
+- **Recolectar datos**. Teniendo en cuenta la lección anterior sobre la equidad en el análisis de datos, recopile sus datos con cuidado. Tenga en cuenta la fuente de estos datos, cualquier sesgo inherente que pueda tener y documente su origen.
+- **Preparar datos**. Hay varios pasos en el proceso de preparación de datos. Podría necesitar recopilar datos y normalizarlos si provienen de diversas fuentes. Puede mejorar la calidad y cantidad de los datos mediante varios métodos, como convertir strings en números (como hacemos en [Clustering](../../5-Clustering/1-Visualize/README.md)). También puede generar nuevos datos, basados en los originales (como hacemos en [Clasificación](../../4-Classification/1-Introduction/README.md)). Puede limpiar y editar los datos (como lo haremos antes de la lección [Web App](../../3-Web-App/README.md)). Por último, es posible que también deba aleotizarlo y mezclarlo, según sus técnicas de entrenamiento.
+
+✅ Despúes de recopilar y procesar sus datos, tómese un momento para ver si su forma le permitirá responder a su pregunta. ¡Puede ser que los datos no funcionen bien en su tarea dada, como descubriremos en nuestras lecciones de[Clustering](../../5-Clustering/1-Visualize/README.md)!
+
+### Seleccionando su variable característica
+
+Una [característica](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) es una propiedad medible de sus datos. En muchos conjuntos de datos, se expresa como un encabezado de columna como 'fecha', 'tamaño' o 'color'. Su variable característica, generalmente representada como `y` en el código, representa la respuesta a la pregunta que está tratando de hacer a sus datos: en diciembre, ¿qué calabazas de **color** serán las más baratas?, en San Francisco, ¿que vecinadarios tendrán el mejor **precio** de bienes raíces?
+
+🎓 **Selección y extracción de características** ¿ Cómo sabe que variable elegir al construir un modelo? Probablemente pasará por un proceso de selección o extracción de características para elegir las variables correctas para mayor un mayor rendimiento del modelo. Sin embargo, no son lo mismo: "La extracción de características crea nuevas características a partir de funciones de las características originales, mientras que la selección de características devuelve un subconjunto de las características." ([fuente](https://wikipedia.org/wiki/Feature_selection))
+
+### Visualiza tus datos
+
+Un aspecto importante del conjunto de herramientas del científico de datos es el poder de visualizar datos utilizando varias bibliotecas excelentes como Seaborn o MatPlotLib. Representar sus datos visualmente puede permitirle descubrir correlaciones ocultas que puede aprovechar. Sus visualizaciones también pueden ayudarlo a descubrir sesgos o datos desequilibrados. (como descubrimos en [Clasificación](../../4-Classification/2-Classifiers-1/README.md)).
+
+### Divide tu conjunto de datos
+
+Antes del entrenamiento, debe dividir su conjunto de datos en dos o más partes de tamaño desigual que aún represente bien los datos.
+
+- **Entrenamiento**. Esta parte del conjunto de datos se ajusta a su modelo para entrenarlo. Este conjunto constituye la mayor parte del conjunto de datos original.
+- **Pruebas**. Un conjunto de datos de pruebas es un grupo independiente de datos, a menudo recopilado a partir de los datos originales, que se utiliza para confirmar el rendimiento del modelo construido.
+- **Validación**. Un conjunto de validación es un pequeño grupo independiente de ejemplos que se usa para ajustar los hiperparámetros o la arquitectura del modelo para mejorar el modelo. Dependiendo del tamaño de de su conjunto de datos y de la pregunta que se está haciendo, es posible que no necesite crear este tercer conjunto (como notamos en [Pronóstico se series de tiempo](../../7-TimeSeries/1-Introduction/README.md)).
+
+## Contruye un modelo
+
+Usando sus datos de entrenamiento, su objetivo es construir un modelo, o una representación estadística de sus datos, usando varios algoritmos para **entrenarlo**. El entrenamiento de un modelo lo expone a los datos y le permite hacer suposiciones sobre los patrones percibidos que descubre, valida y rechaza.
+
+### Decide un método de entrenamiento
+
+Dependiendo de su pregunta y la naturaleza de sus datos, elegirá un método para entrenarlos. Pasando por la [documentación de Scikit-learn ](https://scikit-learn.org/stable/user_guide.html) - que usamos en este curso - puede explorar muchas formas de entrenar un modelo. Dependiendo de su experiencia, es posible que deba probar varios métodos diferentes para construir el mejor modelo. Es probable que pase por un proceso en el que los científicos de datos evalúan el rendimiento de un modelo alimentándolo con datos no vistos anteriormente por el modelo, verificando la precisión, el sesgo, y otros problemas que degradan la calidad, y seleccionando el método de entrenamieto más apropiado para la tarea en custión.
+### Entrena un modelo
+
+Armado con sus datos de entrenamiento, está listo para 'fit'(ajustarlos/entrenarlos) para crear un modelo. Notará que en muchas bibliotecas de ML, encontrará el código 'model.fit' - es en este momento cuando envías sus datos como una matriz de valores (generalmente 'X') y una variable característica (generalmente 'Y').
+### Evaluar el modelo
+
+Una vez que se completa el proceso de entrenamiento (puede tomar muchas iteraciones, o 'épocas', entrenar un modelo de gran tamaño), podrá evaluar la calidad del modelo utilizando datos de prueba para medir su rendimiento. Estos datos son un subconjunto de los datos originales que el modelo no ha analizado previamente. Puede imprimir una tabla de métricas sobre la calidad de su modelo.
+
+🎓 **Ajuste del modelo (Model fitting)**
+
+En el contexto del machine learning, el ajuste del modelo se refiere a la precisión de la función subyacente del modelo cuando intenta analizar datos con los que no está familiarizado.
+
+🎓 **Ajuste insuficiente (Underfitting)** y **sobreajuste (overfitting)** son problemas comunes que degradan la calidad del modelo, ya que el modelo no encaja suficientemente bien, o encaja demasiado bien. Esto hace que el modelo haga predicciones demasiado estrechamente alineadas o demasiado poco alineadas con sus datos de entrenamiento. Un modelo sobreajustadoo (overfitting) predice demasiado bien los datos de entrenamiento porque ha aprendido demasiado bien los detalles de los datos y el ruido. Un modelo insuficentemente ajustado (Underfitting) es es preciso, ya que ni puede analizar con precisión sus datos de entrenamiento ni los datos que aún no ha 'visto'.
+
+
+> Infografía de [Jen Looper](https://twitter.com/jenlooper)
+
+## Ajuste de parámetros
+
+Una vez que haya completado su entrenamiento inicial, observe la calidad del modelo y considere mejorarlo ajustando sus 'hiperparámetros'. Lea más sobre el proceso [en la documentación](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters?WT.mc_id=academic-15963-cxa).
+
+## Predicción
+
+Este es el momento en el que puede usar datos completamente nuevos para probar la precisión de su modelo. En una configuración de ML aplicada, donde está creando activos web para usar el modelo en producción, este proceo puede implicar la recopilación de la entrada del usuario (presionar un botón, por ejemplo) para establecer una variable y enviarla al modelo para la inferencia, o evaluación.
+En estas lecciones, descubrirá cómo utilizar estos pasos para preparar, construir, probar, evaluar, y predecir - todos los gestos de un científico de datos y más, a medida que avanza en su viaje para convertirse en un ingeniero de machine learning 'full stack'.
+---
+
+## 🚀Desafío
+
+Dibuje un diagrama de flujos que refleje los pasos de practicante de ML. ¿Dónde te ves ahora mismo en el proceso? ¿Dónde predice que encontrará dificultades? ¿Qué te parece fácil?
+
+## [Cuestionario previo a la conferencia](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/8/)
+
+## Revisión & Autoestudio
+
+Busque en línea entrevistas con científicos de datos que analicen su trabajo diario. Aquí está [uno](https://www.youtube.com/watch?v=Z3IjgbbCEfs).
+
+## Asignación
+
+[Entrevistar a un científico de datos](assignment.md)
\ No newline at end of file
diff --git a/1-Introduction/4-techniques-of-ML/translations/README.id.md b/1-Introduction/4-techniques-of-ML/translations/README.id.md
new file mode 100644
index 00000000..6fee4f14
--- /dev/null
+++ b/1-Introduction/4-techniques-of-ML/translations/README.id.md
@@ -0,0 +1,105 @@
+# Teknik-teknik Machine Learning
+
+Proses membangun, menggunakan, dan memelihara model machine learning dan data yang digunakan adalah proses yang sangat berbeda dari banyak alur kerja pengembangan lainnya. Dalam pelajaran ini, kita akan mengungkap prosesnya dan menguraikan teknik utama yang perlu Kamu ketahui. Kamu akan:
+
+- Memahami gambaran dari proses yang mendasari machine learning.
+- Menjelajahi konsep dasar seperti '*models*', '*predictions*', dan '*training data*'.
+
+## [Quiz Pra-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/7/)
+## Pengantar
+
+Gambaran membuat proses machine learning (ML) terdiri dari sejumlah langkah:
+
+1. **Menentukan pertanyaan**. Sebagian besar proses ML dimulai dengan mengajukan pertanyaan yang tidak dapat dijawab oleh program kondisional sederhana atau mesin berbasis aturan (*rules-based engine*). Pertanyaan-pertanyaan ini sering berkisar seputar prediksi berdasarkan kumpulan data.
+2. **Mengumpulkan dan menyiapkan data**. Untuk dapat menjawab pertanyaanmu, Kamu memerlukan data. Bagaimana kualitas dan terkadang kuantitas data kamu akan menentukan seberapa baik kamu dapat menjawab pertanyaan awal kamu. Memvisualisasikan data merupakan aspek penting dari fase ini. Fase ini juga mencakup pemisahan data menjadi kelompok *training* dan *testing* untuk membangun model.
+3. **Memilih metode training**. Tergantung dari pertanyaan dan sifat datamu, Kamu perlu memilih bagaimana kamu ingin men-training sebuah model untuk mencerminkan data kamu dengan baik dan membuat prediksi yang akurat terhadapnya. Ini adalah bagian dari proses ML yang membutuhkan keahlian khusus dan seringkali perlu banyak eksperimen.
+4. **Melatih model**. Dengan menggunakan data *training*, kamu akan menggunakan berbagai algoritma untuk melatih model guna mengenali pola dalam data. Modelnya mungkin bisa memanfaatkan *internal weight* yang dapat disesuaikan untuk memberi hak istimewa pada bagian tertentu dari data dibandingkan bagian lainnya untuk membangun model yang lebih baik.
+5. **Mengevaluasi model**. Gunakan data yang belum pernah dilihat sebelumnya (data *testing*) untuk melihat bagaimana kinerja model.
+6. **Parameter tuning**. Berdasarkan kinerja modelmu, Kamu dapat mengulang prosesnya menggunakan parameter atau variabel yang berbeda, yang mengontrol perilaku algoritma yang digunakan untuk melatih model.
+7. **Prediksi**. Gunakan input baru untuk menguji keakuratan model kamu.
+
+## Pertanyaan apa yang harus ditanyakan?
+
+Komputer sangat ahli dalam menemukan pola tersembunyi dalam data. Hal ini sangat membantu peneliti yang memiliki pertanyaan tentang domain tertentu yang tidak dapat dijawab dengan mudah dari hanya membuat mesin berbasis aturan kondisional (*conditionally-based rules engine*). Untuk tugas aktuaria misalnya, seorang data scientist mungkin dapat membuat aturan secara manual seputar mortalitas perokok vs non-perokok.
+
+Namun, ketika banyak variabel lain dimasukkan ke dalam persamaan, model ML mungkin terbukti lebih efisien untuk memprediksi tingkat mortalitas di masa depan berdasarkan riwayat kesehatan masa lalu. Contoh yang lebih menyenangkan mungkin membuat prediksi cuaca untuk bulan April di lokasi tertentu berdasarkan data yang mencakup garis lintang, garis bujur, perubahan iklim, kedekatan dengan laut, pola aliran udara (Jet Stream), dan banyak lagi.
+
+✅ [Slide deck](https://www2.cisl.ucar.edu/sites/default/files/0900%20June%2024%20Haupt_0.pdf) ini menawarkan perspektif historis pada model cuaca dengan menggunakan ML dalam analisis cuaca.
+
+## Tugas Pra-Pembuatan
+
+Sebelum mulai membangun model kamu, ada beberapa tugas yang harus kamu selesaikan. Untuk menguji pertanyaan kamu dan membentuk hipotesis berdasarkan prediksi model, Kamu perlu mengidentifikasi dan mengonfigurasi beberapa elemen.
+
+### Data
+
+Untuk dapat menjawab pertanyaan kamu dengan kepastian, Kamu memerlukan sejumlah besar data dengan jenis yang tepat. Ada dua hal yang perlu kamu lakukan pada saat ini:
+
+- **Mengumpulkan data**. Ingat pelajaran sebelumnya tentang keadilan dalam analisis data, kumpulkan data kamu dengan hati-hati. Waspadai sumber datanya, bias bawaan apa pun yang mungkin dimiliki, dan dokumentasikan asalnya.
+- **Menyiapkan data**. Ada beberapa langkah dalam proses persiapan data. Kamu mungkin perlu menyusun data dan melakukan normalisasi jika berasal dari berbagai sumber. Kamu dapat meningkatkan kualitas dan kuantitas data melalui berbagai metode seperti mengonversi string menjadi angka (seperti yang kita lakukan di [Clustering](../../5-Clustering/1-Visualize/translations/README.id.md)). Kamu mungkin juga bisa membuat data baru berdasarkan data yang asli (seperti yang kita lakukan di [Classification](../../4-Classification/1-Introduction/translations/README.id.md)). Kamu bisa membersihkan dan mengubah data (seperti yang kita lakukan sebelum pelajaran [Web App](../3-Web-App/translations/README.id.md)). Terakhir, Kamu mungkin juga perlu mengacaknya dan mengubah urutannya, tergantung pada teknik *training* kamu.
+
+✅ Setelah mengumpulkan dan memproses data kamu, luangkan waktu sejenak untuk melihat apakah bentuknya memungkinkan kamu untuk menjawab pertanyaan yang kamu maksudkan. Mungkin data tidak akan berkinerja baik dalam tugas yang kamu berikan, seperti yang kita temukan dalam pelajaran [Clustering](../../5-Clustering/1-Visualize/translations/README.id.md).
+
+### Memilih variabel fiturmu
+
+Sebuah [fitur](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) adalah sebuah properti yang dapat diukur dalam data kamu. Dalam banyak dataset, properti dinyatakan sebagai sebuah heading kolom seperti 'date' 'size' atau 'color'. Variabel fitur kamu yang biasanya direpresentasikan sebagai `y` dalam kode, mewakili jawaban atas pertanyaan yang kamu coba tanyakan tentang data kamu: pada bulan Desember, labu dengan **warna** apa yang akan paling murah? di San Francisco, lingkungan mana yang menawarkan **harga** real estate terbaik?
+
+🎓 **Feature Selection dan Feature Extraction** Bagaimana kamu tahu variabel mana yang harus dipilih saat membangun model? Kamu mungkin akan melalui proses pemilihan fitur (*Feature Selection*) atau ekstraksi fitur (*Feature Extraction*) untuk memilih variabel yang tepat untuk membuat model yang berkinerja paling baik. Namun, keduanya tidak sama: "Ekstraksi fitur membuat fitur baru dari fungsi fitur asli, sedangkan pemilihan fitur mengembalikan subset fitur." ([sumber](https://wikipedia.org/wiki/Feature_selection))
+### Visualisasikan datamu
+
+Aspek penting dari toolkit data scientist adalah kemampuan untuk memvisualisasikan data menggunakan beberapa *library* seperti Seaborn atau MatPlotLib. Merepresentasikan data kamu secara visual memungkinkan kamu mengungkap korelasi tersembunyi yang dapat kamu manfaatkan. Visualisasimu mungkin juga membantu kamu mengungkap data yang bias atau tidak seimbang (seperti yang kita temukan dalam [Classification](../../4-Classification/2-Classifiers-1/translations/README.id.md)).
+### Membagi dataset
+
+Sebelum memulai *training*, Kamu perlu membagi dataset menjadi dua atau lebih bagian dengan ukuran yang tidak sama tapi masih mewakili data dengan baik.
+
+- **Training**. Bagian dataset ini digunakan untuk men-training model kamu. Bagian dataset ini merupakan mayoritas dari dataset asli.
+- **Testing**. Sebuah dataset tes adalah kelompok data independen, seringkali dikumpulkan dari data yang asli yang akan digunakan untuk mengkonfirmasi kinerja dari model yang dibuat.
+- **Validating**. Dataset validasi adalah kumpulan contoh mandiri yang lebih kecil yang kamu gunakan untuk menyetel hyperparameter atau arsitektur model untuk meningkatkan model. Tergantung dari ukuran data dan pertanyaan yang kamu ajukan, Kamu mungkin tidak perlu membuat dataset ketiga ini (seperti yang kita catat dalam [Time Series Forecasting](../7-TimeSeries/1-Introduction/translations/README.id.md)).
+
+## Membuat sebuah model
+
+Dengan menggunakan data *training*, tujuan kamu adalah membuat model atau representasi statistik data kamu menggunakan berbagai algoritma untuk **melatihnya**. Melatih model berarti mengeksposnya dengan data dan mengizinkannya membuat asumsi tentang pola yang ditemukan, divalidasi, dan diterima atau ditolak.
+
+### Tentukan metode training
+
+Tergantung dari pertanyaan dan sifat datamu, Kamu akan memilih metode untuk melatihnya. Buka dokumentasi [Scikit-learn](https://scikit-learn.org/stable/user_guide.html) yang kita gunakan dalam pelajaran ini, kamu bisa menjelajahi banyak cara untuk melatih sebuah model. Tergantung dari pengalamanmu, kamu mungkin perlu mencoba beberapa metode yang berbeda untuk membuat model yang terbaik. Kemungkinan kamu akan melalui proses di mana data scientist mengevaluasi kinerja model dengan memasukkan data yang belum pernah dilihat, memeriksa akurasi, bias, dan masalah penurunan kualitas lainnya, dan memilih metode training yang paling tepat untuk tugas yang ada.
+### Melatih sebuah model
+
+Berbekal data *training*, Kamu siap untuk menggunakannya untuk membuat model. Kamu akan melihat di banyak *library* ML mengenai kode 'model.fit' - pada saat inilah kamu mengirimkan data kamu sebagai *array* nilai (biasanya 'X') dan variabel fitur (biasanya 'y' ).
+### Mengevaluasi model
+
+Setelah proses *training* selesai (ini mungkin membutuhkan banyak iterasi, atau 'epoch', untuk melatih model besar), Kamu akan dapat mengevaluasi kualitas model dengan menggunakan data tes untuk mengukur kinerjanya. Data ini merupakan subset dari data asli yang modelnya belum pernah dianalisis sebelumnya. Kamu dapat mencetak tabel metrik tentang kualitas model kamu.
+
+🎓 **Model fitting**
+
+Dalam konteks machine learning, *model fitting* mengacu pada keakuratan dari fungsi yang mendasari model saat mencoba menganalisis data yang tidak familiar.
+
+🎓 **Underfitting** dan **overfitting** adalah masalah umum yang menurunkan kualitas model, karena model tidak cukup akurat atau terlalu akurat. Hal ini menyebabkan model membuat prediksi yang terlalu selaras atau tidak cukup selaras dengan data trainingnya. Model overfit memprediksi data *training* terlalu baik karena telah mempelajari detail dan noise data dengan terlalu baik. Model underfit tidak akurat karena tidak dapat menganalisis data *training* atau data yang belum pernah dilihat sebelumnya secara akurat.
+
+
+> Infografis oleh [Jen Looper](https://twitter.com/jenlooper)
+
+## Parameter tuning
+
+Setelah *training* awal selesai, amati kualitas model dan pertimbangkan untuk meningkatkannya dengan mengubah 'hyperparameter' nya. Baca lebih lanjut tentang prosesnya [di dalam dokumentasi](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters?WT.mc_id=academic-15963-cxa).
+
+## Prediksi
+
+Ini adalah saat di mana Kamu dapat menggunakan data yang sama sekali baru untuk menguji akurasi model kamu. Dalam setelan ML 'terapan', di mana kamu membangun aset web untuk menggunakan modelnya dalam produksi, proses ini mungkin melibatkan pengumpulan input pengguna (misalnya menekan tombol) untuk menyetel variabel dan mengirimkannya ke model untuk inferensi, atau evaluasi.
+
+Dalam pelajaran ini, Kamu akan menemukan cara untuk menggunakan langkah-langkah ini untuk mempersiapkan, membangun, menguji, mengevaluasi, dan memprediksi - semua gestur data scientist dan banyak lagi, seiring kemajuanmu dalam perjalanan menjadi 'full stack' ML engineer.
+
+---
+
+## 🚀Tantangan
+
+Gambarlah sebuah flow chart yang mencerminkan langkah-langkah seorang praktisi ML. Di mana kamu melihat diri kamu saat ini dalam prosesnya? Di mana kamu memprediksi kamu akan menemukan kesulitan? Apa yang tampak mudah bagi kamu?
+
+## [Quiz Pra-Pelajaran](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/8/)
+
+## Ulasan & Belajar Mandiri
+
+Cari di Internet mengenai wawancara dengan data scientist yang mendiskusikan pekerjaan sehari-hari mereka. Ini [salah satunya](https://www.youtube.com/watch?v=Z3IjgbbCEfs).
+
+## Tugas
+
+[Wawancara dengan data scientist](assignment.id.md)
diff --git a/1-Introduction/4-techniques-of-ML/translations/README.ja.md b/1-Introduction/4-techniques-of-ML/translations/README.ja.md
new file mode 100644
index 00000000..0e0a1bca
--- /dev/null
+++ b/1-Introduction/4-techniques-of-ML/translations/README.ja.md
@@ -0,0 +1,110 @@
+# 機械学習の手法
+
+機械学習モデルやそのモデルが使用するデータを構築・使用・管理するプロセスは、他の多くの開発ワークフローとは全く異なるものです。このレッスンでは、このプロセスを明快にして、知っておくべき主な手法の概要をまとめます。あなたは、
+
+- 機械学習を支えるプロセスを高い水準で理解します。
+- 「モデル」「予測」「訓練データ」などの基本的な概念を調べます。
+
+## [講義前の小テスト](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/7?loc=ja)
+
+## 導入
+
+大まかに言うと、機械学習 (Machine Learning: ML) プロセスを作成する技術はいくつかのステップで構成されています。
+
+1. **質問を決める**。ほとんどの機械学習プロセスは、単純な条件のプログラムやルールベースのエンジンでは答えられないような質問をすることから始まります。このような質問は、データの集合を使った予測を中心にされることが多いです。
+2. **データを集めて準備する**。質問に答えるためにはデータが必要です。データの質と、ときには量が、最初の質問にどれだけうまく答えられるかを決めます。データの可視化がこのフェーズの重要な側面です。モデルを構築するためにデータを訓練グループとテストグループに分けることもこのフェーズに含みます。
+3. **学習方法を選ぶ**。質問の内容やデータの性質に応じて、データを最も良く反映して正確に予測できるモデルを、どのように学習するかを選ぶ必要があります。これは機械学習プロセスの中でも、特定の専門知識と、多くの場合はかなりの試行回数が必要になる部分です。
+4. **モデルを学習する**。データのパターンを認識するモデルを学習するために、訓練データと様々なアルゴリズムを使います。モデルはより良いモデルを構築するために、データの特定の部分を優先するように調整できる内部の重みを活用するかもしれません。
+5. **モデルを評価する**。モデルがどのように動作しているかを確認するために、集めたデータの中からまだ見たことのないもの(テストデータ)を使います。
+6. **パラメータチューニング**。モデルの性能によっては、モデルを学習するために使われる、各アルゴリズムの挙動を制御するパラメータや変数を変更してプロセスをやり直すこともできます。
+7. **予測する**。モデルの精度をテストするために新しい入力を使います。
+
+## どのような質問をすれば良いか
+
+コンピュータはデータの中に隠れているパターンを見つけることがとても得意です。この有用性は、条件ベースのルールエンジンを作っても簡単には答えられないような、特定の領域に関する質問を持っている研究者にとって非常に役立ちます。たとえば、ある保険数理の問題があったとして、データサイエンティストは喫煙者と非喫煙者の死亡率に関する法則を自分の手だけでも作れるかもしれません。
+
+しかし、他にも多くの変数が方程式に含まれる場合、過去の健康状態から将来の死亡率を予測する機械学習モデルの方が効率的かもしれません。もっと明るいテーマの例としては、緯度、経度、気候変動、海への近さ、ジェット気流のパターンなどのデータに基づいて、特定の場所における4月の天気を予測することができます。
+
+✅ 気象モデルに関するこの [スライド](https://www2.cisl.ucar.edu/sites/default/files/0900%20June%2024%20Haupt_0.pdf) は、気象解析に機械学習を使う際の歴史的な考え方を示しています。
+
+## 構築前のタスク
+
+モデルの構築を始める前に、いくつかのタスクを完了させる必要があります。質問をテストしたりモデルの予測に基づいた仮説を立てたりするためには、いくつかの要素を特定して設定する必要があります。
+
+### データ
+
+質問に確実に答えるためには、適切な種類のデータが大量に必要になります。ここではやるべきことが2つあります。
+
+- **データを集める**。データ解析における公平性に関する前回の講義を思い出しながら、慎重にデータを集めてください。特定のバイアスを持っているかもしれないデータのソースに注意し、それを記録しておいてください。
+- **データを準備する**。データを準備するプロセスにはいくつかのステップがあります。異なるソースからデータを集めた場合、照合と正規化が必要になるかもしれません。([クラスタリング](../../../5-Clustering/1-Visualize/README.md) で行っているように、)文字列を数値に変換するなどの様々な方法でデータの質と量を向上させることができます。([分類](../../../4-Classification/1-Introduction/README.md) で行っているように、)元のデータから新しいデータを生成することもできます。([Webアプリ](../../../3-Web-App/README.md) の講義の前に行うように、)データをクリーニングしたり編集したりすることができます。最後に、学習の手法によっては、ランダムにしたりシャッフルしたりする必要もあるかもしれません。
+
+✅ データを集めて処理した後は、その形で意図した質問に対応できるかどうかを確認してみましょう。[クラスタリング](../../../5-Clustering/1-Visualize/README.md) の講義でわかるように、データは与えられたタスクに対して上手く機能しないかもしれません!
+
+### 特徴量の選択
+
+[特徴](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection) とは、測定可能なデータの特性のことです。多くのデータセットでは、「日付」「大きさ」「色」などの列の見出しとして表されています。コード上では `y` で表されることが多い特徴量は、データに関する次のような質問への回答を意味します。「12月はどんな **色** のカボチャが一番安いか?」「サンフランシスコでは、どの地域が最も不動産の **価格** が高いか?」
+
+🎓 **特徴選択と特徴抽出** モデルを構築する際にどの変数を選ぶべきかは、どうすればわかるでしょうか?最も性能の高いモデルのためには、適した変数を選択する特徴選択や特徴抽出のプロセスをたどることになるでしょう。しかし、これらは同じものではありません。「特徴抽出は元の特徴の機能から新しい特徴を作成するのに対し、特徴選択は特徴の一部を返すものです。」 ([出典](https://wikipedia.org/wiki/Feature_selection))
+
+### データを可視化する
+
+データサイエンティストの道具に関する重要な側面は、Seaborn や MatPlotLib などの優れたライブラリを使ってデータを可視化する力です。データを視覚的に表現することで、隠れた相関関係を見つけて活用できるかもしれません。また、([分類](../../../4-Classification/2-Classifiers-1/README.md) でわかるように、)視覚化することで、バイアスやバランシングされていないデータを見つけられるかもしれません。
+
+### データセットを分割する
+
+学習の前にデータセットを2つ以上に分割して、それぞれがデータを表すのに十分かつ不均等な大きさにする必要があります。
+
+- **学習**。データセットのこの部分は、モデルを学習するために適合させます。これは元のデータセットの大部分を占めます。
+- **テスト**。テストデータセットとは、構築したモデルの性能を確認するために使用する独立したデータグループのことで、多くの場合は元のデータから集められます。
+- **検証**。検証セットとは、さらに小さくて独立したサンプルの集合のことで、モデルを改善するためにハイパーパラメータや構造を調整する際に使用されます。([時系列予測](../../../7-TimeSeries/1-Introduction/README.md) に記載しているように、)データの大きさや質問の内容によっては、この3つ目のセットを作る必要はありません。
+
+## モデルの構築
+
+訓練データと様々なアルゴリズムを使った **学習** によって、モデルもしくはデータの統計的な表現を構築することが目標です。モデルを学習することで、データを扱えるようになったり、発見、検証、肯定または否定したパターンに関する仮説を立てることができたりします。
+
+### 学習方法を決める
+
+質問の内容やデータの性質に応じて、モデルを学習する方法を選択します。このコースで使用する [Scikit-learn のドキュメント](https://scikit-learn.org/stable/user_guide.html) を見ると、モデルを学習する様々な方法を調べられます。経験次第では、最適なモデルを構築するためにいくつかの異なる方法を試す必要があるかもしれません。また、モデルが見たことのないデータを与えたり、質を下げている問題、精度、バイアスについて調べたり、タスクに対して最適な学習方法を選んだりすることで、データサイエンティストが行っている、モデルの性能を評価するプロセスを踏むことになるでしょう。
+
+### モデルを学習する
+
+訓練データを用意したので、モデルを作成するためにそれを「適合」させる準備が整いました。多くの機械学習ライブラリには 'model.fit' というコードがあることに気づくでしょう。データを(通常は 'X' で表す)値の配列と(通常は 'y' で表す)特徴量として渡すときです。
+
+### モデルを評価する
+
+(大きなモデルを学習するには多くの反復(エポック)が必要になりますが、)学習プロセスが完了したら、テストデータを使ってモデルの質を評価することができます。このデータは元のデータのうち、モデルがそれまでに分析していないものです。モデルの質を表す指標の表を出力することができます。
+
+🎓 **モデルフィッティング**
+
+機械学習におけるモデルフィッティングは、モデルがまだ知らないデータを分析する際の根本的な機能の精度を参照します。
+
+🎓 **未学習** と **過学習** はモデルの質を下げる一般的な問題で、モデルが十分に適合していないか、または適合しすぎています。これによってモデルは訓練データに近すぎたり遠すぎたりする予測を行います。過学習モデルは、データの詳細やノイズもよく学習しているため、訓練データを上手く予測しすぎてしまいます。未学習モデルは、訓練データやまだ「見たことのない」データを正確に分析することができないため、精度が高くないです。
+
+
+> [Jen Looper](https://twitter.com/jenlooper) さんによる解説画像
+
+## パラメータチューニング
+
+最初のトレーニングが完了したら、モデルの質を観察して、「ハイパーパラメータ」の調整によるモデルの改善を検討しましょう。このプロセスについては [ドキュメント](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters?WT.mc_id=academic-15963-cxa) を読んでください。
+
+## 予測
+
+全く新しいデータを使ってモデルの精度をテストする瞬間です。本番環境でモデルを使用するためにWebアセットを構築するよう「適用された」機械学習の設定においては、推論や評価のためにモデルに渡したり、変数を設定したりするためにユーザの入力(ボタンの押下など)を収集することがこのプロセスに含まれるかもしれません。
+
+この講義では、「フルスタック」の機械学習エンジニアになるための旅をしながら、準備・構築・テスト・評価・予測などのデータサイエンティストが行うすべてのステップの使い方を学びます。
+
+---
+
+## 🚀チャレンジ
+
+機械学習の学習者のステップを反映したフローチャートを描いてください。今の自分はこのプロセスのどこにいると思いますか?どこに困難があると予想しますか?あなたにとって簡単そうなことは何ですか?
+
+## [講義後の小テスト](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/8?loc=ja)
+
+## 振り返りと自主学習
+
+データサイエンティストが日々の仕事について話しているインタビューをネットで検索してみましょう。ひとつは [これ](https://www.youtube.com/watch?v=Z3IjgbbCEfs) です。
+
+## 課題
+
+[データサイエンティストにインタビューする](assignment.ja.md)
diff --git a/1-Introduction/4-techniques-of-ML/translations/README.zh-cn.md b/1-Introduction/4-techniques-of-ML/translations/README.zh-cn.md
index d01d5bbf..318876bd 100644
--- a/1-Introduction/4-techniques-of-ML/translations/README.zh-cn.md
+++ b/1-Introduction/4-techniques-of-ML/translations/README.zh-cn.md
@@ -54,7 +54,7 @@
- **训练**。这部分数据集适合你的模型进行训练。这个集合构成了原始数据集的大部分。
- **测试**。测试数据集是一组独立的数据,通常从原始数据中收集,用于确认构建模型的性能。
-- **验证**。验证集是一个较小的独立示例组,用于调整模型的超参数或架构,以改进模型。根据你的数据大小和你提出的问题,你可能不需要构建第三组(正如我们在[时间序列预测](../../7-TimeSeries/1-Introduction/README.md)中所述)。
+- **验证**。验证集是一个较小的独立示例组,用于调整模型的超参数或架构,以改进模型。根据你的数据大小和你提出的问题,你可能不需要构建第三组(正如我们在[时间序列预测](../../../7-TimeSeries/1-Introduction/README.md)中所述)。
## 建立模型
@@ -72,7 +72,7 @@
训练过程完成后(训练大型模型可能需要多次迭代或“时期”),你将能够通过使用测试数据来衡量模型的性能来评估模型的质量。此数据是模型先前未分析的原始数据的子集。 你可以打印出有关模型质量的指标表。
-🎓 **模型拟合 **
+🎓 **模型拟合**
在机器学习的背景下,模型拟合是指模型在尝试分析不熟悉的数据时其底层功能的准确性。
@@ -105,4 +105,4 @@
## 任务
-[采访一名数据科学家](../assignment.md)
+[采访一名数据科学家](assignment.zh-cn.md)
diff --git a/1-Introduction/4-techniques-of-ML/translations/assignment.id.md b/1-Introduction/4-techniques-of-ML/translations/assignment.id.md
new file mode 100644
index 00000000..9f7b23be
--- /dev/null
+++ b/1-Introduction/4-techniques-of-ML/translations/assignment.id.md
@@ -0,0 +1,11 @@
+# Wawancara seorang data scientist
+
+## Instruksi
+
+Di perusahaan Kamu, dalam user group, atau di antara teman atau sesama siswa, berbicaralah dengan seseorang yang bekerja secara profesional sebagai data scientist. Tulis makalah singkat (500 kata) tentang pekerjaan sehari-hari mereka. Apakah mereka spesialis, atau apakah mereka bekerja 'full stack'?
+
+## Rubrik
+
+| Kriteria | Sangat Bagus | Cukup | Perlu Peningkatan |
+| -------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------ | --------------------- |
+| | Sebuah esai dengan panjang yang sesuai, dengan sumber yang dikaitkan, disajikan sebagai file .doc | Esai dikaitkan dengan buruk atau lebih pendek dari panjang yang dibutuhkan | Tidak ada esai yang disajikan |
diff --git a/1-Introduction/4-techniques-of-ML/translations/assignment.ja.md b/1-Introduction/4-techniques-of-ML/translations/assignment.ja.md
new file mode 100644
index 00000000..b3690e77
--- /dev/null
+++ b/1-Introduction/4-techniques-of-ML/translations/assignment.ja.md
@@ -0,0 +1,11 @@
+# データサイエンティストにインタビューする
+
+## 指示
+
+会社・ユーザグループ・友人・学生仲間の中で、データサイエンティストとして専門的に働いている人に話を聞いてみましょう。その人の日々の仕事について短いレポート(500語)を書いてください。その人は専門家でしょうか?それとも「フルスタック」として働いているでしょうか?
+
+## 評価基準
+
+| 基準 | 模範的 | 十分 | 要改善 |
+| ---- | ---------------------------------------------------------------------- | -------------------------------------------------------------- | -------------------------- |
+| | 出典が明記された適切な長さのレポートが.docファイルとして提示されている | レポートに出典が明記されていない、もしくは必要な長さよりも短い | レポートが提示されていない |
diff --git a/1-Introduction/4-techniques-of-ML/translations/assignment.zh-cn.md b/1-Introduction/4-techniques-of-ML/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..ba28b554
--- /dev/null
+++ b/1-Introduction/4-techniques-of-ML/translations/assignment.zh-cn.md
@@ -0,0 +1,11 @@
+# 采访一位数据科学家
+
+## 说明
+
+在你的公司、你所在的社群、或者在你的朋友和同学中,找到一位从事数据科学专业工作的人,与他或她交流一下。写一篇关于他们工作日常的小短文(500字左右)。他们是专家,还是说他们是“全栈”开发者?
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | ------------------------------------------------------------------------------------ | ------------------------------------------------------------------ | --------------------- |
+| | 提交一篇清晰描述了职业属性且字数符合规范的word文档 | 提交的文档职业属性描述得不清晰或者字数不合规范 | 啥都没有交 |
diff --git a/1-Introduction/translations/README.fr.md b/1-Introduction/translations/README.fr.md
new file mode 100644
index 00000000..462dea70
--- /dev/null
+++ b/1-Introduction/translations/README.fr.md
@@ -0,0 +1,22 @@
+# Introduction au machine learning
+
+Dans cette section du programme, vous découvrirez les concepts de base sous-jacents au domaine du machine learning, ce qu’il est, et vous découvrirez son histoire et les techniques que les chercheurs utilisent pour travailler avec lui. Explorons ensemble ce nouveau monde de ML !
+
+
+> Photo par Bill Oxford sur Unsplash
+
+### Leçons
+
+1. [Introduction au machine learning](../1-intro-to-ML/translations/README.fr.md)
+1. [L’histoire du machine learning et de l’IA](../2-history-of-ML/translations/README.fr.md)
+1. [Équité et machine learning](../3-fairness/translations/README.fr.md)
+1. [Techniques de machine learning](../4-techniques-of-ML/translations/README.fr.md)
+### Crédits
+
+"Introduction au machine learning" a été écrit avec ♥️ par une équipe de personnes comprenant [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan), [Ornella Altunyan](https://twitter.com/ornelladotcom) et [Jen Looper](https://twitter.com/jenlooper)
+
+"L’histoire du machine learning" a été écrit avec ♥️ par [Jen Looper](https://twitter.com/jenlooper) et [Amy Boyd](https://twitter.com/AmyKateNicho)
+
+"Équité et machine learning" a été écrit avec ♥️ par [Tomomi Imura](https://twitter.com/girliemac)
+
+"Techniques de machine learning" a été écrit avec ♥️ par [Jen Looper](https://twitter.com/jenlooper) et [Chris Noring](https://twitter.com/softchris)
diff --git a/1-Introduction/translations/README.id.md b/1-Introduction/translations/README.id.md
new file mode 100644
index 00000000..0e6cc557
--- /dev/null
+++ b/1-Introduction/translations/README.id.md
@@ -0,0 +1,23 @@
+# Pengantar Machine Learning
+
+Di bagian kurikulum ini, Kamu akan berkenalan dengan konsep yang mendasari bidang Machine Learning, apa itu Machine Learning, dan belajar mengenai
+sejarah serta teknik-teknik yang digunakan oleh para peneliti. Ayo jelajahi dunia baru Machine Learning bersama!
+
+
+> Foto oleh Bill Oxford di Unsplash
+
+### Pelajaran
+
+1. [Pengantar Machine Learning](../1-intro-to-ML/translations/README.id.md)
+1. [Sejarah dari Machine Learning dan AI](../2-history-of-ML/translations/README.id.md)
+1. [Keadilan dan Machine Learning](../3-fairness/translations/README.id.md)
+1. [Teknik-Teknik Machine Learning](../4-techniques-of-ML/translations/README.id.md)
+### Penghargaan
+
+"Pengantar Machine Learning" ditulis dengan ♥️ oleh sebuah tim yang terdiri dari [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan), [Ornella Altunyan](https://twitter.com/ornelladotcom) dan [Jen Looper](https://twitter.com/jenlooper)
+
+"Sejarah dari Machine Learning dan AI" ditulis dengan ♥️ oleh [Jen Looper](https://twitter.com/jenlooper) dan [Amy Boyd](https://twitter.com/AmyKateNicho)
+
+"Keadilan dan Machine Learning" ditulis dengan ♥️ oleh [Tomomi Imura](https://twitter.com/girliemac)
+
+"Teknik-Teknik Machine Learning" ditulis dengan ♥️ oleh [Jen Looper](https://twitter.com/jenlooper) dan [Chris Noring](https://twitter.com/softchris)
diff --git a/1-Introduction/translations/README.zh-cn.md b/1-Introduction/translations/README.zh-cn.md
new file mode 100644
index 00000000..f1ad8e1e
--- /dev/null
+++ b/1-Introduction/translations/README.zh-cn.md
@@ -0,0 +1,22 @@
+# 机器学习入门
+
+课程的本章节将为您介绍机器学习领域背后的基本概念、什么是机器学习,并学习它的历史以及曾为此做出贡献的技术研究者门。让我们一起开始探索机器学习的全新世界吧!
+
+
+> 图片由 Bill Oxford提供,来自 Unsplash
+
+### 课程安排
+
+1. [机器学习简介](../1-intro-to-ML/translations/README.zh-cn.md)
+1. [机器学习的历史](../2-history-of-ML/translations/README.zh-cn.md)
+1. [机器学习中的公平性](../3-fairness/translations/README.zh-cn.md)
+1. [机器学习技术](../4-techniques-of-ML/translations/README.zh-cn.md)
+### 致谢
+
+"机器学习简介"由 [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan), [Ornella Altunyan](https://twitter.com/ornelladotcom) 及 [Jen Looper](https://twitter.com/jenlooper),共同倾 ♥️ 而作
+
+"机器学习及人工智能历史" 由 [Jen Looper](https://twitter.com/jenlooper) 及 [Amy Boyd](https://twitter.com/AmyKateNicho)倾 ♥️ 而作
+
+"公平性与机器学习" 由 [Tomomi Imura](https://twitter.com/girliemac) 倾 ♥️ 而作
+
+"机器学习的技术" 由 [Jen Looper](https://twitter.com/jenlooper) 及 [Chris Noring](https://twitter.com/softchris) 倾 ♥️ 而作
diff --git a/2-Regression/1-Tools/README.md b/2-Regression/1-Tools/README.md
index e36c34fe..69986fba 100644
--- a/2-Regression/1-Tools/README.md
+++ b/2-Regression/1-Tools/README.md
@@ -5,6 +5,9 @@
> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)
## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/9/)
+
+> ### [This lesson is available in R!](./solution/lesson_1-R.ipynb)
+
## Introduction
In these four lessons, you will discover how to build regression models. We will discuss what these are for shortly. But before you do anything, make sure you have the right tools in place to start the process!
@@ -95,7 +98,7 @@ For this task we will import some libraries:
- **matplotlib**. It's a useful [graphing tool](https://matplotlib.org/) and we will use it to create a line plot.
- **numpy**. [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html) is a useful library for handling numeric data in Python.
-- **sklearn**. This is the Scikit-learn library.
+- **sklearn**. This is the [Scikit-learn](https://scikit-learn.org/stable/user_guide.html) library.
Import some libraries to help with your tasks.
@@ -180,6 +183,9 @@ In a new code cell, load the diabetes dataset by calling `load_diabetes()`. The
```python
plt.scatter(X_test, y_test, color='black')
plt.plot(X_test, y_pred, color='blue', linewidth=3)
+ plt.xlabel('Scaled BMIs')
+ plt.ylabel('Disease Progression')
+ plt.title('A Graph Plot Showing Diabetes Progression Against BMI')
plt.show()
```
diff --git a/2-Regression/1-Tools/images/encouRage.jpg b/2-Regression/1-Tools/images/encouRage.jpg
new file mode 100644
index 00000000..e1d08fc2
Binary files /dev/null and b/2-Regression/1-Tools/images/encouRage.jpg differ
diff --git a/2-Regression/1-Tools/images/scatterplot.png b/2-Regression/1-Tools/images/scatterplot.png
index ba9f1610..446529a5 100644
Binary files a/2-Regression/1-Tools/images/scatterplot.png and b/2-Regression/1-Tools/images/scatterplot.png differ
diff --git a/2-Regression/1-Tools/solution/lesson_1-R.ipynb b/2-Regression/1-Tools/solution/lesson_1-R.ipynb
new file mode 100644
index 00000000..cba24081
--- /dev/null
+++ b/2-Regression/1-Tools/solution/lesson_1-R.ipynb
@@ -0,0 +1,436 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "lesson_1-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "YJUHCXqK57yz"
+ },
+ "source": [
+ "#Build a regression model: Get started with R and Tidymodels for regression models"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "LWNNzfqd6feZ"
+ },
+ "source": [
+ "## Introduction to Regression - Lesson 1\n",
+ "\n",
+ "#### Putting it into perspective\n",
+ "\n",
+ "✅ There are many types of regression methods, and which one you pick depends on the answer you're looking for. If you want to predict the probable height for a person of a given age, you'd use `linear regression`, as you're seeking a **numeric value**. If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a **category assignment** so you would use `logistic regression`. You'll learn more about logistic regression later. Think a bit about some questions you can ask of data, and which of these methods would be more appropriate.\n",
+ "\n",
+ "In this section, you will work with a [small dataset about diabetes](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html). Imagine that you wanted to test a treatment for diabetic patients. Machine Learning models might help you determine which patients would respond better to the treatment, based on combinations of variables. Even a very basic regression model, when visualized, might show information about variables that would help you organize your theoretical clinical trials.\n",
+ "\n",
+ "That said, let's get started on this task!\n",
+ "\n",
+ " Artwork by @allison_horst"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "FIo2YhO26wI9"
+ },
+ "source": [
+ "## 1. Loading up our tool set\n",
+ "\n",
+ "For this task, we'll require the following packages:\n",
+ "\n",
+ "- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!\n",
+ "\n",
+ "- `tidymodels`: The [tidymodels](https://www.tidymodels.org/) framework is a [collection of packages](https://www.tidymodels.org/packages/) for modeling and machine learning.\n",
+ "\n",
+ "You can have them installed as:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\"))`\n",
+ "\n",
+ "The script below checks whether you have the packages required to complete this module and installs them for you in case some are missing."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "cIA9fz9v7Dss",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "2df7073b-86b2-4b32-cb86-0da605a0dc11"
+ },
+ "source": [
+ "if (!require(\"pacman\")) install.packages(\"pacman\")\n",
+ "pacman::p_load(tidyverse, tidymodels)"
+ ],
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "text": [
+ "Loading required package: pacman\n",
+ "\n"
+ ],
+ "name": "stderr"
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "gpO_P_6f9WUG"
+ },
+ "source": [
+ "Now, let's load these awesome packages and make them available in our current R session.(This is for mere illustration, `pacman::p_load()` already did that for you)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "NLMycgG-9ezO"
+ },
+ "source": [
+ "# load the core Tidyverse packages\n",
+ "library(tidyverse)\n",
+ "\n",
+ "# load the core Tidymodels packages\n",
+ "library(tidymodels)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KM6iXLH996Cl"
+ },
+ "source": [
+ "## 2. The diabetes dataset\n",
+ "\n",
+ "In this exercise, we'll put our regression skills into display by making predictions on a diabetes dataset. The [diabetes dataset](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.rwrite1.txt) includes `442 samples` of data around diabetes, with 10 predictor feature variables, `age`, `sex`, `body mass index`, `average blood pressure`, and `six blood serum measurements` as well as an outcome variable `y`: a quantitative measure of disease progression one year after baseline.\n",
+ "\n",
+ "|Number of observations|442|\n",
+ "|----------------------|:---|\n",
+ "|Number of predictors|First 10 columns are numeric predictive|\n",
+ "|Outcome/Target|Column 11 is a quantitative measure of disease progression one year after baseline|\n",
+ "|Predictor Information|- age in years\n",
+ "||- sex\n",
+ "||- bmi body mass index\n",
+ "||- bp average blood pressure\n",
+ "||- s1 tc, total serum cholesterol\n",
+ "||- s2 ldl, low-density lipoproteins\n",
+ "||- s3 hdl, high-density lipoproteins\n",
+ "||- s4 tch, total cholesterol / HDL\n",
+ "||- s5 ltg, possibly log of serum triglycerides level\n",
+ "||- s6 glu, blood sugar level|\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "> 🎓 Remember, this is supervised learning, and we need a named 'y' target.\n",
+ "\n",
+ "Before you can manipulate data with R, you need to import the data into R's memory, or build a connection to the data that R can use to access the data remotely.\n",
+ "\n",
+ "> The [readr](https://readr.tidyverse.org/) package, which is part of the Tidyverse, provides a fast and friendly way to read rectangular data into R.\n",
+ "\n",
+ "Now, let's load the diabetes dataset provided in this source URL: \n",
+ "\n",
+ "Also, we'll perform a sanity check on our data using `glimpse()` and dsiplay the first 5 rows using `slice()`.\n",
+ "\n",
+ "Before going any further, let's also introduce something you will encounter often in R code 🥁🥁: the pipe operator `%>%`\n",
+ "\n",
+ "The pipe operator (`%>%`) performs operations in logical sequence by passing an object forward into a function or call expression. You can think of the pipe operator as saying \"and then\" in your code."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Z1geAMhM-bSP"
+ },
+ "source": [
+ "# Import the data set\n",
+ "diabetes <- read_table2(file = \"https://www4.stat.ncsu.edu/~boos/var.select/diabetes.rwrite1.txt\")\n",
+ "\n",
+ "\n",
+ "# Get a glimpse and dimensions of the data\n",
+ "glimpse(diabetes)\n",
+ "\n",
+ "\n",
+ "# Select the first 5 rows of the data\n",
+ "diabetes %>% \n",
+ " slice(1:5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "UwjVT1Hz-c3Z"
+ },
+ "source": [
+ "`glimpse()` shows us that this data has 442 rows and 11 columns with all the columns being of data type `double` \n",
+ "\n",
+ " \n",
+ "\n",
+ "\n",
+ "\n",
+ "> glimpse() and slice() are functions in [`dplyr`](https://dplyr.tidyverse.org/). Dplyr, part of the Tidyverse, is a grammar of data manipulation that provides a consistent set of verbs that help you solve the most common data manipulation challenges\n",
+ "\n",
+ " \n",
+ "\n",
+ "Now that we have the data, let's narrow down to one feature (`bmi`) to target for this exercise. This will require us to select the desired columns. So, how do we do this?\n",
+ "\n",
+ "[`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) allows us to *select* (and optionally rename) columns in a data frame."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "RDY1oAKI-m80"
+ },
+ "source": [
+ "# Select predictor feature `bmi` and outcome `y`\n",
+ "diabetes_select <- diabetes %>% \n",
+ " select(c(bmi, y))\n",
+ "\n",
+ "# Print the first 5 rows\n",
+ "diabetes_select %>% \n",
+ " slice(1:10)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SDk668xK-tc3"
+ },
+ "source": [
+ "## 3. Training and Testing data\n",
+ "\n",
+ "It's common practice in supervised learning to *split* the data into two subsets; a (typically larger) set with which to train the model, and a smaller \"hold-back\" set with which to see how the model performed.\n",
+ "\n",
+ "Now that we have data ready, we can see if a machine can help determine a logical split between the numbers in this dataset. We can use the [rsample](https://tidymodels.github.io/rsample/) package, which is part of the Tidymodels framework, to create an object that contains the information on *how* to split the data, and then two more rsample functions to extract the created training and testing sets:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "EqtHx129-1h-"
+ },
+ "source": [
+ "set.seed(2056)\n",
+ "# Split 67% of the data for training and the rest for tesing\n",
+ "diabetes_split <- diabetes_select %>% \n",
+ " initial_split(prop = 0.67)\n",
+ "\n",
+ "# Extract the resulting train and test sets\n",
+ "diabetes_train <- training(diabetes_split)\n",
+ "diabetes_test <- testing(diabetes_split)\n",
+ "\n",
+ "# Print the first 3 rows of the training set\n",
+ "diabetes_train %>% \n",
+ " slice(1:10)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "sBOS-XhB-6v7"
+ },
+ "source": [
+ "## 4. Train a linear regression model with Tidymodels\n",
+ "\n",
+ "Now we are ready to train our model!\n",
+ "\n",
+ "In Tidymodels, you specify models using `parsnip()` by specifying three concepts:\n",
+ "\n",
+ "- Model **type** differentiates models such as linear regression, logistic regression, decision tree models, and so forth.\n",
+ "\n",
+ "- Model **mode** includes common options like regression and classification; some model types support either of these while some only have one mode.\n",
+ "\n",
+ "- Model **engine** is the computational tool which will be used to fit the model. Often these are R packages, such as **`\"lm\"`** or **`\"ranger\"`**\n",
+ "\n",
+ "This modeling information is captured in a model specification, so let's build one!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "20OwEw20--t3"
+ },
+ "source": [
+ "# Build a linear model specification\n",
+ "lm_spec <- \n",
+ " # Type\n",
+ " linear_reg() %>% \n",
+ " # Engine\n",
+ " set_engine(\"lm\") %>% \n",
+ " # Mode\n",
+ " set_mode(\"regression\")\n",
+ "\n",
+ "\n",
+ "# Print the model specification\n",
+ "lm_spec"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "_oDHs89k_CJj"
+ },
+ "source": [
+ "After a model has been *specified*, the model can be `estimated` or `trained` using the [`fit()`](https://parsnip.tidymodels.org/reference/fit.html) function, typically using a formula and some data.\n",
+ "\n",
+ "`y ~ .` means we'll fit `y` as the predicted quantity/target, explained by all the predictors/features ie, `.` (in this case, we only have one predictor: `bmi` )"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "YlsHqd-q_GJQ"
+ },
+ "source": [
+ "# Build a linear model specification\n",
+ "lm_spec <- linear_reg() %>% \n",
+ " set_engine(\"lm\") %>%\n",
+ " set_mode(\"regression\")\n",
+ "\n",
+ "\n",
+ "# Train a linear regression model\n",
+ "lm_mod <- lm_spec %>% \n",
+ " fit(y ~ ., data = diabetes_train)\n",
+ "\n",
+ "# Print the model\n",
+ "lm_mod"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kGZ22RQj_Olu"
+ },
+ "source": [
+ "From the model output, we can see the coefficients learned during training. They represent the coefficients of the line of best fit that gives us the lowest overall error between the actual and predicted variable.\n",
+ " \n",
+ "\n",
+ "## 5. Make predictions on the test set\n",
+ "\n",
+ "Now that we've trained a model, we can use it to predict the disease progression y for the test dataset using [parsnip::predict()](https://parsnip.tidymodels.org/reference/predict.model_fit.html). This will be used to draw the line between data groups."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "nXHbY7M2_aao"
+ },
+ "source": [
+ "# Make predictions for the test set\n",
+ "predictions <- lm_mod %>% \n",
+ " predict(new_data = diabetes_test)\n",
+ "\n",
+ "# Print out some of the predictions\n",
+ "predictions %>% \n",
+ " slice(1:5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "R_JstwUY_bIs"
+ },
+ "source": [
+ "Woohoo! 💃🕺 We just trained a model and used it to make predictions!\n",
+ "\n",
+ "When making predictions, the tidymodels convention is to always produce a tibble/data frame of results with standardized column names. This makes it easy to combine the original data and the predictions in a usable format for subsequent operations such as plotting.\n",
+ "\n",
+ "`dplyr::bind_cols()` efficiently binds multiple data frames column."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "RybsMJR7_iI8"
+ },
+ "source": [
+ "# Combine the predictions and the original test set\n",
+ "results <- diabetes_test %>% \n",
+ " bind_cols(predictions)\n",
+ "\n",
+ "\n",
+ "results %>% \n",
+ " slice(1:5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XJbYbMZW_n_s"
+ },
+ "source": [
+ "## 6. Plot modelling results\n",
+ "\n",
+ "Now, its time to see this visually 📈. We'll create a scatter plot of all the `y` and `bmi` values of the test set, then use the predictions to draw a line in the most appropriate place, between the model's data groupings.\n",
+ "\n",
+ "R has several systems for making graphs, but `ggplot2` is one of the most elegant and most versatile. This allows you to compose graphs by **combining independent components**."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "R9tYp3VW_sTn"
+ },
+ "source": [
+ "# Set a theme for the plot\n",
+ "theme_set(theme_light())\n",
+ "# Create a scatter plot\n",
+ "results %>% \n",
+ " ggplot(aes(x = bmi)) +\n",
+ " # Add a scatter plot\n",
+ " geom_point(aes(y = y), size = 1.6) +\n",
+ " # Add a line plot\n",
+ " geom_line(aes(y = .pred), color = \"blue\", size = 1.5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zrPtHIxx_tNI"
+ },
+ "source": [
+ "> ✅ Think a bit about what's going on here. A straight line is running through many small dots of data, but what is it doing exactly? Can you see how you should be able to use this line to predict where a new, unseen data point should fit in relationship to the plot's y axis? Try to put into words the practical use of this model.\n",
+ "\n",
+ "Congratulations, you built your first linear regression model, created a prediction with it, and displayed it in a plot!\n"
+ ]
+ }
+ ]
+}
diff --git a/2-Regression/1-Tools/solution/lesson_1.Rmd b/2-Regression/1-Tools/solution/lesson_1.Rmd
new file mode 100644
index 00000000..d6a0c0ea
--- /dev/null
+++ b/2-Regression/1-Tools/solution/lesson_1.Rmd
@@ -0,0 +1,250 @@
+---
+title: 'Build a regression model: Get started with R and Tidymodels for regression models'
+output:
+ html_document:
+ df_print: paged
+ theme: flatly
+ highlight: breezedark
+ toc: yes
+ toc_float: yes
+ code_download: yes
+---
+
+## Introduction to Regression - Lesson 1
+
+#### Putting it into perspective
+
+✅ There are many types of regression methods, and which one you pick depends on the answer you're looking for. If you want to predict the probable height for a person of a given age, you'd use `linear regression`, as you're seeking a **numeric value**. If you're interested in discovering whether a type of cuisine should be considered vegan or not, you're looking for a **category assignment** so you would use `logistic regression`. You'll learn more about logistic regression later. Think a bit about some questions you can ask of data, and which of these methods would be more appropriate.
+
+In this section, you will work with a [small dataset about diabetes](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html). Imagine that you wanted to test a treatment for diabetic patients. Machine Learning models might help you determine which patients would respond better to the treatment, based on combinations of variables. Even a very basic regression model, when visualized, might show information about variables that would help you organize your theoretical clinical trials.
+
+That said, let's get started on this task!
+
+{width="630"}
+
+## 1. Loading up our tool set
+
+For this task, we'll require the following packages:
+
+- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!
+
+- `tidymodels`: The [tidymodels](https://www.tidymodels.org/) framework is a [collection of packages](https://www.tidymodels.org/packages/) for modeling and machine learning.
+
+You can have them installed as:
+
+`install.packages(c("tidyverse", "tidymodels"))`
+
+The script below checks whether you have the packages required to complete this module and installs them for you in case they are missing.
+
+```{r, message=F, warning=F}
+if (!require("pacman")) install.packages("pacman")
+pacman::p_load(tidyverse, tidymodels)
+```
+
+Now, let's load these awesome packages and make them available in our current R session. (This is for mere illustration, `pacman::p_load()` already did that for you)
+
+```{r load_tidy_verse_models, message=F, warning=F}
+# load the core Tidyverse packages
+library(tidyverse)
+
+# load the core Tidymodels packages
+library(tidymodels)
+
+
+```
+
+## 2. The diabetes dataset
+
+In this exercise, we'll put our regression skills into display by making predictions on a diabetes dataset. The [diabetes dataset](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.rwrite1.txt) includes `442 samples` of data around diabetes, with 10 predictor feature variables, `age`, `sex`, `body mass index`, `average blood pressure`, and `six blood serum measurements` as well as an outcome variable `y`: a quantitative measure of disease progression one year after baseline.
+
++----------------------------+------------------------------------------------------------------------------------+
+| **Number of observations** | **442** |
++============================+====================================================================================+
+| **Number of predictors** | First 10 columns are numeric predictive values |
++----------------------------+------------------------------------------------------------------------------------+
+| **Outcome/Target** | Column 11 is a quantitative measure of disease progression one year after baseline |
++----------------------------+------------------------------------------------------------------------------------+
+| **Predictor Information** | - age age in years |
+| | - sex |
+| | - bmi body mass index |
+| | - bp average blood pressure |
+| | - s1 tc, total serum cholesterol |
+| | - s2 ldl, low-density lipoproteins |
+| | - s3 hdl, high-density lipoproteins |
+| | - s4 tch, total cholesterol / HDL |
+| | - s5 ltg, possibly log of serum triglycerides level |
+| | - s6 glu, blood sugar level |
++----------------------------+------------------------------------------------------------------------------------+
+
+> 🎓 Remember, this is supervised learning, and we need a named 'y' target.
+
+Before you can manipulate data with R, you need to import the data into R's memory, or build a connection to the data that R can use to access the data remotely.\
+
+> The [readr](https://readr.tidyverse.org/) package, which is part of the Tidyverse, provides a fast and friendly way to read rectangular data into R.
+
+Now, let's load the diabetes dataset provided in this source URL:
+
+Also, we'll perform a sanity check on our data using `glimpse()` and dsiplay the first 5 rows using `slice()`.
+
+Before going any further, let's introduce something you will encounter quite often in R code: the pipe operator `%>%`
+
+The pipe operator (`%>%`) performs operations in logical sequence by passing an object forward into a function or call expression. You can think of the pipe operator as saying "and then" in your code.\
+
+```{r load_dataset, message=F, warning=F}
+# Import the data set
+diabetes <- read_table2(file = "https://www4.stat.ncsu.edu/~boos/var.select/diabetes.rwrite1.txt")
+
+
+# Get a glimpse and dimensions of the data
+glimpse(diabetes)
+
+
+# Select the first 5 rows of the data
+diabetes %>%
+ slice(1:5)
+
+```
+
+`glimpse()` shows us that this data has 442 rows and 11 columns with all the columns being of data type `double`
+
+> glimpse() and slice() are functions in [`dplyr`](https://dplyr.tidyverse.org/). Dplyr, part of the Tidyverse, is a grammar of data manipulation that provides a consistent set of verbs that help you solve the most common data manipulation challenges
+
+Now that we have the data, let's narrow down to one feature (`bmi`) to target for this exercise. This will require us to select the desired columns. So, how do we do this?
+
+[`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) allows us to *select* (and optionally rename) columns in a data frame.
+
+```{r select, message=F, warning=F}
+# Select predictor feature `bmi` and outcome `y`
+diabetes_select <- diabetes %>%
+ select(c(bmi, y))
+
+# Print the first 5 rows
+diabetes_select %>%
+ slice(1:5)
+```
+
+## 3. Training and Testing data
+
+It's common practice in supervised learning to *split* the data into two subsets; a (typically larger) set with which to train the model, and a smaller "hold-back" set with which to see how the model performed.
+
+Now that we have data ready, we can see if a machine can help determine a logical split between the numbers in this dataset. We can use the [rsample](https://tidymodels.github.io/rsample/) package, which is part of the Tidymodels framework, to create an object that contains the information on *how* to split the data, and then two more rsample functions to extract the created training and testing sets:
+
+```{r split, message=F, warning=F}
+set.seed(2056)
+# Split 67% of the data for training and the rest for tesing
+diabetes_split <- diabetes_select %>%
+ initial_split(prop = 0.67)
+
+# Extract the resulting train and test sets
+diabetes_train <- training(diabetes_split)
+diabetes_test <- testing(diabetes_split)
+
+# Print the first 3 rows of the training set
+diabetes_train %>%
+ slice(1:3)
+
+```
+
+## 4. Train a linear regression model with Tidymodels
+
+Now we are ready to train our model!
+
+In Tidymodels, you specify models using `parsnip()` by specifying three concepts:
+
+- Model **type** differentiates models such as linear regression, logistic regression, decision tree models, and so forth.
+
+- Model **mode** includes common options like regression and classification; some model types support either of these while some only have one mode.
+
+- Model **engine** is the computational tool which will be used to fit the model. Often these are R packages, such as **`"lm"`** or **`"ranger"`**
+
+This modeling information is captured in a model specification, so let's build one!
+
+```{r lm_model_spec, message=F, warning=F}
+# Build a linear model specification
+lm_spec <-
+ # Type
+ linear_reg() %>%
+ # Engine
+ set_engine("lm") %>%
+ # Mode
+ set_mode("regression")
+
+
+# Print the model specification
+lm_spec
+
+```
+
+After a model has been *specified*, the model can be `estimated` or `trained` using the [`fit()`](https://parsnip.tidymodels.org/reference/fit.html) function, typically using a formula and some data.
+
+`y ~ .` means we'll fit `y` as the predicted quantity/target, explained by all the predictors/features ie, `.` (in this case, we only have one predictor: `bmi` )
+
+```{r train, message=F, warning=F}
+# Build a linear model specification
+lm_spec <- linear_reg() %>%
+ set_engine("lm") %>%
+ set_mode("regression")
+
+
+# Train a linear regression model
+lm_mod <- lm_spec %>%
+ fit(y ~ ., data = diabetes_train)
+
+# Print the model
+lm_mod
+```
+
+From the model output, we can see the coefficients learned during training. They represent the coefficients of the line of best fit that gives us the lowest overall error between the actual and predicted variable.
+
+## 5. Make predictions on the test set
+
+Now that we've trained a model, we can use it to predict the disease progression y for the test dataset using [parsnip::predict()](https://parsnip.tidymodels.org/reference/predict.model_fit.html). This will be used to draw the line between data groups.
+
+```{r test, message=F, warning=F}
+# Make predictions for the test set
+predictions <- lm_mod %>%
+ predict(new_data = diabetes_test)
+
+# Print out some of the predictions
+predictions %>%
+ slice(1:5)
+```
+
+Woohoo! 💃🕺 We just trained a model and used it to make predictions!
+
+When making predictions, the tidymodels convention is to always produce a tibble/data frame of results with standardized column names. This makes it easy to combine the original data and the predictions in a usable format for subsequent operations such as plotting.
+
+`dplyr::bind_cols()` efficiently binds multiple data frames column.
+
+```{r test_pred, message=F, warning=F}
+# Combine the predictions and the original test set
+results <- diabetes_test %>%
+ bind_cols(predictions)
+
+
+results %>%
+ slice(1:5)
+```
+
+## 6. Plot modelling results
+
+Now, its time to see this visually 📈. We'll create a scatter plot of all the `y` and `bmi` values of the test set, then use the predictions to draw a line in the most appropriate place, between the model's data groupings.
+
+R has several systems for making graphs, but `ggplot2` is one of the most elegant and most versatile. This allows you to compose graphs by **combining independent components**.
+
+```{r plot_pred, message=F, warning=F}
+# Set a theme for the plot
+theme_set(theme_light())
+# Create a scatter plot
+results %>%
+ ggplot(aes(x = bmi)) +
+ # Add a scatter plot
+ geom_point(aes(y = y), size = 1.6) +
+ # Add a line plot
+ geom_line(aes(y = .pred), color = "blue", size = 1.5)
+
+```
+
+> ✅ Think a bit about what's going on here. A straight line is running through many small dots of data, but what is it doing exactly? Can you see how you should be able to use this line to predict where a new, unseen data point should fit in relationship to the plot's y axis? Try to put into words the practical use of this model.
+
+Congratulations, you built your first linear regression model, created a prediction with it, and displayed it in a plot!
diff --git a/2-Regression/1-Tools/solution/notebook.ipynb b/2-Regression/1-Tools/solution/notebook.ipynb
index e7d80492..b7017624 100644
--- a/2-Regression/1-Tools/solution/notebook.ipynb
+++ b/2-Regression/1-Tools/solution/notebook.ipynb
@@ -182,13 +182,6 @@
"plt.plot(X_test, y_pred, color='blue', linewidth=3)\n",
"plt.show()"
]
- },
- {
- "cell_type": "code",
- "execution_count": null,
- "metadata": {},
- "outputs": [],
- "source": []
}
]
-}
\ No newline at end of file
+}
diff --git a/2-Regression/1-Tools/translations/README.it.md b/2-Regression/1-Tools/translations/README.it.md
new file mode 100644
index 00000000..48c61d34
--- /dev/null
+++ b/2-Regression/1-Tools/translations/README.it.md
@@ -0,0 +1,211 @@
+# Iniziare con Python e Scikit-learn per i modelli di regressione
+
+
+
+> Sketchnote di [Tomomi Imura](https://www.twitter.com/girlie_mac)
+
+## [Qui Pre-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/9/)
+
+## Introduzione
+
+In queste quattro lezioni, si scoprirà come costruire modelli di regressione. Si discuterà di cosa siano fra breve.
+Prima di tutto, ci si deve assicurare di avere a disposizione gli strumenti adatti per far partire il processo!
+
+In questa lezione, si imparerà come:
+
+- Configurare il proprio computer per attività locali di machine learning.
+- Lavorare con i Jupyter notebook.
+- Usare Scikit-learn, compresa l'installazione.
+- Esplorare la regressione lineare con un esercizio pratico.
+
+## Installazioni e configurazioni
+
+[](https://youtu.be/7EXd4_ttIuw "Using Python with Visual Studio Code")
+
+> 🎥 Fare click sull'immagine qui sopra per un video: usare Python all'interno di VS Code.
+
+1. **Installare Python**. Assicurarsi che [Python](https://www.python.org/downloads/) sia installato nel proprio computer. Si userà Python for per molte attività di data science e machine learning. La maggior parte dei sistemi già include una installazione di Python. Ci sono anche utili [Pacchetti di Codice Python](https://code.visualstudio.com/learn/educators/installers?WT.mc_id=academic-15963-cxa) disponbili, per facilitare l'installazione per alcuni utenti.
+
+ Alcuni utilizzi di Python, tuttavia, richiedono una versione del software, laddove altri ne richiedono un'altra differente. Per questa ragione, è utile lavorare con un [ambiente virtuale](https://docs.python.org/3/library/venv.html).
+
+2. **Installare Visual Studio Code**. Assicurarsi di avere installato Visual Studio Code sul proprio computer. Si seguano queste istruzioni per [installare Visual Studio Code](https://code.visualstudio.com/) per l'installazione basica. Si userà Python in Visual Studio Code in questo corso, quindi meglio rinfrescarsi le idee su come [configurare Visual Studio Code](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-15963-cxa) per lo sviluppo in Python.
+
+ > Si prenda confidenza con Python tramite questa collezione di [moduli di apprendimento](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-15963-cxa)
+
+3. **Installare Scikit-learn**, seguendo [queste istruzioni](https://scikit-learn.org/stable/install.html). Visto che ci si deve assicurare di usare Python 3, ci si raccomanda di usare un ambiente virtuale. Si noti che se si installa questa libreria in un M1 Mac, ci sono istruzioni speciali nella pagina di cui al riferimento qui sopra.
+
+1. **Installare Jupyter Notebook**. Servirà [installare il pacchetto Jupyter](https://pypi.org/project/jupyter/).
+
+## Ambiente di creazione ML
+
+Si useranno **notebook** per sviluppare il codice Python e creare modelli di machine learning. Questo tipo di file è uno strumento comune per i data scientist, e viene identificato dal suffisso o estensione `.ipynb`.
+
+I notebook sono un ambiente interattivo che consente allo sviluppatore di scrivere codice, aggiungere note e scrivere documentazione attorno al codice il che è particolarmente utile per progetti sperimentali o orientati alla ricerca.
+
+### Esercizio - lavorare con un notebook
+
+In questa cartella, si troverà il file _notebook.ipynb_.
+
+1. Aprire _notebook.ipynb_ in Visual Studio Code.
+
+ Un server Jupyter verrà lanciato con Python 3+. Si troveranno aree del notebook che possono essere `eseguite`, pezzi di codice. Si può eseguire un blocco di codice selezionando l'icona che assomiglia a un bottone di riproduzione.
+
+1. Selezionare l'icona `md` e aggiungere un poco di markdown, e il seguente testo **# Benvenuto nel tuo notebook**.
+
+ Poi, aggiungere un blocco di codice Python.
+
+1. Digitare **print('hello notebook')** nell'area riservata al codice.
+1. Selezionare la freccia per eseguire il codice.
+
+ Si dovrebbe vedere stampata la seguente frase:
+
+ ```output
+ hello notebook
+ ```
+
+
+
+Si può inframezzare il codice con commenti per auto documentare il notebook.
+
+✅ Si pensi per un minuto all'ambiente di lavoro di uno sviluppatore web rispetto a quello di un data scientist.
+
+## Scikit-learn installato e funzionante
+
+Adesso che Python è impostato nel proprio ambiente locale, e si è familiari con i notebook Jupyter, si acquisterà ora confidenza con Scikit-learn (si pronuncia con la `si` della parola inglese `science`). Scikit-learn fornisce una [API estensiva](https://scikit-learn.org/stable/modules/classes.html#api-ref) che aiuta a eseguire attività ML.
+
+Stando al loro [sito web](https://scikit-learn.org/stable/getting_started.html), "Scikit-learn è una libreria di machine learning open source che supporta l'apprendimento assistito (supervised learning) e non assistito (unsuperivised learnin). Fornisce anche strumenti vari per l'adattamento del modello, la pre-elaborazione dei dati, la selezione e la valutazione dei modelli e molte altre utilità."
+
+In questo corso, si userà Scikit-learn e altri strumenti per costruire modelli di machine learning per eseguire quelle che vengono chiamate attività di 'machine learning tradizionale'. Si sono deliberamente evitate le reti neurali e il deep learning visto che saranno meglio trattati nel prossimo programma di studi 'AI per Principianti'.
+
+Scikit-learn rende semplice costruire modelli e valutarli per l'uso. Si concentra principalmente sull'utilizzo di dati numerici e contiene diversi insiemi di dati già pronti per l'uso come strumenti di apprendimento. Include anche modelli pre-costruiti per gli studenti da provare. Si esplora ora il processo di caricamento dei dati preconfezionati, e, utilizzando un modello di stimatore incorporato, un primo modello ML con Scikit-Learn con alcuni dati di base.
+
+## Esercizio - Il Primo notebook Scikit-learn
+
+> Questo tutorial è stato ispirato dall'[esempio di regressione lineare](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py) nel sito web di Scikit-learn.
+
+Nel file _notebook.ipynb_ associato a questa lezione, svuotare tutte le celle usando l'icona cestino ('trash can').
+
+In questa sezione, di lavorerà con un piccolo insieme di dati sul diabete che è incorporato in Scikit-learn per scopi di apprendimento. Si immagini di voler testare un trattamento per i pazienti diabetici. I modelli di machine learning potrebbero essere di aiuto nel determinare quali pazienti risponderebbero meglio al trattamento, in base a combinazioni di variabili. Anche un modello di regressione molto semplice, quando visualizzato, potrebbe mostrare informazioni sulle variabili che aiuteranno a organizzare le sperimentazioni cliniche teoriche.
+
+✅ Esistono molti tipi di metodi di regressione e quale scegliere dipende dalla risposta che si sta cercando. Se si vuole prevedere l'altezza probabile per una persona di una data età, si dovrebbe usare la regressione lineare, visto che si sta cercando un **valore numerico**. Se si è interessati a scoprire se un tipo di cucina dovrebbe essere considerato vegano o no, si sta cercando un'**assegnazione di categoria** quindi si dovrebbe usare la regressione logistica. Si imparerà di più sulla regressione logistica in seguito. Si pensi ad alcune domande che si possono chiedere ai dati e quale di questi metodi sarebbe più appropriato.
+
+Si inizia con questa attività.
+
+### Importare le librerie
+
+Per questo compito verranno importate alcune librerie:
+
+- **matplotlib**. E' un utile [strumento grafico](https://matplotlib.org/) e verrà usato per creare una trama a linee.
+- **numpy**. [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html) è una libreira utile per gestire i dati numerici in Python.
+- **sklearn**. Questa è la libreria Scikit-learn.
+
+Importare alcune librerie che saranno di aiuto per le proprie attività.
+
+1. Con il seguente codice si aggiungono le importazioni:
+
+ ```python
+ import matplotlib.pyplot as plt
+ import numpy as np
+ from sklearn import datasets, linear_model, model_selection
+ ```
+
+ Qui sopra vengono importati `matplottlib`, e `numpy`, da `sklearn` si importa `datasets`, `linear_model` e `model_selection`. `model_selection` viene usato per dividere i dati negli insiemi di addestramento e test.
+
+### L'insieme di dati riguardante il diabete
+
+L'[insieme dei dati sul diabete](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) include 442 campioni di dati sul diabete, con 10 variabili caratteristiche, alcune delle quali includono:
+
+- age (età): età in anni
+- bmi: indice di massa corporea (body mass index)
+- bp: media pressione sanguinea
+- s1 tc: Cellule T (un tipo di leucocito)
+
+✅ Questo insieme di dati include il concetto di "sesso" come caratteristica variabile importante per la ricerca sul diabete. Molti insiemi di dati medici includono questo tipo di classificazione binaria. Si rifletta su come categorizzazioni come questa potrebbe escludere alcune parti di una popolazione dai trattamenti.
+
+Ora si caricano i dati di X e y.
+
+> 🎓 Si ricordi, questo è apprendimento supervisionato (supervised learning), e serve dare un nome all'obiettivo 'y'.
+
+In una nuova cella di codice, caricare l'insieme di dati sul diabete chiamando `load_diabetes()`. Il parametro `return_X_y=True` segnala che `X` sarà una matrice di dati e `y` sarà l'obiettivo della regressione.
+
+1. Si aggiungono alcuni comandi di stampa per msotrare la forma della matrice di dati e i suoi primi elementi:
+
+ ```python
+ X, y = datasets.load_diabetes(return_X_y=True)
+ print(X.shape)
+ print(X[0])
+ ```
+
+ Quella che viene ritornata è una tuple. Quello che si sta facento è assegnare i primi due valori della tupla a `X` e `y` rispettivamente. Per saperne di più sulle [tuples](https://wikipedia.org/wiki/Tuple).
+
+ Si può vedere che questi dati hanno 442 elementi divisi in array di 10 elementi:
+
+ ```text
+ (442, 10)
+ [ 0.03807591 0.05068012 0.06169621 0.02187235 -0.0442235 -0.03482076
+ -0.04340085 -0.00259226 0.01990842 -0.01764613]
+ ```
+
+ ✅ Si rifletta sulla relazione tra i dati e l'obiettivo di regressione. La regressione lineare prevede le relazioni tra la caratteristica X e la variabile di destinazione y. Si può trovare l'[obiettivo](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) per l'insieme di dati sul diabete nella documentazione? Cosa dimostra questo insieme di dati, dato quell'obiettivo?
+
+2. Successivamente, selezionare una porzione di questo insieme di dati da tracciare sistemandola in un nuovo array usando la funzione di numpy's `newaxis`. Verrà usata la regressione lineare per generare una linea tra i valori in questi dati secondo il modello che determina.
+
+ ```python
+ X = X[:, np.newaxis, 2]
+ ```
+
+ ✅ A piacere, stampare i dati per verificarne la forma.
+
+3. Ora che si hanno dei dati pronti per essere tracciati, è possibile vedere se una macchina può aiutare a determinare una divisione logica tra i numeri in questo insieme di dati. Per fare ciò, è necessario dividere sia i dati (X) che l'obiettivo (y) in insiemi di test e addestamento. Scikit-learn ha un modo semplice per farlo; si possono dividere i dati di prova in un determinato punto.
+
+ ```python
+ X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
+ ```
+
+4. Ora si è pronti ad addestare il modello! Caricare il modello di regressione lineare e addestrarlo con i propri insiemi di addestramento X e y usando `model.fit()`:
+
+ ```python
+ model = linear_model.LinearRegression()
+ model.fit(X_train, y_train)
+ ```
+
+ ✅ `model.fit()` è una funzione che si vedrà in molte librerie ML tipo TensorFlow
+
+5. Successivamente creare una previsione usando i dati di test, con la funzione `predict()`. Questo servirà per tracciare la linea tra i gruppi di dati
+
+ ```python
+ y_pred = model.predict(X_test)
+ ```
+
+6. Ora è il momento di mostrare i dati in un tracciato. Matplotlib è uno strumento molto utile per questo compito. Si crei un grafico a dispersione (scatterplot) di tutti i dati del test X e y e si utilizzi la previsione per disegnare una linea nel luogo più appropriato, tra i raggruppamenti dei dati del modello.
+
+ ```python
+ plt.scatter(X_test, y_test, color='black')
+ plt.plot(X_test, y_pred, color='blue', linewidth=3)
+ plt.show()
+ ```
+
+ 
+
+ ✅ Si pensi a cosa sta succedendo qui. Una linea retta scorre attraverso molti piccoli punti dati, ma cosa sta facendo esattamente? Si può capire come si dovrebbe utilizzare questa linea per prevedere dove un nuovo punto di dati non noto dovrebbe adattarsi alla relazione con l'asse y del tracciato? Si cerchi di mettere in parole l'uso pratico di questo modello.
+
+Congratulazioni, si è costruito il primo modello di regressione lineare, creato una previsione con esso, e visualizzata in una tracciato!
+
+---
+
+## 🚀Sfida
+
+Tracciare una variabile diversa da questo insieme di dati. Suggerimento: modificare questa riga: `X = X[:, np.newaxis, 2]`. Dato l'obiettivo di questo insieme di dati, cosa si potrebbe riuscire a scoprire circa la progressione del diabete come matattia?
+
+## [Qui post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/10/)
+
+## Riepilogo e Auto Apprendimento
+
+In questo tutorial, si è lavorato con una semplice regressione lineare, piuttosto che una regressione univariata o multipla. Ci so informi circa le differenze tra questi metodi oppure si dia uno sguardo a [questo video](https://www.coursera.org/lecture/quantifying-relationships-regression-models/linear-vs-nonlinear-categorical-variables-ai2Ef)
+
+Si legga di più sul concetto di regressione e si pensi a quale tipo di domande potrebbero trovare risposta con questa tecnica. Seguire questo [tutorial](https://docs.microsoft.com/learn/modules/train-evaluate-regression-models?WT.mc_id=academic-15963-cxa) per approfondire la propria conoscenza.
+
+## Compito
+
+[Un insieme di dati diverso](assignment.it.md)
+
diff --git a/2-Regression/1-Tools/translations/README.ja.md b/2-Regression/1-Tools/translations/README.ja.md
index 0bebf16d..0ecc0e82 100644
--- a/2-Regression/1-Tools/translations/README.ja.md
+++ b/2-Regression/1-Tools/translations/README.ja.md
@@ -4,7 +4,7 @@
> [Tomomi Imura](https://www.twitter.com/girlie_mac) によって制作されたスケッチノート
-## [講義前クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/9/)
+## [講義前クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/9?loc=ja)
## イントロダクション
@@ -205,7 +205,7 @@ s1 tc: T細胞(白血球の一種)
## 🚀チャレンジ
このデータセットから別の変数を選択してプロットしてください。ヒント: `X = X[:, np.newaxis, 2]` の行を編集する。今回のデータセットのターゲットである、糖尿病という病気の進行について、どのような発見があるのでしょうか?
-## [講義後クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/10/)
+## [講義後クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/10?loc=ja)
## レビュー & 自主学習
@@ -215,4 +215,4 @@ s1 tc: T細胞(白血球の一種)
## 課題
-[異なるデータセット](assignment.md)
+[異なるデータセット](./assignment.ja.md)
diff --git a/2-Regression/1-Tools/translations/README.zh-cn.md b/2-Regression/1-Tools/translations/README.zh-cn.md
new file mode 100644
index 00000000..bb7f37c1
--- /dev/null
+++ b/2-Regression/1-Tools/translations/README.zh-cn.md
@@ -0,0 +1,205 @@
+# 开始使用Python和Scikit学习回归模型
+
+
+
+> 作者[Tomomi Imura](https://www.twitter.com/girlie_mac)
+
+## [课前测](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/9/)
+## 介绍
+
+在这四节课中,你将了解如何构建回归模型。我们将很快讨论这些是什么。但在你做任何事情之前,请确保你有合适的工具来开始这个过程!
+
+在本课中,你将学习如何:
+
+- 为本地机器学习任务配置你的计算机。
+- 使用Jupyter notebooks。
+- 使用Scikit-learn,包括安装。
+- 通过动手练习探索线性回归。
+
+## 安装和配置
+
+[](https://youtu.be/7EXd4_ttIuw "在 Visual Studio Code中使用 Python")
+
+> 🎥 单击上图观看视频:在VS Code中使用Python。
+
+1. **安装 Python**。确保你的计算机上安装了[Python](https://www.python.org/downloads/)。你将在许多数据科学和机器学习任务中使用 Python。大多数计算机系统已经安装了Python。也有一些有用的[Python编码包](https://code.visualstudio.com/learn/educations/installers?WT.mc_id=academic-15963-cxa)可用于简化某些用户的设置。
+
+ 然而,Python的某些用法需要一个版本的软件,而其他用法则需要另一个不同的版本。 因此,在[虚拟环境](https://docs.python.org/3/library/venv.html)中工作很有用。
+
+2. **安装 Visual Studio Code**。确保你的计算机上安装了Visual Studio Code。按照这些说明[安装 Visual Studio Code](https://code.visualstudio.com/)进行基本安装。在本课程中,你将在Visual Studio Code中使用Python,因此你可能想复习如何[配置 Visual Studio Code](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-15963-cxa)用于Python开发。
+
+ > 通过学习这一系列的 [学习模块](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-15963-cxa)熟悉Python
+
+3. **按照[这些说明]安装Scikit learn**(https://scikit-learn.org/stable/install.html )。由于你需要确保使用Python3,因此建议你使用虚拟环境。注意,如果你是在M1 Mac上安装这个库,在上面链接的页面上有特别的说明。
+
+4. **安装Jupyter Notebook**。你需要[安装Jupyter包](https://pypi.org/project/jupyter/)。
+
+## 你的ML工作环境
+
+你将使用**notebooks**开发Python代码并创建机器学习模型。这种类型的文件是数据科学家的常用工具,可以通过后缀或扩展名`.ipynb`来识别它们。
+
+Notebooks是一个交互式环境,允许开发人员编写代码并添加注释并围绕代码编写文档,这对于实验或面向研究的项目非常有帮助。
+
+### 练习 - 使用notebook
+
+1. 在Visual Studio Code中打开_notebook.ipynb_。
+
+ Jupyter服务器将以python3+启动。你会发现notebook可以“运行”的区域、代码块。你可以通过选择看起来像播放按钮的图标来运行代码块。
+
+2. 选择`md`图标并添加一点markdown,输入文字 **# Welcome to your notebook**。
+
+ 接下来,添加一些Python代码。
+
+1. 在代码块中输入**print("hello notebook")**。
+
+2. 选择箭头运行代码。
+
+ 你应该看到打印的语句:
+
+ ```output
+ hello notebook
+ ```
+
+
+
+你可以为你的代码添加注释,以便notebook可以自描述。
+
+✅ 想一想web开发人员的工作环境与数据科学家的工作环境有多大的不同。
+
+## 启动并运行Scikit-learn
+
+现在Python已在你的本地环境中设置好,并且你对Jupyter notebook感到满意,让我们同样熟悉Scikit-learn(在“science”中发音为“sci”)。 Scikit-learn提供了[大量的API](https://scikit-learn.org/stable/modules/classes.html#api-ref)来帮助你执行ML任务。
+
+根据他们的[网站](https://scikit-learn.org/stable/getting_started.html),“Scikit-learn是一个开源机器学习库,支持有监督和无监督学习。它还提供了各种模型拟合工具、数据预处理、模型选择和评估以及许多其他实用程序。”
+
+在本课程中,你将使用Scikit-learn和其他工具来构建机器学习模型,以执行我们所谓的“传统机器学习”任务。我们特意避免了神经网络和深度学习,因为它们在我们即将推出的“面向初学者的人工智能”课程中得到了更好的介绍。
+
+Scikit-learn使构建模型和评估它们的使用变得简单。它主要侧重于使用数字数据,并包含几个现成的数据集用作学习工具。它还包括供学生尝试的预建模型。让我们探索加载预先打包的数据和使用内置的estimator first ML模型和Scikit-learn以及一些基本数据的过程。
+
+## 练习 - 你的第一个Scikit-learn notebook
+
+> 本教程的灵感来自Scikit-learn网站上的[线性回归示例](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py)。
+
+在与本课程相关的 _notebook.ipynb_ 文件中,通过点击“垃圾桶”图标清除所有单元格。
+
+在本节中,你将使用一个关于糖尿病的小数据集,该数据集内置于Scikit-learn中以用于学习目的。想象一下,你想为糖尿病患者测试一种治疗方法。机器学习模型可能会帮助你根据变量组合确定哪些患者对治疗反应更好。即使是非常基本的回归模型,在可视化时,也可能会显示有助于组织理论临床试验的变量信息。
+
+✅ 回归方法有很多种,你选择哪一种取决于你正在寻找的答案。如果你想预测给定年龄的人的可能身高,你可以使用线性回归,因为你正在寻找**数值**。如果你有兴趣了解某种菜肴是否应被视为素食主义者,那么你正在寻找**类别分配**,以便使用逻辑回归。稍后你将了解有关逻辑回归的更多信息。想一想你可以对数据提出的一些问题,以及这些方法中的哪一个更合适。
+
+让我们开始这项任务。
+
+### 导入库
+
+对于此任务,我们将导入一些库:
+
+- **matplotlib**。这是一个有用的[绘图工具](https://matplotlib.org/),我们将使用它来创建线图。
+- **numpy**。 [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html)是一个有用的库,用于在Python中处理数字数据。
+- **sklearn**。这是Scikit-learn库。
+
+导入一些库来帮助你完成任务。
+
+1. 通过输入以下代码添加导入:
+
+ ```python
+ import matplotlib.pyplot as plt
+ import numpy as np
+ from sklearn import datasets, linear_model, model_selection
+ ```
+
+ 在上面的代码中,你正在导入`matplottlib`、`numpy`,你正在从`sklearn`导入`datasets`、`linear_model`和`model_selection`。 `model_selection`用于将数据拆分为训练集和测试集。
+
+### 糖尿病数据集
+
+内置的[糖尿病数据集](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset)包含442个围绕糖尿病的数据样本,具有10个特征变量,其中包括:
+
+age:岁数
+bmi:体重指数
+bp:平均血压
+s1 tc:T细胞(一种白细胞)
+
+✅ 该数据集包括“性别”的概念,作为对糖尿病研究很重要的特征变量。许多医学数据集包括这种类型的二元分类。想一想诸如此类的分类如何将人群的某些部分排除在治疗之外。
+
+现在,加载X和y数据。
+
+> 🎓 请记住,这是监督学习,我们需要一个命名为“y”的目标。
+
+在新的代码单元中,通过调用`load_diabetes()`加载糖尿病数据集。输入`return_X_y=True`表示`X`将是一个数据矩阵,而`y`将是回归目标。
+
+1. 添加一些打印命令来显示数据矩阵的形状及其第一个元素:
+
+ ```python
+ X, y = datasets.load_diabetes(return_X_y=True)
+ print(X.shape)
+ print(X[0])
+ ```
+
+ 作为响应返回的是一个元组。你正在做的是将元组的前两个值分别分配给`X`和`y`。了解更多 [关于元组](https://wikipedia.org/wiki/Tuple)。
+
+ 你可以看到这个数据有442个项目,组成了10个元素的数组:
+
+ ```text
+ (442, 10)
+ [ 0.03807591 0.05068012 0.06169621 0.02187235 -0.0442235 -0.03482076
+ -0.04340085 -0.00259226 0.01990842 -0.01764613]
+ ```
+
+ ✅ 稍微思考一下数据和回归目标之间的关系。线性回归预测特征X和目标变量y之间的关系。你能在文档中找到糖尿病数据集的[目标](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset)吗?鉴于该目标,该数据集展示了什么?
+
+2. 接下来,通过使用numpy的`newaxis`函数将其排列到一个新数组中来选择要绘制的该数据集的一部分。我们将使用线性回归根据它确定的模式在此数据中的值之间生成一条线。
+
+ ```python
+ X = X[:, np.newaxis, 2]
+ ```
+
+ ✅ 随时打印数据以检查其形状。
+
+3. 现在你已准备好绘制数据,你可以查看机器是否可以帮助确定此数据集中数字之间的逻辑分割。为此你需要将数据(X)和目标(y)拆分为测试集和训练集。Scikit-learn有一个简单的方法来做到这一点;你可以在给定点拆分测试数据。
+
+ ```python
+ X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
+ ```
+
+4. 现在你已准备好训练你的模型!加载线性回归模型并使用`model.fit()`使用X和y训练集对其进行训练:
+
+ ```python
+ model = linear_model.LinearRegression()
+ model.fit(X_train, y_train)
+ ```
+
+ ✅ `model.fit()`是一个你会在许多机器学习库(例如 TensorFlow)中看到的函数
+
+5. 然后,使用函数`predict()`,使用测试数据创建预测。这将用于绘制数据组之间的线
+
+ ```python
+ y_pred = model.predict(X_test)
+ ```
+
+6. 现在是时候在图中显示数据了。Matplotlib是完成此任务的非常有用的工具。创建所有X和y测试数据的散点图,并使用预测在模型的数据分组之间最合适的位置画一条线。
+
+ ```python
+ plt.scatter(X_test, y_test, color='black')
+ plt.plot(X_test, y_pred, color='blue', linewidth=3)
+ plt.show()
+ ```
+
+ 
+
+ ✅ 想一想这里发生了什么。一条直线穿过许多小数据点,但它到底在做什么?你能看到你应该如何使用这条线来预测一个新的、未见过的数据点对应的y轴值吗?尝试用语言描述该模型的实际用途。
+
+恭喜,你构建了第一个线性回归模型,使用它创建了预测,并将其显示在绘图中!
+
+---
+## 🚀挑战
+
+从这个数据集中绘制一个不同的变量。提示:编辑这一行:`X = X[:, np.newaxis, 2]`。鉴于此数据集的目标,你能够发现糖尿病作为一种疾病的进展情况吗?
+## [课后测](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/10/)
+
+## 复习与自学
+
+在本教程中,你使用了简单线性回归,而不是单变量或多元线性回归。阅读一些关于这些方法之间差异的信息,或查看[此视频](https://www.coursera.org/lecture/quantifying-relationships-regression-models/linear-vs-nonlinear-categorical-variables-ai2Ef)
+
+阅读有关回归概念的更多信息,并思考这种技术可以回答哪些类型的问题。用这个[教程](https://docs.microsoft.com/learn/modules/train-evaluate-regression-models?WT.mc_id=academic-15963-cxa)加深你的理解。
+
+## 任务
+
+[不同的数据集](../assignment.md)
diff --git a/2-Regression/1-Tools/translations/assignment.it.md b/2-Regression/1-Tools/translations/assignment.it.md
new file mode 100644
index 00000000..51fa1663
--- /dev/null
+++ b/2-Regression/1-Tools/translations/assignment.it.md
@@ -0,0 +1,13 @@
+# Regressione con Scikit-learn
+
+## Istruzioni
+
+Dare un'occhiata all'[insieme di dati Linnerud](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud) in Scikit-learn. Questo insieme di dati ha [obiettivi](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset) multipli: "Consiste di tre variabili di esercizio (dati) e tre variabili fisiologiche (obiettivo) raccolte da venti uomini di mezza età in un fitness club".
+
+Con parole proprie, descrivere come creare un modello di Regressione che tracci la relazione tra il punto vita e il numero di addominali realizzati. Fare lo stesso per gli altri punti dati in questo insieme di dati.
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| ------------------------------ | ----------------------------------- | ----------------------------- | -------------------------- |
+| Inviare un paragrafo descrittivo | Viene presentato un paragrafo ben scritto | Vengono inviate alcune frasi | Non viene fornita alcuna descrizione |
diff --git a/2-Regression/1-Tools/translations/assignment.ja.md b/2-Regression/1-Tools/translations/assignment.ja.md
new file mode 100644
index 00000000..6f7d9ef0
--- /dev/null
+++ b/2-Regression/1-Tools/translations/assignment.ja.md
@@ -0,0 +1,13 @@
+# Scikit-learnを用いた回帰
+
+## 課題の指示
+
+Scikit-learnで[Linnerud dataset](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud) を見てみましょう。このデータセットは複数の[ターゲット](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset) を持っています。フィットネスクラブで20人の中年男性から収集した3つの運動変数(data)と3つの生理変数(target)で構成されています。
+
+あなた自身の言葉で、ウエストラインと腹筋の回数との関係をプロットする回帰モデルの作成方法を説明してください。このデータセットの他のデータポイントについても同様に説明してみてください。
+
+## ルーブリック
+
+| 指標 | 模範的 | 適切 | 要改善 |
+| ------------------------------ | ----------------------------------- | ----------------------------- | -------------------------- |
+| 説明文を提出してください。 | よく書けた文章が提出されている。 | いくつかの文章が提出されている。 | 文章が提出されていません。 |
diff --git a/2-Regression/1-Tools/translations/assignment.zh-cn.md b/2-Regression/1-Tools/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..c296c8ca
--- /dev/null
+++ b/2-Regression/1-Tools/translations/assignment.zh-cn.md
@@ -0,0 +1,14 @@
+# 用 Scikit-learn 实现一次回归算法
+
+## 说明
+
+先看看 Scikit-learn 中的 [Linnerud 数据集](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud)
+这个数据集中有多个[目标变量(target)](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset),其中包含了三种运动(训练数据)和三个生理指标(目标变量)组成,这些数据都是从一个健身俱乐部中的20名中年男子收集到的。
+
+之后用自己的方式,创建一个可以描述腰围和完成仰卧起坐个数关系的回归模型。用同样的方式对这个数据集中的其它数据也建立一下模型探究一下其中的关系。
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| ------------------------------ | ----------------------------------- | ----------------------------- | -------------------------- |
+| 需要提交一段能描述数据集中关系的文字 | 很好的描述了数据集中的关系 | 只能描述少部分的关系 | 啥都没有提交 |
diff --git a/2-Regression/2-Data/README.md b/2-Regression/2-Data/README.md
index 2c7f23ad..56493188 100644
--- a/2-Regression/2-Data/README.md
+++ b/2-Regression/2-Data/README.md
@@ -1,10 +1,13 @@
# Build a regression model using Scikit-learn: prepare and visualize data
-> 
-> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+
+Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/11/)
+> ### [This lesson is available in R!](./solution/lesson_2-R.ipynb)
+
## Introduction
Now that you are set up with the tools you need to start tackling machine learning model building with Scikit-learn, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.
diff --git a/2-Regression/2-Data/images/dplyr_wrangling.png b/2-Regression/2-Data/images/dplyr_wrangling.png
new file mode 100644
index 00000000..06c50bb3
Binary files /dev/null and b/2-Regression/2-Data/images/dplyr_wrangling.png differ
diff --git a/2-Regression/2-Data/images/unruly_data.jpg b/2-Regression/2-Data/images/unruly_data.jpg
new file mode 100644
index 00000000..54943ca9
Binary files /dev/null and b/2-Regression/2-Data/images/unruly_data.jpg differ
diff --git a/2-Regression/2-Data/solution/lesson_2-R.ipynb b/2-Regression/2-Data/solution/lesson_2-R.ipynb
new file mode 100644
index 00000000..adb3a503
--- /dev/null
+++ b/2-Regression/2-Data/solution/lesson_2-R.ipynb
@@ -0,0 +1,644 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "lesson_2-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Pg5aexcOPqAZ"
+ },
+ "source": [
+ "# Build a regression model: prepare and visualize data\n",
+ "\n",
+ "## **Linear Regression for Pumpkins - Lesson 2**\n",
+ "#### Introduction\n",
+ "\n",
+ "Now that you are set up with the tools you need to start tackling machine learning model building with Tidymodels and the Tidyverse, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.\n",
+ "\n",
+ "In this lesson, you will learn:\n",
+ "\n",
+ "- How to prepare your data for model-building.\n",
+ "\n",
+ "- How to use `ggplot2` for data visualization.\n",
+ "\n",
+ "The question you need answered will determine what type of ML algorithms you will leverage. And the quality of the answer you get back will be heavily dependent on the nature of your data.\n",
+ "\n",
+ "Let's see this by working through a practical exercise.\n",
+ "\n",
+ " Artwork by \\@allison_horst"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "dc5WhyVdXAjR"
+ },
+ "source": [
+ "## 1. Importing pumpkins data and summoning the Tidyverse\n",
+ "\n",
+ "We'll require the following packages to slice and dice this lesson:\n",
+ "\n",
+ "- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!\n",
+ "\n",
+ "You can have them installed as:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\"))`\n",
+ "\n",
+ "The script below checks whether you have the packages required to complete this module and installs them for you in case some are missing."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "GqPYUZgfXOBt"
+ },
+ "source": [
+ "if (!require(\"pacman\")) install.packages(\"pacman\")\n",
+ "pacman::p_load(tidyverse)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "kvjDTPDSXRr2"
+ },
+ "source": [
+ "Now, let's fire up some packages and load the [data](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/data/US-pumpkins.csv) provided for this lesson!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "VMri-t2zXqgD"
+ },
+ "source": [
+ "# Load the core Tidyverse packages\n",
+ "library(tidyverse)\n",
+ "\n",
+ "# Import the pumpkins data\n",
+ "pumpkins <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv\")\n",
+ "\n",
+ "\n",
+ "# Get a glimpse and dimensions of the data\n",
+ "glimpse(pumpkins)\n",
+ "\n",
+ "\n",
+ "# Print the first 50 rows of the data set\n",
+ "pumpkins %>% \n",
+ " slice_head(n =50)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "REWcIv9yX29v"
+ },
+ "source": [
+ "A quick `glimpse()` immediately shows that there are blanks and a mix of strings (`chr`) and numeric data (`dbl`). The `Date` is of type character and there's also a strange column called `Package` where the data is a mix between `sacks`, `bins` and other values. The data, in fact, is a bit of a mess 😤.\n",
+ "\n",
+ "In fact, it is not very common to be gifted a dataset that is completely ready to use to create a ML model out of the box. But worry not, in this lesson, you will learn how to prepare a raw dataset using standard R libraries 🧑🔧. You will also learn various techniques to visualize the data.📈📊\n",
+ " \n",
+ "\n",
+ "> A refresher: The pipe operator (`%>%`) performs operations in logical sequence by passing an object forward into a function or call expression. You can think of the pipe operator as saying \"and then\" in your code.\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Zxfb3AM5YbUe"
+ },
+ "source": [
+ "## 2. Check for missing data\n",
+ "\n",
+ "One of the most common issues data scientists need to deal with is incomplete or missing data. R represents missing, or unknown values, with special sentinel value: `NA` (Not Available).\n",
+ "\n",
+ "So how would we know that the data frame contains missing values?\n",
+ " \n",
+ "- One straight forward way would be to use the base R function `anyNA` which returns the logical objects `TRUE` or `FALSE`"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "G--DQutAYltj"
+ },
+ "source": [
+ "pumpkins %>% \n",
+ " anyNA()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mU-7-SB6YokF"
+ },
+ "source": [
+ "Great, there seems to be some missing data! That's a good place to start.\n",
+ "\n",
+ "- Another way would be to use the function `is.na()` that indicates which individual column elements are missing with a logical `TRUE`."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "W-DxDOR4YxSW"
+ },
+ "source": [
+ "pumpkins %>% \n",
+ " is.na() %>% \n",
+ " head(n = 7)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "xUWxipKYY0o7"
+ },
+ "source": [
+ "Okay, got the job done but with a large data frame such as this, it would be inefficient and practically impossible to review all of the rows and columns individually😴.\n",
+ "\n",
+ "- A more intuitive way would be to calculate the sum of the missing values for each column:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ZRBWV6P9ZArL"
+ },
+ "source": [
+ "pumpkins %>% \n",
+ " is.na() %>% \n",
+ " colSums()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "9gv-crB6ZD1Y"
+ },
+ "source": [
+ "Much better! There is missing data, but maybe it won't matter for the task at hand. Let's see what further analysis brings forth.\n",
+ "\n",
+ "> Along with the awesome sets of packages and functions, R has a very good documentation. For instance, use `help(colSums)` or `?colSums` to find out more about the function."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "o4jLY5-VZO2C"
+ },
+ "source": [
+ "## 3. Dplyr: A Grammar of Data Manipulation\n",
+ "\n",
+ " Artwork by \\@allison_horst"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "i5o33MQBZWWw"
+ },
+ "source": [
+ "[`dplyr`](https://dplyr.tidyverse.org/), a package in the Tidyverse, is a grammar of data manipulation that provides a consistent set of verbs that help you solve the most common data manipulation challenges. In this section, we'll explore some of dplyr's verbs!\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "x3VGMAGBZiUr"
+ },
+ "source": [
+ "#### dplyr::select()\n",
+ "\n",
+ "`select()` is a function in the package `dplyr` which helps you pick columns to keep or exclude.\n",
+ "\n",
+ "To make your data frame easier to work with, drop several of its columns, using `select()`, keeping only the columns you need.\n",
+ "\n",
+ "For instance, in this exercise, our analysis will involve the columns `Package`, `Low Price`, `High Price` and `Date`. Let's select these columns."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "F_FgxQnVZnM0"
+ },
+ "source": [
+ "# Select desired columns\n",
+ "pumpkins <- pumpkins %>% \n",
+ " select(Package, `Low Price`, `High Price`, Date)\n",
+ "\n",
+ "\n",
+ "# Print data set\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "2KKo0Ed9Z1VB"
+ },
+ "source": [
+ "#### dplyr::mutate()\n",
+ "\n",
+ "`mutate()` is a function in the package `dplyr` which helps you create or modify columns, while keeping the existing columns.\n",
+ "\n",
+ "The general structure of mutate is:\n",
+ "\n",
+ "`data %>% mutate(new_column_name = what_it_contains)`\n",
+ "\n",
+ "Let's take `mutate` out for a spin using the `Date` column by doing the following operations:\n",
+ "\n",
+ "1. Convert the dates (currently of type character) to a month format (these are US dates, so the format is `MM/DD/YYYY`).\n",
+ "\n",
+ "2. Extract the month from the dates to a new column.\n",
+ "\n",
+ "In R, the package [lubridate](https://lubridate.tidyverse.org/) makes it easier to work with Date-time data. So, let's use `dplyr::mutate()`, `lubridate::mdy()`, `lubridate::month()` and see how to achieve the above objectives. We can drop the Date column since we won't be needing it again in subsequent operations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "5joszIVSZ6xe"
+ },
+ "source": [
+ "# Load lubridate\n",
+ "library(lubridate)\n",
+ "\n",
+ "pumpkins <- pumpkins %>% \n",
+ " # Convert the Date column to a date object\n",
+ " mutate(Date = mdy(Date)) %>% \n",
+ " # Extract month from Date\n",
+ " mutate(Month = month(Date)) %>% \n",
+ " # Drop Date column\n",
+ " select(-Date)\n",
+ "\n",
+ "# View the first few rows\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 7)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "nIgLjNMCZ-6Y"
+ },
+ "source": [
+ "Woohoo! 🤩\n",
+ "\n",
+ "Next, let's create a new column `Price`, which represents the average price of a pumpkin. Now, let's take the average of the `Low Price` and `High Price` columns to populate the new Price column.\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Zo0BsqqtaJw2"
+ },
+ "source": [
+ "# Create a new column Price\n",
+ "pumpkins <- pumpkins %>% \n",
+ " mutate(Price = (`Low Price` + `High Price`)/2)\n",
+ "\n",
+ "# View the first few rows of the data\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "p77WZr-9aQAR"
+ },
+ "source": [
+ "Yeees!💪\n",
+ "\n",
+ "\"But wait!\", you'll say after skimming through the whole data set with `View(pumpkins)`, \"There's something odd here!\"🤔\n",
+ "\n",
+ "If you look at the `Package` column, pumpkins are sold in many different configurations. Some are sold in `1 1/9 bushel` measures, and some in `1/2 bushel` measures, some per pumpkin, some per pound, and some in big boxes with varying widths.\n",
+ "\n",
+ "Let's verify this:"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "XISGfh0IaUy6"
+ },
+ "source": [
+ "# Verify the distinct observations in Package column\n",
+ "pumpkins %>% \n",
+ " distinct(Package)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7sMjiVujaZxY"
+ },
+ "source": [
+ "Amazing!👏\n",
+ "\n",
+ "Pumpkins seem to be very hard to weigh consistently, so let's filter them by selecting only pumpkins with the string *bushel* in the `Package` column and put this in a new data frame `new_pumpkins`.\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "L8Qfcs92ageF"
+ },
+ "source": [
+ "#### dplyr::filter() and stringr::str_detect()\n",
+ "\n",
+ "[`dplyr::filter()`](https://dplyr.tidyverse.org/reference/filter.html): creates a subset of the data only containing **rows** that satisfy your conditions, in this case, pumpkins with the string *bushel* in the `Package` column.\n",
+ "\n",
+ "[stringr::str_detect()](https://stringr.tidyverse.org/reference/str_detect.html): detects the presence or absence of a pattern in a string.\n",
+ "\n",
+ "The [`stringr`](https://github.com/tidyverse/stringr) package provides simple functions for common string operations."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "hy_SGYREampd"
+ },
+ "source": [
+ "# Retain only pumpkins with \"bushel\"\n",
+ "new_pumpkins <- pumpkins %>% \n",
+ " filter(str_detect(Package, \"bushel\"))\n",
+ "\n",
+ "# Get the dimensions of the new data\n",
+ "dim(new_pumpkins)\n",
+ "\n",
+ "# View a few rows of the new data\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "VrDwF031avlR"
+ },
+ "source": [
+ "You can see that we have narrowed down to 415 or so rows of data containing pumpkins by the bushel.🤩\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mLpw2jH4a0tx"
+ },
+ "source": [
+ "#### dplyr::case_when()\n",
+ "\n",
+ "**But wait! There's one more thing to do**\n",
+ "\n",
+ "Did you notice that the bushel amount varies per row? You need to normalize the pricing so that you show the pricing per bushel, not per 1 1/9 or 1/2 bushel. Time to do some math to standardize it.\n",
+ "\n",
+ "We'll use the function [`case_when()`](https://dplyr.tidyverse.org/reference/case_when.html) to *mutate* the Price column depending on some conditions. `case_when` allows you to vectorise multiple `if_else()`statements.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "P68kLVQmbM6I"
+ },
+ "source": [
+ "# Convert the price if the Package contains fractional bushel values\n",
+ "new_pumpkins <- new_pumpkins %>% \n",
+ " mutate(Price = case_when(\n",
+ " str_detect(Package, \"1 1/9\") ~ Price/(1 + 1/9),\n",
+ " str_detect(Package, \"1/2\") ~ Price/(1/2),\n",
+ " TRUE ~ Price))\n",
+ "\n",
+ "# View the first few rows of the data\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 30)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pS2GNPagbSdb"
+ },
+ "source": [
+ "Now, we can analyze the pricing per unit based on their bushel measurement. All this study of bushels of pumpkins, however, goes to show how very `important` it is to `understand the nature of your data`!\n",
+ "\n",
+ "> ✅ According to [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308), a bushel's weight depends on the type of produce, as it's a volume measurement. \"A bushel of tomatoes, for example, is supposed to weigh 56 pounds... Leaves and greens take up more space with less weight, so a bushel of spinach is only 20 pounds.\" It's all pretty complicated! Let's not bother with making a bushel-to-pound conversion, and instead price by the bushel. All this study of bushels of pumpkins, however, goes to show how very important it is to understand the nature of your data!\n",
+ ">\n",
+ "> ✅ Did you notice that pumpkins sold by the half-bushel are very expensive? Can you figure out why? Hint: little pumpkins are way pricier than big ones, probably because there are so many more of them per bushel, given the unused space taken by one big hollow pie pumpkin.\n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "qql1SowfbdnP"
+ },
+ "source": [
+ "Now lastly, for the sheer sake of adventure 💁♀️, let's also move the Month column to the first position i.e `before` column `Package`.\n",
+ "\n",
+ "`dplyr::relocate()` is used to change column positions."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "JJ1x6kw8bixF"
+ },
+ "source": [
+ "# Create a new data frame new_pumpkins\n",
+ "new_pumpkins <- new_pumpkins %>% \n",
+ " relocate(Month, .before = Package)\n",
+ "\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 7)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "y8TJ0Za_bn5Y"
+ },
+ "source": [
+ "Good job!👌 You now have a clean, tidy dataset on which you can build your new regression model!\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "mYSH6-EtbvNa"
+ },
+ "source": [
+ "## 4. Data visualization with ggplot2\n",
+ "\n",
+ "{width=\"600\"}\n",
+ "\n",
+ "There is a *wise* saying that goes like this:\n",
+ "\n",
+ "> \"The simple graph has brought more information to the data analyst's mind than any other device.\" --- John Tukey\n",
+ "\n",
+ "Part of the data scientist's role is to demonstrate the quality and nature of the data they are working with. To do this, they often create interesting visualizations, or plots, graphs, and charts, showing different aspects of data. In this way, they are able to visually show relationships and gaps that are otherwise hard to uncover.\n",
+ "\n",
+ "Visualizations can also help determine the machine learning technique most appropriate for the data. A scatterplot that seems to follow a line, for example, indicates that the data is a good candidate for a linear regression exercise.\n",
+ "\n",
+ "R offers a number of several systems for making graphs, but [`ggplot2`](https://ggplot2.tidyverse.org/index.html) is one of the most elegant and most versatile. `ggplot2` allows you to compose graphs by **combining independent components**.\n",
+ "\n",
+ "Let's start with a simple scatter plot for the Price and Month columns.\n",
+ "\n",
+ "So in this case, we'll start with [`ggplot()`](https://ggplot2.tidyverse.org/reference/ggplot.html), supply a dataset and aesthetic mapping (with [`aes()`](https://ggplot2.tidyverse.org/reference/aes.html)) then add a layers (like [`geom_point()`](https://ggplot2.tidyverse.org/reference/geom_point.html)) for scatter plots.\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "g2YjnGeOcLo4"
+ },
+ "source": [
+ "# Set a theme for the plots\n",
+ "theme_set(theme_light())\n",
+ "\n",
+ "# Create a scatter plot\n",
+ "p <- ggplot(data = new_pumpkins, aes(x = Price, y = Month))\n",
+ "p + geom_point()"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Ml7SDCLQcPvE"
+ },
+ "source": [
+ "Is this a useful plot 🤷? Does anything about it surprise you?\n",
+ "\n",
+ "It's not particularly useful as all it does is display in your data as a spread of points in a given month.\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jMakvJZIcVkh"
+ },
+ "source": [
+ "### **How do we make it useful?**\n",
+ "\n",
+ "To get charts to display useful data, you usually need to group the data somehow. For instance in our case, finding the average price of pumpkins for each month would provide more insights to the underlying patterns in our data. This leads us to one more **dplyr** flyby:\n",
+ "\n",
+ "#### `dplyr::group_by() %>% summarize()`\n",
+ "\n",
+ "Grouped aggregation in R can be easily computed using\n",
+ "\n",
+ "`dplyr::group_by() %>% summarize()`\n",
+ "\n",
+ "- `dplyr::group_by()` changes the unit of analysis from the complete dataset to individual groups such as per month.\n",
+ "\n",
+ "- `dplyr::summarize()` creates a new data frame with one column for each grouping variable and one column for each of the summary statistics that you have specified.\n",
+ "\n",
+ "For example, we can use the `dplyr::group_by() %>% summarize()` to group the pumpkins into groups based on the **Month** columns and then find the **mean price** for each month."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6kVSUa2Bcilf"
+ },
+ "source": [
+ "# Find the average price of pumpkins per month\n",
+ "new_pumpkins %>%\n",
+ " group_by(Month) %>% \n",
+ " summarise(mean_price = mean(Price))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Kds48GUBcj3W"
+ },
+ "source": [
+ "Succinct!✨\n",
+ "\n",
+ "Categorical features such as months are better represented using a bar plot 📊. The layers responsible for bar charts are `geom_bar()` and `geom_col()`. Consult `?geom_bar` to find out more.\n",
+ "\n",
+ "Let's whip up one!"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "VNbU1S3BcrxO"
+ },
+ "source": [
+ "# Find the average price of pumpkins per month then plot a bar chart\n",
+ "new_pumpkins %>%\n",
+ " group_by(Month) %>% \n",
+ " summarise(mean_price = mean(Price)) %>% \n",
+ " ggplot(aes(x = Month, y = mean_price)) +\n",
+ " geom_col(fill = \"midnightblue\", alpha = 0.7) +\n",
+ " ylab(\"Pumpkin Price\")"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zDm0VOzzcuzR"
+ },
+ "source": [
+ "🤩🤩This is a more useful data visualization! It seems to indicate that the highest price for pumpkins occurs in September and October. Does that meet your expectation? Why or why not?\n",
+ "\n",
+ "Congratulations on finishing the second lesson 👏! You prepared your data for model building, then uncovered more insights using visualizations!"
+ ]
+ }
+ ]
+}
diff --git a/2-Regression/2-Data/solution/lesson_2.Rmd b/2-Regression/2-Data/solution/lesson_2.Rmd
new file mode 100644
index 00000000..7f72b1a8
--- /dev/null
+++ b/2-Regression/2-Data/solution/lesson_2.Rmd
@@ -0,0 +1,345 @@
+---
+title: 'Build a regression model: prepare and visualize data'
+output:
+ html_document:
+ df_print: paged
+ theme: flatly
+ highlight: breezedark
+ toc: yes
+ toc_float: yes
+ code_download: yes
+---
+
+## **Linear Regression for Pumpkins - Lesson 2**
+
+#### Introduction
+
+Now that you are set up with the tools you need to start tackling machine learning model building with Tidymodels and the Tidyverse, you are ready to start asking questions of your data. As you work with data and apply ML solutions, it's very important to understand how to ask the right question to properly unlock the potentials of your dataset.
+
+In this lesson, you will learn:
+
+- How to prepare your data for model-building.
+
+- How to use `ggplot2` for data visualization.
+
+The question you need answered will determine what type of ML algorithms you will leverage. And the quality of the answer you get back will be heavily dependent on the nature of your data.
+
+Let's see this by working through a practical exercise.
+
+{width="700"}
+
+## 1. Importing pumpkins data and summoning the Tidyverse
+
+We'll require the following packages to slice and dice this lesson:
+
+- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!
+
+You can have them installed as:
+
+`install.packages(c("tidyverse"))`
+
+The script below checks whether you have the packages required to complete this module and installs them for you in case they are missing.
+
+```{r, message=F, warning=F}
+if (!require("pacman")) install.packages("pacman")
+pacman::p_load(tidyverse)
+```
+
+Now, let's fire up some packages and load the [data](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/data/US-pumpkins.csv) provided for this lesson!
+
+```{r load_tidy_verse_models, message=F, warning=F}
+# Load the core Tidyverse packages
+library(tidyverse)
+
+# Import the pumpkins data
+pumpkins <- read_csv(file = "https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv")
+
+
+# Get a glimpse and dimensions of the data
+glimpse(pumpkins)
+
+
+# Print the first 50 rows of the data set
+pumpkins %>%
+ slice_head(n =50)
+
+```
+
+A quick `glimpse()` immediately shows that there are blanks and a mix of strings (`chr`) and numeric data (`dbl`). The `Date` is of type character and there's also a strange column called `Package` where the data is a mix between `sacks`, `bins` and other values. The data, in fact, is a bit of a mess 😤.
+
+In fact, it is not very common to be gifted a dataset that is completely ready to use to create a ML model out of the box. But worry not, in this lesson, you will learn how to prepare a raw dataset using standard R libraries 🧑🔧. You will also learn various techniques to visualize the data.📈📊
+
+
+
+> A refresher: The pipe operator (`%>%`) performs operations in logical sequence by passing an object forward into a function or call expression. You can think of the pipe operator as saying "and then" in your code.
+
+
+## 2. Check for missing data
+
+One of the most common issues data scientists need to deal with is incomplete or missing data. R represents missing, or unknown values, with special sentinel value: `NA` (Not Available).
+
+So how would we know that the data frame contains missing values?
+
+- One straight forward way would be to use the base R function `anyNA` which returns the logical objects `TRUE` or `FALSE`
+
+```{r anyNA, message=F, warning=F}
+pumpkins %>%
+ anyNA()
+```
+
+Great, there seems to be some missing data! That's a good place to start.
+
+- Another way would be to use the function `is.na()` that indicates which individual column elements are missing with a logical `TRUE`.
+
+```{r is_na, message=F, warning=F}
+pumpkins %>%
+ is.na() %>%
+ head(n = 7)
+```
+
+Okay, got the job done but with a large data frame such as this, it would be inefficient and practically impossible to review all of the rows and columns individually😴.
+
+- A more intuitive way would be to calculate the sum of the missing values for each column:
+
+```{r colSum_NA, message=F, warning=F}
+pumpkins %>%
+ is.na() %>%
+ colSums()
+```
+
+Much better! There is missing data, but maybe it won't matter for the task at hand. Let's see what further analysis brings forth.
+
+> Along with the awesome sets of packages and functions, R has a very good documentation. For instance, use `help(colSums)` or `?colSums` to find out more about the function.
+
+## 3. Dplyr: A Grammar of Data Manipulation
+
+{width="569"}
+
+[`dplyr`](https://dplyr.tidyverse.org/), a package in the Tidyverse, is a grammar of data manipulation that provides a consistent set of verbs that help you solve the most common data manipulation challenges. In this section, we'll explore some of dplyr's verbs!
+
+#### dplyr::select()
+
+`select()` is a function in the package `dplyr` which helps you pick columns to keep or exclude.
+
+To make your data frame easier to work with, drop several of its columns, using `select()`, keeping only the columns you need.
+
+For instance, in this exercise, our analysis will involve the columns `Package`, `Low Price`, `High Price` and `Date`. Let's select these columns.
+
+```{r select, message=F, warning=F}
+# Select desired columns
+pumpkins <- pumpkins %>%
+ select(Package, `Low Price`, `High Price`, Date)
+
+
+# Print data set
+pumpkins %>%
+ slice_head(n = 5)
+```
+
+#### dplyr::mutate()
+
+`mutate()` is a function in the package `dplyr` which helps you create or modify columns, while keeping the existing columns.
+
+The general structure of mutate is:
+
+`data %>% mutate(new_column_name = what_it_contains)`
+
+Let's take `mutate` out for a spin using the `Date` column by doing the following operations:
+
+1. Convert the dates (currently of type character) to a month format (these are US dates, so the format is `MM/DD/YYYY`).
+
+2. Extract the month from the dates to a new column.
+
+In R, the package [lubridate](https://lubridate.tidyverse.org/) makes it easier to work with Date-time data. So, let's use `dplyr::mutate()`, `lubridate::mdy()`, `lubridate::month()` and see how to achieve the above objectives. We can drop the Date column since we won't be needing it again in subsequent operations.
+
+```{r mut_date, message=F, warning=F}
+# Load lubridate
+library(lubridate)
+
+pumpkins <- pumpkins %>%
+ # Convert the Date column to a date object
+ mutate(Date = mdy(Date)) %>%
+ # Extract month from Date
+ mutate(Month = month(Date)) %>%
+ # Drop Date column
+ select(-Date)
+
+# View the first few rows
+pumpkins %>%
+ slice_head(n = 7)
+```
+
+Woohoo! 🤩
+
+Next, let's create a new column `Price`, which represents the average price of a pumpkin. Now, let's take the average of the `Low Price` and `High Price` columns to populate the new Price column.
+
+```{r price, message=F, warning=F}
+# Create a new column Price
+pumpkins <- pumpkins %>%
+ mutate(Price = (`Low Price` + `High Price`)/2)
+
+# View the first few rows of the data
+pumpkins %>%
+ slice_head(n = 5)
+```
+
+Yeees!💪
+
+"But wait!", you'll say after skimming through the whole data set with `View(pumpkins)`, "There's something odd here!"🤔
+
+If you look at the `Package` column, pumpkins are sold in many different configurations. Some are sold in `1 1/9 bushel` measures, and some in `1/2 bushel` measures, some per pumpkin, some per pound, and some in big boxes with varying widths.
+
+Let's verify this:
+
+```{r Package, message=F, warning=F}
+# Verify the distinct observations in Package column
+pumpkins %>%
+ distinct(Package)
+
+```
+
+Amazing!👏
+
+Pumpkins seem to be very hard to weigh consistently, so let's filter them by selecting only pumpkins with the string *bushel* in the `Package` column and put this in a new data frame `new_pumpkins`.
+
+#### dplyr::filter() and stringr::str_detect()
+
+[`dplyr::filter()`](https://dplyr.tidyverse.org/reference/filter.html): creates a subset of the data only containing **rows** that satisfy your conditions, in this case, pumpkins with the string *bushel* in the `Package` column.
+
+[stringr::str_detect()](https://stringr.tidyverse.org/reference/str_detect.html): detects the presence or absence of a pattern in a string.
+
+The [`stringr`](https://github.com/tidyverse/stringr) package provides simple functions for common string operations.
+
+```{r filter, message=F, warning=F}
+# Retain only pumpkins with "bushel"
+new_pumpkins <- pumpkins %>%
+ filter(str_detect(Package, "bushel"))
+
+# Get the dimensions of the new data
+dim(new_pumpkins)
+
+# View a few rows of the new data
+new_pumpkins %>%
+ slice_head(n = 5)
+```
+
+You can see that we have narrowed down to 415 or so rows of data containing pumpkins by the bushel.🤩
+
+#### dplyr::case_when()
+
+**But wait! There's one more thing to do**
+
+Did you notice that the bushel amount varies per row? You need to normalize the pricing so that you show the pricing per bushel, not per 1 1/9 or 1/2 bushel. Time to do some math to standardize it.
+
+We'll use the function [`case_when()`](https://dplyr.tidyverse.org/reference/case_when.html) to *mutate* the Price column depending on some conditions. `case_when` allows you to vectorise multiple `if_else()`statements.
+
+```{r normalize_price, message=F, warning=F}
+# Convert the price if the Package contains fractional bushel values
+new_pumpkins <- new_pumpkins %>%
+ mutate(Price = case_when(
+ str_detect(Package, "1 1/9") ~ Price/(1 + 1/9),
+ str_detect(Package, "1/2") ~ Price/(1/2),
+ TRUE ~ Price))
+
+# View the first few rows of the data
+new_pumpkins %>%
+ slice_head(n = 30)
+```
+
+Now, we can analyze the pricing per unit based on their bushel measurement. All this study of bushels of pumpkins, however, goes to show how very `important` it is to `understand the nature of your data`!
+
+> ✅ According to [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308), a bushel's weight depends on the type of produce, as it's a volume measurement. "A bushel of tomatoes, for example, is supposed to weigh 56 pounds... Leaves and greens take up more space with less weight, so a bushel of spinach is only 20 pounds." It's all pretty complicated! Let's not bother with making a bushel-to-pound conversion, and instead price by the bushel. All this study of bushels of pumpkins, however, goes to show how very important it is to understand the nature of your data!
+>
+> ✅ Did you notice that pumpkins sold by the half-bushel are very expensive? Can you figure out why? Hint: little pumpkins are way pricier than big ones, probably because there are so many more of them per bushel, given the unused space taken by one big hollow pie pumpkin.
+
+Now lastly, for the sheer sake of adventure 💁♀️, let's also move the Month column to the first position i.e `before` column `Package`.
+
+`dplyr::relocate()` is used to change column positions.
+
+```{r new_pumpkins, message=F, warning=F}
+# Create a new data frame new_pumpkins
+new_pumpkins <- new_pumpkins %>%
+ relocate(Month, .before = Package)
+
+new_pumpkins %>%
+ slice_head(n = 7)
+
+```
+
+Good job!👌 You now have a clean, tidy dataset on which you can build your new regression model!
+
+## 4. Data visualization with ggplot2
+
+{width="600"}
+
+There is a *wise* saying that goes like this:
+
+> "The simple graph has brought more information to the data analyst's mind than any other device." --- John Tukey
+
+Part of the data scientist's role is to demonstrate the quality and nature of the data they are working with. To do this, they often create interesting visualizations, or plots, graphs, and charts, showing different aspects of data. In this way, they are able to visually show relationships and gaps that are otherwise hard to uncover.
+
+Visualizations can also help determine the machine learning technique most appropriate for the data. A scatterplot that seems to follow a line, for example, indicates that the data is a good candidate for a linear regression exercise.
+
+R offers a number of several systems for making graphs, but [`ggplot2`](https://ggplot2.tidyverse.org/index.html) is one of the most elegant and most versatile. `ggplot2` allows you to compose graphs by **combining independent components**.
+
+Let's start with a simple scatter plot for the Price and Month columns.
+
+So in this case, we'll start with [`ggplot()`](https://ggplot2.tidyverse.org/reference/ggplot.html), supply a dataset and aesthetic mapping (with [`aes()`](https://ggplot2.tidyverse.org/reference/aes.html)) then add a layers (like [`geom_point()`](https://ggplot2.tidyverse.org/reference/geom_point.html)) for scatter plots.
+
+```{r scatter_plt, message=F, warning=F}
+# Set a theme for the plots
+theme_set(theme_light())
+
+# Create a scatter plot
+p <- ggplot(data = new_pumpkins, aes(x = Price, y = Month))
+p + geom_point()
+```
+
+Is this a useful plot 🤷? Does anything about it surprise you?
+
+It's not particularly useful as all it does is display in your data as a spread of points in a given month.
+
+### **How do we make it useful?**
+
+To get charts to display useful data, you usually need to group the data somehow. For instance in our case, finding the average price of pumpkins for each month would provide more insights to the underlying patterns in our data. This leads us to one more **dplyr** flyby:
+
+#### `dplyr::group_by() %>% summarize()`
+
+Grouped aggregation in R can be easily computed using
+
+`dplyr::group_by() %>% summarize()`
+
+- `dplyr::group_by()` changes the unit of analysis from the complete dataset to individual groups such as per month.
+
+- `dplyr::summarize()` creates a new data frame with one column for each grouping variable and one column for each of the summary statistics that you have specified.
+
+For example, we can use the `dplyr::group_by() %>% summarize()` to group the pumpkins into groups based on the **Month** columns and then find the **mean price** for each month.
+
+```{r grp_sumry, message=F, warning=F}
+# Find the average price of pumpkins per month
+new_pumpkins %>%
+ group_by(Month) %>%
+ summarise(mean_price = mean(Price))
+```
+
+Succinct!✨
+
+Categorical features such as months are better represented using a bar plot 📊. The layers responsible for bar charts are `geom_bar()` and `geom_col()`. Consult
+
+`?geom_bar` to find out more.
+
+Let's whip up one!
+
+```{r bar_plt, message=F, warning=F}
+# Find the average price of pumpkins per month then plot a bar chart
+new_pumpkins %>%
+ group_by(Month) %>%
+ summarise(mean_price = mean(Price)) %>%
+ ggplot(aes(x = Month, y = mean_price)) +
+ geom_col(fill = "midnightblue", alpha = 0.7) +
+ ylab("Pumpkin Price")
+```
+
+🤩🤩This is a more useful data visualization! It seems to indicate that the highest price for pumpkins occurs in September and October. Does that meet your expectation? Why or why not?
+
+Congratulations on finishing the second lesson 👏! You did prepared your data for model building, then uncovered more insights using visualizations!\
diff --git a/2-Regression/2-Data/translations/README.it.md b/2-Regression/2-Data/translations/README.it.md
new file mode 100644
index 00000000..7b78ac52
--- /dev/null
+++ b/2-Regression/2-Data/translations/README.it.md
@@ -0,0 +1,201 @@
+# Costruire un modello di regressione usando Scikit-learn: preparare e visualizzare i dati
+
+> 
+> Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/11/)
+
+## Introduzione
+
+Ora che si hanno a disposizione gli strumenti necessari per iniziare ad affrontare la creazione di modelli di machine learning con Scikit-learn, si è pronti per iniziare a porre domande sui propri dati. Mentre si lavora con i dati e si applicano soluzioni ML, è molto importante capire come porre la domanda giusta per sbloccare correttamente le potenzialità del proprio insieme di dati.
+
+In questa lezione, si imparerà:
+
+- Come preparare i dati per la creazione del modello.
+- Come utilizzare Matplotlib per la visualizzazione dei dati.
+
+## Fare la domanda giusta ai propri dati
+
+La domanda a cui si deve rispondere determinerà il tipo di algoritmi ML che verranno utilizzati. La qualità della risposta che si riceverà dipenderà fortemente dalla natura dei propri dati.
+
+Si dia un'occhiata ai [dati](../../data/US-pumpkins.csv) forniti per questa lezione. Si può aprire questo file .csv in VS Code. Una rapida scrematura mostra immediatamente che ci sono spazi vuoti e un mix di stringhe e dati numerici. C'è anche una strana colonna chiamata "Package" (pacchetto) in cui i dati sono un mix tra "sacks" (sacchi), "bins" (contenitori) e altri valori. I dati, infatti, sono un po' un pasticcio.
+
+In effetti, non è molto comune ricevere un insieme di dati completamente pronto per creare un modello ML pronto all'uso. In questa lezione si imparerà come preparare un insieme di dati non elaborato utilizzando le librerie standard di Python. Si impareranno anche varie tecniche per visualizzare i dati.
+
+## Caso di studio: 'il mercato della zucca'
+
+In questa cartella si troverà un file .csv nella cartella `data` radice chiamato [US-pumpkins.csv](../../data/US-pumpkins.csv) che include 1757 righe di dati sul mercato delle zucche, ordinate in raggruppamenti per città. Si tratta di dati grezzi estratti dai [Report Standard dei Mercati Terminali delle Colture Speciali](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) distribuiti dal Dipartimento dell'Agricoltura degli Stati Uniti.
+
+### Preparazione dati
+
+Questi dati sono di pubblico dominio. Possono essere scaricati in molti file separati, per città, dal sito web dell'USDA. Per evitare troppi file separati, sono stati concatenati tutti i dati della città in un unico foglio di calcolo, quindi un po' i dati sono già stati _preparati_ . Successivamente, si darà un'occhiata più da vicino ai dati.
+
+### I dati della zucca - prime conclusioni
+
+Cosa si nota riguardo a questi dati? Si è già visto che c'è un mix di stringhe, numeri, spazi e valori strani a cui occorre dare un senso.
+
+Che domanda si puà fare a questi dati, utilizzando una tecnica di Regressione? Che dire di "Prevedere il prezzo di una zucca in vendita durante un dato mese". Esaminando nuovamente i dati, ci sono alcune modifiche da apportare per creare la struttura dati necessaria per l'attività.
+
+## Esercizio: analizzare i dati della zucca
+
+Si usa [Pandas](https://pandas.pydata.org/), (il nome sta per `Python Data Analysis`) uno strumento molto utile per dare forma ai dati, per analizzare e preparare questi dati sulla zucca.
+
+### Innanzitutto, controllare le date mancanti
+
+Prima si dovranno eseguire i passaggi per verificare le date mancanti:
+
+1. Convertire le date in un formato mensile (queste sono date statunitensi, quindi il formato è `MM/GG/AAAA`).
+2. Estrarre il mese in una nuova colonna.
+
+Aprire il file _notebook.ipynb_ in Visual Studio Code e importare il foglio di calcolo in un nuovo dataframe Pandas.
+
+1. Usare la funzione `head()` per visualizzare le prime cinque righe.
+
+ ```python
+ import pandas as pd
+ pumpkins = pd.read_csv('../data/US-pumpkins.csv')
+ pumpkins.head()
+ ```
+
+ ✅ Quale funzione si userebbe per visualizzare le ultime cinque righe?
+
+1. Controllare se mancano dati nel dataframe corrente:
+
+ ```python
+ pumpkins.isnull().sum()
+ ```
+
+ Ci sono dati mancanti, ma forse non avrà importanza per l'attività da svolgere.
+
+1. Per rendere più facile lavorare con il dataframe, si scartano molte delle sue colonne, usando `drop()`, mantenendo solo le colonne di cui si ha bisogno:
+
+ ```python
+ new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']
+ pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
+ ```
+
+### Secondo, determinare il prezzo medio della zucca
+
+Si pensi a come determinare il prezzo medio di una zucca in un dato mese. Quali colonne si sceglierebbero per questa attività? Suggerimento: serviranno 3 colonne.
+
+Soluzione: prendere la media delle colonne `Low Price` e `High Price` per popolare la nuova colonna Price e convertire la colonna Date per mostrare solo il mese. Fortunatamente, secondo il controllo di cui sopra, non mancano dati per date o prezzi.
+
+1. Per calcolare la media, aggiungere il seguente codice:
+
+ ```python
+ price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2
+
+ month = pd.DatetimeIndex(pumpkins['Date']).month
+
+ ```
+
+ ✅ Si possono di stampare tutti i dati che si desidera controllare utilizzando `print(month)`.
+
+2. Ora copiare i dati convertiti in un nuovo dataframe Pandas:
+
+ ```python
+ new_pumpkins = pd.DataFrame({'Month': month, 'Package': pumpkins['Package'], 'Low Price': pumpkins['Low Price'],'High Price': pumpkins['High Price'], 'Price': price})
+ ```
+
+ La stampa del dataframe mostrerà un insieme di dati pulito e ordinato su cui si può costruire il nuovo modello di regressione.
+
+### Ma non è finita qui! C'è qualcosa di strano qui.
+
+Osservando la colonna `Package`, le zucche sono vendute in molte configurazioni diverse. Alcune sono venduti in misure '1 1/9 bushel' (bushel = staio) e alcuni in misure '1/2 bushel', alcuni per zucca, alcuni per libbra e alcuni in grandi scatole con larghezze variabili.
+
+> Le zucche sembrano molto difficili da pesare in modo coerente
+
+Scavando nei dati originali, è interessante notare che qualsiasi cosa con `Unit of Sale` (Unità di vendita) uguale a 'EACH' o 'PER BIN' ha anche il tipo di `Package` per 'inch' (pollice), per 'bin' (contenitore) o 'each' (entrambi). Le zucche sembrano essere molto difficili da pesare in modo coerente, quindi si filtrano selezionando solo zucche con la stringa "bushel" nella colonna `Package`.
+
+1. Aggiungere un filtro nella parte superiore del file, sotto l'importazione .csv iniziale:
+
+ ```python
+ pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]
+ ```
+
+ Se si stampano i dati ora, si può vedere che si stanno ricevendo solo le circa 415 righe di dati contenenti zucche per bushel.
+
+### Ma non è finita qui! C'è un'altra cosa da fare.
+
+Si è notato che la quantità di bushel varia per riga? Si deve normalizzare il prezzo in modo da mostrare il prezzo per bushel, quindi si facciano un po' di calcoli per standardizzarlo.
+
+1. Aggiungere queste righe dopo il blocco che crea il dataframe new_pumpkins:
+
+ ```python
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/(1 + 1/9)
+
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price/(1/2)
+ ```
+
+✅ Secondo [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308), il peso di un bushel dipende dal tipo di prodotto, poiché è una misura di volume. "Un bushel di pomodori, per esempio, dovrebbe pesare 56 libbre... Foglie e verdure occupano più spazio con meno peso, quindi un bushel di spinaci è solo 20 libbre". È tutto piuttosto complicato! Non occorre preoccuparsi di fare una conversione da bushel a libbra, e invece si valuta a bushel. Tutto questo studio sui bushel di zucche, però, dimostra quanto sia importante capire la natura dei propri dati!
+
+Ora si può analizzare il prezzo per unità in base alla misurazione del bushel. Se si stampano i dati ancora una volta, si può vedere come sono standardizzati.
+
+✅ Si è notato che le zucche vendute a metà bushel sono molto costose? Si riesce a capire perché? Suggerimento: le zucche piccole sono molto più costose di quelle grandi, probabilmente perché ce ne sono molte di più per bushel, dato lo spazio inutilizzato occupato da una grande zucca cava.
+
+## Strategie di Visualizzazione
+
+Parte del ruolo del data scientist è dimostrare la qualità e la natura dei dati con cui sta lavorando. Per fare ciò, si creano spesso visualizzazioni interessanti o tracciati, grafici e diagrammi, che mostrano diversi aspetti dei dati. In questo modo, sono in grado di mostrare visivamente relazioni e lacune altrimenti difficili da scoprire.
+
+Le visualizzazioni possono anche aiutare a determinare la tecnica di machine learning più appropriata per i dati. Un grafico a dispersione che sembra seguire una linea, ad esempio, indica che i dati sono un buon candidato per un esercizio di regressione lineare.
+
+Una libreria di visualizzazione dei dati che funziona bene nei notebook Jupyter è [Matplotlib](https://matplotlib.org/) (che si è visto anche nella lezione precedente).
+
+> Per fare più esperienza con la visualizzazione dei dati si seguano [questi tutorial](https://docs.microsoft.com/learn/modules/explore-analyze-data-with-python?WT.mc_id=academic-15963-cxa).
+
+## Esercizio - sperimentare con Matplotlib
+
+Provare a creare alcuni grafici di base per visualizzare il nuovo dataframe appena creato. Cosa mostrerebbe un grafico a linee di base?
+
+1. Importare Matplotlib nella parte superiore del file, sotto l'importazione di Pandas:
+
+ ```python
+ import matplotlib.pyplot as plt
+ ```
+
+1. Rieseguire l'intero notebook per aggiornare.
+1. Nella parte inferiore del notebook, aggiungere una cella per tracciare i dati come una casella:
+
+ ```python
+ price = new_pumpkins.Price
+ month = new_pumpkins.Month
+ plt.scatter(price, month)
+ plt.show()
+ ```
+
+ 
+
+ È un tracciato utile? C'è qualcosa che sorprende?
+
+ Non è particolarmente utile in quanto tutto ciò che fa è visualizzare nei propri dati come una diffusione di punti in un dato mese.
+
+### Renderlo utile
+
+Per fare in modo che i grafici mostrino dati utili, di solito è necessario raggruppare i dati in qualche modo. Si prova a creare un grafico che mostra la distribuzione dei dati dove l'asse x mostra i mesi.
+
+1. Aggiungere una cella per creare un grafico a barre raggruppato:
+
+ ```python
+ new_pumpkins.groupby(['Month'])['Price'].mean().plot(kind='bar')
+ plt.ylabel("Pumpkin Price")
+ ```
+
+ 
+
+ Questa è una visualizzazione dei dati più utile! Sembra indicare che il prezzo più alto per le zucche si verifica a settembre e ottobre. Questo soddisfa le proprie aspettative? Perché o perché no?
+
+---
+
+## 🚀 Sfida
+
+Esplorare i diversi tipi di visualizzazione offerti da Matplotlib. Quali tipi sono più appropriati per i problemi di regressione?
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/12/)
+
+## Revisione e Auto Apprendimento
+
+Dare un'occhiata ai molti modi per visualizzare i dati. Fare un elenco delle varie librerie disponibili e annotare quali sono le migliori per determinati tipi di attività, ad esempio visualizzazioni 2D rispetto a visualizzazioni 3D. Cosa si è scoperto?
+
+## Compito
+
+[Esplorazione della visualizzazione](assignment.it.md)
diff --git a/2-Regression/2-Data/translations/README.ja.md b/2-Regression/2-Data/translations/README.ja.md
new file mode 100644
index 00000000..8a9f4d55
--- /dev/null
+++ b/2-Regression/2-Data/translations/README.ja.md
@@ -0,0 +1,206 @@
+# Scikit-learnを用いた回帰モデルの構築: データの準備と可視化
+
+> 
+>
+> [Dasani Madipalli](https://twitter.com/dasani_decoded) によるインフォグラフィック
+
+## [講義前のクイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/11?loc=ja)
+
+## イントロダクション
+
+Scikit-learnを使って機械学習モデルの構築を行うために必要なツールの用意ができたところで、データに対する問いかけを始める準備が整いました。データを扱いMLソリューションを適用する際には、データセットの潜在能力を適切に引き出すために正しい問いかけをすることが非常に重要です。
+
+このレッスンでは、以下のことを学びます。
+
+- モデルを構築するためのデータ処理方法について
+- データの可視化におけるMatplotlibの使い方について
+
+## データに対して正しい問いかけをする
+
+どのような質問に答えるかによって、どのようなMLアルゴリズムを活用するかが決まります。また、返ってくる回答の質は、データの性質に大きく依存します。
+
+このレッスンのために用意された[データ]((../../data/US-pumpkins.csv))を見てみましょう。この.csvファイルは、VS Codeで開くことができます。ざっと確認してみると、空欄があったり、文字列や数値データが混在していることがわかります。また、「Package」という奇妙な列では「sacks」や 「bins」などの異なる単位の値が混在しています。このように、データはちょっとした混乱状態にあります。
+
+実際のところ、MLモデルの作成にすぐに使えるような整ったデータセットをそのまま受け取ることはあまりありません。このレッスンでは、Pythonの標準ライブラリを使って生のデータセットを準備する方法を学びます。また、データを可視化するための様々なテクニックを学びます。
+
+## ケーススタディ: カボチャの市場
+
+ルートの`date`フォルダの中に [US-pumpkins.csv](../../data/US-pumpkins.csv) という名前の.csvファイルがあります。このファイルには、カボチャの市場に関する1757行のデータが、都市ごとにグループ分けされて入っています。これは、米国農務省が配布している [Specialty Crops Terminal Markets Standard Reports](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) から抽出した生データです。
+
+### データの準備
+
+このデータはパブリックドメインです。米国農務省のウェブサイトから、都市ごとに個別ファイルをダウンロードすることができます。ファイルが多くなりすぎないように、すべての都市のデータを1つのスプレッドシートに連結しました。次に、データを詳しく見てみましょう。
+
+### カボチャのデータ - 初期の結論
+
+このデータについて何か気付いたことはありますか?文字列、数字、空白、奇妙な値が混在していて、意味を理解しなければならないこと気付いたと思います。
+
+回帰を使って、このデータにどのような問いかけができますか?「ある月に販売されるカボチャの価格を予測する」というのはどうでしょうか?データをもう一度見てみると、この課題に必要なデータ構造を作るために、いくつかの変更が必要です。
+
+## エクササイズ - カボチャのデータを分析
+
+データを整形するのに非常に便利な [Pandas](https://pandas.pydata.org/) (Python Data Analysisの略) を使って、このカボチャのデータを分析したり整えてみましょう。
+
+### 最初に、日付が欠損していないか確認する
+
+日付が欠損していないか確認するために、いくつかのステップがあります:
+
+1. 日付を月の形式に変換する(これは米国の日付なので、形式は `MM/DD/YYYY` となる)。
+2. 新しい列として月を抽出する。
+
+Visual Studio Codeで _notebook.ipynb_ ファイルを開き、スプレッドシートを Pandas DataFrame としてインポートします。
+
+1. `head()` 関数を使って最初の5行を確認します。
+
+ ```python
+ import pandas as pd
+ pumpkins = pd.read_csv('../data/US-pumpkins.csv')
+ pumpkins.head()
+ ```
+
+ ✅ 最後の5行を表示するには、どのような関数を使用しますか?
+
+
+2. 現在のデータフレームに欠損データがあるかどうかをチェックします。
+
+ ```python
+ pumpkins.isnull().sum()
+ ```
+
+ 欠損データがありましたが、今回のタスクには影響がなさそうです。
+
+
+3. データフレームを扱いやすくするために、`drop()` 関数を使っていくつかの列を削除し、必要な列だけを残すようにします。
+
+ ```python
+ new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']
+ pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
+ ```
+
+### 次に、カボチャの平均価格を決定します。
+
+ある月のかぼちゃの平均価格を決定する方法を考えてみましょう。このタスクのために、どの列が必要ですか?ヒント:3つの列が必要になります。
+
+解決策:「最低価格」と「最高価格」の平均値を取って新しい「price」列を作成し、「日付」列を月のみ表示するように変換します。幸いなことに、上記で確認した結果によると日付や価格に欠損データはありませんでした。
+
+1. 平均値を算出するために、以下のコードを追加します。
+
+ ```python
+ price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2
+
+ month = pd.DatetimeIndex(pumpkins['Date']).month
+
+ ```
+
+ ✅ `print(month)` などを使って自由にデータを確認してみてください。
+
+
+2. 変換したデータをPandasの新しいデータフレームにコピーします。
+
+ ```python
+ new_pumpkins = pd.DataFrame({'Month': month, 'Package': pumpkins['Package'], 'Low Price': pumpkins['Low Price'],'High Price': pumpkins['High Price'], 'Price': price})
+ ```
+
+ データフレームを出力すると、新しい回帰モデルを構築するための綺麗に整頓されたデータセットが表示されます。
+
+### でも、待ってください!なにかおかしいです。
+
+`Package` 列をみると、カボチャは様々な形で販売されています。「1 1/9ブッシェル」で売られているもの、「1/2ブッシェル」で売られているもの、かぼちゃ1個単位で売られているもの、1ポンド単位で売られているもの、幅の違う大きな箱で売られているものなど様々です。
+
+
+> かぼちゃの重さを一定にするのはとても難しいようです。
+
+元のデータを調べてみると、「Unit of Sale」が「EACH」または「PER BIN」となっているものは、「Package」が「per inch」、「per bin」、「each」となっているのが興味深いです。カボチャの計量単位に一貫性を持たせるのが非常に難しいようなので、`Package`列に「bushel」という文字列を持つカボチャだけを選択してフィルタリングしてみましょう。
+
+1. ファイルの一番上にフィルタを追加します。
+
+ ```python
+ pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]
+ ```
+
+ 今、データを出力してみると、ブッシェル単位のカボチャを含む415行ほどのデータしか得られていないことがわかります。
+
+### でも、待ってください!もうひとつ、やるべきことがあります。
+
+行ごとにブッシェルの量が異なることに気付きましたか?1ブッシェルあたりの価格を表示するためには、計算して価格を標準化する必要があります。
+
+1. new_pumpkinsデータフレームを作成するブロックの後に以下の行を追加します。
+
+ ```python
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/(1 + 1/9)
+
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price/(1/2)
+ ```
+
+✅ [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308) によると、ブッシェルの重さは体積を測るものなので、農産物の種類によって異なります。例えば、トマトの1ブッシェルは、56ポンドの重さになるとされています。葉っぱや野菜は重量が少なくてもスペースを取るので、ほうれん草の1ブッシェルはたったの20ポンドです。なんだか複雑ですね!ブッシェルからポンドへの換算は面倒なのでやめて、ブッシェル単位で価格を決めましょう。しかし、カボチャのブッシェルについての議論は、データの性質を理解することがいかに重要であるかを示しています。
+
+これで、ブッシェルの測定値に基づいて、ユニットごとの価格を分析することができます。もう1度データを出力してみると、標準化されていることがわかります。
+
+✅ ハーフブッシェルで売られているカボチャがとても高価なことに気付きましたか?なぜだかわかりますか?小さなカボチャは大きなカボチャよりもはるかに高価です。おそらく大きなカボチャ中身には、体積あたりで考えると空洞な部分が多く含まれると考えられます。
+
+## 可視化戦略
+
+データサイエンティストの役割の一つは、扱うデータの質や性質を示すことです。そのために、データのさまざまな側面を示す興味深いビジュアライゼーション(プロット、グラフ、チャート)を作成することがよくあります。そうすることで、他の方法では発見しにくい関係性やギャップを視覚的に示すことができます。
+
+また、可視化することでデータに適した機械学習の手法を判断することができます。例えば、散布図が直線に沿っているように見える場合は、適用する手法の候補の一つとして線形回帰が考えられます。
+
+Jupyter notebookでうまく利用できるテータ可視化ライブラリの一つに [Matplotlib](https://matplotlib.org/) があります (前のレッスンでも紹介しています)。
+
+> [こちらのチュートリアル](https://docs.microsoft.com/learn/modules/explore-analyze-data-with-python?WT.mc_id=academic-15963-cxa) でデータの可視化ついてより深く体験することができます。
+
+## エクササイズ - Matplotlibの実験
+
+先ほど作成したデータフレームを表示するために、いくつか基本的なプロットを作成してみてください。折れ線グラフから何が読み取れるでしょうか?
+
+1. ファイルの先頭、Pandasのインポートの下で Matplotlibをインポートします。
+
+ ```python
+ import matplotlib.pyplot as plt
+ ```
+
+1. ノートブック全体を再実行してリフレッシュします。
+2. ノートブックの下部に、データをプロットするためのセルを追加します。
+
+ ```python
+ price = new_pumpkins.Price
+ month = new_pumpkins.Month
+ plt.scatter(price, month)
+ plt.show()
+ ```
+
+ 
+
+ これは役に立つプロットですか?なにか驚いたことはありますか?
+
+ これはデータをある月について、データの広がりとして表示しているだけなので、特に役に立つものではありません。
+
+### 活用できるようにする
+
+グラフに有用なデータを表示するには、通常、データを何らかの方法でグループ化する必要があります。ここでは、X軸を月として、データの分布を示すようなプロットを作ってみましょう。
+
+1. セルを追加してグループ化された棒グラフを作成します。
+
+ ```python
+ new_pumpkins.groupby(['Month'])['Price'].mean().plot(kind='bar')
+ plt.ylabel("Pumpkin Price")
+ ```
+
+ 
+
+ このプロットの方が、より有用なデータを可視化しています!カボチャの価格が最も高くなるのは、9月と10月であることを示しているようです。このプロットはあなたの期待に応えるものですか?どのような点で期待通りですか?また、どのような点で期待に答えられていませんか?
+
+---
+
+## 🚀チャレンジ
+
+Matplotlibが提供する様々なタイプのビジュアライゼーションを探ってみましょう。回帰の問題にはどのタイプが最も適しているでしょうか?
+
+## [講義後クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/12?loc=ja)
+
+## レビュー & 自主学習
+
+データを可視化するための様々な方法を見てみましょう。様々なライブラリをリストアップし、例えば2Dビジュアライゼーションと3Dビジュアライゼーションのように、特定のタイプのタスクに最適なものをメモします。どのような発見がありましたか?
+
+## 課題
+
+[ビジュアライゼーションの探求](./assignment.ja.md)
diff --git a/2-Regression/2-Data/translations/README.zh-cn.md b/2-Regression/2-Data/translations/README.zh-cn.md
new file mode 100644
index 00000000..bc273ab1
--- /dev/null
+++ b/2-Regression/2-Data/translations/README.zh-cn.md
@@ -0,0 +1,202 @@
+# 使用Scikit-learn构建回归模型:准备和可视化数据
+
+> 
+> 作者[Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+## [课前测](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/11/)
+
+## 介绍
+
+既然你已经设置了开始使用Scikit-learn处理机器学习模型构建所需的工具,你就可以开始对数据提出问题了。当你处理数据并应用ML解决方案时,了解如何提出正确的问题以正确释放数据集的潜力非常重要。
+
+在本课中,你将学习:
+
+- 如何为模型构建准备数据。
+- 如何使用Matplotlib进行数据可视化。
+
+## 对你的数据提出正确的问题
+
+你需要回答的问题将决定你将使用哪种类型的ML算法。你得到的答案的质量将在很大程度上取决于你的数据的性质。
+
+查看为本课程提供的[数据](../data/US-pumpkins.csv)。你可以在VS Code中打开这个.csv文件。快速浏览一下就会发现有空格,还有字符串和数字数据的混合。还有一个奇怪的列叫做“Package”,其中的数据是“sacks”、“bins”和其他值的混合。事实上,数据有点乱。
+
+事实上,获得一个完全准备好用于创建开箱即用的ML模型的数据集并不是很常见。在本课中,你将学习如何使用标准Python库准备原始数据集。你还将学习各种技术来可视化数据。
+
+## 案例研究:“南瓜市场”
+
+你将在`data`文件夹中找到一个名为[US-pumpkins.csv](../data/US-pumpkins.csv)的.csv 文件,其中包含有关南瓜市场的1757行数据,已 按城市排序分组。这是从美国农业部分发的[特种作物终端市场标准报告](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice)中提取的原始数据。
+
+### 准备数据
+
+这些数据属于公共领域。它可以从美国农业部网站下载,每个城市有许多不同的文件。为了避免太多单独的文件,我们将所有城市数据合并到一个电子表格中,因此我们已经准备了一些数据。接下来,让我们仔细看看数据。
+
+### 南瓜数据 - 早期结论
+
+你对这些数据有什么看法?你已经看到了无法理解的字符串、数字、空格和奇怪值的混合体。
+
+你可以使用回归技术对这些数据提出什么问题?“预测给定月份内待售南瓜的价格”怎么样?再次查看数据,你需要进行一些更改才能创建任务所需的数据结构。
+## 练习 - 分析南瓜数据
+
+让我们使用[Pandas](https://pandas.pydata.org/),(“Python 数据分析”的意思)一个非常有用的工具,用于分析和准备南瓜数据。
+
+### 首先,检查遗漏的日期
+
+你首先需要采取以下步骤来检查缺少的日期:
+
+1. 将日期转换为月份格式(这些是美国日期,因此格式为`MM/DD/YYYY`)。
+
+2. 将月份提取到新列。
+
+在 Visual Studio Code 中打开notebook.ipynb文件,并将电子表格导入到新的Pandas dataframe中。
+
+1. 使用 `head()`函数查看前五行。
+
+ ```python
+ import pandas as pd
+ pumpkins = pd.read_csv('../../data/US-pumpkins.csv')
+ pumpkins.head()
+ ```
+
+ ✅ 使用什么函数来查看最后五行?
+
+2. 检查当前dataframe中是否缺少数据:
+
+ ```python
+ pumpkins.isnull().sum()
+ ```
+
+ 有数据丢失,但可能对手头的任务来说无关紧要。
+
+3. 为了让你的dataframe更容易使用,使用`drop()`删除它的几个列,只保留你需要的列:
+
+ ```python
+ new_columns = ['Package', 'Month', 'Low Price', 'High Price', 'Date']
+ pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
+ ```
+
+### 然后,确定南瓜的平均价格
+
+考虑如何确定给定月份南瓜的平均价格。你会为此任务选择哪些列?提示:你需要3列。
+
+解决方案:取`Low Price`和`High Price`列的平均值来填充新的Price列,将Date列转换成只显示月份。幸运的是,根据上面的检查,没有丢失日期或价格的数据。
+
+1. 要计算平均值,请添加以下代码:
+
+ ```python
+ price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2
+
+ month = pd.DatetimeIndex(pumpkins['Date']).month
+
+ ```
+
+ ✅ 请随意使用`print(month)`打印你想检查的任何数据。
+
+2. 现在,将转换后的数据复制到新的Pandas dataframe中:
+
+ ```python
+ new_pumpkins = pd.DataFrame({'Month': month, 'Package': pumpkins['Package'], 'Low Price': pumpkins['Low Price'],'High Price': pumpkins['High Price'], 'Price': price})
+ ```
+
+ 打印出的dataframe将向你展示一个干净整洁的数据集,你可以在此数据集上构建新的回归模型。
+
+### 但是等等!这里有点奇怪
+
+如果你看看`Package`(包装)一栏,南瓜有很多不同的配置。有的以1 1/9蒲式耳的尺寸出售,有的以1/2蒲式耳的尺寸出售,有的以每只南瓜出售,有的以每磅出售,有的以不同宽度的大盒子出售。
+
+> 南瓜似乎很难统一称重方式
+
+深入研究原始数据,有趣的是,任何`Unit of Sale`等于“EACH”或“PER BIN”的东西也具有每英寸、每箱或“每个”的`Package`类型。南瓜似乎很难采用统一称重方式,因此让我们通过仅选择`Package`列中带有字符串“蒲式耳”的南瓜来过滤它们。
+
+1. 在初始.csv导入下添加过滤器:
+
+ ```python
+ pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]
+ ```
+
+ 如果你现在打印数据,你可以看到你只获得了 415 行左右包含按蒲式耳计算的南瓜的数据。
+
+### 可是等等! 还有一件事要做
+
+你是否注意到每行的蒲式耳数量不同?你需要对定价进行标准化,以便显示每蒲式耳的定价,因此请进行一些数学计算以对其进行标准化。
+
+1. 在创建 new_pumpkins dataframe的代码块之后添加这些行:
+
+ ```python
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/(1 + 1/9)
+
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price/(1/2)
+ ```
+
+✅ 根据 [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308),蒲式耳的重量取决于产品的类型,因为它是一种体积测量。“例如,一蒲式耳西红柿应该重56 磅……叶子和蔬菜占据更多空间,重量更轻,所以一蒲式耳菠菜只有20磅。” 这一切都相当复杂!让我们不要费心进行蒲式耳到磅的转换,而是按蒲式耳定价。然而,所有这些对蒲式耳南瓜的研究表明,了解数据的性质是多么重要!
+
+现在,你可以根据蒲式耳测量来分析每单位的定价。如果你再打印一次数据,你可以看到它是如何标准化的。
+
+✅ 你有没有注意到半蒲式耳卖的南瓜很贵?你能弄清楚为什么吗?提示:小南瓜比大南瓜贵得多,这可能是因为考虑到一个大的空心馅饼南瓜占用的未使用空间,每蒲式耳的南瓜要多得多。
+
+## 可视化策略
+
+数据科学家的部分职责是展示他们使用的数据的质量和性质。为此,他们通常会创建有趣的可视化或绘图、图形和图表,以显示数据的不同方面。通过这种方式,他们能够直观地展示难以发现的关系和差距。
+
+可视化还可以帮助确定最适合数据的机器学习技术。例如,似乎沿着一条线的散点图表明该数据是线性回归练习的良好候选者。
+
+一个在Jupyter notebooks中运行良好的数据可视化库是[Matplotlib](https://matplotlib.org/)(你在上一课中也看到过)。
+
+> 在[这些教程](https://docs.microsoft.com/learn/modules/explore-analyze-data-with-python?WT.mc_id=academic-15963-cxa)中获得更多数据可视化经验。
+
+## 练习 - 使用 Matplotlib 进行实验
+
+尝试创建一些基本图形来显示你刚刚创建的新dataframe。基本线图会显示什么?
+
+1. 在文件顶部导入Matplotlib:
+
+ ```python
+ import matplotlib.pyplot as plt
+ ```
+
+2. 重新刷新以运行整个notebook。
+
+3. 在notebook底部,添加一个单元格以绘制数据:
+
+ ```python
+ price = new_pumpkins.Price
+ month = new_pumpkins.Month
+ plt.scatter(price, month)
+ plt.show()
+ ```
+
+ 
+
+ 这是一个有用的图吗?有什么让你吃惊的吗?
+
+ 它并不是特别有用,因为它所做的只是在你的数据中显示为给定月份的点数分布。
+
+### 让它有用
+
+为了让图表显示有用的数据,你通常需要以某种方式对数据进行分组。让我们尝试创建一个图,其中y轴显示月份,数据显示数据的分布。
+
+1. 添加单元格以创建分组柱状图:
+
+ ```python
+ new_pumpkins.groupby(['Month'])['Price'].mean().plot(kind='bar')
+ plt.ylabel("Pumpkin Price")
+ ```
+
+ 
+
+ 这是一个更有用的数据可视化!似乎表明南瓜的最高价格出现在9月和10月。这符合你的期望吗?为什么?为什么不?
+
+---
+
+## 🚀挑战
+
+探索Matplotlib提供的不同类型的可视化。哪种类型最适合回归问题?
+
+## [课后测](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/12/)
+
+## 复习与自学
+
+请看一下可视化数据的多种方法。列出各种可用的库,并注意哪些库最适合给定类型的任务,例如2D可视化与3D可视化。你发现了什么?
+
+## 任务
+
+[探索可视化](../assignment.md)
diff --git a/2-Regression/2-Data/translations/assignment.it.md b/2-Regression/2-Data/translations/assignment.it.md
new file mode 100644
index 00000000..14527fca
--- /dev/null
+++ b/2-Regression/2-Data/translations/assignment.it.md
@@ -0,0 +1,9 @@
+# Esplorazione delle visualizzazioni
+
+Sono disponibili diverse librerie per la visualizzazione dei dati. Creare alcune visualizzazioni utilizzando i dati della zucca in questa lezione con matplotlib e seaborn in un notebook di esempio. Con quali librerie è più facile lavorare?
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | --------- | -------- | ----------------- |
+| | Viene inviato un notebook con due esplorazioni/visualizzazioni | Viene inviato un notebook con una esplorazione/visualizzazione | Non è stato inviato un notebook |
diff --git a/2-Regression/2-Data/translations/assignment.ja.md b/2-Regression/2-Data/translations/assignment.ja.md
new file mode 100644
index 00000000..09f344d6
--- /dev/null
+++ b/2-Regression/2-Data/translations/assignment.ja.md
@@ -0,0 +1,9 @@
+# ビジュアライゼーションの探求
+
+データのビジュアライゼーションには、いくつかの異なるライブラリがあります。このレッスンのPumpkinデータを使って、matplotlibとseabornを使って、サンプルノートブックでいくつかのビジュアライゼーションを作ってみましょう。どのライブラリが作業しやすいでしょうか?
+
+## ルーブリック
+
+| 指標 | 模範的 | 適切 | 要改善 |
+| -------- | --------- | -------- | ----------------- |
+| | ノートブックには2つの活用法/可視化方法が示されている。 | ノートブックには1つの活用法/可視化方法が示されている。 | ノートブックが提出されていない。 |
diff --git a/2-Regression/2-Data/translations/assignment.zh-cn.md b/2-Regression/2-Data/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..e9c0f1c2
--- /dev/null
+++ b/2-Regression/2-Data/translations/assignment.zh-cn.md
@@ -0,0 +1,9 @@
+# 探索数据可视化
+
+有好几个库都可以进行数据可视化。用 matplotlib 和 seaborn 对本课中涉及的 Pumpkin 数据集创建一些数据可视化的图标。并思考哪个库更容易使用?
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | --------- | -------- | ----------------- |
+| | 提交了含有两种探索可视化方法的notebook工程文件 | 提交了只包含有一种探索可视化方法的notebook工程文件 | 没提交 notebook 工程文件 |
diff --git a/2-Regression/3-Linear/images/janitor.jpg b/2-Regression/3-Linear/images/janitor.jpg
new file mode 100644
index 00000000..93e6f011
Binary files /dev/null and b/2-Regression/3-Linear/images/janitor.jpg differ
diff --git a/2-Regression/3-Linear/images/recipes.png b/2-Regression/3-Linear/images/recipes.png
new file mode 100644
index 00000000..7fd24b06
Binary files /dev/null and b/2-Regression/3-Linear/images/recipes.png differ
diff --git a/2-Regression/3-Linear/solution/lesson_3-R.ipynb b/2-Regression/3-Linear/solution/lesson_3-R.ipynb
new file mode 100644
index 00000000..0cc040bb
--- /dev/null
+++ b/2-Regression/3-Linear/solution/lesson_3-R.ipynb
@@ -0,0 +1,1082 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {
+ "colab": {
+ "name": "lesson_3-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# Build a regression model: linear and polynomial regression models"
+ ],
+ "metadata": {
+ "id": "EgQw8osnsUV-"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## Linear and Polynomial Regression for Pumpkin Pricing - Lesson 3\r\n",
+ "
\r\n",
+ " \r\n",
+ " Infographic by Dasani Madipalli\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "#### Introduction\r\n",
+ "\r\n",
+ "So far you have explored what regression is with sample data gathered from the pumpkin pricing dataset that we will use throughout this lesson. You have also visualized it using `ggplot2`.💪\r\n",
+ "\r\n",
+ "Now you are ready to dive deeper into regression for ML. In this lesson, you will learn more about two types of regression: *basic linear regression* and *polynomial regression*, along with some of the math underlying these techniques.\r\n",
+ "\r\n",
+ "> Throughout this curriculum, we assume minimal knowledge of math, and seek to make it accessible for students coming from other fields, so watch for notes, 🧮 callouts, diagrams, and other learning tools to aid in comprehension.\r\n",
+ "\r\n",
+ "#### Preparation\r\n",
+ "\r\n",
+ "As a reminder, you are loading this data so as to ask questions of it.\r\n",
+ "\r\n",
+ "- When is the best time to buy pumpkins?\r\n",
+ "\r\n",
+ "- What price can I expect of a case of miniature pumpkins?\r\n",
+ "\r\n",
+ "- Should I buy them in half-bushel baskets or by the 1 1/9 bushel box? Let's keep digging into this data.\r\n",
+ "\r\n",
+ "In the previous lesson, you created a `tibble` (a modern reimagining of the data frame) and populated it with part of the original dataset, standardizing the pricing by the bushel. By doing that, however, you were only able to gather about 400 data points and only for the fall months. Maybe we can get a little more detail about the nature of the data by cleaning it more? We'll see... 🕵️♀️\r\n",
+ "\r\n",
+ "For this task, we'll require the following packages:\r\n",
+ "\r\n",
+ "- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!\r\n",
+ "\r\n",
+ "- `tidymodels`: The [tidymodels](https://www.tidymodels.org/) framework is a [collection of packages](https://www.tidymodels.org/packages/) for modeling and machine learning.\r\n",
+ "\r\n",
+ "- `janitor`: The [janitor package](https://github.com/sfirke/janitor) provides simple little tools for examining and cleaning dirty data.\r\n",
+ "\r\n",
+ "- `corrplot`: The [corrplot package](https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html) provides a visual exploratory tool on correlation matrix that supports automatic variable reordering to help detect hidden patterns among variables.\r\n",
+ "\r\n",
+ "You can have them installed as:\r\n",
+ "\r\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"janitor\", \"corrplot\"))`\r\n",
+ "\r\n",
+ "The script below checks whether you have the packages required to complete this module and installs them for you in case they are missing."
+ ],
+ "metadata": {
+ "id": "WqQPS1OAsg3H"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "suppressWarnings(if (!require(\"pacman\")) install.packages(\"pacman\"))\r\n",
+ "\r\n",
+ "pacman::p_load(tidyverse, tidymodels, janitor, corrplot)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "tA4C2WN3skCf",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c06cd805-5534-4edc-f72b-d0d1dab96ac0"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "We'll later load these awesome packages and make them available in our current R session. (This is for mere illustration, `pacman::p_load()` already did that for you)\r\n",
+ "\r\n",
+ "## 1. A linear regression line\r\n",
+ "\r\n",
+ "As you learned in Lesson 1, the goal of a linear regression exercise is to be able to plot a *line* *of* *best fit* to:\r\n",
+ "\r\n",
+ "- **Show variable relationships**. Show the relationship between variables\r\n",
+ "\r\n",
+ "- **Make predictions**. Make accurate predictions on where a new data point would fall in relationship to that line.\r\n",
+ "\r\n",
+ "To draw this type of line, we use a statistical technique called **Least-Squares Regression**. The term `least-squares` means that all the data points surrounding the regression line are squared and then added up. Ideally, that final sum is as small as possible, because we want a low number of errors, or `least-squares`. As such, the line of best fit is the line that gives us the lowest value for the sum of the squared errors - hence the name *least squares regression*.\r\n",
+ "\r\n",
+ "We do so since we want to model a line that has the least cumulative distance from all of our data points. We also square the terms before adding them since we are concerned with its magnitude rather than its direction.\r\n",
+ "\r\n",
+ "> **🧮 Show me the math**\r\n",
+ ">\r\n",
+ "> This line, called the *line of best fit* can be expressed by [an equation](https://en.wikipedia.org/wiki/Simple_linear_regression):\r\n",
+ ">\r\n",
+ "> Y = a + bX\r\n",
+ ">\r\n",
+ "> `X` is the '`explanatory variable` or `predictor`'. `Y` is the '`dependent variable` or `outcome`'. The slope of the line is `b` and `a` is the y-intercept, which refers to the value of `Y` when `X = 0`.\r\n",
+ ">\r\n",
+ "\r\n",
+ "> \r\n",
+ " Infographic by Jen Looper\r\n",
+ ">\r\n",
+ "> First, calculate the slope `b`.\r\n",
+ ">\r\n",
+ "> In other words, and referring to our pumpkin data's original question: \"predict the price of a pumpkin per bushel by month\", `X` would refer to the price and `Y` would refer to the month of sale.\r\n",
+ ">\r\n",
+ "> \r\n",
+ " Infographic by Jen Looper\r\n",
+ "> \r\n",
+ "> Calculate the value of Y. If you're paying around \\$4, it must be April!\r\n",
+ ">\r\n",
+ "> The math that calculates the line must demonstrate the slope of the line, which is also dependent on the intercept, or where `Y` is situated when `X = 0`.\r\n",
+ ">\r\n",
+ "> You can observe the method of calculation for these values on the [Math is Fun](https://www.mathsisfun.com/data/least-squares-regression.html) web site. Also visit [this Least-squares calculator](https://www.mathsisfun.com/data/least-squares-calculator.html) to watch how the numbers' values impact the line.\r\n",
+ "\r\n",
+ "Not so scary, right? 🤓\r\n",
+ "\r\n",
+ "#### Correlation\r\n",
+ "\r\n",
+ "One more term to understand is the **Correlation Coefficient** between given X and Y variables. Using a scatterplot, you can quickly visualize this coefficient. A plot with datapoints scattered in a neat line have high correlation, but a plot with datapoints scattered everywhere between X and Y have a low correlation.\r\n",
+ "\r\n",
+ "A good linear regression model will be one that has a high (nearer to 1 than 0) Correlation Coefficient using the Least-Squares Regression method with a line of regression.\r\n",
+ "\r\n"
+ ],
+ "metadata": {
+ "id": "cdX5FRpvsoP5"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## **2. A dance with data: creating a data frame that will be used for modelling**\r\n",
+ "\r\n",
+ "
\r\n",
+ " \r\n",
+ " Artwork by @allison_horst\r\n",
+ "\r\n",
+ "\r\n",
+ ""
+ ],
+ "metadata": {
+ "id": "WdUKXk7Bs8-V"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Load up required libraries and dataset. Convert the data to a data frame containing a subset of the data:\n",
+ "\n",
+ "- Only get pumpkins priced by the bushel\n",
+ "\n",
+ "- Convert the date to a month\n",
+ "\n",
+ "- Calculate the price to be an average of high and low prices\n",
+ "\n",
+ "- Convert the price to reflect the pricing by bushel quantity\n",
+ "\n",
+ "> We covered these steps in the [previous lesson](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/2-Data/solution/lesson_2-R.ipynb)."
+ ],
+ "metadata": {
+ "id": "fMCtu2G2s-p8"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load the core Tidyverse packages\r\n",
+ "library(tidyverse)\r\n",
+ "library(lubridate)\r\n",
+ "\r\n",
+ "# Import the pumpkins data\r\n",
+ "pumpkins <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Get a glimpse and dimensions of the data\r\n",
+ "glimpse(pumpkins)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print the first 50 rows of the data set\r\n",
+ "pumpkins %>% \r\n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "ryMVZEEPtERn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "In the spirit of sheer adventure, let's explore the [`janitor package`](github.com/sfirke/janitor) that provides simple functions for examining and cleaning dirty data. For instance, let's take a look at the column names for our data:"
+ ],
+ "metadata": {
+ "id": "xcNxM70EtJjb"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Return column names\r\n",
+ "pumpkins %>% \r\n",
+ " names()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "5XtpaIigtPfW"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤔 We can do better. Let's make these column names `friendR` by converting them to the [snake_case](https://en.wikipedia.org/wiki/Snake_case) convention using `janitor::clean_names`. To find out more about this function: `?clean_names`"
+ ],
+ "metadata": {
+ "id": "IbIqrMINtSHe"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Clean names to the snake_case convention\r\n",
+ "pumpkins <- pumpkins %>% \r\n",
+ " clean_names(case = \"snake\")\r\n",
+ "\r\n",
+ "# Return column names\r\n",
+ "pumpkins %>% \r\n",
+ " names()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "a2uYvclYtWvX"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Much tidyR 🧹! Now, a dance with the data using `dplyr` as in the previous lesson! 💃\n"
+ ],
+ "metadata": {
+ "id": "HfhnuzDDtaDd"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Select desired columns\r\n",
+ "pumpkins <- pumpkins %>% \r\n",
+ " select(variety, city_name, package, low_price, high_price, date)\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "# Extract the month from the dates to a new column\r\n",
+ "pumpkins <- pumpkins %>%\r\n",
+ " mutate(date = mdy(date),\r\n",
+ " month = month(date)) %>% \r\n",
+ " select(-date)\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a new column for average Price\r\n",
+ "pumpkins <- pumpkins %>% \r\n",
+ " mutate(price = (low_price + high_price)/2)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Retain only pumpkins with the string \"bushel\"\r\n",
+ "new_pumpkins <- pumpkins %>% \r\n",
+ " filter(str_detect(string = package, pattern = \"bushel\"))\r\n",
+ "\r\n",
+ "\r\n",
+ "# Normalize the pricing so that you show the pricing per bushel, not per 1 1/9 or 1/2 bushel\r\n",
+ "new_pumpkins <- new_pumpkins %>% \r\n",
+ " mutate(price = case_when(\r\n",
+ " str_detect(package, \"1 1/9\") ~ price/(1.1),\r\n",
+ " str_detect(package, \"1/2\") ~ price*2,\r\n",
+ " TRUE ~ price))\r\n",
+ "\r\n",
+ "# Relocate column positions\r\n",
+ "new_pumpkins <- new_pumpkins %>% \r\n",
+ " relocate(month, .before = variety)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Display the first 5 rows\r\n",
+ "new_pumpkins %>% \r\n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "X0wU3gQvtd9f"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Good job!👌 You now have a clean, tidy data set on which you can build your new regression model!\n",
+ "\n",
+ "Mind a scatter plot?\n"
+ ],
+ "metadata": {
+ "id": "UpaIwaxqth82"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Set theme\r\n",
+ "theme_set(theme_light())\r\n",
+ "\r\n",
+ "# Make a scatter plot of month and price\r\n",
+ "new_pumpkins %>% \r\n",
+ " ggplot(mapping = aes(x = month, y = price)) +\r\n",
+ " geom_point(size = 1.6)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "DXgU-j37tl5K"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "A scatter plot reminds us that we only have month data from August through December. We probably need more data to be able to draw conclusions in a linear fashion.\n",
+ "\n",
+ "Let's take a look at our modelling data again:"
+ ],
+ "metadata": {
+ "id": "Ve64wVbwtobI"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Display first 5 rows\r\n",
+ "new_pumpkins %>% \r\n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "HFQX2ng1tuSJ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "What if we wanted to predict the `price` of a pumpkin based on the `city` or `package` columns which are of type character? Or even more simply, how could we find the correlation (which requires both of its inputs to be numeric) between, say, `package` and `price`? 🤷🤷\n",
+ "\n",
+ "Machine learning models work best with numeric features rather than text values, so you generally need to convert categorical features into numeric representations.\n",
+ "\n",
+ "This means that we have to find a way to reformat our predictors to make them easier for a model to use effectively, a process known as `feature engineering`."
+ ],
+ "metadata": {
+ "id": "7hsHoxsStyjJ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 3. Preprocessing data for modelling with recipes 👩🍳👨🍳\n",
+ "\n",
+ "Activities that reformat predictor values to make them easier for a model to use effectively has been termed `feature engineering`.\n",
+ "\n",
+ "Different models have different preprocessing requirements. For instance, least squares requires `encoding categorical variables` such as month, variety and city_name. This simply involves `translating` a column with `categorical values` into one or more `numeric columns` that take the place of the original.\n",
+ "\n",
+ "For example, suppose your data includes the following categorical feature:\n",
+ "\n",
+ "| city |\n",
+ "|:-------:|\n",
+ "| Denver |\n",
+ "| Nairobi |\n",
+ "| Tokyo |\n",
+ "\n",
+ "You can apply *ordinal encoding* to substitute a unique integer value for each category, like this:\n",
+ "\n",
+ "| city |\n",
+ "|:----:|\n",
+ "| 0 |\n",
+ "| 1 |\n",
+ "| 2 |\n",
+ "\n",
+ "And that's what we'll do to our data!\n",
+ "\n",
+ "In this section, we'll explore another amazing Tidymodels package: [recipes](https://tidymodels.github.io/recipes/) - which is designed to help you preprocess your data **before** training your model. At its core, a recipe is an object that defines what steps should be applied to a data set in order to get it ready for modelling.\n",
+ "\n",
+ "Now, let's create a recipe that prepares our data for modelling by substituting a unique integer for all the observations in the predictor columns:"
+ ],
+ "metadata": {
+ "id": "AD5kQbcvt3Xl"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Specify a recipe\r\n",
+ "pumpkins_recipe <- recipe(price ~ ., data = new_pumpkins) %>% \r\n",
+ " step_integer(all_predictors(), zero_based = TRUE)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print out the recipe\r\n",
+ "pumpkins_recipe"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "BNaFKXfRt9TU"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Awesome! 👏 We just created our first recipe that specifies an outcome (price) and its corresponding predictors and that all the predictor columns should be encoded into a set of integers 🙌! Let's quickly break it down:\r\n",
+ "\r\n",
+ "- The call to `recipe()` with a formula tells the recipe the *roles* of the variables using `new_pumpkins` data as the reference. For instance the `price` column has been assigned an `outcome` role while the rest of the columns have been assigned a `predictor` role.\r\n",
+ "\r\n",
+ "- `step_integer(all_predictors(), zero_based = TRUE)` specifies that all the predictors should be converted into a set of integers with the numbering starting at 0.\r\n",
+ "\r\n",
+ "We are sure you may be having thoughts such as: \"This is so cool!! But what if I needed to confirm that the recipes are doing exactly what I expect them to do? 🤔\"\r\n",
+ "\r\n",
+ "That's an awesome thought! You see, once your recipe is defined, you can estimate the parameters required to actually preprocess the data, and then extract the processed data. You don't typically need to do this when you use Tidymodels (we'll see the normal convention in just a minute-\\> `workflows`) but it can come in handy when you want to do some kind of sanity check for confirming that recipes are doing what you expect.\r\n",
+ "\r\n",
+ "For that, you'll need two more verbs: `prep()` and `bake()` and as always, our little R friends by [`Allison Horst`](https://github.com/allisonhorst/stats-illustrations) help you in understanding this better!\r\n",
+ "\r\n",
+ "
\r\n",
+ " \r\n",
+ " Artwork by @allison_horst\r\n",
+ "\r\n",
+ "\r\n",
+ ""
+ ],
+ "metadata": {
+ "id": "KEiO0v7kuC9O"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "[`prep()`](https://recipes.tidymodels.org/reference/prep.html): estimates the required parameters from a training set that can be later applied to other data sets. For instance, for a given predictor column, what observation will be assigned integer 0 or 1 or 2 etc\n",
+ "\n",
+ "[`bake()`](https://recipes.tidymodels.org/reference/bake.html): takes a prepped recipe and applies the operations to any data set.\n",
+ "\n",
+ "That said, lets prep and bake our recipes to really confirm that under the hood, the predictor columns will be first encoded before a model is fit."
+ ],
+ "metadata": {
+ "id": "Q1xtzebuuTCP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Prep the recipe\r\n",
+ "pumpkins_prep <- prep(pumpkins_recipe)\r\n",
+ "\r\n",
+ "# Bake the recipe to extract a preprocessed new_pumpkins data\r\n",
+ "baked_pumpkins <- bake(pumpkins_prep, new_data = NULL)\r\n",
+ "\r\n",
+ "# Print out the baked data set\r\n",
+ "baked_pumpkins %>% \r\n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "FGBbJbP_uUUn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Woo-hoo!🥳 The processed data `baked_pumpkins` has all it's predictors encoded confirming that indeed the preprocessing steps defined as our recipe will work as expected. This makes it harder for you to read but much more intelligible for Tidymodels! Take some time to find out what observation has been mapped to a corresponding integer.\n",
+ "\n",
+ "It is also worth mentioning that `baked_pumpkins` is a data frame that we can perform computations on.\n",
+ "\n",
+ "For instance, let's try to find a good correlation between two points of your data to potentially build a good predictive model. We'll use the function `cor()` to do this. Type `?cor()` to find out more about the function."
+ ],
+ "metadata": {
+ "id": "1dvP0LBUueAW"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Find the correlation between the city_name and the price\r\n",
+ "cor(baked_pumpkins$city_name, baked_pumpkins$price)\r\n",
+ "\r\n",
+ "# Find the correlation between the package and the price\r\n",
+ "cor(baked_pumpkins$package, baked_pumpkins$price)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "3bQzXCjFuiSV"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "As it turns out, there's only weak correlation between the City and Price. However there's a bit better correlation between the Package and its Price. That makes sense, right? Normally, the bigger the produce box, the higher the price.\n",
+ "\n",
+ "While we are at it, let's also try and visualize a correlation matrix of all the columns using the `corrplot` package."
+ ],
+ "metadata": {
+ "id": "BToPWbgjuoZw"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load the corrplot package\r\n",
+ "library(corrplot)\r\n",
+ "\r\n",
+ "# Obtain correlation matrix\r\n",
+ "corr_mat <- cor(baked_pumpkins %>% \r\n",
+ " # Drop columns that are not really informative\r\n",
+ " select(-c(low_price, high_price)))\r\n",
+ "\r\n",
+ "# Make a correlation plot between the variables\r\n",
+ "corrplot(corr_mat, method = \"shade\", shade.col = NA, tl.col = \"black\", tl.srt = 45, addCoef.col = \"black\", cl.pos = \"n\", order = \"original\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "ZwAL3ksmutVR"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤩🤩 Much better.\r\n",
+ "\r\n",
+ "A good question to now ask of this data will be: '`What price can I expect of a given pumpkin package?`' Let's get right into it!\r\n",
+ "\r\n",
+ "> Note: When you **`bake()`** the prepped recipe **`pumpkins_prep`** with **`new_data = NULL`**, you extract the processed (i.e. encoded) training data. If you had another data set for example a test set and would want to see how a recipe would pre-process it, you would simply bake **`pumpkins_prep`** with **`new_data = test_set`**\r\n",
+ "\r\n",
+ "## 4. Build a linear regression model\r\n",
+ "\r\n",
+ "
\r\n",
+ " \r\n",
+ " Infographic by Dasani Madipalli\r\n",
+ "\r\n",
+ "\r\n",
+ ""
+ ],
+ "metadata": {
+ "id": "YqXjLuWavNxW"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Now that we have build a recipe, and actually confirmed that the data will be pre-processed appropriately, let's now build a regression model to answer the question: `What price can I expect of a given pumpkin package?`\n",
+ "\n",
+ "#### Train a linear regression model using the training set\n",
+ "\n",
+ "As you may have already figured out, the column *price* is the `outcome` variable while the *package* column is the `predictor` variable.\n",
+ "\n",
+ "To do this, we'll first split the data such that 80% goes into training and 20% into test set, then define a recipe that will encode the predictor column into a set of integers, then build a model specification. We won't prep and bake our recipe since we already know it will preprocess the data as expected."
+ ],
+ "metadata": {
+ "id": "Pq0bSzCevW-h"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "set.seed(2056)\r\n",
+ "# Split the data into training and test sets\r\n",
+ "pumpkins_split <- new_pumpkins %>% \r\n",
+ " initial_split(prop = 0.8)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Extract training and test data\r\n",
+ "pumpkins_train <- training(pumpkins_split)\r\n",
+ "pumpkins_test <- testing(pumpkins_split)\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a recipe for preprocessing the data\r\n",
+ "lm_pumpkins_recipe <- recipe(price ~ package, data = pumpkins_train) %>% \r\n",
+ " step_integer(all_predictors(), zero_based = TRUE)\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a linear model specification\r\n",
+ "lm_spec <- linear_reg() %>% \r\n",
+ " set_engine(\"lm\") %>% \r\n",
+ " set_mode(\"regression\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "CyoEh_wuvcLv"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Good job! Now that we have a recipe and a model specification, we need to find a way of bundling them together into an object that will first preprocess the data (prep+bake behind the scenes), fit the model on the preprocessed data and also allow for potential post-processing activities. How's that for your peace of mind!🤩\n",
+ "\n",
+ "In Tidymodels, this convenient object is called a [`workflow`](https://workflows.tidymodels.org/) and conveniently holds your modeling components! This is what we'd call *pipelines* in *Python*.\n",
+ "\n",
+ "So let's bundle everything up into a workflow!📦"
+ ],
+ "metadata": {
+ "id": "G3zF_3DqviFJ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Hold modelling components in a workflow\r\n",
+ "lm_wf <- workflow() %>% \r\n",
+ " add_recipe(lm_pumpkins_recipe) %>% \r\n",
+ " add_model(lm_spec)\r\n",
+ "\r\n",
+ "# Print out the workflow\r\n",
+ "lm_wf"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "T3olroU3v-WX"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "👌 Into the bargain, a workflow can be fit/trained in much the same way a model can."
+ ],
+ "metadata": {
+ "id": "zd1A5tgOwEPX"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Train the model\r\n",
+ "lm_wf_fit <- lm_wf %>% \r\n",
+ " fit(data = pumpkins_train)\r\n",
+ "\r\n",
+ "# Print the model coefficients learned \r\n",
+ "lm_wf_fit"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "NhJagFumwFHf"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "From the model output, we can see the coefficients learned during training. They represent the coefficients of the line of best fit that gives us the lowest overall error between the actual and predicted variable.\n",
+ "\n",
+ "\n",
+ "#### Evaluate model performance using the test set\n",
+ "\n",
+ "It's time to see how the model performed 📏! How do we do this?\n",
+ "\n",
+ "Now that we've trained the model, we can use it to make predictions for the test_set using `parsnip::predict()`. Then we can compare these predictions to the actual label values to evaluate how well (or not!) the model is working.\n",
+ "\n",
+ "Let's start with making predictions for the test set then bind the columns to the test set."
+ ],
+ "metadata": {
+ "id": "_4QkGtBTwItF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make predictions for the test set\r\n",
+ "predictions <- lm_wf_fit %>% \r\n",
+ " predict(new_data = pumpkins_test)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Bind predictions to the test set\r\n",
+ "lm_results <- pumpkins_test %>% \r\n",
+ " select(c(package, price)) %>% \r\n",
+ " bind_cols(predictions)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print the first ten rows of the tibble\r\n",
+ "lm_results %>% \r\n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "UFZzTG0gwTs9"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "\n",
+ "Yes, you have just trained a model and used it to make predictions!🔮 Is it any good, let's evaluate the model's performance!\n",
+ "\n",
+ "In Tidymodels, we do this using `yardstick::metrics()`! For linear regression, let's focus on the following metrics:\n",
+ "\n",
+ "- `Root Mean Square Error (RMSE)`: The square root of the [MSE](https://en.wikipedia.org/wiki/Mean_squared_error). This yields an absolute metric in the same unit as the label (in this case, the price of a pumpkin). The smaller the value, the better the model (in a simplistic sense, it represents the average price by which the predictions are wrong!)\n",
+ "\n",
+ "- `Coefficient of Determination (usually known as R-squared or R2)`: A relative metric in which the higher the value, the better the fit of the model. In essence, this metric represents how much of the variance between predicted and actual label values the model is able to explain."
+ ],
+ "metadata": {
+ "id": "0A5MjzM7wW9M"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Evaluate performance of linear regression\r\n",
+ "metrics(data = lm_results,\r\n",
+ " truth = price,\r\n",
+ " estimate = .pred)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "reJ0UIhQwcEH"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "There goes the model performance. Let's see if we can get a better indication by visualizing a scatter plot of the package and price then use the predictions made to overlay a line of best fit.\n",
+ "\n",
+ "This means we'll have to prep and bake the test set in order to encode the package column then bind this to the predictions made by our model."
+ ],
+ "metadata": {
+ "id": "fdgjzjkBwfWt"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Encode package column\r\n",
+ "package_encode <- lm_pumpkins_recipe %>% \r\n",
+ " prep() %>% \r\n",
+ " bake(new_data = pumpkins_test) %>% \r\n",
+ " select(package)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Bind encoded package column to the results\r\n",
+ "lm_results <- lm_results %>% \r\n",
+ " bind_cols(package_encode %>% \r\n",
+ " rename(package_integer = package)) %>% \r\n",
+ " relocate(package_integer, .after = package)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print new results data frame\r\n",
+ "lm_results %>% \r\n",
+ " slice_head(n = 5)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Make a scatter plot\r\n",
+ "lm_results %>% \r\n",
+ " ggplot(mapping = aes(x = package_integer, y = price)) +\r\n",
+ " geom_point(size = 1.6) +\r\n",
+ " # Overlay a line of best fit\r\n",
+ " geom_line(aes(y = .pred), color = \"orange\", size = 1.2) +\r\n",
+ " xlab(\"package\")\r\n",
+ " \r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "R0nw719lwkHE"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Great! As you can see, the linear regression model does not really well generalize the relationship between a package and its corresponding price.\r\n",
+ "\r\n",
+ "🎃 Congratulations, you just created a model that can help predict the price of a few varieties of pumpkins. Your holiday pumpkin patch will be beautiful. But you can probably create a better model!\r\n",
+ "\r\n",
+ "## 5. Build a polynomial regression model\r\n",
+ "\r\n",
+ "
\r\n",
+ " \r\n",
+ " Infographic by Dasani Madipalli\r\n",
+ "\r\n",
+ "\r\n",
+ ""
+ ],
+ "metadata": {
+ "id": "HOCqJXLTwtWI"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Sometimes our data may not have a linear relationship, but we still want to predict an outcome. Polynomial regression can help us make predictions for more complex non-linear relationships.\n",
+ "\n",
+ "Take for instance the relationship between the package and price for our pumpkins data set. While sometimes there's a linear relationship between variables - the bigger the pumpkin in volume, the higher the price - sometimes these relationships can't be plotted as a plane or straight line.\n",
+ "\n",
+ "> ✅ Here are [some more examples](https://online.stat.psu.edu/stat501/lesson/9/9.8) of data that could use polynomial regression\n",
+ ">\n",
+ "> Take another look at the relationship between Variety to Price in the previous plot. Does this scatterplot seem like it should necessarily be analyzed by a straight line? Perhaps not. In this case, you can try polynomial regression.\n",
+ ">\n",
+ "> ✅ Polynomials are mathematical expressions that might consist of one or more variables and coefficients\n",
+ "\n",
+ "#### Train a polynomial regression model using the training set\n",
+ "\n",
+ "Polynomial regression creates a *curved line* to better fit nonlinear data.\n",
+ "\n",
+ "Let's see whether a polynomial model will perform better in making predictions. We'll follow a somewhat similar procedure as we did before:\n",
+ "\n",
+ "- Create a recipe that specifies the preprocessing steps that should be carried out on our data to get it ready for modelling i.e: encoding predictors and computing polynomials of degree *n*\n",
+ "\n",
+ "- Build a model specification\n",
+ "\n",
+ "- Bundle the recipe and model specification into a workflow\n",
+ "\n",
+ "- Create a model by fitting the workflow\n",
+ "\n",
+ "- Evaluate how well the model performs on the test data\n",
+ "\n",
+ "Let's get right into it!\n"
+ ],
+ "metadata": {
+ "id": "VcEIpRV9wzYr"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Specify a recipe\r\n",
+ "poly_pumpkins_recipe <-\r\n",
+ " recipe(price ~ package, data = pumpkins_train) %>%\r\n",
+ " step_integer(all_predictors(), zero_based = TRUE) %>% \r\n",
+ " step_poly(all_predictors(), degree = 4)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a model specification\r\n",
+ "poly_spec <- linear_reg() %>% \r\n",
+ " set_engine(\"lm\") %>% \r\n",
+ " set_mode(\"regression\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Bundle recipe and model spec into a workflow\r\n",
+ "poly_wf <- workflow() %>% \r\n",
+ " add_recipe(poly_pumpkins_recipe) %>% \r\n",
+ " add_model(poly_spec)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a model\r\n",
+ "poly_wf_fit <- poly_wf %>% \r\n",
+ " fit(data = pumpkins_train)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print learned model coefficients\r\n",
+ "poly_wf_fit\r\n",
+ "\r\n",
+ " "
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "63n_YyRXw3CC"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#### Evaluate model performance\n",
+ "\n",
+ "👏👏You've built a polynomial model let's make predictions on the test set!"
+ ],
+ "metadata": {
+ "id": "-LHZtztSxDP0"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make price predictions on test data\r\n",
+ "poly_results <- poly_wf_fit %>% predict(new_data = pumpkins_test) %>% \r\n",
+ " bind_cols(pumpkins_test %>% select(c(package, price))) %>% \r\n",
+ " relocate(.pred, .after = last_col())\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print the results\r\n",
+ "poly_results %>% \r\n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "YUFpQ_dKxJGx"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Woo-hoo, let's evaluate how the model performed on the test_set using `yardstick::metrics()`."
+ ],
+ "metadata": {
+ "id": "qxdyj86bxNGZ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "metrics(data = poly_results, truth = price, estimate = .pred)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "8AW5ltkBxXDm"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤩🤩 Much better performance.\n",
+ "\n",
+ "The `rmse` decreased from about 7. to about 3. an indication that of a reduced error between the actual price and the predicted price. You can *loosely* interpret this as meaning that on average, incorrect predictions are wrong by around \\$3. The `rsq` increased from about 0.4 to 0.8.\n",
+ "\n",
+ "All these metrics indicate that the polynomial model performs way better than the linear model. Good job!\n",
+ "\n",
+ "Let's see if we can visualize this!"
+ ],
+ "metadata": {
+ "id": "6gLHNZDwxYaS"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Bind encoded package column to the results\r\n",
+ "poly_results <- poly_results %>% \r\n",
+ " bind_cols(package_encode %>% \r\n",
+ " rename(package_integer = package)) %>% \r\n",
+ " relocate(package_integer, .after = package)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print new results data frame\r\n",
+ "poly_results %>% \r\n",
+ " slice_head(n = 5)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Make a scatter plot\r\n",
+ "poly_results %>% \r\n",
+ " ggplot(mapping = aes(x = package_integer, y = price)) +\r\n",
+ " geom_point(size = 1.6) +\r\n",
+ " # Overlay a line of best fit\r\n",
+ " geom_line(aes(y = .pred), color = \"midnightblue\", size = 1.2) +\r\n",
+ " xlab(\"package\")\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "A83U16frxdF1"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "You can see a curved line that fits your data better! 🤩\n",
+ "\n",
+ "You can make this more smoother by passing a polynomial formula to `geom_smooth` like this:"
+ ],
+ "metadata": {
+ "id": "4U-7aHOVxlGU"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make a scatter plot\r\n",
+ "poly_results %>% \r\n",
+ " ggplot(mapping = aes(x = package_integer, y = price)) +\r\n",
+ " geom_point(size = 1.6) +\r\n",
+ " # Overlay a line of best fit\r\n",
+ " geom_smooth(method = lm, formula = y ~ poly(x, degree = 4), color = \"midnightblue\", size = 1.2, se = FALSE) +\r\n",
+ " xlab(\"package\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "5vzNT0Uexm-w"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Much like a smooth curve!🤩\n",
+ "\n",
+ "Here's how you would make a new prediction:"
+ ],
+ "metadata": {
+ "id": "v9u-wwyLxq4G"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make a hypothetical data frame\r\n",
+ "hypo_tibble <- tibble(package = \"bushel baskets\")\r\n",
+ "\r\n",
+ "# Make predictions using linear model\r\n",
+ "lm_pred <- lm_wf_fit %>% predict(new_data = hypo_tibble)\r\n",
+ "\r\n",
+ "# Make predictions using polynomial model\r\n",
+ "poly_pred <- poly_wf_fit %>% predict(new_data = hypo_tibble)\r\n",
+ "\r\n",
+ "# Return predictions in a list\r\n",
+ "list(\"linear model prediction\" = lm_pred, \r\n",
+ " \"polynomial model prediction\" = poly_pred)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "jRPSyfQGxuQv"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "The `polynomial model` prediction does make sense, given the scatter plots of `price` and `package`! And, if this is a better model than the previous one, looking at the same data, you need to budget for these more expensive pumpkins!\n",
+ "\n",
+ "🏆 Well done! You created two regression models in one lesson. In the final section on regression, you will learn about logistic regression to determine categories.\n",
+ "\n",
+ "## **🚀Challenge**\n",
+ "\n",
+ "Test several different variables in this notebook to see how correlation corresponds to model accuracy.\n",
+ "\n",
+ "## [**Post-lecture quiz**](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/14/)\n",
+ "\n",
+ "## **Review & Self Study**\n",
+ "\n",
+ "In this lesson we learned about Linear Regression. There are other important types of Regression. Read about Stepwise, Ridge, Lasso and Elasticnet techniques. A good course to study to learn more is the [Stanford Statistical Learning course](https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning)\n",
+ "\n",
+ "If you want to learn more about how to use the amazing Tidymodels framework, please check out the following resources:\n",
+ "\n",
+ "- Tidymodels website: [Get started with Tidymodels](https://www.tidymodels.org/start/)\n",
+ "\n",
+ "- Max Kuhn and Julia Silge, [*Tidy Modeling with R*](https://www.tmwr.org/)*.*\n",
+ "\n",
+ "###### **THANK YOU TO:**\n",
+ "\n",
+ "[Allison Horst](https://twitter.com/allison_horst?lang=en) for creating the amazing illustrations that make R more welcoming and engaging. Find more illustrations at her [gallery](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM).\n"
+ ],
+ "metadata": {
+ "id": "8zOLOWqMxzk5"
+ }
+ }
+ ]
+}
\ No newline at end of file
diff --git a/2-Regression/3-Linear/solution/lesson_3.Rmd b/2-Regression/3-Linear/solution/lesson_3.Rmd
new file mode 100644
index 00000000..a1b81780
--- /dev/null
+++ b/2-Regression/3-Linear/solution/lesson_3.Rmd
@@ -0,0 +1,679 @@
+---
+title: 'Build a regression model: linear and polynomial regression models'
+output:
+ html_document:
+ df_print: paged
+ theme: flatly
+ highlight: breezedark
+ toc: yes
+ toc_float: yes
+ code_download: yes
+---
+
+## Linear and Polynomial Regression for Pumpkin Pricing - Lesson 3
+
+{width="800"}
+
+#### Introduction
+
+So far you have explored what regression is with sample data gathered from the pumpkin pricing dataset that we will use throughout this lesson. You have also visualized it using `ggplot2`.💪
+
+Now you are ready to dive deeper into regression for ML. In this lesson, you will learn more about two types of regression: *basic linear regression* and *polynomial regression*, along with some of the math underlying these techniques.
+
+> Throughout this curriculum, we assume minimal knowledge of math, and seek to make it accessible for students coming from other fields, so watch for notes, 🧮 callouts, diagrams, and other learning tools to aid in comprehension.
+
+#### Preparation
+
+As a reminder, you are loading this data so as to ask questions of it.
+
+- When is the best time to buy pumpkins?
+
+- What price can I expect of a case of miniature pumpkins?
+
+- Should I buy them in half-bushel baskets or by the 1 1/9 bushel box? Let's keep digging into this data.
+
+In the previous lesson, you created a `tibble` (a modern reimagining of the data frame) and populated it with part of the original dataset, standardizing the pricing by the bushel. By doing that, however, you were only able to gather about 400 data points and only for the fall months. Maybe we can get a little more detail about the nature of the data by cleaning it more? We'll see... 🕵️♀️
+
+For this task, we'll require the following packages:
+
+- `tidyverse`: The [tidyverse](https://www.tidyverse.org/) is a [collection of R packages](https://www.tidyverse.org/packages) designed to makes data science faster, easier and more fun!
+
+- `tidymodels`: The [tidymodels](https://www.tidymodels.org/) framework is a [collection of packages](https://www.tidymodels.org/packages/) for modeling and machine learning.
+
+- `janitor`: The [janitor package](https://github.com/sfirke/janitor) provides simple little tools for examining and cleaning dirty data.
+
+- `corrplot`: The [corrplot package](https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html) provides a visual exploratory tool on correlation matrix that supports automatic variable reordering to help detect hidden patterns among variables.
+
+You can have them installed as:
+
+`install.packages(c("tidyverse", "tidymodels", "janitor", "corrplot"))`
+
+The script below checks whether you have the packages required to complete this module and installs them for you in case they are missing.
+
+```{r, message=F, warning=F}
+suppressWarnings(if (!require("pacman")) install.packages("pacman"))
+
+pacman::p_load(tidyverse, tidymodels, janitor, corrplot)
+```
+
+We'll later load these awesome packages and make them available in our current R session. (This is for mere illustration, `pacman::p_load()` already did that for you)
+
+## 1. A linear regression line
+
+As you learned in Lesson 1, the goal of a linear regression exercise is to be able to plot a *line* *of* *best fit* to:
+
+- **Show variable relationships**. Show the relationship between variables
+
+- **Make predictions**. Make accurate predictions on where a new data point would fall in relationship to that line.
+
+To draw this type of line, we use a statistical technique called **Least-Squares Regression**. The term `least-squares` means that all the data points surrounding the regression line are squared and then added up. Ideally, that final sum is as small as possible, because we want a low number of errors, or `least-squares`. As such, the line of best fit is the line that gives us the lowest value for the sum of the squared errors - hence the name *least squares regression*.
+
+We do so since we want to model a line that has the least cumulative distance from all of our data points. We also square the terms before adding them since we are concerned with its magnitude rather than its direction.
+
+> **🧮 Show me the math**
+>
+> This line, called the *line of best fit* can be expressed by [an equation](https://en.wikipedia.org/wiki/Simple_linear_regression):
+>
+> Y = a + bX
+>
+> `X` is the '`explanatory variable` or `predictor`'. `Y` is the '`dependent variable` or `outcome`'. The slope of the line is `b` and `a` is the y-intercept, which refers to the value of `Y` when `X = 0`.
+>
+> {width="400"}
+>
+> First, calculate the slope `b`.
+>
+> In other words, and referring to our pumpkin data's original question: "predict the price of a pumpkin per bushel by month", `X` would refer to the price and `Y` would refer to the month of sale.
+>
+> 
+>
+> Calculate the value of Y. If you're paying around \$4, it must be April!
+>
+> The math that calculates the line must demonstrate the slope of the line, which is also dependent on the intercept, or where `Y` is situated when `X = 0`.
+>
+> You can observe the method of calculation for these values on the [Math is Fun](https://www.mathsisfun.com/data/least-squares-regression.html) web site. Also visit [this Least-squares calculator](https://www.mathsisfun.com/data/least-squares-calculator.html) to watch how the numbers' values impact the line.
+
+Not so scary, right? 🤓
+
+#### Correlation
+
+One more term to understand is the **Correlation Coefficient** between given X and Y variables. Using a scatterplot, you can quickly visualize this coefficient. A plot with datapoints scattered in a neat line have high correlation, but a plot with datapoints scattered everywhere between X and Y have a low correlation.
+
+A good linear regression model will be one that has a high (nearer to 1 than 0) Correlation Coefficient using the Least-Squares Regression method with a line of regression.
+
+## **2. A dance with data: creating a data frame that will be used for modelling**
+
+{width="700"}
+
+Load up required libraries and dataset. Convert the data to a data frame containing a subset of the data:
+
+- Only get pumpkins priced by the bushel
+
+- Convert the date to a month
+
+- Calculate the price to be an average of high and low prices
+
+- Convert the price to reflect the pricing by bushel quantity
+
+> We covered these steps in the [previous lesson](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/2-Data/solution/lesson_2-R.ipynb).
+
+```{r load_tidy_verse_models, message=F, warning=F}
+# Load the core Tidyverse packages
+library(tidyverse)
+library(lubridate)
+
+# Import the pumpkins data
+pumpkins <- read_csv(file = "https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv")
+
+
+# Get a glimpse and dimensions of the data
+glimpse(pumpkins)
+
+
+# Print the first 50 rows of the data set
+pumpkins %>%
+ slice_head(n = 5)
+
+
+```
+
+In the spirit of sheer adventure, let's explore the [`janitor package`](github.com/sfirke/janitor) that provides simple functions for examining and cleaning dirty data. For instance, let's take a look at the column names for our data:
+
+```{r col_names}
+# Return column names
+pumpkins %>%
+ names()
+
+```
+
+🤔 We can do better. Let's make these column names `friendR` by converting them to the [snake_case](https://en.wikipedia.org/wiki/Snake_case) convention using `janitor::clean_names`. To find out more about this function: `?clean_names`
+
+```{r friendR}
+# Clean names to the snake_case convention
+pumpkins <- pumpkins %>%
+ clean_names(case = "snake")
+
+# Return column names
+pumpkins %>%
+ names()
+
+```
+
+Much tidyR 🧹! Now, a dance with the data using `dplyr` as in the previous lesson! 💃
+
+```{r prep_data, message=F, warning=F}
+# Select desired columns
+pumpkins <- pumpkins %>%
+ select(variety, city_name, package, low_price, high_price, date)
+
+
+
+# Extract the month from the dates to a new column
+pumpkins <- pumpkins %>%
+ mutate(date = mdy(date),
+ month = month(date)) %>%
+ select(-date)
+
+
+
+# Create a new column for average Price
+pumpkins <- pumpkins %>%
+ mutate(price = (low_price + high_price)/2)
+
+
+# Retain only pumpkins with the string "bushel"
+new_pumpkins <- pumpkins %>%
+ filter(str_detect(string = package, pattern = "bushel"))
+
+
+# Normalize the pricing so that you show the pricing per bushel, not per 1 1/9 or 1/2 bushel
+new_pumpkins <- new_pumpkins %>%
+ mutate(price = case_when(
+ str_detect(package, "1 1/9") ~ price/(1.1),
+ str_detect(package, "1/2") ~ price*2,
+ TRUE ~ price))
+
+# Relocate column positions
+new_pumpkins <- new_pumpkins %>%
+ relocate(month, .before = variety)
+
+
+# Display the first 5 rows
+new_pumpkins %>%
+ slice_head(n = 5)
+```
+
+Good job!👌 You now have a clean, tidy data set on which you can build your new regression model!
+
+Mind a scatter plot?
+
+```{r scatter_price_month}
+# Set theme
+theme_set(theme_light())
+
+# Make a scatter plot of month and price
+new_pumpkins %>%
+ ggplot(mapping = aes(x = month, y = price)) +
+ geom_point(size = 1.6)
+
+```
+
+A scatter plot reminds us that we only have month data from August through December. We probably need more data to be able to draw conclusions in a linear fashion.
+
+Let's take a look at our modelling data again:
+
+```{r modelling data}
+# Display first 5 rows
+new_pumpkins %>%
+ slice_head(n = 5)
+
+```
+
+What if we wanted to predict the `price` of a pumpkin based on the `city` or `package` columns which are of type character? Or even more simply, how could we find the correlation (which requires both of its inputs to be numeric) between, say, `package` and `price`? 🤷🤷
+
+Machine learning models work best with numeric features rather than text values, so you generally need to convert categorical features into numeric representations.
+
+This means that we have to find a way to reformat our predictors to make them easier for a model to use effectively, a process known as `feature engineering`.
+
+## 3. Preprocessing data for modelling with recipes 👩🍳👨🍳
+
+Activities that reformat predictor values to make them easier for a model to use effectively has been termed `feature engineering`.
+
+Different models have different preprocessing requirements. For instance, least squares requires `encoding categorical variables` such as month, variety and city_name. This simply involves `translating` a column with `categorical values` into one or more `numeric columns` that take the place of the original.
+
+For example, suppose your data includes the following categorical feature:
+
+| city |
+|:-------:|
+| Denver |
+| Nairobi |
+| Tokyo |
+
+You can apply *ordinal encoding* to substitute a unique integer value for each category, like this:
+
+| city |
+|:----:|
+| 0 |
+| 1 |
+| 2 |
+
+And that's what we'll do to our data!
+
+In this section, we'll explore another amazing Tidymodels package: [recipes](https://tidymodels.github.io/recipes/) - which is designed to help you preprocess your data **before** training your model. At its core, a recipe is an object that defines what steps should be applied to a data set in order to get it ready for modelling.
+
+Now, let's create a recipe that prepares our data for modelling by substituting a unique integer for all the observations in the predictor columns:
+
+```{r pumpkins_recipe}
+# Specify a recipe
+pumpkins_recipe <- recipe(price ~ ., data = new_pumpkins) %>%
+ step_integer(all_predictors(), zero_based = TRUE)
+
+
+# Print out the recipe
+pumpkins_recipe
+
+```
+
+Awesome! 👏 We just created our first recipe that specifies an outcome (price) and its corresponding predictors and that all the predictor columns should be encoded into a set of integers 🙌! Let's quickly break it down:
+
+- The call to `recipe()` with a formula tells the recipe the *roles* of the variables using `new_pumpkins` data as the reference. For instance the `price` column has been assigned an `outcome` role while the rest of the columns have been assigned a `predictor` role.
+
+- `step_integer(all_predictors(), zero_based = TRUE)` specifies that all the predictors should be converted into a set of integers with the numbering starting at 0.
+
+We are sure you may be having thoughts such as: "This is so cool!! But what if I needed to confirm that the recipes are doing exactly what I expect them to do? 🤔"
+
+That's an awesome thought! You see, once your recipe is defined, you can estimate the parameters required to actually preprocess the data, and then extract the processed data. You don't typically need to do this when you use Tidymodels (we'll see the normal convention in just a minute-\> `workflows`) but it can come in handy when you want to do some kind of sanity check for confirming that recipes are doing what you expect.
+
+For that, you'll need two more verbs: `prep()` and `bake()` and as always, our little R friends by [`Allison Horst`](https://github.com/allisonhorst/stats-illustrations) help you in understanding this better!
+
+{width="550"}
+
+[`prep()`](https://recipes.tidymodels.org/reference/prep.html): estimates the required parameters from a training set that can be later applied to other data sets. For instance, for a given predictor column, what observation will be assigned integer 0 or 1 or 2 etc
+
+[`bake()`](https://recipes.tidymodels.org/reference/bake.html): takes a prepped recipe and applies the operations to any data set.
+
+That said, lets prep and bake our recipes to really confirm that under the hood, the predictor columns will be first encoded before a model is fit.
+
+```{r prep_bake}
+# Prep the recipe
+pumpkins_prep <- prep(pumpkins_recipe)
+
+# Bake the recipe to extract a preprocessed new_pumpkins data
+baked_pumpkins <- bake(pumpkins_prep, new_data = NULL)
+
+# Print out the baked data set
+baked_pumpkins %>%
+ slice_head(n = 10)
+```
+
+Woo-hoo!🥳 The processed data `baked_pumpkins` has all it's predictors encoded confirming that indeed the preprocessing steps defined as our recipe will work as expected. This makes it harder for you to read but much more intelligible for Tidymodels! Take some time to find out what observation has been mapped to a corresponding integer.
+
+It is also worth mentioning that `baked_pumpkins` is a data frame that we can perform computations on.
+
+For instance, let's try to find a good correlation between two points of your data to potentially build a good predictive model. We'll use the function `cor()` to do this. Type `?cor()` to find out more about the function.
+
+```{r corr}
+# Find the correlation between the city_name and the price
+cor(baked_pumpkins$city_name, baked_pumpkins$price)
+
+# Find the correlation between the package and the price
+cor(baked_pumpkins$package, baked_pumpkins$price)
+
+```
+
+As it turns out, there's only weak correlation between the City and Price. However there's a bit better correlation between the Package and its Price. That makes sense, right? Normally, the bigger the produce box, the higher the price.
+
+While we are at it, let's also try and visualize a correlation matrix of all the columns using the `corrplot` package.
+
+```{r corrplot}
+# Load the corrplot package
+library(corrplot)
+
+# Obtain correlation matrix
+corr_mat <- cor(baked_pumpkins %>%
+ # Drop columns that are not really informative
+ select(-c(low_price, high_price)))
+
+# Make a correlation plot between the variables
+corrplot(corr_mat, method = "shade", shade.col = NA, tl.col = "black", tl.srt = 45, addCoef.col = "black", cl.pos = "n", order = "original")
+
+```
+
+🤩🤩 Much better.
+
+A good question to now ask of this data will be: '`What price can I expect of a given pumpkin package?`' Let's get right into it!
+
+> Note: When you **`bake()`** the prepped recipe **`pumpkins_prep`** with **`new_data = NULL`**, you extract the processed (i.e. encoded) training data. If you had another data set for example a test set and would want to see how a recipe would pre-process it, you would simply bake **`pumpkins_prep`** with **`new_data = test_set`**
+
+## 4. Build a linear regression model
+
+{width="800"}
+
+Now that we have build a recipe, and actually confirmed that the data will be pre-processed appropriately, let's now build a regression model to answer the question: `What price can I expect of a given pumpkin package?`
+
+#### Train a linear regression model using the training set
+
+As you may have already figured out, the column *price* is the `outcome` variable while the *package* column is the `predictor` variable.
+
+To do this, we'll first split the data such that 80% goes into training and 20% into test set, then define a recipe that will encode the predictor column into a set of integers, then build a model specification. We won't prep and bake our recipe since we already know it will preprocess the data as expected.
+
+```{r lm_rec_spec}
+set.seed(2056)
+# Split the data into training and test sets
+pumpkins_split <- new_pumpkins %>%
+ initial_split(prop = 0.8)
+
+
+# Extract training and test data
+pumpkins_train <- training(pumpkins_split)
+pumpkins_test <- testing(pumpkins_split)
+
+
+
+# Create a recipe for preprocessing the data
+lm_pumpkins_recipe <- recipe(price ~ package, data = pumpkins_train) %>%
+ step_integer(all_predictors(), zero_based = TRUE)
+
+
+
+# Create a linear model specification
+lm_spec <- linear_reg() %>%
+ set_engine("lm") %>%
+ set_mode("regression")
+
+
+```
+
+Good job! Now that we have a recipe and a model specification, we need to find a way of bundling them together into an object that will first preprocess the data (prep+bake behind the scenes), fit the model on the preprocessed data and also allow for potential post-processing activities. How's that for your peace of mind!🤩
+
+In Tidymodels, this convenient object is called a [`workflow`](https://workflows.tidymodels.org/) and conveniently holds your modeling components! This is what we'd call *pipelines* in *Python*.
+
+So let's bundle everything up into a workflow!📦
+
+```{r lm_workflow}
+# Hold modelling components in a workflow
+lm_wf <- workflow() %>%
+ add_recipe(lm_pumpkins_recipe) %>%
+ add_model(lm_spec)
+
+# Print out the workflow
+lm_wf
+
+```
+
+👌 Into the bargain, a workflow can be fit/trained in much the same way a model can.
+
+```{r lm_wf_fit}
+# Train the model
+lm_wf_fit <- lm_wf %>%
+ fit(data = pumpkins_train)
+
+# Print the model coefficients learned
+lm_wf_fit
+
+```
+
+From the model output, we can see the coefficients learned during training. They represent the coefficients of the line of best fit that gives us the lowest overall error between the actual and predicted variable.
+
+#### Evaluate model performance using the test set
+
+It's time to see how the model performed 📏! How do we do this?
+
+Now that we've trained the model, we can use it to make predictions for the test_set using `parsnip::predict()`. Then we can compare these predictions to the actual label values to evaluate how well (or not!) the model is working.
+
+Let's start with making predictions for the test set then bind the columns to the test set.
+
+```{r lm_pred}
+# Make predictions for the test set
+predictions <- lm_wf_fit %>%
+ predict(new_data = pumpkins_test)
+
+
+# Bind predictions to the test set
+lm_results <- pumpkins_test %>%
+ select(c(package, price)) %>%
+ bind_cols(predictions)
+
+
+# Print the first ten rows of the tibble
+lm_results %>%
+ slice_head(n = 10)
+```
+
+Yes, you have just trained a model and used it to make predictions!🔮 Is it any good, let's evaluate the model's performance!
+
+In Tidymodels, we do this using `yardstick::metrics()`! For linear regression, let's focus on the following metrics:
+
+- `Root Mean Square Error (RMSE)`: The square root of the [MSE](https://en.wikipedia.org/wiki/Mean_squared_error). This yields an absolute metric in the same unit as the label (in this case, the price of a pumpkin). The smaller the value, the better the model (in a simplistic sense, it represents the average price by which the predictions are wrong!)
+
+- `Coefficient of Determination (usually known as R-squared or R2)`: A relative metric in which the higher the value, the better the fit of the model. In essence, this metric represents how much of the variance between predicted and actual label values the model is able to explain.
+
+```{r lm_yardstick}
+# Evaluate performance of linear regression
+metrics(data = lm_results,
+ truth = price,
+ estimate = .pred)
+
+
+```
+
+There goes the model performance. Let's see if we can get a better indication by visualizing a scatter plot of the package and price then use the predictions made to overlay a line of best fit.
+
+This means we'll have to prep and bake the test set in order to encode the package column then bind this to the predictions made by our model.
+
+```{r lm_plot}
+# Encode package column
+package_encode <- lm_pumpkins_recipe %>%
+ prep() %>%
+ bake(new_data = pumpkins_test) %>%
+ select(package)
+
+
+# Bind encoded package column to the results
+lm_results <- lm_results %>%
+ bind_cols(package_encode %>%
+ rename(package_integer = package)) %>%
+ relocate(package_integer, .after = package)
+
+
+# Print new results data frame
+lm_results %>%
+ slice_head(n = 5)
+
+
+# Make a scatter plot
+lm_results %>%
+ ggplot(mapping = aes(x = package_integer, y = price)) +
+ geom_point(size = 1.6) +
+ # Overlay a line of best fit
+ geom_line(aes(y = .pred), color = "orange", size = 1.2) +
+ xlab("package")
+
+
+
+```
+
+Great! As you can see, the linear regression model does not really well generalize the relationship between a package and its corresponding price.
+
+🎃 Congratulations, you just created a model that can help predict the price of a few varieties of pumpkins. Your holiday pumpkin patch will be beautiful. But you can probably create a better model!
+
+## 5. Build a polynomial regression model
+
+{width="800"}
+
+Sometimes our data may not have a linear relationship, but we still want to predict an outcome. Polynomial regression can help us make predictions for more complex non-linear relationships.
+
+Take for instance the relationship between the package and price for our pumpkins data set. While sometimes there's a linear relationship between variables - the bigger the pumpkin in volume, the higher the price - sometimes these relationships can't be plotted as a plane or straight line.
+
+> ✅ Here are [some more examples](https://online.stat.psu.edu/stat501/lesson/9/9.8) of data that could use polynomial regression
+>
+> Take another look at the relationship between Variety to Price in the previous plot. Does this scatterplot seem like it should necessarily be analyzed by a straight line? Perhaps not. In this case, you can try polynomial regression.
+>
+> ✅ Polynomials are mathematical expressions that might consist of one or more variables and coefficients
+
+#### Train a polynomial regression model using the training set
+
+Polynomial regression creates a *curved line* to better fit nonlinear data.
+
+Let's see whether a polynomial model will perform better in making predictions. We'll follow a somewhat similar procedure as we did before:
+
+- Create a recipe that specifies the preprocessing steps that should be carried out on our data to get it ready for modelling i.e: encoding predictors and computing polynomials of degree *n*
+
+- Build a model specification
+
+- Bundle the recipe and model specification into a workflow
+
+- Create a model by fitting the workflow
+
+- Evaluate how well the model performs on the test data
+
+Let's get right into it!
+
+```{r polynomial_reg}
+# Specify a recipe
+poly_pumpkins_recipe <-
+ recipe(price ~ package, data = pumpkins_train) %>%
+ step_integer(all_predictors(), zero_based = TRUE) %>%
+ step_poly(all_predictors(), degree = 4)
+
+
+# Create a model specification
+poly_spec <- linear_reg() %>%
+ set_engine("lm") %>%
+ set_mode("regression")
+
+
+# Bundle recipe and model spec into a workflow
+poly_wf <- workflow() %>%
+ add_recipe(poly_pumpkins_recipe) %>%
+ add_model(poly_spec)
+
+
+# Create a model
+poly_wf_fit <- poly_wf %>%
+ fit(data = pumpkins_train)
+
+
+# Print learned model coefficients
+poly_wf_fit
+
+
+
+```
+
+#### Evaluate model performance
+
+👏👏You've built a polynomial model let's make predictions on the test set!
+
+```{r poly_predict}
+# Make price predictions on test data
+poly_results <- poly_wf_fit %>% predict(new_data = pumpkins_test) %>%
+ bind_cols(pumpkins_test %>% select(c(package, price))) %>%
+ relocate(.pred, .after = last_col())
+
+
+# Print the results
+poly_results %>%
+ slice_head(n = 10)
+```
+
+Woo-hoo , let's evaluate how the model performed on the test_set using `yardstick::metrics()`.
+
+```{r poly_eval}
+metrics(data = poly_results, truth = price, estimate = .pred)
+```
+
+🤩🤩 Much better performance.
+
+The `rmse` decreased from about 7. to about 3. an indication that of a reduced error between the actual price and the predicted price. You can *loosely* interpret this as meaning that on average, incorrect predictions are wrong by around \$3. The `rsq` increased from about 0.4 to 0.8.
+
+All these metrics indicate that the polynomial model performs way better than the linear model. Good job!
+
+Let's see if we can visualize this!
+
+```{r poly_viz}
+# Bind encoded package column to the results
+poly_results <- poly_results %>%
+ bind_cols(package_encode %>%
+ rename(package_integer = package)) %>%
+ relocate(package_integer, .after = package)
+
+
+# Print new results data frame
+poly_results %>%
+ slice_head(n = 5)
+
+
+# Make a scatter plot
+poly_results %>%
+ ggplot(mapping = aes(x = package_integer, y = price)) +
+ geom_point(size = 1.6) +
+ # Overlay a line of best fit
+ geom_line(aes(y = .pred), color = "midnightblue", size = 1.2) +
+ xlab("package")
+
+
+
+```
+
+You can see a curved line that fits your data better! 🤩
+
+You can make this more smoother by passing a polynomial formula to `geom_smooth` like this:
+
+```{r smooth curve}
+# Make a scatter plot
+poly_results %>%
+ ggplot(mapping = aes(x = package_integer, y = price)) +
+ geom_point(size = 1.6) +
+ # Overlay a line of best fit
+ geom_smooth(method = lm, formula = y ~ poly(x, degree = 4), color = "midnightblue", size = 1.2, se = FALSE) +
+ xlab("package")
+
+
+
+
+```
+
+Much like a smooth curve!🤩
+
+Here's how you would make a new prediction:
+
+```{r predict}
+# Make a hypothetical data frame
+hypo_tibble <- tibble(package = "bushel baskets")
+
+# Make predictions using linear model
+lm_pred <- lm_wf_fit %>% predict(new_data = hypo_tibble)
+
+# Make predictions using polynomial model
+poly_pred <- poly_wf_fit %>% predict(new_data = hypo_tibble)
+
+# Return predictions in a list
+list("linear model prediction" = lm_pred,
+ "polynomial model prediction" = poly_pred)
+
+
+```
+
+The `polynomial model` prediction does make sense, given the scatter plots of `price` and `package`! And, if this is a better model than the previous one, looking at the same data, you need to budget for these more expensive pumpkins!
+
+🏆 Well done! You created two regression models in one lesson. In the final section on regression, you will learn about logistic regression to determine categories.
+
+## **🚀Challenge**
+
+Test several different variables in this notebook to see how correlation corresponds to model accuracy.
+
+## [**Post-lecture quiz**](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/14/)
+
+## **Review & Self Study**
+
+In this lesson we learned about Linear Regression. There are other important types of Regression. Read about Stepwise, Ridge, Lasso and Elasticnet techniques. A good course to study to learn more is the [Stanford Statistical Learning course](https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning)
+
+If you want to learn more about how to use the amazing Tidymodels framework, please check out the following resources:
+
+- Tidymodels website: [Get started with Tidymodels](https://www.tidymodels.org/start/)
+
+- Max Kuhn and Julia Silge, [*Tidy Modeling with R*](https://www.tmwr.org/)*.*
+
+###### **THANK YOU TO:**
+
+[Allison Horst](https://twitter.com/allison_horst?lang=en) for creating the amazing illustrations that make R more welcoming and engaging. Find more illustrations at her [gallery](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM).
diff --git a/2-Regression/3-Linear/translations/README.it.md b/2-Regression/3-Linear/translations/README.it.md
new file mode 100644
index 00000000..1aafa601
--- /dev/null
+++ b/2-Regression/3-Linear/translations/README.it.md
@@ -0,0 +1,339 @@
+# Costruire un modello di regressione usando Scikit-learn: regressione in due modi
+
+
+> Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/13/)
+
+### Introduzione
+
+Finora si è esplorato cos'è la regressione con dati di esempio raccolti dall'insieme di dati relativo ai prezzi della zucca, che verrà usato in questa lezione. Lo si è anche visualizzato usando Matplotlib.
+
+Ora si è pronti per approfondire la regressione per machine learning. In questa lezione si imparerà di più su due tipi di regressione: _regressione lineare di base_ e _regressione polinomiale_, insieme ad alcuni dei calcoli alla base di queste tecniche.
+
+> In questo programma di studi, si assume una conoscenza minima della matematica, e si cerca di renderla accessibile agli studenti provenienti da altri campi, quindi si faccia attenzione a note, 🧮 didascalie, diagrammi e altri strumenti di apprendimento che aiutano la comprensione.
+
+### Prerequisito
+
+Si dovrebbe ormai avere familiarità con la struttura dei dati della zucca che si sta esaminando. Lo si può trovare precaricato e prepulito nel file _notebook.ipynb_ di questa lezione. Nel file, il prezzo della zucca viene visualizzato per bushel (staio) in un nuovo dataframe. Assicurasi di poter eseguire questi notebook nei kernel in Visual Studio Code.
+
+### Preparazione
+
+Come promemoria, si stanno caricando questi dati in modo da porre domande su di essi.
+
+- Qual è il momento migliore per comprare le zucche?
+- Che prezzo ci si può aspettare da una cassa di zucche in miniatura?
+- Si devono acquistare in cestini da mezzo bushel o a scatola da 1 1/9 bushel? Si continua a scavare in questi dati.
+
+Nella lezione precedente, è stato creato un dataframe Pandas e si è popolato con parte dell'insieme di dati originale, standardizzando il prezzo per lo bushel. In questo modo, tuttavia, si sono potuti raccogliere solo circa 400 punti dati e solo per i mesi autunnali.
+
+Si dia un'occhiata ai dati precaricati nel notebook di accompagnamento di questa lezione. I dati sono precaricati e viene tracciato un grafico a dispersione iniziale per mostrare i dati mensili. Forse si può ottenere qualche dettaglio in più sulla natura dei dati pulendoli ulteriormente.
+
+## Una linea di regressione lineare
+
+Come si è appreso nella lezione 1, l'obiettivo di un esercizio di regressione lineare è essere in grado di tracciare una linea per:
+
+- **Mostrare le relazioni tra variabili**.
+- **Fare previsioni**. Fare previsioni accurate su dove cadrebbe un nuovo punto dati in relazione a quella linea.
+
+È tipico della **Regressione dei Minimi Quadrati** disegnare questo tipo di linea. Il termine "minimi quadrati" significa che tutti i punti dati che circondano la linea di regressione sono elevati al quadrato e quindi sommati. Idealmente, quella somma finale è la più piccola possibile, perché si vuole un basso numero di errori, o `minimi quadrati`.
+
+Lo si fa perché si vuole modellare una linea che abbia la distanza cumulativa minima da tutti i punti dati. Si esegue anche il quadrato dei termini prima di aggiungerli poiché interessa la grandezza piuttosto che la direzione.
+
+> **🧮 Mostrami la matematica**
+>
+> Questa linea, chiamata _linea di miglior adattamento_ , può essere espressa da [un'equazione](https://en.wikipedia.org/wiki/Simple_linear_regression):
+>
+> ```
+> Y = a + bX
+> ```
+>
+> `X` è la "variabile esplicativa". `Y` è la "variabile dipendente". La pendenza della linea è `b` e `a` è l'intercetta di y, che si riferisce al valore di `Y` quando `X = 0`.
+>
+> 
+>
+> Prima, calcolare la pendenza `b`. Infografica di [Jen Looper](https://twitter.com/jenlooper)
+>
+> In altre parole, facendo riferimento alla domanda originale per i dati sulle zucche: "prevedere il prezzo di una zucca per bushel per mese", `X` si riferisce al prezzo e `Y` si riferirisce al mese di vendita.
+>
+> 
+>
+> Si calcola il valore di Y. Se si sta pagando circa $4, deve essere aprile! Infografica di [Jen Looper](https://twitter.com/jenlooper)
+>
+> La matematica che calcola la linea deve dimostrare la pendenza della linea, che dipende anche dall'intercetta, o dove `Y` si trova quando `X = 0`.
+>
+> Si può osservare il metodo di calcolo per questi valori sul sito web [Math is Fun](https://www.mathsisfun.com/data/least-squares-regression.html) . Si visiti anche [questo calcolatore dei minimi quadrati](https://www.mathsisfun.com/data/least-squares-calculator.html) per vedere come i valori dei numeri influiscono sulla linea.
+
+## Correlazione
+
+Un altro termine da comprendere è il **Coefficiente di Correlazione** tra determinate variabili X e Y. Utilizzando un grafico a dispersione, è possibile visualizzare rapidamente questo coefficiente. Un grafico con punti dati sparsi in una linea ordinata ha un'alta correlazione, ma un grafico con punti dati sparsi ovunque tra X e Y ha una bassa correlazione.
+
+Un buon modello di regressione lineare sarà quello che ha un Coefficiente di Correlazione alto (più vicino a 1 rispetto a 0) utilizzando il Metodo di Regressione dei Minimi Quadrati con una linea di regressione.
+
+✅ Eseguire il notebook che accompagna questa lezione e guardare il grafico a dispersione City to Price. I dati che associano la città al prezzo per le vendite di zucca sembrano avere una correlazione alta o bassa, secondo la propria interpretazione visiva del grafico a dispersione?
+
+
+## Preparare i dati per la regressione
+
+Ora che si ha una comprensione della matematica alla base di questo esercizio, si crea un modello di regressione per vedere se si può prevedere quale pacchetto di zucche avrà i migliori prezzi per zucca. Qualcuno che acquista zucche per una festa con tema un campo di zucche potrebbe desiderare che queste informazioni siano in grado di ottimizzare i propri acquisti di pacchetti di zucca per il campo.
+
+Dal momento che si utilizzerà Scikit-learn, non c'è motivo di farlo a mano (anche se si potrebbe!). Nel blocco di elaborazione dati principale del notebook della lezione, aggiungere una libreria da Scikit-learn per convertire automaticamente tutti i dati di tipo stringa in numeri:
+
+```python
+from sklearn.preprocessing import LabelEncoder
+
+new_pumpkins.iloc[:, 0:-1] = new_pumpkins.iloc[:, 0:-1].apply(LabelEncoder().fit_transform)
+```
+
+Se si guarda ora il dataframe new_pumpkins, si vede che tutte le stringhe ora sono numeriche. Questo rende più difficile la lettura per un umano ma molto più comprensibile per Scikit-learn!
+Ora si possono prendere decisioni più consapevoli (non solo basate sull'osservazione di un grafico a dispersione) sui dati più adatti alla regressione.
+
+Si provi a trovare una buona correlazione tra due punti nei propri dati per costruire potenzialmente un buon modello predittivo. A quanto pare, c'è solo una debole correlazione tra la città e il prezzo:
+
+```python
+print(new_pumpkins['City'].corr(new_pumpkins['Price']))
+0.32363971816089226
+```
+
+Tuttavia, c'è una correlazione leggermente migliore tra il pacchetto e il suo prezzo. Ha senso, vero? Normalmente, più grande è la scatola dei prodotti, maggiore è il prezzo.
+
+```python
+print(new_pumpkins['Package'].corr(new_pumpkins['Price']))
+0.6061712937226021
+```
+
+Una buona domanda da porre a questi dati sarà: "Che prezzo posso aspettarmi da un determinato pacchetto di zucca?"
+
+Si costruisce questo modello di regressione
+
+## Costruire un modello lineare
+
+Prima di costruire il modello, si esegue un altro riordino dei dati. Si eliminano tutti i dati nulli e si controlla ancora una volta che aspetto hanno i dati.
+
+```python
+new_pumpkins.dropna(inplace=True)
+new_pumpkins.info()
+```
+
+Quindi, si crea un nuovo dataframe da questo set minimo e lo si stampa:
+
+```python
+new_columns = ['Package', 'Price']
+lin_pumpkins = new_pumpkins.drop([c for c in new_pumpkins.columns if c not in new_columns], axis='columns')
+
+lin_pumpkins
+```
+
+```output
+ Package Price
+70 0 13.636364
+71 0 16.363636
+72 0 16.363636
+73 0 15.454545
+74 0 13.636364
+... ... ...
+1738 2 30.000000
+1739 2 28.750000
+1740 2 25.750000
+1741 2 24.000000
+1742 2 24.000000
+415 rows × 2 columns
+```
+
+1. Ora si possono assegnare i dati delle coordinate X e y:
+
+ ```python
+ X = lin_pumpkins.values[:, :1]
+ y = lin_pumpkins.values[:, 1:2]
+ ```
+
+Cosa sta succedendo qui? Si sta usando [la notazione slice Python](https://stackoverflow.com/questions/509211/understanding-slice-notation/509295#509295) per creare array per popolare `X` e `y`.
+
+2. Successivamente, si avvia le routine di creazione del modello di regressione:
+
+ ```python
+ from sklearn.linear_model import LinearRegression
+ from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
+ from sklearn.model_selection import train_test_split
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+ lin_reg = LinearRegression()
+ lin_reg.fit(X_train,y_train)
+
+ pred = lin_reg.predict(X_test)
+
+ accuracy_score = lin_reg.score(X_train,y_train)
+ print('Model Accuracy: ', accuracy_score)
+ ```
+
+ Poiché la correlazione non è particolarmente buona, il modello prodotto non è molto accurato.
+
+ ```output
+ Model Accuracy: 0.3315342327998987
+ ```
+
+3. Si può visualizzare la linea tracciata nel processo:
+
+ ```python
+ plt.scatter(X_test, y_test, color='black')
+ plt.plot(X_test, pred, color='blue', linewidth=3)
+
+ plt.xlabel('Package')
+ plt.ylabel('Price')
+
+ plt.show()
+ ```
+
+ 
+
+4. Si testa il modello contro una varietà ipotetica:
+
+ ```python
+ lin_reg.predict( np.array([ [2.75] ]) )
+ ```
+
+ Il prezzo restituito per questa varietà mitologica è:
+
+ ```output
+ array([[33.15655975]])
+ ```
+
+Quel numero ha senso, se la logica della linea di regressione è vera.
+
+🎃 Congratulazioni, si è appena creato un modello che può aiutare a prevedere il prezzo di alcune varietà di zucche. La zucca per le festività sarà bellissima. Ma probabilmente si può creare un modello migliore!
+
+## Regressione polinomiale
+
+Un altro tipo di regressione lineare è la regressione polinomiale. Mentre a volte c'è una relazione lineare tra le variabili - più grande è il volume della zucca, più alto è il prezzo - a volte queste relazioni non possono essere tracciate come un piano o una linea retta.
+
+✅ Ecco [alcuni altri esempi](https://online.stat.psu.edu/stat501/lesson/9/9.8) di dati che potrebbero utilizzare la regressione polinomiale
+
+Si dia un'altra occhiata alla relazione tra Varietà e Prezzo nel tracciato precedente. Questo grafico a dispersione deve essere necessariamente analizzato da una linea retta? Forse no. In questo caso, si può provare la regressione polinomiale.
+
+✅ I polinomi sono espressioni matematiche che possono essere costituite da una o più variabili e coefficienti
+
+La regressione polinomiale crea una linea curva per adattare meglio i dati non lineari.
+
+1. Viene ricreato un dataframe popolato con un segmento dei dati della zucca originale:
+
+ ```python
+ new_columns = ['Variety', 'Package', 'City', 'Month', 'Price']
+ poly_pumpkins = new_pumpkins.drop([c for c in new_pumpkins.columns if c not in new_columns], axis='columns')
+
+ poly_pumpkins
+ ```
+
+Un buon modo per visualizzare le correlazioni tra i dati nei dataframe è visualizzarli in un grafico "coolwarm":
+
+2. Si usa il metodo `Background_gradient()` con `coolwarm` come valore dell'argomento:
+
+ ```python
+ corr = poly_pumpkins.corr()
+ corr.style.background_gradient(cmap='coolwarm')
+ ```
+
+ Questo codice crea una mappa di calore:
+ 
+
+Guardando questo grafico, si può visualizzare la buona correlazione tra Pacchetto e Prezzo. Quindi si dovrebbe essere in grado di creare un modello un po' migliore dell'ultimo.
+
+### Creare una pipeline
+
+Scikit-learn include un'API utile per la creazione di modelli di regressione polinomiale: l'[API](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html?highlight=pipeline#sklearn.pipeline.make_pipeline) `make_pipeline`. Viene creata una 'pipeline' che è una catena di stimatori. In questo caso, la pipeline include caratteristiche polinomiali o previsioni che formano un percorso non lineare.
+
+1. Si costruiscono le colonne X e y:
+
+ ```python
+ X=poly_pumpkins.iloc[:,3:4].values
+ y=poly_pumpkins.iloc[:,4:5].values
+ ```
+
+2. Si crea la pipeline chiamando il metodo `make_pipeline()` :
+
+ ```python
+ from sklearn.preprocessing import PolynomialFeatures
+ from sklearn.pipeline import make_pipeline
+
+ pipeline = make_pipeline(PolynomialFeatures(4), LinearRegression())
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+ pipeline.fit(np.array(X_train), y_train)
+
+ y_pred=pipeline.predict(X_test)
+ ```
+
+### Creare una sequenza
+
+A questo punto, è necessario creare un nuovo dataframe con dati _ordinati_ in modo che la pipeline possa creare una sequenza.
+
+Si aggiunge il seguente codice:
+
+```python
+df = pd.DataFrame({'x': X_test[:,0], 'y': y_pred[:,0]})
+df.sort_values(by='x',inplace = True)
+points = pd.DataFrame(df).to_numpy()
+
+plt.plot(points[:, 0], points[:, 1],color="blue", linewidth=3)
+plt.xlabel('Package')
+plt.ylabel('Price')
+plt.scatter(X,y, color="black")
+plt.show()
+```
+
+Si è creato un nuovo dataframe chiamato `pd.DataFrame`. Quindi si sono ordinati i valori chiamando `sort_values()`. Alla fine si è creato un grafico polinomiale:
+
+
+
+Si può vedere una linea curva che si adatta meglio ai dati.
+
+Si verifica la precisione del modello:
+
+```python
+accuracy_score = pipeline.score(X_train,y_train)
+print('Model Accuracy: ', accuracy_score)
+```
+
+E voilà!
+
+```output
+Model Accuracy: 0.8537946517073784
+```
+
+Ecco, meglio! Si prova a prevedere un prezzo:
+
+### Fare una previsione
+
+E possibile inserire un nuovo valore e ottenere una previsione?
+
+Si chiami `predict()` per fare una previsione:
+
+```python
+pipeline.predict( np.array([ [2.75] ]) )
+```
+
+Viene data questa previsione:
+
+```output
+array([[46.34509342]])
+```
+
+Ha senso, visto il tracciato! Se questo è un modello migliore del precedente, guardando gli stessi dati, si deve preventivare queste zucche più costose!
+
+Ben fatto! Sono stati creati due modelli di regressione in una lezione. Nella sezione finale sulla regressione, si imparerà a conoscere la regressione logistica per determinare le categorie.
+
+---
+
+## 🚀 Sfida
+
+Testare diverse variabili in questo notebook per vedere come la correlazione corrisponde all'accuratezza del modello.
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/14/)
+
+## Revisione e Auto Apprendimento
+
+In questa lezione si è appreso della regressione lineare. Esistono altri tipi importanti di regressione. Leggere le tecniche Stepwise, Ridge, Lazo ed Elasticnet. Un buon corso per studiare per saperne di più è il [corso Stanford Statistical Learning](https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning)
+
+## Compito
+
+[Costruire un modello](assignment.it.md)
diff --git a/2-Regression/3-Linear/translations/README.ja.md b/2-Regression/3-Linear/translations/README.ja.md
new file mode 100644
index 00000000..cc3c3b7b
--- /dev/null
+++ b/2-Regression/3-Linear/translations/README.ja.md
@@ -0,0 +1,334 @@
+# Scikit-learnを用いた回帰モデルの構築: 回帰を行う2つの方法
+
+
+> [Dasani Madipalli](https://twitter.com/dasani_decoded) によるインフォグラフィック
+## [講義前のクイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/13/)
+### イントロダクション
+
+これまで、このレッスンで使用するカボチャの価格データセットから集めたサンプルデータを使って、回帰とは何かを探ってきました。また、Matplotlibを使って可視化を行いました。
+
+これで、MLにおける回帰をより深く理解する準備が整いました。このレッスンでは、2種類の回帰について詳しく説明します。基本的な線形回帰 (_basic linear regression_)と多項式回帰 (_polynomial regression_)の2種類の回帰について、その基礎となる数学を学びます。
+
+> このカリキュラムでは、最低限の数学の知識を前提とし、他の分野の学生にも理解できるようにしていますので、理解を助けるためのメモ、🧮吹き出し、図などの学習ツールをご覧ください。
+
+### 事前確認
+
+ここでは、パンプキンデータの構造について説明しています。このレッスンの_notebook.ipynb_ファイルには、事前に読み込まれ、整形されたデータが入っています。このファイルでは、カボチャの価格がブッシェル単位で新しいデータフレームに表示されています。 これらのノートブックを、Visual Studio Codeのカーネルで実行できることを確認してください。
+
+### 準備
+
+忘れてはならないのは、データを読み込んだら問いかけを行うことです。
+
+- カボチャを買うのに最適な時期はいつですか?
+- ミニカボチャ1ケースの価格はどのくらいでしょうか?
+- 半ブッシェルのバスケットで買うべきか、1 1/9ブッシェルの箱で買うべきか。
+
+データを掘り下げていきましょう。
+
+前回のレッスンでは、Pandasのデータフレームを作成し、元のデータセットの一部を入力して、ブッシェル単位の価格を標準化しました。しかし、この方法では、約400のデータポイントしか集めることができず、しかもそれは秋の期間のものでした。
+
+このレッスンに付属するノートブックで、あらかじめ読み込んでおいたデータを見てみましょう。データが事前に読み込まれ、月毎のデータが散布図として表示されています。データをもっと綺麗にすることで、データの性質をもう少し知ることができるかもしれません。
+
+## 線形回帰
+
+レッスン1で学んだように、線形回帰の演習では、以下のような線を描けるようになることが目標です。
+
+- **変数間の関係を示す。**
+- **予測を行う。** 新しいデータポイントが、その線のどこに位置するかを正確に予測することができる。
+
+このような線を描くことは、**最小二乗回帰 (Least-Squares Regression)** の典型的な例です。「最小二乗」という言葉は、回帰線を囲むすべてのデータポイントとの距離が二乗され、その後加算されることを意味しています。理想的には、最終的な合計ができるだけ小さくなるようにします。これはエラーの数、つまり「最小二乗」の値を小さくするためです。
+
+これは、すべてのデータポイントからの累積距離が最小となる直線をモデル化したいためです。また、方向ではなく大きさに注目しているので、足す前に項を二乗します。
+
+> **🧮 Show me the math**
+>
+> この線は、_line of best fit_ と呼ばれ、[方程式](https://en.wikipedia.org/wiki/Simple_linear_regression) で表すことができます。
+>
+> ```
+> Y = a + bX
+> ```
+>
+> `X`は「説明変数」です。`Y`は「目的変数」です。`a`は切片で`b`は直線の傾きを表します。`X=0`のとき、`Y`の値は切片`a`となります。
+>
+>
+>
+> はじめに、傾き`b`を計算してみます。[Jen Looper](https://twitter.com/jenlooper) によるインフォグラフィック。
+>
+> カボチャのデータに関する最初の質問である、「月毎のブッシェル単位でのカボチャの価格を予測してください」で言い換えてみると、`X`は価格を、`Y`は販売された月を表しています。
+>
+>
+>
+> Yの値を計算してみましょう。$4前後払っているなら、4月に違いありません によるインフォグラフィック。
+>
+> 直線を計算する数学は、直線の傾きを示す必要がありますが、これは切片、つまり「X = 0」のときに「Y」がどこに位置するかにも依存します。
+>
+> これらの値の計算方法は、[Math is Fun](https://www.mathsisfun.com/data/least-squares-regression.html) というサイトで見ることができます。また、[this Least-squares calculator](https://www.mathsisfun.com/data/least-squares-calculator.html) では、値が線にどのような影響を与えるかを見ることができます。
+
+## 相関関係
+
+もう一つの理解すべき用語は、与えられたXとYの変数間の**相関係数 (Correlation Coefficient)** です。散布図を使えば、この係数をすぐに可視化することができます。データポイントがきれいな直線上に散らばっているプロットは、高い相関を持っていますが、データポイントがXとYの間のあらゆる場所に散らばっているプロットは、低い相関を持っています。
+
+良い線形回帰モデルとは、最小二乗法によって求めた回帰線が高い相関係数 (0よりも1に近い)を持つものです。
+
+✅ このレッスンのノートを開いて、「都市と価格」の散布図を見てみましょう。散布図の視覚的な解釈によると、カボチャの販売に関する「都市」と「価格」の関連データは、相関性が高いように見えますか、それとも低いように見えますか?
+
+## 回帰に用いるデータの準備
+
+この演習の背景にある数学を理解したので、回帰モデルを作成して、どのパッケージのカボチャの価格が最も高いかを予測できるかどうかを確認してください。休日のパンプキンパッチ用にパンプキンを購入する人は、パッチ用のパンプキンパッケージの購入を最適化するために、この情報を必要とするかもしれません。
+
+ここではScikit-learnを使用するので、手作業で行う必要はありません。レッスンノートのメインのデータ処理ブロックに、Scikit-learnのライブラリを追加して、すべての文字列データを自動的に数字に変換します。
+
+```python
+from sklearn.preprocessing import LabelEncoder
+
+new_pumpkins.iloc[:, 0:-1] = new_pumpkins.iloc[:, 0:-1].apply(LabelEncoder().fit_transform)
+```
+
+new_pumpkinsデータフレームを見ると、すべての文字列が数値になっているのがわかります。これにより、人が読むのは難しくなりましたが、Scikit-learnにとってはとても分かりやすくなりました。
+これで、回帰に最も適したデータについて、(散布図を見ただけではなく)より高度な判断ができるようになりました。
+
+良い予測モデルを構築するために、データの2点間に良い相関関係を見つけようとします。その結果、「都市」と「価格」の間には弱い相関関係しかないことがわかりました。
+
+```python
+print(new_pumpkins['City'].corr(new_pumpkins['Price']))
+0.32363971816089226
+```
+
+しかし、パッケージと価格の間にはもう少し強い相関関係があります。これは理にかなっていると思いますか?通常、箱が大きければ大きいほど、価格は高くなります。
+
+```python
+print(new_pumpkins['Package'].corr(new_pumpkins['Price']))
+0.6061712937226021
+```
+
+このデータに対する良い質問は、次のようになります。「あるカボチャのパッケージの価格はどのくらいになるか?」
+
+この回帰モデルを構築してみましょう!
+
+## 線形モデルの構築
+
+モデルを構築する前に、もう一度データの整理をしてみましょう。NULLデータを削除し、データがどのように見えるかをもう一度確認します。
+
+```python
+new_pumpkins.dropna(inplace=True)
+new_pumpkins.info()
+```
+
+そして、この最小セットから新しいデータフレームを作成し、それを出力します。
+
+```python
+new_columns = ['Package', 'Price']
+lin_pumpkins = new_pumpkins.drop([c for c in new_pumpkins.columns if c not in new_columns], axis='columns')
+
+lin_pumpkins
+```
+
+```output
+ Package Price
+70 0 13.636364
+71 0 16.363636
+72 0 16.363636
+73 0 15.454545
+74 0 13.636364
+... ... ...
+1738 2 30.000000
+1739 2 28.750000
+1740 2 25.750000
+1741 2 24.000000
+1742 2 24.000000
+415 rows × 2 columns
+```
+
+1. これで、XとYの座標データを割り当てることができます。
+
+ ```python
+ X = lin_pumpkins.values[:, :1]
+ y = lin_pumpkins.values[:, 1:2]
+ ```
+✅ ここでは何をしていますか? Pythonの[スライス記法](https://stackoverflow.com/questions/509211/understanding-slice-notation/509295#509295) を使って、`X`と`y`の配列を作成しています。
+
+2. 次に、回帰モデル構築のためのルーチンを開始します。
+
+ ```python
+ from sklearn.linear_model import LinearRegression
+ from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error
+ from sklearn.model_selection import train_test_split
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+ lin_reg = LinearRegression()
+ lin_reg.fit(X_train,y_train)
+
+ pred = lin_reg.predict(X_test)
+
+ accuracy_score = lin_reg.score(X_train,y_train)
+ print('Model Accuracy: ', accuracy_score)
+ ```
+
+ 相関関係があまり良くないので、生成されたモデルもあまり正確ではありません。
+
+ ```output
+ Model Accuracy: 0.3315342327998987
+ ```
+
+3. 今回の過程で描かれた線を可視化します。
+
+ ```python
+ plt.scatter(X_test, y_test, color='black')
+ plt.plot(X_test, pred, color='blue', linewidth=3)
+
+ plt.xlabel('Package')
+ plt.ylabel('Price')
+
+ plt.show()
+ ```
+ 
+
+4. 架空の値に対してモデルをテストする。
+
+ ```python
+ lin_reg.predict( np.array([ [2.75] ]) )
+ ```
+
+ この架空の値に対して、以下の価格が返されます。
+
+ ```output
+ array([[33.15655975]])
+ ```
+
+回帰の線が正しく引かれていれば、その数字は理にかなっています。
+
+🎃 おめでとうございます!数種類のカボチャの価格を予測するモデルを作成しました。休日のパンプキンパッチは美しいものになるでしょう。でも、もっと良いモデルを作れるかもしれません。
+
+## 多項式回帰
+
+線形回帰のもう一つのタイプは、多項式回帰です。時には変数の間に直線的な関係 (カボチャの量が多いほど、価格は高くなる)があることもありますが、これらの関係は、平面や直線としてプロットできないこともあります。
+
+✅ 多項式回帰を使うことができる、[いくつかの例](https://online.stat.psu.edu/stat501/lesson/9/9.8) を示します。
+
+先ほどの散布図の「品種」と「価格」の関係をもう一度見てみましょう。この散布図は、必ずしも直線で分析しなければならないように見えますか?そうではないかもしれません。このような場合は、多項式回帰を試してみましょう。
+
+✅ 多項式とは、1つ以上の変数と係数で構成される数学的表現である。
+
+多項式回帰では、非線形データをよりよく適合させるために曲線を作成します。
+
+1. 元のカボチャのデータの一部を入力したデータフレームを作成してみましょう。
+
+ ```python
+ new_columns = ['Variety', 'Package', 'City', 'Month', 'Price']
+ poly_pumpkins = new_pumpkins.drop([c for c in new_pumpkins.columns if c not in new_columns], axis='columns')
+
+ poly_pumpkins
+ ```
+
+データフレーム内のデータ間の相関関係を視覚化するには、「coolwarm」チャートで表示するのが良いでしょう。
+
+2. `Background_gradient()` メソッドの引数に `coolwarm` を指定して使用します。
+
+ ```python
+ corr = poly_pumpkins.corr()
+ corr.style.background_gradient(cmap='coolwarm')
+ ```
+
+ このコードはヒートマップを作成します。
+ 
+
+このチャートを見ると、「パッケージ」と「価格」の間に正の相関関係があることが視覚化されています。つまり、前回のモデルよりも多少良いモデルを作ることができるはずです。
+
+### パイプラインの作成
+
+Scikit-learnには、多項式回帰モデルを構築するための便利なAPIである`make_pipeline` [API](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html?highlight=pipeline#sklearn.pipeline.make_pipeline) が用意されています。「パイプライン」は推定量の連鎖で作成されます。今回の場合、パイプラインには多項式の特徴量、非線形の経路を形成する予測値が含まれます。
+
+1. X列とy列を作ります。
+
+ ```python
+ X=poly_pumpkins.iloc[:,3:4].values
+ y=poly_pumpkins.iloc[:,4:5].values
+ ```
+
+2. `make_pipeline()` メソッドを呼び出してパイプラインを作成します。
+
+ ```python
+ from sklearn.preprocessing import PolynomialFeatures
+ from sklearn.pipeline import make_pipeline
+
+ pipeline = make_pipeline(PolynomialFeatures(4), LinearRegression())
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+ pipeline.fit(np.array(X_train), y_train)
+
+ y_pred=pipeline.predict(X_test)
+ ```
+
+### 系列の作成
+
+この時点で、パイプラインが系列を作成できるように、ソートされたデータで新しいデータフレームを作成する必要があります。
+
+以下のコードを追加します。
+
+ ```python
+ df = pd.DataFrame({'x': X_test[:,0], 'y': y_pred[:,0]})
+ df.sort_values(by='x',inplace = True)
+ points = pd.DataFrame(df).to_numpy()
+
+ plt.plot(points[:, 0], points[:, 1],color="blue", linewidth=3)
+ plt.xlabel('Package')
+ plt.ylabel('Price')
+ plt.scatter(X,y, color="black")
+ plt.show()
+ ```
+
+`pd.DataFrame` を呼び出して新しいデータフレームを作成しました。次に`sort_values()` を呼び出して値をソートしました。最後に多項式のプロットを作成しました。
+
+
+
+よりデータにフィットした曲線を確認することができます。
+
+モデルの精度を確認してみましょう。
+
+ ```python
+ accuracy_score = pipeline.score(X_train,y_train)
+ print('Model Accuracy: ', accuracy_score)
+ ```
+
+ これで完成です!
+
+ ```output
+ Model Accuracy: 0.8537946517073784
+ ```
+
+いい感じです!価格を予測してみましょう。
+
+### 予測の実行
+
+新しい値を入力し、予測値を取得できますか?
+
+`predict()` メソッドを呼び出して、予測を行います。
+
+ ```python
+ pipeline.predict( np.array([ [2.75] ]) )
+ ```
+ 以下の予測結果が得られます。
+
+ ```output
+ array([[46.34509342]])
+ ```
+
+プロットを見てみると、納得できそうです!そして、同じデータを見て、これが前のモデルよりも良いモデルであれば、より高価なカボチャのために予算を組む必要があります。
+
+🏆 お疲れ様でした!1つのレッスンで2つの回帰モデルを作成しました。回帰に関する最後のセクションでは、カテゴリーを決定するためのロジスティック回帰について学びます。
+
+---
+## 🚀チャレンジ
+
+このノートブックでいくつかの異なる変数をテストし、相関関係がモデルの精度にどのように影響するかを確認してみてください。
+
+## [講義後クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/14/)
+
+## レビュー & 自主学習
+
+このレッスンでは、線形回帰について学びました。回帰には他にも重要な種類があります。Stepwise、Ridge、Lasso、Elasticnetなどのテクニックをご覧ください。より詳しく学ぶには、[Stanford Statistical Learning course](https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning) が良いでしょう。
+
+## 課題
+
+[モデル構築](./assignment.ja.md)
diff --git a/2-Regression/3-Linear/translations/assignment.it.md b/2-Regression/3-Linear/translations/assignment.it.md
new file mode 100644
index 00000000..e5aaaa77
--- /dev/null
+++ b/2-Regression/3-Linear/translations/assignment.it.md
@@ -0,0 +1,11 @@
+# Creare un Modello di Regressione
+
+## Istruzioni
+
+In questa lezione è stato mostrato come costruire un modello utilizzando sia la Regressione Lineare che Polinomiale. Usando questa conoscenza, trovare un insieme di dati o utilizzare uno degli insiemi integrati di Scikit-Learn per costruire un modello nuovo. Spiegare nel proprio notebook perché si è scelto una determinata tecnica e dimostrare la precisione del modello. Se non è accurato, spiegare perché.
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | ------------------------------------------------------------ | -------------------------- | ------------------------------- |
+| | presenta un notebook completo con una soluzione ben documentata | La soluzione è incompleta | La soluzione è difettosa o contiene bug |
diff --git a/2-Regression/3-Linear/translations/assignment.ja.md b/2-Regression/3-Linear/translations/assignment.ja.md
new file mode 100644
index 00000000..d0f8a4c5
--- /dev/null
+++ b/2-Regression/3-Linear/translations/assignment.ja.md
@@ -0,0 +1,11 @@
+# 回帰モデルの作成
+
+## 課題の指示
+
+このレッスンでは、線形回帰と多項式回帰の両方を使ってモデルを構築する方法を紹介しました。この知識をもとに、自分でデータセットを探すか、Scikit-learnのビルトインセットの1つを使用して、新しいモデルを構築してください。手法を選んだ理由をノートブックに書き、モデルの精度を示してください。精度が十分でない場合は、その理由も説明してください。
+
+## ルーブリック
+
+| 指標 | 模範的 | 適切 | 要改善 |
+| -------- | ------------------------------------------------------------ | -------------------------- | ------------------------------- |
+| | ドキュメント化されたソリューションを含む完全なノートブックを提示する。 | 解決策が不完全である。 | 解決策に欠陥またはバグがある。 |
diff --git a/2-Regression/3-Linear/translations/assignment.zh-cn.md b/2-Regression/3-Linear/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..e9c476c3
--- /dev/null
+++ b/2-Regression/3-Linear/translations/assignment.zh-cn.md
@@ -0,0 +1,12 @@
+# 创建自己的回归模型
+
+## 说明
+
+在这节课中你学到了如何用线性回归和多项式回归建立一个模型。利用这些只是,找到一个你感兴趣的数据集或者是 Scikit-learn 内置的数据集来建立一个全新的模型。用你的 notebook 来解释为什么用了这种技术来对这个数据集进行建模,并且证明出你的模型的准确度。如果它没你想象中准确,请思考一下并解释一下原因。
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | ------------------------------------------------------------ | -------------------------- | ------------------------------- |
+| | 提交了一个完整的 notebook 工程文件,其中包含了解集,并且可读性良好 | 不完整的解集 | 解集是有缺陷或者有错误的 |
+
diff --git a/2-Regression/4-Logistic/README.md b/2-Regression/4-Logistic/README.md
index a4488c11..a2a938a7 100644
--- a/2-Regression/4-Logistic/README.md
+++ b/2-Regression/4-Logistic/README.md
@@ -140,7 +140,7 @@ Now that we have an idea of the relationship between the binary categories of co
> **🧮 Show Me The Math**
>
-> Remember how linear regression often used ordinary least squares to arrive at a value? Logistic regression relies on the concept of 'maximum likelihood' using [sigmoid functions](https://wikipedia.org/wiki/Sigmoid_function). A 'Sigmoid Function' on a plot looks like an 'S' shape. It takes a value and maps it to somewhere between 0 and 1. Its curve is also called a 'logistic curve'. Its formula looks like thus:
+> Remember how linear regression often used ordinary least squares to arrive at a value? Logistic regression relies on the concept of 'maximum likelihood' using [sigmoid functions](https://wikipedia.org/wiki/Sigmoid_function). A 'Sigmoid Function' on a plot looks like an 'S' shape. It takes a value and maps it to somewhere between 0 and 1. Its curve is also called a 'logistic curve'. Its formula looks like this:
>
> 
>
@@ -206,7 +206,7 @@ While you can get a scoreboard report [terms](https://scikit-learn.org/stable/mo
> 🎓 A '[confusion matrix](https://wikipedia.org/wiki/Confusion_matrix)' (or 'error matrix') is a table that expresses your model's true vs. false positives and negatives, thus gauging the accuracy of predictions.
-1. To use a confusion metrics, call `confusin_matrix()`:
+1. To use a confusion metrics, call `confusion_matrix()`:
```python
from sklearn.metrics import confusion_matrix
@@ -220,26 +220,35 @@ While you can get a scoreboard report [terms](https://scikit-learn.org/stable/mo
[ 33, 0]])
```
-What's going on here? Let's say our model is asked to classify items between two binary categories, category 'pumpkin' and category 'not-a-pumpkin'.
+In Scikit-learn, confusion matrices Rows (axis 0) are actual labels and columns (axis 1) are predicted labels.
-- If your model predicts something as a pumpkin and it belongs to category 'pumpkin' in reality we call it a true positive, shown by the top left number.
-- If your model predicts something as not a pumpkin and it belongs to category 'pumpkin' in reality we call it a false positive, shown by the top right number.
-- If your model predicts something as a pumpkin and it belongs to category 'not-a-pumpkin' in reality we call it a false negative, shown by the bottom left number.
-- If your model predicts something as not a pumpkin and it belongs to category 'not-a-pumpkin' in reality we call it a true negative, shown by the bottom right number.
+| |0|1|
+|:-:|:-:|:-:|
+|0|TN|FP|
+|1|FN|TP|
-
+What's going on here? Let's say our model is asked to classify pumpkins between two binary categories, category 'orange' and category 'not-orange'.
-> Infographic by [Jen Looper](https://twitter.com/jenlooper)
+- If your model predicts a pumpkin as not orange and it belongs to category 'not-orange' in reality we call it a true negative, shown by the top left number.
+- If your model predicts a pumpkin as orange and it belongs to category 'not-orange' in reality we call it a false negative, shown by the bottom left number.
+- If your model predicts a pumpkin as not orange and it belongs to category 'orange' in reality we call it a false positive, shown by the top right number.
+- If your model predicts a pumpkin as orange and it belongs to category 'orange' in reality we call it a true positive, shown by the bottom right number.
As you might have guessed it's preferable to have a larger number of true positives and true negatives and a lower number of false positives and false negatives, which implies that the model performs better.
-✅ Q: According to the confusion matrix, how did the model do? A: Not too bad; there are a good number of true positives but also several false negatives.
+How does the confusion matrix relate to precision and recall? Remember, the classification report printed above showed precision (0.83) and recall (0.98).
+
+Precision = tp / (tp + fp) = 162 / (162 + 33) = 0.8307692307692308
+
+Recall = tp / (tp + fn) = 162 / (162 + 4) = 0.9759036144578314
+
+✅ Q: According to the confusion matrix, how did the model do? A: Not too bad; there are a good number of true negatives but also several false negatives.
Let's revisit the terms we saw earlier with the help of the confusion matrix's mapping of TP/TN and FP/FN:
-🎓 Precision: TP/(TP + FN) The fraction of relevant instances among the retrieved instances (e.g. which labels were well-labeled)
+🎓 Precision: TP/(TP + FP) The fraction of relevant instances among the retrieved instances (e.g. which labels were well-labeled)
-🎓 Recall: TP/(TP + FP) The fraction of relevant instances that were retrieved, whether well-labeled or not
+🎓 Recall: TP/(TP + FN) The fraction of relevant instances that were retrieved, whether well-labeled or not
🎓 f1-score: (2 * precision * recall)/(precision + recall) A weighted average of the precision and recall, with best being 1 and worst being 0
@@ -252,6 +261,7 @@ Let's revisit the terms we saw earlier with the help of the confusion matrix's m
🎓 Weighted Avg: The calculation of the mean metrics for each label, taking label imbalance into account by weighting them by their support (the number of true instances for each label).
✅ Can you think which metric you should watch if you want your model to reduce the number of false negatives?
+
## Visualize the ROC curve of this model
This is not a bad model; its accuracy is in the 80% range so ideally you could use it to predict the color of a pumpkin given a set of variables.
@@ -284,7 +294,8 @@ In future lessons on classifications, you will learn how to iterate to improve y
---
## 🚀Challenge
-There's a lot more to unpack regarding logistic regression! But the best way to learn is to experiment. Find a dataset that lends itself to this type of analysis and build a model with it. What do you learn? tip: try [Kaggle](https://kaggle.com) for interesting datasets.
+There's a lot more to unpack regarding logistic regression! But the best way to learn is to experiment. Find a dataset that lends itself to this type of analysis and build a model with it. What do you learn? tip: try [Kaggle](https://www.kaggle.com/search?q=logistic+regression+datasets) for interesting datasets.
+
## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/16/)
## Review & Self Study
diff --git a/2-Regression/4-Logistic/translations/README.id.md b/2-Regression/4-Logistic/translations/README.id.md
index ac5a3a98..213ef4aa 100644
--- a/2-Regression/4-Logistic/translations/README.id.md
+++ b/2-Regression/4-Logistic/translations/README.id.md
@@ -230,10 +230,6 @@ Apa yang sedang terjadi di sini? Mari kita asumsi dulu bahwa model kita ditanyak
- Kalau modelmu memprediksi sesuati sebagai sebuah labu tetapi sebenarnya bukan sebuah labu, itu disebut negatif palsu yang diindikasi angka di pojok kiri bawah.
- Kalau modelmu memprediksi sesuati sebagai bukan sebuah labu dan memang benar sesuatu itu bukan sebuah labu, itu disebut negatif benar yang diindikasi angka di pojok kanan bawah.
-
-
-> Infografik oleh [Jen Looper](https://twitter.com/jenlooper)
-
Sebagaimana kamu mungkin sudah pikirkan, lebih baik dapat banyak positif benar dan negatif benar dan sedikit positif palsu dan negatif palsu. Implikasinya adalah performa modelnya bagus.
✅ Pertanyaan: Berdasarkan matriks kebingungan, modelnya baik tidak? Jawaban: Tidak buruk; ada banyak positif benar dan sedikit negatif palsu.
@@ -245,9 +241,9 @@ Mari kita lihat kembali istilah-istilah yang kita lihat tadi dengan bantuan matr
> NB: Negatif benar
> NP: Negatif palsu
-🎓 Presisi: PB/(PB + NP) Rasio titik data relevan antara semua titik data (seperti data mana yang benar dilabelkannya)
+🎓 Presisi: PB/(PB + PP) Rasio titik data relevan antara semua titik data (seperti data mana yang benar dilabelkannya)
-🎓 *Recall*: PB/(PB + PP) Rasio titk data relevan yang digunakan, maupun labelnya benar atau tidak.
+🎓 *Recall*: PB/(PB + NP) Rasio titk data relevan yang digunakan, maupun labelnya benar atau tidak.
🎓 *f1-score*: (2 * Presisi * *Recall*)/(Presisi + *Recall*) Sebuah rata-rata tertimbang antara presisi dan *recall*. 1 itu baik dan 0 itu buruk.
diff --git a/2-Regression/4-Logistic/translations/README.it.md b/2-Regression/4-Logistic/translations/README.it.md
new file mode 100644
index 00000000..e9940203
--- /dev/null
+++ b/2-Regression/4-Logistic/translations/README.it.md
@@ -0,0 +1,295 @@
+# Regressione logistica per prevedere le categorie
+
+
+> Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/15/)
+
+## Introduzione
+
+In questa lezione finale sulla Regressione, una delle tecniche _classiche_ di base di machine learning, si darà un'occhiata alla Regressione Logistica. Si dovrebbe utilizzare questa tecnica per scoprire modelli per prevedere le categorie binarie. Questa caramella è al cioccolato o no? Questa malattia è contagiosa o no? Questo cliente sceglierà questo prodotto o no?
+
+In questa lezione, si imparerà:
+
+- Una nuova libreria per la visualizzazione dei dati
+- Tecniche per la regressione logistica
+
+✅ Con questo [modulo di apprendimento](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-15963-cxa) si potrà approfondire la comprensione del lavoro con questo tipo di regressione
+## Prerequisito
+
+Avendo lavorato con i dati della zucca, ora si ha abbastanza familiarità con essi per rendersi conto che esiste una categoria binaria con cui è possibile lavorare: `Color` (Colore).
+
+Si costruisce un modello di regressione logistica per prevedere, date alcune variabili, di _che colore sarà probabilmente una data zucca_ (arancione 🎃 o bianca 👻).
+
+> Perché si parla di classificazione binaria in un gruppo di lezioni sulla regressione? Solo per comodità linguistica, poiché la regressione logistica è in [realtà un metodo di classificazione](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression), anche se lineare. Si scopriranno altri modi per classificare i dati nel prossimo gruppo di lezioni.
+
+## Definire la domanda
+
+Allo scopo, verrà espressa come binaria: 'Arancio' o 'Non Arancio'. C'è anche una categoria "striped" (a strisce) nell'insieme di dati, ma ci sono pochi casi, quindi non verrà presa in considerazione. Comunque scompare una volta rimossi i valori null dall'insieme di dati.
+
+> 🎃 Fatto divertente, a volte le zucche bianche vengono chiamate zucche "fantasma" Non sono molto facili da intagliare, quindi non sono così popolari come quelle arancioni ma hanno un bell'aspetto!
+
+## Informazioni sulla regressione logistica
+
+La regressione logistica differisce dalla regressione lineare, che si è appresa in precedenza, in alcuni importanti modi.
+
+### Classificazione Binaria
+
+La regressione logistica non offre le stesse caratteristiche della regressione lineare. La prima offre una previsione su una categoria binaria ("arancione o non arancione") mentre la seconda è in grado di prevedere valori continui, ad esempio data l'origine di una zucca e il momento del raccolto, di _quanto aumenterà il suo prezzo_.
+
+
+> Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+### Altre classificazioni:
+
+Esistono altri tipi di regressione logistica, inclusi multinomiale e ordinale:
+
+- **Multinomiale**, che implica avere più di una categoria: "arancione, bianco e a strisce".
+- **Ordinale**, che coinvolge categorie ordinate, utile se si volessero ordinare i risultati in modo logico, come le zucche che sono ordinate per un numero finito di dimensioni (mini,sm,med,lg,xl,xxl).
+
+
+> Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+### È ancora lineare
+
+Anche se questo tipo di Regressione riguarda le "previsioni di categoria", funziona ancora meglio quando esiste una chiara relazione lineare tra la variabile dipendente (colore) e le altre variabili indipendenti (il resto dell'insieme di dati, come il nome della città e le dimensioni) . È bene avere un'idea se c'è qualche linearità che divide queste variabili o meno.
+
+### Le variabili NON devono essere correlate
+
+Si ricorda come la regressione lineare ha funzionato meglio con più variabili correlate? La regressione logistica è l'opposto: le variabili non devono essere allineate. Funziona per questi dati che hanno correlazioni alquanto deboli.
+
+### Servono molti dati puliti
+
+La regressione logistica fornirà risultati più accurati se si utilizzano più dati; quindi si tenga a mente che, essendo l'insieme di dati sulla zucca piccolo, non è ottimale per questo compito
+
+✅ Si pensi ai tipi di dati che si prestano bene alla regressione logistica
+
+## Esercizio: riordinare i dati
+
+Innanzitutto, si puliscono un po 'i dati, eliminando i valori null e selezionando solo alcune delle colonne:
+
+1. Aggiungere il seguente codice:
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+
+ new_columns = ['Color','Origin','Item Size','Variety','City Name','Package']
+
+ new_pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
+
+ new_pumpkins.dropna(inplace=True)
+
+ new_pumpkins = new_pumpkins.apply(LabelEncoder().fit_transform)
+ ```
+
+ Si può sempre dare un'occhiata al nuovo dataframe:
+
+ ```python
+ new_pumpkins.info
+ ```
+
+### Visualizzazione - griglia affiancata
+
+A questo punto si è caricato di nuovo il [notebook iniziale](../notebook.ipynb) con i dati della zucca e lo si è pulito in modo da preservare un insieme di dati contenente alcune variabili, incluso `Color`. Si visualizza il dataframe nel notebook utilizzando una libreria diversa: [Seaborn](https://seaborn.pydata.org/index.html), che è costruita su Matplotlib, usata in precedenza.
+
+Seaborn offre alcuni modi accurati per visualizzare i dati. Ad esempio, si possono confrontare le distribuzioni dei dati per ogni punto in una griglia affiancata.
+
+1. Si crea una griglia di questo tipo istanziando `PairGrid`, usando i dati della zucca `new_pumpkins`, poi chiamando `map()`:
+
+ ```python
+ import seaborn as sns
+
+ g = sns.PairGrid(new_pumpkins)
+ g.map(sns.scatterplot)
+ ```
+
+ 
+
+ Osservando i dati fianco a fianco, si può vedere come i dati di Color si riferiscono alle altre colonne.
+
+ ✅ Data questa griglia del grafico a dispersione, quali sono alcune esplorazioni interessanti che si possono immaginare?
+
+### Usare un grafico a sciame
+
+Poiché Color è una categoria binaria (arancione o no), viene chiamata "dati categoriali" e richiede "un [approccio più specializzato](https://seaborn.pydata.org/tutorial/categorical.html?highlight=bar) alla visualizzazione". Esistono altri modi per visualizzare la relazione di questa categoria con altre variabili.
+
+È possibile visualizzare le variabili fianco a fianco con i grafici di Seaborn.
+
+1. Si provi un grafico a "sciame" per mostrare la distribuzione dei valori:
+
+ ```python
+ sns.swarmplot(x="Color", y="Item Size", data=new_pumpkins)
+ ```
+
+ 
+
+### Grafico violino
+
+Un grafico di tipo "violino" è utile in quanto è possibile visualizzare facilmente il modo in cui sono distribuiti i dati nelle due categorie. I grafici di tipo violino non funzionano così bene con insieme di dati più piccoli poiché la distribuzione viene visualizzata in modo più "liscio".
+
+1. Chiamare `catplot()` passando i parametri `x=Color`, `kind="violin"` :
+
+ ```python
+ sns.catplot(x="Color", y="Item Size",
+ kind="violin", data=new_pumpkins)
+ ```
+
+ 
+
+ ✅ Provare a creare questo grafico e altri grafici Seaborn, utilizzando altre variabili.
+
+Ora che si ha un'idea della relazione tra le categorie binarie di colore e il gruppo più ampio di dimensioni, si esplora la regressione logistica per determinare il probabile colore di una data zucca.
+
+> **🧮 Mostrami la matematica**
+>
+> Si ricorda come la regressione lineare usava spesso i minimi quadrati ordinari per arrivare a un valore? La regressione logistica si basa sul concetto di "massima verosimiglianza" utilizzando [le funzioni sigmoidi](https://wikipedia.org/wiki/Sigmoid_function). Una "Funzione Sigmoide" su un grafico ha l'aspetto di una forma a "S". Prende un valore e lo mappa da qualche parte tra 0 e 1. La sua curva è anche chiamata "curva logistica". La sua formula si presenta così:
+>
+> 
+>
+> dove il punto medio del sigmoide si trova nel punto 0 di x, L è il valore massimo della curva e k è la pendenza della curva. Se l'esito della funzione è maggiore di 0,5, all'etichetta in questione verrà assegnata la classe '1' della scelta binaria. In caso contrario, sarà classificata come '0'.
+
+## Costruire il modello
+
+Costruire un modello per trovare queste classificazioni binarie è sorprendentemente semplice in Scikit-learn.
+
+1. Si selezionano le variabili da utilizzare nel modello di classificazione e si dividono gli insiemi di training e test chiamando `train_test_split()`:
+
+ ```python
+ from sklearn.model_selection import train_test_split
+
+ Selected_features = ['Origin','Item Size','Variety','City Name','Package']
+
+ X = new_pumpkins[Selected_features]
+ y = new_pumpkins['Color']
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+ ```
+
+1. Ora si può addestrare il modello, chiamando `fit()` con i dati di addestramento e stamparne il risultato:
+
+ ```python
+ from sklearn.model_selection import train_test_split
+ from sklearn.metrics import accuracy_score, classification_report
+ from sklearn.linear_model import LogisticRegression
+
+ model = LogisticRegression()
+ model.fit(X_train, y_train)
+ predictions = model.predict(X_test)
+
+ print(classification_report(y_test, predictions))
+ print('Predicted labels: ', predictions)
+ print('Accuracy: ', accuracy_score(y_test, predictions))
+ ```
+
+ Si dia un'occhiata al tabellone segnapunti del modello. Non è male, considerando che si hanno solo circa 1000 righe di dati:
+
+ ```output
+ precision recall f1-score support
+
+ 0 0.85 0.95 0.90 166
+ 1 0.38 0.15 0.22 33
+
+ accuracy 0.82 199
+ macro avg 0.62 0.55 0.56 199
+ weighted avg 0.77 0.82 0.78 199
+
+ Predicted labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
+ 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 0 0 0 1 0 1 0 0 1 0 0 0 1 0]
+ ```
+
+## Migliore comprensione tramite una matrice di confusione
+
+Sebbene si possano ottenere [i termini](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report) del rapporto dei punteggi stampando gli elementi di cui sopra, si potrebbe essere in grado di comprendere più facilmente il modello utilizzando una [matrice di confusione](https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix) che aiuti a capire come lo stesso sta funzionando.
+
+> 🎓 Una '[matrice di confusione](https://it.wikipedia.org/wiki/Matrice_di_confusione)' (o 'matrice di errore') è una tabella che esprime i veri contro i falsi positivi e negativi del modello, misurando così l'accuratezza delle previsioni.
+
+1. Per utilizzare una metrica di confusione, si `chiama confusion_matrix()`:
+
+ ```python
+ from sklearn.metrics import confusion_matrix
+ confusion_matrix(y_test, predictions)
+ ```
+
+ Si dia un'occhiata alla matrice di confusione del modello:
+
+ ```output
+ array([[162, 4],
+ [ 33, 0]])
+ ```
+
+Cosa sta succedendo qui? Si supponga che al modello venga chiesto di classificare gli elementi tra due categorie binarie, la categoria "zucca" e la categoria "non una zucca".
+
+- Se il modello prevede qualcosa come una zucca e appartiene alla categoria 'zucca' in realtà lo si chiama un vero positivo, mostrato dal numero in alto a sinistra.
+- Se il modello prevede qualcosa come non una zucca e appartiene alla categoria 'zucca' in realtà si chiama falso positivo, mostrato dal numero in alto a destra.
+- Se il modello prevede qualcosa come una zucca e appartiene alla categoria 'non-una-zucca' in realtà si chiama falso negativo, mostrato dal numero in basso a sinistra.
+- Se il modello prevede qualcosa come non una zucca e appartiene alla categoria 'non-una-zucca' in realtà lo si chiama un vero negativo, mostrato dal numero in basso a destra.
+
+Come si sarà intuito, è preferibile avere un numero maggiore di veri positivi e veri negativi e un numero inferiore di falsi positivi e falsi negativi, il che implica che il modello funziona meglio.
+
+✅ Domanda: Secondo la matrice di confusione, come si è comportato il modello? Risposta: Non male; ci sono un buon numero di veri positivi ma anche diversi falsi negativi.
+
+I termini visti in precedenza vengono rivisitati con l'aiuto della mappatura della matrice di confusione di TP/TN e FP/FN:
+
+🎓 Precisione: TP/(TP + FP) La frazione di istanze rilevanti tra le istanze recuperate (ad es. quali etichette erano ben etichettate)
+
+🎓 Richiamo: TP/(TP + FN) La frazione di istanze rilevanti che sono state recuperate, ben etichettate o meno
+
+🎓 f1-score: (2 * precisione * richiamo)/(precisione + richiamo) Una media ponderata della precisione e del richiamo, dove il migliore è 1 e il peggiore è 0
+
+🎓 Supporto: il numero di occorrenze di ciascuna etichetta recuperata
+
+🎓 Accuratezza: (TP + TN)/(TP + TN + FP + FN) La percentuale di etichette prevista accuratamente per un campione.
+
+🎓 Macro Media: il calcolo delle metriche medie non ponderate per ciascuna etichetta, senza tener conto dello squilibrio dell'etichetta.
+
+🎓 Media ponderata: il calcolo delle metriche medie per ogni etichetta, tenendo conto dello squilibrio dell'etichetta pesandole in base al loro supporto (il numero di istanze vere per ciascuna etichetta).
+
+✅ Si riesce a pensare a quale metrica si dovrebbe guardare se si vuole che il modello riduca il numero di falsi negativi?
+
+## Visualizzare la curva ROC di questo modello
+
+Questo non è un cattivo modello; la sua precisione è nell'intervallo dell'80%, quindi idealmente si potrebbe usare per prevedere il colore di una zucca dato un insieme di variabili.
+
+Si rende un'altra visualizzazione per vedere il cosiddetto punteggio 'ROC':
+
+```python
+from sklearn.metrics import roc_curve, roc_auc_score
+
+y_scores = model.predict_proba(X_test)
+# calculate ROC curve
+fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
+sns.lineplot([0, 1], [0, 1])
+sns.lineplot(fpr, tpr)
+```
+Usando di nuovo Seaborn, si traccia la [Caratteristica Operativa di Ricezione](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc) o il ROC del modello. Le curve ROC vengono spesso utilizzate per ottenere una visualizzazione dell'output di un classificatore in termini di veri e falsi positivi. "Le curve ROC in genere presentano un tasso di veri positivi sull'asse Y e un tasso di falsi positivi sull'asse X". Pertanto, la ripidità della curva e lo spazio tra la linea del punto medio e la curva contano: si vuole una curva che si sposti rapidamente verso l'alto e oltre la linea. In questo caso, ci sono falsi positivi con cui iniziare, quindi la linea si dirige correttamente:
+
+
+
+Infine, si usa l'[`API roc_auc_score`](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html?highlight=roc_auc#sklearn.metrics.roc_auc_score) di Scikit-learn per calcolare l'effettiva "Area sotto la curva" (AUC):
+
+```python
+auc = roc_auc_score(y_test,y_scores[:,1])
+print(auc)
+```
+Il risultato è `0.6976998904709748`. Dato che l'AUC varia da 0 a 1, si desidera un punteggio elevato, poiché un modello corretto al 100% nelle sue previsioni avrà un AUC di 1; in questo caso, il modello è _abbastanza buono_.
+
+Nelle lezioni future sulle classificazioni si imparerà come eseguire l'iterazione per migliorare i punteggi del modello. Ma per ora, congratulazioni! Si sono completate queste lezioni di regressione!
+
+---
+## 🚀 Sfida
+
+C'è molto altro da svelare riguardo alla regressione logistica! Ma il modo migliore per imparare è sperimentare. Trovare un insieme di dati che si presti a questo tipo di analisi e costruire un modello con esso. Cosa si è appreso? suggerimento: provare [Kaggle](https://kaggle.com) per ottenere insiemi di dati interessanti.
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/16/)
+
+## Revisione e Auto Apprendimento
+
+Leggere le prime pagine di [questo articolo da Stanford](https://web.stanford.edu/~jurafsky/slp3/5.pdf) su alcuni usi pratici della regressione logistica. Si pensi alle attività più adatte per l'uno o l'altro tipo di attività di regressione studiate fino a questo punto. Cosa funzionerebbe meglio?
+
+## Compito
+
+[Ritentare questa regressione](assignment.it.md)
diff --git a/2-Regression/4-Logistic/translations/README.ja.md b/2-Regression/4-Logistic/translations/README.ja.md
new file mode 100644
index 00000000..3e384824
--- /dev/null
+++ b/2-Regression/4-Logistic/translations/README.ja.md
@@ -0,0 +1,310 @@
+# カテゴリ予測のためのロジスティック回帰
+
+
+> [Dasani Madipalli](https://twitter.com/dasani_decoded) によるインフォグラフィック
+## [講義前のクイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/15/)
+
+## イントロダクション
+
+回帰の最後のレッスンでは、古典的な機械学習手法の一つである、「ロジスティック回帰」を見ていきます。この手法は、2値のカテゴリを予測するためのパターンを発見するために使います。例えば、「このお菓子は、チョコレートかどうか?」、「この病気は伝染するかどうか?」、「この顧客は、この商品を選ぶかどうか?」などです。
+
+このレッスンでは以下の内容を扱います。
+
+- データを可視化するための新しいライブラリ
+- ロジスティック回帰について
+
+✅ この[モジュール](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-15963-cxa) では、今回のタイプのような回帰について理解を深めることができます。
+
+## 前提条件
+
+カボチャのデータを触ったことで、データの扱いにかなり慣れてきました。その際にバイナリカテゴリが一つあることに気づきました。「`Color`」です。
+
+いくつかの変数が与えられたときに、あるカボチャがどのような色になる可能性が高いか (オレンジ🎃または白👻)を予測するロジスティック回帰モデルを構築してみましょう。
+
+> なぜ、回帰についてのレッスンで二値分類の話をしているのでしょうか?ロジスティック回帰は、線形ベースのものではありますが、[実際には分類法](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression) であるため、言語的な便宜上です。次のレッスングループでは、データを分類する他の方法について学びます。
+
+## 質問の定義
+
+ここでは、「Orange」か「Not Orange」かの二値で表現しています。データセットには「striped」というカテゴリーもありますが、ほとんど例がないので、ここでは使いません。データセットからnull値を削除すると、このカテゴリーは消えてしまいます。
+
+> 🎃 面白いことに、白いカボチャを「お化けカボチャ」と呼ぶことがあります。彫るのが簡単ではないので、オレンジ色のカボチャほど人気はありませんが、見た目がクールですよね!
+
+## ロジスティック回帰について
+
+ロジスティック回帰は、前回学んだ線形回帰とは、いくつかの重要な点で異なります。
+
+### 2値分類
+
+ロジスティック回帰は、線形回帰とは異なる特徴を持っています。ロジスティック回帰は、二値のカテゴリー(「オレンジ色かオレンジ色でないか」)についての予測を行うのに対し、線形回帰は連続的な値を予測します。例えば、カボチャの産地と収穫時期が与えられれば、その価格がどれだけ上昇するかを予測することができます。
+
+
+> [Dasani Madipalli](https://twitter.com/dasani_decoded) によるインフォグラフィック
+### その他の分類
+
+ロジスティック回帰には他にもMultinomialやOrdinalなどの種類があります。
+
+- **Multinomial**: これは2つ以上のカテゴリーを持つ場合です。 (オレンジ、白、ストライプ)
+- **Ordinal**: これは、順序付けられたカテゴリを含むもので、有限の数のサイズ(mini、sm、med、lg、xl、xxl)で並べられたカボチャのように、結果を論理的に並べたい場合に便利です。
+
+
+> [Dasani Madipalli](https://twitter.com/dasani_decoded) によるインフォグラフィック
+
+### 線形について
+
+このタイプの回帰は、「カテゴリーの予測」が目的ですが、従属変数(色)と他の独立変数(都市名やサイズなどのデータセットの残りの部分)の間に明確な線形関係がある場合に最も効果的です。これらの変数を分ける線形性があるかどうかを把握するのは良いことです。
+
+### 変数が相関している必要はない
+
+線形回帰は、相関性の高い変数ほどよく働くことを覚えていますか?ロジスティック回帰は、そうとは限りません。相関関係がやや弱いこのデータには有効ですね。
+
+### 大量のきれいなデータが必要です
+
+一般的にロジスティック回帰は、より多くのデータを使用すれば、より正確な結果が得られます。私たちの小さなデータセットは、このタスクには最適ではありませんので、その点に注意してください。
+
+✅ ロジスティック回帰に適したデータの種類を考えてみてください。
+
+## エクササイズ - データの整形
+
+まず、NULL値を削除したり、一部の列だけを選択したりして、データを少し綺麗にします。
+
+1. 以下のコードを追加:
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+
+ new_columns = ['Color','Origin','Item Size','Variety','City Name','Package']
+
+ new_pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)
+
+ new_pumpkins.dropna(inplace=True)
+
+ new_pumpkins = new_pumpkins.apply(LabelEncoder().fit_transform)
+ ```
+
+ 新しいデータフレームはいつでも確認することができます。
+
+ ```python
+ new_pumpkins.info
+ ```
+
+### 可視化 - グリッド状に並べる
+
+ここまでで、[スターターノートブック](../notebook.ipynb) にパンプキンデータを再度読み込み、`Color`を含むいくつかの変数を含むデータセットを保持するように整形しました。別のライブラリを使って、ノートブック内のデータフレームを可視化してみましょう。[Seaborn](https://seaborn.pydata.org/index.html) というライブラリを使って、ノートブック内のデータフレームを可視化してみましょう。このライブラリは、今まで使っていた`Matplotlib`をベースにしています。
+
+Seabornには、データを可視化するためのいくつかの優れた方法があります。例えば、各データの分布を横並びのグリッドで比較することができます。
+
+1. かぼちゃのデータ`new_pumpkins`を使って、`PairGrid`をインスタンス化し、`map()`メソッドを呼び出して、以下のようなグリッドを作成します。
+
+ ```python
+ import seaborn as sns
+
+ g = sns.PairGrid(new_pumpkins)
+ g.map(sns.scatterplot)
+ ```
+
+ 
+
+ データを並べて観察することで、Colorのデータが他の列とどのように関連しているのかを知ることができます。
+
+ ✅ この散布図をもとに、どのような面白い試みが考えられるでしょうか?
+
+### swarm plot
+
+Colorは2つのカテゴリー(Orange or Not)であるため、「カテゴリカルデータ」と呼ばれ、「可視化にはより[専門的なアプローチ](https://seaborn.pydata.org/tutorial/categorical.html?highlight=bar) 」が必要となります。このカテゴリと他の変数との関係を可視化する方法は他にもあります。
+
+Seabornプロットでは、変数を並べて表示することができます。
+
+1. 値の分布を示す、'swarm' plotを試してみます。
+
+ ```python
+ sns.swarmplot(x="Color", y="Item Size", data=new_pumpkins)
+ ```
+
+ 
+
+### Violin plot
+
+'violin' タイプのプロットは、2つのカテゴリーのデータがどのように分布しているかを簡単に視覚化できるので便利です。Violin plotは、分布がより「滑らか」に表示されるため、データセットが小さい場合はあまりうまくいきません。
+
+1. パラメータとして`x=Color`、`kind="violin"` をセットし、 `catplot()`メソッドを呼びます。
+
+ ```python
+ sns.catplot(x="Color", y="Item Size",
+ kind="violin", data=new_pumpkins)
+ ```
+
+ 
+
+ ✅ 他の変数を使って、このプロットや他のSeabornのプロットを作成してみてください。
+
+さて、`Color`の二値カテゴリと、より大きなサイズのグループとの関係がわかったところで、ロジスティック回帰を使って、あるカボチャの色について調べてみましょう。
+
+> **🧮 数学の確認**
+>
+> 線形回帰では、通常の最小二乗法を用いて値を求めることが多かったことを覚えていますか?ロジスティック回帰は、[シグモイド関数](https://wikipedia.org/wiki/Sigmoid_function) を使った「最尤」の概念に依存しています。シグモイド関数は、プロット上では「S」字のように見えます。その曲線は「ロジスティック曲線」とも呼ばれます。数式は次のようになります。
+>
+> 
+>
+> ここで、シグモイドの中点はx=0の点、Lは曲線の最大値、kは曲線の急峻さを表します。この関数の結果が0.5以上であれば、そのラベルは二値選択のクラス「1」になります。そうでない場合は、「0」に分類されます。
+
+## モデルの構築
+
+これらの二値分類を行うためのモデルの構築は、Scikit-learnでは驚くほど簡単にできます。
+
+1. 分類モデルで使用したい変数を選択し、`train_test_split()`メソッドでトレーニングセットとテストセットを分割します。
+
+ ```python
+ from sklearn.model_selection import train_test_split
+
+ Selected_features = ['Origin','Item Size','Variety','City Name','Package']
+
+ X = new_pumpkins[Selected_features]
+ y = new_pumpkins['Color']
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+ ```
+
+2. これで、学習データを使って`fit()`メソッドを呼び出し、モデルを訓練し、その結果を出力することができます。
+
+ ```python
+ from sklearn.model_selection import train_test_split
+ from sklearn.metrics import accuracy_score, classification_report
+ from sklearn.linear_model import LogisticRegression
+
+ model = LogisticRegression()
+ model.fit(X_train, y_train)
+ predictions = model.predict(X_test)
+
+ print(classification_report(y_test, predictions))
+ print('Predicted labels: ', predictions)
+ print('Accuracy: ', accuracy_score(y_test, predictions))
+ ```
+
+ モデルのスコアボードを見てみましょう。1000行程度のデータしかないことを考えると、悪くないと思います。
+
+ ```output
+ precision recall f1-score support
+
+ 0 0.85 0.95 0.90 166
+ 1 0.38 0.15 0.22 33
+
+ accuracy 0.82 199
+ macro avg 0.62 0.55 0.56 199
+ weighted avg 0.77 0.82 0.78 199
+
+ Predicted labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0
+ 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 1 1
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 0 0 0 1 0 1 0 0 1 0 0 0 1 0]
+ ```
+
+## 混同行列による理解度の向上
+
+
+上記の項目を出力することで[スコアボードレポート](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report) を得ることができますが、[混同行列](https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix) を使うことで、より簡単にモデルを理解することができるかもしれません。
+
+
+> 🎓 [混同行列](https://wikipedia.org/wiki/Confusion_matrix) とは、モデルの真の陽性と陰性を表す表で、予測の正確さを測ることができます。
+
+1. `confusion_matrix()`メソッドを呼んで、混同行列を作成します。
+
+ ```python
+ from sklearn.metrics import confusion_matrix
+ confusion_matrix(y_test, predictions)
+ ```
+
+ T作成したモデルの混同行列をみてみてください。
+
+ ```output
+ array([[162, 4],
+ [ 33, 0]])
+ ```
+
+Scikit-learnでは、混同行列の行 (axis=0)が実際のラベル、列 (axis=1)が予測ラベルとなります。
+
+| |0|1|
+|:-:|:-:|:-:|
+|0|TN|FP|
+|1|FN|TP|
+
+ここで何が起こっているのか?例えば、カボチャを「オレンジ色」と「オレンジ色でない」という2つのカテゴリーに分類するように求められたとしましょう。
+
+- モデルではオレンジ色ではないと予測されたカボチャが、実際には「オレンジ色ではない」というカテゴリーに属していた場合、「true negative」と呼ばれ、左上の数字で示されます。
+- モデルではオレンジ色と予測されたカボチャが、実際には「オレンジ色ではない」カテゴリーに属していた場合、「false negative」と呼ばれ、左下の数字で示されます。
+- モデルがオレンジではないと予測したかぼちゃが、実際にはカテゴリー「オレンジ」に属していた場合、「false positive」と呼ばれ、右上の数字で示されます。
+- モデルがカボチャをオレンジ色と予測し、それが実際にカテゴリ「オレンジ」に属する場合、「true positive」と呼ばれ、右下の数字で示されます。
+
+お気づきの通り、true positiveとtrue negativeの数が多く、false positiveとfalse negativeの数が少ないことが好ましく、これはモデルの性能が高いことを意味します。
+
+混同行列は、precisionとrecallにどのように関係するのでしょうか?上記の分類レポートでは、precision(0.83)とrecall(0.98)が示されています。
+
+Precision = tp / (tp + fp) = 162 / (162 + 33) = 0.8307692307692308
+
+Recall = tp / (tp + fn) = 162 / (162 + 4) = 0.9759036144578314
+
+✅ Q: 混同行列によると、モデルの出来はどうでしたか? A: 悪くありません。true negativeがかなりの数ありますが、false negativeもいくつかあります。
+
+先ほどの用語を、混同行列のTP/TNとFP/FNのマッピングを参考にして再確認してみましょう。
+
+🎓 Precision: TP/(TP + FP) 探索されたインスタンスのうち、関連性のあるインスタンスの割合(どのラベルがよくラベル付けされていたかなど)。
+
+🎓 Recall: TP/(TP + FN) ラベリングされているかどうかに関わらず、探索された関連インスタンスの割合です。
+
+🎓 f1-score: (2 * precision * recall)/(precision + recall) precisionとrecallの加重平均で、最高が1、最低が0となる。
+
+🎓 Support: 取得した各ラベルの出現回数です。
+
+🎓 Accuracy: (TP + TN)/(TP + TN + FP + FN) サンプルに対して正確に予測されたラベルの割合です。
+
+🎓 Macro Avg: 各ラベルの非加重平均指標の計算で、ラベルの不均衡を考慮せずに算出される。
+
+🎓 Weighted Avg: 各ラベルのサポート数(各ラベルの真のインスタンス数)で重み付けすることにより、ラベルの不均衡を考慮して、各ラベルの平均指標を算出する。
+
+✅ 自分のモデルでfalse negativeの数を減らしたい場合、どの指標に注目すべきか考えられますか?
+
+## モデルのROC曲線を可視化する
+
+これは悪いモデルではありません。精度は80%の範囲で、理想的には、一連の変数が与えられたときにカボチャの色を予測するのに使うことができます。
+
+いわゆる「ROC」スコアを見るために、もう一つの可視化を行ってみましょう。
+
+```python
+from sklearn.metrics import roc_curve, roc_auc_score
+
+y_scores = model.predict_proba(X_test)
+# calculate ROC curve
+fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
+sns.lineplot([0, 1], [0, 1])
+sns.lineplot(fpr, tpr)
+```
+Seaborn を再度使用して、モデルの [受信者操作特性 (Receiving Operating Characteristic)](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc) またはROCをプロットします。ROC曲線は、分類器の出力を、true positiveとfalse positiveの観点から見るためによく使われます。ROC曲線は通常、true positive rateをY軸に、false positive rateをX軸にとっています。したがって、曲線の急峻さと、真ん中の線形な線と曲線の間のスペースが重要で、すぐに頭を上げて中線を超えるような曲線を求めます。今回のケースでは、最初にfalse positiveが出て、その後、ラインがきちんと上に向かって超えていきます。
+
+
+
+最後に、Scikit-learnの[`roc_auc_score` API](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html?highlight=roc_auc#sklearn.metrics.roc_auc_score) を使って、実際の「Area Under the Curve」(AUC)を計算します。
+
+```python
+auc = roc_auc_score(y_test,y_scores[:,1])
+print(auc)
+```
+結果は`0.6976998904709748`となりました。AUCの範囲が0から1であることを考えると、大きなスコアが欲しいところです。なぜなら、予測が100%正しいモデルはAUCが1になるからです。
+
+今後の分類のレッスンでは、モデルのスコアを向上させるための反復処理の方法を学びます。一旦おめでとうございます。あなたはこの回帰のレッスンを完了しました。
+
+---
+## 🚀チャレンジ
+
+ロジスティック回帰については、まだまだ解き明かすべきことがたくさんあります。しかし、学ぶための最良の方法は、実験することです。この種の分析に適したデータセットを見つけて、それを使ってモデルを構築してみましょう。ヒント:面白いデータセットを探すために[Kaggle](https://www.kaggle.com/search?q=logistic+regression+datasets) を試してみてください。
+
+## [講義後クイズ](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/16/)
+
+## レビュー & 自主学習
+
+ロジスティック回帰の実用的な使い方について、[Stanfordからのこの論文](https://web.stanford.edu/~jurafsky/slp3/5.pdf) の最初の数ページを読んでみてください。これまで学んできた回帰タスクのうち、どちらか一方のタイプに適したタスクについて考えてみてください。何が一番うまくいくでしょうか?
+
+## 課題
+
+[回帰に再挑戦する](./assignment.ja.md)
diff --git a/2-Regression/4-Logistic/translations/README.zh-cn.md b/2-Regression/4-Logistic/translations/README.zh-cn.md
index 52453de5..5de169fb 100644
--- a/2-Regression/4-Logistic/translations/README.zh-cn.md
+++ b/2-Regression/4-Logistic/translations/README.zh-cn.md
@@ -120,7 +120,7 @@ Seaborn提供了一些巧妙的方法来可视化你的数据。例如,你可
sns.swarmplot(x="Color", y="Item Size", data=new_pumpkins)
```
- 
+ 
### 小提琴图
@@ -133,7 +133,7 @@ Seaborn提供了一些巧妙的方法来可视化你的数据。例如,你可
kind="violin", data=new_pumpkins)
```
- 
+ 
✅ 尝试使用其他变量创建此图和其他Seaborn图。
@@ -228,19 +228,15 @@ Seaborn提供了一些巧妙的方法来可视化你的数据。例如,你可
- 如果你的模型将某物预测为南瓜并且它实际上属于“非南瓜”类别,我们将其称为假阴性,由左下角的数字显示。
- 如果你的模型预测某物不是南瓜,并且它实际上属于“非南瓜”类别,我们将其称为真阴性,如右下角的数字所示。
-
-
-> 作者[Jen Looper](https://twitter.com/jenlooper)
-
正如你可能已经猜到的那样,最好有更多的真阳性和真阴性以及较少的假阳性和假阴性,这意味着模型性能更好。
✅ Q:根据混淆矩阵,模型怎么样? A:还不错;有很多真阳性,但也有一些假阴性。
让我们借助混淆矩阵对TP/TN和FP/FN的映射,重新审视一下我们之前看到的术语:
-🎓 准确率:TP/(TP+FN)检索实例中相关实例的分数(例如,哪些标签标记得很好)
+🎓 准确率:TP/(TP + FP) 检索实例中相关实例的分数(例如,哪些标签标记得很好)
-🎓 召回率: TP/(TP + FP) 检索到的相关实例的比例,无论是否标记良好
+🎓 召回率: TP/(TP + FN) 检索到的相关实例的比例,无论是否标记良好
🎓 F1分数: (2 * 准确率 * 召回率)/(准确率 + 召回率) 准确率和召回率的加权平均值,最好为1,最差为0
diff --git a/2-Regression/4-Logistic/translations/assignment.it.md b/2-Regression/4-Logistic/translations/assignment.it.md
new file mode 100644
index 00000000..7b9b2016
--- /dev/null
+++ b/2-Regression/4-Logistic/translations/assignment.it.md
@@ -0,0 +1,10 @@
+# Riprovare un po' di Regressione
+
+## Istruzioni
+
+Nella lezione è stato usato un sottoinsieme dei dati della zucca. Ora si torna ai dati originali e si prova a usarli tutti, puliti e standardizzati, per costruire un modello di regressione logistica.
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | ----------------------------------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------------- |
+| | Un notebook viene presentato con un modello ben spiegato con buone prestazioni | Un notebook viene presentato con un modello dalle prestazioni minime | Un notebook viene presentato con un modello con scarse o nessuna prestazione |
diff --git a/2-Regression/4-Logistic/translations/assignment.ja.md b/2-Regression/4-Logistic/translations/assignment.ja.md
new file mode 100644
index 00000000..6c838173
--- /dev/null
+++ b/2-Regression/4-Logistic/translations/assignment.ja.md
@@ -0,0 +1,11 @@
+# 回帰に再挑戦する
+
+## 課題の指示
+
+レッスンでは、カボチャのデータのサブセットを使用しました。今度は、元のデータに戻って、ロジスティック回帰モデルを構築するために、整形して標準化したデータをすべて使ってみましょう。
+
+## ルーブリック
+
+| 指標 | 模範的 | 適切 | 要改善 |
+| -------- | ----------------------------------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------------- |
+| | 説明がわかりやすく、性能の良いモデルが含まれたノートブック| 最小限の性能しか発揮できないモデルが含まれたノートブック | 性能の劣るモデルや、何もないモデルが含まれたノートブック |
diff --git a/2-Regression/4-Logistic/translations/assignment.zh-cn.md b/2-Regression/4-Logistic/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..8dc55af3
--- /dev/null
+++ b/2-Regression/4-Logistic/translations/assignment.zh-cn.md
@@ -0,0 +1,11 @@
+# 再探回归模型
+
+## 说明
+
+在这节课中,你使用了 pumpkin 数据集的子集。现在,让我们回到原始数据,并尝试使用所有数据。经过了数据清理和标准化,建立一个逻辑回归模型。
+
+## 评判标准
+
+| 标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | ----------------------------------------------------------------------- | ------------------------------------------------------------ | ----------------------------------------------------------- |
+| | 用notebook呈现了一个解释性和性能良好的模型 | 用notebook呈现了一个性能一般的模型 | 用notebook呈现了一个性能差的模型或根本没有模型 |
diff --git a/2-Regression/translations/README.fr.md b/2-Regression/translations/README.fr.md
new file mode 100644
index 00000000..1b252f3f
--- /dev/null
+++ b/2-Regression/translations/README.fr.md
@@ -0,0 +1,33 @@
+# Modèles de régression pour le machine learning
+## Sujet régional : Modèles de régression des prix des citrouilles en Amérique du Nord 🎃
+
+En Amérique du Nord, les citrouilles sont souvent sculptées en visages effrayants pour Halloween. Découvrons-en plus sur ces légumes fascinants!
+
+
+> Photo de Beth Teutschmann sur Unsplash
+
+## Ce que vous apprendrez
+
+Les leçons de cette section couvrent les types de régression dans le contexte du machine learning. Les modèles de régression peuvent aider à déterminer la _relation_ entre les variables. Ce type de modèle peut prédire des valeurs telles que la longueur, la température ou l'âge, découvrant ainsi les relations entre les variables lors de l'analyse des points de données.
+
+Dans cette série de leçons, vous découvrirez la différence entre la régression linéaire et la régression logistique, et quand vous devriez utiliser l'une ou l'autre.
+
+Dans ce groupe de leçons, vous serez préparé afin de commencer les tâches de machine learning, y compris la configuration de Visual Studio Code pour gérer les blocs-notes, l'environnement commun pour les scientifiques des données. Vous découvrirez Scikit-learn, une bibliothèque pour le machine learning, et vous construirez vos premiers modèles, en vous concentrant sur les modèles de régression dans ce chapitre.
+
+> Il existe des outils low-code utiles qui peuvent vous aider à apprendre à travailler avec des modèles de régression. Essayez [Azure ML pour cette tâche](https://docs.microsoft.com/learn/modules/create-regression-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
+
+### Cours
+
+1. [Outils du métier](1-Tools/translations/README.fr.md)
+2. [Gestion des données](2-Data/translations/README.fr.md)
+3. [Régression linéaire et polynomiale](3-Linear/translations/README.fr.md)
+4. [Régression logistique](4-Logistic/translations/README.fr.md)
+
+---
+### Crédits
+
+"ML avec régression" a été écrit avec ♥️ par [Jen Looper](https://twitter.com/jenlooper)
+
+♥️ Les contributeurs du quiz incluent : [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan) et [Ornella Altunyan](https://twitter.com/ornelladotcom)
+
+L'ensemble de données sur la citrouille est suggéré par [ce projet sur Kaggle](https://www.kaggle.com/usda/a-year-of-pumpkin-prices) et ses données proviennent des [Rapports standard des marchés terminaux des cultures spécialisées](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) distribué par le département américain de l'Agriculture. Nous avons ajouté quelques points autour de la couleur en fonction de la variété pour normaliser la distribution. Ces données sont dans le domaine public.
diff --git a/2-Regression/translations/README.it.md b/2-Regression/translations/README.it.md
new file mode 100644
index 00000000..c6e957f9
--- /dev/null
+++ b/2-Regression/translations/README.it.md
@@ -0,0 +1,34 @@
+# Modelli di regressione per machine learning
+
+## Argomento regionale: modelli di Regressione per i prezzi della zucca in Nord America 🎃
+
+In Nord America, le zucche sono spesso intagliate in facce spaventose per Halloween. Si scoprirà di più su queste affascinanti verdure!
+
+
+> Foto di Beth Teutschmann su Unsplash
+
+## Cosa si imparerà
+
+Le lezioni in questa sezione riguardano i tipi di regressione nel contesto di machine learning. I modelli di regressione possono aiutare a determinare la _relazione_ tra le variabili. Questo tipo di modello può prevedere valori come lunghezza, temperatura o età, scoprendo così le relazioni tra le variabili mentre analizza i punti dati.
+
+In questa serie di lezioni si scoprirà la differenza tra regressione lineare e regressione logistica e quando si dovrebbe usare l'una o l'altra.
+
+In questo gruppo di lezioni si imposterà una configurazione per iniziare le attività di machine learning, inclusa la configurazione di Visual Studio Code per gestire i notebook, l'ambiente comune per i data scientist. Si scoprirà Scikit-learn, una libreria per machine learning, e si creeranno i primi modelli, concentrandosi in questo capitolo sui modelli di Regressione.
+
+> Esistono utili strumenti a basso codice che possono aiutare a imparare a lavorare con i modelli di regressione. Si provi [Azure Machine Learning per questa attività](https://docs.microsoft.com/learn/modules/create-regression-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
+
+### Lezioni
+
+1. [Gli Attrezzi Necessari](../1-Tools/translations/README.it.md)
+2. [Gestione dati](../2-Data/translations/README.it.md)
+3. [Regressione lineare e polinomiale](../3-Linear/translations/README.it.md)
+4. [Regressione logistica](../4-Logistic/translations/README.it.md)
+
+---
+### Crediti
+
+"ML con regressione" scritto con ♥️ da [Jen Looper](https://twitter.com/jenlooper)
+
+♥️ I collaboratori del quiz includono: [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan) e [Ornella Altunyan](https://twitter.com/ornelladotcom)
+
+L'insieme di dati relativi alla zucca è suggerito da [questo progetto su](https://www.kaggle.com/usda/a-year-of-pumpkin-prices) Kaggle e i suoi dati provengono dai [Rapporti Standard sui Mercati Terminali delle Colture Speciali](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) distribuiti dal Dipartimento dell'Agricoltura degli Stati Uniti. Sono stati aggiunti alcuni punti intorno al colore in base alla varietà per normalizzare la distribuzione. Questi dati sono di pubblico dominio.
diff --git a/2-Regression/translations/README.zh-cn.md b/2-Regression/translations/README.zh-cn.md
new file mode 100644
index 00000000..f7c511e6
--- /dev/null
+++ b/2-Regression/translations/README.zh-cn.md
@@ -0,0 +1,34 @@
+# 机器学习中的回归模型
+## 本节主题: 北美南瓜价格的回归模型 🎃
+
+在北美,南瓜经常在万圣节被刻上吓人的鬼脸。让我们来深入研究一下这种奇妙的蔬菜
+
+
+> Foto oleh Beth Teutschmann di Unsplash
+
+## 你会学到什么
+
+这节的课程包括机器学习领域中的多种回归模型。回归模型可以明确多种变量间的_关系_。这种模型可以用来预测类似长度、温度和年龄之类的值, 通过分析数据点来揭示变量之间的关系。
+
+在本节的一系列课程中,你会学到线性回归和逻辑回归之间的区别,并且你将知道对于特定问题如何在这两种模型中进行选择
+
+在这组课程中,你会准备好包括为管理笔记而设置VS Code、配置数据科学家常用的环境等机器学习的初始任务。你会开始上手Scikit-learn学习项目(一个机器学习的百科),并且你会以回归模型为主构建起你的第一种机器学习模型
+
+> 这里有一些代码难度较低但很有用的工具可以帮助你学习使用回归模型。 试一下 [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-regression-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
+
+
+### Lessons
+
+1. [交易的工具](../1-Tools/translations/README.zh-cn.md)
+2. [管理数据](../2-Data/translations/README.zh-cn.md)
+3. [线性和多项式回归](../3-Linear/translations/README.zh-cn.md)
+4. [逻辑回归](../4-Logistic/translations/README.zh-cn.md)
+
+---
+### Credits
+
+"机器学习中的回归" 由[Jen Looper](https://twitter.com/jenlooper)♥️ 撰写
+
+♥️ 测试的贡献者: [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan) 和 [Ornella Altunyan](https://twitter.com/ornelladotcom)
+
+南瓜数据集受此启发 [this project on Kaggle](https://www.kaggle.com/usda/a-year-of-pumpkin-prices) 并且其数据源自 [Specialty Crops Terminal Markets Standard Reports](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice) 由美国农业部上传分享。我们根据种类添加了围绕颜色的一些数据点。这些数据处在公共的域名上。
diff --git a/3-Web-App/1-Web-App/README.md b/3-Web-App/1-Web-App/README.md
index 6150aece..2e409e78 100644
--- a/3-Web-App/1-Web-App/README.md
+++ b/3-Web-App/1-Web-App/README.md
@@ -1,6 +1,6 @@
# Build a Web App to use a ML Model
-In this lesson, you will train an ML model on a data set that's out of this world: _UFO sightings over the past century_, sourced from [NUFORC's database](https://www.nuforc.org).
+In this lesson, you will train an ML model on a data set that's out of this world: _UFO sightings over the past century_, sourced from NUFORC's database.
You will learn:
@@ -165,7 +165,7 @@ Now you can build a Flask app to call your model and return similar results, but
web-app/
static/
css/
- templates/
+ templates/
notebook.ipynb
ufo-model.pkl
```
@@ -187,7 +187,7 @@ Now you can build a Flask app to call your model and return similar results, but
cd web-app
```
-1. In your terminal type `pip install`, to install the libraries listed in _reuirements.txt_:
+1. In your terminal type `pip install`, to install the libraries listed in _requirements.txt_:
```bash
pip install -r requirements.txt
@@ -199,7 +199,7 @@ Now you can build a Flask app to call your model and return similar results, but
2. Create **index.html** in _templates_ directory.
3. Create **styles.css** in _static/css_ directory.
-1. Build out the _styles.css__ file with a few styles:
+1. Build out the _styles.css_ file with a few styles:
```css
body {
diff --git a/3-Web-App/1-Web-App/translations/README.it.md b/3-Web-App/1-Web-App/translations/README.it.md
new file mode 100644
index 00000000..82454852
--- /dev/null
+++ b/3-Web-App/1-Web-App/translations/README.it.md
@@ -0,0 +1,347 @@
+# Creare un'app web per utilizzare un modello ML
+
+In questa lezione, si addestrerà un modello ML su un insieme di dati fuori dal mondo: _avvistamenti di UFO nel secolo scorso_, provenienti dal [database di NUFORC](https://www.nuforc.org).
+
+Si imparerà:
+
+- Come serializzare/deserializzare un modello addestrato
+- Come usare quel modello in un'app Flask
+
+Si continuerà a utilizzare il notebook per pulire i dati e addestrare il modello, ma si può fare un ulteriore passo avanti nel processo esplorando l'utilizzo del modello direttamente in un'app web.
+
+Per fare ciò, è necessario creare un'app Web utilizzando Flask.
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/17/)
+
+## Costruire un'app
+
+Esistono diversi modi per creare app Web per utilizzare modelli di machine learning. L'architettura web può influenzare il modo in cui il modello viene addestrato. Si immagini di lavorare in un'azienda nella quale il gruppo di data science ha addestrato un modello che va utilizzato in un'app.
+
+### Considerazioni
+
+Ci sono molte domande da porsi:
+
+- **È un'app web o un'app su dispositivo mobile?** Se si sta creando un'app su dispositivo mobile o si deve usare il modello in un contesto IoT, ci si può avvalere [di TensorFlow Lite](https://www.tensorflow.org/lite/) e usare il modello in un'app Android o iOS.
+- **Dove risiederà il modello**? E' utilizzato in cloud o in locale?
+- **Supporto offline**. L'app deve funzionare offline?
+- **Quale tecnologia è stata utilizzata per addestrare il modello?** La tecnologia scelta può influenzare gli strumenti che è necessario utilizzare.
+ - **Utilizzare** TensorFlow. Se si sta addestrando un modello utilizzando TensorFlow, ad esempio, tale ecosistema offre la possibilità di convertire un modello TensorFlow per l'utilizzo in un'app Web utilizzando [TensorFlow.js](https://www.tensorflow.org/js/).
+ - **Utilizzare PyTorch**. Se si sta costruendo un modello utilizzando una libreria come PyTorch[,](https://pytorch.org/) si ha la possibilità di esportarlo in formato [ONNX](https://onnx.ai/) ( Open Neural Network Exchange) per l'utilizzo in app Web JavaScript che possono utilizzare il [motore di esecuzione Onnx](https://www.onnxruntime.ai/). Questa opzione verrà esplorata in una lezione futura per un modello addestrato da Scikit-learn
+ - **Utilizzo di Lobe.ai o Azure Custom vision**. Se si sta usando un sistema ML SaaS (Software as a Service) come [Lobe.ai](https://lobe.ai/) o [Azure Custom Vision](https://azure.microsoft.com/services/cognitive-services/custom-vision-service/?WT.mc_id=academic-15963-cxa) per addestrare un modello, questo tipo di software fornisce modi per esportare il modello per molte piattaforme, inclusa la creazione di un'API su misura da interrogare nel cloud dalla propria applicazione online.
+
+Si ha anche l'opportunità di creare un'intera app Web Flask in grado di addestrare il modello stesso in un browser Web. Questo può essere fatto anche usando TensorFlow.js in un contesto JavaScript.
+
+Per questo scopo, poiché si è lavorato con i notebook basati su Python, verranno esplorati i passaggi necessari per esportare un modello addestrato da tale notebook in un formato leggibile da un'app Web creata in Python.
+
+## Strumenti
+
+Per questa attività sono necessari due strumenti: Flask e Pickle, entrambi eseguiti su Python.
+
+✅ Cos'è [Flask](https://palletsprojects.com/p/flask/)? Definito come un "micro-framework" dai suoi creatori, Flask fornisce le funzionalità di base dei framework web utilizzando Python e un motore di template per creare pagine web. Si dia un'occhiata a [questo modulo di apprendimento](https://docs.microsoft.com/learn/modules/python-flask-build-ai-web-app?WT.mc_id=academic-15963-cxa) per esercitarsi a sviluppare con Flask.
+
+✅ Cos'è [Pickle](https://docs.python.org/3/library/pickle.html)? Pickle 🥒 è un modulo Python che serializza e de-serializza la struttura di un oggetto Python. Quando si utilizza pickle in un modello, si serializza o si appiattisce la sua struttura per l'uso sul web. Cautela: pickle non è intrinsecamente sicuro, quindi si faccia attenzione se viene chiesto di de-serializzare un file. Un file creato con pickle ha il suffisso `.pkl`.
+
+## Esercizio: pulire i dati
+
+In questa lezione verranno utilizzati i dati di 80.000 avvistamenti UFO, raccolti dal Centro Nazionale per gli Avvistamenti di UFO [NUFORC](https://nuforc.org) (The National UFO Reporting Center). Questi dati hanno alcune descrizioni interessanti di avvistamenti UFO, ad esempio:
+
+- **Descrizione di esempio lunga**. "Un uomo emerge da un raggio di luce che di notte brilla su un campo erboso e corre verso il parcheggio della Texas Instruments".
+- **Descrizione di esempio breve**. "le luci ci hanno inseguito".
+
+Il foglio di calcolo [ufo.csv](../data/ufos.csv) include colonne su città (`city`), stato (`state`) e nazione (`country`) in cui è avvenuto l'avvistamento, la forma (`shape`) dell'oggetto e la sua latitudine (`latitude`) e longitudine (`longitude`).
+
+Nel [notebook](../notebook.ipynb) vuoto incluso in questa lezione:
+
+1. importare `pandas`, `matplotlib` e `numpy` come fatto nelle lezioni precedenti e importare il foglio di calcolo ufo.csv. Si può dare un'occhiata a un insieme di dati campione:
+
+ ```python
+ import pandas as pd
+ import numpy as np
+
+ ufos = pd.read_csv('../data/ufos.csv')
+ ufos.head()
+ ```
+
+1. Convertire i dati ufos in un piccolo dataframe con nuove intestazioni Controllare i valori univoci nel campo `Country` .
+
+ ```python
+ ufos = pd.DataFrame({'Seconds': ufos['duration (seconds)'], 'Country': ufos['country'],'Latitude': ufos['latitude'],'Longitude': ufos['longitude']})
+
+ ufos.Country.unique()
+ ```
+
+1. Ora si può ridurre la quantità di dati da gestire eliminando qualsiasi valore nullo e importando solo avvistamenti tra 1-60 secondi:
+
+ ```python
+ ufos.dropna(inplace=True)
+
+ ufos = ufos[(ufos['Seconds'] >= 1) & (ufos['Seconds'] <= 60)]
+
+ ufos.info()
+ ```
+
+1. Importare la libreria `LabelEncoder` di Scikit-learn per convertire i valori di testo per le nazioni in un numero:
+
+ ✅ LabelEncoder codifica i dati in ordine alfabetico
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+
+ ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])
+
+ ufos.head()
+ ```
+
+ I dati dovrebbero assomigliare a questo:
+
+ ```output
+ Seconds Country Latitude Longitude
+ 2 20.0 3 53.200000 -2.916667
+ 3 20.0 4 28.978333 -96.645833
+ 14 30.0 4 35.823889 -80.253611
+ 23 60.0 4 45.582778 -122.352222
+ 24 3.0 3 51.783333 -0.783333
+ ```
+
+## Esercizio: costruire il proprio modello
+
+Ora ci si può preparare per addestrare un modello portando i dati nei gruppi di addestramento e test.
+
+1. Selezionare le tre caratteristiche su cui lo si vuole allenare come vettore X mentre il vettore y sarà `Country` Si deve essere in grado di inserire secondi (`Seconds`), latitudine (`Latitude`) e longitudine (`Longitude`) e ottenere un ID nazione da restituire.
+
+ ```python
+ from sklearn.model_selection import train_test_split
+
+ Selected_features = ['Seconds','Latitude','Longitude']
+
+ X = ufos[Selected_features]
+ y = ufos['Country']
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+ ```
+
+1. Addestrare il modello usando la regressione logistica:
+
+ ```python
+ from sklearn.metrics import accuracy_score, classification_report
+ from sklearn.linear_model import LogisticRegression
+ model = LogisticRegression()
+ model.fit(X_train, y_train)
+ predictions = model.predict(X_test)
+
+ print(classification_report(y_test, predictions))
+ print('Predicted labels: ', predictions)
+ print('Accuracy: ', accuracy_score(y_test, predictions))
+ ```
+
+La precisione non è male **(circa il 95%)**, non sorprende che `Country` e `Latitude/Longitude` siano correlati.
+
+Il modello creato non è molto rivoluzionario in quanto si dovrebbe essere in grado di dedurre una nazione (`Country`) dalla sua latitudine e longitudine (`Latitude` e `Longitude`), ma è un buon esercizio provare ad allenare dai dati grezzi che sono stati puliti ed esportati, e quindi utilizzare questo modello in una app web.
+
+## Esercizio: usare pickle con il modello
+
+Ora è il momento di utilizzare _pickle_ con il modello! Lo si può fare in poche righe di codice. Una volta che è stato _serializzato con pickle_, caricare il modello e testarlo rispetto a un array di dati di esempio contenente valori per secondi, latitudine e longitudine,
+
+```python
+import pickle
+model_filename = 'ufo-model.pkl'
+pickle.dump(model, open(model_filename,'wb'))
+
+model = pickle.load(open('ufo-model.pkl','rb'))
+print(model.predict([[50,44,-12]]))
+```
+
+Il modello restituisce **"3"**, che è il codice nazione per il Regno Unito. Fantastico! 👽
+
+## Esercizio: creare un'app Flask
+
+Ora si può creare un'app Flask per chiamare il modello e restituire risultati simili, ma in un modo visivamente più gradevole.
+
+1. Iniziare creando una cartella chiamata **web-app** a livello del file _notebook.ipynb_ dove risiede il file _ufo-model.pkl_.
+
+1. In quella cartella creare altre tre cartelle: **static**, con una cartella **css** al suo interno e **templates**. Ora si dovrebbero avere i seguenti file e directory:
+
+ ```output
+ web-app/
+ static/
+ css/
+ templates/
+ notebook.ipynb
+ ufo-model.pkl
+ ```
+
+ ✅ Fare riferimento alla cartella della soluzione per una visualizzazione dell'app finita.
+
+1. Il primo file da creare nella cartella _web-app_ è il file **requirements.txt**. Come _package.json_ in un'app JavaScript, questo file elenca le dipendenze richieste dall'app. In **requirements.txt** aggiungere le righe:
+
+ ```text
+ scikit-learn
+ pandas
+ numpy
+ flask
+ ```
+
+1. Ora, eseguire questo file portandosi su _web-app_:
+
+ ```bash
+ cd web-app
+ ```
+
+1. Aprire una finestra di terminale dove risiede requirements.txt e digitare `pip install`, per installare le librerie elencate in _reuirements.txt_:
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+1. Ora si è pronti per creare altri tre file per completare l'app:
+
+ 1. Creare **app.py** nella directory radice.
+ 2. Creare **index.html** nella directory _templates_.
+ 3. Creare **sytles.css** nella directory _static/css_.
+
+1. Inserire nel file _styles.css_ alcuni stili:
+
+ ```css
+ body {
+ width: 100%;
+ height: 100%;
+ font-family: 'Helvetica';
+ background: black;
+ color: #fff;
+ text-align: center;
+ letter-spacing: 1.4px;
+ font-size: 30px;
+ }
+
+ input {
+ min-width: 150px;
+ }
+
+ .grid {
+ width: 300px;
+ border: 1px solid #2d2d2d;
+ display: grid;
+ justify-content: center;
+ margin: 20px auto;
+ }
+
+ .box {
+ color: #fff;
+ background: #2d2d2d;
+ padding: 12px;
+ display: inline-block;
+ }
+ ```
+
+1. Quindi, creare il file _index.html_ :
+
+ ```html
+
+
+
+
+ 🛸 UFO Appearance Prediction! 👽
+
+
+
+
+
+
+
+
+
According to the number of seconds, latitude and longitude, which country is likely to have reported seeing a UFO?
+
+
+
+
+
{{ prediction_text }}
+
+
+
+
+
+
+ ```
+
+ Dare un'occhiata al template di questo file. Notare la sintassi con le parentesi graffe attorno alle variabili che verranno fornite dall'app, come il testo di previsione: `{{}}`. C'è anche un modulo che invia una previsione alla rotta `/predict`.
+
+ Infine, si è pronti per creare il file python che guida il consumo del modello e la visualizzazione delle previsioni:
+
+1. In `app.py` aggiungere:
+
+ ```python
+ import numpy as np
+ from flask import Flask, request, render_template
+ import pickle
+
+ app = Flask(__name__)
+
+ model = pickle.load(open("../ufo-model.pkl", "rb"))
+
+
+ @app.route("/")
+ def home():
+ return render_template("index.html")
+
+
+ @app.route("/predict", methods=["POST"])
+ def predict():
+
+ int_features = [int(x) for x in request.form.values()]
+ final_features = [np.array(int_features)]
+ prediction = model.predict(final_features)
+
+ output = prediction[0]
+
+ countries = ["Australia", "Canada", "Germany", "UK", "US"]
+
+ return render_template(
+ "index.html", prediction_text="Likely country: {}".format(countries[output])
+ )
+
+
+ if __name__ == "__main__":
+ app.run(debug=True)
+ ```
+
+ > 💡 Suggerimento: quando si aggiunge [`debug=True`](https://www.askpython.com/python-modules/flask/flask-debug-mode) durante l'esecuzione dell'app web utilizzando Flask, qualsiasi modifica apportata all'applicazione verrà recepita immediatamente senza la necessità di riavviare il server. Attenzione! Non abilitare questa modalità in un'app di produzione.
+
+Se si esegue `python app.py` o `python3 app.py` , il server web si avvia, localmente, e si può compilare un breve modulo per ottenere una risposta alla domanda scottante su dove sono stati avvistati gli UFO!
+
+Prima di farlo, dare un'occhiata alle parti di `app.py`:
+
+1. Innanzitutto, le dipendenze vengono caricate e l'app si avvia.
+1. Poi il modello viene importato.
+1. Infine index.html viene visualizzato sulla rotta home.
+
+Sulla rotta `/predict` , accadono diverse cose quando il modulo viene inviato:
+
+1. Le variabili del modulo vengono raccolte e convertite in un array numpy. Vengono quindi inviate al modello e viene restituita una previsione.
+2. Le nazioni che si vogliono visualizzare vengono nuovamente esposte come testo leggibile ricavato dal loro codice paese previsto e tale valore viene inviato a index.html per essere visualizzato nel template della pagina web.
+
+Usare un modello in questo modo, con Flask e un modello serializzato è relativamente semplice. La cosa più difficile è capire che forma hanno i dati che devono essere inviati al modello per ottenere una previsione. Tutto dipende da come è stato addestrato il modello. Questo ha tre punti dati da inserire per ottenere una previsione.
+
+In un ambiente professionale, si può vedere quanto sia necessaria una buona comunicazione tra le persone che addestrano il modello e coloro che lo consumano in un'app web o su dispositivo mobile. In questo caso, si ricoprono entrambi i ruoli!
+
+---
+
+## 🚀 Sfida
+
+Invece di lavorare su un notebook e importare il modello nell'app Flask, si può addestrare il modello direttamente nell'app Flask! Provare a convertire il codice Python nel notebook, magari dopo che i dati sono stati puliti, per addestrare il modello dall'interno dell'app su un percorso chiamato `/train`. Quali sono i pro e i contro nel seguire questo metodo?
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/18/)
+
+## Revisione e Auto Apprendimento
+
+Esistono molti modi per creare un'app web per utilizzare i modelli ML. Elencare dei modi in cui si potrebbe utilizzare JavaScript o Python per creare un'app web per sfruttare machine learning. Considerare l'architettura: il modello dovrebbe rimanere nell'app o risiedere nel cloud? In quest'ultimo casi, come accedervi? Disegnare un modello architettonico per una soluzione web ML applicata.
+
+## Compito
+
+[Provare un modello diverso](assignment.it.md)
+
+
diff --git a/3-Web-App/1-Web-App/translations/README.zh-cn.md b/3-Web-App/1-Web-App/translations/README.zh-cn.md
new file mode 100644
index 00000000..9110f2b8
--- /dev/null
+++ b/3-Web-App/1-Web-App/translations/README.zh-cn.md
@@ -0,0 +1,347 @@
+# 构建使用ML模型的Web应用程序
+
+在本课中,你将在一个数据集上训练一个ML模型,这个数据集来自世界各地:过去一个世纪的UFO目击事件,来源于[NUFORC的数据库](https://www.nuforc.org)。
+
+你将学会:
+
+- 如何“pickle”一个训练有素的模型
+- 如何在Flask应用程序中使用该模型
+
+我们将继续使用notebook来清理数据和训练我们的模型,但你可以进一步探索在web应用程序中使用模型。
+
+为此,你需要使用Flask构建一个web应用程序。
+
+## [课前测](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/17/)
+
+## 构建应用程序
+
+有多种方法可以构建Web应用程序以使用机器学习模型。你的web架构可能会影响你的模型训练方式。想象一下,你在一家企业工作,其中数据科学小组已经训练了他们希望你在应用程序中使用的模型。
+
+### 注意事项
+
+你需要问很多问题:
+
+- **它是web应用程序还是移动应用程序?**如果你正在构建移动应用程序或需要在物联网环境中使用模型,你可以使用[TensorFlow Lite](https://www.tensorflow.org/lite/)并在Android或iOS应用程序中使用该模型。
+- **模型放在哪里?**在云端还是本地?
+- **离线支持**。该应用程序是否必须离线工作?
+- **使用什么技术来训练模型?**所选的技术可能会影响你需要使用的工具。
+ - **使用Tensor flow**。例如,如果你正在使用TensorFlow训练模型,则该生态系统提供了使用[TensorFlow.js](https://www.tensorflow.org/js/)转换TensorFlow模型以便在Web应用程序中使用的能力。
+ - **使用 PyTorch**。如果你使用[PyTorch](https://pytorch.org/)等库构建模型,则可以选择将其导出到[ONNX](https://onnx.ai/)(开放神经网络交换)格式,用于可以使用 [Onnx Runtime](https://www.onnxruntime.ai/)的JavaScript Web 应用程序。此选项将在Scikit-learn-trained模型的未来课程中进行探讨。
+ - **使用Lobe.ai或Azure自定义视觉**。如果你使用ML SaaS(软件即服务)系统,例如[Lobe.ai](https://lobe.ai/)或[Azure Custom Vision](https://azure.microsoft.com/services/ cognitive-services/custom-vision-service/?WT.mc_id=academic-15963-cxa)来训练模型,这种类型的软件提供了为许多平台导出模型的方法,包括构建一个定制API,供在线应用程序在云中查询。
+
+你还有机会构建一个完整的Flask Web应用程序,该应用程序能够在 Web浏览器中训练模型本身。这也可以在JavaScript上下文中使用 TensorFlow.js来完成。
+
+出于我们的目的,既然我们一直在使用基于Python的notebook,那么就让我们探讨一下将经过训练的模型从notebook导出为Python构建的web应用程序可读的格式所需要采取的步骤。
+
+## 工具
+
+对于此任务,你需要两个工具:Flask和Pickle,它们都在Python上运行。
+
+✅ 什么是 [Flask](https://palletsprojects.com/p/flask/)? Flask被其创建者定义为“微框架”,它提供了使用Python和模板引擎构建网页的Web框架的基本功能。看看[本学习单元](https://docs.microsoft.com/learn/modules/python-flask-build-ai-web-app?WT.mc_id=academic-15963-cxa)练习使用Flask构建应用程序。
+
+✅ 什么是[Pickle](https://docs.python.org/3/library/pickle.html)? Pickle🥒是一 Python模块,用于序列化和反序列化 Python对象结构。当你“pickle”一个模型时,你将其结构序列化或展平以在 Web上使用。小心:pickle本质上不是安全的,所以如果提示“un-pickle”文件,请小心。生产的文件具有后缀`.pkl`。
+
+## 练习 - 清理你的数据
+
+在本课中,你将使用由 [NUFORC](https://nuforc.org)(国家 UFO 报告中心)收集的80,000次UFO目击数据。这些数据对UFO目击事件有一些有趣的描述,例如:
+
+- **详细描述**。"一名男子从夜间照射在草地上的光束中出现,他朝德克萨斯仪器公司的停车场跑去"。
+- **简短描述**。 “灯光追着我们”。
+
+[ufos.csv](./data/ufos.csv)电子表格包括有关目击事件发生的`city`、`state`和`country`、对象的`shape`及其`latitude`和`longitude`的列。
+
+在包含在本课中的空白[notebook](notebook.ipynb)中:
+
+1. 像在之前的课程中一样导入`pandas`、`matplotlib`和`numpy`,然后导入ufos电子表格。你可以查看一个示例数据集:
+
+ ```python
+ import pandas as pd
+ import numpy as np
+
+ ufos = pd.read_csv('../data/ufos.csv')
+ ufos.head()
+ ```
+
+2. 将ufos数据转换为带有新标题的小dataframe。检查`country`字段中的唯一值。
+
+ ```python
+ ufos = pd.DataFrame({'Seconds': ufos['duration (seconds)'], 'Country': ufos['country'],'Latitude': ufos['latitude'],'Longitude': ufos['longitude']})
+
+ ufos.Country.unique()
+ ```
+
+3. 现在,你可以通过删除任何空值并仅导入1-60秒之间的目击数据来减少我们需要处理的数据量:
+
+ ```python
+ ufos.dropna(inplace=True)
+
+ ufos = ufos[(ufos['Seconds'] >= 1) & (ufos['Seconds'] <= 60)]
+
+ ufos.info()
+ ```
+
+4. 导入Scikit-learn的`LabelEncoder`库,将国家的文本值转换为数字:
+
+ ✅ LabelEncoder按字母顺序编码数据
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+
+ ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])
+
+ ufos.head()
+ ```
+
+ 你的数据应如下所示:
+
+ ```output
+ Seconds Country Latitude Longitude
+ 2 20.0 3 53.200000 -2.916667
+ 3 20.0 4 28.978333 -96.645833
+ 14 30.0 4 35.823889 -80.253611
+ 23 60.0 4 45.582778 -122.352222
+ 24 3.0 3 51.783333 -0.783333
+ ```
+
+## 练习 - 建立你的模型
+
+现在,你可以通过将数据划分为训练和测试组来准备训练模型。
+
+1. 选择要训练的三个特征作为X向量,y向量将是`Country` 你希望能够输入`Seconds`、`Latitude`和`Longitude`并获得要返回的国家/地区ID。
+
+ ```python
+ from sklearn.model_selection import train_test_split
+
+ Selected_features = ['Seconds','Latitude','Longitude']
+
+ X = ufos[Selected_features]
+ y = ufos['Country']
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+ ```
+
+2. 使用逻辑回归训练模型:
+
+ ```python
+ from sklearn.metrics import accuracy_score, classification_report
+ from sklearn.linear_model import LogisticRegression
+ model = LogisticRegression()
+ model.fit(X_train, y_train)
+ predictions = model.predict(X_test)
+
+ print(classification_report(y_test, predictions))
+ print('Predicted labels: ', predictions)
+ print('Accuracy: ', accuracy_score(y_test, predictions))
+ ```
+
+准确率还不错**(大约 95%)**,不出所料,因为`Country`和`Latitude/Longitude`相关。
+
+你创建的模型并不是非常具有革命性,因为你应该能够从其`Latitude`和`Longitude`推断出`Country`,但是,尝试从清理、导出的原始数据进行训练,然后在web应用程序中使用此模型是一个很好的练习。
+
+## 练习 - “pickle”你的模型
+
+现在,是时候_pickle_你的模型了!你可以在几行代码中做到这一点。一旦它是 _pickled_,加载你的pickled模型并针对包含秒、纬度和经度值的示例数据数组对其进行测试,
+
+```python
+import pickle
+model_filename = 'ufo-model.pkl'
+pickle.dump(model, open(model_filename,'wb'))
+
+model = pickle.load(open('ufo-model.pkl','rb'))
+print(model.predict([[50,44,-12]]))
+```
+
+该模型返回**'3'**,这是英国的国家代码。👽
+
+## 练习 - 构建Flask应用程序
+
+现在你可以构建一个Flask应用程序来调用你的模型并返回类似的结果,但以一种更美观的方式。
+
+1. 首先在你的 _ufo-model.pkl_ 文件所在的_notebook.ipynb_文件旁边创建一个名为**web-app**的文件夹。
+
+2. 在该文件夹中创建另外三个文件夹:**static**,其中有文件夹**css**和**templates`**。 你现在应该拥有以下文件和目录
+
+ ```output
+ web-app/
+ static/
+ css/
+ templates/
+ notebook.ipynb
+ ufo-model.pkl
+ ```
+
+ ✅ 请参阅解决方案文件夹以查看已完成的应用程序
+
+3. 在_web-app_文件夹中创建的第一个文件是**requirements.txt**文件。与JavaScript应用程序中的_package.json_一样,此文件列出了应用程序所需的依赖项。在**requirements.txt**中添加以下几行:
+
+ ```text
+ scikit-learn
+ pandas
+ numpy
+ flask
+ ```
+
+4. 现在,进入web-app文件夹:
+
+ ```bash
+ cd web-app
+ ```
+
+5. 在你的终端中输入`pip install`,以安装_reuirements.txt_中列出的库:
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+6. 现在,你已准备好创建另外三个文件来完成应用程序:
+
+ 1. 在根目录中创建**app.py**
+ 2. 在_templates_目录中创建**index.html**。
+ 3. 在_static/css_目录中创建**styles.css**。
+
+7. 使用一些样式构建_styles.css_文件:
+
+ ```css
+ body {
+ width: 100%;
+ height: 100%;
+ font-family: 'Helvetica';
+ background: black;
+ color: #fff;
+ text-align: center;
+ letter-spacing: 1.4px;
+ font-size: 30px;
+ }
+
+ input {
+ min-width: 150px;
+ }
+
+ .grid {
+ width: 300px;
+ border: 1px solid #2d2d2d;
+ display: grid;
+ justify-content: center;
+ margin: 20px auto;
+ }
+
+ .box {
+ color: #fff;
+ background: #2d2d2d;
+ padding: 12px;
+ display: inline-block;
+ }
+ ```
+
+8. 接下来,构建_index.html_文件:
+
+ ```html
+
+
+
+
+ 🛸 UFO Appearance Prediction! 👽
+
+
+
+
+
+
+
+
+
According to the number of seconds, latitude and longitude, which country is likely to have reported seeing a UFO?
+
+
+
+
+
{{ prediction_text }}
+
+
+
+
+
+
+ ```
+
+ 看看这个文件中的模板。请注意应用程序将提供的变量周围的“mustache”语法,例如预测文本:`{{}}`。还有一个表单可以将预测发布到`/predict`路由。
+
+ 最后,你已准备好构建使用模型和显示预测的python 文件:
+
+9. 在`app.py`中添加:
+
+ ```python
+ import numpy as np
+ from flask import Flask, request, render_template
+ import pickle
+
+ app = Flask(__name__)
+
+ model = pickle.load(open("../ufo-model.pkl", "rb"))
+
+
+ @app.route("/")
+ def home():
+ return render_template("index.html")
+
+
+ @app.route("/predict", methods=["POST"])
+ def predict():
+
+ int_features = [int(x) for x in request.form.values()]
+ final_features = [np.array(int_features)]
+ prediction = model.predict(final_features)
+
+ output = prediction[0]
+
+ countries = ["Australia", "Canada", "Germany", "UK", "US"]
+
+ return render_template(
+ "index.html", prediction_text="Likely country: {}".format(countries[output])
+ )
+
+
+ if __name__ == "__main__":
+ app.run(debug=True)
+ ```
+
+ > 💡 提示:当你在使用Flask运行Web应用程序时添加 [`debug=True`](https://www.askpython.com/python-modules/flask/flask-debug-mode)时你对应用程序所做的任何更改将立即反映,无需重新启动服务器。注意!不要在生产应用程序中启用此模式
+
+如果你运行`python app.py`或`python3 app.py` - 你的网络服务器在本地启动,你可以填写一个简短的表格来回答你关于在哪里看到UFO的问题!
+
+在此之前,先看一下`app.py`的实现:
+
+1. 首先,加载依赖项并启动应用程序。
+2. 然后,导入模型。
+3. 然后,在home路由上渲染index.html。
+
+在`/predict`路由上,当表单被发布时会发生几件事情:
+
+1. 收集表单变量并转换为numpy数组。然后将它们发送到模型并返回预测。
+2. 我们希望显示的国家/地区根据其预测的国家/地区代码重新呈现为可读文本,并将该值发送回index.html以在模板中呈现。
+
+以这种方式使用模型,包括Flask和pickled模型,是相对简单的。最困难的是要理解数据是什么形状的,这些数据必须发送到模型中才能得到预测。这完全取决于模型是如何训练的。有三个数据要输入,以便得到一个预测。
+
+在一个专业的环境中,你可以看到训练模型的人和在Web或移动应用程序中使用模型的人之间的良好沟通是多么的必要。在我们的情况下,只有一个人,你!
+
+---
+
+## 🚀 挑战:
+
+你可以在Flask应用程序中训练模型,而不是在notebook上工作并将模型导入Flask应用程序!尝试在notebook中转换Python代码,可能是在清除数据之后,从应用程序中的一个名为`train`的路径训练模型。采用这种方法的利弊是什么?
+
+## [课后测](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/18/)
+
+## 复习与自学
+
+有很多方法可以构建一个Web应用程序来使用ML模型。列出可以使用JavaScript或Python构建Web应用程序以利用机器学习的方法。考虑架构:模型应该留在应用程序中还是存在于云中?如果是后者,你将如何访问它?为应用的ML Web解决方案绘制架构模型。
+
+## 任务
+
+[尝试不同的模型](../assignment.md)
+
+
diff --git a/3-Web-App/1-Web-App/translations/assignment.it.md b/3-Web-App/1-Web-App/translations/assignment.it.md
new file mode 100644
index 00000000..7bc7ffd9
--- /dev/null
+++ b/3-Web-App/1-Web-App/translations/assignment.it.md
@@ -0,0 +1,11 @@
+# Provare un modello diverso
+
+## Istruzioni
+
+Ora che si è creato un'app web utilizzando un modello di Regressione addestrato, usare uno dei modelli da una lezione precedente sulla Regressione per rifare questa app web. Si può mantenere lo stile o progettarla in modo diverso per riflettere i dati della zucca. Fare attenzione a modificare gli input in modo che riflettano il metodo di addestramento del proprio modello.
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | -------------------------------------- |
+| | L'app web funziona come previsto e viene distribuita nel cloud | L'app web contiene difetti o mostra risultati imprevisti | L'app web non funziona correttamente |
diff --git a/3-Web-App/1-Web-App/translations/assignment.zh-cn.md b/3-Web-App/1-Web-App/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..016dfa52
--- /dev/null
+++ b/3-Web-App/1-Web-App/translations/assignment.zh-cn.md
@@ -0,0 +1,12 @@
+# Բͬģ
+
+## ˵
+
+ڣѾܹʹһѵĻعģwebӦóôǰĻعγѡһģһwebӦóʹԭķͬķƣչʾpumpkinݡעԷӳģ͵ѵ
+
+
+## б
+
+| | | йо | Ŭ |
+| -------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | -------------------------------------- |
+| | webӦóԤУƶ | webӦóȱݻʾ벻Ľ | webӦó |
diff --git a/3-Web-App/translations/README.it.md b/3-Web-App/translations/README.it.md
new file mode 100644
index 00000000..d376b8ec
--- /dev/null
+++ b/3-Web-App/translations/README.it.md
@@ -0,0 +1,22 @@
+# Creare un'app web per utilizzare il modello ML
+
+In questa sezione del programma di studi, verrà presentato un argomento ML applicato: come salvare il modello di Scikit-learn come file che può essere utilizzato per fare previsioni all'interno di un'applicazione web. Una volta salvato il modello, si imparerà come utilizzarlo in un'app web sviluppata con Flask. Per prima cosa si creerà un modello utilizzando alcuni dati che riguardano gli avvistamenti di UFO! Quindi, si creerà un'app web che consentirà di inserire un numero di secondi con un valore di latitudine e longitudine per prevedere quale paese ha riferito di aver visto un UFO.
+
+
+
+Foto di Michael Herren su Unsplash
+
+
+## Lezioni
+
+1. [Costruire un'app web](../1-Web-App/translations/README.it.md)
+
+## Crediti
+
+"Costruire un'app web" è stato scritto con ♥️ da [Jen Looper](https://twitter.com/jenlooper).
+
+♥️ I quiz sono stati scritti da Rohan Raj.
+
+L'insieme di dati proviene da [Kaggle](https://www.kaggle.com/NUFORC/ufo-sightings).
+
+L'architettura dell'app web è stata suggerita in parte da [questo articolo](https://towardsdatascience.com/how-to-easily-deploy-machine-learning-models-using-flask-b95af8fe34d4) e da [questo](https://github.com/abhinavsagar/machine-learning-deployment) repository di Abhinav Sagar.
\ No newline at end of file
diff --git a/4-Classification/1-Introduction/README.md b/4-Classification/1-Introduction/README.md
index 4490131c..4a337376 100644
--- a/4-Classification/1-Introduction/README.md
+++ b/4-Classification/1-Introduction/README.md
@@ -163,7 +163,7 @@ Now you can dig deeper into the data and learn what are the typical ingredients
def create_ingredient_df(df):
ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
- ingredient_df = ingredient_df.sort_values(by='value', ascending=False
+ ingredient_df = ingredient_df.sort_values(by='value', ascending=False,
inplace=False)
return ingredient_df
```
@@ -275,7 +275,7 @@ Now that you have cleaned the data, use [SMOTE](https://imbalanced-learn.org/dev
```python
transformed_df.head()
transformed_df.info()
- transformed_df.to_csv("../data/cleaned_cuisine.csv")
+ transformed_df.to_csv("../data/cleaned_cuisines.csv")
```
This fresh CSV can now be found in the root data folder.
diff --git a/4-Classification/1-Introduction/solution/notebook.ipynb b/4-Classification/1-Introduction/solution/notebook.ipynb
index c5b8c629..5abb9693 100644
--- a/4-Classification/1-Introduction/solution/notebook.ipynb
+++ b/4-Classification/1-Introduction/solution/notebook.ipynb
@@ -622,7 +622,7 @@
"metadata": {},
"outputs": [],
"source": [
- "transformed_df.to_csv(\"../../data/cleaned_cuisine.csv\")"
+ "transformed_df.to_csv(\"../../data/cleaned_cuisines.csv\")"
]
},
{
diff --git a/4-Classification/1-Introduction/translations/README.it.md b/4-Classification/1-Introduction/translations/README.it.md
new file mode 100644
index 00000000..fabfec5e
--- /dev/null
+++ b/4-Classification/1-Introduction/translations/README.it.md
@@ -0,0 +1,297 @@
+# Introduzione alla classificazione
+
+In queste quattro lezioni si esplorerà un focus fondamentale del machine learning classico: _la classificazione_. Verrà analizzato l'utilizzo di vari algoritmi di classificazione con un insieme di dati su tutte le brillanti cucine dell'Asia e dell'India. Si spera siate affamati!
+
+
+
+> In queste lezioni di celebrano le cucine panasiatiche! Immagine di [Jen Looper](https://twitter.com/jenlooper)
+
+La classificazione è una forma di [apprendimento supervisionato](https://it.wikipedia.org/wiki/Apprendimento_supervisionato) che ha molto in comune con le tecniche di regressione. Se machine learning riguarda la previsione di valori o nomi di cose utilizzando insiemi di dati, la classificazione generalmente rientra in due gruppi: _classificazione binaria_ e _classificazione multiclasse_.
+
+[](https://youtu.be/eg8DJYwdMyg "Introduzione alla classificazione")
+
+> 🎥 Fare clic sull'immagine sopra per un video: John Guttag del MIT introduce la classificazione
+
+Ricordare:
+
+- La **regressione lineare** ha aiutato a prevedere le relazioni tra le variabili e a fare previsioni accurate su dove un nuovo punto dati si sarebbe posizionato in relazione a quella linea. Quindi, si potrebbe prevedere _quale prezzo avrebbe una zucca a settembre rispetto a dicembre_, ad esempio.
+- La **regressione logistica** ha aiutato a scoprire le "categorie binarie": a questo prezzo, _questa zucca è arancione o non arancione_?
+
+La classificazione utilizza vari algoritmi per determinare altri modi per definire l'etichetta o la classe di un punto dati. Si lavorerà con questi dati di cucina per vedere se, osservando un gruppo di ingredienti, è possibile determinarne la cucina di origine.
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/19/)
+
+### Introduzione
+
+La classificazione è una delle attività fondamentali del ricercatore di machine learning e data scientist. Dalla classificazione basica di un valore binario ("questa email è spam o no?"), alla complessa classificazione e segmentazione di immagini utilizzando la visione artificiale, è sempre utile essere in grado di ordinare i dati in classi e porre domande su di essi.
+
+Per definire il processo in modo più scientifico, il metodo di classificazione crea un modello predittivo che consente di mappare la relazione tra le variabili di input e le variabili di output.
+
+
+
+> Problemi binari e multiclasse per la gestione di algoritmi di classificazione. Infografica di [Jen Looper](https://twitter.com/jenlooper)
+
+Prima di iniziare il processo di pulizia dei dati, visualizzazione e preparazione per le attività di machine learning, si apprenderà qualcosa circa i vari modi in cui machine learning può essere sfruttato per classificare i dati.
+
+Derivata dalla [statistica](https://it.wikipedia.org/wiki/Classificazione_statistica), la classificazione che utilizza machine learning classico utilizza caratteristiche come l'`essere fumatore`, il `peso` e l'`età` per determinare _la probabilità di sviluppare la malattia X._ Essendo una tecnica di apprendimento supervisionata simile agli esercizi di regressione eseguiti in precedenza, i dati vengono etichettati e gli algoritmi ML utilizzano tali etichette per classificare e prevedere le classi (o "caratteristiche") di un insieme di dati e assegnarle a un gruppo o risultato.
+
+✅ Si prenda un momento per immaginare un insieme di dati sulle cucine. A cosa potrebbe rispondere un modello multiclasse? A cosa potrebbe rispondere un modello binario? Se si volesse determinare se una determinata cucina potrebbe utilizzare il fieno greco? Se si volesse vedere se, regalando una busta della spesa piena di anice stellato, carciofi, cavolfiori e rafano, si possa creare un piatto tipico indiano?
+
+[](https://youtu.be/GuTeDbaNoEU " Cestini misteriosi pazzeschi")
+
+> 🎥 Fare clic sull'immagine sopra per un video. L'intera premessa dello spettacolo 'Chopped' è il 'cesto misterioso' dove gli chef devono preparare un piatto con una scelta casuale di ingredienti. Sicuramente un modello ML avrebbe aiutato!
+
+## Ciao 'classificatore'
+
+La domanda che si vuole porre a questo insieme di dati sulla cucina è in realtà una **domanda multiclasse**, poiché ci sono diverse potenziali cucine nazionali con cui lavorare. Dato un lotto di ingredienti, in quale di queste molte classi si identificheranno i dati?
+
+Scikit-learn offre diversi algoritmi da utilizzare per classificare i dati, a seconda del tipo di problema che si desidera risolvere. Nelle prossime due lezioni si impareranno a conoscere molti di questi algoritmi.
+
+## Esercizio: pulire e bilanciare i dati
+
+Il primo compito, prima di iniziare questo progetto, sarà pulire e **bilanciare** i dati per ottenere risultati migliori. Si inizia con il file vuoto _notebook.ipynb_ nella radice di questa cartella.
+
+La prima cosa da installare è [imblearn](https://imbalanced-learn.org/stable/). Questo è un pacchetto di apprendimento di Scikit che consentirà di bilanciare meglio i dati (si imparerà di più su questa attività tra un minuto).
+
+1. Per installare `imblearn`, eseguire `pip install`, in questo modo:
+
+ ```python
+ pip install imblearn
+ ```
+
+1. Importare i pacchetti necessari per caricare i dati e visualizzarli, importare anche `SMOTE` da `imblearn`.
+
+ ```python
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ import matplotlib as mpl
+ import numpy as np
+ from imblearn.over_sampling import SMOTE
+ ```
+
+ Ora si è pronti per la successiva importazione dei dati.
+
+1. Il prossimo compito sarà quello di importare i dati:
+
+ ```python
+ df = pd.read_csv('../data/cuisines.csv')
+ ```
+
+ Usando `read_csv()` si leggerà il contenuto del file csv _cusines.csv_ e lo posizionerà nella variabile `df`.
+
+1. Controllare la forma dei dati:
+
+ ```python
+ df.head()
+ ```
+
+ Le prime cinque righe hanno questo aspetto:
+
+ ```output
+ | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+ | 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+ ```
+
+1. Si possono ottienere informazioni su questi dati chiamando `info()`:
+
+ ```python
+ df.info()
+ ```
+
+ Il risultato assomiglia a:
+
+ ```output
+
+ RangeIndex: 2448 entries, 0 to 2447
+ Columns: 385 entries, Unnamed: 0 to zucchini
+ dtypes: int64(384), object(1)
+ memory usage: 7.2+ MB
+ ```
+
+## Esercizio - conoscere le cucine
+
+Ora il lavoro inizia a diventare più interessante. Si scoprirà la distribuzione dei dati, per cucina
+
+1. Tracciare i dati come barre chiamando `barh()`:
+
+ ```python
+ df.cuisine.value_counts().plot.barh()
+ ```
+
+ 
+
+ Esiste un numero finito di cucine, ma la distribuzione dei dati non è uniforme. Si può sistemare! Prima di farlo, occorre esplorare un po' di più.
+
+1. Si deve scoprire quanti dati sono disponibili per cucina e stamparli:
+
+ ```python
+ thai_df = df[(df.cuisine == "thai")]
+ japanese_df = df[(df.cuisine == "japanese")]
+ chinese_df = df[(df.cuisine == "chinese")]
+ indian_df = df[(df.cuisine == "indian")]
+ korean_df = df[(df.cuisine == "korean")]
+
+ print(f'thai df: {thai_df.shape}')
+ print(f'japanese df: {japanese_df.shape}')
+ print(f'chinese df: {chinese_df.shape}')
+ print(f'indian df: {indian_df.shape}')
+ print(f'korean df: {korean_df.shape}')
+ ```
+
+ il risultato si presenta così:
+
+ ```output
+ thai df: (289, 385)
+ japanese df: (320, 385)
+ chinese df: (442, 385)
+ indian df: (598, 385)
+ korean df: (799, 385)
+ ```
+
+## Alla scoperta degli ingredienti
+
+Ora si possono approfondire i dati e scoprire quali sono gli ingredienti tipici per cucina. Si dovrebbero ripulire i dati ricorrenti che creano confusione tra le cucine, quindi si affronterà questo problema.
+
+1. Creare una funzione `create_ingredient()` in Python per creare un dataframe ingredient Questa funzione inizierà eliminando una colonna non utile e ordinando gli ingredienti in base al loro conteggio:
+
+ ```python
+ def create_ingredient_df(df):
+ ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
+ ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
+ ingredient_df = ingredient_df.sort_values(by='value', ascending=False
+ inplace=False)
+ return ingredient_df
+ ```
+
+ Ora si può usare questa funzione per farsi un'idea dei primi dieci ingredienti più popolari per cucina.
+
+1. Chiamare `create_ingredient_df()` e tracciare il grafico chiamando `barh()`:
+
+ ```python
+ thai_ingredient_df = create_ingredient_df(thai_df)
+ thai_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Fare lo stesso per i dati giapponesi:
+
+ ```python
+ japanese_ingredient_df = create_ingredient_df(japanese_df)
+ japanese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Ora per gli ingredienti cinesi:
+
+ ```python
+ chinese_ingredient_df = create_ingredient_df(chinese_df)
+ chinese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Tracciare gli ingredienti indiani:
+
+ ```python
+ indian_ingredient_df = create_ingredient_df(indian_df)
+ indian_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Infine, tracciare gli ingredienti coreani:
+
+ ```python
+ korean_ingredient_df = create_ingredient_df(korean_df)
+ korean_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Ora, eliminare gli ingredienti più comuni che creano confusione tra le diverse cucine, chiamando `drop()`:
+
+ Tutti amano il riso, l'aglio e lo zenzero!
+
+ ```python
+ feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)
+ labels_df = df.cuisine #.unique()
+ feature_df.head()
+ ```
+
+## Bilanciare l'insieme di dati
+
+Ora che i dati sono puliti, si usa [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html) - "Tecnica di sovracampionamento della minoranza sintetica" - per bilanciarlo.
+
+1. Chiamare `fit_resample()`, questa strategia genera nuovi campioni per interpolazione.
+
+ ```python
+ oversample = SMOTE()
+ transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)
+ ```
+
+ Bilanciando i dati, si otterranno risultati migliori quando si classificano. Si pensi a una classificazione binaria. Se la maggior parte dei dati è una classe, un modello ML prevederà quella classe più frequentemente, solo perché ci sono più dati per essa. Il bilanciamento dei dati prende tutti i dati distorti e aiuta a rimuovere questo squilibrio.
+
+1. Ora si può controllare il numero di etichette per ingrediente:
+
+ ```python
+ print(f'new label count: {transformed_label_df.value_counts()}')
+ print(f'old label count: {df.cuisine.value_counts()}')
+ ```
+
+ il risultato si presenta così:
+
+ ```output
+ new label count: korean 799
+ chinese 799
+ indian 799
+ japanese 799
+ thai 799
+ Name: cuisine, dtype: int64
+ old label count: korean 799
+ indian 598
+ chinese 442
+ japanese 320
+ thai 289
+ Name: cuisine, dtype: int64
+ ```
+
+ I dati sono belli e puliti, equilibrati e molto deliziosi!
+
+1. L'ultimo passaggio consiste nel salvare i dati bilanciati, incluse etichette e caratteristiche, in un nuovo dataframe che può essere esportato in un file:
+
+ ```python
+ transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')
+ ```
+
+1. Si può dare un'altra occhiata ai dati usando `transform_df.head()` e `transform_df.info()`. Salvare una copia di questi dati per utilizzarli nelle lezioni future:
+
+ ```python
+ transformed_df.head()
+ transformed_df.info()
+ transformed_df.to_csv("../data/cleaned_cuisine.csv")
+ ```
+
+ Questo nuovo CSV può ora essere trovato nella cartella data in radice.
+
+---
+
+## 🚀 Sfida
+
+Questo programma di studi contiene diversi insiemi di dati interessanti. Esaminare le cartelle `data` e vedere se contiene insiemi di dati che sarebbero appropriati per la classificazione binaria o multiclasse. Quali domande si farebbero a questo insieme di dati?
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/20/)
+
+## Revisione e Auto Apprendimento
+
+Esplorare l'API di SMOTE. Per quali casi d'uso è meglio usarla? Quali problemi risolve?
+
+## Compito
+
+[Esplorare i metodi di classificazione](assignment.it.md)
diff --git a/4-Classification/1-Introduction/translations/README.tr.md b/4-Classification/1-Introduction/translations/README.tr.md
new file mode 100644
index 00000000..a2b1d92f
--- /dev/null
+++ b/4-Classification/1-Introduction/translations/README.tr.md
@@ -0,0 +1,298 @@
+# Sınıflandırmaya giriş
+
+Bu dört derste klasik makine öğreniminin temel bir odağı olan _sınıflandırma_ konusunu keşfedeceksiniz. Asya ve Hindistan'ın nefis mutfağının tamamı üzerine hazırlanmış bir veri setiyle çeşitli sınıflandırma algoritmalarını kullanmanın üzerinden geçeceğiz. Umarız açsınızdır!
+
+
+
+> Bu derslerede Pan-Asya mutfağını kutlayın! Fotoğraf [Jen Looper](https://twitter.com/jenlooper) tarafından çekilmiştir.
+
+Sınıflandırma, regresyon yöntemleriyle birçok ortak özelliği olan bir [gözetimli öğrenme](https://wikipedia.org/wiki/Supervised_learning) biçimidir. Eğer makine öğrenimi tamamen veri setleri kullanarak değerleri veya nesnelere verilecek isimleri öngörmekse, sınıflandırma genellikle iki gruba ayrılır: _ikili sınıflandırma_ ve _çok sınıflı sınıflandırma_.
+
+[](https://youtu.be/eg8DJYwdMyg "Introduction to classification")
+
+> :movie_camera: Video için yukarıdaki fotoğrafa tıklayın: MIT's John Guttag introduces classification (MIT'den John Guttag sınıflandırmayı tanıtıyor)
+
+Hatırlayın:
+
+- **Doğrusal regresyon** değişkenler arasındaki ilişkileri öngörmenize ve o doğruya ilişkili olarak yeni bir veri noktasının nereye düşeceğine dair doğru öngörülerde bulunmanıza yardımcı oluyordu. Yani, _bir balkabağının fiyatının aralık ayına göre eylül ayında ne kadar olabileceğini_ öngörebilirsiniz örneğin.
+- **Lojistik regresyon** "ikili kategoriler"i keşfetmenizi sağlamıştı: bu fiyat noktasında, _bu balkabağı turuncu mudur, turuncu-değil midir?_
+
+Sınıflandırma, bir veri noktasının etiketini veya sınıfını belirlemek için farklı yollar belirlemek üzere çeşitli algoritmalar kullanır. Bir grup malzemeyi gözlemleyerek kökeninin hangi mutfak olduğunu belirleyip belirleyemeyeceğimizi görmek için bu mutfak verisiyle çalışalım.
+
+## [Ders öncesi kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/19/?loc=tr)
+
+### Giriş
+
+Sınıflandırma, makine öğrenimi araştırmacısının ve veri bilimcisinin temel işlerinden biridir. İkili bir değerin temel sınıflandırmasından ("Bu e-posta gereksiz (spam) midir yoksa değil midir?") bilgisayarla görüden yararlanarak karmaşık görüntü sınıflandırma ve bölütlemeye kadar, veriyi sınıf sınıf sıralayabilmek ve soru sorabilmek daima faydalıdır.
+
+Süreci daha bilimsel bir yolla ifade etmek gerekirse, sınıflandırma yönteminiz, girdi bilinmeyenlerinin arasındaki ilişkiyi çıktı bilinmeyenlerine eşlemenizi sağlayan öngörücü bir model oluşturur.
+
+
+
+> Sınıflandırma algoritmalarının başa çıkması gereken ikili ve çok sınıflı problemler. Bilgilendirme grafiği [Jen Looper](https://twitter.com/jenlooper) tarafından hazırlanmıştır.
+
+Verimizi temizleme, görselleştirme ve makine öğrenimi görevleri için hazırlama süreçlerine başlamadan önce, veriyi sınıflandırmak için makine öğreniminin leveraj edilebileceği çeşitli yolları biraz öğrenelim.
+
+[İstatistikten](https://wikipedia.org/wiki/Statistical_classification) türetilmiş olarak, klasik makine öğrenimi kullanarak sınıflandırma, _X hastalığının gelişmesi ihtimalini_ belirlemek için `smoker`, `weight`, ve `age` gibi öznitelikler kullanır. Daha önce yaptığınız regresyon alıştırmalarına benzeyen bir gözetimli öğrenme yöntemi olarak, veriniz etiketlenir ve makine öğrenimi algoritmaları o etiketleri, sınıflandırmak ve veri setinin sınıflarını (veya 'özniteliklerini') öngörmek ve onları bir gruba veya bir sonuca atamak için kullanır.
+
+:white_check_mark: Mutfaklarla ilgili bir veri setini biraz düşünün. Çok sınıflı bir model neyi cevaplayabilir? İkili bir model neyi cevaplayabilir? Farz edelim ki verilen bir mutfağın çemen kullanmasının muhtemel olup olmadığını belirlemek istiyorsunuz. Farzedelim ki yıldız anason, enginar, karnabahar ve bayır turpu ile dolu bir alışveriş poşetinden tipik bir Hint yemeği yapıp yapamayacağınızı görmek istiyorsunuz.
+
+[](https://youtu.be/GuTeDbaNoEU "Crazy mystery baskets")
+
+> :movie_camera: Video için yukarıdaki fotoğrafa tıklayın. Aşçıların rastgele malzeme seçeneklerinden yemek yaptığı 'Chopped' programının tüm olayı 'gizem sepetleri'dir. Kuşkusuz, bir makine öğrenimi modeli onlara yardımcı olurdu!
+
+## Merhaba 'sınıflandırıcı'
+
+Bu mutfak veri setiyle ilgili sormak istediğimiz soru aslında bir **çok sınıflı soru**dur çünkü elimizde farklı potansiyel ulusal mutfaklar var. Verilen bir grup malzeme için, veri bu sınıflardan hangisine uyacak?
+
+Scikit-learn, veriyi sınıflandırmak için kullanmak üzere, çözmek istediğiniz problem çeşidine bağlı olarak, çeşitli farklı algoritmalar sunar. Önümüzdeki iki derste, bu algoritmalardan birkaçını öğreneceksiniz.
+
+## Alıştırma - verinizi temizleyip dengeleyin
+
+Bu projeye başlamadan önce elinizdeki ilk görev, daha iyi sonuçlar almak için, verinizi temizlemek ve **dengelemek**. Üst klasördeki boş _notebook.ipynb_ dosyasıyla başlayın.
+
+Kurmanız gereken ilk şey [imblearn](https://imbalanced-learn.org/stable/). Bu, veriyi daha iyi dengelemenizi sağlayacak bir Scikit-learn paketidir. (Bu görev hakkında birazdan daha fazla bilgi göreceksiniz.)
+
+1. `imblearn` kurun, `pip install` çalıştırın, şu şekilde:
+
+ ```python
+ pip install imblearn
+ ```
+
+1. Verinizi almak ve görselleştirmek için ihtiyaç duyacağınız paketleri alın (import edin), ayrıca `imblearn` paketinden `SMOTE` alın.
+
+ ```python
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ import matplotlib as mpl
+ import numpy as np
+ from imblearn.over_sampling import SMOTE
+ ```
+
+ Şimdi okumak için hazırsınız, sonra veriyi alın.
+
+1. Sonraki görev veriyi almak olacak:
+
+ ```python
+ df = pd.read_csv('../data/cuisines.csv')
+ ```
+
+ `read_csv()` kullanmak _cusines.csv_ csv dosyasının içeriğini okuyacak ve `df` değişkenine yerleştirecek.
+
+1. Verinin şeklini kontrol edin:
+
+ ```python
+ df.head()
+ ```
+
+ İlk beş satır şöyle görünüyor:
+
+ ```output
+ | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+ | 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+ ```
+
+1. `info()` fonksiyonunu çağırarak bu veri hakkında bilgi edinin:
+
+ ```python
+ df.info()
+ ```
+
+ Çıktınız şuna benzer:
+
+ ```output
+
+ RangeIndex: 2448 entries, 0 to 2447
+ Columns: 385 entries, Unnamed: 0 to zucchini
+ dtypes: int64(384), object(1)
+ memory usage: 7.2+ MB
+ ```
+
+## Alıştırma - mutfaklar hakkında bilgi edinmek
+
+Şimdi, işimiz daha da ilginçleşmeye başlıyor. Mutfak mutfak verinin dağılımını keşfedelim
+
+1. `barh()` fonksiyonunu çağırarak veriyi sütunlarla çizdirin:
+
+ ```python
+ df.cuisine.value_counts().plot.barh()
+ ```
+
+ 
+
+ Sonlu sayıda mutfak var, ancak verinin dağılımı düzensiz. Bunu düzeltebilirsiniz! Bunu yapmadan önce, biraz daha keşfedelim.
+
+1. Her mutfak için ne kadar verinin mevcut olduğunu bulun ve yazdırın:
+
+ ```python
+ thai_df = df[(df.cuisine == "thai")]
+ japanese_df = df[(df.cuisine == "japanese")]
+ chinese_df = df[(df.cuisine == "chinese")]
+ indian_df = df[(df.cuisine == "indian")]
+ korean_df = df[(df.cuisine == "korean")]
+
+ print(f'thai df: {thai_df.shape}')
+ print(f'japanese df: {japanese_df.shape}')
+ print(f'chinese df: {chinese_df.shape}')
+ print(f'indian df: {indian_df.shape}')
+ print(f'korean df: {korean_df.shape}')
+ ```
+
+ çıktı şöyle görünür:
+
+ ```output
+ thai df: (289, 385)
+ japanese df: (320, 385)
+ chinese df: (442, 385)
+ indian df: (598, 385)
+ korean df: (799, 385)
+ ```
+
+## Malzemeleri keşfetme
+
+Şimdi veriyi daha derinlemesine inceleyebilirsiniz ve her mutfak için tipik malzemelerin neler olduğunu öğrenebilirsiniz. Mutfaklar arasında karışıklık yaratan tekrar eden veriyi temizlemelisiniz, dolayısıyla şimdi bu problemle ilgili bilgi edinelim.
+
+1. Python'da, malzeme veri iskeleti yaratmak için `create_ingredient_df()` diye bir fonksiyon oluşturun. Bu fonksiyon, yardımcı olmayan bir sütunu temizleyerek ve sayılarına göre malzemeleri sıralayarak başlar:
+
+ ```python
+ def create_ingredient_df(df):
+ ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
+ ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
+ ingredient_df = ingredient_df.sort_values(by='value', ascending=False
+ inplace=False)
+ return ingredient_df
+ ```
+
+ Şimdi bu fonksiyonu, her mutfağın en yaygın ilk on malzemesi hakkında hakkında fikir edinmek için kullanabilirsiniz.
+
+1. `create_ingredient_df()` fonksiyonunu çağırın ve `barh()` fonksiyonunu çağırarak çizdirin:
+
+ ```python
+ thai_ingredient_df = create_ingredient_df(thai_df)
+ thai_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Japon verisi için de aynısını yapın:
+
+ ```python
+ japanese_ingredient_df = create_ingredient_df(japanese_df)
+ japanese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Şimdi Çin malzemeleri için yapın:
+
+ ```python
+ chinese_ingredient_df = create_ingredient_df(chinese_df)
+ chinese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Hint malzemelerini çizdirin:
+
+ ```python
+ indian_ingredient_df = create_ingredient_df(indian_df)
+ indian_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Son olarak, Kore malzemelerini çizdirin:
+
+ ```python
+ korean_ingredient_df = create_ingredient_df(korean_df)
+ korean_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. Şimdi, `drop()` fonksiyonunu çağırarak, farklı mutfaklar arasında karışıklığa sebep olan en çok ortaklık taşıyan malzemeleri temizleyelim:
+
+ Herkes pirinci, sarımsağı ve zencefili seviyor!
+
+ ```python
+ feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)
+ labels_df = df.cuisine #.unique()
+ feature_df.head()
+ ```
+
+## Veri setini dengeleyin
+
+Veriyi temizlediniz, şimdi [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html) - "Synthetic Minority Over-sampling Technique" ("Sentetik Azınlık Aşırı-Örnekleme/Örneklem-Artırma Tekniği") kullanarak dengeleyelim.
+
+1. `fit_resample()` fonksiyonunu çağırın, bu strateji ara değerlemeyle yeni örnekler üretir.
+
+ ```python
+ oversample = SMOTE()
+ transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)
+ ```
+
+ Verinizi dengeleyerek, sınıflandırırken daha iyi sonuçlar alabileceksiniz. Bir ikili sınıflandırma düşünün. Eğer verimizin çoğu tek bir sınıfsa, bir makine öğrenimi modeli, sırf onun için daha fazla veri olduğundan o sınıfı daha sık tahmin edecektir. Veriyi dengelemek herhangi eğri veriyi alır ve bu dengesizliğin ortadan kaldırılmasına yardımcı olur.
+
+1. Şimdi, her bir malzeme için etiket sayısını kontrol edebilirsiniz:
+
+ ```python
+ print(f'new label count: {transformed_label_df.value_counts()}')
+ print(f'old label count: {df.cuisine.value_counts()}')
+ ```
+
+ Çıktınız şöyle görünür:
+
+ ```output
+ new label count: korean 799
+ chinese 799
+ indian 799
+ japanese 799
+ thai 799
+ Name: cuisine, dtype: int64
+ old label count: korean 799
+ indian 598
+ chinese 442
+ japanese 320
+ thai 289
+ Name: cuisine, dtype: int64
+ ```
+
+ Veri şimdi tertemiz, dengeli ve çok lezzetli!
+
+1. Son adım, dengelenmiş verinizi, etiket ve özniteliklerle beraber, yeni bir dosyaya gönderilebilecek yeni bir veri iskeletine kaydetmek:
+
+ ```python
+ transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')
+ ```
+
+1. `transformed_df.head()` ve `transformed_df.info()` fonksiyonlarını kullanarak verinize bir kez daha göz atabilirsiniz. Gelecek derslerde kullanabilmek için bu verinin bir kopyasını kaydedin:
+
+ ```python
+ transformed_df.head()
+ transformed_df.info()
+ transformed_df.to_csv("../../data/cleaned_cuisines.csv")
+
+ ```
+
+ Bu yeni CSV şimdi kök data (veri) klasöründe görülebilir.
+
+---
+
+## :rocket: Meydan okuma
+
+Bu öğretim programı farklı ilgi çekici veri setleri içermekte. `data` klasörlerini inceleyin ve ikili veya çok sınıflı sınıflandırma için uygun olabilecek veri setleri bulunduran var mı, bakın. Bu veri seti için hangi soruları sorabilirdiniz?
+
+## [Ders sonrası kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/20/?loc=tr)
+
+## Gözden Geçirme & Kendi Kendine Çalışma
+
+SMOTE'nin API'ını keşfedin. En iyi hangi durumlar için kullanılıyor? Hangi problemleri çözüyor?
+
+## Ödev
+
+[Sınıflandırma yöntemlerini keşfedin](assignment.tr.md)
diff --git a/4-Classification/1-Introduction/translations/README.zh-cn.md b/4-Classification/1-Introduction/translations/README.zh-cn.md
new file mode 100644
index 00000000..ae8d123b
--- /dev/null
+++ b/4-Classification/1-Introduction/translations/README.zh-cn.md
@@ -0,0 +1,291 @@
+# 对分类方法的介绍
+
+在这四节课程中,你将会学习机器学习中一个基本的重点 - _分类_. 我们会在关于亚洲和印度的神奇的美食的数据集上尝试使用多种分类算法。希望你有点饿了。
+
+
+
+>在学习的课程中赞叹泛亚地区的美食吧! 图片由 [Jen Looper](https://twitter.com/jenlooper)提供
+
+分类算法是[监督学习](https://wikipedia.org/wiki/Supervised_learning) 的一种。它与回归算法在很多方面都有相同之处。如果机器学习所有的目标都是使用数据集来预测数值或物品的名字,那么分类算法通常可以分为两类 _二元分类_ 和 _多元分类_。
+
+[](https://youtu.be/eg8DJYwdMyg "对分类算法的介绍")
+
+> 🎥 点击上方给的图片可以跳转到一个视频-MIT的John对分类算法的介绍
+
+请记住:
+
+- **线性回归** 帮助你预测变量之间的关系并对一个新的数据点会落在哪条线上做出精确的预测。因此,你可以预测 _南瓜在九月的价格和十月的价格_。
+- **逻辑回归** 帮助你发现“二元范畴”:即在当前这个价格, _这个南瓜是不是橙色_?
+
+分类方法采用多种算法来确定其他可以用来确定一个数据点的标签或类别的方法。让我们来研究一下这个数据集,看看我们能否通过观察菜肴的原料来确定它的源头。
+
+## [课程前的小问题](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/19/)
+
+分类是机器学习研究者和数据科学家使用的一种基本方法。从基本的二元分类(这是不是一份垃圾邮件?)到复杂的图片分类和使用计算机视觉的分割技术,它都是将数据分类并提出相关问题的有效工具。
+
+
+
+> 需要分类算法解决的二元分类和多元分类问题的对比. 信息图由[Jen Looper](https://twitter.com/jenlooper)提供
+
+在开始清洗数据、数据可视化和调整数据以适应机器学习的任务前,让我们来了解一下多种可用来数据分类的机器学习方法。
+
+派生自[统计数学](https://wikipedia.org/wiki/Statistical_classification),分类算法使用经典的机器学习的一些特征,比如通过'吸烟者'、'体重'和'年龄'来推断 _罹患某种疾病的可能性_。作为一个与你刚刚实践过的回归算法很相似的监督学习算法,你的数据是被标记过的并且算法通过采集这些标签来进行分类和预测并进行输出。
+
+✅ 花一点时间来想象一下一个关于菜肴的数据集。一个多元分类的模型应该能回答什么问题?一个二元分类的模型又应该能回答什么?如果你想确定一个给定的菜肴是否会用到葫芦巴(一种植物,种子用来调味)该怎么做?如果你想知道给你一个装满了八角茴香、花椰菜和辣根的购物袋你能否做出一道代表性的印度菜又该怎么做?
+
+[](https://youtu.be/GuTeDbaNoEU "疯狂的神秘篮子")
+
+> 🎥 点击图像观看视频。整个'Chopped'节目的前提都是建立在神秘的篮子上,在这个节目中厨师必须利用随机给定的食材做菜。可见一个机器学习模型能起到不小的作用
+
+## 初见-分类器
+
+我们关于这个菜肴数据集想要提出的问题其实是一个 **多元问题**,因为我们有很多潜在的具有代表性的菜肴。给定一系列食材数据,数据能够符合这些类别中的哪一类?
+
+Scikit-learn项目提供多种对数据进行分类的算法,你需要根据问题的具体类型来进行选择。在下两节课程中你会学到这些算法中的几个。
+
+## 练习 - 清洗并平衡你的数据
+
+在你开始进行这个项目前的第一个上手的任务就是清洗和 **平衡**你的数据来得到更好的结果。从当前目录的根目录中的 _nodebook.ipynb_ 开始。
+
+第一个需要安装的东西是 [imblearn](https://imbalanced-learn.org/stable/)这是一个Scikit-learn项目中的一个包,它可以让你更好的平衡数据 (关于这个任务你很快你就会学到更多)。
+
+1. 安装 `imblearn`, 运行命令 `pip install`:
+
+ ```python
+ pip install imblearn
+ ```
+
+1. 为了导入和可视化数据你需要导入下面的这些包, 你还需要从`imblearn`导入`SMOTE`
+
+ ```python
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ import matplotlib as mpl
+ import numpy as np
+ from imblearn.over_sampling import SMOTE
+ ```
+
+ 现在你已经准备好导入数据了。
+
+1. 下一项任务是导入数据:
+
+ ```python
+ df = pd.read_csv('../data/cuisines.csv')
+ ```
+
+ 使用函数 `read_csv()` 会读取csv文件的内容 _cusines.csv_ 并将内容放置在 变量`df`中。
+
+1. 检查数据的形状是否正确:
+
+ ```python
+ df.head()
+ ```
+
+ 前五行输出应该是这样的:
+
+ ```output
+ | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+ | 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+ ```
+
+1. 调用函数 `info()` 可以获得有关这个数据集的信息:
+
+ ```python
+ df.info()
+ ```
+
+ Your out resembles:
+
+ ```output
+
+ RangeIndex: 2448 entries, 0 to 2447
+ Columns: 385 entries, Unnamed: 0 to zucchini
+ dtypes: int64(384), object(1)
+ memory usage: 7.2+ MB
+ ```
+
+ ## 练习 - 了解这些菜肴
+
+现在任务变得更有趣了,让我们来探索如何将数据分配给各个菜肴
+
+1. 调用函数 `barh()`可以绘制出数据的条形图:
+
+ ```python
+ df.cuisine.value_counts().plot.barh()
+ ```
+
+ 
+
+ 这里有有限的一些菜肴,但是数据的分配是不平均的。但是你可以修正这一现象!在这样做之前再稍微探索一下。
+
+1. 找出对于每个菜肴有多少数据是有效的并将其打印出来:
+
+ ```python
+ thai_df = df[(df.cuisine == "thai")]
+ japanese_df = df[(df.cuisine == "japanese")]
+ chinese_df = df[(df.cuisine == "chinese")]
+ indian_df = df[(df.cuisine == "indian")]
+ korean_df = df[(df.cuisine == "korean")]
+
+ print(f'thai df: {thai_df.shape}')
+ print(f'japanese df: {japanese_df.shape}')
+ print(f'chinese df: {chinese_df.shape}')
+ print(f'indian df: {indian_df.shape}')
+ print(f'korean df: {korean_df.shape}')
+ ```
+
+ 输出应该是这样的 :
+
+ ```output
+ thai df: (289, 385)
+ japanese df: (320, 385)
+ chinese df: (442, 385)
+ indian df: (598, 385)
+ korean df: (799, 385)
+ ```
+## 探索有关食材的内容
+
+现在你可以在数据中探索的更深一点并了解每道菜肴的代表性食材。你需要将反复出现的、容易造成混淆的数据清理出去,那么让我们来学习解决这个问题。
+
+1. 在Python中创建一个函数 `create_ingredient_df()` 来创建一个食材的数据帧。这个函数会去掉数据中无用的列并按食材的数量进行分类。
+
+ ```python
+ def create_ingredient_df(df):
+ ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
+ ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
+ ingredient_df = ingredient_df.sort_values(by='value', ascending=False
+ inplace=False)
+ return ingredient_df
+ ```
+现在你可以使用这个函数来得到理想的每道菜肴最重要的10种食材。
+
+1. 调用函数 `create_ingredient_df()` 然后通过函数`barh()`来绘制图像:
+
+ ```python
+ thai_ingredient_df = create_ingredient_df(thai_df)
+ thai_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 对日本的数据进行相同的操作:
+
+ ```python
+ japanese_ingredient_df = create_ingredient_df(japanese_df)
+ japanese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 现在处理中国的数据:
+
+ ```python
+ chinese_ingredient_df = create_ingredient_df(chinese_df)
+ chinese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 绘制印度食材的数据:
+
+ ```python
+ indian_ingredient_df = create_ingredient_df(indian_df)
+ indian_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 最后,绘制韩国的食材的数据:
+
+ ```python
+ korean_ingredient_df = create_ingredient_df(korean_df)
+ korean_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 现在,去除在不同的菜肴间最普遍的容易造成混乱的食材,调用函数 `drop()`:
+
+ 大家都喜欢米饭、大蒜和生姜
+
+ ```python
+ feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)
+ labels_df = df.cuisine #.unique()
+ feature_df.head()
+ ```
+
+## 平衡数据集
+
+现在你已经清理过数据集了, 使用 [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html) - "Synthetic Minority Over-sampling Technique" - 来平衡数据集。
+
+1. 调用函数 `fit_resample()`, 此方法通过插入数据来生成新的样本
+
+ ```python
+ oversample = SMOTE()
+ transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)
+ ```
+
+ 通过对数据集的平衡,当你对数据进行分类时能够得到更好的结果。现在考虑一个二元分类的问题,如果你的数据集中的大部分数据都属于其中一个类别,那么机器学习的模型就会因为在那个类别的数据更多而判断那个类别更为常见。平衡数据能够去除不公平的数据点。
+
+1. 现在你可以查看每个食材的标签数量:
+
+ ```python
+ print(f'new label count: {transformed_label_df.value_counts()}')
+ print(f'old label count: {df.cuisine.value_counts()}')
+ ```
+
+ 输出应该是这样的 :
+
+ ```output
+ new label count: korean 799
+ chinese 799
+ indian 799
+ japanese 799
+ thai 799
+ Name: cuisine, dtype: int64
+ old label count: korean 799
+ indian 598
+ chinese 442
+ japanese 320
+ thai 289
+ Name: cuisine, dtype: int64
+ ```
+
+ 现在这个数据集不仅干净、平衡而且还很“美味” !
+
+1. 最后一步是保存你处理过后的平衡的数据(包括标签和特征),将其保存为一个可以被输出到文件中的数据帧。
+
+ ```python
+ transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')
+ ```
+
+1. 你可以通过调用函数 `transformed_df.head()` 和 `transformed_df.info()`再检查一下你的数据。 接下来要将数据保存以供在未来的课程中使用:
+
+ ```python
+ transformed_df.head()
+ transformed_df.info()
+ transformed_df.to_csv("../data/cleaned_cuisines.csv")
+ ```
+
+ 这个全新的CSV文件可以在数据根目录中被找到。
+
+---
+
+## 🚀小练习
+
+本项目的全部课程含有很多有趣的数据集。 探索一下 `data`文件夹,看看这里面有没有适合二元分类、多元分类算法的数据集,再想一下你对这些数据集有没有什么想问的问题。
+
+## [课后练习](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/20/)
+
+## 回顾 & 自学
+
+探索一下 SMOTE的API文档。思考一下它最适合于什么样的情况、它能够解决什么样的问题。
+
+## 课后作业
+
+[探索一下分类方法](../assignment.md)
diff --git a/4-Classification/1-Introduction/translations/assignment.it.md b/4-Classification/1-Introduction/translations/assignment.it.md
new file mode 100644
index 00000000..12834017
--- /dev/null
+++ b/4-Classification/1-Introduction/translations/assignment.it.md
@@ -0,0 +1,11 @@
+# Esplorare i metodi di classificazione
+
+## Istruzioni
+
+Nella [documentazione](https://scikit-learn.org/stable/supervised_learning.html) di Scikit-learn si troverà un ampio elenco di modi per classificare i dati. Fare una piccola caccia al tesoro in questi documenti: l'obiettivo è cercare metodi di classificazione e abbinare un insieme di dati in questo programma di studi, una domanda che si può porre e una tecnica di classificazione. Creare un foglio di calcolo o una tabella in un file .doc e spiegare come funzionerebbe l'insieme di dati con l'algoritmo di classificazione.
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| | viene presentato un documento che riporta una panoramica di 5 algoritmi insieme a una tecnica di classificazione. La panoramica è ben spiegata e dettagliata. | viene presentato un documento che riporta una panoramica di 3 algoritmi insieme a una tecnica di classificazione. La panoramica è ben spiegata e dettagliata. | viene presentato un documento che riporta una panoramica di meno di tre algoritmi insieme a una tecnica di classificazione e la panoramica non è né ben spiegata né dettagliata. |
diff --git a/4-Classification/1-Introduction/translations/assignment.tr.md b/4-Classification/1-Introduction/translations/assignment.tr.md
new file mode 100644
index 00000000..99dfe5c2
--- /dev/null
+++ b/4-Classification/1-Introduction/translations/assignment.tr.md
@@ -0,0 +1,11 @@
+# Sınıflandırma yöntemlerini keşfedin
+
+## Yönergeler
+
+[Scikit-learn dokümentasyonunda](https://scikit-learn.org/stable/supervised_learning.html) veriyi sınıflandırma yöntemlerini içeren büyük bir liste göreceksiniz. Bu dokümanlar arasında ufak bir çöpçü avı yapın: Hedefiniz, sınıflandırma yöntemleri aramak ve bu eğitim programındaki bir veri seti, sorabileceğiniz bir soru ve bir sınıflandırma yöntemi eşleştirmek. Bir .doc dosyasında elektronik çizelge veya tablo hazırlayın ve veri setinin sınıflandırma algoritmasıyla nasıl çalışacağını açıklayın.
+
+## Rubrik
+
+| Ölçüt | Örnek Alınacak Nitelikte | Yeterli | Geliştirme Gerekli |
+| -------- | ----------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------------------------- |
+| | Bir sınıflandırma yönteminin yanısıra 5 algoritmayı inceleyen bir doküman sunulmuş. İnceleme iyi açıklanmış ve detaylı. | Bir sınıflandırma yönteminin yanısıra 5 algoritmayı inceleyen bir doküman sunulmuş. İnceleme iyi açıklanmış ve detaylı. | Bir sınıflandırma yönteminin yanısıra 3'ten az algoritmayı inceleyen bir doküman sunulmuş ve inceleme iyi açıklanmış veya detaylı değil. |
diff --git a/4-Classification/2-Classifiers-1/README.md b/4-Classification/2-Classifiers-1/README.md
index 15800922..c6b35219 100644
--- a/4-Classification/2-Classifiers-1/README.md
+++ b/4-Classification/2-Classifiers-1/README.md
@@ -15,21 +15,20 @@ Assuming you completed [Lesson 1](../1-Introduction/README.md), make sure that a
```python
import pandas as pd
- cuisines_df = pd.read_csv("../../data/cleaned_cuisine.csv")
+ cuisines_df = pd.read_csv("../../data/cleaned_cuisines.csv")
cuisines_df.head()
```
The data looks like this:
- ```output
- | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
- | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
- | 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
- ```
+| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+| 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
1. Now, import several more libraries:
@@ -68,13 +67,13 @@ Assuming you completed [Lesson 1](../1-Introduction/README.md), make sure that a
Your features look like this:
- | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini | |
- | -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: | --- |
- | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
- | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+| | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: | -----: |
+| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
Now you are ready to train your model!
@@ -200,13 +199,13 @@ Since you are using the multiclass case, you need to choose what _scheme_ to use
The result is printed - Indian cuisine is its best guess, with good probability:
- | | 0 | | | | | | | | | | | | | | | | | | | | |
- | -------: | -------: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
- | indian | 0.715851 | | | | | | | | | | | | | | | | | | | | |
- | chinese | 0.229475 | | | | | | | | | | | | | | | | | | | | |
- | japanese | 0.029763 | | | | | | | | | | | | | | | | | | | | |
- | korean | 0.017277 | | | | | | | | | | | | | | | | | | | | |
- | thai | 0.007634 | | | | | | | | | | | | | | | | | | | | |
+ | | 0 |
+ | -------: | -------: |
+ | indian | 0.715851 |
+ | chinese | 0.229475 |
+ | japanese | 0.029763 |
+ | korean | 0.017277 |
+ | thai | 0.007634 |
✅ Can you explain why the model is pretty sure this is an Indian cuisine?
@@ -217,22 +216,23 @@ Since you are using the multiclass case, you need to choose what _scheme_ to use
print(classification_report(y_test,y_pred))
```
- | precision | recall | f1-score | support | | | | | | | | | | | | | | | | | | |
- | ------------ | ------ | -------- | ------- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
- | chinese | 0.73 | 0.71 | 0.72 | 229 | | | | | | | | | | | | | | | | | |
- | indian | 0.91 | 0.93 | 0.92 | 254 | | | | | | | | | | | | | | | | | |
- | japanese | 0.70 | 0.75 | 0.72 | 220 | | | | | | | | | | | | | | | | | |
- | korean | 0.86 | 0.76 | 0.81 | 242 | | | | | | | | | | | | | | | | | |
- | thai | 0.79 | 0.85 | 0.82 | 254 | | | | | | | | | | | | | | | | | |
- | accuracy | 0.80 | 1199 | | | | | | | | | | | | | | | | | | | |
- | macro avg | 0.80 | 0.80 | 0.80 | 1199 | | | | | | | | | | | | | | | | | |
- | weighted avg | 0.80 | 0.80 | 0.80 | 1199 | | | | | | | | | | | | | | | | | |
+ | | precision | recall | f1-score | support |
+ | ------------ | ------ | -------- | ------- | ---- |
+ | chinese | 0.73 | 0.71 | 0.72 | 229 |
+ | indian | 0.91 | 0.93 | 0.92 | 254 |
+ | japanese | 0.70 | 0.75 | 0.72 | 220 |
+ | korean | 0.86 | 0.76 | 0.81 | 242 |
+ | thai | 0.79 | 0.85 | 0.82 | 254 |
+ | accuracy | 0.80 | 1199 | | |
+ | macro avg | 0.80 | 0.80 | 0.80 | 1199 |
+ | weighted avg | 0.80 | 0.80 | 0.80 | 1199 |
## 🚀Challenge
In this lesson, you used your cleaned data to build a machine learning model that can predict a national cuisine based on a series of ingredients. Take some time to read through the many options Scikit-learn provides to classify data. Dig deeper into the concept of 'solver' to understand what goes on behind the scenes.
## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/22/)
+
## Review & Self Study
Dig a little more into the math behind logistic regression in [this lesson](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf)
diff --git a/4-Classification/2-Classifiers-1/solution/notebook.ipynb b/4-Classification/2-Classifiers-1/solution/notebook.ipynb
index a819dbe5..770ac85c 100644
--- a/4-Classification/2-Classifiers-1/solution/notebook.ipynb
+++ b/4-Classification/2-Classifiers-1/solution/notebook.ipynb
@@ -47,7 +47,7 @@
],
"source": [
"import pandas as pd\n",
- "cuisines_df = pd.read_csv(\"../../data/cleaned_cuisine.csv\")\n",
+ "cuisines_df = pd.read_csv(\"../../data/cleaned_cuisines.csv\")\n",
"cuisines_df.head()"
]
},
diff --git a/4-Classification/2-Classifiers-1/translations/README.it.md b/4-Classification/2-Classifiers-1/translations/README.it.md
new file mode 100644
index 00000000..674a59c2
--- /dev/null
+++ b/4-Classification/2-Classifiers-1/translations/README.it.md
@@ -0,0 +1,241 @@
+# Classificatori di cucina 1
+
+In questa lezione, si utilizzerà l'insieme di dati salvati dall'ultima lezione, pieno di dati equilibrati e puliti relativi alle cucine.
+
+Si utilizzerà questo insieme di dati con una varietà di classificatori per _prevedere una determinata cucina nazionale in base a un gruppo di ingredienti_. Mentre si fa questo, si imparerà di più su alcuni dei modi in cui gli algoritmi possono essere sfruttati per le attività di classificazione.
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/21/)
+# Preparazione
+
+Supponendo che la [Lezione 1](../1-Introduction/README.md) sia stata completata, assicurarsi che _esista_ un file clean_cuisines.csv nella cartella in radice `/data` per queste quattro lezioni.
+
+## Esercizio - prevedere una cucina nazionale
+
+1. Lavorando con il _notebook.ipynb_ di questa lezione nella cartella radice, importare quel file insieme alla libreria Pandas:
+
+ ```python
+ import pandas as pd
+ cuisines_df = pd.read_csv("../../data/cleaned_cuisine.csv")
+ cuisines_df.head()
+ ```
+
+ I dati si presentano così:
+
+ ```output
+ | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+ | 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+ ```
+
+1. Ora importare molte altre librerie:
+
+ ```python
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ from sklearn.svm import SVC
+ import numpy as np
+ ```
+
+1. Dividere le coordinate X e y in due dataframe per l'addestramento. `cuisine` può essere il dataframe delle etichette:
+
+ ```python
+ cuisines_label_df = cuisines_df['cuisine']
+ cuisines_label_df.head()
+ ```
+
+ Apparirà così
+
+ ```output
+ 0 indian
+ 1 indian
+ 2 indian
+ 3 indian
+ 4 indian
+ Name: cuisine, dtype: object
+ ```
+
+1. Scartare la colonna `Unnamed: 0` e la colonna `cuisine` , chiamando `drop()`. Salvare il resto dei dati come caratteristiche addestrabili:
+
+ ```python
+ cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
+ cuisines_feature_df.head()
+ ```
+
+ Le caratteristiche sono così:
+
+ | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini | |
+ | -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: | --- |
+ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
+Ora si è pronti per addestrare il modello!
+
+## Scegliere il classificatore
+
+Ora che i dati sono puliti e pronti per l'addestramento, si deve decidere quale algoritmo utilizzare per il lavoro.
+
+Scikit-learn raggruppa la classificazione in Supervised Learning, e in quella categoria si troveranno molti modi per classificare. [La varietà](https://scikit-learn.org/stable/supervised_learning.html) è piuttosto sconcertante a prima vista. I seguenti metodi includono tutti tecniche di classificazione:
+
+- Modelli Lineari
+- Macchine a Vettori di Supporto
+- Discesa stocastica del gradiente
+- Nearest Neighbors
+- Processi Gaussiani
+- Alberi di Decisione
+- Apprendimento ensemble (classificatore di voto)
+- Algoritmi multiclasse e multioutput (classificazione multiclasse e multietichetta, classificazione multiclasse-multioutput)
+
+> Si possono anche usare [le reti neurali per classificare i dati](https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification), ma questo esula dall'ambito di questa lezione.
+
+### Con quale classificatore andare?
+
+Quale classificatore si dovrebbe scegliere? Spesso, scorrerne diversi e cercare un buon risultato è un modo per testare. Scikit-learn offre un [confronto fianco](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html) a fianco su un insieme di dati creato, confrontando KNeighbors, SVC in due modi, GaussianProcessClassifier, DecisionTreeClassifier, RandomForestClassifier, MLPClassifier, AdaBoostClassifier, GaussianNB e QuadraticDiscrinationAnalysis, mostrando i risultati visualizzati:
+
+
+> Grafici generati sulla documentazione di Scikit-learn
+
+> AutoML risolve questo problema in modo ordinato eseguendo questi confronti nel cloud, consentendo di scegliere l'algoritmo migliore per i propri dati. Si può provare [qui](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-15963-cxa)
+
+### Un approccio migliore
+
+Un modo migliore che indovinare a caso, tuttavia, è seguire le idee su questo [ML Cheat sheet](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-15963-cxa) scaricabile. Qui si scopre che, per questo problema multiclasse, si dispone di alcune scelte:
+
+
+> Una sezione dell'Algorithm Cheat Sheet di Microsoft, che descrive in dettaglio le opzioni di classificazione multiclasse
+
+✅ Scaricare questo cheat sheet, stamparlo e appenderlo alla parete!
+
+### Motivazione
+
+Si prova a ragionare attraverso diversi approcci dati i vincoli presenti:
+
+- **Le reti neurali sono troppo pesanti**. Dato l'insieme di dati pulito, ma minimo, e il fatto che si sta eseguendo l'addestramento localmente tramite notebook, le reti neurali sono troppo pesanti per questo compito.
+- **Nessun classificatore a due classi**. Non si usa un classificatore a due classi, quindi questo esclude uno contro tutti.
+- L'**albero decisionale o la regressione logistica potrebbero funzionare**. Potrebbe funzionare un albero decisionale o una regressione logistica per dati multiclasse.
+- **Gli alberi decisionali potenziati multiclasse risolvono un problema diverso**. L'albero decisionale potenziato multiclasse è più adatto per attività non parametriche, ad esempio attività progettate per costruire classifiche, quindi non è utile in questo caso.
+
+### Utilizzo di Scikit-learn
+
+Si userà Scikit-learn per analizzare i dati. Tuttavia, ci sono molti modi per utilizzare la regressione logistica in Scikit-learn. Dare un'occhiata ai [parametri da passare](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression).
+
+Essenzialmente ci sono due importanti parametri `multi_class` e `solver`, che occorre specificare, quando si chiede a Scikit-learn di eseguire una regressione logistica. Il valore `multi_class` si applica un certo comportamento. Il valore del risolutore è quale algoritmo utilizzare. Non tutti i risolutori possono essere associati a tutti i valori `multi_class` .
+
+Secondo la documentazione, nel caso multiclasse, l'algoritmo di addestramento:
+
+- **Utilizza lo schema one-vs-rest (OvR)** - uno contro tutti, se l'opzione `multi_class` è impostata su `ovr`
+- **Utilizza la perdita di entropia incrociata**, se l 'opzione `multi_class` è impostata su `multinomial`. (Attualmente l'opzione multinomiale è supportata solo dai solutori 'lbfgs', 'sag', 'saga' e 'newton-cg')."
+
+> 🎓 Lo 'schema' qui può essere 'ovr' (one-vs-rest) - uno contro tutti - o 'multinomiale'. Poiché la regressione logistica è realmente progettata per supportare la classificazione binaria, questi schemi consentono di gestire meglio le attività di classificazione multiclasse. [fonte](https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/)
+
+> 🎓 Il 'solver' è definito come "l'algoritmo da utilizzare nel problema di ottimizzazione". [fonte](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression).
+
+Scikit-learn offre questa tabella per spiegare come i risolutori gestiscono le diverse sfide presentate da diversi tipi di strutture dati:
+
+
+
+## Esercizio: dividere i dati
+
+Ci si può concentrare sulla regressione logistica per la prima prova di addestramento poiché di recente si è appreso di quest'ultima in una lezione precedente.
+Dividere i dati in gruppi di addestramento e test chiamando `train_test_split()`:
+
+```python
+X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+```
+
+## Esercizio: applicare la regressione logistica
+
+Poiché si sta utilizzando il caso multiclasse, si deve scegliere quale _schema_ utilizzare e quale _solutore_ impostare. Usare LogisticRegression con un'impostazione multiclasse e il solutore **liblinear** da addestrare.
+
+1. Creare una regressione logistica con multi_class impostato su `ovr` e il risolutore impostato su `liblinear`:
+
+ ```python
+ lr = LogisticRegression(multi_class='ovr',solver='liblinear')
+ model = lr.fit(X_train, np.ravel(y_train))
+
+ accuracy = model.score(X_test, y_test)
+ print ("Accuracy is {}".format(accuracy))
+ ```
+
+ ✅ Provare un risolutore diverso come `lbfgs`, che è spesso impostato come predefinito
+
+ > Nota, usare la funzione [`ravel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ravel.html) di Pandas per appiattire i dati quando necessario.
+
+ La precisione è buona oltre l'**80%**!
+
+1. Si può vedere questo modello in azione testando una riga di dati (#50):
+
+ ```python
+ print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
+ print(f'cuisine: {y_test.iloc[50]}')
+ ```
+
+ Il risultato viene stampato:
+
+ ```output
+ ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
+ cuisine: indian
+ ```
+
+ ✅ Provare un numero di riga diverso e controllare i risultati
+
+1. Scavando più a fondo, si può verificare l'accuratezza di questa previsione:
+
+ ```python
+ test= X_test.iloc[50].values.reshape(-1, 1).T
+ proba = model.predict_proba(test)
+ classes = model.classes_
+ resultdf = pd.DataFrame(data=proba, columns=classes)
+
+ topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
+ topPrediction.head()
+ ```
+
+ Il risultato è stampato: la cucina indiana è la sua ipotesi migliore, con buone probabilità:
+
+ | | 0 | | | | | | | | | | | | | | | | | | | | |
+ | -------: | -------: | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+ | indiano | 0,715851 | | | | | | | | | | | | | | | | | | | | |
+ | cinese | 0.229475 | | | | | | | | | | | | | | | | | | | | |
+ | Giapponese | 0,029763 | | | | | | | | | | | | | | | | | | | | |
+ | Coreano | 0.017277 | | | | | | | | | | | | | | | | | | | | |
+ | thai | 0.007634 | | | | | | | | | | | | | | | | | | | | |
+
+ ✅ Si è in grado di spiegare perché il modello è abbastanza sicuro che questa sia una cucina indiana?
+
+1. Ottenere maggiori dettagli stampando un rapporto di classificazione, come fatto nelle lezioni di regressione:
+
+ ```python
+ y_pred = model.predict(X_test)
+ print(classification_report(y_test,y_pred))
+ ```
+
+ | precisione | recall | punteggio f1 | supporto | | | | | | | | | | | | | | | | | | |
+ | ------------ | ------ | -------- | ------- | ---- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- | --- |
+ | cinese | 0,73 | 0,71 | 0,72 | 229 | | | | | | | | | | | | | | | | | |
+ | indiano | 0,91 | 0,93 | 0,92 | 254 | | | | | | | | | | | | | | | | | |
+ | Giapponese | 0.70 | 0,75 | 0,72 | 220 | | | | | | | | | | | | | | | | | |
+ | Coreano | 0,86 | 0,76 | 0,81 | 242 | | | | | | | | | | | | | | | | | |
+ | thai | 0,79 | 0,85 | 0.82 | 254 | | | | | | | | | | | | | | | | | |
+ | accuratezza | 0,80 | 1199 | | | | | | | | | | | | | | | | | | | |
+ | macro media | 0,80 | 0,80 | 0,80 | 1199 | | | | | | | | | | | | | | | | | |
+ | Media ponderata | 0,80 | 0,80 | 0,80 | 1199 | | | | | | | | | | | | | | | | | |
+
+## 🚀 Sfida
+
+In questa lezione, sono stati utilizzati dati puliti per creare un modello di apprendimento automatico in grado di prevedere una cucina nazionale basata su una serie di ingredienti. Si prenda del tempo per leggere le numerose opzioni fornite da Scikit-learn per classificare i dati. Approfondire il concetto di "risolutore" per capire cosa succede dietro le quinte.
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/22/)
+## Revisione e Auto Apprendimento
+
+Approfondire un po' la matematica alla base della regressione logistica in [questa lezione](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf)
+## Compito
+
+[Studiare i risolutori](assignment.it.md)
diff --git a/4-Classification/2-Classifiers-1/translations/README.tr.md b/4-Classification/2-Classifiers-1/translations/README.tr.md
new file mode 100644
index 00000000..f02bd759
--- /dev/null
+++ b/4-Classification/2-Classifiers-1/translations/README.tr.md
@@ -0,0 +1,241 @@
+# Mutfak sınıflandırıcıları 1
+
+Bu derste, mutfaklarla ilgili dengeli ve temiz veriyle dolu, geçen dersten kaydettiğiniz veri setini kullanacaksınız.
+
+Bu veri setini çeşitli sınıflandırıcılarla _bir grup malzemeyi baz alarak verilen bir ulusal mutfağı öngörmek_ için kullanacaksınız. Bunu yaparken, sınıflandırma görevleri için algoritmaların leveraj edilebileceği yollardan bazıları hakkında daha fazla bilgi edineceksiniz.
+
+## [Ders öncesi kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/21/?loc=tr)
+# Hazırlık
+
+[Birinci dersi](../../1-Introduction/README.md) tamamladığınızı varsayıyoruz, dolayısıyla bu dört ders için _cleaned_cuisines.csv_ dosyasının kök `/data` klasöründe var olduğundan emin olun.
+
+## Alıştırma - ulusal bir mutfağı öngörün
+
+1. Bu dersin _notebook.ipynb_ dosyasında çalışarak, Pandas kütüphanesiyle beraber o dosyayı da alın:
+
+ ```python
+ import pandas as pd
+ cuisines_df = pd.read_csv("../data/cleaned_cuisines.csv")
+ cuisines_df.head()
+ ```
+
+ Veri şöyle görünüyor:
+
+| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+| 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
+
+1. Şimdi, birkaç kütüphane daha alın:
+
+ ```python
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ from sklearn.svm import SVC
+ import numpy as np
+ ```
+
+1. X ve y koordinatlarını eğitme için iki veri iskeletine bölün. `cuisine` etiket veri iskeleti olabilir:
+
+ ```python
+ cuisines_label_df = cuisines_df['cuisine']
+ cuisines_label_df.head()
+ ```
+
+ Şöyle görünecek:
+
+ ```output
+ 0 indian
+ 1 indian
+ 2 indian
+ 3 indian
+ 4 indian
+ Name: cuisine, dtype: object
+ ```
+
+1. `Unnamed: 0` ve `cuisine` sütunlarını, `drop()` fonksiyonunu çağırarak temizleyin. Kalan veriyi eğitilebilir öznitelikler olarak kaydedin:
+
+ ```python
+ cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
+ cuisines_feature_df.head()
+ ```
+
+ Öznitelikleriniz şöyle görünüyor:
+
+| almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: |
+| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
+Şimdi modelinizi eğitmek için hazırsınız!
+
+## Sınıflandırıcınızı seçme
+
+Veriniz temiz ve eğitme için hazır, şimdi bu iş için hangi algoritmanın kullanılması gerektiğine karar vermelisiniz.
+
+Scikit-learn, sınıflandırmayı gözetimli öğrenme altında grupluyor. Bu kategoride sınıflandırma için birçok yöntem görebilirsiniz. [Çeşitlilik](https://scikit-learn.org/stable/supervised_learning.html) ilk bakışta oldukça şaşırtıcı. Aşağıdaki yöntemlerin hepsi sınıflandırma yöntemlerini içermektedir:
+
+- Doğrusal Modeller
+- Destek Vektör Makineleri
+- Stokastik Gradyan İnişi
+- En Yakın Komşu
+- Gauss Süreçleri
+- Karar Ağaçları
+- Topluluk Metotları (Oylama Sınıflandırıcısı)
+- Çok sınıflı ve çok çıktılı algoritmalar (çok sınıflı ve çok etiketli sınıflandırma, çok sınıflı-çok çıktılı sınıflandırma)
+
+> [Verileri sınıflandırmak için sinir ağlarını](https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification) da kullanabilirsiniz, ancak bu, bu dersin kapsamı dışındadır.
+
+### Hangi sınıflandırıcıyı kullanmalı?
+
+Şimdi, hangi sınıflandırıcıyı seçmelisiniz? Genellikle, birçoğunu gözden geçirmek ve iyi bir sonuç aramak deneme yollarından biridir. Scikit-learn, oluşturulmuş bir veri seti üzerinde KNeighbors, iki yolla SVC, GaussianProcessClassifier, DecisionTreeClassifier, RandomForestClassifier, MLPClassifier, AdaBoostClassifier, GaussianNB ve QuadraticDiscrinationAnalysis karşılaştırmaları yapan ve sonuçları görsel olarak gösteren bir [yan yana karşılaştırma](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html) sunar:
+
+
+> Grafikler Scikit-learn dokümantasyonlarında oluşturulmuştur.
+
+> AutoML, bu karşılaştırmaları bulutta çalıştırarak bu problemi muntazam bir şekilde çözer ve veriniz için en iyi algoritmayı seçmenizi sağlar. [Buradan](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-15963-cxa) deneyin.
+
+### Daha iyi bir yaklaşım
+
+Böyle tahminlerle çözmekten daha iyi bir yol ise, indirilebilir [ML Kopya kağıdı](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-15963-cxa) içindeki fikirlere bakmaktır. Burada, bizim çok sınıflı problemimiz için bazı seçenekler olduğunu görüyoruz:
+
+
+> Microsoft'un Algoritma Kopya Kağıdı'ndan, çok sınıflı sınıflandırma seçeneklerini detaylandıran bir bölüm
+
+:white_check_mark: Bu kopya kağıdını indirin, yazdırın ve duvarınıza asın!
+
+### Akıl yürütme
+
+Elimizdeki kısıtlamalarla farklı yaklaşımlar üzerine akıl yürütelim:
+
+- **Sinir ağları çok ağır**. Temiz ama minimal veri setimizi ve eğitimi not defterleriyle yerel makinelerde çalıştırdığımızı göz önünde bulundurursak, sinir ağları bu görev için çok ağır oluyor.
+- **İki sınıflı sınıflandırıcısı yok**. İki sınıflı sınıflandırıcı kullanmıyoruz, dolayısıyla bire karşı hepsi (one-vs-all) yöntemi eleniyor.
+- **Karar ağacı veya lojistik regresyon işe yarayabilirdi**. Bir karar ağacı veya çok sınıflı veri için lojistik regresyon işe yarayabilir.
+- **Çok Sınıf Artırmalı Karar Ağaçları farklı bir problemi çözüyor**. Çok sınıf artırmalı karar ağacı, parametrik olmayan görevler için en uygunu, mesela sıralama (ranking) oluşturmak için tasarlanan görevler. Yani, bizim için kullanışlı değil.
+
+### Scikit-learn kullanımı
+
+Verimizi analiz etmek için Scikit-learn kullanacağız. Ancak, Scikit-learn içerisinde lojistik regresyonu kullanmanın birçok yolu var. [Geçirilecek parametreler](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression) göz atın.
+
+Aslında, Scikit-learn'den lojistik regresyon yapmasını beklediğimizde belirtmemiz gereken `multi_class` ve `solver` diye iki önemli parametre var. `multi_class` değeri belli bir davranış uygular. Çözücünün değeri, hangi algoritmanın kullanılacağını gösterir. Her çözücü her `multi_class` değeriyle eşleştirilemez.
+
+Dokümanlara göre, çok sınıflı durumunda eğitme algoritması:
+
+- Eğer `multi_class` seçeneği `ovr` olarak ayarlanmışsa, **bire karşı diğerleri (one-vs-rest, OvR) şemasını kullanır**
+- Eğer `multi_class` seçeneği `multinomial` olarak ayarlanmışsa, **çapraz düzensizlik yitimini/kaybını kullanır**. (Güncel olarak `multinomial` seçeneği yalnızca ‘lbfgs’, ‘sag’, ‘saga’ ve ‘newton-cg’ çözücüleriyle destekleniyor.)
+
+> :mortar_board: Buradaki 'şema' ya 'ovr' (one-vs-rest, yani bire karşı diğerleri) ya da 'multinomial' olabilir. Lojistik regresyon aslında ikili sınıflandırmayı desteklemek için tasarlandığından, bu şemalar onun çok sınıflı sınıflandırma görevlerini daha iyi ele alabilmesini sağlıyor. [kaynak](https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/)
+
+> :mortar_board: 'Çözücü', "eniyileştirme probleminde kullanılacak algoritma" olarak tanımlanır. [kaynak](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression)
+
+Scikit-learn, çözücülerin, farklı tür veri yapıları tarafından sunulan farklı meydan okumaları nasıl ele aldığını açıklamak için bu tabloyu sunar:
+
+
+
+## Alıştırma - veriyi bölün
+
+İkincisini önceki derte öğrendiğinizden, ilk eğitme denememiz için lojistik regresyona odaklanabiliriz.
+`train_test_split()` fonksiyonunu çağırarak verilerinizi eğitme ve sınama gruplarına bölün:
+
+```python
+X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+```
+
+## Alıştırma - lojistik regresyon uygulayın
+
+Çok sınıflı durumu kullandığınız için, hangi _şemayı_ kullanacağınızı ve hangi _çözücüyü_ ayarlayacağınızı seçmeniz gerekiyor. Eğitme için, bir çok sınıflı ayarında LogisticRegression ve **liblinear** çözücüsünü kullanın.
+
+1. multi_class'ı `ovr` ve solver'ı `liblinear` olarak ayarlayarak bir lojistik regresyon oluşturun:
+
+ ```python
+ lr = LogisticRegression(multi_class='ovr',solver='liblinear')
+ model = lr.fit(X_train, np.ravel(y_train))
+
+ accuracy = model.score(X_test, y_test)
+ print ("Accuracy is {}".format(accuracy))
+ ```
+
+ :white_check_mark: Genelde varsayılan olarak ayarlanan `lbfgs` gibi farklı bir çözücü deneyin.
+
+ > Not olarak, gerektiğinde verinizi düzleştirmek için Pandas [`ravel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ravel.html) fonksiyonunu kullanın.
+
+ Doğruluk **%80** üzerinde iyidir!
+
+1. Bir satır veriyi (#50) sınayarak bu modeli eylem halinde görebilirsiniz:
+
+ ```python
+ print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
+ print(f'cuisine: {y_test.iloc[50]}')
+ ```
+
+ Sonuç bastırılır:
+
+ ```output
+ ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
+ cuisine: indian
+ ```
+
+ :white_check_mark: Farklı bir satır sayısı deneyin ve sonuçları kontrol edin
+
+1. Daha derinlemesine inceleyerek, bu öngörünün doğruluğunu kontrol edebilirsiniz:
+
+ ```python
+ test= X_test.iloc[50].values.reshape(-1, 1).T
+ proba = model.predict_proba(test)
+ classes = model.classes_
+ resultdf = pd.DataFrame(data=proba, columns=classes)
+
+ topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
+ topPrediction.head()
+ ```
+
+ Sonuç bastırılır - Hint mutfağı iyi olasılıkla en iyi öngörü:
+
+ | | 0 |
+ | -------: | -------: |
+ | indian | 0.715851 |
+ | chinese | 0.229475 |
+ | japanese | 0.029763 |
+ | korean | 0.017277 |
+ | thai | 0.007634 |
+
+ :while_check_mark: Modelin, bunun bir Hint mutfağı olduğundan nasıl emin olduğunu açıklayabilir misiniz?
+
+1. Regresyon derslerinde yaptığınız gibi, bir sınıflandırma raporu bastırarak daha fazla detay elde edin:
+
+ ```python
+ y_pred = model.predict(X_test)
+ print(classification_report(y_test,y_pred))
+ ```
+
+ | | precision | recall | f1-score | support |
+ | ------------ | ------ | -------- | ------- | ---- |
+ | chinese | 0.73 | 0.71 | 0.72 | 229 |
+ | indian | 0.91 | 0.93 | 0.92 | 254 |
+ | japanese | 0.70 | 0.75 | 0.72 | 220 |
+ | korean | 0.86 | 0.76 | 0.81 | 242 |
+ | thai | 0.79 | 0.85 | 0.82 | 254 |
+ | accuracy | 0.80 | 1199 | | |
+ | macro avg | 0.80 | 0.80 | 0.80 | 1199 |
+ | weighted avg | 0.80 | 0.80 | 0.80 | 1199 |
+
+## :rocket: Meydan Okuma
+
+Bu derste, bir grup malzemeyi baz alarak bir ulusal mutfağı öngörebilen bir makine öğrenimi modeli oluşturmak için temiz verinizi kullandınız. Scikit-learn'ün veri sınıflandırmak için sağladığı birçok yöntemi okumak için biraz vakit ayırın. Arka tarafta neler olduğunu anlamak için 'çözücü' kavramını derinlemesine inceleyin.
+
+## [Ders sonrası kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/22/?loc=tr)
+
+## Gözden geçirme & kendi kendine çalışma
+
+[Bu deste](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf) lojistik regresyonun arkasındaki matematiği derinlemesine inceleyin.
+## Ödev
+
+[Çözücüleri çalışın](assignment.tr.md)
diff --git a/4-Classification/2-Classifiers-1/translations/README.zh-cn.md b/4-Classification/2-Classifiers-1/translations/README.zh-cn.md
new file mode 100644
index 00000000..83aa4fc4
--- /dev/null
+++ b/4-Classification/2-Classifiers-1/translations/README.zh-cn.md
@@ -0,0 +1,242 @@
+# 菜品分类器1
+
+本节课程将使用你在上一个课程中所保存的全部经过均衡和清洗的菜品数据。
+
+你将使用此数据集和各种分类器,_根据一组配料预测这是哪一国家的美食_。在此过程中,你将学到更多用来权衡分类任务算法的方法
+
+## [课前测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/21/)
+# 准备工作
+
+假如你已经完成了[课程1](../../1-Introduction/translations/README.zh-cn.md), 确保在根目录的`/data`文件夹中有 _cleaned_cuisines.csv_ 这份文件来进行接下来的四节课程。
+
+## 练习 - 预测某国的菜品
+
+1. 在本节课的 _notebook.ipynb_ 文件中,导入Pandas,并读取相应的数据文件:
+
+ ```python
+ import pandas as pd
+ cuisines_df = pd.read_csv("../../data/cleaned_cuisine.csv")
+ cuisines_df.head()
+ ```
+
+ 数据如下所示:
+
+ ```output
+ | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+ | 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+ ```
+
+1. 现在,再多导入一些库:
+
+ ```python
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ from sklearn.svm import SVC
+ import numpy as np
+ ```
+
+1. 接下来需要将数据分为训练模型所需的X(译者注:代表特征数据)和y(译者注:代表标签数据)两个dataframe。首先可将`cuisine`列的数据单独保存为的一个dataframe作为标签(label)。
+
+ ```python
+ cuisines_label_df = cuisines_df['cuisine']
+ cuisines_label_df.head()
+ ```
+
+ 输出如下:
+
+ ```output
+ 0 indian
+ 1 indian
+ 2 indian
+ 3 indian
+ 4 indian
+ Name: cuisine, dtype: object
+ ```
+
+1. 调用`drop()`方法将 `Unnamed: 0`和 `cuisine`列删除,并将余下的数据作为可以用于训练的特证(feature)数据:
+
+ ```python
+ cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
+ cuisines_feature_df.head()
+ ```
+
+ 你的特征集看上去将会是这样:
+
+ | | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: | --- |
+ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
+现在,你已经准备好可以开始训练你的模型了!
+
+## 选择你的分类器
+
+你的数据已经清洗干净并已经准备好可以进行训练了,现在需要决定你想要使用的算法来完成这项任务。
+
+Scikit_learn将分类任务归在了监督学习类别中,在这个类别中你可以找到很多可以用来分类的方法。乍一看上去,有点[琳琅满目](https://scikit-learn.org/stable/supervised_learning.html)。以下这些算法都可以用于分类:
+
+- 线性模型(Linear Models)
+- 支持向量机(Support Vector Machines)
+- 随机梯度下降(Stochastic Gradient Descent)
+- 最近邻(Nearest Neighbors)
+- 高斯过程(Gaussian Processes)
+- 决策树(Decision Trees)
+- 集成方法(投票分类器)(Ensemble methods(voting classifier))
+- 多类别多输出算法(多类别多标签分类,多类别多输出分类)(Multiclass and multioutput algorithms (multiclass and multilabel classification, multiclass-multioutput classification))
+
+> 你也可以使用[神经网络来分类数据](https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification), 但这对于本课程来说有点超纲了。
+
+### 如何选择分类器?
+
+那么,你应该如何从中选择分类器呢?一般来说,可以选择多个分类器并对比他们的运行结果。Scikit-learn提供了各种算法(包括KNeighbors、 SVC two ways、 GaussianProcessClassifier、 DecisionTreeClassifier、 RandomForestClassifier、 MLPClassifier、 AdaBoostClassifier、 GaussianNB以及QuadraticDiscrinationAnalysis)的[对比](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html),并且将结果进行了可视化的展示:
+
+
+> 图表来源于Scikit-learn的官方文档
+
+> AutoML通过在云端运行这些算法并进行了对比,非常巧妙地解决的算法选择的问题,能帮助你根据数据集的特点来选择最佳的算法。试试点击[这里](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-15963-cxa)了解更多。
+
+### 另外一种效果更佳的分类器选择方法
+
+比起无脑地猜测,你可以下载这份[机器学习小抄(cheatsheet)](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-15963-cxa)。这里面将各算法进行了比较,能更有效地帮助我们选择算法。根据这份小抄,我们可以找到要完成本课程中涉及的多类型的分类任务,可以有以下这些选择:
+
+
+> 微软算法小抄中部分关于多类型分类任务可选算法
+
+✅ 下载这份小抄,并打印出来,挂在你的墙上吧!
+
+### 选择的流程
+
+让我们根据所有限制条件依次对各种算法的可行性进行判断:
+
+- **神经网络(Neural Network)太过复杂了**。我们的数据很清晰但数据量比较小,此外我们是通过notebook在本地进行训练的,神经网络对于这个任务来说过于复杂了。
+- **二分类法(two-class classifier)是不可行的**。我们不能使用二分类法,所以这就排除了一对多(one-vs-all)算法。
+- **可以选择决策树以及逻辑回归算法**。决策树应该是可行的,此外也可以使用逻辑回归来处理多类型数据。
+- **多类型增强决策树是用于解决其他问题的**. 多类型增强决策树最适合的是非参数化的任务,即任务目标是建立一个排序,这对我们当前的任务并没有作用。
+
+### 使用Scikit-learn
+
+我们将会使用Scikit-learn来对我们的数据进行分析。然而在Scikit-learn中使用逻辑回归也有很多方法。可以先了解一下逻辑回归算法需要[传递的参数](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression)。
+
+当我们需要Scikit-learn进行逻辑回归运算时,`multi_class` 以及 `solver`是最重要的两个参数,因此我们需要特别说明一下。 `multi_class` 是分类方式选择参数,而`solver`优化算法选择参数。值得注意的是,并不是所有的solvers都可以与`multi_class`参数进行匹配的。
+
+根据官方文档,在多类型分类问题中:
+
+- 当`multi_class`被设置为`ovr`时,将使用 **“一对其余”(OvR)策略(scheme)**。
+- 当`multi_class`被设置为`multinomial`时,则使用的是**交叉熵损失(cross entropy loss)** 作为损失函数。(注意,目前`multinomial`只支持‘lbfgs’, ‘sag’, ‘saga’以及‘newton-cg’等solver作为损失函数的优化方法)
+
+> 🎓 在本课程的任务中“scheme”可以是“ovr(one-vs-rest)”也可以是“multinomial”。因为逻辑回归本来是设计来用于进行二分类任务的,这两个scheme参数的选择都可以使得逻辑回归很好的完成多类型分类任务。[来源](https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/)
+
+> 🎓 “solver”被定义为是"用于解决优化问题的算法"。[来源](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression).
+
+Scikit-learn提供了以下这个表格来解释各种solver是如何应对的不同的数据结构所带来的不同的挑战的:
+
+
+
+## 练习 - 分割数据
+
+因为你刚刚在上一节课中学习了逻辑回归,我们这里就通过逻辑回归算法,来演练一下如何进行你的第一个机器学习模型的训练。首先,需要通过调用`train_test_split()`方法可以把你的数据分割成训练集和测试集:
+
+
+```python
+X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+```
+
+## 练习 - 调用逻辑回归算法
+
+接下来,你需要决定选用什么 _scheme_ 以及 _solver_ 来进行我们这个多类型分类的案例。在这里我们使用LogisticRegression方法,并设置相应的multi_class参数,同时将solver设置为**liblinear**来进行模型训练。
+
+1. 创建一个逻辑回归模型,并将multi_class设置为`ovr`,同时将solver设置为 `liblinear`:
+
+ ```python
+ lr = LogisticRegression(multi_class='ovr',solver='liblinear')
+ model = lr.fit(X_train, np.ravel(y_train))
+
+ accuracy = model.score(X_test, y_test)
+ print ("Accuracy is {}".format(accuracy))
+ ```
+
+ ✅ 也可以试试其他solver比如`lbfgs`, 这也是默认参数
+
+ > 注意, 使用Pandas的[`ravel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ravel.html) 方法可以在需要的时候将你的数据进行降维
+
+ 运算之后,可以看到准确率高达了**80%**!
+
+1. 你也可以通过查看某一行数据(比如第50行)来观测到模型运行的情况:
+
+ ```python
+ print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
+ print(f'cuisine: {y_test.iloc[50]}')
+ ```
+
+ 运行后的输出如下:
+
+ ```output
+ ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
+ cuisine: indian
+ ```
+
+ ✅ 试试不同的行索引来检查一下计算的结果吧
+
+1. 我们可以再进行一部深入的研究,检查一下本轮预测结果的准确率:
+
+ ```python
+ test= X_test.iloc[50].values.reshape(-1, 1).T
+ proba = model.predict_proba(test)
+ classes = model.classes_
+ resultdf = pd.DataFrame(data=proba, columns=classes)
+
+ topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
+ topPrediction.head()
+ ```
+
+ 运行后的输出如下———可以发现这是一道印度菜的可能性最大,是最合理的猜测:
+
+ | | 0 |
+ | -------: | -------: |
+ | indian | 0.715851 |
+ | chinese | 0.229475 |
+ | japanese | 0.029763 |
+ | korean | 0.017277 |
+ | thai | 0.007634 |
+
+ ✅ 你能解释下为什么模型会如此确定这是一道印度菜么?
+
+1. 和你在之前的回归的课程中所做的一样,我们也可以通过输出分类的报告得到关于模型的更多的细节:
+
+ ```python
+ y_pred = model.predict(X_test)
+ print(classification_report(y_test,y_pred))
+ ```
+
+ | precision | recall | f1-score | support | |
+ | ------------ | ------ | -------- | ------- | ---- |
+ | chinese | 0.73 | 0.71 | 0.72 | 229 |
+ | indian | 0.91 | 0.93 | 0.92 | 254 |
+ | japanese | 0.70 | 0.75 | 0.72 | 220 |
+ | korean | 0.86 | 0.76 | 0.81 | 242 |
+ | thai | 0.79 | 0.85 | 0.82 | 254 |
+ | accuracy | 0.80 | 1199 | | |
+ | macro avg | 0.80 | 0.80 | 0.80 | 1199 |
+ | weighted avg | 0.80 | 0.80 | 0.80 | 1199 |
+
+## 挑战
+
+在本课程中,你使用了清洗后的数据建立了一个机器学习的模型,这个模型能够根据输入的一系列的配料来预测菜品来自于哪个国家。请再花点时间阅读一下Scikit-learn所提供的关于可以用来分类数据的其他方法的资料。此外,你也可以深入研究一下“solver”的概念并尝试一下理解其背后的原理。
+
+## [课后测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/22/)
+## 回顾与自学
+
+[这个课程](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf)将对逻辑回归背后的数学原理进行更加深入的讲解
+
+## 作业
+
+[学习solver](assignment.md)
diff --git a/4-Classification/2-Classifiers-1/translations/assignment.it.md b/4-Classification/2-Classifiers-1/translations/assignment.it.md
new file mode 100644
index 00000000..80d1c5e1
--- /dev/null
+++ b/4-Classification/2-Classifiers-1/translations/assignment.it.md
@@ -0,0 +1,10 @@
+# Studiare i risolutori
+## Istruzioni
+
+In questa lezione si è imparato a conoscere i vari risolutori che associano algoritmi a un processo di machine learning per creare un modello accurato. Esaminare i risolutori elencati nella lezione e sceglierne due. Con parole proprie, confrontare questi due risolutori. Che tipo di problema affrontano? Come funzionano con varie strutture di dati? Perché se ne dovrebbe sceglierne uno piuttosto che un altro?
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | ---------------------------------------------------------------------------------------------- | ------------------------------------------------ | ---------------------------- |
+| | Viene presentato un file .doc con due paragrafi, uno su ciascun risolutore, confrontandoli attentamente. | Un file .doc viene presentato con un solo paragrafo | Il compito è incompleto |
diff --git a/4-Classification/2-Classifiers-1/translations/assignment.tr.md b/4-Classification/2-Classifiers-1/translations/assignment.tr.md
new file mode 100644
index 00000000..10d4c64f
--- /dev/null
+++ b/4-Classification/2-Classifiers-1/translations/assignment.tr.md
@@ -0,0 +1,9 @@
+# Çözücüleri çalışın
+## Yönergeler
+
+Bu derste, doğru bir model yaratmak için algoritmaları bir makine öğrenimi süreciyle eşleştiren çeşitli çözücüleri öğrendiniz. Derste sıralanan çözücüleri inceleyin ve iki tanesini seçin. Kendi cümlelerinizle, bu iki çözücünün benzerliklerini ve farklılıklarını bulup yazın. Ne tür problemleri ele alıyorlar? Çeşitli veri yapılarıyla nasıl çalışıyorlar? Birini diğerine neden tercih ederdiniz?
+## Rubrik
+
+| Ölçüt | Örnek Alınacak Nitelikte | Yeterli | Geliştirme Gerekli |
+| -------- | -------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------ | ---------------------------- |
+| | Her biri bir çözücü üzerine yazılmış, onları dikkatle karşılaştıran ve iki paragraf içeren bir .doc dosyası sunulmuş | Bir paragraf içeren bir .doc dosyası sunulmuş | Görev tamamlanmamış |
diff --git a/4-Classification/3-Classifiers-2/README.md b/4-Classification/3-Classifiers-2/README.md
index dd25926e..9720c763 100644
--- a/4-Classification/3-Classifiers-2/README.md
+++ b/4-Classification/3-Classifiers-2/README.md
@@ -6,7 +6,7 @@ In this second classification lesson, you will explore more ways to classify num
### Prerequisite
-We assume that you have completed the previous lessons and have a cleaned dataset in your `data` folder called _cleaned_cuisine.csv_ in the root of this 4-lesson folder.
+We assume that you have completed the previous lessons and have a cleaned dataset in your `data` folder called _cleaned_cuisines.csv_ in the root of this 4-lesson folder.
### Preparation
diff --git a/4-Classification/3-Classifiers-2/notebook.ipynb b/4-Classification/3-Classifiers-2/notebook.ipynb
index f4dec474..4659a7b6 100644
--- a/4-Classification/3-Classifiers-2/notebook.ipynb
+++ b/4-Classification/3-Classifiers-2/notebook.ipynb
@@ -47,7 +47,7 @@
],
"source": [
"import pandas as pd\n",
- "cuisines_df = pd.read_csv(\"../data/cleaned_cuisine.csv\")\n",
+ "cuisines_df = pd.read_csv(\"../data/cleaned_cuisines.csv\")\n",
"cuisines_df.head()"
]
},
diff --git a/4-Classification/3-Classifiers-2/solution/notebook.ipynb b/4-Classification/3-Classifiers-2/solution/notebook.ipynb
index d953c603..a089b21f 100644
--- a/4-Classification/3-Classifiers-2/solution/notebook.ipynb
+++ b/4-Classification/3-Classifiers-2/solution/notebook.ipynb
@@ -47,7 +47,7 @@
],
"source": [
"import pandas as pd\n",
- "cuisines_df = pd.read_csv(\"../../data/cleaned_cuisine.csv\")\n",
+ "cuisines_df = pd.read_csv(\"../../data/cleaned_cuisines.csv\")\n",
"cuisines_df.head()"
]
},
diff --git a/4-Classification/3-Classifiers-2/translations/README.it.md b/4-Classification/3-Classifiers-2/translations/README.it.md
new file mode 100644
index 00000000..4a3a431f
--- /dev/null
+++ b/4-Classification/3-Classifiers-2/translations/README.it.md
@@ -0,0 +1,235 @@
+# Classificatori di cucina 2
+
+In questa seconda lezione sulla classificazione, si esploreranno più modi per classificare i dati numerici. Si Impareranno anche le ramificazioni per la scelta di un classificatore rispetto all'altro.
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/23/)
+
+### Prerequisito
+
+Si parte dal presupposto che siano state completate le lezioni precedenti e si disponga di un insieme di dati pulito nella cartella `data` chiamato _clean_cuisine.csv_ nella radice di questa cartella di 4 lezioni.
+
+### Preparazione
+
+Il file _notebook.ipynb_ è stato caricato con l'insieme di dati pulito ed è stato diviso in dataframe di dati X e y, pronti per il processo di creazione del modello.
+
+## Una mappa di classificazione
+
+In precedenza, si sono apprese le varie opzioni a disposizione durante la classificazione dei dati utilizzando il cheat sheet di Microsoft. Scikit-learn offre un cheat sheet simile, ma più granulare che può aiutare ulteriormente a restringere i propri stimatori (un altro termine per i classificatori):
+
+
+> Suggerimento: [visitare questa mappa online](https://scikit-learn.org/stable/tutorial/machine_learning_map/) e fare clic lungo il percorso per leggere la documentazione.
+
+### Il piano
+
+Questa mappa è molto utile una volta che si ha una chiara comprensione dei propri dati, poiché si può "camminare" lungo i suoi percorsi verso una decisione:
+
+- Ci sono >50 campioni
+- Si vuole pronosticare una categoria
+- I dati sono etichettati
+- Ci sono meno di 100K campioni
+- ✨ Si può scegliere un SVC lineare
+- Se non funziona, visto che ci sono dati numerici
+ - Si può provare un ✨ KNeighbors Classifier
+ - Se non funziona, si prova ✨ SVC e ✨ Classificatori di ensemble
+
+Questo è un percorso molto utile da seguire.
+
+## Esercizio: dividere i dati
+
+Seguendo questo percorso, si dovrebbe iniziare importando alcune librerie da utilizzare.
+
+1. Importare le librerie necessarie:
+
+ ```python
+ from sklearn.neighbors import KNeighborsClassifier
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.svm import SVC
+ from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ import numpy as np
+ ```
+
+1. Dividere i dati per allenamento e test:
+
+ ```python
+ X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+ ```
+
+## Classificatore lineare SVC
+
+Il clustering Support-Vector (SVC) è figlio della famiglia di tecniche ML Support-Vector (ulteriori informazioni su queste di seguito). In questo metodo, si può scegliere un "kernel" per decidere come raggruppare le etichette. Il parametro 'C' si riferisce alla 'regolarizzazione' che regola l'influenza dei parametri. Il kernel può essere uno dei [tanti](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC); qui si imposta su 'lineare' per assicurarsi di sfruttare l'SVC lineare. Il valore predefinito di probabilità è 'false'; qui si imposta su 'true' per raccogliere stime di probabilità. Si imposta lo stato casuale su "0" per mescolare i dati per ottenere le probabilità.
+
+### Esercizio: applicare una SVC lineare
+
+Iniziare creando un array di classificatori. Si aggiungerà progressivamente a questo array durante il test.
+
+1. Iniziare con un SVC lineare:
+
+ ```python
+ C = 10
+ # Create different classifiers.
+ classifiers = {
+ 'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0)
+ }
+ ```
+
+2. Addestrare il modello utilizzando Linear SVC e stampare un rapporto:
+
+ ```python
+ n_classifiers = len(classifiers)
+
+ for index, (name, classifier) in enumerate(classifiers.items()):
+ classifier.fit(X_train, np.ravel(y_train))
+
+ y_pred = classifier.predict(X_test)
+ accuracy = accuracy_score(y_test, y_pred)
+ print("Accuracy (train) for %s: %0.1f%% " % (name, accuracy * 100))
+ print(classification_report(y_test,y_pred))
+ ```
+
+ Il risultato è abbastanza buono:
+
+ ```output
+ Accuracy (train) for Linear SVC: 78.6%
+ precision recall f1-score support
+
+ chinese 0.71 0.67 0.69 242
+ indian 0.88 0.86 0.87 234
+ japanese 0.79 0.74 0.76 254
+ korean 0.85 0.81 0.83 242
+ thai 0.71 0.86 0.78 227
+
+ accuracy 0.79 1199
+ macro avg 0.79 0.79 0.79 1199
+ weighted avg 0.79 0.79 0.79 1199
+ ```
+
+## Classificatore K-Neighbors
+
+K-Neighbors fa parte della famiglia dei metodi ML "neighbors" (vicini), che possono essere utilizzati sia per l'apprendimento supervisionato che non supervisionato. In questo metodo, viene creato un numero predefinito di punti e i dati vengono raccolti attorno a questi punti in modo tale da poter prevedere etichette generalizzate per i dati.
+
+### Esercizio: applicare il classificatore K-Neighbors
+
+Il classificatore precedente era buono e funzionava bene con i dati, ma forse si può ottenere una maggiore precisione. Provare un classificatore K-Neighbors.
+
+1. Aggiungere una riga all'array classificatore (aggiungere una virgola dopo l'elemento Linear SVC):
+
+ ```python
+ 'KNN classifier': KNeighborsClassifier(C),
+ ```
+
+ Il risultato è un po' peggio:
+
+ ```output
+ Accuracy (train) for KNN classifier: 73.8%
+ precision recall f1-score support
+
+ chinese 0.64 0.67 0.66 242
+ indian 0.86 0.78 0.82 234
+ japanese 0.66 0.83 0.74 254
+ korean 0.94 0.58 0.72 242
+ thai 0.71 0.82 0.76 227
+
+ accuracy 0.74 1199
+ macro avg 0.76 0.74 0.74 1199
+ weighted avg 0.76 0.74 0.74 1199
+ ```
+
+ ✅ Scoprire [K-Neighbors](https://scikit-learn.org/stable/modules/neighbors.html#neighbors)
+
+## Classificatore Support Vector
+
+I classificatori Support-Vector fanno parte della famiglia di metodi ML [Support-Vector Machine](https://it.wikipedia.org/wiki/Macchine_a_vettori_di_supporto) utilizzati per le attività di classificazione e regressione. Le SVM "mappano esempi di addestramento in punti nello spazio" per massimizzare la distanza tra due categorie. I dati successivi vengono mappati in questo spazio in modo da poter prevedere la loro categoria.
+
+### Esercizio: applicare un classificatore di vettori di supporto
+
+Si prova a ottenere una precisione leggermente migliore con un classificatore di vettori di supporto.
+
+1. Aggiungere una virgola dopo l'elemento K-Neighbors, quindi aggiungere questa riga:
+
+ ```python
+ 'SVC': SVC(),
+ ```
+
+ Il risultato è abbastanza buono!
+
+ ```output
+ Accuracy (train) for SVC: 83.2%
+ precision recall f1-score support
+
+ chinese 0.79 0.74 0.76 242
+ indian 0.88 0.90 0.89 234
+ japanese 0.87 0.81 0.84 254
+ korean 0.91 0.82 0.86 242
+ thai 0.74 0.90 0.81 227
+
+ accuracy 0.83 1199
+ macro avg 0.84 0.83 0.83 1199
+ weighted avg 0.84 0.83 0.83 1199
+ ```
+
+ ✅ Scoprire i vettori di [supporto](https://scikit-learn.org/stable/modules/svm.html#svm)
+
+## Classificatori ensamble
+
+Si segue il percorso fino alla fine, anche se il test precedente è stato abbastanza buono. Si provano un po' di classificatori di ensemble, nello specifico Random Forest e AdaBoost:
+
+```python
+'RFST': RandomForestClassifier(n_estimators=100),
+ 'ADA': AdaBoostClassifier(n_estimators=100)
+```
+
+Il risultato è molto buono, soprattutto per Random Forest:
+
+```output
+Accuracy (train) for RFST: 84.5%
+ precision recall f1-score support
+
+ chinese 0.80 0.77 0.78 242
+ indian 0.89 0.92 0.90 234
+ japanese 0.86 0.84 0.85 254
+ korean 0.88 0.83 0.85 242
+ thai 0.80 0.87 0.83 227
+
+ accuracy 0.84 1199
+ macro avg 0.85 0.85 0.84 1199
+weighted avg 0.85 0.84 0.84 1199
+
+Accuracy (train) for ADA: 72.4%
+ precision recall f1-score support
+
+ chinese 0.64 0.49 0.56 242
+ indian 0.91 0.83 0.87 234
+ japanese 0.68 0.69 0.69 254
+ korean 0.73 0.79 0.76 242
+ thai 0.67 0.83 0.74 227
+
+ accuracy 0.72 1199
+ macro avg 0.73 0.73 0.72 1199
+weighted avg 0.73 0.72 0.72 1199
+```
+
+✅ Ulteriori informazioni sui [classificatori di ensemble](https://scikit-learn.org/stable/modules/ensemble.html)
+
+Questo metodo di Machine Learning "combina le previsioni di diversi stimatori di base" per migliorare la qualità del modello. In questo esempio, si è utilizzato Random Trees e AdaBoost.
+
+- [Random Forest](https://scikit-learn.org/stable/modules/ensemble.html#forest), un metodo di calcolo della media, costruisce una "foresta" di "alberi decisionali" infusi di casualità per evitare il sovradattamento. Il parametro n_estimators è impostato sul numero di alberi.
+
+- [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) adatta un classificatore a un insieme di dati e quindi adatta le copie di quel classificatore allo stesso insieme di dati. Si concentra sui pesi degli elementi classificati in modo errato e regola l'adattamento per il successivo classificatore da correggere.
+
+---
+
+## 🚀 Sfida
+
+Ognuna di queste tecniche ha un gran numero di parametri che si possono modificare. Ricercare i parametri predefiniti di ciascuno e pensare a cosa significherebbe modificare questi parametri per la qualità del modello.
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/24/)
+
+## Revisione e Auto Apprendimento
+
+C'è molto gergo in queste lezioni, quindi si prenda un minuto per rivedere [questo elenco](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-15963-cxa) di terminologia utile!
+
+## Compito
+
+[Giocore coi parametri](assignment.it.md)
diff --git a/4-Classification/3-Classifiers-2/translations/README.tr.md b/4-Classification/3-Classifiers-2/translations/README.tr.md
new file mode 100644
index 00000000..24ea3dfd
--- /dev/null
+++ b/4-Classification/3-Classifiers-2/translations/README.tr.md
@@ -0,0 +1,235 @@
+# Mutfak sınıflandırıcıları 2
+
+Bu ikinci sınıflandırma dersinde, sayısal veriyi sınıflandırmak için daha fazla yöntem öğreneceksiniz. Ayrıca, bir sınıflandırıcıyı diğerlerine tercih etmenin sonuçlarını da öğreneceksiniz.
+
+## [Ders öncesi kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/23/?loc=tr)
+
+### Ön koşul
+
+Önceki dersleri tamamladığınızı ve bu 4-ders klasörünün kökündeki `data` klasörünüzdeki _cleaned_cuisines.csv_ adlı veri setini temizlediğinizi varsayıyoruz.
+
+### Hazırlık
+
+Temizlenmiş veri setiyle _notebook.ipynb_ dosyanızı yükledik ve model oluşturma sürecine hazır olması için X ve y veri iskeletlerine böldük.
+
+## Bir sınıflandırma haritası
+
+Daha önce, Microsoft'un kopya kağıdını kullanarak veri sınıflandırmanın çeşitli yollarını öğrendiniz. Scikit-learn de buna benzer, öngörücülerinizi (sınıflandırıcı) sınırlandırmanıza ilaveten yardım edecek bir kopya kağıdı sunar.
+
+
+> Tavsiye: [Bu haritayı çevrim içi ziyaret edin](https://scikit-learn.org/stable/tutorial/machine_learning_map/) ve rotayı seyrederken dokümantasyonu okumak için tıklayın.
+
+### Plan
+
+Verinizi iyice kavradığınızda bu harita çok faydalı olacaktır, çünkü karara ulaşırken rotalarında 'yürüyebilirsiniz':
+
+- >50 adet örneğimiz var
+- Bir kategori öngörmek istiyoruz
+- Etiketlenmiş veri var
+- 100 binden az örneğimiz var
+- :sparkles: Bir Linear SVC (Doğrusal Destek Vektör Sınıflandırma) seçebiliriz
+- Eğer bu işe yaramazsa, verimiz sayısal olduğundan
+ - :sparkles: Bir KNeighbors (K Komşu) Sınıflandırıcı deneyebiliriz
+ - Eğer bu işe yaramazsa, :sparkles: SVC (Destek Vektör Sınıflandırma) ve :sparkles: Ensemble (Topluluk) Sınıflandırıcılarını deneyin
+
+Bu çok faydalı bir yol.
+
+## Alıştırma - veriyi bölün
+
+Bu yolu takip ederek, kullanmak için bazı kütüphaneleri alarak başlamalıyız.
+
+1. Gerekli kütüphaneleri alın:
+
+ ```python
+ from sklearn.neighbors import KNeighborsClassifier
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.svm import SVC
+ from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ import numpy as np
+ ```
+
+1. Eğitme ve sınama verinizi bölün:
+
+ ```python
+ X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+ ```
+
+## Linear SVC Sınıflandırıcısı
+
+Destek Vektör kümeleme (SVC), makine öğrenimi yöntemlerinden Destek Vektör Makinelerinin (Aşağıda bunun hakkında daha fazla bilgi edineceksiniz.) alt dallarından biridir. Bu yöntemde, etiketleri nasıl kümeleyeceğinize karar vermek için bir 'kernel' seçebilirsiniz. 'C' parametresi 'düzenlileştirme'yi ifade eder ve parametrelerin etkilerini düzenler. Kernel (çekirdek) [birçoğundan](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC) biri olabilir; burada, doğrusal SVC leveraj ettiğimizden emin olmak için, 'linear' olarak ayarlıyoruz. Olasılık varsayılan olarak 'false' olarak ayarlıdır; burada, olasılık öngörülerini toplamak için, 'true' olarak ayarlıyoruz. Rastgele durumu (random state), olasılıkları elde etmek için veriyi karıştırmak (shuffle) üzere, '0' olarak ayarlıyoruz.
+
+### Alıştırma - doğrusal SVC uygulayın
+
+Sınıflandırıcıardan oluşan bir dizi oluşturarak başlayın. Sınadıkça bu diziye ekleme yapacağız.
+
+1. Liner SVC ile başlayın:
+
+ ```python
+ C = 10
+ # Create different classifiers.
+ classifiers = {
+ 'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0)
+ }
+ ```
+
+2. Linear SVC kullanarak modelinizi eğitin ve raporu bastırın:
+
+ ```python
+ n_classifiers = len(classifiers)
+
+ for index, (name, classifier) in enumerate(classifiers.items()):
+ classifier.fit(X_train, np.ravel(y_train))
+
+ y_pred = classifier.predict(X_test)
+ accuracy = accuracy_score(y_test, y_pred)
+ print("Accuracy (train) for %s: %0.1f%% " % (name, accuracy * 100))
+ print(classification_report(y_test,y_pred))
+ ```
+
+ Sonuç oldukça iyi:
+
+ ```output
+ Accuracy (train) for Linear SVC: 78.6%
+ precision recall f1-score support
+
+ chinese 0.71 0.67 0.69 242
+ indian 0.88 0.86 0.87 234
+ japanese 0.79 0.74 0.76 254
+ korean 0.85 0.81 0.83 242
+ thai 0.71 0.86 0.78 227
+
+ accuracy 0.79 1199
+ macro avg 0.79 0.79 0.79 1199
+ weighted avg 0.79 0.79 0.79 1199
+ ```
+
+## K-Komşu sınıflandırıcısı
+
+K-Komşu, makine öğrenimi yöntemlerinden "neighbors" (komşular) ailesinin bir parçasıdır ve gözetimli ve gözetimsiz öğrenmenin ikisinde de kullanılabilir. Bu yöntemde, önceden tanımlanmış sayıda nokta üretilir ve veri bu noktalar etrafında, genelleştirilmiş etiketlerin veriler için öngörülebileceği şekilde toplanır.
+
+### Alıştırma - K-Komşu sınıflandırıcısını uygulayın
+
+Önceki sınıflandırıcı iyiydi ve veriyle iyi çalıştı, ancak belki daha iyi bir doğruluk elde edebiliriz. K-Komşu sınıflandırıcısını deneyin.
+
+1. Sınıflandırıcı dizinize bir satır ekleyin (Linear SVC ögesinden sonra bir virgül ekleyin):
+
+ ```python
+ 'KNN classifier': KNeighborsClassifier(C),
+ ```
+
+ Sonuç biraz daha kötü:
+
+ ```output
+ Accuracy (train) for KNN classifier: 73.8%
+ precision recall f1-score support
+
+ chinese 0.64 0.67 0.66 242
+ indian 0.86 0.78 0.82 234
+ japanese 0.66 0.83 0.74 254
+ korean 0.94 0.58 0.72 242
+ thai 0.71 0.82 0.76 227
+
+ accuracy 0.74 1199
+ macro avg 0.76 0.74 0.74 1199
+ weighted avg 0.76 0.74 0.74 1199
+ ```
+
+ :white_check_mark: [K-Komşu](https://scikit-learn.org/stable/modules/neighbors.html#neighbors) hakkında bilgi edinin
+
+## Destek Vektör Sınıflandırıcısı
+
+Destek Vektör sınıflandırıcıları, makine öğrenimi yöntemlerinden [Destek Vektörü Makineleri](https://wikipedia.org/wiki/Support-vector_machine) ailesinin bir parçasıdır ve sınıflandırma ve regresyon görevlerinde kullanılır. SVM'ler (Destek Vektör Makineleri), iki kategori arasındaki uzaklığı en yükseğe getirmek için eğitme örneklerini boşluktaki noktalara eşler. Sonraki veri, kategorisinin öngörülebilmesi için bu boşluğa eşlenir.
+
+### Alıştırma - bir Destek Vektör Sınıflandırıcısı uygulayın
+
+Bir Destek Vektör Sınıflandırıcısı ile daha iyi bir doğruluk elde etmeye çalışalım.
+
+1. K-Neighbors ögesinden sonra bir virgül ekleyin, sonra bu satırı ekleyin:
+
+ ```python
+ 'SVC': SVC(),
+ ```
+
+ Sonuç oldukça iyi!
+
+ ```output
+ Accuracy (train) for SVC: 83.2%
+ precision recall f1-score support
+
+ chinese 0.79 0.74 0.76 242
+ indian 0.88 0.90 0.89 234
+ japanese 0.87 0.81 0.84 254
+ korean 0.91 0.82 0.86 242
+ thai 0.74 0.90 0.81 227
+
+ accuracy 0.83 1199
+ macro avg 0.84 0.83 0.83 1199
+ weighted avg 0.84 0.83 0.83 1199
+ ```
+
+ :white_check_mark: [Destek Vektörleri](https://scikit-learn.org/stable/modules/svm.html#svm) hakkında bilgi edinin
+
+## Topluluk Sınıflandırıcıları
+
+Önceki sınamanın oldukça iyi olmasına rağmen rotayı sonuna kadar takip edelim. Bazı Topluluk Sınıflandırıcılarını deneyelim, özellikle Random Forest ve AdaBoost'u:
+
+```python
+'RFST': RandomForestClassifier(n_estimators=100),
+ 'ADA': AdaBoostClassifier(n_estimators=100)
+```
+
+Sonuç çok iyi, özellikle Random Forest sonuçları:
+
+```output
+Accuracy (train) for RFST: 84.5%
+ precision recall f1-score support
+
+ chinese 0.80 0.77 0.78 242
+ indian 0.89 0.92 0.90 234
+ japanese 0.86 0.84 0.85 254
+ korean 0.88 0.83 0.85 242
+ thai 0.80 0.87 0.83 227
+
+ accuracy 0.84 1199
+ macro avg 0.85 0.85 0.84 1199
+weighted avg 0.85 0.84 0.84 1199
+
+Accuracy (train) for ADA: 72.4%
+ precision recall f1-score support
+
+ chinese 0.64 0.49 0.56 242
+ indian 0.91 0.83 0.87 234
+ japanese 0.68 0.69 0.69 254
+ korean 0.73 0.79 0.76 242
+ thai 0.67 0.83 0.74 227
+
+ accuracy 0.72 1199
+ macro avg 0.73 0.73 0.72 1199
+weighted avg 0.73 0.72 0.72 1199
+```
+
+:white_check_mark: [Topluluk Sınıflandırıcıları](https://scikit-learn.org/stable/modules/ensemble.html) hakkında bilgi edinin
+
+Makine Öğreniminin bu yöntemi, modelin kalitesini artırmak için, "birçok temel öngörücünün öngörülerini birleştirir." Bizim örneğimizde, Random Trees ve AdaBoost kullandık.
+
+- [Random Forest](https://scikit-learn.org/stable/modules/ensemble.html#forest) bir ortalama alma yöntemidir, aşırı öğrenmeden kaçınmak için rastgelelikle doldurulmuş 'karar ağaçları'ndan oluşan bir 'orman' oluşturur. n_estimators parametresi, ağaç sayısı olarak ayarlanmaktadır.
+
+- [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html), bir sınıflandıcıyı bir veri setine uydurur ve sonra o sınıflandırıcının kopyalarını aynı veri setine uydurur. Yanlış sınıflandırılmış ögelerin ağırlıklarına odaklanır ve bir sonraki sınıflandırıcının düzeltmesi için uydurma/oturtmayı ayarlar.
+
+---
+
+## :rocket: Meydan okuma
+
+Bu yöntemlerden her biri değiştirebileceğiniz birsürü parametre içeriyor. Her birinin varsayılan parametrelerini araştırın ve bu parametreleri değiştirmenin modelin kalitesi için ne anlama gelebileceği hakkında düşünün.
+
+## [Ders sonrası kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/24/?loc=tr)
+
+## Gözden Geçirme & Kendi Kendine Çalışma
+
+Bu derslerde çok fazla jargon var, bu yüzden yararlı terminoloji içeren [bu listeyi](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-15963-cxa) incelemek için bir dakika ayırın.
+
+## Ödev
+
+[Parametre oyunu](assignment.tr.md)
\ No newline at end of file
diff --git a/4-Classification/3-Classifiers-2/translations/assignment.it.md b/4-Classification/3-Classifiers-2/translations/assignment.it.md
new file mode 100644
index 00000000..472cdb11
--- /dev/null
+++ b/4-Classification/3-Classifiers-2/translations/assignment.it.md
@@ -0,0 +1,11 @@
+# Giocore coi parametri
+
+## Istruzioni
+
+Ci sono molti parametri impostati in modalità predefinita quando si lavora con questi classificatori. Intellisense in VS Code può aiutare a scavare in loro. Adottare una delle tecniche di classificazione ML in questa lezione e riaddestrare i modelli modificando i vari valori dei parametri. Costruire un notebook spiegando perché alcune modifiche aiutano la qualità del modello mentre altre la degradano. La risposta sia dettagliata.
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | ---------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | ----------------------------- |
+| | Un notebook viene presentato con un classificatore completamente costruito e i suoi parametri ottimizzati e le modifiche spiegate nelle caselle di testo | Un quaderno è presentato parzialmente o spiegato male | Un notebook contiene errori o è difettoso |
diff --git a/4-Classification/3-Classifiers-2/translations/assignment.tr.md b/4-Classification/3-Classifiers-2/translations/assignment.tr.md
new file mode 100644
index 00000000..fbc74092
--- /dev/null
+++ b/4-Classification/3-Classifiers-2/translations/assignment.tr.md
@@ -0,0 +1,11 @@
+# Parametre Oyunu
+
+## Yönergeler
+
+Bu sınıflandırıcılarla çalışırken varsayılan olarak ayarlanmış birçok parametre var. VS Code'daki Intellisense, onları derinlemesine incelemenize yardımcı olabilir. Bu dersteki Makine Öğrenimi Sınıflandırma Yöntemlerinden birini seçin ve çeşitli parametre değerlerini değiştirerek modelleri yeniden eğitin. Neden bazı değişikliklerin modelin kalitesini artırdığını ve bazılarının azalttığını açıklayan bir not defteri yapın. Cevabınız açıklayıcı olmalı.
+
+## Rubrik
+
+| Ölçüt | Örnek Alınacak Nitelikte | Yeterli | Geliştirme Gerekli |
+| -------- | ---------------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | ------------------------------- |
+| | Bir sınıflandırıcının tamamen oluşturulduğu ve parametrelerinin değiştirilip yazı kutularında açıklandığı bir not defteri sunulmuş | Not defteri kısmen sunulmuş veya az açıklanmış | Not defteri hatalı veya kusurlu |
\ No newline at end of file
diff --git a/4-Classification/4-Applied/README.md b/4-Classification/4-Applied/README.md
index 773271a1..66597fbf 100644
--- a/4-Classification/4-Applied/README.md
+++ b/4-Classification/4-Applied/README.md
@@ -40,7 +40,7 @@ First, train a classification model using the cleaned cuisines dataset we used.
1. Then, work with your data in the same way you did in previous lessons, by reading a CSV file using `read_csv()`:
```python
- data = pd.read_csv('../data/cleaned_cuisine.csv')
+ data = pd.read_csv('../data/cleaned_cuisines.csv')
data.head()
```
@@ -312,7 +312,7 @@ In this code, there are several things happening:
## Test your application
-Open a terminal session in Visual Studio Code in the folder where your index.html file resides. Ensure that you have `[http-server](https://www.npmjs.com/package/http-server)` installed globally, and type `http-server` at the prompt. A localhost should open and you can view your web app. Check what cuisine is recommended based on various ingredients:
+Open a terminal session in Visual Studio Code in the folder where your index.html file resides. Ensure that you have [http-server](https://www.npmjs.com/package/http-server) installed globally, and type `http-server` at the prompt. A localhost should open and you can view your web app. Check what cuisine is recommended based on various ingredients:

diff --git a/4-Classification/4-Applied/solution/notebook.ipynb b/4-Classification/4-Applied/solution/notebook.ipynb
index 5ed9da52..b388d2ca 100644
--- a/4-Classification/4-Applied/solution/notebook.ipynb
+++ b/4-Classification/4-Applied/solution/notebook.ipynb
@@ -115,7 +115,7 @@
}
],
"source": [
- "data = pd.read_csv('../../data/cleaned_cuisine.csv')\n",
+ "data = pd.read_csv('../../data/cleaned_cuisines.csv')\n",
"data.head()"
]
},
diff --git a/4-Classification/4-Applied/translations/README.it.md b/4-Classification/4-Applied/translations/README.it.md
new file mode 100644
index 00000000..4512aaba
--- /dev/null
+++ b/4-Classification/4-Applied/translations/README.it.md
@@ -0,0 +1,336 @@
+# Costruire un'App Web per Consigliare una Cucina
+
+In questa lezione si creerà un modello di classificazione utilizzando alcune delle tecniche apprese nelle lezioni precedenti e con il delizioso insieme di dati sulla cucina utilizzato in questa serie. Inoltre, si creerà una piccola app web per utilizzare un modello salvato, sfruttando il runtime web di Onnx.
+
+Uno degli usi pratici più utili dell'apprendimento automatico è la creazione di sistemi di raccomandazione e oggi si può fare il primo passo in quella direzione!
+
+[](https://youtu.be/giIXNoiqO_U "Introduzione ai Sistemi di Raccomandazione")
+
+> 🎥 Fare clic sull'immagine sopra per un video: Andrew Ng introduce la progettazione di un sistema di raccomandazione
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/25/)
+
+In questa lezione, si imparerà:
+
+- Come costruire un modello e salvarlo come modello Onnx
+- Come usare Netron per ispezionare il modello
+- Come utilizzare il modello in un'app web per l'inferenza
+
+## Costruire il modello
+
+La creazione di sistemi ML applicati è una parte importante dell'utilizzo di queste tecnologie per i sistemi aziendali. Si possono utilizzare i modelli all'interno delle proprie applicazioni web (e quindi utilizzarli in un contesto offline se necessario) utilizzando Onnx.
+
+In una [lezione precedente](../../../3-Web-App/1-Web-App/translations/README.it.md) si è costruito un modello di regressione sugli avvistamenti di UFO, è stato serializzato e lo si è utilizzato in un'app Flask. Sebbene questa architettura sia molto utile da conoscere, è un'app Python completa e i requisiti potrebbero includere l'uso di un'applicazione JavaScript.
+
+In questa lezione si può creare un sistema di inferenza di base utilizzando JavaScript. Prima, tuttavia, è necessario addestrare un modello e convertirlo per l'utilizzo con Onnx.
+
+## Esercizio - modello di classificazione di addestramento
+
+Innanzitutto, addestrare un modello di classificazione utilizzando l'insieme di dati pulito delle cucine precedentemente usato.
+
+1. Iniziare importando librerie utili:
+
+ ```python
+ !pip install skl2onnx
+ import pandas as pd
+ ```
+
+ Serve '[skl2onnx](https://onnx.ai/sklearn-onnx/)' per poter convertire il modello di Scikit-learn in formato Onnx.
+
+1. Quindi si lavora con i dati nello stesso modo delle lezioni precedenti, leggendo un file CSV usando `read_csv()`:
+
+ ```python
+ data = pd.read_csv('../data/cleaned_cuisine.csv')
+ data.head()
+ ```
+
+1. Rimuovere le prime due colonne non necessarie e salvare i dati rimanenti come "X":
+
+ ```python
+ X = data.iloc[:,2:]
+ X.head()
+ ```
+
+1. Salvare le etichette come "y":
+
+ ```python
+ y = data[['cuisine']]
+ y.head()
+
+ ```
+
+### Iniziare la routine di addestramento
+
+Verrà usata la libreria 'SVC' che ha una buona precisione.
+
+1. Importare le librerie appropriate da Scikit-learn:
+
+ ```python
+ from sklearn.model_selection import train_test_split
+ from sklearn.svm import SVC
+ from sklearn.model_selection import cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report
+ ```
+
+1. Separare gli insiemi di allenamento e test:
+
+ ```python
+ X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
+ ```
+
+1. Costruire un modello di classificazione SVC come fatto nella lezione precedente:
+
+ ```python
+ model = SVC(kernel='linear', C=10, probability=True,random_state=0)
+ model.fit(X_train,y_train.values.ravel())
+ ```
+
+1. Ora provare il modello, chiamando `predict()`:
+
+ ```python
+ y_pred = model.predict(X_test)
+ ```
+
+1. Stampare un rapporto di classificazione per verificare la qualità del modello:
+
+ ```python
+ print(classification_report(y_test,y_pred))
+ ```
+
+ Come visto prima, la precisione è buona:
+
+ ```output
+ precision recall f1-score support
+
+ chinese 0.72 0.69 0.70 257
+ indian 0.91 0.87 0.89 243
+ japanese 0.79 0.77 0.78 239
+ korean 0.83 0.79 0.81 236
+ thai 0.72 0.84 0.78 224
+
+ accuracy 0.79 1199
+ macro avg 0.79 0.79 0.79 1199
+ weighted avg 0.79 0.79 0.79 1199
+ ```
+
+### Convertire il modello in Onnx
+
+Assicurarsi di eseguire la conversione con il numero tensore corretto. Questo insieme di dati ha 380 ingredienti elencati, quindi è necessario annotare quel numero in `FloatTensorType`:
+
+1. Convertire usando un numero tensore di 380.
+
+ ```python
+ from skl2onnx import convert_sklearn
+ from skl2onnx.common.data_types import FloatTensorType
+
+ initial_type = [('float_input', FloatTensorType([None, 380]))]
+ options = {id(model): {'nocl': True, 'zipmap': False}}
+ ```
+
+1. Creare l'onx e salvarlo come file **model.onnx**:
+
+ ```python
+ onx = convert_sklearn(model, initial_types=initial_type, options=options)
+ with open("./model.onnx", "wb") as f:
+ f.write(onx.SerializeToString())
+ ```
+
+ > Nota, si possono passare le[opzioni](https://onnx.ai/sklearn-onnx/parameterized.html) nello script di conversione. In questo caso, si è passato 'nocl' come True e 'zipmap' come False. Poiché questo è un modello di classificazione, si ha la possibilità di rimuovere ZipMap che produce un elenco di dizionari (non necessario). `nocl` si riferisce alle informazioni sulla classe incluse nel modello. Ridurre le dimensioni del modello impostando `nocl` su 'True'.
+
+L'esecuzione dell'intero notebook ora creerà un modello Onnx e lo salverà in questa cartella.
+
+## Visualizzare il modello
+
+I modelli Onnx non sono molto visualizzabili in Visual Studio code, ma c'è un ottimo software gratuito che molti ricercatori usano per visualizzare il modello per assicurarsi che sia costruito correttamente. Scaricare [Netron](https://github.com/lutzroeder/Netron) e aprire il file model.onnx. Si può vedere il modello semplice visualizzato, con i suoi 380 input e classificatore elencati:
+
+
+
+Netron è uno strumento utile per visualizzare i modelli.
+
+Ora si è pronti per utilizzare questo modello accurato in un'app web. Si costruisce un'app che tornerà utile quando si guarda nel frigorifero e si prova a capire quale combinazione di ingredienti avanzati si può usare per cucinare una determinata tipologia di cucina, come determinato dal modello.
+
+## Creare un'applicazione web di raccomandazione
+
+Si può utilizzare il modello direttamente in un'app web. Questa architettura consente anche di eseguirlo localmente e anche offline se necessario. Iniziare creando un file `index.html` nella stessa cartella in cui si è salvato il file `model.onnx`.
+
+1. In questo file _index.html_, aggiungere il seguente codice markup:
+
+ ```html
+
+
+
+ Cuisine Matcher
+
+
+ ...
+
+
+ ```
+
+1. Ora, lavorando all'interno del tag `body` , aggiungere un piccolo markup per mostrare un elenco di caselle di controllo che riflettono alcuni ingredienti:
+
+ ```html
+
Check your refrigerator. What can you create?
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ```
+
+ Notare che a ogni casella di controllo viene assegnato un valore. Questo riflette l'indice in cui si trova l'ingrediente in base all'insieme di dati. Apple, ad esempio, in questo elenco alfabetico, occupa la quinta colonna, quindi il suo valore è "4" poiché si inizia a contare da 0. Si può consultare il [foglio di calcolo degli ingredienti](../../data/ingredient_indexes.csv) per scoprire l'indice di un determinato ingrediente.
+
+ Continuando il lavoro nel file index.html, aggiungere un blocco di script in cui viene chiamato il modello dopo la chiusura del tag `` finale.
+
+1. Innanzitutto, importare il [runtime Onnx](https://www.onnxruntime.ai/):
+
+ ```html
+
+ ```
+
+ > Onnx Runtime viene utilizzato per consentire l'esecuzione dei modelli Onnx su un'ampia gamma di piattaforme hardware, comprese le ottimizzazioni e un'API da utilizzare.
+
+1. Una volta che il Runtime è a posto, lo si può chiamare:
+
+ ```javascript
+
+ ```
+
+In questo codice, accadono diverse cose:
+
+1. Si è creato un array di 380 possibili valori (1 o 0) da impostare e inviare al modello per l'inferenza, a seconda che una casella di controllo dell'ingrediente sia selezionata.
+2. Si è creata una serie di caselle di controllo e un modo per determinare se sono state selezionate in una funzione `init` chiamata all'avvio dell'applicazione. Quando una casella di controllo è selezionata, l 'array `ingredients` viene modificato per riflettere l'ingrediente scelto.
+3. Si è creata una funzione `testCheckboxes` che controlla se una casella di controllo è stata selezionata.
+4. Si utilizza quella funzione quando si preme il pulsante e, se una casella di controllo è selezionata, si avvia l'inferenza.
+5. La routine di inferenza include:
+ 1. Impostazione di un caricamento asincrono del modello
+ 2. Creazione di una struttura tensoriale da inviare al modello
+ 3. Creazione di "feed" che riflettano l'input `float_input` creato durante l'addestramento del modello (si può usare Netron per verificare quel nome)
+ 4. Invio di questi "feed" al modello e attesa di una risposta
+
+## Verificare l'applicazione
+
+Aprire una sessione terminale in Visual Studio Code nella cartella in cui risiede il file index.html. Assicurarsi di avere [http-server](https://www.npmjs.com/package/http-server) installato globalmente e digitare `http-server` al prompt. Dovrebbe aprirsi nel browser un localhost e si può visualizzare l'app web. Controllare quale cucina è consigliata in base ai vari ingredienti:
+
+
+
+Congratulazioni, si è creato un'app web di "raccomandazione" con pochi campi. Si prenda del tempo per costruire questo sistema!
+## 🚀 Sfida
+
+L'app web è molto minimale, quindi continuare a costruirla usando gli ingredienti e i loro indici dai dati [ingredient_indexes](../../data/ingredient_indexes.csv) . Quali combinazioni di sapori funzionano per creare un determinato piatto nazionale?
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/26/)
+
+## Revisione e Auto Apprendimento
+
+Sebbene questa lezione abbia appena toccato l'utilità di creare un sistema di raccomandazione per gli ingredienti alimentari, quest'area delle applicazioni ML è molto ricca di esempi. Leggere di più su come sono costruiti questi sistemi:
+
+- https://www.sciencedirect.com/topics/computer-science/recommendation-engine
+- https://www.technologyreview.com/2014/08/25/171547/the-ultimate-challenge-for-recommendation-engines/
+- https://www.technologyreview.com/2015/03/23/168831/everything-is-a-recommendation/
+
+## Compito
+
+[Creare un nuovo sistema di raccomandazione](assignment.it.md)
diff --git a/4-Classification/4-Applied/translations/README.tr.md b/4-Classification/4-Applied/translations/README.tr.md
new file mode 100644
index 00000000..651fc85b
--- /dev/null
+++ b/4-Classification/4-Applied/translations/README.tr.md
@@ -0,0 +1,336 @@
+# Mutfak Önerici Bir Web Uygulaması Oluşturun
+
+Bu derste, önceki derslerde öğrendiğiniz bazı yöntemleri kullanarak, bu seri boyunca kullanılan leziz mutfak veri setiyle bir sınıflandırma modeli oluşturacaksınız. Ayrıca, kaydettiğiniz modeli kullanmak üzere, Onnx'un web çalışma zamanından yararlanan küçük bir web uygulaması oluşturacaksınız.
+
+Makine öğreniminin en faydalı pratik kullanımlarından biri, önerici/tavsiyeci sistemler oluşturmaktır ve bu yöndeki ilk adımınızı bugün atabilirsiniz!
+
+[](https://youtu.be/giIXNoiqO_U "Recommendation Systems Introduction")
+
+> :movie_camera: Video için yukarıdaki fotoğrafa tıklayın: Andrew Ng introduces recommendation system design (Andrew Ng önerici sistem tasarımını tanıtıyor)
+
+## [Ders öncesi kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/25/?loc=tr)
+
+Bu derste şunları öğreneceksiniz:
+
+- Bir model nasıl oluşturulur ve Onnx modeli olarak kaydedilir
+- Modeli denetlemek için Netron nasıl kullanılır
+- Modeliniz çıkarım için bir web uygulamasında nasıl kullanılabilir
+
+## Modelinizi oluşturun
+
+Uygulamalı Makine Öğrenimi sistemleri oluşturmak, bu teknolojilerden kendi iş sistemleriniz için yararlanmanızın önemli bir parçasıdır. Onnx kullanarak modelleri kendi web uygulamalarınız içerisinde kullanabilirsiniz (Böylece gerektiğinde çevrim dışı bir içerikte kullanabilirsiniz.).
+
+[Önceki bir derste](../../../3-Web-App/1-Web-App/README.md) UFO gözlemleriyle ilgili bir Regresyon modeli oluşturmuş, "pickle" kullanmış ve bir Flask uygulamasında kullanmıştınız. Bu mimariyi bilmek çok faydalıdır, ancak bu tam yığın Python uygulamasıdır ve bir JavaScript uygulaması kullanımı gerekebilir.
+
+Bu derste, çıkarım için temel JavaScript tabanlı bir sistem oluşturabilirsiniz. Ancak öncelikle, bir model eğitmeniz ve Onnx ile kullanım için dönüştürmeniz gerekmektedir.
+
+## Alıştırma - sınıflandırma modelini eğitin
+
+Öncelikle, kullandığımız temiz mutfak veri setini kullanarak bir sınıflandırma modeli eğitin.
+
+1. Faydalı kütüphaneler almakla başlayın:
+
+ ```python
+ !pip install skl2onnx
+ import pandas as pd
+ ```
+
+ Scikit-learn modelinizi Onnx biçimine dönüştürmeyi sağlamak için '[skl2onnx](https://onnx.ai/sklearn-onnx/)'a ihtiyacınız var.
+
+1. Sonra, önceki derslerde yaptığınız şekilde, `read_csv()` kullanarak bir CSV dosyasını okuyarak veriniz üzerinde çalışın:
+
+ ```python
+ data = pd.read_csv('../data/cleaned_cuisines.csv')
+ data.head()
+ ```
+
+1. İlk iki gereksiz sütunu kaldırın ve geriye kalan veriyi 'X' olarak kaydedin:
+
+ ```python
+ X = data.iloc[:,2:]
+ X.head()
+ ```
+
+1. Etiketleri 'y' olarak kaydedin:
+
+ ```python
+ y = data[['cuisine']]
+ y.head()
+
+ ```
+
+### Eğitme rutinine başlayın
+
+İyi doğruluğu olan 'SVC' kütüphanesini kullanacağız.
+
+1. Scikit-learn'den uygun kütüphaneleri alın:
+
+ ```python
+ from sklearn.model_selection import train_test_split
+ from sklearn.svm import SVC
+ from sklearn.model_selection import cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report
+ ```
+
+1. Eğitme ve sınama kümelerini ayırın:
+
+ ```python
+ X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
+ ```
+
+1. Önceki derste yaptığınız gibi bir SVC Sınıflandırma modeli oluşturun:
+
+ ```python
+ model = SVC(kernel='linear', C=10, probability=True,random_state=0)
+ model.fit(X_train,y_train.values.ravel())
+ ```
+
+1. Şimdi, `predict()` fonksiyonunu çağırarak modelinizi sınayın:
+
+ ```python
+ y_pred = model.predict(X_test)
+ ```
+
+1. Modelin kalitesini kontrol etmek için bir sınıflandırma raporu bastırın:
+
+ ```python
+ print(classification_report(y_test,y_pred))
+ ```
+
+ Daha önce de gördüğümüz gibi, doğruluk iyi:
+
+ ```output
+ precision recall f1-score support
+
+ chinese 0.72 0.69 0.70 257
+ indian 0.91 0.87 0.89 243
+ japanese 0.79 0.77 0.78 239
+ korean 0.83 0.79 0.81 236
+ thai 0.72 0.84 0.78 224
+
+ accuracy 0.79 1199
+ macro avg 0.79 0.79 0.79 1199
+ weighted avg 0.79 0.79 0.79 1199
+ ```
+
+### Modelinizi Onnx'a dönüştürün
+
+Dönüştürmeyi uygun Tensor sayısıyla yaptığınıza emin olun. Bu veri seti listelenmiş 380 malzeme içeriyor, dolayısıyla bu sayıyı `FloatTensorType` içinde belirtmeniz gerekiyor:
+
+1. 380 tensor sayısını kullanarak dönüştürün.
+
+ ```python
+ from skl2onnx import convert_sklearn
+ from skl2onnx.common.data_types import FloatTensorType
+
+ initial_type = [('float_input', FloatTensorType([None, 380]))]
+ options = {id(model): {'nocl': True, 'zipmap': False}}
+ ```
+
+1. onx'u oluşturun ve **model.onnx** diye bir dosya olarak kaydedin:
+
+ ```python
+ onx = convert_sklearn(model, initial_types=initial_type, options=options)
+ with open("./model.onnx", "wb") as f:
+ f.write(onx.SerializeToString())
+ ```
+
+ > Not olarak, dönüştürme senaryonuzda [seçenekler](https://onnx.ai/sklearn-onnx/parameterized.html) geçirebilirsiniz. Biz bu durumda, 'nocl' parametresini True ve 'zipmap' parametresini 'False' olarak geçirdik. Bu bir sınıflandırma modeli olduğundan, bir sözlük listesi üreten (gerekli değil) ZipMap'i kaldırma seçeneğiniz var. `nocl`, modelde sınıf bilgisinin barındırılmasını ifade eder. `nocl` parametresini 'True' olarak ayarlayarak modelinizin boyutunu küçültün.
+
+Tüm not defterini çalıştırmak şimdi bir Onnx modeli oluşturacak ve bu klasöre kaydedecek.
+
+## Modelinizi inceleyin
+
+Onnx modelleri Visual Studio code'da pek görünür değiller ama birçok araştırmacının modelin doğru oluştuğundan emin olmak üzere modeli görselleştirmek için kullandığı çok iyi bir yazılım var. [Netron](https://github.com/lutzroeder/Netron)'u indirin ve model.onnx dosyanızı açın. 380 girdisi ve sınıflandırıcısıyla basit modelinizin görselleştirildiğini görebilirsiniz:
+
+
+
+Netron, modellerinizi incelemek için faydalı bir araçtır.
+
+Şimdi, bu düzenli modeli web uygulamanızda kullanmak için hazırsınız. Buzdolabınıza baktığınızda ve verilen bir mutfak için artık malzemelerin hangi birleşimini kullanabileceğinizi bulmayı denediğinizde kullanışlı olacak bir uygulama oluşturalım. Bu birleşim modeliniz tarafından belirlenecek.
+
+## Önerici bir web uygulaması oluşturun
+
+Modelinizi doğrudan bir web uygulamasında kullanabilirsiniz. Bu mimari, modelinizi yerelde ve hatta gerektiğinde çevrim dışı çalıştırabilmenizi de sağlar. `model.onnx` dosyanızı kaydettiğiniz klasörde `index.html` dosyasını oluşturarak başlayın.
+
+1. Bu _index.html_ dosyasında aşağıdaki işaretlemeyi ekleyin:
+
+ ```html
+
+
+
+ Cuisine Matcher
+
+
+ ...
+
+
+ ```
+
+1. Şimdi, `body` etiketleri içinde çalışarak, bazı malzemeleri ifade eden bir onay kutusu listesi göstermek için küçük bir işaretleme ekleyin:
+
+ ```html
+
Check your refrigerator. What can you create?
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+
+ ```
+
+ Her bir onay kutusuna bir değer verildiğine dikkat edin. Bu, veri setine göre malzemenin bulunduğu indexi ifade eder. Örneğin bu alfabetik listede elma beşinci sütundadır, dolayısıyla onun değeri '4'tür çünkü saymaya 0'dan başlıyoruz. Verilen malzemenin indexini görmek için [malzemeler tablosuna](../../data/ingredient_indexes.csv) başvurabilirsiniz.
+
+ index.html dosyasındaki işinize devam ederek, son `` kapamasından sonra modelinizin çağrılacağı bir script bloğu ekleyin.
+
+1. Öncelikle, [Onnx Runtime](https://www.onnxruntime.ai/) alın:
+
+ ```html
+
+ ```
+
+ > Onnx Runtime, Onnx modelinizin, eniyileştirmeler ve kullanmak için bir API da dahil olmak üzere, geniş bir donanım platform yelpazesinde çalışmasını sağlamak için kullanılır.
+
+1. Runtime uygun hale geldiğinde, onu çağırabilirsiniz:
+
+ ```javascript
+
+ ```
+
+Bu kodda birçok şey gerçekleşiyor:
+
+1. Ayarlanması ve çıkarım için modele gönderilmesi için, bir malzeme onay kutusunun işaretli olup olmadığına bağlı 380 muhtemel değerden (ya 1 ya da 0) oluşan bir dizi oluşturdunuz.
+2. Onay kutularından oluşan bir dizi ve uygulama başladığında çağrılan bir `init` fonksiyonunda işaretli olup olmadıklarını belirleme yolu oluşturdunuz. Eğer onay kutusu işaretliyse, `ingredients` dizisi, seçilen malzemeyi ifade etmek üzere değiştirilir.
+3. Herhangi bir onay kutusunun işaretli olup olmadığını kontrol eden bir `testCheckboxes` fonksiyonu oluşturdunuz.
+4. Düğmeye basıldığında o fonksiyonu kullanıyor ve eğer herhangi bir onay kutusu işaretlenmişse çıkarıma başlıyorsunuz.
+5. Çıkarım rutini şunları içerir:
+ 1. Makinenin eşzamansız bir yüklemesini ayarlama
+ 2. Modele göndermek için bir Tensor yapısı oluşturma
+ 3. Modelinizi eğitirken oluşturduğunuz `float_input` (Bu adı doğrulamak için Netron kullanabilirsiniz.) girdisini ifade eden 'feeds' oluşturma
+ 4. Bu 'feeds'i modele gönderme ve yanıt için bekleme
+
+## Uygulamanızı test edin
+
+index.html dosyanızın olduğu klasördeyken Visual Studio Code'da bir terminal açın. Global kapsamda [http-server](https://www.npmjs.com/package/http-server) indirilmiş olduğundan emin olun ve istemde `http-server` yazın. Bir yerel ana makine açılmalı ve web uygulamanızı görebilirsiniz. Çeşitli malzemeleri baz alarak hangi mutfağın önerildiğine bakın:
+
+
+
+Tebrikler, birkaç değişkenle bir 'önerici' web uygulaması oluşturdunuz! Bu sistemi oluşturmak için biraz zaman ayırın!
+## :rocket: Meydan okuma
+
+Web uygulamanız çok minimal, bu yüzden [ingredient_indexes](../../data/ingredient_indexes.csv) verisinden malzemeleri ve indexlerini kullanarak web uygulamanızı oluşturmaya devam edin. Verilen bir ulusal yemeği yapmak için hangi tat birleşimleri işe yarıyor?
+
+## [Ders sonrası kısa sınavı](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/26/?loc=tr)
+
+## Gözden Geçirme & Kendi Kendine Çalışma
+
+Bu dersin sadece yemek malzemeleri için bir öneri sistemi oluşturmanın olanaklarına değinmesiyle beraber, makine öğrenimi uygulamalarının bu alanı örnekler açısından çok zengin. Bu sistemlerin nasıl oluşturulduğu hakkında biraz daha okuyun:
+
+- https://www.sciencedirect.com/topics/computer-science/recommendation-engine
+- https://www.technologyreview.com/2014/08/25/171547/the-ultimate-challenge-for-recommendation-engines/
+- https://www.technologyreview.com/2015/03/23/168831/everything-is-a-recommendation/
+
+## Ödev
+
+[Yeni bir önerici oluşturun](assignment.tr.md)
diff --git a/4-Classification/4-Applied/translations/assignment.it.md b/4-Classification/4-Applied/translations/assignment.it.md
new file mode 100644
index 00000000..cc926c72
--- /dev/null
+++ b/4-Classification/4-Applied/translations/assignment.it.md
@@ -0,0 +1,11 @@
+# Creare un sistema di raccomandazione
+
+## Istruzioni
+
+Dati gli esercizi di questa lezione, ora si conosce come creare un'app Web basata su JavaScript utilizzando Onnx Runtime e un modello Onnx convertito. Sperimentare con la creazione di un nuovo sistema di raccomandazione utilizzando i dati di queste lezioni o provenienti da altre parti (citare le fonti, per favore). Si potrebbe creare un sistema di raccomandazione di animali domestici in base a vari attributi della personalità o un sistema di raccomandazione di genere musicale basato sull'umore di una persona. Dare sfogo alla creatività!
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | ---------------------------------------------------------------------- | ------------------------------------- | --------------------------------- |
+| | Vengono presentati un'app Web e un notebook, entrambi ben documentati e funzionanti | Uno di quei due è mancante o difettoso | Entrambi sono mancanti o difettosi |
diff --git a/4-Classification/4-Applied/translations/assignment.tr.md b/4-Classification/4-Applied/translations/assignment.tr.md
new file mode 100644
index 00000000..f561bf48
--- /dev/null
+++ b/4-Classification/4-Applied/translations/assignment.tr.md
@@ -0,0 +1,11 @@
+# Bir önerici oluşturun
+
+## Yönergeler
+
+Bu dersteki alıştırmalar göz önünde bulundurulursa, Onnx Runtime ve dönüştürülmüş bir Onnx modeli kullanarak JavaScript tabanlı web uygulamasının nasıl oluşturulacağını artık biliyorsunuz. Bu derslerdeki verileri veya başka bir yerden kaynaklandırılmış verileri (Lütfen kaynakça verin.) kullanarak yeni bir önerici oluşturma deneyimi kazanın. Verilen çeşitli kişilik özellikleriyle bir evcil hayvan önericisi veya kişinin ruh haline göre bir müzik türü önericisi oluşturabilirsiniz. Yaratıcı olun!
+
+## Rubrik
+
+| Ölçüt | Örnek Alınacak Nitelikte | Yeterli | Geliştirme Gerekli |
+| -------- | ---------------------------------------------------------------------- | ------------------------------------- | --------------------------------- |
+| | İyi belgelenen ve çalışan bir web uygulaması ve not defteri sunulmuş | İkisinden biri eksik veya kusurlu | İkisi ya eksik ya da kusurlu |
\ No newline at end of file
diff --git a/4-Classification/README.md b/4-Classification/README.md
index f6133aa1..73d83beb 100644
--- a/4-Classification/README.md
+++ b/4-Classification/README.md
@@ -8,7 +8,7 @@ In Asia and India, food traditions are extremely diverse, and very delicious! Le
## What you will learn
-In this section, you will build on the skills you learned in the first part of this curriculum all about regressionn to learn about other classifiers you can use that will help you learn about your data.
+In this section, you will build on the skills you learned in the first part of this curriculum all about regression to learn about other classifiers you can use that will help you learn about your data.
> There are useful low-code tools that can help you learn about working with classification models. Try [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-classification-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
diff --git a/4-Classification/data/cleaned_cuisine.csv b/4-Classification/data/cleaned_cuisines.csv
similarity index 100%
rename from 4-Classification/data/cleaned_cuisine.csv
rename to 4-Classification/data/cleaned_cuisines.csv
diff --git a/4-Classification/translations/README.it.md b/4-Classification/translations/README.it.md
new file mode 100644
index 00000000..fbaa4720
--- /dev/null
+++ b/4-Classification/translations/README.it.md
@@ -0,0 +1,26 @@
+# Iniziare con la classificazione
+
+## Argomento regionale: Deliziose Cucine Asiatiche e Indiane 🍜
+
+In Asia e in India, le tradizioni alimentari sono estremamente diverse e molto deliziose! Si darà un'occhiata ai dati sulle cucine regionali per cercare di capirne gli ingredienti.
+
+
+> Foto di Lisheng Chang su Unsplash
+
+## Cosa si imparerà
+
+In questa sezione si approfondiranno le abilità sulla regressione apprese nella prima parte di questo programma di studi per conoscere altri classificatori da poter utilizzare e che aiuteranno a conoscere i propri dati.
+
+> Esistono utili strumenti a basso codice che possono aiutare a imparare a lavorare con i modelli di regressione. Si provi [Azure ML per questa attività](https://docs.microsoft.com/learn/modules/create-classification-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
+
+## Lezioni
+
+1. [Introduzione alla classificazione](../1-Introduction/translations/README.it.md)
+2. [Più classificatori](../2-Classifiers-1/translations/README.it.md)
+3. [Ancora altri classificatori](../3-Classifiers-2/translations/README.it.md)
+4. [Machine Learning applicato: sviluppare un'app web](../4-Applied/translations/README.it.md)
+## Crediti
+
+"Iniziare con la classificazione" è stato scritto con ♥️ da [Cassie Breviu](https://www.twitter.com/cassieview) e [Jen Looper](https://www.twitter.com/jenlooper)
+
+L'insieme di dati sulle deliziose cucine proviene da [Kaggle](https://www.kaggle.com/hoandan/asian-and-indian-cuisines)
diff --git a/4-Classification/translations/README.tr.md b/4-Classification/translations/README.tr.md
new file mode 100644
index 00000000..9514dd0a
--- /dev/null
+++ b/4-Classification/translations/README.tr.md
@@ -0,0 +1,25 @@
+# Sınıflandırmaya başlarken
+## Bölgesel konu: Leziz Asya ve Hint Mutfağı :ramen:
+
+Asya ve Hindistan'da yemek gelenekleri fazlaca çeşitlilik gösterir ve çok lezzetlidir! Malzemelerini anlamaya çalışmak için bölgesel mutfaklar hakkındaki veriye bakalım.
+
+
+> Fotoğraf Lisheng Chang tarafından çekilmiştir ve Unsplash'tadır.
+
+## Öğrenecekleriniz
+
+Bu bölümde, bu eğitim programının tamamen regresyon üzerine olan ilk bölümünde öğrendiğiniz becerilere dayanıp onların üstüne beceriler ekleyeceksiniz ve veriniz hakkında bilgi sahibi olmanızı sağlayacak diğer sınıflandırıcıları öğreneceksiniz.
+
+> Sınıflandırma modelleriyle çalışmayı öğrenmenizi sağlayacak faydalı düşük kodlu araçlar vardır. [Bu görev için Azure ML](https://docs.microsoft.com/learn/modules/create-classification-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)'i deneyin.
+
+## Dersler
+
+1. [Sınıflandırmaya giriş](../1-Introduction/translations/README.tr.md)
+2. [Daha fazla sınıflandırıcı](../2-Classifiers-1/translations/README.tr.md)
+3. [Hatta daha fazla sınıflandırıcı](../3-Classifiers-2/translations/README.tr.md)
+4. [Uygulamalı Makine Öğrenimi: bir web uygulaması oluşturun](../4-Applied/translations/README.tr.md)
+## Katkıda bulunanlar
+
+"Sınıflandırmaya başlarken" [Cassie Breviu](https://www.twitter.com/cassieview) ve [Jen Looper](https://www.twitter.com/jenlooper) tarafından :hearts: ile yazılmıştır.
+
+Leziz mutfak veri seti [Kaggle](https://www.kaggle.com/hoandan/asian-and-indian-cuisines)'dan alınmıştır.
diff --git a/5-Clustering/1-Visualize/translations/README.it.md b/5-Clustering/1-Visualize/translations/README.it.md
new file mode 100644
index 00000000..104507c9
--- /dev/null
+++ b/5-Clustering/1-Visualize/translations/README.it.md
@@ -0,0 +1,332 @@
+# Introduzione al clustering
+
+Il clustering è un tipo di [apprendimento non supervisionato](https://wikipedia.org/wiki/Unsupervised_learning) che presuppone che un insieme di dati non sia etichettato o che i suoi input non siano abbinati a output predefiniti. Utilizza vari algoritmi per ordinare i dati non etichettati e fornire raggruppamenti in base ai modelli che individua nei dati.
+
+[](https://youtu.be/ty2advRiWJM "No One Like You di PSquare")
+
+> 🎥 Fare clic sull'immagine sopra per un video. Mentre si studia machine learning con il clustering, si potranno gradire brani della Nigerian Dance Hall: questa è una canzone molto apprezzata del 2014 di PSquare.
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/27/)
+
+### Introduzione
+
+[Il clustering](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124) è molto utile per l'esplorazione dei dati. Si vedrà se può aiutare a scoprire tendenze e modelli nel modo in cui il pubblico nigeriano consuma la musica.
+
+✅ Ci si prenda un minuto per pensare agli usi del clustering. Nella vita reale, il clustering si verifica ogni volta che si ha una pila di biancheria e si devono sistemare i vestiti dei propri familiari 🧦👕👖🩲. Nella scienza dei dati, il clustering si verifica quando si tenta di analizzare le preferenze di un utente o di determinare le caratteristiche di qualsiasi insieme di dati senza etichetta. Il clustering, in un certo senso, aiuta a dare un senso al caos, come un cassetto dei calzini.
+
+[](https://youtu.be/esmzYhuFnds "Introduzione al Clustering")
+
+> 🎥 Fare clic sull'immagine sopra per un video: John Guttag del MIT introduce il clustering
+
+In un ambiente professionale, il clustering può essere utilizzato per determinare cose come la segmentazione del mercato, determinare quali fasce d'età acquistano quali articoli, ad esempio. Un altro uso sarebbe il rilevamento di anomalie, forse per rilevare le frodi da un insieme di dati delle transazioni con carta di credito. Oppure si potrebbe usare il clustering per determinare i tumori in una serie di scansioni mediche.
+
+✅ Si pensi un minuto a come si potrebbe aver incontrato il clustering 'nel mondo reale', in un ambiente bancario, e-commerce o aziendale.
+
+> 🎓 È interessante notare che l'analisi dei cluster ha avuto origine nei campi dell'antropologia e della psicologia negli anni '30. Si riusce a immaginare come potrebbe essere stato utilizzato?
+
+In alternativa, lo si può utilizzare per raggruppare i risultati di ricerca, ad esempio tramite link per acquisti, immagini o recensioni. Il clustering è utile quando si dispone di un insieme di dati di grandi dimensioni che si desidera ridurre e sul quale si desidera eseguire un'analisi più granulare, quindi la tecnica può essere utilizzata per conoscere i dati prima che vengano costruiti altri modelli.
+
+✅ Una volta che i dati sono organizzati in cluster, viene assegnato un ID cluster e questa tecnica può essere utile quando si preserva la privacy di un insieme di dati; si può invece fare riferimento a un punto dati tramite il suo ID cluster, piuttosto che dati identificabili più rivelatori. Si riesce a pensare ad altri motivi per cui fare riferimento a un ID cluster piuttosto che ad altri elementi del cluster per identificarlo?
+
+In questo [modulo di apprendimento](https://docs.microsoft.com/learn/modules/train-evaluate-cluster-models?WT.mc_id=academic-15963-cxa) si approfondirà la propria comprensione delle tecniche di clustering
+
+## Iniziare con il clustering
+
+[Scikit-learn offre una vasta gamma](https://scikit-learn.org/stable/modules/clustering.html) di metodi per eseguire il clustering. Il tipo scelto dipenderà dal caso d'uso. Secondo la documentazione, ogni metodo ha diversi vantaggi. Ecco una tabella semplificata dei metodi supportati da Scikit-learn e dei loro casi d'uso appropriati:
+
+| Nome del metodo | Caso d'uso |
+| :--------------------------- | :--------------------------------------------------------------------- |
+| K-MEANS | uso generale, induttivo |
+| Affinity propagation (Propagazione dell'affinità) | molti, cluster irregolari, induttivo |
+| Mean-shift (Spostamento medio) | molti, cluster irregolari, induttivo |
+| Spectral clustering (Raggruppamento spettrale) | pochi, anche grappoli, trasduttivi |
+| Ward hierarchical clustering (Cluster gerarchico) | molti, cluster vincolati, trasduttivi |
+| Agglomerative clustering (Raggruppamento agglomerativo) | molte, vincolate, distanze non euclidee, trasduttive |
+| DBSCAN | geometria non piatta, cluster irregolari, trasduttivo |
+| OPTICS | geometria non piatta, cluster irregolari con densità variabile, trasduttivo |
+| Gaussian mixtures (miscele gaussiane) | geometria piana, induttiva |
+| BIRCH | insiemi di dati di grandi dimensioni con valori anomali, induttivo |
+
+> 🎓 Il modo in cui si creno i cluster ha molto a che fare con il modo in cui si raccolgono punti dati in gruppi. Si esamina un po' di vocabolario:
+>
+> 🎓 ['trasduttivo' vs. 'induttivo'](https://wikipedia.org/wiki/Transduction_(machine_learning))
+>
+> L'inferenza trasduttiva è derivata da casi di addestramento osservati che mappano casi di test specifici. L'inferenza induttiva è derivata da casi di addestramento che mappano regole generali che vengono poi applicate ai casi di test.
+>
+> Un esempio: si immagini di avere un insieme di dati che è solo parzialmente etichettato. Alcune cose sono "dischi", alcune "cd" e altre sono vuote. Il compito è fornire etichette per gli spazi vuoti. Se si scegliesse un approccio induttivo, si addestrerebbe un modello alla ricerca di "dischi" e "cd" e si applicherebbero quelle etichette ai dati non etichettati. Questo approccio avrà problemi a classificare cose che sono in realtà "cassette". Un approccio trasduttivo, d'altra parte, gestisce questi dati sconosciuti in modo più efficace poiché funziona raggruppando elementi simili e quindi applica un'etichetta a un gruppo. In questo caso, i cluster potrebbero riflettere "cose musicali rotonde" e "cose musicali quadrate".
+>
+> 🎓 [Geometria 'non piatta' (non-flat) vs. 'piatta' (flat)](https://datascience.stackexchange.com/questions/52260/terminology-flat-geometry-in-the-context-of-clustering)
+>
+> Derivato dalla terminologia matematica, la geometria non piatta rispetto a quella piatta si riferisce alla misura delle distanze tra i punti mediante metodi geometrici "piatti" ([euclidei](https://wikipedia.org/wiki/Euclidean_geometry)) o "non piatti" (non euclidei).
+>
+> "Piatto" in questo contesto si riferisce alla geometria euclidea (parti della quale vengono insegnate come geometria "piana") e non piatto si riferisce alla geometria non euclidea. Cosa ha a che fare la geometria con machine learning? Bene, come due campi che sono radicati nella matematica, ci deve essere un modo comune per misurare le distanze tra i punti nei cluster, e questo può essere fatto in modo "piatto" o "non piatto", a seconda della natura dei dati . [Le distanze euclidee](https://wikipedia.org/wiki/Euclidean_distance) sono misurate come la lunghezza di un segmento di linea tra due punti. [Le distanze non euclidee](https://wikipedia.org/wiki/Non-Euclidean_geometry) sono misurate lungo una curva. Se i dati, visualizzati, sembrano non esistere su un piano, si potrebbe dover utilizzare un algoritmo specializzato per gestirli.
+>
+
+> Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+>
+> [' Distanze'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)
+>
+> I cluster sono definiti dalla loro matrice di distanza, ad esempio le distanze tra i punti. Questa distanza può essere misurata in alcuni modi. I cluster euclidei sono definiti dalla media dei valori dei punti e contengono un 'centroide' o baricentro. Le distanze sono quindi misurate dalla distanza da quel baricentro. Le distanze non euclidee si riferiscono a "clustroidi", il punto più vicino ad altri punti. I clustroidi a loro volta possono essere definiti in vari modi.
+>
+> 🎓 ['Vincolato'](https://wikipedia.org/wiki/Constrained_clustering)
+>
+> [Constrained Clustering](https://web.cs.ucdavis.edu/~davidson/Publications/ICDMTutorial.pdf) introduce l'apprendimento 'semi-supervisionato' in questo metodo non supervisionato. Le relazioni tra i punti sono contrassegnate come "non è possibile collegare" o "è necessario collegare", quindi alcune regole sono imposte sull'insieme di dati.
+>
+> Un esempio: se un algoritmo viene applicato su un batch di dati non etichettati o semi-etichettati, i cluster che produce potrebbero essere di scarsa qualità. Nell'esempio sopra, i cluster potrebbero raggruppare "cose musicali rotonde" e "cose musicali quadrate" e "cose triangolari" e "biscotti". Se vengono dati dei vincoli, o delle regole da seguire ("l'oggetto deve essere di plastica", "l'oggetto deve essere in grado di produrre musica"), questo può aiutare a "vincolare" l'algoritmo a fare scelte migliori.
+>
+> 'Densità'
+>
+> I dati "rumorosi" sono considerati "densi". Le distanze tra i punti in ciascuno dei suoi cluster possono rivelarsi, all'esame, più o meno dense, o "affollate" e quindi questi dati devono essere analizzati con il metodo di clustering appropriato. [Questo articolo](https://www.kdnuggets.com/2020/02/understanding-density-based-clustering.html) dimostra la differenza tra l'utilizzo del clustering K-Means rispetto agli algoritmi HDBSCAN per esplorare un insieme di dati rumoroso con densità di cluster non uniforme.
+
+## Algoritmi di clustering
+
+Esistono oltre 100 algoritmi di clustering e il loro utilizzo dipende dalla natura dei dati a portata di mano. Si discutono alcuni dei principali:
+
+- **Raggruppamento gerarchico**. Se un oggetto viene classificato in base alla sua vicinanza a un oggetto vicino, piuttosto che a uno più lontano, i cluster vengono formati in base alla distanza dei loro membri da e verso altri oggetti. Il clustering agglomerativo di Scikit-learn è gerarchico.
+
+ 
+ > Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+- **Raggruppamento centroide**. Questo popolare algoritmo richiede la scelta di 'k', o il numero di cluster da formare, dopodiché l'algoritmo determina il punto centrale di un cluster e raccoglie i dati attorno a quel punto. [Il clustering K-means](https://wikipedia.org/wiki/K-means_clustering) è una versione popolare del clustering centroide. Il centro è determinato dalla media più vicina, da qui il nome. La distanza al quadrato dal cluster è ridotta al minimo.
+
+ 
+ > Infografica di [Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+- **Clustering basato sulla distribuzione**. Basato sulla modellazione statistica, il clustering basato sulla distribuzione è incentrato sulla determinazione della probabilità che un punto dati appartenga a un cluster e sull'assegnazione di conseguenza. I metodi di miscelazione gaussiana appartengono a questo tipo.
+
+- **Clustering basato sulla densità**. I punti dati vengono assegnati ai cluster in base alla loro densità o al loro raggruppamento l'uno intorno all'altro. I punti dati lontani dal gruppo sono considerati valori anomali o rumore. DBSCAN, Mean-shift e OPTICS appartengono a questo tipo di clustering.
+
+- **Clustering basato su griglia**. Per gli insiemi di dati multidimensionali, viene creata una griglia e i dati vengono divisi tra le celle della griglia, creando così dei cluster.
+
+## Esercizio: raggruppare i dati
+
+Il clustering come tecnica è notevolmente aiutato da una corretta visualizzazione, quindi si inizia visualizzando i dati musicali. Questo esercizio aiuterà a decidere quale dei metodi di clustering si dovranno utilizzare in modo più efficace per la natura di questi dati.
+
+1. Aprire il file _notebook.ipynb_ in questa cartella.
+
+1. Importare il pacchetto `Seaborn` per una buona visualizzazione dei dati.
+
+ ```python
+ !pip install seaborn
+ ```
+
+1. Aggiungere i dati dei brani da _nigerian-songs.csv_. Caricare un dataframe con alcuni dati sulle canzoni. Prepararsi a esplorare questi dati importando le librerie e scaricando i dati:
+
+ ```python
+ import matplotlib.pyplot as plt
+ import pandas as pd
+
+ df = pd.read_csv("../data/nigerian-songs.csv")
+ df.head()
+ ```
+
+ Controllare le prime righe di dati:
+
+ | | name | album | artist | artist_top_genre | release_date | length | popularity | danceability | acousticness | energy | instrumentalness | liveness | loudness | speechiness | tempo | time_signature |
+ | --- | ------------------------ | ---------------------------- | ------------------- | ---------------- | ------------ | ------ | ---------- | ------------ | ------------ | ------ | ---------------- | -------- | -------- | ----------- | ------- | -------------- |
+ | 0 | Sparky | Mandy & The Jungle | Cruel Santino | alternative r&b | 2019 | 144000 | 48 | 0.666 | 0.851 | 0.42 | 0.534 | 0.11 | -6.699 | 0.0829 | 133.015 | 5 |
+ | 1 | shuga rush | EVERYTHING YOU HEARD IS TRUE | Odunsi (The Engine) | afropop | 2020 | 89488 | 30 | 0.71 | 0.0822 | 0.683 | 0.000169 | 0.101 | -5.64 | 0.36 | 129.993 | 3 |
+ | 2 | LITT! | LITT! | AYLØ | indie r&b | 2018 | 207758 | 40 | 0.836 | 0.272 | 0.564 | 0.000537 | 0.11 | -7.127 | 0.0424 | 130.005 | 4 |
+ | 3 | Confident / Feeling Cool | Enjoy Your Life | Lady Donli | nigerian pop | 2019 | 175135 | 14 | 0.894 | 0.798 | 0.611 | 0.000187 | 0.0964 | -4.961 | 0.113 | 111.087 | 4 |
+ | 4 | wanted you | rare. | Odunsi (The Engine) | afropop | 2018 | 152049 | 25 | 0.702 | 0.116 | 0.833 | 0.91 | 0.348 | -6.044 | 0.0447 | 105.115 | 4 |
+
+1. Ottenere alcune informazioni sul dataframe, chiamando `info()`:
+
+ ```python
+ df.info()
+ ```
+
+ Il risultato appare così:
+
+ ```output
+
+ RangeIndex: 530 entries, 0 to 529
+ Data columns (total 16 columns):
+ # Column Non-Null Count Dtype
+ --- ------ -------------- -----
+ 0 name 530 non-null object
+ 1 album 530 non-null object
+ 2 artist 530 non-null object
+ 3 artist_top_genre 530 non-null object
+ 4 release_date 530 non-null int64
+ 5 length 530 non-null int64
+ 6 popularity 530 non-null int64
+ 7 danceability 530 non-null float64
+ 8 acousticness 530 non-null float64
+ 9 energy 530 non-null float64
+ 10 instrumentalness 530 non-null float64
+ 11 liveness 530 non-null float64
+ 12 loudness 530 non-null float64
+ 13 speechiness 530 non-null float64
+ 14 tempo 530 non-null float64
+ 15 time_signature 530 non-null int64
+ dtypes: float64(8), int64(4), object(4)
+ memory usage: 66.4+ KB
+ ```
+
+1. Ricontrollare i valori null, chiamando `isnull()` e verificando che la somma sia 0:
+
+ ```python
+ df.isnull().sum()
+ ```
+
+ Si presenta bene!
+
+ ```output
+ name 0
+ album 0
+ artist 0
+ artist_top_genre 0
+ release_date 0
+ length 0
+ popularity 0
+ danceability 0
+ acousticness 0
+ energy 0
+ instrumentalness 0
+ liveness 0
+ loudness 0
+ speechiness 0
+ tempo 0
+ time_signature 0
+ dtype: int64
+ ```
+
+1. Descrivere i dati:
+
+ ```python
+ df.describe()
+ ```
+
+ | | release_date | lenght | popularity | danceability | acousticness | Energia | strumentale | vitalità | livello di percezione sonora | parlato | tempo | #ora_firma |
+ | ----- | ------------ | ----------- | ---------- | ------------ | ------------ | -------- | ---------------- | -------- | --------- | ----------- | ---------- | -------------- |
+ | estero) | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 |
+ | mezzo | 2015.390566 | 222298.1698 | 17.507547 | 0.741619 | 0.265412 | 0.760623 | 0,016305 | 0,147308 | -4.953011 | 0,130748 | 116.487864 | 3.986792 |
+ | std | 3.131688 | 39696.82226 | 18.992212 | 0,117522 | 0.208342 | 0.148533 | 0.090321 | 0,123588 | 2.464186 | 0,092939 | 23.518601 | 0.333701 |
+ | min | 1998 | 89488 | 0 | 0,255 | 0,000665 | 0,111 | 0 | 0,0283 | -19,362 | 0,0278 | 61.695 | 3 |
+ | 25% | 2014 | 199305 | 0 | 0,681 | 0,089525 | 0,669 | 0 | 0,07565 | -6.29875 | 0,0591 | 102.96125 | 4 |
+ | 50% | 2016 | 218509 | 13 | 0,761 | 0.2205 | 0.7845 | 0.000004 | 0,1035 | -4.5585 | 0,09795 | 112.7145 | 4 |
+ | 75% | 2017 | 242098.5 | 31 | 0,8295 | 0.403 | 0.87575 | 0.000234 | 0,164 | -3.331 | 0,177 | 125.03925 | 4 |
+ | max | 2020 | 511738 | 73 | 0.966 | 0,954 | 0,995 | 0,91 | 0,811 | 0,582 | 0.514 | 206.007 | 5 |
+
+> 🤔 Se si sta lavorando con il clustering, un metodo non supervisionato che non richiede dati etichettati, perché si stanno mostrando questi dati con etichette? Nella fase di esplorazione dei dati, sono utili, ma non sono necessari per il funzionamento degli algoritmi di clustering. Si potrebbero anche rimuovere le intestazioni delle colonne e fare riferimento ai dati per numero di colonna.
+
+Dare un'occhiata ai valori generali dei dati. Si nota che la popolarità può essere "0", che mostra i brani che non hanno una classifica. Quelli verranno rimossi a breve.
+
+1. Usare un grafico a barre per scoprire i generi più popolari:
+
+ ```python
+ import seaborn as sns
+
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top[:5].index,y=top[:5].values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+ 
+
+✅ Se si desidera vedere più valori superiori, modificare il valore di top `[:5]` con un valore più grande o rimuoverlo per vederli tutti.
+
+Nota, quando un valore di top è descritto come "Missing", ciò significa che Spotify non lo ha classificato, quindi va rimosso.
+
+1. Eliminare i dati mancanti escludendoli via filtro
+
+ ```python
+ df = df[df['artist_top_genre'] != 'Missing']
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top.index,y=top.values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+ Ora ricontrollare i generi:
+
+ 
+
+1. Di gran lunga, i primi tre generi dominano questo insieme di dati. Si pone l'attenzione su `afrodancehall,` `afropop` e `nigerian pop`, filtrando inoltre l'insieme di dati per rimuovere qualsiasi cosa con un valore di popolarità 0 (il che significa che non è stato classificato con una popolarità nell'insieme di dati e può essere considerato rumore per gli scopi attuali):
+
+ ```python
+ df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]
+ df = df[(df['popularity'] > 0)]
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top.index,y=top.values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+1. Fare un test rapido per vedere se i dati sono correlati in modo particolarmente forte:
+
+ ```python
+ corrmat = df.corr()
+ f, ax = plt.subplots(figsize=(12, 9))
+ sns.heatmap(corrmat, vmax=.8, square=True)
+ ```
+
+ 
+
+ L'unica forte correlazione è tra `energy` e `loudness` (volume), il che non è troppo sorprendente, dato che la musica ad alto volume di solito è piuttosto energica. Altrimenti, le correlazioni sono relativamente deboli. Sarà interessante vedere cosa può fare un algoritmo di clustering di questi dati.
+
+ > 🎓 Notare che la correlazione non implica la causalità! Ci sono prove di correlazione ma nessuna prova di causalità. Un [sito web divertente](https://tylervigen.com/spurious-correlations) ha alcune immagini che enfatizzano questo punto.
+
+C'è qualche convergenza in questo insieme di dati intorno alla popolarità e alla ballabilità percepite di una canzone? Una FacetGrid mostra che ci sono cerchi concentrici che si allineano, indipendentemente dal genere. Potrebbe essere che i gusti nigeriani convergano ad un certo livello di ballabilità per questo genere?
+
+✅ Provare diversi punti dati (energy, loudness, speachiness) e più o diversi generi musicali. Cosa si può scoprire? Dare un'occhiata alla tabella con `df.describe()` per vedere la diffusione generale dei punti dati.
+
+### Esercizio - distribuzione dei dati
+
+Questi tre generi sono significativamente differenti nella percezione della loro ballabilità, in base alla loro popolarità?
+
+1. Esaminare la distribuzione dei dati sui tre principali generi per la popolarità e la ballabilità lungo un dato asse x e y.
+
+ ```python
+ sns.set_theme(style="ticks")
+
+ g = sns.jointplot(
+ data=df,
+ x="popularity", y="danceability", hue="artist_top_genre",
+ kind="kde",
+ )
+ ```
+
+ Si possono scoprire cerchi concentrici attorno a un punto di convergenza generale, che mostra la distribuzione dei punti.
+
+ > 🎓 Si noti che questo esempio utilizza un grafico KDE (Kernel Density Estimate) che rappresenta i dati utilizzando una curva di densità di probabilità continua. Questo consente di interpretare i dati quando si lavora con più distribuzioni.
+
+ In generale, i tre generi si allineano liberamente in termini di popolarità e ballabilità. Determinare i cluster in questi dati vagamente allineati sarà una sfida:
+
+ 
+
+1. Crea un grafico a dispersione:
+
+ ```python
+ sns.FacetGrid(df, hue="artist_top_genre", size=5) \
+ .map(plt.scatter, "popularity", "danceability") \
+ .add_legend()
+ ```
+
+ Un grafico a dispersione degli stessi assi mostra un modello di convergenza simile
+
+ 
+
+In generale, per il clustering è possibile utilizzare i grafici a dispersione per mostrare i cluster di dati, quindi è molto utile padroneggiare questo tipo di visualizzazione. Nella prossima lezione, si prenderanno questi dati filtrati e si utilizzerà il clustering k-means per scoprire gruppi in questi dati che si sovrappongono in modi interessanti.
+
+---
+
+## 🚀 Sfida
+
+In preparazione per la lezione successiva, creare un grafico sui vari algoritmi di clustering che si potrebbero scoprire e utilizzare in un ambiente di produzione. Che tipo di problemi sta cercando di affrontare il clustering?
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/28/)
+
+## Revisione e Auto Apprendimento
+
+Prima di applicare gli algoritmi di clustering, come si è appreso, è una buona idea comprendere la natura del proprio insieme di dati. Leggere di più su questo argomento [qui](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)
+
+[Questo utile articolo](https://www.freecodecamp.org/news/8-clustering-algorithms-in-machine-learning-that-all-data-scientists-should-know/) illustra i diversi modi in cui si comportano i vari algoritmi di clustering, date diverse forme di dati.
+
+## Compito
+
+[Ricercare altre visualizzazioni per il clustering](assignment.it.md)
diff --git a/5-Clustering/1-Visualize/translations/README.zh-cn.md b/5-Clustering/1-Visualize/translations/README.zh-cn.md
new file mode 100644
index 00000000..5f41ffa7
--- /dev/null
+++ b/5-Clustering/1-Visualize/translations/README.zh-cn.md
@@ -0,0 +1,336 @@
+# 介绍聚类
+
+聚类是一种无监督学习,它假定数据集未标记或其输入与预定义的输出不匹配。它使用各种算法对未标记的数据进行排序,并根据它在数据中识别的模式提供分组。
+[](https://youtu.be/ty2advRiWJM "No One Like You by PSquare")
+
+> 🎥 点击上面的图片观看视频。当您通过聚类学习机器学习时,请欣赏一些尼日利亚舞厅曲目 - 这是2014 年PSquare上高度评价的歌曲。
+## [课前测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/27/)
+### 介绍
+
+[聚类](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124)对于数据探索非常有用。让我们看看它是否有助于发现尼日利亚观众消费音乐的趋势和模式。
+
+✅花一点时间思考聚类的用途。在现实生活中,每当你有一堆衣服需要整理家人的衣服时,就会发生聚类🧦👕👖🩲. 在数据科学中,聚类用于在尝试分析用户的偏好或确定任何未标记数据集的特征。在某种程度上,聚类有助于理解杂乱的状态,就像是一个袜子抽屉。
+
+[](https://youtu.be/esmzYhuFnds "Introduction to Clustering")
+
+> 🎥单击上图观看视频:麻省理工学院的 John Guttag 介绍聚类
+
+在专业环境中,聚类可用于确定诸如市场细分之类的事情,例如确定哪些年龄组购买哪些商品。另一个用途是异常检测,可能是从信用卡交易数据集中检测欺诈。或者您可以使用聚类来确定一批医学扫描中的肿瘤。
+
+✅ 想一想您是如何在银行、电子商务或商业环境中“意外”遇到聚类的。
+
+> 🎓有趣的是,聚类分析起源于 1930 年代的人类学和心理学领域。你能想象它是如何被使用的吗?
+
+或者,您可以使用它对搜索结果进行分组 - 例如,通过购物链接、图片或评论。当您有一个大型数据集想要减少并且想要对其执行更细粒度的分析时,聚类非常有用,因此该技术可用于在构建其他模型之前了解数据。
+
+✅一旦你的数据被组织成聚类,你就为它分配一个聚类 ID,这个技术在保护数据集的隐私时很有用;您可以改为通过其聚类 ID 来引用数据点,而不是通过更多的可明显区分的数据。您能想到为什么要引用聚类 ID 而不是聚类的其他元素来识别它的其他原因吗?
+
+在此[学习模块中](https://docs.microsoft.com/learn/modules/train-evaluate-cluster-models?WT.mc_id=academic-15963-cxa)加深您对聚类技术的理解
+
+## 聚类入门
+
+[Scikit-learn ](https://scikit-learn.org/stable/modules/clustering.html)提供了大量的方法来执行聚类。您选择的类型将取决于您的用例。根据文档,每种方法都有不同的好处。以下是 Scikit-learn 支持的方法及其适当用例的简化表:
+
+| 方法名称 | 用例 |
+| ---------------------------- | -------------------------------------------------- |
+| K-Means | 通用目的,归纳的 |
+| Affinity propagation | 许多,不均匀的聚类,归纳的 |
+| Mean-shift | 许多,不均匀的聚类,归纳的 |
+| Spectral clustering | 少数,甚至聚类,转导的 |
+| Ward hierarchical clustering | 许多,受约束的聚类,转导的 |
+| Agglomerative clustering | 许多,受约束的,非欧几里得距离,转导的 |
+| DBSCAN | 非平面几何,不均匀聚类,转导的 |
+| OPTICS | 不平坦的几何形状,具有可变密度的不均匀聚类,转导的 |
+| Gaussian mixtures | 平面几何,归纳的 |
+| BIRCH | 具有异常值的大型数据集,归纳的 |
+
+> 🎓我们如何创建聚类与我们如何将数据点收集到组中有很大关系。让我们分析一些词汇:
+>
+> 🎓 [“转导”与“归纳”](https://wikipedia.org/wiki/Transduction_(machine_learning))
+>
+> 转导推理源自观察到的映射到特定测试用例的训练用例。归纳推理源自映射到一般规则的训练案例,然后才应用于测试案例。
+>
+> 示例:假设您有一个仅部分标记的数据集。有些东西是“记录”,有些是“CD”,有些是空白的。您的工作是为空白提供标签。如果您选择归纳方法,您将训练一个寻找“记录”和“CD”的模型,并将这些标签应用于未标记的数据。这种方法将难以对实际上是“盒式磁带”的东西进行分类。另一方面,转导方法可以更有效地处理这些未知数据,因为它可以将相似的项目组合在一起,然后将标签应用于一个组。在这种情况下,聚类可能反映“圆形音乐事物”和“方形音乐事物”。
+>
+> 🎓 [“非平面”与“平面”几何](https://datascience.stackexchange.com/questions/52260/terminology-flat-geometry-in-the-context-of-clustering)
+>
+> 源自数学术语,非平面与平面几何是指通过“平面”([欧几里德](https://wikipedia.org/wiki/Euclidean_geometry))或“非平面”(非欧几里得)几何方法测量点之间的距离。
+>
+> 在此上下文中,“平面”是指欧几里得几何(其中一部分被教导为“平面”几何),而非平面是指非欧几里得几何。几何与机器学习有什么关系?好吧,作为植根于数学的两个领域,必须有一种通用的方法来测量聚类中点之间的距离,并且可以以“平坦”(flat)或“非平坦”(non-flat)的方式完成,具体取决于数据的性质. [欧几里得距离](https://wikipedia.org/wiki/Euclidean_distance)测量为两点之间线段的长度。[非欧距离](https://wikipedia.org/wiki/Non-Euclidean_geometry)是沿曲线测量的。如果您的可视化数据似乎不存在于平面上,您可能需要使用专门的算法来处理它。
+>
+> 
+> [Dasani Madipalli ](https://twitter.com/dasani_decoded)作图
+>
+> 🎓 ['距离'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)
+>
+> 聚类由它们的距离矩阵定义,例如点之间的距离。这个距离可以通过几种方式来测量。欧几里得聚类由点值的平均值定义,并包含“质心”或中心点。因此,距离是通过到该质心的距离来测量的。非欧式距离指的是“聚类中心”,即离其他点最近的点。聚类中心又可以用各种方式定义。
+>
+> 🎓 ['约束'](https://wikipedia.org/wiki/Constrained_clustering)
+>
+> [约束聚类](https://web.cs.ucdavis.edu/~davidson/Publications/ICDMTutorial.pdf)将“半监督”学习引入到这种无监督方法中。点之间的关系被标记为“无法链接”或“必须链接”,因此对数据集强加了一些规则。
+>
+> 一个例子:如果一个算法在一批未标记或半标记的数据上不受约束,它产生的聚类质量可能很差。在上面的示例中,聚类可能将“圆形音乐事物”和“方形音乐事物”以及“三角形事物”和“饼干”分组。如果给出一些约束或要遵循的规则(“物品必须由塑料制成”、“物品需要能够产生音乐”),这可以帮助“约束”算法做出更好的选择。
+>
+> 🎓 '密度'
+>
+> “嘈杂”的数据被认为是“密集的”。在检查时,每个聚类中的点之间的距离可能或多或少地密集或“拥挤”,因此需要使用适当的聚类方法分析这些数据。[本文](https://www.kdnuggets.com/2020/02/understanding-density-based-clustering.html)演示了使用 K-Means 聚类与 HDBSCAN 算法探索具有不均匀聚类密度的嘈杂数据集之间的区别。
+
+## 聚类算法
+
+有超过 100 种聚类算法,它们的使用取决于手头数据的性质。让我们讨论一些主要的:
+
+- **层次聚类**。如果一个对象是根据其与附近对象的接近程度而不是较远对象来分类的,则聚类是根据其成员与其他对象之间的距离来形成的。Scikit-learn 的凝聚聚类是分层的。
+
+ 
+
+ > [Dasani Madipalli ](https://twitter.com/dasani_decoded)作图
+
+- **质心聚类**。这种流行的算法需要选择“k”或要形成的聚类数量,然后算法确定聚类的中心点并围绕该点收集数据。[K-means 聚类](https://wikipedia.org/wiki/K-means_clustering)是质心聚类的流行版本。中心由最近的平均值确定,因此叫做质心。与聚类的平方距离被最小化。
+
+ 
+
+ > [Dasani Madipalli](https://twitter.com/dasani_decoded)作图
+
+- **基于分布的聚类**。基于统计建模,基于分布的聚类中心确定一个数据点属于一个聚类的概率,并相应地分配它。高斯混合方法属于这种类型。
+
+- **基于密度的聚类**。数据点根据它们的密度或它们彼此的分组分配给聚类。远离该组的数据点被视为异常值或噪声。DBSCAN、Mean-shift 和 OPTICS 属于此类聚类。
+
+- **基于网格的聚类**。对于多维数据集,创建一个网格并将数据划分到网格的单元格中,从而创建聚类。
+
+
+
+## 练习 - 对你的数据进行聚类
+
+适当的可视化对聚类作为一种技术有很大帮助,所以让我们从可视化我们的音乐数据开始。这个练习将帮助我们决定我们应该最有效地使用哪种聚类方法来处理这些数据的性质。
+
+1. 打开此文件夹中的*notebook.ipynb*文件。
+
+1. 导入`Seaborn`包以获得良好的数据可视化。
+
+ ```python
+ !pip install seaborn
+ ```
+
+1. 附加来自*nigerian-songs.csv*的歌曲数据。加载包含有关歌曲的一些数据的数据帧。准备好通过导入库和转储数据来探索这些数据:
+
+ ```python
+ import matplotlib.pyplot as plt
+ import pandas as pd
+
+ df = pd.read_csv("../data/nigerian-songs.csv")
+ df.head()
+ ```
+
+ 检查前几行数据:
+
+ | | name | album | artist | artist_top_genre | release_date | length | popularity | danceability | acousticness | energy | instrumentalness | liveness | loudness | speechiness | tempo | time_signature |
+ | --- | ------------------------ | ---------------------------- | ------------------- | ---------------- | ------------ | ------ | ---------- | ------------ | ------------ | ------ | ---------------- | -------- | -------- | ----------- | ------- | -------------- |
+ | 0 | Sparky | Mandy & The Jungle | Cruel Santino | alternative r&b | 2019 | 144000 | 48 | 0.666 | 0.851 | 0.42 | 0.534 | 0.11 | -6.699 | 0.0829 | 133.015 | 5 |
+ | 1 | shuga rush | EVERYTHING YOU HEARD IS TRUE | Odunsi (The Engine) | afropop | 2020 | 89488 | 30 | 0.71 | 0.0822 | 0.683 | 0.000169 | 0.101 | -5.64 | 0.36 | 129.993 | 3 |
+ | 2 | LITT! | LITT! | AYLØ | indie r&b | 2018 | 207758 | 40 | 0.836 | 0.272 | 0.564 | 0.000537 | 0.11 | -7.127 | 0.0424 | 130.005 | 4 |
+ | 3 | Confident / Feeling Cool | Enjoy Your Life | Lady Donli | nigerian pop | 2019 | 175135 | 14 | 0.894 | 0.798 | 0.611 | 0.000187 | 0.0964 | -4.961 | 0.113 | 111.087 | 4 |
+ | 4 | wanted you | rare. | Odunsi (The Engine) | afropop | 2018 | 152049 | 25 | 0.702 | 0.116 | 0.833 | 0.91 | 0.348 | -6.044 | 0.0447 | 105.115 | 4 |
+
+1. 获取有关数据帧的一些信息,调用`info()`:
+
+ ```python
+ df.info()
+ ```
+
+ 输出看起来像这样:
+
+ ```output
+
+ RangeIndex: 530 entries, 0 to 529
+ Data columns (total 16 columns):
+ # Column Non-Null Count Dtype
+ --- ------ -------------- -----
+ 0 name 530 non-null object
+ 1 album 530 non-null object
+ 2 artist 530 non-null object
+ 3 artist_top_genre 530 non-null object
+ 4 release_date 530 non-null int64
+ 5 length 530 non-null int64
+ 6 popularity 530 non-null int64
+ 7 danceability 530 non-null float64
+ 8 acousticness 530 non-null float64
+ 9 energy 530 non-null float64
+ 10 instrumentalness 530 non-null float64
+ 11 liveness 530 non-null float64
+ 12 loudness 530 non-null float64
+ 13 speechiness 530 non-null float64
+ 14 tempo 530 non-null float64
+ 15 time_signature 530 non-null int64
+ dtypes: float64(8), int64(4), object(4)
+ memory usage: 66.4+ KB
+ ```
+
+1. 通过调用`isnull()`和验证总和为 0 来仔细检查空值:
+
+ ```python
+ df.isnull().sum()
+ ```
+
+ 看起来不错:
+
+ ```output
+ name 0
+ album 0
+ artist 0
+ artist_top_genre 0
+ release_date 0
+ length 0
+ popularity 0
+ danceability 0
+ acousticness 0
+ energy 0
+ instrumentalness 0
+ liveness 0
+ loudness 0
+ speechiness 0
+ tempo 0
+ time_signature 0
+ dtype: int64
+ ```
+
+1. 描述数据:
+
+ ```python
+ df.describe()
+ ```
+
+ | | release_date | length | popularity | danceability | acousticness | energy | instrumentalness | liveness | loudness | speechiness | tempo | time_signature |
+ | ----- | ------------ | ----------- | ---------- | ------------ | ------------ | -------- | ---------------- | -------- | --------- | ----------- | ---------- | -------------- |
+ | count | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 |
+ | mean | 2015.390566 | 222298.1698 | 17.507547 | 0.741619 | 0.265412 | 0.760623 | 0.016305 | 0.147308 | -4.953011 | 0.130748 | 116.487864 | 3.986792 |
+ | std | 3.131688 | 39696.82226 | 18.992212 | 0.117522 | 0.208342 | 0.148533 | 0.090321 | 0.123588 | 2.464186 | 0.092939 | 23.518601 | 0.333701 |
+ | min | 1998 | 89488 | 0 | 0.255 | 0.000665 | 0.111 | 0 | 0.0283 | -19.362 | 0.0278 | 61.695 | 3 |
+ | 25% | 2014 | 199305 | 0 | 0.681 | 0.089525 | 0.669 | 0 | 0.07565 | -6.29875 | 0.0591 | 102.96125 | 4 |
+ | 50% | 2016 | 218509 | 13 | 0.761 | 0.2205 | 0.7845 | 0.000004 | 0.1035 | -4.5585 | 0.09795 | 112.7145 | 4 |
+ | 75% | 2017 | 242098.5 | 31 | 0.8295 | 0.403 | 0.87575 | 0.000234 | 0.164 | -3.331 | 0.177 | 125.03925 | 4 |
+ | max | 2020 | 511738 | 73 | 0.966 | 0.954 | 0.995 | 0.91 | 0.811 | 0.582 | 0.514 | 206.007 | 5 |
+
+> 🤔如果我们正在使用聚类,一种不需要标记数据的无监督方法,为什么我们用标签显示这些数据?在数据探索阶段,它们派上用场,但它们不是聚类算法工作所必需的。您也可以删除列标题并按列号引用数据。
+
+查看数据的普遍值。请注意,流行度可以是“0”,表示没有排名的歌曲。让我们尽快删除它们。
+
+1. 使用条形图找出最受欢迎的类型:
+
+ ```python
+ import seaborn as sns
+
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top[:5].index,y=top[:5].values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+ 
+
+✅如果您想查看更多顶部值,请将顶部更改`[:5]`为更大的值,或将其删除以查看全部。
+
+请注意,当顶级流派被描述为“缺失”时,这意味着 Spotify 没有对其进行分类,所以让我们避免它。
+
+1. 通过过滤掉丢失的数据来避免
+
+ ```python
+ df = df[df['artist_top_genre'] != 'Missing']
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top.index,y=top.values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+ 现在重新检查genres:
+
+ 
+
+1. 到目前为止,前三大流派主导了这个数据集。让我们专注于`afro dancehall`, `afropop`, 和`nigerian pop`,另外过滤数据集以删除任何具有 0 流行度值的内容(这意味着它在数据集中没有被归类为流行度并且可以被视为我们的目的的噪音):
+
+ ```python
+ df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]
+ df = df[(df['popularity'] > 0)]
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top.index,y=top.values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+1. 做一个快速测试,看看数据是否以任何特别强的方式相关:
+
+ ```python
+ corrmat = df.corr()
+ f, ax = plt.subplots(figsize=(12, 9))
+ sns.heatmap(corrmat, vmax=.8, square=True)
+ ```
+
+ 
+
+ > 唯一强相关性是`energy`和之间`loudness`,这并不奇怪,因为嘈杂的音乐通常非常有活力。否则,相关性相对较弱。看看聚类算法可以如何处理这些数据会很有趣。
+ >
+ > > 🎓请注意,相关性并不意味着因果关系!我们有相关性的证据,但没有因果关系的证据。一个[有趣的网站](https://tylervigen.com/spurious-correlations)有一些强调这一点的视觉效果。
+
+这个数据集是否围绕歌曲的流行度和可舞性有任何收敛?FacetGrid 显示无论流派如何,都有同心圆排列。对于这种类型,尼日利亚人的口味是否会在某种程度的可舞性上趋于一致?
+
+✅尝试不同的数据点(能量、响度、语音)和更多或不同的音乐类型。你能发现什么?查看`df.describe()`表格以了解数据点的一般分布。
+
+### 练习 - 数据分布
+
+这三种流派是否因其受欢迎程度而对其可舞性的看法有显着差异?
+
+1. 检查我们沿给定 x 和 y 轴的流行度和可舞性的前三种类型数据分布。
+
+ ```python
+ sns.set_theme(style="ticks")
+
+ g = sns.jointplot(
+ data=df,
+ x="popularity", y="danceability", hue="artist_top_genre",
+ kind="kde",
+ )
+ ```
+
+ 您可以发现围绕一般收敛点的同心圆,显示点的分布。
+
+ > 🎓请注意,此示例使用 KDE(核密度估计)图,该图使用连续概率密度曲线表示数据。这允许我们在处理多个分布时解释数据。
+
+ 总的来说,这三种流派在流行度和可舞性方面松散地对齐。在这种松散对齐的数据中确定聚类将是一个挑战:
+
+ 
+
+1. 创建散点图:
+
+ ```python
+ sns.FacetGrid(df, hue="artist_top_genre", size=5) \
+ .map(plt.scatter, "popularity", "danceability") \
+ .add_legend()
+ ```
+
+ 相同轴的散点图显示了类似的收敛模式
+
+ 
+
+一般来说,对于聚类,你可以使用散点图来展示数据的聚类,所以掌握这种类型的可视化是非常有用的。在下一课中,我们将使用过滤后的数据并使用 k-means 聚类来发现这些数据中以有趣方式重叠的组。
+
+---
+
+## 🚀挑战
+
+为下一课做准备,制作一张图表,说明您可能会在生产环境中发现和使用的各种聚类算法。
+
+聚类试图解决什么样的问题?
+
+## [课后测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/28/)
+
+## 复习与自学
+
+在应用聚类算法之前,正如我们所了解的,了解数据集的性质是一个好主意。[在此处](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)阅读有关此主题的更多[信息](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)
+
+[这篇有用的文章将](https://www.freecodecamp.org/news/8-clustering-algorithms-in-machine-learning-that-all-data-scientists-should-know/)引导您了解各种聚类算法在给定不同数据形状的情况下的不同行为方式。
+
+## 作业
+
+[研究用于聚类的其他可视化](./assignment.zh-cn.md)
diff --git a/5-Clustering/1-Visualize/translations/assignment.it.md b/5-Clustering/1-Visualize/translations/assignment.it.md
new file mode 100644
index 00000000..dad3d708
--- /dev/null
+++ b/5-Clustering/1-Visualize/translations/assignment.it.md
@@ -0,0 +1,11 @@
+# Ricercare altre visualizzazioni per il clustering
+
+## Istruzioni
+
+In questa lezione, si è lavorato con alcune tecniche di visualizzazione per capire come tracciare i propri dati in preparazione per il clustering. I grafici a dispersione, in particolare, sono utili per trovare gruppi di oggetti. Ricercare modi diversi e librerie diverse per creare grafici a dispersione e documentare il proprio lavoro in un notebook. Si possono utilizzare i dati di questa lezione, di altre lezioni o dei dati che si sono procurati in autonomia (per favore citare la fonte, comunque, nel proprio notebook). Tracciare alcuni dati usando i grafici a dispersione e spiegare cosa si scopre.
+
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | -------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------- |
+| | Viene presentato un notebook con cinque grafici a dispersione ben documentati | Un notebook viene presentato con meno di cinque grafici a dispersione ed è meno ben documentato | Viene presentato un notebook incompleto |
diff --git a/5-Clustering/1-Visualize/translations/assignment.zh-cn.md b/5-Clustering/1-Visualize/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..48f5ea24
--- /dev/null
+++ b/5-Clustering/1-Visualize/translations/assignment.zh-cn.md
@@ -0,0 +1,13 @@
+# 研究用于聚类的其他可视化
+
+## 说明
+
+在本节课中,您使用了一些可视化技术来掌握绘制数据图,为聚类数据做准备。散点图在寻找一组对象时尤其有用。研究不同的方法和不同的库来创建散点图,并在notebook上记录你的工作。你可以使用这节课的数据,其他课的数据,或者你自己的数据(但是,请把它的来源记在你的notebook上)。用散点图绘制一些数据,并解释你的发现。
+
+## 评判规则
+
+
+| 评判标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | -------------------------------- | ----------------------------------------------- | -------------------- |
+| | notebook上有五个详细文档的散点图 | notebook上的散点图少于5个,而且文档写得不太详细 | 一个不完整的notebook |
+
diff --git a/5-Clustering/2-K-Means/README.md b/5-Clustering/2-K-Means/README.md
index 153932e6..d85654ae 100644
--- a/5-Clustering/2-K-Means/README.md
+++ b/5-Clustering/2-K-Means/README.md
@@ -224,7 +224,7 @@ Previously, you surmised that, because you have targeted 3 song genres, you shou
## Variance
-Variance is defined as "the average of the squared differences from the Mean."[source](https://www.mathsisfun.com/data/standard-deviation.html) In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean.
+Variance is defined as "the average of the squared differences from the Mean" [source](https://www.mathsisfun.com/data/standard-deviation.html). In the context of this clustering problem, it refers to data that the numbers of our dataset tend to diverge a bit too much from the mean.
✅ This is a great moment to think about all the ways you could correct this issue. Tweak the data a bit more? Use different columns? Use a different algorithm? Hint: Try [scaling your data](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/) to normalize it and test other columns.
@@ -242,7 +242,7 @@ Hint: Try to scale your data. There's commented code in the notebook that adds s
## Review & Self Study
-Take a look at Stanford's K-Means Simulator [here](https://stanford.edu/class/engr108/visualizations/kmeans/kmeans.html). You can use this tool to visualize sample data points and determine its centroids. With fresh data, click 'update' to see how long it takes to find convergence. You can edit the data's randomness, numbers of clusters and numbers of centroids. Does this help you get an idea of how the data can be grouped?
+Take a look at a K-Means Simulator [such as this one](https://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/). You can use this tool to visualize sample data points and determine its centroids. You can edit the data's randomness, numbers of clusters and numbers of centroids. Does this help you get an idea of how the data can be grouped?
Also, take a look at [this handout on k-means](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html) from Stanford.
diff --git a/5-Clustering/2-K-Means/translations/README.it.md b/5-Clustering/2-K-Means/translations/README.it.md
new file mode 100644
index 00000000..ce1dc9d2
--- /dev/null
+++ b/5-Clustering/2-K-Means/translations/README.it.md
@@ -0,0 +1,251 @@
+# Clustering K-Means
+
+[](https://youtu.be/hDmNF9JG3lo " Andrew Ng spiega Clustering")
+
+> 🎥 Fare clic sull'immagine sopra per un video: Andrew Ng spiega il clustering
+
+## [Quiz Pre-Lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/29/)
+
+In questa lezione si imparerà come creare cluster utilizzando Scikit-learn e l'insieme di dati di musica nigeriana importato in precedenza. Si tratteranno le basi di K-Means per Clustering. Si tenga presente che, come appreso nella lezione precedente, ci sono molti modi per lavorare con i cluster e il metodo usato dipende dai propri dati. Si proverà K-Means poiché è la tecnica di clustering più comune. Si inizia!
+
+Temini che si imparerà a conoscere:
+
+- Silhouette scoring (punteggio silhouette)
+- Elbow method (metodo del gomito)
+- Inerzia
+- Varianza
+
+## Introduzione
+
+[K-Means Clustering](https://wikipedia.org/wiki/K-means_clustering) è un metodo derivato dal campo dell'elaborazione del segnale. Viene utilizzato per dividere e partizionare gruppi di dati in cluster "k" utilizzando una serie di osservazioni. Ogni osservazione lavora per raggruppare un dato punto dati più vicino alla sua "media" più vicina, o punto centrale di un cluster.
+
+I cluster possono essere visualizzati come [diagrammi di Voronoi](https://wikipedia.org/wiki/Voronoi_diagram), che includono un punto (o 'seme') e la sua regione corrispondente.
+
+
+
+> Infografica di [Jen Looper](https://twitter.com/jenlooper)
+
+Il processo di clustering K-Means [viene eseguito in tre fasi](https://scikit-learn.org/stable/modules/clustering.html#k-means):
+
+1. L'algoritmo seleziona il numero k di punti centrali campionando dall'insieme di dati. Dopo questo, esegue un ciclo:
+ 1. Assegna ogni campione al centroide più vicino.
+ 2. Crea nuovi centroidi prendendo il valore medio di tutti i campioni assegnati ai centroidi precedenti.
+ 3. Quindi, calcola la differenza tra il nuovo e il vecchio centroide e ripete finché i centroidi non sono stabilizzati.
+
+Uno svantaggio dell'utilizzo di K-Means include il fatto che sarà necessario stabilire 'k', ovvero il numero di centroidi. Fortunatamente il "metodo del gomito" aiuta a stimare un buon valore iniziale per "k". Si proverà in un minuto.
+
+## Prerequisito
+
+Si lavorerà nel file _notebook.ipynb_ di questa lezione che include l'importazione dei dati e la pulizia preliminare fatta nell'ultima lezione.
+
+## Esercizio - preparazione
+
+Iniziare dando un'altra occhiata ai dati delle canzoni.
+
+1. Creare un diagramma a scatola e baffi (boxplot), chiamando `boxplot()` per ogni colonna:
+
+ ```python
+ plt.figure(figsize=(20,20), dpi=200)
+
+ plt.subplot(4,3,1)
+ sns.boxplot(x = 'popularity', data = df)
+
+ plt.subplot(4,3,2)
+ sns.boxplot(x = 'acousticness', data = df)
+
+ plt.subplot(4,3,3)
+ sns.boxplot(x = 'energy', data = df)
+
+ plt.subplot(4,3,4)
+ sns.boxplot(x = 'instrumentalness', data = df)
+
+ plt.subplot(4,3,5)
+ sns.boxplot(x = 'liveness', data = df)
+
+ plt.subplot(4,3,6)
+ sns.boxplot(x = 'loudness', data = df)
+
+ plt.subplot(4,3,7)
+ sns.boxplot(x = 'speechiness', data = df)
+
+ plt.subplot(4,3,8)
+ sns.boxplot(x = 'tempo', data = df)
+
+ plt.subplot(4,3,9)
+ sns.boxplot(x = 'time_signature', data = df)
+
+ plt.subplot(4,3,10)
+ sns.boxplot(x = 'danceability', data = df)
+
+ plt.subplot(4,3,11)
+ sns.boxplot(x = 'length', data = df)
+
+ plt.subplot(4,3,12)
+ sns.boxplot(x = 'release_date', data = df)
+ ```
+
+ Questi dati sono un po' rumorosi: osservando ogni colonna come un boxplot, si possono vedere i valori anomali.
+
+ 
+
+Si potrebbe esaminare l'insieme di dati e rimuovere questi valori anomali, ma ciò renderebbe i dati piuttosto minimi.
+
+1. Per ora, si scelgono quali colonne utilizzare per questo esercizio di clustering. Scegliere quelle con intervalli simili e codifica la colonna `artist_top_genre` come dati numerici:
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+ le = LabelEncoder()
+
+ X = df.loc[:, ('artist_top_genre','popularity','danceability','acousticness','loudness','energy')]
+
+ y = df['artist_top_genre']
+
+ X['artist_top_genre'] = le.fit_transform(X['artist_top_genre'])
+
+ y = le.transform(y)
+ ```
+
+1. Ora si deve scegliere quanti cluster scegliere come obiettivo. E' noto che ci sono 3 generi di canzoni ricavati dall'insieme di dati, quindi si prova 3:
+
+ ```python
+ from sklearn.cluster import KMeans
+
+ nclusters = 3
+ seed = 0
+
+ km = KMeans(n_clusters=nclusters, random_state=seed)
+ km.fit(X)
+
+ # Predict the cluster for each data point
+
+ y_cluster_kmeans = km.predict(X)
+ y_cluster_kmeans
+ ```
+
+Viene visualizzato un array con i cluster previsti (0, 1 o 2) per ogni riga del dataframe di dati.
+
+1. Usare questo array per calcolare un "punteggio silhouette":
+
+ ```python
+ from sklearn import metrics
+ score = metrics.silhouette_score(X, y_cluster_kmeans)
+ score
+ ```
+
+## Punteggio Silhouette
+
+Si vuole ottenere un punteggio silhouette più vicino a 1. Questo punteggio varia da -1 a 1 e, se il punteggio è 1, il cluster è denso e ben separato dagli altri cluster. Un valore vicino a 0 rappresenta cluster sovrapposti con campioni molto vicini al limite di decisione dei clusters vicini [fonte](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam).
+
+Il punteggio è **.53**, quindi proprio nel mezzo. Ciò indica che i dati non sono particolarmente adatti a questo tipo di clustering, ma si prosegue.
+
+### Esercizio: costruire il proprio modello
+
+1. Importare `KMeans` e avviare il processo di clustering.
+
+ ```python
+ from sklearn.cluster import KMeans
+ wcss = []
+
+ for i in range(1, 11):
+ kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
+ kmeans.fit(X)
+ wcss.append(kmeans.inertia_)
+
+ ```
+
+ Ci sono alcune parti qui che meritano una spiegazione.
+
+ > 🎓 range: queste sono le iterazioni del processo di clustering
+
+ > 🎓 random_state: "Determina la generazione di numeri casuali per l'inizializzazione del centroide."[fonte](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
+
+ > 🎓 WCSS: "somma dei quadrati all'interno del cluster" misura la distanza media al quadrato di tutti i punti all'interno di un cluster rispetto al cluster centroid [fonte](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce).
+
+ > 🎓 Inerzia: gli algoritmi K-Means tentano di scegliere i centroidi per ridurre al minimo l’’inerzia’, "una misura di quanto siano coerenti i cluster".[fonte](https://scikit-learn.org/stable/modules/clustering.html). Il valore viene aggiunto alla variabile wcss ad ogni iterazione.
+
+ > 🎓 k-means++: in [Scikit-learn](https://scikit-learn.org/stable/modules/clustering.html#k-means) puoi utilizzare l'ottimizzazione 'k-means++', che "inizializza i centroidi in modo che siano (generalmente) distanti l'uno dall'altro, portando probabilmente a risultati migliori rispetto all'inizializzazione casuale.
+
+### Metodo del gomito
+
+In precedenza, si era supposto che, poiché sono stati presi di mira 3 generi di canzoni, si dovrebbero scegliere 3 cluster. E' questo il caso?
+
+1. Usare il "metodo del gomito" per assicurarsene.
+
+ ```python
+ plt.figure(figsize=(10,5))
+ sns.lineplot(range(1, 11), wcss,marker='o',color='red')
+ plt.title('Elbow')
+ plt.xlabel('Number of clusters')
+ plt.ylabel('WCSS')
+ plt.show()
+ ```
+
+ Usare la variabile `wcss` creata nel passaggio precedente per creare un grafico che mostra dove si trova la "piegatura" nel gomito, che indica il numero ottimale di cluster. Forse **sono** 3!
+
+ 
+
+## Esercizio - visualizzare i cluster
+
+1. Riprovare il processo, questa volta impostando tre cluster e visualizzare i cluster come grafico a dispersione:
+
+ ```python
+ from sklearn.cluster import KMeans
+ kmeans = KMeans(n_clusters = 3)
+ kmeans.fit(X)
+ labels = kmeans.predict(X)
+ plt.scatter(df['popularity'],df['danceability'],c = labels)
+ plt.xlabel('popularity')
+ plt.ylabel('danceability')
+ plt.show()
+ ```
+
+1. Verificare la precisione del modello:
+
+ ```python
+ labels = kmeans.labels_
+
+ correct_labels = sum(y == labels)
+
+ print("Result: %d out of %d samples were correctly labeled." % (correct_labels, y.size))
+
+ print('Accuracy score: {0:0.2f}'. format(correct_labels/float(y.size)))
+ ```
+
+ La precisione di questo modello non è molto buona e la forma dei grappoli fornisce un indizio sul perché.
+
+ 
+
+ Questi dati sono troppo sbilanciati, troppo poco correlati e c'è troppa varianza tra i valori della colonna per raggruppare bene. In effetti, i cluster che si formano sono probabilmente fortemente influenzati o distorti dalle tre categorie di genere definite sopra. È stato un processo di apprendimento!
+
+ Nella documentazione di Scikit-learn, si può vedere che un modello come questo, con cluster non molto ben delimitati, ha un problema di "varianza":
+
+ 
+ > Infografica da Scikit-learn
+
+## Varianza
+
+La varianza è definita come "la media delle differenze al quadrato dalla media" [fonte](https://www.mathsisfun.com/data/standard-deviation.html). Nel contesto di questo problema di clustering, si fa riferimento ai dati che i numeri dell'insieme di dati tendono a divergere un po' troppo dalla media.
+
+✅ Questo è un ottimo momento per pensare a tutti i modi in cui si potrebbe correggere questo problema. Modificare un po' di più i dati? Utilizzare colonne diverse? Utilizzare un algoritmo diverso? Suggerimento: provare a [ridimensionare i dati](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/) per normalizzarli e testare altre colonne.
+
+> Provare questo "[calcolatore della varianza](https://www.calculatorsoup.com/calculators/statistics/variance-calculator.php)" per capire un po’ di più il concetto.
+
+---
+
+## 🚀 Sfida
+
+Trascorrere un po' di tempo con questo notebook, modificando i parametri. E possibile migliorare l'accuratezza del modello pulendo maggiormente i dati (rimuovendo gli outlier, ad esempio)? È possibile utilizzare i pesi per dare più peso a determinati campioni di dati. Cos'altro si può fare per creare cluster migliori?
+
+Suggerimento: provare a ridimensionare i dati. C'è un codice commentato nel notebook che aggiunge il ridimensionamento standard per rendere le colonne di dati più simili tra loro in termini di intervallo. Si scoprirà che mentre il punteggio della silhouette diminuisce, il "kink" nel grafico del gomito si attenua. Questo perché lasciare i dati non scalati consente ai dati con meno varianza di avere più peso. Leggere un po' di più su questo problema [qui](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering/21226#21226).
+
+## [Quiz post-lezione](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/30/)
+
+## Revisione e Auto Apprendimento
+
+Dare un'occhiata a un simulatore di K-Means [tipo questo](https://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/). È possibile utilizzare questo strumento per visualizzare i punti dati di esempio e determinarne i centroidi. Questo aiuta a farsi un'idea di come i dati possono essere raggruppati?
+
+Inoltre, dare un'occhiata a [questa dispensa sui k-means](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html) di Stanford.
+
+## Compito
+
+[Provare diversi metodi di clustering](assignment.it.md)
diff --git a/5-Clustering/2-K-Means/translations/README.zh-cn.md b/5-Clustering/2-K-Means/translations/README.zh-cn.md
new file mode 100644
index 00000000..32a33015
--- /dev/null
+++ b/5-Clustering/2-K-Means/translations/README.zh-cn.md
@@ -0,0 +1,253 @@
+# K-Means 聚类
+
+[](https://youtu.be/hDmNF9JG3lo "Andrew Ng explains Clustering")
+
+> 🎥 单击上图观看视频:Andrew Ng 解释聚类
+
+## [课前测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/29/)
+
+在本课中,您将学习如何使用 Scikit-learn 和您之前导入的尼日利亚音乐数据集创建聚类。我们将介绍 K-Means 聚类 的基础知识。请记住,正如您在上一课中学到的,使用聚类的方法有很多种,您使用的方法取决于您的数据。我们将尝试 K-Means,因为它是最常见的聚类技术。让我们开始吧!
+
+您将了解的术语:
+
+- 轮廓打分
+- 手肘方法
+- 惯性
+- 方差
+
+## 介绍
+
+[K-Means Clustering](https://wikipedia.org/wiki/K-means_clustering)是一种源自信号处理领域的方法。它用于使用一系列观察将数据组划分和划分为“k”个聚类。每个观察都用于对最接近其最近“平均值”或聚类中心点的给定数据点进行分组。
+
+聚类可以可视化为[Voronoi 图](https://wikipedia.org/wiki/Voronoi_diagram),其中包括一个点(或“种子”)及其相应的区域。
+
+
+
+> [Jen Looper](https://twitter.com/jenlooper)作图
+
+K-Means 聚类过程[分三步执行](https://scikit-learn.org/stable/modules/clustering.html#k-means):
+
+1. 该算法通过从数据集中采样来选择 k 个中心点。在此之后,它循环:
+ 1. 它将每个样本分配到最近的质心。
+ 2. 它通过取分配给先前质心的所有样本的平均值来创建新质心。
+ 3. 然后,它计算新旧质心之间的差异并重复直到质心稳定。
+
+使用 K-Means 的一个缺点包括您需要建立“k”,即质心的数量。幸运的是,“肘部法则”有助于估计“k”的良好起始值。试一下吧。
+
+## 前置条件
+
+您将使用本课的*notebook.ipynb*文件,其中包含您在上一课中所做的数据导入和初步清理。
+
+## 练习 - 准备
+
+首先再看看歌曲数据。
+
+1. 创建一个箱线图,`boxplot()`为每一列调用:
+
+ ```python
+ plt.figure(figsize=(20,20), dpi=200)
+
+ plt.subplot(4,3,1)
+ sns.boxplot(x = 'popularity', data = df)
+
+ plt.subplot(4,3,2)
+ sns.boxplot(x = 'acousticness', data = df)
+
+ plt.subplot(4,3,3)
+ sns.boxplot(x = 'energy', data = df)
+
+ plt.subplot(4,3,4)
+ sns.boxplot(x = 'instrumentalness', data = df)
+
+ plt.subplot(4,3,5)
+ sns.boxplot(x = 'liveness', data = df)
+
+ plt.subplot(4,3,6)
+ sns.boxplot(x = 'loudness', data = df)
+
+ plt.subplot(4,3,7)
+ sns.boxplot(x = 'speechiness', data = df)
+
+ plt.subplot(4,3,8)
+ sns.boxplot(x = 'tempo', data = df)
+
+ plt.subplot(4,3,9)
+ sns.boxplot(x = 'time_signature', data = df)
+
+ plt.subplot(4,3,10)
+ sns.boxplot(x = 'danceability', data = df)
+
+ plt.subplot(4,3,11)
+ sns.boxplot(x = 'length', data = df)
+
+ plt.subplot(4,3,12)
+ sns.boxplot(x = 'release_date', data = df)
+ ```
+
+ 这个数据有点嘈杂:通过观察每一列作为箱线图,你可以看到异常值。
+
+ 
+
+您可以浏览数据集并删除这些异常值,但这会使数据非常少。
+
+1. 现在,选择您将用于聚类练习的列。选择具有相似范围的那些并将`artist_top_genre`列编码为数字类型的数据:
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+ le = LabelEncoder()
+
+ X = df.loc[:, ('artist_top_genre','popularity','danceability','acousticness','loudness','energy')]
+
+ y = df['artist_top_genre']
+
+ X['artist_top_genre'] = le.fit_transform(X['artist_top_genre'])
+
+ y = le.transform(y)
+ ```
+
+1. 现在您需要选择要定位的聚类数量。您知道我们从数据集中挖掘出 3 种歌曲流派,所以让我们尝试 3 种:
+
+ ```python
+ from sklearn.cluster import KMeans
+
+ nclusters = 3
+ seed = 0
+
+ km = KMeans(n_clusters=nclusters, random_state=seed)
+ km.fit(X)
+
+ # Predict the cluster for each data point
+
+ y_cluster_kmeans = km.predict(X)
+ y_cluster_kmeans
+ ```
+
+您会看到打印出的数组,其中包含数据帧每一行的预测聚类(0、1 或 2)。
+
+1. 使用此数组计算“轮廓分数”:
+
+ ```python
+ from sklearn import metrics
+ score = metrics.silhouette_score(X, y_cluster_kmeans)
+ score
+ ```
+
+## 轮廓分数
+
+寻找接近 1 的轮廓分数。该分数从 -1 到 1 不等,如果分数为 1,则该聚类密集且与其他聚类分离良好。接近 0 的值表示重叠聚类,样本非常接近相邻聚类的决策边界。[来源](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam)。
+
+我们的分数是**0.53**,所以正好在中间。这表明我们的数据不是特别适合这种类型的聚类,但让我们继续。
+
+### 练习 - 建立模型
+
+1. 导入`KMeans`并启动聚类过程。
+
+ ```python
+ from sklearn.cluster import KMeans
+ wcss = []
+
+ for i in range(1, 11):
+ kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
+ kmeans.fit(X)
+ wcss.append(kmeans.inertia_)
+
+ ```
+
+ 这里有几个部分需要解释。
+
+ > 🎓 range:这些是聚类过程的迭代
+
+ > 🎓random_state:“确定质心初始化的随机数生成。” [来源](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
+
+ > 🎓WCSS:“聚类内平方和”测量聚类内所有点到聚类质心的平方平均距离。[来源](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce)。
+
+ > 🎓Inertia:K-Means 算法尝试选择质心以最小化“惯性”,“惯性是衡量内部相干程度的一种方法”。[来源](https://scikit-learn.org/stable/modules/clustering.html)。该值在每次迭代时附加到 wcss 变量。
+
+ > 🎓k-means++:在[Scikit-learn 中,](https://scikit-learn.org/stable/modules/clustering.html#k-means)您可以使用“k-means++”优化,它“将质心初始化为(通常)彼此远离,导致可能比随机初始化更好的结果。
+
+### 手肘方法
+
+之前,您推测,因为您已经定位了 3 个歌曲genre,所以您应该选择 3 个聚类。但真的是这样吗?
+
+1. 使用手肘方法来确认。
+
+ ```python
+ plt.figure(figsize=(10,5))
+ sns.lineplot(range(1, 11), wcss,marker='o',color='red')
+ plt.title('Elbow')
+ plt.xlabel('Number of clusters')
+ plt.ylabel('WCSS')
+ plt.show()
+ ```
+
+ 使用`wcss`您在上一步中构建的变量创建一个图表,显示肘部“弯曲”的位置,这表示最佳聚类数。也许**是**3!
+
+ 
+
+## 练习 - 显示聚类
+
+1. 再次尝试该过程,这次设置三个聚类,并将聚类显示为散点图:
+
+ ```python
+ from sklearn.cluster import KMeans
+ kmeans = KMeans(n_clusters = 3)
+ kmeans.fit(X)
+ labels = kmeans.predict(X)
+ plt.scatter(df['popularity'],df['danceability'],c = labels)
+ plt.xlabel('popularity')
+ plt.ylabel('danceability')
+ plt.show()
+ ```
+
+1. 检查模型的准确性:
+
+ ```python
+ labels = kmeans.labels_
+
+ correct_labels = sum(y == labels)
+
+ print("Result: %d out of %d samples were correctly labeled." % (correct_labels, y.size))
+
+ print('Accuracy score: {0:0.2f}'. format(correct_labels/float(y.size)))
+ ```
+
+ 这个模型的准确性不是很好,聚类的形状给了你一个提示。
+
+ 
+
+ 这些数据太不平衡,相关性太低,列值之间的差异太大,无法很好地聚类。事实上,形成的聚类可能受到我们上面定义的三个类型类别的严重影响或扭曲。那是一个学习的过程!
+
+ 在 Scikit-learn 的文档中,你可以看到像这样的模型,聚类划分不是很好,有一个“方差”问题:
+
+ 
+
+ > 图来自 Scikit-learn
+
+## 方差
+
+> 方差被定义为“来自均值的平方差的平均值”[源](https://www.mathsisfun.com/data/standard-deviation.html)。在这个聚类问题的上下文中,它指的是我们数据集的数量往往与平均值相差太多的数据。
+>
+> ✅这是考虑可以纠正此问题的所有方法的好时机。稍微调整一下数据?使用不同的列?使用不同的算法?提示:尝试[缩放数据](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/)以对其进行标准化并测试其他列。
+>
+> > 试试这个“[方差计算器](https://www.calculatorsoup.com/calculators/statistics/variance-calculator.php)”来更多地理解这个概念。
+
+---
+
+## 🚀挑战
+
+花一些时间在这个笔记本上,调整参数。您能否通过更多地清理数据(例如,去除异常值)来提高模型的准确性?您可以使用权重为给定的数据样本赋予更多权重。你还能做些什么来创建更好的聚类?
+
+提示:尝试缩放您的数据。笔记本中的注释代码添加了标准缩放,使数据列在范围方面更加相似。您会发现,当轮廓分数下降时,肘部图中的“扭结”变得平滑。这是因为不缩放数据可以让方差较小的数据承载更多的权重。在[这里](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering/21226#21226)阅读更多关于这个问题的[信息](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering/21226#21226)。
+
+## [课后测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/30/)
+
+## 复习与自学
+
+看看[像这样](https://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/)的 K-Means 模拟器。您可以使用此工具来可视化样本数据点并确定其质心。您可以编辑数据的随机性、聚类数和质心数。这是否有助于您了解如何对数据进行分组?
+
+另外,看看斯坦福大学的[k-means 讲义](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html)。
+
+## 作业
+
+[尝试不同的聚类方法](./assignment.zh-cn.md)
+
diff --git a/5-Clustering/2-K-Means/translations/assignment.it.md b/5-Clustering/2-K-Means/translations/assignment.it.md
new file mode 100644
index 00000000..59fc79de
--- /dev/null
+++ b/5-Clustering/2-K-Means/translations/assignment.it.md
@@ -0,0 +1,10 @@
+# Provare diversi metodi di clustering
+
+## Istruzioni
+
+In questa lezione si è imparato a conoscere il clustering K-Means. A volte K-Means non è appropriato per i propri dati. Creare un notebook usando i dati da queste lezioni o da qualche altra parte (accreditare la fonte) e mostrare un metodo di clustering diverso NON usando K-Means. Che cosa si è imparato?
+## Rubrica
+
+| Criteri | Ottimo | Adeguato | Necessita miglioramento |
+| -------- | --------------------------------------------------------------- | -------------------------------------------------------------------- | ---------------------------- |
+| | Viene presentato un notebook con un modello di clustering ben documentato | Un notebook è presentato senza una buona documentazione e/o incompleto | E' stato inviato un lavoro incompleto |
diff --git a/5-Clustering/2-K-Means/translations/assignment.zh-cn.md b/5-Clustering/2-K-Means/translations/assignment.zh-cn.md
new file mode 100644
index 00000000..c21058d3
--- /dev/null
+++ b/5-Clustering/2-K-Means/translations/assignment.zh-cn.md
@@ -0,0 +1,12 @@
+# 尝试不同的聚类方法
+
+
+## 说明
+
+在本课中,您学习了 K-Means 聚类。有时 K-Means 不适合您的数据。使用来自这些课程或其他地方的数据(归功于您的来源)创建notebook,并展示不使用 K-Means 的不同聚类方法。你学到了什么?
+## 评判规则
+
+| 评判标准 | 优秀 | 中规中矩 | 仍需努力 |
+| -------- | --------------------------------------------------------------- | -------------------------------------------------------------------- | ---------------------------- |
+| | 一个具有良好文档记录的聚类模型的notebook | 一个没有详细文档或不完整的notebook| 提交了一个不完整的工作 |
+
diff --git a/5-Clustering/translations/README.it.md b/5-Clustering/translations/README.it.md
new file mode 100644
index 00000000..4a056e72
--- /dev/null
+++ b/5-Clustering/translations/README.it.md
@@ -0,0 +1,29 @@
+# Modelli di clustering per machine learning
+
+Il clustering è un'attività di machine learning che cerca di trovare oggetti che si assomigliano per raggrupparli in gruppi chiamati cluster. Ciò che differenzia il clustering da altri approcci in machine learning è che le cose accadono automaticamente, infatti, è giusto dire che è l'opposto dell'apprendimento supervisionato.
+
+## Tema regionale: modelli di clustering per il gusto musicale di un pubblico nigeriano 🎧
+
+Il pubblico eterogeneo della Nigeria ha gusti musicali diversi. Usando i dati recuperati da Spotify (ispirato da [questo articolo](https://towardsdatascience.com/country-wise-visual-analysis-of-music-taste-using-spotify-api-seaborn-in-python-77f5b749b421), si dà un'occhiata a un po' di musica popolare in Nigeria. Questo insieme di dati include dati sul punteggio di "danzabilità", acustica, volume, "speechness" (un numero compreso tra zero e uno che indica la probabilità che un particolare file audio sia parlato - n.d.t.) popolarità ed energia di varie canzoni. Sarà interessante scoprire modelli in questi dati!
+
+
+
+Foto di Marcela Laskoski su Unsplash
+
+In questa serie di lezioni si scopriranno nuovi modi per analizzare i dati utilizzando tecniche di clustering. Il clustering è particolarmente utile quando l'insieme di dati non ha etichette. Se ha etichette, le tecniche di classificazione come quelle apprese nelle lezioni precedenti potrebbero essere più utili. Ma nei casi in cui si sta cercando di raggruppare dati senza etichetta, il clustering è un ottimo modo per scoprire i modelli.
+
+> Esistono utili strumenti a basso codice che possono aiutare a imparare a lavorare con i modelli di clustering. Si provi [Azure ML per questa attività](https://docs.microsoft.com/learn/modules/create-clustering-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
+
+## Lezioni
+
+
+1. [Introduzione al clustering](../1-Visualize/translations/README.it.md)
+2. [K-Means clustering](../2-K-Means/translations/README.it.md)
+
+## Crediti
+
+Queste lezioni sono state scritte con 🎶 da [Jen Looper](https://www.twitter.com/jenlooper) con utili recensioni di [Rishit Dagli](https://rishit_dagli) e [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan).
+
+L'insieme di dati [Nigerian Songs](https://www.kaggle.com/sootersaalu/nigerian-songs-spotify) è stato prelevato da Kaggle, a sua volta recuperato da Spotify.
+
+Esempi utili di K-Means che hanno aiutato nella creazione di questa lezione includono questa [esplorazione dell'iride](https://www.kaggle.com/bburns/iris-exploration-pca-k-means-and-gmm-clustering), questo [notebook introduttivo](https://www.kaggle.com/prashant111/k-means-clustering-with-python) e questo [ipotetico esempio di ONG](https://www.kaggle.com/ankandash/pca-k-means-clustering-hierarchical-clustering).
\ No newline at end of file
diff --git a/5-Clustering/translations/README.zh-cn.md b/5-Clustering/translations/README.zh-cn.md
new file mode 100644
index 00000000..7f05082b
--- /dev/null
+++ b/5-Clustering/translations/README.zh-cn.md
@@ -0,0 +1,27 @@
+# 机器学习中的聚类模型
+
+聚类(clustering)是一项机器学习任务,用于寻找类似对象并将他们分成不同的组(这些组称做“聚类”(cluster))。聚类与其它机器学习方法的不同之处在于聚类是自动进行的。事实上,我们可以说它是监督学习的对立面。
+
+## 本节主题: 尼日利亚观众音乐品味的聚类模型🎧
+
+尼日利亚多样化的观众有着多样化的音乐品味。使用从Spotify上抓取的数据(受到[本文](https://towardsdatascience.com/country-wise-visual-analysis-of-music-taste-using-spotify-api-seaborn-in-python-77f5b749b421)的启发),让我们看看尼日利亚流行的一些音乐。这个数据集包括关于各种歌曲的舞蹈性、声学、响度、言语、流行度和活力的分数。从这些数据中发现一些模式(pattern)会是很有趣的事情!
+
+
+
+Marcela Laskoski在Unsplash上的照片
+
+在本系列课程中,您将发现使用聚类技术分析数据的新方法。当数据集缺少标签的时候,聚类特别有用。如果它有标签,那么分类技术(比如您在前面的课程中所学的那些)可能会更有用。但是如果要对未标记的数据进行分组,聚类是发现模式的好方法。
+
+> 这里有一些有用的低代码工具可以帮助您了解如何使用聚类模型。尝试 [Azure ML for this task](https://docs.microsoft.com/learn/modules/create-clustering-model-azure-machine-learning-designer/?WT.mc_id=academic-15963-cxa)
+## 课程安排
+
+1. [介绍聚类](../1-Visualize/translations/README.zh-cn.md)
+2. [K-Means聚类](../2-K-Means/translations/README.zh-cn.md)
+## 致谢
+
+这些课程由Jen Looper在🎶上撰写,并由 [Rishit Dagli](https://rishit_dagli) 和[Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan)进行了有帮助的评审。
+
+[尼日利亚歌曲数据集](https://www.kaggle.com/sootersaalu/nigerian-songs-spotify) 来自Kaggle抓取的Spotify数据。
+
+一些帮助创造了这节课程的K-Means例子包括:[虹膜探索(iris exploration)](https://www.kaggle.com/bburns/iris-exploration-pca-k-means-and-gmm-clustering),[介绍性的笔记(introductory notebook)](https://www.kaggle.com/prashant111/k-means-clustering-with-python),和 [假设非政府组织的例子(hypothetical NGO example)](https://www.kaggle.com/ankandash/pca-k-means-clustering-hierarchical-clustering)。
+
diff --git a/6-NLP/1-Introduction-to-NLP/README.md b/6-NLP/1-Introduction-to-NLP/README.md
index 0d47a1d7..51235856 100644
--- a/6-NLP/1-Introduction-to-NLP/README.md
+++ b/6-NLP/1-Introduction-to-NLP/README.md
@@ -17,7 +17,7 @@ You will learn about:
## Computational linguistics
-Computational linguistics is an area of research and development over many decades that studies how computers can work with, and even understand, translate, and communicate with languages. natural language processing (NLP) is a related field focused on how computers can process 'natural', or human, languages.
+Computational linguistics is an area of research and development over many decades that studies how computers can work with, and even understand, translate, and communicate with languages. Natural language processing (NLP) is a related field focused on how computers can process 'natural', or human, languages.
### Example - phone dictation
@@ -69,7 +69,7 @@ The idea for this came from a party game called *The Imitation Game* where an in
### Developing Eliza
-In the 1960's an MIT scientist called *Joseph Weizenbaum* developed [*Eliza*](https:/wikipedia.org/wiki/ELIZA), a computer 'therapist' that would ask the human questions and give the appearance of understanding their answers. However, while Eliza could parse a sentence and identify certain grammatical constructs and keywords so as to give a reasonable answer, it could not be said to *understand* the sentence. If Eliza was presented with a sentence following the format "**I am** sad" it might rearrange and substitute words in the sentence to form the response "How long have **you been** sad".
+In the 1960's an MIT scientist called *Joseph Weizenbaum* developed [*Eliza*](https://wikipedia.org/wiki/ELIZA), a computer 'therapist' that would ask the human questions and give the appearance of understanding their answers. However, while Eliza could parse a sentence and identify certain grammatical constructs and keywords so as to give a reasonable answer, it could not be said to *understand* the sentence. If Eliza was presented with a sentence following the format "**I am** sad" it might rearrange and substitute words in the sentence to form the response "How long have **you been** sad".
This gave the impression that Eliza understood the statement and was asking a follow-on question, whereas in reality, it was changing the tense and adding some words. If Eliza could not identify a keyword that it had a response for, it would instead give a random response that could be applicable to many different statements. Eliza could be easily tricked, for instance if a user wrote "**You are** a bicycle" it might respond with "How long have **I been** a bicycle?", instead of a more reasoned response.
@@ -81,7 +81,7 @@ This gave the impression that Eliza understood the statement and was asking a fo
## Exercise - coding a basic conversational bot
-A conversational bot, like Eliza, is a program that elicits user input and seems to understand and respond intelligently. Unlike Eliza, our bot will not have several rules giving it the appearance of having an intelligent conversation. Instead, out bot will have one ability only, to keep the conversation going with random responses that might work in almost any trivial conversation.
+A conversational bot, like Eliza, is a program that elicits user input and seems to understand and respond intelligently. Unlike Eliza, our bot will not have several rules giving it the appearance of having an intelligent conversation. Instead, our bot will have one ability only, to keep the conversation going with random responses that might work in almost any trivial conversation.
### The plan
diff --git a/6-NLP/1-Introduction-to-NLP/translations/README.zh-cn.md b/6-NLP/1-Introduction-to-NLP/translations/README.zh-cn.md
new file mode 100644
index 00000000..e9df88a3
--- /dev/null
+++ b/6-NLP/1-Introduction-to-NLP/translations/README.zh-cn.md
@@ -0,0 +1,162 @@
+# 自然语言处理介绍
+这节课讲解了 *自然语言处理* 的简要历史和重要概念,*自然语言处理*是计算语言学的一个子领域。
+
+## [课前测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/31/)
+
+## 介绍
+众所周知,自然语言处理 (Natural Language Processing, NLP) 是机器学习在生产软件中应用最广泛的领域之一。
+
+✅ 你能想到哪些你日常生活中使用的软件可能嵌入了自然语言处理技术呢?或者,你经常使用的文字处理程序或移动应用程序中是否嵌入了自然语言处理技术呢?
+
+你将会学习到:
+
+- **什么是「语言」**。语言的发展历程,以及相关研究的主要领域。
+- **定义和概念**。你还将了解关于计算机文本处理的概念。包括解析 (parsing)、语法 (grammar) 以及识别名词与动词。这节课中有一些编程任务;还有一些重要概念将在以后的课程中被引入,届时你也会练习通过编程实现其它概念。
+
+## 计算语言学
+
+计算语言学 (Computational Linguistics) 是一个经过几十年研究和发展的领域,它研究如何让计算机能使用、理解、翻译语言并使用语言交流。自然语言处理 (NLP) 是计算语言学中一个专注于计算机如何处理「自然的」(或者说,人类的)语言的相关领域。
+
+### 举例:电话号码识别
+
+如果你曾经在手机上使用语音输入替代键盘输入,或者使用过虚拟语音助手,那么你的语音将被转录(或者叫*解析*)为文本形式后进行处理。被检测到的关键字最后将被处理成手机或语音助手可以理解并可以依此做出行为的格式。
+
+
+> 真正意义上的语言理解很难!图源:[Jen Looper](https://twitter.com/jenlooper)
+
+### 这项技术是如何实现的?
+
+我们之所以可能完成这样的任务,是因为有人编写了一个计算机程序来实现它。几十年前,一些科幻作家预测,在未来,人类很大可能会能够他们的电脑对话,而电脑总是能准确地理解人类的意思。可惜的是,事实证明这个问题的解决比我们想象的更困难。虽然今天这个问题已经被初步解决,但在理解句子的含义时,要实现 “完美” 的自然语言处理仍然存在重大挑战 —— 理解幽默或是检测感情(比如讽刺)对于计算机来说尤其困难。
+
+现在,你可能会想起课堂上老师讲解的语法。在某些国家/地区,语法和语言学知识是学生的专题课内容。但在另一些国家/地区,不管是从小学习的第一语言(学习阅读和写作),还是之后学习的第二语言中,语法及语言学知识都是作为语言的一部分教学的。所以,如果你不能很好地区分名词与动词或者区分副词与形容词,请不要担心!
+
+你还为难以区分*一般现在时*与*现在进行时*而烦恼吗?没关系的,即使是对以这门语言为母语的人在内的大多数人来说,区分它们都很有挑战性。但是,计算机非常善于应用标准的规则,你将学会编写可以像人一样"解析"句子的代码。稍后你将面对的更大挑战是理解句子的*语义*和*情绪*。
+
+## 前提
+
+本节教程的主要先决条件是能够阅读和理解本节教程的语言。本节中没有数学问题或方程需要解决。虽然原作者用英文写了这教程,但它也被翻译成其他语言,所以你可能在阅读翻译内容。这节课的示例中涉及到很多语言种类(以比较不同语言的不同语法规则)。这些是*未*翻译的,但对它们的解释是翻译过的,所以你应该能理解它在讲什么。
+
+编程任务中,你将会使用 Python 语言,示例使用的是 Python 3.8 版本。
+
+在本节中你将需要并使用如下技能:
+
+- **Python 3**。你需要能够理解并使用 Python 3. 本课将会使用输入、循环、文件读取、数组功能。
+- **Visual Studio Code + 扩展**. 我们将使用 Visual Studio Code 及其 Python 扩展。你也可以使用你喜欢的 Python IDE。
+- **TextBlob**. [TextBlob](https://github.com/sloria/TextBlob)是一个精简的 Python 文本处理库。请按照 TextBlob 网站上的说明,在您的系统上安装它(也需要安装语料库,安装代码如下所示):
+-
+ ```bash
+ pip install -U textblob
+ python -m textblob.download_corpora
+ ```
+
+> 💡 提示:你可以在 VS Code 环境中直接运行 Python。 点击[docs](https://code.visualstudio.com/docs/languages/python?WT.mc_id=academic-15963-cxa)查看更多信息。
+
+## 与机器对话
+
+试图让计算机理解人类语言的尝试最早可以追溯到几十年前。*Alan Turing* 是最早研究自然语言处理问题的科学家之一。
+
+### 图灵测试
+
+当图灵在 1950 年代研究*人工智能*时,他想出了这个思维实验:让人类和计算机通过打字的方式来交谈,其中人类并不知道对方是人类还是计算机。
+
+如果经过一定时间的交谈,人类无法确定对方是否是计算机,那么是否可以认为计算机正在“思考”?
+
+### 灵感 - “模仿游戏”
+
+这个想法来自一个名为 *模仿游戏* 的派对游戏,其中一名审讯者独自一人在一个房间里,负责确定在另一个房间里的两人的性别(男性或女性)。审讯者可以传递笔记,并且需要想出能够揭示神秘人性别的问题。当然,另一个房间的玩家也可以通过回答问题的方式来欺骗审讯者,例如用看似真诚的方式误导或迷惑审讯者。
+
+### Eliza 的研发
+
+在 1960 年代的麻省理工学院,一位名叫 *Joseph Weizenbaum* 的科学家开发了[*Eliza*](https:/wikipedia.org/wiki/ELIZA)。Eliza 是一位计算机“治疗师”,它可以向人类提出问题并让人类觉得它能理解人类的回答。然而,虽然 Eliza 可以解析句子并识别某些语法结构和关键字以给出合理的答案,但不能说它*理解*了句子。如果 Eliza 看到的句子格式为“**I am** sad”(**我很** 难过),它可能会重新排列并替换句子中的单词,回答 “How long have **you been** sad"(**你已经** 难过 多久了)。
+
+看起来像是 Eliza 理解了这句话,还在询问关于这句话的问题,而实际上,它只是在改变时态和添加词语。如果 Eliza 没有在回答中发现它知道如何响应的词汇,它会给出一个随机响应,该响应可以适用于许多不同的语句。 Eliza 很容易被欺骗,例如,如果用户写了 "**You are** a bicycle"(**你是** 个 自行车),它可能会回复 "How long have **I been** a bicycle?"(**我已经是** 一个 自行车 多久了?),而不是更合理的回答。
+
+[](https://youtu.be/RMK9AphfLco "跟 Eliza 聊天")
+
+> 🎥 点击上方的图片查看关于 Eliza 原型的视频
+
+> 旁注:如果你拥有 ACM 账户,你可以阅读 1996 年发表的 [Eliza](https://cacm.acm.org/magazines/1966/1/13317-elizaa-computer-program-for-the-study-of-natural-language-communication-between-man-and-machine/abstract)的原始介绍。或者,在[维基百科](https://wikipedia.org/wiki/ELIZA)上阅读有关 Eliza 的信息。
+
+## 练习 - 编程实现一个基础的对话机器人
+
+像 Eliza 一样的对话机器人是一个看起来可以智能地理解和响应用户输入的程序。与 Eliza 不同的是,我们的机器人不会用规则让它看起来像是在进行智能对话。我们的对话机器人将只有一种能力:它只会通过基本上可以糊弄所有普通对话的句子来随机回答,使得谈话能够继续进行。
+
+### 计划
+
+搭建聊天机器人的步骤
+
+1. 打印用户与机器人交互的使用说明
+2. 开启循环
+ 1. 获取用户输入
+ 2. 如果用户要求退出,就退出
+ 3. 处理用户输入并选择一个回答(在这个例子中,从回答列表中随机选择一个回答)
+ 4. 打印回答
+3. 重复步骤2
+
+### 构建聊天机器人
+
+接下来让我们建一个聊天机器人。我们将从定义一些短语开始。
+
+1. 使用以下随机的回复(`random_responses`)在 Python 中自己创建此机器人:
+
+ ```python
+ random_responses = ["That is quite interesting, please tell me more.",
+ "I see. Do go on.",
+ "Why do you say that?",
+ "Funny weather we've been having, isn't it?",
+ "Let's change the subject.",
+ "Did you catch the game last night?"]
+ ```
+
+ 程序运行看起来应该是这样:(用户输入位于以 `>` 开头的行上)
+
+ ```output
+ Hello, I am Marvin, the simple robot.
+ You can end this conversation at any time by typing 'bye'
+ After typing each answer, press 'enter'
+ How are you today?
+ > I am good thanks
+ That is quite interesting, please tell me more.
+ > today I went for a walk
+ Did you catch the game last night?
+ > I did, but my team lost
+ Funny weather we've been having, isn't it?
+ > yes but I hope next week is better
+ Let's change the subject.
+ > ok, lets talk about music
+ Why do you say that?
+ > because I like music!
+ Why do you say that?
+ > bye
+ It was nice talking to you, goodbye!
+ ```
+
+ 示例程序在[这里](../solution/bot.py)。这只是一种可能的解决方案。
+
+ ✅ 停下来,思考一下
+
+ 1. 你认为这些随机响应能够“欺骗”人类,使人类认为机器人实际上理解了他们的意思吗?
+ 2. 机器人需要哪些功能才能更有效的回应?
+ 3. 如果机器人真的可以“理解”一个句子的意思,它是否也需要“记住”前面句子的意思?
+
+---
+## 🚀挑战
+
+在上面的「停下来,思考一下」板块中选择一个问题,尝试编程实现它们,或使用伪代码在纸上编写解决方案。
+
+在下一课中,您将了解解析自然语言和机器学习的许多其他方法。
+
+## [课后测验](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/32/)
+
+## 复习与自学
+
+看看下面的参考资料作为进一步的参考阅读。
+
+### 参考
+
+1. Schubert, Lenhart, "Computational Linguistics", *The Stanford Encyclopedia of Philosophy* (Spring 2020 Edition), Edward N. Zalta (ed.), URL = .
+2. Princeton University "About WordNet." [WordNet](https://wordnet.princeton.edu/). Princeton University. 2010.
+
+## 任务
+
+[查找一个机器人](../assignment.md)
diff --git a/6-NLP/3-Translation-Sentiment/README.md b/6-NLP/3-Translation-Sentiment/README.md
index bcd6cdd1..0c6b568b 100644
--- a/6-NLP/3-Translation-Sentiment/README.md
+++ b/6-NLP/3-Translation-Sentiment/README.md
@@ -143,7 +143,7 @@ Your task is to determine, using sentiment polarity, if *Pride and Prejudice* ha
1. If the polarity is 1 or -1 store the sentence in an array or list of positive or negative messages
5. At the end, print out all the positive sentences and negative sentences (separately) and the number of each.
-Here is a sample [solution](solutions/notebook.ipynb).
+Here is a sample [solution](solution/notebook.ipynb).
✅ Knowledge Check
diff --git a/6-NLP/5-Hotel-Reviews-2/README.md b/6-NLP/5-Hotel-Reviews-2/README.md
index 7d8a4d03..12b9a15a 100644
--- a/6-NLP/5-Hotel-Reviews-2/README.md
+++ b/6-NLP/5-Hotel-Reviews-2/README.md
@@ -347,13 +347,13 @@ print("Saving results to Hotel_Reviews_NLP.csv")
df.to_csv(r"../data/Hotel_Reviews_NLP.csv", index = False)
```
-You should run the entire code for [the analysis notebook](solution/notebook-sentiment-analysis.ipynb) (after you've run [your filtering notebook](solution/notebook-filtering.ipynb) to generate the Hotel_Reviews_Filtered.csv file).
+You should run the entire code for [the analysis notebook](solution/3-notebook.ipynb) (after you've run [your filtering notebook](solution/1-notebook.ipynb) to generate the Hotel_Reviews_Filtered.csv file).
To review, the steps are:
-1. Original dataset file **Hotel_Reviews.csv** is explored in the previous lesson with [the explorer notebook](../4-Hotel-Reviews-1/solution/notebook-explorer.ipynb)
-2. Hotel_Reviews.csv is filtered by [the filtering notebook](solution/notebook-filtering.ipynb) resulting in **Hotel_Reviews_Filtered.csv**
-3. Hotel_Reviews_Filtered.csv is processed by [the sentiment analysis notebook](solution/notebook-sentiment-analysis.ipynb) resulting in **Hotel_Reviews_NLP.csv**
+1. Original dataset file **Hotel_Reviews.csv** is explored in the previous lesson with [the explorer notebook](../4-Hotel-Reviews-1/solution/notebook.ipynb)
+2. Hotel_Reviews.csv is filtered by [the filtering notebook](solution/1-notebook.ipynb) resulting in **Hotel_Reviews_Filtered.csv**
+3. Hotel_Reviews_Filtered.csv is processed by [the sentiment analysis notebook](solution/3-notebook.ipynb) resulting in **Hotel_Reviews_NLP.csv**
4. Use Hotel_Reviews_NLP.csv in the NLP Challenge below
### Conclusion
diff --git a/6-NLP/translations/README.zh-cn.md b/6-NLP/translations/README.zh-cn.md
new file mode 100644
index 00000000..db08bd08
--- /dev/null
+++ b/6-NLP/translations/README.zh-cn.md
@@ -0,0 +1,24 @@
+# 自然语言处理入门
+
+自然语言处理 (NLP) 是人工智能的一个子领域,主要研究如何让机器理解和处理人类语言,并用它来执行拼写检查或机器翻译等任务。
+
+## 本节主题:欧洲语言文学和欧洲浪漫酒店 ❤️
+
+在这部分课程中,您将了解机器学习最广泛的用途之一:自然语言处理 (NLP)。源自计算语言学,这一类人工智能会通过语音或文本与人类交流,建立连接人与机器的桥梁。
+
+课程中,我们将通过构建小型对话机器人来学习 NLP 的基础知识,以了解机器学习是如何使这个机器人越来越“智能”。您将穿越回 1813 年,与简·奥斯汀的经典小说 **傲慢与偏见** 中的 Elizabeth Bennett 和 Mr. Darcy 聊天(该小说于 1813 年出版)。然后,您将通过欧洲的酒店评论来进一步学习情感分析。
+
+
+> 由 Elaine Howlin 拍摄, 来自 Unsplash
+
+## 课程
+
+1. [自然语言处理简介](../1-Introduction-to-NLP/README.md)
+2. [NLP 常见任务与技巧](../2-Tasks/README.md)
+3. [机器学习翻译和情感分析](../3-Translation-Sentiment/README.md)
+4. [准备数据](../4-Hotel-Reviews-1/README.md)
+5. [用于情感分析的工具:NLTK](../5-Hotel-Reviews-2/README.md)
+
+## 作者
+
+这些自然语言处理课程由 [Stephen Howell](https://twitter.com/Howell_MSFT) 用 ☕ 编写
\ No newline at end of file
diff --git a/7-TimeSeries/2-ARIMA/README.md b/7-TimeSeries/2-ARIMA/README.md
index d54a781b..97d0c49a 100644
--- a/7-TimeSeries/2-ARIMA/README.md
+++ b/7-TimeSeries/2-ARIMA/README.md
@@ -295,7 +295,7 @@ Walk-forward validation is the gold standard of time series model evaluation and
eval_df.head()
```
- ```output
+ Output
| | | timestamp | h | prediction | actual |
| --- | ---------- | --------- | --- | ---------- | -------- |
| 0 | 2014-12-30 | 00:00:00 | t+1 | 3,008.74 | 3,023.00 |
@@ -303,7 +303,7 @@ Walk-forward validation is the gold standard of time series model evaluation and
| 2 | 2014-12-30 | 02:00:00 | t+1 | 2,900.17 | 2,899.00 |
| 3 | 2014-12-30 | 03:00:00 | t+1 | 2,917.69 | 2,886.00 |
| 4 | 2014-12-30 | 04:00:00 | t+1 | 2,946.99 | 2,963.00 |
- ```
+
Observe the hourly data's prediction, compared to the actual load. How accurate is this?
diff --git a/8-Reinforcement/1-QLearning/README.md b/8-Reinforcement/1-QLearning/README.md
index bfa07ffe..6301c46e 100644
--- a/8-Reinforcement/1-QLearning/README.md
+++ b/8-Reinforcement/1-QLearning/README.md
@@ -229,8 +229,7 @@ We are now ready to implement the learning algorithm. Before we do that, we also
We add a few `eps` to the original vector in order to avoid division by 0 in the initial case, when all components of the vector are identical.
Run them learning algorithm through 5000 experiments, also called **epochs**: (code block 8)
-
- ```python
+```python
for epoch in range(5000):
# Pick initial point
@@ -255,11 +254,11 @@ Run them learning algorithm through 5000 experiments, also called **epochs**: (c
ai = action_idx[a]
Q[x,y,ai] = (1 - alpha) * Q[x,y,ai] + alpha * (r + gamma * Q[x+dpos[0], y+dpos[1]].max())
n+=1
- ```
+```
- After executing this algorithm, the Q-Table should be updated with values that define the attractiveness of different actions at each step. We can try to visualize the Q-Table by plotting a vector at each cell that will point in the desired direction of movement. For simplicity, we draw a small circle instead of an arrow head.
+After executing this algorithm, the Q-Table should be updated with values that define the attractiveness of different actions at each step. We can try to visualize the Q-Table by plotting a vector at each cell that will point in the desired direction of movement. For simplicity, we draw a small circle instead of an arrow head.
-
+
## Checking the policy
diff --git a/8-Reinforcement/translations/README.zh-cn.md b/8-Reinforcement/translations/README.zh-cn.md
new file mode 100644
index 00000000..25a53cfc
--- /dev/null
+++ b/8-Reinforcement/translations/README.zh-cn.md
@@ -0,0 +1,53 @@
+# 强化学习简介
+
+强化学习 (RL, Reinforcement Learning),是基本的机器学习范式之一(仅次于监督学习 (Supervised Learning) 和无监督学习(Unsupervised Learning))。强化学习和「策略」息息相关:它应当产生正确的策略,或从错误的策略中学习。
+
+假设有一个模拟环境,比如说股市。当我们用某一个规则来限制这个市场时,会发生什么?这个规则(或者说策略)有积极或消极的影响吗?如果它的影响是正面的,我们需要从这种_负面强化_中学习,改变我们的策略。如果它的影响是正面的,我们需要在这种_积极强化_的基础上再进一步发展。
+
+
+
+> 彼得和他的朋友们得从饥饿的狼这儿逃掉!图片来自 [Jen Looper](https://twitter.com/jenlooper)
+
+## 本节主题:彼得与狼(俄罗斯)
+
+[彼得与狼](https://en.wikipedia.org/wiki/Peter_and_the_Wolf) 是俄罗斯作曲家[谢尔盖·普罗科菲耶夫](https://en.wikipedia.org/wiki/Sergei_Prokofiev)创作的音乐童话。它讲述了彼得勇敢地走出家门,到森林中央追逐狼的故事。在本节中,我们将训练帮助 Peter 追狼的机器学习算法:
+
+- **探索**周边区域并构建最佳地图
+- **学习**如何使用滑板并在滑板上保持平衡,以便更快地移动。
+
+[](https://www.youtube.com/watch?v=Fmi5zHg4QSM)
+
+> 🎥 点击上面的图片,听普罗科菲耶夫的《彼得与狼》
+
+## 强化学习
+
+在前面的部分中,您已经看到了两类机器学习问题的例子:
+
+- **监督**,在有已经标记的,暗含解决方案的数据集的情况下。 [分类](../../4-Classification/README.md) 和 [回归](../../2-Regression/README.md) 是监督学习任务。
+- **无监督**,在我们没有标记训练数据集的情况下。无监督学习的主要例子是 [聚类](../../5-Clustering/README.md)。
+
+在本节中,我们将学习一类新的机器学习问题,它不需要已经标记的训练数据 —— 比如这两类问题:
+
+- **[半监督学习](https://wikipedia.org/wiki/Semi-supervised_learning)**,在我们有很多未标记的、可以用来预训练模型的数据的情况下。
+- **[强化学习](https://wikipedia.org/wiki/Reinforcement_learning)**,在这种方法中,机器通过在某种模拟环境中进行实验来学习最佳策略。
+
+### 示例 - 电脑游戏
+
+假设我们要教会计算机玩某一款游戏 —— 例如国际象棋,或者 [超级马里奥](https://wikipedia.org/wiki/Super_Mario)。为了让计算机学会玩游戏,我们需要它预测在每个游戏「状态」下,它应该做什么「操作」。虽然这看起来像是一个分类问题,但事实并非如此,因为我们并没有像这样的,包含「状态」和状态对应的「操作」的数据集。我们只有一些有限的数据,比如来自国际象棋比赛的记录,或者是玩家玩超级马里奥的记录。这些数据可能无法涵盖足够多的「状态」。
+
+不同于这种需要大量现有的数据的方法,**强化学习**是基于*让计算机多次玩*并观察玩的结果的想法。因此,要使用强化学习方法,我们需要两个要素:
+
+- **环境**和**模拟器**,它们允许我们多次玩游戏。该模拟器应该定义所有游戏规则,以及可能的状态和动作。
+
+- **奖励函数**,它会告诉我们每个每一步(或者每局游戏)的表现如何。
+
+其他类型的机器学习和强化学习 (RL) 之间的主要区别在于,在 RL 中,我们通常在完成游戏之前,都不知道我们是赢还是输。因此,我们不能说单独的某个动作是不是「好」的 - 我们只会在游戏结束时获得奖励。我们的目标是设计算法,使我们能够在这种不确定的条件下训练模型。我们将了解一种称为 **Q-learning** 的 RL 算法。
+
+## 课程
+
+1.【强化学习和 Q-Learning 介绍】(1-QLearning/README.md)
+2.【使用 Gym 模拟环境】(2-Gym/README.md)
+
+## 本文作者
+
+“强化学习简介” 由 [Dmitry Soshnikov](http://soshnikov.com) 用 ♥️ 编写
\ No newline at end of file
diff --git a/9-Real-World/translations/README.zh-cn.md b/9-Real-World/translations/README.zh-cn.md
new file mode 100644
index 00000000..89cae2cb
--- /dev/null
+++ b/9-Real-World/translations/README.zh-cn.md
@@ -0,0 +1,14 @@
+# 附言: 经典机器学习在现实生活中的应用
+
+在课程的这一章节中,你将会了解一些经典机器学习的现实应用。我们在网络上找遍了涉及课程中这些技术的应用的白皮书和文章,从中尽力排除了神经网络、深度学习和AI。让我们一起来探索机器学习是如何被应用在商业系统、生态应用、金融、艺术文化和其他领域中吧。
+
+
+
+> 照片由 Alexis Fauvet 拍摄并发布在 Unsplash 平台
+
+## 课程安排
+
+1. [机器学习的现实应用](../1-Applications/README.md)
+## 致谢
+
+"机器学习的现实应用" 由 [Jen Looper](https://twitter.com/jenlooper) 和 [Ornella Altunyan](https://twitter.com/ornelladotcom) 两人的团队共同撰写.
diff --git a/README.md b/README.md
index 34c09b57..64ab0a82 100644
--- a/README.md
+++ b/README.md
@@ -20,19 +20,22 @@ Travel with us around the world as we apply these classic techniques to data fro
**🎨 Thanks as well to our illustrators** Tomomi Imura, Dasani Madipalli, and Jen Looper
- **🙏 Special thanks 🙏 to our Microsoft Student Ambassador authors, reviewers and content contributors**, notably Rishit Dagli, Muhammad Sakib Khan Inan, Rohan Raj, Alexandru Petrescu, Abhishek Jaiswal, Nawrin Tabassum, Ioan Samuila, and Snigdha Agarwal
+**🙏 Special thanks 🙏 to our Microsoft Student Ambassador authors, reviewers and content contributors**, notably Rishit Dagli, Muhammad Sakib Khan Inan, Rohan Raj, Alexandru Petrescu, Abhishek Jaiswal, Nawrin Tabassum, Ioan Samuila, and Snigdha Agarwal
+
+**🤩 Extra gratitude to Microsoft Student Ambassador Eric Wanjau for our R lessons!**
---
+
# Getting Started
**Students**, to use this curriculum, fork the entire repo to your own GitHub account and complete the exercises on your own or with a group:
-- Start with a pre-lecture quiz
-- Read the lecture and complete the activities, pausing and reflecting at each knowledge check.
-- Try to create the projects by comprehending the lessons rather than running the solution code; however that code is available in the `/solution` folders in each project-oriented lesson.
-- Take the post-lecture quiz
-- Complete the challenge
-- Complete the assignment
+- Start with a pre-lecture quiz.
+- Read the lecture and complete the activities, pausing and reflecting at each knowledge check.
+- Try to create the projects by comprehending the lessons rather than running the solution code; however that code is available in the `/solution` folders in each project-oriented lesson.
+- Take the post-lecture quiz.
+- Complete the challenge.
+- Complete the assignment.
- After completing a lesson group, visit the [Discussion board](https://github.com/microsoft/ML-For-Beginners/discussions) and "learn out loud" by filling out the appropriate PAT rubric. A 'PAT' is a Progress Assessment Tool that is a rubric you fill out to further your learning. You can also react to other PATs so we can learn together.
> For further study, we recommend following these [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-15963-cxa) modules and learning paths.
@@ -48,6 +51,7 @@ Travel with us around the world as we apply these classic techniques to data fro
> 🎥 Click the image above for a video about the project and the folks who created it!
---
+
## Pedagogy
We have chosen two pedagogical tenets while building this curriculum: ensuring that it is hands-on **project-based** and that it includes **frequent quizzes**. In addition, this curriculum has a common **theme** to give it cohesion.
@@ -55,6 +59,7 @@ We have chosen two pedagogical tenets while building this curriculum: ensuring t
By ensuring that the content aligns with projects, the process is made more engaging for students and retention of concepts will be augmented. In addition, a low-stakes quiz before a class sets the intention of the student towards learning a topic, while a second quiz after class ensures further retention. This curriculum was designed to be flexible and fun and can be taken in whole or in part. The projects start small and become increasingly complex by the end of the 12 week cycle. This curriculum also includes a postscript on real-world applications of ML, which can be used as extra credit or as a basis for discussion.
> Find our [Code of Conduct](CODE_OF_CONDUCT.md), [Contributing](CONTRIBUTING.md), and [Translation](TRANSLATIONS.md) guidelines. We welcome your constructive feedback!
+
## Each lesson includes:
- optional sketchnote
@@ -70,45 +75,45 @@ By ensuring that the content aligns with projects, the process is made more enga
> **A note about quizzes**: All quizzes are contained [in this app](https://jolly-sea-0a877260f.azurestaticapps.net), for 50 total quizzes of three questions each. They are linked from within the lessons but the quiz app can be run locally; follow the instruction in the `quiz-app` folder.
-
-| Lesson Number | Topic | Lesson Grouping | Learning Objectives | Linked Lesson | Author |
-| :-----------: | :--------------------------------------------------------: | :-------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------: | :------------: |
-| 01 | Introduction to machine learning | [Introduction](1-Introduction/README.md) | Learn the basic concepts behind machine learning | [lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
-| 02 | The History of machine learning | [Introduction](1-Introduction/README.md) | Learn the history underlying this field | [lesson](1-Introduction/2-history-of-ML/README.md) | Jen and Amy |
-| 03 | Fairness and machine learning | [Introduction](1-Introduction/README.md) | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](1-Introduction/3-fairness/README.md) | Tomomi |
-| 04 | Techniques for machine learning | [Introduction](1-Introduction/README.md) | What techniques do ML researchers use to build ML models? | [lesson](1-Introduction/4-techniques-of-ML/README.md) | Chris and Jen |
-| 05 | Introduction to regression | [Regression](2-Regression/README.md) | Get started with Python and Scikit-learn for regression models | [lesson](2-Regression/1-Tools/README.md) | Jen |
-| 06 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Visualize and clean data in preparation for ML | [lesson](2-Regression/2-Data/README.md) | Jen |
-| 07 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build linear and polynomial regression models | [lesson](2-Regression/3-Linear/README.md) | Jen |
-| 08 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build a logistic regression model | [lesson](2-Regression/4-Logistic/README.md) | Jen |
-| 09 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a web app to use your trained model | [lesson](3-Web-App/1-Web-App/README.md) | Jen |
-| 10 | Introduction to classification | [Classification](4-Classification/README.md) | Clean, prep, and visualize your data; introduction to classification | [lesson](4-Classification/1-Introduction/README.md) | Jen and Cassie |
-| 11 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Introduction to classifiers | [lesson](4-Classification/2-Classifiers-1/README.md) | Jen and Cassie |
-| 12 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | More classifiers | [lesson](4-Classification/3-Classifiers-2/README.md) | Jen and Cassie |
-| 13 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Build a recommender web app using your model | [lesson](4-Classification/4-Applied/README.md) | Jen |
-| 14 | Introduction to clustering | [Clustering](5-Clustering/README.md) | Clean, prep, and visualize your data; Introduction to clustering | [lesson](5-Clustering/1-Visualize/README.md) | Jen |
-| 15 | Exploring Nigerian Musical Tastes 🎧 | [Clustering](5-Clustering/README.md) | Explore the K-Means clustering method | [lesson](5-Clustering/2-K-Means/README.md) | Jen |
-| 16 | Introduction to natural language processing ☕️ | [Natural language processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
-| 17 | Common NLP Tasks ☕️ | [Natural language processing](6-NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [lesson](6-NLP/2-Tasks/README.md) | Stephen |
-| 18 | Translation and sentiment analysis ♥️ | [Natural language processing](6-NLP/README.md) | Translation and sentiment analysis with Jane Austen | [lesson](6-NLP/3-Translation-Sentiment/README.md) | Stephen |
-| 19 | Romantic hotels of Europe ♥️ | [Natural language processing](6-NLP/README.md) | Sentiment analysis with hotel reviews, 1 | [lesson](6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
-| 20 | Romantic hotels of Europe ♥️ | [Natural language processing](6-NLP/README.md) | Sentiment analysis with hotel reviews 2 | [lesson](6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
-| 21 | Introduction to time series forecasting | [Time series](7-TimeSeries/README.md) | Introduction to time series forecasting | [lesson](7-TimeSeries/1-Introduction/README.md) | Francesca |
+| Lesson Number | Topic | Lesson Grouping | Learning Objectives | Linked Lesson | Author |
+| :-----------: | :------------------------------------------------------------: | :-------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------: | :------------: |
+| 01 | Introduction to machine learning | [Introduction](1-Introduction/README.md) | Learn the basic concepts behind machine learning | [lesson](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
+| 02 | The History of machine learning | [Introduction](1-Introduction/README.md) | Learn the history underlying this field | [lesson](1-Introduction/2-history-of-ML/README.md) | Jen and Amy |
+| 03 | Fairness and machine learning | [Introduction](1-Introduction/README.md) | What are the important philosophical issues around fairness that students should consider when building and applying ML models? | [lesson](1-Introduction/3-fairness/README.md) | Tomomi |
+| 04 | Techniques for machine learning | [Introduction](1-Introduction/README.md) | What techniques do ML researchers use to build ML models? | [lesson](1-Introduction/4-techniques-of-ML/README.md) | Chris and Jen |
+| 05 | Introduction to regression | [Regression](2-Regression/README.md) | Get started with Python and Scikit-learn for regression models | [lesson](2-Regression/1-Tools/README.md) | Jen |
+| 06 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Visualize and clean data in preparation for ML | [lesson](2-Regression/2-Data/README.md) | Jen |
+| 07 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build linear and polynomial regression models | [lesson](2-Regression/3-Linear/README.md) | Jen |
+| 08 | North American pumpkin prices 🎃 | [Regression](2-Regression/README.md) | Build a logistic regression model | [lesson](2-Regression/4-Logistic/README.md) | Jen |
+| 09 | A Web App 🔌 | [Web App](3-Web-App/README.md) | Build a web app to use your trained model | [lesson](3-Web-App/1-Web-App/README.md) | Jen |
+| 10 | Introduction to classification | [Classification](4-Classification/README.md) | Clean, prep, and visualize your data; introduction to classification | [lesson](4-Classification/1-Introduction/README.md) | Jen and Cassie |
+| 11 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Introduction to classifiers | [lesson](4-Classification/2-Classifiers-1/README.md) | Jen and Cassie |
+| 12 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | More classifiers | [lesson](4-Classification/3-Classifiers-2/README.md) | Jen and Cassie |
+| 13 | Delicious Asian and Indian cuisines 🍜 | [Classification](4-Classification/README.md) | Build a recommender web app using your model | [lesson](4-Classification/4-Applied/README.md) | Jen |
+| 14 | Introduction to clustering | [Clustering](5-Clustering/README.md) | Clean, prep, and visualize your data; Introduction to clustering | [lesson](5-Clustering/1-Visualize/README.md) | Jen |
+| 15 | Exploring Nigerian Musical Tastes 🎧 | [Clustering](5-Clustering/README.md) | Explore the K-Means clustering method | [lesson](5-Clustering/2-K-Means/README.md) | Jen |
+| 16 | Introduction to natural language processing ☕️ | [Natural language processing](6-NLP/README.md) | Learn the basics about NLP by building a simple bot | [lesson](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
+| 17 | Common NLP Tasks ☕️ | [Natural language processing](6-NLP/README.md) | Deepen your NLP knowledge by understanding common tasks required when dealing with language structures | [lesson](6-NLP/2-Tasks/README.md) | Stephen |
+| 18 | Translation and sentiment analysis ♥️ | [Natural language processing](6-NLP/README.md) | Translation and sentiment analysis with Jane Austen | [lesson](6-NLP/3-Translation-Sentiment/README.md) | Stephen |
+| 19 | Romantic hotels of Europe ♥️ | [Natural language processing](6-NLP/README.md) | Sentiment analysis with hotel reviews 1 | [lesson](6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
+| 20 | Romantic hotels of Europe ♥️ | [Natural language processing](6-NLP/README.md) | Sentiment analysis with hotel reviews 2 | [lesson](6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
+| 21 | Introduction to time series forecasting | [Time series](7-TimeSeries/README.md) | Introduction to time series forecasting | [lesson](7-TimeSeries/1-Introduction/README.md) | Francesca |
| 22 | ⚡️ World Power Usage ⚡️ - time series forecasting with ARIMA | [Time series](7-TimeSeries/README.md) | Time series forecasting with ARIMA | [lesson](7-TimeSeries/2-ARIMA/README.md) | Francesca |
-| 23 | Introduction to reinforcement learning | [Reinforcement learning](8-Reinforcement/README.md) | Introduction to reinforcement learning with Q-Learning | [lesson](8-Reinforcement/1-QLearning/README.md) | Dmitry |
-| 24 | Help Peter avoid the wolf! 🐺 | [Reinforcement learning](8-Reinforcement/README.md) | Reinforcement learning Gym | [lesson](8-Reinforcement/2-Gym/README.md) | Dmitry |
-| Postscript | Real-World ML scenarios and applications | [ML in the Wild](9-Real-World/README.md) | Interesting and revealing real-world applications of classical ML | [lesson](9-Real-World/1-Applications/README.md) | Team |
+| 23 | Introduction to reinforcement learning | [Reinforcement learning](8-Reinforcement/README.md) | Introduction to reinforcement learning with Q-Learning | [lesson](8-Reinforcement/1-QLearning/README.md) | Dmitry |
+| 24 | Help Peter avoid the wolf! 🐺 | [Reinforcement learning](8-Reinforcement/README.md) | Reinforcement learning Gym | [lesson](8-Reinforcement/2-Gym/README.md) | Dmitry |
+| Postscript | Real-World ML scenarios and applications | [ML in the Wild](9-Real-World/README.md) | Interesting and revealing real-world applications of classical ML | [lesson](9-Real-World/1-Applications/README.md) | Team |
+
## Offline access
You can run this documentation offline by using [Docsify](https://docsify.js.org/#/). Fork this repo, [install Docsify](https://docsify.js.org/#/quickstart) on your local machine, and then in the root folder of this repo, type `docsify serve`. The website will be served on port 3000 on your localhost: `localhost:3000`.
## PDFs
-Find a pdf of the curriculum with links [here](pdf/readme.pdf)
+Find a pdf of the curriculum with links [here](pdf/readme.pdf).
## Help Wanted!
-Would you like to contribute a translation? Please read our [translation guidelines](TRANSLATIONS.md) and add input [here](https://github.com/microsoft/ML-For-Beginners/issues/71)
+Would you like to contribute a translation? Please read our [translation guidelines](TRANSLATIONS.md) and add input [here](https://github.com/microsoft/ML-For-Beginners/issues/71).
## Other Curricula
@@ -116,4 +121,3 @@ Our team produces other curricula! Check out:
- [Web Dev for Beginners](https://aka.ms/webdev-beginners)
- [IoT for Beginners](https://aka.ms/iot-beginners)
-
diff --git a/package.json b/package.json
index b64c6bf1..3b5c347c 100644
--- a/package.json
+++ b/package.json
@@ -4,8 +4,8 @@
"description": "Machine Learning for Beginners - A Curriculum",
"main": "index.js",
"scripts": {
- "convert": "node_modules/.bin/docsify-to-pdf"
- },
+ "convert": "node_modules/.bin/docsify-to-pdf"
+ },
"repository": {
"type": "git",
"url": "git+https://github.com/microsoft/ML-For-Beginners.git"
diff --git a/quiz-app/README.md b/quiz-app/README.md
index 042d53ca..83b30d1d 100644
--- a/quiz-app/README.md
+++ b/quiz-app/README.md
@@ -1,6 +1,6 @@
# Quizzes
-These quizzes are the pre- and post-lecture quizzes for the web development for ml curriculum at https://aka.ms/ml-beginners
+These quizzes are the pre- and post-lecture quizzes for the ML curriculum at https://aka.ms/ml-beginners
## Project setup
diff --git a/quiz-app/package-lock.json b/quiz-app/package-lock.json
index e9aebee3..8f51a0ba 100644
--- a/quiz-app/package-lock.json
+++ b/quiz-app/package-lock.json
@@ -1087,16 +1087,6 @@
"postcss": "^7.0.0"
}
},
- "@kazupon/vue-i18n-loader": {
- "version": "0.5.0",
- "resolved": "https://registry.npmjs.org/@kazupon/vue-i18n-loader/-/vue-i18n-loader-0.5.0.tgz",
- "integrity": "sha512-Tp2mXKemf9/RBhI9CW14JjR9oKjL2KH7tV6S0eKEjIBuQBAOFNuPJu3ouacmz9hgoXbNp+nusw3MVQmxZWFR9g==",
- "dev": true,
- "requires": {
- "js-yaml": "^3.13.1",
- "json5": "^2.1.1"
- }
- },
"@mrmlnc/readdir-enhanced": {
"version": "2.2.1",
"resolved": "https://registry.npmjs.org/@mrmlnc/readdir-enhanced/-/readdir-enhanced-2.2.1.tgz",
@@ -1720,6 +1710,16 @@
"integrity": "sha512-nQyp0o1/mNdbTO1PO6kHkwSrmgZ0MT/jCCpNiwbUjGoRN4dlBhqJtoQuCnEOKzgTVwg0ZWiCoQy6SxMebQVh8A==",
"dev": true
},
+ "ansi-styles": {
+ "version": "4.3.0",
+ "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz",
+ "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==",
+ "dev": true,
+ "optional": true,
+ "requires": {
+ "color-convert": "^2.0.1"
+ }
+ },
"cacache": {
"version": "13.0.1",
"resolved": "https://registry.npmjs.org/cacache/-/cacache-13.0.1.tgz",
@@ -1746,6 +1746,53 @@
"unique-filename": "^1.1.1"
}
},
+ "chalk": {
+ "version": "4.1.1",
+ "resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.1.tgz",
+ "integrity": "sha512-diHzdDKxcU+bAsUboHLPEDQiw0qEe0qd7SYUn3HgcFlWgbDcfLGswOHYeGrHKzG9z6UYf01d9VFMfZxPM1xZSg==",
+ "dev": true,
+ "optional": true,
+ "requires": {
+ "ansi-styles": "^4.1.0",
+ "supports-color": "^7.1.0"
+ }
+ },
+ "color-convert": {
+ "version": "2.0.1",
+ "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz",
+ "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==",
+ "dev": true,
+ "optional": true,
+ "requires": {
+ "color-name": "~1.1.4"
+ }
+ },
+ "color-name": {
+ "version": "1.1.4",
+ "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz",
+ "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==",
+ "dev": true,
+ "optional": true
+ },
+ "has-flag": {
+ "version": "4.0.0",
+ "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz",
+ "integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==",
+ "dev": true,
+ "optional": true
+ },
+ "loader-utils": {
+ "version": "2.0.0",
+ "resolved": "https://registry.npmjs.org/loader-utils/-/loader-utils-2.0.0.tgz",
+ "integrity": "sha512-rP4F0h2RaWSvPEkD7BLDFQnvSf+nK+wr3ESUjNTyAGobqrijmW92zc+SO6d4p4B1wh7+B/Jg1mkQe5NYUEHtHQ==",
+ "dev": true,
+ "optional": true,
+ "requires": {
+ "big.js": "^5.2.2",
+ "emojis-list": "^3.0.0",
+ "json5": "^2.1.2"
+ }
+ },
"source-map": {
"version": "0.6.1",
"resolved": "https://registry.npmjs.org/source-map/-/source-map-0.6.1.tgz",
@@ -1762,6 +1809,16 @@
"minipass": "^3.1.1"
}
},
+ "supports-color": {
+ "version": "7.2.0",
+ "resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz",
+ "integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==",
+ "dev": true,
+ "optional": true,
+ "requires": {
+ "has-flag": "^4.0.0"
+ }
+ },
"terser-webpack-plugin": {
"version": "2.3.8",
"resolved": "https://registry.npmjs.org/terser-webpack-plugin/-/terser-webpack-plugin-2.3.8.tgz",
@@ -1778,6 +1835,18 @@
"terser": "^4.6.12",
"webpack-sources": "^1.4.3"
}
+ },
+ "vue-loader-v16": {
+ "version": "npm:vue-loader@16.3.0",
+ "resolved": "https://registry.npmjs.org/vue-loader/-/vue-loader-16.3.0.tgz",
+ "integrity": "sha512-UDgni/tUVSdwHuQo+vuBmEgamWx88SuSlEb5fgdvHrlJSPB9qMBRF6W7bfPWSqDns425Gt1wxAUif+f+h/rWjg==",
+ "dev": true,
+ "optional": true,
+ "requires": {
+ "chalk": "^4.1.0",
+ "hash-sum": "^2.0.0",
+ "loader-utils": "^2.0.0"
+ }
}
}
},
@@ -10953,87 +11022,6 @@
}
}
},
- "vue-loader-v16": {
- "version": "npm:vue-loader@16.1.2",
- "resolved": "https://registry.npmjs.org/vue-loader/-/vue-loader-16.1.2.tgz",
- "integrity": "sha512-8QTxh+Fd+HB6fiL52iEVLKqE9N1JSlMXLR92Ijm6g8PZrwIxckgpqjPDWRP5TWxdiPaHR+alUWsnu1ShQOwt+Q==",
- "dev": true,
- "optional": true,
- "requires": {
- "chalk": "^4.1.0",
- "hash-sum": "^2.0.0",
- "loader-utils": "^2.0.0"
- },
- "dependencies": {
- "ansi-styles": {
- "version": "4.3.0",
- "resolved": "https://registry.npmjs.org/ansi-styles/-/ansi-styles-4.3.0.tgz",
- "integrity": "sha512-zbB9rCJAT1rbjiVDb2hqKFHNYLxgtk8NURxZ3IZwD3F6NtxbXZQCnnSi1Lkx+IDohdPlFp222wVALIheZJQSEg==",
- "dev": true,
- "optional": true,
- "requires": {
- "color-convert": "^2.0.1"
- }
- },
- "chalk": {
- "version": "4.1.0",
- "resolved": "https://registry.npmjs.org/chalk/-/chalk-4.1.0.tgz",
- "integrity": "sha512-qwx12AxXe2Q5xQ43Ac//I6v5aXTipYrSESdOgzrN+9XjgEpyjpKuvSGaN4qE93f7TQTlerQQ8S+EQ0EyDoVL1A==",
- "dev": true,
- "optional": true,
- "requires": {
- "ansi-styles": "^4.1.0",
- "supports-color": "^7.1.0"
- }
- },
- "color-convert": {
- "version": "2.0.1",
- "resolved": "https://registry.npmjs.org/color-convert/-/color-convert-2.0.1.tgz",
- "integrity": "sha512-RRECPsj7iu/xb5oKYcsFHSppFNnsj/52OVTRKb4zP5onXwVF3zVmmToNcOfGC+CRDpfK/U584fMg38ZHCaElKQ==",
- "dev": true,
- "optional": true,
- "requires": {
- "color-name": "~1.1.4"
- }
- },
- "color-name": {
- "version": "1.1.4",
- "resolved": "https://registry.npmjs.org/color-name/-/color-name-1.1.4.tgz",
- "integrity": "sha512-dOy+3AuW3a2wNbZHIuMZpTcgjGuLU/uBL/ubcZF9OXbDo8ff4O8yVp5Bf0efS8uEoYo5q4Fx7dY9OgQGXgAsQA==",
- "dev": true,
- "optional": true
- },
- "has-flag": {
- "version": "4.0.0",
- "resolved": "https://registry.npmjs.org/has-flag/-/has-flag-4.0.0.tgz",
- "integrity": "sha512-EykJT/Q1KjTWctppgIAgfSO0tKVuZUjhgMr17kqTumMl6Afv3EISleU7qZUzoXDFTAHTDC4NOoG/ZxU3EvlMPQ==",
- "dev": true,
- "optional": true
- },
- "loader-utils": {
- "version": "2.0.0",
- "resolved": "https://registry.npmjs.org/loader-utils/-/loader-utils-2.0.0.tgz",
- "integrity": "sha512-rP4F0h2RaWSvPEkD7BLDFQnvSf+nK+wr3ESUjNTyAGobqrijmW92zc+SO6d4p4B1wh7+B/Jg1mkQe5NYUEHtHQ==",
- "dev": true,
- "optional": true,
- "requires": {
- "big.js": "^5.2.2",
- "emojis-list": "^3.0.0",
- "json5": "^2.1.2"
- }
- },
- "supports-color": {
- "version": "7.2.0",
- "resolved": "https://registry.npmjs.org/supports-color/-/supports-color-7.2.0.tgz",
- "integrity": "sha512-qpCAvRl9stuOHveKsn7HncJRvv501qIacKzQlO/+Lwxc9+0q2wLyv4Dfvt80/DPn2pqOBsJdDiogXGR9+OvwRw==",
- "dev": true,
- "optional": true,
- "requires": {
- "has-flag": "^4.0.0"
- }
- }
- }
- },
"vue-router": {
"version": "3.4.9",
"resolved": "https://registry.npmjs.org/vue-router/-/vue-router-3.4.9.tgz",
diff --git a/quiz-app/src/App.vue b/quiz-app/src/App.vue
index 78482d49..ef95dbed 100644
--- a/quiz-app/src/App.vue
+++ b/quiz-app/src/App.vue
@@ -6,6 +6,8 @@
diff --git a/quiz-app/src/assets/translations/en.json b/quiz-app/src/assets/translations/en.json
index ae358aef..337b0867 100644
--- a/quiz-app/src/assets/translations/en.json
+++ b/quiz-app/src/assets/translations/en.json
@@ -1,2815 +1,2815 @@
[
- {
- "title": "Machine Learning for Beginners: Quizzes",
- "complete": "Congratulations, you completed the quiz!",
- "error": "Sorry, try again",
- "quizzes": [
- {
- "id": 1,
- "title": "Introduction to Machine Learning: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Applications of machine learning are all around us",
- "answerOptions": [
- {
- "answerText": "True",
- "isCorrect": "true"
- },
- {
- "answerText": "False",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is the technical difference between classical ML and deep learning?",
- "answerOptions": [
- {
- "answerText": "classical ML was invented first",
- "isCorrect": "false"
- },
- {
- "answerText": "the use of neural networks",
- "isCorrect": "true"
- },
- {
- "answerText": "deep learning is used in robots",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Why might a business want to use ML strategies?",
- "answerOptions": [
- {
- "answerText": "to automate the solving of multi-dimensional problems",
- "isCorrect": "false"
- },
- {
- "answerText": "to customize a shopping experience based on the type of customer",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 2,
- "title": "Introduction to Machine Learning: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Machine learning algorithms are meant to simulate",
- "answerOptions": [
- {
- "answerText": "intelligent machines",
- "isCorrect": "false"
- },
- {
- "answerText": "the human brain",
- "isCorrect": "true"
- },
- {
- "answerText": "orangutans",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is an example of a classical ML technique?",
- "answerOptions": [
- {
- "answerText": "natural language processing",
- "isCorrect": "true"
- },
- {
- "answerText": "deep learning",
- "isCorrect": "false"
- },
- {
- "answerText": "Neural Networks",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Why should everyone learn the basics of ML?",
- "answerOptions": [
- {
- "answerText": "learning ML is fun and accessible to everyone",
- "isCorrect": "false"
- },
- {
- "answerText": "ML strategies are being used in many industries and domains",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 3,
- "title": "History of Machine Learning: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Approximately when was the term 'artificial intelligence' coined?",
- "answerOptions": [
- {
- "answerText": "1980s",
- "isCorrect": "false"
- },
- {
- "answerText": "1950s",
- "isCorrect": "true"
- },
- {
- "answerText": "1930s",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Who was one of the early pioneers of machine learning?",
- "answerOptions": [
- {
- "answerText": "Alan Turing",
- "isCorrect": "true"
- },
- {
- "answerText": "Bill Gates",
- "isCorrect": "false"
- },
- {
- "answerText": "Shakey the robot",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is one of the reasons that advancement in AI slowed in the 1970s?",
- "answerOptions": [
- {
- "answerText": "Limited compute power",
- "isCorrect": "true"
- },
- {
- "answerText": "Not enough skilled engineers",
- "isCorrect": "false"
- },
- {
- "answerText": "Conflicts between countries",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 4,
- "title": "History of Machine Learning: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What's an example of a 'scruffy' AI system?",
- "answerOptions": [
- {
- "answerText": "ELIZA",
- "isCorrect": "true"
- },
- {
- "answerText": "HACKML",
- "isCorrect": "false"
- },
- {
- "answerText": "SSYSTEM",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is an example of a technology that was developed during 'The Golden Years'?",
- "answerOptions": [
- {
- "answerText": "Blocks world",
- "isCorrect": "true"
- },
- {
- "answerText": "Jibo",
- "isCorrect": "false"
- },
- {
- "answerText": "Robot dogs",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which event was foundational in the creation and expansion of the field of artificial intelligence?",
- "answerOptions": [
- {
- "answerText": "Turing Test",
- "isCorrect": "false"
- },
- {
- "answerText": "Dartmouth Summer Research Project",
- "isCorrect": "true"
- },
- {
- "answerText": "AI Winter",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 5,
- "title": "Fairness and Machine Learning: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Unfairness in Machine Learning can happen",
- "answerOptions": [
- {
- "answerText": "intentionally",
- "isCorrect": "false"
- },
- {
- "answerText": "unintentionally",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "The term 'unfairness' in ML connotes:",
- "answerOptions": [
- {
- "answerText": "harms for a group of people",
- "isCorrect": "true"
- },
- {
- "answerText": "harm to one person",
- "isCorrect": "false"
- },
- {
- "answerText": "harms for the majority of people",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "The five main types of harms include",
- "answerOptions": [
- {
- "answerText": "allocation, quality of service, stereotyping, denigration, and over- or under- representation",
- "isCorrect": "true"
- },
- {
- "answerText": "elocation, quality of service, stereotyping, denigration, and over- or under- representation ",
- "isCorrect": "false"
- },
- {
- "answerText": "allocation, quality of service, stereophonics, denigration, and over- or under- representation ",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 6,
- "title": "Fairness and Machine Learning: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Unfairness in a model can be caused by",
- "answerOptions": [
- {
- "answerText": "overrreliance on historical data",
- "isCorrect": "true"
- },
- {
- "answerText": "underreliance on historical data",
- "isCorrect": "false"
- },
- {
- "answerText": "too closely aligning to historical data",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "To mitigate unfairness, you can",
- "answerOptions": [
- {
- "answerText": "identify harms and affected groups",
- "isCorrect": "false"
- },
- {
- "answerText": "define fairness metrics",
- "isCorrect": "false"
- },
- {
- "answerText": "both the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Fairlearn is a package that can",
- "answerOptions": [
- {
- "answerText": "compare multiple models by using fairness and performance metrics",
- "isCorrect": "true"
- },
- {
- "answerText": "choose the best model for your needs",
- "isCorrect": "false"
- },
- {
- "answerText": "help you decide what is fair and what is not",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 7,
- "title": "Tools and Techniques: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "When building a model, you should:",
- "answerOptions": [
- {
- "answerText": "prepare your data, then train your model",
- "isCorrect": "true"
- },
- {
- "answerText": "choose a training method, then prepare your data",
- "isCorrect": "false"
- },
- {
- "answerText": "tune parameters, then train your model",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Your data's ___ will impact the quality of your ML model",
- "answerOptions": [
- {
- "answerText": "quantity",
- "isCorrect": "false"
- },
- {
- "answerText": "shape",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "A feature variable is:",
- "answerOptions": [
- {
- "answerText": "a quality of your data",
- "isCorrect": "false"
- },
- {
- "answerText": "a measurable property of your data",
- "isCorrect": "true"
- },
- {
- "answerText": "a row of your data",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 8,
- "title": "Tools and Techniques: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "You should visualize your data because",
- "answerOptions": [
- {
- "answerText": "you can discover outliers",
- "isCorrect": "false"
- },
- {
- "answerText": "you can discover potential cause for bias",
- "isCorrect": "true"
- },
- {
- "answerText": "both of these",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Split your data into:",
- "answerOptions": [
- {
- "answerText": "training and turing sets",
- "isCorrect": "false"
- },
- {
- "answerText": "training and test sets",
- "isCorrect": "true"
- },
- {
- "answerText": "validation and evaluation sets",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A common command to start the training process in various ML libraries is:",
- "answerOptions": [
- {
- "answerText": "model.travel",
- "isCorrect": "false"
- },
- {
- "answerText": "model.train",
- "isCorrect": "false"
- },
- {
- "answerText": "model.fit",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 9,
- "title": "Introduction to Regression: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Which of these variables is a numeric variable?",
- "answerOptions": [
- {
- "answerText": "Height",
- "isCorrect": "true"
- },
- {
- "answerText": "Gender",
- "isCorrect": "false"
- },
- {
- "answerText": "Hair Color",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which of these variables is a categorical variable?",
- "answerOptions": [
- {
- "answerText": "Heart Rate",
- "isCorrect": "false"
- },
- {
- "answerText": "Blood Type",
- "isCorrect": "true"
- },
- {
- "answerText": "Weight",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which of these problems is a Regression analysis-based problem?",
- "answerOptions": [
- {
- "answerText": "Predicting the final exam marks of a student",
- "isCorrect": "true"
- },
- {
- "answerText": "Predicting the blood type of a person",
- "isCorrect": "false"
- },
- {
- "answerText": "Predicting whether an email is spam or not",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 10,
- "title": "Introduction to Regression: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "If your Machine Learning model's training accuracy is 95 % and the testing accuracy is 30 %, then what type of condition it is called?",
- "answerOptions": [
- {
- "answerText": "Overfitting",
- "isCorrect": "true"
- },
- {
- "answerText": "Underfitting",
- "isCorrect": "false"
- },
- {
- "answerText": "Double Fitting",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "The process of identifying significant features from a set of features is called:",
- "answerOptions": [
- {
- "answerText": "Feature Extraction",
- "isCorrect": "false"
- },
- {
- "answerText": "Feature Dimensionality Reduction",
- "isCorrect": "false"
- },
- {
- "answerText": "Feature Selection",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "The process of splitting a dataset into a certain ratio of training and testing dataset using Scikit Learn's 'train_test_split()' method/function is called:",
- "answerOptions": [
- {
- "answerText": "Cross-Validation",
- "isCorrect": "false"
- },
- {
- "answerText": "Hold-Out Validation",
- "isCorrect": "true"
- },
- {
- "answerText": "Leave one out Validation",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 11,
- "title": "Prepare and Visualize Data for Regression: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Which of these Python modules is used to plot the visualization of data?",
- "answerOptions": [
- {
- "answerText": "Numpy",
- "isCorrect": "false"
- },
- {
- "answerText": "Scikit-learn",
- "isCorrect": "false"
- },
- {
- "answerText": "Matplotlib",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "If you want to understand the spread or the other characteristics of data points of your dataset, then perform:",
- "answerOptions": [
- {
- "answerText": "Data Visualization",
- "isCorrect": "true"
- },
- {
- "answerText": "Data Preprocessing",
- "isCorrect": "false"
- },
- {
- "answerText": "Train Test Split",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which of these is a part of the Data Visualization step in a Machine Learning project?",
- "answerOptions": [
- {
- "answerText": "Incorporating a certain Machine Learning algorithm",
- "isCorrect": "false"
- },
- {
- "answerText": "Creating a pictorial representation of data using different plotting methods",
- "isCorrect": "true"
- },
- {
- "answerText": "Normalizing the values of a dataset",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 12,
- "title": "Prepare and Visualize Data for Regression: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Which of these code snippets is correct based on this lesson, if you want to check for the presence of missing values in your dataset? Suppose the dataset is stored in a variable named 'dataset' which is a Pandas DataFrame object.",
- "answerOptions": [
- {
- "answerText": "dataset.isnull().sum()",
- "isCorrect": "true"
- },
- {
- "answerText": "findMissing(dataset)",
- "isCorrect": "false"
- },
- {
- "answerText": "sum(null(dataset))",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which of these plotting methods is useful when you would like to understand the spread of different groups of datapoints from your dataset?",
- "answerOptions": [
- {
- "answerText": "Scatter Plot",
- "isCorrect": "false"
- },
- {
- "answerText": "Line Plot",
- "isCorrect": "false"
- },
- {
- "answerText": "Bar Plot",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "What can Data Visualization NOT tell you?",
- "answerOptions": [
- {
- "answerText": "Relationships among datapoints",
- "isCorrect": "false"
- },
- {
- "answerText": "The source from where the dataset is collected",
- "isCorrect": "true"
- },
- {
- "answerText": "Finding the presence of outliers in the dataset",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 13,
- "title": "Linear and Polynomial Regression: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Matplotlib is a ",
- "answerOptions": [
- {
- "answerText": "drawing library",
- "isCorrect": "false"
- },
- {
- "answerText": "data visualization library",
- "isCorrect": "true"
- },
- {
- "answerText": "lending library",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Linear Regression uses the following to plot relationships between variables",
- "answerOptions": [
- {
- "answerText": "a straight line",
- "isCorrect": "true"
- },
- {
- "answerText": "a circle",
- "isCorrect": "false"
- },
- {
- "answerText": "a curve",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A good Linear Regression model has a ___ Correlation Coefficient",
- "answerOptions": [
- {
- "answerText": "low",
- "isCorrect": "false"
- },
- {
- "answerText": "high",
- "isCorrect": "true"
- },
- {
- "answerText": "flat",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 14,
- "title": "Linear and Polynomial Regression: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "If your data is nonlinear, try a ___ type of Regression",
- "answerOptions": [
- {
- "answerText": "linear",
- "isCorrect": "false"
- },
- {
- "answerText": "spherical",
- "isCorrect": "false"
- },
- {
- "answerText": "polynomial",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "These are all types of Regression methods",
- "answerOptions": [
- {
- "answerText": "Falsestep, Ridge, Lasso and Elasticnet",
- "isCorrect": "false"
- },
- {
- "answerText": "Stepwise, Ridge, Lasso and Elasticnet",
- "isCorrect": "true"
- },
- {
- "answerText": "Stepwise, Ridge, Lariat and Elasticnet",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Least-Squares Regression means that all the datapoints surrounding the regression line are:",
- "answerOptions": [
- {
- "answerText": "squared and then subtracted",
- "isCorrect": "false"
- },
- {
- "answerText": "multiplied",
- "isCorrect": "false"
- },
- {
- "answerText": "squared and then added up",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 15,
- "title": "Logistic Regression: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Use Logistic Regression to predict",
- "answerOptions": [
- {
- "answerText": "whether an apple is ripe or not",
- "isCorrect": "true"
- },
- {
- "answerText": "how many tickets can be sold in a month",
- "isCorrect": "false"
- },
- {
- "answerText": "what color the sky will turn tomorrow at 6 PM",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Types of Logistic Regression include",
- "answerOptions": [
- {
- "answerText": "multinomial and cardinal",
- "isCorrect": "false"
- },
- {
- "answerText": "multinomial and ordinal",
- "isCorrect": "true"
- },
- {
- "answerText": "principal and ordinal",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Your data has weak correlations. The best type of Regression to use is:",
- "answerOptions": [
- {
- "answerText": "Logistic",
- "isCorrect": "true"
- },
- {
- "answerText": "Linear",
- "isCorrect": "false"
- },
- {
- "answerText": "Cardinal",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 16,
- "title": "Logistic Regression: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Seaborn is a type of",
- "answerOptions": [
- {
- "answerText": "data visualization library",
- "isCorrect": "true"
- },
- {
- "answerText": "mapping library",
- "isCorrect": "false"
- },
- {
- "answerText": "mathematical library",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A confusion matrix is also known as a:",
- "answerOptions": [
- {
- "answerText": "error matrix",
- "isCorrect": "true"
- },
- {
- "answerText": "truth matrix",
- "isCorrect": "false"
- },
- {
- "answerText": "accuracy matrix",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A good model will have:",
- "answerOptions": [
- {
- "answerText": "a large number of false positives and true negatives in its confusion matrix",
- "isCorrect": "false"
- },
- {
- "answerText": "a large number of true positives and true negatives in its confusion matrix",
- "isCorrect": "true"
- },
- {
- "answerText": "a large number of true positives and false negatives in its confusion matrix",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 17,
- "title": "Build a Web App: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What does ONNX stand for?",
- "answerOptions": [
- {
- "answerText": "Over Neural Network Exchange",
- "isCorrect": "false"
- },
- {
- "answerText": "Open Neural Network Exchange",
- "isCorrect": "true"
- },
- {
- "answerText": "Output Neural Network Exchange",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "How is Flask defined by its creators?",
- "answerOptions": [
- {
- "answerText": "mini-framework",
- "isCorrect": "false"
- },
- {
- "answerText": "large-framework",
- "isCorrect": "false"
- },
- {
- "answerText": "micro-framework",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "What does the Pickle module of Python do",
- "answerOptions": [
- {
- "answerText": "Serializes a Python Object",
- "isCorrect": "false"
- },
- {
- "answerText": "De-serializes a Python Object",
- "isCorrect": "false"
- },
- {
- "answerText": "Serializes and De-serializes a Python Object",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 18,
- "title": "Build a Web App: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What are the tools we can use to host a pre-trained model on the web using Python?",
- "answerOptions": [
- {
- "answerText": "Flask",
- "isCorrect": "true"
- },
- {
- "answerText": "TensorFlow.js",
- "isCorrect": "false"
- },
- {
- "answerText": "onnx.js",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What does SaaS stand for?",
- "answerOptions": [
- {
- "answerText": "System as a Service",
- "isCorrect": "false"
- },
- {
- "answerText": "Software as a Service",
- "isCorrect": "true"
- },
- {
- "answerText": "Security as a Service",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What does Scikit-learn's LabelEncoder library do?",
- "answerOptions": [
- {
- "answerText": "Encodes data alphabetically",
- "isCorrect": "true"
- },
- {
- "answerText": "Encodes data numerically",
- "isCorrect": "false"
- },
- {
- "answerText": "Encodes data serially",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 19,
- "title": "Classification 1: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Classification is a form of supervised learning that has a lot in common with",
- "answerOptions": [
- {
- "answerText": "Time Series",
- "isCorrect": "false"
- },
- {
- "answerText": "Regression techniques",
- "isCorrect": "true"
- },
- {
- "answerText": "NLP",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What question can classification help answer?",
- "answerOptions": [
- {
- "answerText": "Is this email spam or not?",
- "isCorrect": "true"
- },
- {
- "answerText": "Can pigs fly?",
- "isCorrect": "false"
- },
- {
- "answerText": "What is the meaning of life?",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is the first step to using Classification techniques?",
- "answerOptions": [
- {
- "answerText": "creating classes of a dataset",
- "isCorrect": "false"
- },
- {
- "answerText": "cleaning and balancing your data",
- "isCorrect": "true"
- },
- {
- "answerText": "assigning a data point to a group or outcome",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 20,
- "title": "Classification 1: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What is a multiclass question?",
- "answerOptions": [
- {
- "answerText": "the task of classifying data points into multiple classes",
- "isCorrect": "true"
- },
- {
- "answerText": "the task of classifying data points into one of several classes",
- "isCorrect": "true"
- },
- {
- "answerText": "the task of cleaning data points in multiple ways",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "It's important to clean out recurrent or unhelpful data to help your classifiers solve your problem.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What's the best reason to balance your data?",
- "answerOptions": [
- {
- "answerText": "Imbalanced data looks bad in visualizations",
- "isCorrect": "false"
- },
- {
- "answerText": "Balancing your data yields better results because an ML model won't skew towards one class",
- "isCorrect": "true"
- },
- {
- "answerText": "Balancing your data gives you more data points",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 21,
- "title": "Classification 2: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Balanced, clean data yields the best classification results",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "How do you choose the right classifier?",
- "answerOptions": [
- {
- "answerText": "Understand which classifiers work best for which scenarios",
- "isCorrect": "false"
- },
- {
- "answerText": "Educated guess and check",
- "isCorrect": "false"
- },
- {
- "answerText": "Both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Classification is a type of",
- "answerOptions": [
- {
- "answerText": "NLP",
- "isCorrect": "false"
- },
- {
- "answerText": "Supervised Learning",
- "isCorrect": "true"
- },
- {
- "answerText": "Programming language",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 22,
- "title": "Classification 2: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What is a 'solver'?",
- "answerOptions": [
- {
- "answerText": "the person who double-checks your work",
- "isCorrect": "false"
- },
- {
- "answerText": "the algorithm to use in the optimization problem",
- "isCorrect": "true"
- },
- {
- "answerText": "a machine learning technique",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which classifier did we use in this lesson?",
- "answerOptions": [
- {
- "answerText": "Logistic Regression",
- "isCorrect": "true"
- },
- {
- "answerText": "Decision Trees",
- "isCorrect": "false"
- },
- {
- "answerText": "One-vs-All Multiclass",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "How do you know if the classification algorithm is working as expected?",
- "answerOptions": [
- {
- "answerText": "By checking the accuracy of its predictions",
- "isCorrect": "true"
- },
- {
- "answerText": "By checking it against other algorithms",
- "isCorrect": "false"
- },
- {
- "answerText": "By looking at historical data for how good this algorithm is at solving similar problems",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 23,
- "title": "Classification 3: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "A good initial classifier to try is:",
- "answerOptions": [
- {
- "answerText": "Linear SVC",
- "isCorrect": "true"
- },
- {
- "answerText": "K-Means",
- "isCorrect": "false"
- },
- {
- "answerText": "Logical SVC",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Regularization controls:",
- "answerOptions": [
- {
- "answerText": "the influence of parameters",
- "isCorrect": "true"
- },
- {
- "answerText": "the influence of training speed",
- "isCorrect": "false"
- },
- {
- "answerText": "the influence of outliers",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "K-Neighbors classifier can be used for:",
- "answerOptions": [
- {
- "answerText": "supervised learning",
- "isCorrect": "false"
- },
- {
- "answerText": "unsupervised learning",
- "isCorrect": "false"
- },
- {
- "answerText": "both of these",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 24,
- "title": "Classification 3: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Support-Vector classifiers can be used for",
- "answerOptions": [
- {
- "answerText": "classification",
- "isCorrect": "false"
- },
- {
- "answerText": "regression",
- "isCorrect": "false"
- },
- {
- "answerText": "both of these",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Random Forest is a ___ type of classifier",
- "answerOptions": [
- {
- "answerText": "Ensemble",
- "isCorrect": "true"
- },
- {
- "answerText": "Dissemble",
- "isCorrect": "false"
- },
- {
- "answerText": "Assemble",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Adaboost is known for:",
- "answerOptions": [
- {
- "answerText": "focusing on the weights of incorrectly classified items",
- "isCorrect": "true"
- },
- {
- "answerText": "focusing on outliers",
- "isCorrect": "false"
- },
- {
- "answerText": "focusing on incorrect data",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 25,
- "title": "Classification 4: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Recommendation systems might be used for",
- "answerOptions": [
- {
- "answerText": "Recommending a good restaurant",
- "isCorrect": "false"
- },
- {
- "answerText": "Recommending fashions to try",
- "isCorrect": "false"
- },
- {
- "answerText": "Both of these",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Embedding a model in a web app helps it to be offline-capable",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Onnx Runtime can be used for",
- "answerOptions": [
- {
- "answerText": "Running models in a web app",
- "isCorrect": "true"
- },
- {
- "answerText": "Training models",
- "isCorrect": "false"
- },
- {
- "answerText": "Hyperparameter tuning",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 26,
- "title": "Classification 4: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Netron app helps you:",
- "answerOptions": [
- {
- "answerText": "Visualize data",
- "isCorrect": "false"
- },
- {
- "answerText": "Visualize your model's structure",
- "isCorrect": "true"
- },
- {
- "answerText": "Test your web app",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Convert your Scikit-learn model for use with Onnx using:",
- "answerOptions": [
- {
- "answerText": "sklearn-app",
- "isCorrect": "false"
- },
- {
- "answerText": "sklearn-web",
- "isCorrect": "false"
- },
- {
- "answerText": "sklearn-onnx",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Using your model in a web app is called:",
- "answerOptions": [
- {
- "answerText": "inference",
- "isCorrect": "true"
- },
- {
- "answerText": "interference",
- "isCorrect": "false"
- },
- {
- "answerText": "insurance",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 27,
- "title": "Introduction to Clustering: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "A real-life example of clustering would be",
- "answerOptions": [
- {
- "answerText": "Setting the dinner table",
- "isCorrect": "false"
- },
- {
- "answerText": "Sorting the laundry",
- "isCorrect": "true"
- },
- {
- "answerText": "Grocery shopping",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Clustering techniques can be used in these industries",
- "answerOptions": [
- {
- "answerText": "banking",
- "isCorrect": "false"
- },
- {
- "answerText": "e-commerce",
- "isCorrect": "false"
- },
- {
- "answerText": "both of these",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Clustering is a type of:",
- "answerOptions": [
- {
- "answerText": "supervised learning",
- "isCorrect": "false"
- },
- {
- "answerText": "unsupervised learning",
- "isCorrect": "true"
- },
- {
- "answerText": "reinforcement learning",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 28,
- "title": "Introduction to Clustering: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Euclidean geometry is arranged along",
- "answerOptions": [
- {
- "answerText": "planes",
- "isCorrect": "true"
- },
- {
- "answerText": "curves",
- "isCorrect": "false"
- },
- {
- "answerText": "spheres",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "The density of your clustering data is related to its",
- "answerOptions": [
- {
- "answerText": "noise",
- "isCorrect": "true"
- },
- {
- "answerText": "depth",
- "isCorrect": "false"
- },
- {
- "answerText": "validity",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "The best-known clustering algorithm is",
- "answerOptions": [
- {
- "answerText": "k-means",
- "isCorrect": "true"
- },
- {
- "answerText": "k-middle",
- "isCorrect": "false"
- },
- {
- "answerText": "k-mart",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 29,
- "title": "K-Means Clustering: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "K-Means is derived from:",
- "answerOptions": [
- {
- "answerText": "electrical engineering",
- "isCorrect": "false"
- },
- {
- "answerText": "signal processing",
- "isCorrect": "true"
- },
- {
- "answerText": "computational linguistics",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A good Silhouette score means:",
- "answerOptions": [
- {
- "answerText": "clusters are well-separated and well-defined",
- "isCorrect": "true"
- },
- {
- "answerText": "there are few clusters",
- "isCorrect": "false"
- },
- {
- "answerText": "there are many clusters",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Variance is:",
- "answerOptions": [
- {
- "answerText": "the average of the squared differences from the mean",
- "isCorrect": "false"
- },
- {
- "answerText": "a problem for clustering if it becomes too high",
- "isCorrect": "false"
- },
- {
- "answerText": "both of these",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 30,
- "title": "K-Means Clustering: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "A Voronoi diagram shows:",
- "answerOptions": [
- {
- "answerText": "a cluster's variance",
- "isCorrect": "false"
- },
- {
- "answerText": "a cluster's seed and its region",
- "isCorrect": "true"
- },
- {
- "answerText": "a cluster's inertia",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Inertia is",
- "answerOptions": [
- {
- "answerText": "a measure of how internally coherent clusters are",
- "isCorrect": "true"
- },
- {
- "answerText": "a measure of how much clusters move",
- "isCorrect": "false"
- },
- {
- "answerText": "a measure of cluster quality",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Using K-Means, you must first determine the value of 'k'",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 31,
- "title": "Intro to NLP: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What does NLP stand for in these lessons?",
- "answerOptions": [
- {
- "answerText": "Neural Language Processing",
- "isCorrect": "false"
- },
- {
- "answerText": "natural language processing",
- "isCorrect": "true"
- },
- {
- "answerText": "Natural Linguistic Processing",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Eliza was an early bot that acted as a computer",
- "answerOptions": [
- {
- "answerText": "therapist",
- "isCorrect": "true"
- },
- {
- "answerText": "doctor",
- "isCorrect": "false"
- },
- {
- "answerText": "nurse",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Alan Turing's 'Turing Test' tried to determine if a computer was",
- "answerOptions": [
- {
- "answerText": "indistinguishable from a human",
- "isCorrect": "false"
- },
- {
- "answerText": "thinking",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 32,
- "title": "Intro to NLP: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Joseph Weizenbaum invented the bot",
- "answerOptions": [
- {
- "answerText": "Elisha",
- "isCorrect": "false"
- },
- {
- "answerText": "Eliza",
- "isCorrect": "true"
- },
- {
- "answerText": "Eloise",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A conversational bot gives output based on",
- "answerOptions": [
- {
- "answerText": "Randomly choosing predefined choices",
- "isCorrect": "false"
- },
- {
- "answerText": "Analyzing the input and using machine intelligence",
- "isCorrect": "false"
- },
- {
- "answerText": "Both of these",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "How would you make the bot more effective?",
- "answerOptions": [
- {
- "answerText": "By asking it more questions.",
- "isCorrect": "false"
- },
- {
- "answerText": "By feeding it more data and training it accordingly",
- "isCorrect": "true"
- },
- {
- "answerText": "The bot is dumb, it cannot learn :(",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 33,
- "title": "NLP Tasks: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Tokenization",
- "answerOptions": [
- {
- "answerText": "Splits text by means of punctuation",
- "isCorrect": "false"
- },
- {
- "answerText": "Splits text into separate tokens (words)",
- "isCorrect": "true"
- },
- {
- "answerText": "Splits text into phrases",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Embeddings",
- "answerOptions": [
- {
- "answerText": "converts text data numerically so words can cluster",
- "isCorrect": "true"
- },
- {
- "answerText": "embeds words into phrases",
- "isCorrect": "false"
- },
- {
- "answerText": "embeds sentences into paragraphs",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Parts-of-Speech Tagging",
- "answerOptions": [
- {
- "answerText": "divides sentences by their parts of speech",
- "isCorrect": "false"
- },
- {
- "answerText": "takes tokenized words and tags them by their part of speech",
- "isCorrect": "true"
- },
- {
- "answerText": "diagrams sentences",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 34,
- "title": "NLP Tasks: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Build a dictionary of how often words reoccur using:",
- "answerOptions": [
- {
- "answerText": "Word and Phrase Dictionary",
- "isCorrect": "false"
- },
- {
- "answerText": "Word and Phrase Frequencies",
- "isCorrect": "true"
- },
- {
- "answerText": "Word and Phrase Library",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "N-grams refer to",
- "answerOptions": [
- {
- "answerText": "A text can be split into sequences of words of a set length",
- "isCorrect": "true"
- },
- {
- "answerText": "A word can be split into sequences of characters of a set length",
- "isCorrect": "false"
- },
- {
- "answerText": "A text can be split into paragraphs of a set length",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Sentiment analysis",
- "answerOptions": [
- {
- "answerText": "analyzes a phrase for positivity or negativity",
- "isCorrect": "true"
- },
- {
- "answerText": "analyzes a phrase for sentimentality",
- "isCorrect": "false"
- },
- {
- "answerText": "analyzes a phrase for sadness",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 35,
- "title": "NLP and Translation: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Naive translation",
- "answerOptions": [
- {
- "answerText": "translates words only",
- "isCorrect": "true"
- },
- {
- "answerText": "translates sentence structure",
- "isCorrect": "false"
- },
- {
- "answerText": "translates sentiment",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A *corpus* of texts refers to",
- "answerOptions": [
- {
- "answerText": "A small number of texts",
- "isCorrect": "false"
- },
- {
- "answerText": "A large number of texts",
- "isCorrect": "true"
- },
- {
- "answerText": "One standard text",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "If a ML model has enough human translations to build a model on, it can",
- "answerOptions": [
- {
- "answerText": "abbreviate translations",
- "isCorrect": "false"
- },
- {
- "answerText": "standardize translations",
- "isCorrect": "false"
- },
- {
- "answerText": "improve the accuracy of translations",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 36,
- "title": "NLP and Translation: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Underlying TextBlob's translation library is:",
- "answerOptions": [
- {
- "answerText": "Google Translate",
- "isCorrect": "true"
- },
- {
- "answerText": "Bing",
- "isCorrect": "false"
- },
- {
- "answerText": "A custom ML model",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "To use `blob.translate` you need:",
- "answerOptions": [
- {
- "answerText": "an internet connection",
- "isCorrect": "true"
- },
- {
- "answerText": "a dictionary",
- "isCorrect": "false"
- },
- {
- "answerText": "JavaScript",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "To determine sentiment, an ML approach would be to:",
- "answerOptions": [
- {
- "answerText": "apply Regression techniques to manually generated opinions and scores and look for patterns",
- "isCorrect": "false"
- },
- {
- "answerText": "apply NLP techniques to manually generated opinions and scores and look for patterns",
- "isCorrect": "true"
- },
- {
- "answerText": "apply Clustering techniques to manually generated opinions and scores and look for patterns",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 37,
- "title": "NLP 4: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What information can we get from text that was written or spoken by a human?",
- "answerOptions": [
- {
- "answerText": "patterns and frequencies",
- "isCorrect": "false"
- },
- {
- "answerText": "sentiment and meaning",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "What is sentiment analysis?",
- "answerOptions": [
- {
- "answerText": "a study of whether a family heirloom has sentimental value",
- "isCorrect": "false"
- },
- {
- "answerText": "a method of systematically identifying, extracting, quantifying, and studying affective states and subjective information",
- "isCorrect": "true"
- },
- {
- "answerText": "the ability to tell whether someone is sad or happy",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What question could be answered using a dataset of hotel reviews, Python, and sentiment analysis?",
- "answerOptions": [
- {
- "answerText": "What are the most frequently used words and phrases in reviews?",
- "isCorrect": "true"
- },
- {
- "answerText": "Which resort has the best pool?",
- "isCorrect": "false"
- },
- {
- "answerText": "Is there valet parking at this hotel?",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 38,
- "title": "NLP 4: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What is the essence of NLP?",
- "answerOptions": [
- {
- "answerText": "categorizing human language into happy or sad",
- "isCorrect": "false"
- },
- {
- "answerText": "interpreting meaning or sentiment without having to have a human do it",
- "isCorrect": "true"
- },
- {
- "answerText": "finding outliers in sentiment and examining them",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What are some things you might look for while cleaning data?",
- "answerOptions": [
- {
- "answerText": "characters in other languages",
- "isCorrect": "false"
- },
- {
- "answerText": "blank rows or columns",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "It is important to understand your data and its foibles before performing operations on it.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 39,
- "title": "NLP 5: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Why is it important to clean data before analyzing it?",
- "answerOptions": [
- {
- "answerText": "Some columns might have missing or incorrect data",
- "isCorrect": "false"
- },
- {
- "answerText": "Messy data can lead to false conclusions about the dataset",
- "isCorrect": "false"
- },
- {
- "answerText": "Both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "What is one example of a strategy for cleaning data?",
- "answerOptions": [
- {
- "answerText": "removing columns/rows that aren't useful for answering a specific question",
- "isCorrect": "true"
- },
- {
- "answerText": "getting rid of verified values that don't fit your hypothesis",
- "isCorrect": "false"
- },
- {
- "answerText": "moving the outliers to a separate table and running the calculations for that table to see if they match",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "It can be useful to categorize data using a Tag column.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 40,
- "title": "NLP 5: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What is the goal of the dataset?",
- "answerOptions": [
- {
- "answerText": "to see how many negative and positive reviews there are for hotels across the world",
- "isCorrect": "false"
- },
- {
- "answerText": "to add sentiment and columns that will help you choose the best hotel",
- "isCorrect": "true"
- },
- {
- "answerText": "to analyze why people leave specific reviews",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What are stop words?",
- "answerOptions": [
- {
- "answerText": "common English words that do not change the sentiment of a sentence",
- "isCorrect": "false"
- },
- {
- "answerText": "words that you can remove to speed up sentiment analysis",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "To test the sentiment analysis, make sure it matches the reviewer's score for the same review.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 41,
- "title": "Intro to Time Series: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Time Series Forecasting is useful in",
- "answerOptions": [
- {
- "answerText": "determining future costs",
- "isCorrect": "false"
- },
- {
- "answerText": "predicting future pricing",
- "isCorrect": "false"
- },
- {
- "answerText": "both the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "A time series is a sequence taken at:",
- "answerOptions": [
- {
- "answerText": "successive equally spaced points in space",
- "isCorrect": "false"
- },
- {
- "answerText": "successive equally spaced points in time",
- "isCorrect": "true"
- },
- {
- "answerText": "successive equally spaced points in space and time",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Time series can be used in:",
- "answerOptions": [
- {
- "answerText": "earthquake prediction",
- "isCorrect": "true"
- },
- {
- "answerText": "computer vision",
- "isCorrect": "false"
- },
- {
- "answerText": "color analysis",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 42,
- "title": "Intro to Time Series: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Time series trends are",
- "answerOptions": [
- {
- "answerText": "Measurable increases and decreases over time",
- "isCorrect": "true"
- },
- {
- "answerText": "Quantifying decreases over time",
- "isCorrect": "false"
- },
- {
- "answerText": "Gaps between increases and decreases over time",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Outliers are",
- "answerOptions": [
- {
- "answerText": "points close to standard data variance",
- "isCorrect": "false"
- },
- {
- "answerText": "points far away from standard data variance",
- "isCorrect": "true"
- },
- {
- "answerText": "points within standard data variance",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Time Series Forecasting is most useful for",
- "answerOptions": [
- {
- "answerText": "Econometrics",
- "isCorrect": "true"
- },
- {
- "answerText": "History",
- "isCorrect": "false"
- },
- {
- "answerText": "Libraries",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 43,
- "title": "Time Series ARIMA: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "ARIMA stands for",
- "answerOptions": [
- {
- "answerText": "AutoRegressive Integral Moving Average",
- "isCorrect": "false"
- },
- {
- "answerText": "AutoRegressive Integrated Moving Action",
- "isCorrect": "false"
- },
- {
- "answerText": "AutoRegressive Integrated Moving Average",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "Stationarity refers to",
- "answerOptions": [
- {
- "answerText": "data whose attributes does not change when shifted in time",
- "isCorrect": "false"
- },
- {
- "answerText": "data whose distribution does not change when shifted in time",
- "isCorrect": "true"
- },
- {
- "answerText": "data whose distribution changes when shifted in time",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Differencing",
- "answerOptions": [
- {
- "answerText": "stabilizes trend and seasonality",
- "isCorrect": "false"
- },
- {
- "answerText": "exacerbates trend and seasonality",
- "isCorrect": "false"
- },
- {
- "answerText": "eliminates trend and seasonality",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 44,
- "title": "Time Series ARIMA: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "ARIMA is used to make a model fit the special form of time series data",
- "answerOptions": [
- {
- "answerText": "as flat as possible",
- "isCorrect": "false"
- },
- {
- "answerText": "as closely as possible",
- "isCorrect": "true"
- },
- {
- "answerText": "via scatterplots",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Use SARIMAX to",
- "answerOptions": [
- {
- "answerText": "manage seasonal ARIMA models",
- "isCorrect": "true"
- },
- {
- "answerText": "manage special ARIMA models",
- "isCorrect": "false"
- },
- {
- "answerText": "manage statistical ARIMA models",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "'Walk-Forward' validation involves",
- "answerOptions": [
- {
- "answerText": "re-evaluating a model progressively as it is validated",
- "isCorrect": "false"
- },
- {
- "answerText": "re-training a model progressively as it is validated",
- "isCorrect": "true"
- },
- {
- "answerText": "re-configuring a model progressively as it is validated",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 45,
- "title": "Reinforcement 1: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What is reinforcement learning?",
- "answerOptions": [
- {
- "answerText": "teaching someone something over and over again until they understand",
- "isCorrect": "false"
- },
- {
- "answerText": "a learning technique that deciphers the optimal behavior of an agent in some environment by running many experiments",
- "isCorrect": "true"
- },
- {
- "answerText": "understanding how to run multiple experiments at once",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is a policy?",
- "answerOptions": [
- {
- "answerText": "a function that returns the action at any given state",
- "isCorrect": "true"
- },
- {
- "answerText": "a document that tells you whether or not you can return an item",
- "isCorrect": "false"
- },
- {
- "answerText": "a function that is used for a random purpose",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "A reward function returns a score for each state of an environment.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "true"
- },
- {
- "answerText": "false",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 46,
- "title": "Reinforcement 1: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What is Q-Learning?",
- "answerOptions": [
- {
- "answerText": "a mechanism for recording the 'goodness' of each state",
- "isCorrect": "false"
- },
- {
- "answerText": "an algorithm where the policy is defined by a Q-Table",
- "isCorrect": "false"
- },
- {
- "answerText": "both of the above",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "For what values does a Q-Table correspond to the random walk policy?",
- "answerOptions": [
- {
- "answerText": "all equal values",
- "isCorrect": "true"
- },
- {
- "answerText": "-0.25",
- "isCorrect": "false"
- },
- {
- "answerText": "all different values",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "It was better to use exploration than exploitation during the learning process in our lesson.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "false"
- },
- {
- "answerText": "false",
- "isCorrect": "true"
- }
- ]
- }
- ]
- },
- {
- "id": 47,
- "title": "Reinforcement 2: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Chess and Go are games with continuous states.",
- "answerOptions": [
- {
- "answerText": "true",
- "isCorrect": "false"
- },
- {
- "answerText": "false",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "What is the CartPole problem?",
- "answerOptions": [
- {
- "answerText": "a process for eliminating outliers",
- "isCorrect": "false"
- },
- {
- "answerText": "a method for optimizing your shopping cart",
- "isCorrect": "false"
- },
- {
- "answerText": "a simplified version of balancing",
- "isCorrect": "true"
- }
- ]
- },
- {
- "questionText": "What tool can we use to play out different scenarios of potential states in a game?",
- "answerOptions": [
- {
- "answerText": "guess and check",
- "isCorrect": "false"
- },
- {
- "answerText": "simulation environments",
- "isCorrect": "true"
- },
- {
- "answerText": "state transition testing",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 48,
- "title": "Reinforcement 2: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Where do we define all possible actions in an environment?",
- "answerOptions": [
- {
- "answerText": "methods",
- "isCorrect": "false"
- },
- {
- "answerText": "action space",
- "isCorrect": "true"
- },
- {
- "answerText": "action list",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What pair did we use as the dictionary key-value?",
- "answerOptions": [
- {
- "answerText": "(state, action) as the key, Q-Table entry as the value",
- "isCorrect": "true"
- },
- {
- "answerText": "state as the key, action as the value",
- "isCorrect": "false"
- },
- {
- "answerText": "the value of the qvalues function as the key, action as the value",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What are the hyperparameters we used during Q-Learning?",
- "answerOptions": [
- {
- "answerText": "q-table value, current reward, random action",
- "isCorrect": "false"
- },
- {
- "answerText": "learning rate, discount factor, exploration/exploitation factor",
- "isCorrect": "true"
- },
- {
- "answerText": "cumulative rewards, learning rate, exploration factor",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 49,
- "title": "Real World Applications: Pre-Lecture Quiz",
- "quiz": [
- {
- "questionText": "What's an example of an ML application in the Finance industry?",
- "answerOptions": [
- {
- "answerText": "Personalizing the customer journey using NLP",
- "isCorrect": "false"
- },
- {
- "answerText": "Wealth management using linear regression",
- "isCorrect": "true"
- },
- {
- "answerText": "Energy management using Time Series",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What ML technique can hospitals use to manage readmission?",
- "answerOptions": [
- {
- "answerText": "Clustering",
- "isCorrect": "true"
- },
- {
- "answerText": "Time Series",
- "isCorrect": "false"
- },
- {
- "answerText": "NLP",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What is an example of using Time Series for energy management?",
- "answerOptions": [
- {
- "answerText": "Motion sensing animals",
- "isCorrect": "false"
- },
- {
- "answerText": "Smart parking meters",
- "isCorrect": "true"
- },
- {
- "answerText": "Tracking forest fires",
- "isCorrect": "false"
- }
- ]
- }
- ]
- },
- {
- "id": 50,
- "title": "Real World Applications: Post-Lecture Quiz",
- "quiz": [
- {
- "questionText": "Which ML technique can be used to detect credit card fraud?",
- "answerOptions": [
- {
- "answerText": "Regression",
- "isCorrect": "false"
- },
- {
- "answerText": "Clustering",
- "isCorrect": "true"
- },
- {
- "answerText": "NLP",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "Which ML technique is exemplified in forest management?",
- "answerOptions": [
- {
- "answerText": "Reinforcement Learning",
- "isCorrect": "true"
- },
- {
- "answerText": "Time Series",
- "isCorrect": "false"
- },
- {
- "answerText": "NLP",
- "isCorrect": "false"
- }
- ]
- },
- {
- "questionText": "What's an example of an ML application in the Health Care industry?",
- "answerOptions": [
- {
- "answerText": "Predicting student behavior using regression",
- "isCorrect": "false"
- },
- {
- "answerText": "Managing clinical trials using classifiers",
- "isCorrect": "true"
- },
- {
- "answerText": "Motion sensing of animals using classifiers",
- "isCorrect": "false"
- }
- ]
- }
- ]
- }
- ]
- }
+ {
+ "title": "Machine Learning for Beginners: Quizzes",
+ "complete": "Congratulations, you completed the quiz!",
+ "error": "Sorry, try again",
+ "quizzes": [
+ {
+ "id": 1,
+ "title": "Introduction to Machine Learning: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Applications of machine learning are all around us",
+ "answerOptions": [
+ {
+ "answerText": "True",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "False",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is the technical difference between classical ML and deep learning?",
+ "answerOptions": [
+ {
+ "answerText": "classical ML was invented first",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "the use of neural networks",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "deep learning is used in robots",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Why might a business want to use ML strategies?",
+ "answerOptions": [
+ {
+ "answerText": "to automate the solving of multi-dimensional problems",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "to customize a shopping experience based on the type of customer",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 2,
+ "title": "Introduction to Machine Learning: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Machine learning algorithms are meant to simulate",
+ "answerOptions": [
+ {
+ "answerText": "intelligent machines",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "the human brain",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "orangutans",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is an example of a classical ML technique?",
+ "answerOptions": [
+ {
+ "answerText": "natural language processing",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "deep learning",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Neural Networks",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Why should everyone learn the basics of ML?",
+ "answerOptions": [
+ {
+ "answerText": "learning ML is fun and accessible to everyone",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ML strategies are being used in many industries and domains",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 3,
+ "title": "History of Machine Learning: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Approximately when was the term 'artificial intelligence' coined?",
+ "answerOptions": [
+ {
+ "answerText": "1980s",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "1950s",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "1930s",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Who was one of the early pioneers of machine learning?",
+ "answerOptions": [
+ {
+ "answerText": "Alan Turing",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bill Gates",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Shakey the robot",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is one of the reasons that advancement in AI slowed in the 1970s?",
+ "answerOptions": [
+ {
+ "answerText": "Limited compute power",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Not enough skilled engineers",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Conflicts between countries",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 4,
+ "title": "History of Machine Learning: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What's an example of a 'scruffy' AI system?",
+ "answerOptions": [
+ {
+ "answerText": "ELIZA",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "HACKML",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "SSYSTEM",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is an example of a technology that was developed during 'The Golden Years'?",
+ "answerOptions": [
+ {
+ "answerText": "Blocks world",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Jibo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Robot dogs",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which event was foundational in the creation and expansion of the field of artificial intelligence?",
+ "answerOptions": [
+ {
+ "answerText": "Turing Test",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Dartmouth Summer Research Project",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "AI Winter",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 5,
+ "title": "Fairness and Machine Learning: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Unfairness in Machine Learning can happen",
+ "answerOptions": [
+ {
+ "answerText": "intentionally",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "unintentionally",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "The term 'unfairness' in ML connotes:",
+ "answerOptions": [
+ {
+ "answerText": "harms for a group of people",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "harm to one person",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "harms for the majority of people",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "The five main types of harms include",
+ "answerOptions": [
+ {
+ "answerText": "allocation, quality of service, stereotyping, denigration, and over- or under- representation",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "elocation, quality of service, stereotyping, denigration, and over- or under- representation ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "allocation, quality of service, stereophonics, denigration, and over- or under- representation ",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 6,
+ "title": "Fairness and Machine Learning: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Unfairness in a model can be caused by",
+ "answerOptions": [
+ {
+ "answerText": "over reliance on historical data",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "under reliance on historical data",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "too closely aligning to historical data",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "To mitigate unfairness, you can",
+ "answerOptions": [
+ {
+ "answerText": "identify harms and affected groups",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "define fairness metrics",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Fairlearn is a package that can",
+ "answerOptions": [
+ {
+ "answerText": "compare multiple models by using fairness and performance metrics",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "choose the best model for your needs",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "help you decide what is fair and what is not",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 7,
+ "title": "Tools and Techniques: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "When building a model, you should:",
+ "answerOptions": [
+ {
+ "answerText": "prepare your data, then train your model",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "choose a training method, then prepare your data",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "tune parameters, then train your model",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Your data's ___ will impact the quality of your ML model",
+ "answerOptions": [
+ {
+ "answerText": "quantity",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "shape",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "A feature variable is:",
+ "answerOptions": [
+ {
+ "answerText": "a quality of your data",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a measurable property of your data",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a row of your data",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 8,
+ "title": "Tools and Techniques: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "You should visualize your data because",
+ "answerOptions": [
+ {
+ "answerText": "you can discover outliers",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "you can discover potential cause for bias",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of these",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Split your data into:",
+ "answerOptions": [
+ {
+ "answerText": "training and turing sets",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "training and test sets",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "validation and evaluation sets",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A common command to start the training process in various ML libraries is:",
+ "answerOptions": [
+ {
+ "answerText": "model.travel",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "model.train",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "model.fit",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 9,
+ "title": "Introduction to Regression: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Which of these variables is a numeric variable?",
+ "answerOptions": [
+ {
+ "answerText": "Height",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Gender",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Hair Color",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which of these variables is a categorical variable?",
+ "answerOptions": [
+ {
+ "answerText": "Heart Rate",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Blood Type",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Weight",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which of these problems is a Regression analysis-based problem?",
+ "answerOptions": [
+ {
+ "answerText": "Predicting the final exam marks of a student",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Predicting the blood type of a person",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Predicting whether an email is spam or not",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 10,
+ "title": "Introduction to Regression: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "If your Machine Learning model's training accuracy is 95 % and the testing accuracy is 30 %, then what type of condition it is called?",
+ "answerOptions": [
+ {
+ "answerText": "Overfitting",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Underfitting",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Double Fitting",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "The process of identifying significant features from a set of features is called:",
+ "answerOptions": [
+ {
+ "answerText": "Feature Extraction",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Feature Dimensionality Reduction",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Feature Selection",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "The process of splitting a dataset into a certain ratio of training and testing dataset using Scikit Learn's 'train_test_split()' method/function is called:",
+ "answerOptions": [
+ {
+ "answerText": "Cross-Validation",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Hold-Out Validation",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Leave one out Validation",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 11,
+ "title": "Prepare and Visualize Data for Regression: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Which of these Python modules is used to plot the visualization of data?",
+ "answerOptions": [
+ {
+ "answerText": "Numpy",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Scikit-learn",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Matplotlib",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "If you want to understand the spread or the other characteristics of data points of your dataset, then perform:",
+ "answerOptions": [
+ {
+ "answerText": "Data Visualization",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Data Preprocessing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Train Test Split",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which of these is a part of the Data Visualization step in a Machine Learning project?",
+ "answerOptions": [
+ {
+ "answerText": "Incorporating a certain Machine Learning algorithm",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Creating a pictorial representation of data using different plotting methods",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Normalizing the values of a dataset",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 12,
+ "title": "Prepare and Visualize Data for Regression: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Which of these code snippets is correct based on this lesson, if you want to check for the presence of missing values in your dataset? Suppose the dataset is stored in a variable named 'dataset' which is a Pandas DataFrame object.",
+ "answerOptions": [
+ {
+ "answerText": "dataset.isnull().sum()",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "findMissing(dataset)",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sum(null(dataset))",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which of these plotting methods is useful when you would like to understand the spread of different groups of datapoints from your dataset?",
+ "answerOptions": [
+ {
+ "answerText": "Scatter Plot",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Line Plot",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Bar Plot",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "What can Data Visualization NOT tell you?",
+ "answerOptions": [
+ {
+ "answerText": "Relationships among datapoints",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "The source from where the dataset is collected",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Finding the presence of outliers in the dataset",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 13,
+ "title": "Linear and Polynomial Regression: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Matplotlib is a ",
+ "answerOptions": [
+ {
+ "answerText": "drawing library",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "data visualization library",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "lending library",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Linear Regression uses the following to plot relationships between variables",
+ "answerOptions": [
+ {
+ "answerText": "a straight line",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a circle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a curve",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A good Linear Regression model has a ___ Correlation Coefficient",
+ "answerOptions": [
+ {
+ "answerText": "low",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "high",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "flat",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 14,
+ "title": "Linear and Polynomial Regression: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "If your data is nonlinear, try a ___ type of Regression",
+ "answerOptions": [
+ {
+ "answerText": "linear",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "spherical",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "polynomial",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "These are all types of Regression methods",
+ "answerOptions": [
+ {
+ "answerText": "Falsestep, Ridge, Lasso and Elasticnet",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Stepwise, Ridge, Lasso and Elasticnet",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Stepwise, Ridge, Lariat and Elasticnet",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Least-Squares Regression means that all the datapoints surrounding the regression line are:",
+ "answerOptions": [
+ {
+ "answerText": "squared and then subtracted",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "multiplied",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "squared and then added up",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 15,
+ "title": "Logistic Regression: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Use Logistic Regression to predict",
+ "answerOptions": [
+ {
+ "answerText": "whether an apple is ripe or not",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "how many tickets can be sold in a month",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "what color the sky will turn tomorrow at 6 PM",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Types of Logistic Regression include",
+ "answerOptions": [
+ {
+ "answerText": "multinomial and cardinal",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "multinomial and ordinal",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "principal and ordinal",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Your data has weak correlations. The best type of Regression to use is:",
+ "answerOptions": [
+ {
+ "answerText": "Logistic",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Linear",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Cardinal",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 16,
+ "title": "Logistic Regression: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Seaborn is a type of",
+ "answerOptions": [
+ {
+ "answerText": "data visualization library",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "mapping library",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "mathematical library",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A confusion matrix is also known as a:",
+ "answerOptions": [
+ {
+ "answerText": "error matrix",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "truth matrix",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "accuracy matrix",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A good model will have:",
+ "answerOptions": [
+ {
+ "answerText": "a large number of false positives and true negatives in its confusion matrix",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a large number of true positives and true negatives in its confusion matrix",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a large number of true positives and false negatives in its confusion matrix",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 17,
+ "title": "Build a Web App: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What does ONNX stand for?",
+ "answerOptions": [
+ {
+ "answerText": "Over Neural Network Exchange",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Open Neural Network Exchange",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Output Neural Network Exchange",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "How is Flask defined by its creators?",
+ "answerOptions": [
+ {
+ "answerText": "mini-framework",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "large-framework",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "micro-framework",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "What does the Pickle module of Python do",
+ "answerOptions": [
+ {
+ "answerText": "Serializes a Python Object",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "De-serializes a Python Object",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Serializes and De-serializes a Python Object",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 18,
+ "title": "Build a Web App: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What are the tools we can use to host a pre-trained model on the web using Python?",
+ "answerOptions": [
+ {
+ "answerText": "Flask",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "TensorFlow.js",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "onnx.js",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What does SaaS stand for?",
+ "answerOptions": [
+ {
+ "answerText": "System as a Service",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Software as a Service",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Security as a Service",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What does Scikit-learn's LabelEncoder library do?",
+ "answerOptions": [
+ {
+ "answerText": "Encodes data alphabetically",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Encodes data numerically",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Encodes data serially",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 19,
+ "title": "Classification 1: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Classification is a form of supervised learning that has a lot in common with",
+ "answerOptions": [
+ {
+ "answerText": "Time Series",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Regression techniques",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What question can classification help answer?",
+ "answerOptions": [
+ {
+ "answerText": "Is this email spam or not?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Can pigs fly?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "What is the meaning of life?",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is the first step to using Classification techniques?",
+ "answerOptions": [
+ {
+ "answerText": "creating classes of a dataset",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "cleaning and balancing your data",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "assigning a data point to a group or outcome",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 20,
+ "title": "Classification 1: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What is a multiclass question?",
+ "answerOptions": [
+ {
+ "answerText": "the task of classifying data points into multiple classes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "the task of classifying data points into one of several classes",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "the task of cleaning data points in multiple ways",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "It's important to clean out recurrent or unhelpful data to help your classifiers solve your problem.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What's the best reason to balance your data?",
+ "answerOptions": [
+ {
+ "answerText": "Imbalanced data looks bad in visualizations",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Balancing your data yields better results because an ML model won't skew towards one class",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Balancing your data gives you more data points",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 21,
+ "title": "Classification 2: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Balanced, clean data yields the best classification results",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "How do you choose the right classifier?",
+ "answerOptions": [
+ {
+ "answerText": "Understand which classifiers work best for which scenarios",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Educated guess and check",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Classification is a type of",
+ "answerOptions": [
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Supervised Learning",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Programming language",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 22,
+ "title": "Classification 2: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What is a 'solver'?",
+ "answerOptions": [
+ {
+ "answerText": "the person who double-checks your work",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "the algorithm to use in the optimization problem",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a machine learning technique",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which classifier did we use in this lesson?",
+ "answerOptions": [
+ {
+ "answerText": "Logistic Regression",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Decision Trees",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "One-vs-All Multiclass",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "How do you know if the classification algorithm is working as expected?",
+ "answerOptions": [
+ {
+ "answerText": "By checking the accuracy of its predictions",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "By checking it against other algorithms",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "By looking at historical data for how good this algorithm is at solving similar problems",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 23,
+ "title": "Classification 3: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "A good initial classifier to try is:",
+ "answerOptions": [
+ {
+ "answerText": "Linear SVC",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "K-Means",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Logical SVC",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Regularization controls:",
+ "answerOptions": [
+ {
+ "answerText": "the influence of parameters",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "the influence of training speed",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "the influence of outliers",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "K-Neighbors classifier can be used for:",
+ "answerOptions": [
+ {
+ "answerText": "supervised learning",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "unsupervised learning",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of these",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 24,
+ "title": "Classification 3: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Support-Vector classifiers can be used for",
+ "answerOptions": [
+ {
+ "answerText": "classification",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "regression",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of these",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Random Forest is a ___ type of classifier",
+ "answerOptions": [
+ {
+ "answerText": "Ensemble",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Dissemble",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Assemble",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Adaboost is known for:",
+ "answerOptions": [
+ {
+ "answerText": "focusing on the weights of incorrectly classified items",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "focusing on outliers",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "focusing on incorrect data",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 25,
+ "title": "Classification 4: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Recommendation systems might be used for",
+ "answerOptions": [
+ {
+ "answerText": "Recommending a good restaurant",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Recommending fashions to try",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Both of these",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Embedding a model in a web app helps it to be offline-capable",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Onnx Runtime can be used for",
+ "answerOptions": [
+ {
+ "answerText": "Running models in a web app",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Training models",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Hyperparameter tuning",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 26,
+ "title": "Classification 4: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Netron app helps you:",
+ "answerOptions": [
+ {
+ "answerText": "Visualize data",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Visualize your model's structure",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Test your web app",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Convert your Scikit-learn model for use with Onnx using:",
+ "answerOptions": [
+ {
+ "answerText": "sklearn-app",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sklearn-web",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sklearn-onnx",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Using your model in a web app is called:",
+ "answerOptions": [
+ {
+ "answerText": "inference",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "interference",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "insurance",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 27,
+ "title": "Introduction to Clustering: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "A real-life example of clustering would be",
+ "answerOptions": [
+ {
+ "answerText": "Setting the dinner table",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Sorting the laundry",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Grocery shopping",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Clustering techniques can be used in these industries",
+ "answerOptions": [
+ {
+ "answerText": "banking",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "e-commerce",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of these",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Clustering is a type of:",
+ "answerOptions": [
+ {
+ "answerText": "supervised learning",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "unsupervised learning",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "reinforcement learning",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 28,
+ "title": "Introduction to Clustering: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Euclidean geometry is arranged along",
+ "answerOptions": [
+ {
+ "answerText": "planes",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "curves",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "spheres",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "The density of your clustering data is related to its",
+ "answerOptions": [
+ {
+ "answerText": "noise",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "depth",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "validity",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "The best-known clustering algorithm is",
+ "answerOptions": [
+ {
+ "answerText": "k-means",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "k-middle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "k-mart",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 29,
+ "title": "K-Means Clustering: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "K-Means is derived from:",
+ "answerOptions": [
+ {
+ "answerText": "electrical engineering",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "signal processing",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "computational linguistics",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A good Silhouette score means:",
+ "answerOptions": [
+ {
+ "answerText": "clusters are well-separated and well-defined",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "there are few clusters",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "there are many clusters",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Variance is:",
+ "answerOptions": [
+ {
+ "answerText": "the average of the squared differences from the mean",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a problem for clustering if it becomes too high",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of these",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 30,
+ "title": "K-Means Clustering: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "A Voronoi diagram shows:",
+ "answerOptions": [
+ {
+ "answerText": "a cluster's variance",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a cluster's seed and its region",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a cluster's inertia",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Inertia is",
+ "answerOptions": [
+ {
+ "answerText": "a measure of how internally coherent clusters are",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a measure of how much clusters move",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a measure of cluster quality",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Using K-Means, you must first determine the value of 'k'",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 31,
+ "title": "Intro to NLP: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What does NLP stand for in these lessons?",
+ "answerOptions": [
+ {
+ "answerText": "Neural Language Processing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "natural language processing",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Natural Linguistic Processing",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Eliza was an early bot that acted as a computer",
+ "answerOptions": [
+ {
+ "answerText": "therapist",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "doctor",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "nurse",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Alan Turing's 'Turing Test' tried to determine if a computer was",
+ "answerOptions": [
+ {
+ "answerText": "indistinguishable from a human",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "thinking",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 32,
+ "title": "Intro to NLP: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Joseph Weizenbaum invented the bot",
+ "answerOptions": [
+ {
+ "answerText": "Elisha",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Eliza",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Eloise",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A conversational bot gives output based on",
+ "answerOptions": [
+ {
+ "answerText": "Randomly choosing predefined choices",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Analyzing the input and using machine intelligence",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Both of these",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "How would you make the bot more effective?",
+ "answerOptions": [
+ {
+ "answerText": "By asking it more questions.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "By feeding it more data and training it accordingly",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "The bot is dumb, it cannot learn :(",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 33,
+ "title": "NLP Tasks: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Tokenization",
+ "answerOptions": [
+ {
+ "answerText": "Splits text by means of punctuation",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Splits text into separate tokens (words)",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Splits text into phrases",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Embeddings",
+ "answerOptions": [
+ {
+ "answerText": "converts text data numerically so words can cluster",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "embeds words into phrases",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "embeds sentences into paragraphs",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Parts-of-Speech Tagging",
+ "answerOptions": [
+ {
+ "answerText": "divides sentences by their parts of speech",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "takes tokenized words and tags them by their part of speech",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "diagrams sentences",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 34,
+ "title": "NLP Tasks: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Build a dictionary of how often words reoccur using:",
+ "answerOptions": [
+ {
+ "answerText": "Word and Phrase Dictionary",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Word and Phrase Frequencies",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Word and Phrase Library",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "N-grams refer to",
+ "answerOptions": [
+ {
+ "answerText": "A text can be split into sequences of words of a set length",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "A word can be split into sequences of characters of a set length",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "A text can be split into paragraphs of a set length",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Sentiment analysis",
+ "answerOptions": [
+ {
+ "answerText": "analyzes a phrase for positivity or negativity",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "analyzes a phrase for sentimentality",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "analyzes a phrase for sadness",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 35,
+ "title": "NLP and Translation: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Naive translation",
+ "answerOptions": [
+ {
+ "answerText": "translates words only",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "translates sentence structure",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "translates sentiment",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A *corpus* of texts refers to",
+ "answerOptions": [
+ {
+ "answerText": "A small number of texts",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "A large number of texts",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "One standard text",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "If a ML model has enough human translations to build a model on, it can",
+ "answerOptions": [
+ {
+ "answerText": "abbreviate translations",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "standardize translations",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "improve the accuracy of translations",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 36,
+ "title": "NLP and Translation: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Underlying TextBlob's translation library is:",
+ "answerOptions": [
+ {
+ "answerText": "Google Translate",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "A custom ML model",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "To use `blob.translate` you need:",
+ "answerOptions": [
+ {
+ "answerText": "an internet connection",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a dictionary",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "JavaScript",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "To determine sentiment, an ML approach would be to:",
+ "answerOptions": [
+ {
+ "answerText": "apply Regression techniques to manually generated opinions and scores and look for patterns",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "apply NLP techniques to manually generated opinions and scores and look for patterns",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "apply Clustering techniques to manually generated opinions and scores and look for patterns",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 37,
+ "title": "NLP 4: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What information can we get from text that was written or spoken by a human?",
+ "answerOptions": [
+ {
+ "answerText": "patterns and frequencies",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sentiment and meaning",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "What is sentiment analysis?",
+ "answerOptions": [
+ {
+ "answerText": "a study of whether a family heirloom has sentimental value",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a method of systematically identifying, extracting, quantifying, and studying affective states and subjective information",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "the ability to tell whether someone is sad or happy",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What question could be answered using a dataset of hotel reviews, Python, and sentiment analysis?",
+ "answerOptions": [
+ {
+ "answerText": "What are the most frequently used words and phrases in reviews?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Which resort has the best pool?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Is there valet parking at this hotel?",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 38,
+ "title": "NLP 4: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What is the essence of NLP?",
+ "answerOptions": [
+ {
+ "answerText": "categorizing human language into happy or sad",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "interpreting meaning or sentiment without having to have a human do it",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "finding outliers in sentiment and examining them",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What are some things you might look for while cleaning data?",
+ "answerOptions": [
+ {
+ "answerText": "characters in other languages",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "blank rows or columns",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "It is important to understand your data and its foibles before performing operations on it.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 39,
+ "title": "NLP 5: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Why is it important to clean data before analyzing it?",
+ "answerOptions": [
+ {
+ "answerText": "Some columns might have missing or incorrect data",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Messy data can lead to false conclusions about the dataset",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "What is one example of a strategy for cleaning data?",
+ "answerOptions": [
+ {
+ "answerText": "removing columns/rows that aren't useful for answering a specific question",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "getting rid of verified values that don't fit your hypothesis",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "moving the outliers to a separate table and running the calculations for that table to see if they match",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "It can be useful to categorize data using a Tag column.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 40,
+ "title": "NLP 5: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What is the goal of the dataset?",
+ "answerOptions": [
+ {
+ "answerText": "to see how many negative and positive reviews there are for hotels across the world",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "to add sentiment and columns that will help you choose the best hotel",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "to analyze why people leave specific reviews",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What are stop words?",
+ "answerOptions": [
+ {
+ "answerText": "common English words that do not change the sentiment of a sentence",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "words that you can remove to speed up sentiment analysis",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "To test the sentiment analysis, make sure it matches the reviewer's score for the same review.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 41,
+ "title": "Intro to Time Series: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Time Series Forecasting is useful in",
+ "answerOptions": [
+ {
+ "answerText": "determining future costs",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "predicting future pricing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "A time series is a sequence taken at:",
+ "answerOptions": [
+ {
+ "answerText": "successive equally spaced points in space",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "successive equally spaced points in time",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "successive equally spaced points in space and time",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Time series can be used in:",
+ "answerOptions": [
+ {
+ "answerText": "earthquake prediction",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "computer vision",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "color analysis",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 42,
+ "title": "Intro to Time Series: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Time series trends are",
+ "answerOptions": [
+ {
+ "answerText": "Measurable increases and decreases over time",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Quantifying decreases over time",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Gaps between increases and decreases over time",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Outliers are",
+ "answerOptions": [
+ {
+ "answerText": "points close to standard data variance",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "points far away from standard data variance",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "points within standard data variance",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Time Series Forecasting is most useful for",
+ "answerOptions": [
+ {
+ "answerText": "Econometrics",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "History",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Libraries",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 43,
+ "title": "Time Series ARIMA: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "ARIMA stands for",
+ "answerOptions": [
+ {
+ "answerText": "AutoRegressive Integral Moving Average",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "AutoRegressive Integrated Moving Action",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "AutoRegressive Integrated Moving Average",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Stationarity refers to",
+ "answerOptions": [
+ {
+ "answerText": "data whose attributes does not change when shifted in time",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "data whose distribution does not change when shifted in time",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "data whose distribution changes when shifted in time",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Differencing",
+ "answerOptions": [
+ {
+ "answerText": "stabilizes trend and seasonality",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "exacerbates trend and seasonality",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "eliminates trend and seasonality",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 44,
+ "title": "Time Series ARIMA: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "ARIMA is used to make a model fit the special form of time series data",
+ "answerOptions": [
+ {
+ "answerText": "as flat as possible",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "as closely as possible",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "via scatterplots",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Use SARIMAX to",
+ "answerOptions": [
+ {
+ "answerText": "manage seasonal ARIMA models",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "manage special ARIMA models",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "manage statistical ARIMA models",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "'Walk-Forward' validation involves",
+ "answerOptions": [
+ {
+ "answerText": "re-evaluating a model progressively as it is validated",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "re-training a model progressively as it is validated",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "re-configuring a model progressively as it is validated",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 45,
+ "title": "Reinforcement 1: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What is reinforcement learning?",
+ "answerOptions": [
+ {
+ "answerText": "teaching someone something over and over again until they understand",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a learning technique that deciphers the optimal behavior of an agent in some environment by running many experiments",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "understanding how to run multiple experiments at once",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is a policy?",
+ "answerOptions": [
+ {
+ "answerText": "a function that returns the action at any given state",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "a document that tells you whether or not you can return an item",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a function that is used for a random purpose",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "A reward function returns a score for each state of an environment.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 46,
+ "title": "Reinforcement 1: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What is Q-Learning?",
+ "answerOptions": [
+ {
+ "answerText": "a mechanism for recording the 'goodness' of each state",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "an algorithm where the policy is defined by a Q-Table",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "both of the above",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "For what values does a Q-Table correspond to the random walk policy?",
+ "answerOptions": [
+ {
+ "answerText": "all equal values",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "-0.25",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "all different values",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "It was better to use exploration than exploitation during the learning process in our lesson.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 47,
+ "title": "Reinforcement 2: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Chess and Go are games with continuous states.",
+ "answerOptions": [
+ {
+ "answerText": "true",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "false",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "What is the CartPole problem?",
+ "answerOptions": [
+ {
+ "answerText": "a process for eliminating outliers",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a method for optimizing your shopping cart",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "a simplified version of balancing",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "What tool can we use to play out different scenarios of potential states in a game?",
+ "answerOptions": [
+ {
+ "answerText": "guess and check",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "simulation environments",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "state transition testing",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 48,
+ "title": "Reinforcement 2: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Where do we define all possible actions in an environment?",
+ "answerOptions": [
+ {
+ "answerText": "methods",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "action space",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "action list",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What pair did we use as the dictionary key-value?",
+ "answerOptions": [
+ {
+ "answerText": "(state, action) as the key, Q-Table entry as the value",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "state as the key, action as the value",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "the value of the qvalues function as the key, action as the value",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What are the hyperparameters we used during Q-Learning?",
+ "answerOptions": [
+ {
+ "answerText": "q-table value, current reward, random action",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "learning rate, discount factor, exploration/exploitation factor",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "cumulative rewards, learning rate, exploration factor",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 49,
+ "title": "Real World Applications: Pre-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "What's an example of an ML application in the Finance industry?",
+ "answerOptions": [
+ {
+ "answerText": "Personalizing the customer journey using NLP",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Wealth management using linear regression",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Energy management using Time Series",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What ML technique can hospitals use to manage readmission?",
+ "answerOptions": [
+ {
+ "answerText": "Clustering",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Time Series",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What is an example of using Time Series for energy management?",
+ "answerOptions": [
+ {
+ "answerText": "Motion sensing animals",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Smart parking meters",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Tracking forest fires",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 50,
+ "title": "Real World Applications: Post-Lecture Quiz",
+ "quiz": [
+ {
+ "questionText": "Which ML technique can be used to detect credit card fraud?",
+ "answerOptions": [
+ {
+ "answerText": "Regression",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Clustering",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Which ML technique is exemplified in forest management?",
+ "answerOptions": [
+ {
+ "answerText": "Reinforcement Learning",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Time Series",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "What's an example of an ML application in the Health Care industry?",
+ "answerOptions": [
+ {
+ "answerText": "Predicting student behavior using regression",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Managing clinical trials using classifiers",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Motion sensing of animals using classifiers",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
]
diff --git a/quiz-app/src/assets/translations/fr.json b/quiz-app/src/assets/translations/fr.json
new file mode 100644
index 00000000..9b946ab5
--- /dev/null
+++ b/quiz-app/src/assets/translations/fr.json
@@ -0,0 +1,2811 @@
+[
+ {
+ "title": "Machine Learning pour les Débutants: Quiz",
+ "complete": "Félicitations, vous avez terminé le quiz!",
+ "error": "Désolé, essayez à nouveau",
+ "quizzes": [
+ {
+ "id": 1,
+ "title": "Introduction au machine learning: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Les applications de machine learning sont toutes autour de nous",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle est la différence technique entre le ml classique et le deep learning?",
+ "answerOptions": [
+ {
+ "answerText": "ML classique a été inventé en premier",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "L'utilisation de réseaux de neurones",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Le deep learning est utilisé dans les robots",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Pourquoi une entreprise pourrait-elle vouloir utiliser des stratégies ML?",
+ "answerOptions": [
+ {
+ "answerText": "Pour automatiser la résolution de problèmes multidimensionnels",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Pour personnaliser une expérience de magasinage basée sur le type de client",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 2,
+ "title": "Introduction au machine learning: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Les algorithmes de machine learning sont destinés à simuler",
+ "answerOptions": [
+ {
+ "answerText": "Des machines intelligentes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Le cerveau humain",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Des orangutans",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Qu'est-ce qu'un exemple de technique classique de ML?",
+ "answerOptions": [
+ {
+ "answerText": "Le traitement des langues naturelles",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Le deep learning",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Des neural networks",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Pourquoi tout le monde devrait-il apprendre les bases du ML?",
+ "answerOptions": [
+ {
+ "answerText": "L'apprentissage ml est amusant et accessible à tout le monde",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les stratégies ML sont utilisées dans de nombreuses industries et domaines",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 3,
+ "title": "Historique du machine learning: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Quand approximativement le terme 'intelligence artificielle' a-t-il été inventé ?",
+ "answerOptions": [
+ {
+ "answerText": "1980s",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "années 1950",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "années 1930",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Qui était l'un des premiers pionniers du machine learning?",
+ "answerOptions": [
+ {
+ "answerText": "Alan Turing",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bill Gates",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Shakey the Robot",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle est l'une des raisons pour lesquelles l'avancement de l'AI a ralenti dans les années 1970?",
+ "answerOptions": [
+ {
+ "answerText": "Puissance de calcul limitée",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Pas assez d'ingénieurs qualifiés",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Conflits entre pays",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 4,
+ "title": "Historique du machine learning: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Qu'est-ce qu'un exemple de système d'IA \" Scruffy \" AI?",
+ "answerOptions": [
+ {
+ "answerText": "ELIZA",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "HACKML",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "SSYSTEM",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel est l'exemple d'une technologie qui a été développée pendant les « années d'or » ?",
+ "answerOptions": [
+ {
+ "answerText": "Blocks World",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Jibo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Robot Dogs",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel événement était fondé sur la création et l'expansion du domaine de l'intelligence artificielle?",
+ "answerOptions": [
+ {
+ "answerText": "Turing Test",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Projet de recherche d'été de Dartmouth",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "AI Winter",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 5,
+ "title": "L'équité et le machine learning: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "L'injustice dans le machine learning peut arriver",
+ "answerOptions": [
+ {
+ "answerText": "Intentionnellement",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Involontairement",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Le terme \" injustice \" en ml connotes:",
+ "answerOptions": [
+ {
+ "answerText": "Préjudices pour un groupe de personnees",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "préjudice à une personne",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Préjudices pour la majorité des gens",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Les cinq principaux types de préjudices incluent",
+ "answerOptions": [
+ {
+ "answerText": "Allocation, qualité de service, stéréotypage, dénigration et sous-représentation",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Elocation, qualité de service, stéréotypage, dénigration et sous-représentation",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Allocation, qualité de service, stéréophonie, dénigration et sous-représentation",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 6,
+ "title": "L'équité et le machine learning: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "L'injustice dans un modèle peut être causée par",
+ "answerOptions": [
+ {
+ "answerText": "Dépendance excessive de données historiques",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "sous-dépendance sur les données historiques",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Alignement trop étroit sur les données historiques",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Pour atténuer l'injustice, vous pouvez",
+ "answerOptions": [
+ {
+ "answerText": "Identifier les préjudices et les groupes affectés",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Définir les métriques d'équité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Fairlearn est un paquet qui peut",
+ "answerOptions": [
+ {
+ "answerText": "Comparer plusieurs modèles en utilisant des métriques d'équité et de performance",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Choisissez le meilleur modèle pour vos besoins",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Vous aider à décider de ce qui est juste et ce qui ne l'est pas",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 7,
+ "title": "Outils et techniques: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Lors de la construction d'un modèle, vous devriez:",
+ "answerOptions": [
+ {
+ "answerText": "Préparez vos données, puis formez votre modèle",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Choisissez une méthode de formation, puis préparez vos données",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Tune Paramètres, puis formez votre modèle",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Vos données ___ vont avoir une incidence sur la qualité de votre modèle ML",
+ "answerOptions": [
+ {
+ "answerText": "Quantité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Forme",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Une variable de fonctionnalité est la suivante:",
+ "answerOptions": [
+ {
+ "answerText": "une qualité de vos données",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une propriété mesurable de vos données",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Une ligne de vos données",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 8,
+ "title": "Outils et techniques: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Vous devez visualiser vos données car",
+ "answerOptions": [
+ {
+ "answerText": "Vous pouvez découvrir des valeurs aberrantes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Vous pouvez découvrir une cause potentielle de biais",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Sélectionnez vos données en:",
+ "answerOptions": [
+ {
+ "answerText": "Entraînement et ensembles de Turing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Entraînement et ensembles de test",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Ensembles de validation et d'évaluation",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Une commande commune de démarrer le processus de formation dans diverses bibliothèques ML est la suivante:",
+ "answerOptions": [
+ {
+ "answerText": "Model.travel",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Model.train",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Model.fit",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 9,
+ "title": "Introduction à la régression: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Laquelle de ces variables est une variable numérique?",
+ "answerOptions": [
+ {
+ "answerText": "Hauteur",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Genre",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Couleur des cheveux",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Laquelle de ces variables est une variable catégorique?",
+ "answerOptions": [
+ {
+ "answerText": "rythme cardiaque",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Type de sang",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Poids",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Lequel de ces problèmes est un problème basé sur l'analyse de régression?",
+ "answerOptions": [
+ {
+ "answerText": "Prédire les marques d'examen final d'un étudiant",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Prédire le type de sang d'une personne",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Prédire si un email est spam ou non",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 10,
+ "title": "Introduction à la régression: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Si la précision de la formation du modèle d'apprentissage de votre machine est de 95% et que la précision des tests est de 30%, quel type de condition est appelé?",
+ "answerOptions": [
+ {
+ "answerText": "Surapprentissage",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Insuffisance",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Double ajustement",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Le processus d'identification des fonctionnalités significatives d'un ensemble de fonctionnalités est appelé:",
+ "answerOptions": [
+ {
+ "answerText": "Extraction de fonctionnalités",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Réduction de la dimensionnalité de fonctionnalité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Sélection de fonctionnalités",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Le processus de division d'un ensemble de données en un certain rapport d'ensemble de données d'entraînement et de test à l'aide de la méthode/fonction 'train_test_split ()' de Scikit Learn est appelé une:",
+ "answerOptions": [
+ {
+ "answerText": "Validation croisée",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Validation d'attentn",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Validation \"Oubliez-en un\" ",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 11,
+ "title": "Préparer et visualiser des données pour la régression: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Lequel de ces modules Python est utilisé pour tracer la visualisation des données?",
+ "answerOptions": [
+ {
+ "answerText": "Numpy",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Scikit-learn",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Matplotlib",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Si vous souhaitez comprendre la propagation ou les autres caractéristiques des points de données de votre ensemble de données, alors effectuez:",
+ "answerOptions": [
+ {
+ "answerText": "Une visualisation des données",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un pré-traitement des données",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un Train Test Splitn",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Lequel d'entre eux fait partie de l'étape de visualisation des données dans un projet de machine learning?",
+ "answerOptions": [
+ {
+ "answerText": "Intégrer un algorithme d'apprentissage de certains machines",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Créer une représentation picturale des données à l'aide de différentes méthodes de tracé",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Normaliser les valeurs d'un jeu de données",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 12,
+ "title": "Préparer et visualiser des données pour la régression: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Lequel de ces extraits de code est correct d'après cette leçon, si vous souhaitez vérifier la présence de valeurs manquantes dans votre ensemble de données ? Supposons que l'ensemble de données soit stocké dans une variable nommée \"ensemble de données\", qui est un objet Pandas DataFrame.",
+ "answerOptions": [
+ {
+ "answerText": "dataset.isnull().sum()",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "findMissing(dataset)",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sum(null(dataset))",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Laquelle de ces méthodes de traçage est utile lorsque vous souhaitez comprendre la propagation de différents groupes de fichiers de données de votre jeu de données?",
+ "answerOptions": [
+ {
+ "answerText": "Nuage de pointsn",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Graphique linéaire",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Graphique à barres",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Que peut ne pas vous dire la visualisation des données?",
+ "answerOptions": [
+ {
+ "answerText": "Relations entre DataPoints",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La source de l'endroit où le jeu de données est collecté",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Trouver la présence de valeurs aberrantes dans l'ensemble de données",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 13,
+ "title": "Régression linéaire et polynomiale: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Matplotlib est une",
+ "answerOptions": [
+ {
+ "answerText": "Bibliothèque de dessin",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Bibliothèque de visualisation de données",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bibliothèque de prêt",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La régression linéaire utilise ce qui suit pour tracer des relations entre variables",
+ "answerOptions": [
+ {
+ "answerText": "Une ligne droite",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un cercle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une courbe",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Un bon modèle de régression linéaire a un coefficient de corrélation ___",
+ "answerOptions": [
+ {
+ "answerText": "Bas",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Elevé",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Plat",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 14,
+ "title": "Régression linéaire et polynomiale: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Si vos données sont non linéaires, essayez un type ___ de régression",
+ "answerOptions": [
+ {
+ "answerText": "linéaire",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sphérique",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "polynômial",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Ce sont tous types de méthodes de régression",
+ "answerOptions": [
+ {
+ "answerText": "Falsestep, Ridge, Lasso et Elasticnet",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Stepwise, Ridge, Lasso et Elasticnet",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Stepwise, Ridge, Lariat et Elasticnet",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La régression des moindres carrés signifie que toutes les données de données entourant la ligne de régression sont:",
+ "answerOptions": [
+ {
+ "answerText": "carré puis soustrait",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "multiplié",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "carré puis ajouté",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 15,
+ "title": "Régression logistique: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Utilisez la régression logistique à prédire",
+ "answerOptions": [
+ {
+ "answerText": "Si une pomme est mûre ou non",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Combien de billets peuvent être vendus dans un mois",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "De quelle couleur le ciel tournera demain à 18 heures",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Les types de régression logistique incluent",
+ "answerOptions": [
+ {
+ "answerText": "multinomial et cardinal",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "multinomial et ordinal",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Principal et ordinal",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Vos données ont des corrélations faibles. Le meilleur type de régression à utiliser est:",
+ "answerOptions": [
+ {
+ "answerText": "Logistique",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "linéaire",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "cardinale",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 16,
+ "title": "Régression logistique: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Seaborn est un type de",
+ "answerOptions": [
+ {
+ "answerText": "Bibliothèque de visualisation de données",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bibliothèque de mappage",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Bibliothèque mathématique",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Une matrice de confusion est également connue sous le nom de:",
+ "answerOptions": [
+ {
+ "answerText": "Matrice d'erreur",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Matrice de vérité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Matrice de précision",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Un bon modèle aura:",
+ "answerOptions": [
+ {
+ "answerText": "Un grand nombre de faux positifs et de vrais négatifs dans sa matrice de confusion",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un grand nombre de vrais positifs et vrais négatifs dans sa matrice de confusion",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un grand nombre de vrais positifs et de faux négatifs dans sa matrice de confusion",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }, {
+ "id": 17,
+ "title": "Construire une application Web: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Qu'est-ce que ONNX signifie?",
+ "answerOptions": [
+ {
+ "answerText": "Over Neural Network Exchange",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Open Neural Network Exchange",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Output Neural Network Exchange",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Comment Flask est-il défini par ses créateurs?",
+ "answerOptions": [
+ {
+ "answerText": "mini-framework",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "grand-framework",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "micro-framework",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Que fait le module de cornichon de Python",
+ "answerOptions": [
+ {
+ "answerText": "Serialiser un objet Python",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Dé-sérialiser un objet Python",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Sérialiser et Dé-sérialiser un objet Python",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 18,
+ "title": "Construire une application Web: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Quels sont les outils que nous pouvons utiliser pour héberger un modèle pré-formé sur le Web à l'aide de Python?",
+ "answerOptions": [
+ {
+ "answerText": "Flask",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Tensorflow.js",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "onnx.JS",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Qu'est-ce que SaaS signifie?",
+ "answerOptions": [
+ {
+ "answerText": "Système en tant que service",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Logiciel en tant que service",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Sécurité en tant que service",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Qu'est-ce que la bibliothèque de labelencoder de Scikit-apprendre?",
+ "answerOptions": [
+ {
+ "answerText": "Encode les données par ordre alphabétique",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Encode les données numériquement",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Encode des données en série",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 19,
+ "title": "Classification 1: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "La classification est une forme d'apprentissage supervisé qui a beaucoup en commun avec",
+ "answerOptions": [
+ {
+ "answerText": "Série chronologique",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Techniques de régression",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle question peut aider la classification à répondre?",
+ "answerOptions": [
+ {
+ "answerText": "Est-ce que ce courrier électronique ou pas?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Les cochons peuvent voler?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Quel est le sens de la vie?",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle est la première étape pour utiliser des techniques de classification?",
+ "answerOptions": [
+ {
+ "answerText": "Création de cours d'un jeu de données",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Nettoyer et équilibrer vos données",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Affectation d'un point de données à un groupe ou à un résultat",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 20,
+ "title": "Classification 1: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Qu'est-ce qu'une question multiclasse?",
+ "answerOptions": [
+ {
+ "answerText": "La tâche de classer les points de données dans plusieurs classes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La tâche de classifier les points de données dans l'une des plusieurs classes",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "La tâche de nettoyer les points de données de plusieurs manières",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Il est important de nettoyer des données récurrentes ou inutiles pour aider vos classificateurs à résoudre votre problème.",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle est la meilleure raison d'équilibrer vos données?",
+ "answerOptions": [
+ {
+ "answerText": "Les données déséquilibrées ont l'air mauvais dans les visualisations",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "L'équilibrage de vos données donne des résultats meilleurs, car un modèle ML n'enfraigne pas vers une classe",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "L'équilibrage de vos données vous donne plus de points de données",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 21,
+ "title": "Classification 2: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Les données équilibrées et propres ont produit les meilleurs résultats de la classification",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Comment choisissez-vous le bon classificateur?",
+ "answerOptions": [
+ {
+ "answerText": "Comprend quel classificateurs fonctionnent le mieux pour quels scénarios",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Devineuse éduquée et chèque",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "La classification est un type de",
+ "answerOptions": [
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Apprentissage supervisé",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Langage de programmation",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 22,
+ "title": "Classification 2: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Qu'est-ce qu'un \"solveur\" ?",
+ "answerOptions": [
+ {
+ "answerText": "La personne qui vérifie votre travail",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "L'algorithme à utiliser dans le problème d'optimisation",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Une technique de machine learning",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel classificateur avons-nous utilisé dans cette leçon?",
+ "answerOptions": [
+ {
+ "answerText": "Régression logistique",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Arbres de décision",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Multiclasse un-contre-tous",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Comment savez-vous si l'algorithme de classification fonctionne comme prévu?",
+ "answerOptions": [
+ {
+ "answerText": "En vérifiant la précision de ses prévisions",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "En le contrôlant contre d'autres algorithmes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "En regardant des données historiques pour la qualité de cet algorithme de résoudre des problèmes similaires",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 23,
+ "title": "Classification 3: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Un bon classificateur initial à essayer est:",
+ "answerOptions": [
+ {
+ "answerText": "SVC linéaire",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "K-Means",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "SVC logique",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Contrôles de régularisation:",
+ "answerOptions": [
+ {
+ "answerText": "L'influence des paramètres",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "L'influence de la vitesse de formation",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "L'influence des valeurs aberrantes",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Le classificateur K-voisins peut être utilisé pour:",
+ "answerOptions": [
+ {
+ "answerText": "Apprentissage supervisé",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "L'apprentissage non supervisé",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 24,
+ "title": "Classification 3: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Les classificateurs de support-vectoriel peuvent être utilisés pour",
+ "answerOptions": [
+ {
+ "answerText": "La classification",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La régression",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Forêt aléatoire est un type de classificateur ___",
+ "answerOptions": [
+ {
+ "answerText": "Ensembliste",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Disensembliste",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Assembliste",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Adaboost est connu pour:",
+ "answerOptions": [
+ {
+ "answerText": "Se concentrer sur les poids des éléments incorrectement classifiés",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Se concentrer sur des valeurs aberrantes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Se concentrer sur des données incorrectes",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 25,
+ "title": "Classification 4: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Les systèmes de recommandation peuvent être utilisés pour",
+ "answerOptions": [
+ {
+ "answerText": "Recommander un bon restaurant",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Recommander des modes à essayer",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "L'intégration d'un modèle dans une application Web l'aide à être compatible hors ligne",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Onnx Runtime peut être utilisé pour",
+ "answerOptions": [
+ {
+ "answerText": "Exécution de modèles dans une application Web",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Modèles de formation",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Réglage des hyperparamètres",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 26,
+ "title": "Classification 4: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "L'application Netron vous aide:",
+ "answerOptions": [
+ {
+ "answerText": "Visualiser les données",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Visualisez la structure de votre modèle",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Testez votre application Web",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Convertissez votre modèle Scikit-learnL pour une utilisation avec Onnx en utilisant:",
+ "answerOptions": [
+ {
+ "answerText": "Sklearn-app",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Sklearn-web",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Sklearn-onnX",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "L'utilisation de votre modèle dans une application Web s'appelle:",
+ "answerOptions": [
+ {
+ "answerText": "Inférence",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Interférence",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Assurance",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 27,
+ "title": "Introduction au Clustering (regroupement): Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Un exemple de vie réel de regroupement serait",
+ "answerOptions": [
+ {
+ "answerText": "Définir la table du dîner",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Tri du linge",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Shopping de l'épicerie",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Les techniques de clustering peuvent être utilisées dans ces industries",
+ "answerOptions": [
+ {
+ "answerText": "Banking",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "e-commerce",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "La Clustering est un type :",
+ "answerOptions": [
+ {
+ "answerText": "D'apprentissage supervisé",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "D'apprentissage non supervisé",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "D'apprentissage de renforcement",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 28,
+ "title": "Introduction au Clustering (regroupement): Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "La géométrie Euclidienne est disposée le long",
+ "answerOptions": [
+ {
+ "answerText": "De plans",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "De courbes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "De sphères",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La densité de vos données de clustering est liée à son / sa",
+ "answerOptions": [
+ {
+ "answerText": "Bruit",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Profondeur",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Validité",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "L'algorithme de regroupement le plus connu est",
+ "answerOptions": [
+ {
+ "answerText": "k-means",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "K-middle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "K-mart",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 29,
+ "title": "K-Means Clustering: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "K-Means est dérivé de:",
+ "answerOptions": [
+ {
+ "answerText": "Génie électrique",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Traitement du signal",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Linguistics informatiques",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Un bon score de silhouette signifie:",
+ "answerOptions": [
+ {
+ "answerText": "Les grappes sont bien séparées et bien définies",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Il y a peu de grappes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Il y a beaucoup de clusters",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La variance est:",
+ "answerOptions": [
+ {
+ "answerText": "La moyenne des différences carrées de la moyenne",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un problème de regroupement s'il devient trop élevé",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 30,
+ "title": "K-Means Clustering: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Un diagramme de Voronoi montre:",
+ "answerOptions": [
+ {
+ "answerText": "Une variance d'une cluster",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La graine d'une grappe et sa région",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "L'inertie d'une cluster",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "L'inertie est",
+ "answerOptions": [
+ {
+ "answerText": "Une mesure de la manière dont les clusters cohérents internes sont",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Une mesure de la quantité de grappes déplacées",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une mesure de la qualité des grappes",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "en utilisant k-moyen, vous devez d'abord déterminer la valeur de 'k'",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 31,
+ "title": "Intro aux NLP: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Que signifie NLP pour ces leçons?",
+ "answerOptions": [
+ {
+ "answerText": "Neural Language Processing (Traitement des langues neurales)",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Natural Language Processing (Traitement des langues naturelles)",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Natural Linguistic Processing (Traitement linguistique naturel)",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Eliza était un bot précoce qui a agi comme un ordinateur",
+ "answerOptions": [
+ {
+ "answerText": "Thérapeute",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Docteur",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Infirmière",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Le test Turing d'Alan Turing a essayé de déterminer si un ordinateur était",
+ "answerOptions": [
+ {
+ "answerText": "Indiscernable d'un humain",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Pensif",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 32,
+ "title": "Intro aux NLP: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Joseph Weizenbaum a inventé le bot",
+ "answerOptions": [
+ {
+ "answerText": "Elisha",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Eliza",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Eloise",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Un bot conversationnel donne une sortie basée sur",
+ "answerOptions": [
+ {
+ "answerText": "Un choix de choix prédéfinis au hasard",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Analyse de l'entrée et de l'utilisation de l'intelligence de la machine",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Comment feriez-vous pour que le bot soit plus efficace?",
+ "answerOptions": [
+ {
+ "answerText": "En le demandant plus de questions.",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "En lui fournissant plus de données et en le formant en conséquence",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Le bot est stupide, il ne peut pas apprendre :(",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 33,
+ "title": "Tâches NLP: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "La tokenization",
+ "answerOptions": [
+ {
+ "answerText": "Divise le texte au moyen de la ponctuation",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Divise le texte en jetons séparés (mots)",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Divise le texte en phrases",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "L'Embeddings",
+ "answerOptions": [
+ {
+ "answerText": "Convertit numériquement les données de texte afin que les mots puissent se classer",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Intégre des mots en phrases",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Intégre des phrases dans les paragraphes",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Le balisage des parties du discours (Parts-of-Speech Tagging)",
+ "answerOptions": [
+ {
+ "answerText": "Divise les phrases en fonction de leurs parties du discours",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "prend les mots tokenisés et les marque selon leur partie du discours",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "schématise des phrases",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 34,
+ "title": "Tâches NLP: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Construisez un dictionnaire de la fréquence à laquelle les mots se reproduisent en utilisant:",
+ "answerOptions": [
+ {
+ "answerText": "Dictionnaire de mots et d'expressions",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Fréquences de mots et de phrases",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bibliothèque de mots et de phrases",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "N-grams fait référence à",
+ "answerOptions": [
+ {
+ "answerText": "Un texte pouvant être divisé en séquences de mots d'une longueur définie",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un mot pouvant être divisé en séquences de caractères d'une longueur de jeu",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un texte pouvant être divisé en paragraphes d'une longueur définie",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Analyse du sentiment",
+ "answerOptions": [
+ {
+ "answerText": "analyse une phrase pour la positivité ou la négativité",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "analyse une phrase pour sentimentalité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "analyse une phrase pour la tristesse",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 35,
+ "title": "NLP et traduction: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "La traduction naïve",
+ "answerOptions": [
+ {
+ "answerText": "Traduit uniquement les mots",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Traduit la structure de la phrase",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Traduit le sentiment",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Un *corpus* de textes fait référence à",
+ "answerOptions": [
+ {
+ "answerText": "Un petit nombre de textes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un grand nombre de textes",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un texte standard",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Si un modèle ML a suffisamment de traductions humaines pour construire un modèle, il peut",
+ "answerOptions": [
+ {
+ "answerText": "Abréger des traductions",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Normaliser les traductions",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Améliorer la précision des traductions",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 36,
+ "title": "NLP et traduction: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "La bibliothèque de traduction de texte sous-jacente est:",
+ "answerOptions": [
+ {
+ "answerText": "Google Translate",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un modèle ML personnalisé",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Pour utiliser `blob.translate` vous avez besoin de:",
+ "answerOptions": [
+ {
+ "answerText": "Une connexion Internet",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un dictionnaire",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "JavaScript",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Pour déterminer un sentiment, une approche ML serait d':",
+ "answerOptions": [
+ {
+ "answerText": "Appliquer des techniques de régression pour générer manuellement des opinions et des scores et rechercher des modèles",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Appliquer des techniques de PNL pour générer manuellement des opinions et des scores et rechercher des modèles",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Appliquer des techniques de regroupement pour des opinions et des scores générés manuellement et rechercher des modèles",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 37,
+ "title": "NLP 4: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Quelles informations pouvons-nous obtenir du texte écrit ou parlé par un humain?",
+ "answerOptions": [
+ {
+ "answerText": "motifs et fréquences",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Sentiment et signification",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Qu'est-ce que l'analyse du sentiment?",
+ "answerOptions": [
+ {
+ "answerText": "Une étude sur la question de savoir si un héritage de famille a une valeur sentimentale",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une méthode d'identification systématique, d'extraction, de quantification et d'étude des états affectifs et des informations subjectives",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "La capacité de savoir si quelqu'un est triste ou heureux",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle question pourrait être répondue à l'aide d'un jeu de données de critiques hôteliers, de python et d'analyse de sentiment?",
+ "answerOptions": [
+ {
+ "answerText": "Quels sont les mots et expressions les plus fréquemment utilisés dans les critiques?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Quel hôtel a la meilleure piscine?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Y a-t-il un service de voiturier dans cet hôtel?",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 38,
+ "title": "NLP 4: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Quelle est l'essence de la NLP?",
+ "answerOptions": [
+ {
+ "answerText": "catégoriser la langue humaine en joyeuse ou triste",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Interprétation de sens ou de sentiment sans avoir un humain pour le faire",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Trouver des valeurs aberrantes dans le sentiment et les examiner",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelles sont certaines choses que vous pourriez rechercher lors du nettoyage des données?",
+ "answerOptions": [
+ {
+ "answerText": "Personnages dans d'autres langues",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Lignes vierges ou colonnes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Il est important de comprendre votre donnée et ses faiblesses avant d'effectuer des opérations à ce sujet.",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 39,
+ "title": "NLP 5: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Pourquoi est-il important de nettoyer les données avant de l'analyser?",
+ "answerOptions": [
+ {
+ "answerText": "Certaines colonnes pourraient avoir des données manquantes ou incorrectes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les données en désordre peuvent conduire à de fausses conclusions sur le jeu de données",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel est un exemple d'une stratégie de nettoyage des données?",
+ "answerOptions": [
+ {
+ "answerText": "Supprimer des colonnes / rangées qui ne sont pas utiles pour répondre à une question spécifique",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Se débarrasser des valeurs vérifiées qui ne correspondent pas à votre hypothèse",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Déplacement des valeurs aberrantes vers une table séparée et exécutant les calculs de cette table pour voir s'ils correspondent",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Il peut être utile de classer les données à l'aide d'une colonne Tag.",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 40,
+ "title": "NLP 5: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Quel est l'objectif de l'ensemble de données?",
+ "answerOptions": [
+ {
+ "answerText": "Voir combien de critiques négatives et positives il y a pour les hôtels à travers le monde",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Ajouter du sentiment et des colonnes qui vous aideront à choisir le meilleur hôtel",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Analyser pourquoi les gens laissent des critiques spécifiques",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quels sont les mots d'arrêt?",
+ "answerOptions": [
+ {
+ "answerText": "Mots anglais communs qui ne changent pas le sentiment d'une phrase",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Mots que vous pouvez supprimer pour accélérer l'analyse du sentiment",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Pour tester l'analyse du sentiment, assurez-vous qu'il correspond au score du critique pour le même examen.",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 41,
+ "title": "Introduction aux Séries chronologiques (Time Series) : Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "La prévision de série chronologique est utile pour",
+ "answerOptions": [
+ {
+ "answerText": "Déterminer les coûts futurs",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Prédire les prix futurs",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux à la fois",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Une série chronologique est une séquence prise à:",
+ "answerOptions": [
+ {
+ "answerText": "points successifs également espacés dans l'espace",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "points successifs également espacés dans le temps",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "points successifs également espacés dans l'espace et le temps",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La série chronologique peut être utilisée dans les cas de:",
+ "answerOptions": [
+ {
+ "answerText": "Prévision de tremblement de terre",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Vision informatique",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Analyse des couleurs",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 42,
+ "title": "Introduction aux séries chronologiques : Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Les tendances de série chronologique sont",
+ "answerOptions": [
+ {
+ "answerText": "des augmentations et des diminutions mesurables au fil du temps",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "La quantification des diminutions au fil du temps",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Des lacunes entre augmentations et diminution au fil du temps",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Les valeurs aberrantes sont des",
+ "answerOptions": [
+ {
+ "answerText": "Points proches de la variance de données standard",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Points loin de la variance de données standard",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Points dans la variance des données standard",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La prévision de séries chronologiques est utile pour",
+ "answerOptions": [
+ {
+ "answerText": "L'économétrie",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "L'histoire",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les bibliothèques",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 43,
+ "title": "Les séries chronologiques ARIMA: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "ARIMA signifie",
+ "answerOptions": [
+ {
+ "answerText": "AutoRegressive Integral Moving Average",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "AutoRegressive Integrated Moving Action",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "AutoRegressive Integrated Moving Average",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "La stationnarité fait référence à",
+ "answerOptions": [
+ {
+ "answerText": "Les données dont les attributs ne changent pas lors de la décalage",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les données dont la distribution ne change pas lors de la décalage de temps",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Les données dont la distribution change lors de la décalage",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "La différenciation",
+ "answerOptions": [
+ {
+ "answerText": "Stabilise la tendance et la saisonnalité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Exacerbe la tendance et la saisonnalité",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Élimine la tendance et la saisonnalité",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 44,
+ "title": "Les séries chronologiques ARIMA: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Arima est utilisé pour créer un modèle adapté à la forme spéciale des données de la série chronologique",
+ "answerOptions": [
+ {
+ "answerText": "aussi plat que possible",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "aussi étroitement que possible",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "via ScatterPlots",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Utilisez Sarimax pour",
+ "answerOptions": [
+ {
+ "answerText": "Gérer les modèles d'ARIMA saisonniers",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Gérer des modèles spéciaux Arima",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Gérer les modèles statistiques ARIMA",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": " La validation « Walk-Forward » implique de",
+ "answerOptions": [
+ {
+ "answerText": "Réévaluer un modèle progressivement tel qu'il est validé",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Re-entraîner un modèle progressivement tel qu'il est validé",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Re-configurer un modèle progressivement tel qu'il est validé",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }, {
+ "id": 45,
+ "title": "Renforcement 1: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Qu'est-ce que l'apprentissage du renforcement?",
+ "answerOptions": [
+ {
+ "answerText": "Enseigner à quelqu'un quelque chose encore et encore jusqu'à ce qu'ils comprennent",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une technique d'apprentissage qui déchiffre le comportement optimal d'un agent dans certains environnements en exécutant de nombreuses expériences",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Comprendre comment exécuter plusieurs expériences à la fois",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Qu'est-ce qu'une politique?",
+ "answerOptions": [
+ {
+ "answerText": "une fonction qui renvoie l'action à tout état donné",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Un document qui vous dit si vous pouvez renvoyer ou non un article",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une fonction utilisée à des fins aléatoires",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Une fonction de récompense renvoie un score pour chaque état d'environnement.",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }, {
+ "id": 46,
+ "title": "Renforcement 1: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Qu'est-ce que le Q-Learning?",
+ "answerOptions": [
+ {
+ "answerText": "Un mécanisme d'enregistrement de la \"bonté\" de chaque État",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Un algorithme où la politique est définie par une table Q",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Les deux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Pour quelles valeurs une Q-Table correspond à la stratégie de marche aléatoire?",
+ "answerOptions": [
+ {
+ "answerText": "toutes les valeurs égales",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "-0,25",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "toutes les valeurs différentes",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Il valait mieux utiliser l'exploration que l'exploitation pendant le processus d'apprentissage de notre leçon.",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 47,
+ "title": "Renforcement 2: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Les échecs et le go sont des jeux avec des états continus",
+ "answerOptions": [
+ {
+ "answerText": "Vrai",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Faux",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel est le problème CartPole ?",
+ "answerOptions": [
+ {
+ "answerText": "Un processus d'élimination des valeurs aberrantes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une méthode d'optimisation de votre panier",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Une version simplifiée d'équilibrage",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel outil pouvons-nous utiliser pour jouer à différents scénarios d'états potentiels dans un jeu?",
+ "answerOptions": [
+ {
+ "answerText": "Devinez et chèque",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Environnements de simulation",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Test de transition de l'état",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 48,
+ "title": "Renforcement 2: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Où définissons-nous toutes les actions possibles dans un environnement?",
+ "answerOptions": [
+ {
+ "answerText": "Méthodes",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "espace d'action",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Liste d'action",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle paire avons-nous utilisée comme valeur de la clé de dictionnaire?",
+ "answerOptions": [
+ {
+ "answerText": "(état, action) comme clé, l'entrée Q-Table comme valeur",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "L'état comme clé, action en tant que valeur",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "La valeur de la fonction QValues est la clé, l'action en tant que valeur",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quels sont les hyperparamètres que nous avons utilisés pendant le Q-Learning?",
+ "answerOptions": [
+ {
+ "answerText": "Valeur de la table Q, récompense actuelle, action aléatoire",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Taux d'apprentissage, facteur de réduction, facteur d'exploration / d'exploitation",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Récompenses cumulatives, taux d'apprentissage, facteur d'exploration",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 49,
+ "title": "Applications du monde réel: Quiz préalable",
+ "quiz": [
+ {
+ "questionText": "Quel est un exemple d'application ML dans l'industrie des finances?",
+ "answerOptions": [
+ {
+ "answerText": "Personnaliser le voyage client à l'aide de NLP",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Gestion de la richesse à l'aide de la régression linéaire",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Gestion de l'énergie à l'aide de séries chronologiques",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle technique ML peut utiliser les hôpitaux pour gérer la réadmission?",
+ "answerOptions": [
+ {
+ "answerText": "Le Clustering (Regroupement)",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Les séries chronologiques",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Le NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel est un exemple d'utilisation des séries chronologiques pour la gestion de l'énergie?",
+ "answerOptions": [
+ {
+ "answerText": "Animaux de détection de mouvement",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Parkings intelligents",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Suivi des incendies de forêt",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 50,
+ "title": "Applications du monde réel: Quiz de validation des connaissances",
+ "quiz": [
+ {
+ "questionText": "Quelle technique ML peut être utilisée pour détecter la fraude par carte de crédit?",
+ "answerOptions": [
+ {
+ "answerText": "régression",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Clustering",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quelle technique ML est illustrée dans la gestion forestière?",
+ "answerOptions": [
+ {
+ "answerText": "Apprentissage du renforcement",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Série chronologique",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "NLP",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Quel est un exemple d'application ML dans l'industrie des soins de santé?",
+ "answerOptions": [
+ {
+ "answerText": "Prédire le comportement des étudiants en utilisant la régression",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Gestion des essais cliniques à l'aide de classificateurs",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Sensation de mouvement des animaux utilisant des classificateurs",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+}]
\ No newline at end of file
diff --git a/quiz-app/src/assets/translations/index.js b/quiz-app/src/assets/translations/index.js
index e4abf6eb..f93b84e1 100644
--- a/quiz-app/src/assets/translations/index.js
+++ b/quiz-app/src/assets/translations/index.js
@@ -1,12 +1,16 @@
// index.js
import en from './en.json';
import tr from './tr.json';
+import fr from './fr.json';
+import ja from './ja.json';
//export const defaultLocale = 'en';
const messages = {
en: en[0],
tr: tr[0],
+ fr: fr[0],
+ ja: ja[0]
};
export default messages;
diff --git a/quiz-app/src/assets/translations/ja.json b/quiz-app/src/assets/translations/ja.json
new file mode 100644
index 00000000..4696347f
--- /dev/null
+++ b/quiz-app/src/assets/translations/ja.json
@@ -0,0 +1,2815 @@
+[
+ {
+ "title": "初心者のための機械学習: 小テスト",
+ "complete": "おめでとうございます、小テストを完了しました!",
+ "error": "すみません、もう一度試してみてください。",
+ "quizzes": [
+ {
+ "id": 1,
+ "title": "機械学習への導入: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "機械学習の応用は身近にある",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "古典的機械学習と深層学習の技術的な違いは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "古典的機械学習のほうが先に発明された",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ニューラルネットワークを使用するかどうか",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "深層学習はロボットに使用されている",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "なぜ企業は機械学習の戦略を使いたいと思うのでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "多元的な問題の解決を自動化するため",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "顧客の種類に応じてショッピング体験をカスタマイズするため",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 2,
+ "title": "機械学習への導入: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "機械学習のアルゴリズムがシミュレートするのは",
+ "answerOptions": [
+ {
+ "answerText": "賢いマシン",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "人間の脳",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "オランウータン",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "古典的機械学習の手法の例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "自然言語処理",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "深層学習",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ニューラルネットワーク",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "なぜ全員が機械学習の基礎を学ぶべきなのでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "機械学習を学ぶのは誰でも始めやすくて楽しいから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "機械学習の戦略は多くの産業や領域で使用されているから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 3,
+ "title": "機械学習の歴史: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "「人工知能」という言葉が生まれたのはいつ頃でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "1980年代",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "1950年代",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "1930年代",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "機械学習における先駆者のうちのひとりは誰でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "アラン・チューリング",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ビル・ゲイツ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "シェーキー",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "1970年代にAIの進歩が鈍化した理由のひとつは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "計算能力の限界",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "熟練したエンジニアの不足",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "国家間の紛争",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 4,
+ "title": "機械学習の歴史: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "「みすぼらしい」AIシステムの例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "ELIZA",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "HACKML",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "SSYSTEM",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "「黄金期」に開発された技術の例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "Blocks world",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Jibo",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ロボット犬",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "人工知能の分野において誕生と発展の礎になった出来事はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "チューリングテスト",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ダートマス夏期研究会",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "AIの冬",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 5,
+ "title": "公平性と機械学習: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "機械学習において不公平性が起こりうるのは",
+ "answerOptions": [
+ {
+ "answerText": "故意で",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "過失で",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方で",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "機械学習において「不公平性」が意味するのは",
+ "answerOptions": [
+ {
+ "answerText": "あるグループの人々に対する弊害",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ひとりの人に対する弊害",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "大多数の人々に対する弊害",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "5つの主な弊害は",
+ "answerOptions": [
+ {
+ "answerText": "割り当て・サービスの質・偏見・誹謗中傷・表現の過不足",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "移住・サービスの質・偏見・誹謗中傷・表現の過不足",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "割り当て・サービスの質・ステレオ・誹謗中傷・表現の過不足",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 6,
+ "title": "公平性と機械学習: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "モデルに不公平性が発生する原因のひとつは",
+ "answerOptions": [
+ {
+ "answerText": "過去のデータに対する依存度が高すぎること",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "過去のデータに対する依存度が低すぎること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "過去のデータとの整合性が高すぎること",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "不公平性を緩和するためにできることは",
+ "answerOptions": [
+ {
+ "answerText": "弊害とそれを受けるグループの特定",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "公平性に関する指標の定義",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Fairlearnパッケージができることは",
+ "answerOptions": [
+ {
+ "answerText": "複数のモデル間で公平性とパフォーマンスの指標を比較",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ニーズに応じた最適なモデルの選択",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "何が公平で、何がそうでないかの判断を助けること",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 7,
+ "title": "ツールと手法: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "モデルを構築する際にすべきなのは",
+ "answerOptions": [
+ {
+ "answerText": "データを準備してから学習すること",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "学習方法を選んでからデータを準備すること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "パラメータを調整してから学習すること",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "データの〇〇が機械学習モデルの質に影響を与える",
+ "answerOptions": [
+ {
+ "answerText": "量",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "形",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "特徴量とは",
+ "answerOptions": [
+ {
+ "answerText": "データの質",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データの測定可能な特性",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データの列",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 8,
+ "title": "ツールと手法: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "データを可視化すべき理由は",
+ "answerOptions": [
+ {
+ "answerText": "外れ値を発見できるから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "バイアスの原因を発見できるから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "分割するデータの種類は",
+ "answerOptions": [
+ {
+ "answerText": "訓練データとチューリングデータ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "訓練データとテストデータ",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "検証データと評価データ",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "様々な機械学習ライブラリで学習プロセスを開始する一般的なコマンドは",
+ "answerOptions": [
+ {
+ "answerText": "model.travel",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "model.train",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "model.fit",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 9,
+ "title": "回帰への導入: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "次の変数のうち、数値の変数はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "身長",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "性別",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "髪の色",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "次の変数のうち、カテゴリーの変数はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "心拍数",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "血液型",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "体重",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "次の問題のうち、回帰分析に基づく問題はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "学生の期末試験の点数を予測する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ある人物の血液型を予測する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "メールがスパムかどうかを判定する",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 10,
+ "title": "回帰への導入: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "機械学習モデルの学習精度が95%でテスト精度が30%の場合、どんな状態であると呼ばれるでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "過学習",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "未学習",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "二重学習",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "特徴量の中から重要なものを特定するプロセスの名前は",
+ "answerOptions": [
+ {
+ "answerText": "特徴量抽出",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "特徴量の次元削減",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "特徴量選択",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Scikit Learn の 'train_test_split()' メソッド/関数を使用して、データセットを一定の割合で訓練データセットとテストデータセットに分割する処理の名前は",
+ "answerOptions": [
+ {
+ "answerText": "交差検証",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ホールドアウト検証",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ひとつ抜き検証",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 11,
+ "title": "回帰のためにデータを準備して可視化する: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "次のPythonモジュールのうち、データを可視化するために使用されるものはどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "Numpy",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Scikit-learn",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Matplotlib",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "データセットの広がり方やその他の特性を理解するために実行するのは",
+ "answerOptions": [
+ {
+ "answerText": "データの可視化",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データの前処理",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "訓練データとテストデータの分割",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "次のうち、機械学習プロジェクトにおいてデータ可視化ステップの一部であるものはどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "特定の機械学習アルゴリズムを取り入れる",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "様々なプロット方法を使ってデータの図解表現を作成する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データセットの値を正規化する",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 12,
+ "title": "回帰のためにデータを準備して可視化する: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "次のコードスニペットのうち、データセットに欠損値が含まれているかどうかを確認するものとして、このレッスンにおいて正しいのはどれでしょうか?なお、データセットはPandasのDataFrameオブジェクトである 'dataset' という変数に格納されているものとします。",
+ "answerOptions": [
+ {
+ "answerText": "dataset.isnull().sum()",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "findMissing(dataset)",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sum(null(dataset))",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "次のプロット方法のうち、データセットの異なるデータグループの広がり方を理解するために有効なものはどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "散布図",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "折れ線グラフ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "棒グラフ",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "データ可視化が教えてくれないことは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "データポイント間の関係",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データセットの収集源",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データセットに外れ値が含まれているかどうか",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 13,
+ "title": "線形および多項式回帰: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "Matplotlibは",
+ "answerOptions": [
+ {
+ "answerText": "描画ライブラリ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データ可視化ライブラリ",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "図書館",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "線形回帰が変数間の関係をプロットする方法は",
+ "answerOptions": [
+ {
+ "answerText": "直線",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "円J",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "曲線",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "優れた線形回帰モデルは〇〇相関係数を持つ",
+ "answerOptions": [
+ {
+ "answerText": "低い",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "高い",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "平坦な",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 14,
+ "title": "線形および多項式回帰: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "データが線形でない場合、〇〇回帰を試すと良い",
+ "answerOptions": [
+ {
+ "answerText": "線形",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "球面",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "多項式",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "すべて回帰法の種類なのは",
+ "answerOptions": [
+ {
+ "answerText": "フォルスステップ・リッジ・ラッソ・エラスティックネット",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ステップワイズ・リッジ・ラッソ・エラスティックネット",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ステップワイズ・リッジ・ラリアット・エラスティックネット",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "最小二乗回帰は回帰直線のまわりのすべてのデータポイントが",
+ "answerOptions": [
+ {
+ "answerText": "二乗してから減算されている",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "乗算されている",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "二乗してから加算されている",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 15,
+ "title": "ロジスティック回帰: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "ロジスティック回帰が予測するのは",
+ "answerOptions": [
+ {
+ "answerText": "りんごが熟しているかどうか",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "チケットが月にいくつ売れるか",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "明日の午後6時に空が何色になるか",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "ロジスティック回帰の種類に含まれるのは",
+ "answerOptions": [
+ {
+ "answerText": "多項と基本",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "多項と順序",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "主要と順序",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "あなたのデータには弱い相関があります。最適な回帰の種類は",
+ "answerOptions": [
+ {
+ "answerText": "ロジスティック",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "線形",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "基本",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 16,
+ "title": "ロジスティック回帰: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "Seabornは",
+ "answerOptions": [
+ {
+ "answerText": "データ可視化ライブラリ",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "地図ライブラリ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "数学ライブラリ",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "混同行列の別名は",
+ "answerOptions": [
+ {
+ "answerText": "誤差行列",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "真理行列",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "精度行列",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "良いモデルは",
+ "answerOptions": [
+ {
+ "answerText": "混同行列に多くの偽陽性と真陰性を含む",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "混同行列に多くの真陽性と真陰性を含む",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "混同行列に多くの真陽性と偽陰性を含む",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 17,
+ "title": "Webアプリを構築する: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "ONNXは何の略でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "Over Neural Network Exchange",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Open Neural Network Exchange",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Output Neural Network Exchange",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Flaskは作成者にどのように定義されているでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "ミニフレームワーク",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ラージフレームワーク",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "マイクロフレームワーク",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "PythonのPickleモジュールが行うのは",
+ "answerOptions": [
+ {
+ "answerText": "パイソンオブジェクトのシリアライズ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Pythonオブジェクトのデシリアライズ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Pythonオブジェクトのシリアライズとデシリアライズ",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 18,
+ "title": "Webアプリを構築する: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "事前学習済みモデルをWeb上にホスティングするために使えるPythonのツールは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "Flask",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "TensorFlow.js",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "onnx.js",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "SaaSは何の略でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "System as a Service",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Software as a Service",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Security as a Service",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Scikit-learn の LabelEncoder ライブラリが行うことは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "データをアルファベットにエンコードすること",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データを数値にエンコードすること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データをシリアルにエンコードすること",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 19,
+ "title": "分類 1: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "分類は教師あり学習の一種であり、多くの共通点を持つのは",
+ "answerOptions": [
+ {
+ "answerText": "時系列",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "回帰手法",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "自然言語処理",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "分類はどのような疑問に答えられるでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "このメールはスパムでしょうか?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "豚は飛べるでしょうか?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "人生の意味とは何でしょうか?",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "分類手法を使うための最初のステップは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "データのクラスを作成すること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データのクリーニングとバランシング",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データポイントをグループまたは結果に割り当てること",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 20,
+ "title": "分類 1: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "多クラス問題とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "データポイントを複数のクラスに分類するタスク",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データポイントを複数のクラスのどれかに分類するタスク",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データポイントを複数の方法でクリーニングするタスク",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "分類器が問題を解決するためには、何度も現れるデータや役に立たないデータを除くことが重要である",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "データをバランシングする一番の理由は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "不均衡データは可視化すると見栄えが悪いから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "データをバランシングすると機械学習モデルがひとつのクラスに偏らず、良い結果が得られるから",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "データをバランシングするとより多くのデータポイントが得られるから",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 21,
+ "title": "分類 2: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "バランシングおよびクリーニングされたデータが最も良い分類結果につながる",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "正しい分類器はどのように選ぶでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "どの分類器がどの場面に最適かを理解する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "経験に基づく推測と確認",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "分類は一種の",
+ "answerOptions": [
+ {
+ "answerText": "自然言語処理",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "教師あり学習",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "プログラミング言語",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 22,
+ "title": "分類 2: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "「ソルバ」とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "自分の仕事をダブルチェックしてくれる人",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "最適化問題で使用するアルゴリズム",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "機械学習の手法",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "レッスンで使用した分類器はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "ロジスティック回帰",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "決定木",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "一対他の多クラス",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "分類アルゴリズムが期待通りに動作しているかどうかは、どのようにして知ることができるでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "予測の精度を確認する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "他のアルゴリズムと比較する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "似た問題を解決した際にどれだけ優れていたかを過去のデータから確認する",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 23,
+ "title": "分類 3: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "最初に試すのに適した分類器は",
+ "answerOptions": [
+ {
+ "answerText": "線形サポートベクター分類器",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "K-Means",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "論理サポートベクター分類器",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "正則化がコントロールするのは",
+ "answerOptions": [
+ {
+ "answerText": "パラメータの影響",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "学習スピードの影響",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "外れ値の影響",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "k近傍分類器が使えるのは",
+ "answerOptions": [
+ {
+ "answerText": "教師あり学習",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "教師なし学習",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 24,
+ "title": "分類 3: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "サポートベクター分類器が使えるのは",
+ "answerOptions": [
+ {
+ "answerText": "分類",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "回帰",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "ランダムフォレストは〇〇な分類器の一種である",
+ "answerOptions": [
+ {
+ "answerText": "アンサンブル",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ディセンブル",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "アセンブル",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "アダブーストは次のように知られている",
+ "answerOptions": [
+ {
+ "answerText": "誤って分類された要素の重みに着目する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "外れ値に着目する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "誤ったデータに着目する",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 25,
+ "title": "分類 4: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "推薦システムが使えるのは",
+ "answerOptions": [
+ {
+ "answerText": "良いレストランの推薦",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "試すべきファッションの推薦",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Webアプリにモデルを埋め込むことでオフライン対応が可能になる",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Onnx Runtime が使えるのは",
+ "answerOptions": [
+ {
+ "answerText": "Webアプリの中でモデルを実行する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "モデルを学習する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ハイパーパラメータのチューニング",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 26,
+ "title": "分類 4: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "Netronアプリが役立つのは",
+ "answerOptions": [
+ {
+ "answerText": "データの可視化",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "モデル構造の可視化",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Webアプリのテスト",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Scikit-learn モデルをOnnxで扱えるようにするために使うのは",
+ "answerOptions": [
+ {
+ "answerText": "sklearn-app",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sklearn-web",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "sklearn-onnx",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "Webアプリでモデルを使うことは、通称",
+ "answerOptions": [
+ {
+ "answerText": "推論",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "干渉",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "保険",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 27,
+ "title": "クラスタリングへの導入: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "クラスタリングの実例は",
+ "answerOptions": [
+ {
+ "answerText": "食卓の準備",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "洗濯物の分類",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "食料品の買い物",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "クラスタリングの手法が使える産業は",
+ "answerOptions": [
+ {
+ "answerText": "銀行",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "電子商取引",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "クラスタリングは一種の",
+ "answerOptions": [
+ {
+ "answerText": "教師あり学習",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "教師なし学習",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "強化学習",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 28,
+ "title": "クラスタリングへの導入: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "ユークリッド幾何学が配置されるのは",
+ "answerOptions": [
+ {
+ "answerText": "平面",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "曲面",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "球面",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "クラスタリングデータの密度が関係するのは",
+ "answerOptions": [
+ {
+ "answerText": "ノイズ",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "深さ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "妥当性",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "最も有名なクラスタリングアルゴリズムは",
+ "answerOptions": [
+ {
+ "answerText": "k-means",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "k-middle",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "k-mart",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 29,
+ "title": "K-Means法: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "K-Means の派生元は",
+ "answerOptions": [
+ {
+ "answerText": "電気工学",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "信号処理",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "計算言語学",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "良いシルエットスコアとは",
+ "answerOptions": [
+ {
+ "answerText": "クラスタがよく分離されていて、よく定義されている",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "クラスタの数が少ない",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "クラスタの数が多い",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "分散とは",
+ "answerOptions": [
+ {
+ "answerText": "平均との差を二乗した値の平均",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "クラスタリングにおいて高くなりすぎると問題になるもの",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 30,
+ "title": "K-Means法: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "ボロノイ図が表すのは",
+ "answerOptions": [
+ {
+ "answerText": "クラスタの分散",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "クラスタのシードとその領域",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "クラスタの慣性",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "慣性とは",
+ "answerOptions": [
+ {
+ "answerText": "どれだけ内部にまとまっているクラスタかを表す指標",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "クラスタがどれだけ動くかを表す指標",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "クラスタの質を表す指標",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "K-Means を使う際は、最初に 'k' の値を決める必要がある",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 31,
+ "title": "自然言語処理への導入: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "レッスンにおいてNLPは何の略でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "Neural Language Processing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "natural language processing",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Natural Linguistic Processing",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "初期のボットであるElizaが演じていたのは",
+ "answerOptions": [
+ {
+ "answerText": "療法士",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "医者",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "看護師",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "アラン・チューリングの「チューリングテスト」が判定しようとしていたのは、コンピュータが",
+ "answerOptions": [
+ {
+ "answerText": "人間と見分けがつかないかどうか",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "思考しているかどうか",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 32,
+ "title": "自然言語処理への導入: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "ジョセフ・ワイゼンバウムが発明したボットは",
+ "answerOptions": [
+ {
+ "answerText": "Elisha",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Eliza",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Eloise",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "会話型のボットが出力する方法は",
+ "answerOptions": [
+ {
+ "answerText": "あらかじめ決められた選択肢からランダムに選ぶ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "入力を分析して機械知能を使う",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "ボットをより効果的にするにはどうすれば良いでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "ボットに対してより多くの質問をする",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ボットにより多くのデータを与えて、それに応じた学習をさせる",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ボットは頭が悪いので学習できない :(",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 33,
+ "title": "自然言語処理のタスク: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "トークン化は",
+ "answerOptions": [
+ {
+ "answerText": "文章を句読点で分割する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "文章をトークン(単語)に分割する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "文章をフレーズに分割する",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "埋め込みは",
+ "answerOptions": [
+ {
+ "answerText": "単語をクラスタ化するために文章を数値に変換する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "単語をフレーズに埋め込む",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "文を段落に埋め込む",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "品詞タグ付けは",
+ "answerOptions": [
+ {
+ "answerText": "文を品詞で分ける",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "トークン化された単語を品詞でタグ付けする",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "文を図で表す",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 34,
+ "title": "自然言語処理のタスク: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "単語の出現頻度に関する辞書を作る際に使うのは",
+ "answerOptions": [
+ {
+ "answerText": "単語とフレーズの辞書",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "単語とフレーズの出現頻度",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "単語とフレーズのライブラリ",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "N-grams とは",
+ "answerOptions": [
+ {
+ "answerText": "一定の長さの単語列に分割できる文章",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "一定の長さの文字列に分割できる単語",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "一定の長さの段落に分割できる文章",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "感情分析は",
+ "answerOptions": [
+ {
+ "answerText": "フレーズがポジティブかネガティブかを分析する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "フレーズが感傷的かどうかを分析する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "フレーズの悲しさを分析する",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 35,
+ "title": "自然言語処理と翻訳: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "単純な翻訳は",
+ "answerOptions": [
+ {
+ "answerText": "単語のみを翻訳する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "文章構造を翻訳する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "感情を翻訳する",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "文章の「コーパス」とは",
+ "answerOptions": [
+ {
+ "answerText": "少量の文章",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "大量の文章",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ひとつの標準的な文章",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "もしモデルを構築するのに十分な人間の翻訳があれば、機械学習モデルは",
+ "answerOptions": [
+ {
+ "answerText": "翻訳を省略できる",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "翻訳を標準化できる",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "翻訳の精度を高められる",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 36,
+ "title": "自然言語処理と翻訳: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "TextBlob の翻訳ライブラリの基盤は",
+ "answerOptions": [
+ {
+ "answerText": "Google翻訳",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "Bing",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "独自の機械学習モデル",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "`blob.translate` を使用するために必要なのは",
+ "answerOptions": [
+ {
+ "answerText": "インターネット接続",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "辞書",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "JavaScript",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "感情を判定するために機械学習のアプローチで行うのは",
+ "answerOptions": [
+ {
+ "answerText": "手で作成した意見やスコアに回帰の手法を適用して、パターンを探す",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "手で作成した意見やスコアに自然言語処理の手法を適用して、パターンを探す",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "手で作成した意見やスコアにクラスタリングの手法を適用して、パターンを探す",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 37,
+ "title": "自然言語処理 4: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "人間が書いたり話した文章から得られる情報は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "パターンと頻度",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "感情と意味",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "感情分析とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "家宝に感傷的な価値があるかどうかの研究",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "感情の状態や主観的な情報を体系的に識別・抽出・定量化・研究する方法",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "ある人物が悲しいのか楽しいのかを見分ける能力",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "ホテルのレビューのデータセット・Python・感情分析を使うことで答えられる質問は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "レビューで最もよく使われる単語やフレーズは何でしょうか?",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "どのリゾートに最も良いプールがあるでしょうか?",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "このホテルにはバレーパーキングがあるでしょうか?",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 38,
+ "title": "自然言語処理 4: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "自然言語処理の本質とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "人間の言葉を楽しいものと悲しいものに分類すること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "人の手を借りずに意味や感情を読み取ること",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "感情の異常を見つけて調べること",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "データをクリーニングする際に気を付けたほうが良いことは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "他の言語の文字",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "空の行や列",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "データを操作する前にデータとその弱点を理解するのが重要である",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 39,
+ "title": "自然言語処理 5: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "分析する前にデータをクリーニングすることが重要なのはなぜでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "データが欠損していたり不正だったりする列があるかもしれないから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "汚いデータはデータセットに関する誤った結論につながる可能性があるから",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "データクリーニング戦略の例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "特定の質問に答えるために有用でない列や行の削除",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "仮説に合わない検証値の排除",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "外れ値を別の表に移し、その表で計算を行って、一致するかどうかを確認",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Tag列を使ってデータを分類すると便利なことがある",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 40,
+ "title": "自然言語処理 5: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "データセットの目的は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "世界中のホテルに対する否定的および肯定的なレビューがいくつあるかを確認すること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "最も良いホテルを選ぶために役立つ意見と列を追加すること",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "人々がなぜ特定のレビューを残すのかを分析すること",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "ストップワードとは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "文章の印象を変えない一般的な単語",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "感情分析を高速化するために除去できる単語",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "感情分析をテストするには、同じレビューに対するレビュアーのスコアが一致していることを確認する",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 41,
+ "title": "時系列への導入: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "時系列予測が役立つのは",
+ "answerOptions": [
+ {
+ "answerText": "将来のコストを決めるとき",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "将来の価格を予測するとき",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "時系列とは、次のような列である",
+ "answerOptions": [
+ {
+ "answerText": "空間的に連続した等間隔の点",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "時間的に連続した等間隔の点",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "時間的および空間的に連続した等間隔の点",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "時系列が使用できるのは",
+ "answerOptions": [
+ {
+ "answerText": "地震予測",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "コンピュータビジョン",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "色解析",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 42,
+ "title": "時系列への導入: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "時系列のトレンドとは",
+ "answerOptions": [
+ {
+ "answerText": "時間の経過に伴う測定可能な増加と減少",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "時間の経過に伴う減少の定量化",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "時間の経過に伴う増加と減少の差",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "外れ値とは",
+ "answerOptions": [
+ {
+ "answerText": "標準的なデータの分散に近い点",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "標準的なデータの分散から離れた点",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "標準的なデータの分散の中にある点",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "時系列予測が最も有効なのは",
+ "answerOptions": [
+ {
+ "answerText": "計量経済学",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "歴史",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "図書館",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 43,
+ "title": "時系列ARIMA: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "ARIMAは次の略である",
+ "answerOptions": [
+ {
+ "answerText": "AutoRegressive Integral Moving Average",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "AutoRegressive Integrated Moving Action",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "AutoRegressive Integrated Moving Average",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "定常性とは",
+ "answerOptions": [
+ {
+ "answerText": "時間をずらしても属性が変わらないデータ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "時間をずらしても分布が変わらないデータ",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "時間をずらすと分布が変わるデータ",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "差分変換は",
+ "answerOptions": [
+ {
+ "answerText": "トレンドや季節性を安定させる",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "トレンドや季節性を悪化させる",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "トレンドや季節性を排除する",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 44,
+ "title": "時系列ARIMA: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "ARIMAは、時系列データの特殊な形に対してモデルを次のように適合させるために使われる",
+ "answerOptions": [
+ {
+ "answerText": "できるだけ平坦に",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "できるだけ近く",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "散布図によって",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "SARIMAXを使うのは",
+ "answerOptions": [
+ {
+ "answerText": "季節性ARIMAモデルを管理するため",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "特別なARIMAモデルを管理するため",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "統計的なARIMAモデルを管理するため",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "「ウォークフォワード」検証では",
+ "answerOptions": [
+ {
+ "answerText": "モデルを検証しながら段階的に再評価する",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "モデルを検証しながら段階的に再学習する",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "モデルを検証しながら段階的に再構成する",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 45,
+ "title": "強化 1: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "強化学習とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "理解するまで何度も教えること",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "何回も試行することで、ある環境におけるエージェントの最適な行動を解読する学習手法",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "複数の試行を一度に行う方法を理解すること",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "方策とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "任意の状態で行動を返す関数",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "返品できるかどうかを示す書類",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "ランダムな目的で使用される関数",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "報酬関数はある環境における各状態に対するスコアを返す",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 46,
+ "title": "強化 1: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "Q学習とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "各状態の「良さ」を記録する仕組み",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "Qテーブルによって方策が定義されているアルゴリズム",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "上の両方",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "ランダムウォークに対応するQテーブルの値は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "すべて同じ値",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "-0.25",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "すべて違う値",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "レッスンの学習プロセスでは搾取よりも探索を行ったほうが良かった",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "true"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 47,
+ "title": "強化 2: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "チェスや囲碁は連続した状態を持つゲームである",
+ "answerOptions": [
+ {
+ "answerText": "正しい",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "正しくない",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "カートポール問題とは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "外れ値を排除するプロセス",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "買い物かごを最適化する方法",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "バランシングの簡易版",
+ "isCorrect": "true"
+ }
+ ]
+ },
+ {
+ "questionText": "ゲームの中で起こりうる状態における様々なシナリオを行うために使えるツールは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "推測と確認",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "シミュレーション環境",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "状態遷移テスト",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 48,
+ "title": "強化 2: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "ある環境で起こりうるすべての状態を定義する場所はどこでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "メソッド",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "アクションスペース",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "アクションリスト",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "辞書のキーバリューとして使ったペアは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "キーに(state, action)、バリューにQテーブルのエントリ",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "キーにstate、バリューにaction",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "キーにqvalues関数の値、バリューにaction",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "Q学習で使用したハイパーパラメータは何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "Qテーブルの値・現在の報酬・ランダムなアクション",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "学習率・割引率・探索/搾取率",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "累積報酬・学習率・探索率",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 49,
+ "title": "実世界への応用: 講義前の小テスト",
+ "quiz": [
+ {
+ "questionText": "金融業界における機械学習の応用例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "自然言語処理を使ったカスタマージャーニーのパーソナライズ",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "線形回帰を使った健康管理",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "時系列を使ったエネルギー管理",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "再入院を管理するために病院で使える機械学習の手法は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "クラスタリング",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "時系列",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "自然言語処理",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "エネルギー管理に時系列を使用する例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "動物のモーションセンシング",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "スマートパーキングメーター",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "森林火災の追跡",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ },
+ {
+ "id": 50,
+ "title": "実世界への応用: 講義後の小テスト",
+ "quiz": [
+ {
+ "questionText": "クレジットカードの不正利用を検出するために使用できる機械学習の手法はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "回帰",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "クラスタリング",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "自然言語処理",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "森林管理で例示されている機械学習の手法はどれでしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "強化学習",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "時系列",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "自然言語処理",
+ "isCorrect": "false"
+ }
+ ]
+ },
+ {
+ "questionText": "ヘルスケア業界における機械学習の応用例は何でしょうか?",
+ "answerOptions": [
+ {
+ "answerText": "回帰を使った学生の行動予測",
+ "isCorrect": "false"
+ },
+ {
+ "answerText": "分類器を使った臨床試験の管理",
+ "isCorrect": "true"
+ },
+ {
+ "answerText": "分類器を使った動物のモーションセンシング",
+ "isCorrect": "false"
+ }
+ ]
+ }
+ ]
+ }
+ ]
+ }
+]
diff --git a/quiz-app/src/assets/translations/tr.json b/quiz-app/src/assets/translations/tr.json
index 050bbd2a..aa479e28 100644
--- a/quiz-app/src/assets/translations/tr.json
+++ b/quiz-app/src/assets/translations/tr.json
@@ -412,7 +412,7 @@
},
{
"answerText": "önyargı için potansiyel bir sebebi keşfedebilirsiniz",
- "isCorrect": "true"
+ "isCorrect": "false"
},
{
"answerText": "bunların her ikisi",
@@ -1092,7 +1092,7 @@
"answerOptions": [
{
"answerText": "veri noktalarını birden çok sınıfa sınıflandırma görevi",
- "isCorrect": "true"
+ "isCorrect": "false"
},
{
"answerText": "veri noktalarını birkaç sınıftan birine sınıflandırma görevi",
diff --git a/translations/README.ja.md b/translations/README.ja.md
new file mode 100644
index 00000000..39ab3fae
--- /dev/null
+++ b/translations/README.ja.md
@@ -0,0 +1,121 @@
+[](https://github.com/microsoft/ML-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/ML-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/pulls/)
+[](http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/ML-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/network/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/stargazers/)
+
+# 初心者のための機械学習 - カリキュラム
+
+> 🌍 世界の文化に触れながら機械学習を探求する旅 🌍
+
+マイクロソフトの Azure Cloud Advocates では、12週間、24レッスンの**機械学習**に関するカリキュラムを提供しています。このカリキュラムでは、今後公開する予定の「初心者のためのAI」で扱う深層学習を避け、主に Scikit-learn ライブラリを使用した**古典的機械学習**と呼ばれるものについて学びます。同様に公開予定の「初心者のためのデータサイエンス」と合わせてご活用ください!
+
+世界各地のデータに古典的な手法を適用しながら、一緒に世界を旅してみましょう。各レッスンには、レッスン前後の小テストや、レッスンを完了するための指示・解答・課題などが含まれています。新しいスキルを「定着」させるものとして実証されているプロジェクトベースの教育法によって、構築しながら学ぶことができます。
+
+**✍️ 著者の皆様に心から感謝いたします** Jen Looper さん、Stephen Howell さん、Francesca Lazzeri さん、Tomomi Imura さん、Cassie Breviu さん、Dmitry Soshnikov さん、Chris Noring さん、Ornella Altunyan さん、そして Amy Boyd さん
+
+**🎨 イラストレーターの皆様にも心から感謝いたします** Tomomi Imura さん、Dasani Madipalli さん、そして Jen Looper さん
+
+**🙏 Microsoft Student Ambassador の著者・査読者・コンテンツ提供者の皆様に特に感謝いたします 🙏** 特に、Rishit Dagli さん、Muhammad Sakib Khan Inan さん、Rohan Raj さん、Alexandru Petrescu さん、Abhishek Jaiswal さん、Nawrin Tabassum さん、Ioan Samuila さん、そして Snigdha Agarwal さん
+
+---
+
+# はじめに
+
+**学生の皆さん**、このカリキュラムを利用するには、自分のGitHubアカウントにリポジトリ全体をフォークして、一人もしくはグループで演習を完了させてください。
+
+- 講義前の小テストから始めてください。
+- 知識を確認するたびに立ち止まったり振り返ったりしながら、講義を読んで各アクティビティを完了させてください。
+- 解答のコードをただ実行するのではなく、レッスンを理解してプロジェクトを作成するようにしてください。なお、解答のコードは、プロジェクトに紐づく各レッスンの `/solution` フォルダにあります。
+- 講義後の小テストを受けてください。
+- チャレンジを完了させてください。
+- 課題を完了させてください。
+- レッスングループの完了後は [Discussionボード](https://github.com/microsoft/ML-For-Beginners/discussions) にアクセスし、適切なPAT表に記入することで「声に出して学習」してください。"PAT" とは Progress Assessment Tool(進捗評価ツール)の略で、学習を促進するために記入する表のことです。他のPATにリアクションすることもできるので、共に学ぶことが可能です。
+
+> さらに学習を進める場合は、[Microsoft Learn](https://docs.microsoft.com/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-15963-cxa) のラーニングパスに従うことをお勧めします。
+
+**先生方**、このカリキュラムをどのように使用するか、[いくつかの提案](../for-teachers.md) があります。
+
+---
+
+## チームの紹介
+
+[](https://youtu.be/Tj1XWrDSYJU "プロモーションビデオ")
+
+> 🎥 上の画像をクリックすると、このプロジェクトと、プロジェクトを作った人たちについてのビデオを観ることができます!
+
+---
+
+## 教育法
+
+このカリキュラムを構築するにあたり、私たちは2つの教育方針を選びました。**プロジェクトベース**の体験と、**頻繁な小テスト**を含むことです。さらにこのカリキュラムには、まとまりを持たせるための共通の**テーマ**があります。
+
+内容とプロジェクトとの整合性を保つことで、学生にとって学習プロセスがより魅力的になり、概念の定着度が高まります。さらに、授業前の軽い小テストは学生の学習意欲を高め、授業後の2回目の小テストはより一層の定着につながります。このカリキュラムは柔軟かつ楽しいものになるようデザインされており、すべて、もしくは一部を受講することが可能です。プロジェクトは小さなものから始まり、12週間の間に少しずつ複雑なものになっていきます。また、このカリキュラムには機械学習の実世界への応用に関するあとがきも含んでおり、追加の単位あるいは議論の題材として使用できます。
+
+> [行動規範](../CODE_OF_CONDUCT.md)、[貢献](../CONTRIBUTING.md)、[翻訳](../TRANSLATIONS.md) のガイドラインをご覧ください。建設的なご意見をお待ちしております!
+
+## 各レッスンの内容
+
+- オプションのスケッチノート
+- オプションの補足ビデオ
+- 講義前の小テスト
+- 成文のレッスン
+- プロジェクトベースのレッスンを行うため、プロジェクトの構築方法に関する段階的なガイド
+- 知識の確認
+- チャレンジ
+- 副読本
+- 課題
+- 講義後の小テスト
+
+> **小テストに関する注意**: すべての小テストは [このアプリ](https://jolly-sea-0a877260f.azurestaticapps.net) に含まれており、各3問からなる50個の小テストがあります。これらはレッスン内からリンクされていますが、アプリをローカルで実行することもできます。`quiz-app` フォルダ内の指示に従ってください。
+
+| レッスン番号 | トピック | レッスングループ | 学習の目的 | 関連するレッスン | 著者 |
+| :----------: | :------------------------------------------: | :----------------------------------------------------: | ------------------------------------------------------------------------------------------ | :---------------------------------------------------------------------: | :------------: |
+| 01 | 機械学習への導入 | [導入](../1-Introduction/translations/README.ja.md) | 機械学習の基本的な概念を学ぶ | [レッスン](../1-Introduction/1-intro-to-ML/translations/README.ja.md) | Muhammad |
+| 02 | 機械学習の歴史 | [導入](../1-Introduction/translations/README.ja.md) | この分野の背景にある歴史を学ぶ | [レッスン](../1-Introduction/2-history-of-ML/translations/README.ja.md) | Jen and Amy |
+| 03 | 公平性と機械学習 | [導入](../1-Introduction/translations/README.ja.md) | 機械学習モデルを構築・適用する際に学生が考慮すべき、公平性に関する重要な哲学的問題は何か? | [レッスン](../1-Introduction/3-fairness/translations/README.ja.md) | Tomomi |
+| 04 | 機械学習の手法 | [導入](../1-Introduction/translations/README.ja.md) | 機械学習の研究者はどのような手法でモデルを構築しているか? | [レッスン](1-Introduction/4-techniques-of-ML/README.md) | Chris and Jen |
+| 05 | 回帰への導入 | [回帰](../2-Regression/README.md) | 回帰モデルをPythonと Scikit-learn で始める | [レッスン](../2-Regression/1-Tools/translations/README.ja.md) | Jen |
+| 06 | 北米のカボチャの価格 🎃 | [回帰](../2-Regression/README.md) | 機械学習に向けてデータを可視化してクリーニングする | [レッスン](../2-Regression/2-Data/translations/README.ja.md) | Jen |
+| 07 | 北米のカボチャの価格 🎃 | [回帰](../2-Regression/README.md) | 線形および多項式回帰モデルを構築する | [レッスン](2-Regression/3-Linear/README.md) | Jen |
+| 08 | 北米のカボチャの価格 🎃 | [回帰](../2-Regression/README.md) | ロジスティック回帰モデルを構築する | [レッスン](../2-Regression/4-Logistic/README.md) | Jen |
+| 09 | Webアプリ 🔌 | [Web アプリ](../3-Web-App/README.md) | 学習したモデルを使用するWebアプリを構築する | [レッスン](../3-Web-App/1-Web-App/README.md) | Jen |
+| 10 | 分類への導入 | [分類](../4-Classification/README.md) | データをクリーニング・前処理・可視化する。分類への導入 | [レッスン](../4-Classification/1-Introduction/README.md) | Jen and Cassie |
+| 11 | 美味しいアジア料理とインド料理 🍜 | [分類](../4-Classification/README.md) | 分類器への導入 | [レッスン](../4-Classification/2-Classifiers-1/README.md) | Jen and Cassie |
+| 12 | 美味しいアジア料理とインド料理 🍜 | [分類](../4-Classification/README.md) | その他の分類器 | [レッスン](../4-Classification/3-Classifiers-2/README.md) | Jen and Cassie |
+| 13 | 美味しいアジア料理とインド料理 🍜 | [分類](../4-Classification/README.md) | モデルを使用して推薦Webアプリを構築する | [レッスン](../4-Classification/4-Applied/README.md) | Jen |
+| 14 | クラスタリングへの導入 | [クラスタリング](../5-Clustering/README.md) | データをクリーニング・前処理・可視化する。クラスタリングへの導入 | [レッスン](../5-Clustering/1-Visualize/README.md) | Jen |
+| 15 | ナイジェリアの音楽的嗜好を探る 🎧 | [クラスタリング](../5-Clustering/README.md) | K-Means法を探る | [レッスン](../5-Clustering/2-K-Means/README.md) | Jen |
+| 16 | 自然言語処理への導入 ☕️ | [自然言語処理](../6-NLP/README.md) | 単純なボットを構築して自然言語処理の基礎を学ぶ | [レッスン](../6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
+| 17 | 自然言語処理の一般的なタスク ☕️ | [自然言語処理](../6-NLP/README.md) | 言語構造を扱う際に必要となる一般的なタスクを理解することで、自然言語処理の知識を深める | [レッスン](../6-NLP/2-Tasks/README.md) | Stephen |
+| 18 | 翻訳と感情分析 ♥️ | [自然言語処理](../6-NLP/README.md) | ジェーン・オースティンの翻訳と感情分析 | [レッスン](../6-NLP/3-Translation-Sentiment/README.md) | Stephen |
+| 19 | ヨーロッパのロマンチックなホテル ♥️ | [自然言語処理](../6-NLP/README.md) | ホテルのレビューの感情分析 1 | [レッスン](../6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
+| 20 | ヨーロッパのロマンチックなホテル ♥️ | [自然言語処理](../6-NLP/README.md) | ホテルのレビューの感情分析 2 | [レッスン](../6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
+| 21 | 時系列予測への導入 | [Time series](../7-TimeSeries/README.md) | 時系列予測への導入 | [レッスン](../7-TimeSeries/1-Introduction/README.md) | Francesca |
+| 22 | ⚡️ 世界の電力使用量 ⚡️ - ARIMAによる時系列予測 | [Time series](../7-TimeSeries/README.md) | ARIMAによる時系列予測 | [レッスン](../7-TimeSeries/2-ARIMA/README.md) | Francesca |
+| 23 | 強化学習への導入 | [Reinforcement learning](../8-Reinforcement/README.md) | Q学習を使った強化学習への導入 | [レッスン](../8-Reinforcement/1-QLearning/README.md) | Dmitry |
+| 24 | ピーターが狼を避けるのを手伝ってください! 🐺 | [Reinforcement learning](../8-Reinforcement/README.md) | 強化学習ジム | [レッスン](../8-Reinforcement/2-Gym/README.md) | Dmitry |
+| Postscript | 実世界の機械学習シナリオと応用 | [ML in the Wild](../9-Real-World/README.md) | 興味深くて意義のある、古典的機械学習の実世界での応用 | [レッスン](../9-Real-World/1-Applications/README.md) | Team |
+
+## オフラインアクセス
+
+[Docsify](https://docsify.js.org/#/) を使うと、このドキュメントをオフラインで実行できます。このリポジトリをフォークして、ローカルマシンに [Docsify をインストール](https://docsify.js.org/#/quickstart) し、このリポジトリのルートフォルダで `docsify serve` と入力してください。ローカルホストの3000番ポート、つまり `localhost:3000` でWebサイトが起動します。
+
+## PDF
+
+カリキュラムのPDFへのリンクは [こちら](../pdf/readme.pdf)。
+
+## ヘルプ募集!
+
+翻訳をしてみませんか?[翻訳ガイドライン](../TRANSLATIONS.md) をご覧の上、[こちら](https://github.com/microsoft/ML-For-Beginners/issues/71) でお知らせください。
+
+## その他のカリキュラム
+
+私たちはその他のカリキュラムも提供しています!ぜひチェックしてみてください。
+
+- [初心者のためのWeb開発](https://aka.ms/webdev-beginners)
+- [初心者のためのIoT](https://aka.ms/iot-beginners)
diff --git a/translations/README.tr.md b/translations/README.tr.md
new file mode 100644
index 00000000..f4dab353
--- /dev/null
+++ b/translations/README.tr.md
@@ -0,0 +1,119 @@
+[](https://github.com/microsoft/ML-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/ML-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/pulls/)
+[](http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/ML-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/network/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/stargazers/)
+
+# Yeni Başlayanlar için Makine Öğrenimi - Bir Eğitim Programı
+
+> :earth_africa: Dünya kültürleri sayesinde Makine Öğrenimini keşfederken dünyayı gezin :earth_africa:
+
+Microsoft'taki Azure Cloud Destekleyicileri tamamen **Makine Öğrenimi** hakkında olan 12 hafta ve 24 derslik eğitim programını sunmaktan memnuniyet duyar. Bu eğitim programında, kütüphane olarak temelde Scikit-learn kullanarak ve yakında çıkacak olan 'Yeni Başlayanlar için Yapay Zeka' dersinde anlatılan derin öğrenmeden uzak durarak, zaman zaman adlandırıldığı şekliyle, **klasik makine öğrenimi**ni öğreneceksiniz. Bu dersleri yakında çıkacak olan 'Yeni Başlayanlar için Veri Bilimi' eğitim programımızla da birleştirin!
+
+Biz bu klasik teknikleri dünyanın birçok alanından verilere uygularken bizimle dünyayı gezin. Her bir ders, ders başı ve ders sonu kısa sınavlarını, dersi tamamlamak için yazılı yönergeleri, bir çözümü, bir ödevi ve daha fazlasını içerir. Yeni becerilerin 'yerleşmesi' için kanıtlanmış bir yol olan proje temelli pedagojimiz, yaparken öğrenmenizi sağlar.
+
+**:writing_hand: Yazarlarımıza yürekten teşekkürler** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshnikov, Chris Noring, Ornella Altunyan, and Amy Boyd
+
+**:art: Çizerlerimize de teşekkürler** Tomomi Imura, Dasani Madipalli, and Jen Looper
+
+ **:pray: Microsoft Student Ambassador yazarlarımıza, eleştirmenlerimize ve içeriğe katkıda bulunanlara özel teşekkürler :pray:** özellikle Rishit Dagli, Muhammad Sakib Khan Inan, Rohan Raj, Alexandru Petrescu, Abhishek Jaiswal, Nawrin Tabassum, Ioan Samuila, and Snigdha Agarwal
+
+---
+# Başlarken
+
+**Öğrenciler**, bu eğitim programını kullanmak için, tüm yazılım havuzunu kendi GitHub hesabınıza çatallayın ve alıştırmaları kendiniz veya bir grup ile tamamlayın:
+
+- Bir ders öncesi kısa sınavı ile başlayın
+- Her bilgi kontrolünde durup derinlemesine düşünerek dersi okuyun ve etkinlikleri tamamlayın.
+- Çözüm kodunu çalıştırmaktansa dersleri kavrayarak projeleri yapmaya çalışın; yine de o çözüm kodu her proje yönelimli derste `/solution` klasörlerinde mevcut.
+- Ders sonrası kısa sınavını çözün
+- Meydan okumayı tamamlayın
+- Ödevi tamamlayın
+- Bir ders grubunu tamamladıktan sonra, [Tartışma Panosu](https://github.com/microsoft/ML-For-Beginners/discussions)'nu ziyaret edin ve uygun PAT yönergesini doldurarak "sesli öğrenin" (Yani, tamamen öğrenmeden önce öğrenme süreciniz üzerine derin düşünerek içgözlem ve geridönütlerle kendinizde farkındalık oluşturun.). 'PAT', bir Progress Assessment Tool'dur (Süreç Değerlendirme Aracı), öğrenmenizi daha ileriye taşımak için doldurduğunuz bir yönergedir. Diğer PAT'lere de karşılık verebilirsiniz, böylece beraber öğrenebiliriz.
+
+> İleri çalışma için, bu [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-15963-cxa) modüllerini ve öğrenme rotalarını takip etmenizi tavsiye ediyoruz.
+
+**Öğretmenler**, bu eğitim programının nasıl kullanılacağı hakkında [bazı öneriler ekledik](../for-teachers.md).
+
+---
+
+## Takımla Tanışın
+
+[](https://youtu.be/Tj1XWrDSYJU "Promo video")
+
+> :movie_camera: Proje ve projeyi yaratanlar hakkındaki video için yukarıdaki fotoğrafa tıklayın!
+
+---
+## Pedagoji
+
+Bu eğitim programını oluştururken iki pedagojik ilke seçtik: uygulamalı **proje temelli** olduğundan ve **sık kısa sınavlar** içerdiğinden emin olmak. Ayrıca, bu eğitim programında tutarlılık sağlaması için genel bir **tema** var.
+
+İçeriğin projelerle uyumlu olduğuna emin olarak, süreç öğrenciler için daha ilgi çekici hale getirilmiştir ve kavramların akılda kalıcılığı artacaktır. Ayrıca, dersten önce ikincil değerli bir kısa sınav öğrencinin niyetini konuyu öğrenmek yaparken dersten sonra yapılan ikinci bir kısa sınav da akılda kalıcılığı sağlar. Bu eğitim programı esnek ve eğlenceli olacak şekilde hazırlanmıştır ve tümüyle veya kısmen işlenebilir. Projeler kolay başlar ve 12 haftalık zamanın sonuna doğru karmaşıklıkları gittikçe artar. Bu eğitim programı, Makine Öğreniminin gerçek hayattaki uygulamaları üzerine, ek puan veya tartışma için bir temel olarak kullanılabilecek bir ek yazı da içermektedir.
+
+> [Davranış Kuralları](../CODE_OF_CONDUCT.md)'mızı, [Katkıda Bulunma](../CONTRIBUTING.md) ve [Çeviri](../TRANSLATIONS.md) kılavuz ilkelerimizi inceleyin. Yapıcı geridönütlerinizi memnuniyetle karşılıyoruz!
+## Her bir ders şunları içermektedir:
+
+- isteğe bağlı eskiz notu
+- isteğe bağlı ek video
+- ders öncesi ısınma kısa sınavı
+- yazılı ders
+- proje temelli dersler için, projenin nasıl yapılacağına dair adım adım kılavuz
+- bilgi kontrolleri
+- bir meydan okuma
+- ek okuma
+- ödev
+- ders sonrası kısa sınavı
+
+> **Kısa sınavlar hakkında bir not**: Her biri üç sorudan oluşan ve toplamda 50 tane olan tüm kısa sınavlar [bu uygulamada](https://jolly-sea-0a877260f.azurestaticapps.net) bulunmaktadır. Derslerin içinden de bağlantı yoluyla ulaşılabilirler ancak kısa sınav uygulaması yerelde çalıştırılabilir; `quiz-app` klasöründeki yönergeleri takip edin.
+
+
+| Ders Numarası | Konu | Ders Gruplandırması | Öğrenme Hedefleri | Ders | Yazar |
+| :-----------: | :--------------------------------------------------------: | :-------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------: | :------------: |
+| 01 | Makine Öğrenimi Giriş | [Giriş](../1-Introduction/README.md) | Makine öğreniminin temel kavramlarını öğrenmek | [ders](../1-Introduction/1-intro-to-ML/README.md) | Muhammad |
+| 02 | Makine Öğrenimi Tarihi | [Giriş](../1-Introduction/README.md) | Bu alanın altında yatan tarihi öğrenmek | [ders](../1-Introduction/2-history-of-ML/README.md) | Jen and Amy |
+| 03 | Eşitlik ve Makine Öğrenimi | [Giriş](../1-Introduction/README.md) | Öğrencilerin ML modelleri yaparken ve uygularken düşünmeleri gereken eşitlik hakkındaki önemli felsefi sorunlar nelerdir? | [ders](../1-Introduction/3-fairness/README.md) | Tomomi |
+| 04 | Makine Öğrenimi için Yöntemler | [Giriş](../1-Introduction/README.md) | ML araştırmacıları ML modelleri üretmek için hangi yöntemleri kullanırlar? | [ders](../1-Introduction/4-techniques-of-ML/README.md) | Chris and Jen |
+| 05 | Regresyona Giriş | [Regresyon](../2-Regression/README.md) | Regresyon modelleri için Python ve Scikit-learn'e başlamak | [ders](../2-Regression/1-Tools/README.md) | Jen |
+| 06 | Kuzey Amerika balkabağı fiyatları :jack_o_lantern: | [Regresyon](../2-Regression/README.md) | ML hazırlığı için verileri görselleştirmek ve temizlemek | [ders](../2-Regression/2-Data/README.md) | Jen |
+| 07 | Kuzey Amerika balkabağı fiyatları :jack_o_lantern: | [Regresyon](../2-Regression/README.md) | Doğrusal ve polinom regresyon modelleri yapmak | [ders](../2-Regression/3-Linear/README.md) | Jen |
+| 08 | Kuzey Amerika balkabağı fiyatları :jack_o_lantern: | [Regresyon](../2-Regression/README.md) | Lojistik bir regresyon modeli yapmak | [ders](../2-Regression/4-Logistic/README.md) | Jen |
+| 09 | Bir Web Uygulaması :electric_plug: | [Web Uygulaması](../3-Web-App/README.md) | Eğittiğiniz modeli kullanmak için bir web uygulaması yapmak | [ders](../3-Web-App/1-Web-App/README.md) | Jen |
+| 10 | Sınıflandırmaya Giriş | [Sınıflandırma](../4-Classification/README.md) | Verilerinizi temizlemek, hazırlamak, ve görselleştirmek; sınıflandırmaya giriş | [ders](../4-Classification/1-Introduction/README.md) | Jen and Cassie |
+| 11 | Leziz Asya ve Hint mutfağı :ramen: | [Sınıflandırma](../4-Classification/README.md) | Sınıflandırıcılara giriş | [ders](../4-Classification/2-Classifiers-1/README.md) | Jen and Cassie |
+| 12 | Leziz Asya ve Hint mutfağı :ramen: | [Sınıflandırma](../4-Classification/README.md) | Daha fazla sınıflandırıcı | [ders](../4-Classification/3-Classifiers-2/README.md) | Jen and Cassie |
+| 13 | Leziz Asya ve Hint mutfağı :ramen: | [Sınıflandırma](../4-Classification/README.md) | Modelinizi kullanarak tavsiyede bulunan bir web uygulaması yapmak | [ders](../4-Classification/4-Applied/README.md) | Jen |
+| 14 | Kümelemeye Giriş | [Kümeleme](../5-Clustering/README.md) | Verilerinizi temizlemek, hazırlamak, ve görselleştirmek; kümelemeye giriş | [ders](../5-Clustering/1-Visualize/README.md) | Jen |
+| 15 | Nijerya'nın Müzik Zevklerini Keşfetme :headphones: | [Kümeleme](../5-Clustering/README.md) | K merkezli kümeleme yöntemini keşfetmek | [ders](../5-Clustering/2-K-Means/README.md) | Jen |
+| 16 | Doğal Dil İşlemeye Giriş :coffee: | [Doğal Dil İşleme](../6-NLP/README.md) | Basit bir bot yaratarak NLP temellerini öğrenmek | [ders](../6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
+| 17 | Yaygın NLP Görevleri :coffee: | [Doğal Dil İşleme](../6-NLP/README.md) | Dil yapılarıyla uğraşırken gereken yaygın görevleri anlayarak NLP bilginizi derinleştirmek | [ders](../6-NLP/2-Tasks/README.md) | Stephen |
+| 18 | Çeviri ve Duygu Analizi :hearts: | [Doğal Dil İşleme](../6-NLP/README.md) | Jane Austen ile çeviri ve duygu analizi | [ders](../6-NLP/3-Translation-Sentiment/README.md) | Stephen |
+| 19 | Avrupa'nın Romantik Otelleri :hearts: | [Doğal Dil İşleme](../6-NLP/README.md) | Otel değerlendirmeleriyle duygu analizi, 1 | [ders](../6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
+| 20 | Avrupa'nın Romantik Otelleri :hearts: | [Doğal Dil İşleme](../6-NLP/README.md) | Otel değerlendirmeleriyle duygu analizi 2 | [ders](../6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
+| 21 | Zaman Serisi Tahminine Giriş | [Zaman Serisi](../7-TimeSeries/README.md) | Zaman serisi tahminine giriş | [ders](../7-TimeSeries/1-Introduction/README.md) | Francesca |
+| 22 | :zap: Dünya Güç Kullanımı :zap: - ARIMA ile Zaman Serisi Tahmini | [Zaman Serisi](../7-TimeSeries/README.md) | ARIMA ile zaman serisi tahmini | [ders](../7-TimeSeries/2-ARIMA/README.md) | Francesca |
+| 23 | Pekiştirmeli Öğrenmeye Giriş | [Pekiştirmeli Öğrenme](../8-Reinforcement/README.md) | Q-Learning ile pekiştirmeli öğrenmeye giriş | [ders](../8-Reinforcement/1-QLearning/README.md) | Dmitry |
+| 24 | Peter'ın Kurttan Uzak Durmasına Yardım Edin! :wolf: | [Pekiştirmeli Öğrenme](../8-Reinforcement/README.md) | Pekiştirmeli öğrenme spor salonu | [ders](../8-Reinforcement/2-Gym/README.md) | Dmitry |
+| Ek Yazı | Gerçek Hayattan ML Senaryoları ve Uygulamaları | [Vahşi Doğada ML](../9-Real-World/README.md) | Klasik makine öğreniminin ilginç ve açıklayıcı gerçek hayat uygulamaları | [ders](../9-Real-World/1-Applications/README.md) | Takım |
+## Çevrimdışı erişim
+
+Bu dokümantasyonu [Docsify](https://docsify.js.org/#/) kullanarak çevrimdışı çalıştırabilirsiniz. Bu yazılım havuzunu çatallayın, yerel makinenizde [Docsify'ı kurun](https://docsify.js.org/#/quickstart) ve sonra bu yazılım havuzunun kök dizininde `docsify serve` yazın. İnternet sitesi, 3000 portunda `localhost:3000` yerel ana makinenizde sunulacaktır.
+
+## PDF'ler
+
+Eğitim programının bağlantılarla PDF'sine [buradan](../pdf/readme.pdf) ulaşabilirsiniz.
+
+## Yardım İsteniyor!
+
+Bir çeviri katkısında bulunmak ister misiniz? Lütfen [çeviri kılavuz ilkelerimizi](../TRANSLATIONS.md) okuyun ve [buraya](https://github.com/microsoft/ML-For-Beginners/issues/71) girdiyi ekleyin.
+
+## Diğer Eğitim Programları
+
+Takımımız başka eğitim programları üretiyor! İnceleyin:
+
+- [Yeni Başlayanlar için Web Geliştirme](https://aka.ms/webdev-beginners)
+- [Yeni Başlayanlar için Nesnelerin İnterneti](https://aka.ms/iot-beginners)
+
diff --git a/translations/README.zh-cn.md b/translations/README.zh-cn.md
new file mode 100644
index 00000000..dfe0760c
--- /dev/null
+++ b/translations/README.zh-cn.md
@@ -0,0 +1,119 @@
+[](https://github.com/microsoft/ML-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/ML-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/pulls/)
+[](../http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/ML-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/network/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/stargazers/)
+
+# 针对初学者的机器学习课程
+
+> 🌍 环游世界,并通过世界文化来探索机器学习 🌍
+
+微软 Azure Cloud 的倡导者们很高兴可以提供这套十二周、二十四节课的关于**机器学习**的课程。在这套课程中,你将学习关于**经典机器学习**的内容,主要将使用 Scikit-learn 这一库。关于深度学习的内容将会尽量避免 —— 它会被我们即将推出的 "AI for Beginners (针对初学者的 AI 教程)" 所涵盖。你也可以把这些课和我们即将推出的 "Data Science for Beginners (针对初学者的数据科学教程)" 相结合!
+
+通过把这些经典的技术应用在来自世界各地的数据,我们将 “环游世界”。每一节课都包括了课前和课后测验、课程内容的文字讲义说明、示例代码、作业等。通过这种基于项目的教学方法,你将在构建中学习,这样可以把技能学的更牢靠。
+
+**✍️ 衷心感谢作者们** Jen Looper, Stephen Howell, Francesca Lazzeri, Tomomi Imura, Cassie Breviu, Dmitry Soshnikov, Chris Noring, Ornella Altunyan 以及 Amy Boyd
+
+**🎨 同时也要感谢我们的插画师** Tomomi Imura, Dasani Madipalli 以及 Jen Looper
+
+ **🙏 特别感谢 🙏 我们的微软学生大使作者们,内容贡献和内容复核者们**, Rishit Dagli, Muhammad Sakib Khan Inan, Rohan Raj, Alexandru Petrescu, Abhishek Jaiswal, Nawrin Tabassum, Ioan Samuila, 和 Snigdha Agarwal 等
+
+---
+# 准备开始
+
+**对于学生们**,为了更好的使用这套课程,把整个仓库 fork 到你自己的 Github 账户中,并自行(或和一个小组一起)完成以下练习:
+
+- 从课前测验开始
+- 阅读课程内容,完成所有的活动,在每次 knowledge check 时暂停并思考
+- 我们建议你基于理解来创建项目(而不是仅仅跑一遍示例代码)。示例代码的位置在每一个项目的 `/solution` 文件夹中。
+- 进行课后测验
+- 完成课程挑战
+- 完成作业
+- 一节课完成后, 访问[讨论版](https://github.com/microsoft/ML-For-Beginners/discussions),通过填写相应的 PAT Rubric (课程目标) 来深化自己的学习成果。你也可以回应其它的 PAT,这样我们可以一起学习。
+
+> 如果希望进一步学习,我们推荐跟随 [Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-15963-cxa) 的模块和学习路径。
+
+**对于老师们**,我们对于如何使用这套教程[提供了一些建议](../for-teachers.md)。
+
+---
+
+## 项目团队
+
+[](https://youtu.be/Tj1XWrDSYJU "宣传视频")
+
+> 🎥 点击上方的图片,来观看一个关于这个项目和它的创造者们的视频!
+
+---
+## 教学方式
+
+此课程基于两个教学原则:学生应该上手进行**项目实践**,并完成**频繁的测验**。 此外,为了使整个课程更具有整体性,课程们有一个共同的**主题**。
+
+通过确保课程内容与项目强相关,我们让学习过程对学生更具吸引力,概念的学习也被深化了。难度较低的课前测验可以吸引学生学习课程,而课后的第二次测验也进一步重复了课堂中的概念。该课程被设计地灵活有趣,可以一次性全部学习,或者分开来一部分一部分学习。这些项目由浅入深,从第一周的小项目开始,在第十二周结束时变得较为复杂。本课程还包括一个关于机器学习实际应用的后记,可用作额外学分或进一步讨论的基础。
+
+> 在这里,你可以找到我们的[行为守则](../CODE_OF_CONDUCT.md),[对项目作出贡献](../CONTRIBUTING.md)以及[翻译](../TRANSLATIONS.md)指南。我们欢迎各位提出有建设性的反馈!
+
+## 每一节课都包含:
+
+- 可选的笔记
+- 可选的补充视频
+- 课前热身测验
+- 文字课程
+- 对于基于项目的课程,包含构建项目的分步指南
+- knowledge checks
+- 一个挑战
+- 补充阅读
+- 作业
+- 课后测验
+
+> **关于测验**:所有的测验都在[这个应用里](https://jolly-sea-0a877260f.azurestaticapps.net),总共 50 个测验,每个测验三个问题。它们的链接在每节课中,而且这个测验应用可以在本地运行。请参考 `quiz-app` 文件夹中的指南。
+
+
+| 课程编号 | 主体 | 课程组 | 学习目标 | 课程链接 | 作者 |
+| :-----------: | :--------------------------------------------------------: | :-------------------------------------------------: | ------------------------------------------------------------------------------------------------------------------------------- | :---------------------------------------------------: | :------------: |
+| 01 | 机器学习简介 | [简介](../1-Introduction/README.md) | 了解机器学习背后的基本概念 | [课程](../1-Introduction/1-intro-to-ML/README.md) | Muhammad |
+| 02 | 机器学习的历史 | [简介](../1-Introduction/README.md) | 了解该领域的历史 | [课程](../1-Introduction/2-history-of-ML/README.md) | Jen 和 Amy |
+| 03 | 机器学习与公平 | [简介](../1-Introduction/README.md) | 在构建和应用机器学习模型时,我们应该考虑哪些有关公平的重要哲学问题? | [课程](../1-Introduction/3-fairness/README.md) | Tomomi |
+| 04 | 机器学习的技术工具 | [简介](../1-Introduction/README.md) | 机器学习研究者使用哪些技术来构建机器学习模型? | [课程](../1-Introduction/4-techniques-of-ML/README.md) | Chris 和 Jen |
+| 05 | 回归简介 | [回归](../2-Regression/README.md) | 开始使用 Python 和 Scikit-learn 构建回归模型 | [课程](../2-Regression/1-Tools/README.md) | Jen |
+| 06 | 北美南瓜价格 🎃 | [回归](../2-Regression/README.md) | 可视化、进行数据清理,为机器学习做准备 | [课程](../2-Regression/2-Data/README.md) | Jen |
+| 07 | 北美南瓜价格 🎃 | [回归](../2-Regression/README.md) | 建立线性和多项式回归模型 | [课程](../2-Regression/3-Linear/README.md) | Jen |
+| 08 | 北美南瓜价格 🎃 | [回归](../2-Regression/README.md) | 构建逻辑回归模型 | [课程](../2-Regression/4-Logistic/README.md) | Jen |
+| 09 | 一个网页应用 🔌 | [网页应用](../3-Web-App/README.md) | 构建一个 Web 应用程序以使用经过训练的模型 | [课程](../3-Web-App/1-Web-App/README.md) | Jen |
+| 10 | 分类简介 | [分类](../4-Classification/README.md) | 清理、准备和可视化数据; 分类简介 | [课程](../4-Classification/1-Introduction/README.md) | Jen 和 Cassie |
+| 11 | 美味的亚洲和印度美食 🍜 | [分类](../4-Classification/README.md) | 分类器简介 | [课程](../4-Classification/2-Classifiers-1/README.md) | Jen 和 Cassie |
+| 12 | 美味的亚洲和印度美食 🍜 | [分类](../4-Classification/README.md) | 关于分类器的更多内容 | [课程](../4-Classification/3-Classifiers-2/README.md) | Jen 和 Cassie |
+| 13 | 美味的亚洲和印度美食 🍜 | [分类](../4-Classification/README.md) | 使用您的模型构建一个可以「推荐」的 Web 应用 | [课程](../4-Classification/4-Applied/README.md) | Jen |
+| 14 | 聚类简介 | [聚类](../5-Clustering/README.md) | 清理、准备和可视化数据; 聚类简介 | [课程](../5-Clustering/1-Visualize/README.md) | Jen |
+| 15 | 探索尼日利亚人的音乐品味 🎧 | [聚类](../5-Clustering/README.md) | 探索 K-Means 聚类方法 | [课程](../5-Clustering/2-K-Means/README.md) | Jen |
+| 16 | 自然语言处理 (NLP) 简介 ☕️ | [自然语言处理](../6-NLP/README.md) | 通过构建一个简单的 bot (机器人) 来了解 NLP 的基础知识 | [课程](../6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
+| 17 | 常见的 NLP 任务 ☕️ | [自然语言处理](../6-NLP/README.md) | 通过理解处理语言结构时所需的常见任务来加深对于自然语言处理 (NLP) 的理解 | [课程](../6-NLP/2-Tasks/README.md) | Stephen |
+| 18 | 翻译和情感分析 ♥️ | [自然语言处理](../6-NLP/README.md) | 对简·奥斯汀的文本进行翻译和情感分析 | [课程](../6-NLP/3-Translation-Sentiment/README.md) | Stephen |
+| 19 | 欧洲的浪漫酒店 ♥️ | [自然语言处理](../6-NLP/README.md) | 对于酒店评价进行情感分析(上) | [课程](../6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
+| 20 | 欧洲的浪漫酒店 ♥️ | [自然语言处理](../6-NLP/README.md) | 对于酒店评价进行情感分析(下) | [课程](../6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
+| 21 | 时间序列预测简介 | [时间序列](../7-TimeSeries/README.md) | 时间序列预测简介 forecasting | [课程](../7-TimeSeries/1-Introduction/README.md) | Francesca |
+| 22 | ⚡️ 世界用电量 ⚡️ - 使用 ARIMA 进行时间序列预测 | [时间序列](../7-TimeSeries/README.md) | 使用 ARIMA 进行时间序列预测 | [课程](../7-TimeSeries/2-ARIMA/README.md) | Francesca |
+| 23 | 强化学习简介 | [强化学习](../8-Reinforcement/README.md) | Q-Learning 强化学习简介 | [课程](../8-Reinforcement/1-QLearning/README.md) | Dmitry |
+| 24 | 帮助 Peter 避开狼!🐺 | [强化学习](../8-Reinforcement/README.md) | 强化学习练习 | [课程](../8-Reinforcement/2-Gym/README.md) | Dmitry |
+| 后记 | 现实世界中的机器学习场景和应用 | [自然场景下的机器学习](../9-Real-World/README.md) | 探索有趣的经典机器学习方法,了解现实世界中机器学习的应用 | [课程](../9-Real-World/1-Applications/README.md) | 团队 |
+## 离线访问
+
+您可以使用 [Docsify](https://docsify.js.org/#/) 离线运行此文档。 Fork 这个仓库,并在你的本地机器上[安装 Docsify](https://docsify.js.org/#/quickstart),并在这个仓库的根文件夹中运行 `docsify serve`。你可以通过 localhost 的 3000 端口访问此文档:`localhost:3000`。
+## PDF 文档们
+
+点击[这里](../pdf/readme.pdf)查找课程的 PDF 文档们。
+
+## 需要你的帮助!
+
+想贡献一份翻译吗?请阅读我们的[翻译指南](../TRANSLATIONS.md)并在[此处](https://github.com/microsoft/ML-For-Beginners/issues/71)添加你的意见。
+
+## 其他课程
+
+我们的团队还制作了其他课程!可以看一下:
+
+- [针对初学者的 Web 开发课程](https://aka.ms/webdev-beginners)
+- [针对初学者的物联网课程](https://aka.ms/iot-beginners)
+