diff --git a/translated_images/zh-CN/peter.779730f9ba3a8a8d.webp b/translated_images/zh-CN/peter.779730f9ba3a8a8d.webp
new file mode 100644
index 000000000..a6a517271
Binary files /dev/null and b/translated_images/zh-CN/peter.779730f9ba3a8a8d.webp differ
diff --git a/translated_images/zh-CN/pie-pumpkins-scatter.d14f9804a53f927e.webp b/translated_images/zh-CN/pie-pumpkins-scatter.d14f9804a53f927e.webp
new file mode 100644
index 000000000..c606bc6ee
Binary files /dev/null and b/translated_images/zh-CN/pie-pumpkins-scatter.d14f9804a53f927e.webp differ
diff --git a/translated_images/zh-CN/pinch.1b035ec9ba7e0d40.webp b/translated_images/zh-CN/pinch.1b035ec9ba7e0d40.webp
new file mode 100644
index 000000000..4008abf91
Binary files /dev/null and b/translated_images/zh-CN/pinch.1b035ec9ba7e0d40.webp differ
diff --git a/translated_images/zh-CN/poly-results.ee587348f0f1f60b.webp b/translated_images/zh-CN/poly-results.ee587348f0f1f60b.webp
new file mode 100644
index 000000000..6743c245c
Binary files /dev/null and b/translated_images/zh-CN/poly-results.ee587348f0f1f60b.webp differ
diff --git a/translated_images/zh-CN/polynomial.8fce4663e7283dfb.webp b/translated_images/zh-CN/polynomial.8fce4663e7283dfb.webp
new file mode 100644
index 000000000..27b68bead
Binary files /dev/null and b/translated_images/zh-CN/polynomial.8fce4663e7283dfb.webp differ
diff --git a/translated_images/zh-CN/popular.9c48d84b3386705f.webp b/translated_images/zh-CN/popular.9c48d84b3386705f.webp
new file mode 100644
index 000000000..11d668712
Binary files /dev/null and b/translated_images/zh-CN/popular.9c48d84b3386705f.webp differ
diff --git a/translated_images/zh-CN/price-by-variety.744a2f9925d9bcb4.webp b/translated_images/zh-CN/price-by-variety.744a2f9925d9bcb4.webp
new file mode 100644
index 000000000..d001a164a
Binary files /dev/null and b/translated_images/zh-CN/price-by-variety.744a2f9925d9bcb4.webp differ
diff --git a/translated_images/zh-CN/problems.f7fb539ccd80608e.webp b/translated_images/zh-CN/problems.f7fb539ccd80608e.webp
new file mode 100644
index 000000000..29cb57923
Binary files /dev/null and b/translated_images/zh-CN/problems.f7fb539ccd80608e.webp differ
diff --git a/translated_images/zh-CN/pumpkin-classifier.562771f104ad5436.webp b/translated_images/zh-CN/pumpkin-classifier.562771f104ad5436.webp
new file mode 100644
index 000000000..57d269d4d
Binary files /dev/null and b/translated_images/zh-CN/pumpkin-classifier.562771f104ad5436.webp differ
diff --git a/translated_images/zh-CN/pumpkins_catplot_1.c55c409b71fea2ec.webp b/translated_images/zh-CN/pumpkins_catplot_1.c55c409b71fea2ec.webp
new file mode 100644
index 000000000..58594e5da
Binary files /dev/null and b/translated_images/zh-CN/pumpkins_catplot_1.c55c409b71fea2ec.webp differ
diff --git a/translated_images/zh-CN/pumpkins_catplot_2.87a354447880b388.webp b/translated_images/zh-CN/pumpkins_catplot_2.87a354447880b388.webp
new file mode 100644
index 000000000..6166b0de4
Binary files /dev/null and b/translated_images/zh-CN/pumpkins_catplot_2.87a354447880b388.webp differ
diff --git a/translated_images/zh-CN/r_learners_sm.cd14eb3581a9f28d.webp b/translated_images/zh-CN/r_learners_sm.cd14eb3581a9f28d.webp
new file mode 100644
index 000000000..074baeefb
Binary files /dev/null and b/translated_images/zh-CN/r_learners_sm.cd14eb3581a9f28d.webp differ
diff --git a/translated_images/zh-CN/r_learners_sm.e25fa9c205b3a3f9.webp b/translated_images/zh-CN/r_learners_sm.e25fa9c205b3a3f9.webp
new file mode 100644
index 000000000..1ae28b35e
Binary files /dev/null and b/translated_images/zh-CN/r_learners_sm.e25fa9c205b3a3f9.webp differ
diff --git a/translated_images/zh-CN/r_learners_sm.e4a71b113ffbedfe.webp b/translated_images/zh-CN/r_learners_sm.e4a71b113ffbedfe.webp
new file mode 100644
index 000000000..11d0b46df
Binary files /dev/null and b/translated_images/zh-CN/r_learners_sm.e4a71b113ffbedfe.webp differ
diff --git a/translated_images/zh-CN/r_learners_sm.f9199f76f1e2e493.webp b/translated_images/zh-CN/r_learners_sm.f9199f76f1e2e493.webp
new file mode 100644
index 000000000..e2741202b
Binary files /dev/null and b/translated_images/zh-CN/r_learners_sm.f9199f76f1e2e493.webp differ
diff --git a/translated_images/zh-CN/recipes.186acfa8ed2e8f00.webp b/translated_images/zh-CN/recipes.186acfa8ed2e8f00.webp
new file mode 100644
index 000000000..7dd735830
Binary files /dev/null and b/translated_images/zh-CN/recipes.186acfa8ed2e8f00.webp differ
diff --git a/translated_images/zh-CN/recipes.9ad10d8a4056bf89.webp b/translated_images/zh-CN/recipes.9ad10d8a4056bf89.webp
new file mode 100644
index 000000000..2d2bb9141
Binary files /dev/null and b/translated_images/zh-CN/recipes.9ad10d8a4056bf89.webp differ
diff --git a/translated_images/zh-CN/scaled.91897dfbaa26ca4a.webp b/translated_images/zh-CN/scaled.91897dfbaa26ca4a.webp
new file mode 100644
index 000000000..04f2bec62
Binary files /dev/null and b/translated_images/zh-CN/scaled.91897dfbaa26ca4a.webp differ
diff --git a/translated_images/zh-CN/scaled.e35258ca5cd3d43f.webp b/translated_images/zh-CN/scaled.e35258ca5cd3d43f.webp
new file mode 100644
index 000000000..ac538319f
Binary files /dev/null and b/translated_images/zh-CN/scaled.e35258ca5cd3d43f.webp differ
diff --git a/translated_images/zh-CN/scatter-dayofyear-color.65790faefbb9d54f.webp b/translated_images/zh-CN/scatter-dayofyear-color.65790faefbb9d54f.webp
new file mode 100644
index 000000000..cd40af301
Binary files /dev/null and b/translated_images/zh-CN/scatter-dayofyear-color.65790faefbb9d54f.webp differ
diff --git a/translated_images/zh-CN/scatter-dayofyear.bc171c189c9fd553.webp b/translated_images/zh-CN/scatter-dayofyear.bc171c189c9fd553.webp
new file mode 100644
index 000000000..aec045037
Binary files /dev/null and b/translated_images/zh-CN/scatter-dayofyear.bc171c189c9fd553.webp differ
diff --git a/translated_images/zh-CN/scatterplot.ad8b356bcbb33be6.webp b/translated_images/zh-CN/scatterplot.ad8b356bcbb33be6.webp
new file mode 100644
index 000000000..9703a7283
Binary files /dev/null and b/translated_images/zh-CN/scatterplot.ad8b356bcbb33be6.webp differ
diff --git a/translated_images/zh-CN/scatterplot.b6868f44cbd2051c.webp b/translated_images/zh-CN/scatterplot.b6868f44cbd2051c.webp
new file mode 100644
index 000000000..bf0ce25bc
Binary files /dev/null and b/translated_images/zh-CN/scatterplot.b6868f44cbd2051c.webp differ
diff --git a/translated_images/zh-CN/shakey.4dc17819c447c05b.webp b/translated_images/zh-CN/shakey.4dc17819c447c05b.webp
new file mode 100644
index 000000000..2421233b5
Binary files /dev/null and b/translated_images/zh-CN/shakey.4dc17819c447c05b.webp differ
diff --git a/translated_images/zh-CN/sigmoid.8b7ba9d095c789cf.webp b/translated_images/zh-CN/sigmoid.8b7ba9d095c789cf.webp
new file mode 100644
index 000000000..f513a38cc
Binary files /dev/null and b/translated_images/zh-CN/sigmoid.8b7ba9d095c789cf.webp differ
diff --git a/translated_images/zh-CN/slope.f3c9d5910ddbfcf9.webp b/translated_images/zh-CN/slope.f3c9d5910ddbfcf9.webp
new file mode 100644
index 000000000..5320ffab0
Binary files /dev/null and b/translated_images/zh-CN/slope.f3c9d5910ddbfcf9.webp differ
diff --git a/translated_images/zh-CN/solvers.5fc648618529e627.webp b/translated_images/zh-CN/solvers.5fc648618529e627.webp
new file mode 100644
index 000000000..fdeb344f0
Binary files /dev/null and b/translated_images/zh-CN/solvers.5fc648618529e627.webp differ
diff --git a/translated_images/zh-CN/svm.621ae7b516d678e0.webp b/translated_images/zh-CN/svm.621ae7b516d678e0.webp
new file mode 100644
index 000000000..b154a7464
Binary files /dev/null and b/translated_images/zh-CN/svm.621ae7b516d678e0.webp differ
diff --git a/translated_images/zh-CN/swarm.56d253ae80a2c0f5.webp b/translated_images/zh-CN/swarm.56d253ae80a2c0f5.webp
new file mode 100644
index 000000000..c475c950a
Binary files /dev/null and b/translated_images/zh-CN/swarm.56d253ae80a2c0f5.webp differ
diff --git a/translated_images/zh-CN/swarm_2.efeacfca536c2b57.webp b/translated_images/zh-CN/swarm_2.efeacfca536c2b57.webp
new file mode 100644
index 000000000..87fcb275b
Binary files /dev/null and b/translated_images/zh-CN/swarm_2.efeacfca536c2b57.webp differ
diff --git a/translated_images/zh-CN/test-data-predict.8afc47ee7e52874f.webp b/translated_images/zh-CN/test-data-predict.8afc47ee7e52874f.webp
new file mode 100644
index 000000000..16f3e5e89
Binary files /dev/null and b/translated_images/zh-CN/test-data-predict.8afc47ee7e52874f.webp differ
diff --git a/translated_images/zh-CN/thai-food.c47a7a7f9f05c218.webp b/translated_images/zh-CN/thai-food.c47a7a7f9f05c218.webp
new file mode 100644
index 000000000..1eda3cea3
Binary files /dev/null and b/translated_images/zh-CN/thai-food.c47a7a7f9f05c218.webp differ
diff --git a/translated_images/zh-CN/thai.0269dbab2e78bd38.webp b/translated_images/zh-CN/thai.0269dbab2e78bd38.webp
new file mode 100644
index 000000000..6d43e1013
Binary files /dev/null and b/translated_images/zh-CN/thai.0269dbab2e78bd38.webp differ
diff --git a/translated_images/zh-CN/tokenization.1641a160c66cd2d9.webp b/translated_images/zh-CN/tokenization.1641a160c66cd2d9.webp
new file mode 100644
index 000000000..c3ebfb414
Binary files /dev/null and b/translated_images/zh-CN/tokenization.1641a160c66cd2d9.webp differ
diff --git a/translated_images/zh-CN/train-data-predict.3c4ef4e78553104f.webp b/translated_images/zh-CN/train-data-predict.3c4ef4e78553104f.webp
new file mode 100644
index 000000000..721bc0756
Binary files /dev/null and b/translated_images/zh-CN/train-data-predict.3c4ef4e78553104f.webp differ
diff --git a/translated_images/zh-CN/train-test.8928d14e5b91fc94.webp b/translated_images/zh-CN/train-test.8928d14e5b91fc94.webp
new file mode 100644
index 000000000..2f0315772
Binary files /dev/null and b/translated_images/zh-CN/train-test.8928d14e5b91fc94.webp differ
diff --git a/translated_images/zh-CN/train-test.ead0cecbfc341921.webp b/translated_images/zh-CN/train-test.ead0cecbfc341921.webp
new file mode 100644
index 000000000..2f0315772
Binary files /dev/null and b/translated_images/zh-CN/train-test.ead0cecbfc341921.webp differ
diff --git a/translated_images/zh-CN/train_progress_raw.2adfdf2daea09c59.webp b/translated_images/zh-CN/train_progress_raw.2adfdf2daea09c59.webp
new file mode 100644
index 000000000..9cb93e6e5
Binary files /dev/null and b/translated_images/zh-CN/train_progress_raw.2adfdf2daea09c59.webp differ
diff --git a/translated_images/zh-CN/train_progress_runav.c71694a8fa9ab359.webp b/translated_images/zh-CN/train_progress_runav.c71694a8fa9ab359.webp
new file mode 100644
index 000000000..46fb46ea5
Binary files /dev/null and b/translated_images/zh-CN/train_progress_runav.c71694a8fa9ab359.webp differ
diff --git a/translated_images/zh-CN/turntable.f2b86b13c53302dc.webp b/translated_images/zh-CN/turntable.f2b86b13c53302dc.webp
new file mode 100644
index 000000000..27c674f5b
Binary files /dev/null and b/translated_images/zh-CN/turntable.f2b86b13c53302dc.webp differ
diff --git a/translated_images/zh-CN/ufo.9e787f5161da9d4d.webp b/translated_images/zh-CN/ufo.9e787f5161da9d4d.webp
new file mode 100644
index 000000000..39f4c519f
Binary files /dev/null and b/translated_images/zh-CN/ufo.9e787f5161da9d4d.webp differ
diff --git a/translated_images/zh-CN/unruly_data.0eedc7ced92d2d91.webp b/translated_images/zh-CN/unruly_data.0eedc7ced92d2d91.webp
new file mode 100644
index 000000000..877c8cb92
Binary files /dev/null and b/translated_images/zh-CN/unruly_data.0eedc7ced92d2d91.webp differ
diff --git a/translated_images/zh-CN/violin.ffceb68923177011.webp b/translated_images/zh-CN/violin.ffceb68923177011.webp
new file mode 100644
index 000000000..54f301792
Binary files /dev/null and b/translated_images/zh-CN/violin.ffceb68923177011.webp differ
diff --git a/translated_images/zh-CN/voronoi.1dc1613fb0439b95.webp b/translated_images/zh-CN/voronoi.1dc1613fb0439b95.webp
new file mode 100644
index 000000000..4a7d11723
Binary files /dev/null and b/translated_images/zh-CN/voronoi.1dc1613fb0439b95.webp differ
diff --git a/translated_images/zh-CN/web-app.4c76450cabe20036.webp b/translated_images/zh-CN/web-app.4c76450cabe20036.webp
new file mode 100644
index 000000000..6878dad85
Binary files /dev/null and b/translated_images/zh-CN/web-app.4c76450cabe20036.webp differ
diff --git a/translated_images/zh-CN/wolf.a56d3d4070ca0c79.webp b/translated_images/zh-CN/wolf.a56d3d4070ca0c79.webp
new file mode 100644
index 000000000..a9f12f4bd
Binary files /dev/null and b/translated_images/zh-CN/wolf.a56d3d4070ca0c79.webp differ
diff --git a/translations/fa/.co-op-translator.json b/translations/fa/.co-op-translator.json
new file mode 100644
index 000000000..fd435c73e
--- /dev/null
+++ b/translations/fa/.co-op-translator.json
@@ -0,0 +1,596 @@
+{
+ "1-Introduction/1-intro-to-ML/README.md": {
+ "original_hash": "69389392fa6346e0dfa30f664b7b6fec",
+ "translation_date": "2025-09-04T22:40:32+00:00",
+ "source_file": "1-Introduction/1-intro-to-ML/README.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/1-intro-to-ML/assignment.md": {
+ "original_hash": "4c4698044bb8af52cfb6388a4ee0e53b",
+ "translation_date": "2025-09-03T23:38:44+00:00",
+ "source_file": "1-Introduction/1-intro-to-ML/assignment.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/2-history-of-ML/README.md": {
+ "original_hash": "6a05fec147e734c3e6bfa54505648e2b",
+ "translation_date": "2025-09-04T22:41:05+00:00",
+ "source_file": "1-Introduction/2-history-of-ML/README.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/2-history-of-ML/assignment.md": {
+ "original_hash": "eb6e4d5afd1b21a57d2b9e6d0aac3969",
+ "translation_date": "2025-09-03T23:42:42+00:00",
+ "source_file": "1-Introduction/2-history-of-ML/assignment.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/3-fairness/README.md": {
+ "original_hash": "9a6b702d1437c0467e3c5c28d763dac2",
+ "translation_date": "2025-09-04T22:39:13+00:00",
+ "source_file": "1-Introduction/3-fairness/README.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/3-fairness/assignment.md": {
+ "original_hash": "dbda60e7b1fe5f18974e7858eff0004e",
+ "translation_date": "2025-09-03T23:31:31+00:00",
+ "source_file": "1-Introduction/3-fairness/assignment.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/4-techniques-of-ML/README.md": {
+ "original_hash": "9d91f3af3758fdd4569fb410575995ef",
+ "translation_date": "2025-09-04T22:40:01+00:00",
+ "source_file": "1-Introduction/4-techniques-of-ML/README.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/4-techniques-of-ML/assignment.md": {
+ "original_hash": "70d65aeddc06170bc1aed5b27805f930",
+ "translation_date": "2025-09-03T23:35:27+00:00",
+ "source_file": "1-Introduction/4-techniques-of-ML/assignment.md",
+ "language_code": "fa"
+ },
+ "1-Introduction/README.md": {
+ "original_hash": "cf8ecc83f28e5b98051d2179eca08e08",
+ "translation_date": "2025-09-03T23:26:18+00:00",
+ "source_file": "1-Introduction/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/1-Tools/README.md": {
+ "original_hash": "fa81d226c71d5af7a2cade31c1c92b88",
+ "translation_date": "2025-09-04T22:33:17+00:00",
+ "source_file": "2-Regression/1-Tools/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/1-Tools/assignment.md": {
+ "original_hash": "74a5cf83e4ebc302afbcbc4f418afd0a",
+ "translation_date": "2025-09-03T22:36:30+00:00",
+ "source_file": "2-Regression/1-Tools/assignment.md",
+ "language_code": "fa"
+ },
+ "2-Regression/1-Tools/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T22:36:53+00:00",
+ "source_file": "2-Regression/1-Tools/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/2-Data/README.md": {
+ "original_hash": "7c077988328ebfe33b24d07945f16eca",
+ "translation_date": "2025-09-04T22:33:56+00:00",
+ "source_file": "2-Regression/2-Data/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/2-Data/assignment.md": {
+ "original_hash": "4485a1ed4dd1b5647365e3d87456515d",
+ "translation_date": "2025-09-03T22:40:35+00:00",
+ "source_file": "2-Regression/2-Data/assignment.md",
+ "language_code": "fa"
+ },
+ "2-Regression/2-Data/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T22:40:56+00:00",
+ "source_file": "2-Regression/2-Data/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/3-Linear/README.md": {
+ "original_hash": "40e64f004f3cb50aa1d8661672d3cd92",
+ "translation_date": "2025-09-04T22:31:20+00:00",
+ "source_file": "2-Regression/3-Linear/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/3-Linear/assignment.md": {
+ "original_hash": "cc471fa89c293bc735dd3a9a0fb79b1b",
+ "translation_date": "2025-09-03T22:22:20+00:00",
+ "source_file": "2-Regression/3-Linear/assignment.md",
+ "language_code": "fa"
+ },
+ "2-Regression/3-Linear/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T22:22:42+00:00",
+ "source_file": "2-Regression/3-Linear/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/4-Logistic/README.md": {
+ "original_hash": "abf86d845c84330bce205a46b382ec88",
+ "translation_date": "2025-09-04T22:32:20+00:00",
+ "source_file": "2-Regression/4-Logistic/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/4-Logistic/assignment.md": {
+ "original_hash": "8af40209a41494068c1f42b14c0b450d",
+ "translation_date": "2025-09-03T22:31:27+00:00",
+ "source_file": "2-Regression/4-Logistic/assignment.md",
+ "language_code": "fa"
+ },
+ "2-Regression/4-Logistic/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T22:31:48+00:00",
+ "source_file": "2-Regression/4-Logistic/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "2-Regression/README.md": {
+ "original_hash": "508582278dbb8edd2a8a80ac96ef416c",
+ "translation_date": "2025-09-03T22:15:39+00:00",
+ "source_file": "2-Regression/README.md",
+ "language_code": "fa"
+ },
+ "3-Web-App/1-Web-App/README.md": {
+ "original_hash": "e0b75f73e4a90d45181dc5581fe2ef5c",
+ "translation_date": "2025-09-04T22:41:40+00:00",
+ "source_file": "3-Web-App/1-Web-App/README.md",
+ "language_code": "fa"
+ },
+ "3-Web-App/1-Web-App/assignment.md": {
+ "original_hash": "a8e8ae10be335cbc745b75ee552317ff",
+ "translation_date": "2025-09-03T23:47:27+00:00",
+ "source_file": "3-Web-App/1-Web-App/assignment.md",
+ "language_code": "fa"
+ },
+ "3-Web-App/README.md": {
+ "original_hash": "9836ff53cfef716ddfd70e06c5f43436",
+ "translation_date": "2025-09-03T23:43:29+00:00",
+ "source_file": "3-Web-App/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/1-Introduction/README.md": {
+ "original_hash": "aaf391d922bd6de5efba871d514c6d47",
+ "translation_date": "2025-09-04T22:43:49+00:00",
+ "source_file": "4-Classification/1-Introduction/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/1-Introduction/assignment.md": {
+ "original_hash": "b2a01912beb24cfb0007f83594dba801",
+ "translation_date": "2025-09-04T00:04:16+00:00",
+ "source_file": "4-Classification/1-Introduction/assignment.md",
+ "language_code": "fa"
+ },
+ "4-Classification/1-Introduction/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T00:04:41+00:00",
+ "source_file": "4-Classification/1-Introduction/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/2-Classifiers-1/README.md": {
+ "original_hash": "1a6e9e46b34a2e559fbbfc1f95397c7b",
+ "translation_date": "2025-09-04T22:42:21+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/2-Classifiers-1/assignment.md": {
+ "original_hash": "de6025f96841498b0577e9d1aee18d1f",
+ "translation_date": "2025-09-03T23:54:16+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/assignment.md",
+ "language_code": "fa"
+ },
+ "4-Classification/2-Classifiers-1/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T23:54:38+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/3-Classifiers-2/README.md": {
+ "original_hash": "49047911108adc49d605cddfb455749c",
+ "translation_date": "2025-09-04T22:43:29+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/3-Classifiers-2/assignment.md": {
+ "original_hash": "58dfdaf79fb73f7d34b22bdbacf57329",
+ "translation_date": "2025-09-04T00:00:21+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/assignment.md",
+ "language_code": "fa"
+ },
+ "4-Classification/3-Classifiers-2/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T00:00:56+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/4-Applied/README.md": {
+ "original_hash": "61bdec27ed2da8b098cd9065405d9bb0",
+ "translation_date": "2025-09-04T22:43:03+00:00",
+ "source_file": "4-Classification/4-Applied/README.md",
+ "language_code": "fa"
+ },
+ "4-Classification/4-Applied/assignment.md": {
+ "original_hash": "799ed651e2af0a7cad17c6268db11578",
+ "translation_date": "2025-09-03T23:57:38+00:00",
+ "source_file": "4-Classification/4-Applied/assignment.md",
+ "language_code": "fa"
+ },
+ "4-Classification/README.md": {
+ "original_hash": "74e809ffd1e613a1058bbc3e9600859e",
+ "translation_date": "2025-09-03T23:49:20+00:00",
+ "source_file": "4-Classification/README.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/1-Visualize/README.md": {
+ "original_hash": "730225ea274c9174fe688b21d421539d",
+ "translation_date": "2025-09-04T22:36:14+00:00",
+ "source_file": "5-Clustering/1-Visualize/README.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/1-Visualize/assignment.md": {
+ "original_hash": "589fa015a5e7d9e67bd629f7d47b53de",
+ "translation_date": "2025-09-03T23:09:26+00:00",
+ "source_file": "5-Clustering/1-Visualize/assignment.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/1-Visualize/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T23:09:51+00:00",
+ "source_file": "5-Clustering/1-Visualize/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/2-K-Means/README.md": {
+ "original_hash": "7cdd17338d9bbd7e2171c2cd462eb081",
+ "translation_date": "2025-09-04T22:37:17+00:00",
+ "source_file": "5-Clustering/2-K-Means/README.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/2-K-Means/assignment.md": {
+ "original_hash": "b8e17eff34ad1680eba2a5d3cf9ffc41",
+ "translation_date": "2025-09-03T23:13:02+00:00",
+ "source_file": "5-Clustering/2-K-Means/assignment.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/2-K-Means/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T23:13:23+00:00",
+ "source_file": "5-Clustering/2-K-Means/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "5-Clustering/README.md": {
+ "original_hash": "b28a3a4911584062772c537b653ebbc7",
+ "translation_date": "2025-09-03T22:56:19+00:00",
+ "source_file": "5-Clustering/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/1-Introduction-to-NLP/README.md": {
+ "original_hash": "1c2ec40cf55c98a028a359c27ef7e45a",
+ "translation_date": "2025-09-04T22:48:19+00:00",
+ "source_file": "6-NLP/1-Introduction-to-NLP/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/1-Introduction-to-NLP/assignment.md": {
+ "original_hash": "1d7583e8046dacbb0c056d5ba0a71b16",
+ "translation_date": "2025-09-04T00:49:22+00:00",
+ "source_file": "6-NLP/1-Introduction-to-NLP/assignment.md",
+ "language_code": "fa"
+ },
+ "6-NLP/2-Tasks/README.md": {
+ "original_hash": "5f3cb462e3122e1afe7ab0050ccf2bd3",
+ "translation_date": "2025-09-04T22:46:26+00:00",
+ "source_file": "6-NLP/2-Tasks/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/2-Tasks/assignment.md": {
+ "original_hash": "2efc4c2aba5ed06c780c05539c492ae3",
+ "translation_date": "2025-09-04T00:37:31+00:00",
+ "source_file": "6-NLP/2-Tasks/assignment.md",
+ "language_code": "fa"
+ },
+ "6-NLP/3-Translation-Sentiment/README.md": {
+ "original_hash": "be03c8182982b87ced155e4e9d1438e8",
+ "translation_date": "2025-09-04T22:48:48+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/3-Translation-Sentiment/assignment.md": {
+ "original_hash": "9d2a734deb904caff310d1a999c6bd7a",
+ "translation_date": "2025-09-04T00:54:08+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/assignment.md",
+ "language_code": "fa"
+ },
+ "6-NLP/3-Translation-Sentiment/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T00:54:52+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/3-Translation-Sentiment/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-04T00:54:33+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/4-Hotel-Reviews-1/README.md": {
+ "original_hash": "8d32dadeda93c6fb5c43619854882ab1",
+ "translation_date": "2025-09-04T22:46:59+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/4-Hotel-Reviews-1/assignment.md": {
+ "original_hash": "bf39bceb833cd628f224941dca8041df",
+ "translation_date": "2025-09-04T00:44:37+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/assignment.md",
+ "language_code": "fa"
+ },
+ "6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T00:45:17+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/4-Hotel-Reviews-1/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-04T00:44:59+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/5-Hotel-Reviews-2/README.md": {
+ "original_hash": "2c742993fe95d5bcbb2846eda3d442a1",
+ "translation_date": "2025-09-04T22:49:36+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/5-Hotel-Reviews-2/assignment.md": {
+ "original_hash": "daf144daa552da6a7d442aff6f3e77d8",
+ "translation_date": "2025-09-04T00:59:59+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/assignment.md",
+ "language_code": "fa"
+ },
+ "6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T01:00:41+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/5-Hotel-Reviews-2/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-04T01:00:23+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/README.md": {
+ "original_hash": "1eb379dc2d0c9940b320732d16083778",
+ "translation_date": "2025-09-04T00:33:30+00:00",
+ "source_file": "6-NLP/README.md",
+ "language_code": "fa"
+ },
+ "6-NLP/data/README.md": {
+ "original_hash": "ee0670655c89e4719319764afb113624",
+ "translation_date": "2025-09-04T00:45:39+00:00",
+ "source_file": "6-NLP/data/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/1-Introduction/README.md": {
+ "original_hash": "662b509c39eee205687726636d0a8455",
+ "translation_date": "2025-09-04T22:35:08+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/1-Introduction/assignment.md": {
+ "original_hash": "d1781b0b92568ea1d119d0a198b576b4",
+ "translation_date": "2025-09-03T22:51:03+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/assignment.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/1-Introduction/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T22:51:45+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/1-Introduction/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T22:51:27+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/2-ARIMA/README.md": {
+ "original_hash": "917dbf890db71a322f306050cb284749",
+ "translation_date": "2025-09-04T22:34:29+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/2-ARIMA/assignment.md": {
+ "original_hash": "1c814013e10866dfd92cdb32caaae3ac",
+ "translation_date": "2025-09-03T22:46:36+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/assignment.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/2-ARIMA/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T22:47:16+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/2-ARIMA/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T22:46:58+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/3-SVR/README.md": {
+ "original_hash": "482bccabe1df958496ea71a3667995cd",
+ "translation_date": "2025-09-04T22:35:42+00:00",
+ "source_file": "7-TimeSeries/3-SVR/README.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/3-SVR/assignment.md": {
+ "original_hash": "94aa2fc6154252ae30a3f3740299707a",
+ "translation_date": "2025-09-03T22:55:17+00:00",
+ "source_file": "7-TimeSeries/3-SVR/assignment.md",
+ "language_code": "fa"
+ },
+ "7-TimeSeries/README.md": {
+ "original_hash": "61342603bad8acadbc6b2e4e3aab3f66",
+ "translation_date": "2025-09-03T22:41:38+00:00",
+ "source_file": "7-TimeSeries/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/1-QLearning/README.md": {
+ "original_hash": "911efd5e595089000cb3c16fce1beab8",
+ "translation_date": "2025-09-04T22:45:00+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/1-QLearning/assignment.md": {
+ "original_hash": "68394b2102d3503882e5e914bd0ff5c1",
+ "translation_date": "2025-09-04T00:24:37+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/assignment.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/1-QLearning/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T00:25:39+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/1-QLearning/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-04T00:25:22+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/2-Gym/README.md": {
+ "original_hash": "107d5bb29da8a562e7ae72262d251a75",
+ "translation_date": "2025-09-04T22:45:37+00:00",
+ "source_file": "8-Reinforcement/2-Gym/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/2-Gym/assignment.md": {
+ "original_hash": "1f2b7441745eb52e25745423b247016b",
+ "translation_date": "2025-09-04T00:31:48+00:00",
+ "source_file": "8-Reinforcement/2-Gym/assignment.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/2-Gym/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-04T00:32:44+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/Julia/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/2-Gym/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-04T00:32:26+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/R/README.md",
+ "language_code": "fa"
+ },
+ "8-Reinforcement/README.md": {
+ "original_hash": "20ca019012b1725de956681d036d8b18",
+ "translation_date": "2025-09-04T00:14:30+00:00",
+ "source_file": "8-Reinforcement/README.md",
+ "language_code": "fa"
+ },
+ "9-Real-World/1-Applications/README.md": {
+ "original_hash": "83320d6b6994909e35d830cebf214039",
+ "translation_date": "2025-09-04T22:37:44+00:00",
+ "source_file": "9-Real-World/1-Applications/README.md",
+ "language_code": "fa"
+ },
+ "9-Real-World/1-Applications/assignment.md": {
+ "original_hash": "fdebfcd0a3f12c9e2b436ded1aa79885",
+ "translation_date": "2025-09-03T23:19:47+00:00",
+ "source_file": "9-Real-World/1-Applications/assignment.md",
+ "language_code": "fa"
+ },
+ "9-Real-World/2-Debugging-ML-Models/README.md": {
+ "original_hash": "df2b538e8fbb3e91cf0419ae2f858675",
+ "translation_date": "2025-09-04T22:38:26+00:00",
+ "source_file": "9-Real-World/2-Debugging-ML-Models/README.md",
+ "language_code": "fa"
+ },
+ "9-Real-World/2-Debugging-ML-Models/assignment.md": {
+ "original_hash": "91c6a180ef08e20cc15acfd2d6d6e164",
+ "translation_date": "2025-09-03T23:25:34+00:00",
+ "source_file": "9-Real-World/2-Debugging-ML-Models/assignment.md",
+ "language_code": "fa"
+ },
+ "9-Real-World/README.md": {
+ "original_hash": "5e069a0ac02a9606a69946c2b3c574a9",
+ "translation_date": "2025-09-03T23:14:55+00:00",
+ "source_file": "9-Real-World/README.md",
+ "language_code": "fa"
+ },
+ "AGENTS.md": {
+ "original_hash": "93fdaa0fd38836e50c4793e2f2f25e8b",
+ "translation_date": "2025-10-03T10:58:53+00:00",
+ "source_file": "AGENTS.md",
+ "language_code": "fa"
+ },
+ "CODE_OF_CONDUCT.md": {
+ "original_hash": "c06b12caf3c901eb3156e3dd5b0aea56",
+ "translation_date": "2025-09-03T22:14:43+00:00",
+ "source_file": "CODE_OF_CONDUCT.md",
+ "language_code": "fa"
+ },
+ "CONTRIBUTING.md": {
+ "original_hash": "977ec5266dfd78ad1ce2bd8d46fccbda",
+ "translation_date": "2025-09-03T22:12:02+00:00",
+ "source_file": "CONTRIBUTING.md",
+ "language_code": "fa"
+ },
+ "README.md": {
+ "original_hash": "da2ceed62f16a0820259556e3a873c95",
+ "translation_date": "2026-01-29T17:41:26+00:00",
+ "source_file": "README.md",
+ "language_code": "fa"
+ },
+ "SECURITY.md": {
+ "original_hash": "5e1b8da31aae9cca3d53ad243fa3365a",
+ "translation_date": "2025-09-03T22:13:05+00:00",
+ "source_file": "SECURITY.md",
+ "language_code": "fa"
+ },
+ "SUPPORT.md": {
+ "original_hash": "09623d7343ff1c26ff4f198c1b2d3176",
+ "translation_date": "2025-10-03T11:39:50+00:00",
+ "source_file": "SUPPORT.md",
+ "language_code": "fa"
+ },
+ "TROUBLESHOOTING.md": {
+ "original_hash": "134d8759f0e2ab886e9aa4f62362c201",
+ "translation_date": "2025-10-03T12:37:28+00:00",
+ "source_file": "TROUBLESHOOTING.md",
+ "language_code": "fa"
+ },
+ "docs/_sidebar.md": {
+ "original_hash": "68dd06c685f6ce840e0acfa313352e7c",
+ "translation_date": "2025-09-03T23:14:07+00:00",
+ "source_file": "docs/_sidebar.md",
+ "language_code": "fa"
+ },
+ "for-teachers.md": {
+ "original_hash": "b37de02054fa6c0438ede6fabe1fdfb8",
+ "translation_date": "2025-09-03T22:14:05+00:00",
+ "source_file": "for-teachers.md",
+ "language_code": "fa"
+ },
+ "quiz-app/README.md": {
+ "original_hash": "6d130dffca5db70d7e615f926cb1ad4c",
+ "translation_date": "2025-09-03T23:48:23+00:00",
+ "source_file": "quiz-app/README.md",
+ "language_code": "fa"
+ },
+ "sketchnotes/LICENSE.md": {
+ "original_hash": "fba3b94d88bfb9b81369b869a1e9a20f",
+ "translation_date": "2025-09-04T00:11:51+00:00",
+ "source_file": "sketchnotes/LICENSE.md",
+ "language_code": "fa"
+ },
+ "sketchnotes/README.md": {
+ "original_hash": "a88d5918c1b9da69a40d917a0840c497",
+ "translation_date": "2025-09-04T00:05:08+00:00",
+ "source_file": "sketchnotes/README.md",
+ "language_code": "fa"
+ }
+}
\ No newline at end of file
diff --git a/translations/ur/.co-op-translator.json b/translations/ur/.co-op-translator.json
new file mode 100644
index 000000000..e08f95145
--- /dev/null
+++ b/translations/ur/.co-op-translator.json
@@ -0,0 +1,596 @@
+{
+ "1-Introduction/1-intro-to-ML/README.md": {
+ "original_hash": "69389392fa6346e0dfa30f664b7b6fec",
+ "translation_date": "2025-09-06T08:53:11+00:00",
+ "source_file": "1-Introduction/1-intro-to-ML/README.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/1-intro-to-ML/assignment.md": {
+ "original_hash": "4c4698044bb8af52cfb6388a4ee0e53b",
+ "translation_date": "2025-08-29T13:44:16+00:00",
+ "source_file": "1-Introduction/1-intro-to-ML/assignment.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/2-history-of-ML/README.md": {
+ "original_hash": "6a05fec147e734c3e6bfa54505648e2b",
+ "translation_date": "2025-09-06T08:53:42+00:00",
+ "source_file": "1-Introduction/2-history-of-ML/README.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/2-history-of-ML/assignment.md": {
+ "original_hash": "eb6e4d5afd1b21a57d2b9e6d0aac3969",
+ "translation_date": "2025-08-29T13:47:20+00:00",
+ "source_file": "1-Introduction/2-history-of-ML/assignment.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/3-fairness/README.md": {
+ "original_hash": "9a6b702d1437c0467e3c5c28d763dac2",
+ "translation_date": "2025-09-06T08:51:44+00:00",
+ "source_file": "1-Introduction/3-fairness/README.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/3-fairness/assignment.md": {
+ "original_hash": "dbda60e7b1fe5f18974e7858eff0004e",
+ "translation_date": "2025-08-29T13:38:50+00:00",
+ "source_file": "1-Introduction/3-fairness/assignment.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/4-techniques-of-ML/README.md": {
+ "original_hash": "9d91f3af3758fdd4569fb410575995ef",
+ "translation_date": "2025-09-06T08:52:31+00:00",
+ "source_file": "1-Introduction/4-techniques-of-ML/README.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/4-techniques-of-ML/assignment.md": {
+ "original_hash": "70d65aeddc06170bc1aed5b27805f930",
+ "translation_date": "2025-08-29T13:42:01+00:00",
+ "source_file": "1-Introduction/4-techniques-of-ML/assignment.md",
+ "language_code": "ur"
+ },
+ "1-Introduction/README.md": {
+ "original_hash": "cf8ecc83f28e5b98051d2179eca08e08",
+ "translation_date": "2025-08-29T13:35:37+00:00",
+ "source_file": "1-Introduction/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/1-Tools/README.md": {
+ "original_hash": "fa81d226c71d5af7a2cade31c1c92b88",
+ "translation_date": "2025-09-06T08:45:16+00:00",
+ "source_file": "2-Regression/1-Tools/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/1-Tools/assignment.md": {
+ "original_hash": "74a5cf83e4ebc302afbcbc4f418afd0a",
+ "translation_date": "2025-08-29T13:05:05+00:00",
+ "source_file": "2-Regression/1-Tools/assignment.md",
+ "language_code": "ur"
+ },
+ "2-Regression/1-Tools/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:05:27+00:00",
+ "source_file": "2-Regression/1-Tools/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/2-Data/README.md": {
+ "original_hash": "7c077988328ebfe33b24d07945f16eca",
+ "translation_date": "2025-09-06T08:46:01+00:00",
+ "source_file": "2-Regression/2-Data/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/2-Data/assignment.md": {
+ "original_hash": "4485a1ed4dd1b5647365e3d87456515d",
+ "translation_date": "2025-08-29T13:08:12+00:00",
+ "source_file": "2-Regression/2-Data/assignment.md",
+ "language_code": "ur"
+ },
+ "2-Regression/2-Data/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:08:29+00:00",
+ "source_file": "2-Regression/2-Data/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/3-Linear/README.md": {
+ "original_hash": "40e64f004f3cb50aa1d8661672d3cd92",
+ "translation_date": "2025-09-06T08:43:17+00:00",
+ "source_file": "2-Regression/3-Linear/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/3-Linear/assignment.md": {
+ "original_hash": "cc471fa89c293bc735dd3a9a0fb79b1b",
+ "translation_date": "2025-08-29T12:56:05+00:00",
+ "source_file": "2-Regression/3-Linear/assignment.md",
+ "language_code": "ur"
+ },
+ "2-Regression/3-Linear/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T12:56:23+00:00",
+ "source_file": "2-Regression/3-Linear/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/4-Logistic/README.md": {
+ "original_hash": "abf86d845c84330bce205a46b382ec88",
+ "translation_date": "2025-09-06T08:44:16+00:00",
+ "source_file": "2-Regression/4-Logistic/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/4-Logistic/assignment.md": {
+ "original_hash": "8af40209a41494068c1f42b14c0b450d",
+ "translation_date": "2025-08-29T13:01:20+00:00",
+ "source_file": "2-Regression/4-Logistic/assignment.md",
+ "language_code": "ur"
+ },
+ "2-Regression/4-Logistic/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:01:38+00:00",
+ "source_file": "2-Regression/4-Logistic/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "2-Regression/README.md": {
+ "original_hash": "508582278dbb8edd2a8a80ac96ef416c",
+ "translation_date": "2025-08-29T12:51:42+00:00",
+ "source_file": "2-Regression/README.md",
+ "language_code": "ur"
+ },
+ "3-Web-App/1-Web-App/README.md": {
+ "original_hash": "e0b75f73e4a90d45181dc5581fe2ef5c",
+ "translation_date": "2025-09-06T08:54:15+00:00",
+ "source_file": "3-Web-App/1-Web-App/README.md",
+ "language_code": "ur"
+ },
+ "3-Web-App/1-Web-App/assignment.md": {
+ "original_hash": "a8e8ae10be335cbc745b75ee552317ff",
+ "translation_date": "2025-08-29T13:50:30+00:00",
+ "source_file": "3-Web-App/1-Web-App/assignment.md",
+ "language_code": "ur"
+ },
+ "3-Web-App/README.md": {
+ "original_hash": "9836ff53cfef716ddfd70e06c5f43436",
+ "translation_date": "2025-08-29T13:47:43+00:00",
+ "source_file": "3-Web-App/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/1-Introduction/README.md": {
+ "original_hash": "aaf391d922bd6de5efba871d514c6d47",
+ "translation_date": "2025-09-06T08:56:19+00:00",
+ "source_file": "4-Classification/1-Introduction/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/1-Introduction/assignment.md": {
+ "original_hash": "b2a01912beb24cfb0007f83594dba801",
+ "translation_date": "2025-08-29T14:01:43+00:00",
+ "source_file": "4-Classification/1-Introduction/assignment.md",
+ "language_code": "ur"
+ },
+ "4-Classification/1-Introduction/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T14:02:06+00:00",
+ "source_file": "4-Classification/1-Introduction/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/2-Classifiers-1/README.md": {
+ "original_hash": "1a6e9e46b34a2e559fbbfc1f95397c7b",
+ "translation_date": "2025-09-06T08:54:49+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/2-Classifiers-1/assignment.md": {
+ "original_hash": "de6025f96841498b0577e9d1aee18d1f",
+ "translation_date": "2025-08-29T13:55:24+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/assignment.md",
+ "language_code": "ur"
+ },
+ "4-Classification/2-Classifiers-1/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:55:44+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/3-Classifiers-2/README.md": {
+ "original_hash": "49047911108adc49d605cddfb455749c",
+ "translation_date": "2025-09-06T08:55:57+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/3-Classifiers-2/assignment.md": {
+ "original_hash": "58dfdaf79fb73f7d34b22bdbacf57329",
+ "translation_date": "2025-08-29T13:59:19+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/assignment.md",
+ "language_code": "ur"
+ },
+ "4-Classification/3-Classifiers-2/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:59:38+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/4-Applied/README.md": {
+ "original_hash": "61bdec27ed2da8b098cd9065405d9bb0",
+ "translation_date": "2025-09-06T08:55:31+00:00",
+ "source_file": "4-Classification/4-Applied/README.md",
+ "language_code": "ur"
+ },
+ "4-Classification/4-Applied/assignment.md": {
+ "original_hash": "799ed651e2af0a7cad17c6268db11578",
+ "translation_date": "2025-08-29T13:57:35+00:00",
+ "source_file": "4-Classification/4-Applied/assignment.md",
+ "language_code": "ur"
+ },
+ "4-Classification/README.md": {
+ "original_hash": "74e809ffd1e613a1058bbc3e9600859e",
+ "translation_date": "2025-08-29T13:51:45+00:00",
+ "source_file": "4-Classification/README.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/1-Visualize/README.md": {
+ "original_hash": "730225ea274c9174fe688b21d421539d",
+ "translation_date": "2025-09-06T08:48:36+00:00",
+ "source_file": "5-Clustering/1-Visualize/README.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/1-Visualize/assignment.md": {
+ "original_hash": "589fa015a5e7d9e67bd629f7d47b53de",
+ "translation_date": "2025-08-29T13:25:29+00:00",
+ "source_file": "5-Clustering/1-Visualize/assignment.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/1-Visualize/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:25:50+00:00",
+ "source_file": "5-Clustering/1-Visualize/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/2-K-Means/README.md": {
+ "original_hash": "7cdd17338d9bbd7e2171c2cd462eb081",
+ "translation_date": "2025-09-06T08:49:43+00:00",
+ "source_file": "5-Clustering/2-K-Means/README.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/2-K-Means/assignment.md": {
+ "original_hash": "b8e17eff34ad1680eba2a5d3cf9ffc41",
+ "translation_date": "2025-08-29T13:27:54+00:00",
+ "source_file": "5-Clustering/2-K-Means/assignment.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/2-K-Means/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:28:12+00:00",
+ "source_file": "5-Clustering/2-K-Means/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "5-Clustering/README.md": {
+ "original_hash": "b28a3a4911584062772c537b653ebbc7",
+ "translation_date": "2025-08-29T13:18:41+00:00",
+ "source_file": "5-Clustering/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/1-Introduction-to-NLP/README.md": {
+ "original_hash": "1c2ec40cf55c98a028a359c27ef7e45a",
+ "translation_date": "2025-09-06T09:01:56+00:00",
+ "source_file": "6-NLP/1-Introduction-to-NLP/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/1-Introduction-to-NLP/assignment.md": {
+ "original_hash": "1d7583e8046dacbb0c056d5ba0a71b16",
+ "translation_date": "2025-08-29T14:32:08+00:00",
+ "source_file": "6-NLP/1-Introduction-to-NLP/assignment.md",
+ "language_code": "ur"
+ },
+ "6-NLP/2-Tasks/README.md": {
+ "original_hash": "5f3cb462e3122e1afe7ab0050ccf2bd3",
+ "translation_date": "2025-09-06T08:58:58+00:00",
+ "source_file": "6-NLP/2-Tasks/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/2-Tasks/assignment.md": {
+ "original_hash": "2efc4c2aba5ed06c780c05539c492ae3",
+ "translation_date": "2025-08-29T14:22:27+00:00",
+ "source_file": "6-NLP/2-Tasks/assignment.md",
+ "language_code": "ur"
+ },
+ "6-NLP/3-Translation-Sentiment/README.md": {
+ "original_hash": "be03c8182982b87ced155e4e9d1438e8",
+ "translation_date": "2025-09-06T09:02:30+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/3-Translation-Sentiment/assignment.md": {
+ "original_hash": "9d2a734deb904caff310d1a999c6bd7a",
+ "translation_date": "2025-08-29T14:35:10+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/assignment.md",
+ "language_code": "ur"
+ },
+ "6-NLP/3-Translation-Sentiment/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T14:35:43+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/3-Translation-Sentiment/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T14:35:30+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/4-Hotel-Reviews-1/README.md": {
+ "original_hash": "8d32dadeda93c6fb5c43619854882ab1",
+ "translation_date": "2025-09-06T09:00:38+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/4-Hotel-Reviews-1/assignment.md": {
+ "original_hash": "bf39bceb833cd628f224941dca8041df",
+ "translation_date": "2025-08-29T14:29:12+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/assignment.md",
+ "language_code": "ur"
+ },
+ "6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T14:29:40+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/4-Hotel-Reviews-1/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T14:29:27+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/5-Hotel-Reviews-2/README.md": {
+ "original_hash": "2c742993fe95d5bcbb2846eda3d442a1",
+ "translation_date": "2025-09-06T09:03:36+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/5-Hotel-Reviews-2/assignment.md": {
+ "original_hash": "daf144daa552da6a7d442aff6f3e77d8",
+ "translation_date": "2025-08-29T14:38:51+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/assignment.md",
+ "language_code": "ur"
+ },
+ "6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T14:39:24+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/5-Hotel-Reviews-2/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T14:39:09+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/README.md": {
+ "original_hash": "1eb379dc2d0c9940b320732d16083778",
+ "translation_date": "2025-08-29T14:19:12+00:00",
+ "source_file": "6-NLP/README.md",
+ "language_code": "ur"
+ },
+ "6-NLP/data/README.md": {
+ "original_hash": "ee0670655c89e4719319764afb113624",
+ "translation_date": "2025-08-29T14:29:53+00:00",
+ "source_file": "6-NLP/data/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/1-Introduction/README.md": {
+ "original_hash": "662b509c39eee205687726636d0a8455",
+ "translation_date": "2025-09-06T08:47:30+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/1-Introduction/assignment.md": {
+ "original_hash": "d1781b0b92568ea1d119d0a198b576b4",
+ "translation_date": "2025-08-29T13:15:05+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/assignment.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/1-Introduction/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:15:48+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/1-Introduction/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T13:15:32+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/2-ARIMA/README.md": {
+ "original_hash": "917dbf890db71a322f306050cb284749",
+ "translation_date": "2025-09-06T08:46:38+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/2-ARIMA/assignment.md": {
+ "original_hash": "1c814013e10866dfd92cdb32caaae3ac",
+ "translation_date": "2025-08-29T13:11:54+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/assignment.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/2-ARIMA/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T13:12:28+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/2-ARIMA/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T13:12:14+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/3-SVR/README.md": {
+ "original_hash": "482bccabe1df958496ea71a3667995cd",
+ "translation_date": "2025-09-06T08:48:05+00:00",
+ "source_file": "7-TimeSeries/3-SVR/README.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/3-SVR/assignment.md": {
+ "original_hash": "94aa2fc6154252ae30a3f3740299707a",
+ "translation_date": "2025-08-29T13:18:11+00:00",
+ "source_file": "7-TimeSeries/3-SVR/assignment.md",
+ "language_code": "ur"
+ },
+ "7-TimeSeries/README.md": {
+ "original_hash": "61342603bad8acadbc6b2e4e3aab3f66",
+ "translation_date": "2025-08-29T13:08:50+00:00",
+ "source_file": "7-TimeSeries/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/1-QLearning/README.md": {
+ "original_hash": "911efd5e595089000cb3c16fce1beab8",
+ "translation_date": "2025-09-06T08:57:34+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/1-QLearning/assignment.md": {
+ "original_hash": "68394b2102d3503882e5e914bd0ff5c1",
+ "translation_date": "2025-08-29T14:14:05+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/assignment.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/1-QLearning/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T14:15:11+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/1-QLearning/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T14:14:57+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/2-Gym/README.md": {
+ "original_hash": "107d5bb29da8a562e7ae72262d251a75",
+ "translation_date": "2025-09-06T08:58:16+00:00",
+ "source_file": "8-Reinforcement/2-Gym/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/2-Gym/assignment.md": {
+ "original_hash": "1f2b7441745eb52e25745423b247016b",
+ "translation_date": "2025-08-29T14:18:07+00:00",
+ "source_file": "8-Reinforcement/2-Gym/assignment.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/2-Gym/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-08-29T14:18:51+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/Julia/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/2-Gym/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-08-29T14:18:37+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/R/README.md",
+ "language_code": "ur"
+ },
+ "8-Reinforcement/README.md": {
+ "original_hash": "20ca019012b1725de956681d036d8b18",
+ "translation_date": "2025-08-29T14:07:54+00:00",
+ "source_file": "8-Reinforcement/README.md",
+ "language_code": "ur"
+ },
+ "9-Real-World/1-Applications/README.md": {
+ "original_hash": "83320d6b6994909e35d830cebf214039",
+ "translation_date": "2025-09-06T08:50:10+00:00",
+ "source_file": "9-Real-World/1-Applications/README.md",
+ "language_code": "ur"
+ },
+ "9-Real-World/1-Applications/assignment.md": {
+ "original_hash": "fdebfcd0a3f12c9e2b436ded1aa79885",
+ "translation_date": "2025-08-29T13:32:06+00:00",
+ "source_file": "9-Real-World/1-Applications/assignment.md",
+ "language_code": "ur"
+ },
+ "9-Real-World/2-Debugging-ML-Models/README.md": {
+ "original_hash": "df2b538e8fbb3e91cf0419ae2f858675",
+ "translation_date": "2025-09-06T08:50:55+00:00",
+ "source_file": "9-Real-World/2-Debugging-ML-Models/README.md",
+ "language_code": "ur"
+ },
+ "9-Real-World/2-Debugging-ML-Models/assignment.md": {
+ "original_hash": "91c6a180ef08e20cc15acfd2d6d6e164",
+ "translation_date": "2025-08-29T13:35:10+00:00",
+ "source_file": "9-Real-World/2-Debugging-ML-Models/assignment.md",
+ "language_code": "ur"
+ },
+ "9-Real-World/README.md": {
+ "original_hash": "5e069a0ac02a9606a69946c2b3c574a9",
+ "translation_date": "2025-08-29T13:29:08+00:00",
+ "source_file": "9-Real-World/README.md",
+ "language_code": "ur"
+ },
+ "AGENTS.md": {
+ "original_hash": "93fdaa0fd38836e50c4793e2f2f25e8b",
+ "translation_date": "2025-10-03T10:59:25+00:00",
+ "source_file": "AGENTS.md",
+ "language_code": "ur"
+ },
+ "CODE_OF_CONDUCT.md": {
+ "original_hash": "c06b12caf3c901eb3156e3dd5b0aea56",
+ "translation_date": "2025-08-29T12:51:12+00:00",
+ "source_file": "CODE_OF_CONDUCT.md",
+ "language_code": "ur"
+ },
+ "CONTRIBUTING.md": {
+ "original_hash": "977ec5266dfd78ad1ce2bd8d46fccbda",
+ "translation_date": "2025-08-29T12:49:28+00:00",
+ "source_file": "CONTRIBUTING.md",
+ "language_code": "ur"
+ },
+ "README.md": {
+ "original_hash": "da2ceed62f16a0820259556e3a873c95",
+ "translation_date": "2026-01-29T17:42:58+00:00",
+ "source_file": "README.md",
+ "language_code": "ur"
+ },
+ "SECURITY.md": {
+ "original_hash": "5e1b8da31aae9cca3d53ad243fa3365a",
+ "translation_date": "2025-08-29T12:50:02+00:00",
+ "source_file": "SECURITY.md",
+ "language_code": "ur"
+ },
+ "SUPPORT.md": {
+ "original_hash": "09623d7343ff1c26ff4f198c1b2d3176",
+ "translation_date": "2025-10-03T11:40:56+00:00",
+ "source_file": "SUPPORT.md",
+ "language_code": "ur"
+ },
+ "TROUBLESHOOTING.md": {
+ "original_hash": "134d8759f0e2ab886e9aa4f62362c201",
+ "translation_date": "2025-10-03T12:37:51+00:00",
+ "source_file": "TROUBLESHOOTING.md",
+ "language_code": "ur"
+ },
+ "docs/_sidebar.md": {
+ "original_hash": "68dd06c685f6ce840e0acfa313352e7c",
+ "translation_date": "2025-08-29T13:28:33+00:00",
+ "source_file": "docs/_sidebar.md",
+ "language_code": "ur"
+ },
+ "for-teachers.md": {
+ "original_hash": "b37de02054fa6c0438ede6fabe1fdfb8",
+ "translation_date": "2025-08-29T12:50:43+00:00",
+ "source_file": "for-teachers.md",
+ "language_code": "ur"
+ },
+ "quiz-app/README.md": {
+ "original_hash": "6d130dffca5db70d7e615f926cb1ad4c",
+ "translation_date": "2025-08-29T13:51:00+00:00",
+ "source_file": "quiz-app/README.md",
+ "language_code": "ur"
+ },
+ "sketchnotes/LICENSE.md": {
+ "original_hash": "fba3b94d88bfb9b81369b869a1e9a20f",
+ "translation_date": "2025-08-29T14:04:44+00:00",
+ "source_file": "sketchnotes/LICENSE.md",
+ "language_code": "ur"
+ },
+ "sketchnotes/README.md": {
+ "original_hash": "a88d5918c1b9da69a40d917a0840c497",
+ "translation_date": "2025-08-29T14:02:20+00:00",
+ "source_file": "sketchnotes/README.md",
+ "language_code": "ur"
+ }
+}
\ No newline at end of file
diff --git a/translations/zh-CN/.co-op-translator.json b/translations/zh-CN/.co-op-translator.json
new file mode 100644
index 000000000..02dd2ff9c
--- /dev/null
+++ b/translations/zh-CN/.co-op-translator.json
@@ -0,0 +1,596 @@
+{
+ "1-Introduction/1-intro-to-ML/README.md": {
+ "original_hash": "69389392fa6346e0dfa30f664b7b6fec",
+ "translation_date": "2025-09-05T09:05:11+00:00",
+ "source_file": "1-Introduction/1-intro-to-ML/README.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/1-intro-to-ML/assignment.md": {
+ "original_hash": "4c4698044bb8af52cfb6388a4ee0e53b",
+ "translation_date": "2025-09-03T17:48:58+00:00",
+ "source_file": "1-Introduction/1-intro-to-ML/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/2-history-of-ML/README.md": {
+ "original_hash": "6a05fec147e734c3e6bfa54505648e2b",
+ "translation_date": "2025-09-05T09:05:36+00:00",
+ "source_file": "1-Introduction/2-history-of-ML/README.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/2-history-of-ML/assignment.md": {
+ "original_hash": "eb6e4d5afd1b21a57d2b9e6d0aac3969",
+ "translation_date": "2025-09-03T17:52:58+00:00",
+ "source_file": "1-Introduction/2-history-of-ML/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/3-fairness/README.md": {
+ "original_hash": "9a6b702d1437c0467e3c5c28d763dac2",
+ "translation_date": "2025-09-05T09:04:06+00:00",
+ "source_file": "1-Introduction/3-fairness/README.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/3-fairness/assignment.md": {
+ "original_hash": "dbda60e7b1fe5f18974e7858eff0004e",
+ "translation_date": "2025-09-03T17:41:11+00:00",
+ "source_file": "1-Introduction/3-fairness/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/4-techniques-of-ML/README.md": {
+ "original_hash": "9d91f3af3758fdd4569fb410575995ef",
+ "translation_date": "2025-09-05T09:04:44+00:00",
+ "source_file": "1-Introduction/4-techniques-of-ML/README.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/4-techniques-of-ML/assignment.md": {
+ "original_hash": "70d65aeddc06170bc1aed5b27805f930",
+ "translation_date": "2025-09-03T17:45:24+00:00",
+ "source_file": "1-Introduction/4-techniques-of-ML/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "1-Introduction/README.md": {
+ "original_hash": "cf8ecc83f28e5b98051d2179eca08e08",
+ "translation_date": "2025-09-03T17:33:58+00:00",
+ "source_file": "1-Introduction/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/1-Tools/README.md": {
+ "original_hash": "fa81d226c71d5af7a2cade31c1c92b88",
+ "translation_date": "2025-09-05T08:58:09+00:00",
+ "source_file": "2-Regression/1-Tools/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/1-Tools/assignment.md": {
+ "original_hash": "74a5cf83e4ebc302afbcbc4f418afd0a",
+ "translation_date": "2025-09-03T16:40:46+00:00",
+ "source_file": "2-Regression/1-Tools/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/1-Tools/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T16:41:15+00:00",
+ "source_file": "2-Regression/1-Tools/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/2-Data/README.md": {
+ "original_hash": "7c077988328ebfe33b24d07945f16eca",
+ "translation_date": "2025-09-05T08:58:47+00:00",
+ "source_file": "2-Regression/2-Data/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/2-Data/assignment.md": {
+ "original_hash": "4485a1ed4dd1b5647365e3d87456515d",
+ "translation_date": "2025-09-03T16:45:08+00:00",
+ "source_file": "2-Regression/2-Data/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/2-Data/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T16:45:35+00:00",
+ "source_file": "2-Regression/2-Data/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/3-Linear/README.md": {
+ "original_hash": "40e64f004f3cb50aa1d8661672d3cd92",
+ "translation_date": "2025-09-05T08:55:34+00:00",
+ "source_file": "2-Regression/3-Linear/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/3-Linear/assignment.md": {
+ "original_hash": "cc471fa89c293bc735dd3a9a0fb79b1b",
+ "translation_date": "2025-09-03T16:25:06+00:00",
+ "source_file": "2-Regression/3-Linear/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/3-Linear/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T16:25:33+00:00",
+ "source_file": "2-Regression/3-Linear/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/4-Logistic/README.md": {
+ "original_hash": "abf86d845c84330bce205a46b382ec88",
+ "translation_date": "2025-09-05T08:57:14+00:00",
+ "source_file": "2-Regression/4-Logistic/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/4-Logistic/assignment.md": {
+ "original_hash": "8af40209a41494068c1f42b14c0b450d",
+ "translation_date": "2025-09-03T16:35:04+00:00",
+ "source_file": "2-Regression/4-Logistic/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/4-Logistic/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T16:35:32+00:00",
+ "source_file": "2-Regression/4-Logistic/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "2-Regression/README.md": {
+ "original_hash": "508582278dbb8edd2a8a80ac96ef416c",
+ "translation_date": "2025-09-03T16:17:29+00:00",
+ "source_file": "2-Regression/README.md",
+ "language_code": "zh-CN"
+ },
+ "3-Web-App/1-Web-App/README.md": {
+ "original_hash": "e0b75f73e4a90d45181dc5581fe2ef5c",
+ "translation_date": "2025-09-05T09:06:07+00:00",
+ "source_file": "3-Web-App/1-Web-App/README.md",
+ "language_code": "zh-CN"
+ },
+ "3-Web-App/1-Web-App/assignment.md": {
+ "original_hash": "a8e8ae10be335cbc745b75ee552317ff",
+ "translation_date": "2025-09-03T17:57:46+00:00",
+ "source_file": "3-Web-App/1-Web-App/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "3-Web-App/README.md": {
+ "original_hash": "9836ff53cfef716ddfd70e06c5f43436",
+ "translation_date": "2025-09-03T17:53:39+00:00",
+ "source_file": "3-Web-App/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/1-Introduction/README.md": {
+ "original_hash": "aaf391d922bd6de5efba871d514c6d47",
+ "translation_date": "2025-09-05T09:08:05+00:00",
+ "source_file": "4-Classification/1-Introduction/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/1-Introduction/assignment.md": {
+ "original_hash": "b2a01912beb24cfb0007f83594dba801",
+ "translation_date": "2025-09-03T18:15:53+00:00",
+ "source_file": "4-Classification/1-Introduction/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/1-Introduction/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T18:16:22+00:00",
+ "source_file": "4-Classification/1-Introduction/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/2-Classifiers-1/README.md": {
+ "original_hash": "1a6e9e46b34a2e559fbbfc1f95397c7b",
+ "translation_date": "2025-09-05T09:06:37+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/2-Classifiers-1/assignment.md": {
+ "original_hash": "de6025f96841498b0577e9d1aee18d1f",
+ "translation_date": "2025-09-03T18:05:14+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/2-Classifiers-1/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T18:05:41+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/3-Classifiers-2/README.md": {
+ "original_hash": "49047911108adc49d605cddfb455749c",
+ "translation_date": "2025-09-05T09:07:46+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/3-Classifiers-2/assignment.md": {
+ "original_hash": "58dfdaf79fb73f7d34b22bdbacf57329",
+ "translation_date": "2025-09-03T18:11:55+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/3-Classifiers-2/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T18:12:23+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/4-Applied/README.md": {
+ "original_hash": "61bdec27ed2da8b098cd9065405d9bb0",
+ "translation_date": "2025-09-05T09:07:17+00:00",
+ "source_file": "4-Classification/4-Applied/README.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/4-Applied/assignment.md": {
+ "original_hash": "799ed651e2af0a7cad17c6268db11578",
+ "translation_date": "2025-09-03T18:09:03+00:00",
+ "source_file": "4-Classification/4-Applied/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "4-Classification/README.md": {
+ "original_hash": "74e809ffd1e613a1058bbc3e9600859e",
+ "translation_date": "2025-09-03T17:59:47+00:00",
+ "source_file": "4-Classification/README.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/1-Visualize/README.md": {
+ "original_hash": "730225ea274c9174fe688b21d421539d",
+ "translation_date": "2025-09-05T09:00:51+00:00",
+ "source_file": "5-Clustering/1-Visualize/README.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/1-Visualize/assignment.md": {
+ "original_hash": "589fa015a5e7d9e67bd629f7d47b53de",
+ "translation_date": "2025-09-03T17:16:31+00:00",
+ "source_file": "5-Clustering/1-Visualize/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/1-Visualize/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T17:17:00+00:00",
+ "source_file": "5-Clustering/1-Visualize/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/2-K-Means/README.md": {
+ "original_hash": "7cdd17338d9bbd7e2171c2cd462eb081",
+ "translation_date": "2025-09-05T09:02:01+00:00",
+ "source_file": "5-Clustering/2-K-Means/README.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/2-K-Means/assignment.md": {
+ "original_hash": "b8e17eff34ad1680eba2a5d3cf9ffc41",
+ "translation_date": "2025-09-03T17:20:07+00:00",
+ "source_file": "5-Clustering/2-K-Means/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/2-K-Means/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T17:20:32+00:00",
+ "source_file": "5-Clustering/2-K-Means/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "5-Clustering/README.md": {
+ "original_hash": "b28a3a4911584062772c537b653ebbc7",
+ "translation_date": "2025-09-03T17:02:10+00:00",
+ "source_file": "5-Clustering/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/1-Introduction-to-NLP/README.md": {
+ "original_hash": "1c2ec40cf55c98a028a359c27ef7e45a",
+ "translation_date": "2025-09-05T09:11:59+00:00",
+ "source_file": "6-NLP/1-Introduction-to-NLP/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/1-Introduction-to-NLP/assignment.md": {
+ "original_hash": "1d7583e8046dacbb0c056d5ba0a71b16",
+ "translation_date": "2025-09-03T19:02:50+00:00",
+ "source_file": "6-NLP/1-Introduction-to-NLP/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/2-Tasks/README.md": {
+ "original_hash": "5f3cb462e3122e1afe7ab0050ccf2bd3",
+ "translation_date": "2025-09-05T09:10:13+00:00",
+ "source_file": "6-NLP/2-Tasks/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/2-Tasks/assignment.md": {
+ "original_hash": "2efc4c2aba5ed06c780c05539c492ae3",
+ "translation_date": "2025-09-03T18:50:07+00:00",
+ "source_file": "6-NLP/2-Tasks/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/3-Translation-Sentiment/README.md": {
+ "original_hash": "be03c8182982b87ced155e4e9d1438e8",
+ "translation_date": "2025-09-05T09:12:24+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/3-Translation-Sentiment/assignment.md": {
+ "original_hash": "9d2a734deb904caff310d1a999c6bd7a",
+ "translation_date": "2025-09-03T19:08:10+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/3-Translation-Sentiment/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T19:08:59+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/3-Translation-Sentiment/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T19:08:36+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/4-Hotel-Reviews-1/README.md": {
+ "original_hash": "8d32dadeda93c6fb5c43619854882ab1",
+ "translation_date": "2025-09-05T09:10:44+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/4-Hotel-Reviews-1/assignment.md": {
+ "original_hash": "bf39bceb833cd628f224941dca8041df",
+ "translation_date": "2025-09-03T18:57:44+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T18:58:30+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/4-Hotel-Reviews-1/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T18:58:09+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/5-Hotel-Reviews-2/README.md": {
+ "original_hash": "2c742993fe95d5bcbb2846eda3d442a1",
+ "translation_date": "2025-09-05T09:13:02+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/5-Hotel-Reviews-2/assignment.md": {
+ "original_hash": "daf144daa552da6a7d442aff6f3e77d8",
+ "translation_date": "2025-09-03T19:14:24+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T19:15:08+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/5-Hotel-Reviews-2/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T19:14:49+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/README.md": {
+ "original_hash": "1eb379dc2d0c9940b320732d16083778",
+ "translation_date": "2025-09-03T18:45:55+00:00",
+ "source_file": "6-NLP/README.md",
+ "language_code": "zh-CN"
+ },
+ "6-NLP/data/README.md": {
+ "original_hash": "ee0670655c89e4719319764afb113624",
+ "translation_date": "2025-09-03T18:58:52+00:00",
+ "source_file": "6-NLP/data/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/1-Introduction/README.md": {
+ "original_hash": "662b509c39eee205687726636d0a8455",
+ "translation_date": "2025-09-05T08:59:54+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/1-Introduction/assignment.md": {
+ "original_hash": "d1781b0b92568ea1d119d0a198b576b4",
+ "translation_date": "2025-09-03T16:56:37+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/1-Introduction/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T16:57:26+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/1-Introduction/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T16:57:06+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/2-ARIMA/README.md": {
+ "original_hash": "917dbf890db71a322f306050cb284749",
+ "translation_date": "2025-09-05T08:59:15+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/2-ARIMA/assignment.md": {
+ "original_hash": "1c814013e10866dfd92cdb32caaae3ac",
+ "translation_date": "2025-09-03T16:51:48+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/2-ARIMA/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T16:52:35+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/2-ARIMA/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T16:52:15+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/3-SVR/README.md": {
+ "original_hash": "482bccabe1df958496ea71a3667995cd",
+ "translation_date": "2025-09-05T09:00:24+00:00",
+ "source_file": "7-TimeSeries/3-SVR/README.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/3-SVR/assignment.md": {
+ "original_hash": "94aa2fc6154252ae30a3f3740299707a",
+ "translation_date": "2025-09-03T17:01:12+00:00",
+ "source_file": "7-TimeSeries/3-SVR/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "7-TimeSeries/README.md": {
+ "original_hash": "61342603bad8acadbc6b2e4e3aab3f66",
+ "translation_date": "2025-09-03T16:46:16+00:00",
+ "source_file": "7-TimeSeries/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/1-QLearning/README.md": {
+ "original_hash": "911efd5e595089000cb3c16fce1beab8",
+ "translation_date": "2025-09-05T09:09:02+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/1-QLearning/assignment.md": {
+ "original_hash": "68394b2102d3503882e5e914bd0ff5c1",
+ "translation_date": "2025-09-03T18:37:16+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/1-QLearning/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T18:38:43+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/1-QLearning/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T18:38:21+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/2-Gym/README.md": {
+ "original_hash": "107d5bb29da8a562e7ae72262d251a75",
+ "translation_date": "2025-09-05T09:09:36+00:00",
+ "source_file": "8-Reinforcement/2-Gym/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/2-Gym/assignment.md": {
+ "original_hash": "1f2b7441745eb52e25745423b247016b",
+ "translation_date": "2025-09-03T18:44:12+00:00",
+ "source_file": "8-Reinforcement/2-Gym/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/2-Gym/solution/Julia/README.md": {
+ "original_hash": "a39c15d63f3b2795ee2284a82b986b93",
+ "translation_date": "2025-09-03T18:45:14+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/Julia/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/2-Gym/solution/R/README.md": {
+ "original_hash": "81db6ff2cf6e62fbe2340b094bb9509e",
+ "translation_date": "2025-09-03T18:44:53+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/R/README.md",
+ "language_code": "zh-CN"
+ },
+ "8-Reinforcement/README.md": {
+ "original_hash": "20ca019012b1725de956681d036d8b18",
+ "translation_date": "2025-09-03T18:26:49+00:00",
+ "source_file": "8-Reinforcement/README.md",
+ "language_code": "zh-CN"
+ },
+ "9-Real-World/1-Applications/README.md": {
+ "original_hash": "83320d6b6994909e35d830cebf214039",
+ "translation_date": "2025-09-05T09:02:25+00:00",
+ "source_file": "9-Real-World/1-Applications/README.md",
+ "language_code": "zh-CN"
+ },
+ "9-Real-World/1-Applications/assignment.md": {
+ "original_hash": "fdebfcd0a3f12c9e2b436ded1aa79885",
+ "translation_date": "2025-09-03T17:27:15+00:00",
+ "source_file": "9-Real-World/1-Applications/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "9-Real-World/2-Debugging-ML-Models/README.md": {
+ "original_hash": "df2b538e8fbb3e91cf0419ae2f858675",
+ "translation_date": "2025-09-05T09:03:02+00:00",
+ "source_file": "9-Real-World/2-Debugging-ML-Models/README.md",
+ "language_code": "zh-CN"
+ },
+ "9-Real-World/2-Debugging-ML-Models/assignment.md": {
+ "original_hash": "91c6a180ef08e20cc15acfd2d6d6e164",
+ "translation_date": "2025-09-03T17:33:11+00:00",
+ "source_file": "9-Real-World/2-Debugging-ML-Models/assignment.md",
+ "language_code": "zh-CN"
+ },
+ "9-Real-World/README.md": {
+ "original_hash": "5e069a0ac02a9606a69946c2b3c574a9",
+ "translation_date": "2025-09-03T17:22:09+00:00",
+ "source_file": "9-Real-World/README.md",
+ "language_code": "zh-CN"
+ },
+ "AGENTS.md": {
+ "original_hash": "93fdaa0fd38836e50c4793e2f2f25e8b",
+ "translation_date": "2025-10-03T10:59:57+00:00",
+ "source_file": "AGENTS.md",
+ "language_code": "zh-CN"
+ },
+ "CODE_OF_CONDUCT.md": {
+ "original_hash": "c06b12caf3c901eb3156e3dd5b0aea56",
+ "translation_date": "2025-09-03T16:16:28+00:00",
+ "source_file": "CODE_OF_CONDUCT.md",
+ "language_code": "zh-CN"
+ },
+ "CONTRIBUTING.md": {
+ "original_hash": "977ec5266dfd78ad1ce2bd8d46fccbda",
+ "translation_date": "2025-09-03T16:13:40+00:00",
+ "source_file": "CONTRIBUTING.md",
+ "language_code": "zh-CN"
+ },
+ "README.md": {
+ "original_hash": "da2ceed62f16a0820259556e3a873c95",
+ "translation_date": "2026-01-29T17:44:38+00:00",
+ "source_file": "README.md",
+ "language_code": "zh-CN"
+ },
+ "SECURITY.md": {
+ "original_hash": "5e1b8da31aae9cca3d53ad243fa3365a",
+ "translation_date": "2025-09-03T16:14:39+00:00",
+ "source_file": "SECURITY.md",
+ "language_code": "zh-CN"
+ },
+ "SUPPORT.md": {
+ "original_hash": "09623d7343ff1c26ff4f198c1b2d3176",
+ "translation_date": "2025-10-03T11:41:54+00:00",
+ "source_file": "SUPPORT.md",
+ "language_code": "zh-CN"
+ },
+ "TROUBLESHOOTING.md": {
+ "original_hash": "134d8759f0e2ab886e9aa4f62362c201",
+ "translation_date": "2025-10-03T12:38:25+00:00",
+ "source_file": "TROUBLESHOOTING.md",
+ "language_code": "zh-CN"
+ },
+ "docs/_sidebar.md": {
+ "original_hash": "68dd06c685f6ce840e0acfa313352e7c",
+ "translation_date": "2025-09-03T17:21:14+00:00",
+ "source_file": "docs/_sidebar.md",
+ "language_code": "zh-CN"
+ },
+ "for-teachers.md": {
+ "original_hash": "b37de02054fa6c0438ede6fabe1fdfb8",
+ "translation_date": "2025-09-03T16:15:43+00:00",
+ "source_file": "for-teachers.md",
+ "language_code": "zh-CN"
+ },
+ "quiz-app/README.md": {
+ "original_hash": "6d130dffca5db70d7e615f926cb1ad4c",
+ "translation_date": "2025-09-03T17:58:43+00:00",
+ "source_file": "quiz-app/README.md",
+ "language_code": "zh-CN"
+ },
+ "sketchnotes/LICENSE.md": {
+ "original_hash": "fba3b94d88bfb9b81369b869a1e9a20f",
+ "translation_date": "2025-09-03T18:22:05+00:00",
+ "source_file": "sketchnotes/LICENSE.md",
+ "language_code": "zh-CN"
+ },
+ "sketchnotes/README.md": {
+ "original_hash": "a88d5918c1b9da69a40d917a0840c497",
+ "translation_date": "2025-09-03T18:16:47+00:00",
+ "source_file": "sketchnotes/README.md",
+ "language_code": "zh-CN"
+ }
+}
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/1-intro-to-ML/README.md b/translations/zh-CN/1-Introduction/1-intro-to-ML/README.md
new file mode 100644
index 000000000..6c89ff76f
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/1-intro-to-ML/README.md
@@ -0,0 +1,150 @@
+# 机器学习简介
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+---
+
+[](https://youtu.be/6mSx_KJxcHI "初学者的机器学习 - 机器学习入门")
+
+> 🎥 点击上方图片观看本课相关的短视频。
+
+欢迎来到这门面向初学者的经典机器学习课程!无论你是完全新手,还是一位希望复习某些领域的经验丰富的机器学习从业者,我们都很高兴你能加入我们!我们希望为你的机器学习学习提供一个友好的起点,并欢迎你提供[反馈](https://github.com/microsoft/ML-For-Beginners/discussions),我们会评估、回应并融入你的建议。
+
+[](https://youtu.be/h0e2HAPTGF4 "机器学习简介")
+
+> 🎥 点击上方图片观看视频:麻省理工学院的 John Guttag 介绍机器学习
+
+---
+## 开始学习机器学习
+
+在开始学习本课程之前,你需要确保你的电脑已经设置好并可以本地运行笔记本。
+
+- **通过以下视频配置你的电脑**。使用以下链接学习[如何安装 Python](https://youtu.be/CXZYvNRIAKM)以及[设置文本编辑器](https://youtu.be/EU8eayHWoZg)进行开发。
+- **学习 Python**。建议你对[Python](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-77952-leestott)有基本的了解,这是一种对数据科学家非常有用的编程语言,我们将在课程中使用它。
+- **学习 Node.js 和 JavaScript**。我们在课程中会使用 JavaScript 构建一些网页应用,因此你需要安装 [node](https://nodejs.org) 和 [npm](https://www.npmjs.com/),以及为 Python 和 JavaScript 开发准备好 [Visual Studio Code](https://code.visualstudio.com/)。
+- **创建 GitHub 账户**。既然你在 [GitHub](https://github.com) 找到了我们,你可能已经有一个账户了,但如果没有,请创建一个账户,然后 fork 本课程以供自己使用。(也可以给我们点个星星 😊)
+- **探索 Scikit-learn**。熟悉 [Scikit-learn](https://scikit-learn.org/stable/user_guide.html),这是我们在课程中参考的一组机器学习库。
+
+---
+## 什么是机器学习?
+
+“机器学习”是当今最流行和最常用的术语之一。如果你对技术有一定的了解,无论你从事哪个领域,都有很大可能至少听过一次这个术语。然而,机器学习的运作机制对大多数人来说仍然是一个谜。对于机器学习初学者来说,这个主题有时可能会让人感到不知所措。因此,了解机器学习的真正含义,并通过实际例子一步步学习它是非常重要的。
+
+---
+## 热度曲线
+
+
+
+> Google Trends 显示了“机器学习”这一术语的近期热度曲线
+
+---
+## 神秘的宇宙
+
+我们生活在一个充满迷人奥秘的宇宙中。像斯蒂芬·霍金、阿尔伯特·爱因斯坦等伟大的科学家们,毕生致力于寻找有意义的信息,以揭示我们周围世界的奥秘。这是人类学习的本质:一个孩子通过逐年成长,学习新事物并揭示其世界的结构。
+
+---
+## 孩子的大脑
+
+孩子的大脑和感官感知周围环境的事实,并逐渐学习生活中隐藏的模式,这些模式帮助孩子制定逻辑规则以识别已学到的模式。人类大脑的学习过程使人类成为这个世界上最复杂的生物。通过发现隐藏模式并不断创新,我们能够在一生中不断提升自己。这种学习能力和进化能力与一个叫做[脑可塑性](https://www.simplypsychology.org/brain-plasticity.html)的概念有关。从表面上看,我们可以将人类大脑的学习过程与机器学习的概念进行一些激励性的类比。
+
+---
+## 人类大脑
+
+[人类大脑](https://www.livescience.com/29365-human-brain.html)从现实世界中感知事物,处理感知到的信息,做出理性决策,并根据情况采取某些行动。这就是我们所说的智能行为。当我们将智能行为过程的模拟编程到机器中时,这就被称为人工智能(AI)。
+
+---
+## 一些术语
+
+尽管这些术语可能会混淆,但机器学习(ML)是人工智能的重要子集。**机器学习关注的是使用专门的算法从感知到的数据中发现有意义的信息和隐藏模式,以支持理性决策过程**。
+
+---
+## AI、ML、深度学习
+
+
+
+> 一张展示 AI、ML、深度学习和数据科学之间关系的图表。信息图由 [Jen Looper](https://twitter.com/jenlooper) 制作,灵感来源于[这张图](https://softwareengineering.stackexchange.com/questions/366996/distinction-between-ai-ml-neural-networks-deep-learning-and-data-mining)
+
+---
+## 涵盖的概念
+
+在本课程中,我们将仅涵盖机器学习的核心概念,这些是初学者必须了解的内容。我们主要使用 Scikit-learn,这是一款许多学生用来学习基础知识的优秀库,来讲解我们称之为“经典机器学习”的内容。要理解人工智能或深度学习的更广泛概念,扎实的机器学习基础知识是不可或缺的,因此我们希望在这里提供这些知识。
+
+---
+## 在本课程中你将学习:
+
+- 机器学习的核心概念
+- 机器学习的历史
+- 机器学习与公平性
+- 回归机器学习技术
+- 分类机器学习技术
+- 聚类机器学习技术
+- 自然语言处理机器学习技术
+- 时间序列预测机器学习技术
+- 强化学习
+- 机器学习的实际应用
+
+---
+## 我们不会涵盖的内容
+
+- 深度学习
+- 神经网络
+- 人工智能
+
+为了提供更好的学习体验,我们将避免涉及神经网络的复杂性、“深度学习”(使用神经网络构建多层模型)以及人工智能,这些内容将在另一门课程中讨论。我们还将提供即将推出的数据科学课程,以专注于这一更广泛领域的相关内容。
+
+---
+## 为什么学习机器学习?
+
+从系统的角度来看,机器学习被定义为创建能够从数据中学习隐藏模式以帮助做出智能决策的自动化系统。
+
+这种动机在一定程度上受到人类大脑如何根据外界感知的数据学习某些事物的启发。
+
+✅ 思考一下,为什么企业会选择使用机器学习策略,而不是创建一个基于硬编码规则的引擎?
+
+---
+## 机器学习的应用
+
+机器学习的应用几乎无处不在,就像我们社会中流动的数据一样,这些数据由智能手机、连接设备和其他系统生成。考虑到最先进的机器学习算法的巨大潜力,研究人员一直在探索其解决多维度和多学科现实问题的能力,并取得了非常积极的成果。
+
+---
+## 应用机器学习的例子
+
+**机器学习有许多用途**:
+
+- 根据患者的病史或报告预测疾病的可能性。
+- 利用天气数据预测天气事件。
+- 理解文本的情感。
+- 检测虚假新闻以阻止宣传的传播。
+
+金融、经济、地球科学、太空探索、生物医学工程、认知科学,甚至人文学科都已经适应了机器学习,以解决其领域中繁重的数据处理问题。
+
+---
+## 结论
+
+机器学习通过从现实世界或生成的数据中发现有意义的洞察来自动化模式发现的过程。它已在商业、健康和金融等领域证明了其高度价值。
+
+在不久的将来,由于机器学习的广泛应用,了解机器学习的基础知识将成为任何领域人士的必备技能。
+
+---
+# 🚀 挑战
+
+用纸或在线应用(如 [Excalidraw](https://excalidraw.com/))绘制你对 AI、ML、深度学习和数据科学之间差异的理解。添加一些关于每种技术擅长解决的问题的想法。
+
+# [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+---
+# 复习与自学
+
+要了解如何在云端使用机器学习算法,请参考此[学习路径](https://docs.microsoft.com/learn/paths/create-no-code-predictive-models-azure-machine-learning/?WT.mc_id=academic-77952-leestott)。
+
+学习机器学习基础知识,请参考此[学习路径](https://docs.microsoft.com/learn/modules/introduction-to-machine-learning/?WT.mc_id=academic-77952-leestott)。
+
+---
+# 作业
+
+[开始学习](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/1-intro-to-ML/assignment.md b/translations/zh-CN/1-Introduction/1-intro-to-ML/assignment.md
new file mode 100644
index 000000000..f3c95cb63
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/1-intro-to-ML/assignment.md
@@ -0,0 +1,14 @@
+# 快速开始
+
+## 说明
+
+在这个非评分的任务中,你需要复习 Python 并设置好你的环境,以便能够运行笔记本。
+
+请学习这个 [Python 学习路径](https://docs.microsoft.com/learn/paths/python-language/?WT.mc_id=academic-77952-leestott),然后通过以下入门视频设置你的系统:
+
+https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/2-history-of-ML/README.md b/translations/zh-CN/1-Introduction/2-history-of-ML/README.md
new file mode 100644
index 000000000..566741eb0
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/2-history-of-ML/README.md
@@ -0,0 +1,155 @@
+# 机器学习的历史
+
+
+> 草图由 [Tomomi Imura](https://www.twitter.com/girlie_mac) 绘制
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+---
+
+[](https://youtu.be/N6wxM4wZ7V0 "机器学习初学者 - 机器学习的历史")
+
+> 🎥 点击上方图片观看本课的简短视频。
+
+在本课中,我们将回顾机器学习和人工智能历史上的重要里程碑。
+
+人工智能(AI)作为一个领域的历史与机器学习的历史密不可分,因为支撑机器学习的算法和计算进步也推动了人工智能的发展。需要注意的是,尽管这些领域作为独立的研究方向在20世纪50年代开始成型,但重要的[算法、统计、数学、计算和技术发现](https://wikipedia.org/wiki/Timeline_of_machine_learning)早在这一时期之前就已经出现并有所交集。事实上,人们已经思考这些问题[数百年](https://wikipedia.org/wiki/History_of_artificial_intelligence):这篇文章探讨了“会思考的机器”这一理念的历史性思想基础。
+
+---
+## 重要发现
+
+- 1763年, 1812年 [贝叶斯定理](https://wikipedia.org/wiki/Bayes%27_theorem)及其前身。这一定理及其应用奠定了推断的基础,描述了基于先验知识事件发生的概率。
+- 1805年 [最小二乘法](https://wikipedia.org/wiki/Least_squares),由法国数学家Adrien-Marie Legendre提出。这一理论(你将在回归单元中学习)有助于数据拟合。
+- 1913年 [马尔可夫链](https://wikipedia.org/wiki/Markov_chain),以俄罗斯数学家Andrey Markov命名,用于描述基于前一状态的一系列可能事件。
+- 1957年 [感知机](https://wikipedia.org/wiki/Perceptron),一种由美国心理学家Frank Rosenblatt发明的线性分类器,是深度学习进步的基础。
+
+---
+
+- 1967年 [最近邻算法](https://wikipedia.org/wiki/Nearest_neighbor),最初设计用于路径规划。在机器学习中,它被用来检测模式。
+- 1970年 [反向传播算法](https://wikipedia.org/wiki/Backpropagation),用于训练[前馈神经网络](https://wikipedia.org/wiki/Feedforward_neural_network)。
+- 1982年 [循环神经网络](https://wikipedia.org/wiki/Recurrent_neural_network),从前馈神经网络衍生而来,用于创建时间序列图。
+
+✅ 做一些研究。还有哪些年份在机器学习和人工智能的历史上具有重要意义?
+
+---
+## 1950年:会思考的机器
+
+艾伦·图灵(Alan Turing)是一位真正杰出的人物,他在[2019年被公众评选](https://wikipedia.org/wiki/Icons:_The_Greatest_Person_of_the_20th_Century)为20世纪最伟大的科学家。他被认为奠定了“会思考的机器”这一概念的基础。他通过创建[图灵测试](https://www.bbc.com/news/technology-18475646)来应对质疑者以及自己对这一概念的实证需求,你将在自然语言处理课程中进一步探索这一测试。
+
+---
+## 1956年:达特茅斯夏季研究项目
+
+“达特茅斯夏季人工智能研究项目是人工智能领域的一个开创性事件”,在这里,“人工智能”这一术语首次被提出([来源](https://250.dartmouth.edu/highlights/artificial-intelligence-ai-coined-dartmouth))。
+
+> 学习的每一个方面或智能的任何其他特征原则上都可以被如此精确地描述,以至于可以制造出模拟它的机器。
+
+---
+
+该项目的首席研究员、数学教授John McCarthy希望“基于这样的假设:学习的每一个方面或智能的任何其他特征原则上都可以被如此精确地描述,以至于可以制造出模拟它的机器。”参与者中还包括该领域的另一位杰出人物Marvin Minsky。
+
+该研讨会被认为启动并推动了多项讨论,包括“符号方法的兴起、专注于有限领域的系统(早期专家系统)以及演绎系统与归纳系统的对比”([来源](https://wikipedia.org/wiki/Dartmouth_workshop))。
+
+---
+## 1956 - 1974年:“黄金时代”
+
+从20世纪50年代到70年代中期,人们对人工智能能够解决许多问题充满乐观。1967年,Marvin Minsky自信地表示:“在一代人的时间内……创造‘人工智能’的问题将基本解决。”(Minsky, Marvin (1967), Computation: Finite and Infinite Machines, Englewood Cliffs, N.J.: Prentice-Hall)
+
+自然语言处理研究蓬勃发展,搜索技术得到了改进并变得更强大,“微观世界”的概念被提出,在这种环境下,简单任务可以通过简单的语言指令完成。
+
+---
+
+政府机构为研究提供了充足的资金,计算和算法取得了进展,智能机器的原型被制造出来。这些机器包括:
+
+* [Shakey机器人](https://wikipedia.org/wiki/Shakey_the_robot),它能够“智能地”移动并决定如何执行任务。
+
+ 
+ > 1972年的Shakey
+
+---
+
+* Eliza,一个早期的“聊天机器人”,能够与人对话并充当一个原始的“治疗师”。你将在自然语言处理课程中进一步了解Eliza。
+
+ 
+ > Eliza的一个版本,一个聊天机器人
+
+---
+
+* “积木世界”是一个微观世界的例子,在这里积木可以被堆叠和排序,机器学习决策的实验可以在此进行。使用诸如[SHRDLU](https://wikipedia.org/wiki/SHRDLU)之类的库的进步推动了语言处理的发展。
+
+ [](https://www.youtube.com/watch?v=QAJz4YKUwqw "SHRDLU的积木世界")
+
+ > 🎥 点击上方图片观看视频:SHRDLU的积木世界
+
+---
+## 1974 - 1980年:“人工智能寒冬”
+
+到70年代中期,制造“智能机器”的复杂性被低估的事实变得显而易见,而其承诺在当时的计算能力下被过度夸大。资金枯竭,领域信心减弱。影响信心的一些问题包括:
+---
+- **局限性**。计算能力过于有限。
+- **组合爆炸**。随着对计算机要求的增加,需要训练的参数数量呈指数增长,而计算能力和性能却没有相应提升。
+- **数据匮乏**。数据的匮乏阻碍了算法的测试、开发和优化过程。
+- **我们是否在问正确的问题?**。研究者开始质疑他们提出的问题:
+ - 图灵测试因“中文房间理论”等观点受到质疑,该理论认为,“编程数字计算机可能使其看似理解语言,但无法产生真正的理解。”([来源](https://plato.stanford.edu/entries/chinese-room/))
+ - 将人工智能(如“治疗师”ELIZA)引入社会的伦理问题受到挑战。
+
+---
+
+与此同时,各种人工智能学派开始形成。“[粗放派](https://wikipedia.org/wiki/Neats_and_scruffies)”与“精确派”实践之间的二分法逐渐确立。_粗放派_实验室通过不断调整程序以获得所需结果,而_精确派_实验室则“专注于逻辑和形式化问题解决”。ELIZA和SHRDLU是著名的_粗放派_系统。到了80年代,随着对机器学习系统可重复性的需求增加,_精确派_方法逐渐占据主导地位,因为其结果更具可解释性。
+
+---
+## 1980年代 专家系统
+
+随着领域的发展,其对商业的益处变得更加明显,1980年代“专家系统”的普及也随之而来。“专家系统是最早真正成功的人工智能(AI)软件形式之一。”([来源](https://wikipedia.org/wiki/Expert_system))
+
+这种系统实际上是_混合型_的,部分由定义业务需求的规则引擎组成,部分由利用规则系统推导新事实的推理引擎组成。
+
+这一时期还出现了对神经网络的日益关注。
+
+---
+## 1987 - 1993年:人工智能“冷却期”
+
+专用专家系统硬件的普及不幸导致其过于专用化。个人计算机的兴起也与这些大型、专用、集中化的系统形成了竞争。计算的民主化开始了,并最终为现代大数据的爆发铺平了道路。
+
+---
+## 1993 - 2011年
+
+这一时期见证了机器学习和人工智能能够解决早期因数据和计算能力不足而导致的问题。数据量开始迅速增加并变得更易获取,无论是好是坏,尤其是在2007年左右智能手机的出现之后。计算能力呈指数级增长,算法也随之演进。随着过去自由发展的日子逐渐凝聚成一个真正的学科,这一领域开始走向成熟。
+
+---
+## 现在
+
+如今,机器学习和人工智能几乎触及我们生活的每一个部分。这一时代需要我们仔细理解这些算法对人类生活的风险和潜在影响。正如微软的Brad Smith所说:“信息技术提出了一些问题,这些问题触及了隐私和言论自由等基本人权保护的核心。这些问题加重了创造这些产品的科技公司的责任。在我们看来,这也需要深思熟虑的政府监管以及围绕可接受用途的规范发展。”([来源](https://www.technologyreview.com/2019/12/18/102365/the-future-of-ais-impact-on-society/))
+
+---
+
+未来会如何发展仍未可知,但理解这些计算机系统及其运行的软件和算法是非常重要的。我们希望这门课程能帮助你更好地理解这些内容,从而让你自己做出判断。
+
+[](https://www.youtube.com/watch?v=mTtDfKgLm54 "深度学习的历史")
+> 🎥 点击上方图片观看视频:Yann LeCun在这次讲座中讨论了深度学习的历史
+
+---
+## 🚀挑战
+
+深入研究这些历史时刻中的一个,了解背后的人物。这些人物非常有趣,没有任何科学发现是在文化真空中产生的。你发现了什么?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+---
+## 复习与自学
+
+以下是一些可以观看和收听的内容:
+
+[这期Amy Boyd讨论人工智能演变的播客](http://runasradio.com/Shows/Show/739)
+
+[](https://www.youtube.com/watch?v=EJt3_bFYKss "Amy Boyd讲述人工智能的历史")
+
+---
+
+## 作业
+
+[创建一个时间线](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/2-history-of-ML/assignment.md b/translations/zh-CN/1-Introduction/2-history-of-ML/assignment.md
new file mode 100644
index 000000000..86b208eca
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/2-history-of-ML/assignment.md
@@ -0,0 +1,16 @@
+# 创建时间轴
+
+## 说明
+
+使用[这个仓库](https://github.com/Digital-Humanities-Toolkit/timeline-builder),创建一个关于算法、数学、统计学、人工智能或机器学习历史某一方面的时间轴,或者结合这些主题。你可以专注于一个人、一个想法,或者一个长时间跨度的思想发展。确保添加多媒体元素。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ------------------------------------------------- | --------------------------------------- | ---------------------------------------------------------------- |
+| | 时间轴已部署为一个GitHub页面 | 代码不完整且未部署 | 时间轴不完整,研究不充分且未部署 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/3-fairness/README.md b/translations/zh-CN/1-Introduction/3-fairness/README.md
new file mode 100644
index 000000000..d1c37b637
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/3-fairness/README.md
@@ -0,0 +1,161 @@
+# 构建负责任的人工智能的机器学习解决方案
+
+
+> 由 [Tomomi Imura](https://www.twitter.com/girlie_mac) 绘制的概要图
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 简介
+
+在本课程中,您将开始了解机器学习如何以及正在影响我们的日常生活。即使是现在,系统和模型已经参与了日常决策任务,例如医疗诊断、贷款审批或欺诈检测。因此,确保这些模型能够提供值得信赖的结果非常重要。与任何软件应用程序一样,人工智能系统可能会未达到预期或产生不理想的结果。这就是为什么理解和解释人工智能模型的行为至关重要。
+
+想象一下,当您用于构建这些模型的数据缺乏某些人口统计信息(例如种族、性别、政治观点、宗教)或过度代表某些人口统计信息时会发生什么?如果模型的输出被解释为偏向某些人口统计信息,又会有什么后果?此外,当模型产生不良结果并对人们造成伤害时会发生什么?谁应该对人工智能系统的行为负责?这些是我们将在本课程中探讨的一些问题。
+
+在本课中,您将:
+
+- 提高对机器学习公平性及相关危害重要性的认识。
+- 熟悉探索异常值和特殊场景以确保可靠性和安全性的实践。
+- 了解设计包容性系统以赋能所有人的必要性。
+- 探讨保护数据和个人隐私与安全的重要性。
+- 认识到采用透明化方法解释人工智能模型行为的重要性。
+- 意识到责任感对于建立人工智能系统信任的重要性。
+
+## 前提条件
+
+作为前提条件,请完成“负责任人工智能原则”学习路径并观看以下视频:
+
+通过以下 [学习路径](https://docs.microsoft.com/learn/modules/responsible-ai-principles/?WT.mc_id=academic-77952-leestott) 了解更多关于负责任人工智能的信息。
+
+[](https://youtu.be/dnC8-uUZXSc "微软的负责任人工智能方法")
+
+> 🎥 点击上方图片观看视频:微软的负责任人工智能方法
+
+## 公平性
+
+人工智能系统应公平对待每个人,避免对类似群体产生不同影响。例如,当人工智能系统提供医疗建议、贷款申请或就业指导时,它们应对具有类似症状、财务状况或专业资格的人做出相同的推荐。我们每个人作为人类,都携带着影响我们决策和行为的固有偏见。这些偏见可能会体现在我们用于训练人工智能系统的数据中。这种操控有时可能是无意的。通常很难有意识地知道何时在数据中引入了偏见。
+
+**“不公平”** 包括对某些群体(例如按种族、性别、年龄或残疾状态定义的群体)造成的负面影响或“危害”。主要与公平性相关的危害可以分类为:
+
+- **分配**:例如,如果某个性别或种族被优待于另一个。
+- **服务质量**:如果您仅为一个特定场景训练数据,而现实情况更复杂,这会导致服务表现不佳。例如,一个无法识别深色皮肤的洗手液分配器。[参考](https://gizmodo.com/why-cant-this-soap-dispenser-identify-dark-skin-1797931773)
+- **贬低**:不公平地批评或标记某事或某人。例如,一个图像标记技术曾错误地将深色皮肤人群的照片标记为猩猩。
+- **过度或不足代表**:某些群体在某些职业中未被看到,而任何继续推广这种现象的服务或功能都在助长危害。
+- **刻板印象**:将某个群体与预先分配的属性联系起来。例如,英语和土耳其语之间的语言翻译系统可能因与性别相关的刻板印象而出现不准确。
+
+
+> 翻译成土耳其语
+
+
+> 翻译回英语
+
+在设计和测试人工智能系统时,我们需要确保人工智能是公平的,并且不会被编程为做出偏见或歧视性的决策,这些决策也是人类被禁止做出的。确保人工智能和机器学习的公平性仍然是一个复杂的社会技术挑战。
+
+### 可靠性与安全性
+
+为了建立信任,人工智能系统需要在正常和意外情况下保持可靠、安全和一致。了解人工智能系统在各种情况下的行为尤其重要,特别是当它们处于异常值时。在构建人工智能解决方案时,需要重点关注如何处理人工智能解决方案可能遇到的各种情况。例如,一辆自动驾驶汽车需要将人的安全作为首要任务。因此,驱动汽车的人工智能需要考虑汽车可能遇到的所有可能场景,例如夜晚、雷暴或暴风雪、孩子跑过街道、宠物、道路施工等。人工智能系统在各种条件下可靠安全地处理问题的能力反映了数据科学家或人工智能开发人员在设计或测试系统时的预见水平。
+
+> [🎥 点击此处观看视频:](https://www.microsoft.com/videoplayer/embed/RE4vvIl)
+
+### 包容性
+
+人工智能系统应设计为能够吸引和赋能所有人。在设计和实施人工智能系统时,数据科学家和人工智能开发人员需要识别并解决系统中可能无意间排除某些人的潜在障碍。例如,全球有10亿残疾人。随着人工智能的进步,他们可以更轻松地获取广泛的信息和机会。通过解决这些障碍,可以创造创新机会,开发具有更好体验的人工智能产品,从而惠及所有人。
+
+> [🎥 点击此处观看视频:人工智能中的包容性](https://www.microsoft.com/videoplayer/embed/RE4vl9v)
+
+### 安全与隐私
+
+人工智能系统应安全并尊重个人隐私。人们对那些可能危及隐私、信息或生命的系统信任度较低。在训练机器学习模型时,我们依赖数据以获得最佳结果。在此过程中,必须考虑数据的来源和完整性。例如,数据是用户提交的还是公开可用的?接下来,在处理数据时,开发人工智能系统时必须能够保护机密信息并抵御攻击。随着人工智能的普及,保护隐私和确保重要的个人和商业信息的安全变得越来越重要和复杂。隐私和数据安全问题需要特别关注人工智能,因为数据访问对于人工智能系统做出准确和知情的预测以及关于人的决策至关重要。
+
+> [🎥 点击此处观看视频:人工智能中的安全性](https://www.microsoft.com/videoplayer/embed/RE4voJF)
+
+- 在行业中,我们在隐私和安全方面取得了显著进展,这在很大程度上得益于像GDPR(通用数据保护条例)这样的法规。
+- 然而,对于人工智能系统,我们必须承认需要更多个人数据以使系统更个性化和有效——与隐私之间的紧张关系。
+- 就像互联网连接的计算机诞生一样,我们也看到了与人工智能相关的安全问题数量的巨大增长。
+- 同时,我们也看到人工智能被用于改善安全性。例如,大多数现代杀毒扫描器今天都由人工智能启发式驱动。
+- 我们需要确保我们的数据科学流程与最新的隐私和安全实践和谐融合。
+
+### 透明性
+
+人工智能系统应易于理解。透明性的一个关键部分是解释人工智能系统及其组件的行为。提高对人工智能系统的理解需要利益相关者能够理解它们的功能和原因,以便识别潜在的性能问题、安全和隐私问题、偏见、排他性实践或意外结果。我们还认为,使用人工智能系统的人应该诚实并坦率地说明何时、为何以及如何选择部署它们,以及所使用系统的局限性。例如,如果一家银行使用人工智能系统来支持其消费者贷款决策,那么审查结果并了解哪些数据影响系统的推荐是很重要的。政府开始对各行业的人工智能进行监管,因此数据科学家和组织必须解释人工智能系统是否符合监管要求,特别是在出现不理想结果时。
+
+> [🎥 点击此处观看视频:人工智能中的透明性](https://www.microsoft.com/videoplayer/embed/RE4voJF)
+
+- 由于人工智能系统非常复杂,很难理解它们的工作原理并解释结果。
+- 这种理解的缺乏影响了这些系统的管理、操作化和文档化方式。
+- 更重要的是,这种理解的缺乏影响了使用这些系统产生的结果所做出的决策。
+
+### 责任感
+
+设计和部署人工智能系统的人必须对其系统的运行负责。责任感对于敏感技术的使用尤其重要,例如面部识别技术。最近,面部识别技术的需求不断增长,尤其是执法机构,他们看到了该技术在寻找失踪儿童等用途上的潜力。然而,这些技术可能会被政府用于威胁公民的基本自由,例如对特定个人进行持续监控。因此,数据科学家和组织需要对其人工智能系统对个人或社会的影响负责。
+
+[](https://www.youtube.com/watch?v=Wldt8P5V6D0 "微软的负责任人工智能方法")
+
+> 🎥 点击上方图片观看视频:面部识别可能导致大规模监控的警告
+
+最终,对于我们这一代人来说,作为将人工智能引入社会的第一代人,最大的一个问题是如何确保计算机始终对人类负责,以及如何确保设计计算机的人对其他人负责。
+
+## 影响评估
+
+在训练机器学习模型之前,进行影响评估以了解人工智能系统的目的、预期用途、部署地点以及与系统交互的人非常重要。这些评估有助于评审者或测试人员在识别潜在风险和预期后果时知道需要考虑哪些因素。
+
+以下是进行影响评估时的重点领域:
+
+* **对个人的不利影响**:意识到任何限制或要求、不支持的用途或任何已知限制影响系统性能至关重要,以确保系统不会以可能对个人造成伤害的方式使用。
+* **数据要求**:了解系统如何以及在哪里使用数据使评审者能够探索需要注意的任何数据要求(例如GDPR或HIPPA数据法规)。此外,检查数据的来源或数量是否足够用于训练。
+* **影响摘要**:收集使用系统可能产生的潜在危害列表。在机器学习生命周期中,审查是否解决或处理了识别的问题。
+* **六项核心原则的适用目标**:评估每项原则的目标是否达成,以及是否存在任何差距。
+
+## 使用负责任人工智能进行调试
+
+与调试软件应用程序类似,调试人工智能系统是识别和解决系统问题的必要过程。许多因素会影响模型未按预期或负责任地运行。大多数传统模型性能指标是模型性能的定量汇总,这不足以分析模型如何违反负责任人工智能原则。此外,机器学习模型是一个黑箱,难以理解其结果的驱动因素或在出现错误时提供解释。在本课程后续部分,我们将学习如何使用负责任人工智能仪表板来帮助调试人工智能系统。该仪表板为数据科学家和人工智能开发人员提供了一个全面的工具,用于执行以下操作:
+
+* **错误分析**:识别模型的错误分布,这可能影响系统的公平性或可靠性。
+* **模型概览**:发现模型在不同数据群体中的性能差异。
+* **数据分析**:了解数据分布并识别数据中可能导致公平性、包容性和可靠性问题的潜在偏见。
+* **模型可解释性**:了解影响或驱动模型预测的因素。这有助于解释模型的行为,这对透明性和责任感至关重要。
+
+## 🚀 挑战
+
+为了防止危害的引入,我们应该:
+
+- 确保参与系统开发的人员具有多样化的背景和观点
+- 投资于反映社会多样性的数据集
+- 在整个机器学习生命周期中开发更好的方法,以检测和纠正负责任人工智能问题
+
+思考现实生活中模型在构建和使用过程中显现不可信的场景。我们还应该考虑什么?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+在本课中,您已经学习了机器学习中公平性和不公平性概念的一些基础知识。
+观看此研讨会,深入了解相关主题:
+
+- 追求负责任的人工智能:将原则付诸实践,由 Besmira Nushi、Mehrnoosh Sameki 和 Amit Sharma 主讲
+
+[](https://www.youtube.com/watch?v=tGgJCrA-MZU "RAI Toolbox: 构建负责任人工智能的开源框架")
+
+> 🎥 点击上方图片观看视频:RAI Toolbox: 构建负责任人工智能的开源框架,由 Besmira Nushi、Mehrnoosh Sameki 和 Amit Sharma 主讲
+
+此外,阅读以下内容:
+
+- 微软的负责任人工智能资源中心:[负责任人工智能资源 – Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
+
+- 微软的 FATE 研究团队:[FATE: 公平性、问责性、透明性和人工智能伦理 - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
+
+RAI 工具箱:
+
+- [负责任人工智能工具箱 GitHub 仓库](https://github.com/microsoft/responsible-ai-toolbox)
+
+了解 Azure 机器学习工具如何确保公平性:
+
+- [Azure 机器学习](https://docs.microsoft.com/azure/machine-learning/concept-fairness-ml?WT.mc_id=academic-77952-leestott)
+
+## 作业
+
+[探索 RAI 工具箱](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。虽然我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/3-fairness/assignment.md b/translations/zh-CN/1-Introduction/3-fairness/assignment.md
new file mode 100644
index 000000000..47bffface
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/3-fairness/assignment.md
@@ -0,0 +1,16 @@
+# 探索负责任的AI工具箱
+
+## 说明
+
+在本课程中,您学习了负责任的AI工具箱,这是一个“开源的、社区驱动的项目,旨在帮助数据科学家分析和改进AI系统。” 在本次作业中,请探索RAI工具箱的一个[笔记本](https://github.com/microsoft/responsible-ai-toolbox/blob/main/notebooks/responsibleaidashboard/getting-started.ipynb),并在论文或演示文稿中报告您的发现。
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | --------- | -------- | ----------------- |
+| | 提交了一篇论文或PowerPoint演示文稿,讨论了Fairlearn的系统、运行的笔记本以及从中得出的结论 | 提交了一篇没有结论的论文 | 未提交论文 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/4-techniques-of-ML/README.md b/translations/zh-CN/1-Introduction/4-techniques-of-ML/README.md
new file mode 100644
index 000000000..181e5bf20
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/4-techniques-of-ML/README.md
@@ -0,0 +1,123 @@
+# 机器学习技术
+
+构建、使用和维护机器学习模型及其所需数据的过程,与许多其他开发工作流有很大的不同。在本课中,我们将揭开这一过程的神秘面纱,并概述您需要了解的主要技术。您将:
+
+- 从高层次理解机器学习的基本流程。
+- 探索诸如“模型”、“预测”和“训练数据”等基础概念。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+[](https://youtu.be/4NGM0U2ZSHU "机器学习入门 - 机器学习技术")
+
+> 🎥 点击上方图片观看本课的简短视频。
+
+## 介绍
+
+从高层次来看,创建机器学习(ML)流程的过程包括以下几个步骤:
+
+1. **确定问题**。大多数机器学习流程从提出一个无法通过简单条件程序或基于规则的引擎回答的问题开始。这些问题通常围绕基于数据集合的预测展开。
+2. **收集和准备数据**。为了回答您的问题,您需要数据。数据的质量以及有时数据的数量将决定您能多好地回答最初的问题。可视化数据是这一阶段的重要部分。这一阶段还包括将数据分为训练集和测试集以构建模型。
+3. **选择训练方法**。根据您的问题和数据的性质,您需要选择一种训练模型的方法,以便最好地反映数据并对其进行准确预测。这是机器学习流程中需要特定专业知识的部分,通常需要大量的实验。
+4. **训练模型**。使用训练数据,您将使用各种算法训练模型以识别数据中的模式。模型可能会利用内部权重,这些权重可以调整以优先考虑数据的某些部分,从而构建更好的模型。
+5. **评估模型**。使用从未见过的数据(测试数据)来检查模型的表现。
+6. **参数调优**。根据模型的表现,您可以使用不同的参数或变量重新进行训练,这些参数或变量控制用于训练模型的算法的行为。
+7. **预测**。使用新的输入测试模型的准确性。
+
+## 提出什么问题
+
+计算机特别擅长发现数据中的隐藏模式。这种能力对研究人员来说非常有用,他们可能会提出一些无法通过条件规则引擎轻松回答的问题。例如,在精算任务中,数据科学家可能能够围绕吸烟者与非吸烟者的死亡率构建手工规则。
+
+然而,当许多其他变量被纳入考虑时,机器学习模型可能更高效地根据过去的健康历史预测未来的死亡率。一个更令人愉快的例子可能是基于纬度、经度、气候变化、靠近海洋、喷流模式等数据预测某地四月份的天气。
+
+✅ 这份[幻灯片](https://www2.cisl.ucar.edu/sites/default/files/2021-10/0900%20June%2024%20Haupt_0.pdf)提供了使用机器学习进行天气分析的历史视角。
+
+## 构建前的任务
+
+在开始构建模型之前,您需要完成几个任务。为了测试您的问题并根据模型的预测形成假设,您需要识别并配置几个要素。
+
+### 数据
+
+为了以任何确定性回答您的问题,您需要足够数量的正确类型的数据。在这一点上,您需要完成以下两件事:
+
+- **收集数据**。牢记上一课关于数据分析公平性的内容,谨慎收集数据。注意数据的来源、可能存在的内在偏见,并记录其来源。
+- **准备数据**。数据准备过程包括多个步骤。如果数据来自不同来源,您可能需要整理并规范化数据。您可以通过各种方法提高数据的质量和数量,例如将字符串转换为数字(如我们在[聚类](../../5-Clustering/1-Visualize/README.md)中所做的)。您还可以基于原始数据生成新数据(如我们在[分类](../../4-Classification/1-Introduction/README.md)中所做的)。您可以清理和编辑数据(如我们在[Web 应用](../../3-Web-App/README.md)课程之前所做的)。最后,根据您的训练技术,您可能还需要随机化和打乱数据。
+
+✅ 在收集和处理数据后,花点时间检查其形状是否能帮助您解决预期问题。可能会发现数据在给定任务中表现不佳,就像我们在[聚类](../../5-Clustering/1-Visualize/README.md)课程中发现的那样!
+
+### 特征和目标
+
+[特征](https://www.datasciencecentral.com/profiles/blogs/an-introduction-to-variable-and-feature-selection)是数据的可测量属性。在许多数据集中,它通常以列标题的形式表示,例如“日期”、“大小”或“颜色”。特征变量通常在代码中表示为`X`,代表用于训练模型的输入变量。
+
+目标是您试图预测的内容。目标通常在代码中表示为`y`,代表您试图从数据中回答的问题:在十二月,哪种**颜色**的南瓜最便宜?在旧金山,哪些社区的房地产**价格**最好?有时目标也被称为标签属性。
+
+### 选择特征变量
+
+🎓 **特征选择和特征提取** 如何在构建模型时选择变量?您可能会经历特征选择或特征提取的过程,以选择最适合的变量来构建性能最佳的模型。然而,它们并不相同:“特征提取通过原始特征的函数创建新特征,而特征选择返回特征的子集。”([来源](https://wikipedia.org/wiki/Feature_selection))
+
+### 可视化数据
+
+数据科学家工具箱的重要组成部分是使用 Seaborn 或 MatPlotLib 等优秀库可视化数据的能力。通过可视化数据,您可能会发现可以利用的隐藏相关性。可视化还可能帮助您发现偏差或数据不平衡(如我们在[分类](../../4-Classification/2-Classifiers-1/README.md)中发现的那样)。
+
+### 划分数据集
+
+在训练之前,您需要将数据集划分为两个或更多不等大小的部分,同时确保它们能很好地代表数据。
+
+- **训练集**。数据集的这一部分用于训练模型。它通常占原始数据集的大部分。
+- **测试集**。测试数据集是一个独立的数据组,通常从原始数据中提取,用于验证模型的性能。
+- **验证集**。验证集是一个较小的独立数据组,用于调整模型的超参数或架构以改进模型。根据数据的大小和您提出的问题,您可能不需要构建这个第三组(如我们在[时间序列预测](../../7-TimeSeries/1-Introduction/README.md)中提到的)。
+
+## 构建模型
+
+使用训练数据,您的目标是通过各种算法**训练**模型,构建数据的统计表示。训练模型使其接触数据,并让它对发现的模式进行假设、验证并接受或拒绝。
+
+### 决定训练方法
+
+根据您的问题和数据的性质,您将选择一种训练方法。通过浏览[Scikit-learn 的文档](https://scikit-learn.org/stable/user_guide.html)(我们在本课程中使用的工具),您可以探索多种训练模型的方法。根据您的经验,您可能需要尝试几种不同的方法来构建最佳模型。数据科学家通常会经历一个过程,通过向模型提供未见过的数据来评估其性能,检查准确性、偏差和其他质量问题,并选择最适合当前任务的训练方法。
+
+### 训练模型
+
+有了训练数据,您可以开始“拟合”数据以创建模型。您会注意到,在许多机器学习库中,代码中会出现“model.fit”——此时,您将特征变量作为值数组(通常是`X`)和目标变量(通常是`y`)传入。
+
+### 评估模型
+
+一旦训练过程完成(对于大型模型可能需要多次迭代或“周期”),您可以使用测试数据评估模型的质量,以衡量其性能。这些数据是模型之前未分析过的原始数据的子集。您可以打印出关于模型质量的指标表。
+
+🎓 **模型拟合**
+
+在机器学习的背景下,模型拟合指的是模型底层函数在尝试分析未见过的数据时的准确性。
+
+🎓 **欠拟合**和**过拟合**是常见问题,会降低模型质量。欠拟合的模型无法很好地分析训练数据或未见过的数据,而过拟合的模型过于贴合训练数据的细节和噪声。过拟合的模型对训练数据的预测过于精准,而欠拟合的模型则不够准确。
+
+
+> 信息图由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+## 参数调优
+
+初步训练完成后,观察模型的质量,并通过调整其“超参数”来改进模型。阅读更多关于该过程的内容:[文档](https://docs.microsoft.com/en-us/azure/machine-learning/how-to-tune-hyperparameters?WT.mc_id=academic-77952-leestott)。
+
+## 预测
+
+这是您可以使用全新数据测试模型准确性的时刻。在“应用”机器学习场景中,例如构建用于生产的 Web 应用程序,这一过程可能涉及收集用户输入(例如按钮点击)以设置变量并将其发送到模型进行推断或评估。
+
+在这些课程中,您将学习如何使用这些步骤来准备、构建、测试、评估和预测——这些都是数据科学家的基本操作,同时也将帮助您在成为“全栈”机器学习工程师的旅程中不断进步。
+
+---
+
+## 🚀挑战
+
+绘制一张流程图,反映机器学习从业者的步骤。您认为自己目前处于哪个阶段?您预测在哪些方面会遇到困难?哪些部分对您来说似乎很容易?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+在线搜索数据科学家讨论日常工作的访谈。这里有一个[示例](https://www.youtube.com/watch?v=Z3IjgbbCEfs)。
+
+## 作业
+
+[采访一位数据科学家](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/4-techniques-of-ML/assignment.md b/translations/zh-CN/1-Introduction/4-techniques-of-ML/assignment.md
new file mode 100644
index 000000000..6a42463f7
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/4-techniques-of-ML/assignment.md
@@ -0,0 +1,16 @@
+# 采访数据科学家
+
+## 指导说明
+
+在你的公司、用户组、朋友或同学中,找一位专业从事数据科学工作的人员进行交流。撰写一篇简短的文章(500字),描述他们的日常工作内容。他们是专攻某一领域,还是从事“全栈”工作?
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ----------------------------------------------------------------------- | ------------------------------------------------------------- | -------------------- |
+| | 提交一篇符合字数要求、带有明确来源的文章,并以 .doc 文件形式呈现 | 文章来源不明确或字数少于要求 | 未提交文章 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/1-Introduction/README.md b/translations/zh-CN/1-Introduction/README.md
new file mode 100644
index 000000000..f56e93302
--- /dev/null
+++ b/translations/zh-CN/1-Introduction/README.md
@@ -0,0 +1,28 @@
+# 机器学习简介
+
+在本课程部分中,您将了解机器学习领域的基本概念、它的定义,并学习它的历史以及研究人员使用的相关技术。让我们一起探索这个机器学习的新世界吧!
+
+
+> 图片由 Bill Oxford 提供,来自 Unsplash
+
+### 课程
+
+1. [机器学习简介](1-intro-to-ML/README.md)
+1. [机器学习和人工智能的历史](2-history-of-ML/README.md)
+1. [公平性与机器学习](3-fairness/README.md)
+1. [机器学习的技术](4-techniques-of-ML/README.md)
+
+### 致谢
+
+《机器学习简介》由包括 [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan)、[Ornella Altunyan](https://twitter.com/ornelladotcom) 和 [Jen Looper](https://twitter.com/jenlooper) 在内的团队倾情创作。
+
+《机器学习的历史》由 [Jen Looper](https://twitter.com/jenlooper) 和 [Amy Boyd](https://twitter.com/AmyKateNicho) 倾情创作。
+
+《公平性与机器学习》由 [Tomomi Imura](https://twitter.com/girliemac) 倾情创作。
+
+《机器学习的技术》由 [Jen Looper](https://twitter.com/jenlooper) 和 [Chris Noring](https://twitter.com/softchris) 倾情创作。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/1-Tools/README.md b/translations/zh-CN/2-Regression/1-Tools/README.md
new file mode 100644
index 000000000..c076da040
--- /dev/null
+++ b/translations/zh-CN/2-Regression/1-Tools/README.md
@@ -0,0 +1,230 @@
+# 使用 Python 和 Scikit-learn 构建回归模型
+
+
+
+> 由 [Tomomi Imura](https://www.twitter.com/girlie_mac) 绘制的手绘笔记
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+> ### [本课程也提供 R 版本!](../../../../2-Regression/1-Tools/solution/R/lesson_1.html)
+
+## 简介
+
+在这四节课中,您将学习如何构建回归模型。我们很快会讨论这些模型的用途。但在开始之前,请确保您已准备好正确的工具来进行学习!
+
+在本课中,您将学习:
+
+- 配置您的计算机以进行本地机器学习任务。
+- 使用 Jupyter 笔记本。
+- 安装并使用 Scikit-learn。
+- 通过动手练习探索线性回归。
+
+## 安装和配置
+
+[](https://youtu.be/-DfeD2k2Kj0 "机器学习入门 - 配置工具以构建机器学习模型")
+
+> 🎥 点击上方图片观看短视频,了解如何配置您的计算机以进行机器学习。
+
+1. **安装 Python**。确保您的计算机上已安装 [Python](https://www.python.org/downloads/)。您将使用 Python 来完成许多数据科学和机器学习任务。大多数计算机系统已经预装了 Python。此外,还有一些有用的 [Python 编码包](https://code.visualstudio.com/learn/educators/installers?WT.mc_id=academic-77952-leestott),可以简化某些用户的设置过程。
+
+ 不过,某些 Python 的使用场景可能需要不同版本的软件。因此,建议您使用 [虚拟环境](https://docs.python.org/3/library/venv.html)。
+
+2. **安装 Visual Studio Code**。确保您的计算机上已安装 Visual Studio Code。按照这些说明完成 [Visual Studio Code 的安装](https://code.visualstudio.com/)。在本课程中,您将使用 Python 在 Visual Studio Code 中进行开发,因此您可能需要了解如何 [配置 Visual Studio Code](https://docs.microsoft.com/learn/modules/python-install-vscode?WT.mc_id=academic-77952-leestott) 以进行 Python 开发。
+
+ > 通过学习这组 [模块](https://docs.microsoft.com/users/jenlooper-2911/collections/mp1pagggd5qrq7?WT.mc_id=academic-77952-leestott),熟悉 Python。
+ >
+ > [](https://youtu.be/yyQM70vi7V8 "使用 Visual Studio Code 设置 Python")
+ >
+ > 🎥 点击上方图片观看视频:在 VS Code 中使用 Python。
+
+3. **安装 Scikit-learn**,按照 [这些说明](https://scikit-learn.org/stable/install.html) 进行安装。由于需要确保使用 Python 3,建议您使用虚拟环境。如果您在 M1 Mac 上安装此库,请参考上述页面中的特殊说明。
+
+4. **安装 Jupyter Notebook**。您需要 [安装 Jupyter 包](https://pypi.org/project/jupyter/)。
+
+## 您的机器学习开发环境
+
+您将使用 **笔记本** 来开发 Python 代码并创建机器学习模型。这种文件类型是数据科学家常用的工具,其文件后缀为 `.ipynb`。
+
+笔记本是一种交互式环境,允许开发者编写代码并添加注释和文档,非常适合实验或研究项目。
+
+[](https://youtu.be/7E-jC8FLA2E "机器学习入门 - 设置 Jupyter 笔记本以开始构建回归模型")
+
+> 🎥 点击上方图片观看短视频,了解如何完成此练习。
+
+### 练习 - 使用笔记本
+
+在此文件夹中,您会找到文件 _notebook.ipynb_。
+
+1. 在 Visual Studio Code 中打开 _notebook.ipynb_。
+
+ 一个 Jupyter 服务器将启动,并使用 Python 3+。您会发现笔记本中可以运行的代码块。您可以通过选择播放按钮图标运行代码块。
+
+2. 选择 `md` 图标并添加一些 markdown,输入以下文本 **# 欢迎来到您的笔记本**。
+
+ 接下来,添加一些 Python 代码。
+
+3. 在代码块中输入 **print('hello notebook')**。
+4. 选择箭头运行代码。
+
+ 您应该会看到打印的结果:
+
+ ```output
+ hello notebook
+ ```
+
+
+
+您可以在代码中插入注释,以便自我记录笔记本内容。
+
+✅ 思考一下,网页开发者的工作环境与数据科学家的工作环境有何不同。
+
+## 使用 Scikit-learn 入门
+
+现在,Python 已在您的本地环境中设置完毕,并且您已经熟悉了 Jupyter 笔记本,接下来让我们熟悉一下 Scikit-learn(发音为 `sci`,像 `science`)。Scikit-learn 提供了一个 [广泛的 API](https://scikit-learn.org/stable/modules/classes.html#api-ref),帮助您完成机器学习任务。
+
+根据其 [官网](https://scikit-learn.org/stable/getting_started.html) 的介绍,“Scikit-learn 是一个开源机器学习库,支持监督学习和无监督学习。它还提供了各种工具,用于模型拟合、数据预处理、模型选择和评估,以及许多其他实用功能。”
+
+在本课程中,您将使用 Scikit-learn 和其他工具构建机器学习模型,以完成我们称为“传统机器学习”的任务。我们特意避开了神经网络和深度学习,因为这些内容将在即将推出的“AI 入门”课程中详细介绍。
+
+Scikit-learn 使构建模型并评估其使用变得简单。它主要专注于使用数值数据,并包含几个现成的数据集供学习使用。它还包括一些预构建的模型供学生尝试。让我们探索加载预打包数据并使用内置估算器构建第一个机器学习模型的过程。
+
+## 练习 - 您的第一个 Scikit-learn 笔记本
+
+> 本教程的灵感来源于 Scikit-learn 网站上的 [线性回归示例](https://scikit-learn.org/stable/auto_examples/linear_model/plot_ols.html#sphx-glr-auto-examples-linear-model-plot-ols-py)。
+
+[](https://youtu.be/2xkXL5EUpS0 "机器学习入门 - 您的第一个 Python 线性回归项目")
+
+> 🎥 点击上方图片观看短视频,了解如何完成此练习。
+
+在与本课相关的 _notebook.ipynb_ 文件中,按下“垃圾桶”图标清空所有单元格。
+
+在本节中,您将使用 Scikit-learn 中内置的一个关于糖尿病的小型数据集进行学习。假设您想测试一种针对糖尿病患者的治疗方法。机器学习模型可能会帮助您根据变量的组合确定哪些患者对治疗的反应更好。即使是一个非常基础的回归模型,当可视化时,也可能显示有关变量的信息,帮助您组织理论临床试验。
+
+✅ 回归方法有很多种,选择哪一种取决于您想要回答的问题。如果您想预测某个年龄段的人的可能身高,您可以使用线性回归,因为您在寻找一个 **数值**。如果您想确定某种菜肴是否应该被归类为素食,您在寻找一个 **类别分配**,因此您可以使用逻辑回归。稍后您将学习更多关于逻辑回归的内容。思考一下,您可以向数据提出哪些问题,以及哪种方法更适合回答这些问题。
+
+让我们开始这个任务。
+
+### 导入库
+
+在此任务中,我们将导入一些库:
+
+- **matplotlib**。这是一个有用的 [绘图工具](https://matplotlib.org/),我们将用它来创建折线图。
+- **numpy**。 [numpy](https://numpy.org/doc/stable/user/whatisnumpy.html) 是一个处理 Python 数值数据的有用库。
+- **sklearn**。这是 [Scikit-learn](https://scikit-learn.org/stable/user_guide.html) 库。
+
+导入一些库以帮助完成任务。
+
+1. 通过输入以下代码添加导入:
+
+ ```python
+ import matplotlib.pyplot as plt
+ import numpy as np
+ from sklearn import datasets, linear_model, model_selection
+ ```
+
+ 上述代码导入了 `matplotlib` 和 `numpy`,并从 `sklearn` 中导入了 `datasets`、`linear_model` 和 `model_selection`。`model_selection` 用于将数据分割为训练集和测试集。
+
+### 糖尿病数据集
+
+内置的 [糖尿病数据集](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) 包括 442 个关于糖尿病的数据样本,包含 10 个特征变量,其中一些包括:
+
+- age:年龄(以年为单位)
+- bmi:身体质量指数
+- bp:平均血压
+- s1 tc:T 细胞(白细胞的一种)
+
+✅ 此数据集包含“性别”这一特征变量,这在糖尿病研究中很重要。许多医学数据集都包含这种二元分类。思考一下,这种分类可能会如何将某些群体排除在治疗之外。
+
+现在,加载 X 和 y 数据。
+
+> 🎓 请记住,这是监督学习,我们需要一个名为“y”的目标变量。
+
+在新的代码单元中,通过调用 `load_diabetes()` 加载糖尿病数据集。输入参数 `return_X_y=True` 表示 `X` 将是数据矩阵,而 `y` 将是回归目标。
+
+1. 添加一些打印命令以显示数据矩阵的形状及其第一个元素:
+
+ ```python
+ X, y = datasets.load_diabetes(return_X_y=True)
+ print(X.shape)
+ print(X[0])
+ ```
+
+ 您得到的响应是一个元组。您将元组的前两个值分别赋给 `X` 和 `y`。了解更多 [关于元组](https://wikipedia.org/wiki/Tuple)。
+
+ 您可以看到这些数据有 442 个项目,每个项目是包含 10 个元素的数组:
+
+ ```text
+ (442, 10)
+ [ 0.03807591 0.05068012 0.06169621 0.02187235 -0.0442235 -0.03482076
+ -0.04340085 -0.00259226 0.01990842 -0.01764613]
+ ```
+
+ ✅ 思考一下数据与回归目标之间的关系。线性回归预测特征 X 和目标变量 y 之间的关系。您能在文档中找到糖尿病数据集的 [目标](https://scikit-learn.org/stable/datasets/toy_dataset.html#diabetes-dataset) 吗?这个数据集展示了什么?
+
+2. 接下来,通过选择数据集的第 3 列来绘制部分数据。您可以使用 `:` 操作符选择所有行,然后使用索引(2)选择第 3 列。您还可以使用 `reshape(n_rows, n_columns)` 将数据重塑为二维数组(绘图所需)。如果其中一个参数为 -1,则对应的维度会自动计算。
+
+ ```python
+ X = X[:, 2]
+ X = X.reshape((-1,1))
+ ```
+
+ ✅ 随时打印数据以检查其形状。
+
+3. 现在您已经准备好绘制数据,可以看看机器是否能帮助确定数据集中的逻辑分割。为此,您需要将数据(X)和目标(y)分割为测试集和训练集。Scikit-learn 提供了一种简单的方法,您可以在给定点分割测试数据。
+
+ ```python
+ X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)
+ ```
+
+4. 现在您可以训练模型了!加载线性回归模型,并使用 `model.fit()` 用 X 和 y 训练集训练模型:
+
+ ```python
+ model = linear_model.LinearRegression()
+ model.fit(X_train, y_train)
+ ```
+
+ ✅ `model.fit()` 是一个您会在许多机器学习库(如 TensorFlow)中看到的函数。
+
+5. 然后,使用测试数据创建预测,使用 `predict()` 函数。这将用于绘制数据组之间的分割线。
+
+ ```python
+ y_pred = model.predict(X_test)
+ ```
+
+6. 现在是时候用图表展示数据了。Matplotlib 是一个非常有用的工具。创建一个所有 X 和 y 测试数据的散点图,并使用预测结果在数据组之间绘制一条最合适的线。
+
+ ```python
+ plt.scatter(X_test, y_test, color='black')
+ plt.plot(X_test, y_pred, color='blue', linewidth=3)
+ plt.xlabel('Scaled BMIs')
+ plt.ylabel('Disease Progression')
+ plt.title('A Graph Plot Showing Diabetes Progression Against BMI')
+ plt.show()
+ ```
+
+ 
+✅ 想一想这里发生了什么。一条直线穿过许多小数据点,但它究竟在做什么?你能看出如何利用这条线来预测一个新的、未见过的数据点在图表的 y 轴上的位置吗?试着用语言描述这个模型的实际用途。
+
+恭喜你!你已经构建了第一个线性回归模型,用它进行了预测,并将结果显示在图表中!
+
+---
+## 🚀挑战
+
+绘制该数据集中不同变量的图表。提示:编辑这行代码:`X = X[:,2]`。根据该数据集的目标,你能发现关于糖尿病作为一种疾病的进展的什么信息?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+在本教程中,你使用了简单线性回归,而不是单变量或多变量线性回归。阅读一些关于这些方法之间差异的内容,或者观看[这个视频](https://www.coursera.org/lecture/quantifying-relationships-regression-models/linear-vs-nonlinear-categorical-variables-ai2Ef)。
+
+阅读更多关于回归的概念,并思考这种技术可以回答哪些类型的问题。通过[这个教程](https://docs.microsoft.com/learn/modules/train-evaluate-regression-models?WT.mc_id=academic-77952-leestott)来加深你的理解。
+
+## 作业
+
+[一个不同的数据集](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/1-Tools/assignment.md b/translations/zh-CN/2-Regression/1-Tools/assignment.md
new file mode 100644
index 000000000..50a58c181
--- /dev/null
+++ b/translations/zh-CN/2-Regression/1-Tools/assignment.md
@@ -0,0 +1,18 @@
+# 使用 Scikit-learn 进行回归分析
+
+## 说明
+
+查看 Scikit-learn 中的 [Linnerud 数据集](https://scikit-learn.org/stable/modules/generated/sklearn.datasets.load_linnerud.html#sklearn.datasets.load_linnerud)。这个数据集包含多个[目标变量](https://scikit-learn.org/stable/datasets/toy_dataset.html#linnerrud-dataset):“它由三项运动(数据)和三项生理指标(目标变量)组成,这些数据是从一家健身俱乐部的二十名中年男性中收集的。”
+
+用你自己的话描述如何创建一个回归模型,以绘制腰围与完成仰卧起坐次数之间的关系。同样,针对该数据集中的其他数据点也进行类似的描述。
+
+## 评分标准
+
+| 标准 | 优秀 | 合格 | 需要改进 |
+| ----------------------------- | --------------------------------- | ---------------------------- | ------------------------- |
+| 提交描述性段落 | 提交了一段写得很好的描述性段落 | 提交了几句话 | 未提供任何描述 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/1-Tools/notebook.ipynb b/translations/zh-CN/2-Regression/1-Tools/notebook.ipynb
new file mode 100644
index 000000000..e69de29bb
diff --git a/translations/zh-CN/2-Regression/1-Tools/solution/Julia/README.md b/translations/zh-CN/2-Regression/1-Tools/solution/Julia/README.md
new file mode 100644
index 000000000..779236745
--- /dev/null
+++ b/translations/zh-CN/2-Regression/1-Tools/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/1-Tools/solution/R/lesson_1-R.ipynb b/translations/zh-CN/2-Regression/1-Tools/solution/R/lesson_1-R.ipynb
new file mode 100644
index 000000000..f7e1ee54c
--- /dev/null
+++ b/translations/zh-CN/2-Regression/1-Tools/solution/R/lesson_1-R.ipynb
@@ -0,0 +1,447 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {
+ "colab": {
+ "name": "lesson_1-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ },
+ "coopTranslator": {
+ "original_hash": "c18d3bd0bd8ae3878597e89dcd1fa5c1",
+ "translation_date": "2025-09-03T19:43:03+00:00",
+ "source_file": "2-Regression/1-Tools/solution/R/lesson_1-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "YJUHCXqK57yz"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 回归简介 - 第1课\n",
+ "\n",
+ "#### 放到实际情境中\n",
+ "\n",
+ "✅ 回归方法有很多种,选择哪一种取决于你想要得到的答案。如果你想预测某个年龄段的人可能的身高,你会使用 `线性回归`,因为你在寻找一个**数值结果**。如果你想知道某种菜肴是否应该被归类为素食,你是在寻找一个**类别分配**,因此你会使用 `逻辑回归`。稍后你会学习更多关于逻辑回归的内容。试着思考一些你可以从数据中提出的问题,以及哪种方法更适合这些问题。\n",
+ "\n",
+ "在本节中,你将使用一个[关于糖尿病的小型数据集](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.html)。假设你想测试一种针对糖尿病患者的治疗方法。机器学习模型可能会帮助你根据变量的组合确定哪些患者对治疗的反应更好。即使是一个非常基础的回归模型,在可视化时也可能显示出一些关于变量的信息,这些信息可以帮助你组织理论上的临床试验。\n",
+ "\n",
+ "话不多说,让我们开始这项任务吧!\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 由 @allison_horst 创作的艺术作品 \n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "LWNNzfqd6feZ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 1. 加载工具集\n",
+ "\n",
+ "在这个任务中,我们需要以下软件包:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个[由 R 软件包组成的集合](https://www.tidyverse.org/packages),旨在让数据科学更快速、更简单、更有趣!\n",
+ "\n",
+ "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个[由软件包组成的集合](https://www.tidymodels.org/packages/),用于建模和机器学习。\n",
+ "\n",
+ "你可以通过以下命令安装它们:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\"))`\n",
+ "\n",
+ "下面的脚本会检查你是否拥有完成本模块所需的软件包,并在缺少某些软件包时为你安装它们。\n"
+ ],
+ "metadata": {
+ "id": "FIo2YhO26wI9"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "source": [
+ "suppressWarnings(if(!require(\"pacman\")) install.packages(\"pacman\"))\n",
+ "pacman::p_load(tidyverse, tidymodels)"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "Loading required package: pacman\n",
+ "\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "id": "cIA9fz9v7Dss",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "2df7073b-86b2-4b32-cb86-0da605a0dc11"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在,让我们加载这些超棒的包并使它们在我们当前的 R 会话中可用。(这只是为了说明,`pacman::p_load()` 已经为您完成了这一操作)\n"
+ ],
+ "metadata": {
+ "id": "gpO_P_6f9WUG"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# load the core Tidyverse packages\r\n",
+ "library(tidyverse)\r\n",
+ "\r\n",
+ "# load the core Tidymodels packages\r\n",
+ "library(tidymodels)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "NLMycgG-9ezO"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 2. 糖尿病数据集\n",
+ "\n",
+ "在本次练习中,我们将通过对糖尿病数据集进行预测来展示我们的回归技能。[糖尿病数据集](https://www4.stat.ncsu.edu/~boos/var.select/diabetes.rwrite1.txt) 包含 `442 个样本`,数据包括 10 个预测特征变量:`年龄`、`性别`、`身体质量指数`、`平均血压`以及 `六项血清测量值`,还有一个结果变量 `y`:衡量基线后一年疾病进展的定量指标。\n",
+ "\n",
+ "|观察数量|442|\n",
+ "|----------------------|:---|\n",
+ "|预测变量数量|前 10 列为数值型预测变量|\n",
+ "|结果/目标|第 11 列为基线后一年疾病进展的定量指标|\n",
+ "|预测变量信息|- 年龄(以年为单位)\n",
+ "||- 性别\n",
+ "||- bmi 身体质量指数\n",
+ "||- bp 平均血压\n",
+ "||- s1 tc,总血清胆固醇\n",
+ "||- s2 ldl,低密度脂蛋白\n",
+ "||- s3 hdl,高密度脂蛋白\n",
+ "||- s4 tch,总胆固醇 / HDL\n",
+ "||- s5 ltg,可能是血清甘油三酯水平的对数值\n",
+ "||- s6 glu,血糖水平|\n",
+ "\n",
+ "> 🎓 记住,这是监督学习,我们需要一个名为 'y' 的目标变量。\n",
+ "\n",
+ "在使用 R 操作数据之前,您需要将数据导入 R 的内存,或者建立一个 R 可以用来远程访问数据的连接。\n",
+ "\n",
+ "> [readr](https://readr.tidyverse.org/) 包是 Tidyverse 的一部分,它提供了一种快速且友好的方式将矩形数据读入 R。\n",
+ "\n",
+ "现在,让我们加载来源 URL 提供的糖尿病数据集:\n",
+ "\n",
+ "此外,我们将使用 `glimpse()` 对数据进行基本检查,并使用 `slice()` 显示前 5 行。\n",
+ "\n",
+ "在继续之前,让我们介绍一个您在 R 代码中经常会遇到的东西 🥁🥁:管道操作符 `%>%`\n",
+ "\n",
+ "管道操作符 (`%>%`) 按逻辑顺序执行操作,将一个对象传递到函数或表达式中。您可以将管道操作符理解为代码中的“然后”。\n"
+ ],
+ "metadata": {
+ "id": "KM6iXLH996Cl"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Import the data set\r\n",
+ "diabetes <- read_table2(file = \"https://www4.stat.ncsu.edu/~boos/var.select/diabetes.rwrite1.txt\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Get a glimpse and dimensions of the data\r\n",
+ "glimpse(diabetes)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Select the first 5 rows of the data\r\n",
+ "diabetes %>% \r\n",
+ " slice(1:5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "Z1geAMhM-bSP"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "`glimpse()` 命令显示,这个数据集有 442 行和 11 列,所有列的数据类型均为 `double`。\n",
+ "\n",
+ " \n",
+ "\n",
+ "> `glimpse()` 和 `slice()` 是 [`dplyr`](https://dplyr.tidyverse.org/) 包中的函数。Dplyr 是 Tidyverse 的一部分,它是一种数据操作的语法,提供了一组一致的动词,帮助解决最常见的数据处理问题。\n",
+ "\n",
+ " \n",
+ "\n",
+ "现在我们已经有了数据,接下来让我们聚焦于一个特征(`bmi`),作为本次练习的目标。为此,我们需要选择所需的列。那么,我们该如何操作呢?\n",
+ "\n",
+ "[`dplyr::select()`](https://dplyr.tidyverse.org/reference/select.html) 函数允许我们在数据框中*选择*(并可选地重命名)列。\n"
+ ],
+ "metadata": {
+ "id": "UwjVT1Hz-c3Z"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Select predictor feature `bmi` and outcome `y`\r\n",
+ "diabetes_select <- diabetes %>% \r\n",
+ " select(c(bmi, y))\r\n",
+ "\r\n",
+ "# Print the first 5 rows\r\n",
+ "diabetes_select %>% \r\n",
+ " slice(1:10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "RDY1oAKI-m80"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 3. 训练和测试数据\n",
+ "\n",
+ "在监督学习中,通常的做法是将数据*分割*成两个子集:一个(通常较大的)用于训练模型的集合,以及一个较小的“保留”集合,用于查看模型的表现如何。\n",
+ "\n",
+ "现在我们已经准备好了数据,可以看看机器是否能够帮助我们在这个数据集中找到一个合理的分割方式。我们可以使用 [rsample](https://tidymodels.github.io/rsample/) 包,它是 Tidymodels 框架的一部分,用来创建一个包含数据分割信息的对象,然后使用另外两个 rsample 函数提取生成的训练集和测试集:\n"
+ ],
+ "metadata": {
+ "id": "SDk668xK-tc3"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "set.seed(2056)\r\n",
+ "# Split 67% of the data for training and the rest for tesing\r\n",
+ "diabetes_split <- diabetes_select %>% \r\n",
+ " initial_split(prop = 0.67)\r\n",
+ "\r\n",
+ "# Extract the resulting train and test sets\r\n",
+ "diabetes_train <- training(diabetes_split)\r\n",
+ "diabetes_test <- testing(diabetes_split)\r\n",
+ "\r\n",
+ "# Print the first 3 rows of the training set\r\n",
+ "diabetes_train %>% \r\n",
+ " slice(1:10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "EqtHx129-1h-"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 4. 使用Tidymodels训练线性回归模型\n",
+ "\n",
+ "现在我们可以开始训练模型了!\n",
+ "\n",
+ "在Tidymodels中,你可以通过`parsnip()`指定模型,主要涉及以下三个概念:\n",
+ "\n",
+ "- 模型**类型**区分了不同的模型,例如线性回归、逻辑回归、决策树模型等。\n",
+ "\n",
+ "- 模型**模式**包括常见选项如回归和分类;某些模型类型支持这两种模式,而有些仅支持其中一种。\n",
+ "\n",
+ "- 模型**引擎**是用于拟合模型的计算工具。通常这些是R包,例如 **`\"lm\"`** 或 **`\"ranger\"`**。\n",
+ "\n",
+ "这些建模信息会被捕获到一个模型规范中,所以让我们来构建一个吧!\n"
+ ],
+ "metadata": {
+ "id": "sBOS-XhB-6v7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Build a linear model specification\r\n",
+ "lm_spec <- \r\n",
+ " # Type\r\n",
+ " linear_reg() %>% \r\n",
+ " # Engine\r\n",
+ " set_engine(\"lm\") %>% \r\n",
+ " # Mode\r\n",
+ " set_mode(\"regression\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print the model specification\r\n",
+ "lm_spec"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "20OwEw20--t3"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "在模型被*指定*之后,可以使用 [`fit()`](https://parsnip.tidymodels.org/reference/fit.html) 函数对模型进行`估计`或`训练`,通常使用公式和一些数据。\n",
+ "\n",
+ "`y ~ .` 表示我们将拟合 `y` 作为预测值/目标,由所有预测变量/特征解释,即 `.`(在这个例子中,我们只有一个预测变量:`bmi`)。\n"
+ ],
+ "metadata": {
+ "id": "_oDHs89k_CJj"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Build a linear model specification\r\n",
+ "lm_spec <- linear_reg() %>% \r\n",
+ " set_engine(\"lm\") %>%\r\n",
+ " set_mode(\"regression\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Train a linear regression model\r\n",
+ "lm_mod <- lm_spec %>% \r\n",
+ " fit(y ~ ., data = diabetes_train)\r\n",
+ "\r\n",
+ "# Print the model\r\n",
+ "lm_mod"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "YlsHqd-q_GJQ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "从模型输出中,我们可以看到训练过程中学习到的系数。它们表示最佳拟合线的系数,该线使实际变量与预测变量之间的总体误差最小。\n",
+ " \n",
+ "\n",
+ "## 5. 在测试集上进行预测\n",
+ "\n",
+ "现在我们已经训练了一个模型,可以使用它通过 [parsnip::predict()](https://parsnip.tidymodels.org/reference/predict.model_fit.html) 来预测测试数据集中的疾病进展 y。这将用于绘制数据组之间的分界线。\n"
+ ],
+ "metadata": {
+ "id": "kGZ22RQj_Olu"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make predictions for the test set\r\n",
+ "predictions <- lm_mod %>% \r\n",
+ " predict(new_data = diabetes_test)\r\n",
+ "\r\n",
+ "# Print out some of the predictions\r\n",
+ "predictions %>% \r\n",
+ " slice(1:5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "nXHbY7M2_aao"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "哇哦!💃🕺 我们刚刚训练了一个模型,并用它进行了预测!\n",
+ "\n",
+ "在进行预测时,tidymodels 的惯例是始终生成一个带有标准化列名的 tibble/数据框结果。这使得将原始数据和预测结果结合在一个可用的格式中变得非常简单,方便后续操作,例如绘图。\n",
+ "\n",
+ "`dplyr::bind_cols()` 可以高效地将多个数据框按列绑定在一起。\n"
+ ],
+ "metadata": {
+ "id": "R_JstwUY_bIs"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Combine the predictions and the original test set\r\n",
+ "results <- diabetes_test %>% \r\n",
+ " bind_cols(predictions)\r\n",
+ "\r\n",
+ "\r\n",
+ "results %>% \r\n",
+ " slice(1:5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "RybsMJR7_iI8"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 6. 绘制模型结果\n",
+ "\n",
+ "现在是时候用视觉化方式来看结果了 📈。我们将创建一个散点图,展示测试集中的所有 `y` 和 `bmi` 值,然后使用预测结果绘制一条线,将模型的数据分组之间连接起来,放在最合适的位置。\n",
+ "\n",
+ "R 有多种绘图系统,但 `ggplot2` 是其中最优雅且最灵活的一个。它允许你通过**组合独立组件**来构建图表。\n"
+ ],
+ "metadata": {
+ "id": "XJbYbMZW_n_s"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Set a theme for the plot\r\n",
+ "theme_set(theme_light())\r\n",
+ "# Create a scatter plot\r\n",
+ "results %>% \r\n",
+ " ggplot(aes(x = bmi)) +\r\n",
+ " # Add a scatter plot\r\n",
+ " geom_point(aes(y = y), size = 1.6) +\r\n",
+ " # Add a line plot\r\n",
+ " geom_line(aes(y = .pred), color = \"blue\", size = 1.5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "R9tYp3VW_sTn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "✅ 想一想这里发生了什么。一条直线穿过了许多小数据点,但它究竟在做什么呢?你能看出如何利用这条直线来预测一个新的、未见过的数据点应该如何与图表的 y 轴相关联吗?试着用语言描述这个模型的实际用途。\n",
+ "\n",
+ "恭喜你!你已经构建了第一个线性回归模型,用它进行了预测,并在图表中展示了结果!\n"
+ ],
+ "metadata": {
+ "id": "zrPtHIxx_tNI"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/1-Tools/solution/notebook.ipynb b/translations/zh-CN/2-Regression/1-Tools/solution/notebook.ipynb
new file mode 100644
index 000000000..def6e582b
--- /dev/null
+++ b/translations/zh-CN/2-Regression/1-Tools/solution/notebook.ipynb
@@ -0,0 +1,677 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 糖尿病数据集的线性回归 - 第1课\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "导入所需的库\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "from sklearn import datasets, linear_model, model_selection\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "加载糖尿病数据集,分为 `X` 数据和 `y` 特征\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(442, 10)\n",
+ "[ 0.03807591 0.05068012 0.06169621 0.02187239 -0.0442235 -0.03482076\n",
+ " -0.04340085 -0.00259226 0.01990749 -0.01764613]\n"
+ ]
+ }
+ ],
+ "source": [
+ "X, y = datasets.load_diabetes(return_X_y=True)\n",
+ "print(X.shape)\n",
+ "print(X[0])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "仅选择一个功能作为此练习的目标\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(442,)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Selecting the 3rd feature\n",
+ "X = X[:, 2]\n",
+ "print(X.shape)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(442, 1)\n",
+ "[[ 0.06169621]\n",
+ " [-0.05147406]\n",
+ " [ 0.04445121]\n",
+ " [-0.01159501]\n",
+ " [-0.03638469]\n",
+ " [-0.04069594]\n",
+ " [-0.04716281]\n",
+ " [-0.00189471]\n",
+ " [ 0.06169621]\n",
+ " [ 0.03906215]\n",
+ " [-0.08380842]\n",
+ " [ 0.01750591]\n",
+ " [-0.02884001]\n",
+ " [-0.00189471]\n",
+ " [-0.02560657]\n",
+ " [-0.01806189]\n",
+ " [ 0.04229559]\n",
+ " [ 0.01211685]\n",
+ " [-0.0105172 ]\n",
+ " [-0.01806189]\n",
+ " [-0.05686312]\n",
+ " [-0.02237314]\n",
+ " [-0.00405033]\n",
+ " [ 0.06061839]\n",
+ " [ 0.03582872]\n",
+ " [-0.01267283]\n",
+ " [-0.07734155]\n",
+ " [ 0.05954058]\n",
+ " [-0.02129532]\n",
+ " [-0.00620595]\n",
+ " [ 0.04445121]\n",
+ " [-0.06548562]\n",
+ " [ 0.12528712]\n",
+ " [-0.05039625]\n",
+ " [-0.06332999]\n",
+ " [-0.03099563]\n",
+ " [ 0.02289497]\n",
+ " [ 0.01103904]\n",
+ " [ 0.07139652]\n",
+ " [ 0.01427248]\n",
+ " [-0.00836158]\n",
+ " [-0.06764124]\n",
+ " [-0.0105172 ]\n",
+ " [-0.02345095]\n",
+ " [ 0.06816308]\n",
+ " [-0.03530688]\n",
+ " [-0.01159501]\n",
+ " [-0.0730303 ]\n",
+ " [-0.04177375]\n",
+ " [ 0.01427248]\n",
+ " [-0.00728377]\n",
+ " [ 0.0164281 ]\n",
+ " [-0.00943939]\n",
+ " [-0.01590626]\n",
+ " [ 0.0250506 ]\n",
+ " [-0.04931844]\n",
+ " [ 0.04121778]\n",
+ " [-0.06332999]\n",
+ " [-0.06440781]\n",
+ " [-0.02560657]\n",
+ " [-0.00405033]\n",
+ " [ 0.00457217]\n",
+ " [-0.00728377]\n",
+ " [-0.0374625 ]\n",
+ " [-0.02560657]\n",
+ " [-0.02452876]\n",
+ " [-0.01806189]\n",
+ " [-0.01482845]\n",
+ " [-0.02991782]\n",
+ " [-0.046085 ]\n",
+ " [-0.06979687]\n",
+ " [ 0.03367309]\n",
+ " [-0.00405033]\n",
+ " [-0.02021751]\n",
+ " [ 0.00241654]\n",
+ " [-0.03099563]\n",
+ " [ 0.02828403]\n",
+ " [-0.03638469]\n",
+ " [-0.05794093]\n",
+ " [-0.0374625 ]\n",
+ " [ 0.01211685]\n",
+ " [-0.02237314]\n",
+ " [-0.03530688]\n",
+ " [ 0.00996123]\n",
+ " [-0.03961813]\n",
+ " [ 0.07139652]\n",
+ " [-0.07518593]\n",
+ " [-0.00620595]\n",
+ " [-0.04069594]\n",
+ " [-0.04824063]\n",
+ " [-0.02560657]\n",
+ " [ 0.0519959 ]\n",
+ " [ 0.00457217]\n",
+ " [-0.06440781]\n",
+ " [-0.01698407]\n",
+ " [-0.05794093]\n",
+ " [ 0.00996123]\n",
+ " [ 0.08864151]\n",
+ " [-0.00512814]\n",
+ " [-0.06440781]\n",
+ " [ 0.01750591]\n",
+ " [-0.04500719]\n",
+ " [ 0.02828403]\n",
+ " [ 0.04121778]\n",
+ " [ 0.06492964]\n",
+ " [-0.03207344]\n",
+ " [-0.07626374]\n",
+ " [ 0.04984027]\n",
+ " [ 0.04552903]\n",
+ " [-0.00943939]\n",
+ " [-0.03207344]\n",
+ " [ 0.00457217]\n",
+ " [ 0.02073935]\n",
+ " [ 0.01427248]\n",
+ " [ 0.11019775]\n",
+ " [ 0.00133873]\n",
+ " [ 0.05846277]\n",
+ " [-0.02129532]\n",
+ " [-0.0105172 ]\n",
+ " [-0.04716281]\n",
+ " [ 0.00457217]\n",
+ " [ 0.01750591]\n",
+ " [ 0.08109682]\n",
+ " [ 0.0347509 ]\n",
+ " [ 0.02397278]\n",
+ " [-0.00836158]\n",
+ " [-0.06117437]\n",
+ " [-0.00189471]\n",
+ " [-0.06225218]\n",
+ " [ 0.0164281 ]\n",
+ " [ 0.09618619]\n",
+ " [-0.06979687]\n",
+ " [-0.02129532]\n",
+ " [-0.05362969]\n",
+ " [ 0.0433734 ]\n",
+ " [ 0.05630715]\n",
+ " [-0.0816528 ]\n",
+ " [ 0.04984027]\n",
+ " [ 0.11127556]\n",
+ " [ 0.06169621]\n",
+ " [ 0.01427248]\n",
+ " [ 0.04768465]\n",
+ " [ 0.01211685]\n",
+ " [ 0.00564998]\n",
+ " [ 0.04660684]\n",
+ " [ 0.12852056]\n",
+ " [ 0.05954058]\n",
+ " [ 0.09295276]\n",
+ " [ 0.01535029]\n",
+ " [-0.00512814]\n",
+ " [ 0.0703187 ]\n",
+ " [-0.00405033]\n",
+ " [-0.00081689]\n",
+ " [-0.04392938]\n",
+ " [ 0.02073935]\n",
+ " [ 0.06061839]\n",
+ " [-0.0105172 ]\n",
+ " [-0.03315126]\n",
+ " [-0.06548562]\n",
+ " [ 0.0433734 ]\n",
+ " [-0.06225218]\n",
+ " [ 0.06385183]\n",
+ " [ 0.03043966]\n",
+ " [ 0.07247433]\n",
+ " [-0.0191397 ]\n",
+ " [-0.06656343]\n",
+ " [-0.06009656]\n",
+ " [ 0.06924089]\n",
+ " [ 0.05954058]\n",
+ " [-0.02668438]\n",
+ " [-0.02021751]\n",
+ " [-0.046085 ]\n",
+ " [ 0.07139652]\n",
+ " [-0.07949718]\n",
+ " [ 0.00996123]\n",
+ " [-0.03854032]\n",
+ " [ 0.01966154]\n",
+ " [ 0.02720622]\n",
+ " [-0.00836158]\n",
+ " [-0.01590626]\n",
+ " [ 0.00457217]\n",
+ " [-0.04285156]\n",
+ " [ 0.00564998]\n",
+ " [-0.03530688]\n",
+ " [ 0.02397278]\n",
+ " [-0.01806189]\n",
+ " [ 0.04229559]\n",
+ " [-0.0547075 ]\n",
+ " [-0.00297252]\n",
+ " [-0.06656343]\n",
+ " [-0.01267283]\n",
+ " [-0.04177375]\n",
+ " [-0.03099563]\n",
+ " [-0.00512814]\n",
+ " [-0.05901875]\n",
+ " [ 0.0250506 ]\n",
+ " [-0.046085 ]\n",
+ " [ 0.00349435]\n",
+ " [ 0.05415152]\n",
+ " [-0.04500719]\n",
+ " [-0.05794093]\n",
+ " [-0.05578531]\n",
+ " [ 0.00133873]\n",
+ " [ 0.03043966]\n",
+ " [ 0.00672779]\n",
+ " [ 0.04660684]\n",
+ " [ 0.02612841]\n",
+ " [ 0.04552903]\n",
+ " [ 0.04013997]\n",
+ " [-0.01806189]\n",
+ " [ 0.01427248]\n",
+ " [ 0.03690653]\n",
+ " [ 0.00349435]\n",
+ " [-0.07087468]\n",
+ " [-0.03315126]\n",
+ " [ 0.09403057]\n",
+ " [ 0.03582872]\n",
+ " [ 0.03151747]\n",
+ " [-0.06548562]\n",
+ " [-0.04177375]\n",
+ " [-0.03961813]\n",
+ " [-0.03854032]\n",
+ " [-0.02560657]\n",
+ " [-0.02345095]\n",
+ " [-0.06656343]\n",
+ " [ 0.03259528]\n",
+ " [-0.046085 ]\n",
+ " [-0.02991782]\n",
+ " [-0.01267283]\n",
+ " [-0.01590626]\n",
+ " [ 0.07139652]\n",
+ " [-0.03099563]\n",
+ " [ 0.00026092]\n",
+ " [ 0.03690653]\n",
+ " [ 0.03906215]\n",
+ " [-0.01482845]\n",
+ " [ 0.00672779]\n",
+ " [-0.06871905]\n",
+ " [-0.00943939]\n",
+ " [ 0.01966154]\n",
+ " [ 0.07462995]\n",
+ " [-0.00836158]\n",
+ " [-0.02345095]\n",
+ " [-0.046085 ]\n",
+ " [ 0.05415152]\n",
+ " [-0.03530688]\n",
+ " [-0.03207344]\n",
+ " [-0.0816528 ]\n",
+ " [ 0.04768465]\n",
+ " [ 0.06061839]\n",
+ " [ 0.05630715]\n",
+ " [ 0.09834182]\n",
+ " [ 0.05954058]\n",
+ " [ 0.03367309]\n",
+ " [ 0.05630715]\n",
+ " [-0.06548562]\n",
+ " [ 0.16085492]\n",
+ " [-0.05578531]\n",
+ " [-0.02452876]\n",
+ " [-0.03638469]\n",
+ " [-0.00836158]\n",
+ " [-0.04177375]\n",
+ " [ 0.12744274]\n",
+ " [-0.07734155]\n",
+ " [ 0.02828403]\n",
+ " [-0.02560657]\n",
+ " [-0.06225218]\n",
+ " [-0.00081689]\n",
+ " [ 0.08864151]\n",
+ " [-0.03207344]\n",
+ " [ 0.03043966]\n",
+ " [ 0.00888341]\n",
+ " [ 0.00672779]\n",
+ " [-0.02021751]\n",
+ " [-0.02452876]\n",
+ " [-0.01159501]\n",
+ " [ 0.02612841]\n",
+ " [-0.05901875]\n",
+ " [-0.03638469]\n",
+ " [-0.02452876]\n",
+ " [ 0.01858372]\n",
+ " [-0.0902753 ]\n",
+ " [-0.00512814]\n",
+ " [-0.05255187]\n",
+ " [-0.02237314]\n",
+ " [-0.02021751]\n",
+ " [-0.0547075 ]\n",
+ " [-0.00620595]\n",
+ " [-0.01698407]\n",
+ " [ 0.05522933]\n",
+ " [ 0.07678558]\n",
+ " [ 0.01858372]\n",
+ " [-0.02237314]\n",
+ " [ 0.09295276]\n",
+ " [-0.03099563]\n",
+ " [ 0.03906215]\n",
+ " [-0.06117437]\n",
+ " [-0.00836158]\n",
+ " [-0.0374625 ]\n",
+ " [-0.01375064]\n",
+ " [ 0.07355214]\n",
+ " [-0.02452876]\n",
+ " [ 0.03367309]\n",
+ " [ 0.0347509 ]\n",
+ " [-0.03854032]\n",
+ " [-0.03961813]\n",
+ " [-0.00189471]\n",
+ " [-0.03099563]\n",
+ " [-0.046085 ]\n",
+ " [ 0.00133873]\n",
+ " [ 0.06492964]\n",
+ " [ 0.04013997]\n",
+ " [-0.02345095]\n",
+ " [ 0.05307371]\n",
+ " [ 0.04013997]\n",
+ " [-0.02021751]\n",
+ " [ 0.01427248]\n",
+ " [-0.03422907]\n",
+ " [ 0.00672779]\n",
+ " [ 0.00457217]\n",
+ " [ 0.03043966]\n",
+ " [ 0.0519959 ]\n",
+ " [ 0.06169621]\n",
+ " [-0.00728377]\n",
+ " [ 0.00564998]\n",
+ " [ 0.05415152]\n",
+ " [-0.00836158]\n",
+ " [ 0.114509 ]\n",
+ " [ 0.06708527]\n",
+ " [-0.05578531]\n",
+ " [ 0.03043966]\n",
+ " [-0.02560657]\n",
+ " [ 0.10480869]\n",
+ " [-0.00620595]\n",
+ " [-0.04716281]\n",
+ " [-0.04824063]\n",
+ " [ 0.08540807]\n",
+ " [-0.01267283]\n",
+ " [-0.03315126]\n",
+ " [-0.00728377]\n",
+ " [-0.01375064]\n",
+ " [ 0.05954058]\n",
+ " [ 0.02181716]\n",
+ " [ 0.01858372]\n",
+ " [-0.01159501]\n",
+ " [-0.00297252]\n",
+ " [ 0.01750591]\n",
+ " [-0.02991782]\n",
+ " [-0.02021751]\n",
+ " [-0.05794093]\n",
+ " [ 0.06061839]\n",
+ " [-0.04069594]\n",
+ " [-0.07195249]\n",
+ " [-0.05578531]\n",
+ " [ 0.04552903]\n",
+ " [-0.00943939]\n",
+ " [-0.03315126]\n",
+ " [ 0.04984027]\n",
+ " [-0.08488624]\n",
+ " [ 0.00564998]\n",
+ " [ 0.02073935]\n",
+ " [-0.00728377]\n",
+ " [ 0.10480869]\n",
+ " [-0.02452876]\n",
+ " [-0.00620595]\n",
+ " [-0.03854032]\n",
+ " [ 0.13714305]\n",
+ " [ 0.17055523]\n",
+ " [ 0.00241654]\n",
+ " [ 0.03798434]\n",
+ " [-0.05794093]\n",
+ " [-0.00943939]\n",
+ " [-0.02345095]\n",
+ " [-0.0105172 ]\n",
+ " [-0.03422907]\n",
+ " [-0.00297252]\n",
+ " [ 0.06816308]\n",
+ " [ 0.00996123]\n",
+ " [ 0.00241654]\n",
+ " [-0.03854032]\n",
+ " [ 0.02612841]\n",
+ " [-0.08919748]\n",
+ " [ 0.06061839]\n",
+ " [-0.02884001]\n",
+ " [-0.02991782]\n",
+ " [-0.0191397 ]\n",
+ " [-0.04069594]\n",
+ " [ 0.01535029]\n",
+ " [-0.02452876]\n",
+ " [ 0.00133873]\n",
+ " [ 0.06924089]\n",
+ " [-0.06979687]\n",
+ " [-0.02991782]\n",
+ " [-0.046085 ]\n",
+ " [ 0.01858372]\n",
+ " [ 0.00133873]\n",
+ " [-0.03099563]\n",
+ " [-0.00405033]\n",
+ " [ 0.01535029]\n",
+ " [ 0.02289497]\n",
+ " [ 0.04552903]\n",
+ " [-0.04500719]\n",
+ " [-0.03315126]\n",
+ " [ 0.097264 ]\n",
+ " [ 0.05415152]\n",
+ " [ 0.12313149]\n",
+ " [-0.08057499]\n",
+ " [ 0.09295276]\n",
+ " [-0.05039625]\n",
+ " [-0.01159501]\n",
+ " [-0.0277622 ]\n",
+ " [ 0.05846277]\n",
+ " [ 0.08540807]\n",
+ " [-0.00081689]\n",
+ " [ 0.00672779]\n",
+ " [ 0.00888341]\n",
+ " [ 0.08001901]\n",
+ " [ 0.07139652]\n",
+ " [-0.02452876]\n",
+ " [-0.0547075 ]\n",
+ " [-0.03638469]\n",
+ " [ 0.0164281 ]\n",
+ " [ 0.07786339]\n",
+ " [-0.03961813]\n",
+ " [ 0.01103904]\n",
+ " [-0.04069594]\n",
+ " [-0.03422907]\n",
+ " [ 0.00564998]\n",
+ " [ 0.08864151]\n",
+ " [-0.03315126]\n",
+ " [-0.05686312]\n",
+ " [-0.03099563]\n",
+ " [ 0.05522933]\n",
+ " [-0.06009656]\n",
+ " [ 0.00133873]\n",
+ " [-0.02345095]\n",
+ " [-0.07410811]\n",
+ " [ 0.01966154]\n",
+ " [-0.01590626]\n",
+ " [-0.01590626]\n",
+ " [ 0.03906215]\n",
+ " [-0.0730303 ]]\n"
+ ]
+ }
+ ],
+ "source": [
+ "#Reshaping to get a 2D array\n",
+ "X = X.reshape(-1, 1)\n",
+ "print(X.shape)\n",
+ "print(X)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "将训练和测试数据分别拆分为 `X` 和 `y`\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_train, X_test, y_train, y_test = model_selection.train_test_split(X, y, test_size=0.33)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "选择模型并用训练数据进行拟合\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "LinearRegression() In a Jupyter environment, please rerun this cell to show the HTML representation or trust the notebook. On GitHub, the HTML representation is unable to render, please try loading this page with nbviewer.org. "
+ ],
+ "text/plain": [
+ "LinearRegression()"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "model = linear_model.LinearRegression()\n",
+ "model.fit(X_train, y_train)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "使用测试数据预测一条线\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "y_pred = model.predict(X_test)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "显示结果在图中\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.scatter(X_test, y_test, color='black')\n",
+ "plt.plot(X_test, y_pred, color='blue', linewidth=3)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.1"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "16ff1a974f6e4348e869e4a7d366b86a",
+ "translation_date": "2025-09-03T19:39:45+00:00",
+ "source_file": "2-Regression/1-Tools/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/2-Data/README.md b/translations/zh-CN/2-Regression/2-Data/README.md
new file mode 100644
index 000000000..1b56a35dd
--- /dev/null
+++ b/translations/zh-CN/2-Regression/2-Data/README.md
@@ -0,0 +1,217 @@
+# 使用 Scikit-learn 构建回归模型:准备和可视化数据
+
+
+
+信息图作者:[Dasani Madipalli](https://twitter.com/dasani_decoded)
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+> ### [本课程也提供 R 版本!](../../../../2-Regression/2-Data/solution/R/lesson_2.html)
+
+## 简介
+
+现在你已经准备好使用 Scikit-learn 开始构建机器学习模型,可以开始向数据提出问题了。在处理数据并应用机器学习解决方案时,了解如何提出正确的问题以充分挖掘数据的潜力非常重要。
+
+在本课中,你将学习:
+
+- 如何为模型构建准备数据。
+- 如何使用 Matplotlib 进行数据可视化。
+
+## 向数据提出正确的问题
+
+你需要回答的问题将决定你使用哪种类型的机器学习算法。而你得到答案的质量将很大程度上取决于数据的性质。
+
+看看为本课提供的[数据](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/data/US-pumpkins.csv)。你可以在 VS Code 中打开这个 .csv 文件。快速浏览会发现其中有空白值,还有字符串和数值数据的混合。此外,还有一个名为“Package”的奇怪列,其中的数据是“sacks”、“bins”和其他值的混合。事实上,这些数据有点混乱。
+
+[](https://youtu.be/5qGjczWTrDQ "机器学习入门 - 如何分析和清理数据集")
+
+> 🎥 点击上方图片观看准备本课数据的简短视频。
+
+事实上,很少会直接获得一个完全准备好用于创建机器学习模型的数据集。在本课中,你将学习如何使用标准 Python 库准备原始数据集。你还将学习各种数据可视化技术。
+
+## 案例研究:“南瓜市场”
+
+在本文件夹中,你会发现根目录 `data` 文件夹中有一个名为 [US-pumpkins.csv](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/data/US-pumpkins.csv) 的 .csv 文件,其中包含关于南瓜市场的 1757 行数据,这些数据按城市分组。这是从美国农业部发布的[特种作物终端市场标准报告](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice)中提取的原始数据。
+
+### 准备数据
+
+这些数据属于公共领域。可以从 USDA 网站按城市下载多个单独的文件。为了避免过多的单独文件,我们将所有城市数据合并到一个电子表格中,因此我们已经对数据进行了部分_准备_。接下来,让我们仔细看看这些数据。
+
+### 南瓜数据 - 初步结论
+
+你对这些数据有什么发现?你可能已经注意到其中有字符串、数字、空白和一些需要理解的奇怪值。
+
+使用回归技术,你可以向这些数据提出什么问题?比如“预测某个月份出售南瓜的价格”。再次查看数据,你需要进行一些更改以创建适合任务的数据结构。
+
+## 练习 - 分析南瓜数据
+
+让我们使用 [Pandas](https://pandas.pydata.org/)(名称代表 `Python Data Analysis`),一个非常有用的数据处理工具,来分析和准备这些南瓜数据。
+
+### 首先,检查缺失日期
+
+你首先需要采取步骤检查是否有缺失日期:
+
+1. 将日期转换为月份格式(这些是美国日期,格式为 `MM/DD/YYYY`)。
+2. 提取月份到一个新列。
+
+在 Visual Studio Code 中打开 _notebook.ipynb_ 文件,并将电子表格导入到一个新的 Pandas 数据框中。
+
+1. 使用 `head()` 函数查看前五行。
+
+ ```python
+ import pandas as pd
+ pumpkins = pd.read_csv('../data/US-pumpkins.csv')
+ pumpkins.head()
+ ```
+
+ ✅ 你会使用什么函数来查看最后五行?
+
+1. 检查当前数据框中是否有缺失数据:
+
+ ```python
+ pumpkins.isnull().sum()
+ ```
+
+ 存在缺失数据,但可能对当前任务没有影响。
+
+1. 为了让数据框更易于操作,使用 `loc` 函数选择你需要的列。`loc` 函数从原始数据框中提取一组行(作为第一个参数传递)和列(作为第二个参数传递)。下面的表达式 `:` 表示“所有行”。
+
+ ```python
+ columns_to_select = ['Package', 'Low Price', 'High Price', 'Date']
+ pumpkins = pumpkins.loc[:, columns_to_select]
+ ```
+
+### 其次,确定南瓜的平均价格
+
+思考如何确定某个月份南瓜的平均价格。你会选择哪些列来完成这个任务?提示:你需要 3 列。
+
+解决方案:取 `Low Price` 和 `High Price` 列的平均值来填充新的 Price 列,并将 Date 列转换为仅显示月份。幸运的是,根据上面的检查,日期和价格没有缺失数据。
+
+1. 要计算平均值,添加以下代码:
+
+ ```python
+ price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2
+
+ month = pd.DatetimeIndex(pumpkins['Date']).month
+
+ ```
+
+ ✅ 随时使用 `print(month)` 打印任何数据以进行检查。
+
+2. 现在,将转换后的数据复制到一个新的 Pandas 数据框中:
+
+ ```python
+ new_pumpkins = pd.DataFrame({'Month': month, 'Package': pumpkins['Package'], 'Low Price': pumpkins['Low Price'],'High Price': pumpkins['High Price'], 'Price': price})
+ ```
+
+ 打印出你的数据框会显示一个干净整洁的数据集,你可以用它来构建新的回归模型。
+
+### 等等!这里有些奇怪的地方
+
+如果你查看 `Package` 列,南瓜以许多不同的配置出售。有些以“1 1/9 bushel”计量,有些以“1/2 bushel”计量,有些按南瓜个数出售,有些按磅出售,还有些以不同宽度的大箱子出售。
+
+> 南瓜似乎很难一致地称重
+
+深入研究原始数据,发现 `Unit of Sale` 等于 'EACH' 或 'PER BIN' 的数据,其 `Package` 类型也为每英寸、每箱或“每个”。南瓜似乎很难一致地称重,因此我们通过选择 `Package` 列中包含字符串 'bushel' 的南瓜来进行过滤。
+
+1. 在文件顶部的初始 .csv 导入下添加过滤器:
+
+ ```python
+ pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]
+ ```
+
+ 如果现在打印数据,你会发现只剩下约 415 行按 bushel 销售的南瓜数据。
+
+### 等等!还有一件事要做
+
+你是否注意到每行的 bushel 数量不同?你需要对价格进行标准化,以显示每 bushel 的价格,因此需要进行一些数学计算来统一标准。
+
+1. 在创建 new_pumpkins 数据框的代码块后添加以下代码:
+
+ ```python
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/(1 + 1/9)
+
+ new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price/(1/2)
+ ```
+
+✅ 根据 [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308),bushel 的重量取决于农产品的类型,因为它是一个体积测量单位。“例如,一 bushel 的番茄应该重 56 磅……叶子和绿叶占据更多空间但重量较轻,因此一 bushel 的菠菜只有 20 磅。”这非常复杂!我们不必进行 bushel 到磅的转换,而是按 bushel 定价。然而,所有这些关于南瓜 bushel 的研究表明,了解数据的性质是多么重要!
+
+现在,你可以根据 bushel 测量分析每单位的定价。如果再打印一次数据,你会看到它已经标准化。
+
+✅ 你是否注意到按半 bushel 销售的南瓜非常昂贵?你能找出原因吗?提示:小南瓜比大南瓜贵得多,可能是因为每 bushel 中小南瓜的数量更多,而大空心南瓜占据了更多未使用的空间。
+
+## 可视化策略
+
+数据科学家的部分职责是展示他们正在处理的数据的质量和性质。为此,他们通常会创建有趣的可视化,例如图表、图形和表格,展示数据的不同方面。通过这种方式,他们能够直观地展示关系和差距,这些关系和差距可能很难通过其他方式发现。
+
+[](https://youtu.be/SbUkxH6IJo0 "机器学习入门 - 如何使用 Matplotlib 可视化数据")
+
+> 🎥 点击上方图片观看本课数据可视化的简短视频。
+
+可视化还可以帮助确定最适合数据的机器学习技术。例如,一个看起来沿着一条线分布的散点图表明数据非常适合线性回归练习。
+
+一个在 Jupyter 笔记本中表现良好的数据可视化库是 [Matplotlib](https://matplotlib.org/)(你在上一课中也见过它)。
+
+> 在[这些教程](https://docs.microsoft.com/learn/modules/explore-analyze-data-with-python?WT.mc_id=academic-77952-leestott)中获得更多数据可视化经验。
+
+## 练习 - 试验 Matplotlib
+
+尝试创建一些基本图表来显示你刚刚创建的新数据框。基本折线图会显示什么?
+
+1. 在文件顶部的 Pandas 导入下导入 Matplotlib:
+
+ ```python
+ import matplotlib.pyplot as plt
+ ```
+
+1. 重新运行整个笔记本以刷新。
+1. 在笔记本底部添加一个单元格,将数据绘制为一个框图:
+
+ ```python
+ price = new_pumpkins.Price
+ month = new_pumpkins.Month
+ plt.scatter(price, month)
+ plt.show()
+ ```
+
+ 
+
+ 这是一个有用的图表吗?它是否让你感到惊讶?
+
+ 它并不是特别有用,因为它只是显示了某个月份的数据点分布。
+
+### 让它更有用
+
+为了让图表显示有用的数据,你通常需要以某种方式对数据进行分组。让我们尝试创建一个图表,其中 y 轴显示月份,数据展示数据的分布。
+
+1. 添加一个单元格以创建分组柱状图:
+
+ ```python
+ new_pumpkins.groupby(['Month'])['Price'].mean().plot(kind='bar')
+ plt.ylabel("Pumpkin Price")
+ ```
+
+ 
+
+ 这是一个更有用的数据可视化!它似乎表明南瓜的最高价格出现在九月和十月。这符合你的预期吗?为什么?
+
+---
+
+## 🚀挑战
+
+探索 Matplotlib 提供的不同类型的可视化。哪些类型最适合回归问题?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+看看可视化数据的各种方法。列出可用的各种库,并记录哪些库最适合特定类型的任务,例如 2D 可视化与 3D 可视化。你发现了什么?
+
+## 作业
+
+[探索可视化](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/2-Data/assignment.md b/translations/zh-CN/2-Regression/2-Data/assignment.md
new file mode 100644
index 000000000..aa863f272
--- /dev/null
+++ b/translations/zh-CN/2-Regression/2-Data/assignment.md
@@ -0,0 +1,14 @@
+# 探索可视化
+
+有许多不同的库可用于数据可视化。在本课中使用南瓜数据,在示例笔记本中使用 matplotlib 和 seaborn 创建一些可视化。哪些库更容易使用?
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | --------- | -------- | ----------------- |
+| | 提交的笔记本包含两个探索/可视化 | 提交的笔记本包含一个探索/可视化 | 未提交笔记本 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/2-Data/notebook.ipynb b/translations/zh-CN/2-Regression/2-Data/notebook.ipynb
new file mode 100644
index 000000000..4746353e3
--- /dev/null
+++ b/translations/zh-CN/2-Regression/2-Data/notebook.ipynb
@@ -0,0 +1,46 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3-final"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3",
+ "language": "python"
+ },
+ "coopTranslator": {
+ "original_hash": "1b2ab303ac6c604a34c6ca7a49077fc7",
+ "translation_date": "2025-09-03T19:44:44+00:00",
+ "source_file": "2-Regression/2-Data/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/2-Data/solution/Julia/README.md b/translations/zh-CN/2-Regression/2-Data/solution/Julia/README.md
new file mode 100644
index 000000000..f30fc4eeb
--- /dev/null
+++ b/translations/zh-CN/2-Regression/2-Data/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/2-Data/solution/R/lesson_2-R.ipynb b/translations/zh-CN/2-Regression/2-Data/solution/R/lesson_2-R.ipynb
new file mode 100644
index 000000000..9cb696e33
--- /dev/null
+++ b/translations/zh-CN/2-Regression/2-Data/solution/R/lesson_2-R.ipynb
@@ -0,0 +1,670 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {
+ "colab": {
+ "name": "lesson_2-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ },
+ "coopTranslator": {
+ "original_hash": "f3c335f9940cfd76528b3ef918b9b342",
+ "translation_date": "2025-09-03T19:50:05+00:00",
+ "source_file": "2-Regression/2-Data/solution/R/lesson_2-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 构建回归模型:准备和可视化数据\n",
+ "\n",
+ "## **南瓜线性回归 - 第二课**\n",
+ "#### 介绍\n",
+ "\n",
+ "现在你已经准备好了使用Tidymodels和Tidyverse来构建机器学习模型的工具,可以开始对数据提出问题了。在处理数据并应用机器学习解决方案时,正确提出问题以充分挖掘数据的潜力是非常重要的。\n",
+ "\n",
+ "在本课中,你将学习:\n",
+ "\n",
+ "- 如何为模型构建准备数据。\n",
+ "\n",
+ "- 如何使用`ggplot2`进行数据可视化。\n",
+ "\n",
+ "你需要回答的问题将决定你使用哪种类型的机器学习算法。而你得到的答案质量将很大程度上取决于数据的性质。\n",
+ "\n",
+ "让我们通过一个实际练习来看看这一点。\n",
+ "\n",
+ "\n",
+ " \n",
+ " 艺术作品由 @allison_horst 提供 \n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "Pg5aexcOPqAZ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 1. 导入南瓜数据并召唤 Tidyverse\n",
+ "\n",
+ "我们需要以下软件包来完成本课程的分析和处理:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个 [R 软件包集合](https://www.tidyverse.org/packages),旨在让数据科学更快速、更简单、更有趣!\n",
+ "\n",
+ "你可以通过以下方式安装它们:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\"))`\n",
+ "\n",
+ "下面的脚本会检查你是否已经安装了完成本模块所需的软件包,并在缺少时为你安装它们。\n"
+ ],
+ "metadata": {
+ "id": "dc5WhyVdXAjR"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "suppressWarnings(if(!require(\"pacman\")) install.packages(\"pacman\"))\n",
+ "pacman::p_load(tidyverse)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "GqPYUZgfXOBt"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在,让我们启动一些软件包并加载为本课程提供的[数据](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/data/US-pumpkins.csv)!\n"
+ ],
+ "metadata": {
+ "id": "kvjDTPDSXRr2"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load the core Tidyverse packages\n",
+ "library(tidyverse)\n",
+ "\n",
+ "# Import the pumpkins data\n",
+ "pumpkins <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv\")\n",
+ "\n",
+ "\n",
+ "# Get a glimpse and dimensions of the data\n",
+ "glimpse(pumpkins)\n",
+ "\n",
+ "\n",
+ "# Print the first 50 rows of the data set\n",
+ "pumpkins %>% \n",
+ " slice_head(n =50)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "VMri-t2zXqgD"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "一个快速的 `glimpse()` 立即显示出数据中存在空值,并且混合了字符串 (`chr`) 和数值数据 (`dbl`)。`Date` 是字符类型,还有一个奇怪的列叫做 `Package`,其中数据是 `sacks`、`bins` 和其他值的混合。事实上,这些数据有点乱 😤。\n",
+ "\n",
+ "实际上,很少会直接获得一个完全准备好用于创建机器学习模型的数据集。但别担心,在本节课中,你将学习如何使用标准的 R 库来准备一个原始数据集 🧑🔧。你还将学习各种技术来可视化数据。📈📊\n",
+ " \n",
+ "\n",
+ "> 温故知新:管道操作符 (`%>%`) 按逻辑顺序执行操作,将一个对象向前传递到函数或调用表达式中。你可以将管道操作符理解为代码中的“然后”。\n"
+ ],
+ "metadata": {
+ "id": "REWcIv9yX29v"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 2. 检查缺失数据\n",
+ "\n",
+ "数据科学家经常需要处理的一个常见问题是数据不完整或缺失。R使用特殊的哨兵值`NA`(Not Available)来表示缺失或未知的值。\n",
+ "\n",
+ "那么我们如何知道数据框中是否包含缺失值呢?\n",
+ " \n",
+ "- 一个直接的方法是使用R的基础函数`anyNA`,它会返回逻辑对象`TRUE`或`FALSE`\n"
+ ],
+ "metadata": {
+ "id": "Zxfb3AM5YbUe"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "pumpkins %>% \n",
+ " anyNA()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "G--DQutAYltj"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "太好了,看来有一些数据缺失!这是一个不错的起点。\n",
+ "\n",
+ "- 另一种方法是使用函数 `is.na()`,它通过逻辑值 `TRUE` 来指示哪些单个列元素是缺失的。\n"
+ ],
+ "metadata": {
+ "id": "mU-7-SB6YokF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "pumpkins %>% \n",
+ " is.na() %>% \n",
+ " head(n = 7)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "W-DxDOR4YxSW"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "对于如此大的数据框,逐行逐列地检查显然效率低下,几乎不可能完成😴。\n",
+ "\n",
+ "- 更直观的方法是计算每列中缺失值的总和:\n"
+ ],
+ "metadata": {
+ "id": "xUWxipKYY0o7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "pumpkins %>% \n",
+ " is.na() %>% \n",
+ " colSums()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "ZRBWV6P9ZArL"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "更棒了!虽然有些数据缺失,但可能对当前任务影响不大。让我们看看进一步的分析会带来什么结果。\n",
+ "\n",
+ "> 除了强大的包和函数集合,R 还拥有非常优秀的文档支持。例如,可以使用 `help(colSums)` 或 `?colSums` 来了解更多关于该函数的信息。\n"
+ ],
+ "metadata": {
+ "id": "9gv-crB6ZD1Y"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 3. Dplyr:数据操作的语法\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 插图作者:@allison_horst \n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "o4jLY5-VZO2C"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "[`dplyr`](https://dplyr.tidyverse.org/) 是 Tidyverse 中的一个包,它是一种数据操作的语法,提供了一组一致的动词,帮助你解决最常见的数据操作问题。在本节中,我们将探索一些 dplyr 的动词! \n",
+ " \n"
+ ],
+ "metadata": {
+ "id": "i5o33MQBZWWw"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#### dplyr::select()\n",
+ "\n",
+ "`select()` 是 `dplyr` 包中的一个函数,用于选择保留或排除特定的列。\n",
+ "\n",
+ "为了让数据框更易于操作,可以使用 `select()` 删除一些列,仅保留你需要的列。\n",
+ "\n",
+ "例如,在这个练习中,我们的分析将涉及 `Package`、`Low Price`、`High Price` 和 `Date` 这些列。让我们选择这些列吧。\n"
+ ],
+ "metadata": {
+ "id": "x3VGMAGBZiUr"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Select desired columns\n",
+ "pumpkins <- pumpkins %>% \n",
+ " select(Package, `Low Price`, `High Price`, Date)\n",
+ "\n",
+ "\n",
+ "# Print data set\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "F_FgxQnVZnM0"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#### dplyr::mutate()\n",
+ "\n",
+ "`mutate()` 是 `dplyr` 包中的一个函数,用于创建或修改列,同时保留现有的列。\n",
+ "\n",
+ "`mutate` 的一般结构是:\n",
+ "\n",
+ "`data %>% mutate(new_column_name = what_it_contains)`\n",
+ "\n",
+ "让我们通过以下操作来尝试使用 `mutate` 对 `Date` 列进行处理:\n",
+ "\n",
+ "1. 将日期(目前是字符类型)转换为月份格式(这些是美国日期格式,因此格式为 `MM/DD/YYYY`)。\n",
+ "\n",
+ "2. 从日期中提取月份到一个新列。\n",
+ "\n",
+ "在 R 中,[lubridate](https://lubridate.tidyverse.org/) 包可以更轻松地处理日期时间数据。因此,让我们使用 `dplyr::mutate()`、`lubridate::mdy()` 和 `lubridate::month()` 来实现上述目标。我们可以删除 `Date` 列,因为在后续操作中不再需要它。\n"
+ ],
+ "metadata": {
+ "id": "2KKo0Ed9Z1VB"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load lubridate\n",
+ "library(lubridate)\n",
+ "\n",
+ "pumpkins <- pumpkins %>% \n",
+ " # Convert the Date column to a date object\n",
+ " mutate(Date = mdy(Date)) %>% \n",
+ " # Extract month from Date\n",
+ " mutate(Month = month(Date)) %>% \n",
+ " # Drop Date column\n",
+ " select(-Date)\n",
+ "\n",
+ "# View the first few rows\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 7)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "5joszIVSZ6xe"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "哇哦!🤩\n",
+ "\n",
+ "接下来,让我们创建一个新的列 `Price`,表示南瓜的平均价格。现在,我们将 `Low Price` 和 `High Price` 列的平均值计算出来,用来填充新的 Price 列。\n",
+ " \n"
+ ],
+ "metadata": {
+ "id": "nIgLjNMCZ-6Y"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Create a new column Price\n",
+ "pumpkins <- pumpkins %>% \n",
+ " mutate(Price = (`Low Price` + `High Price`)/2)\n",
+ "\n",
+ "# View the first few rows of the data\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "Zo0BsqqtaJw2"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "耶!💪\n",
+ "\n",
+ "“等等!”你可能会在用 `View(pumpkins)` 浏览整个数据集后说,“这里有点奇怪!”🤔\n",
+ "\n",
+ "如果你查看 `Package` 列,会发现南瓜是以多种不同的方式出售的。有些是按 `1 1/9 蒲式耳` 计量出售的,有些是按 `1/2 蒲式耳` 计量出售的,有些是按个数出售的,有些是按重量(磅)出售的,还有一些是装在宽度各异的大箱子里出售的。\n",
+ "\n",
+ "让我们来验证一下:\n"
+ ],
+ "metadata": {
+ "id": "p77WZr-9aQAR"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Verify the distinct observations in Package column\n",
+ "pumpkins %>% \n",
+ " distinct(Package)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "XISGfh0IaUy6"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "太棒了!👏\n",
+ "\n",
+ "南瓜似乎很难保持一致的称重,因此我们可以通过筛选 `Package` 列中包含字符串 *bushel* 的南瓜来过滤它们,并将结果放入一个新的数据框 `new_pumpkins` 中。\n"
+ ],
+ "metadata": {
+ "id": "7sMjiVujaZxY"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#### dplyr::filter() 和 stringr::str_detect()\n",
+ "\n",
+ "[`dplyr::filter()`](https://dplyr.tidyverse.org/reference/filter.html):创建一个数据子集,仅包含满足条件的**行**,在本例中是 `Package` 列中包含字符串 *bushel* 的南瓜。\n",
+ "\n",
+ "[stringr::str_detect()](https://stringr.tidyverse.org/reference/str_detect.html):检测字符串中是否存在某个模式。\n",
+ "\n",
+ "[`stringr`](https://github.com/tidyverse/stringr) 包提供了用于常见字符串操作的简单函数。\n"
+ ],
+ "metadata": {
+ "id": "L8Qfcs92ageF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Retain only pumpkins with \"bushel\"\n",
+ "new_pumpkins <- pumpkins %>% \n",
+ " filter(str_detect(Package, \"bushel\"))\n",
+ "\n",
+ "# Get the dimensions of the new data\n",
+ "dim(new_pumpkins)\n",
+ "\n",
+ "# View a few rows of the new data\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "hy_SGYREampd"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "你可以看到我们已经缩小到大约415行左右的数据,这些数据包含了按蒲式耳计算的南瓜。🤩 \n"
+ ],
+ "metadata": {
+ "id": "VrDwF031avlR"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#### dplyr::case_when()\n",
+ "\n",
+ "**但等等!还有一件事要做**\n",
+ "\n",
+ "你是否注意到每行的蒲式耳数量是不同的?你需要将价格标准化,以显示每蒲式耳的价格,而不是每1 1/9或1/2蒲式耳的价格。是时候做一些数学运算来进行标准化了。\n",
+ "\n",
+ "我们将使用函数[`case_when()`](https://dplyr.tidyverse.org/reference/case_when.html)根据一些条件来*变更*价格列的值。`case_when`允许你将多个`if_else()`语句向量化处理。\n"
+ ],
+ "metadata": {
+ "id": "mLpw2jH4a0tx"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Convert the price if the Package contains fractional bushel values\n",
+ "new_pumpkins <- new_pumpkins %>% \n",
+ " mutate(Price = case_when(\n",
+ " str_detect(Package, \"1 1/9\") ~ Price/(1 + 1/9),\n",
+ " str_detect(Package, \"1/2\") ~ Price/(1/2),\n",
+ " TRUE ~ Price))\n",
+ "\n",
+ "# View the first few rows of the data\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 30)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "P68kLVQmbM6I"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在,我们可以根据蒲式耳的测量来分析每单位的定价。然而,所有这些关于南瓜蒲式耳的研究都表明,`了解数据的本质`是多么`重要`!\n",
+ "\n",
+ "> ✅ 根据 [The Spruce Eats](https://www.thespruceeats.com/how-much-is-a-bushel-1389308),蒲式耳的重量取决于农产品的类型,因为它是一种体积测量单位。“例如,一个番茄的蒲式耳应该重56磅……叶类和绿叶蔬菜占据更多空间但重量较轻,所以一个菠菜的蒲式耳只有20磅。”这真的很复杂!我们不必费心将蒲式耳转换为磅,而是直接按蒲式耳定价。然而,所有这些关于南瓜蒲式耳的研究都表明,了解数据的本质是多么重要!\n",
+ "\n",
+ "> ✅ 你注意到按半蒲式耳出售的南瓜非常贵吗?你能找出原因吗?提示:小南瓜比大南瓜贵得多,可能是因为每蒲式耳的小南瓜数量更多,而一个大的空心派南瓜占据了更多未使用的空间。\n"
+ ],
+ "metadata": {
+ "id": "pS2GNPagbSdb"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在最后,为了冒险的乐趣 💁♀️,我们还将“Month”列移动到第一个位置,也就是在“Package”列之前。\n",
+ "\n",
+ "`dplyr::relocate()` 用于更改列的位置。\n"
+ ],
+ "metadata": {
+ "id": "qql1SowfbdnP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Create a new data frame new_pumpkins\n",
+ "new_pumpkins <- new_pumpkins %>% \n",
+ " relocate(Month, .before = Package)\n",
+ "\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 7)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "JJ1x6kw8bixF"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "干得好!👌 现在你有一个干净整洁的数据集,可以用来构建新的回归模型! \n"
+ ],
+ "metadata": {
+ "id": "y8TJ0Za_bn5Y"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 4. 使用 ggplot2 进行数据可视化\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 信息图表作者:Dasani Madipalli \n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "有一句*智慧*的名言是这样说的:\n",
+ "\n",
+ "> “简单的图表比任何其他工具都能为数据分析师带来更多的信息。” --- John Tukey\n",
+ "\n",
+ "数据科学家的职责之一是展示他们所处理数据的质量和特性。为此,他们通常会创建有趣的可视化内容,比如图表、折线图和柱状图,来展示数据的不同方面。通过这种方式,他们能够直观地展示数据中的关系和差距,这些信息通常难以通过其他方式发现。\n",
+ "\n",
+ "可视化还可以帮助确定最适合数据的机器学习技术。例如,一个看起来沿着直线分布的散点图表明该数据非常适合线性回归分析。\n",
+ "\n",
+ "R 提供了多种绘图系统,而 [`ggplot2`](https://ggplot2.tidyverse.org/index.html) 是其中最优雅且最灵活的一个。`ggplot2` 允许你通过**组合独立组件**来构建图表。\n",
+ "\n",
+ "我们先从一个简单的散点图开始,展示 Price 和 Month 列的数据。\n",
+ "\n",
+ "在这个例子中,我们将从 [`ggplot()`](https://ggplot2.tidyverse.org/reference/ggplot.html) 开始,提供一个数据集和美学映射(使用 [`aes()`](https://ggplot2.tidyverse.org/reference/aes.html)),然后添加图层(例如用于散点图的 [`geom_point()`](https://ggplot2.tidyverse.org/reference/geom_point.html))。\n"
+ ],
+ "metadata": {
+ "id": "mYSH6-EtbvNa"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Set a theme for the plots\n",
+ "theme_set(theme_light())\n",
+ "\n",
+ "# Create a scatter plot\n",
+ "p <- ggplot(data = new_pumpkins, aes(x = Price, y = Month))\n",
+ "p + geom_point()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "g2YjnGeOcLo4"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "这个图表有用吗🤷?有没有什么让你感到惊讶的地方?\n",
+ "\n",
+ "它并不是特别有用,因为它只是将你的数据以某个月的点状分布显示出来。\n",
+ " \n"
+ ],
+ "metadata": {
+ "id": "Ml7SDCLQcPvE"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### **如何让它更有用?**\n",
+ "\n",
+ "为了让图表显示有用的数据,通常需要以某种方式对数据进行分组。例如,在我们的案例中,计算每个月南瓜的平均价格可以为数据中的潜在模式提供更多洞察。这引导我们了解另一个 **dplyr** 的功能:\n",
+ "\n",
+ "#### `dplyr::group_by() %>% summarize()`\n",
+ "\n",
+ "在 R 中可以轻松计算分组聚合:\n",
+ "\n",
+ "`dplyr::group_by() %>% summarize()`\n",
+ "\n",
+ "- `dplyr::group_by()` 将分析单位从整个数据集更改为单个组,例如按月分组。\n",
+ "\n",
+ "- `dplyr::summarize()` 创建一个新的数据框,其中每个分组变量有一列,以及每个指定的汇总统计量有一列。\n",
+ "\n",
+ "例如,我们可以使用 `dplyr::group_by() %>% summarize()` 将南瓜按 **Month** 列分组,然后计算每个月的 **平均价格**。\n"
+ ],
+ "metadata": {
+ "id": "jMakvJZIcVkh"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Find the average price of pumpkins per month\r\n",
+ "new_pumpkins %>%\r\n",
+ " group_by(Month) %>% \r\n",
+ " summarise(mean_price = mean(Price))"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "6kVSUa2Bcilf"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "简洁明了!✨\n",
+ "\n",
+ "像月份这样的分类特征更适合用柱状图来表示 📊。负责绘制柱状图的图层是 `geom_bar()` 和 `geom_col()`。查看 `?geom_bar` 以了解更多信息。\n",
+ "\n",
+ "让我们来试试吧!\n"
+ ],
+ "metadata": {
+ "id": "Kds48GUBcj3W"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Find the average price of pumpkins per month then plot a bar chart\r\n",
+ "new_pumpkins %>%\r\n",
+ " group_by(Month) %>% \r\n",
+ " summarise(mean_price = mean(Price)) %>% \r\n",
+ " ggplot(aes(x = Month, y = mean_price)) +\r\n",
+ " geom_col(fill = \"midnightblue\", alpha = 0.7) +\r\n",
+ " ylab(\"Pumpkin Price\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "VNbU1S3BcrxO"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤩🤩这是一个更有用的数据可视化!它似乎表明南瓜的最高价格出现在九月和十月。这符合你的预期吗?为什么符合或不符合?\n",
+ "\n",
+ "恭喜你完成了第二课 👏!你已经为模型构建准备好了数据,并通过可视化发现了更多的洞察!\n"
+ ],
+ "metadata": {
+ "id": "zDm0VOzzcuzR"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/2-Data/solution/notebook.ipynb b/translations/zh-CN/2-Regression/2-Data/solution/notebook.ipynb
new file mode 100644
index 000000000..3e189f692
--- /dev/null
+++ b/translations/zh-CN/2-Regression/2-Data/solution/notebook.ipynb
@@ -0,0 +1,437 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " City Name \n",
+ " Type \n",
+ " Package \n",
+ " Variety \n",
+ " Sub Variety \n",
+ " Grade \n",
+ " Date \n",
+ " Low Price \n",
+ " High Price \n",
+ " Mostly Low \n",
+ " ... \n",
+ " Unit of Sale \n",
+ " Quality \n",
+ " Condition \n",
+ " Appearance \n",
+ " Storage \n",
+ " Crop \n",
+ " Repack \n",
+ " Trans Mode \n",
+ " Unnamed: 24 \n",
+ " Unnamed: 25 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 70 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 1 1/9 bushel cartons \n",
+ " PIE TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 15.0 \n",
+ " 15.0 \n",
+ " 15.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 71 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 1 1/9 bushel cartons \n",
+ " PIE TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 18.0 \n",
+ " 18.0 \n",
+ " 18.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 72 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 1 1/9 bushel cartons \n",
+ " PIE TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 10/1/16 \n",
+ " 18.0 \n",
+ " 18.0 \n",
+ " 18.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 73 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 1 1/9 bushel cartons \n",
+ " PIE TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 10/1/16 \n",
+ " 17.0 \n",
+ " 17.0 \n",
+ " 17.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 74 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 1 1/9 bushel cartons \n",
+ " PIE TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 10/8/16 \n",
+ " 15.0 \n",
+ " 15.0 \n",
+ " 15.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 26 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City Name Type Package Variety Sub Variety Grade \\\n",
+ "70 BALTIMORE NaN 1 1/9 bushel cartons PIE TYPE NaN NaN \n",
+ "71 BALTIMORE NaN 1 1/9 bushel cartons PIE TYPE NaN NaN \n",
+ "72 BALTIMORE NaN 1 1/9 bushel cartons PIE TYPE NaN NaN \n",
+ "73 BALTIMORE NaN 1 1/9 bushel cartons PIE TYPE NaN NaN \n",
+ "74 BALTIMORE NaN 1 1/9 bushel cartons PIE TYPE NaN NaN \n",
+ "\n",
+ " Date Low Price High Price Mostly Low ... Unit of Sale Quality \\\n",
+ "70 9/24/16 15.0 15.0 15.0 ... NaN NaN \n",
+ "71 9/24/16 18.0 18.0 18.0 ... NaN NaN \n",
+ "72 10/1/16 18.0 18.0 18.0 ... NaN NaN \n",
+ "73 10/1/16 17.0 17.0 17.0 ... NaN NaN \n",
+ "74 10/8/16 15.0 15.0 15.0 ... NaN NaN \n",
+ "\n",
+ " Condition Appearance Storage Crop Repack Trans Mode Unnamed: 24 \\\n",
+ "70 NaN NaN NaN NaN N NaN NaN \n",
+ "71 NaN NaN NaN NaN N NaN NaN \n",
+ "72 NaN NaN NaN NaN N NaN NaN \n",
+ "73 NaN NaN NaN NaN N NaN NaN \n",
+ "74 NaN NaN NaN NaN N NaN NaN \n",
+ "\n",
+ " Unnamed: 25 \n",
+ "70 NaN \n",
+ "71 NaN \n",
+ "72 NaN \n",
+ "73 NaN \n",
+ "74 NaN \n",
+ "\n",
+ "[5 rows x 26 columns]"
+ ]
+ },
+ "execution_count": 2,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "pumpkins = pd.read_csv('../../data/US-pumpkins.csv')\n",
+ "\n",
+ "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
+ "\n",
+ "pumpkins.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "City Name 0\n",
+ "Type 406\n",
+ "Package 0\n",
+ "Variety 0\n",
+ "Sub Variety 167\n",
+ "Grade 415\n",
+ "Date 0\n",
+ "Low Price 0\n",
+ "High Price 0\n",
+ "Mostly Low 24\n",
+ "Mostly High 24\n",
+ "Origin 0\n",
+ "Origin District 396\n",
+ "Item Size 114\n",
+ "Color 145\n",
+ "Environment 415\n",
+ "Unit of Sale 404\n",
+ "Quality 415\n",
+ "Condition 415\n",
+ "Appearance 415\n",
+ "Storage 415\n",
+ "Crop 415\n",
+ "Repack 0\n",
+ "Trans Mode 415\n",
+ "Unnamed: 24 415\n",
+ "Unnamed: 25 391\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pumpkins.isnull().sum()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " Month Package Low Price High Price Price\n",
+ "70 9 1 1/9 bushel cartons 15.00 15.0 13.50\n",
+ "71 9 1 1/9 bushel cartons 18.00 18.0 16.20\n",
+ "72 10 1 1/9 bushel cartons 18.00 18.0 16.20\n",
+ "73 10 1 1/9 bushel cartons 17.00 17.0 15.30\n",
+ "74 10 1 1/9 bushel cartons 15.00 15.0 13.50\n",
+ "... ... ... ... ... ...\n",
+ "1738 9 1/2 bushel cartons 15.00 15.0 30.00\n",
+ "1739 9 1/2 bushel cartons 13.75 15.0 28.75\n",
+ "1740 9 1/2 bushel cartons 10.75 15.0 25.75\n",
+ "1741 9 1/2 bushel cartons 12.00 12.0 24.00\n",
+ "1742 9 1/2 bushel cartons 12.00 12.0 24.00\n",
+ "\n",
+ "[415 rows x 5 columns]\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "# A set of new columns for a new dataframe. Filter out nonmatching columns\n",
+ "columns_to_select = ['Package', 'Low Price', 'High Price', 'Date']\n",
+ "pumpkins = pumpkins.loc[:, columns_to_select]\n",
+ "\n",
+ "# Get an average between low and high price for the base pumpkin price\n",
+ "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
+ "\n",
+ "# Convert the date to its month only\n",
+ "month = pd.DatetimeIndex(pumpkins['Date']).month\n",
+ "\n",
+ "# Create a new dataframe with this basic data\n",
+ "new_pumpkins = pd.DataFrame({'Month': month, 'Package': pumpkins['Package'], 'Low Price': pumpkins['Low Price'],'High Price': pumpkins['High Price'], 'Price': price})\n",
+ "\n",
+ "# Convert the price if the Package contains fractional bushel values\n",
+ "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/(1 + 1/9)\n",
+ "\n",
+ "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price/(1/2)\n",
+ "\n",
+ "print(new_pumpkins)\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "price = new_pumpkins.Price\n",
+ "month = new_pumpkins.Month\n",
+ "plt.scatter(price, month)\n",
+ "plt.show()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0, 0.5, 'Pumpkin Price')"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAX4AAAEJCAYAAACT/UyFAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjUuMCwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy8/fFQqAAAACXBIWXMAAAsTAAALEwEAmpwYAAARAElEQVR4nO3de5AlZX3G8e8joKigiIwbVNYVQ6ErwcVaiRW0CgUNikEQKxFTijHJahlUSsvUqknE/LVE0KoYNVkDigloNCoQLt5AxUuCLrrhIhqUQgMiLBGE0goR+OWP0+sMszOzZ8ft0zO830/VqTndfc7phwae6XlPX1JVSJLa8aChA0iSJsvil6TGWPyS1BiLX5IaY/FLUmMsfklqzK5DBxjHPvvsU6tWrRo6hiQtK1dcccVtVTU1e/6yKP5Vq1axadOmoWNI0rKS5IdzzXeoR5IaY/FLUmMsfklqjMUvSY2x+CWpMRa/JDXG4pekxlj8ktSYZXECl3auVesvHDoCN2w4eugIUrMsfjXNX4JqkUM9ktQYi1+SGmPxS1JjLH5JaozFL0mNsfglqTEWvyQ1xuKXpMZY/JLUGItfkhpj8UtSYyx+SWqMxS9JjbH4JakxFr8kNcbil6TGWPyS1BiLX5IaY/FLUmMsfklqTG/Fn2S/JF9M8p0k1yR5Yzf/lCQ3JdncPV7YVwZJ0rZ27fGz7wHeXFXfSrIncEWSz3fL3lNVp/W4bknSPHor/qq6Gbi5e35XkmuBx/W1PknSePrc4/+VJKuAQ4DLgcOAk5K8EtjE6K+C2yeRQ9L8Vq2/cOgI3LDh6KEjNKH3L3eT7AF8Eji5qu4EPgA8CVjD6C+C0+d537okm5Js2rJlS98xJakZvRZ/kt0Ylf7ZVfUpgKq6parurar7gA8Ch8713qraWFVrq2rt1NRUnzElqSl9HtUT4Azg2qp694z5+8542XHA1X1lkCRtq88x/sOAVwBXJdnczXsbcEKSNUABNwCv6TGDJGmWPo/q+SqQORZd1Nc6F+IXV5I04pm7ktQYi1+SGmPxS1JjLH5JaozFL0mNsfglqTEWvyQ1xuKXpMZY/JLUGItfkhpj8UtSYyx+SWqMxS9JjbH4JakxFr8kNcbil6TGWPyS1BiLX5IaY/FLUmMsfklqjMUvSY2x+CWpMRa/JDXG4pekxlj8ktQYi1+SGmPxS1JjLH5JakxvxZ9kvyRfTPKdJNckeWM3f+8kn09yXffzUX1lkCRtq889/nuAN1fVauCZwJ8lWQ2sBy6pqgOAS7ppSdKE9Fb8VXVzVX2re34XcC3wOODFwFndy84Cju0rgyRpWxMZ40+yCjgEuBxYUVU3d4t+AqyY5z3rkmxKsmnLli2TiClJTei9+JPsAXwSOLmq7py5rKoKqLneV1Ubq2ptVa2dmprqO6YkNWOs4k/y0CQH7uiHJ9mNUemfXVWf6mbfkmTfbvm+wK07+rmSpMXbbvEn+T1gM/CZbnpNkvPHeF+AM4Brq+rdMxadD5zYPT8ROG8HM0uSfg3j7PGfAhwK3AFQVZuBJ47xvsOAVwDPTbK5e7wQ2AA8L8l1wJHdtCRpQnYd4zW/rKqfjXbgf2XOcfn7vaDqq0DmWXzEGOuVJPVgnOK/JsnLgV2SHAC8Afh6v7EkSX0ZZ6jn9cBTgbuBc4CfASf3mEmS1KPt7vFX1S+At3cPSdIyN85RPZ9PsteM6Ucl+WyvqSRJvRlnqGefqrpj60RV3Q48prdEkqRejVP89yVZuXUiyRMY46geSdLSNM5RPW8Hvprky4wOz3w2sK7XVJKk3ozz5e5nkjyd0aWVYXTNndv6jSVJ6su8Qz1Jntz9fDqwEvhx91jZzZMkLUML7fG/idGQzulzLCvgub0kkiT1at7ir6p1SR4E/EVVfW2CmSRJPVrwqJ6qug/4uwllkSRNwDiHc16S5PjMukqbJGl5Gqf4XwN8Arg7yZ1J7kpy5/beJElamsY5nHPPSQSRJE3GQodzHpDkvCRXJzknyeMmGUyS1I+FhnrOBC4Ajge+Dbx3IokkSb1aaKhnz6r6YPf8XUm+NYlAkqR+LVT8uyc5hOnbJz505nRV+YtAkpahhYr/ZuDdM6Z/MmPaM3claZla6Mzd50wyiCRpMsY5jl+S9ABi8UtSYyx+SWrMOHfgojt56wkzX19Vl/UVSpLUn+0Wf5JTgT8AvgPc280uwOKXpGVonD3+Y4EDq+runrNIkiZgnOK/HtgN2KHiT3Im8CLg1qo6qJt3CvCnwJbuZW+rqot25HMlqW+r1l84dARu2HB0b589TvH/Atic5BJmlH9VvWE77/swo5u4fGTW/PdU1Wk7ElKStPOMU/znd48dUlWXJVm1w4kkSb0a53r8Z+3kdZ6U5JXAJuDNVXX7XC9Kso7Rzd5ZuXLlTo4gSe1a6Hr8H+9+XpXkytmPRa7vA8CTgDWMrgV0+nwvrKqNVbW2qtZOTU0tcnWSpNkW2uN/Y/fzRTtrZVV1y9bnST7I6Hr/kqQJmnePv6pu7p6urqofznwAL1jMypLsO2PyOODqxXyOJGnxxvly9y+T3F1VlwIk+XPgOcDfL/SmJB8FDgf2SXIj8A7g8CRrGJ0AdgOjG7lLkiZonOI/BrggyVuAo4AnAy/e3puq6oQ5Zp+xY/EkSTvbOEf13JbkGOALwBXAS6uqek8mSerFvMWf5C5GQzJbPRjYH3hpkqqqR/QdTpK08y10B649JxlEkjQZ416W+SXAsxj9BfCVqjq3z1CSpP5s90YsSd4PvBa4itHhl69N8r6+g0mS+jHOHv9zgads/UI3yVnANb2mkiT1ZpxbL34fmHmxnP26eZKkZWicPf49gWuTfKObfgawKcn5AFV1TF/hJEk73zjF/1e9p5AkTcw4J3B9GSDJI7j/zdZ/2mMuSVJPxrnZ+jrgr4H/Be4Dwuiwzv37jSZJ6sM4Qz1vAQ6qqtv6DiNJ6t84R/X8gNF9dyVJDwDj7PG/Ffh6ksvZsZutS5KWoHGK/x+ASxmduXtfv3EkSX0bp/h3q6o39Z5EkjQR44zxX5xkXZJ9k+y99dF7MklSL8bZ4996J623zpjn4ZyStEyNcwLXEycRRJI0GeOcwPXKueZX1Ud2fhxJUt/GGep5xoznuwNHAN8CLH5JWobGGep5/czpJHsBH+srkCSpX+Mc1TPbzwHH/SVpmRpnjP/fGB3FA6NfFKuBj/cZSpLUn3HG+E+b8fwe4IdVdWNPeSRJPZu3+JPszugm67/J6HINZ1TVPZMKJknqx0Jj/GcBaxmV/guA0yeSSJLUq4WGelZX1W8BJDkD+MYCr91GkjOBFwG3VtVB3by9gX8BVgE3AL9fVbfveGxJ0mIttMf/y61PFjnE82HgqFnz1gOXVNUBwCXdtCRpghYq/qclubN73AUcvPV5kju398FVdRkw+768L2Y0hET389jFhJYkLd68Qz1VtUsP61tRVTd3z38CrOhhHZKkBSzmBK6doqqK6fMDttFdCnpTkk1btmyZYDJJemCbdPHfkmRfgO7nrfO9sKo2VtXaqlo7NTU1sYCS9EA36eI/Hzixe34icN6E1y9Jzeut+JN8FPh34MAkNyb5Y2AD8Lwk1wFHdtOSpAka55INi1JVJ8yz6Ii+1ilJ2r7BvtyVJA3D4pekxlj8ktQYi1+SGmPxS1JjLH5JaozFL0mNsfglqTEWvyQ1xuKXpMZY/JLUGItfkhpj8UtSYyx+SWqMxS9JjbH4JakxFr8kNcbil6TGWPyS1BiLX5IaY/FLUmMsfklqjMUvSY2x+CWpMRa/JDXG4pekxlj8ktSYXYdYaZIbgLuAe4F7qmrtEDkkqUWDFH/nOVV124Drl6QmOdQjSY0ZqvgL+FySK5KsGyiDJDVpqKGeZ1XVTUkeA3w+yXer6rKZL+h+IawDWLly5RAZJekBaZA9/qq6qft5K/Bp4NA5XrOxqtZW1dqpqalJR5SkB6yJF3+ShyfZc+tz4PnA1ZPOIUmtGmKoZwXw6SRb139OVX1mgByS1KSJF39VXQ88bdLrlSSNeDinJDXG4pekxlj8ktQYi1+SGmPxS1JjLH5JaozFL0mNsfglqTEWvyQ1xuKXpMZY/JLUGItfkhpj8UtSYyx+SWqMxS9JjbH4JakxFr8kNcbil6TGWPyS1BiLX5IaY/FLUmMsfklqjMUvSY2x+CWpMRa/JDXG4pekxlj8ktQYi1+SGjNI8Sc5Ksn3knw/yfohMkhSqyZe/El2Ad4HvABYDZyQZPWkc0hSq4bY4z8U+H5VXV9V/wd8DHjxADkkqUmpqsmuMHkpcFRV/Uk3/Qrgt6vqpFmvWwes6yYPBL430aDb2ge4beAMS4XbYprbYprbYtpS2RZPqKqp2TN3HSLJOKpqI7Bx6BxbJdlUVWuHzrEUuC2muS2muS2mLfVtMcRQz03AfjOmH9/NkyRNwBDF/03ggCRPTPJg4GXA+QPkkKQmTXyop6ruSXIS8FlgF+DMqrpm0jkWYckMOy0BbotpbotpbotpS3pbTPzLXUnSsDxzV5IaY/FLUmMsfklqzJI9jn9IM442+nFVfSHJy4HfAa4FNlbVLwcNOGFJ9gdewugw3HuB/wLOqao7Bw0maVH8cncOSc5m9EvxYcAdwB7Ap4AjGG2zE4dLN1lJ3gC8CLgMeCHwbUbb5DjgdVX1pcHCSVoUi38OSa6sqoOT7Mro5LLHVtW9SQL8Z1UdPHDEiUlyFbCm++d/GHBRVR2eZCVwXlUdMnDEiUnySOCtwLHAY4ACbgXOAzZU1R2DhVtCklxcVS8YOsekJHkEo/8uHg9cXFXnzFj2/qp63WDh5uFQz9we1A33PJzRXv8jgZ8CDwF2GzLYQHZlNMTzEEZ//VBVP0rS2rb4OHApcHhV/QQgyW8AJ3bLnj9gtolK8vT5FgFrJhhlKfgQcB3wSeDVSY4HXl5VdwPPHDTZPCz+uZ0BfJfRCWZvBz6R5HpG/xI/NmSwAfwj8M0klwPPBk4FSDLF6JdhS1ZV1akzZ3S/AE5N8uqBMg3lm8CXGRX9bHtNNsrgnlRVx3fPz03yduDSJMcMGWohDvXMI8ljAarqx0n2Ao4EflRV3xg02ACSPBV4CnB1VX136DxDSfI54AvAWVV1SzdvBfAq4HlVdeSA8SYqydXAcVV13RzL/ruq9pvjbQ9ISa4FnlpV982Y9yrgLcAeVfWEobLNx+KXxpTkUcB6RvePeEw3+xZG15raUFW3D5Vt0rrLq19VVdtcLj3JsVV17uRTDSPJ3wCfq6ovzJp/FPDeqjpgmGTzs/ilnSDJH1XVh4bOsRS4LaYt1W1h8Us7QZIfVdXKoXMsBW6LaUt1W/jlrjSmJFfOtwhYMcksQ3NbTFuO28Lil8a3AvhdYPZYfoCvTz7OoNwW05bdtrD4pfFdwOgojc2zFyT50sTTDMttMW3ZbQvH+CWpMV6dU5IaY/FLUmMsfglIUkn+ecb0rkm2JLlgkZ+3V5LXzZg+fLGfJe1sFr808nPgoCQP7aafx+jKrIu1F7DkrsoogcUvzXQRcHT3/ATgo1sXJNk7yblJrkzyH0kO7uafkuTMJF9Kcn13/wKADcCTkmxO8q5u3h5J/jXJd5Oc3V3mW5o4i1+a9jHgZUl2Bw4GLp+x7J3At7t7MbwN+MiMZU9mdBz3ocA7ustVrwd+UFVrquot3esOAU4GVgP7A4f1+M8izcvilzpVdSWwitHe/kWzFj8L+KfudZcCj+5uwAFwYVXdXVW3Mboxy3xna36jqm7sruK4uVuXNHGewCXd3/nAacDhwKPHfM/dM57fy/z/X437OqlX7vFL93cm8M6qumrW/K8AfwijI3SA27Zzs/m7gD37CCj9utzjkGaoqhuBv51j0SnAmd0FuX7B6HaLC33O/yT5WnfDkouBC3d2VmmxvGSDJDXGoR5JaozFL0mNsfglqTEWvyQ1xuKXpMZY/JLUGItfkhpj8UtSY/4fZDFW+b6+4WkAAAAASUVORK5CYII=",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "\n",
+ "new_pumpkins.groupby(['Month'])['Price'].mean().plot(kind='bar')\n",
+ "plt.ylabel(\"Pumpkin Price\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
+ },
+ "kernelspec": {
+ "display_name": "Python 3.7.0 64-bit ('3.7')",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.1"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "95726f0b8283628d5356a4f8eb8b4b76",
+ "translation_date": "2025-09-03T19:45:01+00:00",
+ "source_file": "2-Regression/2-Data/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/3-Linear/README.md b/translations/zh-CN/2-Regression/3-Linear/README.md
new file mode 100644
index 000000000..607e1e518
--- /dev/null
+++ b/translations/zh-CN/2-Regression/3-Linear/README.md
@@ -0,0 +1,373 @@
+# 使用 Scikit-learn 构建回归模型:四种回归方法
+
+
+> 信息图由 [Dasani Madipalli](https://twitter.com/dasani_decoded) 提供
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+> ### [本课程也提供 R 版本!](../../../../2-Regression/3-Linear/solution/R/lesson_3.html)
+### 介绍
+
+到目前为止,您已经通过南瓜定价数据集的样本数据了解了什么是回归,并使用 Matplotlib 对其进行了可视化。
+
+现在,您可以深入学习机器学习中的回归。虽然可视化可以帮助您理解数据,但机器学习的真正力量在于_训练模型_。模型通过历史数据进行训练,自动捕捉数据之间的依赖关系,并能够预测模型未见过的新数据的结果。
+
+在本课程中,您将进一步了解两种回归类型:_基本线性回归_和_多项式回归_,以及这些技术背后的部分数学原理。这些模型将帮助我们根据不同的输入数据预测南瓜价格。
+
+[](https://youtu.be/CRxFT8oTDMg "机器学习入门 - 理解线性回归")
+
+> 🎥 点击上方图片观看关于线性回归的简短视频概述。
+
+> 在整个课程中,我们假设学生的数学知识较少,并努力使内容对来自其他领域的学生更易理解,因此请注意笔记、🧮 数学提示、图表和其他学习工具,以帮助理解。
+
+### 前置知识
+
+到目前为止,您应该已经熟悉我们正在研究的南瓜数据的结构。您可以在本课程的_notebook.ipynb_文件中找到预加载和预清理的数据。在文件中,南瓜价格以蒲式耳为单位显示在一个新的数据框中。确保您可以在 Visual Studio Code 的内核中运行这些笔记本。
+
+### 准备工作
+
+提醒一下,您正在加载这些数据以便提出问题:
+
+- 什么时候是购买南瓜的最佳时机?
+- 我可以预期一箱迷你南瓜的价格是多少?
+- 我应该购买半蒲式耳篮子还是 1 1/9 蒲式耳箱?
+
+让我们继续深入挖掘这些数据。
+
+在上一课中,您创建了一个 Pandas 数据框,并用原始数据集的一部分填充它,将价格标准化为蒲式耳单位。然而,通过这样做,您只能收集到大约 400 个数据点,并且仅限于秋季月份。
+
+查看本课程附带笔记本中预加载的数据。数据已预加载,并绘制了初始散点图以显示月份数据。也许通过进一步清理数据,我们可以更详细地了解数据的性质。
+
+## 线性回归线
+
+正如您在第一课中所学,线性回归的目标是绘制一条线以:
+
+- **显示变量关系**。展示变量之间的关系
+- **进行预测**。准确预测新数据点在该线上的位置
+
+通常使用**最小二乘回归**来绘制这种类型的线。“最小二乘”意味着围绕回归线的所有数据点的误差平方后相加。理想情况下,最终的总和越小越好,因为我们希望误差较少,即`最小二乘`。
+
+我们这样做是因为我们希望建模一条与所有数据点的累计距离最小的线。我们在相加之前对误差进行平方,因为我们关心的是误差的大小而不是方向。
+
+> **🧮 数学展示**
+>
+> 这条线,称为_最佳拟合线_,可以通过[一个公式](https://en.wikipedia.org/wiki/Simple_linear_regression)表示:
+>
+> ```
+> Y = a + bX
+> ```
+>
+> `X` 是“解释变量”。`Y` 是“因变量”。线的斜率是 `b`,而 `a` 是 y 截距,表示当 `X = 0` 时 `Y` 的值。
+>
+>
+>
+> 首先,计算斜率 `b`。信息图由 [Jen Looper](https://twitter.com/jenlooper) 提供
+>
+> 换句话说,参考我们南瓜数据的原始问题:“根据月份预测每蒲式耳南瓜的价格”,`X` 表示价格,`Y` 表示销售月份。
+>
+>
+>
+> 计算 `Y` 的值。如果您支付大约 $4,那一定是四月!信息图由 [Jen Looper](https://twitter.com/jenlooper) 提供
+>
+> 计算线的数学公式必须展示线的斜率,这也取决于截距,即当 `X = 0` 时 `Y` 的位置。
+>
+> 您可以在 [Math is Fun](https://www.mathsisfun.com/data/least-squares-regression.html) 网站上观察这些值的计算方法。还可以访问[最小二乘计算器](https://www.mathsisfun.com/data/least-squares-calculator.html),观察数值如何影响线的形状。
+
+## 相关性
+
+另一个需要理解的术语是给定 X 和 Y 变量之间的**相关系数**。使用散点图,您可以快速可视化该系数。数据点整齐排列成一条线的图具有高相关性,而数据点在 X 和 Y 之间随意分布的图具有低相关性。
+
+一个好的线性回归模型应该是使用最小二乘回归方法和回归线时,相关系数接近 1(而不是 0)。
+
+✅ 运行本课程附带的笔记本,查看月份与价格的散点图。根据您对散点图的视觉解释,南瓜销售的月份与价格之间的数据相关性是高还是低?如果您使用更细化的度量(例如*一年中的天数*,即从年初开始的天数),相关性是否会发生变化?
+
+在下面的代码中,我们假设已经清理了数据,并获得了一个名为 `new_pumpkins` 的数据框,类似于以下内容:
+
+ID | Month | DayOfYear | Variety | City | Package | Low Price | High Price | Price
+---|-------|-----------|---------|------|---------|-----------|------------|-------
+70 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 15.0 | 15.0 | 13.636364
+71 | 9 | 267 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 18.0 | 18.0 | 16.363636
+72 | 10 | 274 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 18.0 | 18.0 | 16.363636
+73 | 10 | 274 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 17.0 | 17.0 | 15.454545
+74 | 10 | 281 | PIE TYPE | BALTIMORE | 1 1/9 bushel cartons | 15.0 | 15.0 | 13.636364
+
+> 清理数据的代码可在 [`notebook.ipynb`](../../../../2-Regression/3-Linear/notebook.ipynb) 中找到。我们执行了与上一课相同的清理步骤,并使用以下表达式计算了 `DayOfYear` 列:
+
+```python
+day_of_year = pd.to_datetime(pumpkins['Date']).apply(lambda dt: (dt-datetime(dt.year,1,1)).days)
+```
+
+现在您已经了解了线性回归背后的数学原理,让我们创建一个回归模型,看看是否可以预测哪种南瓜包装的价格最优。为节日南瓜园购买南瓜的人可能需要这些信息,以优化南瓜包装的购买。
+
+## 寻找相关性
+
+[](https://youtu.be/uoRq-lW2eQo "机器学习入门 - 寻找相关性:线性回归的关键")
+
+> 🎥 点击上方图片观看关于相关性的简短视频概述。
+
+从上一课中,您可能已经看到不同月份的平均价格如下所示:
+
+
+
+这表明可能存在某种相关性,我们可以尝试训练线性回归模型来预测 `Month` 与 `Price` 或 `DayOfYear` 与 `Price` 之间的关系。以下是显示后者关系的散点图:
+
+
+
+让我们使用 `corr` 函数查看是否存在相关性:
+
+```python
+print(new_pumpkins['Month'].corr(new_pumpkins['Price']))
+print(new_pumpkins['DayOfYear'].corr(new_pumpkins['Price']))
+```
+
+看起来相关性很小,`Month` 的相关性为 -0.15,`DayOfYear` 的相关性为 -0.17,但可能存在另一个重要关系。看起来不同南瓜品种对应的价格存在不同的聚类。为了验证这一假设,让我们为每种南瓜类别绘制不同颜色的点。通过向 `scatter` 绘图函数传递 `ax` 参数,我们可以将所有点绘制在同一个图上:
+
+```python
+ax=None
+colors = ['red','blue','green','yellow']
+for i,var in enumerate(new_pumpkins['Variety'].unique()):
+ df = new_pumpkins[new_pumpkins['Variety']==var]
+ ax = df.plot.scatter('DayOfYear','Price',ax=ax,c=colors[i],label=var)
+```
+
+
+
+我们的调查表明,品种对整体价格的影响比实际销售日期更大。我们可以通过柱状图看到这一点:
+
+```python
+new_pumpkins.groupby('Variety')['Price'].mean().plot(kind='bar')
+```
+
+
+
+让我们暂时专注于一种南瓜品种——“馅饼型”,看看日期对价格的影响:
+
+```python
+pie_pumpkins = new_pumpkins[new_pumpkins['Variety']=='PIE TYPE']
+pie_pumpkins.plot.scatter('DayOfYear','Price')
+```
+
+
+如果我们现在使用 `corr` 函数计算 `Price` 与 `DayOfYear` 之间的相关性,我们会得到类似 `-0.27` 的结果——这意味着训练预测模型是有意义的。
+
+> 在训练线性回归模型之前,确保数据清洁非常重要。线性回归对缺失值的处理效果不好,因此清除所有空单元格是有意义的:
+
+```python
+pie_pumpkins.dropna(inplace=True)
+pie_pumpkins.info()
+```
+
+另一种方法是用对应列的平均值填充这些空值。
+
+## 简单线性回归
+
+[](https://youtu.be/e4c_UP2fSjg "机器学习入门 - 使用 Scikit-learn 进行线性和多项式回归")
+
+> 🎥 点击上方图片观看关于线性和多项式回归的简短视频概述。
+
+为了训练我们的线性回归模型,我们将使用 **Scikit-learn** 库。
+
+```python
+from sklearn.linear_model import LinearRegression
+from sklearn.metrics import mean_squared_error
+from sklearn.model_selection import train_test_split
+```
+
+我们首先将输入值(特征)和预期输出(标签)分离到单独的 numpy 数组中:
+
+```python
+X = pie_pumpkins['DayOfYear'].to_numpy().reshape(-1,1)
+y = pie_pumpkins['Price']
+```
+
+> 请注意,我们必须对输入数据执行 `reshape`,以便线性回归包能够正确理解它。线性回归需要一个二维数组作为输入,其中数组的每一行对应于输入特征的向量。在我们的例子中,由于我们只有一个输入——我们需要一个形状为 N×1 的数组,其中 N 是数据集的大小。
+
+然后,我们需要将数据分为训练集和测试集,以便在训练后验证我们的模型:
+
+```python
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+```
+
+最后,训练实际的线性回归模型只需要两行代码。我们定义 `LinearRegression` 对象,并使用 `fit` 方法将其拟合到我们的数据:
+
+```python
+lin_reg = LinearRegression()
+lin_reg.fit(X_train,y_train)
+```
+
+`LinearRegression` 对象在 `fit` 后包含所有回归系数,可以通过 `.coef_` 属性访问。在我们的例子中,只有一个系数,大约是 `-0.017`。这意味着价格似乎随着时间略有下降,但幅度不大,每天大约下降 2 美分。我们还可以通过 `lin_reg.intercept_` 访问回归线与 Y 轴的交点——在我们的例子中,大约是 `21`,表示年初的价格。
+
+为了查看我们的模型有多准确,我们可以预测测试数据集上的价格,然后测量预测值与预期值的接近程度。这可以通过均方误差(MSE)指标完成,它是所有预期值与预测值之间平方差的平均值。
+
+```python
+pred = lin_reg.predict(X_test)
+
+mse = np.sqrt(mean_squared_error(y_test,pred))
+print(f'Mean error: {mse:3.3} ({mse/np.mean(pred)*100:3.3}%)')
+```
+我们的错误似乎集中在两个点上,大约是 17%。表现不太理想。另一个衡量模型质量的指标是 **决定系数**,可以通过以下方式获得:
+
+```python
+score = lin_reg.score(X_train,y_train)
+print('Model determination: ', score)
+```
+如果值为 0,意味着模型没有考虑输入数据,表现为*最差的线性预测器*,即结果的平均值。值为 1 表示我们可以完美预测所有期望的输出。在我们的案例中,决定系数约为 0.06,较低。
+
+我们还可以将测试数据与回归线一起绘制,以更好地观察回归在我们案例中的表现:
+
+```python
+plt.scatter(X_test,y_test)
+plt.plot(X_test,pred)
+```
+
+
+
+## 多项式回归
+
+线性回归的另一种形式是多项式回归。有时变量之间存在线性关系——例如南瓜的体积越大,价格越高——但有时这些关系无法用平面或直线来表示。
+
+✅ 这里有一些[更多示例](https://online.stat.psu.edu/stat501/lesson/9/9.8),展示了可以使用多项式回归的数据。
+
+再看看日期和价格之间的关系。这个散点图看起来是否一定要用直线来分析?价格难道不会波动吗?在这种情况下,可以尝试使用多项式回归。
+
+✅ 多项式是可能包含一个或多个变量和系数的数学表达式。
+
+多项式回归会创建一条曲线,以更好地拟合非线性数据。在我们的案例中,如果将平方的 `DayOfYear` 变量包含在输入数据中,我们应该能够用抛物线拟合数据,该抛物线在一年中的某个点达到最低值。
+
+Scikit-learn 提供了一个非常有用的 [pipeline API](https://scikit-learn.org/stable/modules/generated/sklearn.pipeline.make_pipeline.html?highlight=pipeline#sklearn.pipeline.make_pipeline),可以将数据处理的不同步骤组合在一起。**管道**是**估计器**的链条。在我们的案例中,我们将创建一个管道,首先向模型添加多项式特征,然后训练回归:
+
+```python
+from sklearn.preprocessing import PolynomialFeatures
+from sklearn.pipeline import make_pipeline
+
+pipeline = make_pipeline(PolynomialFeatures(2), LinearRegression())
+
+pipeline.fit(X_train,y_train)
+```
+
+使用 `PolynomialFeatures(2)` 表示我们将包含输入数据中的所有二次多项式。在我们的案例中,这仅意味着 `DayOfYear`2 ,但如果有两个输入变量 X 和 Y,这将添加 X2 、XY 和 Y2 。如果需要,我们也可以使用更高次的多项式。
+
+管道可以像原始的 `LinearRegression` 对象一样使用,例如我们可以 `fit` 管道,然后使用 `predict` 获取预测结果。以下是显示测试数据和拟合曲线的图表:
+
+
+
+使用多项式回归,我们可以获得稍低的 MSE 和稍高的决定系数,但提升并不显著。我们需要考虑其他特征!
+
+> 可以看到南瓜价格最低点大约出现在万圣节附近。你如何解释这一现象?
+
+🎃 恭喜你!你刚刚创建了一个可以帮助预测南瓜派价格的模型。你可能可以对所有南瓜类型重复相同的过程,但这会很繁琐。现在让我们学习如何在模型中考虑南瓜品种!
+
+## 分类特征
+
+在理想情况下,我们希望能够使用同一个模型预测不同南瓜品种的价格。然而,`Variety` 列与 `Month` 等列有所不同,因为它包含非数值值。这类列被称为**分类特征**。
+
+[](https://youtu.be/DYGliioIAE0 "机器学习入门 - 使用线性回归预测分类特征")
+
+> 🎥 点击上方图片观看关于使用分类特征的简短视频概述。
+
+以下是品种与平均价格的关系:
+
+
+
+为了考虑品种,我们首先需要将其转换为数值形式,或者说**编码**。有几种方法可以实现:
+
+* 简单的**数值编码**会构建一个不同品种的表格,然后用表格中的索引替换品种名称。这对线性回归来说不是最好的选择,因为线性回归会将索引的实际数值考虑在内,并通过某个系数与结果相乘。在我们的案例中,索引号与价格之间的关系显然是非线性的,即使我们确保索引按某种特定方式排序。
+* **独热编码**会将 `Variety` 列替换为 4 个不同的列,每个品种对应一个列。如果某行属于某个品种,该列值为 `1`,否则为 `0`。这意味着线性回归中会有四个系数,每个南瓜品种对应一个,负责该品种的“起始价格”(或“附加价格”)。
+
+以下代码展示了如何对品种进行独热编码:
+
+```python
+pd.get_dummies(new_pumpkins['Variety'])
+```
+
+ ID | FAIRYTALE | MINIATURE | MIXED HEIRLOOM VARIETIES | PIE TYPE
+----|-----------|-----------|--------------------------|----------
+70 | 0 | 0 | 0 | 1
+71 | 0 | 0 | 0 | 1
+... | ... | ... | ... | ...
+1738 | 0 | 1 | 0 | 0
+1739 | 0 | 1 | 0 | 0
+1740 | 0 | 1 | 0 | 0
+1741 | 0 | 1 | 0 | 0
+1742 | 0 | 1 | 0 | 0
+
+为了使用独热编码的品种作为输入训练线性回归,我们只需正确初始化 `X` 和 `y` 数据:
+
+```python
+X = pd.get_dummies(new_pumpkins['Variety'])
+y = new_pumpkins['Price']
+```
+
+其余代码与我们之前用于训练线性回归的代码相同。如果尝试,你会发现均方误差差不多,但决定系数显著提高(约 77%)。为了获得更准确的预测,我们可以考虑更多分类特征以及数值特征,例如 `Month` 或 `DayOfYear`。为了获得一个大的特征数组,我们可以使用 `join`:
+
+```python
+X = pd.get_dummies(new_pumpkins['Variety']) \
+ .join(new_pumpkins['Month']) \
+ .join(pd.get_dummies(new_pumpkins['City'])) \
+ .join(pd.get_dummies(new_pumpkins['Package']))
+y = new_pumpkins['Price']
+```
+
+这里我们还考虑了 `City` 和 `Package` 类型,这使得 MSE 降至 2.84(10%),决定系数提高到 0.94!
+
+## 综合起来
+
+为了构建最佳模型,我们可以将上述示例中的组合数据(独热编码分类特征 + 数值特征)与多项式回归一起使用。以下是完整代码供参考:
+
+```python
+# set up training data
+X = pd.get_dummies(new_pumpkins['Variety']) \
+ .join(new_pumpkins['Month']) \
+ .join(pd.get_dummies(new_pumpkins['City'])) \
+ .join(pd.get_dummies(new_pumpkins['Package']))
+y = new_pumpkins['Price']
+
+# make train-test split
+X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+# setup and train the pipeline
+pipeline = make_pipeline(PolynomialFeatures(2), LinearRegression())
+pipeline.fit(X_train,y_train)
+
+# predict results for test data
+pred = pipeline.predict(X_test)
+
+# calculate MSE and determination
+mse = np.sqrt(mean_squared_error(y_test,pred))
+print(f'Mean error: {mse:3.3} ({mse/np.mean(pred)*100:3.3}%)')
+
+score = pipeline.score(X_train,y_train)
+print('Model determination: ', score)
+```
+
+这应该能让我们获得接近 97% 的决定系数,以及 MSE=2.23(约 8% 的预测误差)。
+
+| 模型 | MSE | 决定系数 |
+|-------|-----|---------------|
+| `DayOfYear` 线性 | 2.77 (17.2%) | 0.07 |
+| `DayOfYear` 多项式 | 2.73 (17.0%) | 0.08 |
+| `Variety` 线性 | 5.24 (19.7%) | 0.77 |
+| 所有特征线性 | 2.84 (10.5%) | 0.94 |
+| 所有特征多项式 | 2.23 (8.25%) | 0.97 |
+
+🏆 做得好!你在一节课中创建了四个回归模型,并将模型质量提升至 97%。在回归的最后一部分中,你将学习如何使用逻辑回归来确定类别。
+
+---
+
+## 🚀挑战
+
+在此笔记本中测试几个不同的变量,观察相关性如何影响模型准确性。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+在本课中我们学习了线性回归。还有其他重要的回归类型。阅读关于逐步回归、岭回归、套索回归和弹性网络技术的内容。一个不错的学习课程是 [斯坦福统计学习课程](https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning)。
+
+## 作业
+
+[构建一个模型](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/3-Linear/assignment.md b/translations/zh-CN/2-Regression/3-Linear/assignment.md
new file mode 100644
index 000000000..f97ad3678
--- /dev/null
+++ b/translations/zh-CN/2-Regression/3-Linear/assignment.md
@@ -0,0 +1,16 @@
+# 创建回归模型
+
+## 说明
+
+在本课中,你学习了如何使用线性回归和多项式回归来构建模型。利用这些知识,找到一个数据集或使用 Scikit-learn 内置的数据集来构建一个新的模型。在你的笔记本中解释你选择该技术的原因,并展示你的模型的准确性。如果模型不够准确,请解释原因。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ------------------------------------------------------------ | -------------------------- | ------------------------------- |
+| | 提供一个完整的笔记本,并包含详细记录的解决方案 | 解决方案不完整 | 解决方案存在缺陷或错误 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/3-Linear/notebook.ipynb b/translations/zh-CN/2-Regression/3-Linear/notebook.ipynb
new file mode 100644
index 000000000..d90c9df81
--- /dev/null
+++ b/translations/zh-CN/2-Regression/3-Linear/notebook.ipynb
@@ -0,0 +1,128 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 南瓜定价\n",
+ "\n",
+ "加载所需的库和数据集。将数据转换为一个包含数据子集的数据框:\n",
+ "\n",
+ "- 仅获取按蒲式耳定价的南瓜\n",
+ "- 将日期转换为月份\n",
+ "- 计算价格为高价和低价的平均值\n",
+ "- 将价格转换为反映按蒲式耳数量定价\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "from datetime import datetime\n",
+ "\n",
+ "pumpkins = pd.read_csv('../data/US-pumpkins.csv')\n",
+ "\n",
+ "pumpkins.head()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
+ "\n",
+ "columns_to_select = ['Package', 'Variety', 'City Name', 'Low Price', 'High Price', 'Date']\n",
+ "pumpkins = pumpkins.loc[:, columns_to_select]\n",
+ "\n",
+ "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
+ "\n",
+ "month = pd.DatetimeIndex(pumpkins['Date']).month\n",
+ "day_of_year = pd.to_datetime(pumpkins['Date']).apply(lambda dt: (dt-datetime(dt.year,1,1)).days)\n",
+ "\n",
+ "new_pumpkins = pd.DataFrame(\n",
+ " {'Month': month, \n",
+ " 'DayOfYear' : day_of_year, \n",
+ " 'Variety': pumpkins['Variety'], \n",
+ " 'City': pumpkins['City Name'], \n",
+ " 'Package': pumpkins['Package'], \n",
+ " 'Low Price': pumpkins['Low Price'],\n",
+ " 'High Price': pumpkins['High Price'], \n",
+ " 'Price': price})\n",
+ "\n",
+ "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/1.1\n",
+ "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price*2\n",
+ "\n",
+ "new_pumpkins.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "一个基本的散点图提醒我们,我们只有从八月到十二月的月度数据。我们可能需要更多数据才能以线性方式得出结论。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "plt.scatter('Month','Price',data=new_pumpkins)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "plt.scatter('DayOfYear','Price',data=new_pumpkins)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3-final"
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "b032d371c75279373507f003439a577e",
+ "translation_date": "2025-09-03T19:16:27+00:00",
+ "source_file": "2-Regression/3-Linear/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/3-Linear/solution/Julia/README.md b/translations/zh-CN/2-Regression/3-Linear/solution/Julia/README.md
new file mode 100644
index 000000000..acfece51f
--- /dev/null
+++ b/translations/zh-CN/2-Regression/3-Linear/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/3-Linear/solution/R/lesson_3-R.ipynb b/translations/zh-CN/2-Regression/3-Linear/solution/R/lesson_3-R.ipynb
new file mode 100644
index 000000000..57f669a57
--- /dev/null
+++ b/translations/zh-CN/2-Regression/3-Linear/solution/R/lesson_3-R.ipynb
@@ -0,0 +1,1088 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {
+ "colab": {
+ "name": "lesson_3-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ },
+ "coopTranslator": {
+ "original_hash": "5015d65d61ba75a223bfc56c273aa174",
+ "translation_date": "2025-09-03T19:26:15+00:00",
+ "source_file": "2-Regression/3-Linear/solution/R/lesson_3-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 构建回归模型:线性回归和多项式回归模型\n"
+ ],
+ "metadata": {
+ "id": "EgQw8osnsUV-"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 南瓜定价的线性回归和多项式回归 - 第三课\n",
+ "\n",
+ " \n",
+ " 信息图作者:Dasani Madipalli \n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "#### 介绍\n",
+ "\n",
+ "到目前为止,你已经通过南瓜定价数据集的样本数据了解了什么是回归,并将在整个课程中使用该数据集。你还使用了 `ggplot2` 进行了可视化。💪\n",
+ "\n",
+ "现在你已经准备好深入学习机器学习中的回归。在本课中,你将进一步了解两种回归类型:*基本线性回归* 和 *多项式回归*,以及这些技术背后的一些数学原理。\n",
+ "\n",
+ "> 在整个课程中,我们假设学生的数学知识较少,并努力使内容对来自其他领域的学生更易理解,因此请注意笔记、🧮 数学提示、图表以及其他学习工具,这些都将帮助你更好地理解。\n",
+ "\n",
+ "#### 准备工作\n",
+ "\n",
+ "提醒一下,你正在加载这些数据以便对其进行分析。\n",
+ "\n",
+ "- 什么时候是购买南瓜的最佳时间?\n",
+ "\n",
+ "- 一箱迷你南瓜的价格大概是多少?\n",
+ "\n",
+ "- 我应该选择半蒲式耳篮子还是 1 1/9 蒲式耳箱来购买?让我们继续深入挖掘这些数据。\n",
+ "\n",
+ "在上一课中,你创建了一个 `tibble`(数据框的一种现代化形式),并用原始数据集的一部分填充它,同时将价格标准化为以蒲式耳为单位。然而,通过这种方式,你只能收集到大约 400 个数据点,并且仅限于秋季月份。也许通过进一步清理数据,我们可以获得更多细节?我们拭目以待... 🕵️♀️\n",
+ "\n",
+ "完成此任务需要以下包:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个 [R 包集合](https://www.tidyverse.org/packages),旨在让数据科学更快、更简单、更有趣!\n",
+ "\n",
+ "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个 [包集合](https://www.tidymodels.org/packages),用于建模和机器学习。\n",
+ "\n",
+ "- `janitor`: [janitor 包](https://github.com/sfirke/janitor) 提供了一些简单的小工具,用于检查和清理脏数据。\n",
+ "\n",
+ "- `corrplot`: [corrplot 包](https://cran.r-project.org/web/packages/corrplot/vignettes/corrplot-intro.html) 提供了一个可视化的相关矩阵探索工具,支持自动变量重新排序,以帮助发现变量之间隐藏的模式。\n",
+ "\n",
+ "你可以通过以下命令安装这些包:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"janitor\", \"corrplot\"))`\n",
+ "\n",
+ "下面的脚本会检查你是否安装了完成本模块所需的包,并在缺少时为你安装它们。\n"
+ ],
+ "metadata": {
+ "id": "WqQPS1OAsg3H"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "suppressWarnings(if (!require(\"pacman\")) install.packages(\"pacman\"))\n",
+ "\n",
+ "pacman::p_load(tidyverse, tidymodels, janitor, corrplot)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "tA4C2WN3skCf",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "outputId": "c06cd805-5534-4edc-f72b-d0d1dab96ac0"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "我们稍后会加载这些很棒的包,并将它们在当前的 R 会话中可用。(这只是为了说明,`pacman::p_load()` 已经帮你完成了这一步)\n",
+ "\n",
+ "## 1. 线性回归线\n",
+ "\n",
+ "正如你在第一课中学到的,线性回归的目标是能够绘制一条*最佳拟合线*,以便:\n",
+ "\n",
+ "- **展示变量关系**。展示变量之间的关系\n",
+ "\n",
+ "- **进行预测**。准确预测新数据点在这条线上的位置\n",
+ "\n",
+ "为了绘制这种类型的线,我们使用一种统计技术,称为**最小二乘回归**。`最小二乘`的意思是回归线周围的所有数据点的误差平方后相加。理想情况下,这个最终的总和应该尽可能小,因为我们希望误差数量较低,也就是`最小二乘`。因此,最佳拟合线就是使误差平方和最小的那条线——这就是*最小二乘回归*的名称由来。\n",
+ "\n",
+ "我们这样做是因为我们希望拟合一条与所有数据点的累计距离最小的线。在加总之前,我们会对误差进行平方,因为我们关心的是误差的大小,而不是方向。\n",
+ "\n",
+ "> **🧮 数学公式**\n",
+ ">\n",
+ "> 这条线,称为*最佳拟合线*,可以用[一个公式](https://en.wikipedia.org/wiki/Simple_linear_regression)表示:\n",
+ ">\n",
+ "> Y = a + bX\n",
+ ">\n",
+ "> `X` 是`解释变量`或`预测变量`,`Y` 是`因变量`或`结果变量`。线的斜率是 `b`,而 `a` 是 y 截距,指的是当 `X = 0` 时 `Y` 的值。\n",
+ ">\n",
+ "\n",
+ "> \n",
+ " 信息图由 Jen Looper 制作\n",
+ ">\n",
+ "> 首先,计算斜率 `b`。\n",
+ ">\n",
+ "> 换句话说,参考我们的南瓜数据的原始问题:“按月份预测每蒲式耳南瓜的价格”,`X` 表示价格,`Y` 表示销售月份。\n",
+ ">\n",
+ "> \n",
+ " 信息图由 Jen Looper 制作\n",
+ "> \n",
+ "> 计算 Y 的值。如果你支付大约 \\$4,那一定是四月!\n",
+ ">\n",
+ "> 计算这条线的数学公式必须展示线的斜率,这也取决于截距,即当 `X = 0` 时 `Y` 的位置。\n",
+ ">\n",
+ "> 你可以在 [Math is Fun](https://www.mathsisfun.com/data/least-squares-regression.html) 网站上观察这些值的计算方法。也可以访问[这个最小二乘计算器](https://www.mathsisfun.com/data/least-squares-calculator.html),看看数值如何影响这条线。\n",
+ "\n",
+ "是不是没那么可怕?🤓\n",
+ "\n",
+ "#### 相关性\n",
+ "\n",
+ "还有一个需要理解的术语是给定 X 和 Y 变量之间的**相关系数**。使用散点图,你可以快速可视化这个系数。数据点整齐排列成一条线的图表具有高相关性,而数据点在 X 和 Y 之间随意分布的图表则相关性较低。\n",
+ "\n",
+ "一个好的线性回归模型应该是使用最小二乘回归方法和回归线时,相关系数较高(接近 1 而不是 0)的模型。\n"
+ ],
+ "metadata": {
+ "id": "cdX5FRpvsoP5"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## **2. 与数据共舞:创建用于建模的数据框**\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 插画作者:@allison_horst \n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "WdUKXk7Bs8-V"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "加载所需的库和数据集。将数据转换为包含数据子集的数据框:\n",
+ "\n",
+ "- 仅获取按蒲式耳定价的南瓜数据\n",
+ "\n",
+ "- 将日期转换为月份\n",
+ "\n",
+ "- 计算价格为高价和低价的平均值\n",
+ "\n",
+ "- 将价格转换为反映按蒲式耳数量定价的形式\n",
+ "\n",
+ "> 我们在[上一课](https://github.com/microsoft/ML-For-Beginners/blob/main/2-Regression/2-Data/solution/lesson_2-R.ipynb)中已经涵盖了这些步骤。\n"
+ ],
+ "metadata": {
+ "id": "fMCtu2G2s-p8"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load the core Tidyverse packages\n",
+ "library(tidyverse)\n",
+ "library(lubridate)\n",
+ "\n",
+ "# Import the pumpkins data\n",
+ "pumpkins <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv\")\n",
+ "\n",
+ "\n",
+ "# Get a glimpse and dimensions of the data\n",
+ "glimpse(pumpkins)\n",
+ "\n",
+ "\n",
+ "# Print the first 50 rows of the data set\n",
+ "pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "ryMVZEEPtERn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "出于纯粹冒险的精神,让我们探索 [`janitor package`](../../../../../../2-Regression/3-Linear/solution/R/github.com/sfirke/janitor),它提供了简单的函数来检查和清理脏数据。例如,让我们看看我们数据的列名:\n"
+ ],
+ "metadata": {
+ "id": "xcNxM70EtJjb"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Return column names\n",
+ "pumpkins %>% \n",
+ " names()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "5XtpaIigtPfW"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤔 我们可以做得更好。让我们通过使用 `janitor::clean_names` 将这些列名转换为 [snake_case](https://en.wikipedia.org/wiki/Snake_case) 约定来使它们成为 `friendR`。要了解有关此函数的更多信息:`?clean_names`\n"
+ ],
+ "metadata": {
+ "id": "IbIqrMINtSHe"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Clean names to the snake_case convention\n",
+ "pumpkins <- pumpkins %>% \n",
+ " clean_names(case = \"snake\")\n",
+ "\n",
+ "# Return column names\n",
+ "pumpkins %>% \n",
+ " names()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "a2uYvclYtWvX"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "非常整洁 🧹!现在,像上一节课一样,用 `dplyr` 来与数据共舞吧!💃\n"
+ ],
+ "metadata": {
+ "id": "HfhnuzDDtaDd"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Select desired columns\n",
+ "pumpkins <- pumpkins %>% \n",
+ " select(variety, city_name, package, low_price, high_price, date)\n",
+ "\n",
+ "\n",
+ "\n",
+ "# Extract the month from the dates to a new column\n",
+ "pumpkins <- pumpkins %>%\n",
+ " mutate(date = mdy(date),\n",
+ " month = month(date)) %>% \n",
+ " select(-date)\n",
+ "\n",
+ "\n",
+ "\n",
+ "# Create a new column for average Price\n",
+ "pumpkins <- pumpkins %>% \n",
+ " mutate(price = (low_price + high_price)/2)\n",
+ "\n",
+ "\n",
+ "# Retain only pumpkins with the string \"bushel\"\n",
+ "new_pumpkins <- pumpkins %>% \n",
+ " filter(str_detect(string = package, pattern = \"bushel\"))\n",
+ "\n",
+ "\n",
+ "# Normalize the pricing so that you show the pricing per bushel, not per 1 1/9 or 1/2 bushel\n",
+ "new_pumpkins <- new_pumpkins %>% \n",
+ " mutate(price = case_when(\n",
+ " str_detect(package, \"1 1/9\") ~ price/(1.1),\n",
+ " str_detect(package, \"1/2\") ~ price*2,\n",
+ " TRUE ~ price))\n",
+ "\n",
+ "# Relocate column positions\n",
+ "new_pumpkins <- new_pumpkins %>% \n",
+ " relocate(month, .before = variety)\n",
+ "\n",
+ "\n",
+ "# Display the first 5 rows\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "X0wU3gQvtd9f"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "干得好!👌 你现在拥有一个干净整洁的数据集,可以用来构建新的回归模型!\n",
+ "\n",
+ "画个散点图怎么样?\n"
+ ],
+ "metadata": {
+ "id": "UpaIwaxqth82"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Set theme\n",
+ "theme_set(theme_light())\n",
+ "\n",
+ "# Make a scatter plot of month and price\n",
+ "new_pumpkins %>% \n",
+ " ggplot(mapping = aes(x = month, y = price)) +\n",
+ " geom_point(size = 1.6)\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "DXgU-j37tl5K"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "散点图提醒我们,我们只有从八月到十二月的月度数据。我们可能需要更多的数据才能以线性方式得出结论。\n",
+ "\n",
+ "让我们再看看我们的建模数据:\n"
+ ],
+ "metadata": {
+ "id": "Ve64wVbwtobI"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Display first 5 rows\n",
+ "new_pumpkins %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "HFQX2ng1tuSJ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "如果我们想根据`city`或`package`列(它们是字符类型)来预测南瓜的`price`,该怎么办?或者更简单地说,我们如何找到`package`和`price`之间的相关性(这要求两个输入都为数值类型)呢?🤷🤷\n",
+ "\n",
+ "机器学习模型在处理数值特征时效果最佳,而不是文本值,因此通常需要将分类特征转换为数值表示。\n",
+ "\n",
+ "这意味着我们需要找到一种方法来重新格式化我们的预测变量,使其更容易被模型有效利用,这个过程被称为`特征工程`。\n"
+ ],
+ "metadata": {
+ "id": "7hsHoxsStyjJ"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 3. 为建模预处理数据,使用 recipes 👩🍳👨🍳\n",
+ "\n",
+ "将预测变量重新格式化以便模型更有效使用的活动被称为`特征工程`。\n",
+ "\n",
+ "不同的模型对数据预处理有不同的要求。例如,最小二乘法需要对`分类变量`(如月份、品种和城市名称)进行`编码`。这通常涉及将包含`分类值`的列`转换`为一个或多个`数值列`,以替代原始列。\n",
+ "\n",
+ "例如,假设你的数据包含以下分类特征:\n",
+ "\n",
+ "| city |\n",
+ "|:-------:|\n",
+ "| Denver |\n",
+ "| Nairobi |\n",
+ "| Tokyo |\n",
+ "\n",
+ "你可以应用*序数编码*,为每个类别替换一个唯一的整数值,如下所示:\n",
+ "\n",
+ "| city |\n",
+ "|:----:|\n",
+ "| 0 |\n",
+ "| 1 |\n",
+ "| 2 |\n",
+ "\n",
+ "这就是我们将对数据进行的操作!\n",
+ "\n",
+ "在本节中,我们将探索另一个令人惊叹的 Tidymodels 包:[recipes](https://tidymodels.github.io/recipes/) - 它专为在训练模型**之前**帮助你预处理数据而设计。本质上,recipe 是一个对象,用于定义对数据集应用哪些步骤,以使其为建模做好准备。\n",
+ "\n",
+ "现在,让我们创建一个 recipe,通过为预测变量列中的所有观测值替换唯一整数,来为建模准备数据:\n"
+ ],
+ "metadata": {
+ "id": "AD5kQbcvt3Xl"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Specify a recipe\n",
+ "pumpkins_recipe <- recipe(price ~ ., data = new_pumpkins) %>% \n",
+ " step_integer(all_predictors(), zero_based = TRUE)\n",
+ "\n",
+ "\n",
+ "# Print out the recipe\n",
+ "pumpkins_recipe"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "BNaFKXfRt9TU"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "太棒了!👏 我们刚刚创建了第一个配方,它指定了一个结果(价格)及其对应的预测变量,并且所有预测变量列都被编码为一组整数 🙌!让我们快速分解一下:\n",
+ "\n",
+ "- 调用 `recipe()` 并使用公式告诉配方变量的*角色*,以 `new_pumpkins` 数据作为参考。例如,`price` 列被分配了 `outcome` 角色,而其余列被分配了 `predictor` 角色。\n",
+ "\n",
+ "- `step_integer(all_predictors(), zero_based = TRUE)` 指定所有预测变量都应转换为一组整数,编号从 0 开始。\n",
+ "\n",
+ "我们相信你可能会有这样的想法:“这太酷了!!但如果我需要确认这些配方确实按照我的预期在工作怎么办?🤔”\n",
+ "\n",
+ "这是一个很棒的想法!你看,一旦定义了配方,你可以估算实际预处理数据所需的参数,然后提取处理后的数据。通常在使用 Tidymodels 时不需要这样做(我们稍后会看到常规方法 -> `workflows`),但当你想进行某种合理性检查以确认配方是否按预期工作时,这会非常有用。\n",
+ "\n",
+ "为此,你需要两个额外的动词:`prep()` 和 `bake()`。一如既往,我们的小 R 朋友由 [`Allison Horst`](https://github.com/allisonhorst/stats-illustrations) 创作,帮助你更好地理解这一点!\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 插图作者:@allison_horst \n"
+ ],
+ "metadata": {
+ "id": "KEiO0v7kuC9O"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "[`prep()`](https://recipes.tidymodels.org/reference/prep.html):从训练集估算所需参数,这些参数可以稍后应用于其他数据集。例如,对于给定的预测变量列,哪个观测值会被分配为整数 0、1、2 等。\n",
+ "\n",
+ "[`bake()`](https://recipes.tidymodels.org/reference/bake.html):使用已准备好的配方并将操作应用于任何数据集。\n",
+ "\n",
+ "话虽如此,让我们准备并应用配方,真正确认在底层,预测变量列会先被编码,然后再拟合模型。\n"
+ ],
+ "metadata": {
+ "id": "Q1xtzebuuTCP"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Prep the recipe\n",
+ "pumpkins_prep <- prep(pumpkins_recipe)\n",
+ "\n",
+ "# Bake the recipe to extract a preprocessed new_pumpkins data\n",
+ "baked_pumpkins <- bake(pumpkins_prep, new_data = NULL)\n",
+ "\n",
+ "# Print out the baked data set\n",
+ "baked_pumpkins %>% \n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "FGBbJbP_uUUn"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "哇哦!🥳 处理后的数据 `baked_pumpkins` 的所有预测变量都已编码,这确认了我们定义的预处理步骤(作为我们的配方)确实可以如预期般工作。这虽然让数据更难阅读,但对 Tidymodels 来说却更加易于理解!花点时间找出哪些观测值已被映射到对应的整数。\n",
+ "\n",
+ "另外值得一提的是,`baked_pumpkins` 是一个数据框,我们可以在其上进行计算。\n",
+ "\n",
+ "例如,我们可以尝试在数据中的两个点之间找到一个良好的相关性,以便可能构建一个优秀的预测模型。我们将使用函数 `cor()` 来完成此操作。输入 `?cor()` 以了解更多关于该函数的信息。\n"
+ ],
+ "metadata": {
+ "id": "1dvP0LBUueAW"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Find the correlation between the city_name and the price\n",
+ "cor(baked_pumpkins$city_name, baked_pumpkins$price)\n",
+ "\n",
+ "# Find the correlation between the package and the price\n",
+ "cor(baked_pumpkins$package, baked_pumpkins$price)\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "3bQzXCjFuiSV"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "事实证明,城市和价格之间的相关性较弱。然而,套餐和价格之间的相关性稍强一些。这很合理,对吧?通常来说,生产箱越大,价格越高。\n",
+ "\n",
+ "既然如此,我们也可以尝试使用 `corrplot` 包来可视化所有列的相关性矩阵。\n"
+ ],
+ "metadata": {
+ "id": "BToPWbgjuoZw"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load the corrplot package\n",
+ "library(corrplot)\n",
+ "\n",
+ "# Obtain correlation matrix\n",
+ "corr_mat <- cor(baked_pumpkins %>% \n",
+ " # Drop columns that are not really informative\n",
+ " select(-c(low_price, high_price)))\n",
+ "\n",
+ "# Make a correlation plot between the variables\n",
+ "corrplot(corr_mat, method = \"shade\", shade.col = NA, tl.col = \"black\", tl.srt = 45, addCoef.col = \"black\", cl.pos = \"n\", order = \"original\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "ZwAL3ksmutVR"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤩🤩 好得多。\n",
+ "\n",
+ "现在可以问这个数据的一个好问题是:'`给定一个南瓜包,我可以预期它的价格是多少?`' 让我们直接开始吧!\n",
+ "\n",
+ "> 注意:当你使用 **`new_data = NULL`** 对预处理过的配方 **`pumpkins_prep`** 进行 **`bake()`** 时,你会提取处理过的(即编码后的)训练数据。如果你有另一个数据集,例如测试集,并希望查看配方如何对其进行预处理,你只需使用 **`new_data = test_set`** 对 **`pumpkins_prep`** 进行 bake。\n",
+ "\n",
+ "## 4. 构建线性回归模型\n",
+ "\n",
+ "
\n",
+ " \n",
+ " Dasani Madipalli 制作的信息图 \n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "YqXjLuWavNxW"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在我们已经构建了一个配方,并确认数据将被适当预处理,接下来让我们构建一个回归模型来回答这个问题:`我可以预期某个南瓜包装的价格是多少?`\n",
+ "\n",
+ "#### 使用训练集训练线性回归模型\n",
+ "\n",
+ "正如你可能已经猜到的,*price* 列是 `结果` 变量,而 *package* 列是 `预测` 变量。\n",
+ "\n",
+ "为此,我们首先将数据分割为训练集(占80%)和测试集(占20%),然后定义一个配方,将预测变量列编码为一组整数,接着构建一个模型规范。我们不会准备和烘焙配方,因为我们已经知道它会按预期预处理数据。\n"
+ ],
+ "metadata": {
+ "id": "Pq0bSzCevW-h"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "set.seed(2056)\n",
+ "# Split the data into training and test sets\n",
+ "pumpkins_split <- new_pumpkins %>% \n",
+ " initial_split(prop = 0.8)\n",
+ "\n",
+ "\n",
+ "# Extract training and test data\n",
+ "pumpkins_train <- training(pumpkins_split)\n",
+ "pumpkins_test <- testing(pumpkins_split)\n",
+ "\n",
+ "\n",
+ "\n",
+ "# Create a recipe for preprocessing the data\n",
+ "lm_pumpkins_recipe <- recipe(price ~ package, data = pumpkins_train) %>% \n",
+ " step_integer(all_predictors(), zero_based = TRUE)\n",
+ "\n",
+ "\n",
+ "\n",
+ "# Create a linear model specification\n",
+ "lm_spec <- linear_reg() %>% \n",
+ " set_engine(\"lm\") %>% \n",
+ " set_mode(\"regression\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "CyoEh_wuvcLv"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "干得好!现在我们已经有了一个配方和模型规范,我们需要找到一种方法将它们打包成一个对象,该对象将首先对数据进行预处理(幕后完成prep+bake),然后在预处理后的数据上拟合模型,同时还支持潜在的后处理活动。这是不是让你更安心了!🤩\n",
+ "\n",
+ "在Tidymodels中,这个方便的对象叫做[`workflow`](https://workflows.tidymodels.org/),它可以方便地包含你的建模组件!这在*Python*中我们称之为*管道*。\n",
+ "\n",
+ "那么,让我们把所有东西打包到一个workflow中吧!📦\n"
+ ],
+ "metadata": {
+ "id": "G3zF_3DqviFJ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Hold modelling components in a workflow\n",
+ "lm_wf <- workflow() %>% \n",
+ " add_recipe(lm_pumpkins_recipe) %>% \n",
+ " add_model(lm_spec)\n",
+ "\n",
+ "# Print out the workflow\n",
+ "lm_wf"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "T3olroU3v-WX"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "顺便提一下,工作流程可以像模型一样进行适配或训练。\n"
+ ],
+ "metadata": {
+ "id": "zd1A5tgOwEPX"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Train the model\n",
+ "lm_wf_fit <- lm_wf %>% \n",
+ " fit(data = pumpkins_train)\n",
+ "\n",
+ "# Print the model coefficients learned \n",
+ "lm_wf_fit"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "NhJagFumwFHf"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "从模型输出中,我们可以看到训练过程中学习到的系数。它们表示最佳拟合线的系数,该线使实际变量与预测变量之间的总体误差最小化。\n",
+ "\n",
+ "#### 使用测试集评估模型性能\n",
+ "\n",
+ "是时候看看模型的表现了 📏!我们该怎么做呢?\n",
+ "\n",
+ "现在我们已经训练了模型,可以使用 `parsnip::predict()` 对测试集进行预测。然后,我们可以将这些预测值与实际标签值进行比较,以评估模型的效果(好或不好)。\n",
+ "\n",
+ "让我们从对测试集进行预测开始,然后将预测结果与测试集绑定在一起。\n"
+ ],
+ "metadata": {
+ "id": "_4QkGtBTwItF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make predictions for the test set\n",
+ "predictions <- lm_wf_fit %>% \n",
+ " predict(new_data = pumpkins_test)\n",
+ "\n",
+ "\n",
+ "# Bind predictions to the test set\n",
+ "lm_results <- pumpkins_test %>% \n",
+ " select(c(package, price)) %>% \n",
+ " bind_cols(predictions)\n",
+ "\n",
+ "\n",
+ "# Print the first ten rows of the tibble\n",
+ "lm_results %>% \n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "UFZzTG0gwTs9"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "是的,你刚刚训练了一个模型并用它进行了预测!🔮 它表现如何呢?让我们来评估模型的性能吧!\n",
+ "\n",
+ "在Tidymodels中,我们使用 `yardstick::metrics()` 来完成这一任务!对于线性回归,我们重点关注以下指标:\n",
+ "\n",
+ "- `均方根误差 (RMSE)`:即[均方误差 (MSE)](https://en.wikipedia.org/wiki/Mean_squared_error)的平方根。它提供了一个绝对指标,单位与标签一致(在这个例子中是南瓜的价格)。值越小,模型越好(简单来说,它表示预测值平均偏差的价格范围)。\n",
+ "\n",
+ "- `决定系数(通常称为R平方或R2)`:一个相对指标,值越高,模型拟合效果越好。实际上,这个指标表示模型能够解释预测值与实际标签值之间方差的程度。\n"
+ ],
+ "metadata": {
+ "id": "0A5MjzM7wW9M"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Evaluate performance of linear regression\n",
+ "metrics(data = lm_results,\n",
+ " truth = price,\n",
+ " estimate = .pred)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "reJ0UIhQwcEH"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "模型性能下降了。让我们通过可视化包裹和价格的散点图来看看是否能获得更好的指示,然后使用预测结果叠加一条最佳拟合线。\n",
+ "\n",
+ "这意味着我们需要准备并处理测试集,以便对包裹列进行编码,然后将其与模型生成的预测结果绑定在一起。\n"
+ ],
+ "metadata": {
+ "id": "fdgjzjkBwfWt"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Encode package column\n",
+ "package_encode <- lm_pumpkins_recipe %>% \n",
+ " prep() %>% \n",
+ " bake(new_data = pumpkins_test) %>% \n",
+ " select(package)\n",
+ "\n",
+ "\n",
+ "# Bind encoded package column to the results\n",
+ "lm_results <- lm_results %>% \n",
+ " bind_cols(package_encode %>% \n",
+ " rename(package_integer = package)) %>% \n",
+ " relocate(package_integer, .after = package)\n",
+ "\n",
+ "\n",
+ "# Print new results data frame\n",
+ "lm_results %>% \n",
+ " slice_head(n = 5)\n",
+ "\n",
+ "\n",
+ "# Make a scatter plot\n",
+ "lm_results %>% \n",
+ " ggplot(mapping = aes(x = package_integer, y = price)) +\n",
+ " geom_point(size = 1.6) +\n",
+ " # Overlay a line of best fit\n",
+ " geom_line(aes(y = .pred), color = \"orange\", size = 1.2) +\n",
+ " xlab(\"package\")\n",
+ " \n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "R0nw719lwkHE"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "很棒!正如你所看到的,线性回归模型并不能很好地概括包裹与其对应价格之间的关系。\n",
+ "\n",
+ "🎃 恭喜你,你刚刚创建了一个可以帮助预测几种南瓜价格的模型。你的节日南瓜田会非常漂亮。但你可能可以创建一个更好的模型!\n",
+ "\n",
+ "## 5. 构建一个多项式回归模型\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 信息图由 Dasani Madipalli 制作 \n",
+ "\n",
+ "\n",
+ "\n"
+ ],
+ "metadata": {
+ "id": "HOCqJXLTwtWI"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "有时候,我们的数据可能并不存在线性关系,但我们仍然希望预测结果。这时,多项式回归可以帮助我们对更复杂的非线性关系进行预测。\n",
+ "\n",
+ "以我们的南瓜数据集中的包装和价格关系为例。虽然有时变量之间存在线性关系——比如南瓜的体积越大,价格越高——但有时这些关系无法用一个平面或直线来表示。\n",
+ "\n",
+ "> ✅ 这里有[更多使用多项式回归的数据示例](https://online.stat.psu.edu/stat501/lesson/9/9.8)\n",
+ ">\n",
+ "> 再次看看之前图中品种与价格的关系。这个散点图看起来是否一定应该用一条直线来分析?可能并不是。在这种情况下,你可以尝试使用多项式回归。\n",
+ ">\n",
+ "> ✅ 多项式是可能包含一个或多个变量和系数的数学表达式\n",
+ "\n",
+ "#### 使用训练集训练一个多项式回归模型\n",
+ "\n",
+ "多项式回归会创建一条*曲线*,以更好地拟合非线性数据。\n",
+ "\n",
+ "让我们看看多项式模型是否能在预测中表现得更好。我们将遵循与之前类似的步骤:\n",
+ "\n",
+ "- 创建一个配方,指定对数据进行建模前需要执行的预处理步骤,例如:对预测变量进行编码并计算次数为 *n* 的多项式\n",
+ "\n",
+ "- 构建一个模型规范\n",
+ "\n",
+ "- 将配方和模型规范打包到一个工作流中\n",
+ "\n",
+ "- 通过拟合工作流来创建模型\n",
+ "\n",
+ "- 评估模型在测试数据上的表现\n",
+ "\n",
+ "让我们开始吧!\n"
+ ],
+ "metadata": {
+ "id": "VcEIpRV9wzYr"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Specify a recipe\r\n",
+ "poly_pumpkins_recipe <-\r\n",
+ " recipe(price ~ package, data = pumpkins_train) %>%\r\n",
+ " step_integer(all_predictors(), zero_based = TRUE) %>% \r\n",
+ " step_poly(all_predictors(), degree = 4)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a model specification\r\n",
+ "poly_spec <- linear_reg() %>% \r\n",
+ " set_engine(\"lm\") %>% \r\n",
+ " set_mode(\"regression\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Bundle recipe and model spec into a workflow\r\n",
+ "poly_wf <- workflow() %>% \r\n",
+ " add_recipe(poly_pumpkins_recipe) %>% \r\n",
+ " add_model(poly_spec)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Create a model\r\n",
+ "poly_wf_fit <- poly_wf %>% \r\n",
+ " fit(data = pumpkins_train)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print learned model coefficients\r\n",
+ "poly_wf_fit\r\n",
+ "\r\n",
+ " "
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "63n_YyRXw3CC"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "#### 评估模型性能\n",
+ "\n",
+ "👏👏你已经构建了一个多项式模型,现在让我们在测试集上进行预测吧!\n"
+ ],
+ "metadata": {
+ "id": "-LHZtztSxDP0"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make price predictions on test data\r\n",
+ "poly_results <- poly_wf_fit %>% predict(new_data = pumpkins_test) %>% \r\n",
+ " bind_cols(pumpkins_test %>% select(c(package, price))) %>% \r\n",
+ " relocate(.pred, .after = last_col())\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print the results\r\n",
+ "poly_results %>% \r\n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "YUFpQ_dKxJGx"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "Woo-hoo,让我们使用 `yardstick::metrics()` 来评估模型在 test_set 上的表现。\n"
+ ],
+ "metadata": {
+ "id": "qxdyj86bxNGZ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "metrics(data = poly_results, truth = price, estimate = .pred)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "8AW5ltkBxXDm"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "🤩🤩 表现更出色。\n",
+ "\n",
+ "`rmse` 从大约 7 降至大约 3,这表明实际价格与预测价格之间的误差减少了。你可以*粗略地*理解为平均而言,错误预测的误差大约为 \\$3。`rsq` 从大约 0.4 增加到 0.8。\n",
+ "\n",
+ "所有这些指标都表明多项式模型的表现远优于线性模型。干得好!\n",
+ "\n",
+ "让我们看看是否可以将其可视化!\n"
+ ],
+ "metadata": {
+ "id": "6gLHNZDwxYaS"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Bind encoded package column to the results\r\n",
+ "poly_results <- poly_results %>% \r\n",
+ " bind_cols(package_encode %>% \r\n",
+ " rename(package_integer = package)) %>% \r\n",
+ " relocate(package_integer, .after = package)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Print new results data frame\r\n",
+ "poly_results %>% \r\n",
+ " slice_head(n = 5)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Make a scatter plot\r\n",
+ "poly_results %>% \r\n",
+ " ggplot(mapping = aes(x = package_integer, y = price)) +\r\n",
+ " geom_point(size = 1.6) +\r\n",
+ " # Overlay a line of best fit\r\n",
+ " geom_line(aes(y = .pred), color = \"midnightblue\", size = 1.2) +\r\n",
+ " xlab(\"package\")\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "A83U16frxdF1"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "您可以看到一条更符合您数据的曲线!🤩\n",
+ "\n",
+ "您可以通过向 `geom_smooth` 传递一个多项式公式,使其更加平滑,如下所示:\n"
+ ],
+ "metadata": {
+ "id": "4U-7aHOVxlGU"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make a scatter plot\r\n",
+ "poly_results %>% \r\n",
+ " ggplot(mapping = aes(x = package_integer, y = price)) +\r\n",
+ " geom_point(size = 1.6) +\r\n",
+ " # Overlay a line of best fit\r\n",
+ " geom_smooth(method = lm, formula = y ~ poly(x, degree = 4), color = \"midnightblue\", size = 1.2, se = FALSE) +\r\n",
+ " xlab(\"package\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "5vzNT0Uexm-w"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "就像一条平滑的曲线!🤩\n",
+ "\n",
+ "以下是如何进行新的预测:\n"
+ ],
+ "metadata": {
+ "id": "v9u-wwyLxq4G"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make a hypothetical data frame\r\n",
+ "hypo_tibble <- tibble(package = \"bushel baskets\")\r\n",
+ "\r\n",
+ "# Make predictions using linear model\r\n",
+ "lm_pred <- lm_wf_fit %>% predict(new_data = hypo_tibble)\r\n",
+ "\r\n",
+ "# Make predictions using polynomial model\r\n",
+ "poly_pred <- poly_wf_fit %>% predict(new_data = hypo_tibble)\r\n",
+ "\r\n",
+ "# Return predictions in a list\r\n",
+ "list(\"linear model prediction\" = lm_pred, \r\n",
+ " \"polynomial model prediction\" = poly_pred)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "jRPSyfQGxuQv"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "`多项式模型`的预测是合理的,结合`价格`和`包装`的散点图来看确实如此!而且,如果这个模型比之前的模型更好,那么根据相同的数据,你需要为这些更贵的南瓜做好预算!\n",
+ "\n",
+ "🏆 干得好!你在一节课中创建了两个回归模型。在回归的最后一部分,你将学习逻辑回归以确定类别。\n",
+ "\n",
+ "## **🚀挑战**\n",
+ "\n",
+ "在这个笔记本中测试几个不同的变量,看看相关性如何影响模型的准确性。\n",
+ "\n",
+ "## [**课后测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/14/)\n",
+ "\n",
+ "## **复习与自学**\n",
+ "\n",
+ "在本课中我们学习了线性回归。还有其他重要的回归类型。阅读关于逐步回归、岭回归、套索回归和弹性网络技术的内容。一个很好的课程是[斯坦福统计学习课程](https://online.stanford.edu/courses/sohs-ystatslearning-statistical-learning)。\n",
+ "\n",
+ "如果你想了解更多关于如何使用出色的Tidymodels框架,请查看以下资源:\n",
+ "\n",
+ "- Tidymodels官网:[Tidymodels入门](https://www.tidymodels.org/start/)\n",
+ "\n",
+ "- Max Kuhn 和 Julia Silge,[*Tidy Modeling with R*](https://www.tmwr.org/)*.*\n",
+ "\n",
+ "###### **特别感谢:**\n",
+ "\n",
+ "[Allison Horst](https://twitter.com/allison_horst?lang=en) 创作了令人惊叹的插图,使R语言更加友好和吸引人。可以在她的[画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)中找到更多插图。\n"
+ ],
+ "metadata": {
+ "id": "8zOLOWqMxzk5"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/3-Linear/solution/notebook.ipynb b/translations/zh-CN/2-Regression/3-Linear/solution/notebook.ipynb
new file mode 100644
index 000000000..5577a3185
--- /dev/null
+++ b/translations/zh-CN/2-Regression/3-Linear/solution/notebook.ipynb
@@ -0,0 +1,1113 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 南瓜定价的线性和多项式回归 - 第三课\n",
+ "\n",
+ "加载所需的库和数据集。将数据转换为包含数据子集的数据框:\n",
+ "\n",
+ "- 仅获取按蒲式耳定价的南瓜\n",
+ "- 将日期转换为月份\n",
+ "- 计算价格为高价和低价的平均值\n",
+ "- 将价格转换为按蒲式耳数量定价\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 167,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " City Name \n",
+ " Type \n",
+ " Package \n",
+ " Variety \n",
+ " Sub Variety \n",
+ " Grade \n",
+ " Date \n",
+ " Low Price \n",
+ " High Price \n",
+ " Mostly Low \n",
+ " ... \n",
+ " Unit of Sale \n",
+ " Quality \n",
+ " Condition \n",
+ " Appearance \n",
+ " Storage \n",
+ " Crop \n",
+ " Repack \n",
+ " Trans Mode \n",
+ " Unnamed: 24 \n",
+ " Unnamed: 25 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " 4/29/17 \n",
+ " 270.0 \n",
+ " 280.0 \n",
+ " 270.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " E \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " 5/6/17 \n",
+ " 270.0 \n",
+ " 280.0 \n",
+ " 270.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " E \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 11/5/16 \n",
+ " 90.0 \n",
+ " 100.0 \n",
+ " 90.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 26 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City Name Type Package Variety Sub Variety Grade Date \\\n",
+ "0 BALTIMORE NaN 24 inch bins NaN NaN NaN 4/29/17 \n",
+ "1 BALTIMORE NaN 24 inch bins NaN NaN NaN 5/6/17 \n",
+ "2 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 9/24/16 \n",
+ "3 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 9/24/16 \n",
+ "4 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 11/5/16 \n",
+ "\n",
+ " Low Price High Price Mostly Low ... Unit of Sale Quality Condition \\\n",
+ "0 270.0 280.0 270.0 ... NaN NaN NaN \n",
+ "1 270.0 280.0 270.0 ... NaN NaN NaN \n",
+ "2 160.0 160.0 160.0 ... NaN NaN NaN \n",
+ "3 160.0 160.0 160.0 ... NaN NaN NaN \n",
+ "4 90.0 100.0 90.0 ... NaN NaN NaN \n",
+ "\n",
+ " Appearance Storage Crop Repack Trans Mode Unnamed: 24 Unnamed: 25 \n",
+ "0 NaN NaN NaN E NaN NaN NaN \n",
+ "1 NaN NaN NaN E NaN NaN NaN \n",
+ "2 NaN NaN NaN N NaN NaN NaN \n",
+ "3 NaN NaN NaN N NaN NaN NaN \n",
+ "4 NaN NaN NaN N NaN NaN NaN \n",
+ "\n",
+ "[5 rows x 26 columns]"
+ ]
+ },
+ "execution_count": 167,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "from datetime import datetime\n",
+ "\n",
+ "pumpkins = pd.read_csv('../../data/US-pumpkins.csv')\n",
+ "pumpkins.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 168,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " Month \n",
+ " DayOfYear \n",
+ " Variety \n",
+ " City \n",
+ " Package \n",
+ " Low Price \n",
+ " High Price \n",
+ " Price \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 70 \n",
+ " 9 \n",
+ " 267 \n",
+ " PIE TYPE \n",
+ " BALTIMORE \n",
+ " 1 1/9 bushel cartons \n",
+ " 15.0 \n",
+ " 15.0 \n",
+ " 13.636364 \n",
+ " \n",
+ " \n",
+ " 71 \n",
+ " 9 \n",
+ " 267 \n",
+ " PIE TYPE \n",
+ " BALTIMORE \n",
+ " 1 1/9 bushel cartons \n",
+ " 18.0 \n",
+ " 18.0 \n",
+ " 16.363636 \n",
+ " \n",
+ " \n",
+ " 72 \n",
+ " 10 \n",
+ " 274 \n",
+ " PIE TYPE \n",
+ " BALTIMORE \n",
+ " 1 1/9 bushel cartons \n",
+ " 18.0 \n",
+ " 18.0 \n",
+ " 16.363636 \n",
+ " \n",
+ " \n",
+ " 73 \n",
+ " 10 \n",
+ " 274 \n",
+ " PIE TYPE \n",
+ " BALTIMORE \n",
+ " 1 1/9 bushel cartons \n",
+ " 17.0 \n",
+ " 17.0 \n",
+ " 15.454545 \n",
+ " \n",
+ " \n",
+ " 74 \n",
+ " 10 \n",
+ " 281 \n",
+ " PIE TYPE \n",
+ " BALTIMORE \n",
+ " 1 1/9 bushel cartons \n",
+ " 15.0 \n",
+ " 15.0 \n",
+ " 13.636364 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " Month DayOfYear Variety City Package Low Price \\\n",
+ "70 9 267 PIE TYPE BALTIMORE 1 1/9 bushel cartons 15.0 \n",
+ "71 9 267 PIE TYPE BALTIMORE 1 1/9 bushel cartons 18.0 \n",
+ "72 10 274 PIE TYPE BALTIMORE 1 1/9 bushel cartons 18.0 \n",
+ "73 10 274 PIE TYPE BALTIMORE 1 1/9 bushel cartons 17.0 \n",
+ "74 10 281 PIE TYPE BALTIMORE 1 1/9 bushel cartons 15.0 \n",
+ "\n",
+ " High Price Price \n",
+ "70 15.0 13.636364 \n",
+ "71 18.0 16.363636 \n",
+ "72 18.0 16.363636 \n",
+ "73 17.0 15.454545 \n",
+ "74 15.0 13.636364 "
+ ]
+ },
+ "execution_count": 168,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pumpkins = pumpkins[pumpkins['Package'].str.contains('bushel', case=True, regex=True)]\n",
+ "\n",
+ "new_columns = ['Package', 'Variety', 'City Name', 'Month', 'Low Price', 'High Price', 'Date']\n",
+ "pumpkins = pumpkins.drop([c for c in pumpkins.columns if c not in new_columns], axis=1)\n",
+ "\n",
+ "price = (pumpkins['Low Price'] + pumpkins['High Price']) / 2\n",
+ "\n",
+ "month = pd.DatetimeIndex(pumpkins['Date']).month\n",
+ "day_of_year = pd.to_datetime(pumpkins['Date']).apply(lambda dt: (dt-datetime(dt.year,1,1)).days)\n",
+ "\n",
+ "new_pumpkins = pd.DataFrame(\n",
+ " {'Month': month, \n",
+ " 'DayOfYear' : day_of_year, \n",
+ " 'Variety': pumpkins['Variety'], \n",
+ " 'City': pumpkins['City Name'], \n",
+ " 'Package': pumpkins['Package'], \n",
+ " 'Low Price': pumpkins['Low Price'],\n",
+ " 'High Price': pumpkins['High Price'], \n",
+ " 'Price': price})\n",
+ "\n",
+ "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1 1/9'), 'Price'] = price/1.1\n",
+ "new_pumpkins.loc[new_pumpkins['Package'].str.contains('1/2'), 'Price'] = price*2\n",
+ "\n",
+ "new_pumpkins.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "散点图提醒我们,我们只有从八月到十二月的月度数据。我们可能需要更多数据才能以线性方式得出结论。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 169,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 169,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "new_pumpkins.plot.scatter('Month','Price')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 170,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 170,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "new_pumpkins.plot.scatter('DayOfYear','Price')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 171,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "-0.14878293554077535\n",
+ "-0.16673322492745407\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(new_pumpkins['Month'].corr(new_pumpkins['Price']))\n",
+ "print(new_pumpkins['DayOfYear'].corr(new_pumpkins['Price']))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "看起来相关性很小,但存在一些其他更重要的关系——因为上面图中的价格点似乎有几个不同的聚类。让我们制作一个图表来显示不同的南瓜品种:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 172,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "ax=None\n",
+ "colors = ['red','blue','green','yellow']\n",
+ "for i,var in enumerate(new_pumpkins['Variety'].unique()):\n",
+ " ax = new_pumpkins[new_pumpkins['Variety']==var].plot.scatter('DayOfYear','Price',ax=ax,c=colors[i],label=var)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 173,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 173,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "new_pumpkins.groupby('Variety')['Price'].mean().plot(kind='bar')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 174,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "-0.2669192282197318\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 174,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "pie_pumpkins = new_pumpkins[new_pumpkins['Variety']=='PIE TYPE']\n",
+ "print(pie_pumpkins['DayOfYear'].corr(pie_pumpkins['Price']))\n",
+ "pie_pumpkins.plot.scatter('DayOfYear','Price')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 线性回归\n",
+ "\n",
+ "我们将使用 Scikit Learn 来训练线性回归模型:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 175,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.linear_model import LinearRegression\n",
+ "from sklearn.metrics import r2_score, mean_squared_error, mean_absolute_error\n",
+ "from sklearn.model_selection import train_test_split"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 176,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean error: 2.77 (17.2%)\n"
+ ]
+ }
+ ],
+ "source": [
+ "X = pie_pumpkins['DayOfYear'].to_numpy().reshape(-1,1)\n",
+ "y = pie_pumpkins['Price']\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
+ "lin_reg = LinearRegression()\n",
+ "lin_reg.fit(X_train,y_train)\n",
+ "\n",
+ "pred = lin_reg.predict(X_test)\n",
+ "\n",
+ "mse = np.sqrt(mean_squared_error(y_test,pred))\n",
+ "print(f'Mean error: {mse:3.3} ({mse/np.mean(pred)*100:3.3}%)')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 177,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "execution_count": 177,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.scatter(X_test,y_test)\n",
+ "plt.plot(X_test,pred)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "线的斜率可以通过线性回归系数确定:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 178,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(array([-0.01751876]), 21.133734359909326)"
+ ]
+ },
+ "execution_count": 178,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "lin_reg.coef_, lin_reg.intercept_"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "我们可以使用训练好的模型来预测价格:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 179,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([16.64893156])"
+ ]
+ },
+ "execution_count": 179,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Pumpkin price on programmer's day\n",
+ "\n",
+ "lin_reg.predict([[256]])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 多项式回归\n",
+ "\n",
+ "有时,特征与结果之间的关系本质上是非线性的。例如,南瓜的价格可能在冬季(月份=1,2)较高,然后在夏季(月份=5-7)下降,之后再次上涨。线性回归无法准确捕捉这种关系。\n",
+ "\n",
+ "在这种情况下,我们可以考虑添加额外的特征。一种简单的方法是从输入特征中生成多项式,这样就形成了**多项式回归**。在 Scikit Learn 中,我们可以使用管道自动预计算多项式特征:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 180,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean error: 2.73 (17.0%)\n",
+ "Model determination: 0.07639977655280217\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "execution_count": 180,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.preprocessing import PolynomialFeatures\n",
+ "from sklearn.pipeline import make_pipeline\n",
+ "\n",
+ "pipeline = make_pipeline(PolynomialFeatures(2), LinearRegression())\n",
+ "\n",
+ "pipeline.fit(X_train,y_train)\n",
+ "\n",
+ "pred = pipeline.predict(X_test)\n",
+ "\n",
+ "mse = np.sqrt(mean_squared_error(y_test,pred))\n",
+ "print(f'Mean error: {mse:3.3} ({mse/np.mean(pred)*100:3.3}%)')\n",
+ "\n",
+ "score = pipeline.score(X_train,y_train)\n",
+ "print('Model determination: ', score)\n",
+ "\n",
+ "plt.scatter(X_test,y_test)\n",
+ "plt.plot(sorted(X_test),pipeline.predict(sorted(X_test)))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 编码品种\n",
+ "\n",
+ "在理想情况下,我们希望能够使用同一个模型预测不同南瓜品种的价格。为了考虑品种因素,我们首先需要将其转换为数值形式,也就是**编码**。有几种方法可以实现:\n",
+ "\n",
+ "* 简单的数值编码,这种方法会构建一个不同品种的表格,然后用表中的索引替换品种名称。这对于线性回归来说并不是最好的选择,因为线性回归会考虑索引的数值,而这些数值可能与价格没有直接的数值相关性。\n",
+ "* 独热编码(One-hot encoding),这种方法会将`Variety`列替换为4个不同的列,每个品种对应一个列。如果某一行属于某个品种,该列的值为1,否则为0。\n",
+ "\n",
+ "下面的代码展示了如何对品种进行独热编码:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 181,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " FAIRYTALE \n",
+ " MINIATURE \n",
+ " MIXED HEIRLOOM VARIETIES \n",
+ " PIE TYPE \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 70 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 71 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 72 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 73 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " 74 \n",
+ " 0 \n",
+ " 0 \n",
+ " 0 \n",
+ " 1 \n",
+ " \n",
+ " \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " ... \n",
+ " \n",
+ " \n",
+ " 1738 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1739 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1740 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1741 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 1742 \n",
+ " 0 \n",
+ " 1 \n",
+ " 0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
415 rows × 4 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " FAIRYTALE MINIATURE MIXED HEIRLOOM VARIETIES PIE TYPE\n",
+ "70 0 0 0 1\n",
+ "71 0 0 0 1\n",
+ "72 0 0 0 1\n",
+ "73 0 0 0 1\n",
+ "74 0 0 0 1\n",
+ "... ... ... ... ...\n",
+ "1738 0 1 0 0\n",
+ "1739 0 1 0 0\n",
+ "1740 0 1 0 0\n",
+ "1741 0 1 0 0\n",
+ "1742 0 1 0 0\n",
+ "\n",
+ "[415 rows x 4 columns]"
+ ]
+ },
+ "execution_count": 181,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "pd.get_dummies(new_pumpkins['Variety'])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 品种的线性回归\n",
+ "\n",
+ "我们现在将使用与上面相同的代码,但输入将从 `DayOfYear` 改为我们经过独热编码的品种:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 182,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X = pd.get_dummies(new_pumpkins['Variety'])\n",
+ "y = new_pumpkins['Price']"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 183,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean error: 5.24 (19.7%)\n",
+ "Model determination: 0.774085281105197\n"
+ ]
+ }
+ ],
+ "source": [
+ "def run_linear_regression(X,y):\n",
+ " X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
+ " lin_reg = LinearRegression()\n",
+ " lin_reg.fit(X_train,y_train)\n",
+ "\n",
+ " pred = lin_reg.predict(X_test)\n",
+ "\n",
+ " mse = np.sqrt(mean_squared_error(y_test,pred))\n",
+ " print(f'Mean error: {mse:3.3} ({mse/np.mean(pred)*100:3.3}%)')\n",
+ "\n",
+ " score = lin_reg.score(X_train,y_train)\n",
+ " print('Model determination: ', score)\n",
+ "\n",
+ "run_linear_regression(X,y)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "我们也可以以同样的方式尝试使用其他特征,并将它们与数值特征结合,例如 `Month` 或 `DayOfYear`:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 184,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean error: 2.84 (10.5%)\n",
+ "Model determination: 0.9401096672643048\n"
+ ]
+ }
+ ],
+ "source": [
+ "X = pd.get_dummies(new_pumpkins['Variety']) \\\n",
+ " .join(new_pumpkins['Month']) \\\n",
+ " .join(pd.get_dummies(new_pumpkins['City'])) \\\n",
+ " .join(pd.get_dummies(new_pumpkins['Package']))\n",
+ "y = new_pumpkins['Price']\n",
+ "\n",
+ "run_linear_regression(X,y)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 多项式回归\n",
+ "\n",
+ "多项式回归同样可以用于经过独热编码的分类特征。训练多项式回归的代码基本上与我们之前看到的代码相同。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 185,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Mean error: 2.23 (8.25%)\n",
+ "Model determination: 0.9652870784724543\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.preprocessing import PolynomialFeatures\n",
+ "from sklearn.pipeline import make_pipeline\n",
+ "\n",
+ "pipeline = make_pipeline(PolynomialFeatures(2), LinearRegression())\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n",
+ "\n",
+ "pipeline.fit(X_train,y_train)\n",
+ "\n",
+ "pred = pipeline.predict(X_test)\n",
+ "\n",
+ "mse = np.sqrt(mean_squared_error(y_test,pred))\n",
+ "print(f'Mean error: {mse:3.3} ({mse/np.mean(pred)*100:3.3}%)')\n",
+ "\n",
+ "score = pipeline.score(X_train,y_train)\n",
+ "print('Model determination: ', score)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
+ },
+ "kernelspec": {
+ "display_name": "Python 3.7.0 64-bit ('3.7')",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.9.5"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "d77bd89ae7e79780c68c58bab91f13f8",
+ "translation_date": "2025-09-03T19:18:23+00:00",
+ "source_file": "2-Regression/3-Linear/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/4-Logistic/README.md b/translations/zh-CN/2-Regression/4-Logistic/README.md
new file mode 100644
index 000000000..5fdcd816a
--- /dev/null
+++ b/translations/zh-CN/2-Regression/4-Logistic/README.md
@@ -0,0 +1,414 @@
+# 使用逻辑回归预测类别
+
+
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+> ### [本课程也提供 R 版本!](../../../../2-Regression/4-Logistic/solution/R/lesson_4.html)
+
+## 简介
+
+在本课程中,我们将学习逻辑回归,这是经典机器学习技术之一。你可以使用这种技术发现模式以预测二元类别。例如,这颗糖果是巧克力还是不是巧克力?这种疾病是否具有传染性?这个顾客会选择这个产品还是不会?
+
+在本课程中,你将学习:
+
+- 一个新的数据可视化库
+- 逻辑回归的技术
+
+✅ 在这个 [学习模块](https://docs.microsoft.com/learn/modules/train-evaluate-classification-models?WT.mc_id=academic-77952-leestott) 中深入了解如何使用这种回归方法。
+
+## 前置知识
+
+通过之前的南瓜数据集练习,我们已经足够熟悉它,并意识到其中有一个可以处理的二元类别:`Color`。
+
+让我们构建一个逻辑回归模型来预测给定一些变量时,_某个南瓜可能的颜色_(橙色 🎃 或白色 👻)。
+
+> 为什么在回归课程中讨论二元分类?仅仅是为了语言上的方便,因为逻辑回归实际上是[一种分类方法](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression),尽管它是基于线性的。在下一组课程中,你将学习其他分类数据的方法。
+
+## 定义问题
+
+对于我们的目的,我们将问题表达为一个二元类别:“白色”或“非白色”。数据集中还有一个“条纹”类别,但实例较少,因此我们不会使用它。实际上,在移除数据集中的空值后,它也会消失。
+
+> 🎃 有趣的事实:我们有时称白色南瓜为“幽灵”南瓜。它们不太容易雕刻,因此不像橙色南瓜那么受欢迎,但它们看起来很酷!所以我们也可以将问题重新表述为:“幽灵”或“非幽灵”。👻
+
+## 关于逻辑回归
+
+逻辑回归与之前学习的线性回归有几个重要的不同点。
+
+[](https://youtu.be/KpeCT6nEpBY "机器学习初学者 - 理解逻辑回归用于分类")
+
+> 🎥 点击上方图片观看关于逻辑回归的简短视频概述。
+
+### 二元分类
+
+逻辑回归与线性回归的功能不同。前者预测二元类别(例如“白色或非白色”),而后者能够预测连续值,例如根据南瓜的产地和收获时间,_价格将上涨多少_。
+
+
+> 信息图由 [Dasani Madipalli](https://twitter.com/dasani_decoded) 提供
+
+### 其他分类
+
+逻辑回归还有其他类型,包括多项式和有序分类:
+
+- **多项式分类**:涉及多个类别,例如“橙色、白色和条纹”。
+- **有序分类**:涉及有序类别,适用于逻辑排序的结果,例如按有限大小排序的南瓜(迷你、小、中、大、特大、超大)。
+
+
+
+### 变量不需要相关
+
+还记得线性回归在变量相关性较高时效果更好吗?逻辑回归正好相反——变量不需要相关性。这适用于数据中相关性较弱的情况。
+
+### 需要大量干净数据
+
+逻辑回归在使用更多数据时会给出更准确的结果;我们的数据集较小,因此并不理想。
+
+[](https://youtu.be/B2X4H9vcXTs "机器学习初学者 - 数据分析与准备用于逻辑回归")
+
+> 🎥 点击上方图片观看关于准备线性回归数据的简短视频概述。
+
+✅ 思考哪些类型的数据适合逻辑回归。
+
+## 练习 - 整理数据
+
+首先,清理数据,删除空值并选择部分列:
+
+1. 添加以下代码:
+
+ ```python
+
+ columns_to_select = ['City Name','Package','Variety', 'Origin','Item Size', 'Color']
+ pumpkins = full_pumpkins.loc[:, columns_to_select]
+
+ pumpkins.dropna(inplace=True)
+ ```
+
+ 你可以随时查看新的数据框:
+
+ ```python
+ pumpkins.info
+ ```
+
+### 可视化 - 分类图
+
+现在你已经加载了[起始笔记本](../../../../2-Regression/4-Logistic/notebook.ipynb),其中包含南瓜数据,并清理了数据以保留一些变量,包括 `Color`。让我们使用一个不同的库 [Seaborn](https://seaborn.pydata.org/index.html) 在笔记本中可视化数据框。Seaborn 是基于我们之前使用的 Matplotlib 构建的。
+
+Seaborn 提供了一些很棒的方式来可视化数据。例如,你可以在分类图中比较 `Variety` 和 `Color` 数据的分布。
+
+1. 使用 `catplot` 函数创建这样的图,使用南瓜数据 `pumpkins`,并为每个南瓜类别(橙色或白色)指定颜色映射:
+
+ ```python
+ import seaborn as sns
+
+ palette = {
+ 'ORANGE': 'orange',
+ 'WHITE': 'wheat',
+ }
+
+ sns.catplot(
+ data=pumpkins, y="Variety", hue="Color", kind="count",
+ palette=palette,
+ )
+ ```
+
+ 
+
+ 通过观察数据,你可以看到 `Color` 数据与 `Variety` 的关系。
+
+ ✅ 根据这个分类图,你能想到哪些有趣的探索?
+
+### 数据预处理:特征和标签编码
+
+我们的南瓜数据集的所有列都包含字符串值。处理分类数据对人类来说很直观,但对机器来说却不然。机器学习算法更适合处理数字数据。这就是为什么编码是数据预处理阶段非常重要的一步,它使我们能够将分类数据转换为数值数据,而不会丢失任何信息。良好的编码有助于构建良好的模型。
+
+对于特征编码,主要有两种编码器:
+
+1. **有序编码器**:适用于有序变量,即数据具有逻辑顺序的分类变量,例如数据集中的 `Item Size` 列。它创建一个映射,使每个类别由一个数字表示,该数字是列中类别的顺序。
+
+ ```python
+ from sklearn.preprocessing import OrdinalEncoder
+
+ item_size_categories = [['sml', 'med', 'med-lge', 'lge', 'xlge', 'jbo', 'exjbo']]
+ ordinal_features = ['Item Size']
+ ordinal_encoder = OrdinalEncoder(categories=item_size_categories)
+ ```
+
+2. **分类编码器**:适用于无序变量,即数据没有逻辑顺序的分类变量,例如数据集中除 `Item Size` 之外的所有特征。它是一种独热编码,这意味着每个类别由一个二进制列表示:如果南瓜属于该类别,则编码变量为 1,否则为 0。
+
+ ```python
+ from sklearn.preprocessing import OneHotEncoder
+
+ categorical_features = ['City Name', 'Package', 'Variety', 'Origin']
+ categorical_encoder = OneHotEncoder(sparse_output=False)
+ ```
+
+然后,使用 `ColumnTransformer` 将多个编码器合并为一个步骤,并将其应用于适当的列。
+
+```python
+ from sklearn.compose import ColumnTransformer
+
+ ct = ColumnTransformer(transformers=[
+ ('ord', ordinal_encoder, ordinal_features),
+ ('cat', categorical_encoder, categorical_features)
+ ])
+
+ ct.set_output(transform='pandas')
+ encoded_features = ct.fit_transform(pumpkins)
+```
+
+另一方面,为了编码标签,我们使用 scikit-learn 的 `LabelEncoder` 类,这是一个实用类,用于将标签标准化,使其仅包含 0 到 n_classes-1(这里是 0 和 1)之间的值。
+
+```python
+ from sklearn.preprocessing import LabelEncoder
+
+ label_encoder = LabelEncoder()
+ encoded_label = label_encoder.fit_transform(pumpkins['Color'])
+```
+
+完成特征和标签编码后,我们可以将它们合并为一个新的数据框 `encoded_pumpkins`。
+
+```python
+ encoded_pumpkins = encoded_features.assign(Color=encoded_label)
+```
+
+✅ 使用有序编码器处理 `Item Size` 列有哪些优势?
+
+### 分析变量之间的关系
+
+现在我们已经对数据进行了预处理,可以分析特征和标签之间的关系,以了解模型在给定特征的情况下预测标签的能力。
+
+分析这种关系的最佳方式是绘制数据。我们将再次使用 Seaborn 的 `catplot` 函数,以分类图的形式可视化 `Item Size`、`Variety` 和 `Color` 之间的关系。为了更好地绘制数据,我们将使用编码后的 `Item Size` 列和未编码的 `Variety` 列。
+
+```python
+ palette = {
+ 'ORANGE': 'orange',
+ 'WHITE': 'wheat',
+ }
+ pumpkins['Item Size'] = encoded_pumpkins['ord__Item Size']
+
+ g = sns.catplot(
+ data=pumpkins,
+ x="Item Size", y="Color", row='Variety',
+ kind="box", orient="h",
+ sharex=False, margin_titles=True,
+ height=1.8, aspect=4, palette=palette,
+ )
+ g.set(xlabel="Item Size", ylabel="").set(xlim=(0,6))
+ g.set_titles(row_template="{row_name}")
+```
+
+
+
+### 使用蜂群图
+
+由于 `Color` 是一个二元类别(白色或非白色),它需要“[一种专门的方法](https://seaborn.pydata.org/tutorial/categorical.html?highlight=bar)来可视化”。还有其他方法可以可视化此类别与其他变量的关系。
+
+你可以使用 Seaborn 图表并排可视化变量。
+
+1. 尝试使用“蜂群图”来显示值的分布:
+
+ ```python
+ palette = {
+ 0: 'orange',
+ 1: 'wheat'
+ }
+ sns.swarmplot(x="Color", y="ord__Item Size", data=encoded_pumpkins, palette=palette)
+ ```
+
+ 
+
+**注意**:上述代码可能会生成警告,因为 Seaborn 无法在蜂群图中表示如此多的数据点。一个可能的解决方案是通过使用 `size` 参数减小标记的大小。然而,请注意,这会影响图表的可读性。
+
+> **🧮 数学原理**
+>
+> 逻辑回归依赖于“最大似然”概念,使用[Sigmoid 函数](https://wikipedia.org/wiki/Sigmoid_function)。在图表上,Sigmoid 函数看起来像一个“S”形。它将一个值映射到 0 和 1 之间的某个位置。它的曲线也被称为“逻辑曲线”。其公式如下:
+>
+> 
+>
+> 其中,Sigmoid 的中点位于 x 的 0 点,L 是曲线的最大值,k 是曲线的陡度。如果函数的结果大于 0.5,则该标签将被归类为二元选择中的“1”。否则,将被归类为“0”。
+
+## 构建模型
+
+在 Scikit-learn 中构建一个用于二元分类的模型非常简单。
+
+[](https://youtu.be/MmZS2otPrQ8 "机器学习初学者 - 用逻辑回归进行数据分类")
+
+> 🎥 点击上方图片观看关于构建线性回归模型的简短视频概述。
+
+1. 选择你想在分类模型中使用的变量,并调用 `train_test_split()` 分割训练集和测试集:
+
+ ```python
+ from sklearn.model_selection import train_test_split
+
+ X = encoded_pumpkins[encoded_pumpkins.columns.difference(['Color'])]
+ y = encoded_pumpkins['Color']
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+
+ ```
+
+2. 现在你可以通过调用 `fit()` 使用训练数据训练模型,并打印结果:
+
+ ```python
+ from sklearn.metrics import f1_score, classification_report
+ from sklearn.linear_model import LogisticRegression
+
+ model = LogisticRegression()
+ model.fit(X_train, y_train)
+ predictions = model.predict(X_test)
+
+ print(classification_report(y_test, predictions))
+ print('Predicted labels: ', predictions)
+ print('F1-score: ', f1_score(y_test, predictions))
+ ```
+
+ 查看模型的评分。考虑到数据只有大约 1000 行,结果还不错:
+
+ ```output
+ precision recall f1-score support
+
+ 0 0.94 0.98 0.96 166
+ 1 0.85 0.67 0.75 33
+
+ accuracy 0.92 199
+ macro avg 0.89 0.82 0.85 199
+ weighted avg 0.92 0.92 0.92 199
+
+ Predicted labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0
+ 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0
+ 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0
+ 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0
+ 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1
+ 0 0 0 1 0 0 0 0 0 0 0 0 1 1]
+ F1-score: 0.7457627118644068
+ ```
+
+## 使用混淆矩阵更好地理解模型
+
+虽然你可以通过打印上述项获得评分报告[术语](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.classification_report.html?highlight=classification_report#sklearn.metrics.classification_report),但使用[混淆矩阵](https://scikit-learn.org/stable/modules/model_evaluation.html#confusion-matrix)可能更容易理解模型的表现。
+
+> 🎓 “[混淆矩阵](https://wikipedia.org/wiki/Confusion_matrix)”(或“误差矩阵”)是一个表格,用于表达模型的真实与预测的正负情况,从而评估预测的准确性。
+
+1. 要使用混淆矩阵,调用 `confusion_matrix()`:
+
+ ```python
+ from sklearn.metrics import confusion_matrix
+ confusion_matrix(y_test, predictions)
+ ```
+
+ 查看模型的混淆矩阵:
+
+ ```output
+ array([[162, 4],
+ [ 11, 22]])
+ ```
+
+在 Scikit-learn 中,混淆矩阵的行(轴 0)是实际标签,列(轴 1)是预测标签。
+
+| | 0 | 1 |
+| :---: | :---: | :---: |
+| 0 | TN | FP |
+| 1 | FN | TP |
+
+这里发生了什么?假设我们的模型被要求在两个二元类别之间对南瓜进行分类,“白色”和“非白色”。
+
+- 如果模型预测南瓜为非白色,而实际上属于“非白色”类别,我们称之为真负(True Negative),显示在左上角。
+- 如果模型预测南瓜为白色,而实际上属于“非白色”类别,我们称之为假负(False Negative),显示在左下角。
+- 如果模型预测南瓜为非白色,而实际上属于“白色”类别,我们称之为假正(False Positive),显示在右上角。
+- 如果模型预测南瓜为白色,而实际上属于“白色”类别,我们称之为真正(True Positive),显示在右下角。
+
+正如你可能猜到的,较多的真正和真负以及较少的假正和假负表明模型表现更好。
+混淆矩阵如何与精确率和召回率相关联?请记住,上面打印的分类报告显示精确率为 0.85,召回率为 0.67。
+
+精确率 = tp / (tp + fp) = 22 / (22 + 4) = 0.8461538461538461
+
+召回率 = tp / (tp + fn) = 22 / (22 + 11) = 0.6666666666666666
+
+✅ 问:根据混淆矩阵,模型表现如何?
+答:还不错;有相当多的真正例,但也有一些假负例。
+
+让我们通过混淆矩阵中 TP/TN 和 FP/FN 的映射,重新回顾之前提到的术语:
+
+🎓 精确率(Precision):TP/(TP + FP)
+检索到的实例中,相关实例的比例(例如,哪些标签被正确标记)。
+
+🎓 召回率(Recall):TP/(TP + FN)
+相关实例中被检索到的比例,无论是否被正确标记。
+
+🎓 F1 分数(f1-score):(2 * precision * recall)/(precision + recall)
+精确率和召回率的加权平均值,最佳值为 1,最差值为 0。
+
+🎓 支持度(Support):
+每个标签被检索到的次数。
+
+🎓 准确率(Accuracy):(TP + TN)/(TP + TN + FP + FN)
+样本中标签被正确预测的百分比。
+
+🎓 宏平均(Macro Avg):
+对每个标签的指标进行无权重平均的计算,不考虑标签的不平衡。
+
+🎓 加权平均(Weighted Avg):
+对每个标签的指标进行加权平均的计算,权重由支持度(每个标签的真实实例数)决定。
+
+✅ 你能想到如果想减少假负例的数量,应该关注哪个指标吗?
+
+## 可视化该模型的 ROC 曲线
+
+[](https://youtu.be/GApO575jTA0 "机器学习入门 - 使用 ROC 曲线分析逻辑回归性能")
+
+> 🎥 点击上方图片观看关于 ROC 曲线的简短视频概述
+
+让我们再做一个可视化,看看所谓的“ROC”曲线:
+
+```python
+from sklearn.metrics import roc_curve, roc_auc_score
+import matplotlib
+import matplotlib.pyplot as plt
+%matplotlib inline
+
+y_scores = model.predict_proba(X_test)
+fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])
+
+fig = plt.figure(figsize=(6, 6))
+plt.plot([0, 1], [0, 1], 'k--')
+plt.plot(fpr, tpr)
+plt.xlabel('False Positive Rate')
+plt.ylabel('True Positive Rate')
+plt.title('ROC Curve')
+plt.show()
+```
+
+使用 Matplotlib 绘制模型的 [接收者操作特性曲线(ROC)](https://scikit-learn.org/stable/auto_examples/model_selection/plot_roc.html?highlight=roc)。ROC 曲线通常用于查看分类器输出的真阳性与假阳性之间的关系。“ROC 曲线通常以真阳性率为 Y 轴,假阳性率为 X 轴。”因此,曲线的陡峭程度以及曲线与中线之间的空间很重要:你希望曲线迅速向上并越过中线。在我们的例子中,起初有一些假阳性,然后曲线正确地向上并越过中线:
+
+
+
+最后,使用 Scikit-learn 的 [`roc_auc_score` API](https://scikit-learn.org/stable/modules/generated/sklearn.metrics.roc_auc_score.html?highlight=roc_auc#sklearn.metrics.roc_auc_score) 计算实际的“曲线下面积”(AUC):
+
+```python
+auc = roc_auc_score(y_test,y_scores[:,1])
+print(auc)
+```
+结果是 `0.9749908725812341`。由于 AUC 的范围是 0 到 1,你希望分数越大越好,因为一个 100% 正确预测的模型的 AUC 为 1;在这种情况下,该模型表现“相当不错”。
+
+在未来的分类课程中,你将学习如何迭代以提高模型的分数。但现在,恭喜你!你已经完成了这些回归课程!
+
+---
+
+## 🚀挑战
+
+关于逻辑回归还有很多内容可以深入探讨!但最好的学习方式是动手实践。找到一个适合这种分析的数据集,并用它构建一个模型。你学到了什么?提示:试试 [Kaggle](https://www.kaggle.com/search?q=logistic+regression+datasets) 上的一些有趣数据集。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+阅读 [斯坦福大学的这篇论文](https://web.stanford.edu/~jurafsky/slp3/5.pdf) 的前几页,了解逻辑回归的一些实际应用。思考哪些任务更适合我们到目前为止学习的回归类型。哪种方法效果更好?
+
+## 作业
+
+[重试这个回归任务](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/4-Logistic/assignment.md b/translations/zh-CN/2-Regression/4-Logistic/assignment.md
new file mode 100644
index 000000000..88dafe245
--- /dev/null
+++ b/translations/zh-CN/2-Regression/4-Logistic/assignment.md
@@ -0,0 +1,16 @@
+# 重试一些回归
+
+## 说明
+
+在课程中,你使用了南瓜数据的一个子集。现在,请回到原始数据,尝试使用全部数据(经过清理和标准化)来构建一个逻辑回归模型。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | --------------------------------------------------------------------- | ------------------------------------------------------------ | ---------------------------------------------------------- |
+| | 提交的笔记本包含一个解释清晰且表现良好的模型 | 提交的笔记本包含一个表现最低限度的模型 | 提交的笔记本包含一个表现不佳的模型或未提交模型 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/4-Logistic/notebook.ipynb b/translations/zh-CN/2-Regression/4-Logistic/notebook.ipynb
new file mode 100644
index 000000000..e63e7e00d
--- /dev/null
+++ b/translations/zh-CN/2-Regression/4-Logistic/notebook.ipynb
@@ -0,0 +1,269 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 南瓜品种与颜色\n",
+ "\n",
+ "加载所需的库和数据集。将数据转换为包含数据子集的数据框:\n",
+ "\n",
+ "让我们来看看颜色与品种之间的关系\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " City Name \n",
+ " Type \n",
+ " Package \n",
+ " Variety \n",
+ " Sub Variety \n",
+ " Grade \n",
+ " Date \n",
+ " Low Price \n",
+ " High Price \n",
+ " Mostly Low \n",
+ " ... \n",
+ " Unit of Sale \n",
+ " Quality \n",
+ " Condition \n",
+ " Appearance \n",
+ " Storage \n",
+ " Crop \n",
+ " Repack \n",
+ " Trans Mode \n",
+ " Unnamed: 24 \n",
+ " Unnamed: 25 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " 4/29/17 \n",
+ " 270.0 \n",
+ " 280.0 \n",
+ " 270.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " E \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " 5/6/17 \n",
+ " 270.0 \n",
+ " 280.0 \n",
+ " 270.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " E \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 11/5/16 \n",
+ " 90.0 \n",
+ " 100.0 \n",
+ " 90.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 26 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City Name Type Package Variety Sub Variety Grade Date \\\n",
+ "0 BALTIMORE NaN 24 inch bins NaN NaN NaN 4/29/17 \n",
+ "1 BALTIMORE NaN 24 inch bins NaN NaN NaN 5/6/17 \n",
+ "2 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 9/24/16 \n",
+ "3 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 9/24/16 \n",
+ "4 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 11/5/16 \n",
+ "\n",
+ " Low Price High Price Mostly Low ... Unit of Sale Quality Condition \\\n",
+ "0 270.0 280.0 270.0 ... NaN NaN NaN \n",
+ "1 270.0 280.0 270.0 ... NaN NaN NaN \n",
+ "2 160.0 160.0 160.0 ... NaN NaN NaN \n",
+ "3 160.0 160.0 160.0 ... NaN NaN NaN \n",
+ "4 90.0 100.0 90.0 ... NaN NaN NaN \n",
+ "\n",
+ " Appearance Storage Crop Repack Trans Mode Unnamed: 24 Unnamed: 25 \n",
+ "0 NaN NaN NaN E NaN NaN NaN \n",
+ "1 NaN NaN NaN E NaN NaN NaN \n",
+ "2 NaN NaN NaN N NaN NaN NaN \n",
+ "3 NaN NaN NaN N NaN NaN NaN \n",
+ "4 NaN NaN NaN N NaN NaN NaN \n",
+ "\n",
+ "[5 rows x 26 columns]"
+ ]
+ },
+ "execution_count": 1,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "\n",
+ "full_pumpkins = pd.read_csv('../data/US-pumpkins.csv')\n",
+ "\n",
+ "full_pumpkins.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.11.1"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "dee08c2b49057b0de8b6752c4dbca368",
+ "translation_date": "2025-09-03T19:30:06+00:00",
+ "source_file": "2-Regression/4-Logistic/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/4-Logistic/solution/Julia/README.md b/translations/zh-CN/2-Regression/4-Logistic/solution/Julia/README.md
new file mode 100644
index 000000000..f30fc4eeb
--- /dev/null
+++ b/translations/zh-CN/2-Regression/4-Logistic/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/4-Logistic/solution/R/lesson_4-R.ipynb b/translations/zh-CN/2-Regression/4-Logistic/solution/R/lesson_4-R.ipynb
new file mode 100644
index 000000000..6a61a9f92
--- /dev/null
+++ b/translations/zh-CN/2-Regression/4-Logistic/solution/R/lesson_4-R.ipynb
@@ -0,0 +1,685 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 构建逻辑回归模型 - 第4课\n",
+ "\n",
+ "\n",
+ "\n",
+ "#### **[课前测验](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/15/)**\n",
+ "\n",
+ "#### 介绍\n",
+ "\n",
+ "在关于回归的最后一课中,我们将学习一种经典的机器学习技术——逻辑回归。你可以使用这种技术发现模式来预测二元分类。例如,这颗糖果是巧克力还是不是?这种疾病是否具有传染性?这个顾客是否会选择这个产品?\n",
+ "\n",
+ "在本课中,你将学习:\n",
+ "\n",
+ "- 逻辑回归的技术\n",
+ "\n",
+ "✅ 在这个 [学习模块](https://learn.microsoft.com/training/modules/introduction-classification-models/?WT.mc_id=academic-77952-leestott) 中深入了解如何使用这种回归方法。\n",
+ "\n",
+ "## 前置知识\n",
+ "\n",
+ "在之前使用南瓜数据的过程中,我们已经足够熟悉它,并意识到其中有一个可以使用的二元分类:`Color`。\n",
+ "\n",
+ "让我们构建一个逻辑回归模型,根据一些变量来预测*某个南瓜可能的颜色*(橙色 🎃 或白色 👻)。\n",
+ "\n",
+ "> 为什么我们在关于回归的课程中讨论二元分类?仅仅是为了语言上的方便,因为逻辑回归实际上是[一种分类方法](https://scikit-learn.org/stable/modules/linear_model.html#logistic-regression),尽管它是基于线性的方法。在下一组课程中,你将学习其他分类数据的方法。\n",
+ "\n",
+ "在本课中,我们需要以下软件包:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个 [R 包集合](https://www.tidyverse.org/packages),旨在让数据科学更快、更简单、更有趣!\n",
+ "\n",
+ "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个 [包集合](https://www.tidymodels.org/packages),用于建模和机器学习。\n",
+ "\n",
+ "- `janitor`: [janitor 包](https://github.com/sfirke/janitor) 提供了一些简单的小工具,用于检查和清理脏数据。\n",
+ "\n",
+ "- `ggbeeswarm`: [ggbeeswarm 包](https://github.com/eclarke/ggbeeswarm) 提供了使用 ggplot2 创建蜜蜂群图的方法。\n",
+ "\n",
+ "你可以通过以下方式安装它们:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"janitor\", \"ggbeeswarm\"))`\n",
+ "\n",
+ "或者,下面的脚本会检查你是否已经安装了完成本模块所需的软件包,并在缺少时为你安装它们。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
+ "\n",
+ "pacman::p_load(tidyverse, tidymodels, janitor, ggbeeswarm)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## **定义问题**\n",
+ "\n",
+ "在我们的场景中,我们将问题定义为一个二元分类:“白色”或“非白色”。我们的数据集中还有一个“条纹”类别,但它的样本数量很少,因此我们不会使用它。实际上,当我们从数据集中移除空值后,这个类别也会消失。\n",
+ "\n",
+ "> 🎃 有趣的事实:我们有时会把白色南瓜称为“幽灵”南瓜。它们不太容易雕刻,因此不像橙色南瓜那么受欢迎,但它们看起来很酷!所以我们也可以将问题重新表述为:“幽灵”或“非幽灵”。👻\n",
+ "\n",
+ "## **关于逻辑回归**\n",
+ "\n",
+ "逻辑回归与之前学习的线性回归在几个重要方面有所不同。\n",
+ "\n",
+ "#### **二元分类**\n",
+ "\n",
+ "逻辑回归不具备线性回归的相同功能。前者提供关于`二元类别`(例如“橙色或非橙色”)的预测,而后者能够预测`连续值`,例如根据南瓜的产地和收获时间,*预测其价格将上涨多少*。\n",
+ "\n",
+ "\n",
+ "\n",
+ "### 其他分类方式\n",
+ "\n",
+ "逻辑回归还有其他类型,包括多项式和有序分类:\n",
+ "\n",
+ "- **多项式分类**,涉及多个类别——例如“橙色、白色和条纹”。\n",
+ "\n",
+ "- **有序分类**,涉及有序的类别,这在我们需要逻辑地排列结果时很有用,例如按南瓜的有限尺寸(迷你、小、中、大、特大、超大)进行排序。\n",
+ "\n",
+ "\n",
+ "\n",
+ "#### **变量不需要相关**\n",
+ "\n",
+ "还记得线性回归在变量相关性较强时效果更好吗?逻辑回归正好相反——变量不需要相关性。这非常适合我们的数据,因为它的相关性较弱。\n",
+ "\n",
+ "#### **需要大量干净的数据**\n",
+ "\n",
+ "如果使用更多数据,逻辑回归会提供更准确的结果;我们的数据集较小,因此并不是完成这项任务的最佳选择,请记住这一点。\n",
+ "\n",
+ "✅ 思考哪些类型的数据适合逻辑回归\n",
+ "\n",
+ "## 练习 - 整理数据\n",
+ "\n",
+ "首先,稍微清理一下数据,删除空值并选择部分列:\n",
+ "\n",
+ "1. 添加以下代码:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Load the core tidyverse packages\n",
+ "library(tidyverse)\n",
+ "\n",
+ "# Import the data and clean column names\n",
+ "pumpkins <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/2-Regression/data/US-pumpkins.csv\") %>% \n",
+ " clean_names()\n",
+ "\n",
+ "# Select desired columns\n",
+ "pumpkins_select <- pumpkins %>% \n",
+ " select(c(city_name, package, variety, origin, item_size, color)) \n",
+ "\n",
+ "# Drop rows containing missing values and encode color as factor (category)\n",
+ "pumpkins_select <- pumpkins_select %>% \n",
+ " drop_na() %>% \n",
+ " mutate(color = factor(color))\n",
+ "\n",
+ "# View the first few rows\n",
+ "pumpkins_select %>% \n",
+ " slice_head(n = 5)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "您可以随时使用 [*glimpse()*](https://pillar.r-lib.org/reference/glimpse.html) 函数来快速查看新的数据框,如下所示:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "pumpkins_select %>% \n",
+ " glimpse()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "让我们确认一下,我们实际上是在处理一个二分类问题:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Subset distinct observations in outcome column\n",
+ "pumpkins_select %>% \n",
+ " distinct(color)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 可视化 - 分类图\n",
+ "到目前为止,您已经再次加载了南瓜数据并进行了清理,以保留包含一些变量(包括颜色)的数据集。现在让我们使用 ggplot 库在笔记本中可视化这个数据框。\n",
+ "\n",
+ "ggplot 库提供了一些很棒的方法来可视化您的数据。例如,您可以在分类图中比较每种品种和颜色的数据分布。\n",
+ "\n",
+ "1. 使用 geombar 函数创建这样的图表,使用我们的南瓜数据,并为每种南瓜类别(橙色或白色)指定颜色映射:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "python"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Specify colors for each value of the hue variable\n",
+ "palette <- c(ORANGE = \"orange\", WHITE = \"wheat\")\n",
+ "\n",
+ "# Create the bar plot\n",
+ "ggplot(pumpkins_select, aes(y = variety, fill = color)) +\n",
+ " geom_bar(position = \"dodge\") +\n",
+ " scale_fill_manual(values = palette) +\n",
+ " labs(y = \"Variety\", fill = \"Color\") +\n",
+ " theme_minimal()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "通过观察数据,可以看到颜色数据与品种之间的关系。\n",
+ "\n",
+ "✅ 根据这个分类图,你能想到哪些有趣的探索方向?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "### 数据预处理:特征编码\n",
+ "\n",
+ "我们的南瓜数据集的所有列都包含字符串值。处理分类数据对人类来说很直观,但对机器来说却不是这样。机器学习算法更擅长处理数字数据。这就是为什么编码是数据预处理阶段中非常重要的一步,因为它使我们能够将分类数据转换为数值数据,同时不丢失任何信息。良好的编码能够帮助我们构建一个优秀的模型。\n",
+ "\n",
+ "对于特征编码,主要有两种类型的编码器:\n",
+ "\n",
+ "1. **序数编码器(Ordinal encoder)**:适用于序数变量,这类变量是具有逻辑顺序的分类变量,比如我们数据集中的 `item_size` 列。它会创建一个映射,使每个类别用一个数字表示,这个数字对应类别在列中的顺序。\n",
+ "\n",
+ "2. **分类编码器(Categorical encoder)**:适用于名义变量,这类变量是没有逻辑顺序的分类变量,比如我们数据集中除了 `item_size` 以外的所有特征。这是一种独热编码(one-hot encoding),意味着每个类别都会用一个二进制列表示:如果南瓜属于该类别,则编码变量等于1,否则为0。\n",
+ "\n",
+ "Tidymodels 提供了另一个非常实用的包:[recipes](https://recipes.tidymodels.org/),这是一个用于数据预处理的包。我们将定义一个 `recipe`,指定所有预测列都应该被编码为一组整数,然后通过 `prep` 来估算任何操作所需的量和统计数据,最后通过 `bake` 将这些计算应用到新数据上。\n",
+ "\n",
+ "> 通常情况下,recipes 通常用作建模的预处理器,它定义了为了让数据集适合建模需要应用哪些步骤。在这种情况下,**强烈建议** 使用 `workflow()`,而不是手动通过 prep 和 bake 来估算 recipe。我们稍后会详细讲解这一点。\n",
+ ">\n",
+ "> 不过目前,我们使用 recipes + prep + bake 来指定对数据集需要应用哪些步骤,以便让数据集准备好进行数据分析,然后提取应用了这些步骤的预处理数据。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Preprocess and extract data to allow some data analysis\n",
+ "baked_pumpkins <- recipe(color ~ ., data = pumpkins_select) %>%\n",
+ " # Define ordering for item_size column\n",
+ " step_mutate(item_size = ordered(item_size, levels = c('sml', 'med', 'med-lge', 'lge', 'xlge', 'jbo', 'exjbo'))) %>%\n",
+ " # Convert factors to numbers using the order defined above (Ordinal encoding)\n",
+ " step_integer(item_size, zero_based = F) %>%\n",
+ " # Encode all other predictors using one hot encoding\n",
+ " step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE) %>%\n",
+ " prep(data = pumpkin_select) %>%\n",
+ " bake(new_data = NULL)\n",
+ "\n",
+ "# Display the first few rows of preprocessed data\n",
+ "baked_pumpkins %>% \n",
+ " slice_head(n = 5)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "✅ 使用序数编码器对 Item Size 列进行编码有哪些优势?\n",
+ "\n",
+ "### 分析变量之间的关系\n",
+ "\n",
+ "现在我们已经对数据进行了预处理,可以分析特征与标签之间的关系,以了解模型在给定特征的情况下预测标签的能力。这类分析的最佳方式是对数据进行可视化。 \n",
+ "我们将再次使用 ggplot 的 geom_boxplot_ 函数,以分类图的形式展示 Item Size、Variety 和 Color 之间的关系。为了更好地绘制数据,我们将使用编码后的 Item Size 列和未编码的 Variety 列。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Define the color palette\n",
+ "palette <- c(ORANGE = \"orange\", WHITE = \"wheat\")\n",
+ "\n",
+ "# We need the encoded Item Size column to use it as the x-axis values in the plot\n",
+ "pumpkins_select_plot<-pumpkins_select\n",
+ "pumpkins_select_plot$item_size <- baked_pumpkins$item_size\n",
+ "\n",
+ "# Create the grouped box plot\n",
+ "ggplot(pumpkins_select_plot, aes(x = `item_size`, y = color, fill = color)) +\n",
+ " geom_boxplot() +\n",
+ " facet_grid(variety ~ ., scales = \"free_x\") +\n",
+ " scale_fill_manual(values = palette) +\n",
+ " labs(x = \"Item Size\", y = \"\") +\n",
+ " theme_minimal() +\n",
+ " theme(strip.text = element_text(size = 12)) +\n",
+ " theme(axis.text.x = element_text(size = 10)) +\n",
+ " theme(axis.title.x = element_text(size = 12)) +\n",
+ " theme(axis.title.y = element_blank()) +\n",
+ " theme(legend.position = \"bottom\") +\n",
+ " guides(fill = guide_legend(title = \"Color\")) +\n",
+ " theme(panel.spacing = unit(0.5, \"lines\"))+\n",
+ " theme(strip.text.y = element_text(size = 4, hjust = 0)) \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "#### 使用群集图\n",
+ "\n",
+ "由于颜色是一个二元类别(白色或非白色),它需要一种“[专门的方法](https://github.com/rstudio/cheatsheets/blob/main/data-visualization.pdf)”来进行可视化。\n",
+ "\n",
+ "尝试使用`群集图`来展示颜色相对于item_size的分布。\n",
+ "\n",
+ "我们将使用[ggbeeswarm包](https://github.com/eclarke/ggbeeswarm),该包提供了使用ggplot2创建蜂群式图的方法。蜂群图是一种将通常会重叠的点排列在彼此旁边的绘图方式。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Create beeswarm plots of color and item_size\n",
+ "baked_pumpkins %>% \n",
+ " mutate(color = factor(color)) %>% \n",
+ " ggplot(mapping = aes(x = color, y = item_size, color = color)) +\n",
+ " geom_quasirandom() +\n",
+ " scale_color_brewer(palette = \"Dark2\", direction = -1) +\n",
+ " theme(legend.position = \"none\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "现在我们已经了解了颜色的二元分类与更大尺寸类别之间的关系,接下来让我们探索逻辑回归,以确定某个南瓜可能的颜色。\n",
+ "\n",
+ "## 构建模型\n",
+ "\n",
+ "选择您想在分类模型中使用的变量,并将数据分为训练集和测试集。[rsample](https://rsample.tidymodels.org/) 是 Tidymodels 中的一个包,它提供了高效的数据分割和重采样的基础设施:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Split data into 80% for training and 20% for testing\n",
+ "set.seed(2056)\n",
+ "pumpkins_split <- pumpkins_select %>% \n",
+ " initial_split(prop = 0.8)\n",
+ "\n",
+ "# Extract the data in each split\n",
+ "pumpkins_train <- training(pumpkins_split)\n",
+ "pumpkins_test <- testing(pumpkins_split)\n",
+ "\n",
+ "# Print out the first 5 rows of the training set\n",
+ "pumpkins_train %>% \n",
+ " slice_head(n = 5)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "🙌 我们现在准备通过将训练特征与训练标签(颜色)进行拟合来训练模型。\n",
+ "\n",
+ "我们将首先创建一个配方,用于指定对数据进行建模前的预处理步骤,例如:将分类变量编码为一组整数。就像 `baked_pumpkins` 一样,我们创建了一个 `pumpkins_recipe`,但不会立即 `prep` 和 `bake`,因为这些步骤会被整合到一个工作流中,稍后您会看到具体操作。\n",
+ "\n",
+ "在 Tidymodels 中,有很多方法可以指定逻辑回归模型。请参阅 `?logistic_reg()`。目前,我们将通过默认的 `stats::glm()` 引擎来指定一个逻辑回归模型。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Create a recipe that specifies preprocessing steps for modelling\n",
+ "pumpkins_recipe <- recipe(color ~ ., data = pumpkins_train) %>% \n",
+ " step_mutate(item_size = ordered(item_size, levels = c('sml', 'med', 'med-lge', 'lge', 'xlge', 'jbo', 'exjbo'))) %>%\n",
+ " step_integer(item_size, zero_based = F) %>% \n",
+ " step_dummy(all_nominal(), -all_outcomes(), one_hot = TRUE)\n",
+ "\n",
+ "# Create a logistic model specification\n",
+ "log_reg <- logistic_reg() %>% \n",
+ " set_engine(\"glm\") %>% \n",
+ " set_mode(\"classification\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "现在我们已经有了一个配方和一个模型规范,我们需要找到一种方法将它们打包成一个对象。这个对象将首先对数据进行预处理(在幕后完成 prep 和 bake 操作),然后在预处理后的数据上拟合模型,同时还支持潜在的后处理操作。\n",
+ "\n",
+ "在 Tidymodels 中,这个方便的对象被称为 [`workflow`](https://workflows.tidymodels.org/),它能够方便地容纳你的建模组件。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Bundle modelling components in a workflow\n",
+ "log_reg_wf <- workflow() %>% \n",
+ " add_recipe(pumpkins_recipe) %>% \n",
+ " add_model(log_reg)\n",
+ "\n",
+ "# Print out the workflow\n",
+ "log_reg_wf\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "在*指定*工作流程后,可以使用[`fit()`](https://tidymodels.github.io/parsnip/reference/fit.html)函数对模型进行`训练`。工作流程会估算配方并在训练前对数据进行预处理,因此我们无需手动使用prep和bake来完成这些步骤。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Train the model\n",
+ "wf_fit <- log_reg_wf %>% \n",
+ " fit(data = pumpkins_train)\n",
+ "\n",
+ "# Print the trained workflow\n",
+ "wf_fit\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "模型训练期间打印出的内容显示了学习到的系数。\n",
+ "\n",
+ "现在我们已经使用训练数据训练了模型,可以使用 [parsnip::predict()](https://parsnip.tidymodels.org/reference/predict.model_fit.html) 对测试数据进行预测。让我们从使用模型预测测试集的标签以及每个标签的概率开始。当概率大于 0.5 时,预测类别为 `WHITE`,否则为 `ORANGE`。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Make predictions for color and corresponding probabilities\n",
+ "results <- pumpkins_test %>% select(color) %>% \n",
+ " bind_cols(wf_fit %>% \n",
+ " predict(new_data = pumpkins_test)) %>%\n",
+ " bind_cols(wf_fit %>%\n",
+ " predict(new_data = pumpkins_test, type = \"prob\"))\n",
+ "\n",
+ "# Compare predictions\n",
+ "results %>% \n",
+ " slice_head(n = 10)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "非常好!这为我们提供了更多关于逻辑回归工作原理的见解。\n",
+ "\n",
+ "### 通过混淆矩阵更好地理解\n",
+ "\n",
+ "将每个预测值与其对应的“真实值”进行比较,并不是评估模型预测效果的高效方法。幸运的是,Tidymodels 还有一些其他的技巧:[`yardstick`](https://yardstick.tidymodels.org/)——一个通过性能指标来衡量模型效果的工具包。\n",
+ "\n",
+ "与分类问题相关的一个性能指标是[`混淆矩阵`](https://wikipedia.org/wiki/Confusion_matrix)。混淆矩阵描述了分类模型的表现情况。它统计了模型对每个类别正确分类的样本数量。在我们的例子中,它会显示有多少橙色南瓜被正确分类为橙色,有多少白色南瓜被正确分类为白色;同时,混淆矩阵还会显示有多少样本被错误分类到**其他类别**。\n",
+ "\n",
+ "来自 yardstick 的 [**`conf_mat()`**](https://tidymodels.github.io/yardstick/reference/conf_mat.html) 函数可以计算观察值和预测值的交叉分类表。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Confusion matrix for prediction results\n",
+ "conf_mat(data = results, truth = color, estimate = .pred_class)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "让我们来解读混淆矩阵。我们的模型需要将南瓜分类为两个二元类别:类别 `white` 和类别 `not-white`。\n",
+ "\n",
+ "- 如果你的模型预测南瓜为白色,并且它实际上属于类别 'white',我们称之为 `true positive`,显示在左上角的数字。\n",
+ "\n",
+ "- 如果你的模型预测南瓜为非白色,并且它实际上属于类别 'white',我们称之为 `false negative`,显示在左下角的数字。\n",
+ "\n",
+ "- 如果你的模型预测南瓜为白色,并且它实际上属于类别 'not-white',我们称之为 `false positive`,显示在右上角的数字。\n",
+ "\n",
+ "- 如果你的模型预测南瓜为非白色,并且它实际上属于类别 'not-white',我们称之为 `true negative`,显示在右下角的数字。\n",
+ "\n",
+ "| 实际情况 |\n",
+ "|:-----:|\n",
+ "\n",
+ "| | | |\n",
+ "|---------------|--------|-------|\n",
+ "| **预测结果** | WHITE | ORANGE |\n",
+ "| WHITE | TP | FP |\n",
+ "| ORANGE | FN | TN |\n",
+ "\n",
+ "正如你可能猜到的,理想情况下我们希望有更多的 `true positive` 和 `true negative`,以及更少的 `false positive` 和 `false negative`,这意味着模型表现更好。\n",
+ "\n",
+ "混淆矩阵非常有用,因为它可以衍生出其他指标,帮助我们更好地评估分类模型的性能。让我们来看看其中的一些指标:\n",
+ "\n",
+ "🎓 精确率(Precision):`TP/(TP + FP)`,定义为预测为正的样本中实际为正的比例。也称为[正预测值](https://en.wikipedia.org/wiki/Positive_predictive_value \"Positive predictive value\")。\n",
+ "\n",
+ "🎓 召回率(Recall):`TP/(TP + FN)`,定义为实际为正的样本中被正确预测为正的比例。也称为 `敏感性`。\n",
+ "\n",
+ "🎓 特异性(Specificity):`TN/(TN + FP)`,定义为实际为负的样本中被正确预测为负的比例。\n",
+ "\n",
+ "🎓 准确率(Accuracy):`TP + TN/(TP + TN + FP + FN)`,表示样本中预测正确的标签所占的百分比。\n",
+ "\n",
+ "🎓 F值(F Measure):精确率和召回率的加权平均值,最佳值为1,最差值为0。\n",
+ "\n",
+ "让我们来计算这些指标吧!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Combine metric functions and calculate them all at once\n",
+ "eval_metrics <- metric_set(ppv, recall, spec, f_meas, accuracy)\n",
+ "eval_metrics(data = results, truth = color, estimate = .pred_class)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 可视化该模型的ROC曲线\n",
+ "\n",
+ "让我们进行另一个可视化操作,来查看所谓的[`ROC曲线`](https://en.wikipedia.org/wiki/Receiver_operating_characteristic):\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Make a roc_curve\n",
+ "results %>% \n",
+ " roc_curve(color, .pred_ORANGE) %>% \n",
+ " autoplot()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "ROC 曲线通常用于查看分类器输出的真阳性与假阳性之间的关系。ROC 曲线通常在 Y 轴上显示 `True Positive Rate`(真阳性率)/敏感性,在 X 轴上显示 `False Positive Rate`(假阳性率)/1-特异性。因此,曲线的陡峭程度以及曲线与对角线之间的空间很重要:你希望看到一条快速上升并越过对角线的曲线。在我们的例子中,起初存在一些假阳性,然后曲线正确地上升并越过对角线。\n",
+ "\n",
+ "最后,我们使用 `yardstick::roc_auc()` 来计算实际的曲线下面积(AUC)。AUC 的一种解释方式是:模型将一个随机正例排在一个随机负例之前的概率。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "vscode": {
+ "languageId": "r"
+ }
+ },
+ "outputs": [],
+ "source": [
+ "# Calculate area under curve\n",
+ "results %>% \n",
+ " roc_auc(color, .pred_ORANGE)\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "结果约为 `0.975`。由于 AUC 的范围是 0 到 1,你希望分数越大越好,因为一个模型如果能 100% 准确预测,其 AUC 将达到 1;在这个例子中,模型表现*相当不错*。\n",
+ "\n",
+ "在后续关于分类的课程中,你将学习如何提高模型的分数(例如在这种情况下处理数据不平衡的问题)。\n",
+ "\n",
+ "## 🚀挑战\n",
+ "\n",
+ "关于逻辑回归还有很多内容可以深入探讨!但学习的最佳方式是通过实践。寻找一个适合这种分析的数据集,并用它构建一个模型。你学到了什么?提示:可以尝试 [Kaggle](https://www.kaggle.com/search?q=logistic+regression+datasets) 上的有趣数据集。\n",
+ "\n",
+ "## 复习与自学\n",
+ "\n",
+ "阅读 [斯坦福大学这篇论文](https://web.stanford.edu/~jurafsky/slp3/5.pdf) 的前几页,了解逻辑回归的一些实际应用。思考哪些任务更适合我们到目前为止学习的不同回归类型。哪种方法效果最好?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "anaconda-cloud": "",
+ "kernelspec": {
+ "display_name": "R",
+ "langauge": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.4.1"
+ },
+ "coopTranslator": {
+ "original_hash": "feaf125f481a89c468fa115bf2aed580",
+ "translation_date": "2025-09-03T19:36:22+00:00",
+ "source_file": "2-Regression/4-Logistic/solution/R/lesson_4-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/4-Logistic/solution/notebook.ipynb b/translations/zh-CN/2-Regression/4-Logistic/solution/notebook.ipynb
new file mode 100644
index 000000000..eb3605031
--- /dev/null
+++ b/translations/zh-CN/2-Regression/4-Logistic/solution/notebook.ipynb
@@ -0,0 +1,1259 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 逻辑回归 - 第4课\n",
+ "\n",
+ "加载所需的库和数据集。将数据转换为包含数据子集的数据框:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " City Name \n",
+ " Type \n",
+ " Package \n",
+ " Variety \n",
+ " Sub Variety \n",
+ " Grade \n",
+ " Date \n",
+ " Low Price \n",
+ " High Price \n",
+ " Mostly Low \n",
+ " ... \n",
+ " Unit of Sale \n",
+ " Quality \n",
+ " Condition \n",
+ " Appearance \n",
+ " Storage \n",
+ " Crop \n",
+ " Repack \n",
+ " Trans Mode \n",
+ " Unnamed: 24 \n",
+ " Unnamed: 25 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " 4/29/17 \n",
+ " 270.0 \n",
+ " 280.0 \n",
+ " 270.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " E \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " 5/6/17 \n",
+ " 270.0 \n",
+ " 280.0 \n",
+ " 270.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " E \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 9/24/16 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " 160.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " BALTIMORE \n",
+ " NaN \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " NaN \n",
+ " NaN \n",
+ " 11/5/16 \n",
+ " 90.0 \n",
+ " 100.0 \n",
+ " 90.0 \n",
+ " ... \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " N \n",
+ " NaN \n",
+ " NaN \n",
+ " NaN \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 26 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City Name Type Package Variety Sub Variety Grade Date \n",
+ "0 BALTIMORE NaN 24 inch bins NaN NaN NaN 4/29/17 \\\n",
+ "1 BALTIMORE NaN 24 inch bins NaN NaN NaN 5/6/17 \n",
+ "2 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 9/24/16 \n",
+ "3 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 9/24/16 \n",
+ "4 BALTIMORE NaN 24 inch bins HOWDEN TYPE NaN NaN 11/5/16 \n",
+ "\n",
+ " Low Price High Price Mostly Low ... Unit of Sale Quality Condition \n",
+ "0 270.0 280.0 270.0 ... NaN NaN NaN \\\n",
+ "1 270.0 280.0 270.0 ... NaN NaN NaN \n",
+ "2 160.0 160.0 160.0 ... NaN NaN NaN \n",
+ "3 160.0 160.0 160.0 ... NaN NaN NaN \n",
+ "4 90.0 100.0 90.0 ... NaN NaN NaN \n",
+ "\n",
+ " Appearance Storage Crop Repack Trans Mode Unnamed: 24 Unnamed: 25 \n",
+ "0 NaN NaN NaN E NaN NaN NaN \n",
+ "1 NaN NaN NaN E NaN NaN NaN \n",
+ "2 NaN NaN NaN N NaN NaN NaN \n",
+ "3 NaN NaN NaN N NaN NaN NaN \n",
+ "4 NaN NaN NaN N NaN NaN NaN \n",
+ "\n",
+ "[5 rows x 26 columns]"
+ ]
+ },
+ "execution_count": 63,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "\n",
+ "full_pumpkins = pd.read_csv('../../data/US-pumpkins.csv')\n",
+ "\n",
+ "full_pumpkins.head()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " City Name \n",
+ " Package \n",
+ " Variety \n",
+ " Origin \n",
+ " Item Size \n",
+ " Color \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " BALTIMORE \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " DELAWARE \n",
+ " med \n",
+ " ORANGE \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " BALTIMORE \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " VIRGINIA \n",
+ " med \n",
+ " ORANGE \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " BALTIMORE \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " MARYLAND \n",
+ " lge \n",
+ " ORANGE \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " BALTIMORE \n",
+ " 24 inch bins \n",
+ " HOWDEN TYPE \n",
+ " MARYLAND \n",
+ " lge \n",
+ " ORANGE \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " BALTIMORE \n",
+ " 36 inch bins \n",
+ " HOWDEN TYPE \n",
+ " MARYLAND \n",
+ " med \n",
+ " ORANGE \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " City Name Package Variety Origin Item Size Color\n",
+ "2 BALTIMORE 24 inch bins HOWDEN TYPE DELAWARE med ORANGE\n",
+ "3 BALTIMORE 24 inch bins HOWDEN TYPE VIRGINIA med ORANGE\n",
+ "4 BALTIMORE 24 inch bins HOWDEN TYPE MARYLAND lge ORANGE\n",
+ "5 BALTIMORE 24 inch bins HOWDEN TYPE MARYLAND lge ORANGE\n",
+ "6 BALTIMORE 36 inch bins HOWDEN TYPE MARYLAND med ORANGE"
+ ]
+ },
+ "execution_count": 64,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Select the columns we want to use\n",
+ "columns_to_select = ['City Name','Package','Variety', 'Origin','Item Size', 'Color']\n",
+ "pumpkins = full_pumpkins.loc[:, columns_to_select]\n",
+ "\n",
+ "# Drop rows with missing values\n",
+ "pumpkins.dropna(inplace=True)\n",
+ "\n",
+ "pumpkins.head()"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 来看看我们的数据吧!\n",
+ "\n",
+ "通过使用 Seaborn 进行可视化\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 65,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import seaborn as sns\n",
+ "# Specify colors for each values of the hue variable\n",
+ "palette = {\n",
+ " 'ORANGE': 'orange',\n",
+ " 'WHITE': 'wheat',\n",
+ "}\n",
+ "# Plot a bar plot to visualize how many pumpkins of each variety are orange or white\n",
+ "sns.catplot(\n",
+ " data=pumpkins, y=\"Variety\", hue=\"Color\", kind=\"count\",\n",
+ " palette=palette, \n",
+ ")"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 数据预处理\n",
+ "\n",
+ "让我们对特征和标签进行编码,以便更好地绘制数据并训练模型\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array(['med', 'lge', 'sml', 'xlge', 'med-lge', 'jbo', 'exjbo'],\n",
+ " dtype=object)"
+ ]
+ },
+ "execution_count": 66,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Let's look at the different values of the 'Item Size' column\n",
+ "pumpkins['Item Size'].unique()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.preprocessing import OrdinalEncoder\n",
+ "# Encode the 'Item Size' column using ordinal encoding\n",
+ "item_size_categories = [['sml', 'med', 'med-lge', 'lge', 'xlge', 'jbo', 'exjbo']]\n",
+ "ordinal_features = ['Item Size']\n",
+ "ordinal_encoder = OrdinalEncoder(categories=item_size_categories)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.preprocessing import OneHotEncoder\n",
+ "# Encode all the other features using one-hot encoding\n",
+ "categorical_features = ['City Name', 'Package', 'Variety', 'Origin']\n",
+ "categorical_encoder = OneHotEncoder(sparse_output=False)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 69,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " ord__Item Size \n",
+ " cat__City Name_ATLANTA \n",
+ " cat__City Name_BALTIMORE \n",
+ " cat__City Name_BOSTON \n",
+ " cat__City Name_CHICAGO \n",
+ " cat__City Name_COLUMBIA \n",
+ " cat__City Name_DALLAS \n",
+ " cat__City Name_DETROIT \n",
+ " cat__City Name_LOS ANGELES \n",
+ " cat__City Name_MIAMI \n",
+ " ... \n",
+ " cat__Origin_MICHIGAN \n",
+ " cat__Origin_NEW JERSEY \n",
+ " cat__Origin_NEW YORK \n",
+ " cat__Origin_NORTH CAROLINA \n",
+ " cat__Origin_OHIO \n",
+ " cat__Origin_PENNSYLVANIA \n",
+ " cat__Origin_TENNESSEE \n",
+ " cat__Origin_TEXAS \n",
+ " cat__Origin_VERMONT \n",
+ " cat__Origin_VIRGINIA \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 48 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " ord__Item Size cat__City Name_ATLANTA cat__City Name_BALTIMORE \n",
+ "2 1.0 0.0 1.0 \\\n",
+ "3 1.0 0.0 1.0 \n",
+ "4 3.0 0.0 1.0 \n",
+ "5 3.0 0.0 1.0 \n",
+ "6 1.0 0.0 1.0 \n",
+ "\n",
+ " cat__City Name_BOSTON cat__City Name_CHICAGO cat__City Name_COLUMBIA \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__City Name_DALLAS cat__City Name_DETROIT cat__City Name_LOS ANGELES \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__City Name_MIAMI ... cat__Origin_MICHIGAN cat__Origin_NEW JERSEY \n",
+ "2 0.0 ... 0.0 0.0 \\\n",
+ "3 0.0 ... 0.0 0.0 \n",
+ "4 0.0 ... 0.0 0.0 \n",
+ "5 0.0 ... 0.0 0.0 \n",
+ "6 0.0 ... 0.0 0.0 \n",
+ "\n",
+ " cat__Origin_NEW YORK cat__Origin_NORTH CAROLINA cat__Origin_OHIO \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__Origin_PENNSYLVANIA cat__Origin_TENNESSEE cat__Origin_TEXAS \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__Origin_VERMONT cat__Origin_VIRGINIA \n",
+ "2 0.0 0.0 \n",
+ "3 0.0 1.0 \n",
+ "4 0.0 0.0 \n",
+ "5 0.0 0.0 \n",
+ "6 0.0 0.0 \n",
+ "\n",
+ "[5 rows x 48 columns]"
+ ]
+ },
+ "execution_count": 69,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.compose import ColumnTransformer\n",
+ "ct = ColumnTransformer(transformers=[\n",
+ " ('ord', ordinal_encoder, ordinal_features),\n",
+ " ('cat', categorical_encoder, categorical_features)\n",
+ " ])\n",
+ "# Get the encoded features as a pandas DataFrame\n",
+ "ct.set_output(transform='pandas')\n",
+ "encoded_features = ct.fit_transform(pumpkins)\n",
+ "encoded_features.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 70,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " ord__Item Size \n",
+ " cat__City Name_ATLANTA \n",
+ " cat__City Name_BALTIMORE \n",
+ " cat__City Name_BOSTON \n",
+ " cat__City Name_CHICAGO \n",
+ " cat__City Name_COLUMBIA \n",
+ " cat__City Name_DALLAS \n",
+ " cat__City Name_DETROIT \n",
+ " cat__City Name_LOS ANGELES \n",
+ " cat__City Name_MIAMI \n",
+ " ... \n",
+ " cat__Origin_NEW JERSEY \n",
+ " cat__Origin_NEW YORK \n",
+ " cat__Origin_NORTH CAROLINA \n",
+ " cat__Origin_OHIO \n",
+ " cat__Origin_PENNSYLVANIA \n",
+ " cat__Origin_TENNESSEE \n",
+ " cat__Origin_TEXAS \n",
+ " cat__Origin_VERMONT \n",
+ " cat__Origin_VIRGINIA \n",
+ " Color \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 5 \n",
+ " 3.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ " 6 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 1.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " ... \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0.0 \n",
+ " 0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
5 rows × 49 columns
\n",
+ "
"
+ ],
+ "text/plain": [
+ " ord__Item Size cat__City Name_ATLANTA cat__City Name_BALTIMORE \n",
+ "2 1.0 0.0 1.0 \\\n",
+ "3 1.0 0.0 1.0 \n",
+ "4 3.0 0.0 1.0 \n",
+ "5 3.0 0.0 1.0 \n",
+ "6 1.0 0.0 1.0 \n",
+ "\n",
+ " cat__City Name_BOSTON cat__City Name_CHICAGO cat__City Name_COLUMBIA \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__City Name_DALLAS cat__City Name_DETROIT cat__City Name_LOS ANGELES \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__City Name_MIAMI ... cat__Origin_NEW JERSEY cat__Origin_NEW YORK \n",
+ "2 0.0 ... 0.0 0.0 \\\n",
+ "3 0.0 ... 0.0 0.0 \n",
+ "4 0.0 ... 0.0 0.0 \n",
+ "5 0.0 ... 0.0 0.0 \n",
+ "6 0.0 ... 0.0 0.0 \n",
+ "\n",
+ " cat__Origin_NORTH CAROLINA cat__Origin_OHIO cat__Origin_PENNSYLVANIA \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__Origin_TENNESSEE cat__Origin_TEXAS cat__Origin_VERMONT \n",
+ "2 0.0 0.0 0.0 \\\n",
+ "3 0.0 0.0 0.0 \n",
+ "4 0.0 0.0 0.0 \n",
+ "5 0.0 0.0 0.0 \n",
+ "6 0.0 0.0 0.0 \n",
+ "\n",
+ " cat__Origin_VIRGINIA Color \n",
+ "2 0.0 0 \n",
+ "3 1.0 0 \n",
+ "4 0.0 0 \n",
+ "5 0.0 0 \n",
+ "6 0.0 0 \n",
+ "\n",
+ "[5 rows x 49 columns]"
+ ]
+ },
+ "execution_count": 70,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.preprocessing import LabelEncoder\n",
+ "# Encode the 'Color' column using label encoding\n",
+ "label_encoder = LabelEncoder()\n",
+ "encoded_label = label_encoder.fit_transform(pumpkins['Color'])\n",
+ "encoded_pumpkins = encoded_features.assign(Color=encoded_label)\n",
+ "encoded_pumpkins.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 71,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "['ORANGE', 'WHITE']"
+ ]
+ },
+ "execution_count": 71,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Let's look at the mapping between the encoded values and the original values\n",
+ "list(label_encoder.inverse_transform([0, 1]))"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 分析特征与标签之间的关系\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 81,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 81,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "palette = {\n",
+ " 'ORANGE': 'orange',\n",
+ " 'WHITE': 'wheat',\n",
+ "}\n",
+ "# We need the encoded Item Size column to use it as the x-axis values in the plot\n",
+ "pumpkins['Item Size'] = encoded_pumpkins['ord__Item Size']\n",
+ "\n",
+ "g = sns.catplot(\n",
+ " data=pumpkins,\n",
+ " x=\"Item Size\", y=\"Color\", row='Variety',\n",
+ " kind=\"box\", orient=\"h\",\n",
+ " sharex=False, margin_titles=True,\n",
+ " height=1.8, aspect=4, palette=palette,\n",
+ ")\n",
+ "# Defining axis labels \n",
+ "g.set(xlabel=\"Item Size\", ylabel=\"\").set(xlim=(0,6))\n",
+ "g.set_titles(row_template=\"{row_name}\")\n"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import warnings\n",
+ "warnings.filterwarnings(action='ignore', category=UserWarning, module='seaborn')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 37,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 37,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAioAAAGwCAYAAACHJU4LAAAAOXRFWHRTb2Z0d2FyZQBNYXRwbG90bGliIHZlcnNpb24zLjcuMSwgaHR0cHM6Ly9tYXRwbG90bGliLm9yZy/bCgiHAAAACXBIWXMAAA9hAAAPYQGoP6dpAAB9+0lEQVR4nO3deXQc1Z33/3dV9aatZcnaF1tesTEYL3gLdmzAxDAOkEDCPgES8jwTiJMZMkxgzu8MhFmAMUlIgkOWYSDJQIAwLE54MAEvbLHBbGFzAjZeZFuLV+1qqZffH1dSd6m7Zcu2UBs+r3N0wP3tunVv1e263666V7JisVgMERERkQxkD3cFRERERNJRoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLM9wV+BoRKNRdu/eTV5eHpZlDXd1RERE5DDEYjFaWlqoqKjAtge+Z3JcJyq7d++murp6uKshIiIiR6C2tpaqqqoB33NcJyp5eXmAaWgwGBzm2oiIiMjhaG5uprq6um8cH8hxnaj0Pu4JBoNKVERERI4zhzNtQ5NpRUREJGMpUREREZGMpURFREREMpYSFREREclYSlREREQkYylRERERkYylREVEREQylhIVERERyVhKVERERCRjKVERERGRjDXsv0J/165dfPe73+Xpp5+mvb2d8ePHc99993HqqacOd9U+2SIhqP1fOPg25E2E0ZeAJzseb3wR6laBNwg1l0N2wh+Nat0G238L4Tao/DwUzY3HultNrHULFMyA6i+C7TWxWBR2r4I9L0KgDMZcAf6R8W0Pvge1j5r/r/4SjJgSj4X2wbYHoKMOihdAxdlg9eTZ0W6ofRwOvAG542D0peDNjW+7dwPs+gN4ckwstyYea99pyu1uhvKzoWRBPBZuh+0PQ8tfYcRUqL4QHH9PW2JQ/xw0rAF/sWlLoGSwZ0FERA7BisViseHa+YEDB5g+fTqnn3463/jGNyguLubDDz9k3LhxjBs37pDbNzc3k5+fT1NTk/7Wz2B07oXVi6DpvfhrOTWweB1kj4JXroGP/jses31w2sNQ/QXY9hCs/1uIhePxCdfBrLuhZTM8twg6dsVjBdPhzNXg5MAL55vkp5c3HxY9DcXzYNOd8OYN7npO+0848QaTaKw9G7qb4rHyJfDZlRBph9VnmiSlV1YlnLkWghPgtWXwwd3xmOWBeb+Gmkth55Pw0kUQ7YrHx14Nc+6F9h2mLW3b4rH8KaZcXwG89GXY+UQ85smBhX+A0kUpD7mIiMQNZvwe1kTlxhtv5OWXX+bFF188ou2VqByhjd+ED1ckvz7qyzDmKnh+aXLMPxKWboKV4yDckhw/cx1sWg67n0qOTb7BJEKvXZccy58CC38Pvx9v7rgksmw4dzM8f647qep16gqTSGxanhyrWGr2u3pRcsyTB+dtgacmmzs1/S18CrbeDzt+lxybcB2MnAUbrkqO5Y6Fcz+M3+kREZGUBjN+D+sVdeXKlZx66ql8+ctfpqSkhOnTp/PLX/4y7ftDoRDNzc2uHzkCiXcCXK8/CbueTB0L7YMP70mdpIB5jFT3dPr97UxTbtN7sOW/k5MUMK9tvjd1knKocuuehh3/mzoWbjFtSZWk9JW7Mk3s8fT7bP0IDr6TOiYiIkdkWBOVjz76iHvuuYcJEybwzDPP8I1vfINvfetb/OpXv0r5/ttuu438/Py+n+rq6o+5xp8QvfMs+rP95iftdtkDxAJgedPH0u0TwMlKH/McYp/pyrW84AkMsO0hyrV9g4/BwMdPREQGbVgTlWg0yowZM/iP//gPpk+fzv/5P/+Hr3/96/zsZz9L+f6bbrqJpqamvp/a2tqPucafEDWXp3n9MvOTSk4NTPymmQSbxDJljroo9bajL0u/z+IFMP7rJgHoz/abWPGC5Nihyh11EdRcYerWX6DMtCWnJvW2NQOUW3N5+ljBDMiflDomIiJHZFgTlfLyck488UTXa5MnT2bHjh0p3+/3+wkGg64fOQJT/hkqPu9+rXgBTLvDrOCZdkd8pQ6YgX3+I+YOxYJHwV8Uj9k+mHkXFJwCM38II+e4y62+ECZdD6MvhonLcCUOeRNg3v0QKIbPPGAmpPby5JjXAsXmPXkTEwq1TKJRc4kpu/pL7n2OnGPqUjAVZv7IfQfEXwTzf2faMv937sTL8sC023uOwe3JCVLFUnPsqs6Fyf/knouSU2PqKyIix9SwTqa97LLLqK2tdU2m/Yd/+AdeeeUV/vSnPx1ye02mPUr73+hZnnyCWXmTqKPOLL/1BqH8HHASBvtIJ+x+2ixPLv9c8rLcxpegdbO5w1Aw1R1r2QJ7XoKscihb7B7su5tNuQAV55h994pFTX066qB4PuT1WxV24O348uSSfglGZyPU/dEkPxXnuO/eRLrMSqTuJlOfrHL3tnvWx5cnF85wx1q3QePzpv1lZ4E97Kv9RUSOC8fNqp+NGzfymc98hu9973tcdNFFvPrqq3z961/nF7/4BZdfnub2egIlKiIiIsef42bVz6xZs3j88cf57W9/y0knncS//uu/ctdddx1WkiIiIiKffMN6R+Vo6Y6KiIjI8ee4uaMiIiIiMhAlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsz3Du/JZbbuF73/ue67UTTjiBv/zlL8NUo34e9ANd8X9f3AHbH4Gm9yB/Coy+CJxAPN6wFuqeBX8h1FwOWeXxWMsW2P4QRDqh+gtQODMe626GbQ9C2zYYORsqzwO759TEorDrKdj7MmRVwZjLwVcQ3/bAn6H2MbAcGH0xBE+IxzobYev/QGgPlJ4OZWeBZZlYJAS1/wsH34a8iTD6EvBkx7dtfBHqVoE3aNqSXRWPtW6D7b+FcBtUfh6K5ia0pdXEWrdAwQyo/iLY3nhbdq+CPS9CoAzGXAH+kfFtD74HtY+a/6/+EoyYEo+F9sG2B6CjDooXQMXZYPXk2dFuqH0cDrwBueNg9KXgzY1vu3cD7PoDeHJMLLcmHmvfacrtbobys6FkQTwWboftD0PLX2HEVKi+EBx/T1tiUP8cNKwBf7FpS6Akvm3zX822sQhUXwAFp8RjXQdg6wPQsROKPmOOYV9bwrBrJex7FXJqoOYycw567X8dap8w9Rh9KeSNi8c66kxbQvuh/CxzzntFOo9d3930X7D9Z/H4uO/AnDv5pIt07CXSsRfL8eLJqcTyxI9ftLuNSNtuYrEonuxSbP+Ivlgs2k24dTexcDu2fwROdilWz/mOxWJEOhqJdu7H8gRMuY4vXm6omXB7HRYWTk4Fti/er2OREOHWXcQiIZxAEXZWEVbP5zsWixBpqyfa1YztzcHJqcSynXhbOvcRad+DZXtwciuxPVkJbWk3bYmGcbJLcQLx600sGibSuotouB3bl4+TU+ZqS7SjkUjnfizHjye3yt2WrhYibXUAODnl2L68hLZ09bSlEydQiJ1VktCWaE9bmrA92Ti5lVh2fOiKdB4g0t5g2pJTge2NX8ei4Q4irbt62lKME4hfb2LRCJG23US7W7F9wZ62OPG2dPaebz+e3Eqs3s8+EO1qNceIGJ7scmx//DMai3QRbttFLNyJ7S/oOd8JbWlvIBo6iOXJxpNbgdV7fQSioYOE2xuwsHFyK7C9OfFyw52m3Eg3TlYRTlZRv7bUEe1uwfbm4eSUu8738c6KxWKx4dr5LbfcwqOPPspzzz3X95rH46GoqGiAreKam5vJz8+nqamJYDB46A0G40Hr0O/JmwBnroOsMvjTFWaA7uVkwYLHzIC65T549etm0Oo1+Z9g+h1mcF5zJnQ2xGMj58IZfwTLA+v+BhrXxWP+kXD6H6FwBrxzK7xzc0KFLDj1JzDxOmhYB8+fC+HWeLjqfJj/KHQdhNWLzKDVK6cGFq+D7FHwyjXw0X/HY7YPTnvYDFLbHoL1fwuxcDw+4TqYdTe0bIbnFkHHrnisYDqcuRqcHHjhfJP89PLmw6KnoXgebLoT3rzBfXyn/SeceINJNNaeDd1N8Vj5EvjsSoi0w+ozTZLSK6sSzlwLwQnw2jL44O6EQ+SBeb+Gmkth55Pw0kUQTUhGx14Nc+6F9h2mLW3b4rH8KaZcXwG89GXY+UQ85smBhX+A0kXwwU/htW8CCR+tk2+Bk2+G/W/C2rNM4tWrZKE5DrEIrDkL9m2IxwIlcMYak7S9dSO8f0dCW2yY/UsY91XY/Qy8+EWIdMTjoy+Bzzxg+tZzC6Hlw3jsaPpuKp6RcNHegd9znIrFYnTteZNI2+74i5aNv+RUnOwSwi21dO19m8Tz7ckfh69wMtGuFjrrN5gvBj1s/wj8ZXMBi1DDq0Q7E/qC7SVQNhfbn0/3gQ/oPviBqy7ekSfhDdYQ6dhLqGGj67w42aX4SmZCNExn3Xpi3S3x6nqy8JfNw/Jk0bX3bSKttQml2vhKZuDJKSPcuouuPW+525I3Gl/RyUS72wjVrScW6YyX6wsSKJsLtodQw0aiHXsSivXgL52DEyigu2kL3fs3udtSMBnviHFEOg8QanjFJOm9m2YV4y+dBdEInfUbiHXFP/uWE8BfPhfbm0vXvncJN29LKNXCVzwNT24l4bZ6uhrfAKLxY5Rbja9oKrFwB6H69cTC8c+L5c0jUD4XbC9dja8TaU+4JlsO/tJZOFlFdDdvo3vfu+62jJiIt2Ai0VCTOd/R7nhbAoX4S+cAMUL1G4iGDiYcIx+B8nnYvjy69m8i3LTFVa6vaCqevFFE2hsJNb5mvuz1tiWnAl/xdIiEzPkOt8Wr68nBXz4POyGZzjSDGb+HPVF54okneOutt45o+yFLVH4/HVoOs05jr4KKpWbg6i+rAs5+E1aONt9G+1uy0QzOiYlIr5NvMYNf/8EboHAWzLsfnpqSHLO9cN5WeO6z0PpRcnzu/bBvI3y4Ijk26ssw5ip4fmlyzD8Slm6CleMg3JIcP3MdbFoOu59Kjk2+wSRCr12XHMufAgt/D78f7/oQAmYgPnezSbgSk6pep64wicSm5cmxiqVmv6sXJcc8eXDeFnhqsjth6LXwKdh6P+z4XXJswnUwchZsuCo5ljsWFr8AK8e4LlR9lr4H66+C/RuTY9OXmztU79ySHCtZZOLPzEqOOQE4bzusmg4du5Pj839nzslH9yfHjqbvpnLZsF1KhlS4bXfPgOdmOX78FQvo3Lkmue8C/or5dO/f5E5EenhHTATbSRq8AWxfPr7iaXTuej5FbSwCVWf0DLLtSVFf0SnmW3nL9qSYk1OOJ7fKJDhJO/USqFpEZ+1a95eQ3raUzaO7aQvRjsakmCd/LJYnO2nwBjP4+0tnmWOUQqDqDEING11JVS/vyJOIhdsJNyVfx+ysErz54wjVr08u1PIQqDqdzl3rUn4O/aWzCLfu7Lu742pL3mhs/wi69v45RbHZ+Mvm9bQlua8HKhfStectoglJVV9bCidDNJKUeALYgZF4CycT2v1SirbY5hjtfpFYQrLby1cyg0h7I5HWnUkxJ7cKf/G05DIzxGDG72F99APw4YcfUlFRQSAQYN68edx2222MGjUq5XtDoRChUPxkNTc3D02lDjdJAXMbPtWgBGbg+PCe9Bf67b9NnaSA+bbuyUkd278Rtv4mdSzaDZt/kTpJ6S13X4oLFZg7DImPlRKF9pm2pEpSwDxGqns6/T5zxqSONb0HW+5NeaEnFoXN96ZOUnrLbUu+IAOmLjljU8fCLaYtqZKU3nJ3rkwTezx1QgDmmG/+Rfr+sPXXqZMUMI+uEu+GJGpcZx69pBLphM33pK9T7ePpz8vR9N1PkUhbQ8rXY5GQSQhS9V0g0ro7ZZICEG6v73vM0F+0q4lwioGnZ6+EW7anTFIAIu31REPJAyWYdiQ+ZnDvtNvcmUiRpACE2+pSJil95SY8bnHVtruFcMuO1PsEwi07UiYpptx6Ymk+E9GORsKe1PskFibcsi1t3w63N7jvliTus70+ZUJgim0n3LqDVEkKQLh1Z8okBXrakuauZLRzH+HWXSljxKLmGKWpU6StnkjiXazEWJo2Ho+GdTLtnDlzuP/++1m1ahX33HMPW7duZcGCBbS0pO64t912G/n5+X0/1dXVH3ONU3D8YPvTx9MlGwBOtplbkoo9ULmW2TbtPgeI2f74PItB7ZOB9+kEwEpzEXQC6fd5qHIHastA5VpeGOi256HaYvsGH4ND1DcHSPNIcaByLcc8jkm7z4H62EBtOcT59g5Q7qeJlf4yaVkDfNezbNKdb8uyByw37XXhkPt00pd7iH0esi0DlGsdYVsYYB6FmWNxpG0Z6PjZA5Q7wPEDLAZoy0DtHPAYWQPWd+ByD3G+PyGGtSXnnHMOX/7yl5k6dSpLlizh//2//8fBgwd55JFHUr7/pptuoqmpqe+ntrY25fuO2ui/O/z31lxuflIJTjLzRXyFyTHLhjFfMfNGBltu+edg3NVmvkV/nlwY/3dmIutgy625zPykklMDE79pJsEmsUyZoy5Kve3oy9Lvs3gBjP+6e2JnL9tvYsULkmOHKnfURVBzBSkHikCZaUtOTeptawYod6DjVzDDHHtPbnLM8phzVr5k8OVWnQ9jr0x94fEVwIRrITh58OUequ9OSNN3P2U8uZUpX7e8uXiCNfHJ4v23y6vCyS5NGXNyKtOWa2cV48mrJmXftRw8+aOxfPmpy82txElTrie3AicnTVs8WTjBmrSJvze3CienPGXMk1uZtlzbX4g3OCp137VsvHmjsP2p+5iTU5H2GDk55XjzqlLGcPw4wTFYntTJ/UDHfqDjZ/ny8eSPTpM4WHjyqrGzilNuO9AxcrJLe853CrYXb7AGy5vimtJTridNuelePx5lVMo1YsQIJk6cyObNm1PG/X4/wWDQ9TMkTrvn8N5Xthim3gplZ5g5JYkdOLsKTnsIPFkw/xH3IxXbD7N+biZ7nrrCTDhNVHO5GXzGXg3jrsF1wcqfYiZQZleZ+SaJ37S9QbNPX76ZRJn4uMWyzZyNqnNhyj9Dxefd+yxeANPuMCt4pt3hvvgGykwbPAFY8Cj4EyY72z6YeZdZ1TLzhzByjrvc6gth0vVmRdLEZe625E0wc20Cxaa+iXcGPDnmtUCxeU/exIRCLZNo1Fxiyq7+knufI+eYuhRMhZk/ct9R8BeZeRuegPlvYuJleWDa7T3H4PbkBKliqTl2VeeaydCJF9+cGlNfX76ZeJy4UsfJgrn3mXM2+xfmHCa2Zdw15lxPuLYnuUpQMM30kbzxps8k3gHxFfS0Jduc98SVWZYDJ91s+ubUW01fTXSkfTfdHaHyNEnqJ4CTVWTmlCS03XIC+ItnYNkO/pKZ7s+LZeMrmortzcU38iQsn/s65eRU4gnW4ORW4+S6ByjLm2e29WThKz7F3ccsD/6SGVi2F3/xdKx+d+88+WPxZJfizR+PnVXiitn+QryFk3ECBXgLJrnaguPHVzITu68tiXfgbLyFU7D9QXyFU1yrmQCc7DKz39wKk7QltsWTg6/4FCzHbyZ9JvYxy8FXPL0ndgpWv7t3nmCNGYTzx+JkuxMk2z/C1MUXxFs4BdcwZvvw97TFVzKzX+Jl4S2YZI5B4aSkBMnMexmPJ7sUT/44V8zyZJljbnvxl8xwf0m0bHzFp5hzVjQVy5vn2rb3PHuCNUnJiuUL4ht5ErY3B1/RVPf5tr34S2b29LEZWK4vcxaeERNM3yyYiB1wL0CxA+b1T4phnUzbX2trK6NGjeKWW27hW9/61iHfP6SrfgBeuBJ2/tr8f9YY+OJHsPdVaH4fgidC0Wz3+9t3Qv0as8SzfIn74hVuh91Pm2f+FWe7l+XGYtD4vJkYWjjLvSwXzFLXvRvMAFJ6RnyJMZilrrtXmeXM5ee4l+VGw1D/rFmmXLLQvSwXYP8bPcuTTzArbxJ11Jnlt96gKTdhmSGRTtOWcJu5uxNwXxRpfAlaN5s7DAVT3bGWLbDnJbP8tWyx+4PZ3WzKBag4xz3Yx6KmPh11UDzfvSwX4MDb8eXJJf0SjM5GqPujSX4qznHfvYl0mZVI3U2mPln9vjXuWR9fnlzY7y5V6zZz3gIlZul3wpJJulvN3JBo2JzvxME+FjPLmttroWiee0k5QNP78eXJJQvd5zu0z5xvJ2DakjhYRbuh7hmzPLnsDHfiAse27yauivuETqLtLxruINqxD8vxYmcVu27lx6IRIh2NEIvgZJW4luWapa77iIU7sP0jXMtywSx1jYYOYHmysAMj+5ayglnqGunYA5Zlyk3oY7FYlGjHXmKREHZgpGtZLkA01ES0qxnLm+taYgxmqWukYy/YHpzskgHaUuxalgsQ6dxPrLsN25+P3S8Ji3a3maXWTsC1XNqU202k3cyncLKLXfNlzNLmvcQindiBQteyXHOMmomGmrC82a4lxuYYhXqOkdNzjJyEcqNE2hshGsbJKnItKTdtOUCsZ3my7XffpYp2txPt3Ifl+HvakniMwj3HKNZzjNKd7wLXknLTlpae5cnpzndj2rZEO/YQi3RjZ410LSkHiIQOEOtqxfLl4vjTzDXMIMfNqp9//Md/5Nxzz2X06NHs3r2bm2++mbfeeov333+f4uLUt9ASDXmiIiIiIsfccbPqZ+fOnVx66aXs27eP4uJi5s+fz4YNGw4rSREREZFPvmFNVB56KM2SSxEREREybDKtiIiISCIlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsz3BXINHtt9/OTTfdxLe//W3uuuuu4a4OPGi5/31xB2x/BJreg/wpMPoicALxeMNaqHsW/IVQczlklcdjLVtg+0MQ6YTqL0DhzHisuxm2PQht22DkbKg8D+yeUxOLwq6nYO/LkFUFYy4HX0F82wN/htrHwHJg9MUQPCEe62yErf8DoT1QejqUnQVWT5siIaj9Xzj4NuRNhNGXgCc7vm3ji1C3CrxB05bsqnisdRts/y2E26Dy81A0N6EtrSbWugUKZkD1F8H2xtuyexXseRECZTDmCvCPjG978D2ofdT8f/WXYMSUeCy0D7Y9AB11ULwAKs4GqyfPjnZD7eNw4A3IHQejLwVvbnzbvRtg1x/Ak2NiuTXxWPtOU253M5SfDSUL4rFwO2x/GFr+CiOmQvWF4Ph72hKD+uegYQ34i01bAiXxbZv/araNRaD6Aig4JR7rOgBbH4COnVD0GXMM+9oShl0rYd+rkFMDNZeZc9Br/+tQ+4Spx+hLIW9cPNZRZ9oS2g/lZ5lz3ivSeez67qolwL54nHK4bDcyeLFYjEhHI9HO/VieAJ6cSizH1xePhpoJt9dhYeHkVGD74v06FgkRbt1FLBLCCRRhZxVh9Xy+Y7EIkbZ6ol3N2N4cnJxKLNvp2zbSuY9I+x4s24OTW4ntyYrvs7udSNtuYtEwTnYpTiB+vYlFw0RadxENt2P78nFyyrB6+m4sFiPa0Uikcz+W48eTW+VuS1cLkbY6AJyccmxfXkJbunra0okTKMTOKkloS7SnLU3Ynmyc3EosOz50RToPEGlvMG3JqcD2xq9j0XAHkdZdPW0pxgnErzexaIRI226i3a3YvmBPW5x4Wzr3EunY29OWSqzezz4Q7Wo1x4gYnuxybH/8MxqLdBFu20Us3IntL8DJLnW3pb2BaOgglicbT24FVu/1EYiGDhJub8DCxsmtwPbmpOw3nzZWLBaLDXclADZu3MhFF11EMBjk9NNPP6xEpbm5mfz8fJqamggGg4d8/6D0T1JSyZsAZ66DrDL40xVmgO7lZMGCx8yAuuU+ePXrZtDqNfmfYPodZnBecyZ0NsRjI+fCGX8EywPr/gYa18Vj/pFw+h+hcAa8cyu8c3NChSw49Scw8TpoWAfPnwvh1ni46nyY/yh0HYTVi8yg1SunBhavg+xR8Mo18NF/x2O2D0572AxS2x6C9X8LsXA8PuE6mHU3tGyG5xZBx654rGA6nLkanBx44XyT/PTy5sOip6F4Hmy6E968wX18p/0nnHiDSTTWng3dTfFY+RL47EqItMPqM02S0iurEs5cC8EJ8Noy+ODuhEPkgXm/hppLYeeT8NJFEO2Kx8deDXPuhfYdpi1t2+Kx/CmmXF8BvPRl2PlEPObJgYV/gNJF8MFP4bVvAgkfrZNvgZNvhv1vwtqzTOLVq2ShOQ6xCKw5C/ZtiMcCJXDGGpO0vXUjvH9HQltsmP1LGPdV2P0MvPhFiHTE46Mvgc88YPrWcwuh5cN47Gj6bjqXZcSl5LgRi0YINbxKtDOhL9heAmVzsf35dB/4gO6DH7i28Y48CW+whkjHXkING13nxckuxVcyE6JhOuvWE+tu6YtZniz8ZfOwPFl07X2bSGttQqk2vpIZeHLKCLfuomvPWyT2XU/eaHxFJxPtbiNUt55YpDNeri9IoGwu2B5CDRuJduxJKNaDv3QOTqCA7qYtdO/f5G5LwWS8I8YR6TxAqOEVk6T3bppVjL90FkQjdNZvINYV/+xbTgB/+Vxsby5d+94l3LwtoVQLX/E0PLmVhNvq6Wp8A4jGj1FuNb6iqcTCHYTq1xMLxz8vljePQPlcsL10Nb5OpD3hmmw5+Etn4WQV0d28je5977rbMmIi3oKJRENNdNZvMF+eetsSKMRfOgeIEarfQDR0MOEY+QiUz8P25dG1fxPhpi2ucn1FU/HkjeKTaDDjd0YkKq2trcyYMYOf/vSn/Nu//RvTpk0b3kTlcJKUXmOvgoqlZuDqL6sCzn4TVo4230b7W7LRDM6JiUivk28xg1//wRugcBbMux+empIcs71w3lZ47rPQ+lFyfO79sG8jfLgiOTbqyzDmKnh+aXLMPxKWboKV4yDckhw/cx1sWg67n0qOTb7BJEKvXZccy58CC38Pvx9v7rgksmw4d7NJuBKTql6nrjCJxKblybGKpWa/qxclxzx5cN4WeGqyO2HotfAp2Ho/7PhdcmzCdTByFmy4KjmWOxYWvwArx7guVH2Wvgfrr4L9G5Nj05ebO1Tv3JIcK1lk4s/MSo45AThvO6yaDh0p7mrM/505Jx/dnxw7mr6bihKVQUk1eAPYvnx8xdPo3PV8iq0sAlVn9Ayy7UlRX9Ep5lt5y/akmJNTjie3yiQ4STv1EqhaRGftWveXkB7+snl0N20h2tGYFPPkj8XyZCcN3mAGf3/pLDp3rknRFkxbGja6kqpe3pEnEQu3E25Kvo7ZWSV488cRql+fXKjlIVB1Op271qX8HPpLZxFu3dl3d8fVlrzR2P4RdO39c4pis/GXzetpS3JfD1QupGvPW0QTkqq+thROhmgkKfEEsAMj8RZOJrT7pRRtscmqXuy6M/VJMZjxOyMe/Vx33XUsXbqUxYsX82//9m9p3xcKhQiFQn3/bm5u/jiqN7DaJ1IPSmAGjg/vSX+h3/7b1EkKmG/rnjS3/fZvhK2/SR2LdsPmX6ROUnrL3ZfiQgXmDkPiY6VEoX2mLamSFDCPkeqeTr/PnDGpY03vwZZ7k5MUMK9tvjd1ktJbblvyBRkwdckZmzoWbjFtSZWk9Ja7c2Wa2OOpEwIwx3zzL9L3h62/SZ2kgHl0lXg3JFHjOvPoJZVIp2lLujrVPp7+vBxN35WjFmmrT/l6tKuJcOvONFvFCLfuSJmkAETa64mGkgdKs78G12MG9067zZ2JFEkKQLitLmWS0lduwuMWV227Wwi37Ei9TyDcsiNlkmLKrSeW5jMR7Wgk7Em9T2Jhwi3b0vbtcHuD+25J4j7b64lFQiljsXB7T1tSJ+Th1p0pkxToaUuau5LRzn1EWtN8fmNRIh178ORWpo5/Sgx7ovLQQw/xxhtvsHFjmgt4gttuu43vfe97H0OtBsHxg+1PH0+XbAA42WZuSaoObA9UrmW2HajcdGx/fJ7FoPZ5iHKdAFheiKX4kDuB9Ps8VLnpLkaHKtfygieQOnaofToB87grmqYt9gDfbg55XixSXuicQPrHK5YD9gBtGbCPDVDfo+m7cvSsAdYyWE760AAxLCd9uZY94D4ta4DhoHfbVF8oLLtvnkraOqVjD9BO2yEWOdK2DHT8bMw6klRtGeD49dQpfXCgmI1FLE2KYx2iL2jNy7AegdraWr797W/zwAMPEAgMcCHucdNNN9HU1NT3U1tbe8htjszIQ7+lV83l5ieV4CQzX8RXmByzbBjzFTNvZLDlln8Oxl1t5lv058mFCX9nJrIOttyay8xPKjk1MPGbZhJsEsuUOeqi1NuOviz9PosXwPivuyd29rL9Jla8IDl2qHJHXQQ1V5i69RcoM23JqUm9bc0A5Q50/ApmmGPvyU2OWR5zzsqXDL7cqvNh3FWpL1i+Aph4LQQnD77cI+27ckyk+5ZsZxXjyasmZd+1HDzB0Vi+/JTbOrmVOGnK9eRW4OSkjlmeLJxgTdrE35tbhZNTnjLmya1MW67tL8QbHJW671o23rxR2P7UfczJqUh7jJyccrx5VSljOH6c4BishAnC7m0r05c7wPGzfPl4gqPTJCQWnrxq7KzilNsOdIyc7NKe852C7cXJKkkd+xQZ1kTl9ddfp7GxkRkzZuDxePB4PDz//PP8+Mc/xuPxEIm4v2H6/X6CwaDrZ0hctvfw3le2GKbeCmVnmDkliR04uwpOewg8WTD/EfcjFdsPs35uJnueusJMOE1UczlMuNZM7Bx3Da4LVv4UM4Eyu8rMN3ESPozeoNmnL99Mokx83GLZZs5G1bkw5Z+h4vPufRYvgGl3mBU80+6Ir9QBM7DPf8TcoVjwKPiLEtrig5l3mVUtM38II+e4y62+ECZdb1YkTVzmbkveBDPXJlBs6pv4Dd6TY14LFJv35E1MKNQyiUbNJabs6i+59zlyjqlLwVSY+SP3HQV/kZm34QmY/yYmXpYHpt3ecwxuT06QKpaaY1d1rpkMnXjxzakx9fXlm4nHiSt1nCyYe585Z7N/Yc5hYlvGXWPO9YRre5KrBAXTTB/JG2/6TOIdEF9BT1uyzXlPXJllOXDSzaZvTr3V9NVER9p300o9KEh6Tm41Tq57gLK8efiKpmJ7svAVn+LuY5YHf8kMLNuLv3g6Vr+7jZ78sXiyS/Hmj8fuN7jZ/kK8hZNxAgV4Cybh+hw6fnwlM7FtB3/JzH534Gy8hVOw/UF8hVOw/SPcbcguM/vNrcATrHG3xZODr/gULMePr3i6u49ZDr7i6T2xU7D6rW7xBGvw5FbiyR+Lk+1OkGz/CFMXXxBv4RRcw5jtw9/TFl/JzH6Jl4W3YJI5BoWTkhIkM+9lPJ7sUjz541wxy5NljrntxV8yw/0l0bLxFZ9izlnRVCxvnmvb3vPsCdYkJSuWL4hv5EnY3hx8RVPd59v24i+ZOfBdnE+JYZ1M29LSwvbt7jkGV199NZMmTeK73/0uJ5100oDbD+mqH4AHg0DC89PLYrD3VWh+H4InQtFs9/vbd0L9GrPEs3yJe7APt8Pup80z/4qz3ctyYzFofN5MDC2c5V6WC2ap694NZgApPSO+xBjMUtfdq8xy5vJz3Mtyo2Gof9YsUy5Z6F6WC7D/jZ7lySeYlTeJOurM8ltv0JSbOJkr0mnaEm4zd3cC/TL+xpegdbO5w1Aw1R1r2QJ7XjLLX8sWuz+Y3c2mXICKc9yDfSxq6tNRB8Xz3ctyAQ68HV+eXNIvwehshLo/muSn4hz33ZtIl1mJ1N1k6pPV71vjnvXx5cmF/e5StW4z5y1QYpZ+JyyZpLvVzA2Jhs35ThzsYzGzrLm9FormuZeUAzS9H1+eXLLQfb5D+8z5dgKmLYmDVbQb6p4xy5PLznAnLnBs+27ihHNNoj0q0a5WoqEDWJ4s7MDIvqWsYJa6Rjr2gGXhZJW4luXGYlGiHXuJRULYgZGuZbkA0VAT0a5mLG+ua4kxQCzcSaRjL9genOwS16ObWDRCpKMRYhGcrGLXslyASOd+Yt1t2P58bJ/7uhvtbjNLrZ2Aa7m0KbebSLtZFeRkF7vmy5ilzXuJRTqxA4VJy3KjXc1EQ01Y3mzXEmNzjEI9x8jpOUZOQrlRIu2NEA3jZBVh9XscHOk8QKxnebLtd9+lina3E+3ch+X4e9qSeIzCPcco1nOM4tdHs7R5H7FwB7a/wLWk3LSlpWd5crrz3ZiyLZ80x92qn0SLFi0a/lU/IiIiMmQGM35rlo6IiIhkrGFf9dPfunXrhrsKIiIikiGO6I7KwYMH+a//+i9uuukm9u/fD8Abb7zBrl27DrGliIiIyOEb9B2Vt99+m8WLF5Ofn8+2bdv4+te/TmFhIY899hg7duzg17/+9VDUU0RERD6FBn1H5frrr+eqq67iww8/dP3uk7/5m7/hhRdeOKaVExERkU+3QScqGzdu5P/+3/+b9HplZSX19al/JbSIiIjIkRh0ouL3+1P+jZ0PPviA4uLUv5VPRERE5EgMOlE577zzuPXWW+nuNn/wybIsduzYwXe/+10uvPDCY15BERER+fQadKLy/e9/n9bWVkpKSujo6GDhwoWMHz+evLw8/v3f/30o6igiIiKfUoNe9ZOfn8+zzz7LSy+9xNtvv01rayszZsxg8eLFh95YREREZBAGnajs2LGD0tJS5s+fz/z58/tej8Vi1NbWMmrUqGNaQREREfn0GvSjn5qaGmbMmMGWLVtcrzc2NjJmzJg0W4mIiIgM3hH9ZtrJkycze/ZsVq9e7Xo9w/6+oYiIiBznBp2oWJbFT3/6U/6//+//Y+nSpfz4xz92xURERESOlUHPUem9a/IP//APTJo0iUsvvZR33nmHf/mXfznmlRMREZFPt6P668nnnHMOf/rTnzjvvPN49dVXj1WdRERERIAjSFQWLlyIz+fr+/eJJ57IK6+8wgUXXKA5KiIiIocpEon0/fLUTyKfz4dtH9FUWBcrdhxnF83NzeTn59PU1EQwGBzu6oiIiBxSLBajvr6egwcPDndVhpRt24wZM8Z1c6PXYMbvw7qj0tzc3FdQqr/zk0gJg4iISHq9SUpJSQnZ2dmfyIUo0WiU3bt3U1dXx6hRo46qjYeVqBQUFFBXV0dJSQkjRoxIucNYLIZlWUQikSOujIiIyCdZJBLpS1JGjhw53NUZUsXFxezevZtwOIzX6z3icg4rUVmzZg2FhYUArF279oh3JiIi8mnWOyclOzt7mGsy9Hof+UQikaFPVBYuXJjy/0VERGTwPomPe/o7Vm087Om4e/fuZfv27a7X3nvvPa6++mouuugiHnzwwWNSIREREZFeh52oLFu2zPVbaBsbG1mwYAEbN24kFApx1VVX8Zvf/GZIKikiIiKfToedqGzYsIHzzjuv79+//vWvKSws5K233uLJJ5/kP/7jP1ixYsWQVFJERESOzi233MK0adOGuxqDdtiJSn19PTU1NX3/XrNmDRdccAEej5nmct555/Hhhx8e8wqKiIiIGYeXLVvG2LFj8fv9VFdXc+655yb9geBPmsNOVILBoOuX07z66qvMmTOn79+WZREKhY5p5URERAS2bdvGzJkzWbNmDcuXL+edd95h1apVnH766Vx33XUfWz2G4zfpHnaiMnfuXH784x8TjUZ59NFHaWlp4YwzzuiLf/DBB1RXVw9JJUVERD7Nrr32WizL4tVXX+XCCy9k4sSJTJkyheuvv54NGzYAsGPHDs4//3xyc3MJBoNcdNFFNDQ0pC0zGo1y6623UlVVhd/vZ9q0aaxataovvm3bNizL4uGHH2bhwoUEAgEeeOCBIW9rf4edqPzrv/4rK1euJCsri4svvph/+qd/oqCgoC/+0EMPaemyiIjIMbZ//35WrVrFddddR05OTlJ8xIgRRKNRzj//fPbv38/zzz/Ps88+y0cffcTFF1+cttwf/ehHfP/73+fOO+/k7bffZsmSJSmncdx44418+9vfZtOmTSxZsuSYt+9QDvuPEk6dOpVNmzbx8ssvU1ZW5nrsA3DJJZdw4oknHvMKioiIfJpt3ryZWCzGpEmT0r5n9erVvPPOO2zdurXv6cavf/1rpkyZwsaNG5k1a1bSNnfeeSff/e53ueSSSwC44447WLt2LXfddZdrcczf//3fc8EFFxzjVh2+Qf315KKiIs4///yUsaVLlx6TComIiEjc4fzt4E2bNlFdXe2agnHiiScyYsQINm3alJSoNDc3s3v3bk477TTX66eddhp//vOfXa+deuqpR1H7o3f0f39ZREREhsyECROwLIu//OUvw7L/VI+bPk5KVERERDJYYWEhS5YsYcWKFbS1tSXFDx48yOTJk6mtraW2trbv9ffff5+DBw+mnJYRDAapqKjg5Zdfdr3+8ssvZ9w0jkE9+hEREZGP34oVKzjttNOYPXs2t956K1OnTiUcDvPss89yzz338P7773PyySdz+eWXc9dddxEOh7n22mtZuHBh2kc3N9xwAzfffDPjxo1j2rRp3Hfffbz11lvDsrJnIEpUREREMtzYsWN54403+Pd//3e+853vUFdXR3FxMTNnzuSee+7BsiyefPJJli1bxmc/+1ls2+bss8/mJz/5Sdoyv/Wtb9HU1MR3vvMdGhsbOfHEE1m5ciUTJkz4GFt2aFbscGbppNDY2EhjYyPRaNT1+tSpU49JxQ5Hc3Mz+fn5NDU1EQwGP7b9ioiIHInOzk62bt3KmDFjCAQCw12dITVQWwczfg/6jsrrr7/OlVdeyaZNm/pmIluWRSwWw7IsIpHIYIsUERERSWnQicpXv/pVJk6cyL333ktpaSmWZQ1FvUREREQGn6h89NFH/O///i/jx48fivqIiIiI9Bn08uQzzzwz6ZfBiIiIiAyFQd9R+a//+i+uvPJK3n33XU466SS8Xq8rft555x2zyomIiMin26ATlfXr1/Pyyy/z9NNPJ8U0mVZERESOpUE/+lm2bBlXXHEFdXV1RKNR14+SFBERETmWBp2o7Nu3j3/4h3+gtLR0KOojIiIi0mfQicoFF1zA2rVrh6IuIiIiIi6DnqMyceJEbrrpJl566SVOPvnkpMm03/rWt45Z5UREROTT7YhW/eTm5vL888/z/PPPu2KWZSlRERER+ThEI7DnReiog6xyKF4AtjPku12xYgXLly+nvr6eU045hZ/85CfMnj17yPY36ERl69atx2zn99xzD/fccw/btm0DYMqUKfzLv/wL55xzzjHbx1F5sN9v3b24A7Y/Ak3vQf4UGH0ROAl/v6BhLdQ9C/5CqLncdJxeLVtg+0MQ6YTqL0DhzHisuxm2PQht22DkbKg8D+yeUxOLwq6nYO/LkFUFYy4HX0F82wN/htrHwHJg9MUQPCEe62yErf8DoT1QejqUnQW9v0k4EoLa/4WDb0PeRBh9CXiy49s2vgh1q8AbNG3JrorHWrfB9t9CuA0qPw9FcxPa0mpirVugYAZUfxFsb7wtu1eZD1agDMZcAf6R8W0Pvge1j5r/r/4SjJgSj4X2wbYHzAeyeAFUnA1Wz5PLaDfUPg4H3oDccTD6UvDmxrfduwF2/QE8OSaWWxOPte805XY3Q/nZULIgHgu3w/aHoeWvMGIqVF8Ijr+nLTGofw4a1oC/2LQlUBLftvmvZttYBKovgIJT4rGuA7D1AejYCUWfMcewry1h2LUS9r0KOTVQc5k5B732vw61T5h6jL4U8sbFYx11pi2h/VB+ljnnvSKdx67vrur/l1i9cFkXcvyIdO4j0r4Hy/bg5FZie7L6YtHudiJtu4lFwzjZpTiB+PUmFg0Tad1FNNyO7cvHySnD6um7sViMaEcjkc79WI4fT24VluOLl9vVQqStDgAnpxzblxcvN9JFuHUXsUgnTqAQO6uk77eex2JRIm31RLuasD3ZOLmVWHZ86Ip0HiDS3mDaklOB7Y1fx6LhDiKtu3raUowTiF9vYtEIkbbdRLtbsX3BnrY48bZ07iXSsbenLZVYvZ/9TFH7GLz+bXMN65VdBTN/ZK45Q+Thhx/m+uuv52c/+xlz5szhrrvuYsmSJfz1r3+lpKTk0AUcgSP+o4RdXV1s3bqVcePG4fEc2R9h/v3vf4/jOEyYMIFYLMavfvUrli9fzptvvsmUKVMOuf2Q/lHC/klKKnkT4Mx1kFUGf7rCDNC9nCxY8JgZULfcB69+3QxavSb/E0y/wwzOa86EzoZ4bORcOOOPYHlg3d9A47p4zD8STv8jFM6Ad26Fd25OqJAFp/4EJl4HDevg+XMh3BoPV50P8x+FroOwepEZtHrl1MDidZA9Cl65Bj7673jM9sFpD5tBattDsP5vIRaOxydcB7PuhpbN8Nwi6NgVjxVMhzNXg5MDL5xvkp9e3nxY9DQUz4NNd8KbN7iP77T/hBNvMInG2rOhuykeK18Cn10JkXZYfaZJUnplVcKZayE4AV5bBh/cnXCIPDDv11BzKex8El66CKIJg+zYq2HOvdC+w7SlbVs8lj/FlOsrgJe+DDufiMc8ObDwD1C6CD74Kbz2TSDho3XyLXDyzbD/TVh7lkm8epUsNMchFoE1Z8G+DfFYoATOWGOStrduhPfvSGiLDbN/CeO+CrufgRe/CJGOeHz0JfCZB0zfem4htHwYjx1N303nsiO6lMjHKBaL0bX3bSKttQmv2vhKZuDJKSPcuouuPW+R2Hc9eaPxFZ1MtLuNUN16YpHOvpjlCxIomwu2h1DDRqIdexKK9eAvnYMTKKC7aQvd+ze56uItmIx3xDginQcINbxikvTeTbOK8ZfOgmiEzvoNxLrin33LCeAvn4vtzaVr37uEm7cllGrhK56GJ7eScFs9XY1vAPE/nOvkVuMrmkos3EGofj2xcPzzYnnzCJTPBdtLV+PrRNoTrsmWg790Fk5W0WEe6fSOyR8lrH0MXvwSrmsMAD3j1oJHhyxZmTNnDrNmzeLuu811NRqNUl1dzbJly7jxxhtd7z1Wf5Rw0IlKe3s7y5Yt41e/+hUAH3zwAWPHjmXZsmVUVlYmVXSwCgsLWb58OV/72tcO+d4hS1QOJ0npNfYqqFhqBq7+sirg7Ddh5WjzbbS/JRvN4JyYiPQ6+RYz+PUfvAEKZ8G8++GpFMmc7YXztsJzn4XWj5Ljc++HfRvhwxXJsVFfhjFXwfNLk2P+kbB0E6wcB+GW5PiZ62DTctj9VHJs8g0mEXrtuuRY/hRY+Hv4/XhzxyWRZcO5m03ClZhU9Tp1hUkkNi1PjlUsNftdvSg55smD87bAU5PdCUOvhU/B1vthx++SYxOug5GzYMNVybHcsbD4BVg5xtzl6W/pe7D+Kti/MTk2fbm5Q/XOLcmxkkUm/sys5JgTgPO2w6rp0LE7OT7/d+acfHR/cuxo+m4qSlQyXqS9gVBDiv5newlULaKzdq37S0gPf9k8upu2EO1oTIp58sdiebLp3vduUszy5uEvnUXnzjUp6xOoOoNQw0Zi3cnXFO/Ik4iF2wk3JV/H7KwSvPnjCNWvTy7U8hCoOp3OXetSfg79pbMIt+7su7vjakveaGz/CLr2Jv/2dcuTTaDq9KP++3ZHnahEI7Cyxn0nxcUyd1bO23rMHwN1dXWRnZ3No48+yhe+8IW+16+88koOHjzIk08+6Xr/sP315Jtuuok///nPrFu3jrPPPrvv9cWLF3PLLbcccaISiUT43e9+R1tbG/PmzUv5nlAoRCgU6vt3c3PzEe3rmKp9IvWgBGbg+PCe9Bf67Q+lTlLAfFv35KSO7d8IW3+TOhbths2/SJ2k9Ja7L8WFCswdhsTHSolC+0xbUiUpYB4j1SX/EsC+feaMSR1reg+23JucpIB5bfO9qZOU3nLbtqeO1T1tHgOlEm4xbUmVpPSWu3NlmtjjqRMCMMd88y/S94etv06dpIB5dJV4NyRR4zrTV1KJdJq2pKtT7ePpz8vR9F05LrnuEiSKdps7EymSFIBwe13KJAUg0taAlfC4JVGsu4Vwy4609Qm37EiZpJhy64ml+UxEOxoJe1Lvk1iYcMu2tH073N6Q9jhE2uuJRUIpY7FwO7GuFiz/Mb57P1h7XhwgSQGIQXuteV/pomO667179xKJRJJ+PUlpaSl/+ctfjum+Eg06UXniiSd4+OGHmTt3riuznDJlClu2bBl0Bd555x3mzZtHZ2cnubm5PP7445x44okp33vbbbfxve99b9D7GFKOH+wBnl160yQbYG6xW07q2+r2QOVa4KT5kMLAMdsfn2cxqH0eolwnAJYXYik+5E4g/T4PVW66i9GhyrW8R75PJ2Aed0XTtMX2Jb/ea8D65mBuzaa48+AE0j9esRzTV9Luc6A+NkB9D9V3BypXjk9W+t9IYVnphwML22yb6guFZffNU0m98QDf6geIWbZDLJKmXMs+RFsGKNeyMb+ZI1VbnAHLHTD2celIvhN0VO87Dgz6qO/ZsyflhJm2trYjuiV2wgkn8NZbb/HKK6/wjW98gyuvvJL3338/5Xtvuukmmpqa+n5qa2tTvu9jVXO5+UklOMk8KvAVJscsG8Z8xcwbGWy55Z+DcVeb+Rb9eXJhwt+ZiayDLbfmMvOTSk4NTPymmQSbxDJljroo9bajL0u/z+IFMP7r7omdvWy/iRUvSI4dqtxRF0HNFfQ9s00UKDNtyalJvW3NAOUOdPwKZsD4vzPnoD/LY85Z+ZLBl1t1Poy9MvVF0lcAE6+F4OTBl3uovjsxTd+V45aTU5nydcuThROsSZvce3KrcHLK08Qq05Zr+wvxBkel7ruWjTc4Ctufuo85ORV4clOX6+SU482rShnD8eMEx2B5Uif3Tk5l+nJzK3HSxCxfPrYvxWf745aV+jwc8fsGoaioCMdxaGhw35FqaGigrCzV2HBsDDpROfXUU3nqqfg8hN7k5L/+67/SPrIZiM/nY/z48cycOZPbbruNU045hR/96Ecp3+v3+wkGg66fIXG4z9rLFsPUW6HsDDOnJDGLz66C0x4CTxbMf8T9SMX2w6yfm8mep64wE04T1VwOE641EzvHXYNrsM2fYiZQZleZ+SaJ37S9QbNPX76ZRJn4uMWyzZyNqnNhyj9Dxefd+yxeANPuMCt4pt0RX6kDZmCf/wh4AmaSlj9hQpntg5l3mVUtM38II+e4y62+ECZdb1YkTVzmbkveBDPXJlBs6pv4Dd6TY14LFJv35E1MKNQyiUbNJabs6i+59zlyjqlLwVQzAz7xjoK/yMzb8ATMfxMTL8sD027vOQa3JydIFUvNsas610yGTrz45tSY+vryzcTjxJU6ThbMvc+cs9m/MOcwsS3jrjHnesK1PclVgoJppo/kjTd9JvEOiK+gpy3Z5rwnrsyyHDjpZtM3p95q+mqiI+27clxzAgV4Cybh+hw6fnwlM7FtB3/JzH534Gy8hVOw/UF8hVOw/SPc5WWX4ckfiye3Ak+wxhWzPDn4ik/Bcvz4iqe7+5jl4Cue3hM7BavfnWdPsAZPbiWe/LE42e4B1/aPMHXxBfEWTsE1jNk+/D1t8ZXM7Jd4WXgLJpljUDgpKUEy817G48kuxZPvfmxsebLwF/e7Tg+X4gU9n/V0NwYsyK5O/wXvKPh8PmbOnMnq1av7XotGo6xevfqIxv/DNejJtC+99BLnnHMOV1xxBffffz//9//+X95//33+9Kc/8fzzzzNz5sxDFzKAM844g1GjRnH//fcf8r1DuuoHkifVXhaDva9C8/sQPBGK+q0bb98J9WvMEs/yJe7BPtwOu582z/wrznYvy43FoPF5MzG0cJZ7WS6Ypa57N5jOWXpGfIkxmKWuu1eZ5czl57iX5UbDUP+sWaZcstC9LBdg/xs9y5NPMCtvEnXUmeW33qApN2GZIZFO05Zwm7m7E+h3h63xJWjdbO4wFEx1x1q2wJ6XTLZfttg92Hc3m3IBKs5xD/axqKlPRx0Uz3cvywU48HZ8eXJJvw9oZyPU/dEkPxXnuO/eRLrMSqTuJlOf/t9C9qyPL08u7HeXqnWbOW+BErP0O2HJJN2tZm5INGzOd+JgH4uZZc3ttVA0z72kHKDp/fjy5JKF7vMd2mfOtxMwbUl81BTthrpnzPLksjPciQsc276b+NnQJNrjTizcSaRjL9genOwS16ObWDRCpKMRYhGcrOKkZbmRzv3Eutuw/fnYPvd1N9rdRrRzP5YTwM4qct1lj0W7ibSbVUFOdjFWQh8zS5v3Eot0YgcKsfslLtGuZqKhJixvtmuJMUAsEiLSsQcsByerBCthAmksFiXS3gjRME5WEZbHfec20nmAWM/yZNuf368t7UQ792E5/p62HJvHPsd21Q+4HyUP/aqfhx9+mCuvvJKf//znzJ49m7vuuotHHnmEv/zlL0lzV4Zt1Q/Ali1buP322/nzn/9Ma2srM2bM4Lvf/S4nn3zyoMq56aabOOeccxg1ahQtLS08+OCD3HHHHTzzzDOcddZZh9x+yBMVERGRY+iYJCqQ5veoVJs73EP4e1QA7r777r5f+DZt2jR+/OMfM2fOnKT3DWuicqx87WtfY/Xq1dTV1ZGfn8/UqVP57ne/e1hJCihRERGR48sxS1Rg2H4z7eEatuXJjuNQV1eXNKF23759lJSUEIkcxi+G6nHvvfcOdvciIiICJik5xkuQM9GgH7qluwETCoXw+QZYtikiIiIySId9R+XHP/4xYFb59P5hwl6RSIQXXniBSZMmHfsaioiIyKfWYScqP/zhDwFzR+VnP/sZjhN/Dubz+aipqeFnP/vZsa+hiIiIfGoddqLS+1eTTz/9dB577DEKCvS7FURERGRoDXoy7dq1a4eiHiIiIiJJDjtRuf766w/rfT/4wQ+OuDIiIiIiiQ47UXnzzTcP+Z6j/fPXIiIiIokOO1HRIx8RERH5uA3Z36wOBoN89NFHQ1W8iIiIfAoMWaIyjL+ZX0RE5BMvFosR6dhLuHUXkY69Qz7uvvDCC5x77rlUVFRgWRZPPPHEkO6v16BX/YiIiMjwCrfV0b3vPWKRzr7XLCeAd+QUPDnlA2x55Nra2jjllFP46le/ygUXDO0fPkykREVEROQ4Em6ro6vx9aTXY5FO83rJzCFJVs455xzOOeecY17uoQzZox8RERE5tmKxGN373hvwPd373vtETb9QoiIiInKciHbucz3uSSUW6STaue9jqtHQU6IiIiJynIhFQsf0fceDIUtUOjo66OjoGKriRUREPnUsx39M33c8GLJEJSsri6ysrKEqXkRE5FPHDozEcgIDvsdyAtiBkR9TjYaeVv2IiIgcJyzLwjtySspVP728I6cMyZ+0aW1tZfPmzX3/3rp1K2+99RaFhYWMGjXqmO+vlxIVERGR44gnpxxKZn7sv0fltdde4/TTT+/7d+8fK77yyiu5//77h2SfoERFRETkuOPJKcfJLutZBRTCcvzmsdAQ/nHgRYsWDcuy5yFLVPSXlEVERIaOZVk4WUXDXY0hp7/1IyIiIhlryBKVp59+msrKyqEqXkRERD4FDuvRT++EmcPxgx/8AID58+cfWY1EREREehxWovLmm2+6/v3GG28QDoc54YQTAPjggw9wHIeZM2ce+xqKiIh8wkSj0eGuwpA7VlNADitRWbt2bd///+AHPyAvL49f/epXFBQUAHDgwAGuvvpqFixYcEwqJSIi8knk8/mwbZvdu3dTXFyMz+f7RC4+icVi7Nmzx/zeF6/3qMqyYoNMeSorK/njH//IlClTXK+/++67fO5zn2P37t1HVaHBaG5uJj8/n6amJoLB4Me2XxERkSPV1dVFXV0d7e3tw12VIWVZFlVVVeTm5ibFBjN+D3p5cnNzM3v27El6fc+ePbS0tAy2OBERkU8Vn8/HqFGjCIfDRCKR4a7OkPF6vTiOc9TlDDpR+eIXv8jVV1/N97//fWbPng3AK6+8wg033MAFF1xw1BUSERH5pOt9JHK0j0U+DQadqPzsZz/jH//xH7nsssvo7u42hXg8fO1rX2P58uXHvIIiIiLy6TWoOSqRSISXX36Zk08+GZ/Px5YtWwAYN24cOTk5Q1bJdDRHRURE5PgzZHNUHMfhc5/7HJs2bWLMmDFMnTr1qCoqIiIiMpBB/2bak046iY8++mgo6iIiIiLiMuhE5d/+7d/4x3/8R/7whz9QV1dHc3Oz60dERETkWBn071Gx7Xhuk/hLamKxGJZlfaxLrTRHRURE5PgzpL9HJfG31IqIiIgMpUEnKgsXLhyKeoiIiIgkGXSiAnDw4EHuvfdeNm3aBMCUKVP46le/Sn5+/jGtnIiIiHy6DXoy7Wuvvca4ceP44Q9/yP79+9m/fz8/+MEPGDduHG+88cZQ1FFEREQ+pQY9mXbBggWMHz+eX/7yl3g85oZMOBzmmmuu4aOPPuKFF14Ykoqmosm0IiIix5/BjN+DTlSysrJ48803mTRpkuv1999/n1NPPfVj/WuQSlRERESOP4MZvwf96CcYDLJjx46k12tra8nLyxtscSIiIiJpDTpRufjii/na177Gww8/TG1tLbW1tTz00ENcc801XHrppUNRRxEREfmUGvSqnzvvvBPLsvjKV75COBwGwOv18o1vfIPbb7/9mFdQREREPr0GPUelV3t7u+uvJ2dnZx/Tih0OzVERERE5/gzpb6btlZ2dzcknn3ykmwNw22238dhjj/GXv/yFrKwsPvOZz3DHHXdwwgknHFW5x8yDlvvfF3fA9keg6T3InwKjLwInEI83rIW6Z8FfCDWXQ1Z5PNayBbY/BJFOqP4CFM6Mx7qbYduD0LYNRs6GyvPA7jk1sSjsegr2vgxZVTDmcvAVxLc98GeofQwsB0ZfDMGEY9fZCFv/B0J7oPR0KDsLev/sQSQEtf8LB9+GvIkw+hLwJCSbjS9C3SrwBk1bsqvisdZtsP23EG6Dys9D0dyEtrSaWOsWKJgB1V8E2xtvy+5VsOdFCJTBmCvAPzK+7cH3oPZR8//VX4IRU+Kx0D7Y9gB01EHxAqg4G6yeJ5fRbqh9HA68AbnjYPSl4M2Nb7t3A+z6A3hyTCy3Jh5r32nK7W6G8rOhZEE8Fm6H7Q9Dy19hxFSovhAcf09bYlD/HDSsAX+xaUugJL5t81/NtrEIVF8ABafEY10HYOsD0LETij5jjmFfW8KwayXsexVyaqDmMnMOeu1/HWqfMPUYfSnkjYvHOupMW0L7ofwsc857RTqPXd9941ZoXBmPV1wKix5EMkcsFiHSVk+0qxnbm4OTU4llO33xSOc+Iu17sGwPTm4ltierLxbtbifStptYNIyTXYoTiF9vYtEwkdZdRMPt2L58nJwyrJ6+G4vFiHY0Euncj+X48eRWYTm+eLldLUTa6gBwcsqxffE5jbFIF+HWXcQinTiBQuyskr4/0RKLRXva0oTtycbJrcSy40NXpPMAkfYG05acCmxv/DoWDXcQad3V05ZinED8ehOLRoi07Sba3YrtC/a0xYm3pXMvkY69PW2pxOr97MvH7ojvqBwLZ599NpdccgmzZs0iHA7zz//8z7z77ru8//775OTkHHL7Ib2j0j9JSSVvApy5DrLK4E9XmAG6l5MFCx4zA+qW++DVr5tBq9fkf4Lpd5jBec2Z0NkQj42cC2f8ESwPrPsbaFwXj/lHwul/hMIZ8M6t8M7NCRWy4NSfwMTroGEdPH8uhFvj4arzYf6j0HUQVi8yg1avnBpYvA6yR8Er18BH/x2P2T447WEzSG17CNb/LcTC8fiE62DW3dCyGZ5bBB274rGC6XDmanBy4IXzTfLTy5sPi56G4nmw6U548wb38Z32n3DiDSbRWHs2dDfFY+VL4LMrIdIOq880SUqvrEo4cy0EJ8Bry+CDuxMOkQfm/RpqLoWdT8JLF0G0Kx4fezXMuRfad5i2tG2Lx/KnmHJ9BfDSl2HnE/GYJwcW/gFKF8EHP4XXvgkkfLROvgVOvhn2vwlrzzKJV6+SheY4xCKw5izYtyEeC5TAGWtM0vbWjfD+HQltsWH2L2HcV2H3M/DiFyHSEY+PvgQ+84DpW88thJYP47Gj6bsp+eGyzkO8Rz4OsUgXnXXriXW39L1mebLwl83D8mTRtfdtIq21CVvY+Epm4MkpI9y6i649b5HYdz15o/EVnUy0u41Q3Xpikfh5tnxBAmVzwfYQathItGNPQrEe/KVzcAIFdDdtoXv/Jlc9vQWT8Y4YR6TzAKGGV0yS3rtpVjH+0lkQjdBZv4FYV/yzbzkB/OVzsb25dO17l3DztoRSLXzF0/DkVhJuq6er8Q0g2hd1cqvxFU0lFu4gVL+eWDj+ebG8eQTK54LtpavxdSLtCddky8FfOgsnq+hQh18O05AuTx5Ke/bsoaSkhOeff57Pfvazh3z/kCUqv6uA7rrDe+/Yq6BiqRm4+suqgLPfhJWjzbfR/pZsNINzYiLS6+RbzODXf/AGKJwF8+6Hp6Ykx2wvnLcVnvsstH6UHJ97P+zbCB+uSI6N+jKMuQqeX5oc84+EpZtg5TgItyTHz1wHm5bD7qeSY5NvMInQa9clx/KnwMLfw+/HmzsuiSwbzt1sEq7EpKrXqStMIrFpeXKsYqnZ7+pFyTFPHpy3BZ6a7E4Yei18CrbeDzt+lxybcB2MnAUbrkqO5Y6FxS/AyjHmLk9/S9+D9VfB/o3JsenLzR2qd25JjpUsMvFnZiXHnACctx1WTYeO3cnx+b8z5+Sj+5NjR9N3U7ksYy4ln2pde98h3LI96XUnpxxPbhWhhhT9z/YSqFxE58617i8hPfxl8+hu2kK0ozEp5skfi+XJpnvfu0kxy5uHv3QWnTvXpKxroOoMQg0bXUlVL+/Ik4iF2wk3JV/H7KwSvPnjCNWvTy7U8hCoPp3OnetSfg79pbMIt+7su7vjakveaGz/CLr2/jlFsdkEqk53/TFeOXIfy6OfodDUZLLmwsLClPFQKEQoFOr7d3Nz89BU5HCTFDC34VMNSmAGjg/vSX+h3/5Q6iQFzLd1T5q7Svs3wtbfpI5Fu2HzL1InKb3l7ktxoQJzhyHxsVKi0D7TllRJCpjHSHVPp99nzpjUsab3YMt/JycpYF7bcm/qJKW33LbkCzJg6pIzNnUs3GLakipJ6S1358o0scdTJwRgjvnmX6TvD1t/kzpJAfPoKvFuSKLGdaavpBLpNG1JV6fax9Ofl6Ppu5KxXHcCEl9va8DqfQzbX7TbJDcpkhSAcHtdyiSlr1xv6jmKse4Wwi3Jv86ir9yWHSmTFFNuPbE0n4loRyPhNPskFibcvD1t3w63N6Q/Ru31xCKhlLFYuJ1YVwuWX/MhP26DXp48VKLRKH//93/PaaedxkknnZTyPbfddhv5+fl9P9XV1R9zLVNw/GAP8OwyXbIB5ha75aSO2QOVa4EzwORlzwAx2x+fZzGofTLwPp0AWGkugk4g/T7BHIcj3We6ci0veAKpY4dTru0bfOyQ5WYDab6NDVSu5YA9QFsG7GMDteUo+q5kLivNZd2y08egb65JyhgDbGvZA26b9hp3iJiZUzPAPgcYvqyByrXsAcp1BjxGA8ZkyGTMUb/uuut49913eeihNN8cgZtuuommpqa+n9ra2rTvPSrFKR59pFNzuflJJTjJzBfxpbhDZNkw5itm3shgyy3/HIy72sy36M+TC+P/zkxkHWy5NZeZn1RyamDiN80k2CSWKXPURam3HX1Z+n0WL4DxX3dP7Oxl+02seEFy7FDljroIaq4gZWIQKDNtyalJvW3NAOUOdPwKZsCEvzPnoD/LY85Z+ZLBl1t1Poy7KvVF0lcAE6+F4OTBl3ukfVcympNbmfJ1T24FTk7qmOXJwgmOSZv4e3KrcHLK08Qq05Zr+wvxBkel7ruWjTc4Ctufuo85ORV40rTFySnHk1eVMobjxwnWYHlSfwFycirTl5tbmfb4Wb58bF+Kz7YMuYxIVL75zW/yhz/8gbVr11JVlabzAX6/n2Aw6PoZEmf94fDeV7YYpt4KZWeYOSWJWXx2FZz2EHiyYP4j7kcqth9m/dxM9jx1hZlwmqjmcphwrZnYOe4aXINt/hQzgTK7ysw3Sbwb4Q2affryzSTKxMctlm3mbFSdC1P+GSo+795n8QKYdodZwTPtjvhKHTAD+/xHzB2KBY+CP2FCme2DmXeZVS0zfwgj57jLrb4QJl1vViRNXOZuS94EM9cmUGzqm/gN3pNjXgsUm/fkTUwo1DKJRs0lpuzqL7n3OXKOqUvBVJj5I/cdBX+RmbfhCZj/JiZelgem3d5zDG5PTpAqlppjV3WumQydePHNqTH19eWbiceJK3WcLJh7nzlns39hzmFiW8ZdY871hGt7kqsEBdNMH8kbb/pM4h0QX0FPW7LNeU9cmWU5cNLNpm9OvdX01URH2nfTKZh/6PfIx8KbPx47q8T1mu0vxFs4GSdQgLdgEq7PoePHVzIT23bwl8zsdwfOxls4BdsfxFc4Bds/wlWuk12GJ38sntwKPMEaV8zy5OArPgXL8eMrnu7uY5aDr3h6T+wULK/77p0nWIMntxJP/licbHeCZPtHmLr4gngLp+Aaxmwf/p62+Epm9ku8LLwFk8wxKJyUlCCZeS/j8WSX4skf54pZniz8xf2u0/KxGdbJtLFYjGXLlvH444+zbt06JkyYMKjth/z3qKw6A/avNf9vBeHSJtj7KjS/D8EToWi2+/3tO6F+jVniWb7EPdiH22H30+aZf8XZ7mW5sRg0Pm8mhhbOci/LBbPUde8GM4CUnhFfYgxmqevuVWY5c/k57mW50TDUP2uWKZcsdC/LBdj/Rs/y5BPMyptEHXVm+a03aMpNWGZIpNO0Jdxm7u4E3BdFGl+C1s3mDkPBVHesZQvsecksfy1b7B7su5tNuQAV57gH+1jU1KejDornu5flAhx4O748uaRfgtHZCHV/NMlPxTnuuzeRLrMSqbvJ1Cer37fGPevjy5ML+92lat1mzlugxCz9TlgySXermRsSDZvznTjYx2JmWXN7LRTNcy8pB2h6P748uWSh+3yH9pnz7QRMWxIf80W7oe4Zszy57Ax34gLHtu8mrorTJNqMFA01Ee1qxvLmupYYA8TCnUQ69oLtwckucT26iUUjRDoaIRbBySpOWpYb6dxPrLsN25+P7XNfd6PdbUQ792M5AeysItfE01i0m0i7WRXkZBe75suYpc17iUU6sQOF2P0Sl2hXM9FQE5Y327XEGCAWCRHp2AOWg5NV4lqGHYtFibQ3QjSMk1WE1e9xcKTzALGe5cm2P79fW9qJdu7Dcvw9bcmI7/WfGMfNqp9rr72WBx98kCeffNL1u1Py8/PJyhpg3kIP/cI3ERGR489xk6ikW+Z13333cdVVVx1yeyUqIiIix5/jZnlyBv0KFxEREclAeugmIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMZSoiIiIiIZS4mKiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhnLM5w7f+GFF1i+fDmvv/46dXV1PP7443zhC18Yziq5PWi5/31xB2x/BJreg/wpMPoicALxeMNaqHsW/IVQczlklcdjLVtg+0MQ6YTqL0DhzHisuxm2PQht22DkbKg8D+yeUxOLwq6nYO/LkFUFYy4HX0F82wN/htrHwHJg9MUQPCEe62yErf8DoT1QejqUnQVWT5siIaj9Xzj4NuRNhNGXgCc7vm3ji1C3CrxB05bsqnisdRts/y2E26Dy81A0N6EtrSbWugUKZkD1F8H2xtuyexXseRECZTDmCvCPjG978D2ofdT8f/WXYMSUeCy0D7Y9AB11ULwAKs4GqyfPjnZD7eNw4A3IHQejLwVvbnzbvRtg1x/Ak2NiuTXxWPtOU253M5SfDSUL4rFwO2x/GFr+CiOmQvWF4Ph72hKD+uegYQ34i01bAiXxbZv/araNRaD6Aig4JR7rOgBbH4COnVD0GXMM+9oShl0rYd+rkFMDNZeZc9Br/+tQ+4Spx+hLIW9cPNZRZ9oS2g/lZ5lz3ivSeez67qqlQEM8TjVctgP55ItFw0RadxENt2P78nFyyrB6+m4sFiPa0Uikcz+W48eTW4Xl+Pq2jXa1EGmrA8DJKcf25cXLjXQRbt1FLNKJEyjEzirB6rlWxWJRIm31RLuasD3ZOLmVWHZ86Ip0HiDS3oBle3ByKrC98etYNNxBpHUXsWgYJ7sYJ5BwvZHjhhWLxWLDtfOnn36al19+mZkzZ3LBBRcMOlFpbm4mPz+fpqYmgsHgoTcYjP5JSip5E+DMdZBVBn+6wgzQvZwsWPCYGVC33Aevft0MWr0m/xNMv8MMzmvOhM6EC//IuXDGH8HywLq/gcZ18Zh/JJz+RyicAe/cCu/cnFAhC079CUy8DhrWwfPnQrg1Hq46H+Y/Cl0HYfUiM2j1yqmBxesgexS8cg189N/xmO2D0x42g9S2h2D930IsHI9PuA5m3Q0tm+G5RdCxKx4rmA5nrgYnB1443yQ/vbz5sOhpKJ4Hm+6EN29wH99p/wkn3mASjbVnQ3dTPFa+BD67EiLtsPpMk6T0yqqEM9dCcAK8tgw+uDvhEHlg3q+h5lLY+SS8dBFEu+LxsVfDnHuhfYdpS9u2eCx/iinXVwAvfRl2PhGPeXJg4R+gdBF88FN47ZtAwkfr5Fvg5Jth/5uw9iyTePUqWWiOQywCa86CfRvisUAJnLHGJG1v3Qjv35HQFhtm/xLGfRV2PwMvfhEiHfH46EvgMw+YvvXcQmj5MB47mr6bzmXDdimRj0G0u41Q3Xpikc6+1yxfkEDZXLA9hBo2Eu3YE9/A9uAvnYMTKKC7aQvd+ze5yvMWTMY7YhyRzgOEGl4xSXrvplnF+EtnQTRCZ/0GYl3xz77lBPCXz8X25tK1713CzdsSSrXwFU/Dk1tJuK2ersY3gGhf1Mmtxlc0tS8JkuEzmPF7WBOVRJZlZU6icjhJSq+xV0HFUjNw9ZdVAWe/CStHm2+j/S3ZaAbnxESk18m3mMGv/+ANUDgL5t0PT01JjtleOG8rPPdZaP0oOT73fti3ET5ckRwb9WUYcxU8vzQ55h8JSzfBynEQbkmOn7kONi2H3U8lxybfYBKh165LjuVPgYW/h9+PN3dcElk2nLvZJFyJSVWvU1eYRGLT8uRYxVKz39WLkmOePDhvCzw12Z0w9Fr4FGy9H3b8Ljk24ToYOQs2XJUcyx0Li1+AlWPMXZ7+lr4H66+C/RuTY9OXmztU79ySHCtZZOLPzEqOOQE4bzusmg4du5Pj839nzslH9yfHjqbvpqJE5ROts/5Voh2NSa978sdiebLp3vduUszy5uEvnUXnzjUpywxUnUGoYSOx7uRrinfkScTC7YSbkq9jdlYJ3vxxhOrXJxdqeQhUnU7nrnUpP4f+0lk42aUp6yMfn8GM38P66GewQqEQoVCo79/Nzc3DWJsetU+kHpTADBwf3pP+Qr/9odRJCphv656c1LH9G2Hrb1LHot2w+Repk5TecvelGCjB3GFIfKyUKLTPtCVVkgLmMVLd0+n3mTMmdazpPdhyb3KSAua1zfemTlJ6y23bnjpW97R5DJRKuMW0JVWS0lvuzpVpYo+nTgjAHPPNv0jfH7b+JnWSAubRVeLdkESN60xfSSXSCZvvSV+n2sfTn5ej6bvyqRKLRVMmKQCRtgashMctru26Wwi3pH8sGG7ZkTJJMeXWE0vzmYh2NBJOs09iYcIt29L27XB7gxKV48xxNZn2tttuIz8/v++nurp6uKtk5grY/vRxb5pkA8wtdstJHbMHKtcCJ82HFAaO2f74PItB7fMQ5ToBsLzpY+n2eahyPYfYZ7pyLe+R79MJmMddg40dstxsIM3duoHKtRywA6ljkD6hPVS5h+q7A5Urnz5WmuHCsvvmqaSOp7nGAdjpY5btkHaIsmysAYYva4B9DlhXyUjH1Rm76aabaGpq6vupra0doj0NItuuudz8pBKcZB4V+AqTY5YNY75i5o0Mttzyz8G4q818i/48uTDh78xE1sGWW3OZ+UklpwYmftNMgk1imTJHXZR629GXpd9n8QIY/3X3xM5ett/Eihckxw5V7qiLoOYKUiYGgTLTlpya1NvWDFDuQMevYIY59p7c5JjlMeesfMngy606H8ZdlXqg8BXAhGshOHnw5R6q705M03flU8eybJyc8pQxT24lTk5lypjtL8QbHJW671o23rxR2P7UfczJqcCTm7pcJ6ccT15VyhiOHyc4BsuTlWbb1GVK5jquEhW/308wGHT9DInL6g/vfWWLYeqtUHaGmVOSmMVnV8FpD4EnC+Y/4n6kYvth1s/NZM9TV5gJp4lqLjeDz9irYdw1uAbb/ClmAmV2lZlv4iR8GL1Bs09fvplEmfi4xbLNnI2qc2HKP0PF5937LF4A0+4wK3im3RFfqQNmYJ//CHgCsOBR8BcltMUHM+8yq1pm/hBGznGXW30hTLrerEiauMzdlrwJZq5NoNjUN/EbvCfHvBYoNu/Jm5hQqGUSjZpLTNnVX3Lvc+QcU5eCqTDzR+47Cv4iM2/DEzD/TUy8LA9Mu73nGNyenCBVLDXHrupcMxk68eKbU2Pq68s3E48TV+o4WTD3PnPOZv/CnMPEtoy7xpzrCdf2JFcJCqaZPpI33vSZxDsgvoKetmSb8564Msty4KSbTd+ceqvpq4mOtO+mlXfot8hxzVc4Bds/wvWak12GJ38sntwKPMEaV8zy5OArPgXL8eMrnu7uY5aDr3h6T+wUrH53nj3BGjy5lXjyx+JkuxMk2z/C1MUXxFs4BdcwZvvwl8zEth18JTP73VW18BZMwgkcTn+WTKLJtAN5sBA4EP/3ZTHY+yo0vw/BE6Fotvv97Tuhfo1Z4lm+xD3Yh9th99PmmX/F2e5lubEYND5vJoYWznIvywWz1HXvBjOAlJ4RX2IMZqnr7lVmOXP5Oe5ludEw1D9rlimXLHQvywXY/0bP8uQTzMqbRB11ZvmtN2jKTVhmSKTTtCXcZu7uJC7LBWh8CVo3mzsMBVPdsZYtsOcls/y1bLF7sO9uNuUCVJzjHuxjUVOfjjoonu9elgtw4O348uSSfglGZyPU/dEkPxXnuO/eRLrMSqTuJlOfrH7fGvesjy9PLux3l6p1mzlvgRKz9DthySTdrWZuSDRsznfiYB+LmWXN7bVQNM+9pByg6f348uSShe7zHdpnzrcTMG1JfDQW7Ya6Z8zy5LIz3IkLHNu+mzjhXJNoP1UinfuJdbdh+/Oxfe7rbrS7jWjnfiwngJ1V5FpdE4t2E2k3q4Kc7GKshD5mljbvJRbpxA4UYvdLXKJdzURDTVje7KQlxrFIiEjHHrAcnKySnkdGveVGibQ3QjSMk1WE5RngEap8rI6bVT+tra1s3rwZgOnTp/ODH/yA008/ncLCQkaNGnXI7Yc8UREREZFj7rhZ9fPaa69x+unxX0p1/fXXA3DllVdy//33D1OtREREJFMMa6KyaNEiMuTJk4iIiGSg42oyrYiIiHy6KFERERGRjKVERURERDKWEhURERHJWEpUREREJGMpUREREZGMpURFREREMpYSFREREclYSlREREQkYylRERERkYylREVEREQylhIVERERyVhKVERERCRjKVERERGRjKVERURERDKWEhURERHJWEpUREREJGMpUREREZGMpURFREREMpYSFREREclYSlREREQkYylRERERkYylREVEREQylhIVERERyVhKVERERCRjKVERERGRjKVERURERDKWEhURERHJWEpUREREJGMpUREREZGMpURFREREMpYSFREREclYSlREREQkYylRERERkYylREVEREQylhIVERERyVhKVERERCRjKVERERGRjKVERURERDKWEhURERHJWEpUREREJGMpUREREZGMpURFREREMpYSFREREclYSlREREQkY3mGuwIAK1asYPny5dTX13PKKafwk5/8hNmzZw93teBBy/3viztg+yPQ9B7kT4HRF4ETiMcb1kLds+AvhJrLIas8HmvZAtsfgkgnVH8BCmfGY93NsO1BaNsGI2dD5Xlg95yaWBR2PQV7X4asKhhzOfgK4tse+DPUPgaWA6MvhuAJ8VhnI2z9HwjtgdLToewssHraFAlB7f/CwbchbyKMvgQ82fFtG1+EulXgDZq2ZFfFY63bYPtvIdwGlZ+HorkJbWk1sdYtUDADqr8Itjfelt2rYM+LECiDMVeAf2R824PvQe2j5v+rvwQjpsRjoX2w7QHoqIPiBVBxNlg9eXa0G2ofhwNvQO44GH0peHPj2+7dALv+AJ4cE8uticfad5pyu5uh/GwoWRCPhdth+8PQ8lcYMRWqLwTH39OWGNQ/Bw1rwF9s2hIoiW/b/FezbSwC1RdAwSnxWNcB2PoAdOyEos+YY9jXljDsWgn7XoWcGqi5zJyDXvtfh9onTD1GXwp54+KxjjrTltB+KD/LnPNekc5j13dXnYqbA5eFOd60b/2D699Zo88h0lZHtLsF25uHk1OOZTt98UjHXiIde7EcL56cSixP/PhFu9uItO0mFoviyS7F9o/oi8Wi3YRbdxMLt2P7R+Bkl2L1nO9YLEako5Fo534sT8CU6/ji5YaaCbfXYWHh5FRg++L9OhYJEW7dRSwSwgkUYWcVYfV8vmOxCJG2eqJdzdjeHJycSndbOvcRad+DZXtwciuxPVkJbWk3bYmGcbJLcQLx600sGibSuotouB3bl4+TU+ZqS7SjkUjnfizHjye3yt2WrhYibXUAODnl2L68hLZ09bSlEydQiJ1VktCWaE9bmrA92Ti5lVh2fOiKdB4g0t5g2pJTge2NX8ei4Q4irbt62lKME4hfb2LRCJG23US7W7F9wZ62OPG2dPaebz+e3Eqs3s8+EO1qNceIGJ7scmx//DMai3QRbttFLNyJ7S/oOd8JbWlvIBo6iOXJxpNbgdV7fQSioYOE2xuwsHFyK7C9OfFyw52m3Eg3TlYRTlZRv7Ycm77bWf8XiO7ti+MpJbt6FsPJisViseGswMMPP8xXvvIVfvaznzFnzhzuuusufve73/HXv/6VkpKSAbdtbm4mPz+fpqYmgsHggO8dtP5JSip5E+DMdZBVBn+6wgzQvZwsWPCYGVC33Aevft0MWr0m/xNMv8MMzmvOhM6GeGzkXDjjj2B5YN3fQOO6eMw/Ek7/IxTOgHduhXduTqiQBaf+BCZeBw3r4PlzIdwaD1edD/Mfha6DsHqRGbR65dTA4nWQPQpeuQY++u94zPbBaQ+bQWrbQ7D+byGWMDBNuA5m3Q0tm+G5RdCxKx4rmA5nrgYnB1443yQ/vbz5sOhpKJ4Hm+6EN29wH99p/wkn3mASjbVnQ3dTPFa+BD67EiLtsPpMk6T0yqqEM9dCcAK8tgw+uDvhEHlg3q+h5lLY+SS8dBFEu+LxsVfDnHuhfYdpS9u2eCx/iinXVwAvfRl2PhGPeXJg4R+gdBF88FN47ZtAwkfr5Fvg5Jth/5uw9iyTePUqWWiOQywCa86CfRvisUAJnLHGJG1v3Qjv35HQFhtm/xLGfRV2PwMvfhEiHfH46EvgMw+YvvXcQmj5MB47mr6bzmXDeikZlP5JSiqWJwd/+Twsx0/XnjeJtO1OCNr4S07FyS4h3FJL1963STzfnvxx+AonE+1qobN+g/li0MP2j8BfNhewCDW8SrQzoS/YXgJlc7H9+XQf+IDugx+46uQdeRLeYA2Rjr2EGja6zouTXYqvZCZEw3TWrSfW3ZLQliz8ZfOwPFl07X2bSGttQqk2vpIZeHLKCLfuomvPW+625I3GV3Qy0e42QnXriUU64+X6ggTK5oLtIdSwkWjHnoRiPfhL5+AECuhu2kL3/k3uthRMxjtiHJHOA4QaXjFJeu+mWcX4S2dBNEJn/QZiXfHPvuUE8JfPxfbm0rXvXcLN2xLPGr7iaXhyKwm31dPV+AYQjR+j3Gp8RVOJhTsI1a8nFo5/XixvHoHyuWB76Wp8nUh7wjXZcvCXzsLJKqK7eRvd+951t2XERLwFE4mGmsz5jnbH2xIoxF86B4gRqt9ANHQw4Rj5CJTPw/bl0bV/E+GmLa5yfUVT8eSNItLeSKjxNfNlr7ctORX4iqdDJGTOd7gtXt2j6LvpZI/5/CHfMxiDGb+HPVGZM2cOs2bN4u67zWASjUaprq5m2bJl3HjjjQNuO2SJyuEkKb3GXgUVS83A1V9WBZz9Jqwcbb6N9rdkoxmcExORXiffYga//oM3QOEsmHc/PDUlOWZ74byt8NxnofWj5Pjc+2HfRvhwRXJs1JdhzFXw/NLkmH8kLN0EK8dBuCU5fuY62LQcdj+VHJt8g0mEXrsuOZY/BRb+Hn4/3vUhBMxAfO5mk3AlJlW9Tl1hEolNy5NjFUvNflcvSo558uC8LfDUZHfC0GvhU7D1ftjxu+TYhOtg5CzYcFVyLHcsLH4BVo5xXaj6LH0P1l8F+zcmx6YvN3eo3rklOVayyMSfSfGtxgnAedth1XTo2J0cn/87c04+uj85djR9N5XjJFFp3/pHoOuQ7wNwcqtwskt6Bjw3y/Hjr1hA5841yX0X8FfMp3v/Jnci0sM7YiLYTtLgDWD78vEVT6Nz1/MpamQRqDqjZ5BtT4r6ik4x38pbtie3JaccT26VSXCSduolULmIzp1r3V9CettSNo/upi1EOxqTYp78sVie7KTBG8zg7y+dZY5RCoGqMwg1bHQlVb28I08iFm4n3JR8HbOzSvDmjyNUvz65UMtDoOp0OnetS/k59JfOIty6s+/ujqsteaOx/SPo2vvnFMVm4y+b19OW5L4eqFxI1563iCYkVX1tKZwM0UhS4glgB0biLZxMaPdLKdpim2O0+0ViCcluL1/JDCLtjURadybFjqbvpjKcicqwPvrp6uri9ddf56abbup7zbZtFi9ezPr1yR0wFAoRCsVPVnNz88dSzwHVPpF6UAIzcHx4T/oL/fbfpk5SwHxb9+Skju3fCFt/kzoW7YbNv0idpPSWuy/FhQrMHYbEx0qJQvtMW1IlKWAeI9U9nX6fOWNSx5regy3/nfrDEovClntTJym95bYlX5ABU5ecsalj4RbTllRJSm+5O1emiT2eOiEAc8w3/yJ9f9j6m9RJCphHV4l3QxI1rjOPXlKJdJq2pKtT7ePpz8vR9N3j2uElKYD5Vp3mu1wsEjIJQZoLfbh1V8okBSDcXt/3mKG/aFcT4RQDT89eCbfuSJmkmPrWEw0lD5QAkbYG12MG9067e9qS+hFeuL0uZZLSV27C4xZXbbtbCLfsSL1PINyyI2WSYsqtJ5bmMxHtaCTsSb1PYmHCLdvS9u1we4P7bkniPtvrUyYEpth2wq07SHf3Idy6M2WSAj1tSXNXMtq5j3DrrpQxYlFzjNLUKdJWTyTxLlZi7Cj6bqYZ1sm0e/fuJRKJUFpa6nq9tLSU+vr6pPffdttt5Ofn9/1UV1d/XFVNz/GD7U8fT5dsADjZZm5JKvZA5Vpm27T7HCBm++PzLAa1TwbepxMAK81F0Amk3yeYRw1Hus905VpeSHgOO+hybd/gYzDwsXeygTR36wYq13LAHqAtA/axgdpyiPPtHaDcTwvLjs8fShlP/13PJCKpz7d1yHLTXBf6yh1gu3TlHmKf1kAxBtjWsgfcFnuA+g4QM3MsjrQtAx0/e4ByBzh+gMUhjn3a2EDHyDpEfYfqfGfEFNXDclyt+rnppptoamrq+6mtrT30RkOt5nLzk0pwkpkv4itMjlk2jPmKmTcy2HLLPwfjrk59kfTkwvi/MxNZB1tuzWXmJ5WcGpj4TTMJNollyhx1UeptR1+Wfp/FC2D8190TO3vZfhMrXpAcO1S5oy6CmitIOVAEykxbcmpSb1szQLkDHb+CGebYe3KTY5bHnLPyJYMvt+p8GHdV6ouOrwAmXgvByYMv91B9d0Kavnu8s4sO/Z4enpxKPLmVKWOWNxdvsCY+Wbz/tnlVONmlKWPOAOXaWcV48qpJ2XctB09wNJYvP3W5uZU4acr15Fbg5KRpiycLJzgmbeLvya3CySlPE6tMW67tL8SbNyp137VsvHmjsP2p+5iTU5H2GDk55XjzqlLGcPw4wTFYntRfgAY69gMdP8uXjyd/dJqExMKTV42dVZxy24GOkZNd2nO+U7C9eII1WN4U15Secj1pyj1U3/UM0HczzbAmKkVFRTiOQ0OD+zZcQ0MDZWXJA6Lf7ycYDLp+hsThPmsvWwxTb4WyM8ycksQOnF0Fpz0EniyY/4j7kYrth1k/N5M9T11hJpwmqrkcJlxrJnaOuwbXBSt/iplAmV1l5psk3o3wBs0+fflmEmXi4xbLNnM2qs6FKf8MFf2eNxYvgGl3mBU80+5wd+BAmWmDJwALHgV/woXe9sHMu8yqlpk/hJFz3OVWXwiTrjcrkiYuc7clb4KZaxMoNvVNvDPgyTGvBYrNe/ImJhRqmUSj5hJTdvWX3PscOcfUpWAqzPyR+46Cv8jM2/AEzH8TEy/LA9Nu7zkGtycnSBVLzbGrOtdMhk68+ObUmPr68s3E48SVOk4WzL3PnLPZvzDnMLEt464x53rCtT3JVYKCaaaP5I03fSbxDoivoKct2ea8J67Mshw46WbTN6feavpqoiPtu58A2aPnHvpNgB0owlswESeryMwpSei7lhPAXzwDy3bwl8x0f14sG1/RVGxvLr6RJ2H53NcpJ6cST7AGJ7caJ9c9QFnePLOtJwtf8SnuPmZ58JfMwLK9+IunY/W7e+fJH4snuxRv/njsLPdCBNtfiLdwMk6gAG/BJFdbcPz4SmZi97Ul8Q6cjbdwCrY/iK9wims1E4CTXWb2m1thBr7Etnhy8BWfguX4zaTPxD5mOfiKp/fETsHqd/fOE6wxg3D+WJxsd4Jk+0eYuviCeAun4BrGbB/+nrb4Smb2S7wsvAWTzDEonJSUIJl5L+PxZJfiyR/nilmeLHPMbS/+khnuL4mWja/4FHPOiqZiefNc2/aeZ0+wJilZsXxBfCNPwvbm4Cua6j7fthd/ycyePjYDy/VlzsIzYoLpmwUTsQPu5PuI+25aqRPjj0tGTKadPXs2P/nJTwAzmXbUqFF885vfHL7JtL36T6q9LAZ7X4Xm9yF4IhT1W0LdvhPq15glnuVL3B0g3A67nzbP/CvOdi/LjcWg8XkzMbRwlntZLpilrns3mAGk9Iz4EmMwS113rzLLmcvPcS/LjYah/lmzTLlkoXtZLsD+N3qWJ59gVt4k6qgzy2+9QVNuwjJDIp2mLeE2c3cn4L4o0vgStG42dxgKprpjLVtgz0tm+WvZYvcHs7vZlAtQcY57sI9FTX066qB4vntZLsCBt+PLk0v6JRidjVD3R5P8VJzjvnsT6TIrkbqbTH2y+n1r3LM+vjy5sN9dqtZt5rwFSszS74Qlk3S3mrkh0bA534mDfSxmljW310LRPPeScoCm9+PLk0sWus93aJ85307AtCVxsIp2Q90zZnly2RnuxAWObd9N/GwcJ5No+2vf9ieI7e/5l032mL8hEjpArKsVy5eL43cnaNFwB9GOfViOFzur2HUrPxaNEOlohFgEJ6vEtSzXLHXdRyzcge0f4VqWC2apazR0AMuThR0Y2beUFcxS10jHHrAsU25CH4vFokQ79hKLhLADI13LcgGioSaiXc1Y3lzXEmMwS10jHXvB9uBklwzQlmLXslyASOd+Yt1t2P587H5JWLS7zSy1dgKu5dKm3G4i7WY+hZNd7JovY5Y27yUW6cQOFLqW5Zpj1Ew01ITlzXYtMTbHKNRzjJyeY+QklBsl0t4I0TBOVpFrWa5pywFiPcuTbb97MI52txPt3Ifl+HvakniMwj3HKNZzjNKd7wLXknLTlpae5cnpzndj2rZEO/YQi3RjZ410LSkHjmnfTVwVd6wn0fY6rlb9PPzww1x55ZX8/Oc/Z/bs2dx111088sgj/OUvf0mau9LfkCcqIiIicswdN6t+AC6++GL27NnDv/zLv1BfX8+0adNYtWrVIZMUERER+eQb9jsqR0N3VERERI4/gxm/j6tVPyIiIvLpokRFREREMpYSFREREclYSlREREQkYylRERERkYylREVEREQylhIVERERyVhKVERERCRjKVERERGRjDXsv0L/aPT+Ut3m5uZhromIiIgcrt5x+3B+Of5xnai0tLQAUF1dfYh3ioiISKZpaWkhPz9/wPcc13/rJxqNsnv3bvLy8lx/Kls+mZqbm6murqa2tlZ/20nkE0af70+XWCxGS0sLFRUV2PbAs1CO6zsqtm1TVVU13NWQj1kwGNSFTOQTSp/vT49D3Unppcm0IiIikrGUqIiIiEjGUqIixw2/38/NN9+M3+8f7qqIyDGmz7ekc1xPphUREZFPNt1RERERkYylREVEREQylhIVERERyVhKVERERCRjKVGR48aKFSuoqakhEAgwZ84cXn311eGukogcpRdeeIFzzz2XiooKLMviiSeeGO4qSYZRoiLHhYcffpjrr7+em2++mTfeeINTTjmFJUuW0NjYONxVE5Gj0NbWximnnMKKFSuGuyqSobQ8WY4Lc+bMYdasWdx9992A+TtP1dXVLFu2jBtvvHGYaycix4JlWTz++ON84QtfGO6qSAbRHRXJeF1dXbz++ussXry47zXbtlm8eDHr168fxpqJiMhQU6IiGW/v3r1EIhFKS0tdr5eWllJfXz9MtRIRkY+DEhURERHJWEpUJOMVFRXhOA4NDQ2u1xsaGigrKxumWomIyMdBiYpkPJ/Px8yZM1m9enXfa9FolNWrVzNv3rxhrJmIiAw1z3BXQORwXH/99Vx55ZWceuqpzJ49m7vuuou2tjauvvrq4a6aiByF1tZWNm/e3PfvrVu38tZbb1FYWMioUaOGsWaSKbQ8WY4bd999N8uXL6e+vp5p06bx4x//mDlz5gx3tUTkKKxbt47TTz896fUrr7yS+++//+OvkGQcJSoiIiKSsTRHRURERDKWEhURERHJWEpUREREJGMpUREREZGMpURFREREMpYSFREREclYSlREREQkYylRERERkYylREVEMtott9zCtGnThrsaIjJMlKiIyJCqr69n2bJljB07Fr/fT3V1Neeee67rj0yKiKSjP0ooIkNm27ZtnHbaaYwYMYLly5dz8skn093dzTPPPMN1113HX/7yl4+lHt3d3Xi93o9lXyJybOmOiogMmWuvvRbLsnj11Ve58MILmThxIlOmTOH6669nw4YNAOzYsYPzzz+f3NxcgsEgF110EQ0NDWnLjEaj3HrrrVRVVeH3+5k2bRqrVq3qi2/btg3Lsnj44YdZuHAhgUCABx54YMjbKiJDQ4mKiAyJ/fv3s2rVKq677jpycnKS4iNGjCAajXL++eezf/9+nn/+eZ599lk++ugjLr744rTl/uhHP+L73/8+d955J2+//TZLlizhvPPO48MPP3S978Ybb+Tb3/42mzZtYsmSJce8fSLy8dCjHxEZEps3byYWizFp0qS071m9ejXvvPMOW7dupbq6GoBf//rXTJkyhY0bNzJr1qykbe68806++93vcskllwBwxx13sHbtWu666y5WrFjR976///u/54ILLjjGrRKRj5vuqIjIkIjFYod8z6ZNm6iuru5LUgBOPPFERowYwaZNm5Le39zczO7duznttNNcr5922mlJ7z/11FOPsOYikkmUqIjIkJgwYQKWZX1sE2b7S/W4SUSOP0pURGRIFBYWsmTJElasWEFbW1tS/ODBg0yePJna2lpqa2v7Xn///fc5ePAgJ554YtI2wWCQiooKXn75ZdfrL7/8csr3i8jxT3NURGTIrFixgtNOO43Zs2dz6623MnXqVMLhMM8++yz33HMP77//PieffDKXX345d911F+FwmGuvvZaFCxemfXRzww03cPPNNzNu3DimTZvGfffdx1tvvaWVPSKfUEpURGTIjB07ljfeeIN///d/5zvf+Q51dXUUFxczc+ZM7rnnHizL4sknn2TZsmV89rOfxbZtzj77bH7yk5+kLfNb3/oWTU1NfOc736GxsZETTzyRlStXMmHChI+xZSLycbFihzPjTURERGQYaI6KiIiIZCwlKiIiIpKxlKiIiIhIxlKiIiIiIhlLiYqIiIhkLCUqIiIikrGUqIiIiEjGUqIiIiIiGUuJioiIiGQsJSoiIiKSsZSoiIiISMb6/wHCnL2DXQSZ/gAAAABJRU5ErkJggg==",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "# Suppressing warning message claiming that a portion of points cannot be placed into the plot due to the high number of data points\n",
+ "import warnings\n",
+ "warnings.filterwarnings(action='ignore', category=UserWarning, module='seaborn')\n",
+ "\n",
+ "palette = {\n",
+ " 0: 'orange',\n",
+ " 1: 'wheat'\n",
+ "}\n",
+ "sns.swarmplot(x=\"Color\", y=\"ord__Item Size\", hue=\"Color\", data=encoded_pumpkins, palette=palette)"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "**注意**:忽视警告并不是一种最佳实践,应尽可能避免。警告通常包含有用的信息,可以帮助我们改进代码并解决问题。 \n",
+ "我们忽略这个特定警告的原因是为了保证图表的可读性。在保持调色板颜色一致的同时,用较小的标记尺寸绘制所有数据点会导致图表的可视化效果不清晰。\n"
+ ]
+ },
+ {
+ "attachments": {},
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 构建您的模型\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 74,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "# X is the encoded features\n",
+ "X = encoded_pumpkins[encoded_pumpkins.columns.difference(['Color'])]\n",
+ "# y is the encoded label\n",
+ "y = encoded_pumpkins['Color']\n",
+ "\n",
+ "# Split the data into training and test sets\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 75,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 0.94 0.98 0.96 166\n",
+ " 1 0.85 0.67 0.75 33\n",
+ "\n",
+ " accuracy 0.92 199\n",
+ " macro avg 0.89 0.82 0.85 199\n",
+ "weighted avg 0.92 0.92 0.92 199\n",
+ "\n",
+ "Predicted labels: [0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 0\n",
+ " 0 0 0 0 0 1 0 1 0 0 1 0 0 0 0 0 1 0 1 0 1 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0\n",
+ " 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 1 0 0 0 0 0 0 0 1 0\n",
+ " 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1 0 0 0 0 0 0 0 0 1 0 0 0 1 1 0\n",
+ " 0 0 0 0 1 0 0 0 0 0 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 1\n",
+ " 0 0 0 1 0 0 0 0 0 0 0 0 1 1]\n",
+ "F1-score: 0.7457627118644068\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import f1_score, classification_report \n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "\n",
+ "# Train a logistic regression model on the pumpkin dataset\n",
+ "model = LogisticRegression()\n",
+ "model.fit(X_train, y_train)\n",
+ "predictions = model.predict(X_test)\n",
+ "\n",
+ "# Evaluate the model and print the results\n",
+ "print(classification_report(y_test, predictions))\n",
+ "print('Predicted labels: ', predictions)\n",
+ "print('F1-score: ', f1_score(y_test, predictions))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 76,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "array([[162, 4],\n",
+ " [ 11, 22]])"
+ ]
+ },
+ "execution_count": 76,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import confusion_matrix\n",
+ "confusion_matrix(y_test, predictions)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 77,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "from sklearn.metrics import roc_curve, roc_auc_score\n",
+ "import matplotlib\n",
+ "import matplotlib.pyplot as plt\n",
+ "%matplotlib inline\n",
+ "\n",
+ "y_scores = model.predict_proba(X_test)\n",
+ "# calculate ROC curve\n",
+ "fpr, tpr, thresholds = roc_curve(y_test, y_scores[:,1])\n",
+ "\n",
+ "# plot ROC curve\n",
+ "fig = plt.figure(figsize=(6, 6))\n",
+ "# Plot the diagonal 50% line\n",
+ "plt.plot([0, 1], [0, 1], 'k--')\n",
+ "# Plot the FPR and TPR achieved by our model\n",
+ "plt.plot(fpr, tpr)\n",
+ "plt.xlabel('False Positive Rate')\n",
+ "plt.ylabel('True Positive Rate')\n",
+ "plt.title('ROC Curve')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 78,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "0.9749908725812341\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Calculate AUC score\n",
+ "auc = roc_auc_score(y_test,y_scores[:,1])\n",
+ "print(auc)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.16"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "orig_nbformat": 2,
+ "vscode": {
+ "interpreter": {
+ "hash": "949777d72b0d2535278d3dc13498b2535136f6dfe0678499012e853ee9abcab1"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "ef50cc584e0b79412610cc7da15e1f86",
+ "translation_date": "2025-09-03T19:31:02+00:00",
+ "source_file": "2-Regression/4-Logistic/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/2-Regression/README.md b/translations/zh-CN/2-Regression/README.md
new file mode 100644
index 000000000..a2df49779
--- /dev/null
+++ b/translations/zh-CN/2-Regression/README.md
@@ -0,0 +1,45 @@
+# 机器学习中的回归模型
+## 区域主题:北美地区南瓜价格的回归模型 🎃
+
+在北美,南瓜常被雕刻成恐怖的面孔用于庆祝万圣节。让我们一起来探索这些迷人的蔬菜吧!
+
+
+> 图片由 Beth Teutschmann 提供,来自 Unsplash
+
+## 你将学到什么
+
+[](https://youtu.be/5QnJtDad4iQ "回归简介视频 - 点击观看!")
+> 🎥 点击上方图片观看本课的快速介绍视频
+
+本节课程涵盖了机器学习中回归的类型。回归模型可以帮助确定变量之间的_关系_。这种模型可以预测诸如长度、温度或年龄等值,从而在分析数据点时揭示变量之间的关系。
+
+在这一系列课程中,你将了解线性回归和逻辑回归的区别,以及在什么情况下应该选择其中一种。
+
+[](https://youtu.be/XA3OaoW86R8 "机器学习初学者 - 回归模型简介")
+
+> 🎥 点击上方图片观看关于回归模型的简短介绍视频。
+
+在这一组课程中,你将准备开始机器学习任务,包括配置 Visual Studio Code 来管理笔记本,这是数据科学家常用的环境。你将了解 Scikit-learn,一个用于机器学习的库,并在本章中构建你的第一个模型,重点是回归模型。
+
+> 有一些实用的低代码工具可以帮助你学习如何使用回归模型。试试 [Azure ML 来完成这个任务](https://docs.microsoft.com/learn/modules/create-regression-model-azure-machine-learning-designer/?WT.mc_id=academic-77952-leestott)
+
+### 课程
+
+1. [工具介绍](1-Tools/README.md)
+2. [数据管理](2-Data/README.md)
+3. [线性回归和多项式回归](3-Linear/README.md)
+4. [逻辑回归](4-Logistic/README.md)
+
+---
+### 致谢
+
+"回归中的机器学习" 由 [Jen Looper](https://twitter.com/jenlooper) ♥️ 编写
+
+♥️ 测验贡献者包括:[Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan) 和 [Ornella Altunyan](https://twitter.com/ornelladotcom)
+
+南瓜数据集由 [Kaggle 上的这个项目](https://www.kaggle.com/usda/a-year-of-pumpkin-prices) 提供,其数据来源于美国农业部发布的 [Specialty Crops Terminal Markets Standard Reports](https://www.marketnews.usda.gov/mnp/fv-report-config-step1?type=termPrice)。我们根据品种添加了一些关于颜色的点以规范分布。这些数据属于公共领域。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/3-Web-App/1-Web-App/README.md b/translations/zh-CN/3-Web-App/1-Web-App/README.md
new file mode 100644
index 000000000..e37631e25
--- /dev/null
+++ b/translations/zh-CN/3-Web-App/1-Web-App/README.md
@@ -0,0 +1,350 @@
+# 构建一个使用机器学习模型的网页应用
+
+在本课中,你将使用一个非常特别的数据集来训练一个机器学习模型:_过去一个世纪的UFO目击事件_,数据来源于NUFORC的数据库。
+
+你将学习:
+
+- 如何对训练好的模型进行“pickle”处理
+- 如何在Flask应用中使用该模型
+
+我们将继续使用notebook来清理数据并训练模型,但你可以更进一步,尝试在“真实世界”中使用模型,也就是在一个网页应用中。
+
+为此,你需要使用Flask构建一个网页应用。
+
+## [课前小测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 构建一个应用
+
+有多种方法可以构建网页应用来使用机器学习模型。你的网页架构可能会影响模型的训练方式。想象一下,你正在一个企业中工作,数据科学团队已经训练了一个模型,他们希望你在应用中使用它。
+
+### 需要考虑的问题
+
+你需要问自己许多问题:
+
+- **这是一个网页应用还是一个移动应用?** 如果你正在构建一个移动应用,或者需要在物联网环境中使用模型,你可以使用 [TensorFlow Lite](https://www.tensorflow.org/lite/) 并在Android或iOS应用中使用该模型。
+- **模型将存储在哪里?** 是在云端还是本地?
+- **是否需要离线支持?** 应用是否需要在离线状态下运行?
+- **训练模型使用了什么技术?** 所选技术可能会影响你需要使用的工具。
+ - **使用TensorFlow。** 如果你使用TensorFlow训练模型,该生态系统提供了将TensorFlow模型转换为网页应用中使用的能力,例如通过 [TensorFlow.js](https://www.tensorflow.org/js/)。
+ - **使用PyTorch。** 如果你使用 [PyTorch](https://pytorch.org/) 等库构建模型,你可以选择将其导出为 [ONNX](https://onnx.ai/)(开放神经网络交换)格式,用于支持JavaScript网页应用的 [Onnx Runtime](https://www.onnxruntime.ai/)。在未来的课程中,我们将探索如何将Scikit-learn训练的模型导出为ONNX格式。
+ - **使用Lobe.ai或Azure Custom Vision。** 如果你使用 [Lobe.ai](https://lobe.ai/) 或 [Azure Custom Vision](https://azure.microsoft.com/services/cognitive-services/custom-vision-service/?WT.mc_id=academic-77952-leestott) 等机器学习SaaS(软件即服务)系统来训练模型,这类软件提供了多平台导出模型的方法,包括构建一个定制的API,通过云端供在线应用查询。
+
+你还可以选择构建一个完整的Flask网页应用,该应用能够在网页浏览器中自行训练模型。这也可以通过JavaScript环境中的TensorFlow.js实现。
+
+对于我们的目的,由于我们一直在使用基于Python的notebook,让我们来探索将训练好的模型从notebook导出为Python构建的网页应用可读取的格式所需的步骤。
+
+## 工具
+
+完成此任务,你需要两个工具:Flask和Pickle,它们都运行在Python上。
+
+✅ 什么是 [Flask](https://palletsprojects.com/p/flask/)?Flask被其创建者定义为一个“微框架”,它使用Python和模板引擎来构建网页,提供了网页框架的基本功能。可以参考 [这个学习模块](https://docs.microsoft.com/learn/modules/python-flask-build-ai-web-app?WT.mc_id=academic-77952-leestott) 来练习使用Flask构建应用。
+
+✅ 什么是 [Pickle](https://docs.python.org/3/library/pickle.html)?Pickle 🥒 是一个Python模块,用于序列化和反序列化Python对象结构。当你对模型进行“pickle”处理时,你会将其结构序列化或扁平化,以便在网页上使用。需要注意的是:Pickle本身并不安全,因此在被提示“un-pickle”文件时要小心。Pickle文件的后缀为`.pkl`。
+
+## 练习 - 清理数据
+
+在本课中,你将使用来自 [NUFORC](https://nuforc.org)(国家UFO报告中心)的80,000条UFO目击数据。这些数据中包含一些有趣的UFO目击描述,例如:
+
+- **长描述示例。** “一个人从夜晚草地上的一道光束中出现,跑向德州仪器的停车场。”
+- **短描述示例。** “灯光追逐我们。”
+
+[ufos.csv](../../../../3-Web-App/1-Web-App/data/ufos.csv) 表格包含关于目击发生的 `city`(城市)、`state`(州)和 `country`(国家),物体的 `shape`(形状),以及其 `latitude`(纬度)和 `longitude`(经度)的列。
+
+在本课提供的空白 [notebook](../../../../3-Web-App/1-Web-App/notebook.ipynb) 中:
+
+1. 像之前的课程一样,导入 `pandas`、`matplotlib` 和 `numpy`,并导入ufos表格。你可以查看数据集的样本:
+
+ ```python
+ import pandas as pd
+ import numpy as np
+
+ ufos = pd.read_csv('./data/ufos.csv')
+ ufos.head()
+ ```
+
+1. 将ufos数据转换为一个小型数据框,并重新命名列标题。检查 `Country` 字段中的唯一值。
+
+ ```python
+ ufos = pd.DataFrame({'Seconds': ufos['duration (seconds)'], 'Country': ufos['country'],'Latitude': ufos['latitude'],'Longitude': ufos['longitude']})
+
+ ufos.Country.unique()
+ ```
+
+1. 现在,你可以通过删除任何空值并仅导入1-60秒之间的目击事件来减少需要处理的数据量:
+
+ ```python
+ ufos.dropna(inplace=True)
+
+ ufos = ufos[(ufos['Seconds'] >= 1) & (ufos['Seconds'] <= 60)]
+
+ ufos.info()
+ ```
+
+1. 导入Scikit-learn的 `LabelEncoder` 库,将国家的文本值转换为数字:
+
+ ✅ LabelEncoder 按字母顺序对数据进行编码
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+
+ ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])
+
+ ufos.head()
+ ```
+
+ 你的数据应如下所示:
+
+ ```output
+ Seconds Country Latitude Longitude
+ 2 20.0 3 53.200000 -2.916667
+ 3 20.0 4 28.978333 -96.645833
+ 14 30.0 4 35.823889 -80.253611
+ 23 60.0 4 45.582778 -122.352222
+ 24 3.0 3 51.783333 -0.783333
+ ```
+
+## 练习 - 构建模型
+
+现在你可以准备通过将数据分为训练组和测试组来训练模型。
+
+1. 选择三个特征作为你的X向量,y向量将是 `Country`。你希望能够输入 `Seconds`、`Latitude` 和 `Longitude`,并返回一个国家ID。
+
+ ```python
+ from sklearn.model_selection import train_test_split
+
+ Selected_features = ['Seconds','Latitude','Longitude']
+
+ X = ufos[Selected_features]
+ y = ufos['Country']
+
+ X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)
+ ```
+
+1. 使用逻辑回归训练模型:
+
+ ```python
+ from sklearn.metrics import accuracy_score, classification_report
+ from sklearn.linear_model import LogisticRegression
+ model = LogisticRegression()
+ model.fit(X_train, y_train)
+ predictions = model.predict(X_test)
+
+ print(classification_report(y_test, predictions))
+ print('Predicted labels: ', predictions)
+ print('Accuracy: ', accuracy_score(y_test, predictions))
+ ```
+
+模型的准确率还不错 **(大约95%)**,这并不奇怪,因为 `Country` 和 `Latitude/Longitude` 是相关的。
+
+你创建的模型并不是非常具有革命性,因为你应该能够从 `Latitude` 和 `Longitude` 推断出 `Country`,但这是一个很好的练习,可以尝试从清理过的原始数据中训练模型,导出模型,然后在网页应用中使用它。
+
+## 练习 - 对模型进行“pickle”处理
+
+现在,是时候对你的模型进行“pickle”处理了!你可以用几行代码完成这一步。一旦完成“pickle”处理,加载你的Pickle模型,并用一个包含秒数、纬度和经度值的样本数据数组进行测试,
+
+```python
+import pickle
+model_filename = 'ufo-model.pkl'
+pickle.dump(model, open(model_filename,'wb'))
+
+model = pickle.load(open('ufo-model.pkl','rb'))
+print(model.predict([[50,44,-12]]))
+```
+
+模型返回了 **'3'**,这是英国的国家代码。太神奇了!👽
+
+## 练习 - 构建一个Flask应用
+
+现在你可以构建一个Flask应用来调用你的模型,并以更直观的方式返回类似的结果。
+
+1. 首先,在 _notebook.ipynb_ 文件旁边创建一个名为 **web-app** 的文件夹,其中存放你的 _ufo-model.pkl_ 文件。
+
+1. 在该文件夹中再创建三个文件夹:**static**(其中包含一个名为 **css** 的文件夹)和 **templates**。你现在应该有以下文件和目录:
+
+ ```output
+ web-app/
+ static/
+ css/
+ templates/
+ notebook.ipynb
+ ufo-model.pkl
+ ```
+
+ ✅ 参考解决方案文件夹以查看完成的应用
+
+1. 在 _web-app_ 文件夹中创建第一个文件 **requirements.txt**。像JavaScript应用中的 _package.json_ 一样,此文件列出了应用所需的依赖项。在 **requirements.txt** 中添加以下内容:
+
+ ```text
+ scikit-learn
+ pandas
+ numpy
+ flask
+ ```
+
+1. 现在,通过导航到 _web-app_ 运行此文件:
+
+ ```bash
+ cd web-app
+ ```
+
+1. 在终端中输入 `pip install`,以安装 _requirements.txt_ 中列出的库:
+
+ ```bash
+ pip install -r requirements.txt
+ ```
+
+1. 现在,你可以创建另外三个文件来完成应用:
+
+ 1. 在根目录创建 **app.py**。
+ 2. 在 _templates_ 目录中创建 **index.html**。
+ 3. 在 _static/css_ 目录中创建 **styles.css**。
+
+1. 在 _styles.css_ 文件中添加一些样式:
+
+ ```css
+ body {
+ width: 100%;
+ height: 100%;
+ font-family: 'Helvetica';
+ background: black;
+ color: #fff;
+ text-align: center;
+ letter-spacing: 1.4px;
+ font-size: 30px;
+ }
+
+ input {
+ min-width: 150px;
+ }
+
+ .grid {
+ width: 300px;
+ border: 1px solid #2d2d2d;
+ display: grid;
+ justify-content: center;
+ margin: 20px auto;
+ }
+
+ .box {
+ color: #fff;
+ background: #2d2d2d;
+ padding: 12px;
+ display: inline-block;
+ }
+ ```
+
+1. 接下来,构建 _index.html_ 文件:
+
+ ```html
+
+
+
+
+ 🛸 UFO Appearance Prediction! 👽
+
+
+
+
+
+
+
+
+
According to the number of seconds, latitude and longitude, which country is likely to have reported seeing a UFO?
+
+
+
+
{{ prediction_text }}
+
+
+
+
+
+
+
+ ```
+
+ 查看此文件中的模板语法。注意变量周围的“大括号”语法,例如预测文本:`{{}}`。还有一个表单会将预测结果发布到 `/predict` 路由。
+
+ 最后,你已经准备好构建驱动模型使用和预测显示的Python文件:
+
+1. 在 `app.py` 中添加:
+
+ ```python
+ import numpy as np
+ from flask import Flask, request, render_template
+ import pickle
+
+ app = Flask(__name__)
+
+ model = pickle.load(open("./ufo-model.pkl", "rb"))
+
+
+ @app.route("/")
+ def home():
+ return render_template("index.html")
+
+
+ @app.route("/predict", methods=["POST"])
+ def predict():
+
+ int_features = [int(x) for x in request.form.values()]
+ final_features = [np.array(int_features)]
+ prediction = model.predict(final_features)
+
+ output = prediction[0]
+
+ countries = ["Australia", "Canada", "Germany", "UK", "US"]
+
+ return render_template(
+ "index.html", prediction_text="Likely country: {}".format(countries[output])
+ )
+
+
+ if __name__ == "__main__":
+ app.run(debug=True)
+ ```
+
+ > 💡 提示:当你在使用Flask运行网页应用时添加 [`debug=True`](https://www.askpython.com/python-modules/flask/flask-debug-mode),任何对应用的更改都会立即反映出来,而无需重启服务器。但要小心!不要在生产应用中启用此模式。
+
+如果你运行 `python app.py` 或 `python3 app.py`,你的网页服务器会在本地启动,你可以填写一个简短的表单,获取关于UFO目击地点的答案!
+
+在此之前,先看看 `app.py` 的各个部分:
+
+1. 首先,加载依赖项并启动应用。
+1. 然后,导入模型。
+1. 接着,在主页路由上渲染 index.html。
+
+在 `/predict` 路由上,当表单被提交时,会发生以下几件事:
+
+1. 表单变量被收集并转换为一个numpy数组。然后将其发送到模型,并返回一个预测结果。
+2. 我们希望显示的国家代码被重新渲染为可读的文本值,并将该值发送回 index.html,在模板中渲染。
+
+通过Flask和Pickle模型以这种方式使用模型是相对简单的。最难的部分是理解必须发送到模型的数据形状,以获得预测结果。这完全取决于模型的训练方式。这个模型需要输入三个数据点才能获得预测。
+
+在专业环境中,你可以看到训练模型的团队和在网页或移动应用中使用模型的团队之间良好沟通的重要性。在我们的案例中,只有一个人,那就是你!
+
+---
+
+## 🚀 挑战
+
+与其在notebook中工作并将模型导入Flask应用,你可以直接在Flask应用中训练模型!尝试将notebook中的Python代码转换为在应用中的 `train` 路由上训练模型。尝试这种方法的优缺点是什么?
+
+## [课后小测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+构建一个使用机器学习模型的网页应用有很多方法。列出你可以使用JavaScript或Python构建网页应用以利用机器学习的方法。考虑架构:模型应该保留在应用中还是存储在云端?如果是后者,你将如何访问它?绘制一个应用机器学习网页解决方案的架构模型。
+
+## 作业
+
+[尝试一个不同的模型](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/3-Web-App/1-Web-App/assignment.md b/translations/zh-CN/3-Web-App/1-Web-App/assignment.md
new file mode 100644
index 000000000..1a2226af9
--- /dev/null
+++ b/translations/zh-CN/3-Web-App/1-Web-App/assignment.md
@@ -0,0 +1,16 @@
+# 尝试不同的模型
+
+## 说明
+
+现在您已经使用训练好的回归模型构建了一个网页应用,请使用之前回归课程中的一个模型重新制作这个网页应用。您可以保持原有的风格,也可以设计不同的样式以体现南瓜数据。请注意更改输入以匹配您模型的训练方法。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------------------------- | ------------------------------------------------------- | ------------------------------------------------------- | ------------------------------------- |
+| | 网页应用运行正常并成功部署到云端 | 网页应用存在缺陷或表现出意外结果 | 网页应用无法正常运行 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/3-Web-App/1-Web-App/notebook.ipynb b/translations/zh-CN/3-Web-App/1-Web-App/notebook.ipynb
new file mode 100644
index 000000000..e69de29bb
diff --git a/translations/zh-CN/3-Web-App/1-Web-App/solution/notebook.ipynb b/translations/zh-CN/3-Web-App/1-Web-App/solution/notebook.ipynb
new file mode 100644
index 000000000..7ea590937
--- /dev/null
+++ b/translations/zh-CN/3-Web-App/1-Web-App/solution/notebook.ipynb
@@ -0,0 +1,267 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "5fa2e8f4584c78250ca9729b46562ceb",
+ "translation_date": "2025-09-03T20:19:40+00:00",
+ "source_file": "3-Web-App/1-Web-App/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " datetime city state country shape \\\n",
+ "0 10/10/1949 20:30 san marcos tx us cylinder \n",
+ "1 10/10/1949 21:00 lackland afb tx NaN light \n",
+ "2 10/10/1955 17:00 chester (uk/england) NaN gb circle \n",
+ "3 10/10/1956 21:00 edna tx us circle \n",
+ "4 10/10/1960 20:00 kaneohe hi us light \n",
+ "\n",
+ " duration (seconds) duration (hours/min) \\\n",
+ "0 2700.0 45 minutes \n",
+ "1 7200.0 1-2 hrs \n",
+ "2 20.0 20 seconds \n",
+ "3 20.0 1/2 hour \n",
+ "4 900.0 15 minutes \n",
+ "\n",
+ " comments date posted latitude \\\n",
+ "0 This event took place in early fall around 194... 4/27/2004 29.883056 \n",
+ "1 1949 Lackland AFB, TX. Lights racing acros... 12/16/2005 29.384210 \n",
+ "2 Green/Orange circular disc over Chester, En... 1/21/2008 53.200000 \n",
+ "3 My older brother and twin sister were leaving ... 1/17/2004 28.978333 \n",
+ "4 AS a Marine 1st Lt. flying an FJ4B fighter/att... 1/22/2004 21.418056 \n",
+ "\n",
+ " longitude \n",
+ "0 -97.941111 \n",
+ "1 -98.581082 \n",
+ "2 -2.916667 \n",
+ "3 -96.645833 \n",
+ "4 -157.803611 "
+ ],
+ "text/html": "\n\n
\n \n \n \n datetime \n city \n state \n country \n shape \n duration (seconds) \n duration (hours/min) \n comments \n date posted \n latitude \n longitude \n \n \n \n \n 0 \n 10/10/1949 20:30 \n san marcos \n tx \n us \n cylinder \n 2700.0 \n 45 minutes \n This event took place in early fall around 194... \n 4/27/2004 \n 29.883056 \n -97.941111 \n \n \n 1 \n 10/10/1949 21:00 \n lackland afb \n tx \n NaN \n light \n 7200.0 \n 1-2 hrs \n 1949 Lackland AFB, TX. Lights racing acros... \n 12/16/2005 \n 29.384210 \n -98.581082 \n \n \n 2 \n 10/10/1955 17:00 \n chester (uk/england) \n NaN \n gb \n circle \n 20.0 \n 20 seconds \n Green/Orange circular disc over Chester, En... \n 1/21/2008 \n 53.200000 \n -2.916667 \n \n \n 3 \n 10/10/1956 21:00 \n edna \n tx \n us \n circle \n 20.0 \n 1/2 hour \n My older brother and twin sister were leaving ... \n 1/17/2004 \n 28.978333 \n -96.645833 \n \n \n 4 \n 10/10/1960 20:00 \n kaneohe \n hi \n us \n light \n 900.0 \n 15 minutes \n AS a Marine 1st Lt. flying an FJ4B fighter/att... \n 1/22/2004 \n 21.418056 \n -157.803611 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "\n",
+ "ufos = pd.read_csv('../data/ufos.csv')\n",
+ "ufos.head()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array(['us', nan, 'gb', 'ca', 'au', 'de'], dtype=object)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 24
+ }
+ ],
+ "source": [
+ "\n",
+ "ufos = pd.DataFrame({'Seconds': ufos['duration (seconds)'], 'Country': ufos['country'],'Latitude': ufos['latitude'],'Longitude': ufos['longitude']})\n",
+ "\n",
+ "ufos.Country.unique()\n",
+ "\n",
+ "# 0 au, 1 ca, 2 de, 3 gb, 4 us"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\nInt64Index: 25863 entries, 2 to 80330\nData columns (total 4 columns):\n # Column Non-Null Count Dtype \n--- ------ -------------- ----- \n 0 Seconds 25863 non-null float64\n 1 Country 25863 non-null object \n 2 Latitude 25863 non-null float64\n 3 Longitude 25863 non-null float64\ndtypes: float64(3), object(1)\nmemory usage: 1010.3+ KB\n"
+ ]
+ }
+ ],
+ "source": [
+ "ufos.dropna(inplace=True)\n",
+ "\n",
+ "ufos = ufos[(ufos['Seconds'] >= 1) & (ufos['Seconds'] <= 60)]\n",
+ "\n",
+ "ufos.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Seconds Country Latitude Longitude\n",
+ "2 20.0 3 53.200000 -2.916667\n",
+ "3 20.0 4 28.978333 -96.645833\n",
+ "14 30.0 4 35.823889 -80.253611\n",
+ "23 60.0 4 45.582778 -122.352222\n",
+ "24 3.0 3 51.783333 -0.783333"
+ ],
+ "text/html": "\n\n
\n \n \n \n Seconds \n Country \n Latitude \n Longitude \n \n \n \n \n 2 \n 20.0 \n 3 \n 53.200000 \n -2.916667 \n \n \n 3 \n 20.0 \n 4 \n 28.978333 \n -96.645833 \n \n \n 14 \n 30.0 \n 4 \n 35.823889 \n -80.253611 \n \n \n 23 \n 60.0 \n 4 \n 45.582778 \n -122.352222 \n \n \n 24 \n 3.0 \n 3 \n 51.783333 \n -0.783333 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 26
+ }
+ ],
+ "source": [
+ "from sklearn.preprocessing import LabelEncoder\n",
+ "\n",
+ "ufos['Country'] = LabelEncoder().fit_transform(ufos['Country'])\n",
+ "\n",
+ "ufos.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "\n",
+ "Selected_features = ['Seconds','Latitude','Longitude']\n",
+ "\n",
+ "X = ufos[Selected_features]\n",
+ "y = ufos['Country']\n",
+ "\n",
+ "\n",
+ "X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.2, random_state=0)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:432: FutureWarning: Default solver will be changed to 'lbfgs' in 0.22. Specify a solver to silence this warning.\n",
+ " FutureWarning)\n",
+ "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/linear_model/logistic.py:469: FutureWarning: Default multi_class will be changed to 'auto' in 0.22. Specify the multi_class option to silence this warning.\n",
+ " \"this warning.\", FutureWarning)\n",
+ " precision recall f1-score support\n",
+ "\n",
+ " 0 1.00 1.00 1.00 41\n",
+ " 1 1.00 0.02 0.05 250\n",
+ " 2 0.00 0.00 0.00 8\n",
+ " 3 0.94 1.00 0.97 131\n",
+ " 4 0.95 1.00 0.97 4743\n",
+ "\n",
+ " accuracy 0.95 5173\n",
+ " macro avg 0.78 0.60 0.60 5173\n",
+ "weighted avg 0.95 0.95 0.93 5173\n",
+ "\n",
+ "Predicted labels: [4 4 4 ... 3 4 4]\n",
+ "Accuracy: 0.9512855209742895\n",
+ "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/metrics/classification.py:1437: UndefinedMetricWarning: Precision and F-score are ill-defined and being set to 0.0 in labels with no predicted samples.\n",
+ " 'precision', 'predicted', average, warn_for)\n"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn.metrics import accuracy_score, classification_report \n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "model = LogisticRegression()\n",
+ "model.fit(X_train, y_train)\n",
+ "predictions = model.predict(X_test)\n",
+ "\n",
+ "print(classification_report(y_test, predictions))\n",
+ "print('Predicted labels: ', predictions)\n",
+ "print('Accuracy: ', accuracy_score(y_test, predictions))\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "[3]\n"
+ ]
+ }
+ ],
+ "source": [
+ "import pickle\n",
+ "model_filename = 'ufo-model.pkl'\n",
+ "pickle.dump(model, open(model_filename,'wb'))\n",
+ "\n",
+ "model = pickle.load(open('ufo-model.pkl','rb'))\n",
+ "print(model.predict([[50,44,-12]]))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/3-Web-App/README.md b/translations/zh-CN/3-Web-App/README.md
new file mode 100644
index 000000000..09dc6f96b
--- /dev/null
+++ b/translations/zh-CN/3-Web-App/README.md
@@ -0,0 +1,26 @@
+# 构建一个使用您的机器学习模型的网页应用
+
+在本课程的这一部分,您将学习一个应用型的机器学习主题:如何将您的 Scikit-learn 模型保存为一个文件,以便在网页应用中进行预测。一旦模型保存完成,您将学习如何在使用 Flask 构建的网页应用中使用它。您将首先使用一些关于 UFO 目击事件的数据创建一个模型!然后,您将构建一个网页应用,允许用户输入持续时间(秒数)、纬度和经度值,以预测哪个国家报告了看到 UFO。
+
+
+
+照片由 Michael Herren 提供,来自 Unsplash
+
+## 课程
+
+1. [构建一个网页应用](1-Web-App/README.md)
+
+## 致谢
+
+“构建一个网页应用”由 [Jen Looper](https://twitter.com/jenlooper) 倾情撰写。
+
+♥️ 测验由 Rohan Raj 编写。
+
+数据集来源于 [Kaggle](https://www.kaggle.com/NUFORC/ufo-sightings)。
+
+网页应用架构部分参考了 [这篇文章](https://towardsdatascience.com/how-to-easily-deploy-machine-learning-models-using-flask-b95af8fe34d4) 和 [这个仓库](https://github.com/abhinavsagar/machine-learning-deployment),作者为 Abhinav Sagar。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/1-Introduction/README.md b/translations/zh-CN/4-Classification/1-Introduction/README.md
new file mode 100644
index 000000000..62bc1e4f3
--- /dev/null
+++ b/translations/zh-CN/4-Classification/1-Introduction/README.md
@@ -0,0 +1,304 @@
+# 分类简介
+
+在这四节课中,你将探索经典机器学习的一个核心主题——_分类_。我们将使用一个关于亚洲和印度各种美食的数据集,逐步学习如何使用不同的分类算法。希望你已经准备好大快朵颐了!
+
+
+
+> 在这些课程中,庆祝泛亚洲美食吧!图片由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+分类是一种[监督学习](https://wikipedia.org/wiki/Supervised_learning)方法,与回归技术有许多相似之处。如果说机器学习的核心是通过数据集预测值或名称,那么分类通常分为两类:_二元分类_和_多类分类_。
+
+[](https://youtu.be/eg8DJYwdMyg "分类简介")
+
+> 🎥 点击上方图片观看视频:MIT 的 John Guttag 介绍分类
+
+请记住:
+
+- **线性回归** 帮助你预测变量之间的关系,并准确预测新数据点在这条线上的位置。例如,你可以预测_南瓜在九月和十二月的价格_。
+- **逻辑回归** 帮助你发现“二元类别”:在这个价格点上,_这个南瓜是橙色还是非橙色_?
+
+分类使用各种算法来确定数据点的标签或类别。让我们通过这个美食数据集来看看,是否可以通过观察一组食材来确定它的美食来源。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+> ### [本课程也提供 R 版本!](../../../../4-Classification/1-Introduction/solution/R/lesson_10.html)
+
+### 简介
+
+分类是机器学习研究者和数据科学家的基本活动之一。从简单的二元值分类(“这封邮件是垃圾邮件吗?”)到使用计算机视觉进行复杂的图像分类和分割,能够将数据分类并提出问题总是很有用的。
+
+用更科学的方式表述,你的分类方法会创建一个预测模型,使你能够将输入变量与输出变量之间的关系映射出来。
+
+
+
+> 分类算法处理二元问题和多类问题的对比。信息图由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+在开始清理数据、可视化数据并为机器学习任务做准备之前,让我们先了解一下机器学习分类数据的各种方式。
+
+分类源自[统计学](https://wikipedia.org/wiki/Statistical_classification),使用经典机器学习进行分类时,会利用特征(如 `smoker`、`weight` 和 `age`)来确定_患某种疾病的可能性_。作为一种类似于之前回归练习的监督学习技术,你的数据是带标签的,机器学习算法使用这些标签来分类和预测数据集的类别(或“特征”),并将其分配到某个组或结果中。
+
+✅ 花点时间想象一个关于美食的数据集。一个多类模型可以回答什么问题?一个二元模型可以回答什么问题?如果你想确定某种美食是否可能使用葫芦巴呢?如果你想知道,给你一袋装满八角、洋蓟、花椰菜和辣根的杂货,你是否可以做出一道典型的印度菜呢?
+
+[](https://youtu.be/GuTeDbaNoEU "疯狂的神秘篮子")
+
+> 🎥 点击上方图片观看视频。节目《Chopped》的核心是“神秘篮子”,厨师们必须用随机选择的食材制作一道菜。机器学习模型肯定能帮上忙!
+
+## 你好,“分类器”
+
+我们想要从这个美食数据集中提出的问题实际上是一个**多类问题**,因为我们有多个潜在的国家美食类别可供选择。给定一组食材,这些数据会属于哪一类?
+
+Scikit-learn 提供了多种算法来分类数据,具体取决于你想解决的问题类型。在接下来的两节课中,你将学习其中几种算法。
+
+## 练习 - 清理并平衡数据
+
+在开始这个项目之前,第一项任务是清理并**平衡**数据,以获得更好的结果。从本文件夹根目录中的空白 _notebook.ipynb_ 文件开始。
+
+首先需要安装 [imblearn](https://imbalanced-learn.org/stable/)。这是一个 Scikit-learn 的扩展包,可以帮助你更好地平衡数据(稍后你会了解更多关于这个任务的内容)。
+
+1. 安装 `imblearn`,运行以下命令:
+
+ ```python
+ pip install imblearn
+ ```
+
+1. 导入所需的包以导入数据并进行可视化,同时从 `imblearn` 中导入 `SMOTE`。
+
+ ```python
+ import pandas as pd
+ import matplotlib.pyplot as plt
+ import matplotlib as mpl
+ import numpy as np
+ from imblearn.over_sampling import SMOTE
+ ```
+
+ 现在你已经准备好导入数据了。
+
+1. 接下来导入数据:
+
+ ```python
+ df = pd.read_csv('../data/cuisines.csv')
+ ```
+
+ 使用 `read_csv()` 将 _cusines.csv_ 文件的内容读取到变量 `df` 中。
+
+1. 检查数据的形状:
+
+ ```python
+ df.head()
+ ```
+
+ 前五行数据如下所示:
+
+ ```output
+ | | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+ | --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+ | 0 | 65 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 1 | 66 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 2 | 67 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 3 | 68 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+ | 4 | 69 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+ ```
+
+1. 调用 `info()` 获取数据的信息:
+
+ ```python
+ df.info()
+ ```
+
+ 输出类似于:
+
+ ```output
+
+ RangeIndex: 2448 entries, 0 to 2447
+ Columns: 385 entries, Unnamed: 0 to zucchini
+ dtypes: int64(384), object(1)
+ memory usage: 7.2+ MB
+ ```
+
+## 练习 - 了解美食
+
+现在工作开始变得有趣了。让我们发现每种美食的数据分布情况。
+
+1. 调用 `barh()` 将数据绘制为条形图:
+
+ ```python
+ df.cuisine.value_counts().plot.barh()
+ ```
+
+ 
+
+ 美食的种类是有限的,但数据分布不均匀。你可以解决这个问题!在此之前,先多探索一下。
+
+1. 找出每种美食的数据量并打印出来:
+
+ ```python
+ thai_df = df[(df.cuisine == "thai")]
+ japanese_df = df[(df.cuisine == "japanese")]
+ chinese_df = df[(df.cuisine == "chinese")]
+ indian_df = df[(df.cuisine == "indian")]
+ korean_df = df[(df.cuisine == "korean")]
+
+ print(f'thai df: {thai_df.shape}')
+ print(f'japanese df: {japanese_df.shape}')
+ print(f'chinese df: {chinese_df.shape}')
+ print(f'indian df: {indian_df.shape}')
+ print(f'korean df: {korean_df.shape}')
+ ```
+
+ 输出如下所示:
+
+ ```output
+ thai df: (289, 385)
+ japanese df: (320, 385)
+ chinese df: (442, 385)
+ indian df: (598, 385)
+ korean df: (799, 385)
+ ```
+
+## 探索食材
+
+现在你可以更深入地挖掘数据,了解每种美食的典型食材。你需要清理那些在不同美食之间造成混淆的重复数据,因此让我们了解这个问题。
+
+1. 在 Python 中创建一个函数 `create_ingredient()`,用于创建一个食材数据框。这个函数会先删除无用的列,然后按食材的数量进行排序:
+
+ ```python
+ def create_ingredient_df(df):
+ ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')
+ ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]
+ ingredient_df = ingredient_df.sort_values(by='value', ascending=False,
+ inplace=False)
+ return ingredient_df
+ ```
+
+ 现在你可以使用这个函数来了解每种美食中最受欢迎的前十种食材。
+
+1. 调用 `create_ingredient()` 并通过调用 `barh()` 绘制图表:
+
+ ```python
+ thai_ingredient_df = create_ingredient_df(thai_df)
+ thai_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 对日本美食数据做同样的操作:
+
+ ```python
+ japanese_ingredient_df = create_ingredient_df(japanese_df)
+ japanese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 接下来是中国美食的食材:
+
+ ```python
+ chinese_ingredient_df = create_ingredient_df(chinese_df)
+ chinese_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 绘制印度美食的食材:
+
+ ```python
+ indian_ingredient_df = create_ingredient_df(indian_df)
+ indian_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 最后,绘制韩国美食的食材:
+
+ ```python
+ korean_ingredient_df = create_ingredient_df(korean_df)
+ korean_ingredient_df.head(10).plot.barh()
+ ```
+
+ 
+
+1. 现在,通过调用 `drop()` 删除那些在不同美食之间造成混淆的最常见食材:
+
+ 每个人都喜欢米饭、大蒜和生姜!
+
+ ```python
+ feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)
+ labels_df = df.cuisine #.unique()
+ feature_df.head()
+ ```
+
+## 平衡数据集
+
+现在你已经清理了数据,使用 [SMOTE](https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html)(“合成少数类过采样技术”)来平衡数据。
+
+1. 调用 `fit_resample()`,这种策略通过插值生成新样本。
+
+ ```python
+ oversample = SMOTE()
+ transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)
+ ```
+
+ 通过平衡数据,你在分类时会获得更好的结果。想象一个二元分类问题。如果你的大部分数据属于一个类别,机器学习模型会更频繁地预测这个类别,仅仅因为它的数据更多。平衡数据可以消除这种不平衡。
+
+1. 现在你可以检查每种食材的标签数量:
+
+ ```python
+ print(f'new label count: {transformed_label_df.value_counts()}')
+ print(f'old label count: {df.cuisine.value_counts()}')
+ ```
+
+ 输出如下所示:
+
+ ```output
+ new label count: korean 799
+ chinese 799
+ indian 799
+ japanese 799
+ thai 799
+ Name: cuisine, dtype: int64
+ old label count: korean 799
+ indian 598
+ chinese 442
+ japanese 320
+ thai 289
+ Name: cuisine, dtype: int64
+ ```
+
+ 数据现在干净、平衡,而且非常诱人!
+
+1. 最后一步是将平衡后的数据(包括标签和特征)保存到一个新的数据框中,并导出到文件中:
+
+ ```python
+ transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')
+ ```
+
+1. 你可以通过调用 `transformed_df.head()` 和 `transformed_df.info()` 再次查看数据。保存一份数据副本以供后续课程使用:
+
+ ```python
+ transformed_df.head()
+ transformed_df.info()
+ transformed_df.to_csv("../data/cleaned_cuisines.csv")
+ ```
+
+ 这个新的 CSV 文件现在可以在根数据文件夹中找到。
+
+---
+
+## 🚀挑战
+
+本课程包含多个有趣的数据集。浏览 `data` 文件夹,看看是否有适合二元或多类分类的数据集?你会对这个数据集提出什么问题?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+探索 SMOTE 的 API。它最适合哪些用例?它解决了哪些问题?
+
+## 作业
+
+[探索分类方法](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/1-Introduction/assignment.md b/translations/zh-CN/4-Classification/1-Introduction/assignment.md
new file mode 100644
index 000000000..9d2600722
--- /dev/null
+++ b/translations/zh-CN/4-Classification/1-Introduction/assignment.md
@@ -0,0 +1,16 @@
+# 探索分类方法
+
+## 说明
+
+在 [Scikit-learn 文档](https://scikit-learn.org/stable/supervised_learning.html) 中,你会发现大量用于分类数据的方法。请在这些文档中进行一次小型寻宝活动:你的目标是寻找分类方法,并将其与本课程中的一个数据集、一种可以提出的问题以及一种分类技术进行匹配。创建一个电子表格或 .doc 文件中的表格,并解释该数据集如何与分类算法配合使用。
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | ----------------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------- |
+| | 提交的文档概述了 5 种算法及其分类技术。概述解释清晰且详细。 | 提交的文档概述了 3 种算法及其分类技术。概述解释清晰且详细。 | 提交的文档概述了少于 3 种算法及其分类技术,且概述既不清晰也不详细。 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/1-Introduction/notebook.ipynb b/translations/zh-CN/4-Classification/1-Introduction/notebook.ipynb
new file mode 100644
index 000000000..ca6f5befb
--- /dev/null
+++ b/translations/zh-CN/4-Classification/1-Introduction/notebook.ipynb
@@ -0,0 +1,39 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": 3
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "d544ef384b7ba73757d830a72372a7f2",
+ "translation_date": "2025-09-03T20:33:39+00:00",
+ "source_file": "4-Classification/1-Introduction/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/1-Introduction/solution/Julia/README.md b/translations/zh-CN/4-Classification/1-Introduction/solution/Julia/README.md
new file mode 100644
index 000000000..b411dd85f
--- /dev/null
+++ b/translations/zh-CN/4-Classification/1-Introduction/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/1-Introduction/solution/R/lesson_10-R.ipynb b/translations/zh-CN/4-Classification/1-Introduction/solution/R/lesson_10-R.ipynb
new file mode 100644
index 000000000..37faa8081
--- /dev/null
+++ b/translations/zh-CN/4-Classification/1-Introduction/solution/R/lesson_10-R.ipynb
@@ -0,0 +1,722 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {
+ "colab": {
+ "name": "lesson_10-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": []
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ },
+ "coopTranslator": {
+ "original_hash": "2621e24705e8100893c9bf84e0fc8aef",
+ "translation_date": "2025-09-03T20:40:23+00:00",
+ "source_file": "4-Classification/1-Introduction/solution/R/lesson_10-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 构建分类模型:美味的亚洲和印度美食\n"
+ ],
+ "metadata": {
+ "id": "ItETB4tSFprR"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 分类简介:清理、准备和可视化数据\n",
+ "\n",
+ "在这四节课中,您将探索经典机器学习的一个核心主题——*分类*。我们将使用一个关于亚洲和印度美食的数据集,逐步学习各种分类算法的应用。希望您已经准备好大快朵颐!\n",
+ "\n",
+ "\n",
+ " \n",
+ " 在这些课程中庆祝泛亚洲美食!图片由 Jen Looper 提供 \n",
+ "\n",
+ "分类是一种[监督学习](https://wikipedia.org/wiki/Supervised_learning)形式,与回归技术有许多相似之处。在分类中,您训练一个模型来预测某个项目属于哪个`类别`。如果机器学习的核心是通过数据集预测事物的值或名称,那么分类通常分为两类:*二元分类*和*多类分类*。\n",
+ "\n",
+ "请记住:\n",
+ "\n",
+ "- **线性回归**帮助您预测变量之间的关系,并准确预测新数据点在该关系线上的位置。例如,您可以预测数值,比如*南瓜在九月和十二月的价格*。\n",
+ "\n",
+ "- **逻辑回归**帮助您发现“二元类别”:在这个价格点,*这个南瓜是橙色还是非橙色*?\n",
+ "\n",
+ "分类使用各种算法来确定数据点的标签或类别。让我们使用这个美食数据集,看看通过观察一组食材,是否可以确定其美食的来源。\n",
+ "\n",
+ "### [**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/19/)\n",
+ "\n",
+ "### **简介**\n",
+ "\n",
+ "分类是机器学习研究人员和数据科学家的基本活动之一。从简单的二元值分类(“这封邮件是垃圾邮件还是不是?”),到使用计算机视觉进行复杂的图像分类和分割,能够将数据分类并提出问题总是非常有用。\n",
+ "\n",
+ "用更科学的方式来说,您的分类方法会创建一个预测模型,使您能够将输入变量与输出变量之间的关系进行映射。\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 分类算法处理二元问题与多类问题。信息图由 Jen Looper 提供 \n",
+ "\n",
+ "在开始清理数据、可视化数据以及为机器学习任务准备数据之前,让我们先了解一下机器学习如何用于分类数据的各种方式。\n",
+ "\n",
+ "分类源自[统计学](https://wikipedia.org/wiki/Statistical_classification),使用经典机器学习进行分类时,会利用特征,例如`吸烟者`、`体重`和`年龄`,来确定*患某种疾病的可能性*。作为一种类似于您之前进行的回归练习的监督学习技术,您的数据是带标签的,机器学习算法使用这些标签来分类和预测数据集的类别(或“特征”),并将其分配到某个组或结果中。\n",
+ "\n",
+ "✅ 花点时间想象一个关于美食的数据集。多类模型可以回答什么问题?二元模型可以回答什么问题?如果您想确定某种美食是否可能使用葫芦巴怎么办?如果您想知道,假如收到一袋包含八角、洋蓟、花椰菜和辣根的杂货,是否可以制作一道典型的印度菜呢?\n",
+ "\n",
+ "### **你好,‘分类器’**\n",
+ "\n",
+ "我们想要从这个美食数据集中提出的问题实际上是一个**多类问题**,因为我们有多个潜在的国家美食类别可以选择。给定一组食材,这些数据会属于哪一个类别?\n",
+ "\n",
+ "Tidymodels 提供了几种不同的算法来分类数据,具体取决于您想解决的问题类型。在接下来的两节课中,您将学习其中几种算法。\n",
+ "\n",
+ "#### **前提条件**\n",
+ "\n",
+ "在本课程中,我们需要以下包来清理、准备和可视化数据:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个[集合的 R 包](https://www.tidyverse.org/packages),旨在让数据科学更快、更简单、更有趣!\n",
+ "\n",
+ "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个[建模和机器学习的包集合](https://www.tidymodels.org/packages/)。\n",
+ "\n",
+ "- `DataExplorer`: [DataExplorer 包](https://cran.r-project.org/web/packages/DataExplorer/vignettes/dataexplorer-intro.html)旨在简化和自动化探索性数据分析过程和报告生成。\n",
+ "\n",
+ "- `themis`: [themis 包](https://themis.tidymodels.org/)提供了处理不平衡数据的额外配方步骤。\n",
+ "\n",
+ "您可以通过以下方式安装它们:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"DataExplorer\", \"here\"))`\n",
+ "\n",
+ "或者,下面的脚本会检查您是否安装了完成本模块所需的包,并在缺少时为您安装。\n"
+ ],
+ "metadata": {
+ "id": "ri5bQxZ-Fz_0"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\r\n",
+ "\r\n",
+ "pacman::p_load(tidyverse, tidymodels, DataExplorer, themis, here)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "KIPxa4elGAPI"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "我们稍后会加载这些很棒的包,并使它们在我们当前的 R 会话中可用。(这只是为了说明,`pacman::p_load()` 已经为您完成了这一操作)\n"
+ ],
+ "metadata": {
+ "id": "YkKAxOJvGD4C"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 练习 - 清理并平衡数据\n",
+ "\n",
+ "在开始这个项目之前,首要任务是清理并**平衡**你的数据,以获得更好的结果。\n",
+ "\n",
+ "让我们来看看数据吧!🕵️\n"
+ ],
+ "metadata": {
+ "id": "PFkQDlk0GN5O"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Import data\r\n",
+ "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\r\n",
+ "\r\n",
+ "# View the first 5 rows\r\n",
+ "df %>% \r\n",
+ " slice_head(n = 5)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "Qccw7okxGT0S"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "有趣!从表面上看,第一列是一种`id`列。让我们获取一些关于数据的更多信息。\n"
+ ],
+ "metadata": {
+ "id": "XrWnlgSrGVmR"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Basic information about the data\r\n",
+ "df %>%\r\n",
+ " introduce()\r\n",
+ "\r\n",
+ "# Visualize basic information above\r\n",
+ "df %>% \r\n",
+ " plot_intro(ggtheme = theme_light())"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "4UcGmxRxGieA"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "从输出中我们可以直接看到,我们有 `2448` 行和 `385` 列,并且没有缺失值。此外,我们还有一个离散列,*cuisine*。\n",
+ "\n",
+ "## 练习 - 了解菜系\n",
+ "\n",
+ "现在工作开始变得更有趣了。让我们探索每种菜系的数据分布。\n"
+ ],
+ "metadata": {
+ "id": "AaPubl__GmH5"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Count observations per cuisine\r\n",
+ "df %>% \r\n",
+ " count(cuisine) %>% \r\n",
+ " arrange(n)\r\n",
+ "\r\n",
+ "# Plot the distribution\r\n",
+ "theme_set(theme_light())\r\n",
+ "df %>% \r\n",
+ " count(cuisine) %>% \r\n",
+ " ggplot(mapping = aes(x = n, y = reorder(cuisine, -n))) +\r\n",
+ " geom_col(fill = \"midnightblue\", alpha = 0.7) +\r\n",
+ " ylab(\"cuisine\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "FRsBVy5eGrrv"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "世界上的菜系种类有限,但数据分布却不均衡。你可以改变这一点!在此之前,先多探索一下吧。\n",
+ "\n",
+ "接下来,让我们将每种菜系分配到各自的 tibble 中,并找出每种菜系的数据量(行数和列数)。\n",
+ "\n",
+ "> [tibble](https://tibble.tidyverse.org/) 是一种现代化的数据框。\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 由 @allison_horst 创作的艺术作品 \n"
+ ],
+ "metadata": {
+ "id": "vVvyDb1kG2in"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Create individual tibble for the cuisines\r\n",
+ "thai_df <- df %>% \r\n",
+ " filter(cuisine == \"thai\")\r\n",
+ "japanese_df <- df %>% \r\n",
+ " filter(cuisine == \"japanese\")\r\n",
+ "chinese_df <- df %>% \r\n",
+ " filter(cuisine == \"chinese\")\r\n",
+ "indian_df <- df %>% \r\n",
+ " filter(cuisine == \"indian\")\r\n",
+ "korean_df <- df %>% \r\n",
+ " filter(cuisine == \"korean\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Find out how much data is available per cuisine\r\n",
+ "cat(\" thai df:\", dim(thai_df), \"\\n\",\r\n",
+ " \"japanese df:\", dim(japanese_df), \"\\n\",\r\n",
+ " \"chinese_df:\", dim(chinese_df), \"\\n\",\r\n",
+ " \"indian_df:\", dim(indian_df), \"\\n\",\r\n",
+ " \"korean_df:\", dim(korean_df))"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "0TvXUxD3G8Bk"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## **练习 - 使用 dplyr 探索不同菜系的主要食材**\n",
+ "\n",
+ "现在你可以更深入地挖掘数据,了解每种菜系的典型食材。你需要清理一些会在菜系之间引起混淆的重复数据,所以让我们来学习如何解决这个问题。\n",
+ "\n",
+ "在 R 中创建一个名为 `create_ingredient()` 的函数,该函数返回一个食材数据框。这个函数将从删除一个无用的列开始,并根据食材的数量对其进行排序。\n",
+ "\n",
+ "R 中函数的基本结构如下:\n",
+ "\n",
+ "`myFunction <- function(arglist){`\n",
+ "\n",
+ "**`...`**\n",
+ "\n",
+ "**`return`**`(value)`\n",
+ "\n",
+ "`}`\n",
+ "\n",
+ "关于 R 函数的简洁介绍可以在[这里](https://skirmer.github.io/presentations/functions_with_r.html#1)找到。\n",
+ "\n",
+ "让我们直接开始吧!我们将使用之前课程中学习过的 [dplyr 动词](https://dplyr.tidyverse.org/)。回顾一下:\n",
+ "\n",
+ "- `dplyr::select()`:帮助你选择要保留或排除的**列**。\n",
+ "\n",
+ "- `dplyr::pivot_longer()`:帮助你“拉长”数据,增加行数并减少列数。\n",
+ "\n",
+ "- `dplyr::group_by()` 和 `dplyr::summarise()`:帮助你为不同组计算汇总统计数据,并将其整理成一个漂亮的表格。\n",
+ "\n",
+ "- `dplyr::filter()`:创建一个仅包含满足条件的行的数据子集。\n",
+ "\n",
+ "- `dplyr::mutate()`:帮助你创建或修改列。\n",
+ "\n",
+ "查看 Allison Horst 的这个充满[*艺术*](https://allisonhorst.shinyapps.io/dplyr-learnr/#section-welcome)的 learnr 教程,它介绍了一些 dplyr(Tidyverse 的一部分)中有用的数据整理函数。\n"
+ ],
+ "metadata": {
+ "id": "K3RF5bSCHC76"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Creates a functions that returns the top ingredients by class\r\n",
+ "\r\n",
+ "create_ingredient <- function(df){\r\n",
+ " \r\n",
+ " # Drop the id column which is the first colum\r\n",
+ " ingredient_df = df %>% select(-1) %>% \r\n",
+ " # Transpose data to a long format\r\n",
+ " pivot_longer(!cuisine, names_to = \"ingredients\", values_to = \"count\") %>% \r\n",
+ " # Find the top most ingredients for a particular cuisine\r\n",
+ " group_by(ingredients) %>% \r\n",
+ " summarise(n_instances = sum(count)) %>% \r\n",
+ " filter(n_instances != 0) %>% \r\n",
+ " # Arrange by descending order\r\n",
+ " arrange(desc(n_instances)) %>% \r\n",
+ " mutate(ingredients = factor(ingredients) %>% fct_inorder())\r\n",
+ " \r\n",
+ " \r\n",
+ " return(ingredient_df)\r\n",
+ "} # End of function"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "uB_0JR82HTPa"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在我们可以使用这个函数来了解每种菜系中最受欢迎的前十种食材。让我们用 `thai_df` 来试试吧。\n"
+ ],
+ "metadata": {
+ "id": "h9794WF8HWmc"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Call create_ingredient and display popular ingredients\r\n",
+ "thai_ingredient_df <- create_ingredient(df = thai_df)\r\n",
+ "\r\n",
+ "thai_ingredient_df %>% \r\n",
+ " slice_head(n = 10)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "agQ-1HrcHaEA"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "在上一节中,我们使用了`geom_col()`,现在让我们看看如何使用`geom_bar`来创建条形图。使用`?geom_bar`了解更多信息。\n"
+ ],
+ "metadata": {
+ "id": "kHu9ffGjHdcX"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Make a bar chart for popular thai cuisines\r\n",
+ "thai_ingredient_df %>% \r\n",
+ " slice_head(n = 10) %>% \r\n",
+ " ggplot(aes(x = n_instances, y = ingredients)) +\r\n",
+ " geom_bar(stat = \"identity\", width = 0.5, fill = \"steelblue\") +\r\n",
+ " xlab(\"\") + ylab(\"\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "fb3Bx_3DHj6e"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "让我们对日语数据做同样的事情\n"
+ ],
+ "metadata": {
+ "id": "RHP_xgdkHnvM"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Get popular ingredients for Japanese cuisines and make bar chart\r\n",
+ "create_ingredient(df = japanese_df) %>% \r\n",
+ " slice_head(n = 10) %>%\r\n",
+ " ggplot(aes(x = n_instances, y = ingredients)) +\r\n",
+ " geom_bar(stat = \"identity\", width = 0.5, fill = \"darkorange\", alpha = 0.8) +\r\n",
+ " xlab(\"\") + ylab(\"\")\r\n"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "019v8F0XHrRU"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "关于中国菜肴呢?\n"
+ ],
+ "metadata": {
+ "id": "iIGM7vO8Hu3v"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Get popular ingredients for Chinese cuisines and make bar chart\r\n",
+ "create_ingredient(df = chinese_df) %>% \r\n",
+ " slice_head(n = 10) %>%\r\n",
+ " ggplot(aes(x = n_instances, y = ingredients)) +\r\n",
+ " geom_bar(stat = \"identity\", width = 0.5, fill = \"cyan4\", alpha = 0.8) +\r\n",
+ " xlab(\"\") + ylab(\"\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "lHd9_gd2HyzU"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "ir8qyQbNH1c7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Get popular ingredients for Indian cuisines and make bar chart\r\n",
+ "create_ingredient(df = indian_df) %>% \r\n",
+ " slice_head(n = 10) %>%\r\n",
+ " ggplot(aes(x = n_instances, y = ingredients)) +\r\n",
+ " geom_bar(stat = \"identity\", width = 0.5, fill = \"#041E42FF\", alpha = 0.8) +\r\n",
+ " xlab(\"\") + ylab(\"\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "ApukQtKjH5FO"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "最后,绘制韩国食材。\n"
+ ],
+ "metadata": {
+ "id": "qv30cwY1H-FM"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Get popular ingredients for Korean cuisines and make bar chart\r\n",
+ "create_ingredient(df = korean_df) %>% \r\n",
+ " slice_head(n = 10) %>%\r\n",
+ " ggplot(aes(x = n_instances, y = ingredients)) +\r\n",
+ " geom_bar(stat = \"identity\", width = 0.5, fill = \"#852419FF\", alpha = 0.8) +\r\n",
+ " xlab(\"\") + ylab(\"\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "lumgk9cHIBie"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "从数据可视化中,我们现在可以使用 `dplyr::select()` 删除那些在不同菜系之间容易引起混淆的常见食材。\n",
+ "\n",
+ "大家都喜欢米饭、大蒜和姜!\n"
+ ],
+ "metadata": {
+ "id": "iO4veMXuIEta"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Drop id column, rice, garlic and ginger from our original data set\r\n",
+ "df_select <- df %>% \r\n",
+ " select(-c(1, rice, garlic, ginger))\r\n",
+ "\r\n",
+ "# Display new data set\r\n",
+ "df_select %>% \r\n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "iHJPiG6rIUcK"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 使用配方预处理数据 👩🍳👨🍳 - 处理数据不平衡 ⚖️\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 图片由 @allison_horst 提供 \n",
+ "\n",
+ "既然这节课是关于美食的,我们就需要将`recipes`放到具体的情境中。\n",
+ "\n",
+ "Tidymodels 提供了另一个非常实用的包:`recipes`——一个用于数据预处理的包。\n"
+ ],
+ "metadata": {
+ "id": "kkFd-JxdIaL6"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "让我们再来看看我们菜肴的分布情况。\n"
+ ],
+ "metadata": {
+ "id": "6l2ubtTPJAhY"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Distribution of cuisines\r\n",
+ "old_label_count <- df_select %>% \r\n",
+ " count(cuisine) %>% \r\n",
+ " arrange(desc(n))\r\n",
+ "\r\n",
+ "old_label_count"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "1e-E9cb7JDVi"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "如你所见,各种菜系的数量分布非常不均衡。韩国菜的数量几乎是泰国菜的三倍。不平衡的数据通常会对模型性能产生负面影响。想象一个二分类问题,如果你的数据大部分属于一个类别,机器学习模型可能会更频繁地预测这个类别,仅仅因为它的数据更多。平衡数据可以处理任何偏斜的数据,帮助消除这种不平衡。许多模型在观察数量相等时表现最佳,因此在处理不平衡数据时往往会遇到困难。\n",
+ "\n",
+ "处理不平衡数据集主要有两种方法:\n",
+ "\n",
+ "- 为少数类别添加观察值:`过采样`,例如使用 SMOTE 算法\n",
+ "\n",
+ "- 从多数类别中移除观察值:`欠采样`\n",
+ "\n",
+ "现在我们来演示如何使用一个`配方`来处理不平衡数据集。配方可以被看作是一个蓝图,描述了应该对数据集应用哪些步骤,以使其准备好进行数据分析。\n"
+ ],
+ "metadata": {
+ "id": "soAw6826JKx9"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load themis package for dealing with imbalanced data\r\n",
+ "library(themis)\r\n",
+ "\r\n",
+ "# Create a recipe for preprocessing data\r\n",
+ "cuisines_recipe <- recipe(cuisine ~ ., data = df_select) %>% \r\n",
+ " step_smote(cuisine)\r\n",
+ "\r\n",
+ "cuisines_recipe"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "HS41brUIJVJy"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "让我们分解预处理步骤。\n",
+ "\n",
+ "- 使用公式调用 `recipe()` 告诉配方变量的*角色*,并以 `df_select` 数据作为参考。例如,`cuisine` 列被分配了 `outcome` 角色,而其他列则被分配了 `predictor` 角色。\n",
+ "\n",
+ "- [`step_smote(cuisine)`](https://themis.tidymodels.org/reference/step_smote.html) 创建了一个配方步骤的*规范*,通过使用这些案例的最近邻合成生成少数类的新样本。\n",
+ "\n",
+ "现在,如果我们想查看预处理后的数据,我们需要使用 [**`prep()`**](https://recipes.tidymodels.org/reference/prep.html) 和 [**`bake()`**](https://recipes.tidymodels.org/reference/bake.html) 来处理我们的配方。\n",
+ "\n",
+ "`prep()`:从训练集估算所需参数,这些参数可以稍后应用于其他数据集。\n",
+ "\n",
+ "`bake()`:将预处理过的配方应用于任何数据集。\n"
+ ],
+ "metadata": {
+ "id": "Yb-7t7XcJaC8"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Prep and bake the recipe\r\n",
+ "preprocessed_df <- cuisines_recipe %>% \r\n",
+ " prep() %>% \r\n",
+ " bake(new_data = NULL) %>% \r\n",
+ " relocate(cuisine)\r\n",
+ "\r\n",
+ "# Display data\r\n",
+ "preprocessed_df %>% \r\n",
+ " slice_head(n = 5)\r\n",
+ "\r\n",
+ "# Quick summary stats\r\n",
+ "preprocessed_df %>% \r\n",
+ " introduce()"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "9QhSgdpxJl44"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在让我们检查我们的菜系分布,并将其与不平衡数据进行比较。\n"
+ ],
+ "metadata": {
+ "id": "dmidELh_LdV7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Distribution of cuisines\r\n",
+ "new_label_count <- preprocessed_df %>% \r\n",
+ " count(cuisine) %>% \r\n",
+ " arrange(desc(n))\r\n",
+ "\r\n",
+ "list(new_label_count = new_label_count,\r\n",
+ " old_label_count = old_label_count)"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "aSh23klBLwDz"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "嗯!数据干净整洁、平衡且非常棒,简直美味 😋!\n",
+ "\n",
+ "> 通常情况下,配方(recipe)通常被用作建模的预处理器,它定义了需要对数据集应用哪些步骤以使其为建模做好准备。在这种情况下,通常会使用 `workflow()`(正如我们在之前的课程中已经看到的),而不是手动估算配方。\n",
+ ">\n",
+ "> 因此,当你使用 tidymodels 时,通常不需要手动调用 **`prep()`** 和 **`bake()`** 来处理配方,但这些函数是非常有用的工具,可以用来确认配方是否按照你的预期运行,就像我们现在的情况一样。\n",
+ ">\n",
+ "> 当你使用 **`new_data = NULL`** 来 **`bake()`** 一个已经预处理好的配方时,你会得到定义配方时提供的数据,但这些数据已经经过了预处理步骤。\n",
+ "\n",
+ "现在,让我们保存一份这个数据的副本,以便在后续课程中使用:\n"
+ ],
+ "metadata": {
+ "id": "HEu80HZ8L7ae"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Save preprocessed data\r\n",
+ "write_csv(preprocessed_df, \"../../../data/cleaned_cuisines_R.csv\")"
+ ],
+ "outputs": [],
+ "metadata": {
+ "id": "cBmCbIgrMOI6"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "这个新的 CSV文件现在可以在根数据文件夹中找到。\n",
+ "\n",
+ "**🚀挑战**\n",
+ "\n",
+ "这个课程包含了几个有趣的数据集。浏览 `data` 文件夹,看看是否有适合二分类或多分类的数据集?你会对这个数据集提出哪些问题?\n",
+ "\n",
+ "## [**课后测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/20/)\n",
+ "\n",
+ "## **复习与自学**\n",
+ "\n",
+ "- 查看 [themis 包](https://github.com/tidymodels/themis)。我们还能使用哪些技术来处理数据不平衡问题?\n",
+ "\n",
+ "- Tidy models [参考网站](https://www.tidymodels.org/start/)。\n",
+ "\n",
+ "- H. Wickham 和 G. Grolemund, [*R for Data Science: 数据的可视化、建模、转换、整理和导入*](https://r4ds.had.co.nz/)。\n",
+ "\n",
+ "#### 感谢:\n",
+ "\n",
+ "[`Allison Horst`](https://twitter.com/allison_horst/) 创作了令人惊叹的插图,使 R 更加友好和吸引人。可以在她的 [画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM) 中找到更多插图。\n",
+ "\n",
+ "[Cassie Breviu](https://www.twitter.com/cassieview) 和 [Jen Looper](https://www.twitter.com/jenlooper) 创作了这个模块的原始 Python 版本 ♥️\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 插图作者 @allison_horst \n"
+ ],
+ "metadata": {
+ "id": "WQs5621pMGwf"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/1-Introduction/solution/notebook.ipynb b/translations/zh-CN/4-Classification/1-Introduction/solution/notebook.ipynb
new file mode 100644
index 000000000..0a9db550a
--- /dev/null
+++ b/translations/zh-CN/4-Classification/1-Introduction/solution/notebook.ipynb
@@ -0,0 +1,706 @@
+{
+ "cells": [
+ {
+ "source": [
+ "# 美味的亚洲和印度菜肴\n",
+ "\n",
+ "## 简介\n",
+ "亚洲和印度菜肴以其丰富的风味和多样的食材而闻名。无论是辛辣的咖喱还是清淡的蒸点心,这些菜肴都能满足各种口味。\n",
+ "\n",
+ "## 常见食材\n",
+ "以下是一些在亚洲和印度菜肴中常见的食材:\n",
+ "- 大米:许多菜肴的主食。\n",
+ "- 香料:如姜黄、孜然、香菜和辣椒。\n",
+ "- 豆类:如扁豆和鹰嘴豆。\n",
+ "- 蔬菜:如茄子、菠菜和花椰菜。\n",
+ "- 酱料:如酱油、鱼露和椰奶。\n",
+ "\n",
+ "## 经典菜肴\n",
+ "### 亚洲菜肴\n",
+ "- **寿司**:一种日本料理,由醋饭和生鱼片组成。\n",
+ "- **炒面**:一种中式料理,通常搭配蔬菜和肉类。\n",
+ "- **越南春卷**:用米纸包裹新鲜蔬菜和肉类的健康选择。\n",
+ "\n",
+ "### 印度菜肴\n",
+ "- **黄油鸡**:一种奶油味浓郁的咖喱鸡。\n",
+ "- **印度薄饼(Naan)**:一种用烤炉烤制的软面饼。\n",
+ "- **豆子咖喱(Dal)**:用扁豆制作的传统菜肴。\n",
+ "\n",
+ "## 烹饪技巧\n",
+ "- 使用新鲜的食材以确保最佳风味。\n",
+ "- 适量使用香料,避免过度。\n",
+ "- 慢炖咖喱以释放香料的全部风味。\n",
+ "\n",
+ "## 健康益处\n",
+ "亚洲和印度菜肴通常富含蔬菜和豆类,提供丰富的纤维和营养。此外,许多菜肴使用健康的烹饪方法,如蒸和炖。\n",
+ "\n",
+ "## 总结\n",
+ "无论是亚洲还是印度菜肴,它们都以独特的风味和多样性吸引着全球的美食爱好者。尝试这些菜肴不仅是一种味觉享受,也是一种文化体验。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "安装 Imblearn,它将启用 SMOTE。这是一个 Scikit-learn 包,可帮助在执行分类时处理不平衡数据。(https://imbalanced-learn.org/stable/)\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: imblearn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.0)\n",
+ "Requirement already satisfied: imbalanced-learn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imblearn) (0.8.0)\n",
+ "Requirement already satisfied: numpy>=1.13.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (1.19.2)\n",
+ "Requirement already satisfied: scipy>=0.19.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (1.4.1)\n",
+ "Requirement already satisfied: scikit-learn>=0.24 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (0.24.2)\n",
+ "Requirement already satisfied: joblib>=0.11 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from imbalanced-learn->imblearn) (0.16.0)\n",
+ "Requirement already satisfied: threadpoolctl>=2.0.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from scikit-learn>=0.24->imbalanced-learn->imblearn) (2.1.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n",
+ "You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "pip install imblearn"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import matplotlib.pyplot as plt\n",
+ "import matplotlib as mpl\n",
+ "import numpy as np\n",
+ "from imblearn.over_sampling import SMOTE"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "df = pd.read_csv('../../data/cuisines.csv')"
+ ]
+ },
+ {
+ "source": [
+ "该数据集包括385列,表示给定菜系中各种菜系的所有种类的成分。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n",
+ "0 65 indian 0 0 0 0 0 \n",
+ "1 66 indian 1 0 0 0 0 \n",
+ "2 67 indian 0 0 0 0 0 \n",
+ "3 68 indian 0 0 0 0 0 \n",
+ "4 69 indian 0 0 0 0 0 \n",
+ "\n",
+ " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 385 columns]"
+ ],
+ "text/html": "
\n\n
\n \n \n \n Unnamed: 0 \n cuisine \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 65 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 66 \n indian \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 67 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 68 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 69 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 385 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ],
+ "source": [
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\nRangeIndex: 2448 entries, 0 to 2447\nColumns: 385 entries, Unnamed: 0 to zucchini\ndtypes: int64(384), object(1)\nmemory usage: 7.2+ MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "df.info()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "korean 799\n",
+ "indian 598\n",
+ "chinese 442\n",
+ "japanese 320\n",
+ "thai 289\n",
+ "Name: cuisine, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ],
+ "source": [
+ "df.cuisine.value_counts()"
+ ]
+ },
+ {
+ "source": [
+ "在条形图中显示菜系\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAZEAAAD4CAYAAAAtrdtxAAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAASY0lEQVR4nO3df7TldV3v8eerGZkRRoeAiXtE5UgNIkUCjlwQIzAiC7NscdcSbcmsfkxl5SXX0juuyzK9d3UvlXnpplajma0kMtCUhluImNcr8msGBmb4pZaTQCFQOYom0fi+f+zPkd14hpnzOWefvYfzfKy113z35/vde7/22fvMa3++3733SVUhSVKPbxt3AEnSgcsSkSR1s0QkSd0sEUlSN0tEktRt+bgDLKYjjjiipqenxx1Dkg4oW7dufbiq1sy2bkmVyPT0NFu2bBl3DEk6oCT5u72tc3eWJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqduS+sT69vt3Mb3xqnHH0ALZefG5444gLXnORCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd0sEUlSN0tEktRtIkokyaFJXtuWz0yyeY6X/29Jzh5NOknS3kxEiQCHAq/tvXBVvbmqPraAeSRJ+2FSSuRi4DuTbAN+E1iV5Iokdye5NEkAkrw5yc1JdiTZNDT+viTnjTG/JC1Jk1IiG4G/qaoTgTcAJwEXAscDxwCnt+3eUVUvrKrvAZ4KvGxfV5xkQ5ItSbbs/tqu0aSXpCVqUkpkTzdV1X1V9Q1gGzDdxs9KcmOS7cBLgO/e1xVV1aaqWldV65YdvHp0iSVpCZrUL2B8dGh5N7A8yUrgXcC6qro3yVuAleMIJ0kamJSZyFeAp+1jm5nCeDjJKsBjIJI0ZhMxE6mqf0xyXZIdwL8AX5xlmy8leTewA3gAuHmRY0qS9jARJQJQVa/ay/gvDS1fBFw0yzbrR5dMkrQ3k7I7S5J0ALJEJEndLBFJUjdLRJLUzRKRJHWbmHdnLYYTjlrNlovPHXcMSXrScCYiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6rZ83AEW0/b7dzG98apxx9CY7Lz43HFHkJ50nIlIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG77VSJJPj3qIJKkA89+lUhVvWjUQSRJB579nYk8kmRVkmuT3JJke5Ifa+umk9yd5NIkdyW5IsnBbd2bk9ycZEeSTUnSxj+R5NeT3JTkM0m+r40vS/Kb7TK3J/m5Nj6V5JNJtrXrmtn+nCTXt0yXJ1k1ih+SJGl2czkm8nXgFVV1MnAW8FszpQA8F3hXVT0P+DLw2jb+jqp6YVV9D/BU4GVD17e8qk4BLgR+tY39NLCrql4IvBD42STPAV4FXF1VJwLPB7YlOQK4CDi7ZdoCvH4ud16SND9z+dqTAP8jyRnAN4CjgCPbunur6rq2/H7gdcDbgLOSvBE4GDgMuAP4i7bdh9q/W4HptnwO8L1JzmvnVwNrgZuB9yZ5CvDhqtqW5PuB44HrWpcdBFz/LaGTDcAGgGVPXzOHuytJ2pe5lMirgTXAC6rqsSQ7gZVtXe2xbSVZCbwLWFdV9yZ5y9D2AI+2f3cP5Qjwy1V19Z433srrXOB9Sd4O/DNwTVWd/0Shq2oTsAlgxdTaPXNKkuZhLruzVgMPtgI5Czh6aN2zk5zWll8FfIrHC+PhdqziPPbtauAX2oyDJMcmOSTJ0cAXq+rdwHuAk4EbgNOTfFfb9pAkx87h/kiS5ml/ZyIFXAr8RZLtDI4/3D20/h7gF5O8F7gT+N2q+lqSdwM7gAcY7JLal/cw2LV1Szve8hDw48CZwBuSPAY8Arymqh5Ksh64LMmKdvmLgM/s532SJM1Tqp54D0+Sw4FbqurovayfBja3g+cTbcXU2pq64JJxx9CY+FXwUp8kW6tq3WzrnnB3VpJnMDhY/bZRBJMkHdiecHdWVf098ITHGapqJzDxsxBJ0sLzu7MkSd0sEUlSN0tEktRtLh82POCdcNRqtvgOHUlaMM5EJEndLBFJUjdLRJLUzRKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd0sEUlSN0tEktTNEpEkdbNEJEndLBFJUjdLRJLUzRKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd2WjzvAYtp+/y6mN1417hhSt50XnzvuCNK/40xEktTNEpEkdbNEJEndLBFJUjdLRJLUzRKRJHWzRCRJ3Ra0RJK8L8l5s4w/I8kVC3lbkqTxW5QPG1bV3wPfUi6SpAPbvGYiSV6T5PYktyX54zZ8RpJPJ/nbmVlJkukkO9ry+iQfSvJXST6b5DeGru+cJNcnuSXJ5UlWtfGLk9zZbuttbWxNkg8mubmdTp/PfZEkzV33TCTJdwMXAS+qqoeTHAa8HZgCXgwcB1wJzLYb60TgJOBR4J4kvwP8S7u+s6vqq0n+C/D6JO8EXgEcV1WV5NB2Hb8N/K+q+lSSZwNXA8+bJecGYAPAsqev6b27kqRZzGd31kuAy6vqYYCq+qckAB+uqm8AdyY5ci+XvbaqdgEkuRM4GjgUOB64rl3PQcD1wC7g68AfJNkMbG7XcTZwfNsW4OlJVlXVI8M3VFWbgE0AK6bW1jzuryRpD6M4JvLo0HL2Y5vdLUeAa6rq/D03TnIK8AMMjqv8EoMC+zbg1Kr6+kKEliTN3XyOiXwc+E9JDgdou7Pm4wbg9CTf1a7vkCTHtuMiq6vq/wC/Ajy/bf9R4JdnLpzkxHneviRpjrpnIlV1R5JfA/5vkt3ArfMJUlUPJVkPXJZkRRu+CPgK8JEkKxnMVl7f1r0OeGeS2xncj08CPz+fDJKkuUnV0jlMsGJqbU1dcMm4Y0jd/HsiGockW6tq3Wzr/MS6JKmbJSJJ6maJSJK6WSKSpG6WiCSp26J8AeOkOOGo1Wzx3S2StGCciUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6LR93gMW0/f5dTG+8atwxJM3RzovPHXcE7YUzEUlSN0tEktTNEpEkdbNEJEndLBFJUjdLRJLUbWQlkuTTc9z+zCSb2/LLk2wcTTJJ0kIZ2edEqupF87jslcCVCxhHkjQCo5yJPNL+PTPJJ5JckeTuJJcmSVv30jZ2C/ATQ5ddn+QdbflHk9yY5NYkH0tyZBt/S5L3tuv+2ySvG9V9kSTNbrGOiZwEXAgcDxwDnJ5kJfBu4EeBFwD/YS+X/RRwalWdBPwp8MahdccBPwScAvxqkqeMJr4kaTaL9bUnN1XVfQBJtgHTwCPA56vqs238/cCGWS77TOADSaaAg4DPD627qqoeBR5N8iBwJHDf8IWTbJi53mVPX7OQ90mSlrzFmok8OrS8m7mV1+8A76iqE4CfA1bO5XqralNVrauqdcsOXj2Hm5Uk7cs43+J7NzCd5Dvb+fP3st1q4P62fMHIU0mS9tvYSqSqvs5gN9NV7cD6g3vZ9C3A5Um2Ag8vUjxJ0n5IVY07w6JZMbW2pi64ZNwxJM2RXwU/Xkm2VtW62db5iXVJUjdLRJLUzRKRJHWzRCRJ3SwRSVK3xfrE+kQ44ajVbPFdHpK0YJyJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrpZIpKkbpaIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqZolIkrotH3eAxbT9/l1Mb7xq3DEkaVHtvPjckV23MxFJUjdLRJLUzRKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1W9ASSTKdZMdCXqckaXJNxEwkyZL60KMkPVmMrESSHJPk1iTfl+QPk2xv589q69cnuTLJx4Fr29gbktyc5PYkbx26rg8n2ZrkjiQbhsYfSfJrSW5LckOSI0d1fyRJ32okJZLkucAHgfXAKUBV1QnA+cAfJVnZNj0ZOK+qvj/JOcDatv2JwAuSnNG2+6mqegGwDnhdksPb+CHADVX1fOCTwM/OkmVDki1Jtuz+2q5R3F1JWrJGUSJrgI8Ar66q24AXA+8HqKq7gb8Djm3bXlNV/9SWz2mnW4FbgOMYlAoMiuM24AbgWUPj/wpsbstbgek9w1TVpqpaV1Xrlh28eqHuoySJ0XwB4y7gCwzK4859bPvVoeUA/7Oqfn94gyRnAmcDp1XV15J8ApiZyTxWVdWWd7PEvlBSksZtFDORfwVeAbwmyauA/we8GiDJscCzgXtmudzVwE8lWdW2PSrJdwCrgX9uBXIccOoIMkuSOozklXtVfTXJy4BrgP8OnJBkO/BvwPqqejTJnpf5aJLnAde3dY8APwn8FfDzSe5iUD43jCKzJGnu8vjeoCe/FVNra+qCS8YdQ5IW1Xz/nkiSrVW1brZ1E/E5EUnSgckSkSR1s0QkSd0sEUlSN0tEktRtSX0474SjVrNlnu9SkCQ9zpmIJKmbJSJJ6maJSJK6WSKSpG6WiCSpmyUiSepmiUiSulkikqRulogkqZslIknqtqT+KFWSrzD7n+adFEcAD487xBMw3/yYb37MNz/zyXd0Va2ZbcWS+u4s4J69/XWuSZBki/n6mW9+zDc/SzWfu7MkSd0sEUlSt6VWIpvGHWAfzDc/5psf883Pksy3pA6sS5IW1lKbiUiSFpAlIknqtmRKJMlLk9yT5HNJNo4pw3uTPJhkx9DYYUmuSfLZ9u+3t/Ek+d8t7+1JTl6EfM9K8tdJ7kxyR5L/PEkZk6xMclOS21q+t7bx5yS5seX4QJKD2viKdv5zbf30KPO121yW5NYkmycw284k25NsS7KljU3EY9tu89AkVyS5O8ldSU6blHxJntt+bjOnLye5cFLytdv8lfZ7sSPJZe33ZfTPv6p60p+AZcDfAMcABwG3AcePIccZwMnAjqGx3wA2tuWNwK+35R8B/hIIcCpw4yLkmwJObstPAz4DHD8pGdvtrGrLTwFubLf7Z8Ar2/jvAb/Qll8L/F5bfiXwgUX4Gb4e+BNgczs/Sdl2AkfsMTYRj227zT8CfqYtHwQcOkn5hnIuAx4Ajp6UfMBRwOeBpw4979YvxvNvUX7o4z4BpwFXD51/E/CmMWWZ5t+XyD3AVFueYvCBSIDfB86fbbtFzPoR4AcnMSNwMHAL8B8ZfAp3+Z6PNXA1cFpbXt62ywgzPRO4FngJsLn9BzIR2drt7ORbS2QiHltgdftPMJOYb49M5wDXTVI+BiVyL3BYez5tBn5oMZ5/S2V31swPeMZ9bWwSHFlV/9CWHwCObMtjzdymtycxeLU/MRnb7qJtwIPANQxmmF+qqn+bJcM387X1u4DDRxjvEuCNwDfa+cMnKBtAAR9NsjXJhjY2KY/tc4CHgD9suwPfk+SQCco37JXAZW15IvJV1f3A24AvAP/A4Pm0lUV4/i2VEjkg1OBlwdjfc51kFfBB4MKq+vLwunFnrKrdVXUig1f9pwDHjSvLsCQvAx6sqq3jzvIEXlxVJwM/DPxikjOGV475sV3OYFfv71bVScBXGewe+qZxP/cA2jGFlwOX77lunPnasZgfY1DGzwAOAV66GLe9VErkfuBZQ+ef2cYmwReTTAG0fx9s42PJnOQpDArk0qr60CRmBKiqLwF/zWCKfmiSme+BG87wzXxt/WrgH0cU6XTg5Ul2An/KYJfWb09INuCbr1apqgeBP2dQwpPy2N4H3FdVN7bzVzAolUnJN+OHgVuq6ovt/KTkOxv4fFU9VFWPAR9i8Jwc+fNvqZTIzcDa9k6FgxhMR68cc6YZVwIXtOULGByHmBl/TXuXx6nArqFp80gkCfAHwF1V9fZJy5hkTZJD2/JTGRyvuYtBmZy3l3wzuc8DPt5eLS64qnpTVT2zqqYZPL8+XlWvnoRsAEkOSfK0mWUG+/V3MCGPbVU9ANyb5Llt6AeAOycl35DzeXxX1kyOScj3BeDUJAe33+OZn9/on3+LcSBqEk4M3i3xGQb70P/rmDJcxmB/5WMMXnn9NIP9kNcCnwU+BhzWtg3wzpZ3O7BuEfK9mMF0/HZgWzv9yKRkBL4XuLXl2wG8uY0fA9wEfI7BboYVbXxlO/+5tv6YRXqcz+Txd2dNRLaW47Z2umPmd2BSHtt2mycCW9rj+2Hg2ycs3yEMXq2vHhqbpHxvBe5uvxt/DKxYjOefX3siSeq2VHZnSZJGwBKRJHWzRCRJ3SwRSVI3S0SS1M0SkSR1s0QkSd3+PxNFbW14TY8fAAAAAElFTkSuQmCC\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "df.cuisine.value_counts().plot.barh()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "thai df: (289, 385)\njapanese df: (320, 385)\nchinese df: (442, 385)\nindian df: (598, 385)\nkorean df: (799, 385)\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "thai_df = df[(df.cuisine == \"thai\")]\n",
+ "japanese_df = df[(df.cuisine == \"japanese\")]\n",
+ "chinese_df = df[(df.cuisine == \"chinese\")]\n",
+ "indian_df = df[(df.cuisine == \"indian\")]\n",
+ "korean_df = df[(df.cuisine == \"korean\")]\n",
+ "\n",
+ "print(f'thai df: {thai_df.shape}')\n",
+ "print(f'japanese df: {japanese_df.shape}')\n",
+ "print(f'chinese df: {chinese_df.shape}')\n",
+ "print(f'indian df: {indian_df.shape}')\n",
+ "print(f'korean df: {korean_df.shape}')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def create_ingredient_df(df):\n",
+ " # transpose df, drop cuisine and unnamed rows, sum the row to get total for ingredient and add value header to new df\n",
+ " ingredient_df = df.T.drop(['cuisine','Unnamed: 0']).sum(axis=1).to_frame('value')\n",
+ " # drop ingredients that have a 0 sum\n",
+ " ingredient_df = ingredient_df[(ingredient_df.T != 0).any()]\n",
+ " # sort df\n",
+ " ingredient_df = ingredient_df.sort_values(by='value', ascending=False, inplace=False)\n",
+ " return ingredient_df\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "thai_ingredient_df = create_ingredient_df(thai_df)\r\n",
+ "thai_ingredient_df.head(10).plot.barh()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 11
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "japanese_ingredient_df = create_ingredient_df(japanese_df)\r\n",
+ "japanese_ingredient_df.head(10).plot.barh()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 12
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "chinese_ingredient_df = create_ingredient_df(chinese_df)\r\n",
+ "chinese_ingredient_df.head(10).plot.barh()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "indian_ingredient_df = create_ingredient_df(indian_df)\r\n",
+ "indian_ingredient_df.head(10).plot.barh()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "korean_ingredient_df = create_ingredient_df(korean_df)\r\n",
+ "korean_ingredient_df.head(10).plot.barh()"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " almond angelica anise anise_seed apple apple_brandy apricot \\\n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 \n",
+ "\n",
+ " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 380 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n artichoke \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 380 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ],
+ "source": [
+ "feature_df= df.drop(['cuisine','Unnamed: 0','rice','garlic','ginger'], axis=1)\n",
+ "labels_df = df.cuisine #.unique()\n",
+ "feature_df.head()\n"
+ ]
+ },
+ {
+ "source": [
+ "使用SMOTE过采样平衡数据到最高类别。阅读更多内容:https://imbalanced-learn.org/dev/references/generated/imblearn.over_sampling.SMOTE.html\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "oversample = SMOTE()\n",
+ "transformed_feature_df, transformed_label_df = oversample.fit_resample(feature_df, labels_df)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "new label count: korean 799\nchinese 799\njapanese 799\nindian 799\nthai 799\nName: cuisine, dtype: int64\nold label count: korean 799\nindian 598\nchinese 442\njapanese 320\nthai 289\nName: cuisine, dtype: int64\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(f'new label count: {transformed_label_df.value_counts()}')\r\n",
+ "print(f'old label count: {df.cuisine.value_counts()}')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " almond angelica anise anise_seed apple apple_brandy apricot \\\n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 \n",
+ "\n",
+ " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 380 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n artichoke \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 380 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ],
+ "source": [
+ "transformed_feature_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " cuisine almond angelica anise anise_seed apple apple_brandy \\\n",
+ "0 indian 0 0 0 0 0 0 \n",
+ "1 indian 1 0 0 0 0 0 \n",
+ "2 indian 0 0 0 0 0 0 \n",
+ "3 indian 0 0 0 0 0 0 \n",
+ "4 indian 0 0 0 0 0 0 \n",
+ "... ... ... ... ... ... ... ... \n",
+ "3990 thai 0 0 0 0 0 0 \n",
+ "3991 thai 0 0 0 0 0 0 \n",
+ "3992 thai 0 0 0 0 0 0 \n",
+ "3993 thai 0 0 0 0 0 0 \n",
+ "3994 thai 0 0 0 0 0 0 \n",
+ "\n",
+ " apricot armagnac artemisia ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "... ... ... ... ... ... ... ... \n",
+ "3990 0 0 0 ... 0 0 0 \n",
+ "3991 0 0 0 ... 0 0 0 \n",
+ "3992 0 0 0 ... 0 0 0 \n",
+ "3993 0 0 0 ... 0 0 0 \n",
+ "3994 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "... ... ... ... ... ... ... ... \n",
+ "3990 0 0 0 0 0 0 0 \n",
+ "3991 0 0 0 0 0 0 0 \n",
+ "3992 0 0 0 0 0 0 0 \n",
+ "3993 0 0 0 0 0 0 0 \n",
+ "3994 0 0 0 0 0 0 0 \n",
+ "\n",
+ "[3995 rows x 381 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n cuisine \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n indian \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n ... \n \n \n 3990 \n thai \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3991 \n thai \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3992 \n thai \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3993 \n thai \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3994 \n thai \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n
\n
3995 rows × 381 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 19
+ }
+ ],
+ "source": [
+ "# export transformed data to new df for classification\n",
+ "transformed_df = pd.concat([transformed_label_df,transformed_feature_df],axis=1, join='outer')\n",
+ "transformed_df"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "\nRangeIndex: 3995 entries, 0 to 3994\nColumns: 381 entries, cuisine to zucchini\ndtypes: int64(380), object(1)\nmemory usage: 11.6+ MB\n"
+ ]
+ }
+ ],
+ "source": [
+ "transformed_df.info()"
+ ]
+ },
+ {
+ "source": [
+ "保存文件以供将来使用\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "transformed_df.to_csv(\"../../data/cleaned_cuisines.csv\")"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "1da12ed6d238756959b8de9cac2a35a2",
+ "translation_date": "2025-09-03T20:34:56+00:00",
+ "source_file": "4-Classification/1-Introduction/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/2-Classifiers-1/README.md b/translations/zh-CN/4-Classification/2-Classifiers-1/README.md
new file mode 100644
index 000000000..10d610013
--- /dev/null
+++ b/translations/zh-CN/4-Classification/2-Classifiers-1/README.md
@@ -0,0 +1,244 @@
+# 美食分类器 1
+
+在本课中,您将使用上一课保存的数据集,该数据集包含关于美食的平衡且干净的数据。
+
+您将使用这个数据集和多种分类器来_根据一组食材预测某种国家美食_。在此过程中,您将进一步了解算法如何用于分类任务。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+# 准备工作
+
+假设您已完成[第1课](../1-Introduction/README.md),请确保在根目录的`/data`文件夹中存在一个名为_cleaned_cuisines.csv_的文件,以供这四节课使用。
+
+## 练习 - 预测国家美食
+
+1. 在本课的_notebook.ipynb_文件夹中,导入该文件以及Pandas库:
+
+ ```python
+ import pandas as pd
+ cuisines_df = pd.read_csv("../data/cleaned_cuisines.csv")
+ cuisines_df.head()
+ ```
+
+ 数据看起来如下:
+
+| | Unnamed: 0 | cuisine | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| --- | ---------- | ------- | ------ | -------- | ----- | ---------- | ----- | ------------ | ------- | -------- | --- | ------- | ----------- | ---------- | ----------------------- | ---- | ---- | --- | ----- | ------ | -------- |
+| 0 | 0 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 1 | indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 2 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 3 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 4 | indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |
+
+
+1. 现在,导入更多的库:
+
+ ```python
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ from sklearn.svm import SVC
+ import numpy as np
+ ```
+
+1. 将X和y坐标分成两个数据框用于训练。`cuisine`可以作为标签数据框:
+
+ ```python
+ cuisines_label_df = cuisines_df['cuisine']
+ cuisines_label_df.head()
+ ```
+
+ 它看起来如下:
+
+ ```output
+ 0 indian
+ 1 indian
+ 2 indian
+ 3 indian
+ 4 indian
+ Name: cuisine, dtype: object
+ ```
+
+1. 使用`drop()`方法删除`Unnamed: 0`列和`cuisine`列,并将剩余的数据保存为可训练的特征:
+
+ ```python
+ cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)
+ cuisines_feature_df.head()
+ ```
+
+ 您的特征看起来如下:
+
+| | almond | angelica | anise | anise_seed | apple | apple_brandy | apricot | armagnac | artemisia | artichoke | ... | whiskey | white_bread | white_wine | whole_grain_wheat_flour | wine | wood | yam | yeast | yogurt | zucchini |
+| ---: | -----: | -------: | ----: | ---------: | ----: | -----------: | ------: | -------: | --------: | --------: | ---: | ------: | ----------: | ---------: | ----------------------: | ---: | ---: | ---: | ----: | -----: | -------: |
+| 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 1 | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 2 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 3 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |
+| 4 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ... | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 |
+
+现在您可以开始训练模型了!
+
+## 选择分类器
+
+现在数据已经清理完毕并准备好训练,您需要决定使用哪种算法来完成任务。
+
+Scikit-learn将分类归类为监督学习,在这一类别中,您会发现许多分类方法。[种类繁多](https://scikit-learn.org/stable/supervised_learning.html),初看可能会让人眼花缭乱。以下方法都包含分类技术:
+
+- 线性模型
+- 支持向量机
+- 随机梯度下降
+- 最近邻
+- 高斯过程
+- 决策树
+- 集成方法(投票分类器)
+- 多分类和多输出算法(多分类和多标签分类,多分类-多输出分类)
+
+> 您也可以使用[神经网络进行数据分类](https://scikit-learn.org/stable/modules/neural_networks_supervised.html#classification),但这超出了本课的范围。
+
+### 选择哪个分类器?
+
+那么,应该选择哪个分类器呢?通常,可以尝试多个分类器并寻找效果较好的结果。Scikit-learn提供了一个[并排比较](https://scikit-learn.org/stable/auto_examples/classification/plot_classifier_comparison.html),在一个创建的数据集上比较了KNeighbors、SVC两种方式、GaussianProcessClassifier、DecisionTreeClassifier、RandomForestClassifier、MLPClassifier、AdaBoostClassifier、GaussianNB和QuadraticDiscrinationAnalysis,并以可视化方式展示结果:
+
+
+> 图表来自Scikit-learn文档
+
+> AutoML可以通过在云端运行这些比较来轻松解决这个问题,帮助您选择最适合数据的算法。试试[这里](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-77952-leestott)
+
+### 更好的方法
+
+比盲目猜测更好的方法是参考这个可下载的[机器学习备忘单](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-77952-leestott)。在这里,我们发现对于我们的多分类问题,有一些选择:
+
+
+> 微软算法备忘单的一部分,详细说明了多分类选项
+
+✅ 下载这个备忘单,打印出来,挂在墙上!
+
+### 推理
+
+让我们看看是否可以根据现有约束推理出不同的解决方法:
+
+- **神经网络过于复杂**。考虑到我们的数据集虽然干净但规模较小,并且我们通过本地笔记本运行训练,神经网络对于这个任务来说过于复杂。
+- **不使用二分类器**。我们不使用二分类器,因此排除了一对多(one-vs-all)。
+- **决策树或逻辑回归可能有效**。决策树可能有效,或者逻辑回归适用于多分类数据。
+- **多分类增强决策树解决不同问题**。多分类增强决策树最适合非参数任务,例如设计排名任务,因此对我们来说没有用。
+
+### 使用Scikit-learn
+
+我们将使用Scikit-learn来分析数据。然而,在Scikit-learn中有许多方法可以使用逻辑回归。查看[可传递的参数](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression)。
+
+基本上有两个重要参数——`multi_class`和`solver`——需要指定,当我们要求Scikit-learn执行逻辑回归时。`multi_class`值应用某种行为。`solver`值决定使用哪种算法。并非所有的`solver`都可以与所有的`multi_class`值配对。
+
+根据文档,在多分类情况下,训练算法:
+
+- **使用一对多(OvR)方案**,如果`multi_class`选项设置为`ovr`
+- **使用交叉熵损失**,如果`multi_class`选项设置为`multinomial`。(目前`multinomial`选项仅支持‘lbfgs’、‘sag’、‘saga’和‘newton-cg’求解器。)
+
+> 🎓 这里的“方案”可以是“ovr”(一对多)或“multinomial”。由于逻辑回归实际上是为支持二分类设计的,这些方案使其能够更好地处理多分类任务。[来源](https://machinelearningmastery.com/one-vs-rest-and-one-vs-one-for-multi-class-classification/)
+
+> 🎓 “求解器”定义为“用于优化问题的算法”。[来源](https://scikit-learn.org/stable/modules/generated/sklearn.linear_model.LogisticRegression.html?highlight=logistic%20regressio#sklearn.linear_model.LogisticRegression)。
+
+Scikit-learn提供了这个表格来解释求解器如何处理不同数据结构带来的挑战:
+
+
+
+## 练习 - 划分数据
+
+我们可以专注于逻辑回归作为我们的第一次训练尝试,因为您在上一课中刚刚学习了它。
+通过调用`train_test_split()`将数据划分为训练组和测试组:
+
+```python
+X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+```
+
+## 练习 - 应用逻辑回归
+
+由于您使用的是多分类情况,您需要选择使用什么_方案_以及设置什么_求解器_。使用LogisticRegression并设置多分类选项和**liblinear**求解器进行训练。
+
+1. 创建一个逻辑回归,multi_class设置为`ovr`,solver设置为`liblinear`:
+
+ ```python
+ lr = LogisticRegression(multi_class='ovr',solver='liblinear')
+ model = lr.fit(X_train, np.ravel(y_train))
+
+ accuracy = model.score(X_test, y_test)
+ print ("Accuracy is {}".format(accuracy))
+ ```
+
+ ✅ 尝试使用其他求解器,例如默认设置的`lbfgs`
+> 注意,在需要时可以使用 Pandas [`ravel`](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.Series.ravel.html) 函数来展平数据。
+准确率超过 **80%**!
+
+1. 你可以通过测试一行数据(#50)来查看此模型的实际效果:
+
+ ```python
+ print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')
+ print(f'cuisine: {y_test.iloc[50]}')
+ ```
+
+ 结果打印如下:
+
+ ```output
+ ingredients: Index(['cilantro', 'onion', 'pea', 'potato', 'tomato', 'vegetable_oil'], dtype='object')
+ cuisine: indian
+ ```
+
+ ✅ 尝试不同的行号并检查结果
+
+1. 更深入地分析,你可以检查此预测的准确性:
+
+ ```python
+ test= X_test.iloc[50].values.reshape(-1, 1).T
+ proba = model.predict_proba(test)
+ classes = model.classes_
+ resultdf = pd.DataFrame(data=proba, columns=classes)
+
+ topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])
+ topPrediction.head()
+ ```
+
+ 结果打印如下 - 印度菜是模型的最佳猜测,且概率较高:
+
+ | | 0 |
+ | -------: | -------: |
+ | indian | 0.715851 |
+ | chinese | 0.229475 |
+ | japanese | 0.029763 |
+ | korean | 0.017277 |
+ | thai | 0.007634 |
+
+ ✅ 你能解释为什么模型非常确定这是印度菜吗?
+
+1. 通过打印分类报告获取更多细节,就像你在回归课程中所做的一样:
+
+ ```python
+ y_pred = model.predict(X_test)
+ print(classification_report(y_test,y_pred))
+ ```
+
+ | | precision | recall | f1-score | support |
+ | ------------ | --------- | ------ | -------- | ------- |
+ | chinese | 0.73 | 0.71 | 0.72 | 229 |
+ | indian | 0.91 | 0.93 | 0.92 | 254 |
+ | japanese | 0.70 | 0.75 | 0.72 | 220 |
+ | korean | 0.86 | 0.76 | 0.81 | 242 |
+ | thai | 0.79 | 0.85 | 0.82 | 254 |
+ | accuracy | 0.80 | 1199 | | |
+ | macro avg | 0.80 | 0.80 | 0.80 | 1199 |
+ | weighted avg | 0.80 | 0.80 | 0.80 | 1199 |
+
+## 🚀挑战
+
+在本课中,你使用清理后的数据构建了一个机器学习模型,可以根据一系列食材预测国家菜系。花点时间阅读 Scikit-learn 提供的多种分类数据选项。深入了解“solver”的概念,理解其背后的工作原理。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+深入学习逻辑回归背后的数学原理:[这篇课件](https://people.eecs.berkeley.edu/~russell/classes/cs194/f11/lectures/CS194%20Fall%202011%20Lecture%2006.pdf)
+## 作业
+
+[研究 solvers](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。虽然我们尽力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/2-Classifiers-1/assignment.md b/translations/zh-CN/4-Classification/2-Classifiers-1/assignment.md
new file mode 100644
index 000000000..686cb40d3
--- /dev/null
+++ b/translations/zh-CN/4-Classification/2-Classifiers-1/assignment.md
@@ -0,0 +1,15 @@
+# 研究求解器
+## 说明
+
+在本课中,你学习了将算法与机器学习过程相结合以创建准确模型的各种求解器。浏览课程中列出的求解器,并选择两个。在你自己的话中,比较和对比这两个求解器。它们解决什么样的问题?它们如何与各种数据结构协作?为什么你会选择其中一个而不是另一个?
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | ---------------------------------------------------------------------------------------------- | ---------------------------------------------- | ---------------------------- |
+| | 提交的 .doc 文件包含两段文字,每段分别对一个求解器进行深思熟虑的比较。 | 提交的 .doc 文件仅包含一段文字 | 作业未完成 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/2-Classifiers-1/notebook.ipynb b/translations/zh-CN/4-Classification/2-Classifiers-1/notebook.ipynb
new file mode 100644
index 000000000..bbf8c89b7
--- /dev/null
+++ b/translations/zh-CN/4-Classification/2-Classifiers-1/notebook.ipynb
@@ -0,0 +1,41 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": 3
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "68829b06b4dcd512d3327849191f4d7f",
+ "translation_date": "2025-09-03T20:20:02+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "# 构建分类模型\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/2-Classifiers-1/solution/Julia/README.md b/translations/zh-CN/4-Classification/2-Classifiers-1/solution/Julia/README.md
new file mode 100644
index 000000000..779236745
--- /dev/null
+++ b/translations/zh-CN/4-Classification/2-Classifiers-1/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/2-Classifiers-1/solution/R/lesson_11-R.ipynb b/translations/zh-CN/4-Classification/2-Classifiers-1/solution/R/lesson_11-R.ipynb
new file mode 100644
index 000000000..29cfa2fc6
--- /dev/null
+++ b/translations/zh-CN/4-Classification/2-Classifiers-1/solution/R/lesson_11-R.ipynb
@@ -0,0 +1,1298 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "metadata": {
+ "colab": {
+ "name": "lesson_11-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ },
+ "coopTranslator": {
+ "original_hash": "6ea6a5171b1b99b7b5a55f7469c048d2",
+ "translation_date": "2025-09-03T20:24:29+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/solution/R/lesson_11-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "# 构建分类模型:美味的亚洲和印度美食\n"
+ ],
+ "metadata": {
+ "id": "zs2woWv_HoE8"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 美食分类器 1\n",
+ "\n",
+ "在本课中,我们将探索多种分类器来*根据一组食材预测某种国家美食*。同时,我们将深入了解算法在分类任务中的一些应用方式。\n",
+ "\n",
+ "### [**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/21/)\n",
+ "\n",
+ "### **准备工作**\n",
+ "\n",
+ "本课基于我们[上一课](https://github.com/microsoft/ML-For-Beginners/blob/main/4-Classification/1-Introduction/solution/lesson_10-R.ipynb),其中我们:\n",
+ "\n",
+ "- 使用一个关于亚洲和印度各种美食的数据集进行了分类的简单介绍 😋。\n",
+ "\n",
+ "- 探索了一些 [dplyr 动词](https://dplyr.tidyverse.org/) 来准备和清理数据。\n",
+ "\n",
+ "- 使用 ggplot2 创建了漂亮的可视化图表。\n",
+ "\n",
+ "- 演示了如何通过使用 [recipes](https://recipes.tidymodels.org/articles/Simple_Example.html) 预处理数据来处理不平衡数据。\n",
+ "\n",
+ "- 演示了如何 `prep` 和 `bake` 我们的配方,以确认其能够正常工作。\n",
+ "\n",
+ "#### **前置条件**\n",
+ "\n",
+ "在本课中,我们需要以下包来清理、准备和可视化数据:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个 [R 包集合](https://www.tidyverse.org/packages),旨在让数据科学更快、更简单、更有趣!\n",
+ "\n",
+ "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个 [包集合](https://www.tidymodels.org/packages),用于建模和机器学习。\n",
+ "\n",
+ "- `themis`: [themis 包](https://themis.tidymodels.org/) 提供了额外的配方步骤,用于处理不平衡数据。\n",
+ "\n",
+ "- `nnet`: [nnet 包](https://cran.r-project.org/web/packages/nnet/nnet.pdf) 提供了用于估计具有单个隐藏层的前馈神经网络以及多项逻辑回归模型的函数。\n",
+ "\n",
+ "您可以通过以下方式安装这些包:\n"
+ ],
+ "metadata": {
+ "id": "iDFOb3ebHwQC"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"DataExplorer\", \"here\"))`\n",
+ "\n",
+ "或者,下面的脚本会检查您是否已安装完成本模块所需的包,并在缺少时为您安装。\n"
+ ],
+ "metadata": {
+ "id": "4V85BGCjII7F"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "source": [
+ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\r\n",
+ "\r\n",
+ "pacman::p_load(tidyverse, tidymodels, themis, here)"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "Loading required package: pacman\n",
+ "\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "an5NPyyKIKNR",
+ "outputId": "834d5e74-f4b8-49f9-8ab5-4c52ff2d7bc8"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 1. 将数据分为训练集和测试集\n",
+ "\n",
+ "我们将从上一节课中选择几个步骤开始。\n",
+ "\n",
+ "### 使用 `dplyr::select()` 删除最常见的食材,这些食材容易在不同菜系之间造成混淆。\n",
+ "\n",
+ "谁不喜欢米饭、大蒜和姜呢!\n"
+ ],
+ "metadata": {
+ "id": "0ax9GQLBINVv"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "source": [
+ "# Load the original cuisines data\r\n",
+ "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\r\n",
+ "\r\n",
+ "# Drop id column, rice, garlic and ginger from our original data set\r\n",
+ "df_select <- df %>% \r\n",
+ " select(-c(1, rice, garlic, ginger)) %>%\r\n",
+ " # Encode cuisine column as categorical\r\n",
+ " mutate(cuisine = factor(cuisine))\r\n",
+ "\r\n",
+ "# Display new data set\r\n",
+ "df_select %>% \r\n",
+ " slice_head(n = 5)\r\n",
+ "\r\n",
+ "# Display distribution of cuisines\r\n",
+ "df_select %>% \r\n",
+ " count(cuisine) %>% \r\n",
+ " arrange(desc(n))"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "New names:\n",
+ "* `` -> ...1\n",
+ "\n",
+ "\u001b[1m\u001b[1mRows: \u001b[1m\u001b[22m\u001b[34m\u001b[34m2448\u001b[34m\u001b[39m \u001b[1m\u001b[1mColumns: \u001b[1m\u001b[22m\u001b[34m\u001b[34m385\u001b[34m\u001b[39m\n",
+ "\n",
+ "\u001b[36m──\u001b[39m \u001b[1m\u001b[1mColumn specification\u001b[1m\u001b[22m \u001b[36m────────────────────────────────────────────────────────\u001b[39m\n",
+ "\u001b[1mDelimiter:\u001b[22m \",\"\n",
+ "\u001b[31mchr\u001b[39m (1): cuisine\n",
+ "\u001b[32mdbl\u001b[39m (384): ...1, almond, angelica, anise, anise_seed, apple, apple_brandy, a...\n",
+ "\n",
+ "\n",
+ "\u001b[36mℹ\u001b[39m Use \u001b[30m\u001b[47m\u001b[30m\u001b[47m`spec()`\u001b[47m\u001b[30m\u001b[49m\u001b[39m to retrieve the full column specification for this data.\n",
+ "\u001b[36mℹ\u001b[39m Specify the column types or set \u001b[30m\u001b[47m\u001b[30m\u001b[47m`show_col_types = FALSE`\u001b[47m\u001b[30m\u001b[49m\u001b[39m to quiet this message.\n",
+ "\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " cuisine almond angelica anise anise_seed apple apple_brandy apricot armagnac\n",
+ "1 indian 0 0 0 0 0 0 0 0 \n",
+ "2 indian 1 0 0 0 0 0 0 0 \n",
+ "3 indian 0 0 0 0 0 0 0 0 \n",
+ "4 indian 0 0 0 0 0 0 0 0 \n",
+ "5 indian 0 0 0 0 0 0 0 0 \n",
+ " artemisia ⋯ whiskey white_bread white_wine whole_grain_wheat_flour wine wood\n",
+ "1 0 ⋯ 0 0 0 0 0 0 \n",
+ "2 0 ⋯ 0 0 0 0 0 0 \n",
+ "3 0 ⋯ 0 0 0 0 0 0 \n",
+ "4 0 ⋯ 0 0 0 0 0 0 \n",
+ "5 0 ⋯ 0 0 0 0 0 0 \n",
+ " yam yeast yogurt zucchini\n",
+ "1 0 0 0 0 \n",
+ "2 0 0 0 0 \n",
+ "3 0 0 0 0 \n",
+ "4 0 0 0 0 \n",
+ "5 0 0 1 0 "
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 5 × 381\n",
+ "\n",
+ "| cuisine <fct> | almond <dbl> | angelica <dbl> | anise <dbl> | anise_seed <dbl> | apple <dbl> | apple_brandy <dbl> | apricot <dbl> | armagnac <dbl> | artemisia <dbl> | ⋯ ⋯ | whiskey <dbl> | white_bread <dbl> | white_wine <dbl> | whole_grain_wheat_flour <dbl> | wine <dbl> | wood <dbl> | yam <dbl> | yeast <dbl> | yogurt <dbl> | zucchini <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| indian | 1 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| indian | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 1 | 0 |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 5 × 381\n",
+ "\\begin{tabular}{lllllllllllllllllllll}\n",
+ " cuisine & almond & angelica & anise & anise\\_seed & apple & apple\\_brandy & apricot & armagnac & artemisia & ⋯ & whiskey & white\\_bread & white\\_wine & whole\\_grain\\_wheat\\_flour & wine & wood & yam & yeast & yogurt & zucchini\\\\\n",
+ " & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t indian & 1 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t indian & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 1 & 0\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 5 × 381 \n",
+ "\n",
+ "\tcuisine almond angelica anise anise_seed apple apple_brandy apricot armagnac artemisia ⋯ whiskey white_bread white_wine whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "\t<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ⋯ <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> \n",
+ " \n",
+ "\n",
+ "\tindian 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ "\tindian 1 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ "\tindian 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ "\tindian 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ "\tindian 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 1 0 \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " cuisine n \n",
+ "1 korean 799\n",
+ "2 indian 598\n",
+ "3 chinese 442\n",
+ "4 japanese 320\n",
+ "5 thai 289"
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 5 × 2\n",
+ "\n",
+ "| cuisine <fct> | n <int> |\n",
+ "|---|---|\n",
+ "| korean | 799 |\n",
+ "| indian | 598 |\n",
+ "| chinese | 442 |\n",
+ "| japanese | 320 |\n",
+ "| thai | 289 |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 5 × 2\n",
+ "\\begin{tabular}{ll}\n",
+ " cuisine & n\\\\\n",
+ " & \\\\\n",
+ "\\hline\n",
+ "\t korean & 799\\\\\n",
+ "\t indian & 598\\\\\n",
+ "\t chinese & 442\\\\\n",
+ "\t japanese & 320\\\\\n",
+ "\t thai & 289\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 5 × 2 \n",
+ "\n",
+ "\tcuisine n \n",
+ "\t<fct> <int> \n",
+ " \n",
+ "\n",
+ "\tkorean 799 \n",
+ "\tindian 598 \n",
+ "\tchinese 442 \n",
+ "\tjapanese 320 \n",
+ "\tthai 289 \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 735
+ },
+ "id": "jhCrrH22IWVR",
+ "outputId": "d444a85c-1d8b-485f-bc4f-8be2e8f8217c"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "太棒了!现在是时候将数据分割为70%用于训练,30%用于测试。我们还将应用一种`分层`技术,在分割数据时`保持每种菜系的比例`在训练和验证数据集中。\n",
+ "\n",
+ "[rsample](https://rsample.tidymodels.org/)是Tidymodels中的一个包,它提供了高效的数据分割和重采样的基础设施:\n"
+ ],
+ "metadata": {
+ "id": "AYTjVyajIdny"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "source": [
+ "# Load the core Tidymodels packages into R session\r\n",
+ "library(tidymodels)\r\n",
+ "\r\n",
+ "# Create split specification\r\n",
+ "set.seed(2056)\r\n",
+ "cuisines_split <- initial_split(data = df_select,\r\n",
+ " strata = cuisine,\r\n",
+ " prop = 0.7)\r\n",
+ "\r\n",
+ "# Extract the data in each split\r\n",
+ "cuisines_train <- training(cuisines_split)\r\n",
+ "cuisines_test <- testing(cuisines_split)\r\n",
+ "\r\n",
+ "# Print the number of cases in each split\r\n",
+ "cat(\"Training cases: \", nrow(cuisines_train), \"\\n\",\r\n",
+ " \"Test cases: \", nrow(cuisines_test), sep = \"\")\r\n",
+ "\r\n",
+ "# Display the first few rows of the training set\r\n",
+ "cuisines_train %>% \r\n",
+ " slice_head(n = 5)\r\n",
+ "\r\n",
+ "\r\n",
+ "# Display distribution of cuisines in the training set\r\n",
+ "cuisines_train %>% \r\n",
+ " count(cuisine) %>% \r\n",
+ " arrange(desc(n))"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Training cases: 1712\n",
+ "Test cases: 736"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " cuisine almond angelica anise anise_seed apple apple_brandy apricot armagnac\n",
+ "1 chinese 0 0 0 0 0 0 0 0 \n",
+ "2 chinese 0 0 0 0 0 0 0 0 \n",
+ "3 chinese 0 0 0 0 0 0 0 0 \n",
+ "4 chinese 0 0 0 0 0 0 0 0 \n",
+ "5 chinese 0 0 0 0 0 0 0 0 \n",
+ " artemisia ⋯ whiskey white_bread white_wine whole_grain_wheat_flour wine wood\n",
+ "1 0 ⋯ 0 0 0 0 1 0 \n",
+ "2 0 ⋯ 0 0 0 0 1 0 \n",
+ "3 0 ⋯ 0 0 0 0 0 0 \n",
+ "4 0 ⋯ 0 0 0 0 0 0 \n",
+ "5 0 ⋯ 0 0 0 0 0 0 \n",
+ " yam yeast yogurt zucchini\n",
+ "1 0 0 0 0 \n",
+ "2 0 0 0 0 \n",
+ "3 0 0 0 0 \n",
+ "4 0 0 0 0 \n",
+ "5 0 0 0 0 "
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 5 × 381\n",
+ "\n",
+ "| cuisine <fct> | almond <dbl> | angelica <dbl> | anise <dbl> | anise_seed <dbl> | apple <dbl> | apple_brandy <dbl> | apricot <dbl> | armagnac <dbl> | artemisia <dbl> | ⋯ ⋯ | whiskey <dbl> | white_bread <dbl> | white_wine <dbl> | whole_grain_wheat_flour <dbl> | wine <dbl> | wood <dbl> | yam <dbl> | yeast <dbl> | yogurt <dbl> | zucchini <dbl> |\n",
+ "|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|\n",
+ "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "| chinese | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | ⋯ | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 | 0 |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 5 × 381\n",
+ "\\begin{tabular}{lllllllllllllllllllll}\n",
+ " cuisine & almond & angelica & anise & anise\\_seed & apple & apple\\_brandy & apricot & armagnac & artemisia & ⋯ & whiskey & white\\_bread & white\\_wine & whole\\_grain\\_wheat\\_flour & wine & wood & yam & yeast & yogurt & zucchini\\\\\n",
+ " & & & & & & & & & & ⋯ & & & & & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 1 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\t chinese & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & ⋯ & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0 & 0\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 5 × 381 \n",
+ "\n",
+ "\tcuisine almond angelica anise anise_seed apple apple_brandy apricot armagnac artemisia ⋯ whiskey white_bread white_wine whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "\t<fct> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> ⋯ <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> <dbl> \n",
+ " \n",
+ "\n",
+ "\tchinese 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 1 0 0 0 0 0 \n",
+ "\tchinese 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 1 0 0 0 0 0 \n",
+ "\tchinese 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ "\tchinese 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ "\tchinese 0 0 0 0 0 0 0 0 0 ⋯ 0 0 0 0 0 0 0 0 0 0 \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " cuisine n \n",
+ "1 korean 559\n",
+ "2 indian 418\n",
+ "3 chinese 309\n",
+ "4 japanese 224\n",
+ "5 thai 202"
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 5 × 2\n",
+ "\n",
+ "| cuisine <fct> | n <int> |\n",
+ "|---|---|\n",
+ "| korean | 559 |\n",
+ "| indian | 418 |\n",
+ "| chinese | 309 |\n",
+ "| japanese | 224 |\n",
+ "| thai | 202 |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 5 × 2\n",
+ "\\begin{tabular}{ll}\n",
+ " cuisine & n\\\\\n",
+ " & \\\\\n",
+ "\\hline\n",
+ "\t korean & 559\\\\\n",
+ "\t indian & 418\\\\\n",
+ "\t chinese & 309\\\\\n",
+ "\t japanese & 224\\\\\n",
+ "\t thai & 202\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 5 × 2 \n",
+ "\n",
+ "\tcuisine n \n",
+ "\t<fct> <int> \n",
+ " \n",
+ "\n",
+ "\tkorean 559 \n",
+ "\tindian 418 \n",
+ "\tchinese 309 \n",
+ "\tjapanese 224 \n",
+ "\tthai 202 \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 535
+ },
+ "id": "w5FWIkEiIjdN",
+ "outputId": "2e195fd9-1a8f-4b91-9573-cce5582242df"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 2. 处理数据不平衡\n",
+ "\n",
+ "正如你可能在原始数据集以及我们的训练集里注意到的,菜系的数量分布非常不均衡。韩餐的数量几乎是泰餐的*三倍*。数据不平衡通常会对模型性能产生负面影响。许多模型在观察数量相等时表现最佳,因此在处理不平衡数据时往往会遇到困难。\n",
+ "\n",
+ "处理数据不平衡主要有两种方法:\n",
+ "\n",
+ "- 为少数类别增加观察值:`过采样`,例如使用 SMOTE 算法,该算法通过少数类别的邻近样本合成生成新的样本。\n",
+ "\n",
+ "- 从多数类别中移除观察值:`欠采样`\n",
+ "\n",
+ "在之前的课程中,我们演示了如何使用 `recipe` 来处理数据不平衡问题。`recipe` 可以被看作是一个蓝图,描述了应该对数据集应用哪些步骤以使其准备好进行数据分析。在我们的案例中,我们希望在 `训练集` 中实现菜系数量的均衡分布。让我们直接开始吧。\n"
+ ],
+ "metadata": {
+ "id": "daBi9qJNIwqW"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "source": [
+ "# Load themis package for dealing with imbalanced data\r\n",
+ "library(themis)\r\n",
+ "\r\n",
+ "# Create a recipe for preprocessing training data\r\n",
+ "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>% \r\n",
+ " step_smote(cuisine)\r\n",
+ "\r\n",
+ "# Print recipe\r\n",
+ "cuisines_recipe"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Data Recipe\n",
+ "\n",
+ "Inputs:\n",
+ "\n",
+ " role #variables\n",
+ " outcome 1\n",
+ " predictor 380\n",
+ "\n",
+ "Operations:\n",
+ "\n",
+ "SMOTE based on cuisine"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 200
+ },
+ "id": "Az6LFBGxI1X0",
+ "outputId": "29d71d85-64b0-4e62-871e-bcd5398573b6"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "您当然可以通过准备和烘焙来确认这份配方是否如预期般有效——所有标有“559”观察值的菜系标签。\n",
+ "\n",
+ "由于我们将使用这份配方作为建模的预处理器,`workflow()`将为我们完成所有的准备和烘焙工作,因此我们无需手动估算配方。\n",
+ "\n",
+ "现在我们准备开始训练模型了 👩💻👨💻!\n",
+ "\n",
+ "## 3. 选择您的分类器\n",
+ "\n",
+ "\n",
+ " \n",
+ " 插画作者:@allison_horst \n"
+ ],
+ "metadata": {
+ "id": "NBL3PqIWJBBB"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在我们需要决定使用哪种算法来完成任务 🤔。\n",
+ "\n",
+ "在 Tidymodels 中,[`parsnip package`](https://parsnip.tidymodels.org/index.html) 提供了一个一致的接口,用于跨不同引擎(包)处理模型。请参阅 parsnip 文档,探索[模型类型和引擎](https://www.tidymodels.org/find/parsnip/#models)及其对应的[模型参数](https://www.tidymodels.org/find/parsnip/#model-args)。乍一看,种类繁多令人眼花缭乱。例如,以下方法都包括分类技术:\n",
+ "\n",
+ "- C5.0规则分类模型\n",
+ "\n",
+ "- 灵活判别模型\n",
+ "\n",
+ "- 线性判别模型\n",
+ "\n",
+ "- 正则化判别模型\n",
+ "\n",
+ "- 逻辑回归模型\n",
+ "\n",
+ "- 多项式回归模型\n",
+ "\n",
+ "- 朴素贝叶斯模型\n",
+ "\n",
+ "- 支持向量机\n",
+ "\n",
+ "- 最近邻算法\n",
+ "\n",
+ "- 决策树\n",
+ "\n",
+ "- 集成方法\n",
+ "\n",
+ "- 神经网络\n",
+ "\n",
+ "这个列表还在继续!\n",
+ "\n",
+ "### **选择哪种分类器?**\n",
+ "\n",
+ "那么,应该选择哪种分类器呢?通常,通过尝试多个分类器并寻找效果较好的结果是一种测试方法。\n",
+ "\n",
+ "> AutoML 通过在云端运行这些比较,巧妙地解决了这个问题,让你可以选择最适合数据的算法。试试 [这里](https://docs.microsoft.com/learn/modules/automate-model-selection-with-azure-automl/?WT.mc_id=academic-77952-leestott)\n",
+ "\n",
+ "此外,分类器的选择取决于我们的问题。例如,当结果可以被分类为`多于两个类别`时,就像我们的情况一样,你必须使用`多分类算法`而不是`二分类算法`。\n",
+ "\n",
+ "### **更好的方法**\n",
+ "\n",
+ "比盲目猜测更好的方法是参考这个可下载的[机器学习速查表](https://docs.microsoft.com/azure/machine-learning/algorithm-cheat-sheet?WT.mc_id=academic-77952-leestott)。在这里,我们发现,对于我们的多分类问题,我们有一些选择:\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 微软算法速查表的一部分,详细介绍了多分类选项 \n"
+ ],
+ "metadata": {
+ "id": "a6DLAZ3vJZ14"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "### **推理**\n",
+ "\n",
+ "让我们根据现有的约束条件来分析不同的方法:\n",
+ "\n",
+ "- **深度神经网络过于复杂**。考虑到我们数据集虽然干净但规模较小,并且我们是在本地通过笔记本运行训练,深度神经网络对于这个任务来说过于笨重。\n",
+ "\n",
+ "- **不使用二分类分类器**。我们不使用二分类分类器,因此排除了“一对多”的方法。\n",
+ "\n",
+ "- **决策树或逻辑回归可能适用**。决策树可能有效,或者可以使用多项式回归/多分类逻辑回归来处理多分类数据。\n",
+ "\n",
+ "- **多分类提升决策树解决的是不同的问题**。多分类提升决策树最适合非参数任务,例如用于构建排名的任务,因此对我们来说并不适用。\n",
+ "\n",
+ "此外,通常在尝试更复杂的机器学习模型(例如集成方法)之前,构建一个最简单的模型来了解数据情况是一个好主意。因此,在本课程中,我们将从一个`多项式回归`模型开始。\n",
+ "\n",
+ "> 逻辑回归是一种用于结果变量是分类(或名义)时的技术。对于二元逻辑回归,结果变量的数量是两个,而对于多项式逻辑回归,结果变量的数量超过两个。有关更多信息,请参阅[高级回归方法](https://bookdown.org/chua/ber642_advanced_regression/multinomial-logistic-regression.html)。\n",
+ "\n",
+ "## 4. 训练并评估一个多项式逻辑回归模型\n",
+ "\n",
+ "在Tidymodels中,`parsnip::multinom_reg()`定义了一个使用线性预测器通过多项式分布预测多分类数据的模型。有关使用此模型的不同方法/引擎,请参阅`?multinom_reg()`。\n",
+ "\n",
+ "在这个示例中,我们将通过默认的[nnet](https://cran.r-project.org/web/packages/nnet/nnet.pdf)引擎来拟合一个多项式回归模型。\n",
+ "\n",
+ "> 我随机选择了一个`penalty`值。实际上有更好的方法来选择这个值,例如通过`重采样`和`调参`模型,这些内容我们稍后会讨论。\n",
+ ">\n",
+ "> 如果您想了解更多关于如何调节模型超参数的信息,请参阅[Tidymodels: 入门](https://www.tidymodels.org/start/tuning/)。\n"
+ ],
+ "metadata": {
+ "id": "gWMsVcbBJemu"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "source": [
+ "# Create a multinomial regression model specification\r\n",
+ "mr_spec <- multinom_reg(penalty = 1) %>% \r\n",
+ " set_engine(\"nnet\", MaxNWts = 2086) %>% \r\n",
+ " set_mode(\"classification\")\r\n",
+ "\r\n",
+ "# Print model specification\r\n",
+ "mr_spec"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "Multinomial Regression Model Specification (classification)\n",
+ "\n",
+ "Main Arguments:\n",
+ " penalty = 1\n",
+ "\n",
+ "Engine-Specific Arguments:\n",
+ " MaxNWts = 2086\n",
+ "\n",
+ "Computational engine: nnet \n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 166
+ },
+ "id": "Wq_fcyQiJvfG",
+ "outputId": "c30449c7-3864-4be7-f810-72a003743e2d"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "干得好 🥳!现在我们已经有了一个配方和一个模型规范,我们需要找到一种方法将它们打包到一个对象中,这个对象首先会对数据进行预处理,然后将模型拟合到预处理后的数据上,同时还允许进行潜在的后处理操作。在 Tidymodels 中,这个方便的对象叫做 [`workflow`](https://workflows.tidymodels.org/),它可以方便地保存你的建模组件!这在 *Python* 中我们称之为 *pipelines*。\n",
+ "\n",
+ "那么,让我们把所有内容打包到一个 workflow 中吧!📦\n"
+ ],
+ "metadata": {
+ "id": "NlSbzDfgJ0zh"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "source": [
+ "# Bundle recipe and model specification\r\n",
+ "mr_wf <- workflow() %>% \r\n",
+ " add_recipe(cuisines_recipe) %>% \r\n",
+ " add_model(mr_spec)\r\n",
+ "\r\n",
+ "# Print out workflow\r\n",
+ "mr_wf"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "══ Workflow ════════════════════════════════════════════════════════════════════\n",
+ "\u001b[3mPreprocessor:\u001b[23m Recipe\n",
+ "\u001b[3mModel:\u001b[23m multinom_reg()\n",
+ "\n",
+ "── Preprocessor ────────────────────────────────────────────────────────────────\n",
+ "1 Recipe Step\n",
+ "\n",
+ "• step_smote()\n",
+ "\n",
+ "── Model ───────────────────────────────────────────────────────────────────────\n",
+ "Multinomial Regression Model Specification (classification)\n",
+ "\n",
+ "Main Arguments:\n",
+ " penalty = 1\n",
+ "\n",
+ "Engine-Specific Arguments:\n",
+ " MaxNWts = 2086\n",
+ "\n",
+ "Computational engine: nnet \n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 333
+ },
+ "id": "Sc1TfPA4Ke3_",
+ "outputId": "82c70013-e431-4e7e-cef6-9fcf8aad4a6c"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "工作流 👌👌!一个 **`workflow()`** 可以像模型一样进行拟合。所以,是时候训练一个模型了!\n"
+ ],
+ "metadata": {
+ "id": "TNQ8i85aKf9L"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "source": [
+ "# Train a multinomial regression model\n",
+ "mr_fit <- fit(object = mr_wf, data = cuisines_train)\n",
+ "\n",
+ "mr_fit"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "══ Workflow [trained] ══════════════════════════════════════════════════════════\n",
+ "\u001b[3mPreprocessor:\u001b[23m Recipe\n",
+ "\u001b[3mModel:\u001b[23m multinom_reg()\n",
+ "\n",
+ "── Preprocessor ────────────────────────────────────────────────────────────────\n",
+ "1 Recipe Step\n",
+ "\n",
+ "• step_smote()\n",
+ "\n",
+ "── Model ───────────────────────────────────────────────────────────────────────\n",
+ "Call:\n",
+ "nnet::multinom(formula = ..y ~ ., data = data, decay = ~1, MaxNWts = ~2086, \n",
+ " trace = FALSE)\n",
+ "\n",
+ "Coefficients:\n",
+ " (Intercept) almond angelica anise anise_seed apple\n",
+ "indian 0.19723325 0.2409661 0 -5.004955e-05 -0.1657635 -0.05769734\n",
+ "japanese 0.13961959 -0.6262400 0 -1.169155e-04 -0.4893596 -0.08585717\n",
+ "korean 0.22377347 -0.1833485 0 -5.560395e-05 -0.2489401 -0.15657804\n",
+ "thai -0.04336577 -0.6106258 0 4.903828e-04 -0.5782866 0.63451105\n",
+ " apple_brandy apricot armagnac artemisia artichoke asparagus\n",
+ "indian 0 0.37042636 0 -0.09122797 0 -0.27181970\n",
+ "japanese 0 0.28895643 0 -0.12651100 0 0.14054037\n",
+ "korean 0 -0.07981259 0 0.55756709 0 -0.66979948\n",
+ "thai 0 -0.33160904 0 -0.10725182 0 -0.02602152\n",
+ " avocado bacon baked_potato balm banana barley\n",
+ "indian -0.46624197 0.16008055 0 0 -0.2838796 0.2230625\n",
+ "japanese 0.90341344 0.02932727 0 0 -0.4142787 2.0953906\n",
+ "korean -0.06925382 -0.35804134 0 0 -0.2686963 -0.7233404\n",
+ "thai -0.21473955 -0.75594439 0 0 0.6784880 -0.4363320\n",
+ " bartlett_pear basil bay bean beech\n",
+ "indian 0 -0.7128756 0.1011587 -0.8777275 -0.0004380795\n",
+ "japanese 0 0.1288697 0.9425626 -0.2380748 0.3373437611\n",
+ "korean 0 -0.2445193 -0.4744318 -0.8957870 -0.0048784496\n",
+ "thai 0 1.5365848 0.1333256 0.2196970 -0.0113078024\n",
+ " beef beef_broth beef_liver beer beet\n",
+ "indian -0.7985278 0.2430186 -0.035598065 -0.002173738 0.01005813\n",
+ "japanese 0.2241875 -0.3653020 -0.139551027 0.128905553 0.04923911\n",
+ "korean 0.5366515 -0.6153237 0.213455197 -0.010828645 0.27325423\n",
+ "thai 0.1570012 -0.9364154 -0.008032213 -0.035063746 -0.28279823\n",
+ " bell_pepper bergamot berry bitter_orange black_bean\n",
+ "indian 0.49074330 0 0.58947607 0.191256164 -0.1945233\n",
+ "japanese 0.09074167 0 -0.25917977 -0.118915977 -0.3442400\n",
+ "korean -0.57876763 0 -0.07874180 -0.007729435 -0.5220672\n",
+ "thai 0.92554006 0 -0.07210196 -0.002983296 -0.4614426\n",
+ " black_currant black_mustard_seed_oil black_pepper black_raspberry\n",
+ "indian 0 0.38935801 -0.4453495 0\n",
+ "japanese 0 -0.05452887 -0.5440869 0\n",
+ "korean 0 -0.03929970 0.8025454 0\n",
+ "thai 0 -0.21498372 -0.9854806 0\n",
+ " black_sesame_seed black_tea blackberry blackberry_brandy\n",
+ "indian -0.2759246 0.3079977 0.191256164 0\n",
+ "japanese -0.6101687 -0.1671913 -0.118915977 0\n",
+ "korean 1.5197674 -0.3036261 -0.007729435 0\n",
+ "thai -0.1755656 -0.1487033 -0.002983296 0\n",
+ " blue_cheese blueberry bone_oil bourbon_whiskey brandy\n",
+ "indian 0 0.216164294 -0.2276744 0 0.22427587\n",
+ "japanese 0 -0.119186087 0.3913019 0 -0.15595599\n",
+ "korean 0 -0.007821986 0.2854487 0 -0.02562342\n",
+ "thai 0 -0.004947048 -0.0253658 0 -0.05715244\n",
+ "\n",
+ "...\n",
+ "and 308 more lines."
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 1000
+ },
+ "id": "GMbdfVmTKkJI",
+ "outputId": "adf9ebdf-d69d-4a64-e9fd-e06e5322292e"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "输出显示了模型在训练过程中学习到的系数。\n",
+ "\n",
+ "### 评估训练好的模型\n",
+ "\n",
+ "现在是时候通过在测试集上评估模型来看看它的表现了 📏!让我们从对测试集进行预测开始吧。\n"
+ ],
+ "metadata": {
+ "id": "tt2BfOxrKmcJ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "source": [
+ "# Make predictions on the test set\n",
+ "results <- cuisines_test %>% select(cuisine) %>% \n",
+ " bind_cols(mr_fit %>% predict(new_data = cuisines_test))\n",
+ "\n",
+ "# Print out results\n",
+ "results %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " cuisine .pred_class\n",
+ "1 indian thai \n",
+ "2 indian indian \n",
+ "3 indian indian \n",
+ "4 indian indian \n",
+ "5 indian indian "
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 5 × 2\n",
+ "\n",
+ "| cuisine <fct> | .pred_class <fct> |\n",
+ "|---|---|\n",
+ "| indian | thai |\n",
+ "| indian | indian |\n",
+ "| indian | indian |\n",
+ "| indian | indian |\n",
+ "| indian | indian |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 5 × 2\n",
+ "\\begin{tabular}{ll}\n",
+ " cuisine & .pred\\_class\\\\\n",
+ " & \\\\\n",
+ "\\hline\n",
+ "\t indian & thai \\\\\n",
+ "\t indian & indian\\\\\n",
+ "\t indian & indian\\\\\n",
+ "\t indian & indian\\\\\n",
+ "\t indian & indian\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 5 × 2 \n",
+ "\n",
+ "\tcuisine .pred_class \n",
+ "\t<fct> <fct> \n",
+ " \n",
+ "\n",
+ "\tindian thai \n",
+ "\tindian indian \n",
+ "\tindian indian \n",
+ "\tindian indian \n",
+ "\tindian indian \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 248
+ },
+ "id": "CqtckvtsKqax",
+ "outputId": "e57fe557-6a68-4217-fe82-173328c5436d"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "干得好!在Tidymodels中,可以使用[yardstick](https://yardstick.tidymodels.org/)评估模型性能——这是一个通过性能指标来衡量模型效果的工具包。正如我们在逻辑回归课程中所做的那样,让我们从计算混淆矩阵开始。\n"
+ ],
+ "metadata": {
+ "id": "8w5N6XsBKss7"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "source": [
+ "# Confusion matrix for categorical data\n",
+ "conf_mat(data = results, truth = cuisine, estimate = .pred_class)\n"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " Truth\n",
+ "Prediction chinese indian japanese korean thai\n",
+ " chinese 83 1 8 15 10\n",
+ " indian 4 163 1 2 6\n",
+ " japanese 21 5 73 25 1\n",
+ " korean 15 0 11 191 0\n",
+ " thai 10 11 3 7 70"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 133
+ },
+ "id": "YvODvsLkK0iG",
+ "outputId": "bb69da84-1266-47ad-b174-d43b88ca2988"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "当处理多个类别时,通常更直观的方式是将其可视化为热图,如下所示:\n"
+ ],
+ "metadata": {
+ "id": "c0HfPL16Lr6U"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "source": [
+ "update_geom_defaults(geom = \"tile\", new = list(color = \"black\", alpha = 0.7))\n",
+ "# Visualize confusion matrix\n",
+ "results %>% \n",
+ " conf_mat(cuisine, .pred_class) %>% \n",
+ " autoplot(type = \"heatmap\")"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ "plot without title"
+ ],
+ "image/png": ""
+ },
+ "metadata": {
+ "image/png": {
+ "width": 420,
+ "height": 420
+ }
+ }
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 436
+ },
+ "id": "HsAtwukyLsvt",
+ "outputId": "3032a224-a2c8-4270-b4f2-7bb620317400"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "混淆矩阵图中较深的方块表示案例数量较多,希望你能看到一条较深方块组成的对角线,表明预测标签与实际标签一致的情况。\n",
+ "\n",
+ "现在让我们计算混淆矩阵的汇总统计数据。\n"
+ ],
+ "metadata": {
+ "id": "oOJC87dkLwPr"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "source": [
+ "# Summary stats for confusion matrix\n",
+ "conf_mat(data = results, truth = cuisine, estimate = .pred_class) %>% \n",
+ "summary()"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " .metric .estimator .estimate\n",
+ "1 accuracy multiclass 0.7880435\n",
+ "2 kap multiclass 0.7276583\n",
+ "3 sens macro 0.7780927\n",
+ "4 spec macro 0.9477598\n",
+ "5 ppv macro 0.7585583\n",
+ "6 npv macro 0.9460080\n",
+ "7 mcc multiclass 0.7292724\n",
+ "8 j_index macro 0.7258524\n",
+ "9 bal_accuracy macro 0.8629262\n",
+ "10 detection_prevalence macro 0.2000000\n",
+ "11 precision macro 0.7585583\n",
+ "12 recall macro 0.7780927\n",
+ "13 f_meas macro 0.7641862"
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 13 × 3\n",
+ "\n",
+ "| .metric <chr> | .estimator <chr> | .estimate <dbl> |\n",
+ "|---|---|---|\n",
+ "| accuracy | multiclass | 0.7880435 |\n",
+ "| kap | multiclass | 0.7276583 |\n",
+ "| sens | macro | 0.7780927 |\n",
+ "| spec | macro | 0.9477598 |\n",
+ "| ppv | macro | 0.7585583 |\n",
+ "| npv | macro | 0.9460080 |\n",
+ "| mcc | multiclass | 0.7292724 |\n",
+ "| j_index | macro | 0.7258524 |\n",
+ "| bal_accuracy | macro | 0.8629262 |\n",
+ "| detection_prevalence | macro | 0.2000000 |\n",
+ "| precision | macro | 0.7585583 |\n",
+ "| recall | macro | 0.7780927 |\n",
+ "| f_meas | macro | 0.7641862 |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 13 × 3\n",
+ "\\begin{tabular}{lll}\n",
+ " .metric & .estimator & .estimate\\\\\n",
+ " & & \\\\\n",
+ "\\hline\n",
+ "\t accuracy & multiclass & 0.7880435\\\\\n",
+ "\t kap & multiclass & 0.7276583\\\\\n",
+ "\t sens & macro & 0.7780927\\\\\n",
+ "\t spec & macro & 0.9477598\\\\\n",
+ "\t ppv & macro & 0.7585583\\\\\n",
+ "\t npv & macro & 0.9460080\\\\\n",
+ "\t mcc & multiclass & 0.7292724\\\\\n",
+ "\t j\\_index & macro & 0.7258524\\\\\n",
+ "\t bal\\_accuracy & macro & 0.8629262\\\\\n",
+ "\t detection\\_prevalence & macro & 0.2000000\\\\\n",
+ "\t precision & macro & 0.7585583\\\\\n",
+ "\t recall & macro & 0.7780927\\\\\n",
+ "\t f\\_meas & macro & 0.7641862\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 13 × 3 \n",
+ "\n",
+ "\t.metric .estimator .estimate \n",
+ "\t<chr> <chr> <dbl> \n",
+ " \n",
+ "\n",
+ "\taccuracy multiclass 0.7880435 \n",
+ "\tkap multiclass 0.7276583 \n",
+ "\tsens macro 0.7780927 \n",
+ "\tspec macro 0.9477598 \n",
+ "\tppv macro 0.7585583 \n",
+ "\tnpv macro 0.9460080 \n",
+ "\tmcc multiclass 0.7292724 \n",
+ "\tj_index macro 0.7258524 \n",
+ "\tbal_accuracy macro 0.8629262 \n",
+ "\tdetection_prevalence macro 0.2000000 \n",
+ "\tprecision macro 0.7585583 \n",
+ "\trecall macro 0.7780927 \n",
+ "\tf_meas macro 0.7641862 \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 494
+ },
+ "id": "OYqetUyzL5Wz",
+ "outputId": "6a84d65e-113d-4281-dfc1-16e8b70f37e6"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "如果我们仅关注一些指标,比如准确率、敏感性、PPV,作为开始,我们的表现还不错 🥳!\n",
+ "\n",
+ "## 4. 深入探讨\n",
+ "\n",
+ "让我们问一个微妙的问题:选择某种菜系作为预测结果的标准是什么?\n",
+ "\n",
+ "实际上,统计机器学习算法,比如逻辑回归,是基于`概率`的;分类器真正预测的是一组可能结果的概率分布。然后,概率最高的类别会被选为给定观察数据中最可能的结果。\n",
+ "\n",
+ "让我们通过同时进行硬分类预测和概率预测来看看实际效果。\n"
+ ],
+ "metadata": {
+ "id": "43t7vz8vMJtW"
+ }
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "source": [
+ "# Make hard class prediction and probabilities\n",
+ "results_prob <- cuisines_test %>%\n",
+ " select(cuisine) %>% \n",
+ " bind_cols(mr_fit %>% predict(new_data = cuisines_test)) %>% \n",
+ " bind_cols(mr_fit %>% predict(new_data = cuisines_test, type = \"prob\"))\n",
+ "\n",
+ "# Print out results\n",
+ "results_prob %>% \n",
+ " slice_head(n = 5)"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": [
+ " cuisine .pred_class .pred_chinese .pred_indian .pred_japanese .pred_korean\n",
+ "1 indian thai 1.551259e-03 0.4587877 5.988039e-04 2.428503e-04\n",
+ "2 indian indian 2.637133e-05 0.9999488 6.648651e-07 2.259993e-05\n",
+ "3 indian indian 1.049433e-03 0.9909982 1.060937e-03 1.644947e-05\n",
+ "4 indian indian 6.237482e-02 0.4763035 9.136702e-02 3.660913e-01\n",
+ "5 indian indian 1.431745e-02 0.9418551 2.945239e-02 8.721782e-03\n",
+ " .pred_thai \n",
+ "1 5.388194e-01\n",
+ "2 1.577948e-06\n",
+ "3 6.874989e-03\n",
+ "4 3.863391e-03\n",
+ "5 5.653283e-03"
+ ],
+ "text/markdown": [
+ "\n",
+ "A tibble: 5 × 7\n",
+ "\n",
+ "| cuisine <fct> | .pred_class <fct> | .pred_chinese <dbl> | .pred_indian <dbl> | .pred_japanese <dbl> | .pred_korean <dbl> | .pred_thai <dbl> |\n",
+ "|---|---|---|---|---|---|---|\n",
+ "| indian | thai | 1.551259e-03 | 0.4587877 | 5.988039e-04 | 2.428503e-04 | 5.388194e-01 |\n",
+ "| indian | indian | 2.637133e-05 | 0.9999488 | 6.648651e-07 | 2.259993e-05 | 1.577948e-06 |\n",
+ "| indian | indian | 1.049433e-03 | 0.9909982 | 1.060937e-03 | 1.644947e-05 | 6.874989e-03 |\n",
+ "| indian | indian | 6.237482e-02 | 0.4763035 | 9.136702e-02 | 3.660913e-01 | 3.863391e-03 |\n",
+ "| indian | indian | 1.431745e-02 | 0.9418551 | 2.945239e-02 | 8.721782e-03 | 5.653283e-03 |\n",
+ "\n"
+ ],
+ "text/latex": [
+ "A tibble: 5 × 7\n",
+ "\\begin{tabular}{lllllll}\n",
+ " cuisine & .pred\\_class & .pred\\_chinese & .pred\\_indian & .pred\\_japanese & .pred\\_korean & .pred\\_thai\\\\\n",
+ " & & & & & & \\\\\n",
+ "\\hline\n",
+ "\t indian & thai & 1.551259e-03 & 0.4587877 & 5.988039e-04 & 2.428503e-04 & 5.388194e-01\\\\\n",
+ "\t indian & indian & 2.637133e-05 & 0.9999488 & 6.648651e-07 & 2.259993e-05 & 1.577948e-06\\\\\n",
+ "\t indian & indian & 1.049433e-03 & 0.9909982 & 1.060937e-03 & 1.644947e-05 & 6.874989e-03\\\\\n",
+ "\t indian & indian & 6.237482e-02 & 0.4763035 & 9.136702e-02 & 3.660913e-01 & 3.863391e-03\\\\\n",
+ "\t indian & indian & 1.431745e-02 & 0.9418551 & 2.945239e-02 & 8.721782e-03 & 5.653283e-03\\\\\n",
+ "\\end{tabular}\n"
+ ],
+ "text/html": [
+ "\n",
+ "A tibble: 5 × 7 \n",
+ "\n",
+ "\tcuisine .pred_class .pred_chinese .pred_indian .pred_japanese .pred_korean .pred_thai \n",
+ "\t<fct> <fct> <dbl> <dbl> <dbl> <dbl> <dbl> \n",
+ " \n",
+ "\n",
+ "\tindian thai 1.551259e-03 0.4587877 5.988039e-04 2.428503e-04 5.388194e-01 \n",
+ "\tindian indian 2.637133e-05 0.9999488 6.648651e-07 2.259993e-05 1.577948e-06 \n",
+ "\tindian indian 1.049433e-03 0.9909982 1.060937e-03 1.644947e-05 6.874989e-03 \n",
+ "\tindian indian 6.237482e-02 0.4763035 9.136702e-02 3.660913e-01 3.863391e-03 \n",
+ "\tindian indian 1.431745e-02 0.9418551 2.945239e-02 8.721782e-03 5.653283e-03 \n",
+ " \n",
+ "
\n"
+ ]
+ },
+ "metadata": {}
+ }
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 248
+ },
+ "id": "xdKNs-ZPMTJL",
+ "outputId": "68f6ac5a-725a-4eff-9ea6-481fef00e008"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "为什么模型非常确定第一条观察是泰国菜?\n",
+ "\n",
+ "## **🚀挑战**\n",
+ "\n",
+ "在本课中,你使用清理后的数据构建了一个机器学习模型,可以根据一系列食材预测国家菜系。花点时间阅读 [Tidymodels 提供的多种选项](https://www.tidymodels.org/find/parsnip/#models) 来分类数据,以及 [其他方法](https://parsnip.tidymodels.org/articles/articles/Examples.html#multinom_reg-models) 来拟合多项式回归。\n",
+ "\n",
+ "#### 感谢:\n",
+ "\n",
+ "[`Allison Horst`](https://twitter.com/allison_horst/) 创作了令人惊叹的插图,使 R 更加友好和吸引人。可以在她的 [画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM) 中找到更多插图。\n",
+ "\n",
+ "[Cassie Breviu](https://www.twitter.com/cassieview) 和 [Jen Looper](https://www.twitter.com/jenlooper) 创作了本模块的原始 Python 版本 ♥️\n",
+ "\n",
+ " \n",
+ "本来想加点笑话,但我对食物的双关语一窍不通 😅。\n",
+ "\n",
+ " \n",
+ "\n",
+ "祝学习愉快,\n",
+ "\n",
+ "[Eric](https://twitter.com/ericntay),微软金牌学习学生大使\n"
+ ],
+ "metadata": {
+ "id": "2tWVHMeLMYdM"
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/2-Classifiers-1/solution/notebook.ipynb b/translations/zh-CN/4-Classification/2-Classifiers-1/solution/notebook.ipynb
new file mode 100644
index 000000000..943f3d608
--- /dev/null
+++ b/translations/zh-CN/4-Classification/2-Classifiers-1/solution/notebook.ipynb
@@ -0,0 +1,281 @@
+{
+ "cells": [
+ {
+ "source": [
+ "# 构建分类模型\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n",
+ "0 0 indian 0 0 0 0 0 \n",
+ "1 1 indian 1 0 0 0 0 \n",
+ "2 2 indian 0 0 0 0 0 \n",
+ "3 3 indian 0 0 0 0 0 \n",
+ "4 4 indian 0 0 0 0 0 \n",
+ "\n",
+ " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 382 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n Unnamed: 0 \n cuisine \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n indian \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 2 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 3 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 4 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 382 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 1
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "cuisines_df = pd.read_csv(\"../../data/cleaned_cuisines.csv\")\n",
+ "cuisines_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.model_selection import train_test_split, cross_val_score\n",
+ "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n",
+ "from sklearn.svm import SVC\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0 indian\n",
+ "1 indian\n",
+ "2 indian\n",
+ "3 indian\n",
+ "4 indian\n",
+ "Name: cuisine, dtype: object"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 3
+ }
+ ],
+ "source": [
+ "cuisines_label_df = cuisines_df['cuisine']\n",
+ "cuisines_label_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " almond angelica anise anise_seed apple apple_brandy apricot \\\n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 \n",
+ "\n",
+ " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 380 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n artichoke \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 380 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ],
+ "source": [
+ "cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)\n",
+ "cuisines_feature_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Accuracy is 0.8181818181818182\n"
+ ]
+ }
+ ],
+ "source": [
+ "lr = LogisticRegression(multi_class='ovr',solver='liblinear')\n",
+ "model = lr.fit(X_train, np.ravel(y_train))\n",
+ "\n",
+ "accuracy = model.score(X_test, y_test)\n",
+ "print (\"Accuracy is {}\".format(accuracy))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "ingredients: Index(['artemisia', 'black_pepper', 'mushroom', 'shiitake', 'soy_sauce',\n 'vegetable_oil'],\n dtype='object')\ncuisine: korean\n"
+ ]
+ }
+ ],
+ "source": [
+ "# test an item\n",
+ "print(f'ingredients: {X_test.iloc[50][X_test.iloc[50]!=0].keys()}')\n",
+ "print(f'cuisine: {y_test.iloc[50]}')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " 0\n",
+ "korean 0.392231\n",
+ "chinese 0.372872\n",
+ "japanese 0.218825\n",
+ "thai 0.013427\n",
+ "indian 0.002645"
+ ],
+ "text/html": "\n\n
\n \n \n \n 0 \n \n \n \n \n korean \n 0.392231 \n \n \n chinese \n 0.372872 \n \n \n japanese \n 0.218825 \n \n \n thai \n 0.013427 \n \n \n indian \n 0.002645 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ],
+ "source": [
+ "#rehsape to 2d array and transpose\n",
+ "test= X_test.iloc[50].values.reshape(-1, 1).T\n",
+ "# predict with score\n",
+ "proba = model.predict_proba(test)\n",
+ "classes = model.classes_\n",
+ "# create df with classes and scores\n",
+ "resultdf = pd.DataFrame(data=proba, columns=classes)\n",
+ "\n",
+ "# create df to show results\n",
+ "topPrediction = resultdf.T.sort_values(by=[0], ascending = [False])\n",
+ "topPrediction.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n\n chinese 0.75 0.73 0.74 223\n indian 0.93 0.88 0.90 255\n japanese 0.78 0.78 0.78 253\n korean 0.87 0.86 0.86 236\n thai 0.76 0.84 0.80 232\n\n accuracy 0.82 1199\n macro avg 0.82 0.82 0.82 1199\nweighted avg 0.82 0.82 0.82 1199\n\n"
+ ]
+ }
+ ],
+ "source": [
+ "y_pred = model.predict(X_test)\r\n",
+ "print(classification_report(y_test,y_pred))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "9408506dd864f2b6e334c62f80c0cfcc",
+ "translation_date": "2025-09-03T20:20:25+00:00",
+ "source_file": "4-Classification/2-Classifiers-1/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/3-Classifiers-2/README.md b/translations/zh-CN/4-Classification/3-Classifiers-2/README.md
new file mode 100644
index 000000000..fd95c847e
--- /dev/null
+++ b/translations/zh-CN/4-Classification/3-Classifiers-2/README.md
@@ -0,0 +1,240 @@
+# 美食分类器 2
+
+在第二节分类课程中,您将探索更多分类数值数据的方法。同时,您还将了解选择不同分类器的影响。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+### 前提条件
+
+我们假设您已经完成了之前的课程,并在本四节课程的根目录中的 `data` 文件夹中准备了一个名为 _cleaned_cuisines.csv_ 的清理过的数据集。
+
+### 准备工作
+
+我们已经将清理过的数据集加载到您的 _notebook.ipynb_ 文件中,并将其分为 X 和 y 数据框,准备进行模型构建。
+
+## 分类地图
+
+之前,您已经了解了使用微软的速查表对数据进行分类的各种选项。Scikit-learn 提供了一个类似但更详细的速查表,可以进一步帮助您缩小选择范围(分类器的另一种说法是估计器):
+
+
+> 提示:[在线访问此地图](https://scikit-learn.org/stable/tutorial/machine_learning_map/),点击路径以阅读相关文档。
+
+### 计划
+
+一旦您对数据有了清晰的理解,这张地图就非常有用,您可以沿着它的路径做出决策:
+
+- 我们有 >50 个样本
+- 我们希望预测一个类别
+- 我们有标记数据
+- 我们的样本少于 100K
+- ✨ 我们可以选择一个线性 SVC
+- 如果这不起作用,因为我们有数值数据
+ - 我们可以尝试 ✨ KNeighbors 分类器
+ - 如果这不起作用,可以尝试 ✨ SVC 和 ✨ 集成分类器
+
+这是一条非常有帮助的路径。
+
+## 练习 - 划分数据
+
+按照这条路径,我们应该先导入一些需要使用的库。
+
+1. 导入所需的库:
+
+ ```python
+ from sklearn.neighbors import KNeighborsClassifier
+ from sklearn.linear_model import LogisticRegression
+ from sklearn.svm import SVC
+ from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier
+ from sklearn.model_selection import train_test_split, cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve
+ import numpy as np
+ ```
+
+1. 划分训练数据和测试数据:
+
+ ```python
+ X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)
+ ```
+
+## 线性 SVC 分类器
+
+支持向量聚类(SVC)是支持向量机(SVM)机器学习技术家族的一部分(下面可以了解更多)。在这种方法中,您可以选择一个“核函数”来决定如何聚类标签。“C”参数指的是“正则化”,用于调节参数的影响。核函数可以是[多种选项](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVC.html#sklearn.svm.SVC)之一;这里我们将其设置为“线性”,以确保我们使用线性 SVC。概率默认值为“false”;这里我们将其设置为“true”,以获取概率估计。我们将随机状态设置为“0”,以打乱数据以获取概率。
+
+### 练习 - 应用线性 SVC
+
+首先创建一个分类器数组。随着测试的进行,您将逐步向该数组添加内容。
+
+1. 从线性 SVC 开始:
+
+ ```python
+ C = 10
+ # Create different classifiers.
+ classifiers = {
+ 'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0)
+ }
+ ```
+
+2. 使用线性 SVC 训练您的模型并打印报告:
+
+ ```python
+ n_classifiers = len(classifiers)
+
+ for index, (name, classifier) in enumerate(classifiers.items()):
+ classifier.fit(X_train, np.ravel(y_train))
+
+ y_pred = classifier.predict(X_test)
+ accuracy = accuracy_score(y_test, y_pred)
+ print("Accuracy (train) for %s: %0.1f%% " % (name, accuracy * 100))
+ print(classification_report(y_test,y_pred))
+ ```
+
+ 结果相当不错:
+
+ ```output
+ Accuracy (train) for Linear SVC: 78.6%
+ precision recall f1-score support
+
+ chinese 0.71 0.67 0.69 242
+ indian 0.88 0.86 0.87 234
+ japanese 0.79 0.74 0.76 254
+ korean 0.85 0.81 0.83 242
+ thai 0.71 0.86 0.78 227
+
+ accuracy 0.79 1199
+ macro avg 0.79 0.79 0.79 1199
+ weighted avg 0.79 0.79 0.79 1199
+ ```
+
+## K-Neighbors 分类器
+
+K-Neighbors 是机器学习方法中“邻居”家族的一部分,可用于监督学习和非监督学习。在这种方法中,创建了预定义数量的点,并围绕这些点收集数据,以便预测数据的通用标签。
+
+### 练习 - 应用 K-Neighbors 分类器
+
+之前的分类器表现不错,适合数据,但也许我们可以获得更好的准确性。尝试使用 K-Neighbors 分类器。
+
+1. 在分类器数组中添加一行(在线性 SVC 项目后添加逗号):
+
+ ```python
+ 'KNN classifier': KNeighborsClassifier(C),
+ ```
+
+ 结果稍差一些:
+
+ ```output
+ Accuracy (train) for KNN classifier: 73.8%
+ precision recall f1-score support
+
+ chinese 0.64 0.67 0.66 242
+ indian 0.86 0.78 0.82 234
+ japanese 0.66 0.83 0.74 254
+ korean 0.94 0.58 0.72 242
+ thai 0.71 0.82 0.76 227
+
+ accuracy 0.74 1199
+ macro avg 0.76 0.74 0.74 1199
+ weighted avg 0.76 0.74 0.74 1199
+ ```
+
+ ✅ 了解 [K-Neighbors](https://scikit-learn.org/stable/modules/neighbors.html#neighbors)
+
+## 支持向量分类器
+
+支持向量分类器是[支持向量机](https://wikipedia.org/wiki/Support-vector_machine)机器学习方法家族的一部分,可用于分类和回归任务。SVM 将“训练样本映射到空间中的点”,以最大化两个类别之间的距离。后续数据被映射到该空间,以预测其类别。
+
+### 练习 - 应用支持向量分类器
+
+让我们尝试使用支持向量分类器获得更好的准确性。
+
+1. 在 K-Neighbors 项目后添加逗号,然后添加以下行:
+
+ ```python
+ 'SVC': SVC(),
+ ```
+
+ 结果非常好!
+
+ ```output
+ Accuracy (train) for SVC: 83.2%
+ precision recall f1-score support
+
+ chinese 0.79 0.74 0.76 242
+ indian 0.88 0.90 0.89 234
+ japanese 0.87 0.81 0.84 254
+ korean 0.91 0.82 0.86 242
+ thai 0.74 0.90 0.81 227
+
+ accuracy 0.83 1199
+ macro avg 0.84 0.83 0.83 1199
+ weighted avg 0.84 0.83 0.83 1199
+ ```
+
+ ✅ 了解 [支持向量](https://scikit-learn.org/stable/modules/svm.html#svm)
+
+## 集成分类器
+
+让我们沿着路径走到最后,尽管之前的测试结果已经非常好。尝试一些“集成分类器”,特别是随机森林和 AdaBoost:
+
+```python
+ 'RFST': RandomForestClassifier(n_estimators=100),
+ 'ADA': AdaBoostClassifier(n_estimators=100)
+```
+
+结果非常好,尤其是随机森林:
+
+```output
+Accuracy (train) for RFST: 84.5%
+ precision recall f1-score support
+
+ chinese 0.80 0.77 0.78 242
+ indian 0.89 0.92 0.90 234
+ japanese 0.86 0.84 0.85 254
+ korean 0.88 0.83 0.85 242
+ thai 0.80 0.87 0.83 227
+
+ accuracy 0.84 1199
+ macro avg 0.85 0.85 0.84 1199
+weighted avg 0.85 0.84 0.84 1199
+
+Accuracy (train) for ADA: 72.4%
+ precision recall f1-score support
+
+ chinese 0.64 0.49 0.56 242
+ indian 0.91 0.83 0.87 234
+ japanese 0.68 0.69 0.69 254
+ korean 0.73 0.79 0.76 242
+ thai 0.67 0.83 0.74 227
+
+ accuracy 0.72 1199
+ macro avg 0.73 0.73 0.72 1199
+weighted avg 0.73 0.72 0.72 1199
+```
+
+✅ 了解 [集成分类器](https://scikit-learn.org/stable/modules/ensemble.html)
+
+这种机器学习方法“结合多个基础估计器的预测”,以提高模型质量。在我们的示例中,我们使用了随机森林和 AdaBoost。
+
+- [随机森林](https://scikit-learn.org/stable/modules/ensemble.html#forest),一种平均方法,构建了一个随机性注入的“决策树森林”,以避免过拟合。n_estimators 参数设置为树的数量。
+
+- [AdaBoost](https://scikit-learn.org/stable/modules/generated/sklearn.ensemble.AdaBoostClassifier.html) 将分类器拟合到数据集,然后将该分类器的副本拟合到同一数据集。它关注错误分类项的权重,并调整下一分类器的拟合以进行纠正。
+
+---
+
+## 🚀挑战
+
+每种技术都有大量参数可以调整。研究每种技术的默认参数,并思考调整这些参数对模型质量的影响。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+这些课程中有很多术语,因此花点时间复习[这个术语表](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott),非常有用!
+
+## 作业
+
+[参数调整](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/3-Classifiers-2/assignment.md b/translations/zh-CN/4-Classification/3-Classifiers-2/assignment.md
new file mode 100644
index 000000000..1b6bd7806
--- /dev/null
+++ b/translations/zh-CN/4-Classification/3-Classifiers-2/assignment.md
@@ -0,0 +1,16 @@
+# 参数探索
+
+## 说明
+
+在使用这些分类器时,有许多参数是默认设置的。可以使用 VS Code 中的智能感知功能深入了解这些参数。在本课中选择一种机器学习分类技术,并通过调整各种参数值重新训练模型。创建一个笔记本,详细解释为什么某些更改可以提高模型质量,而其他更改则会降低质量。请详细说明您的答案。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ---------------------------------------------------------------------------------------------------------------------- | ----------------------------------------------------- | ----------------------------- |
+| | 提交的笔记本展示了一个完整构建的分类器,并通过文本框解释了参数调整及其变化 | 提交的笔记本部分完成或解释不充分 | 提交的笔记本存在问题或缺陷 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/3-Classifiers-2/notebook.ipynb b/translations/zh-CN/4-Classification/3-Classifiers-2/notebook.ipynb
new file mode 100644
index 000000000..cb14fc7aa
--- /dev/null
+++ b/translations/zh-CN/4-Classification/3-Classifiers-2/notebook.ipynb
@@ -0,0 +1,165 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "构建分类模型\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n",
+ "0 0 indian 0 0 0 0 0 \n",
+ "1 1 indian 1 0 0 0 0 \n",
+ "2 2 indian 0 0 0 0 0 \n",
+ "3 3 indian 0 0 0 0 0 \n",
+ "4 4 indian 0 0 0 0 0 \n",
+ "\n",
+ " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 382 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n Unnamed: 0 \n cuisine \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n indian \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 2 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 3 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 4 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 382 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "cuisines_df = pd.read_csv(\"../data/cleaned_cuisines.csv\")\n",
+ "cuisines_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0 indian\n",
+ "1 indian\n",
+ "2 indian\n",
+ "3 indian\n",
+ "4 indian\n",
+ "Name: cuisine, dtype: object"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ],
+ "source": [
+ "cuisines_label_df = cuisines_df['cuisine']\n",
+ "cuisines_label_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " almond angelica anise anise_seed apple apple_brandy apricot \\\n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 \n",
+ "\n",
+ " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 380 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n artichoke \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 380 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ],
+ "source": [
+ "cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)\n",
+ "cuisines_feature_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。虽然我们尽力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "15a83277036572e0773229b5f21c1e12",
+ "translation_date": "2025-09-03T20:27:20+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/3-Classifiers-2/solution/Julia/README.md b/translations/zh-CN/4-Classification/3-Classifiers-2/solution/Julia/README.md
new file mode 100644
index 000000000..f30fc4eeb
--- /dev/null
+++ b/translations/zh-CN/4-Classification/3-Classifiers-2/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb b/translations/zh-CN/4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb
new file mode 100644
index 000000000..97a1c66b1
--- /dev/null
+++ b/translations/zh-CN/4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb
@@ -0,0 +1,650 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "name": "lesson_12-R.ipynb",
+ "provenance": [],
+ "collapsed_sections": []
+ },
+ "kernelspec": {
+ "name": "ir",
+ "display_name": "R"
+ },
+ "language_info": {
+ "name": "R"
+ },
+ "coopTranslator": {
+ "original_hash": "fab50046ca413a38939d579f8432274f",
+ "translation_date": "2025-09-03T20:31:31+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/solution/R/lesson_12-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "jsFutf_ygqSx"
+ },
+ "source": [
+ "# 构建分类模型:美味的亚洲和印度美食\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HD54bEefgtNO"
+ },
+ "source": [
+ "## 美食分类器 2\n",
+ "\n",
+ "在第二节分类课程中,我们将探索`更多方法`来分类类别数据。同时,我们还会学习选择不同分类器所带来的影响。\n",
+ "\n",
+ "### [**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/23/)\n",
+ "\n",
+ "### **前置知识**\n",
+ "\n",
+ "我们假设你已经完成了之前的课程,因为我们会继续使用之前学到的一些概念。\n",
+ "\n",
+ "在本课程中,我们需要以下软件包:\n",
+ "\n",
+ "- `tidyverse`: [tidyverse](https://www.tidyverse.org/) 是一个[由 R 包组成的集合](https://www.tidyverse.org/packages),旨在让数据科学更快、更简单、更有趣!\n",
+ "\n",
+ "- `tidymodels`: [tidymodels](https://www.tidymodels.org/) 框架是一个[由 R 包组成的集合](https://www.tidymodels.org/packages),用于建模和机器学习。\n",
+ "\n",
+ "- `themis`: [themis 包](https://themis.tidymodels.org/) 提供了额外的配方步骤,用于处理不平衡数据。\n",
+ "\n",
+ "你可以通过以下命令安装它们:\n",
+ "\n",
+ "`install.packages(c(\"tidyverse\", \"tidymodels\", \"kernlab\", \"themis\", \"ranger\", \"xgboost\", \"kknn\"))`\n",
+ "\n",
+ "或者,下面的脚本会检查你是否已经安装了完成本模块所需的软件包,并在缺少时为你安装它们。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "vZ57IuUxgyQt"
+ },
+ "source": [
+ "suppressWarnings(if (!require(\"pacman\"))install.packages(\"pacman\"))\n",
+ "\n",
+ "pacman::p_load(tidyverse, tidymodels, themis, kernlab, ranger, xgboost, kknn)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "z22M-pj4g07x"
+ },
+ "source": [
+ "## **1. 分类图**\n",
+ "\n",
+ "在我们[上一节课](https://github.com/microsoft/ML-For-Beginners/tree/main/4-Classification/2-Classifiers-1)中,我们尝试解决一个问题:如何在多个模型之间进行选择?在很大程度上,这取决于数据的特性以及我们想要解决的问题类型(例如分类或回归)。\n",
+ "\n",
+ "之前,我们学习了使用微软的速查表对数据进行分类的各种选项。Python的机器学习框架Scikit-learn提供了一个类似但更细化的速查表,可以进一步帮助缩小你的估算器(分类器的另一种说法)的选择范围:\n",
+ "\n",
+ "\n",
+ " \n",
+ " \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "u1i3xRIVg7vG"
+ },
+ "source": [
+ "> 提示:[在线查看这张地图](https://scikit-learn.org/stable/tutorial/machine_learning_map/),并沿着路径点击以阅读相关文档。\n",
+ ">\n",
+ "> [Tidymodels参考网站](https://www.tidymodels.org/find/parsnip/#models)也提供了关于不同模型类型的优秀文档。\n",
+ "\n",
+ "### **计划** 🗺️\n",
+ "\n",
+ "这张地图在你清楚了解数据后非常有用,因为你可以沿着路径“走”到一个决策:\n",
+ "\n",
+ "- 我们有超过50个样本\n",
+ "\n",
+ "- 我们想预测一个类别\n",
+ "\n",
+ "- 我们有标注数据\n",
+ "\n",
+ "- 我们的样本少于10万\n",
+ "\n",
+ "- ✨ 我们可以选择线性SVC\n",
+ "\n",
+ "- 如果这不起作用,因为我们有数值数据\n",
+ "\n",
+ " - 我们可以尝试 ✨ KNeighbors分类器\n",
+ "\n",
+ " - 如果这不起作用,尝试 ✨ SVC 和 ✨ 集成分类器\n",
+ "\n",
+ "这是一条非常有用的路径。现在,让我们使用 [tidymodels](https://www.tidymodels.org/) 建模框架直接开始吧:一个一致且灵活的R包集合,旨在鼓励良好的统计实践 😊。\n",
+ "\n",
+ "## 2. 划分数据并处理不平衡数据集\n",
+ "\n",
+ "从之前的课程中,我们了解到不同菜系之间有一组常见的成分。此外,菜系的数量分布也非常不均衡。\n",
+ "\n",
+ "我们将通过以下方式处理这些问题:\n",
+ "\n",
+ "- 使用 `dplyr::select()` 删除那些在不同菜系之间造成混淆的最常见成分。\n",
+ "\n",
+ "- 使用一个 `recipe` 来预处理数据,使其通过应用 `过采样` 算法为建模做好准备。\n",
+ "\n",
+ "我们在之前的课程中已经看过这些内容,所以这应该会很轻松 🥳!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "6tj_rN00hClA"
+ },
+ "source": [
+ "# Load the core Tidyverse and Tidymodels packages\n",
+ "library(tidyverse)\n",
+ "library(tidymodels)\n",
+ "\n",
+ "# Load the original cuisines data\n",
+ "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/4-Classification/data/cuisines.csv\")\n",
+ "\n",
+ "# Drop id column, rice, garlic and ginger from our original data set\n",
+ "df_select <- df %>% \n",
+ " select(-c(1, rice, garlic, ginger)) %>%\n",
+ " # Encode cuisine column as categorical\n",
+ " mutate(cuisine = factor(cuisine))\n",
+ "\n",
+ "\n",
+ "# Create data split specification\n",
+ "set.seed(2056)\n",
+ "cuisines_split <- initial_split(data = df_select,\n",
+ " strata = cuisine,\n",
+ " prop = 0.7)\n",
+ "\n",
+ "# Extract the data in each split\n",
+ "cuisines_train <- training(cuisines_split)\n",
+ "cuisines_test <- testing(cuisines_split)\n",
+ "\n",
+ "# Display distribution of cuisines in the training set\n",
+ "cuisines_train %>% \n",
+ " count(cuisine) %>% \n",
+ " arrange(desc(n))"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zFin5yw3hHb1"
+ },
+ "source": [
+ "### 处理数据不平衡问题\n",
+ "\n",
+ "数据不平衡通常会对模型性能产生负面影响。许多模型在观察数量相等时表现最佳,因此在处理不平衡数据时往往会遇到困难。\n",
+ "\n",
+ "处理数据不平衡问题主要有两种方法:\n",
+ "\n",
+ "- 为少数类别添加观察值:`过采样`,例如使用 SMOTE 算法,该算法通过少数类别的近邻合成生成新的样本。\n",
+ "\n",
+ "- 从多数类别中移除观察值:`欠采样`\n",
+ "\n",
+ "在之前的课程中,我们演示了如何使用 `recipe` 来处理数据不平衡问题。`recipe` 可以被看作是一个蓝图,描述了应该对数据集应用哪些步骤以使其准备好进行数据分析。在我们的案例中,我们希望在 `训练集` 中实现菜系数量的均匀分布。让我们直接开始吧。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "cRzTnHolhLWd"
+ },
+ "source": [
+ "# Load themis package for dealing with imbalanced data\n",
+ "library(themis)\n",
+ "\n",
+ "# Create a recipe for preprocessing training data\n",
+ "cuisines_recipe <- recipe(cuisine ~ ., data = cuisines_train) %>%\n",
+ " step_smote(cuisine) \n",
+ "\n",
+ "# Print recipe\n",
+ "cuisines_recipe"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KxOQ2ORhhO81"
+ },
+ "source": [
+ "现在我们准备开始训练模型了 👩💻👨💻!\n",
+ "\n",
+ "## 3. 超越多项式回归模型\n",
+ "\n",
+ "在之前的课程中,我们学习了多项式回归模型。现在让我们探索一些更灵活的分类模型。\n",
+ "\n",
+ "### 支持向量机\n",
+ "\n",
+ "在分类的背景下,`支持向量机`是一种机器学习技术,它试图找到一个*超平面*来“最佳”地分隔不同的类别。让我们来看一个简单的例子:\n",
+ "\n",
+ "
\n",
+ " \n",
+ " https://commons.wikimedia.org/w/index.php?curid=22877598 \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "C4Wsd0vZhXYu"
+ },
+ "source": [
+ "H1~ 不会分隔类。H2~ 会分隔类,但仅有小的间距。H3~ 会以最大间距分隔类。\n",
+ "\n",
+ "#### 线性支持向量分类器\n",
+ "\n",
+ "支持向量聚类(SVC)是支持向量机(SVM)机器学习技术家族中的一种方法。在 SVC 中,超平面被选择为正确分隔`大多数`训练样本,但`可能会错误分类`一些样本。通过允许某些点位于错误的一侧,SVM 对异常值的鲁棒性更强,因此对新数据的泛化能力更好。调节这种违反规则的参数称为`cost`,其默认值为 1(参见 `help(\"svm_poly\")`)。\n",
+ "\n",
+ "让我们通过在多项式 SVM 模型中设置 `degree = 1` 来创建一个线性 SVC。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "vJpp6nuChlBz"
+ },
+ "source": [
+ "# Make a linear SVC specification\n",
+ "svc_linear_spec <- svm_poly(degree = 1) %>% \n",
+ " set_engine(\"kernlab\") %>% \n",
+ " set_mode(\"classification\")\n",
+ "\n",
+ "# Bundle specification and recipe into a worklow\n",
+ "svc_linear_wf <- workflow() %>% \n",
+ " add_recipe(cuisines_recipe) %>% \n",
+ " add_model(svc_linear_spec)\n",
+ "\n",
+ "# Print out workflow\n",
+ "svc_linear_wf"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "rDs8cWNkhoqu"
+ },
+ "source": [
+ "现在我们已经将预处理步骤和模型规范整合到一个*工作流*中,可以继续训练线性SVC并在此过程中评估结果。对于性能指标,我们可以创建一个指标集来评估:`准确率`、`敏感性`、`正预测值`和`F值`。\n",
+ "\n",
+ "> `augment()` 会向给定数据添加预测结果的列。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "81wiqcwuhrnq"
+ },
+ "source": [
+ "# Train a linear SVC model\n",
+ "svc_linear_fit <- svc_linear_wf %>% \n",
+ " fit(data = cuisines_train)\n",
+ "\n",
+ "# Create a metric set\n",
+ "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
+ "\n",
+ "\n",
+ "# Make predictions and Evaluate model performance\n",
+ "svc_linear_fit %>% \n",
+ " augment(new_data = cuisines_test) %>% \n",
+ " eval_metrics(truth = cuisine, estimate = .pred_class)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "0UFQvHf-huo3"
+ },
+ "source": [
+ "#### 支持向量机\n",
+ "\n",
+ "支持向量机(SVM)是支持向量分类器的扩展,用于处理类别之间的非线性边界。本质上,SVM通过使用*核技巧*来扩大特征空间,以适应类别之间的非线性关系。SVM使用的一种流行且极其灵活的核函数是*径向基函数*。让我们看看它在我们的数据上表现如何。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "-KX4S8mzhzmp"
+ },
+ "source": [
+ "set.seed(2056)\n",
+ "\n",
+ "# Make an RBF SVM specification\n",
+ "svm_rbf_spec <- svm_rbf() %>% \n",
+ " set_engine(\"kernlab\") %>% \n",
+ " set_mode(\"classification\")\n",
+ "\n",
+ "# Bundle specification and recipe into a worklow\n",
+ "svm_rbf_wf <- workflow() %>% \n",
+ " add_recipe(cuisines_recipe) %>% \n",
+ " add_model(svm_rbf_spec)\n",
+ "\n",
+ "\n",
+ "# Train an RBF model\n",
+ "svm_rbf_fit <- svm_rbf_wf %>% \n",
+ " fit(data = cuisines_train)\n",
+ "\n",
+ "\n",
+ "# Make predictions and Evaluate model performance\n",
+ "svm_rbf_fit %>% \n",
+ " augment(new_data = cuisines_test) %>% \n",
+ " eval_metrics(truth = cuisine, estimate = .pred_class)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QBFSa7WSh4HQ"
+ },
+ "source": [
+ "太棒了 🤩!\n",
+ "\n",
+ "> ✅ 请参阅:\n",
+ ">\n",
+ "> - [*支持向量机*](https://bradleyboehmke.github.io/HOML/svm.html),《Hands-on Machine Learning with R》\n",
+ ">\n",
+ "> - [*支持向量机*](https://www.statlearning.com/),《An Introduction to Statistical Learning with Applications in R》\n",
+ ">\n",
+ "> 了解更多内容。\n",
+ "\n",
+ "### 最近邻分类器\n",
+ "\n",
+ "*K*-最近邻(KNN)是一种算法,根据每个观测值与其他观测值的*相似性*来进行预测。\n",
+ "\n",
+ "让我们将其应用到我们的数据中。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "k4BxxBcdh9Ka"
+ },
+ "source": [
+ "# Make a KNN specification\n",
+ "knn_spec <- nearest_neighbor() %>% \n",
+ " set_engine(\"kknn\") %>% \n",
+ " set_mode(\"classification\")\n",
+ "\n",
+ "# Bundle recipe and model specification into a workflow\n",
+ "knn_wf <- workflow() %>% \n",
+ " add_recipe(cuisines_recipe) %>% \n",
+ " add_model(knn_spec)\n",
+ "\n",
+ "# Train a boosted tree model\n",
+ "knn_wf_fit <- knn_wf %>% \n",
+ " fit(data = cuisines_train)\n",
+ "\n",
+ "\n",
+ "# Make predictions and Evaluate model performance\n",
+ "knn_wf_fit %>% \n",
+ " augment(new_data = cuisines_test) %>% \n",
+ " eval_metrics(truth = cuisine, estimate = .pred_class)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "HaegQseriAcj"
+ },
+ "source": [
+ "看起来这个模型的表现不是很好。可能通过更改模型的参数(请参阅 `help(\"nearest_neighbor\")`)可以提升模型的性能。一定要尝试一下。\n",
+ "\n",
+ "> ✅ 请参考:\n",
+ ">\n",
+ "> - [Hands-on Machine Learning with R](https://bradleyboehmke.github.io/HOML/)\n",
+ ">\n",
+ "> - [An Introduction to Statistical Learning with Applications in R](https://www.statlearning.com/)\n",
+ ">\n",
+ "> 了解更多关于 *K*-最近邻分类器的信息。\n",
+ "\n",
+ "### 集成分类器\n",
+ "\n",
+ "集成算法通过结合多个基础估计器来构建一个优化模型,其方法包括:\n",
+ "\n",
+ "`bagging`:对一组基础模型应用*平均函数*\n",
+ "\n",
+ "`boosting`:构建一系列模型,彼此之间相互依赖,以提升预测性能。\n",
+ "\n",
+ "我们先尝试一个随机森林模型,它通过构建大量决策树并应用平均函数来生成一个更优的整体模型。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "49DPoVs6iK1M"
+ },
+ "source": [
+ "# Make a random forest specification\n",
+ "rf_spec <- rand_forest() %>% \n",
+ " set_engine(\"ranger\") %>% \n",
+ " set_mode(\"classification\")\n",
+ "\n",
+ "# Bundle recipe and model specification into a workflow\n",
+ "rf_wf <- workflow() %>% \n",
+ " add_recipe(cuisines_recipe) %>% \n",
+ " add_model(rf_spec)\n",
+ "\n",
+ "# Train a random forest model\n",
+ "rf_wf_fit <- rf_wf %>% \n",
+ " fit(data = cuisines_train)\n",
+ "\n",
+ "\n",
+ "# Make predictions and Evaluate model performance\n",
+ "rf_wf_fit %>% \n",
+ " augment(new_data = cuisines_test) %>% \n",
+ " eval_metrics(truth = cuisine, estimate = .pred_class)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "RGVYwC_aiUWc"
+ },
+ "source": [
+ "干得好 👏!\n",
+ "\n",
+ "我们也来尝试一下提升树模型。\n",
+ "\n",
+ "提升树是一种集成方法,它通过创建一系列连续的决策树,每棵树都依赖于前一棵树的结果,试图逐步减少误差。它重点关注被错误分类的项目的权重,并调整下一分类器的拟合以进行纠正。\n",
+ "\n",
+ "有多种方法可以拟合此模型(参见 `help(\"boost_tree\")`)。在这个例子中,我们将通过 `xgboost` 引擎来拟合提升树。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Py1YWo-micWs"
+ },
+ "source": [
+ "# Make a boosted tree specification\n",
+ "boost_spec <- boost_tree(trees = 200) %>% \n",
+ " set_engine(\"xgboost\") %>% \n",
+ " set_mode(\"classification\")\n",
+ "\n",
+ "# Bundle recipe and model specification into a workflow\n",
+ "boost_wf <- workflow() %>% \n",
+ " add_recipe(cuisines_recipe) %>% \n",
+ " add_model(boost_spec)\n",
+ "\n",
+ "# Train a boosted tree model\n",
+ "boost_wf_fit <- boost_wf %>% \n",
+ " fit(data = cuisines_train)\n",
+ "\n",
+ "\n",
+ "# Make predictions and Evaluate model performance\n",
+ "boost_wf_fit %>% \n",
+ " augment(new_data = cuisines_test) %>% \n",
+ " eval_metrics(truth = cuisine, estimate = .pred_class)"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "zNQnbuejigZM"
+ },
+ "source": [
+ "> ✅ 请参阅:\n",
+ ">\n",
+ "> - [社会科学中的机器学习](https://cimentadaj.github.io/ml_socsci/tree-based-methods.html#random-forests)\n",
+ ">\n",
+ "> - [R语言实践中的机器学习](https://bradleyboehmke.github.io/HOML/)\n",
+ ">\n",
+ "> - [统计学习导论:R语言应用](https://www.statlearning.com/)\n",
+ ">\n",
+ "> - - 探讨了AdaBoost模型,这是xgboost的一个不错替代方案。\n",
+ ">\n",
+ "> 了解更多关于集成分类器的信息。\n",
+ "\n",
+ "## 4. 额外内容 - 比较多个模型\n",
+ "\n",
+ "在本次实验中,我们已经拟合了相当多的模型 🙌。如果需要从不同的预处理器和/或模型规格中创建大量工作流,然后逐一计算性能指标,这可能会变得繁琐或费力。\n",
+ "\n",
+ "让我们看看是否可以通过创建一个函数来解决这个问题,该函数可以在训练集上拟合一组工作流,并根据测试集返回性能指标。我们将使用 [purrr](https://purrr.tidyverse.org/) 包中的 `map()` 和 `map_dfr()` 来对列表中的每个元素应用函数。\n",
+ "\n",
+ "> [`map()`](https://purrr.tidyverse.org/reference/map.html) 函数允许您用更简洁且更易读的代码替代许多for循环。学习 [`map()`](https://purrr.tidyverse.org/reference/map.html) 函数的最佳地方是《R语言数据科学》中的[迭代章节](http://r4ds.had.co.nz/iteration.html)。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Qzb7LyZnimd2"
+ },
+ "source": [
+ "set.seed(2056)\n",
+ "\n",
+ "# Create a metric set\n",
+ "eval_metrics <- metric_set(ppv, sens, accuracy, f_meas)\n",
+ "\n",
+ "# Define a function that returns performance metrics\n",
+ "compare_models <- function(workflow_list, train_set, test_set){\n",
+ " \n",
+ " suppressWarnings(\n",
+ " # Fit each model to the train_set\n",
+ " map(workflow_list, fit, data = train_set) %>% \n",
+ " # Make predictions on the test set\n",
+ " map_dfr(augment, new_data = test_set, .id = \"model\") %>%\n",
+ " # Select desired columns\n",
+ " select(model, cuisine, .pred_class) %>% \n",
+ " # Evaluate model performance\n",
+ " group_by(model) %>% \n",
+ " eval_metrics(truth = cuisine, estimate = .pred_class) %>% \n",
+ " ungroup()\n",
+ " )\n",
+ " \n",
+ "} # End of function"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Fwa712sNisDA"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "3i4VJOi2iu-a"
+ },
+ "source": [
+ "# Make a list of workflows\n",
+ "workflow_list <- list(\n",
+ " \"svc\" = svc_linear_wf,\n",
+ " \"svm\" = svm_rbf_wf,\n",
+ " \"knn\" = knn_wf,\n",
+ " \"random_forest\" = rf_wf,\n",
+ " \"xgboost\" = boost_wf)\n",
+ "\n",
+ "# Call the function\n",
+ "set.seed(2056)\n",
+ "perf_metrics <- compare_models(workflow_list = workflow_list, train_set = cuisines_train, test_set = cuisines_test)\n",
+ "\n",
+ "# Print out performance metrics\n",
+ "perf_metrics %>% \n",
+ " group_by(.metric) %>% \n",
+ " arrange(desc(.estimate)) %>% \n",
+ " slice_head(n=7)\n",
+ "\n",
+ "# Compare accuracy\n",
+ "perf_metrics %>% \n",
+ " filter(.metric == \"accuracy\") %>% \n",
+ " arrange(desc(.estimate))\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "KuWK_lEli4nW"
+ },
+ "source": [
+ "[**workflowset**](https://workflowsets.tidymodels.org/) 包允许用户创建并轻松拟合大量模型,但主要设计用于与诸如 `交叉验证` 之类的重采样技术配合使用,这是一种我们尚未涉及的方法。\n",
+ "\n",
+ "## **🚀挑战**\n",
+ "\n",
+ "每种技术都有许多参数可以调整,例如 SVM 中的 `cost`,KNN 中的 `neighbors`,随机森林中的 `mtry`(随机选择的预测变量)。\n",
+ "\n",
+ "研究每种模型的默认参数,并思考调整这些参数对模型质量的影响。\n",
+ "\n",
+ "要了解特定模型及其参数的更多信息,请使用:`help(\"model\")`,例如 `help(\"rand_forest\")`\n",
+ "\n",
+ "> 实际中,我们通常通过在一个 `模拟数据集` 上训练多个模型并测量这些模型的表现来*估计*这些参数的*最佳值*。这个过程称为 **调参**。\n",
+ "\n",
+ "### [**课后测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/24/)\n",
+ "\n",
+ "### **复习与自学**\n",
+ "\n",
+ "这些课程中有很多术语,因此花点时间查看[这个列表](https://docs.microsoft.com/dotnet/machine-learning/resources/glossary?WT.mc_id=academic-77952-leestott)中的有用术语!\n",
+ "\n",
+ "#### 特别感谢:\n",
+ "\n",
+ "[`Allison Horst`](https://twitter.com/allison_horst/) 创作了令人惊叹的插图,使 R 更加友好和吸引人。可以在她的[画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)中找到更多插图。\n",
+ "\n",
+ "[Cassie Breviu](https://www.twitter.com/cassieview) 和 [Jen Looper](https://www.twitter.com/jenlooper) 创作了本模块的原始 Python 版本 ♥️\n",
+ "\n",
+ "祝学习愉快,\n",
+ "\n",
+ "[Eric](https://twitter.com/ericntay),微软金牌学习学生大使。\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 插图作者 @allison_horst \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/3-Classifiers-2/solution/notebook.ipynb b/translations/zh-CN/4-Classification/3-Classifiers-2/solution/notebook.ipynb
new file mode 100644
index 000000000..59bf6f5f0
--- /dev/null
+++ b/translations/zh-CN/4-Classification/3-Classifiers-2/solution/notebook.ipynb
@@ -0,0 +1,304 @@
+{
+ "cells": [
+ {
+ "source": [
+ "# 构建更多分类模型\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n",
+ "0 0 indian 0 0 0 0 0 \n",
+ "1 1 indian 1 0 0 0 0 \n",
+ "2 2 indian 0 0 0 0 0 \n",
+ "3 3 indian 0 0 0 0 0 \n",
+ "4 4 indian 0 0 0 0 0 \n",
+ "\n",
+ " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 382 columns]"
+ ],
+ "text/html": "
\n\n
\n \n \n \n Unnamed: 0 \n cuisine \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n indian \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 2 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 3 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 4 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 382 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 1
+ }
+ ],
+ "source": [
+ "import pandas as pd\n",
+ "cuisines_df = pd.read_csv(\"../../data/cleaned_cuisines.csv\")\n",
+ "cuisines_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0 indian\n",
+ "1 indian\n",
+ "2 indian\n",
+ "3 indian\n",
+ "4 indian\n",
+ "Name: cuisine, dtype: object"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 2
+ }
+ ],
+ "source": [
+ "cuisines_label_df = cuisines_df['cuisine']\n",
+ "cuisines_label_df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " almond angelica anise anise_seed apple apple_brandy apricot \\\n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 \n",
+ "\n",
+ " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 380 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n artichoke \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 380 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 3
+ }
+ ],
+ "source": [
+ "cuisines_feature_df = cuisines_df.drop(['Unnamed: 0', 'cuisine'], axis=1)\n",
+ "cuisines_feature_df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "# 尝试不同的分类器\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.neighbors import KNeighborsClassifier\n",
+ "from sklearn.linear_model import LogisticRegression\n",
+ "from sklearn.svm import SVC\n",
+ "from sklearn.ensemble import RandomForestClassifier, AdaBoostClassifier\n",
+ "from sklearn.model_selection import train_test_split, cross_val_score\n",
+ "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report, precision_recall_curve\n",
+ "import numpy as np"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_train, X_test, y_train, y_test = train_test_split(cuisines_feature_df, cuisines_label_df, test_size=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "\n",
+ "C = 10\n",
+ "# Create different classifiers.\n",
+ "classifiers = {\n",
+ " 'Linear SVC': SVC(kernel='linear', C=C, probability=True,random_state=0),\n",
+ " 'KNN classifier': KNeighborsClassifier(C),\n",
+ " 'SVC': SVC(),\n",
+ " 'RFST': RandomForestClassifier(n_estimators=100),\n",
+ " 'ADA': AdaBoostClassifier(n_estimators=100)\n",
+ " \n",
+ "}\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Accuracy (train) for Linear SVC: 76.4% \n",
+ " precision recall f1-score support\n",
+ "\n",
+ " chinese 0.64 0.66 0.65 242\n",
+ " indian 0.91 0.86 0.89 236\n",
+ " japanese 0.72 0.73 0.73 245\n",
+ " korean 0.83 0.75 0.79 234\n",
+ " thai 0.75 0.82 0.78 242\n",
+ "\n",
+ " accuracy 0.76 1199\n",
+ " macro avg 0.77 0.76 0.77 1199\n",
+ "weighted avg 0.77 0.76 0.77 1199\n",
+ "\n",
+ "Accuracy (train) for KNN classifier: 70.7% \n",
+ " precision recall f1-score support\n",
+ "\n",
+ " chinese 0.65 0.63 0.64 242\n",
+ " indian 0.84 0.81 0.82 236\n",
+ " japanese 0.60 0.81 0.69 245\n",
+ " korean 0.89 0.53 0.67 234\n",
+ " thai 0.69 0.75 0.72 242\n",
+ "\n",
+ " accuracy 0.71 1199\n",
+ " macro avg 0.73 0.71 0.71 1199\n",
+ "weighted avg 0.73 0.71 0.71 1199\n",
+ "\n",
+ "Accuracy (train) for SVC: 80.1% \n",
+ " precision recall f1-score support\n",
+ "\n",
+ " chinese 0.71 0.69 0.70 242\n",
+ " indian 0.92 0.92 0.92 236\n",
+ " japanese 0.77 0.78 0.77 245\n",
+ " korean 0.87 0.77 0.82 234\n",
+ " thai 0.75 0.86 0.80 242\n",
+ "\n",
+ " accuracy 0.80 1199\n",
+ " macro avg 0.80 0.80 0.80 1199\n",
+ "weighted avg 0.80 0.80 0.80 1199\n",
+ "\n",
+ "Accuracy (train) for RFST: 82.8% \n",
+ " precision recall f1-score support\n",
+ "\n",
+ " chinese 0.80 0.75 0.77 242\n",
+ " indian 0.90 0.91 0.90 236\n",
+ " japanese 0.82 0.78 0.80 245\n",
+ " korean 0.85 0.82 0.83 234\n",
+ " thai 0.78 0.89 0.83 242\n",
+ "\n",
+ " accuracy 0.83 1199\n",
+ " macro avg 0.83 0.83 0.83 1199\n",
+ "weighted avg 0.83 0.83 0.83 1199\n",
+ "\n",
+ "Accuracy (train) for ADA: 71.1% \n",
+ " precision recall f1-score support\n",
+ "\n",
+ " chinese 0.60 0.57 0.58 242\n",
+ " indian 0.87 0.84 0.86 236\n",
+ " japanese 0.71 0.60 0.65 245\n",
+ " korean 0.68 0.78 0.72 234\n",
+ " thai 0.70 0.78 0.74 242\n",
+ "\n",
+ " accuracy 0.71 1199\n",
+ " macro avg 0.71 0.71 0.71 1199\n",
+ "weighted avg 0.71 0.71 0.71 1199\n",
+ "\n"
+ ]
+ }
+ ],
+ "source": [
+ "n_classifiers = len(classifiers)\n",
+ "\n",
+ "for index, (name, classifier) in enumerate(classifiers.items()):\n",
+ " classifier.fit(X_train, np.ravel(y_train))\n",
+ "\n",
+ " y_pred = classifier.predict(X_test)\n",
+ " accuracy = accuracy_score(y_test, y_pred)\n",
+ " print(\"Accuracy (train) for %s: %0.1f%% \" % (name, accuracy * 100))\n",
+ " print(classification_report(y_test,y_pred))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "7ea2b714669c823a596d986ba2d5739f",
+ "translation_date": "2025-09-03T20:27:44+00:00",
+ "source_file": "4-Classification/3-Classifiers-2/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 4
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/4-Applied/README.md b/translations/zh-CN/4-Classification/4-Applied/README.md
new file mode 100644
index 000000000..16e114d3b
--- /dev/null
+++ b/translations/zh-CN/4-Classification/4-Applied/README.md
@@ -0,0 +1,320 @@
+# 构建一个美食推荐网页应用
+
+在本课中,您将使用之前课程中学到的一些技术以及贯穿整个系列的美食数据集,构建一个分类模型。此外,您还将构建一个小型网页应用来使用保存的模型,并利用 Onnx 的网页运行时。
+
+机器学习最实用的用途之一是构建推荐系统,今天您可以迈出这一方向的第一步!
+
+[](https://youtu.be/17wdM9AHMfg "应用机器学习")
+
+> 🎥 点击上方图片观看视频:Jen Looper 使用分类美食数据构建网页应用
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+在本课中,您将学习:
+
+- 如何构建模型并将其保存为 Onnx 模型
+- 如何使用 Netron 检查模型
+- 如何在网页应用中使用您的模型进行推理
+
+## 构建您的模型
+
+构建应用型机器学习系统是将这些技术应用于业务系统的重要部分。通过使用 Onnx,您可以在网页应用中使用模型(因此在需要时也可以离线使用)。
+
+在[之前的课程](../../3-Web-App/1-Web-App/README.md)中,您构建了一个关于 UFO 目击事件的回归模型,将其“pickle”保存,并在 Flask 应用中使用。虽然这种架构非常有用,但它是一个全栈 Python 应用,而您的需求可能包括使用 JavaScript 应用。
+
+在本课中,您可以构建一个基于 JavaScript 的基础推理系统。不过,首先需要训练一个模型并将其转换为 Onnx 格式。
+
+## 练习 - 训练分类模型
+
+首先,使用我们之前清理过的美食数据集训练一个分类模型。
+
+1. 首先导入有用的库:
+
+ ```python
+ !pip install skl2onnx
+ import pandas as pd
+ ```
+
+ 您需要 '[skl2onnx](https://onnx.ai/sklearn-onnx/)' 来帮助将 Scikit-learn 模型转换为 Onnx 格式。
+
+1. 然后,像之前课程中一样使用 `read_csv()` 读取 CSV 文件来处理数据:
+
+ ```python
+ data = pd.read_csv('../data/cleaned_cuisines.csv')
+ data.head()
+ ```
+
+1. 删除前两列不必要的数据,并将剩余数据保存为 'X':
+
+ ```python
+ X = data.iloc[:,2:]
+ X.head()
+ ```
+
+1. 将标签保存为 'y':
+
+ ```python
+ y = data[['cuisine']]
+ y.head()
+
+ ```
+
+### 开始训练流程
+
+我们将使用具有良好准确性的 'SVC' 库。
+
+1. 从 Scikit-learn 导入相关库:
+
+ ```python
+ from sklearn.model_selection import train_test_split
+ from sklearn.svm import SVC
+ from sklearn.model_selection import cross_val_score
+ from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report
+ ```
+
+1. 分离训练集和测试集:
+
+ ```python
+ X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)
+ ```
+
+1. 像之前课程中一样构建一个 SVC 分类模型:
+
+ ```python
+ model = SVC(kernel='linear', C=10, probability=True,random_state=0)
+ model.fit(X_train,y_train.values.ravel())
+ ```
+
+1. 现在,测试您的模型,调用 `predict()`:
+
+ ```python
+ y_pred = model.predict(X_test)
+ ```
+
+1. 打印分类报告以检查模型质量:
+
+ ```python
+ print(classification_report(y_test,y_pred))
+ ```
+
+ 如我们之前所见,准确性很好:
+
+ ```output
+ precision recall f1-score support
+
+ chinese 0.72 0.69 0.70 257
+ indian 0.91 0.87 0.89 243
+ japanese 0.79 0.77 0.78 239
+ korean 0.83 0.79 0.81 236
+ thai 0.72 0.84 0.78 224
+
+ accuracy 0.79 1199
+ macro avg 0.79 0.79 0.79 1199
+ weighted avg 0.79 0.79 0.79 1199
+ ```
+
+### 将模型转换为 Onnx
+
+确保使用正确的张量数量进行转换。此数据集列出了 380 种食材,因此您需要在 `FloatTensorType` 中注明该数量:
+
+1. 使用张量数量 380 进行转换。
+
+ ```python
+ from skl2onnx import convert_sklearn
+ from skl2onnx.common.data_types import FloatTensorType
+
+ initial_type = [('float_input', FloatTensorType([None, 380]))]
+ options = {id(model): {'nocl': True, 'zipmap': False}}
+ ```
+
+1. 创建 onx 并保存为文件 **model.onnx**:
+
+ ```python
+ onx = convert_sklearn(model, initial_types=initial_type, options=options)
+ with open("./model.onnx", "wb") as f:
+ f.write(onx.SerializeToString())
+ ```
+
+ > 注意,您可以在转换脚本中传递[选项](https://onnx.ai/sklearn-onnx/parameterized.html)。在本例中,我们将 'nocl' 设置为 True,将 'zipmap' 设置为 False。由于这是一个分类模型,您可以选择移除 ZipMap,它会生成一个字典列表(不必要)。`nocl` 指的是模型中是否包含类别信息。通过将 `nocl` 设置为 'True' 来减小模型的大小。
+
+运行整个笔记本后,您将构建一个 Onnx 模型并将其保存到此文件夹中。
+
+## 查看您的模型
+
+Onnx 模型在 Visual Studio Code 中不太直观,但有一个非常好的免费软件,许多研究人员使用它来可视化模型,以确保模型构建正确。下载 [Netron](https://github.com/lutzroeder/Netron) 并打开您的 model.onnx 文件。您可以看到您的简单模型被可视化,包含 380 个输入和分类器:
+
+
+
+Netron 是一个查看模型的有用工具。
+
+现在您可以在网页应用中使用这个简洁的模型了。让我们构建一个应用,当您查看冰箱并试图决定如何利用剩余食材制作某种美食时,它会派上用场。
+
+## 构建推荐网页应用
+
+您可以直接在网页应用中使用您的模型。这种架构还允许您在本地运行,甚至在需要时离线运行。首先,在存储 `model.onnx` 文件的同一文件夹中创建一个 `index.html` 文件。
+
+1. 在此文件 _index.html_ 中,添加以下标记:
+
+ ```html
+
+
+
+
+ ...
+
+
+ ```
+
+1. 现在,在 `body` 标签内添加一些标记以显示一些食材的复选框列表:
+
+ ```html
+ Check your refrigerator. What can you create?
+
+
+ What kind of cuisine can you make?
+
+ ```
+
+ 注意,每个复选框都被赋予了一个值。这反映了食材在数据集中的索引位置。例如,苹果在这个按字母顺序排列的列表中占据第五列,因此其值为 '4'(因为我们从 0 开始计数)。您可以查阅 [ingredients spreadsheet](../../../../4-Classification/data/ingredient_indexes.csv) 来找到某个食材的索引。
+
+ 继续在 index.html 文件中工作,在最后一个关闭的 `` 后添加一个脚本块,其中调用了模型。
+
+1. 首先,导入 [Onnx Runtime](https://www.onnxruntime.ai/):
+
+ ```html
+
+ ```
+
+ > Onnx Runtime 用于支持在广泛的硬件平台上运行您的 Onnx 模型,包括优化和使用的 API。
+
+1. 一旦 Runtime 就位,您可以调用它:
+
+ ```html
+
+ ```
+
+在此代码中,发生了以下几件事:
+
+1. 您创建了一个包含 380 个可能值(1 或 0)的数组,用于根据食材复选框是否被选中来设置并发送到模型进行推理。
+2. 您创建了一个复选框数组以及一个在应用启动时确定它们是否被选中的 `init` 函数。当复选框被选中时,`ingredients` 数组会被修改以反映所选食材。
+3. 您创建了一个 `testCheckboxes` 函数,用于检查是否有复选框被选中。
+4. 您使用 `startInference` 函数,当按钮被按下时,如果有复选框被选中,您就开始推理。
+5. 推理流程包括:
+ 1. 设置模型的异步加载
+ 2. 创建一个发送到模型的张量结构
+ 3. 创建反映您在训练模型时创建的 `float_input` 输入的 'feeds'(您可以使用 Netron 验证该名称)
+ 4. 将这些 'feeds' 发送到模型并等待响应
+
+## 测试您的应用
+
+在存放 index.html 文件的文件夹中打开 Visual Studio Code 的终端会话。确保您已全局安装 [http-server](https://www.npmjs.com/package/http-server),然后在提示符下输入 `http-server`。一个本地主机将打开,您可以查看您的网页应用。根据各种食材检查推荐的美食:
+
+
+
+恭喜,您已经创建了一个带有几个字段的“推荐”网页应用。花点时间完善这个系统吧!
+
+## 🚀挑战
+
+您的网页应用非常简约,因此请继续使用 [ingredient_indexes](../../../../4-Classification/data/ingredient_indexes.csv) 数据中的食材及其索引来完善它。哪些风味组合可以制作出某种国家菜肴?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+虽然本课只是简单介绍了创建食材推荐系统的实用性,但这一领域的机器学习应用有许多丰富的示例。阅读更多关于这些系统如何构建的内容:
+
+- https://www.sciencedirect.com/topics/computer-science/recommendation-engine
+- https://www.technologyreview.com/2014/08/25/171547/the-ultimate-challenge-for-recommendation-engines/
+- https://www.technologyreview.com/2015/03/23/168831/everything-is-a-recommendation/
+
+## 作业
+
+[构建一个新的推荐系统](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/4-Applied/assignment.md b/translations/zh-CN/4-Classification/4-Applied/assignment.md
new file mode 100644
index 000000000..4598f7bf1
--- /dev/null
+++ b/translations/zh-CN/4-Classification/4-Applied/assignment.md
@@ -0,0 +1,16 @@
+# 构建推荐系统
+
+## 说明
+
+通过本课的练习,你已经了解了如何使用 Onnx Runtime 和转换后的 Onnx 模型来构建基于 JavaScript 的网页应用程序。尝试使用本课中的数据或其他来源的数据(请注明来源)来构建一个新的推荐系统。你可以根据不同的性格特征创建一个宠物推荐系统,或者根据一个人的心情创建一个音乐类型推荐系统。发挥你的创造力吧!
+
+## 评分标准
+
+| 标准 | 杰出表现 | 合格表现 | 有待改进 |
+| -------- | --------------------------------------------------------------------- | ------------------------------------- | --------------------------------- |
+| | 提供了一个网页应用程序和笔记本,两者均有良好的文档记录且运行正常 | 两者之一缺失或存在缺陷 | 两者均缺失或存在缺陷 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/4-Applied/notebook.ipynb b/translations/zh-CN/4-Classification/4-Applied/notebook.ipynb
new file mode 100644
index 000000000..8180ade04
--- /dev/null
+++ b/translations/zh-CN/4-Classification/4-Applied/notebook.ipynb
@@ -0,0 +1,41 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": 3
+ },
+ "orig_nbformat": 4,
+ "coopTranslator": {
+ "original_hash": "2f3e0d9e9ac5c301558fb8bf733ac0cb",
+ "translation_date": "2025-09-03T20:26:38+00:00",
+ "source_file": "4-Classification/4-Applied/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "# 建立一个美食推荐系统\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/4-Applied/solution/notebook.ipynb b/translations/zh-CN/4-Classification/4-Applied/solution/notebook.ipynb
new file mode 100644
index 000000000..d59351614
--- /dev/null
+++ b/translations/zh-CN/4-Classification/4-Applied/solution/notebook.ipynb
@@ -0,0 +1,292 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "49325d6dd12a3628fc64fa7ccb1a80ff",
+ "translation_date": "2025-09-03T20:26:59+00:00",
+ "source_file": "4-Classification/4-Applied/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "# 建立一个美食推荐系统\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: skl2onnx in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (1.8.0)\n",
+ "Requirement already satisfied: protobuf in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from skl2onnx) (3.8.0)\n",
+ "Requirement already satisfied: numpy>=1.15 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from skl2onnx) (1.19.2)\n",
+ "Requirement already satisfied: onnx>=1.2.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from skl2onnx) (1.9.0)\n",
+ "Requirement already satisfied: six in /Users/jenlooper/Library/Python/3.7/lib/python/site-packages (from skl2onnx) (1.12.0)\n",
+ "Requirement already satisfied: onnxconverter-common<1.9,>=1.6.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from skl2onnx) (1.8.1)\n",
+ "Requirement already satisfied: scikit-learn>=0.19 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from skl2onnx) (0.24.2)\n",
+ "Requirement already satisfied: scipy>=1.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from skl2onnx) (1.4.1)\n",
+ "Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from protobuf->skl2onnx) (45.1.0)\n",
+ "Requirement already satisfied: typing-extensions>=3.6.2.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from onnx>=1.2.1->skl2onnx) (3.10.0.0)\n",
+ "Requirement already satisfied: threadpoolctl>=2.0.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from scikit-learn>=0.19->skl2onnx) (2.1.0)\n",
+ "Requirement already satisfied: joblib>=0.11 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from scikit-learn>=0.19->skl2onnx) (0.16.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n",
+ "You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install skl2onnx"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 59,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 60,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " Unnamed: 0 cuisine almond angelica anise anise_seed apple \\\n",
+ "0 0 indian 0 0 0 0 0 \n",
+ "1 1 indian 1 0 0 0 0 \n",
+ "2 2 indian 0 0 0 0 0 \n",
+ "3 3 indian 0 0 0 0 0 \n",
+ "4 4 indian 0 0 0 0 0 \n",
+ "\n",
+ " apple_brandy apricot armagnac ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 382 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n Unnamed: 0 \n cuisine \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n indian \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 2 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 3 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 4 \n indian \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 382 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 60
+ }
+ ],
+ "source": [
+ "data = pd.read_csv('../../data/cleaned_cuisines.csv')\n",
+ "data.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 61,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " almond angelica anise anise_seed apple apple_brandy apricot \\\n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 1 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 0 0 \n",
+ "\n",
+ " armagnac artemisia artichoke ... whiskey white_bread white_wine \\\n",
+ "0 0 0 0 ... 0 0 0 \n",
+ "1 0 0 0 ... 0 0 0 \n",
+ "2 0 0 0 ... 0 0 0 \n",
+ "3 0 0 0 ... 0 0 0 \n",
+ "4 0 0 0 ... 0 0 0 \n",
+ "\n",
+ " whole_grain_wheat_flour wine wood yam yeast yogurt zucchini \n",
+ "0 0 0 0 0 0 0 0 \n",
+ "1 0 0 0 0 0 0 0 \n",
+ "2 0 0 0 0 0 0 0 \n",
+ "3 0 0 0 0 0 0 0 \n",
+ "4 0 0 0 0 0 1 0 \n",
+ "\n",
+ "[5 rows x 380 columns]"
+ ],
+ "text/html": "\n\n
\n \n \n \n almond \n angelica \n anise \n anise_seed \n apple \n apple_brandy \n apricot \n armagnac \n artemisia \n artichoke \n ... \n whiskey \n white_bread \n white_wine \n whole_grain_wheat_flour \n wine \n wood \n yam \n yeast \n yogurt \n zucchini \n \n \n \n \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 1 \n 1 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 2 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 3 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n \n \n 4 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n ... \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 0 \n 1 \n 0 \n \n \n
\n
5 rows × 380 columns
\n
"
+ },
+ "metadata": {},
+ "execution_count": 61
+ }
+ ],
+ "source": [
+ "X = data.iloc[:,2:]\n",
+ "X.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 62,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " cuisine\n",
+ "0 indian\n",
+ "1 indian\n",
+ "2 indian\n",
+ "3 indian\n",
+ "4 indian"
+ ],
+ "text/html": "\n\n
\n \n \n \n cuisine \n \n \n \n \n 0 \n indian \n \n \n 1 \n indian \n \n \n 2 \n indian \n \n \n 3 \n indian \n \n \n 4 \n indian \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 62
+ }
+ ],
+ "source": [
+ "y = data[['cuisine']]\n",
+ "y.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 63,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.model_selection import train_test_split\n",
+ "from sklearn.svm import SVC\n",
+ "from sklearn.model_selection import cross_val_score\n",
+ "from sklearn.metrics import accuracy_score,precision_score,confusion_matrix,classification_report"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 64,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "X_train, X_test, y_train, y_test = train_test_split(X,y,test_size=0.3)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 65,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "SVC(C=10, kernel='linear', probability=True, random_state=0)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 65
+ }
+ ],
+ "source": [
+ "model = SVC(kernel='linear', C=10, probability=True,random_state=0)\n",
+ "model.fit(X_train,y_train.values.ravel())\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 66,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "y_pred = model.predict(X_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 67,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " precision recall f1-score support\n\n chinese 0.72 0.70 0.71 236\n indian 0.91 0.88 0.89 243\n japanese 0.80 0.75 0.77 240\n korean 0.80 0.81 0.81 230\n thai 0.76 0.85 0.80 250\n\n accuracy 0.80 1199\n macro avg 0.80 0.80 0.80 1199\nweighted avg 0.80 0.80 0.80 1199\n\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(classification_report(y_test,y_pred))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 68,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from skl2onnx import convert_sklearn\n",
+ "from skl2onnx.common.data_types import FloatTensorType\n",
+ "\n",
+ "initial_type = [('float_input', FloatTensorType([None, 380]))]\n",
+ "options = {id(model): {'nocl': True, 'zipmap': False}}\n",
+ "onx = convert_sklearn(model, initial_types=initial_type, options=options)\n",
+ "with open(\"./model.onnx\", \"wb\") as f:\n",
+ " f.write(onx.SerializeToString())\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/4-Classification/README.md b/translations/zh-CN/4-Classification/README.md
new file mode 100644
index 000000000..62bdfc90f
--- /dev/null
+++ b/translations/zh-CN/4-Classification/README.md
@@ -0,0 +1,32 @@
+# 开始学习分类
+
+## 地区主题:美味的亚洲和印度美食 🍜
+
+在亚洲和印度,饮食文化极其多样化,而且非常美味!让我们来看看有关地区美食的数据,试着了解它们的食材。
+
+
+> 图片由 Lisheng Chang 提供,发布在 Unsplash
+
+## 你将学到什么
+
+在本节中,你将基于之前对回归的学习,进一步了解其他分类器,这些分类器可以帮助你更好地理解数据。
+
+> 有一些非常实用的低代码工具可以帮助你学习如何使用分类模型。试试 [Azure ML 完成这个任务](https://docs.microsoft.com/learn/modules/create-classification-model-azure-machine-learning-designer/?WT.mc_id=academic-77952-leestott)
+
+## 课程
+
+1. [分类简介](1-Introduction/README.md)
+2. [更多分类器](2-Classifiers-1/README.md)
+3. [其他分类器](3-Classifiers-2/README.md)
+4. [应用机器学习:构建一个网络应用](4-Applied/README.md)
+
+## 致谢
+
+"开始学习分类" 由 [Cassie Breviu](https://www.twitter.com/cassiebreviu) 和 [Jen Looper](https://www.twitter.com/jenlooper) 倾情创作。
+
+美味的美食数据集来源于 [Kaggle](https://www.kaggle.com/hoandan/asian-and-indian-cuisines)。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/1-Visualize/README.md b/translations/zh-CN/5-Clustering/1-Visualize/README.md
new file mode 100644
index 000000000..61ddc776b
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/1-Visualize/README.md
@@ -0,0 +1,338 @@
+# 聚类简介
+
+聚类是一种[无监督学习](https://wikipedia.org/wiki/Unsupervised_learning)方法,假设数据集是未标记的,或者其输入未与预定义的输出匹配。它使用各种算法对未标记的数据进行分类,并根据数据中识别出的模式提供分组。
+
+[](https://youtu.be/ty2advRiWJM "PSquare的《No One Like You》")
+
+> 🎥 点击上方图片观看视频。在学习聚类机器学习时,欣赏一些尼日利亚舞厅音乐——这是PSquare在2014年发布的一首高评价歌曲。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+### 简介
+
+[聚类](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124)在数据探索中非常有用。让我们看看它是否能帮助发现尼日利亚观众消费音乐的趋势和模式。
+
+✅ 花一分钟思考聚类的用途。在现实生活中,聚类发生在你有一堆洗好的衣物并需要将家人衣物分类时 🧦👕👖🩲。在数据科学中,聚类发生在试图分析用户偏好或确定任何未标记数据集的特征时。某种程度上,聚类帮助我们从混乱中找到秩序,比如整理袜子抽屉。
+
+[](https://youtu.be/esmzYhuFnds "聚类简介")
+
+> 🎥 点击上方图片观看视频:麻省理工学院的John Guttag介绍聚类
+
+在专业环境中,聚类可以用于确定市场细分,例如确定哪些年龄段购买哪些商品。另一个用途是异常检测,比如从信用卡交易数据集中检测欺诈行为。或者你可以用聚类来识别一批医学扫描中的肿瘤。
+
+✅ 花一分钟思考你可能在银行、电子商务或商业环境中遇到过的聚类应用。
+
+> 🎓 有趣的是,聚类分析起源于20世纪30年代的人类学和心理学领域。你能想象它可能是如何被使用的吗?
+
+另外,你可以用它来对搜索结果进行分组——例如按购物链接、图片或评论分组。当你有一个大型数据集需要简化并进行更细致的分析时,聚类技术非常有用,因此它可以在构建其他模型之前帮助了解数据。
+
+✅ 一旦你的数据被组织成簇,你可以为其分配一个簇ID。这种技术在保护数据集隐私时非常有用;你可以通过簇ID而不是更具识别性的详细数据来引用数据点。你能想到其他使用簇ID而不是簇内元素来标识数据的原因吗?
+
+通过这个[学习模块](https://docs.microsoft.com/learn/modules/train-evaluate-cluster-models?WT.mc_id=academic-77952-leestott)深入了解聚类技术。
+
+## 聚类入门
+
+[Scikit-learn提供了大量方法](https://scikit-learn.org/stable/modules/clustering.html)来执行聚类。你选择的方法将取决于你的使用场景。根据文档,每种方法都有不同的优势。以下是Scikit-learn支持的方法及其适用场景的简化表格:
+
+| 方法名称 | 使用场景 |
+| :--------------------------- | :--------------------------------------------------------------------- |
+| K-Means | 通用目的,归纳式 |
+| Affinity propagation | 多个、不均匀簇,归纳式 |
+| Mean-shift | 多个、不均匀簇,归纳式 |
+| Spectral clustering | 少量、均匀簇,推断式 |
+| Ward hierarchical clustering | 多个、受约束簇,推断式 |
+| Agglomerative clustering | 多个、受约束、非欧几里得距离,推断式 |
+| DBSCAN | 非平面几何、不均匀簇,推断式 |
+| OPTICS | 非平面几何、不均匀簇且密度可变,推断式 |
+| Gaussian mixtures | 平面几何,归纳式 |
+| BIRCH | 大型数据集且有异常值,归纳式 |
+
+> 🎓 我们如何创建簇与我们如何将数据点分组到簇中有很大关系。让我们解读一些术语:
+>
+> 🎓 ['推断式' vs. '归纳式'](https://wikipedia.org/wiki/Transduction_(machine_learning))
+>
+> 推断式推理基于观察到的训练案例映射到特定测试案例。归纳式推理基于训练案例映射到一般规则,然后应用于测试案例。
+>
+> 举个例子:假设你有一个部分标记的数据集。一些是“唱片”,一些是“CD”,还有一些是空白的。你的任务是为空白数据提供标签。如果你选择归纳式方法,你会训练一个模型寻找“唱片”和“CD”,并将这些标签应用于未标记数据。这种方法可能难以分类实际上是“磁带”的东西。而推断式方法则更有效地处理这些未知数据,因为它会努力将相似的项目分组,然后为整个组应用标签。在这种情况下,簇可能反映“圆形音乐物品”和“方形音乐物品”。
+>
+> 🎓 ['非平面' vs. '平面几何'](https://datascience.stackexchange.com/questions/52260/terminology-flat-geometry-in-the-context-of-clustering)
+>
+> 源自数学术语,非平面与平面几何指的是通过“平面”([欧几里得](https://wikipedia.org/wiki/Euclidean_geometry))或“非平面”(非欧几里得)几何方法测量点之间的距离。
+>
+>'平面'在此上下文中指的是欧几里得几何(部分内容被称为“平面”几何),而非平面指的是非欧几里得几何。几何与机器学习有什么关系?作为两个都根植于数学的领域,必须有一种通用方法来测量簇中点之间的距离,这可以根据数据的性质以“平面”或“非平面”的方式完成。[欧几里得距离](https://wikipedia.org/wiki/Euclidean_distance)是通过两点之间线段的长度来测量的。[非欧几里得距离](https://wikipedia.org/wiki/Non-Euclidean_geometry)则沿曲线测量。如果你的数据在可视化时似乎不在平面上,你可能需要使用专门的算法来处理它。
+>
+
+> 信息图由[Dasani Madipalli](https://twitter.com/dasani_decoded)制作
+>
+> 🎓 ['距离'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)
+>
+> 簇由其距离矩阵定义,例如点之间的距离。这种距离可以通过几种方式测量。欧几里得簇由点值的平均值定义,并包含一个“质心”或中心点。因此距离是通过到质心的距离来测量的。非欧几里得距离指的是“簇心”,即最接近其他点的点。簇心可以通过多种方式定义。
+>
+> 🎓 ['受约束'](https://wikipedia.org/wiki/Constrained_clustering)
+>
+> [受约束聚类](https://web.cs.ucdavis.edu/~davidson/Publications/ICDMTutorial.pdf)在这种无监督方法中引入了“半监督”学习。点之间的关系被标记为“不能链接”或“必须链接”,因此对数据集施加了一些规则。
+>
+>举个例子:如果一个算法在一批未标记或半标记数据上自由运行,它生成的簇可能质量较差。在上述例子中,簇可能会将“圆形音乐物品”、“方形音乐物品”、“三角形物品”和“饼干”分组。如果给出一些约束或规则(“物品必须是塑料制成的”,“物品需要能够产生音乐”),这可以帮助“约束”算法做出更好的选择。
+>
+> 🎓 '密度'
+>
+> 数据“噪声”被认为是“密集”的。每个簇中点之间的距离在检查时可能会更密集或更稀疏,因此需要使用适当的聚类方法来分析这些数据。[这篇文章](https://www.kdnuggets.com/2020/02/understanding-density-based-clustering.html)展示了使用K-Means聚类与HDBSCAN算法探索具有不均匀簇密度的噪声数据集的区别。
+
+## 聚类算法
+
+有超过100种聚类算法,其使用取决于手头数据的性质。让我们讨论一些主要的算法:
+
+- **层次聚类**。如果一个对象根据其与附近对象的接近程度而被分类,而不是与更远的对象,簇是基于其成员与其他对象的距离形成的。Scikit-learn的凝聚聚类是层次聚类。
+
+ 
+ > 信息图由[Dasani Madipalli](https://twitter.com/dasani_decoded)制作
+
+- **质心聚类**。这种流行的算法需要选择“k”,即要形成的簇数量,然后算法确定簇的中心点并围绕该点收集数据。[K均值聚类](https://wikipedia.org/wiki/K-means_clustering)是质心聚类的一种流行版本。中心点由最近的平均值确定,因此得名。簇的平方距离被最小化。
+
+ 
+ > 信息图由[Dasani Madipalli](https://twitter.com/dasani_decoded)制作
+
+- **基于分布的聚类**。基于统计建模,分布式聚类的核心是确定数据点属于某个簇的概率,并据此分配。高斯混合方法属于这一类型。
+
+- **基于密度的聚类**。数据点根据其密度或围绕彼此的分组被分配到簇中。远离组的数据点被认为是异常值或噪声。DBSCAN、Mean-shift和OPTICS属于这一类型的聚类。
+
+- **基于网格的聚类**。对于多维数据集,创建一个网格并将数据分配到网格的单元中,从而形成簇。
+
+## 练习 - 聚类你的数据
+
+聚类作为一种技术在适当的可视化帮助下效果更好,因此让我们通过可视化我们的音乐数据开始。这项练习将帮助我们决定针对这些数据的性质最有效使用哪种聚类方法。
+
+1. 打开此文件夹中的[_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/notebook.ipynb)。
+
+1. 导入`Seaborn`包以实现良好的数据可视化。
+
+ ```python
+ !pip install seaborn
+ ```
+
+1. 从[_nigerian-songs.csv_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/data/nigerian-songs.csv)中追加歌曲数据。加载一个包含歌曲数据的数据框。通过导入库并输出数据准备探索这些数据:
+
+ ```python
+ import matplotlib.pyplot as plt
+ import pandas as pd
+
+ df = pd.read_csv("../data/nigerian-songs.csv")
+ df.head()
+ ```
+
+ 检查数据的前几行:
+
+ | | 名称 | 专辑 | 艺术家 | 艺术家主要风格 | 发行日期 | 时长 | 热度 | 舞蹈性 | 声学性 | 能量 | 器乐性 | 现场感 | 响度 | 语音性 | 节奏 | 拍号 |
+ | --- | ------------------------ | ---------------------------- | ------------------- | ---------------- | ------------ | ------ | ---------- | ------------ | ------------ | ------ | ---------------- | -------- | -------- | ----------- | ------- | -------------- |
+ | 0 | Sparky | Mandy & The Jungle | Cruel Santino | alternative r&b | 2019 | 144000 | 48 | 0.666 | 0.851 | 0.42 | 0.534 | 0.11 | -6.699 | 0.0829 | 133.015 | 5 |
+ | 1 | shuga rush | EVERYTHING YOU HEARD IS TRUE | Odunsi (The Engine) | afropop | 2020 | 89488 | 30 | 0.71 | 0.0822 | 0.683 | 0.000169 | 0.101 | -5.64 | 0.36 | 129.993 | 3 |
+| 2 | LITT! | LITT! | AYLØ | 独立R&B | 2018 | 207758 | 40 | 0.836 | 0.272 | 0.564 | 0.000537 | 0.11 | -7.127 | 0.0424 | 130.005 | 4 |
+| 3 | Confident / Feeling Cool | Enjoy Your Life | Lady Donli | 尼日利亚流行 | 2019 | 175135 | 14 | 0.894 | 0.798 | 0.611 | 0.000187 | 0.0964 | -4.961 | 0.113 | 111.087 | 4 |
+| 4 | wanted you | rare. | Odunsi (The Engine) | 非洲流行 | 2018 | 152049 | 25 | 0.702 | 0.116 | 0.833 | 0.91 | 0.348 | -6.044 | 0.0447 | 105.115 | 4 |
+
+1. 获取数据框的一些信息,调用 `info()`:
+
+ ```python
+ df.info()
+ ```
+
+ 输出如下所示:
+
+ ```output
+
+ RangeIndex: 530 entries, 0 to 529
+ Data columns (total 16 columns):
+ # Column Non-Null Count Dtype
+ --- ------ -------------- -----
+ 0 name 530 non-null object
+ 1 album 530 non-null object
+ 2 artist 530 non-null object
+ 3 artist_top_genre 530 non-null object
+ 4 release_date 530 non-null int64
+ 5 length 530 non-null int64
+ 6 popularity 530 non-null int64
+ 7 danceability 530 non-null float64
+ 8 acousticness 530 non-null float64
+ 9 energy 530 non-null float64
+ 10 instrumentalness 530 non-null float64
+ 11 liveness 530 non-null float64
+ 12 loudness 530 non-null float64
+ 13 speechiness 530 non-null float64
+ 14 tempo 530 non-null float64
+ 15 time_signature 530 non-null int64
+ dtypes: float64(8), int64(4), object(4)
+ memory usage: 66.4+ KB
+ ```
+
+1. 通过调用 `isnull()` 并验证总和为 0 来仔细检查是否有空值:
+
+ ```python
+ df.isnull().sum()
+ ```
+
+ 看起来不错:
+
+ ```output
+ name 0
+ album 0
+ artist 0
+ artist_top_genre 0
+ release_date 0
+ length 0
+ popularity 0
+ danceability 0
+ acousticness 0
+ energy 0
+ instrumentalness 0
+ liveness 0
+ loudness 0
+ speechiness 0
+ tempo 0
+ time_signature 0
+ dtype: int64
+ ```
+
+1. 描述数据:
+
+ ```python
+ df.describe()
+ ```
+
+ | | release_date | length | popularity | danceability | acousticness | energy | instrumentalness | liveness | loudness | speechiness | tempo | time_signature |
+ | ----- | ------------ | ----------- | ---------- | ------------ | ------------ | -------- | ---------------- | -------- | --------- | ----------- | ---------- | -------------- |
+ | count | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 | 530 |
+ | mean | 2015.390566 | 222298.1698 | 17.507547 | 0.741619 | 0.265412 | 0.760623 | 0.016305 | 0.147308 | -4.953011 | 0.130748 | 116.487864 | 3.986792 |
+ | std | 3.131688 | 39696.82226 | 18.992212 | 0.117522 | 0.208342 | 0.148533 | 0.090321 | 0.123588 | 2.464186 | 0.092939 | 23.518601 | 0.333701 |
+ | min | 1998 | 89488 | 0 | 0.255 | 0.000665 | 0.111 | 0 | 0.0283 | -19.362 | 0.0278 | 61.695 | 3 |
+ | 25% | 2014 | 199305 | 0 | 0.681 | 0.089525 | 0.669 | 0 | 0.07565 | -6.29875 | 0.0591 | 102.96125 | 4 |
+ | 50% | 2016 | 218509 | 13 | 0.761 | 0.2205 | 0.7845 | 0.000004 | 0.1035 | -4.5585 | 0.09795 | 112.7145 | 4 |
+ | 75% | 2017 | 242098.5 | 31 | 0.8295 | 0.403 | 0.87575 | 0.000234 | 0.164 | -3.331 | 0.177 | 125.03925 | 4 |
+ | max | 2020 | 511738 | 73 | 0.966 | 0.954 | 0.995 | 0.91 | 0.811 | 0.582 | 0.514 | 206.007 | 5 |
+
+> 🤔 如果我们正在使用聚类算法,这是一种不需要标签数据的无监督方法,为什么我们要展示带标签的数据?在数据探索阶段,这些标签很有用,但对于聚类算法来说并不是必要的。你完全可以移除列标题,并通过列号来引用数据。
+
+观察数据的一般值。注意,流行度可以为“0”,这表明歌曲没有排名。我们稍后会移除这些数据。
+
+1. 使用柱状图找出最受欢迎的音乐类型:
+
+ ```python
+ import seaborn as sns
+
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top[:5].index,y=top[:5].values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+ 
+
+✅ 如果你想看到更多的前几项,可以将 `[:5]` 改为更大的值,或者移除它以查看全部。
+
+注意,当最受欢迎的音乐类型被描述为“Missing”时,这意味着 Spotify 没有对其进行分类,因此我们需要移除它。
+
+1. 通过过滤移除缺失数据
+
+ ```python
+ df = df[df['artist_top_genre'] != 'Missing']
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top.index,y=top.values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+ 现在重新检查音乐类型:
+
+ 
+
+1. 显然,前三种音乐类型在这个数据集中占据主导地位。让我们专注于 `afro dancehall`、`afropop` 和 `nigerian pop`,并进一步过滤数据,移除流行度为 0 的数据(这意味着它在数据集中没有被分类为流行度,可以被视为噪声):
+
+ ```python
+ df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]
+ df = df[(df['popularity'] > 0)]
+ top = df['artist_top_genre'].value_counts()
+ plt.figure(figsize=(10,7))
+ sns.barplot(x=top.index,y=top.values)
+ plt.xticks(rotation=45)
+ plt.title('Top genres',color = 'blue')
+ ```
+
+1. 快速测试数据是否有特别强的相关性:
+
+ ```python
+ corrmat = df.corr(numeric_only=True)
+ f, ax = plt.subplots(figsize=(12, 9))
+ sns.heatmap(corrmat, vmax=.8, square=True)
+ ```
+
+ 
+
+ 唯一强相关的是 `energy` 和 `loudness`,这并不令人惊讶,因为响亮的音乐通常很有活力。除此之外,相关性相对较弱。看看聚类算法如何处理这些数据会很有趣。
+
+ > 🎓 注意,相关性并不意味着因果关系!我们有相关性的证据,但没有因果关系的证据。一个[有趣的网站](https://tylervigen.com/spurious-correlations)提供了一些视觉效果来强调这一点。
+
+在这个数据集中,歌曲的流行度和舞蹈性是否有任何收敛?一个 FacetGrid 显示出无论音乐类型如何,都有一些同心圆排列。是否可能尼日利亚的音乐品味在某种程度上对这一类型的舞蹈性趋于一致?
+
+✅ 尝试不同的数据点(如 energy、loudness、speechiness)以及更多或不同的音乐类型。你能发现什么?查看 `df.describe()` 表格以了解数据点的一般分布。
+
+### 练习 - 数据分布
+
+这三种音乐类型在舞蹈性和流行度的感知上是否显著不同?
+
+1. 检查我们前三种音乐类型在给定 x 和 y 轴上的流行度和舞蹈性数据分布。
+
+ ```python
+ sns.set_theme(style="ticks")
+
+ g = sns.jointplot(
+ data=df,
+ x="popularity", y="danceability", hue="artist_top_genre",
+ kind="kde",
+ )
+ ```
+
+ 你可以发现围绕一个一般收敛点的同心圆,显示数据点的分布。
+
+ > 🎓 注意,这个例子使用了一个 KDE(核密度估计)图,它通过连续概率密度曲线来表示数据。这使我们能够在处理多个分布时解释数据。
+
+ 总体而言,这三种音乐类型在流行度和舞蹈性方面大致对齐。确定这些松散对齐数据中的聚类将是一个挑战:
+
+ 
+
+1. 创建一个散点图:
+
+ ```python
+ sns.FacetGrid(df, hue="artist_top_genre", height=5) \
+ .map(plt.scatter, "popularity", "danceability") \
+ .add_legend()
+ ```
+
+ 同一轴上的散点图显示了类似的收敛模式
+
+ 
+
+通常,对于聚类,你可以使用散点图来显示数据的聚类,因此掌握这种可视化类型非常有用。在下一课中,我们将使用 k-means 聚类来探索这些数据中有趣的重叠群组。
+
+---
+
+## 🚀挑战
+
+为下一课做准备,制作一个关于你可能发现并在生产环境中使用的各种聚类算法的图表。聚类试图解决什么样的问题?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+在应用聚类算法之前,正如我们所学,了解数据集的性质是一个好主意。阅读更多相关内容[这里](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)
+
+[这篇有用的文章](https://www.freecodecamp.org/news/8-clustering-algorithms-in-machine-learning-that-all-data-scientists-should-know/)带你了解不同聚类算法在不同数据形状下的表现。
+
+## 作业
+
+[研究其他用于聚类的可视化方法](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/1-Visualize/assignment.md b/translations/zh-CN/5-Clustering/1-Visualize/assignment.md
new file mode 100644
index 000000000..8500d5c5f
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/1-Visualize/assignment.md
@@ -0,0 +1,16 @@
+# 研究其他用于聚类的可视化方法
+
+## 说明
+
+在本课中,你已经学习了一些可视化技术,以便在进行聚类之前对数据进行绘图。散点图尤其适合用于发现对象的分组。研究创建散点图的不同方法和不同库,并在笔记本中记录你的研究成果。你可以使用本课的数据、其他课程的数据,或者自己找到的数据(不过,请在笔记本中注明数据来源)。使用散点图绘制一些数据,并解释你的发现。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | -------------------------------------------------------------- | ---------------------------------------------------------------------------------------- | ----------------------------------- |
+| | 提交的笔记本包含五个记录详尽的散点图 | 提交的笔记本包含少于五个散点图,且记录不够详尽 | 提交的笔记本不完整 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/1-Visualize/notebook.ipynb b/translations/zh-CN/5-Clustering/1-Visualize/notebook.ipynb
new file mode 100644
index 000000000..0b6a5b659
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/1-Visualize/notebook.ipynb
@@ -0,0 +1,50 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.3"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python383jvsc74a57bd0e134e05457d34029b6460cd73bbf1ed73f339b5b6d98c95be70b69eba114fe95",
+ "display_name": "Python 3.8.3 64-bit (conda)"
+ },
+ "coopTranslator": {
+ "original_hash": "40e0707e96b3e1899a912776006264f9",
+ "translation_date": "2025-09-03T20:01:48+00:00",
+ "source_file": "5-Clustering/1-Visualize/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/1-Visualize/solution/Julia/README.md b/translations/zh-CN/5-Clustering/1-Visualize/solution/Julia/README.md
new file mode 100644
index 000000000..779236745
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/1-Visualize/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb b/translations/zh-CN/5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb
new file mode 100644
index 000000000..5a0a7b638
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb
@@ -0,0 +1,493 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## **从Spotify抓取的尼日利亚音乐分析**\n",
+ "\n",
+ "聚类是一种[无监督学习](https://wikipedia.org/wiki/Unsupervised_learning)方法,假设数据集是未标记的,或者其输入未与预定义的输出匹配。它使用各种算法对未标记的数据进行分类,并根据数据中识别出的模式提供分组。\n",
+ "\n",
+ "[**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/27/)\n",
+ "\n",
+ "### **简介**\n",
+ "\n",
+ "[聚类](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124)在数据探索中非常有用。让我们看看它是否能帮助发现尼日利亚观众消费音乐的趋势和模式。\n",
+ "\n",
+ "> ✅ 花一分钟思考一下聚类的用途。在现实生活中,聚类就像你有一堆洗好的衣服,需要将家人各自的衣物分类🧦👕👖🩲。在数据科学中,聚类发生在分析用户偏好或确定任何未标记数据集的特征时。聚类在某种程度上帮助我们从混乱中找到秩序,比如整理袜子抽屉。\n",
+ "\n",
+ "在专业环境中,聚类可以用于市场细分,例如确定哪些年龄段购买哪些商品。另一个用途是异常检测,比如从信用卡交易数据集中检测欺诈行为。或者,你可以用聚类来识别一批医学扫描中的肿瘤。\n",
+ "\n",
+ "✅ 花一分钟思考一下你在银行、电子商务或商业环境中可能遇到过的聚类应用。\n",
+ "\n",
+ "> 🎓 有趣的是,聚类分析起源于20世纪30年代的人类学和心理学领域。你能想象它可能是如何被使用的吗?\n",
+ "\n",
+ "另外,你可以用它来对搜索结果进行分组——例如按购物链接、图片或评论分组。当你有一个大型数据集需要简化并进行更细致的分析时,聚类技术非常有用,因此它可以在构建其他模型之前帮助了解数据。\n",
+ "\n",
+ "✅ 一旦你的数据被组织成聚类,你可以为其分配一个聚类ID。这种技术在保护数据集隐私时非常有用;你可以用聚类ID来引用数据点,而不是使用更具识别性的详细数据。你能想到其他使用聚类ID而不是聚类中其他元素来标识数据点的原因吗?\n",
+ "\n",
+ "### 开始学习聚类\n",
+ "\n",
+ "> 🎓 我们如何创建聚类与我们如何将数据点分组密切相关。让我们来解读一些术语:\n",
+ ">\n",
+ "> 🎓 ['传导性' vs. '归纳性'](https://wikipedia.org/wiki/Transduction_(machine_learning))\n",
+ ">\n",
+ "> 传导性推理是从观察到的训练案例中得出的,这些案例映射到特定的测试案例。归纳性推理是从训练案例中得出的,这些案例映射到一般规则,然后才应用于测试案例。\n",
+ ">\n",
+ "> 举个例子:假设你有一个部分标记的数据集。一些是“唱片”,一些是“CD”,还有一些是空白的。你的任务是为空白部分提供标签。如果你选择归纳方法,你会训练一个模型寻找“唱片”和“CD”,并将这些标签应用于未标记的数据。这种方法可能难以分类实际上是“磁带”的东西。而传导方法则更有效地处理这些未知数据,因为它会将相似的项目分组,然后为整个组应用一个标签。在这种情况下,聚类可能反映“圆形音乐物品”和“方形音乐物品”。\n",
+ ">\n",
+ "> 🎓 ['非平面' vs. '平面'几何](https://datascience.stackexchange.com/questions/52260/terminology-flat-geometry-in-the-context-of-clustering)\n",
+ ">\n",
+ "> 源于数学术语,非平面与平面几何指的是通过“平面”([欧几里得](https://wikipedia.org/wiki/Euclidean_geometry))或“非平面”(非欧几里得)几何方法测量点之间的距离。\n",
+ ">\n",
+ "> 在此上下文中,“平面”指的是欧几里得几何(部分被称为“平面”几何),而“非平面”指的是非欧几里得几何。几何与机器学习有什么关系?作为两个都以数学为基础的领域,必须有一种通用的方法来测量聚类中点之间的距离,这可以根据数据的性质以“平面”或“非平面”的方式进行。[欧几里得距离](https://wikipedia.org/wiki/Euclidean_distance)是通过两点之间线段的长度来测量的。[非欧几里得距离](https://wikipedia.org/wiki/Non-Euclidean_geometry)则沿曲线测量。如果你的数据在可视化时似乎不在一个平面上,你可能需要使用专门的算法来处理它。\n",
+ "\n",
+ "\n",
+ " \n",
+ " Dasani Madipalli制作的信息图 \n",
+ "\n",
+ "> 🎓 ['距离'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)\n",
+ ">\n",
+ "> 聚类由其距离矩阵定义,例如点之间的距离。这种距离可以通过几种方式测量。欧几里得聚类由点值的平均值定义,并包含一个“质心”或中心点。因此,距离是通过到质心的距离来测量的。非欧几里得距离指的是“聚心”,即最接近其他点的点。聚心可以通过多种方式定义。\n",
+ ">\n",
+ "> 🎓 ['约束'](https://wikipedia.org/wiki/Constrained_clustering)\n",
+ ">\n",
+ "> [约束聚类](https://web.cs.ucdavis.edu/~davidson/Publications/ICDMTutorial.pdf)在这种无监督方法中引入了“半监督”学习。点之间的关系被标记为“不能链接”或“必须链接”,因此对数据集施加了一些规则。\n",
+ ">\n",
+ "> 举个例子:如果一个算法在一批未标记或半标记的数据上自由运行,它生成的聚类可能质量较差。在上面的例子中,聚类可能会将“圆形音乐物品”、“方形音乐物品”、“三角形物品”和“饼干”分组。如果给出一些约束或规则(“物品必须是塑料制成的”,“物品需要能够产生音乐”),这可以帮助“约束”算法做出更好的选择。\n",
+ ">\n",
+ "> 🎓 '密度'\n",
+ ">\n",
+ "> 数据“噪声”被认为是“密集”的。每个聚类中点之间的距离可能在检查时表现为更密集或更稀疏,因此需要使用适当的聚类方法进行分析。[这篇文章](https://www.kdnuggets.com/2020/02/understanding-density-based-clustering.html)展示了使用K均值聚类与HDBSCAN算法探索具有不均匀聚类密度的噪声数据集的区别。\n",
+ "\n",
+ "通过这个[学习模块](https://docs.microsoft.com/learn/modules/train-evaluate-cluster-models?WT.mc_id=academic-77952-leestott)加深对聚类技术的理解。\n",
+ "\n",
+ "### **聚类算法**\n",
+ "\n",
+ "有超过100种聚类算法,其使用取决于手头数据的性质。让我们讨论一些主要的算法:\n",
+ "\n",
+ "- **层次聚类**。如果一个对象根据其与附近对象的接近程度而被分类,而不是与更远的对象,聚类是根据其成员与其他对象的距离形成的。层次聚类的特点是反复合并两个聚类。\n",
+ "\n",
+ "
\n",
+ " \n",
+ " Dasani Madipalli制作的信息图 \n",
+ "\n",
+ "- **质心聚类**。这种流行的算法需要选择“k”,即要形成的聚类数量,然后算法确定聚类的中心点并围绕该点收集数据。[K均值聚类](https://wikipedia.org/wiki/K-means_clustering)是质心聚类的一种流行版本,它将数据集分为预定义的K组。中心点由最近的平均值确定,因此得名。聚类的平方距离被最小化。\n",
+ "\n",
+ "
\n",
+ " \n",
+ " Dasani Madipalli制作的信息图 \n",
+ "\n",
+ "- **基于分布的聚类**。基于统计建模,分布式聚类的核心是确定数据点属于某个聚类的概率,并据此分配。高斯混合方法属于这一类型。\n",
+ "\n",
+ "- **基于密度的聚类**。数据点根据其密度或围绕彼此的分组被分配到聚类中。远离组的数据点被认为是异常值或噪声。DBSCAN、Mean-shift和OPTICS属于这一类型的聚类。\n",
+ "\n",
+ "- **基于网格的聚类**。对于多维数据集,创建一个网格并将数据分配到网格的单元中,从而形成聚类。\n",
+ "\n",
+ "学习聚类的最佳方法是亲自尝试,这正是你将在本练习中做的。\n",
+ "\n",
+ "我们需要一些包来完成这个模块。你可以通过以下方式安装它们:`install.packages(c('tidyverse', 'tidymodels', 'DataExplorer', 'summarytools', 'plotly', 'paletteer', 'corrplot', 'patchwork'))`\n",
+ "\n",
+ "或者,下面的脚本会检查你是否拥有完成此模块所需的包,并在缺少时为你安装它们。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "suppressWarnings(if(!require(\"pacman\")) install.packages(\"pacman\"))\r\n",
+ "\r\n",
+ "pacman::p_load('tidyverse', 'tidymodels', 'DataExplorer', 'summarytools', 'plotly', 'paletteer', 'corrplot', 'patchwork')\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 练习 - 对数据进行聚类\n",
+ "\n",
+ "聚类作为一种技术,通过适当的可视化可以大大提高效果,因此让我们从可视化音乐数据开始。这项练习将帮助我们决定哪种聚类方法最适合用于处理这些数据的特性。\n",
+ "\n",
+ "让我们立即开始,导入数据。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Load the core tidyverse and make it available in your current R session\r\n",
+ "library(tidyverse)\r\n",
+ "\r\n",
+ "# Import the data into a tibble\r\n",
+ "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/5-Clustering/data/nigerian-songs.csv\")\r\n",
+ "\r\n",
+ "# View the first 5 rows of the data set\r\n",
+ "df %>% \r\n",
+ " slice_head(n = 5)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "有时候,我们可能希望对数据有更多的了解。我们可以通过使用 [*glimpse()*](https://pillar.r-lib.org/reference/glimpse.html) 函数来查看 `数据` 和 `其结构`:\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Glimpse into the data set\r\n",
+ "df %>% \r\n",
+ " glimpse()\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "干得好!💪\n",
+ "\n",
+ "我们可以看到,`glimpse()` 会显示数据集的总行数(观测值)和列数(变量),然后在变量名称后按行显示每个变量的前几个条目。此外,变量的*数据类型*会紧跟在每个变量名称后面,用 `< >` 表示。\n",
+ "\n",
+ "`DataExplorer::introduce()` 可以将这些信息整齐地总结出来:\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Describe basic information for our data\r\n",
+ "df %>% \r\n",
+ " introduce()\r\n",
+ "\r\n",
+ "# A visual display of the same\r\n",
+ "df %>% \r\n",
+ " plot_intro()\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "太棒了!我们刚刚了解到我们的数据没有缺失值。\n",
+ "\n",
+ "既然如此,我们可以使用 `summarytools::descr()` 来探索常见的集中趋势统计(例如 [均值](https://en.wikipedia.org/wiki/Arithmetic_mean) 和 [中位数](https://en.wikipedia.org/wiki/Median))以及离散程度的度量(例如 [标准差](https://en.wikipedia.org/wiki/Standard_deviation))。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Describe common statistics\r\n",
+ "df %>% \r\n",
+ " descr(stats = \"common\")\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "让我们来看一下数据的总体值。请注意,流行度可以为 `0`,这表示没有排名的歌曲。我们稍后会将这些移除。\n",
+ "\n",
+ "> 🤔 如果我们正在使用聚类,这是一种不需要标签数据的无监督方法,为什么我们还要展示带有标签的数据呢?在数据探索阶段,这些标签非常有用,但它们并不是聚类算法运行所必需的。\n",
+ "\n",
+ "### 1. 探索流行的音乐类型\n",
+ "\n",
+ "让我们继续找出最流行的音乐类型 🎶,通过统计它出现的次数来实现。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Popular genres\r\n",
+ "top_genres <- df %>% \r\n",
+ " count(artist_top_genre, sort = TRUE) %>% \r\n",
+ "# Encode to categorical and reorder the according to count\r\n",
+ " mutate(artist_top_genre = factor(artist_top_genre) %>% fct_inorder())\r\n",
+ "\r\n",
+ "# Print the top genres\r\n",
+ "top_genres\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "那很顺利!人们常说“一张图片胜过千行数据框”(其实没人这么说过 😅)。但你明白我的意思,对吧?\n",
+ "\n",
+ "可视化分类数据(字符或因子变量)的一种方法是使用柱状图。让我们绘制一个前10大流派的柱状图:\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Change the default gray theme\r\n",
+ "theme_set(theme_light())\r\n",
+ "\r\n",
+ "# Visualize popular genres\r\n",
+ "top_genres %>%\r\n",
+ " slice(1:10) %>% \r\n",
+ " ggplot(mapping = aes(x = artist_top_genre, y = n,\r\n",
+ " fill = artist_top_genre)) +\r\n",
+ " geom_col(alpha = 0.8) +\r\n",
+ " paletteer::scale_fill_paletteer_d(\"rcartocolor::Vivid\") +\r\n",
+ " ggtitle(\"Top genres\") +\r\n",
+ " theme(plot.title = element_text(hjust = 0.5),\r\n",
+ " # Rotates the X markers (so we can read them)\r\n",
+ " axis.text.x = element_text(angle = 90))\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "现在更容易发现我们有`缺失`的音乐类型了 🧐!\n",
+ "\n",
+ "> 一个好的可视化能够展示你意想不到的内容,或者引发你对数据的新疑问 —— Hadley Wickham 和 Garrett Grolemund,《R For Data Science》(https://r4ds.had.co.nz/introduction.html)\n",
+ "\n",
+ "注意,当主要音乐类型被描述为`缺失`时,这意味着Spotify没有对其进行分类,所以我们需要将其去除。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Visualize popular genres\r\n",
+ "top_genres %>%\r\n",
+ " filter(artist_top_genre != \"Missing\") %>% \r\n",
+ " slice(1:10) %>% \r\n",
+ " ggplot(mapping = aes(x = artist_top_genre, y = n,\r\n",
+ " fill = artist_top_genre)) +\r\n",
+ " geom_col(alpha = 0.8) +\r\n",
+ " paletteer::scale_fill_paletteer_d(\"rcartocolor::Vivid\") +\r\n",
+ " ggtitle(\"Top genres\") +\r\n",
+ " theme(plot.title = element_text(hjust = 0.5),\r\n",
+ " # Rotates the X markers (so we can read them)\r\n",
+ " axis.text.x = element_text(angle = 90))\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "通过初步的数据探索,我们了解到前三大音乐类型在这个数据集中占据主导地位。让我们专注于 `afro dancehall`、`afropop` 和 `nigerian pop`,并进一步过滤数据集,去除任何流行度值为 0 的条目(这意味着这些条目在数据集中未被分类为流行,且对于我们的目的来说可以视为噪声):\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "nigerian_songs <- df %>% \r\n",
+ " # Concentrate on top 3 genres\r\n",
+ " filter(artist_top_genre %in% c(\"afro dancehall\", \"afropop\",\"nigerian pop\")) %>% \r\n",
+ " # Remove unclassified observations\r\n",
+ " filter(popularity != 0)\r\n",
+ "\r\n",
+ "\r\n",
+ "\r\n",
+ "# Visualize popular genres\r\n",
+ "nigerian_songs %>%\r\n",
+ " count(artist_top_genre) %>%\r\n",
+ " ggplot(mapping = aes(x = artist_top_genre, y = n,\r\n",
+ " fill = artist_top_genre)) +\r\n",
+ " geom_col(alpha = 0.8) +\r\n",
+ " paletteer::scale_fill_paletteer_d(\"ggsci::category10_d3\") +\r\n",
+ " ggtitle(\"Top genres\") +\r\n",
+ " theme(plot.title = element_text(hjust = 0.5))\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "让我们看看数据集中数值变量之间是否存在明显的线性关系。这种关系可以通过[相关统计量](https://en.wikipedia.org/wiki/Correlation)在数学上进行量化。\n",
+ "\n",
+ "相关统计量是一个介于 -1 和 1 之间的值,用于表示关系的强度。大于 0 的值表示*正相关*(一个变量的高值往往与另一个变量的高值同时出现),而小于 0 的值表示*负相关*(一个变量的高值往往与另一个变量的低值同时出现)。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Narrow down to numeric variables and fid correlation\r\n",
+ "corr_mat <- nigerian_songs %>% \r\n",
+ " select(where(is.numeric)) %>% \r\n",
+ " cor()\r\n",
+ "\r\n",
+ "# Visualize correlation matrix\r\n",
+ "corrplot(corr_mat, order = 'AOE', col = c('white', 'black'), bg = 'gold2') \r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "数据之间的相关性并不强,除了 `energy` 和 `loudness` 之间的关系,这很合理,因为响亮的音乐通常充满活力。`Popularity` 与 `release date` 也有一定的对应关系,这也合乎逻辑,因为较新的歌曲可能更受欢迎。长度和能量似乎也存在一定的相关性。\n",
+ "\n",
+ "看看聚类算法如何处理这些数据会很有趣!\n",
+ "\n",
+ "> 🎓 请注意,相关性并不意味着因果关系!我们有相关性的证据,但没有因果关系的证明。一个[有趣的网站](https://tylervigen.com/spurious-correlations)提供了一些视觉化内容来强调这一点。\n",
+ "\n",
+ "### 2. 探索数据分布\n",
+ "\n",
+ "让我们提出一些更微妙的问题。基于流行度,不同的音乐类型在舞蹈性上的感知是否有显著差异?让我们使用[密度图](https://www.khanacademy.org/math/ap-statistics/density-curves-normal-distribution-ap/density-curves/v/density-curves)沿着给定的 x 和 y 轴来检查我们前三大音乐类型在流行度和舞蹈性上的数据分布。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# Perform 2D kernel density estimation\r\n",
+ "density_estimate_2d <- nigerian_songs %>% \r\n",
+ " ggplot(mapping = aes(x = popularity, y = danceability, color = artist_top_genre)) +\r\n",
+ " geom_density_2d(bins = 5, size = 1) +\r\n",
+ " paletteer::scale_color_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
+ " xlim(-20, 80) +\r\n",
+ " ylim(0, 1.2)\r\n",
+ "\r\n",
+ "# Density plot based on the popularity\r\n",
+ "density_estimate_pop <- nigerian_songs %>% \r\n",
+ " ggplot(mapping = aes(x = popularity, fill = artist_top_genre, color = artist_top_genre)) +\r\n",
+ " geom_density(size = 1, alpha = 0.5) +\r\n",
+ " paletteer::scale_fill_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
+ " paletteer::scale_color_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
+ " theme(legend.position = \"none\")\r\n",
+ "\r\n",
+ "# Density plot based on the danceability\r\n",
+ "density_estimate_dance <- nigerian_songs %>% \r\n",
+ " ggplot(mapping = aes(x = danceability, fill = artist_top_genre, color = artist_top_genre)) +\r\n",
+ " geom_density(size = 1, alpha = 0.5) +\r\n",
+ " paletteer::scale_fill_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
+ " paletteer::scale_color_paletteer_d(\"RSkittleBrewer::wildberry\")\r\n",
+ "\r\n",
+ "\r\n",
+ "# Patch everything together\r\n",
+ "library(patchwork)\r\n",
+ "density_estimate_2d / (density_estimate_pop + density_estimate_dance)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "我们发现,无论是哪种类型,都有同心圆对齐的现象。难道尼日利亚人的品味在这个类型的某种舞蹈性水平上趋于一致?\n",
+ "\n",
+ "总体来说,这三种类型在受欢迎程度和舞蹈性方面是相符的。在这种松散对齐的数据中确定聚类将是一个挑战。让我们看看散点图是否能对此提供支持。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [
+ "# A scatter plot of popularity and danceability\r\n",
+ "scatter_plot <- nigerian_songs %>% \r\n",
+ " ggplot(mapping = aes(x = popularity, y = danceability, color = artist_top_genre, shape = artist_top_genre)) +\r\n",
+ " geom_point(size = 2, alpha = 0.8) +\r\n",
+ " paletteer::scale_color_paletteer_d(\"futurevisions::mars\")\r\n",
+ "\r\n",
+ "# Add a touch of interactivity\r\n",
+ "ggplotly(scatter_plot)\r\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "同一坐标轴的散点图显示了类似的收敛模式。\n",
+ "\n",
+ "通常来说,在聚类分析中,你可以使用散点图来展示数据的聚类情况,因此掌握这种可视化方法非常有用。在下一节课中,我们将使用过滤后的数据,并通过 k-means 聚类来发现数据中以有趣方式重叠的群组。\n",
+ "\n",
+ "## **🚀 挑战**\n",
+ "\n",
+ "为下一节课做准备,制作一张关于各种聚类算法的图表,这些算法可能会在生产环境中被发现和使用。聚类试图解决哪些类型的问题?\n",
+ "\n",
+ "## [**课后测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/28/)\n",
+ "\n",
+ "## **复习与自学**\n",
+ "\n",
+ "在应用聚类算法之前,正如我们所学的,了解数据集的性质是一个好主意。你可以在[这里](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)阅读更多相关内容。\n",
+ "\n",
+ "加深对聚类技术的理解:\n",
+ "\n",
+ "- [使用 Tidymodels 和相关工具训练和评估聚类模型](https://rpubs.com/eR_ic/clustering)\n",
+ "\n",
+ "- Bradley Boehmke 和 Brandon Greenwell 的[*Hands-On Machine Learning with R*](https://bradleyboehmke.github.io/HOML/)*.*\n",
+ "\n",
+ "## **作业**\n",
+ "\n",
+ "[研究其他用于聚类的可视化方法](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/assignment.md)\n",
+ "\n",
+ "## 特别感谢:\n",
+ "\n",
+ "[Jen Looper](https://www.twitter.com/jenlooper) 创建了本模块的原始 Python 版本 ♥️\n",
+ "\n",
+ "[`Dasani Madipalli`](https://twitter.com/dasani_decoded) 创作了精彩的插图,使机器学习概念更易于理解和解释。\n",
+ "\n",
+ "祝学习愉快,\n",
+ "\n",
+ "[Eric](https://twitter.com/ericntay),Gold Microsoft Learn 学生大使。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "anaconda-cloud": "",
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.4.1"
+ },
+ "coopTranslator": {
+ "original_hash": "99c36449cad3708a435f6798cfa39972",
+ "translation_date": "2025-09-03T20:07:40+00:00",
+ "source_file": "5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/1-Visualize/solution/notebook.ipynb b/translations/zh-CN/5-Clustering/1-Visualize/solution/notebook.ipynb
new file mode 100644
index 000000000..ed5ece885
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/1-Visualize/solution/notebook.ipynb
@@ -0,0 +1,853 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Defaulting to user installation because normal site-packages is not writeable\n",
+ "Requirement already satisfied: seaborn in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (0.11.2)\n",
+ "Requirement already satisfied: matplotlib>=2.2 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from seaborn) (3.5.0)\n",
+ "Requirement already satisfied: numpy>=1.15 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from seaborn) (1.21.4)\n",
+ "Requirement already satisfied: pandas>=0.23 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from seaborn) (1.3.4)\n",
+ "Requirement already satisfied: scipy>=1.0 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from seaborn) (1.7.2)\n",
+ "Requirement already satisfied: fonttools>=4.22.0 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (4.28.1)\n",
+ "Requirement already satisfied: pyparsing>=2.2.1 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (2.4.7)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (1.3.2)\n",
+ "Requirement already satisfied: pillow>=6.2.0 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (8.4.0)\n",
+ "Requirement already satisfied: cycler>=0.10 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (0.11.0)\n",
+ "Requirement already satisfied: packaging>=20.0 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (21.2)\n",
+ "Requirement already satisfied: setuptools-scm>=4 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (6.3.2)\n",
+ "Requirement already satisfied: python-dateutil>=2.7 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from matplotlib>=2.2->seaborn) (2.8.2)\n",
+ "Requirement already satisfied: pytz>=2017.3 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from pandas>=0.23->seaborn) (2021.3)\n",
+ "Requirement already satisfied: six>=1.5 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from python-dateutil>=2.7->matplotlib>=2.2->seaborn) (1.16.0)\n",
+ "Requirement already satisfied: tomli>=1.0.0 in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from setuptools-scm>=4->matplotlib>=2.2->seaborn) (1.2.2)\n",
+ "Requirement already satisfied: setuptools in /Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages (from setuptools-scm>=4->matplotlib>=2.2->seaborn) (59.1.1)\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "!pip install seaborn"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " name \n",
+ " album \n",
+ " artist \n",
+ " artist_top_genre \n",
+ " release_date \n",
+ " length \n",
+ " popularity \n",
+ " danceability \n",
+ " acousticness \n",
+ " energy \n",
+ " instrumentalness \n",
+ " liveness \n",
+ " loudness \n",
+ " speechiness \n",
+ " tempo \n",
+ " time_signature \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " Sparky \n",
+ " Mandy & The Jungle \n",
+ " Cruel Santino \n",
+ " alternative r&b \n",
+ " 2019 \n",
+ " 144000 \n",
+ " 48 \n",
+ " 0.666 \n",
+ " 0.8510 \n",
+ " 0.420 \n",
+ " 0.534000 \n",
+ " 0.1100 \n",
+ " -6.699 \n",
+ " 0.0829 \n",
+ " 133.015 \n",
+ " 5 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " shuga rush \n",
+ " EVERYTHING YOU HEARD IS TRUE \n",
+ " Odunsi (The Engine) \n",
+ " afropop \n",
+ " 2020 \n",
+ " 89488 \n",
+ " 30 \n",
+ " 0.710 \n",
+ " 0.0822 \n",
+ " 0.683 \n",
+ " 0.000169 \n",
+ " 0.1010 \n",
+ " -5.640 \n",
+ " 0.3600 \n",
+ " 129.993 \n",
+ " 3 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " LITT! \n",
+ " LITT! \n",
+ " AYLØ \n",
+ " indie r&b \n",
+ " 2018 \n",
+ " 207758 \n",
+ " 40 \n",
+ " 0.836 \n",
+ " 0.2720 \n",
+ " 0.564 \n",
+ " 0.000537 \n",
+ " 0.1100 \n",
+ " -7.127 \n",
+ " 0.0424 \n",
+ " 130.005 \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " Confident / Feeling Cool \n",
+ " Enjoy Your Life \n",
+ " Lady Donli \n",
+ " nigerian pop \n",
+ " 2019 \n",
+ " 175135 \n",
+ " 14 \n",
+ " 0.894 \n",
+ " 0.7980 \n",
+ " 0.611 \n",
+ " 0.000187 \n",
+ " 0.0964 \n",
+ " -4.961 \n",
+ " 0.1130 \n",
+ " 111.087 \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " wanted you \n",
+ " rare. \n",
+ " Odunsi (The Engine) \n",
+ " afropop \n",
+ " 2018 \n",
+ " 152049 \n",
+ " 25 \n",
+ " 0.702 \n",
+ " 0.1160 \n",
+ " 0.833 \n",
+ " 0.910000 \n",
+ " 0.3480 \n",
+ " -6.044 \n",
+ " 0.0447 \n",
+ " 105.115 \n",
+ " 4 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " name album \\\n",
+ "0 Sparky Mandy & The Jungle \n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "2 LITT! LITT! \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "0 Cruel Santino alternative r&b 2019 144000 48 \n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "2 AYLØ indie r&b 2018 207758 40 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "0 0.666 0.8510 0.420 0.534000 0.1100 -6.699 \n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "2 0.836 0.2720 0.564 0.000537 0.1100 -7.127 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "0 0.0829 133.015 5 \n",
+ "1 0.3600 129.993 3 \n",
+ "2 0.0424 130.005 4 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 "
+ ]
+ },
+ "execution_count": 4,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df = pd.read_csv(\"../../data/nigerian-songs.csv\")\n",
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "获取有关数据框的信息\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "\n",
+ "RangeIndex: 530 entries, 0 to 529\n",
+ "Data columns (total 16 columns):\n",
+ " # Column Non-Null Count Dtype \n",
+ "--- ------ -------------- ----- \n",
+ " 0 name 530 non-null object \n",
+ " 1 album 530 non-null object \n",
+ " 2 artist 530 non-null object \n",
+ " 3 artist_top_genre 530 non-null object \n",
+ " 4 release_date 530 non-null int64 \n",
+ " 5 length 530 non-null int64 \n",
+ " 6 popularity 530 non-null int64 \n",
+ " 7 danceability 530 non-null float64\n",
+ " 8 acousticness 530 non-null float64\n",
+ " 9 energy 530 non-null float64\n",
+ " 10 instrumentalness 530 non-null float64\n",
+ " 11 liveness 530 non-null float64\n",
+ " 12 loudness 530 non-null float64\n",
+ " 13 speechiness 530 non-null float64\n",
+ " 14 tempo 530 non-null float64\n",
+ " 15 time_signature 530 non-null int64 \n",
+ "dtypes: float64(8), int64(4), object(4)\n",
+ "memory usage: 66.4+ KB\n"
+ ]
+ }
+ ],
+ "source": [
+ "df.info()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "name 0\n",
+ "album 0\n",
+ "artist 0\n",
+ "artist_top_genre 0\n",
+ "release_date 0\n",
+ "length 0\n",
+ "popularity 0\n",
+ "danceability 0\n",
+ "acousticness 0\n",
+ "energy 0\n",
+ "instrumentalness 0\n",
+ "liveness 0\n",
+ "loudness 0\n",
+ "speechiness 0\n",
+ "tempo 0\n",
+ "time_signature 0\n",
+ "dtype: int64"
+ ]
+ },
+ "execution_count": 6,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.isnull().sum()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "查看数据的一般值。注意,受欢迎度可以是“0”——并且有许多行具有该值。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " release_date \n",
+ " length \n",
+ " popularity \n",
+ " danceability \n",
+ " acousticness \n",
+ " energy \n",
+ " instrumentalness \n",
+ " liveness \n",
+ " loudness \n",
+ " speechiness \n",
+ " tempo \n",
+ " time_signature \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " count \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " 530.000000 \n",
+ " \n",
+ " \n",
+ " mean \n",
+ " 2015.390566 \n",
+ " 222298.169811 \n",
+ " 17.507547 \n",
+ " 0.741619 \n",
+ " 0.265412 \n",
+ " 0.760623 \n",
+ " 0.016305 \n",
+ " 0.147308 \n",
+ " -4.953011 \n",
+ " 0.130748 \n",
+ " 116.487864 \n",
+ " 3.986792 \n",
+ " \n",
+ " \n",
+ " std \n",
+ " 3.131688 \n",
+ " 39696.822259 \n",
+ " 18.992212 \n",
+ " 0.117522 \n",
+ " 0.208342 \n",
+ " 0.148533 \n",
+ " 0.090321 \n",
+ " 0.123588 \n",
+ " 2.464186 \n",
+ " 0.092939 \n",
+ " 23.518601 \n",
+ " 0.333701 \n",
+ " \n",
+ " \n",
+ " min \n",
+ " 1998.000000 \n",
+ " 89488.000000 \n",
+ " 0.000000 \n",
+ " 0.255000 \n",
+ " 0.000665 \n",
+ " 0.111000 \n",
+ " 0.000000 \n",
+ " 0.028300 \n",
+ " -19.362000 \n",
+ " 0.027800 \n",
+ " 61.695000 \n",
+ " 3.000000 \n",
+ " \n",
+ " \n",
+ " 25% \n",
+ " 2014.000000 \n",
+ " 199305.000000 \n",
+ " 0.000000 \n",
+ " 0.681000 \n",
+ " 0.089525 \n",
+ " 0.669000 \n",
+ " 0.000000 \n",
+ " 0.075650 \n",
+ " -6.298750 \n",
+ " 0.059100 \n",
+ " 102.961250 \n",
+ " 4.000000 \n",
+ " \n",
+ " \n",
+ " 50% \n",
+ " 2016.000000 \n",
+ " 218509.000000 \n",
+ " 13.000000 \n",
+ " 0.761000 \n",
+ " 0.220500 \n",
+ " 0.784500 \n",
+ " 0.000004 \n",
+ " 0.103500 \n",
+ " -4.558500 \n",
+ " 0.097950 \n",
+ " 112.714500 \n",
+ " 4.000000 \n",
+ " \n",
+ " \n",
+ " 75% \n",
+ " 2017.000000 \n",
+ " 242098.500000 \n",
+ " 31.000000 \n",
+ " 0.829500 \n",
+ " 0.403000 \n",
+ " 0.875750 \n",
+ " 0.000234 \n",
+ " 0.164000 \n",
+ " -3.331000 \n",
+ " 0.177000 \n",
+ " 125.039250 \n",
+ " 4.000000 \n",
+ " \n",
+ " \n",
+ " max \n",
+ " 2020.000000 \n",
+ " 511738.000000 \n",
+ " 73.000000 \n",
+ " 0.966000 \n",
+ " 0.954000 \n",
+ " 0.995000 \n",
+ " 0.910000 \n",
+ " 0.811000 \n",
+ " 0.582000 \n",
+ " 0.514000 \n",
+ " 206.007000 \n",
+ " 5.000000 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " release_date length popularity danceability acousticness \\\n",
+ "count 530.000000 530.000000 530.000000 530.000000 530.000000 \n",
+ "mean 2015.390566 222298.169811 17.507547 0.741619 0.265412 \n",
+ "std 3.131688 39696.822259 18.992212 0.117522 0.208342 \n",
+ "min 1998.000000 89488.000000 0.000000 0.255000 0.000665 \n",
+ "25% 2014.000000 199305.000000 0.000000 0.681000 0.089525 \n",
+ "50% 2016.000000 218509.000000 13.000000 0.761000 0.220500 \n",
+ "75% 2017.000000 242098.500000 31.000000 0.829500 0.403000 \n",
+ "max 2020.000000 511738.000000 73.000000 0.966000 0.954000 \n",
+ "\n",
+ " energy instrumentalness liveness loudness speechiness \\\n",
+ "count 530.000000 530.000000 530.000000 530.000000 530.000000 \n",
+ "mean 0.760623 0.016305 0.147308 -4.953011 0.130748 \n",
+ "std 0.148533 0.090321 0.123588 2.464186 0.092939 \n",
+ "min 0.111000 0.000000 0.028300 -19.362000 0.027800 \n",
+ "25% 0.669000 0.000000 0.075650 -6.298750 0.059100 \n",
+ "50% 0.784500 0.000004 0.103500 -4.558500 0.097950 \n",
+ "75% 0.875750 0.000234 0.164000 -3.331000 0.177000 \n",
+ "max 0.995000 0.910000 0.811000 0.582000 0.514000 \n",
+ "\n",
+ " tempo time_signature \n",
+ "count 530.000000 530.000000 \n",
+ "mean 116.487864 3.986792 \n",
+ "std 23.518601 0.333701 \n",
+ "min 61.695000 3.000000 \n",
+ "25% 102.961250 4.000000 \n",
+ "50% 112.714500 4.000000 \n",
+ "75% 125.039250 4.000000 \n",
+ "max 206.007000 5.000000 "
+ ]
+ },
+ "execution_count": 7,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "df.describe()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "让我们来研究这些类型。有相当一部分被列为“缺失”,这意味着它们在数据集中没有被归类为某种类型。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Top genres')"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "import seaborn as sns\n",
+ "\n",
+ "top = df['artist_top_genre'].value_counts()\n",
+ "plt.figure(figsize=(10,7))\n",
+ "sns.barplot(x=top[:5].index,y=top[:5].values)\n",
+ "plt.xticks(rotation=45)\n",
+ "plt.title('Top genres',color = 'blue')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "删除“缺失”类型,因为它未在 Spotify 中分类\n",
+ "\n",
+ "## 音乐类型分类\n",
+ "\n",
+ "在处理音乐类型时,确保以下几点:\n",
+ "\n",
+ "1. **准确性**:尽量使用 Spotify 提供的官方类型名称。\n",
+ "2. **一致性**:避免使用不一致或模糊的类型标签。\n",
+ "3. **清晰性**:确保类型标签易于理解,并能准确反映音乐的风格。\n",
+ "\n",
+ "### 常见问题\n",
+ "\n",
+ "#### 为什么要删除“缺失”类型?\n",
+ "“缺失”类型并不是 Spotify 官方分类的一部分,因此保留它可能会导致数据不准确或混乱。通过删除这一类型,可以确保分类的准确性和一致性。\n",
+ "\n",
+ "#### 如何处理未分类的音乐?\n",
+ "对于未分类的音乐,可以尝试以下方法:\n",
+ "- 使用更广泛的类型标签,例如“流行”或“电子”。\n",
+ "- 如果无法确定类型,可以暂时标记为“待分类”,并在后续进行更新。\n",
+ "\n",
+ "### 示例\n",
+ "\n",
+ "以下是一些常见类型的示例:\n",
+ "\n",
+ "- 流行\n",
+ "- 摇滚\n",
+ "- 电子\n",
+ "- 嘻哈\n",
+ "- 古典\n",
+ "\n",
+ "通过删除“缺失”类型并使用更准确的标签,可以提高音乐分类的质量和用户体验。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Top genres')"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "df = df[df['artist_top_genre'] != 'Missing']\n",
+ "top = df['artist_top_genre'].value_counts()\n",
+ "plt.figure(figsize=(10,7))\n",
+ "sns.barplot(x=top.index,y=top.values)\n",
+ "plt.xticks(rotation=45)\n",
+ "plt.title('Top genres',color = 'blue')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Top genres')"
+ ]
+ },
+ "execution_count": 10,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]\n",
+ "df = df[(df['popularity'] > 0)]\n",
+ "top = df['artist_top_genre'].value_counts()\n",
+ "plt.figure(figsize=(10,7))\n",
+ "sns.barplot(x=top.index,y=top.values)\n",
+ "plt.xticks(rotation=45)\n",
+ "plt.title('Top genres',color = 'blue')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "数据之间没有强相关性,除了能量和响度之间的相关性,这很合理。流行度与发行数据有对应关系,这也很合理,因为较新的歌曲可能更受欢迎。长度和能量似乎有相关性——也许较短的歌曲更有活力?\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "corrmat = df.corr()\n",
+ "f, ax = plt.subplots(figsize=(12, 9))\n",
+ "sns.heatmap(corrmat, vmax=.8, square=True);"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.set_theme(style=\"ticks\")\n",
+ "\n",
+ "# Show the joint distribution using kernel density estimation\n",
+ "g = sns.jointplot(\n",
+ " data=df,\n",
+ " x=\"popularity\", y=\"danceability\", hue=\"artist_top_genre\",\n",
+ " kind=\"kde\",\n",
+ ")"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "总体而言,这三种类型在受欢迎程度和舞蹈性方面保持一致。相同轴的散点图显示了类似的收敛模式。尝试使用散点图检查每种类型的数据分布。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stderr",
+ "output_type": "stream",
+ "text": [
+ "/Users/jenniferlooper/Library/Python/3.8/lib/python/site-packages/seaborn/axisgrid.py:337: UserWarning: The `size` parameter has been renamed to `height`; please update your code.\n",
+ " warnings.warn(msg, UserWarning)\n"
+ ]
+ },
+ {
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ },
+ {
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "sns.FacetGrid(df, hue=\"artist_top_genre\", size=5) \\\n",
+ " .map(plt.scatter, \"popularity\", \"danceability\") \\\n",
+ " .add_legend()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "interpreter": {
+ "hash": "31f2aee4e71d21fbe5cf8b01ff0e069b9275f58929596ceb00d14d90e3e16cd6"
+ },
+ "kernelspec": {
+ "display_name": "Python 3.7.0 64-bit ('3.7')",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.8.9"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "c61deff2839902ac8cb4ed411eb10fee",
+ "translation_date": "2025-09-03T20:02:50+00:00",
+ "source_file": "5-Clustering/1-Visualize/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/README.md b/translations/zh-CN/5-Clustering/2-K-Means/README.md
new file mode 100644
index 000000000..f7de2e097
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/README.md
@@ -0,0 +1,252 @@
+# K-Means 聚类
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+在本课中,您将学习如何使用 Scikit-learn 和之前导入的尼日利亚音乐数据集创建聚类。我们将介绍 K-Means 聚类的基础知识。请记住,正如您在之前的课程中学到的那样,有许多方法可以处理聚类,您使用的方法取决于您的数据。我们将尝试 K-Means,因为它是最常见的聚类技术。让我们开始吧!
+
+您将学习的术语:
+
+- Silhouette评分
+- 肘部法则
+- 惯性
+- 方差
+
+## 简介
+
+[K-Means 聚类](https://wikipedia.org/wiki/K-means_clustering) 是一种源自信号处理领域的方法。它用于通过一系列观察将数据分组并划分为“k”个聚类。每次观察都将数据点分配到离其最近的“均值”或聚类中心点。
+
+这些聚类可以通过 [Voronoi 图](https://wikipedia.org/wiki/Voronoi_diagram) 来可视化,其中包括一个点(或“种子”)及其对应的区域。
+
+
+
+> 信息图由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+K-Means 聚类过程[通过三步流程执行](https://scikit-learn.org/stable/modules/clustering.html#k-means):
+
+1. 算法通过从数据集中采样选择 k 个中心点。之后进入循环:
+ 1. 将每个样本分配到最近的质心。
+ 2. 通过计算分配到之前质心的所有样本的平均值来创建新的质心。
+ 3. 然后计算新旧质心之间的差异,并重复直到质心稳定。
+
+使用 K-Means 的一个缺点是您需要确定“k”,即质心的数量。幸运的是,“肘部法则”可以帮助估算一个好的起始值。您马上就会尝试。
+
+## 前提条件
+
+您将在本课的 [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/2-K-Means/notebook.ipynb) 文件中工作,其中包括您在上一课中完成的数据导入和初步清理。
+
+## 练习 - 准备工作
+
+首先再次查看歌曲数据。
+
+1. 为每一列调用 `boxplot()` 创建一个箱线图:
+
+ ```python
+ plt.figure(figsize=(20,20), dpi=200)
+
+ plt.subplot(4,3,1)
+ sns.boxplot(x = 'popularity', data = df)
+
+ plt.subplot(4,3,2)
+ sns.boxplot(x = 'acousticness', data = df)
+
+ plt.subplot(4,3,3)
+ sns.boxplot(x = 'energy', data = df)
+
+ plt.subplot(4,3,4)
+ sns.boxplot(x = 'instrumentalness', data = df)
+
+ plt.subplot(4,3,5)
+ sns.boxplot(x = 'liveness', data = df)
+
+ plt.subplot(4,3,6)
+ sns.boxplot(x = 'loudness', data = df)
+
+ plt.subplot(4,3,7)
+ sns.boxplot(x = 'speechiness', data = df)
+
+ plt.subplot(4,3,8)
+ sns.boxplot(x = 'tempo', data = df)
+
+ plt.subplot(4,3,9)
+ sns.boxplot(x = 'time_signature', data = df)
+
+ plt.subplot(4,3,10)
+ sns.boxplot(x = 'danceability', data = df)
+
+ plt.subplot(4,3,11)
+ sns.boxplot(x = 'length', data = df)
+
+ plt.subplot(4,3,12)
+ sns.boxplot(x = 'release_date', data = df)
+ ```
+
+ 这些数据有点噪声:通过观察每一列的箱线图,您可以看到异常值。
+
+ 
+
+您可以遍历数据集并删除这些异常值,但这样会使数据变得非常有限。
+
+1. 目前,选择您将用于聚类练习的列。选择范围相似的列,并将 `artist_top_genre` 列编码为数值数据:
+
+ ```python
+ from sklearn.preprocessing import LabelEncoder
+ le = LabelEncoder()
+
+ X = df.loc[:, ('artist_top_genre','popularity','danceability','acousticness','loudness','energy')]
+
+ y = df['artist_top_genre']
+
+ X['artist_top_genre'] = le.fit_transform(X['artist_top_genre'])
+
+ y = le.transform(y)
+ ```
+
+1. 现在您需要选择目标聚类的数量。您知道数据集中有 3 个歌曲流派,因此我们尝试 3:
+
+ ```python
+ from sklearn.cluster import KMeans
+
+ nclusters = 3
+ seed = 0
+
+ km = KMeans(n_clusters=nclusters, random_state=seed)
+ km.fit(X)
+
+ # Predict the cluster for each data point
+
+ y_cluster_kmeans = km.predict(X)
+ y_cluster_kmeans
+ ```
+
+您会看到一个数组打印出来,其中包含数据框每一行的预测聚类(0、1 或 2)。
+
+1. 使用此数组计算“Silhouette评分”:
+
+ ```python
+ from sklearn import metrics
+ score = metrics.silhouette_score(X, y_cluster_kmeans)
+ score
+ ```
+
+## Silhouette评分
+
+寻找接近 1 的 Silhouette评分。此评分范围从 -1 到 1,如果评分为 1,则聚类密集且与其他聚类分离良好。接近 0 的值表示聚类重叠,样本非常接近邻近聚类的决策边界。[(来源)](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam)
+
+我们的评分是 **0.53**,处于中间位置。这表明我们的数据不太适合这种类型的聚类,但我们继续。
+
+### 练习 - 构建模型
+
+1. 导入 `KMeans` 并开始聚类过程。
+
+ ```python
+ from sklearn.cluster import KMeans
+ wcss = []
+
+ for i in range(1, 11):
+ kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)
+ kmeans.fit(X)
+ wcss.append(kmeans.inertia_)
+
+ ```
+
+ 这里有几个部分需要解释。
+
+ > 🎓 range:这些是聚类过程的迭代次数
+
+ > 🎓 random_state:“确定质心初始化的随机数生成。”[来源](https://scikit-learn.org/stable/modules/generated/sklearn.cluster.KMeans.html#sklearn.cluster.KMeans)
+
+ > 🎓 WCSS:“聚类内平方和”衡量聚类内所有点到质心的平均平方距离。[来源](https://medium.com/@ODSC/unsupervised-learning-evaluating-clusters-bd47eed175ce)
+
+ > 🎓 惯性:K-Means 算法尝试选择质心以最小化“惯性”,“惯性是衡量聚类内部一致性的一种方法。”[来源](https://scikit-learn.org/stable/modules/clustering.html)。该值在每次迭代中附加到 wcss 变量。
+
+ > 🎓 k-means++:在 [Scikit-learn](https://scikit-learn.org/stable/modules/clustering.html#k-means) 中,您可以使用“k-means++”优化,“初始化质心使其(通常)彼此距离较远,从而可能比随机初始化获得更好的结果。”
+
+### 肘部法则
+
+之前,您推测因为您针对 3 个歌曲流派,所以应该选择 3 个聚类。但真的是这样吗?
+
+1. 使用“肘部法则”确认。
+
+ ```python
+ plt.figure(figsize=(10,5))
+ sns.lineplot(x=range(1, 11), y=wcss, marker='o', color='red')
+ plt.title('Elbow')
+ plt.xlabel('Number of clusters')
+ plt.ylabel('WCSS')
+ plt.show()
+ ```
+
+ 使用您在上一步中构建的 `wcss` 变量创建一个图表,显示肘部的“弯曲”位置,这表明最佳聚类数量。也许确实是 **3**!
+
+ 
+
+## 练习 - 显示聚类
+
+1. 再次尝试该过程,这次设置三个聚类,并将聚类显示为散点图:
+
+ ```python
+ from sklearn.cluster import KMeans
+ kmeans = KMeans(n_clusters = 3)
+ kmeans.fit(X)
+ labels = kmeans.predict(X)
+ plt.scatter(df['popularity'],df['danceability'],c = labels)
+ plt.xlabel('popularity')
+ plt.ylabel('danceability')
+ plt.show()
+ ```
+
+1. 检查模型的准确性:
+
+ ```python
+ labels = kmeans.labels_
+
+ correct_labels = sum(y == labels)
+
+ print("Result: %d out of %d samples were correctly labeled." % (correct_labels, y.size))
+
+ print('Accuracy score: {0:0.2f}'. format(correct_labels/float(y.size)))
+ ```
+
+ 该模型的准确性不太高,聚类的形状给了您一个提示原因。
+
+ 
+
+ 这些数据过于不平衡,相关性太低,并且列值之间的方差太大,无法很好地聚类。事实上,形成的聚类可能受到我们上面定义的三个流派类别的严重影响或偏斜。这是一个学习过程!
+
+ 在 Scikit-learn 的文档中,您可以看到像这样的模型,聚类划分不太清晰,存在“方差”问题:
+
+ 
+ > 信息图来自 Scikit-learn
+
+## 方差
+
+方差定义为“与均值的平方差的平均值”[(来源)](https://www.mathsisfun.com/data/standard-deviation.html)。在此聚类问题的背景下,它指的是数据集中数值偏离均值的程度。
+
+✅ 这是一个很好的时机来思考所有可能的解决方法。进一步调整数据?使用不同的列?使用不同的算法?提示:尝试[缩放数据](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/)以进行归一化并测试其他列。
+
+> 尝试这个“[方差计算器](https://www.calculatorsoup.com/calculators/statistics/variance-calculator.php)”来更好地理解这个概念。
+
+---
+
+## 🚀挑战
+
+花一些时间在这个 notebook 上,调整参数。通过进一步清理数据(例如删除异常值),您能否提高模型的准确性?您可以使用权重为某些数据样本赋予更大的权重。还有什么方法可以创建更好的聚类?
+
+提示:尝试缩放数据。notebook 中有注释代码,添加了标准缩放以使数据列在范围上更接近。您会发现虽然 Silhouette评分下降了,但肘部图中的“弯曲”变得更平滑。这是因为未缩放的数据允许方差较小的数据具有更大的权重。阅读更多关于此问题的内容[这里](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering/21226#21226)。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+查看一个 K-Means 模拟器[例如这个](https://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/)。您可以使用此工具可视化样本数据点并确定其质心。您可以编辑数据的随机性、聚类数量和质心数量。这是否帮助您更好地理解数据如何分组?
+
+此外,查看 [斯坦福的 K-Means 手册](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html)。
+
+## 作业
+
+[尝试不同的聚类方法](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/assignment.md b/translations/zh-CN/5-Clustering/2-K-Means/assignment.md
new file mode 100644
index 000000000..925481561
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/assignment.md
@@ -0,0 +1,16 @@
+# 尝试不同的聚类方法
+
+## 说明
+
+在本课中,你学习了 K-Means 聚类。有时 K-Means 并不适合你的数据。创建一个笔记本,使用本课中的数据或其他来源的数据(注明来源),并展示一种不同的聚类方法,而不是使用 K-Means。你学到了什么?
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | ------------------------------------------------------------ | ------------------------------------------------------------ | -------------------------- |
+| | 提交了一个包含详细文档的聚类模型的笔记本 | 提交了一个笔记本,但文档不够完善和/或内容不完整 | 提交的工作不完整 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/notebook.ipynb b/translations/zh-CN/5-Clustering/2-K-Means/notebook.ipynb
new file mode 100644
index 000000000..25ed95875
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/notebook.ipynb
@@ -0,0 +1,231 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "3e5c8ab363e8d88f566d4365efc7e0bd",
+ "translation_date": "2025-09-03T20:10:14+00:00",
+ "source_file": "5-Clustering/2-K-Means/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: seaborn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.11.1)\n",
+ "Requirement already satisfied: numpy>=1.15 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.19.2)\n",
+ "Requirement already satisfied: pandas>=0.23 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.1.2)\n",
+ "Requirement already satisfied: scipy>=1.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.4.1)\n",
+ "Requirement already satisfied: matplotlib>=2.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (3.1.0)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2.8.0)\n",
+ "Requirement already satisfied: pytz>=2017.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2019.1)\n",
+ "Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.1.0)\n",
+ "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.4.0)\n",
+ "Requirement already satisfied: six>=1.5 in /Users/jenlooper/Library/Python/3.7/lib/python/site-packages (from python-dateutil>=2.7.3->pandas>=0.23->seaborn) (1.12.0)\n",
+ "Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.2->seaborn) (45.1.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n",
+ "You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "pip install seaborn"
+ ]
+ },
+ {
+ "source": [
+ "从我们上节课结束的地方开始,导入并过滤数据。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " name album \\\n",
+ "0 Sparky Mandy & The Jungle \n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "2 LITT! LITT! \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "0 Cruel Santino alternative r&b 2019 144000 48 \n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "2 AYLØ indie r&b 2018 207758 40 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "0 0.666 0.8510 0.420 0.534000 0.1100 -6.699 \n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "2 0.836 0.2720 0.564 0.000537 0.1100 -7.127 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "0 0.0829 133.015 5 \n",
+ "1 0.3600 129.993 3 \n",
+ "2 0.0424 130.005 4 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 "
+ ],
+ "text/html": "\n\n
\n \n \n \n name \n album \n artist \n artist_top_genre \n release_date \n length \n popularity \n danceability \n acousticness \n energy \n instrumentalness \n liveness \n loudness \n speechiness \n tempo \n time_signature \n \n \n \n \n 0 \n Sparky \n Mandy & The Jungle \n Cruel Santino \n alternative r&b \n 2019 \n 144000 \n 48 \n 0.666 \n 0.8510 \n 0.420 \n 0.534000 \n 0.1100 \n -6.699 \n 0.0829 \n 133.015 \n 5 \n \n \n 1 \n shuga rush \n EVERYTHING YOU HEARD IS TRUE \n Odunsi (The Engine) \n afropop \n 2020 \n 89488 \n 30 \n 0.710 \n 0.0822 \n 0.683 \n 0.000169 \n 0.1010 \n -5.640 \n 0.3600 \n 129.993 \n 3 \n \n \n 2 \n LITT! \n LITT! \n AYLØ \n indie r&b \n 2018 \n 207758 \n 40 \n 0.836 \n 0.2720 \n 0.564 \n 0.000537 \n 0.1100 \n -7.127 \n 0.0424 \n 130.005 \n 4 \n \n \n 3 \n Confident / Feeling Cool \n Enjoy Your Life \n Lady Donli \n nigerian pop \n 2019 \n 175135 \n 14 \n 0.894 \n 0.7980 \n 0.611 \n 0.000187 \n 0.0964 \n -4.961 \n 0.1130 \n 111.087 \n 4 \n \n \n 4 \n wanted you \n rare. \n Odunsi (The Engine) \n afropop \n 2018 \n 152049 \n 25 \n 0.702 \n 0.1160 \n 0.833 \n 0.910000 \n 0.3480 \n -6.044 \n 0.0447 \n 105.115 \n 4 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ],
+ "source": [
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "\n",
+ "\n",
+ "df = pd.read_csv(\"../data/nigerian-songs.csv\")\n",
+ "df.head()"
+ ]
+ },
+ {
+ "source": [
+ "我们将只关注三个类型。也许我们可以建立三个集群!\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Top genres')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]\n",
+ "df = df[(df['popularity'] > 0)]\n",
+ "top = df['artist_top_genre'].value_counts()\n",
+ "plt.figure(figsize=(10,7))\n",
+ "sns.barplot(x=top.index,y=top.values)\n",
+ "plt.xticks(rotation=45)\n",
+ "plt.title('Top genres',color = 'blue')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " name album \\\n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "5 Kasala Pioneers \n",
+ "6 Pull Up Everything Pretty \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "5 DRB Lasgidi nigerian pop 2020 184800 26 \n",
+ "6 prettyboydo nigerian pop 2018 202648 29 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "5 0.803 0.1270 0.525 0.000007 0.1290 -10.034 \n",
+ "6 0.818 0.4520 0.587 0.004490 0.5900 -9.840 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "1 0.3600 129.993 3 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 \n",
+ "5 0.1970 100.103 4 \n",
+ "6 0.1990 95.842 4 "
+ ],
+ "text/html": "\n\n
\n \n \n \n name \n album \n artist \n artist_top_genre \n release_date \n length \n popularity \n danceability \n acousticness \n energy \n instrumentalness \n liveness \n loudness \n speechiness \n tempo \n time_signature \n \n \n \n \n 1 \n shuga rush \n EVERYTHING YOU HEARD IS TRUE \n Odunsi (The Engine) \n afropop \n 2020 \n 89488 \n 30 \n 0.710 \n 0.0822 \n 0.683 \n 0.000169 \n 0.1010 \n -5.640 \n 0.3600 \n 129.993 \n 3 \n \n \n 3 \n Confident / Feeling Cool \n Enjoy Your Life \n Lady Donli \n nigerian pop \n 2019 \n 175135 \n 14 \n 0.894 \n 0.7980 \n 0.611 \n 0.000187 \n 0.0964 \n -4.961 \n 0.1130 \n 111.087 \n 4 \n \n \n 4 \n wanted you \n rare. \n Odunsi (The Engine) \n afropop \n 2018 \n 152049 \n 25 \n 0.702 \n 0.1160 \n 0.833 \n 0.910000 \n 0.3480 \n -6.044 \n 0.0447 \n 105.115 \n 4 \n \n \n 5 \n Kasala \n Pioneers \n DRB Lasgidi \n nigerian pop \n 2020 \n 184800 \n 26 \n 0.803 \n 0.1270 \n 0.525 \n 0.000007 \n 0.1290 \n -10.034 \n 0.1970 \n 100.103 \n 4 \n \n \n 6 \n Pull Up \n Everything Pretty \n prettyboydo \n nigerian pop \n 2018 \n 202648 \n 29 \n 0.818 \n 0.4520 \n 0.587 \n 0.004490 \n 0.5900 \n -9.840 \n 0.1990 \n 95.842 \n 4 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ],
+ "source": [
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/solution/Julia/README.md b/translations/zh-CN/5-Clustering/2-K-Means/solution/Julia/README.md
new file mode 100644
index 000000000..f30fc4eeb
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/solution/R/lesson_15-R.ipynb b/translations/zh-CN/5-Clustering/2-K-Means/solution/R/lesson_15-R.ipynb
new file mode 100644
index 000000000..d60c3460e
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/solution/R/lesson_15-R.ipynb
@@ -0,0 +1,637 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "anaconda-cloud": "",
+ "kernelspec": {
+ "display_name": "R",
+ "language": "R",
+ "name": "ir"
+ },
+ "language_info": {
+ "codemirror_mode": "r",
+ "file_extension": ".r",
+ "mimetype": "text/x-r-source",
+ "name": "R",
+ "pygments_lexer": "r",
+ "version": "3.4.1"
+ },
+ "colab": {
+ "name": "lesson_14.ipynb",
+ "provenance": [],
+ "collapsed_sections": [],
+ "toc_visible": true
+ },
+ "coopTranslator": {
+ "original_hash": "ad65fb4aad0a156b42216e4929f490fc",
+ "translation_date": "2025-09-03T20:17:16+00:00",
+ "source_file": "5-Clustering/2-K-Means/solution/R/lesson_15-R.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "GULATlQXLXyR"
+ },
+ "source": [
+ "## 使用 R 和 Tidy 数据原则探索 K-Means 聚类\n",
+ "\n",
+ "### [**课前测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/29/)\n",
+ "\n",
+ "在本课中,您将学习如何使用 Tidymodels 包以及 R 生态系统中的其他包(我们称它们为朋友 🧑🤝🧑)创建聚类,并使用您之前导入的尼日利亚音乐数据集。我们将介绍 K-Means 聚类的基础知识。请记住,正如您在之前的课程中学到的那样,有许多方法可以处理聚类,您使用的方法取决于您的数据。我们将尝试 K-Means,因为它是最常见的聚类技术。让我们开始吧!\n",
+ "\n",
+ "您将学习的术语:\n",
+ "\n",
+ "- Silhouette评分\n",
+ "\n",
+ "- 肘部法则\n",
+ "\n",
+ "- 惯性\n",
+ "\n",
+ "- 方差\n",
+ "\n",
+ "### **简介**\n",
+ "\n",
+ "[K-Means 聚类](https://wikipedia.org/wiki/K-means_clustering) 是一种源自信号处理领域的方法。它用于根据特征的相似性将数据分成 `k 个聚类`。\n",
+ "\n",
+ "这些聚类可以通过 [Voronoi 图](https://wikipedia.org/wiki/Voronoi_diagram) 可视化,其中包括一个点(或“种子”)及其对应的区域。\n",
+ "\n",
+ "\n",
+ " \n",
+ " Jen Looper 制作的信息图 \n",
+ "\n",
+ "K-Means 聚类的步骤如下:\n",
+ "\n",
+ "1. 数据科学家首先指定要创建的聚类数量。\n",
+ "\n",
+ "2. 接下来,算法从数据集中随机选择 K 个观测值作为聚类的初始中心(即质心)。\n",
+ "\n",
+ "3. 然后,将其余的观测值分配到距离最近的质心。\n",
+ "\n",
+ "4. 接下来,计算每个聚类的新均值,并将质心移动到均值位置。\n",
+ "\n",
+ "5. 现在质心已经重新计算,每个观测值再次被检查是否更接近其他聚类。所有对象再次使用更新后的聚类均值重新分配。聚类分配和质心更新步骤会迭代重复,直到聚类分配不再变化(即达到收敛)。通常,当每次新迭代导致质心的移动微乎其微且聚类变得静态时,算法会终止。\n",
+ "\n",
+ "
\n",
+ "\n",
+ "> 请注意,由于初始 k 个观测值的随机化,作为起始质心,每次应用该过程时可能会得到略有不同的结果。因此,大多数算法会使用多个 *随机起点* 并选择具有最低 WCSS 的迭代。因此,强烈建议始终使用多个 *nstart* 值运行 K-Means,以避免 *不理想的局部最优解*。\n",
+ "\n",
+ "
\n",
+ "\n",
+ "以下短动画使用 Allison Horst 的 [插画](https://github.com/allisonhorst/stats-illustrations) 解释了聚类过程:\n",
+ "\n",
+ "\n",
+ " \n",
+ " @allison_horst 的插画 \n",
+ "\n",
+ "聚类中一个基本问题是:如何确定将数据分成多少个聚类?使用 K-Means 的一个缺点是您需要确定 `k`,即 `质心` 的数量。幸运的是,`肘部法则` 可以帮助估算一个好的起始值。您马上就会尝试。\n",
+ "\n",
+ "### \n",
+ "\n",
+ "**前提条件**\n",
+ "\n",
+ "我们将从 [上一课](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb) 停止的地方继续,在那里我们分析了数据集,进行了大量可视化,并过滤了感兴趣的观测值。一定要查看!\n",
+ "\n",
+ "我们需要一些包来完成本模块。您可以通过以下方式安装它们:`install.packages(c('tidyverse', 'tidymodels', 'cluster', 'summarytools', 'plotly', 'paletteer', 'factoextra', 'patchwork'))`\n",
+ "\n",
+ "或者,下面的脚本会检查您是否拥有完成本模块所需的包,并在缺少某些包时为您安装它们。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "ah_tBi58LXyi"
+ },
+ "source": [
+ "suppressWarnings(if(!require(\"pacman\")) install.packages(\"pacman\"))\n",
+ "\n",
+ "pacman::p_load('tidyverse', 'tidymodels', 'cluster', 'summarytools', 'plotly', 'paletteer', 'factoextra', 'patchwork')\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "7e--UCUTLXym"
+ },
+ "source": [
+ "让我们开始吧!\n",
+ "\n",
+ "## 1. 与数据共舞:缩小到最受欢迎的三个音乐类型\n",
+ "\n",
+ "这是我们上一节课所做内容的回顾。让我们来分析一些数据吧!\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Ycamx7GGLXyn"
+ },
+ "source": [
+ "# Load the core tidyverse and make it available in your current R session\n",
+ "library(tidyverse)\n",
+ "\n",
+ "# Import the data into a tibble\n",
+ "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/5-Clustering/data/nigerian-songs.csv\", show_col_types = FALSE)\n",
+ "\n",
+ "# Narrow down to top 3 popular genres\n",
+ "nigerian_songs <- df %>% \n",
+ " # Concentrate on top 3 genres\n",
+ " filter(artist_top_genre %in% c(\"afro dancehall\", \"afropop\",\"nigerian pop\")) %>% \n",
+ " # Remove unclassified observations\n",
+ " filter(popularity != 0)\n",
+ "\n",
+ "\n",
+ "\n",
+ "# Visualize popular genres using bar plots\n",
+ "theme_set(theme_light())\n",
+ "nigerian_songs %>%\n",
+ " count(artist_top_genre) %>%\n",
+ " ggplot(mapping = aes(x = artist_top_genre, y = n,\n",
+ " fill = artist_top_genre)) +\n",
+ " geom_col(alpha = 0.8) +\n",
+ " paletteer::scale_fill_paletteer_d(\"ggsci::category10_d3\") +\n",
+ " ggtitle(\"Top genres\") +\n",
+ " theme(plot.title = element_text(hjust = 0.5))\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "b5h5zmkPLXyp"
+ },
+ "source": [
+ "🤩 这进展得很顺利!\n",
+ "\n",
+ "## 2. 更多数据探索\n",
+ "\n",
+ "这些数据有多干净?让我们使用箱线图检查异常值。我们将专注于异常值较少的数值列(尽管你也可以清理异常值)。箱线图可以显示数据的范围,并帮助选择要使用的列。注意,箱线图并不显示方差,而方差是良好可聚类数据的重要元素。请参阅[这个讨论](https://stats.stackexchange.com/questions/91536/deduce-variance-from-boxplot)以了解更多信息。\n",
+ "\n",
+ "[箱线图](https://en.wikipedia.org/wiki/Box_plot)用于以图形方式描述`数值`数据的分布,因此我们先从*选择*所有数值列以及流行音乐流派开始。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "HhNreJKLLXyq"
+ },
+ "source": [
+ "# Select top genre column and all other numeric columns\n",
+ "df_numeric <- nigerian_songs %>% \n",
+ " select(artist_top_genre, where(is.numeric)) \n",
+ "\n",
+ "# Display the data\n",
+ "df_numeric %>% \n",
+ " slice_head(n = 5)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "uYXrwJRaLXyq"
+ },
+ "source": [
+ "看看选择助手 `where` 是如何让这一切变得简单的 💁?可以在[这里](https://tidyselect.r-lib.org/)探索其他类似的函数。\n",
+ "\n",
+ "由于我们将为每个数值特征制作箱线图,并且希望避免使用循环,让我们将数据重新格式化为*更长*的格式,这样就可以利用 `facets`——每个子图分别显示数据的一个子集。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "gd5bR3f8LXys"
+ },
+ "source": [
+ "# Pivot data from wide to long\n",
+ "df_numeric_long <- df_numeric %>% \n",
+ " pivot_longer(!artist_top_genre, names_to = \"feature_names\", values_to = \"values\") \n",
+ "\n",
+ "# Print out data\n",
+ "df_numeric_long %>% \n",
+ " slice_head(n = 15)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "-7tE1swnLXyv"
+ },
+ "source": [
+ "更长了!现在是时候使用一些 `ggplots` 了!那么我们会用什么 `geom` 呢?\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "r88bIsyuLXyy"
+ },
+ "source": [
+ "# Make a box plot\n",
+ "df_numeric_long %>% \n",
+ " ggplot(mapping = aes(x = feature_names, y = values, fill = feature_names)) +\n",
+ " geom_boxplot() +\n",
+ " facet_wrap(~ feature_names, ncol = 4, scales = \"free\") +\n",
+ " theme(legend.position = \"none\")\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "EYVyKIUELXyz"
+ },
+ "source": [
+ "现在我们可以看到这些数据有些杂乱:通过观察每一列的箱线图,可以发现存在异常值。你可以遍历整个数据集并移除这些异常值,但这样会使数据变得非常少。\n",
+ "\n",
+ "目前,我们来选择用于聚类练习的列。我们选择范围相似的数值列。我们可以将 `artist_top_genre` 编码为数值,但暂时先舍弃它。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "-wkpINyZLXy0"
+ },
+ "source": [
+ "# Select variables with similar ranges\n",
+ "df_numeric_select <- df_numeric %>% \n",
+ " select(popularity, danceability, acousticness, loudness, energy) \n",
+ "\n",
+ "# Normalize data\n",
+ "# df_numeric_select <- scale(df_numeric_select)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "D7dLzgpqLXy1"
+ },
+ "source": [
+ "## 3. 在 R 中计算 k-means 聚类\n",
+ "\n",
+ "我们可以使用 R 中内置的 `kmeans` 函数计算 k-means,参见 `help(\"kmeans()\")`。`kmeans()` 函数的主要参数是一个包含所有数值型列的数据框。\n",
+ "\n",
+ "使用 k-means 聚类的第一步是指定最终解决方案中要生成的聚类数量(k)。我们知道从数据集中划分出了 3 种歌曲类型,因此我们可以尝试设置为 3:\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "uC4EQ5w7LXy5"
+ },
+ "source": [
+ "set.seed(2056)\n",
+ "# Kmeans clustering for 3 clusters\n",
+ "kclust <- kmeans(\n",
+ " df_numeric_select,\n",
+ " # Specify the number of clusters\n",
+ " centers = 3,\n",
+ " # How many random initial configurations\n",
+ " nstart = 25\n",
+ ")\n",
+ "\n",
+ "# Display clustering object\n",
+ "kclust\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "hzfhscWrLXy-"
+ },
+ "source": [
+ "kmeans对象包含了许多信息,这些信息在`help(\"kmeans()\")`中有详细说明。现在,我们先关注几个关键点。我们可以看到数据被分成了3个簇,分别包含65、110和111个样本。输出还包括了这3个簇在5个变量上的簇中心(均值)。\n",
+ "\n",
+ "聚类向量是每个观测值的簇分配。我们可以使用`augment`函数将簇分配添加到原始数据集中。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "0XwwpFGQLXy_"
+ },
+ "source": [
+ "# Add predicted cluster assignment to data set\n",
+ "augment(kclust, df_numeric_select) %>% \n",
+ " relocate(.cluster) %>% \n",
+ " slice_head(n = 10)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "NXIVXXACLXzA"
+ },
+ "source": [
+ "太好了,我们刚刚将数据集划分成了三个组。那么我们的聚类效果如何呢🤷?让我们来看看 `Silhouette score`。\n",
+ "\n",
+ "### **轮廓系数**\n",
+ "\n",
+ "[轮廓分析](https://en.wikipedia.org/wiki/Silhouette_(clustering))可以用来研究生成的聚类之间的分离距离。这个分数范围从 -1 到 1,如果分数接近 1,说明聚类紧密且与其他聚类分离良好。接近 0 的值表示聚类之间有重叠,样本非常接近邻近聚类的决策边界。[来源](https://dzone.com/articles/kmeans-silhouette-score-explained-with-python-exam)。\n",
+ "\n",
+ "平均轮廓方法计算不同 *k* 值下观测点的平均轮廓分数。较高的平均轮廓分数表明聚类效果较好。\n",
+ "\n",
+ "使用 cluster 包中的 `silhouette` 函数可以计算平均轮廓宽度。\n",
+ "\n",
+ "> 轮廓分数可以使用任何[距离](https://en.wikipedia.org/wiki/Distance \"Distance\")度量来计算,例如我们在[上一课](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb)中讨论过的[欧几里得距离](https://en.wikipedia.org/wiki/Euclidean_distance \"Euclidean distance\")或[曼哈顿距离](https://en.wikipedia.org/wiki/Manhattan_distance \"Manhattan distance\")。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "Jn0McL28LXzB"
+ },
+ "source": [
+ "# Load cluster package\n",
+ "library(cluster)\n",
+ "\n",
+ "# Compute average silhouette score\n",
+ "ss <- silhouette(kclust$cluster,\n",
+ " # Compute euclidean distance\n",
+ " dist = dist(df_numeric_select))\n",
+ "mean(ss[, 3])\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "QyQRn97nLXzC"
+ },
+ "source": [
+ "我们的得分是 **.549**,正好处于中间位置。这表明我们的数据并不特别适合这种类型的聚类。让我们看看是否可以通过可视化来验证这个猜测。[factoextra 包](https://rpkgs.datanovia.com/factoextra/index.html) 提供了用于可视化聚类的函数(`fviz_cluster()`)。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "7a6Km1_FLXzD"
+ },
+ "source": [
+ "library(factoextra)\n",
+ "\n",
+ "# Visualize clustering results\n",
+ "fviz_cluster(kclust, df_numeric_select)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IBwCWt-0LXzD"
+ },
+ "source": [
+ "聚类之间的重叠表明我们的数据并不特别适合这种类型的聚类,但我们继续进行。\n",
+ "\n",
+ "## 4. 确定最佳聚类数\n",
+ "\n",
+ "在 K-Means 聚类中经常出现的一个基本问题是——在没有已知类别标签的情况下,如何确定将数据分成多少个聚类?\n",
+ "\n",
+ "我们可以尝试的一种方法是使用一个数据样本来`创建一系列聚类模型`,逐步增加聚类的数量(例如从 1 到 10),并评估聚类指标,例如 **Silhouette 分数**。\n",
+ "\n",
+ "让我们通过对不同的 *k* 值计算聚类算法,并评估 **聚类内平方和**(WCSS),来确定最佳的聚类数。聚类内平方和(WCSS)总量衡量了聚类的紧凑性,我们希望它尽可能小,较低的值意味着数据点更接近。\n",
+ "\n",
+ "让我们探索不同的 `k` 值(从 1 到 10)对聚类结果的影响。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "hSeIiylDLXzE"
+ },
+ "source": [
+ "# Create a series of clustering models\n",
+ "kclusts <- tibble(k = 1:10) %>% \n",
+ " # Perform kmeans clustering for 1,2,3 ... ,10 clusters\n",
+ " mutate(model = map(k, ~ kmeans(df_numeric_select, centers = .x, nstart = 25)),\n",
+ " # Farm out clustering metrics eg WCSS\n",
+ " glanced = map(model, ~ glance(.x))) %>% \n",
+ " unnest(cols = glanced)\n",
+ " \n",
+ "\n",
+ "# View clustering rsulsts\n",
+ "kclusts\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "m7rS2U1eLXzE"
+ },
+ "source": [
+ "现在我们已经获得了每个聚类算法在中心 *k* 时的总簇内平方和 (tot.withinss),接下来我们使用[肘部法则](https://en.wikipedia.org/wiki/Elbow_method_(clustering))来确定最佳的聚类数量。该方法的核心是将簇内平方和 (WCSS) 作为聚类数量的函数进行绘图,并选择[曲线的肘部](https://en.wikipedia.org/wiki/Elbow_of_the_curve \"曲线的肘部\")作为要使用的聚类数量。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "o_DjHGItLXzF"
+ },
+ "source": [
+ "set.seed(2056)\n",
+ "# Use elbow method to determine optimum number of clusters\n",
+ "kclusts %>% \n",
+ " ggplot(mapping = aes(x = k, y = tot.withinss)) +\n",
+ " geom_line(size = 1.2, alpha = 0.8, color = \"#FF7F0EFF\") +\n",
+ " geom_point(size = 2, color = \"#FF7F0EFF\")\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "pLYyt5XSLXzG"
+ },
+ "source": [
+ "图表显示,当聚类数量从一个增加到两个时,WCSS显著减少(即更高的*紧密度*),从两个增加到三个聚类时也有进一步明显的减少。之后,减少的幅度变得不那么显著,在图表中大约三个聚类处形成一个`肘部` 💪。这表明数据点可以合理地分为两到三个较为独立的聚类。\n",
+ "\n",
+ "现在我们可以继续提取聚类模型,其中`k = 3`:\n",
+ "\n",
+ "> `pull()`: 用于提取单列\n",
+ ">\n",
+ "> `pluck()`: 用于索引数据结构,例如列表\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "JP_JPKBILXzG"
+ },
+ "source": [
+ "# Extract k = 3 clustering\n",
+ "final_kmeans <- kclusts %>% \n",
+ " filter(k == 3) %>% \n",
+ " pull(model) %>% \n",
+ " pluck(1)\n",
+ "\n",
+ "\n",
+ "final_kmeans\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "l_PDTu8tLXzI"
+ },
+ "source": [
+ "太好了!让我们来可视化获得的聚类。想用 `plotly` 增加一些互动性吗?\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "dNcleFe-LXzJ"
+ },
+ "source": [
+ "# Add predicted cluster assignment to data set\n",
+ "results <- augment(final_kmeans, df_numeric_select) %>% \n",
+ " bind_cols(df_numeric %>% select(artist_top_genre)) \n",
+ "\n",
+ "# Plot cluster assignments\n",
+ "clust_plt <- results %>% \n",
+ " ggplot(mapping = aes(x = popularity, y = danceability, color = .cluster, shape = artist_top_genre)) +\n",
+ " geom_point(size = 2, alpha = 0.8) +\n",
+ " paletteer::scale_color_paletteer_d(\"ggthemes::Tableau_10\")\n",
+ "\n",
+ "ggplotly(clust_plt)\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6JUM_51VLXzK"
+ },
+ "source": [
+ "也许我们会预期每个聚类(用不同颜色表示)会有明显不同的类型(用不同形状表示)。\n",
+ "\n",
+ "让我们来看看模型的准确性。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "HdIMUGq7LXzL"
+ },
+ "source": [
+ "# Assign genres to predefined integers\n",
+ "label_count <- results %>% \n",
+ " group_by(artist_top_genre) %>% \n",
+ " mutate(id = cur_group_id()) %>% \n",
+ " ungroup() %>% \n",
+ " summarise(correct_labels = sum(.cluster == id))\n",
+ "\n",
+ "\n",
+ "# Print results \n",
+ "cat(\"Result:\", label_count$correct_labels, \"out of\", nrow(results), \"samples were correctly labeled.\")\n",
+ "\n",
+ "cat(\"\\nAccuracy score:\", label_count$correct_labels/nrow(results))\n"
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "C50wvaAOLXzM"
+ },
+ "source": [
+ "这个模型的准确性还可以,但并不算优秀。这可能是因为数据本身不太适合使用 K-Means 聚类。这些数据过于不平衡,相关性较低,并且列值之间的差异太大,导致聚类效果不佳。实际上,形成的聚类可能会受到我们之前定义的三个类别的强烈影响或偏斜。\n",
+ "\n",
+ "尽管如此,这仍然是一个很好的学习过程!\n",
+ "\n",
+ "在 Scikit-learn 的文档中,你可以看到像这样的模型,聚类边界不太清晰,存在“方差”问题:\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 来自 Scikit-learn 的信息图 \n",
+ "\n",
+ "\n",
+ "\n",
+ "## **方差**\n",
+ "\n",
+ "方差被定义为“与平均值的平方差的平均值” [来源](https://www.mathsisfun.com/data/standard-deviation.html)。在这个聚类问题的背景下,它指的是数据集中数值偏离平均值的程度过大。\n",
+ "\n",
+ "✅ 这是一个很好的时机来思考如何解决这个问题。稍微调整数据?使用不同的列?尝试不同的算法?提示:试试[对数据进行缩放](https://www.mygreatlearning.com/blog/learning-data-science-with-k-means-clustering/)以进行归一化,并测试其他列。\n",
+ "\n",
+ "> 试试这个‘[方差计算器](https://www.calculatorsoup.com/calculators/statistics/variance-calculator.php)’来更好地理解这个概念。\n",
+ "\n",
+ "------------------------------------------------------------------------\n",
+ "\n",
+ "## **🚀挑战**\n",
+ "\n",
+ "花些时间研究这个笔记本,调整参数。通过进一步清理数据(例如删除异常值),你能否提高模型的准确性?你可以使用权重为某些数据样本赋予更大的权重。还有什么方法可以创建更好的聚类?\n",
+ "\n",
+ "提示:试试对数据进行缩放。笔记本中有注释代码,可以添加标准化缩放,使数据列在范围上更接近。你会发现虽然轮廓分数下降了,但肘部图中的“折点”变得更加平滑。这是因为未缩放的数据允许方差较小的数据权重更大。可以在[这里](https://stats.stackexchange.com/questions/21222/are-mean-normalization-and-feature-scaling-needed-for-k-means-clustering/21226#21226)阅读更多相关问题。\n",
+ "\n",
+ "## [**课后测验**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/30/)\n",
+ "\n",
+ "## **复习与自学**\n",
+ "\n",
+ "- 看看一个 K-Means 模拟器 [例如这个](https://user.ceng.metu.edu.tr/~akifakkus/courses/ceng574/k-means/)。你可以使用这个工具可视化样本数据点并确定其质心。你可以编辑数据的随机性、聚类数量和质心数量。这是否帮助你更好地理解数据如何分组?\n",
+ "\n",
+ "- 另外,看看斯坦福的[这份 K-Means 手册](https://stanford.edu/~cpiech/cs221/handouts/kmeans.html)。\n",
+ "\n",
+ "想尝试将你新学到的聚类技能应用到适合 K-Means 聚类的数据集上?请参考以下内容:\n",
+ "\n",
+ "- [训练和评估聚类模型](https://rpubs.com/eR_ic/clustering),使用 Tidymodels 和相关工具\n",
+ "\n",
+ "- [K-Means 聚类分析](https://uc-r.github.io/kmeans_clustering),UC 商业分析 R 编程指南\n",
+ "\n",
+ "- [使用整洁数据原则进行 K-Means 聚类](https://www.tidymodels.org/learn/statistics/k-means/)\n",
+ "\n",
+ "## **作业**\n",
+ "\n",
+ "[尝试不同的聚类方法](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/2-K-Means/assignment.md)\n",
+ "\n",
+ "## 特别感谢:\n",
+ "\n",
+ "[Jen Looper](https://www.twitter.com/jenlooper) 创建了这个模块的原始 Python 版本 ♥️\n",
+ "\n",
+ "[`Allison Horst`](https://twitter.com/allison_horst/) 创作了令人惊叹的插图,使 R 更加友好和吸引人。可以在她的[画廊](https://www.google.com/url?q=https://github.com/allisonhorst/stats-illustrations&sa=D&source=editors&ust=1626380772530000&usg=AOvVaw3zcfyCizFQZpkSLzxiiQEM)中找到更多插图。\n",
+ "\n",
+ "祝学习愉快,\n",
+ "\n",
+ "[Eric](https://twitter.com/ericntay),Gold Microsoft Learn 学生大使。\n",
+ "\n",
+ "
\n",
+ " \n",
+ " 由 @allison_horst 创作的艺术作品 \n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/solution/notebook.ipynb b/translations/zh-CN/5-Clustering/2-K-Means/solution/notebook.ipynb
new file mode 100644
index 000000000..5bcd2b1e8
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/solution/notebook.ipynb
@@ -0,0 +1,550 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "e867e87e3129c8875423a82945f4ad5e",
+ "translation_date": "2025-09-03T20:11:32+00:00",
+ "source_file": "5-Clustering/2-K-Means/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: seaborn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.11.1)\n",
+ "Requirement already satisfied: pandas>=0.23 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.1.2)\n",
+ "Requirement already satisfied: matplotlib>=2.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (3.1.0)\n",
+ "Requirement already satisfied: scipy>=1.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.4.1)\n",
+ "Requirement already satisfied: numpy>=1.15 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.19.2)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2.8.0)\n",
+ "Requirement already satisfied: pytz>=2017.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2019.1)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.1.0)\n",
+ "Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0)\n",
+ "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.4.0)\n",
+ "Requirement already satisfied: six>=1.5 in /Users/jenlooper/Library/Python/3.7/lib/python/site-packages (from python-dateutil>=2.7.3->pandas>=0.23->seaborn) (1.12.0)\n",
+ "Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.2->seaborn) (45.1.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n",
+ "You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "pip install seaborn"
+ ]
+ },
+ {
+ "source": [
+ "从我们上节课结束的地方开始,导入并过滤数据。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " name album \\\n",
+ "0 Sparky Mandy & The Jungle \n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "2 LITT! LITT! \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "0 Cruel Santino alternative r&b 2019 144000 48 \n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "2 AYLØ indie r&b 2018 207758 40 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "0 0.666 0.8510 0.420 0.534000 0.1100 -6.699 \n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "2 0.836 0.2720 0.564 0.000537 0.1100 -7.127 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "0 0.0829 133.015 5 \n",
+ "1 0.3600 129.993 3 \n",
+ "2 0.0424 130.005 4 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 "
+ ],
+ "text/html": "
\n\n
\n \n \n \n name \n album \n artist \n artist_top_genre \n release_date \n length \n popularity \n danceability \n acousticness \n energy \n instrumentalness \n liveness \n loudness \n speechiness \n tempo \n time_signature \n \n \n \n \n 0 \n Sparky \n Mandy & The Jungle \n Cruel Santino \n alternative r&b \n 2019 \n 144000 \n 48 \n 0.666 \n 0.8510 \n 0.420 \n 0.534000 \n 0.1100 \n -6.699 \n 0.0829 \n 133.015 \n 5 \n \n \n 1 \n shuga rush \n EVERYTHING YOU HEARD IS TRUE \n Odunsi (The Engine) \n afropop \n 2020 \n 89488 \n 30 \n 0.710 \n 0.0822 \n 0.683 \n 0.000169 \n 0.1010 \n -5.640 \n 0.3600 \n 129.993 \n 3 \n \n \n 2 \n LITT! \n LITT! \n AYLØ \n indie r&b \n 2018 \n 207758 \n 40 \n 0.836 \n 0.2720 \n 0.564 \n 0.000537 \n 0.1100 \n -7.127 \n 0.0424 \n 130.005 \n 4 \n \n \n 3 \n Confident / Feeling Cool \n Enjoy Your Life \n Lady Donli \n nigerian pop \n 2019 \n 175135 \n 14 \n 0.894 \n 0.7980 \n 0.611 \n 0.000187 \n 0.0964 \n -4.961 \n 0.1130 \n 111.087 \n 4 \n \n \n 4 \n wanted you \n rare. \n Odunsi (The Engine) \n afropop \n 2018 \n 152049 \n 25 \n 0.702 \n 0.1160 \n 0.833 \n 0.910000 \n 0.3480 \n -6.044 \n 0.0447 \n 105.115 \n 4 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ],
+ "source": [
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "\n",
+ "\n",
+ "df = pd.read_csv(\"../../data/nigerian-songs.csv\")\n",
+ "df.head()"
+ ]
+ },
+ {
+ "source": [
+ "我们将只专注于三个类型。也许我们可以建立三个集群!\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Top genres')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 12
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]\n",
+ "df = df[(df['popularity'] > 0)]\n",
+ "top = df['artist_top_genre'].value_counts()\n",
+ "plt.figure(figsize=(10,7))\n",
+ "sns.barplot(x=top.index,y=top.values)\n",
+ "plt.xticks(rotation=45)\n",
+ "plt.title('Top genres',color = 'blue')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " name album \\\n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "5 Kasala Pioneers \n",
+ "6 Pull Up Everything Pretty \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "5 DRB Lasgidi nigerian pop 2020 184800 26 \n",
+ "6 prettyboydo nigerian pop 2018 202648 29 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "5 0.803 0.1270 0.525 0.000007 0.1290 -10.034 \n",
+ "6 0.818 0.4520 0.587 0.004490 0.5900 -9.840 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "1 0.3600 129.993 3 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 \n",
+ "5 0.1970 100.103 4 \n",
+ "6 0.1990 95.842 4 "
+ ],
+ "text/html": "\n\n
\n \n \n \n name \n album \n artist \n artist_top_genre \n release_date \n length \n popularity \n danceability \n acousticness \n energy \n instrumentalness \n liveness \n loudness \n speechiness \n tempo \n time_signature \n \n \n \n \n 1 \n shuga rush \n EVERYTHING YOU HEARD IS TRUE \n Odunsi (The Engine) \n afropop \n 2020 \n 89488 \n 30 \n 0.710 \n 0.0822 \n 0.683 \n 0.000169 \n 0.1010 \n -5.640 \n 0.3600 \n 129.993 \n 3 \n \n \n 3 \n Confident / Feeling Cool \n Enjoy Your Life \n Lady Donli \n nigerian pop \n 2019 \n 175135 \n 14 \n 0.894 \n 0.7980 \n 0.611 \n 0.000187 \n 0.0964 \n -4.961 \n 0.1130 \n 111.087 \n 4 \n \n \n 4 \n wanted you \n rare. \n Odunsi (The Engine) \n afropop \n 2018 \n 152049 \n 25 \n 0.702 \n 0.1160 \n 0.833 \n 0.910000 \n 0.3480 \n -6.044 \n 0.0447 \n 105.115 \n 4 \n \n \n 5 \n Kasala \n Pioneers \n DRB Lasgidi \n nigerian pop \n 2020 \n 184800 \n 26 \n 0.803 \n 0.1270 \n 0.525 \n 0.000007 \n 0.1290 \n -10.034 \n 0.1970 \n 100.103 \n 4 \n \n \n 6 \n Pull Up \n Everything Pretty \n prettyboydo \n nigerian pop \n 2018 \n 202648 \n 29 \n 0.818 \n 0.4520 \n 0.587 \n 0.004490 \n 0.5900 \n -9.840 \n 0.1990 \n 95.842 \n 4 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ],
+ "source": [
+ "df.head()"
+ ]
+ },
+ {
+ "source": [
+ "这数据有多干净?使用箱线图检查异常值。我们将集中在异常值较少的列(尽管你可以清除异常值)。箱线图可以显示数据范围,并帮助选择使用哪些列。注意,箱线图不显示方差,这是良好聚类数据的重要元素(https://stats.stackexchange.com/questions/91536/deduce-variance-from-boxplot)。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(20,20), dpi=200)\n",
+ "\n",
+ "plt.subplot(4,3,1)\n",
+ "sns.boxplot(x = 'popularity', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,2)\n",
+ "sns.boxplot(x = 'acousticness', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,3)\n",
+ "sns.boxplot(x = 'energy', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,4)\n",
+ "sns.boxplot(x = 'instrumentalness', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,5)\n",
+ "sns.boxplot(x = 'liveness', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,6)\n",
+ "sns.boxplot(x = 'loudness', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,7)\n",
+ "sns.boxplot(x = 'speechiness', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,8)\n",
+ "sns.boxplot(x = 'tempo', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,9)\n",
+ "sns.boxplot(x = 'time_signature', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,10)\n",
+ "sns.boxplot(x = 'danceability', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,11)\n",
+ "sns.boxplot(x = 'length', data = df)\n",
+ "\n",
+ "plt.subplot(4,3,12)\n",
+ "sns.boxplot(x = 'release_date', data = df)"
+ ]
+ },
+ {
+ "source": [
+ "选择几个范围相似的列。确保包括 artist_top_genre 列以保持我们的流派清晰。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.preprocessing import LabelEncoder, StandardScaler\n",
+ "le = LabelEncoder()\n",
+ "\n",
+ "# scaler = StandardScaler()\n",
+ "\n",
+ "X = df.loc[:, ('artist_top_genre','popularity','danceability','acousticness','loudness','energy')]\n",
+ "\n",
+ "y = df['artist_top_genre']\n",
+ "\n",
+ "X['artist_top_genre'] = le.fit_transform(X['artist_top_genre'])\n",
+ "\n",
+ "# X = scaler.fit_transform(X)\n",
+ "\n",
+ "y = le.transform(y)\n",
+ "\n"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([2, 1, 2, 2, 2, 2, 2, 2, 2, 1, 2, 2, 1, 2, 0, 2, 1, 1, 0, 1, 0, 0,\n",
+ " 0, 1, 0, 2, 0, 0, 2, 2, 1, 1, 0, 2, 2, 2, 2, 1, 1, 0, 2, 0, 2, 0,\n",
+ " 2, 0, 0, 1, 1, 2, 1, 0, 0, 2, 2, 2, 2, 1, 1, 0, 1, 2, 2, 1, 2, 2,\n",
+ " 1, 2, 1, 2, 2, 1, 1, 1, 1, 1, 2, 1, 2, 2, 0, 2, 1, 1, 1, 2, 2, 2,\n",
+ " 2, 1, 2, 2, 2, 2, 1, 1, 2, 1, 1, 2, 1, 2, 1, 2, 1, 2, 2, 1, 2, 0,\n",
+ " 1, 1, 2, 1, 1, 2, 2, 2, 2, 2, 2, 2, 0, 1, 1, 1, 1, 0, 1, 2, 1, 2,\n",
+ " 1, 2, 2, 2, 0, 2, 1, 1, 1, 2, 1, 0, 1, 2, 2, 1, 1, 1, 0, 1, 2, 2,\n",
+ " 2, 1, 1, 0, 1, 2, 1, 1, 1, 1, 2, 0, 0, 0, 1, 0, 0, 0, 0, 0, 0, 2,\n",
+ " 0, 1, 0, 0, 1, 0, 0, 2, 0, 0, 1, 1, 2, 0, 2, 2, 0, 2, 2, 1, 1, 0,\n",
+ " 1, 1, 0, 0, 1, 0, 2, 0, 1, 0, 2, 0, 0, 2, 2, 2, 1, 1, 1, 1, 1, 0,\n",
+ " 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 2, 2, 1, 1, 0, 1, 1, 1, 0, 2, 2, 2,\n",
+ " 1, 1, 0, 0, 1, 1, 2, 0, 0, 0, 0, 0, 2, 0, 0, 2, 1, 1, 1, 2, 2, 2,\n",
+ " 1, 2, 1, 2, 1, 1, 1, 0, 2, 2, 2, 1, 2, 1, 0, 1, 2, 1, 1, 1, 2, 1],\n",
+ " dtype=int32)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ],
+ "source": [
+ "\n",
+ "from sklearn.cluster import KMeans\n",
+ "\n",
+ "nclusters = 3 \n",
+ "seed = 0\n",
+ "\n",
+ "km = KMeans(n_clusters=nclusters, random_state=seed)\n",
+ "km.fit(X)\n",
+ "\n",
+ "# Predict the cluster for each data point\n",
+ "\n",
+ "y_cluster_kmeans = km.predict(X)\n",
+ "y_cluster_kmeans"
+ ]
+ },
+ {
+ "source": [
+ "那些数字对我们来说意义不大,所以让我们获取一个“轮廓分数”来查看准确性。我们的分数处于中间水平。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0.5466747351275563"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 17
+ }
+ ],
+ "source": [
+ "from sklearn import metrics\n",
+ "score = metrics.silhouette_score(X, y_cluster_kmeans)\n",
+ "score"
+ ]
+ },
+ {
+ "source": [
+ "导入KMeans并构建模型\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.cluster import KMeans\n",
+ "wcss = []\n",
+ "\n",
+ "for i in range(1, 11):\n",
+ " kmeans = KMeans(n_clusters = i, init = 'k-means++', random_state = 42)\n",
+ " kmeans.fit(X)\n",
+ " wcss.append(kmeans.inertia_)"
+ ]
+ },
+ {
+ "source": [
+ "使用该模型,通过肘部法确定构建的最佳聚类数量\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/seaborn/_decorators.py:43: FutureWarning: Pass the following variables as keyword args: x, y. From version 0.12, the only valid positional argument will be `data`, and passing other arguments without an explicit keyword will result in an error or misinterpretation.\n FutureWarning\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(10,5))\n",
+ "sns.lineplot(range(1, 11), wcss,marker='o',color='red')\n",
+ "plt.title('Elbow')\n",
+ "plt.xlabel('Number of clusters')\n",
+ "plt.ylabel('WCSS')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "source": [
+ "Looks like 3 is a good number after all. Fit the model again and create a scatterplot of your clusters. They do group in bunches, but they are pretty close together."
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYIAAAEKCAYAAAAfGVI8AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAgAElEQVR4nOydd3gUVduH7zOzNZWE0BKqVAHpAgpSFBTBil2wI2Lvfnasr72j2MWCBVGQpqgIAtKl914SSoD0bJ853x+zCdnsbFhCAop7X5eXZHZ2zskmOc85T/k9QkpJjBgxYsT476Ic7wnEiBEjRozjS8wQxIgRI8Z/nJghiBEjRoz/ODFDECNGjBj/cWKGIEaMGDH+48QMQYwYMWL8x6k2QyCE+FQIkS2EWB3hdSGEeFsIsVkIsVII0am65hIjRowYMSJTnSeCMcCACl4/F2ge/G84MLoa5xIjRowYMSJQbYZASjkbyKnglguBL6TBAqCGEKJedc0nRowYMWKYYzmOY2cAu8p8nRm8tqeiN6WlpcnGjRtX47RixIgR48Tj77//PiClrGX22vE0BFEjhBiO4T6iYcOGLFmy5DjPKEaMGDH+XQghdkR67XhmDWUBDcp8XT94LQwp5YdSyi5Syi61apkatBgxYsSIUUmOpyGYBFwbzB7qDuRLKSt0C8WIESNGjKqn2lxDQohvgD5AmhAiExgJWAGklO8D04CBwGbABdxQXXOJESNGjBiRqTZDIKW86jCvS+D26ho/RowYMWJER6yyOEaMGDH+48QMQYwYMWL8x4kZghgxYsT4jxMzBDFixIjxH+dfUVAWI0aM/xYysAu8vxtf2PsjLPWP74ROcGKGIEaMGP8o9OLPofBVQBoXCl9HJj6IEn/tcZ3XiUzMNRQjRox/DDKwM2gEvIAv+J8XCl8xTgkxqoWYIYgRI8Y/B+9vgG7ygg7eX4/1bP4zxFxDMWIcJQd25zD5velsWbGdFqc24/wRZ5NSO/l4T+tfijzeE/hPEjMEMU54pJRIKVGUqj8Ab125g3t7PYHf68fvDbBsxiomvDWVd+b/j/ot0qt8vBMee38ofMvkBcV4LUa1EHMNHWN25uex/sB+NN3s+BujKinOL+bl60cx0Hk1A2xXcn/fkezaYCpwW2neuvVDXAVu/N4AAD6Pn+I8F+/dM6ZKx/mvICyNIOEewI6xT7UY/064B2FpeETPklJH+jciAxHVl2MEEYbkz7+HLl26yH9jP4LMgnxumfIT2/JyUYXAplp49ewB9G180vGe2gmJlJI7uj3CtpU78PuMRVoIQXxyHGM2vk1yWtJRj6FpGufar0Lq4X9DVruFae5vjnqM/yoysB08wZiA42yEpfGRvd87D5n/AEgXSB3U+oiUdxGWJlU+138LQoi/pZRdzF6LnQiOAbqUXP3jODYcPIAnEKDY7yfX4+b2aZPZlpd7vKfHrvx83l44n2dnz2Tuzh382zYHZqxbuImd6zJLjQAYxsHn9fPLp39UyRiKomCxmntX7U57lYxRFUj57zt9CktjRMJw478jNQLaHmTuraAfMAwBHtC2IHOGIqW/Wub7bydmCI4Bi7IyyXW70cstsAFd5+tVK47TrAymbtzAOWPH8O7iBXy2fCkjpv7ELVN+qhbXVXF+MZuXb6Mwt6jKn12ezA27Ta/73D62rqwaV4EQgn5De2G1W0Ou2xxWBtx0ZpWMcTTorono2b2Q+1qhZ/dEd/1wvKd0TJCu8UCg/FXDKHjnHo8p/eOJBYuPAftdxabXA7pOVmHBMZ7NIVx+Pw/9/gueQCDk2rzMnUzfspmBzVtUyTi6rvPhg18yefR0LDYLAV+As4b24u73bka1qFUyRgl5+/NZ+vsq8g8UoJu4bOxxNpp3qjp33K1vXMfuLXtZv2gTqkVF82u069OGG569ssrGqAy66ycoeBLwBC9kQ8Ez6IASd8nxnFr1o+8FTHb+UjdOCTHCiBmCY0CnuukETHbYTouV3g0bH/sJBVmUlYlqkknj8vv5acPaKjMEP7w+hSkf/IbP48fnMf5A/xg7h6TUBIa9OLRKxgCY+M40PnzoKyw2FYEg4AugWo3FGUBRBPY4O+fc0LfKxnQmOHn1j6fYtnonmRt206hNAxq2yjC9t8TlJoSosvEjUvwmpUagFDcUvQUnuCEQttOQ7qkY/a7KooOt0/GY0j+emGvoGJCRlMRlrdvitBxyIdhVlXqJCVzY6uTjNi+LokRM27apVbdHGP/6ZLwub8g1r9vHpPemV1k8YsuK7Xz88Fj8Xj/uQg+uQje6piMAZ6IDq91Ct0GdeXfRiySmJFTJmGVp0rYhZ1zS3dQI7N2ezaPnPs8A25UMdF7NC0PfoiCnsMrnEIIWoeurvveEiAFViONssDTEyDwqwQmOAQhL0+M1q380sRPBMeLpPmfRqV4GX6xchsvnY2DzltzQoRMOi/Xwb64mumbUR1XCd6dOi5XLW7etsnEixQQ8xR50Ta8S99D0MTPxe8PdATaHjQc+u50zBnc76jEqg6vQzZ3dHqHgYCG6LtE1ndnjF7B15Q4+WP5qtdQ2AKDWB21n+HUl/dicSMohpRf0IlBSEKJ6959C2CD1W6TrS/BMBuyIuCHgvKhax/03EzMExwghBBe1OpmLjuMJoDw2VeWD8y7ipkk/AqAFC6+uansKPRs2qrJxWnRuypp5G8KuN2iVUWUxAneRxzQmIKXEU1zeRXLsmPHVbDwub8jcAr4A+7bvZ8WsNXQ885TqGTjhfsj/P0LdQw5IuK9Kh5FSgn8J0vMHiHiE84KQfH8pfciC58A9AZCgJCMTn0BxDqjSeZRHKHGIhFsg4ZZqHedEoVpNsxBigBBigxBisxDiYZPXGwkhZgghVgohZgkhYlqzx5iuGfVZcNMInjuzP4/27M3PQ67j8V59q3TXeOsb1+OIs6METx9CCOxxNu5456YqG+OMwd1xxIenbGoBjc7921XZOEfKttU78RR7w65rms6u9eaZTVWB4jwXkl8CtSGgGv9PfgEl7oIqG0NKicx/CJk7DFyfQPF7yAOD0F0TD92T/zi4J1IqIqfvh/yHkL5FVTaPGEdPtZ0IhBAq8C7QH8gEFgshJkkp15a57VXgCynl50KIM4EXgGuqa04nGlJKxq5awUdLl5DncXNqen0e7tmLZqk1j+g58TYbF7asvpNKy1Ob8c7CFxj7/A9sXrqVRm0aMOSxS6o0e6fLOe3pfHZ7/v51BZ5iL0KAzWnjuqevILVuSpWNEwktoFGYW0RiSkLIKadp+8Y44u1hxkBVFRq1rt59j+I8F5znVt8AvjmGSJx0By8EjP8KnkA6zgQkeKZhKIiWxYMsGo1I7Vp9c4txRFSna6grsFlKuRVACPEtcCFQ1hC0BkrOqjOBifzLkVJG3E3vys/npw3rKPL5OLPJSZyannFUO+8X587mq1XLcQfTP2du38rCrEymXX0tDZL/WaJnjds04LGv76m25yuKwlWPXMzmpdvwuo0UwabtG9NvaK9KPa+in+OcHxYw7tVJ5O8voMs57YlLdDJ59K/4fQFsDitDn7yMS+4ZhBCCM6/uyedPjcPn8aNrRuaY1WYho3k92vVuXblv9h+CdE8NFmyVQ1jA9xeozUBYQZY3BJjHL2IcN6rTEGQAZQXEM4HyEbsVwGDgLeBiIFEIUVNKebDsTUKI4cBwgIYNj0xv5FixKCuTZ/78g3UH9pNkt3NTx87c2qVbaXrmTxvW8cjvv6JJHb+u8+XK5XTLqE+yw8HO/Dy612/A9R06USsuPqrx8j0evli5DK+mlV6TgCfg5/2/F/H8mcdWoCtvfz5/TViE3xug23mdqNekzlE/c/nM1Ux4exp5+wvocVFXzrulP3GJzpAxJ749jaUzVpNat4ZxGiiTnbRh8Rbu7/sUn6x5I2qDu2XFdt654xPWzt+AI87OucPO4qYXhmALFo19+ez3fPfyT3iDO/wpH/wWIjHh9/r5/IlviUt0MHBYP5wJTkYtfIF37/qUxb8sQ7Wo9LmyByNevfa4BG2rFGEBBOapZxawNDBy98NQwNqheucW44ioNq0hIcSlwAAp5bDg19cA3aSUd5S5Jx0YBTQBZgOXAG2llHmRnlsVWkP5Hg/j1qxi2b49tKyZxlVt21E7vvIphWuy93H5+G9Ld+YATouFy9ucwsjeZ1Lg9dL9k/dDCrdKKPkzsikq8TYrk6+6hvTEw+vgLN+7h+smjqfQF77balUzjWlDrotq7gFd5+OlS/hq1XKKfX56NWrMQz3OICOKOZQwe/x8XrpuFIoQ6MF6iasfu4Qhjx1Zvrqr0M3fv61ECNi1Pouxz/9YmnZqc9qo3aAm7y15CWeCk4N7chnR8UGK812m2UIlOBMdPDvpYdr3bnPY8fft2M/Np9yHu+hQgNXmtNLl7A48PeEhivOLubzecHwekx1uOWo1SOPrHaOj+K7/vUjfEmTOTYA79AURh6g9HyGc6EXvQ/HoMu4jAcKJqPlDLJXzGFOR1lB1ngiygAZlvq4fvFaKlHI3xokAIUQCcElFRqBKJlVYwKCvv6DI60NH8uvmTXy4ZDHjL7+Kk2vVrtQz3160IGyRdwcCfLt6Jfd278G8XTuNnH0TSsywT9fQvDpvLPiLV/of3q+bnpiIr8xpoAQBnJSSGvXcH/j1Z6Zv2VR6spiycT1zd+7gt2uuJ9UZd9j3F+QU8tJ1o/C5QxfHb/73I90GdqJZx+hEvmaPn8/L149CtahIKXEXhmb6+Nw+sncdZOqHv3Ppfecz9rnxFOYUoQXCP4OySF2yd1t2VIZgwttTS1VED43rZ8n05ezZto+DWTlY7ZaoDEHu3iPTkFo+czVf/+9H9m7LpvXpLRn6xKXUb17viJ5xrBG2Lsj466H4U4zfPAWQiBqjEMI4uYn4W0BNRxaNNqp6bZ0g4U7wr0Z3jTdE4ByDEEp0J+EY1UN1Zg0tBpoLIZoIIWzAlcCksjcIIdLEoaTiR4BPq3E+APzfb9Mp8HrRg0uwDri1ALdPm1LpZ248eMD0cGxVVXYXFmAxydU3Q5OSOTui08GpHZ9A38YnYVdD0y8dFgsjuhwKwi3bs5sLv/2KZu+8TocPRvHmgr9Kq5x35efzy+aNYe6lAq+Hb1atjGoeC6csRVXDf438Xj8zvp4T1TMOZB3kpetG4XX5cBW4w4xACT63j/mTjdPg4p+XH9YIgOHrj9YYbVq6jYA//NRmtVvJ3LCb1HopISJ2FdHgZPPqYjP++HYuj5//AstmrGLP1n3M/GYut3V5iB3rMqN+xvFCSbwXkTYVkfgQIulJRO25CHtPpLYbGdhmSEFru0E/CLIQApmQOxxZMBJcnyALnkfuP8toUVkBUi9Gun9CFn+O9IenIsc4OqrNEEgpA8AdwHRgHTBOSrlGCPGMEKIkh60PsEEIsRGoAzxfXfMpYUGWed/T7fm5eE1cN9HQKi0Ns6Xer2lkJCbRo0GjqKs5kx2OqMd9/Zxzuahla+yqikVRqJ+UxHsDL6BtbcM/v+ngQYZO+J5V2fvQpaTA6+XDpUt4YubvAKzdn43fRPpCk5KZ27dGNQdN002/NymJaqEG+HPcfFMp5/IIASl1awCQWPPwrjy700a7Xm1o2r5xVPNo3qmJqZqo3+unfst00pvWpWWXplhsFR+k7U4bt7wSXaN1TdN47+7P8LoOnTJ0TcdT5GXM499G9YzjjbA0RMQPRcQNBj0X/cCFyP3nIA9cBPs7Q9EokHmADtqmMqqgAG6QeciCxyM+X/qWI/efgcwfiSx8BXnwMvT8h0/8CuljSLXWEUgpp0kpW0gpm0opnw9ee1JKOSn47/FSyubBe4ZJKcMTrquY8gqgZamsIbiz62k4LKGLg9Ni4aq27Ui023Farbw38AIcqorDYsGqKMZBulyw0GmxMKxj56jHdVisvNDvbFaMuJPFw27lz+uG0bvxod3v6CULQ3b7AJ5AgInr15LjdlHk90VsDGgWezCj28COpdkwZbE5bfS+7PSonuEp9kZlNGxOGxfdYbjNLr3v/LC6AdWqUDMjleS0JNLq1+TKhy/i6YkPRjUHgMF3D8LqKKck6rRx6oCOpcHvpyY8SPverbHarTgTHCSkxHPF/11Iiy5NSUiJp/XpLXh+6qN07t8+qjFz9+WHxCRKkFKy+q91Uc+99H2BzUjvQqRezRIWZmPLADJnCAQ2YNQNuIML/uF+l3TwLUKaZBdJqSHzbgVZhKEd5AM84P4ZvNOr+luocqS2Fz1/JPr+/ugHr0Z6Zh7vKZnyn6ssTrDZKDJZ5FQhiLfZKvXM1rVqM+aiS3jmz5msO7CfZLudG4NZQyXUS0ykZlw8+13FKEJgV1XqJyWTWViAVVHwaRpDTmnPpZWQdrCpKjY1vEJ37YH9pobPpqrsys+nht1hBHhN7qkZRXwAIKVODW598wZG3zsGLaChazo2h43+1/biQOZBJrw9jdant6Rll8iBwVPP7cA3L04I0yMSwpCIUK0qWkDnllevoW2PVgD0vbIH21fv5Ic3pmC1Wwn4AjTr2IRnJv0fSamJUc29PLUb1uLNOc8y6s5PWDPPyBoaOLwfNzx3Vek9SamJvDj9CXL35VGYW0xGs7pHVR0dnxwX8TSUcgT1D1I7gMwdDoHNpSmbMuEOlGNZWeubG1ywKyNhXhJjKId/FUgzV6Eb6RqHcFRvhfLRILV9yAMXBD+TAGg7kHlrkIn3osRff7ynF8J/zhDc2+10np/7Z8jiJ4Ahp7Q3VeKMxAGXi+/WrGTDgQO0q1OXy1q3ZfJV5rVwmq4z9Mfv2e8qDtmBZxYW8N7A87GrFlqmpUUVnD0STk6rxeacg2ELvVfTaJCcTIPkZCxBI1QWu6pyXouWUY9z3vD+dDyzLTO//Qu/x0+Tdo14546PmTF2DgGfhqIqdOp3CiPHP2C6aLbo3JRmnZqwZu76kOsd+rZlxOvXU5hTRItTm+KMP+Q2E0Jw4/NXc+n957N1xQ7SMlKrpEdwfHIcaRmpJNSIIy7R+LfFGj7nlDo1SKlT46jHc8Y7OPPqnsz89q+QgLs93s7Vj1xc+rXU9iILXwPvbBBOiBuCiL8BIYw/YZl3OwTWA4FDC2fxe0hLC4Sj6tRWK0TLjpAuejgsYO9V+r2EEgBTxysgK3eCP1bI4g8PGYFS3FD4BjLuitKA+j+B/5whuL5DJ3I8bj5eagQddQmXnNyax87oE/UzNh48wGXff4tPC+DVNH7ftoX3/17ExCuGUD8pvJBrfuYuiv3+MDeMX9OYtWM7T/WuniYmt3bpyq9bNoWktTosFi5o0arU6NzZtTvvLV5Yeo/DYqFRcg0Gnxy52ElKyboFG1k4bSlxiU76XtmDjGb1GPr4pUgpubH1PRQeLKSs/Vn6+yqmfvg7F9x2Ttjz9mzbx6YlW8Kur5m3gcTUBE5qZ657tG/HfsY+/wMr/1xDrQZpXPXwxXTqV3k5idx9edza5SGKc4vRdUnBwSI+e/xbdqzN5L4PR1T6uYfjrneH4ff6mfPDQiw2FalLhj55GX2u6AGA1POQBy8GPQ/QQOZC0TvIwDpEjdeRgUzwryWsGYt0I12fVbkhkHoe+DeAWi+0j7C1IxHlbEMQGMqgunF6UVIRSc+a32pth7kH24mI+4eLyHnnE94gBxAqBLaAteqEHY+W/2zPYpffT2ZBPnUTEkiyRx+gBbjs+29Yumd3yK+8KgRnNWnK++ddGHb/lI3refj3X3EFTNQxVZWArtM8tSaP9+pDjwZVJ/YGRtbQU3/+wersfSTa7VzXviN3dj0tJJ11zs7tfLliOXleDwObteCKNqfgtJqrokopeeWGd5kzfgEetxeL1YKiKjz02e30vvx0dm/Zy/D294cEP0to2qEx7y99Jez6uFd+4rMnviVQLiPH5rAy7MWhXHzXwLD37Nm2j1s7P4SnyIMWMHah9jg7t799A+feeNYRfUYljHnyW8a9MimsLsHqsPLF5lGkpRtpuWvmbWDCO9PI2ZNL9/O6MGh4P+KTKj7NSWmksVpsFmrVN5cAKcgpJGdPHvVOqh3S6lIv+gCK3iW8v4AdkTYNZAEy55rg7rMcanOUWlONOfhXGgJw/lUgEiDuGkTCbRF24ubfgyx8FVxfgLCB9IO1PSLlPYRiuOP03HvAO5NDtQV2YywZAArA0goSH0GgQWAjqI3BfkaFc5De2cjcOwEN8IGIA2tnRMoHUc/9eKDn3Gi4y8KwIWr9jlDrHtP5HK86gn8Ufk3DHQiQaLMhhCDOaqVFzbQjfk5A11m2d0/YvkeTktk7t5u+p1PddFMjAJS6ZTYcPMDNkyfy5cWX0rleaOphkc+HVVGwW478x9WxXjo/XVlx85czGjbmjCgb5Cz+ZTlzflhQWsFbsni/cuO7nHpuRwJ+LWLFbMBvHhAO+DVTP7muy4jv+eqZ8bgLPSGBaq/Lywf3f0H/a3pH7CUspWTt/I0cyMqhRZeTQiqgV81ZZy5lbbeybeUO0tJTmfrRb4y+93N8bi9SGtXLU97/ldF/v0R8snku/Nr5G3hhyFvkZucjdUmDVhk8Me4+MpqF1gkkpSaaxzd8Swg3Ahi76cA6sPfBfCduA4dx2pSBrUFjEVygZT4Uf4zU9yKS/2c67zA8E8H1FeCFkrwO/zJk/oOIlPeNKdV4DekaB+5vjHscgxDxNyIUkywve4+ohhX2XlDrV6R7MugHEfaeYDut2uWsjxYRfzPS9zehBXc2sHU95kbgcJzwhsCvabz412y+Wb2SgK6TFhfHYz37ALD2QDZNaqQwqHnLiDvg8ihCoEYIsJbP6S9ha14uFqEQOIz/1BMI8Mb8v/hq8OUALN2zm0dm/MrW3FwUAec0bc5zZ/YnyX78GqPPGDvHVE1Ttags/X0lPS7qSmJqQtg9NqeN/teY6/6cfuGpjH3+h7DMIUURnH6h6QaGFbPWmGYraZrO3m3ZpvGCg3tyefCsp9ifmYMiBAF/gL5X9eS+j0agKAoNW2Wweu76sOcGfAHqNK6Nx+Xl/fs+Dwlq+9w+DmTl8NN707n6kcFhY+buy+Phc54LyQzaunIH9/V6krE7Rkc0WCFYmoJvHuHtFzVQ6yOEDZn4BBQ8hZGtIwG7of0ffyNQ4q8OF3/DPQmZ+ABCOXwRoiz+lLAqYnzgnYvU8xFKMkKoiPirIP4qs0dUGqHWQSQMq9JnVjfCfhoy6TEofBGjZ7LfMGA1Xj/eUwvjhDcEI2fNYOL6dXg0Y+e6t6iIu36Zgk1V8WoacVYrL/01hx8vvzoqoTZFCAY2b8G0TZvw64cWLruqRsz4yfO4sVssBPyHT8ncmGPILO3Kz+eaCeNxB08SmoTpWzax/sB+GiQnE9B1Lm7VhvNbtDyiIPfRoloij6WoCkIIHvvmXh4Z8BxaQMPn8eNMcNDw5AwuutO8YrpxmwZcev/5/PDa5NKCLavNwlWPDg7ZNWdu2sOPb0xh+5pdYW6kEjS/RlKaedbQ81e9QdamvSEL/azv5nFyt+YMGt6fwfcM4rcvZ4cs9Fa7hZanNsMRb2fRz8tMTyh+r5854+ebGoJfv5gVZuCkLnEXe1n08zJOv+BU07mWRcQNQbq/MRaSQzMDtTnCasRylLjBSEsTZPFnRs9e2xmI+GsQSjCg7V+L4Vop/3A7BHaALYpqdD1S0b8SdEslG7n93llI93cgPQjH+eC8ACGOXwOmo0EGthlGWCSC/awjroBW4i5HOi8yPmMlBaEeuRfiWHBCG4ICr5cf163Fp5f7Q4TS/HqX348nEODhGdMZG9yJH46n+/RjW14em3MOIjBqEzrWTef+08yPul3SMwjo0RVXNUsx/Mefr1iKTwtd7Py6zubcHDbn5gCwZPduJm9cx8fnX1ytAmYuv5+J69eyKCsTxzn1UX9LQNsT6o+WuiwN1LY5vSVfbBnFjLFz2J+ZQ7szTqbbeZ1QI5yYAG545krOGNyNOeMXIISg9+Wn0eSUQ/GSNfM28PA5z+L3+tECuqlBsjqsdBvYydS9krsvj/ULN4ft9r0uLxNH/cyg4f1p0DKD56c8wmvDRrM/0zDI7fq0IXvHAW5oeRdSyogGyBWhGjp754HSPs1l8Xv9HMzKifh5lEVYGkDKp8j8R0ELVhvbe4e5dIStI8LW0fwhlpMNn3z51E7pBbWB6VvCsPcM9hYo97usJIBiGGxZ+CK4vqXk5CB9y4ymNKmfYyjT/zsw4iHPgWuccUFYgKcg5SOELfpaHwh2TLM2r/I5ViUntCHYX1xkWjlbHl1KFmVl4g0EovLDJ9ntTLj8albu28u2vFxa1kyrUKeobkIiN3ToxOcrlpfu8C2KgqbrIZ5dh8XCPd2NIqwV+/aiHSaQ7w74WZiVyfzMXZzeIHpV1vUH9vPrls1YFIWBzVvQuEbkfPVct5sLvv2KHLcbd8CPTVWRD7Yl/f31OHcUBbWB4Inv78cRd8hlVaNWMpfcc17UcwJo1qEJzTqYy0G8ccsHIe6mkgCxoirY42wEfBqnntOBh8bcbvp+T7EXRTU3lmXdNu37tOHzTe9QcLAQCdx08t0U5hRxuJyKSIa4bqNaptcDvsARyVAIW2dI+yVYoWtHKEeWaiwSbkZ6fiHUteMw+vgGd6lSSmMnX/QB6MXg6A2Jj6KoKcFn3GV0IpPFGIVdCmBDJD2LEAoysAtcX2O4p0pwQ2A1eGeBo3JB/OOC709wjaf0ewnGRGTuCKg97197wonECW0IFCGQUaWzAYgj2lULIWhftx7t60YnDPbg6WfQOT2Dr1Yup8jnY1DzlhT7fHy6/G9yPR6apqTyRK++dM0wmpWYCcqZ4fL7mb1je9SG4NV5c/h0+VL8moYiBKMWLeDhnr24tr35TvKthfPILmNQfZoGCgQe6MJNRRk4E5z0HNy10kVc0eBxedm1Psv0NavDyltzn6NGnRqk1I7s2qvbpDaJKQl4XaG7cIvNQs9y/YyFECSnJTFj7Bz83sBhjQBAk1PMP//c7ALT64qqkL3D6JsgpWTy6Ol8/cIE8vbl06h1fW557To6nRXaxlIIAaJyTXaEpRmkfo4seBoCa0HEg/NqROLdpffIvLtCq3U9P4FnOnqtuShqEk0NypcAACAASURBVEKth6z5FeQ9adQsqDUh4T5EyQLvW4hpqqd0Ib0zD933L0C6xhMeDwEIgG8p2I9PD+zq4oQ2BGayz2aoQtCzYUPT6tyqQgTTS89qElphe3vX7qZNUOKt0VU521SVFGd06a9r92fz6fKlpZ+LJiV+dF6Y+ydnN21G3YTwxXz6lk2mp6oDXjc9h/WhTkLl5bujxVqBto+qKCEupEgIIXjo8zt44oKX0PwBAn4Ne5yd5FqJpr59gANZOaZunfCHw3m3mPd/MAtog/E9lcQbvn1xAl//78fSE8/WlTt48oIXeXH647TtWXWd44StAyJtgunvmx7IiiDZ4IHC56DGy0htH+RcazShxwuaG/IfRirxCHtvUJJAKCYJTBaIIhj9j8KsmQ5g1EBE8TvxL+OfnX91lFgPE0S1KArxVht14hN48axDhU5+TePzFcs4/5svOe/rLxizfGnUO/TKYHYS6dO4ScQspLIoQhDQJW/M/4u/du2oUIhr2qaN+Ew0fYQQzNh2SGRu6Z7d3Dx5Av2//MxUjgOMQjx7FTWej4pK1Lvous5vX/zJnac9ys3t7mPt/A28Pe85LrzzXHpc1JVhLw7ho5Wvk1TT/DTTqlszrPbD75UsFpVNS7eZvnbGJd2xx4UbdV2XdB3YEb/PzzcvTAjLsvK6fXz2RKjonJRepG8Z0r8pKsE16d+AnncP+v4B6Ll3If2GdpHpydddQXNA75/G84pGgZ7PIdePDniQ+Y8hpQ723pjvLS0I55H1pjjeCOcFRgV3GDrYzDPZ/s2c0CeC+snJEfsnNUhKZsgp7WlcowZ9G5+ENbjoSikZNnkCS3ZnlVbbvjxvDr9t3cxXF192zLpKXdn2FMasWMpBl6t0R25TVQSUzlXTJbrUeXex0Q8hbrmVjnXr8ekFg0vvKYsiBMLkAykrgPf71s3c9ctUvIEAEvPiflUIOtdLp4ajakvkc/flsXDqUhCC7ud1okYtw9Xj84ZXZZegaTq7t+wlOS0xLI//zVs+ZOa3c0sX2a83/8isb+fx7pKXSjuOVUS7Xq1peWoz1i/chNcdOeMr4NeY++NCrvy/8ErXlqc2xRHvCCuwa9Q6g7T0VA5kHYx4atix9pAMte6eDAVPAgKkBmo6pHwQUtkr/auRxV+CtsdIOXX/gOHL10HbjvT+CamfIMwWMrUCl5N0oe8/G7S9mFbK6gWg70GoGZA6Bpl7c7BeQRhjJ72IsFRtoWS14zgXPJMNd5d0AVZAhaSXEeLIClD/DZzQhqDA60URwjToWuD1MHP7VlrXqk2X9IxSyYXFu7P4e8/uEFkGTyDAin17KwzKegMBpm3ayF+7dpCRlMTlbU45oi5f5UmyO5h85TW8t2Qhv23dTKLNELK7oEUrlu/bg1/TueeXqRxwH+oZ6/L7WbpnN+PWrGJIu/BWgOe1aMnHy5aglXOZ6VLS76SmSCl5cuaMEJdaySenCIFDtSAE1IqP580B4dW+R8PPn85g1B2foAR7G7xz+0fc+9EI+g3phTPeQUbzemRu2B32Pr/Pzy0dHkALaPQc3J37Px6B3Wkna/MeZoydHeLa8Xn87N2ezZ/j5tH/mt6HnZMQgv/9/BgT357GL5/+gdftI2dPrmkKaXIt81PF/ElLwpr2AOxav5utK3dQv2U6IkK/igYtjVoI6V8H+Y8RUlSmbUPmXg9pvyOEgu6eAvmPUrrw+xcSavF1wI0seBaR9lP4YI5LoOAZzAXjvKBtN51jcDJGzAEQ1jZQaw74VxgBVlvHals4pdQNgyPiqnyDJoQKNd4H3zzDgCo1EM4LDWN3jJHSjXRNBv8SsDRGOC9FqJVrohWJE9oQFPv9WBUVTQvfxeR7vSzMymRhViafL1/K95ddRcd66SzZnWUaW3D7/SzZnWVqCIp8Pi4Z9zVZhQW4/EZmzcdLl/DR+RcfUTZPeWrGxfFEr7480StUK+bU9Pqs3Z9tWq3sDgQYv26NqSFoUTONu7qexlsL5wHGQiel5Nm+/cj3eFiTvY8ct1mADOIsVp7qcybpiUl0zagfJqF9NOzdns2oOz4J88e/cfP7dOjblrT0VO4ZPZzHznsBv9doAq+oCrqmowd0PAFjx//XhIUAPDr2btbO2xgUuAt9pqfYy9+/rohoCApyCvnh9SnM+2kxiakJDL5nEJc/eCGXP2hIh9zc7j52rssK2cU74u1cfNcg0+ctn7XaVGYaYM1f6zmpXSMuvf98vn91ckj9gt1p47qnrwBAur4mXMpZBz3XqOy1ngIFIwmtPo5whgqsN40RKIodPfl1yL+PI1MPtRpFUsohAT4hVKMTWclM9ELwTEVq2QhbJ7CdflRVwVJKZPEnwRaYLlBqIBPuQ4m7rNLPNEMIAfYeiCgroKsDqecgD14CWg5G8NpmFAemfoGwVl5XqzwntCFolFyDRLsNj6vioLEO3PDTDywfcSdpcXE4LBZc/tAFxGGxUis+1PXg9vvJLi5m/LrV7MzPK61NKIkn3Dd9GvNuuqVKF80SKtoBVfTaiC5dGdi8Bb9v3YJFUWhXpw4P//4ruwryUYQIq7kooVZ8PINPPny7x8ow+/v56BGkmMc8/i2tujWnw5lteWfB//j+1UnsWLOLA1k55OwNLXDyefzM/XEhhblFRgMbk8/BYlOp1cC8qKc4v5hbOz9E7t680paVm5ZuZfN927j+6SsBeH7KIzw84Dn27zqIalHxe/0MefwSupxt3n8gLSMVm8MaZuRUi1raZOfakZfjKfLw06hf8PsCJKTGc8fbN9Ghb7BAUduH+eIsQM+BwKYIr5vhiPj7oTgHIu09kK7vjE5ingoa44jEoNZQK0SNVyPepvvWQM6VlMQVZLECluZQczxCVK5CXhZ/CkXvUJrVox+EgufQsSNsHYzCLaX6stiOJbLwneDPv2QN8xkS43kPIWr9UmXjnNCGQBGCV/qfy4ipPxHQNAIVBNgKfD6KfT4GNm/J83Nmhb2uCsGg5i0AQ1b6xbmzGbt6BYoQuE2URQGK/D625ebQNNVcZOxI8QYCjF6yiB/WrUGLUB/htFi44jA9DRom1+DGjp2RUtL3i0/ILCiosGGP02JhROfDV8BWloBfQ5r4yX0eP398M4dZ4/5C6pJzbzqLBz+7HSEEQ5vcZvos1aqSv7+Ajme1JT7ZiafYE6JjpFpUBt5snsY45YPfyMsuCOlb7Cn28v0rkxh81yCSaiZSu2EtPlnzJpuXbSP/QCEtT21KYkrkzKn+1/bh6+d/DLkmhJH22nWgsWte9sdqJr//G1rwM/C5fLx/7xja9WptCNTZ+wZTM8ud1qTPUPyUbiNuEA3iMJWxIgFh64BUalVgCASixpug1EVUUCglpYTc6witK9AhsAFZ8AoiOXJXsgqfWfw+4amdbih4CIkd0JCOsxHJz/+jpJ4rhfdXTOMyWiZSO4hQq2ZtOaGzhgB6NWrMlKuuYWi7jmGpm+XxBPwk2GyMHXw59ZOScFqsxFmtpCcm8uXgy0pVSt9ZtICvV6/AEwjgimAEwAjmOixVU3gipeTaieP5cOlisgoL2FtcZNQCYCzUFkXBabHSo0EjLomyuc2yvXs44HKZGgFFCBJsNhwWC8M6dalUw5xoOe2CLqgmmv8Afm8Ar8uHz+Nn+piZLJjyNwBterQqjSeEzFsR1GlcC1VVeW3m0zRu0wC704YzwUGNWkmM/OHBEKG5siz+ZbmpP99is7ChjEy2EILmnU6iy9ntKzQCAGnpqTw35RFS69XAkeDAHmcjo0U6r816GpvdipSS14a9h9flLXU3+Tx+CnOL+Hzkd8Z4cReDmgGU9bU7IX4YQk0zKo8tzYEosrjkwYgZR9K/Cbm/NzL3FiiMIAsNYDkFYT+jQiNgPG8DSPM6Cjw/HH6upniN3semGHEQ8IHnN2Tew5Uc459EpFOTNEQHq4hqPREIIQYAb2H8hn4spXyx3OsNgc+BGsF7HpZSTqvqeZyUksqTvQ0/e9O3X4u4cCcGF/q2tevw53XD2JKbg5TQLDW19DgtpeTTZX+HBJPNUISgaUoKGUmVDxiXZcmeLNbszw6JX/h1nTirlUtatSY9KYmu6fXpULde1IGzHLcrotuqe0YDHunZi0Y1UkioZOe2w5G5cTcrZq0hLskZMWBaFk+xl2kf/c5p53fh2qcuY8GUJXiKPKVuJUecnZv+dzVWm/EHkt60Lh+ueI2szXvwunw0alO/QpmLWg3TEIoIU0LVNZ2UOkYGk6ZpfPviRCa+M43ifDdtTm/BrW/cELFnAkD73m34ZtcH7FibidVmIaP5oZ9Rzt48cvflh71HC+gsmrYUwAi21vzeiBV4poOShIi7JqTPgEh5D5l7Y1CCQjWXpAZjx2/yM5dSN96vZ0f8PgziIGX0Ye4Jou+P/Fqlu9LaQUmLYp5e8M5A6rkIpXJFeFJ6ka4Jxq5cSUHEDTFiHMeSuCtMJMhVsHVGKFWztkA1GgJhCIu8C/QHMoHFQohJUsq1ZW57HKOp/WghRGtgGtC4uuYEMLB5C6Zu2hh2/ZTadUIKyoQQNDNx6fh1neIKxOPirFYEgkS7jfcGXVA1kwZW7ttHwMQd5PL7sVss3NK56xE/s0PddNP6CKfFwoBmzWlT23znfLRIKXnrtg/57YvZCGHoFEVVuAV4PcZnn9GsHu8ufokvnx7HqjnrSMuoyVWPXMxp54enRpaXe47ExXeey5zv54ekiiqqQr2T6tC0fWMA3hrxIX98M7c0HXT5zDXc0/NxPlj+KvVOivx5KYpCk7bhiQOOeHvEVpXxyYdkJIQSj0i4GRJuNr1XqHWg5hRDllo/gCz+Dny/hd9o624+Qf+yCMZDgNrU0COyd0fEXR29b9/awXi/2dZLqVw6qRACmfBAMJXWPAh/6GarET+ohCGQ0os8eAUEtmGcMgTS8xsy8X6U+OsqM/VKIeJvRPqXgXeeUayHACUNkRze1+NoqM4TQVdgs5RyK4AQ4lvgQqCsIZBAiVlLBsLzA6uY58/sz7oD+9mZn09A17EoCikOJ+8PCm8oY4ZNVWmQnMzO/PBdXKuaadzYsTO14xPo0aBhlaqCZiQmYVXUsIXbabHQMLlyLRPT4uK4pfOpfLz071INJLuqUi8xsdoCwwBzfljAjK/mmLphKsIRb+esq88o/bp+83o88tXdEe+XUjLzm7lMGj0dT7GXPlf04MI7BoS0vCxLi85Nuf+T23jr1g/RdYkW0DjplIaM/PFBhBDkZufz+1dzwnoW+Dx+xr3yE3ePHn5E3w9AfFIcXc7pwJLpy0ME7exxdi4yachTEUIICKqRyvxnzG/yrTG/LoswrxqRoNZDSf3giOYCoKiJ6PZzwVv+kC8guQLX0+GeG3cRUnEiC98EbXewSU4BpgYnWkG9ckjXhDJGgOCzPVD4KtI5+JgFo4WwIlLeR/rXg3+NUT9i61blvRiq0xBkALvKfJ0JlBfoeAr4VQhxJxAP9DN7kBBiODAcoGHDyqdjgpGfP33I9czeuZ11+/fTuEYK/U5qekTyEiN7n8nt0yaXumkEhmDcM3370SW9evKMz2xyEvE2K+6AP8Snb1EUEu12vl29klPTM0wD05qus7eoiES7PayXwb3de9Cudl0+X7mMfI+Hc5u1IMFmY8BXY8h2FdMstSaP9ux9VGmw5Zn64e+mPQ3MUC0qWkDDkeDg5O7NOWvIGYd/U5C3bvuQGV8d6p+wa0MWM7+ZyzsLX4hYUNb3yh6ccUk3tq/eRUJKPHUbH8rXztywG5vDGmYItIAWEkM4Uh4aczuPn/cCW1bswGJV8Xn99Bt6BuePOLvSz0TPNL8ud5umj2LtVE7mOohwHlWDeFHjNWRhBrg+B3yGSqnzCigciZ67A9S6kHAvitM8/Tbicx3nIByGGoAM7EQevChY+KWXzpuEeyqdmWQEaU1SqYUV/EuDVdTHDmFtBdZW1ff86mpVKYS4FBggpRwW/PoaoJuU8o4y99wXnMNrQojTgE+AtlJG7uBSVa0qD8fyvXuYtGE9Esn5LVrRqV5oo5M/tm3ludl/sK+4mPqJSTxzZn+6BQXjqov5u3YybPKE0viERQgcVitI0KShZDqoeUte6ndOqe9/2qYNPDlrBi6/YUD6NWnKS/3OIT6C33/M8qW8Mm9OWJ/jMRdeUiqId7Tce8YTrP5rfYX3WG0WajeqRd+relBwsJCu53bi1AEdUKI8Ze3Zuo9hbe8Nczk54u3cPXo4/YaaN8mpiAO7c7i26R1hhkBRFfpf04sHPjVXPo2W7Wt2sW97Nk07NCYt4+iyQfTsPqCbHLCV2ii1zdongl48FgpforQoTThBbYao+Y0hpXwUGOuMZhRo5d5BqFvHAUkjUeIqL0MhAzuQRW+DbxEotREJtyIcpvvKqNDz7gfPFMLL8OMRKZ9Glvv+B1NRq8rqzBrKAsqey+oHr5XlJmAcgJRyPkZaxHHv3PDyX7O5+sdxfLFyGV+sWMY1E77nf2VSSrfm5nD/rz+zt7gYdyBAZmEhd/08hT2FkbIZjh5vIMCt0yaFLNABKSny+Sjy+3AHAniC1c2TNhiL7N97snjwt1/IcbvxBAL4NI0Z27Zw1y9TTMfQdJ03F84LC4R7AgFemTenyr6XM4ecgT0ufKdmsVmIS3KSUCOec27oy6iFL3DdU1dw5zvD6DawU9RGAGD1X+tNexZ4ir0smb68UvNOS0/ltAu6YHOGLopWu5XLHozOtVgRjds0oNugzkdtBABIuJvQLCMAJyTcFfEtSvwQROqX4LwI7GciEkdWiREAw20lhAVZ+Arhvn0PFL0elX5SRLQd4F9pBKi1LYYmk4xOdNJ0vnFDCM/YESCSwGpeM/JvpjpdQ4uB5kKIJhgG4Erg6nL37ATOAsYIIU7G+M2tINWg+tmcc5AxK5aFZOe4AwG+WrWCwSe3oVVaLR774zcKvJ7SvYI74MenBXh+zixGDTy/WuY1ddMGCryHd6e4A36+Xr2Ci1qdzAdLFodVSXs1jXm7drK3qJC6CYn4NI13Fs3n61UrKPb5I7bT3BTsnFYVDLixL398PYcty7fjLvJgtVlQLApPfHcf3QZF3/Rj7YKNfPjAF2xaupXkWklc/uCFJNSI589x83AVutG18IXFYrVELCgDw300+r7PWTlrDc5EB+ffNoCrH7m4tKXk/31xJx899CU/f/IHPrePxqc05K53h9Ho5CM7LXlcXhZOXYq70E2n/u2oXcGcKoMSdzE6ASh6IxgwTYX4u1HiKm6+JGztEbZqXOgC282v6wcx6g2OXI5C+paHnjKkC1xfIvX9RnBc2MDex7xvcgSErRMy8X4ofDWYpilBJCFSP/3H90quDNVmCKSUASHEHcB0jNTQT6WUa4QQzwBLpJSTgPuBj4QQ92Kcwa6X1eCryi4u4vu1q8kqKKBb/Qac26xFxJjAH9u2mhZr+TWNGdu20iy1Jot3Z5k2r5+5fWvY+6qKP3dsj/reksV/Z0G+aaqsVVXZV1xM3YREbp36E39u345+mL4NjSoZkDbDarPy0m9P8sVT45g/eQk1aiVxw3NX0rZH9JLLm5dv46F+z5TKMuzfdZD37vkMRVXQSrSATGKfqlVl4DDzgrIDWQe5s/ujuArcSCnxun2Me2kimRt28+hYIyhts1u5/a0bue3NG9ACGharhaK8Yia8PZVtq3fRvGMTzhrai7jEyIVMq+eu47FBLyCRSF2iBXSufPgirh0ZXYe8aFHiLoO4y5DSB1iPmWBihah1jd17eUQikXPmK0YWjcL0lOGZiPROx3B86FBjFMIefYxJib8O6RxsxASCJ4ET0QhANdcRBGsCppW79mSZf68FqlXIY8nuLK6Z8D1+TUMHxq9dzdsL5/PTlUNN8+MjGQghBPag+qcSoXm9Ral6WeZV2fuYu3M72UXRuZ0cFgsXtTQW1K7pGWzNzQlLO/VrOk1TUtmWm8Os7dsO27rHYbFwb/fQH9OWFdtZ/PMyHAkOel92Gil1ojcUPq+fh/o9zZYVO/AUedhtU3n47Od4/Lv76H5edCeCL54ah88dekKSukQrK5ERlE+12a2oFhWLzcL/fXFnxDTPCW//jM/tC3FReN0+5k5YSPbO/dRueKjbmBACi9VC1uY93HXao3jdPrwuHzPj7Xz5zPeMWvSi6S7f5/Xz+Pkv4ioMDUSOe2USJ7VvTFp6Cg1aZRCfdGQdyCqiIteODGwH7wxAAcc5CDU94r1VgtrY3BAodStvqDRzCXAgqIIa/GfeHVDrryM7GSiJxzwwfDw4oSUmpJQMnzyhVAMIDL/6trxcHp0xnR4NGtG4RgpdM+qX/hKeVr+BaSOWgK5zWn0jJfScps34dcvmkPtsqsrgk1tXOB+338/M7Vtx+f30bNjItBHM6ux9rMneR/2kZCasX8u0zRvxa5ppYl8JJYYpzmqlWWpNrj7FEKMa0aUrkzaup8jrK93xOy0Wbul8Kgk2G9+t3hrRCAiMxa5uQgKP9uxNz4aNyCosoIbdwSf3fs6vn88i4NNQrSof/99XPPrNPVE1YgeY/tlMNi/bVpqLH/BpBNB48Zq3+X7fx6UFYRWxZfn2qFoUOOIdDHlsMF3O7kCTUxoGhejMWb9oE36TnsQ2u5UdazNDDEEJb474kMLc4tJaAE+xF5/Hz+h7xjDyhwfC7l/+x2pTX7jX5eWZS1/FmeAg4NO47MHzue6pK6p1F68XvR8sVtIBAYWvIxMfMpQt9SKjbqCq1Tb9y8yva5uRMoAQlViSLCcHC+kO9wuhGL0VjjBD6b/ACW0ItuXmkhfBrz5l00ZmbNuKIgTpiUl8c8nlpDrjWLJnN6qJdLWKYMmeTNrUrs2zffuxOSeHXQX5pal4LWumcUOHTvyxbSu14+NpU6t2yB/xoqxMbpo0obTZvSZ17ujandtPNQp8vIEAN0+eyN97jHi6LiU+TYuq0WaduHjibDb6NG7CLZ1PZfzaNazdn03rWrX5ZvAVfLR0MfMyd5LmjGN451M5v4WRhmapIPiakZjI79fehE1V+Wb1Srp89B5+TUPTdOLz9pHi9SM0I9ce4IUhbzFu78cRc/TL8sfYOWH6/GAY7o1LttLm9JaHfUaDlulk7zxw2PsURdCwVX2adTTvhVyWJqc0ZM28DYdcS0H8Xj/pzeqGzHPD4s3kZhewYtYa00rkRT8vNR2jfMZRWaQucRUYO9gfXp9CRrN6UcllVwbp3xQ0AuX+PgqfReI0BJEKNGTcdShJ4Qat8kQK4EqOTPX0ECLhDqR3DuatJcuPUdmK5hObE9oQZLuKK3y9JDtme14uj/3xG6MHXchBl8u0f4GG5IDL0P6v4XAy7eprWZSVyba8XFrUrMnPmzZyzldjsKkqmi5pkJzM5xddQu34hOAiPyGsIvm9xQs5rX5DOtVL570lC1m8OzPk9BIt+90uAsVFZBUWMGb5UmyqijsQwGmxEGe1MuGKIbyeFF6c1L9pM56ZPdPU2JzfohU2VeX3rZt5dvbMUGmLjqlofo1a47eXXlNUhWW/r+L0Cw9/KrCZdOwCYyG0OaLTTxn65GWsnru+woYxYFT0dhkQKsl9YHcOOXtyadAyHWfCIV/+4LsHMf2zmSGGwOaw0r5Pm9Lq5L3bs/m/s58ld28eiHA5ihJKgsvl6dC3jWk/g/J4ir2Me+Wn6jMEnulEXpTdhzbX7i+R9tOqTorZ3hc8vwBlPwMB1k6Vzk4S1laQ+gWy8CXwrwIRFyyQK2d0pQa26GME/yVOzMhHkBY1a1boUinBr+vM2LYVv6axsyAv4n078w69JoSgW/0GXNm2HVmFhXy9ehVeTaPQ58MV8LM55yC3T5sMwNxdO0zdGJ5AgO/XrgZg3JrVlTICQGkMwBMIoElZauDcgQC5Hg8jZ80wfV96YhIXt2qNWs79UMPhYHhQsuKdRQvCMo+kTaWwW210W+ivT7Rx/vOG98cRHx4YTEiJj2rnDtC2Ryue+P5+0pvWQVEEzgQH3c/vjM1hJS7JSVyik+S0RF6c/nhp8Zir0M1j5/2Pa5vdwYNnPc1ldYbx7UsTSp+Z3rQur8wYSbOOjVEUgdVupd/QXjw5/oHS7+/x815g79Z9uIs8uAvNd6BWuyVi4Vt8cjztelfsQiwhLzu8er3qiDInQ7oNWeoqQiT+X7B/cYkBdoBIRFRQaazrPvSDN6DvbYW+twX6vu7onlmhz7W1R6n5NUrdVYjaC4wOY6XKo4oxTuL9CDXcvRfjBD8RpDrj6Fa/AQsydx32Xl1KdCnJLop8isguNn/NEKErV20qJauz97G3qBCvSZ9gMP4U3cG+B9H2RLapKrXj4vFoAfI9HtN4Rll0KZmz0yQ4F+Tl/gPoULceY5Yvpcjno99JTbm72+kkOwwXT8TaCCnR4ywowZ7GAb/GytlrWT5rDb0u6U7bnq1M/dsBf4AeF3el34ze/DpmJkJRUFUFi03l2UkPH5FPvNvATnQb2Amfx4fFZkFRFNxFblbOXofdaeOUM04OiQm8fP0ols1Yhd8bwB8sNBv73A9kNKvHGZcYLrpWXZsz+u9X8Hn9WKxqSO3CjrWZ7Nu+37R3gqIqWO3GHBq2rs/NLw81nfOB3TmsnBVB5qHs8xRB+z7Vp/gqHOcgiz8idGceAXk4l8sRjKvWhbRfke5JEFgFluYI58UhjW3CODgoNMAscyBvOHrq9ygmqa5CCEh+BXwLjJOPcBrdxaqxMvffzgltCADeH3Qht02dxOLdmShCwa8ZvXjL/ikLoH2dutgtFlrXqsXcXeYL58m1zHcTkfL7VUWh0OujR4OG+E0avsRZrQxqbvjDzz6pGT+sXxOW4VMStC3pV/zewAvo3bgJGw8e4OLvxh7WEABYKkh5U4RgaLsODDXpaAbQvm49/ti2JWz/KAISa3EAq8OKpunoAY2J7/yM1HV++WQGfa7owX0fjShd2H//6k8+zyHhdgAAIABJREFUfngsOXtySa6VzDUjL+OD5a+ycvY6kmom0HVgp6j6CJthcxxyKTgTnHQbGK4QWZBTyKJpy0J6DYDhgvnu5YmlhqD0mSZzKc53oZgUqQHUb1GPwXcPolGbBrQ5vWVEg7Z23gYsNkuFInuqVcURZ+eG566MeM/RIqwtkAkjoGg0hjEQhLlSAHAinOeVfiWlBN8CoxWlmg6O/kes+S+UeET8VcBVh71X9601zzICoytb2kRjXtp+pOszQ5xNTUfE34Swn4awn3ZEc/uvcsIbgiS7na8GX0ZWYQH7i4tJttu56sdxFHp9uAN+HBYLNlXlxbMM3ZI7u57GJ8v+No0TTNu0kXFrV9Mtoz6P9Oxdqk7a/6SmfLZ8Wdhib1ctnJSSgqoojOx9Js/MnmkEXIMZPj0aNOSsk4weCfef3pM5O7eT5/Xg8vtxqBYsqsKzfc4is7CARJudQc1bUjPOSCtsHmWzG6uicl6LwwdfI3H/aT2Yt2tHSLWxTVW5rV0H0kY2R9d0xv7vBwLlmrnM+u4v+l/bm3a9WjPru794c8RHpTn/edn5fPjglwx/eSgX3FZ5HZsjoTCnCNWi4Dex2eU7nUWiWcfGpqcBm9PG2df1YdDw/od9Ro3ayaYuNEUR1G5UC0ecnVN6ncwVD11EnUbV68ZQEm5DOgaA5//ZO8/wKKo2DN9ntqZ3IEDoTXoTEFFRARVQUBAQ7KBi7+WzYu8FCypiwYKAgAoCIoh0lI50pNdACOnJtpnz/ZhkSbKzyZJkQ8t9XVyY2Z2Zs7iZd+YtzzMPhAmJGbLeRa8deIBQsLYFu95lI6UDefxW8GzVZaSFHTJfhbgJCHPJXh9lxrXM/2v5w2lSPYI8ds2JuoBnC9K5BBn5Mkpo+Se+zwXO+kBQQK2ISK+Z/J833c4v27awPvkwTeLiGdi8BdF2/a4mzGrll8HDuPXXaaTmG8MX3I0XFJ8X7NnNykMHmT3sFmpFRHJXh078tn0bxx26lINJCCwmE69f3tOrQDqkZWvaJ9Zk6pZNZLuc9GrQmIvq1vNqAsWHhjL3ptv4ddsW1iYfokF0LAObt/Re+IuT4XT49UQQgM1sRhGC+tExPFvM8/hkiAsJ9RafC9CkpFPrJnTpncTvX83HbDYVCQSgB4OXB72HxaYPXBX24wW9XfLbF38qdyBIO5LOjrW7ia8VS/1WRaWNVVVlxqd/MOPTP3DmOr0OYIVRTArtewTm/WoLsXHnWzfy8f1fFplajogN5+q7rwjoGC27NSMiJhxHjrNIodlis/Dy9Keo16JsapmBINVD+vCVcykosYiwEWDvjQjXVVMFIG0XI/OmgpaOsF0Gtkt0D2JA5ozTFTALOm9kDpCLTH8EEf9rcBZtMX5SBfThNEBmf5pvVlPwHSxQCn0ZGdIbUYEGLmcrQROdCxaVJToH+kRyam4u106e4JPDtygKQ1u14YVLLgP09NCkjf+yeN9eakVGckubdjSLL/mOzuFx88fOHRzKyqJ19RpcUDsp4Bz5T5s28OSffxi+Zsl/AmkUG8f5NWuVqxf9vlkzmLXD178hxm5n1R338OcPi/nwni/8GrSXhBAwy/Gj3w6bkpBS8vmj45n+6R9YbGZUj0adZrV4bfbTRCfoJjKv3vA+y2es9gYhk9mEqqrevKDJbCIkws5na94O+O77f1e9wrr5m/C4C0lGh1j5YMkrJRa6NU3zGtMIRfB8vzc5svcYJpOCUASPfDGSiwcGnsaQUtMN4XMnAW6w90OEDvTbeaPfNffNv2su+C7bwXoByHTAjAi9HuzX+J2e1VIuB9Wo3mZFJPwVtEKsdrSL7s1cnOgvUOyX+F+XCEXETUGYGwVlXWcaJYnOnTNPBCdDptPJO8sWM2P7Njyaavgo79Y0Zu/Yzv6MDNrUqMHQlm24o8P53BGgt++utOMMnjIRR75YnM1spll8At9fOzAge8ukqCi/r4VbrQxtdaKIpknJsdwcwq02Qi0nd3c0b7exvHKaw8HBrCy69O3AByNPXqseIDYxpkxBAGDe94uY+cU83E63tzd/1797efWGD3h73gvs23qQZdNXFfE8UD0qVruFWo0TkZqkdfcWDH6iX8A6P0f2pvDvws1FggDo08KT3v6VZyY8ZLjf+oWbeO2GD8jN1v2TE5LiGDX1MRCCvKw8GratF9AQXWFkxpO6VHJBIde9BemYBbHfGl7IZc6X+TLNhW9oHOD668R7MjeCYwEi5gM/Jy3ppjGIN5TxsyD1hkITxFaIeArFnt9aq8QZBwLpAVE2d7JzjapAUAxV0xg0ZSJ70tNK7eQ5lpPD/JxdLN2/l6/WruHnwUOpFx3YF++h32dyPC/P++uT63az6egRPl+9kgc7dzXcx6WqfLNuDZM3b0DVJBZFMSwWj2h/IujP3bmD5xbMI8Ohi+Rd1agJr17WM+CAYOSKVkCOy0nt+ASenfgIrwx5D0VRUD1qQG5jtlArt79WXIMwcKZ98JuPp4HqUdm0dBtpRzPYtmKHoaexy+Gmfuu6/O87/yqc/ji67xgWm8Xn80lNcvC/w4b7HDt0nGf7vl5krQe3H+LRS0cxYd9nZSqQS/c23bKyiL6OAzwbwbUIbN19d3L9g3ExuPCB88D5F9K9CWExMCYK6Qc54yg6lCXAXE+fRg4SihILCXPQNBeQi1Ksw0iEDUemP0HRgTILWDtVmLn72c5ZPUdQFhbu3cPBzIyA2jkLLpH6/ICTVxcvDOgcx3Jz2X481eceyqmqTNls3FoopWTE9J/54J9l7EpLY29GOkjpMyfRq0Ej7u6o+/+sSz7Mg3NmcjQnB6eq4lJVft+xnYfnzAxonQDVQsMMtwsgKV+IrkvfDny6+i26XdeZDr3aYPZjRF+Y869sV65hqazjxq28ikmQk5FLfO1YjDJiFpu5REvJkqjborbhZLDZaqL1xcaCeXO/XeA1pS9ASnA73F5P4pPG9Q+Gd+AyF+n0U1w11cbYgaw4Hl3T3wARdgeYG+sDWwCE6oqcUe8FcNzyoyhWnyAAeiss4SPRZxLCAZs+oBb9fqWs62zgnH0i8Ggax/NyibaHFBGa23IshTy3cRG2YPDKqKNIk5IlJfTrF0aW8Bjt77W1yYdZffhQ0QlfKQmzWBjRrgMJYeH0atiY+ELF5U9XrcBpIEO9cO8ejmRnUz28dPGt5y65lAdm/1bkMwv0yeOCp4q/Ji3lndvH6PIZmoaqaigmxecCWJgl0/7hiye/5863bip1DUbUbpLIkb2+iuWqW6Vmw+rUbFid6IQonLmuIuswmf2rj5ZGZGwE/e67iulj5njrDooisIfZGfiIsfz4sYPHDZ+QVI+qTyfnM+/7RfzwylRSDx2nYdt63PHWTTTv0sR4IUosCLOBAbwFXOvR0u4G68WI0P7e1k4RNgLpXELpMgwWvx6/QgmFuCngWoR0/asL1NmvQijGNwuViRJ+NzL0JvBsByUBYQ5e0f1s5Jx8Ivhm3Ro6jB1D9/Ff0u7zT3hn2RKvmmjdqChCDPLWYRYL93Xqwie9r8biR2XUaD8jEkLDaBAd43N/ZjOZuLap8dTpuuTDhrMIOW43uW4PQ1u1KRIEAPZmpBuGFavJRHIxNVOZP1AHsGz/PoZOnUy3r8fy2/atDG/XAatiQhECBejTuAlv9tC7ZDJTs3jntjG48lw481y4nZ78bhiJ2WIynCAuYOr7xgY5geCv5VMCGceyUBSFd/4aRbPOjbDYzNhCrFSrE89rs54pl/b/HW/eyL2jb6POebWIrhbFJYO7MmbVm37NZNp2b0lIuLH+UosL9QGnaR/O5IORYzmw/RB52Q42LtnKEz1eYtvKHcaLsPdAV3Yvjhs863U10aw3kMf6IzXdkF5Y20HU6/k58xDAguGvv1DA5r8NVggFYeuOEvGAXpw+DYJAAUIJR1jbVwWBMnDOPRFM27LJx4rx63WrsSgKD3bpSs8GjXh18QKvXAPoQ1cOj4exq1diVhRUTfURprOZTAxu0SrgdXxwZR8GT5mIS9XI87gJs1ioFx3D3eefsHXOcbmYsGE9v+/8j1y322++PsRs/L+xY2Itdh1PxVPsCcatajSIiQX0yebXlixk6pZNOD0e6kXHcDAr05saK5gstppMaJrEpphYsHcPO9OO0zyhGv/MXGPoBCYl9L6jBwMe7sstje83XJ+majjznNhCTl6HPi/LuEvJYjWTk5FLTLUoqiXFM3rJq6QdSceZ56J63QTdhP5IOmlHMqjVuIbhuXeu38OaeRuIiAmja//zWTFrLb9/NR9N0+h1y6X0uqU7Vw0P7Kmia7/zSWpWiz2b9nsL1/YwG136dqBB67p43B6+fWGyYXvtV8/+yJtznvM5phB2XVsnbWS+abvILwQXFm7LA/UQMnc8Ily30FRCeiPtV4B6EJRIcG9Fpj+Ibk1Jvg3jmKBf3KVnPzLnK91RzNIYETai1M4e6VyGzBmvdw/ZL0OE3lhpBvLnAudcIPhoxd8+/fd5Hg9frl3N/Z0vwGY2M2XQUJ6aN8crTWE1mXB4PEX2U6TEajJhNZlwqxoXJtXhIT9FXiOaxMWz+LY7mfnfNg5lZtK6Rg26163vnTvIc7u5dtIP7M/MxKmWbLnnz1Xs7o6dmLF9CznuE4b3IWYzw9t1JCLfxP6u334tIna3Oz2tyDEKQkjB605NxelSeXjOLObceCuapvlNZ5mtZmo2rFFimqjwVPDJ0Llve2Z+PtdHwM1kNvH1MxMIiQjhytsvo3H7+mz5+z8cuU4kks8eHs/K39dhsZnRVI1bXhzsTetIKXl3+BgWTF6G6tEwW0y8f9fnuqF8fnrnv9W7WDx1Oa/M+J+3LVdKydYVOziw/RD1WiTRuH2DIut5b+GL/Prx7/z5w2LMVjNXj+xFz1v0+kj60Qw8BtLXALvW7/H7+YWlOSQs1Ien3NshcxSQW+xdTnDMhvATXspCmMBcR//B1gWqLQPPJsAM5mZBN16R7u3I44Pz01oe8GxG5s2G2C8RVsPORrScryBrNN60VvZWZN4UiPv1pLwFqvDPORcIjuRkG27P87hxejyEWCzUiojku2uvx+nxsPVYCkOn/eRzqdOAdjUSual1O5rGxdEwwEnfwoRbrX6fIqZu2cTBrNKDAICRAKaUklqRkfwy5EbeXbaEvw/uJzYklDvbd2TAeXpHyM7jqaw6fLBMYnd7M9JJycmhU+/2aPd84fO61W6l+2BdsbL74K7Mn+BrmN6+Z+siMw471u5m4U/LEIpC90FdadC6rs8+BQx7ZgCLp/5Ddlo2Lodb7xCSEpfDzaIpfyOEyD+nxGKzIKXEkeNEUQSqR/MWfce/MInEBtW5sH8nlvy8goU/LS/kk6D/27sKBTFHjpN/F27m34WbadO9BTkZOTzZ6xX2bt6PEAJNkzQ9vyGvznwae74vsy3ExqDH+zHIwNc4Ms7/XW2N+iUXtYUQYGkOIgzpT8JZlHzXLIQJLIEN1FUEMuv1/EG0AlQgD5n5AiLet4lBatmQ9T5FO5WcoB5F5k7wDsOVaS3OJfowmnoIrB0Q4fchzPXKfLwzmXMuEDSLi2fdkWSf7SFmCzf+/BNxIaHc2rY9XZPqYDObcaoqJsW428KtqvRu7KegV07+3L3T7+RwYULMFno0ODHeP23LJt5ZvoTk7GwSwyN4rGu3Ij7KblVl8uaN/LxlE5lOZ5kNw6WUmBRBTLUo7vt4OB/f9yWaJtFUDYvVTN+RPTmvc2MAnhh/H2lHMlg7f4P3EaNZ58bc+tIQprw3g8i4CPZs2s/0T37XL9BCMO3937jhf9cy7NmBhuePqR7NFxve5bfP/mDNvA2YzCY2Lt3qTb9IKb0X+8L6QmqxqOnIcTLxjZ+5sH8n/vjmL5+WVCMcOU7WLdhIm+4t+OTBr9m1fk8RQ5stf//H2Ce+5eq7ehGVEElsDf8txVa7lWvuvYLpY/4okh6yhVq5+YXrS10LgDDXRZrrguc/imr6hyBCy1aMDxqu1cbbPTuR0okQxVJ17o2657BPYdwJzr+gjIFAy50KmS/hfcpwHEY6/4S4aQhzYAq4ZxNBDQRCiCuB0eiVrXFSyjeKvf4+UKB/EApUk1JWnDmuAU91u4Rbf51apPtGAE7Vw9pkvRd86f69PNzlQka070jLatUNPYztJhO9GjYO2jqrhYX7tcQsINRs4aK6dflt+1bGrVlFhM3GjG1bcBTk97OzeGb+XACubdYcTUqGT/+Z1YcPBhRkSiIpKorYEL04fdXtl9P20pYsnLwct9PNBdd0pFHbE79MJpOJt+Y+T9qRdA5sP0z1egmMe+p7Hr/8RVSPismkFPMV0P2CJ7z+M5cMvpDajRMN1xAZG8HQpwcw9OkBvDt8TJHhsZMh9bCeDlP9qMQWxxpiJSouEiklf01c6pPacTvdzBjzB/O+W4zH5aHtZS15ZsKDhEUVzb3n5ThwO9wMf30YZquZXz6cjcvpJjo+kjvfvZnzr2wX8GcQ0Z8ij9+UbwKfXysIHazLMZ9OKBGgGdV3LPl/ir8/BmPfBAFK2Yr+Unog63WKdlBpuuR29ofnZNtp0AKB0AVKPgF6AgeAlUKI6fk+xQBIKR8u9P77gcC/+WWkU63afH/t9byzbAlbU1Owm82k5uYWGczK83h4d/lSBrVoRaTNxjMXdfcWkCW6h29ieESR6d2K5sbWbZmxfWuRgKUIQUJoGJfVa4AqNRLDw/l8zSqvkJ3At7vc4fHw7vIl9G96Hkv27WVN8qESg4BAdy4TgNlkwuF2GyYdMhwONCm9WkmJ9asz5Mn+JX6mmOrRxFSPZv6ExSyfvsp7B+xvNZqqsXz6Kq5/1Lg1szBh0WGltqwaoZgU2l6qyz23urg5K39fV+o+UpPMGf8XUz/4zW9+H/D6Faybv4FXh47mtZlPA3qn1du3j2HVHP1cifWr8dhX93DLi4NxZDsIjQwtgyyIlp9ycaJ/CxRQDWQZTjWhN0H2GIoOw9kgZKBxfcLcRJ+B8Oyi6FS0HRF2S9nWoCaDNBqu08BVOfI1pxsBBQIhxGrgK2CClDKttPfn0wnYIaXclX+MiUA/YLOf998AvBDgsctF+8SaTBgwCIAhUyeRnO1bN7CaFDYcTebCpLoMbdWGZvEJjF+/hqM5OfRs0IjBLVoRZi1boTMQWlWrzsuX9uCFBX+iCIGqadSMiOTLa66lTlQ0blXl/C8+LRIo/D07HMrKotFH72H2M4ks0AviqpR0qZ3ECxdfSkJYOMdycxgydZLXma0wOW43R7KzSYwwzkE785ws/WUlacnpNO/alGadGnkvbrO/nB9QCkZRREDDaQBX3HYpv332R6mOZT5IyfWPXQPAvq0H/b7NFmpDMQk0VUNqkh1rSjBML4bb6WHd/I2kHk4jtkY0T/Z6mT0b93kL3fu3HeLJK17hy43vGfoiB/QxUgfnawYVoIFzOjK3LSLM2BvhVCDC7kCq+yHvVxA2kC6wdUdEPmX8fiEgZhwy7U7w7ANh0p3GIp7yW1wuFSUavz4MQZyQPp0J9IlgMHAb+l39KuBr4A9ZcoK5FlBYAOQA0NnojUKIukB9YH6A66kwEkJDDe+kPZokxn5CZ719Yk3aJ9as1LUNOK8FfRo3YePRo0TYbDSJjfNeTLccS0H10y1khAS/3gUSePLCi7m1bVEd/0ibjWh7iGEg0KQk3E8g3L1xH49d+gJulwe304PZYqJN9xa8+PMTJ4TfAuSiAYZfGR/qt6zD3R/cypgHv8FsNeFxqwGlikwWE4umLKd+yzqk+PFADo0M4daXBlO3RR2eu/p1wwGxgqcRIYwlecxWM+lHMzh2IJUD2w/5dDt5XB6mfzqHEa/rF+2czFzmfP0XGxZvIalpTfqO7OV3/kFzbwGZavwBsz+FcgQCKSW41+laPpbm5RZwE8KEiHoVGf6wrh1kStLNakrax5SIiJ+B9OwALR3MzfXhtrKuQQlH2q/Kt8wsfEMSgggbWer++myGPKvaVwPqFZNS7pBSPgM0ASagPx3sFUK8KISIrYB1DAGmSCkNrxBCiDuFEKuEEKtSUnynScvDLW3aYy/Wh28SgloREZxXinpoaTg9HlYcPMC65MMl5vpLwm620LFmLZrGxRdJF4RZLOXO8xfG3/ruaNfBZ07BajJxab363hbUwkgpeWngO2SmZpOX5cDj8ujF1b828tvnulpqr5u7eztqimMLtWEPtWG1W3jg0zv8DmoZ0eeOnkw6NJbHv76X/vdd6XeQqzBup4c/f1gMQMdebbCG+AY3j8vDpTd0w2ozY7EZ3zvF1ojm0iEXct4FTYq4ohUgpSSpaU2S96QUcT0rfI79Ww8BcDw5jeHNH+KrZ35kybR/mPLuDIa3eIhN859HS+mJduxaZN7PJwr9nv/8f0BZdrtLqaUhU/sh025DZr6APHYdWtpdSFm2WkxhhCkeYT2/1CBQZB9zI4S1Y7mCgPdYUS+DvSdg1SUzRJhuZWn3P0wn1UNoqcOQRzshj3ZGOzYQ6dlV7rWcDgRcIxBCtEZ/KugNTAV+ALqh38UbiYYfBAqP+NXO32bEEOBeP68hpRwLjAVdhjrQNQdCx5q1ePqi7ry2eAFmRcGjaSRFRvFlv+vKJd88d+cOHp07G9AvAmFWK+OuvpaW1cqmc1Mcm8nst+PHoiioUmISIiAHM5vJ5FeEbmDzluw4fpxv/12L1WTCpWp0SKzJWz2MfQQO7Uwm5YDv3akz18V3L/7E/AlLqNsiicYd6vPf2j04sh1Y7RYUReHRr+4hJz0HIQQXXNORmOon3zcQHh1Gt2s706JrU375aHZA+7idHqa+/xvV6iYQERtGZorm7QKyhVpp36M1rw0bjSPb6W0tLYwQcF6XJjw94SFSDqRyV9vHyM3M8xaf7aE2Rrw+FKvdSsO29fAYFKVtIVZadtMnjb9+9kfSj2Z693e7PLhdHt4duY5xi3QZE5kxClwbEFHPg6WD/w9nKvtTrMx4Gjw7KFLFcS5DZn+OiDAeEjxTEMKOiH4PqWXoBXZTbb8S3gBSuvT0m5aCtzPLswGZOgQS5p/x8wwB+RHk1wjSgS+BqVKe6OUSQkyTUl5nsI8Z2A5cjh4AVgJDpZSbir2vGfA7UL+UVBMQPD+CXLebjUePEG230yTu5LsRth5L4c0li1idfIhom53DWZk+WchIq5V/RtyNzc8k8MmwPvkww37+iVy3b5qiaVw8s4fdwiNzZjFj+1ZDbaTC2Ewmltx2p18THIC0vDy2px4jMSKCOlH+L9D7tx3kno5PllgDMJkVLDYLt782jOPJacQkRHHp0G7EVPMvrV0Wprw3g2+en4jb4UbTJEIRRcxgQL+ImyxmvThuMxMWFcYFV3dg7fyNRMaGgxDsWr/H+3mEIkBSJAjbQq28Ne8FrzZQyoFUfnx9GmvmbSC+ViyDn+hXpAPo5cHv8c9vq731DJNZITIugq+2jCY8OoyB1YeTkZLp83nMVo0JqzcTFVfwzbIiEuYhTDXQjl2bPxhW5NNB9NgTcs0ngZQO5JH2GJbylWoo1XznQs5mpGMOMuOpYjMQAKGIyKcRoYNOybpOhorwI7i+oOhb6KD1pZS7jYIAgJTSI4S4D5iD3j76lZRykxDiJWCVlHJ6/luHABMDCQLBJNRioVOt2mXad1facfpP/B5X/t13tsv40Tnb7eavPbu5slH5204bxcYZpnMsikK3Ovog1t0dOzPrv20+gnGgO5gJ9JTQWz2uLDEIAMSEhNC5dukaLrWb1CQyLqLEQKB6NFSPkz++mc+nq98u9ZhlZeAjV5PYsAY/vj4NR46TTr3bMXf8Apx5LjwuFU1V0VTp7fxxuzw4c10c3nWEr7eMZvvqnTxyyQtF+vtlfkAxm02YbRbMFhP3fzKiiEBcQu04HvjkDr/revqHB/np3RnM+GwOjhwnXfp04LZXbyA8Wm8vtYfaMEzoSLDYCv0/F1ZdpsFUAxH7HTLtHnCvzH/RAhHPlCkI6OcqIe3o09N/ZiK1THDMQHoO6FpMtsvQ718NUA/4+dy5SHVfQLqupzOBBoIpQHFH8ClACc+kIKWcBcwqtu35Yj+PCnANFcahrEy+WruadUeSaRIbx4j2Hb3aO2Vh1II/vUGgJDQp2Xg0uUICQZjVykOduzL6n2XeWoFZUYiw2bizvW6Ok+fRh7MKowjBRXXqcln9hlgUhR4NGpUaBE4GIQTPTHyYp654Gc2jldjFs3PdXn2OwCCnXhH88e0CRt/9BZpHRfVoJO8+yiWDLqBz7/akHkpj/AuTyMkoWgjXVI21f27E5XCxbv5GHxMa0INB/4evoveIHtRsWOOk128ymxjyZH+/7bZ9R/bi+5d/KpKGMps12l2cTWh44e+ZBoqeahRKOCLuW6R2HLQ0MNUpl0WjUMKR5obg2VZ89WAru/Xp6YJ0b9bnLqQHyEPmhYKpDsT+aKy1ZG6eP9hW7PsgQhGWlpWy5mBSYiDIT9u0AKKEEIXv/COB0itxpwlSSlYeOsjRnGyi7XbumTUDp8eDW9NYn3yYX7dtYXz/gXSsWatMx1916FDA7y1emC4Pd3Y4nwYxMYxds4qUnBwurluPezp2JiFM/yJ/tmoF7mIdOqqULD+wnzd7XOl9X0XTvEsTvt81hvkTlnDsYCq/jpmDw8DK0mw1G5rHBIoj18m87xaxfsFGqtdLoO9dvahRT2//y0rLZvTIL3A5TlxMnblOFv20nMuHXczFAy/g2xcn+zmyRNMkkXERWKxm1GIdPla7hep1qpHUtGzfl9K4/tGr2b56Z76gnwmpqSTWyeSxD/YVepcJlEQfeQihxOoy1QEitex8g3gT2Lp6ZasBRNTr+RdLN7owXQgoYYiIR8v1+U4HZPoj+T7HBRtywbMLmTMWEfGw7w7WLmBqlB8YC54MLKDUAFvZZM1PJ0q7KjUF+gLRQOGpnizA/7PvacSF1AToAAAgAElEQVThrCyG/TyZozk5CPRhscIpFVVK8jwenpk/lzk33lq2kwT4XGgSgjY1jKdky0qPBo3o0aBoS19ydhZ70tPZfvyY4WyBBAb+9CNhVitDWrRiaKs2mA06WcpDZFwE/e/Xp1oX/rScZINAEBJhL3NBPistm3vPf4q0I+k4cpyYrSZ+/eh3Xp7xFG0vbcmqOesxWZSic0vo8hALJi6h/eWtqNciiY1LtvocO7paFPZQGxcN6MyYh7/2eV0xKXQfErjA4MliMpt4fvKj7N92kB1r91C9XgLN2h6EzKf1C5ZUwdISET26XA0NWt5MyPif7m2gb4HoDxG2iwH0O934ObovsrpDN3sJGXDGF0alekRP9fjggrzpYBAIhBC64mvOJ5D3i/7/wN4HEfFguZ68ThdKDARSyl+BX4UQF0gpl1fSmiqU+2bPYH9GRqkF051px3F43AH5BRenY2Itluwv3ZRGk5J2NYI3i+BSVR6fO5s/du7AajL5rVW4VJX9mXoW+s2li1i8bw9fXH1t0NZl1EUEkHksq8ypoYlv/ELKgVRvft/jUvG4VN685SMm7P3MUBob9F9oU/6QWsp+47mBzOPZqB6VsKgwnpnwEC8OfMerV2S2mHjsq3uITqjYwrYRSU1rFXrqaIK0LQF1ry4XbSpf95lUD0LGU4CzyBCNTLsfqi1E5DuBCVO1M75DyJcSvm/C/2tCCUVEPA4RjwdhTaeWEm8DhRBP5P/nUCHEh8X/VML6ysWR7Gw2pxwtNQiAnl/3ZzhTGqO6X4bNVPq+ihAs2BO8vuP3li9l7q6d+daZroDsxPM8Hpbt38e/BkJ8FYXZz4VeMSllTg0tnvq3obxD1vFsDu86Qscr2qKpvv8C1hArPW7UC6iZqcZKtJpHw5HrRPWofPrIeNRCshWapjH28e9wGVhWVjSappGVlu1tIxXChDA38AYBKR1Ix19Ixx9eA5pAkXm/gZF4iAAcc8u58tMbYYrXLTd9Ln92CDEWOTzbKe23cEv+36uA1QZ/TmvyPO6AsjY2k4lrmzX3egGcLA1iYvl58DC6161PhNXq1yhGlZJ9GWUf8CmNCRvXF5GcCBRVSlYfDrzOcdL4SV+UI6uBzc9AmqZJbKE2QiNCeG7Sw9hCrNjD9AE1q93CgIf70KJrU0BXQDUirlYMoREhrJi9luPJaWieQoFAlWQdz2bpz8a+vhXF3O8WMqTWnQxKvIP+sbfy1bMTikxkS+cy5NELkBmPIjOeQh69AC13eglHLIbMwbA1VKr5JjdnNyL6A13QToQBFhAhYGmLCLv9VC/tlFBaamhG/t/jK2c5FUudqGgfd67CRFituFSNrklJPH9x+TohmsUn8FU/vZ5++6/TWLDXWItGKc/VrxSMZgoCwaIoVA9S4RjwKwSnqbpsdVlSQ/3uvZLPHh1fpLVTMSk0aluPuERd9rlznw5M2P8ZS39egSPHyflXteXvGau5PnEEWalZJDaogdVuwe3yeOcLbKFW7vtwOEII9m4+YChTkZftYM+m/T7bK4q/f1vN6LvHFvFFmPbBLKQqGf76MKSWhUy/G2Qx/+HMZ5DWtogC45kSELbuuuOXj4exAraLKuaDnMYIc13d2Mc5H9TDetHd0r5cNZczmdK6hmbgX8sMKeU1Fb6iCuRYbo6hhDRA9bAw3ul1FXUio0mKqth8b4uEaizcu9vnH86iKNSNDp7Kdutq1Q29FkpCF5wzc3n9hqW+t6y0uLAp6+Zv9NneoE3dMreO9r7jcjYt38aiycv0YwiIqRbFs5MeKfK+yNgIr63k18/+yNQPZnqDx4Hth7DaLbS9tCXJu49Qq3Eiw54dSMt8L+E6zWphDbH62GKGhNupe15wOoYAvh01yWeC2Znr5JePZ3Pzi4Mwq/NAGl2wVGTedETEfaWfxNIO7FeA84/8JwAB2CF0CMLcoLS9y4XUMpF5U/UZCHMTRMgghOnkjZ3KixBWsBtPyJ9rlNY19E6lrCJIHMjM9BvF0hwOLkzy74BVHga1aMW4tat8nL9CLBYuqxe8X7IXL+3BDVMn4fJ48EiJWQjMJhOtEqqzNTWFmJAQrmjYmN+2bSXd6UACtSIiGdP7mgqZdvbHPe/fyoPdnsPlcKG6VRSTomsJfTKizMdUFIUnv7mPG58dwLYVO4irFUuri84z1PEBvdW0cBAowOVwY7Ga+XbHJz77dO7TnuiEKFx5bm+e3mRWCIsOo9uALmVee2kk7zXW09JUjey0HKLDczDM76OCDKxWIISAqDfB1QeZNwOECWG/FmEL3ucCXa9Hpg4ALQe9pWseMmccxE5EWILn71FFyZSWGlpYWQsJBpEGomgFBFLcLStJUVF8cEUfHps7GyEEUkpCLVbGXXNtUC+4rapVZ+YNN/PFmpVsTjlKi2rVGdGuo89TyFMXXszu9DTMilKiXERFUb9VXcauf4cp781g28qdNGhdh+sfvYbaTcrfQVWrUSK1GpXeknvs4HEUP05zuzfsM9xuMpsYvexVPr7/S5b9uhKpSTr3bc/9Hw3Hagtey2DDNvUMn6CsIVYi4yNAXgxZb/ruKEIQ9sB72oUQYLsEYSvj9HEZkJmv6wNv3kDmBOlCZj6LiJtUaeuooiilpYYmSykHCSE2UDRFJAAppaw8s9MyUC86hjCLhRyD3PkVhdzFPJrGHzv/48/du4gPCWVwy1blmjQGuKJRY7rXq8/a5MPYTCba1EgMan2ggLrR0bxymX8FRdAvAOX9fCdLjXrVuO/D4ZV6zsLE1YzxW6uo28JXWiQvOw+z1UxMtSiem/SIV1uoMnLIt786lMcvH1UkPWQLtXHbKzdgMpmAOsjQWyD3O/S7aqkraFq7g6WMGv2VhWsRvk8zEtzrkdJVovBbFcGjRNE5IUSilPJwvl+AD1LK0pvnK5iTFZ2buX0bj86djSs/TaMIQbTNxqxht1AtLByXqjJs2mS2HEsh1+32plPe6nEFfZs0C9bHqOIUMPaJ75g+Zo6PN3BhwbhNy7bx/p2fcWD7IRRF4ZLBXXngkxGEhIf4O2xQ2Lx8G+Oe+oGd6/cQXyuWm18YxCWDig6xSdcKPdcu3Qj71frdvZHL12mEduR8P9LYFkT19f61fqooNyWJzgWkPpp/kBrormMSWCmlDF7jeQmURX10/ZFkxq1ZxYHMDC5MqsOtbTsQn6+v8+OG9byyeIGPtn+oxcKqO+4u04BZFSdPekoGnz78jd6WKQQXD+zCyHdvITKuZPMPj9vDkb0pRMVHekXb/KFpGhPf+Jkp784gKy2HOs1rc+8Ht9G+h/5ge+C/w9zd/vEignkWm4WW3Zrx1tzn/R22ipNAy3wDcn+gqCGMBexXoES/d6qWdU5Q7kAghBgBPI/uPSCAS4CXpJRfVeRCA6GiZaiHTZvM8gO+rYDhViuf9+nHBUmlt+JVUT7cLje3n/cQKftTvUVZs8VEjfrVGLfxfb+dRTO/mMsXT3yPmi8q1+26zjzyxUi/pjeFkVL6pHk+um8cM8fO8zGxt4VY+Wzt2xVS0zjXkdKBTLsLXOtODJKY6iFixyOU4E9rn8tUhAz140A7KXU/PCFEHLAM3ansjCbUYpyTlFIS4sespSxIKZm+fSvfrl9LjttN70ZNuL1dB792j+cSy35dRUZKZpELsMetkno4jX9mraHrNef77LNi9lo+ffibInn0pT//g5SSZyY8VOo5jXL9ezbt9wkCoIvjHd51pCoQVABC2BGx45HuzbqAm6k+WNqcs/37pwuBBoJUdKG5ArLyt53xDG3VmmX79+mSzYWIsNpoXT1wG73SGLVwPlM2b/SmoPampzFj+1Zm3HDjGZ9+klKyYvZaZo/7E6fDxeVDL+LSIReWOiOgqipH9x5j+6qd5BmI0jlzXezddMAwEEx4bZpPr73L4WbpzyvISssmIubkhdGaX9CELX9v9+oKFeB2uqnXsurJsCIRluZgaX6ql1FFPqV1DRVM5+wA/hFC/IpeI+gH/BvktVUK3evW56bWbRm/fg0mRUFBYDGZ+LLfdRXW5XMwM5PJmzYUmStwqiqHsrL4ddtWBrdoVSHnOVV8/th4Zo6d582tb1y8hXnfL+K1WU/77etfPO0fRt89FkeOE4/L4zV/L4wt1ErtpsZ34cf8CNmZLCYyUjLLFAj639+b3z6bi8etFpk0vmjgBSTUrvyBpyqqqCxKeyIoqNTtzP9TwK/BWU5wyHI6mfXfNpJzsmlXoybd6tT1XuSFEDzV7WJuatOWFQcOEGW3c1GdulgqcM5gTfIhzIriM2CW53GzcO/uMzoQHNqZzIxP/8DlOPFE5chxsmnZNlbNWU+nq9r57LNt1U7evPlDQ//fAkxmE5GxEVxwtbH3UYtuzUg5kOoTPBRFUL1eQpk+S1xiDB+veJ0vnvyetfM2EBIRQv/7ruT6x07rAfoqqig3pQ2UvVhZCwkWm1OOcsPUyaiaRq7HTajFQrO4eH64blCR4a5aEZFce15wHlUTQo27WcyKQq2IyKCcs7JY++cGhMFdvyPbwd+/rTIMBFPem4Erz3e2Q1EECD04d+nbgQfGjMBsMf6K3vzC9fw9YxWOHKc3GNhCbQx/fRgWa9lTbbUaJTJq6tknM1xFFSURUI1ACJEAPIHuVuZ1JpNSXlbKflcCo9EFwMdJKd8weM8gYBR6ymm9lHJooIsvDSkl98/+jSzXiVa1XLebzSkpfLl2Nfec37miTlUinWrVJtoe4mOKY1YUhrZqUylrCBbhMeEoJt8Umtli8tv6mbzrCEbdaiHhdl785QlaXdzcb0qpgFqNEhmz6k2+HfUTGxZvJqF2HDf87zq69C3RPbWKKqowINBi8Q/AJHS3spHALYCxIEo+QggT8AnQEzgArBRCTJdSbi70nsbA/4ALpZRpQohqJ/8R/HMwK5PD2Vk+2x2qh2lbN1VaIFCEYMJ1g7jrt1/Yk5GOSeh1iHd6XkX96JhKWYM/NE3jzWWLmbjxX1yqSrsaNXm755XUigzsSaVzn/aGF22T2cQVtxorura5tAU71+3BXcxPwO3y0KBNvVKDQAG1GiXyv+8fCOi9VVRRhX8CDQRxUsovhRAP5usPLRRCrCxln07ADinlLgAhxET0IvPmQu+5A/hESpkGIKU8enLLLxmBQNWM5yQCnKOrMJKiopg17Bb2pqeT63HTODauwu0hy8LAn34solj698H9XPrtlyy97Q4SwkovuNpDbbz++7M8d/UbuJ1uELo42uNf30tiA2MXrQEP9WX2l/PR0nNQ87X+7WE2rn2gd5mKvFVUUUX5CDQQFCR0Dwsh+gCHgNLEamoBhSe1DgDFb8GbAAghlqKnj0ZJKX8PcE2lUjMiAn8q2pUhtmZEMGWoT5atKSmGstUeTWPUwvl80juwIul5nRsz6dBYNi/fjtvppsWFTbGF+B/qiqkezWdr3ua7l35i1Zx1RMVHMPCRa7hsaLcyf5Yqqqii7AQaCF4RQkQBjwIfAZGAr8Nz2c7fGOgO1AYWCSFaSSnTC79JCHEncCdAnTqB93MfzMr0O6iyLyPdcPu5xG//+Rq3F/D3ASNzb/+YzCZaXXRewO9PqB3HI2NHntQ5qqiiiuAQUCCQUv6W/58ZQKBWXgeBpEI/187fVpgDwD9SSjewWwixHT0wFEk7SSnHAmNBl5gI8PwIhN9ZgMpQAj3dKempKDak7CJrUkp+/nAWk9+ZTlZqFk06NmTke7fStGPJ5jcZxzJZMWstCL32EBlbss5QFacHUkrQjoCwIZRTW/OqomwElKQWQjQRQvwphNiY/3NrIcSzpey2EmgshKgvdG3ZIUBxU9Vf0J8GEELEo6eKKszdvVZkpGF7pt1sZuB5LSrqNGcsA89r4bdO8XCXC8t83HFPfc9Xz/xI6sHjuBxuNi7ZymPdXyjR3vGPbxcwtM5IPrpvHB/dO46hSSP5a9LSMq/hZJBS4nK4DDuZqigZ6VqFPNYDmdITebQbWuqNSLVCS31VVAKBViu/QO/ucQNIKf9Fv7D7RUrpAe4D5gBbgMlSyk1CiJeEEAXJ5zlAqhBiM/AX8HiBnlFF8dFVfYmy2Qi1WFCEINRioXX1Gtzatn1FnuaMRFEUJg4Y7GPSc3vb9vRu3KRMx8zNyuOXj343cAJz8cMrUwz3ObovhdEjx+JyuMnLdpCX7cCZ5+Kd28aQejitTOsIlD8nLOaGpLu4OvxGrou/jUlv/1oVEAJEqoeQacNB3Y+uJuoG92rk8Zuq/g3PMAKtEYRKKVcUy7d7/L25ACnlLGBWsW3PF/pvCTyS/ycoNItPYOntdzF7x3aSs7NpXyORLrWTqkSu8mmfWJMt9z7Esv37SMnJpmeDRoSWQwgvefdRzFYTrmLSQZom+W+N8cPewsnLvZIORRCweOrf9L/vqjKvpySW/bqS9+/8zDvhnJ2Ww/cv/QRSMviJ/kE559mEzJ0IsvhlQNXTRO5VYPXViKri9CTQQHBMCNGQ/BYcIcRA4HDQVlXBhFosDKhKBZVI1wqS205IivMRbQNdcbjOeb5OYAAupxuPgeqn6lFxO3wnkCuKr5/70UfmwpHj5MfXf2bgo1fnu4FV4Rd1LycaCgsjQK1cuxIpXeD4A+n6B0y1ECHXIUwVOpZ0VhNoauhe4HOgmRDiIPAQcHfQVlXFGUtETDg9b7oYW2jRpwpriJVhzwww3Kdz7/aGgx2aR6NTn+Cl8JL3GM9EOnOdOAzUUKsohqUTYNBUID1gaVlpy5BaNjL1OmTGs5A3CbI/Rh7rhXRVnG/J2U5AgUBKuUtK2QNIAJpJKbtJKfcEdWVVBIRbVflz904mb9rA7vTg5tMD5f5PRtDv3iuxh9kQiqB2k0Re/PkJmp7fyPD9qYfTUAwkq01mhdRDwftMdZoZK5uGRYUSEnHiApednsP8CYuZ9/0iMlN9J9XPVURIf1BigMLaTnaw90CY61faOmTOV+DZC+Tmb3GBzEWmP1JVqwiQQLWGXgPeKujvF0LEAI9KKUvrHKoiiOw4nsoNUyfj9HhQpUSTGv2bNue1y3ue0hqI2WLmjjdvYvjrw/C4Vay2kkXgtq3Yger2TQ1pmmT7yh20vzw46qzDX7+R5695A2deMZP4V2/wylwsmrKcN2/5GJNJ/1n1qDz42Z30url7UNZ0JiGUMIifhsz+BBxzQdghdBgi9MbKXYhjJkWtL/PRMvT0lble5a7nDCTQ1NBVhYe88iUhegdnSVUEgpSSO2f8wvG8XLLdLvI8bpyqyoztW5m+3f+gWGWiKEqpQQD0uoI9zHcS2RZiJSEpPhhLA6D95a148ZcnaNimHla7hZqNavDI2Lvoc0dPANKOpPPmLR/jynN5u5lcDjejR47lyN4SpbbOGYQSixL5HEq1RSgJf6CE3YIuM1aZi/DX3KCV8FoVhQk0EJiEEN7fVCFECFC6MWwVQWPH8eMcycnxEdDI9bj5YcP6U7KmsnLJoK6YrWYKP8QIIbDYLXS7rlNQz92hZxs+W/s2M3MnMH77R1w29CLva4un/oPRc5WmSRZOXhbUdVVxEtj83JMqsQhTlb1oIAQaCH4A/hRCDBdCDAfmAuODt6wqSsOpevxORzs8weu0CQahESG8t/Al6raog8VmwWKz0KB1Xd5f9HKJmkXBxuVwoWmaz3bNo+J0+DfVAUhPyWDF7LXsWLu7Kk8dbNTDYBSytSx00YIqSiNQiYk3hRD/Apfnb3pZSjkneMuqojSaxSdgMSk+3Xt2s5lrmgSu+XO6UL9lHb74911SD6chBMTWOPVSBZ37tOfr5yb6bLfYLVzQt6PhPlJKvnpmAtM+mInFZkH1qNSoX5035jxLXOKp/0xnJe6lGIpLCi2/RmDcpFDFCQLWQZZSzpZSPpb/pyoInGLMisL7vXoTYjZjUfScbKjFQqPYOG5sfeaa3cQlxpwWQQAgqWktBjzUB1uo3v0khMAeauOq4ZfTqJ1xV8ziqX/zy0ezcTnc5GTk4shxsm/LAV4c8HYlr/4cQvHjJy09IE6P79LpTqBdQ9cBbwLV0J/BBPpg8Jnts3iGc0m9+sy58VYmb9rIkZxsLqpTlysaNq5Qv+WKYvvqnYx/YTK7/91LUrOa3PTCIFpe2OxUL6tUbn91KBdccz5//rAITZNcNuRCWpSw7qkfzMSRU7SDRVM1dq7bw9F9KVSrUzY/5bIgtRxk3m/g2QLmpoiQqxHK2ef3IMKGI9OfAPIKbbWAtTPC5CdIVFGEQCeL3wKullJuCeZiqjh5akdG8cgFZReIqww2Lt3KU1e8givPiZSQciCVTUu38cLUxzj/Sl9P49ON8zo35rzOjQN6b3ZatuF2k8VEdnou1SpmgLtUpHoYmToAtBz0C2QIMns0xE1BmI0nvM9UhP0KZPhuyB4DwgzSDZa2iOj3TvXSzhgCDQRHqoJAUaSUrEk+xJ70dJrExdOqmrEbV0Wfc+n+fUzdshGPJunXtBmX1W942ktqf/boeB8ROmeei08e/Jpvtp3+geBk6NrvfA7vPOJjw2kym6jbvPIuwDLzJdCOAwXF7jyQTmTmKETsuEpbR2WhhI9Eht4Inv9ASTjrgl2wCTQQrBJCTEKXjfb+RksppwVlVac5mU4Hw6b95J3klVLSqnoNvr7mOkIspffNl5VXFy/gx40byMvvCvprzy4uq9eA0Vf2Oa1F9Hat32O4/dCOw6geFZPBVPGZyvWPXcP8CUtIT8nEledCUfQ22Ic+u6tyP6dzESeCQAEauJYipTytvy9lRSjhYD27biwqi0ADQST6/HavQtskcE4Gguf/+pPtqcdwF2otXJ98mHeWL+G5iwP17Tk5dqUd54cN/+JUT9xp5rrd/Ll7F6sOH+T8mqfvHVBkfCSpB4/7bA+JCEExnXrf5ookMjaCsevfYebYeayas45qdeLp/0BvGrWtPMkFHRPGgnBnT9AtCSldyNzJ4JgBwo4IvQFsV5yVAbAiCLR99LZgL+RMQZOS2Tu2FwkCAE5VZeqWTUELBIv27sGoRS7P4+av3btO60Aw5Mn+jHvqhyLpIVuojese6ntW/mKGRYUx6PF+DHq836lbREgfyJtO0WBgAfvZfzGU0oM8fiO4t1FQQJaudRDyNyJq1Cld2+lKoF1DdmA40AKwF2yXUt4epHWdtmhSovoZEHKpvno5FUWY1YpJUaDYOSyKQrj19B7y7nfvlWQcy+Snd2agKAJN1eh7Zw9ufM5YjbSs5GTkMPHNX1j403Jsdit97+5F37t6npNy0iLif0j3Zr2PXqogTGCqjYh8vvSdz3Sc88CznaJdRHmQNxUZdhvCXLdchy8YEDybAmqgqaHvgK3AFcBLwDB017FzDrOi0L5GTVYfPljk/lxBcEndekE77xUNGzFqwXyf7SZFoV/T03uATAjBLaMGM+TJ/qQcOE5czRhCwuyl73gSuBwu7uvyNEf2pOB26nfBXzzxPRsXb+GZHx+u0HOdCQglEuJ+AdcKvYBqbqi3U4qzKxVnhHQuAZlr8IoCrpVQxkAgtePIjFF6oEEibRcjIl9EmGqUZ7mnBYF+KxpJKZ8DcqSU44E+QOfgLev05rXLexJhs2E363E0xGwmJsTOs0FKCwFE2ux83rcf4Var90+I2cw7Pa+kVuSZMc5hC7FRu3FihQcBgL8mLuXYgVRvEADdV2D59FXs23ow4OO4XW7mfb+Ilwa+w+h7xrLTT6H7TEAIgbB1RoTdiLBdcE4EASB/wMygaUMo+bLZJ4+UKjJ1SH4Q8AAqOBciUwci5ZnvXRHoE0HBb1e6EKIlkIw+XHZO0ig2jvk3385PmzayNfUYratXZ8B5LYm0BTdF061OXVaOuJvlB/ajahpdaicRVg5bybOJdX9t9BnkAhCKYMvf26nTrFapx3A53Tx6yfPs2bQfR44TxaQwd/xC7vt4OFfedlkwll1FEBAhA5E5X+NbLLeC7SKjXUrHuQi0FIo69Gogs8HxO4Sc2damgQaCsfkeBM8C04Fw4LnSdhJCXAmMRm9VGCelfKPY67cCbwMFt2wfSynPiCbn2JBQ7uoYXGVMI2xmM93rVXYHyulPjfrVsNjMPjaZQhHE14oN6Bhzxy9g98b93qK2pmo481x8fP9XXDKoa1CeZKqoeIQ5CaJHIzMeQ2+w0EBEI2I+R5RVllrdBdLA80DmIt3/IQyM2s4kSgwEQojCpvIFnUOf5P8dVsq+pvz39gQOACuFENOllJuLvXWSlPK+wJd8+uBWVTKdTqLtdr2QW8Upo/eIHkx5d0aRQKAogojYcNpeFpht4sKflvsMvoHulLZ52TY69DxzNZzONYT9UrD9De6NIGxgPq98xV1zI/04spgftwhFWJqWb7GnAaU9EUTk/90UOB/9aQDgamBFKft2AnZIKXcBCCEmAv2A4oHgjENKyUcr/uaLNStxaxp2k5kHu3TltrbB89etomQSasfx6syneeOmD8lMzUJTJQ1a1+G5yY8G3DUUHm18byOlLGJdWcWZgRCWihsws3YDpYbeheVND5lARIH9ioo5xymkxEAgpXwRQAixCGgvpczK/3kUMLOUY9cC9hf6+QDGBeYBQoiLge3Aw1LK/cXfIIS4E7gToE6dShJrKYHPV6/k89UryPPoXwiXqvLOssVEWK0MbF55pt1G7E1PZ9zaVWxKOUqLhGqMaNeRutHRp3RNlUXri5vzw55PSd59FIvdQnzNwFJCBfQd2YuVs9fiKPZUEB4VRrNOVVLG5zJCmCDuR2Tma3pNAA1slyEin6WQZ9cZS6D5jOpAYScOV/628jIDqCelbE0JZjdSyrFSyo5Syo4JCZWn3uhnLXxWKAgUkOfx8OGKv0/RqnQ2Hj1Cnx+/ZdLGf1mXfJhJG/+lz4/fsvHokVO6rspECEFig+onHQRAt64c/GQ/LHYLoREhhEaEEFM9mtdmP+P1MK7i3EUo0SjRb6HU+BelxkaUmKyH6lIAABJdSURBVA8RprOjZybQYvG3wAohxM/5P/cHvilln4NAUqGfa3OiKAyAlDK10I/j0FVOT2vcmkaW06BoBBzNMVaerCxeWPAnue4TnRIeKfG43Yxa8CdTBg09hSurHHIyc/np3RksnLQMW4iVq+/uxVUjLj+pi/iNz11Pnzt78u+iLUTEhNGme4tyawTl5TiY+t4M5v+4FLPFRJ87e9D3rl5nlcZSFWc2gUpMvCqEmA0U9F7dJqVcW8puK4HGQoj66AFgCFDkaiSESJRSHs7/8RpO0yE1l6rywd/L+GHDOnLcbsyK4iMxAXpb6alk/ZFkw+3r/Gw/m3A5XDxwwdMc3n0Ut0MPhp8+Mp5/F2/hf989cFLHioyPoF7LJMKjw8p9sfa4PTx80XPs33oQV/66vnjyB9bO38ioqY+X69hVVFFRBPpEgJRyDbDmJN7vEULcB8xBbx/9Skq5SQjxErBKSjkdeEAIcQ169eU4cOvJLL6yeHzu78zdtQNHfjpIM5CYsJvNPN3tkspeWhFCLRayXb5eumFBVEQtTF6OA03VCIsMrZTzFWbh5OUc3XfMGwRAHyhbMu0f9j97kKSmpc8RACz5+R/ev+tzXA43qkelWafGPDf5EWKqRZVpXUt/WcmhHcneIFCwrlVz1rFj3e5TIEZXRRW+BDXxKaWcJaVsIqVsKKV8NX/b8/lBACnl/6SULaSUbaSUl0optwZzPWUhOTuLP3b+5w0CBShCEGsPIdJmo0NiTb7pN4CuSae2kH1Dy9beaecC7GYzw1q1Dep5Uw+n8b8rX+Ha2FsZEH8793R8gt0b9gb1nMVZM3+D4UCZogg2L98e0DF2rNutdx0dy8KR7cDtcLN5+TaevurVMq/r30WbyMs2mDyVsCXAdVVRRbAJ+IngXGVPejpWkxlnMbE3TUrqREUxbfCwU7QyXx69oBuHs7KYu2sHVpMJl6rSs0FDHurSNWjnVFWVRy5+jiN7U1A9errsvzW7efiS5/l2x8dExkaUcoSKoXrdBCxWs48hjKIoxAVYOP7lw1lFnigAVLfK/m2H2PXvXhq0PnmNmoSkeKx2S5EnAtCNagJdVxVVBJuqVohSqBcdXcQDoACzEDRPOL06BqwmEx9e1Zf5Nw/nsz79mH/zcEZf2RdrENU31/65kbSjGd4gUIDH6eGP8QuCdt7i9B5xOSZL0c+pKIKw6FDaXR5YS2/ynhQ0zTftZ7aYSD3k66cQCL1uvsSnziCEwBZq4/yrgvukVkUVgVIVCEqhRngEVzRs5JNysZrNjGjf8RStqmQSIyK4IKkOiRHBvxs/vOsImse3cO7Mc3Fg22GDPYJDtaR4Xp7+FHE1Y7CH2bDaLTRoU493F7wY8EBZ+56tsdp9JQjcTjeNOzQo07pia8Tw2uxnSEiKxx5qwxpipV7LJN5b+CIWa+XUbqqoojTOidSQ0+Phrz27OZKTRbsaNWld/eRkY9/ueRXv/72UCRvWk+N206Z6DUZ1v5x60WVTMjybaNSuPkLxHd23h9k4r0tghu8VRdtLWzJh32cc2pGM1W6hWp2Tmzm5emQvpo+ZQ0ZKJp78FJM9zMY1915JdELZisUALS9sxg97xnBwRzIWq5nqdU/tLEwVVRRHSD8mK6crHTt2lKtWrQr4/bvSjjN4yiQcHjceTUMRgs61k/i8Tz8s56BhSUUjpeTxy19ky9/bvXlwk8VEfM1Yvtz8PraQM2vqMj0lg4lv/sLyX1cRERvGgIevpvvgrmeVCUkV5yZCiNVSSsM0xlkfCHr/MJ5tqceKmMiEmM081rUbt7XtUPELPAdxOVx8/8oU5nz9Fx6XyoXXduL2V28o1110FVVUUbGcs4HgUFYml3/7lU/HD0Dj2Djm3HhrBa+uiiqqqOL0pKRAcFYXiz2a5veR3mMwGfz/9u4/yKryvuP4+7PL7soiIhTGqFB3VaISlR9FA0karUqLP4KdmEZMmpBEy8SqI8ROitKxE1NnqjP1RyamDqYmnalKq8aUAadqiWmjbQQUURQRlE2BIGADQfm1LPvtH+esOSzrsqR7Oefe83nN3Ln3POe5dz/cs8v3nufc8xwzszKq6UIw6pghjGg+eGrhpvr6wl/n18zsSKnpQiCJ70y9lEENjR98/bO5oYFThg4r7Fc/rfqtemENN0y+halN0/mTj1zN/DueoNN7oFZgNX2MoMuvdu/ix2+sYuN7OzjnhJFcdPIpDPC0wlYB61b+DzdMuuWAK501NTdxyTUX8uf3fLWXZ5pVVm/HCEpxHsGwgc18bby/IWSV9/Dtj9O+58CJ//bu2suiec8w41ufZ9CQXq/wapaLUhQCq007fvUeC+9/muWLV/KRlhF8dtaltJ51+PMB9ae1y9uInqapaBzApnVbPNuoFZILgVWlbZu38/UJ3+T9be/TvmcfdfV1PDv/eebOn83kz+R3/KflzFFsXLOJ7kOuHe0dPqPYCssD5VaVHr79R+x4d8cHZzN37u9k7+527vqz+3M9MPvFuVfQOPDAOYSamhuZ8uXzGDz06JxSmfXOhcCq0s8XvkjHvoNPFNyzcw+/fCu/azSfOr6V2xfdQutZybUpmo8ZyBWzP8MN370mt0xmh+KhIatKRw9thraD2/d3dDJoyJG/QlrW2PM+xrwVf0dnZ6cvem9Vwb+lVpU+O+syjhp04IR2Axrq+dgnT/utLyvZ31wErFpU9DdV0lRJqyWtlTSnl35XSApJPsvL+uSiP/00l86cQsNRDQwaMpCm5iZazz6JuY/MyjuaWdWp2AllkuqBN4EpwAZgKXBVRLzerd9gYBHQCFwfEb2eLfbbnFBmtWvb5u2sXb6O4ScOy/2ro2ZFltekc+cCayPi7YhoB+YDl/fQ79vAHUAPV/g2693Q447lnKnjXQTM/h8qWQhOBNZnljekbR+QNAEYFRGLKpjDzMx6kdvRLEl1wF3ATX3oO1PSMknLtm7dWvlwZmYlUslCsBEYlVkembZ1GQycCfxUUhswCVjQ0wHjiJgXERMjYuKIET4708ysP1WyECwFRktqldQITAcWdK2MiF9HxPCIaImIFuDnwLRDHSw2M7P+VbFCEBEdwPXAU8Aq4F8i4jVJt0maVqmfa2Zmh6eiZxZHxJPAk93abv2QvudXMouZmfXMpz6amZWcC4GZWcm5EJiZlZwLgZlZybkQmJmVnAuBmVnJuRCYmZWcC4GZWcm5EJiZlZwLgZlZybkQmJmVnAuBmVnJuRCYmZWcC4GZWcm5EJiZlZwLgZlZybkQmJmVnAuBmVnJVbQQSJoqabWktZLm9LD+65JelfSypOckjalkHjMzO1jFCoGkeuA+4GJgDHBVD//RPxwRZ0XEOOBO4K5K5TEzs55Vco/gXGBtRLwdEe3AfODybIeI2JFZHAREBfOYmVkPBlTwtU8E1meWNwAf795J0nXAN4BG4IIK5jEzsx7kfrA4Iu6LiFOAvwT+qqc+kmZKWiZp2datW49sQDOzGlfJQrARGJVZHpm2fZj5wB/3tCIi5kXExIiYOGLEiH6MaGZmlSwES4HRklolNQLTgQXZDpJGZxYvBdZUMI+ZmfWgYscIIqJD0vXAU0A98GBEvCbpNmBZRCwArpd0EbAP2AbMqFQeMzPrWSUPFhMRTwJPdmu7NfP4xkr+fDMzO7TcDxabmVm+XAjMzErOhcDMrOQqeozArD9FBGteeptfrn2H1rNP4qQzRuYdyawmuBBYVXh/+07m/NHf8IvX11NXV8f+jv2Mu+BM/vrxv6ChsSHveGZVzUNDVhXuvXYeb61oY8/Ovex6bzd7d7ez/Ccreejbj+UdzazquRBY4XXs6+C5J5bQ0d5xQHv77nYWPbA4p1RmtcOFwApvf8d+Ovd39rhu7+69RziNWe1xIbDCaxrYxMljTzqova5OnDN1fA6JzGqLC4FVhZseuJaBgwfS0JQcGG4c2MjgYYOZeeeXck5mVv38rSGrCqeOb+XBVfew8P6naVu5njMmjebiay7kmGGD845mVvVcCKxqDD9hGF+5bXreMcxqjoeGzMxKzoXAzKzkXAjMzErOhcDMrORcCMzMSs6FwMys5FwIzMxKzoXAzKzkXAjMzEpOEZF3hsMiaSvwiz52Hw68W8E4/aVackL1ZHXO/lUtOaF6sh7pnCdFxIieVlRdITgckpZFxMS8cxxKteSE6snqnP2rWnJC9WQtUk4PDZmZlZwLgZlZydV6IZiXd4A+qpacUD1ZnbN/VUtOqJ6shclZ08cIzMzs0Gp9j8DMzA6hZguBpKmSVktaK2lO3nm6SHpQ0hZJKzNtwyQ9I2lNej80z4xpplGSnpX0uqTXJN1YxKySjpK0RNKKNOe30vZWSS+k2/+fJTXmmbOLpHpJyyUtTJeLmrNN0quSXpa0LG0r1LZPMx0r6TFJb0haJWly0XJKOi19H7tuOyTNKlLOmiwEkuqB+4CLgTHAVZLG5JvqAz8EpnZrmwMsjojRwOJ0OW8dwE0RMQaYBFyXvodFy7oXuCAixgLjgKmSJgF3AHdHxKnANuDqHDNm3QisyiwXNSfAH0TEuMxXHIu27QHuBf4tIk4HxpK8t4XKGRGr0/dxHPB7wC7gCYqUMyJq7gZMBp7KLN8M3Jx3rkyeFmBlZnk1cHz6+Hhgdd4Ze8j8r8CUImcFmoGXgI+TnKgzoKffhxzzjST5g78AWAioiDnTLG3A8G5thdr2wBBgHemxzqLm7JbtD4Hni5azJvcIgBOB9ZnlDWlbUR0XEZvSx+8Ax+UZpjtJLcB44AUKmDUdbnkZ2AI8A7wFbI+IjrRLUbb/PcA3gc50+XcoZk6AAJ6W9KKkmWlb0bZ9K7AV+EE63PZ9SYMoXs6s6cAj6ePC5KzVQlC1Ivl4UJivckk6GngcmBURO7LripI1IvZHsts9EjgXOD3nSAeRdBmwJSJezDtLH30qIiaQDK9eJ+nT2ZUF2fYDgAnA30fEeGAn3YZXCpITgPT4zzTg0e7r8s5Zq4VgIzAqszwybSuqzZKOB0jvt+ScBwBJDSRF4KGI+FHaXMisABGxHXiWZIjlWEkD0lVF2P6fBKZJagPmkwwP3UvxcgIQERvT+y0k49nnUrxtvwHYEBEvpMuPkRSGouXscjHwUkRsTpcLk7NWC8FSYHT6jYxGkt2xBTln6s0CYEb6eAbJeHyuJAn4B2BVRNyVWVWorJJGSDo2fTyQ5DjGKpKC8Lm0W+45I+LmiBgZES0kv48/iYgvUrCcAJIGSRrc9ZhkXHslBdv2EfEOsF7SaWnThcDrFCxnxlX8ZlgIipQz74MnFTwocwnwJsl48dy882RyPQJsAvaRfKK5mmSseDGwBvh3YFgBcn6KZFf1FeDl9HZJ0bICZwPL05wrgVvT9pOBJcBakl3xprzf00zm84GFRc2ZZlqR3l7r+vsp2rZPM40DlqXb/8fA0ILmHAT8LzAk01aYnD6z2Mys5Gp1aMjMzPrIhcDMrORcCMzMSs6FwMys5FwIzMxKzoXArJ9IasnOKnsYz/uvzPO/0P/JzHrnQmCWk64ziiPiE2lTC+BCYEecC4GVRvqJ+w1JD6Vz1z8mqVnShemkZa+m14toSvu3SbozbV8i6dS0/YeSPpd53fc/5Gf9TNJL6e0Tafv5afsCkrNgs8//W+D30znrZ0v6T0njMq/5nKSxFXuDrLRcCKxsTgO+FxFnADuAb5BcI+LKiDiLZCKzazP9f522f5dk9tC+2gJMiWTitiuB72TWTQBujIiPdnvOHOBnkcxdfzfJFB9fAZD0UeCoiFhxGBnM+sSFwMpmfUQ8nz7+J5L5adZFxJtp2z8C2Zk2H8ncTz6Mn9MAPCDpVZKpI7IXRloSEev68BqPApelk/99jaRgmfW7AYfuYlZTus+psp1kzpe+9O963EH6IUpSHdDT5SVnA5tJrppVB+zJrNvZp6ARuyQ9A1wOfJ7k6lZm/c57BFY2vyup65P9F0gmLGvpGv8HvgT8R6b/lZn7/04ft/Gb/5SnkXz6724IsCkiOtPXrO9DtveAwd3avk8yrLQ0Irb14TXMDpsLgZXNapILrawimanybuCrwKPpME4ncH+m/1BJr5Bca3h22vYAcJ6kFSTDRT19wv8eMCPtc/qH9OnuFWC/pBWSZgNEciGbHcAPDu+fadZ3nn3USiO95ObCiDizj/3bgIkR8W4FYx0qwwnAT4HT070Ls37nPQKzgpL0ZZLrRM91EbBK8h6BmVnJeY/AzKzkXAjMzErOhcDMrORcCMzMSs6FwMys5FwIzMxK7v8A+me/ab3a2nMAAAAASUVORK5CYII=\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "from sklearn.cluster import KMeans\n",
+ "kmeans = KMeans(n_clusters = 3)\n",
+ "kmeans.fit(X)\n",
+ "labels = kmeans.predict(X)\n",
+ "plt.scatter(df['popularity'],df['danceability'],c = labels)\n",
+ "plt.xlabel('popularity')\n",
+ "plt.ylabel('danceability')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "source": [
+ "该模型的准确性还不错,但并不出色。可能是数据不太适合K均值聚类。您可以尝试使用其他方法。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 811,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Result: 109 out of 286 samples were correctly labeled.\nAccuracy score: 0.38\n"
+ ]
+ }
+ ],
+ "source": [
+ "labels = kmeans.labels_\n",
+ "\n",
+ "correct_labels = sum(y == labels)\n",
+ "\n",
+ "print(\"Result: %d out of %d samples were correctly labeled.\" % (correct_labels, y.size))\n",
+ "\n",
+ "print('Accuracy score: {0:0.2f}'. format(correct_labels/float(y.size)))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/2-K-Means/solution/tester.ipynb b/translations/zh-CN/5-Clustering/2-K-Means/solution/tester.ipynb
new file mode 100644
index 000000000..7e71c1e7f
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/2-K-Means/solution/tester.ipynb
@@ -0,0 +1,343 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "6f92868513e59d321245137c1c4c5311",
+ "translation_date": "2025-09-03T20:12:37+00:00",
+ "source_file": "5-Clustering/2-K-Means/solution/tester.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 104,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: seaborn in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.11.1)\n",
+ "Requirement already satisfied: pandas>=0.23 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.1.2)\n",
+ "Requirement already satisfied: matplotlib>=2.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (3.1.0)\n",
+ "Requirement already satisfied: numpy>=1.15 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.19.2)\n",
+ "Requirement already satisfied: scipy>=1.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from seaborn) (1.4.1)\n",
+ "Requirement already satisfied: pytz>=2017.2 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2019.1)\n",
+ "Requirement already satisfied: python-dateutil>=2.7.3 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from pandas>=0.23->seaborn) (2.8.0)\n",
+ "Requirement already satisfied: kiwisolver>=1.0.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (1.1.0)\n",
+ "Requirement already satisfied: pyparsing!=2.0.4,!=2.1.2,!=2.1.6,>=2.0.1 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (2.4.0)\n",
+ "Requirement already satisfied: cycler>=0.10 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from matplotlib>=2.2->seaborn) (0.10.0)\n",
+ "Requirement already satisfied: six>=1.5 in /Users/jenlooper/Library/Python/3.7/lib/python/site-packages (from python-dateutil>=2.7.3->pandas>=0.23->seaborn) (1.12.0)\n",
+ "Requirement already satisfied: setuptools in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from kiwisolver>=1.0.1->matplotlib>=2.2->seaborn) (45.1.0)\n",
+ "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n",
+ "You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n",
+ "Note: you may need to restart the kernel to use updated packages.\n"
+ ]
+ }
+ ],
+ "source": [
+ "pip install seaborn"
+ ]
+ },
+ {
+ "source": [
+ "从我们上节课结束的地方开始,导入并过滤数据。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 105,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " name album \\\n",
+ "0 Sparky Mandy & The Jungle \n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "2 LITT! LITT! \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "0 Cruel Santino alternative r&b 2019 144000 48 \n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "2 AYLØ indie r&b 2018 207758 40 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "0 0.666 0.8510 0.420 0.534000 0.1100 -6.699 \n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "2 0.836 0.2720 0.564 0.000537 0.1100 -7.127 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "0 0.0829 133.015 5 \n",
+ "1 0.3600 129.993 3 \n",
+ "2 0.0424 130.005 4 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 "
+ ],
+ "text/html": "\n\n
\n \n \n \n name \n album \n artist \n artist_top_genre \n release_date \n length \n popularity \n danceability \n acousticness \n energy \n instrumentalness \n liveness \n loudness \n speechiness \n tempo \n time_signature \n \n \n \n \n 0 \n Sparky \n Mandy & The Jungle \n Cruel Santino \n alternative r&b \n 2019 \n 144000 \n 48 \n 0.666 \n 0.8510 \n 0.420 \n 0.534000 \n 0.1100 \n -6.699 \n 0.0829 \n 133.015 \n 5 \n \n \n 1 \n shuga rush \n EVERYTHING YOU HEARD IS TRUE \n Odunsi (The Engine) \n afropop \n 2020 \n 89488 \n 30 \n 0.710 \n 0.0822 \n 0.683 \n 0.000169 \n 0.1010 \n -5.640 \n 0.3600 \n 129.993 \n 3 \n \n \n 2 \n LITT! \n LITT! \n AYLØ \n indie r&b \n 2018 \n 207758 \n 40 \n 0.836 \n 0.2720 \n 0.564 \n 0.000537 \n 0.1100 \n -7.127 \n 0.0424 \n 130.005 \n 4 \n \n \n 3 \n Confident / Feeling Cool \n Enjoy Your Life \n Lady Donli \n nigerian pop \n 2019 \n 175135 \n 14 \n 0.894 \n 0.7980 \n 0.611 \n 0.000187 \n 0.0964 \n -4.961 \n 0.1130 \n 111.087 \n 4 \n \n \n 4 \n wanted you \n rare. \n Odunsi (The Engine) \n afropop \n 2018 \n 152049 \n 25 \n 0.702 \n 0.1160 \n 0.833 \n 0.910000 \n 0.3480 \n -6.044 \n 0.0447 \n 105.115 \n 4 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 105
+ }
+ ],
+ "source": [
+ "\n",
+ "import matplotlib.pyplot as plt\n",
+ "import pandas as pd\n",
+ "import seaborn as sns\n",
+ "import numpy as np\n",
+ "\n",
+ "df = pd.read_csv(\"../../data/nigerian-songs.csv\")\n",
+ "df.head()"
+ ]
+ },
+ {
+ "source": [
+ "我们将只专注于三个类型。也许我们可以建立三个集群!\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 106,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "Text(0.5, 1.0, 'Top genres')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 106
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "df = df[(df['artist_top_genre'] == 'afro dancehall') | (df['artist_top_genre'] == 'afropop') | (df['artist_top_genre'] == 'nigerian pop')]\n",
+ "df = df[(df['popularity'] > 0)]\n",
+ "top = df['artist_top_genre'].value_counts()\n",
+ "plt.figure(figsize=(10,7))\n",
+ "sns.barplot(x=top.index,y=top.values)\n",
+ "plt.xticks(rotation=45)\n",
+ "plt.title('Top genres',color = 'blue')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 107,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " name album \\\n",
+ "1 shuga rush EVERYTHING YOU HEARD IS TRUE \n",
+ "3 Confident / Feeling Cool Enjoy Your Life \n",
+ "4 wanted you rare. \n",
+ "5 Kasala Pioneers \n",
+ "6 Pull Up Everything Pretty \n",
+ "\n",
+ " artist artist_top_genre release_date length popularity \\\n",
+ "1 Odunsi (The Engine) afropop 2020 89488 30 \n",
+ "3 Lady Donli nigerian pop 2019 175135 14 \n",
+ "4 Odunsi (The Engine) afropop 2018 152049 25 \n",
+ "5 DRB Lasgidi nigerian pop 2020 184800 26 \n",
+ "6 prettyboydo nigerian pop 2018 202648 29 \n",
+ "\n",
+ " danceability acousticness energy instrumentalness liveness loudness \\\n",
+ "1 0.710 0.0822 0.683 0.000169 0.1010 -5.640 \n",
+ "3 0.894 0.7980 0.611 0.000187 0.0964 -4.961 \n",
+ "4 0.702 0.1160 0.833 0.910000 0.3480 -6.044 \n",
+ "5 0.803 0.1270 0.525 0.000007 0.1290 -10.034 \n",
+ "6 0.818 0.4520 0.587 0.004490 0.5900 -9.840 \n",
+ "\n",
+ " speechiness tempo time_signature \n",
+ "1 0.3600 129.993 3 \n",
+ "3 0.1130 111.087 4 \n",
+ "4 0.0447 105.115 4 \n",
+ "5 0.1970 100.103 4 \n",
+ "6 0.1990 95.842 4 "
+ ],
+ "text/html": "\n\n
\n \n \n \n name \n album \n artist \n artist_top_genre \n release_date \n length \n popularity \n danceability \n acousticness \n energy \n instrumentalness \n liveness \n loudness \n speechiness \n tempo \n time_signature \n \n \n \n \n 1 \n shuga rush \n EVERYTHING YOU HEARD IS TRUE \n Odunsi (The Engine) \n afropop \n 2020 \n 89488 \n 30 \n 0.710 \n 0.0822 \n 0.683 \n 0.000169 \n 0.1010 \n -5.640 \n 0.3600 \n 129.993 \n 3 \n \n \n 3 \n Confident / Feeling Cool \n Enjoy Your Life \n Lady Donli \n nigerian pop \n 2019 \n 175135 \n 14 \n 0.894 \n 0.7980 \n 0.611 \n 0.000187 \n 0.0964 \n -4.961 \n 0.1130 \n 111.087 \n 4 \n \n \n 4 \n wanted you \n rare. \n Odunsi (The Engine) \n afropop \n 2018 \n 152049 \n 25 \n 0.702 \n 0.1160 \n 0.833 \n 0.910000 \n 0.3480 \n -6.044 \n 0.0447 \n 105.115 \n 4 \n \n \n 5 \n Kasala \n Pioneers \n DRB Lasgidi \n nigerian pop \n 2020 \n 184800 \n 26 \n 0.803 \n 0.1270 \n 0.525 \n 0.000007 \n 0.1290 \n -10.034 \n 0.1970 \n 100.103 \n 4 \n \n \n 6 \n Pull Up \n Everything Pretty \n prettyboydo \n nigerian pop \n 2018 \n 202648 \n 29 \n 0.818 \n 0.4520 \n 0.587 \n 0.004490 \n 0.5900 \n -9.840 \n 0.1990 \n 95.842 \n 4 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 107
+ }
+ ],
+ "source": [
+ "df.head()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 108,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from sklearn.preprocessing import StandardScaler\n",
+ "\n",
+ "scaler = StandardScaler()\n",
+ "\n",
+ "# X = df.loc[:, ('danceability','energy')]\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 110,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "ValueError",
+ "evalue": "Unknown label type: 'continuous'",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mValueError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[1;32m 20\u001b[0m \u001b[0;31m# we create an instance of SVM and fit out data. We do not scale our\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 21\u001b[0m \u001b[0;31m# data since we want to plot the support vectors\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m---> 22\u001b[0;31m \u001b[0mls30\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mLabelSpreading\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_30\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_30\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Label Spreading 30% data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 23\u001b[0m \u001b[0mls50\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mLabelSpreading\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_50\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my_50\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Label Spreading 50% data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 24\u001b[0m \u001b[0mls100\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0;34m(\u001b[0m\u001b[0mLabelSpreading\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mfit\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0;34m'Label Spreading 100% data'\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/semi_supervised/_label_propagation.py\u001b[0m in \u001b[0;36mfit\u001b[0;34m(self, X, y)\u001b[0m\n\u001b[1;32m 228\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0m_validate_data\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mX\u001b[0m\u001b[0;34m,\u001b[0m \u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 229\u001b[0m \u001b[0mself\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mX_\u001b[0m \u001b[0;34m=\u001b[0m \u001b[0mX\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0;32m--> 230\u001b[0;31m \u001b[0mcheck_classification_targets\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0my\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 231\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 232\u001b[0m \u001b[0;31m# actual graph construction (implementations should override this)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;32m/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/sklearn/utils/multiclass.py\u001b[0m in \u001b[0;36mcheck_classification_targets\u001b[0;34m(y)\u001b[0m\n\u001b[1;32m 181\u001b[0m if y_type not in ['binary', 'multiclass', 'multiclass-multioutput',\n\u001b[1;32m 182\u001b[0m 'multilabel-indicator', 'multilabel-sequences']:\n\u001b[0;32m--> 183\u001b[0;31m \u001b[0;32mraise\u001b[0m \u001b[0mValueError\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m\"Unknown label type: %r\"\u001b[0m \u001b[0;34m%\u001b[0m \u001b[0my_type\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m\u001b[1;32m 184\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n\u001b[1;32m 185\u001b[0m \u001b[0;34m\u001b[0m\u001b[0m\n",
+ "\u001b[0;31mValueError\u001b[0m: Unknown label type: 'continuous'"
+ ]
+ }
+ ],
+ "source": [
+ "from sklearn.svm import SVC\n",
+ "from sklearn.semi_supervised import LabelSpreading\n",
+ "from sklearn.semi_supervised import SelfTrainingClassifier\n",
+ "from sklearn import datasets\n",
+ "\n",
+ "X = df[['danceability','acousticness']].values\n",
+ "y = df['energy'].values\n",
+ "\n",
+ "# X = scaler.fit_transform(X)\n",
+ "\n",
+ "# step size in the mesh\n",
+ "h = .02\n",
+ "\n",
+ "rng = np.random.RandomState(0)\n",
+ "y_rand = rng.rand(y.shape[0])\n",
+ "y_30 = np.copy(y)\n",
+ "y_30[y_rand < 0.3] = -1 # set random samples to be unlabeled\n",
+ "y_50 = np.copy(y)\n",
+ "y_50[y_rand < 0.5] = -1\n",
+ "# we create an instance of SVM and fit out data. We do not scale our\n",
+ "# data since we want to plot the support vectors\n",
+ "ls30 = (LabelSpreading().fit(X, y_30), y_30, 'Label Spreading 30% data')\n",
+ "ls50 = (LabelSpreading().fit(X, y_50), y_50, 'Label Spreading 50% data')\n",
+ "ls100 = (LabelSpreading().fit(X, y), y, 'Label Spreading 100% data')\n",
+ "\n",
+ "# the base classifier for self-training is identical to the SVC\n",
+ "base_classifier = SVC(kernel='rbf', gamma=.5, probability=True)\n",
+ "st30 = (SelfTrainingClassifier(base_classifier).fit(X, y_30),\n",
+ " y_30, 'Self-training 30% data')\n",
+ "st50 = (SelfTrainingClassifier(base_classifier).fit(X, y_50),\n",
+ " y_50, 'Self-training 50% data')\n",
+ "\n",
+ "rbf_svc = (SVC(kernel='rbf', gamma=.5).fit(X, y), y, 'SVC with rbf kernel')\n",
+ "\n",
+ "# create a mesh to plot in\n",
+ "x_min, x_max = X[:, 0].min() - 1, X[:, 0].max() + 1\n",
+ "y_min, y_max = X[:, 1].min() - 1, X[:, 1].max() + 1\n",
+ "xx, yy = np.meshgrid(np.arange(x_min, x_max, h),\n",
+ " np.arange(y_min, y_max, h))\n",
+ "\n",
+ "color_map = {-1: (1, 1, 1), 0: (0, 0, .9), 1: (1, 0, 0), 2: (.8, .6, 0)}\n",
+ "\n",
+ "classifiers = (ls30, st30, ls50, st50, ls100, rbf_svc)\n",
+ "for i, (clf, y_train, title) in enumerate(classifiers):\n",
+ " # Plot the decision boundary. For that, we will assign a color to each\n",
+ " # point in the mesh [x_min, x_max]x[y_min, y_max].\n",
+ " plt.subplot(3, 2, i + 1)\n",
+ " Z = clf.predict(np.c_[xx.ravel(), yy.ravel()])\n",
+ "\n",
+ " # Put the result into a color plot\n",
+ " Z = Z.reshape(xx.shape)\n",
+ " plt.contourf(xx, yy, Z, cmap=plt.cm.Paired)\n",
+ " plt.axis('off')\n",
+ "\n",
+ " # Plot also the training points\n",
+ " colors = [color_map[y] for y in y_train]\n",
+ " plt.scatter(X[:, 0], X[:, 1], c=colors, edgecolors='black')\n",
+ "\n",
+ " plt.title(title)\n",
+ "\n",
+ "plt.suptitle(\"Unlabeled points are colored white\", y=0.1)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/5-Clustering/README.md b/translations/zh-CN/5-Clustering/README.md
new file mode 100644
index 000000000..b7194600d
--- /dev/null
+++ b/translations/zh-CN/5-Clustering/README.md
@@ -0,0 +1,33 @@
+# 机器学习中的聚类模型
+
+聚类是一种机器学习任务,旨在寻找彼此相似的对象并将它们分组到称为“聚类”的组中。与机器学习中的其他方法不同,聚类是自动进行的,实际上可以说它是监督学习的反面。
+
+## 地区主题:针对尼日利亚观众音乐品味的聚类模型 🎧
+
+尼日利亚的观众拥有多样化的音乐品味。通过从 Spotify 抓取的数据(灵感来源于[这篇文章](https://towardsdatascience.com/country-wise-visual-analysis-of-music-taste-using-spotify-api-seaborn-in-python-77f5b749b421)),让我们来看看尼日利亚流行的一些音乐。这份数据集包括关于各种歌曲的“舞蹈性”评分、“声学性”、响度、“语音性”、流行度和能量的相关数据。发现这些数据中的模式将会非常有趣!
+
+
+
+> 图片由 Marcela Laskoski 提供,来自 Unsplash
+
+在这一系列课程中,你将学习使用聚类技术分析数据的新方法。聚类特别适用于数据集缺乏标签的情况。如果数据集有标签,那么你在之前课程中学到的分类技术可能会更有用。但在需要对无标签数据进行分组的情况下,聚类是发现模式的绝佳方法。
+
+> 有一些实用的低代码工具可以帮助你学习如何使用聚类模型。试试 [Azure ML](https://docs.microsoft.com/learn/modules/create-clustering-model-azure-machine-learning-designer/?WT.mc_id=academic-77952-leestott) 来完成这个任务。
+
+## 课程
+
+1. [聚类简介](1-Visualize/README.md)
+2. [K-Means 聚类](2-K-Means/README.md)
+
+## 致谢
+
+这些课程由 [Jen Looper](https://www.twitter.com/jenlooper) 倾情创作,并由 [Rishit Dagli](https://rishit_dagli) 和 [Muhammad Sakib Khan Inan](https://twitter.com/Sakibinan) 提供了有益的审阅。
+
+[Nigerian Songs](https://www.kaggle.com/sootersaalu/nigerian-songs-spotify) 数据集来源于 Kaggle,由 Spotify 抓取。
+
+在创建本课程时,以下 K-Means 示例提供了帮助,包括这个 [鸢尾花探索](https://www.kaggle.com/bburns/iris-exploration-pca-k-means-and-gmm-clustering)、这个[入门笔记本](https://www.kaggle.com/prashant111/k-means-clustering-with-python),以及这个[假设的 NGO 示例](https://www.kaggle.com/ankandash/pca-k-means-clustering-hierarchical-clustering)。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/1-Introduction-to-NLP/README.md b/translations/zh-CN/6-NLP/1-Introduction-to-NLP/README.md
new file mode 100644
index 000000000..5ae742a3c
--- /dev/null
+++ b/translations/zh-CN/6-NLP/1-Introduction-to-NLP/README.md
@@ -0,0 +1,170 @@
+# 自然语言处理简介
+
+本课程涵盖了*自然语言处理*(NLP)的简要历史和重要概念,这是*计算语言学*的一个分支领域。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 简介
+
+NLP是机器学习应用最广泛的领域之一,并已被用于生产软件中。
+
+✅ 你能想到每天使用的软件中可能嵌入了NLP吗?比如你经常使用的文字处理程序或手机应用?
+
+你将学习以下内容:
+
+- **语言的概念**。了解语言的发展以及主要研究领域。
+- **定义和概念**。你还将学习计算机如何处理文本的定义和概念,包括解析、语法以及识别名词和动词。本课程中有一些编码任务,并引入了几个重要概念,这些概念将在后续课程中学习如何编写代码。
+
+## 计算语言学
+
+计算语言学是一个研究和开发领域,已有数十年的历史,研究计算机如何与语言协作,甚至理解、翻译和与语言进行交流。自然语言处理(NLP)是一个相关领域,专注于计算机如何处理“自然”或人类语言。
+
+### 示例 - 手机语音输入
+
+如果你曾经对手机进行语音输入而不是打字,或者向虚拟助手提问,你的语音会被转换为文本形式,然后被处理或*解析*成你所说的语言。检测到的关键词随后会被处理成手机或助手可以理解并执行的格式。
+
+
+> 真正的语言理解很难!图片来源:[Jen Looper](https://twitter.com/jenlooper)
+
+### 这项技术是如何实现的?
+
+这项技术的实现是因为有人编写了一个计算机程序来完成这些任务。几十年前,一些科幻作家预测人们将主要通过语音与计算机交流,而计算机将始终准确理解他们的意思。然而,事实证明这是一个比许多人想象的更难的问题。尽管今天对这个问题的理解已经大大提高,但在实现“完美”的自然语言处理以理解句子的意义时仍然存在重大挑战。尤其是在理解幽默或检测句子中的情感(如讽刺)时,这个问题尤为困难。
+
+此时,你可能会回忆起学校课堂上老师讲解句子语法部分的情景。在一些国家,学生会专门学习语法和语言学,而在许多国家,这些主题是语言学习的一部分:小学学习母语(学习阅读和写作),高中可能学习第二语言。如果你不擅长区分名词和动词或副词和形容词,不用担心!
+
+如果你对区分*一般现在时*和*现在进行时*感到困难,你并不孤单。这对许多人来说是一个挑战,即使是母语使用者。好消息是,计算机非常擅长应用正式规则,你将学习编写代码来像人类一样*解析*句子。更大的挑战是理解句子的*意义*和*情感*。
+
+## 前置知识
+
+本课程的主要前置知识是能够阅读和理解本课程的语言。本课程没有数学问题或需要解决的方程。虽然课程的原作者是用英语编写的,但它也被翻译成其他语言,因此你可能正在阅读翻译版本。本课程中有一些例子使用了多种语言(用于比较不同语言的语法规则)。这些例子*没有*被翻译,但解释性文本是翻译过的,因此意义应该是清晰的。
+
+对于编码任务,你将使用Python,示例使用的是Python 3.8。
+
+在本节中,你将需要并使用以下内容:
+
+- **Python 3理解能力**。理解Python 3编程语言,本课程使用输入、循环、文件读取、数组。
+- **Visual Studio Code + 扩展**。我们将使用Visual Studio Code及其Python扩展。你也可以使用自己选择的Python IDE。
+- **TextBlob**。 [TextBlob](https://github.com/sloria/TextBlob) 是一个简化的Python文本处理库。按照TextBlob网站上的说明将其安装到你的系统中(同时安装语料库,如下所示):
+
+ ```bash
+ pip install -U textblob
+ python -m textblob.download_corpora
+ ```
+
+> 💡 提示:你可以直接在VS Code环境中运行Python。查看[文档](https://code.visualstudio.com/docs/languages/python?WT.mc_id=academic-77952-leestott)了解更多信息。
+
+## 与机器对话
+
+让计算机理解人类语言的历史可以追溯到几十年前,最早考虑自然语言处理的科学家之一是*艾伦·图灵*。
+
+### 图灵测试
+
+当图灵在20世纪50年代研究*人工智能*时,他提出了一个对话测试:通过打字交流,让人类和计算机进行对话,而人类无法确定自己是在与另一个人还是计算机交流。
+
+如果在一定时间的对话后,人类无法判断回答是来自计算机还是人类,那么是否可以说计算机在“思考”?
+
+### 灵感来源 - 模仿游戏
+
+这个想法来源于一个叫*模仿游戏*的派对游戏,游戏中一个审问者独自待在一个房间里,任务是判断另一个房间里的两个人分别是男性还是女性。审问者可以发送纸条,并试图提出问题,通过书面回答来揭示神秘人物的性别。当然,另一个房间里的玩家会试图通过回答问题来误导或迷惑审问者,同时也要表现得像是在诚实回答。
+
+### 开发Eliza
+
+在20世纪60年代,麻省理工学院的科学家*约瑟夫·魏岑鲍姆*开发了[*Eliza*](https://wikipedia.org/wiki/ELIZA),一个计算机“治疗师”,它会向人类提问并表现出理解他们的回答。然而,虽然Eliza可以解析句子并识别某些语法结构和关键词以给出合理的回答,但它不能说是*理解*句子。如果Eliza收到一个格式为“**我很**难过 ”的句子,它可能会重新排列并替换句子中的单词,形成“你**已经**难过 多久了”的回答。
+
+这给人一种Eliza理解了陈述并提出了后续问题的印象,而实际上它只是改变了时态并添加了一些单词。如果Eliza无法识别一个关键词,它会给出一个随机回答,这可能适用于许多不同的陈述。例如,如果用户写“**你是**一辆自行车 ”,它可能会回答“我**已经**是一辆自行车 多久了?”,而不是一个更合理的回答。
+
+[](https://youtu.be/RMK9AphfLco "与Eliza聊天")
+
+> 🎥 点击上方图片观看关于原始ELIZA程序的视频
+
+> 注意:如果你有ACM账户,可以阅读1966年发表的[Eliza](https://cacm.acm.org/magazines/1966/1/13317-elizaa-computer-program-for-the-study-of-natural-language-communication-between-man-and-machine/abstract)原始描述。或者,可以在[wikipedia](https://wikipedia.org/wiki/ELIZA)上了解Eliza。
+
+## 练习 - 编写一个基础对话机器人
+
+一个对话机器人,比如Eliza,是一个能够引导用户输入并表现出理解和智能回应的程序。与Eliza不同,我们的机器人不会有多个规则来表现出智能对话。相反,我们的机器人只有一个功能,即通过随机回应来保持对话,这些回应可能适用于几乎任何简单对话。
+
+### 计划
+
+构建对话机器人的步骤:
+
+1. 打印说明,告知用户如何与机器人互动
+2. 开始一个循环
+ 1. 接收用户输入
+ 2. 如果用户要求退出,则退出
+ 3. 处理用户输入并确定回应(在本例中,回应是从可能的通用回应列表中随机选择)
+ 4. 打印回应
+3. 返回步骤2
+
+### 构建机器人
+
+接下来我们来创建机器人。首先定义一些短语。
+
+1. 使用以下随机回应在Python中创建这个机器人:
+
+ ```python
+ random_responses = ["That is quite interesting, please tell me more.",
+ "I see. Do go on.",
+ "Why do you say that?",
+ "Funny weather we've been having, isn't it?",
+ "Let's change the subject.",
+ "Did you catch the game last night?"]
+ ```
+
+ 以下是一些示例输出(用户输入以`>`开头的行):
+
+ ```output
+ Hello, I am Marvin, the simple robot.
+ You can end this conversation at any time by typing 'bye'
+ After typing each answer, press 'enter'
+ How are you today?
+ > I am good thanks
+ That is quite interesting, please tell me more.
+ > today I went for a walk
+ Did you catch the game last night?
+ > I did, but my team lost
+ Funny weather we've been having, isn't it?
+ > yes but I hope next week is better
+ Let's change the subject.
+ > ok, lets talk about music
+ Why do you say that?
+ > because I like music!
+ Why do you say that?
+ > bye
+ It was nice talking to you, goodbye!
+ ```
+
+ 任务的一个可能解决方案在[这里](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/1-Introduction-to-NLP/solution/bot.py)
+
+ ✅ 停下来思考
+
+ 1. 你认为随机回应能否“欺骗”某人,让他们认为机器人真的理解了他们?
+ 2. 机器人需要哪些功能才能更有效?
+ 3. 如果一个机器人真的能“理解”句子的意义,它是否需要“记住”对话中前几句的意义?
+
+---
+
+## 🚀挑战
+
+选择上述“停下来思考”中的一个元素,尝试用代码实现它,或者用伪代码在纸上写出解决方案。
+
+在下一节课中,你将学习其他解析自然语言和机器学习的方法。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+查看以下参考资料,作为进一步阅读的机会。
+
+### 参考资料
+
+1. Schubert, Lenhart, "Computational Linguistics", *The Stanford Encyclopedia of Philosophy* (Spring 2020 Edition), Edward N. Zalta (ed.), URL = .
+2. Princeton University "About WordNet." [WordNet](https://wordnet.princeton.edu/). Princeton University. 2010.
+
+## 作业
+
+[寻找一个机器人](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/1-Introduction-to-NLP/assignment.md b/translations/zh-CN/6-NLP/1-Introduction-to-NLP/assignment.md
new file mode 100644
index 000000000..a7435831a
--- /dev/null
+++ b/translations/zh-CN/6-NLP/1-Introduction-to-NLP/assignment.md
@@ -0,0 +1,16 @@
+# 寻找一个机器人
+
+## 说明
+
+机器人无处不在。你的任务是:找到一个并与它互动!你可以在网站上、银行应用程序中,或者通过电话找到它,例如,当你拨打金融服务公司的电话咨询或查询账户信息时。分析这个机器人,看看你是否能让它困惑。如果你能让机器人困惑,为什么会发生这种情况?写一篇简短的文章,描述你的体验。
+
+## 评分标准
+
+| 标准 | 优秀 | 合格 | 需要改进 |
+| -------- | -------------------------------------------------------------------------------------------------------- | ---------------------------------------- | --------------------- |
+| | 撰写了一整页文章,解释了假定的机器人架构并概述了与它的互动体验 | 文章不完整或研究不充分 | 未提交文章 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/2-Tasks/README.md b/translations/zh-CN/6-NLP/2-Tasks/README.md
new file mode 100644
index 000000000..d1644e674
--- /dev/null
+++ b/translations/zh-CN/6-NLP/2-Tasks/README.md
@@ -0,0 +1,219 @@
+# 常见的自然语言处理任务和技术
+
+对于大多数*自然语言处理*任务,需要将待处理的文本分解、分析,并将结果存储或与规则和数据集进行交叉引用。这些任务使程序员能够推导出文本中的_意义_、_意图_或仅仅是_词语和术语的频率_。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+让我们来探索处理文本时常用的技术。这些技术结合机器学习,可以帮助你高效地分析大量文本。然而,在将机器学习应用于这些任务之前,我们需要了解自然语言处理专家可能遇到的问题。
+
+## 自然语言处理的常见任务
+
+分析文本有多种方法。通过执行不同的任务,你可以理解文本并得出结论。这些任务通常按顺序进行。
+
+### 分词
+
+大多数自然语言处理算法的第一步可能是将文本分解为词或标记。虽然这听起来很简单,但考虑到标点符号以及不同语言的词和句子的分隔符,这可能会变得复杂。你可能需要使用多种方法来确定分界点。
+
+
+> 从**傲慢与偏见**中分词的示例。信息图由 [Jen Looper](https://twitter.com/jenlooper) 制作
+
+### 嵌入
+
+[词嵌入](https://wikipedia.org/wiki/Word_embedding)是一种将文本数据转换为数值的方式。嵌入的方式使得具有相似意义或经常一起使用的词汇聚集在一起。
+
+
+> “我对你的神经非常尊重,它们是我的老朋友。” - **傲慢与偏见**中的一句话的词嵌入。信息图由 [Jen Looper](https://twitter.com/jenlooper) 制作
+
+✅ 尝试[这个有趣的工具](https://projector.tensorflow.org/)来实验词嵌入。点击一个词可以显示类似词的聚类,例如“toy”与“disney”、“lego”、“playstation”和“console”聚类在一起。
+
+### 解析与词性标注
+
+每个被分词的词都可以标注为词性,例如名词、动词或形容词。句子`the quick red fox jumped over the lazy brown dog`可能会被词性标注为:fox = 名词,jumped = 动词。
+
+
+
+> **傲慢与偏见**中的一句话解析示例。信息图由 [Jen Looper](https://twitter.com/jenlooper) 制作
+
+解析是识别句子中哪些词彼此相关,例如`the quick red fox jumped`是一个形容词-名词-动词序列,与`lazy brown dog`序列分开。
+
+### 词和短语频率
+
+分析大量文本时,一个有用的步骤是构建一个字典,记录每个感兴趣的词或短语及其出现的频率。短语`the quick red fox jumped over the lazy brown dog`中,词`the`的频率为2。
+
+让我们看一个示例文本,统计词频。拉迪亚德·吉卜林的诗《胜利者》中有以下诗句:
+
+```output
+What the moral? Who rides may read.
+When the night is thick and the tracks are blind
+A friend at a pinch is a friend, indeed,
+But a fool to wait for the laggard behind.
+Down to Gehenna or up to the Throne,
+He travels the fastest who travels alone.
+```
+
+由于短语频率可以根据需要区分大小写,短语`a friend`的频率为2,`the`的频率为6,`travels`的频率为2。
+
+### N-grams
+
+文本可以分解为固定长度的词序列,例如单词(unigram)、两个词(bigram)、三个词(trigram)或任意数量的词(n-grams)。
+
+例如,`the quick red fox jumped over the lazy brown dog`的n-gram长度为2,生成以下n-grams:
+
+1. the quick
+2. quick red
+3. red fox
+4. fox jumped
+5. jumped over
+6. over the
+7. the lazy
+8. lazy brown
+9. brown dog
+
+可以将其想象为一个滑动窗口在句子上移动。以下是长度为3的n-grams,每个句子中的n-gram用加粗表示:
+
+1. **the quick red** fox jumped over the lazy brown dog
+2. the **quick red fox ** jumped over the lazy brown dog
+3. the quick **red fox jumped ** over the lazy brown dog
+4. the quick red **fox jumped over ** the lazy brown dog
+5. the quick red fox **jumped over the ** lazy brown dog
+6. the quick red fox jumped **over the lazy ** brown dog
+7. the quick red fox jumped over **the lazy brown** dog
+8. the quick red fox jumped over the **lazy brown dog **
+
+
+
+> N-gram值为3:信息图由 [Jen Looper](https://twitter.com/jenlooper) 制作
+
+### 名词短语提取
+
+在大多数句子中,有一个名词是句子的主语或宾语。在英语中,通常可以通过前面的`a`、`an`或`the`来识别。通过“提取名词短语”来识别句子的主语或宾语是自然语言处理中理解句子意义的常见任务。
+
+✅ 在句子“我无法确定时间、地点、表情或语言,这些构成了基础。这太久远了。我在不知不觉中已经开始了。”中,你能识别出名词短语吗?
+
+在句子`the quick red fox jumped over the lazy brown dog`中,有两个名词短语:**quick red fox**和**lazy brown dog**。
+
+### 情感分析
+
+可以分析句子或文本的情感,即其*积极性*或*消极性*。情感通过*极性*和*客观性/主观性*来衡量。极性范围从-1.0到1.0(消极到积极),客观性范围从0.0到1.0(最客观到最主观)。
+
+✅ 稍后你会学习使用机器学习确定情感的不同方法,但一种方法是由人工专家将词和短语分类为积极或消极,并将该模型应用于文本以计算极性分数。你能看到这种方法在某些情况下有效,而在其他情况下效果较差吗?
+
+### 词形变化
+
+词形变化使你能够获取一个词的单数或复数形式。
+
+### 词形还原
+
+*词形还原*是指获取一组词的词根或主词,例如*flew*、*flies*、*flying*的词形还原为动词*fly*。
+
+还有一些对自然语言处理研究人员非常有用的数据库,例如:
+
+### WordNet
+
+[WordNet](https://wordnet.princeton.edu/)是一个包含词汇、同义词、反义词以及许多其他细节的数据库,涵盖多种语言中的每个词汇。在构建翻译、拼写检查器或任何类型的语言工具时,它非常有用。
+
+## 自然语言处理库
+
+幸运的是,你不需要自己构建所有这些技术,因为有许多优秀的Python库可以让非自然语言处理或机器学习专家的开发者更容易使用。在接下来的课程中会有更多示例,但这里你将学习一些有用的示例来帮助你完成下一项任务。
+
+### 练习 - 使用`TextBlob`库
+
+让我们使用一个名为TextBlob的库,它包含处理这些任务的有用API。TextBlob“基于[NLTK](https://nltk.org)和[pattern](https://github.com/clips/pattern),并与它们很好地协作。”它的API中嵌入了大量机器学习功能。
+
+> 注意:推荐给有经验的Python开发者的TextBlob[快速入门指南](https://textblob.readthedocs.io/en/dev/quickstart.html#quickstart)
+
+在尝试识别*名词短语*时,TextBlob提供了几种提取器选项来找到名词短语。
+
+1. 看看`ConllExtractor`。
+
+ ```python
+ from textblob import TextBlob
+ from textblob.np_extractors import ConllExtractor
+ # import and create a Conll extractor to use later
+ extractor = ConllExtractor()
+
+ # later when you need a noun phrase extractor:
+ user_input = input("> ")
+ user_input_blob = TextBlob(user_input, np_extractor=extractor) # note non-default extractor specified
+ np = user_input_blob.noun_phrases
+ ```
+
+ > 这里发生了什么?[ConllExtractor](https://textblob.readthedocs.io/en/dev/api_reference.html?highlight=Conll#textblob.en.np_extractors.ConllExtractor)是“一个使用ConLL-2000训练语料库进行块解析的名词短语提取器。”ConLL-2000指的是2000年计算自然语言学习会议。每年会议都会举办一个研讨会来解决一个棘手的自然语言处理问题,2000年的主题是名词块解析。一个模型在《华尔街日报》上进行了训练,“使用第15-18节作为训练数据(211727个标记),第20节作为测试数据(47377个标记)”。你可以查看使用的程序[这里](https://www.clips.uantwerpen.be/conll2000/chunking/)以及[结果](https://ifarm.nl/erikt/research/np-chunking.html)。
+
+### 挑战 - 使用自然语言处理改进你的机器人
+
+在上一课中,你构建了一个非常简单的问答机器人。现在,你将通过分析用户输入的情感并打印出匹配情感的响应,使Marvin更加富有同情心。你还需要识别一个`noun_phrase`并围绕它提出更多问题。
+
+构建更好的对话机器人的步骤:
+
+1. 打印说明,指导用户如何与机器人互动
+2. 开始循环
+ 1. 接收用户输入
+ 2. 如果用户要求退出,则退出
+ 3. 处理用户输入并确定适当的情感响应
+ 4. 如果在情感中检测到名词短语,将其复数化并围绕该主题提出更多问题
+ 5. 打印响应
+3. 返回步骤2
+
+以下是使用TextBlob确定情感的代码片段。注意,这里只有四种*情感响应梯度*(如果你愿意,可以增加更多):
+
+```python
+if user_input_blob.polarity <= -0.5:
+ response = "Oh dear, that sounds bad. "
+elif user_input_blob.polarity <= 0:
+ response = "Hmm, that's not great. "
+elif user_input_blob.polarity <= 0.5:
+ response = "Well, that sounds positive. "
+elif user_input_blob.polarity <= 1:
+ response = "Wow, that sounds great. "
+```
+
+以下是一些示例输出以供参考(用户输入以`>`开头的行):
+
+```output
+Hello, I am Marvin, the friendly robot.
+You can end this conversation at any time by typing 'bye'
+After typing each answer, press 'enter'
+How are you today?
+> I am ok
+Well, that sounds positive. Can you tell me more?
+> I went for a walk and saw a lovely cat
+Well, that sounds positive. Can you tell me more about lovely cats?
+> cats are the best. But I also have a cool dog
+Wow, that sounds great. Can you tell me more about cool dogs?
+> I have an old hounddog but he is sick
+Hmm, that's not great. Can you tell me more about old hounddogs?
+> bye
+It was nice talking to you, goodbye!
+```
+
+任务的一个可能解决方案在[这里](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/2-Tasks/solution/bot.py)
+
+✅ 知识检查
+
+1. 你认为这些富有同情心的响应能否“欺骗”某人,让他们认为机器人真的理解他们?
+2. 识别名词短语是否让机器人更“可信”?
+3. 为什么从句子中提取“名词短语”是一件有用的事情?
+
+---
+
+实现上述知识检查中的机器人并测试它。它能否欺骗你的朋友?你能让你的机器人更“可信”吗?
+
+## 🚀挑战
+
+尝试实现上述知识检查中的任务并测试机器人。它能否欺骗你的朋友?你能让你的机器人更“可信”吗?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+在接下来的几节课中,你将学习更多关于情感分析的内容。通过阅读像[KDNuggets](https://www.kdnuggets.com/tag/nlp)上的文章来研究这一有趣的技术。
+
+## 作业
+
+[让机器人回复](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/2-Tasks/assignment.md b/translations/zh-CN/6-NLP/2-Tasks/assignment.md
new file mode 100644
index 000000000..6921a6b69
--- /dev/null
+++ b/translations/zh-CN/6-NLP/2-Tasks/assignment.md
@@ -0,0 +1,16 @@
+# 让机器人回应
+
+## 说明
+
+在之前的课程中,你编写了一个基础的聊天机器人。这个机器人会随机回答,直到你说“bye”。你能让它的回答不那么随机,并在你说特定内容(比如“为什么”或“怎么”)时触发特定回答吗?思考一下,机器学习如何让这种工作变得更自动化,同时扩展你的机器人。你可以使用 NLTK 或 TextBlob 库来简化任务。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ------------------------------------------ | -------------------------------------------- | ---------------------- |
+| | 提供了一个新的 bot.py 文件并进行了文档记录 | 提供了一个新的 bot 文件,但存在一些问题 | 未提供文件 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/3-Translation-Sentiment/README.md b/translations/zh-CN/6-NLP/3-Translation-Sentiment/README.md
new file mode 100644
index 000000000..a197169ca
--- /dev/null
+++ b/translations/zh-CN/6-NLP/3-Translation-Sentiment/README.md
@@ -0,0 +1,191 @@
+# 使用机器学习进行翻译和情感分析
+
+在之前的课程中,你学习了如何使用 `TextBlob` 构建一个基础的机器人。`TextBlob` 是一个库,它在幕后嵌入了机器学习技术,用于执行基本的自然语言处理任务,例如名词短语提取。计算语言学中的另一个重要挑战是准确地将一个语言的句子翻译成另一种语言。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+翻译是一个非常困难的问题,因为世界上有成千上万种语言,每种语言都有非常不同的语法规则。一种方法是将一种语言(例如英语)的正式语法规则转换为一种与语言无关的结构,然后通过转换回另一种语言来完成翻译。这种方法的步骤如下:
+
+1. **识别**。识别或标记输入语言中的单词,例如名词、动词等。
+2. **创建翻译**。以目标语言的格式直接翻译每个单词。
+
+### 示例句子:英语到爱尔兰语
+
+在“英语”中,句子 _I feel happy_ 是三个单词,顺序为:
+
+- **主语** (I)
+- **动词** (feel)
+- **形容词** (happy)
+
+然而,在“爱尔兰语”中,同样的句子有非常不同的语法结构——像“happy”或“sad”这样的情感被表达为“在你身上”。
+
+英语短语 `I feel happy` 在爱尔兰语中是 `Tá athas orm`。一个*字面*翻译是 `Happy is upon me`。
+
+一个讲爱尔兰语的人翻译成英语时会说 `I feel happy`,而不是 `Happy is upon me`,因为他们理解句子的含义,即使单词和句子结构不同。
+
+在爱尔兰语中,这句话的正式顺序是:
+
+- **动词** (Tá 或 is)
+- **形容词** (athas 或 happy)
+- **主语** (orm 或 upon me)
+
+## 翻译
+
+一个简单的翻译程序可能只翻译单词,而忽略句子结构。
+
+✅ 如果你作为成年人学习了第二(或第三甚至更多)语言,你可能一开始会在脑海中用母语思考,将一个概念逐字翻译成第二语言,然后说出你的翻译。这类似于简单翻译计算机程序的工作方式。要达到流利程度,重要的是要超越这个阶段!
+
+简单翻译会导致糟糕(有时甚至是搞笑)的误译:`I feel happy` 字面翻译成爱尔兰语是 `Mise bhraitheann athas`。这意味着(字面上)`me feel happy`,但这不是一个有效的爱尔兰语句子。尽管英语和爱尔兰语是两个邻近岛屿上使用的语言,但它们是非常不同的语言,语法结构也不同。
+
+> 你可以观看一些关于爱尔兰语言传统的视频,例如 [这个](https://www.youtube.com/watch?v=mRIaLSdRMMs)
+
+### 机器学习方法
+
+到目前为止,你已经了解了自然语言处理的正式规则方法。另一种方法是忽略单词的含义,而是*使用机器学习来检测模式*。如果你有大量的文本(*语料库*)或原始语言和目标语言的文本(*语料*),这种方法在翻译中可能会奏效。
+
+例如,考虑《傲慢与偏见》的情况,这是一本由简·奥斯汀于1813年写的著名英语小说。如果你查阅这本书的英语版本和人类翻译的*法语*版本,你可以检测到一种语言中的短语在另一种语言中被*习惯性地*翻译。这就是你接下来要做的。
+
+例如,当英语短语 `I have no money` 被字面翻译成法语时,它可能变成 `Je n'ai pas de monnaie`。“Monnaie” 是一个棘手的法语“假同源词”,因为“money”和“monnaie”并不是同义词。一个人类可能会做出更好的翻译,即 `Je n'ai pas d'argent`,因为它更好地传达了你没有钱的意思(而不是“零钱”,这是“monnaie”的意思)。
+
+
+
+> 图片由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+如果一个机器学习模型有足够的人工翻译来构建模型,它可以通过识别之前由精通两种语言的专家翻译的文本中的常见模式来提高翻译的准确性。
+
+### 练习 - 翻译
+
+你可以使用 `TextBlob` 来翻译句子。试试《傲慢与偏见》的著名第一句:
+
+```python
+from textblob import TextBlob
+
+blob = TextBlob(
+ "It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife!"
+)
+print(blob.translate(to="fr"))
+
+```
+
+`TextBlob` 的翻译效果相当不错:“C'est une vérité universellement reconnue, qu'un homme célibataire en possession d'une bonne fortune doit avoir besoin d'une femme!”。
+
+可以说,`TextBlob` 的翻译实际上比1932年由 V. Leconte 和 Ch. Pressoir 翻译的法语版本更精确:
+
+“C'est une vérité universelle qu'un célibataire pourvu d'une belle fortune doit avoir envie de se marier, et, si peu que l'on sache de son sentiment à cet egard, lorsqu'il arrive dans une nouvelle résidence, cette idée est si bien fixée dans l'esprit de ses voisins qu'ils le considèrent sur-le-champ comme la propriété légitime de l'une ou l'autre de leurs filles。”
+
+在这种情况下,由机器学习支持的翻译比人类翻译更好,因为后者为了“清晰”而不必要地在原作者的文字中添加了额外的内容。
+
+> 这是怎么回事?为什么 `TextBlob` 的翻译如此出色?实际上,它在幕后使用了 Google Translate,这是一种复杂的人工智能,能够解析数百万个短语以预测最适合当前任务的字符串。这完全是自动化的,你需要互联网连接才能使用 `blob.translate`。
+
+✅ 尝试更多句子。机器学习翻译和人工翻译哪个更好?在哪些情况下?
+
+## 情感分析
+
+机器学习在情感分析领域也表现得非常出色。一种非机器学习的方法是识别“积极”和“消极”的单词和短语。然后,给定一段新的文本,计算积极、消极和中性单词的总值,以确定整体情感。
+
+这种方法很容易被欺骗,就像你在 Marvin 任务中看到的那样——句子 `Great, that was a wonderful waste of time, I'm glad we are lost on this dark road` 是一个讽刺性的消极情感句子,但简单的算法会检测到“great”、“wonderful”、“glad”是积极的,“waste”、“lost”和“dark”是消极的。整体情感被这些矛盾的单词所影响。
+
+✅ 停下来想一想,作为人类说话者,我们如何表达讽刺。语调的变化起到了很大的作用。试着用不同的方式说“Well, that film was awesome”,看看你的声音如何传达意义。
+
+### 机器学习方法
+
+机器学习方法是手动收集消极和积极的文本——例如推文、电影评论,或者任何带有评分*和*书面意见的内容。然后可以将 NLP 技术应用于意见和评分,从而发现模式(例如,积极的电影评论中“奥斯卡级”这个短语出现的频率比消极电影评论中高,或者积极的餐厅评论中“美食”出现的频率比“恶心”高)。
+
+> ⚖️ **示例**:如果你在一个政治家的办公室工作,并且有一项新的法律正在讨论,选民可能会写邮件支持或反对这项新法律。假设你的任务是阅读这些邮件并将它们分为两类:*支持*和*反对*。如果邮件很多,你可能会因为试图阅读所有邮件而感到不堪重负。如果有一个机器人可以阅读所有邮件,理解它们并告诉你每封邮件属于哪个类别,那不是很好吗?
+>
+> 一种实现方法是使用机器学习。你可以用一部分*反对*邮件和一部分*支持*邮件来训练模型。模型会倾向于将某些短语和单词与反对方或支持方关联起来,*但它不会理解任何内容*,只会知道某些单词和模式更可能出现在反对或支持邮件中。你可以用一些未用于训练模型的邮件进行测试,看看它是否得出了与你相同的结论。然后,一旦你对模型的准确性感到满意,你就可以处理未来的邮件,而无需逐一阅读。
+
+✅ 这个过程是否类似于你在之前课程中使用的过程?
+
+## 练习 - 情感句子
+
+情感通过*极性*从 -1 到 1 来衡量,-1 表示最消极的情感,1 表示最积极的情感。情感还通过 0 到 1 的分数来衡量客观性(0)和主观性(1)。
+
+再看一眼简·奥斯汀的《傲慢与偏见》。文本可以在 [Project Gutenberg](https://www.gutenberg.org/files/1342/1342-h/1342-h.htm) 找到。以下示例展示了一个简短的程序,它分析了书中第一句和最后一句的情感,并显示其情感极性和主观性/客观性分数。
+
+你应该使用 `TextBlob` 库(如上所述)来确定 `sentiment`(你不需要自己编写情感计算器)来完成以下任务。
+
+```python
+from textblob import TextBlob
+
+quote1 = """It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want of a wife."""
+
+quote2 = """Darcy, as well as Elizabeth, really loved them; and they were both ever sensible of the warmest gratitude towards the persons who, by bringing her into Derbyshire, had been the means of uniting them."""
+
+sentiment1 = TextBlob(quote1).sentiment
+sentiment2 = TextBlob(quote2).sentiment
+
+print(quote1 + " has a sentiment of " + str(sentiment1))
+print(quote2 + " has a sentiment of " + str(sentiment2))
+```
+
+你会看到以下输出:
+
+```output
+It is a truth universally acknowledged, that a single man in possession of a good fortune, must be in want # of a wife. has a sentiment of Sentiment(polarity=0.20952380952380953, subjectivity=0.27142857142857146)
+
+Darcy, as well as Elizabeth, really loved them; and they were
+ both ever sensible of the warmest gratitude towards the persons
+ who, by bringing her into Derbyshire, had been the means of
+ uniting them. has a sentiment of Sentiment(polarity=0.7, subjectivity=0.8)
+```
+
+## 挑战 - 检查情感极性
+
+你的任务是使用情感极性来确定《傲慢与偏见》中绝对积极的句子是否多于绝对消极的句子。对于此任务,你可以假设极性分数为 1 或 -1 的句子是绝对积极或消极的。
+
+**步骤:**
+
+1. 从 Project Gutenberg 下载一份《傲慢与偏见》的 [副本](https://www.gutenberg.org/files/1342/1342-h/1342-h.htm) 作为 .txt 文件。删除文件开头和结尾的元数据,仅保留原始文本。
+2. 在 Python 中打开文件并将内容提取为字符串。
+3. 使用书的字符串创建一个 TextBlob。
+4. 在循环中分析书中的每个句子:
+ 1. 如果极性为 1 或 -1,将句子存储在一个数组或列表中,分别存储积极或消极的消息。
+5. 最后,分别打印出所有积极句子和消极句子,以及它们的数量。
+
+这里是一个 [示例解决方案](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/3-Translation-Sentiment/solution/notebook.ipynb)。
+
+✅ 知识检查
+
+1. 情感是基于句子中使用的单词,但代码是否*理解*这些单词?
+2. 你认为情感极性准确吗?换句话说,你是否*同意*这些分数?
+ 1. 特别是,你是否同意或不同意以下句子的绝对**积极**极性:
+ * “What an excellent father you have, girls!” said she, when the door was shut.
+ * “Your examination of Mr. Darcy is over, I presume,” said Miss Bingley; “and pray what is the result?” “I am perfectly convinced by it that Mr. Darcy has no defect.
+ * How wonderfully these sort of things occur!
+ * I have the greatest dislike in the world to that sort of thing.
+ * Charlotte is an excellent manager, I dare say.
+ * “This is delightful indeed!
+ * I am so happy!
+ * Your idea of the ponies is delightful.
+ 2. 以下三个句子被评分为绝对积极情感,但仔细阅读后,它们并不是积极句子。为什么情感分析认为它们是积极句子?
+ * Happy shall I be, when his stay at Netherfield is over!” “I wish I could say anything to comfort you,” replied Elizabeth; “but it is wholly out of my power.
+ * If I could but see you as happy!
+ * Our distress, my dear Lizzy, is very great.
+ 3. 你是否同意或不同意以下句子的绝对**消极**极性:
+ - Everybody is disgusted with his pride.
+ - “I should like to know how he behaves among strangers.” “You shall hear then—but prepare yourself for something very dreadful.
+ - The pause was to Elizabeth’s feelings dreadful.
+ - It would be dreadful!
+
+✅ 任何简·奥斯汀的爱好者都会理解,她经常在书中批评英国摄政时期社会中更荒谬的方面。《傲慢与偏见》的主角伊丽莎白·班内特是一个敏锐的社会观察者(就像作者一样),她的语言通常充满了深意。甚至故事中的爱情对象达西先生也注意到伊丽莎白的俏皮和戏谑的语言使用:“我有幸认识你足够久,知道你偶尔会发表一些实际上并非你真实观点的意见,并从中获得极大的乐趣。”
+
+---
+
+## 🚀挑战
+
+你能通过从用户输入中提取其他特征来让 Marvin 更加出色吗?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+从文本中提取情感有很多方法。想想可能会利用这种技术的商业应用。再想想它可能出错的情况。阅读更多关于分析情感的复杂企业级系统,例如 [Azure Text Analysis](https://docs.microsoft.com/azure/cognitive-services/Text-Analytics/how-tos/text-analytics-how-to-sentiment-analysis?tabs=version-3-1?WT.mc_id=academic-77952-leestott)。测试上面的一些《傲慢与偏见》的句子,看看它是否能检测出细微差别。
+
+## 作业
+
+[诗意许可](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/3-Translation-Sentiment/assignment.md b/translations/zh-CN/6-NLP/3-Translation-Sentiment/assignment.md
new file mode 100644
index 000000000..8c2c6f9a9
--- /dev/null
+++ b/translations/zh-CN/6-NLP/3-Translation-Sentiment/assignment.md
@@ -0,0 +1,16 @@
+# 诗意的许可
+
+## 说明
+
+在[这个笔记本](https://www.kaggle.com/jenlooper/emily-dickinson-word-frequency)中,你可以找到超过500首艾米莉·狄金森的诗,这些诗之前已经使用 Azure 文本分析进行了情感分析。利用这个数据集,按照课程中描述的方法进行分析。一首诗的情感倾向是否与更复杂的 Azure 服务的判断一致?在你看来,为什么会一致或不一致?有没有什么让你感到意外的地方?
+
+## 评分标准
+
+| 标准 | 优秀 | 合格 | 需要改进 |
+| -------- | -------------------------------------------------------------------------- | ------------------------------------------------------- | ------------------------ |
+| | 提交了一个包含对作者样本输出的深入分析的笔记本 | 笔记本不完整或未进行分析 | 未提交笔记本 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/Julia/README.md b/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/Julia/README.md
new file mode 100644
index 000000000..e2fb46232
--- /dev/null
+++ b/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/R/README.md b/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/R/README.md
new file mode 100644
index 000000000..ba3fc1469
--- /dev/null
+++ b/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/notebook.ipynb b/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/notebook.ipynb
new file mode 100644
index 000000000..581dd4d69
--- /dev/null
+++ b/translations/zh-CN/6-NLP/3-Translation-Sentiment/solution/notebook.ipynb
@@ -0,0 +1,100 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": 3
+ },
+ "orig_nbformat": 4,
+ "coopTranslator": {
+ "original_hash": "27de2abc0235ebd22080fc8f1107454d",
+ "translation_date": "2025-09-03T20:58:06+00:00",
+ "source_file": "6-NLP/3-Translation-Sentiment/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from textblob import TextBlob\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# You should download the book text, clean it, and import it here\n",
+ "with open(\"pride.txt\", encoding=\"utf8\") as f:\n",
+ " file_contents = f.read()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "book_pride = TextBlob(file_contents)\n",
+ "positive_sentiment_sentences = []\n",
+ "negative_sentiment_sentences = []"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "for sentence in book_pride.sentences:\n",
+ " if sentence.sentiment.polarity == 1:\n",
+ " positive_sentiment_sentences.append(sentence)\n",
+ " if sentence.sentiment.polarity == -1:\n",
+ " negative_sentiment_sentences.append(sentence)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(\"The \" + str(len(positive_sentiment_sentences)) + \" most positive sentences:\")\n",
+ "for sentence in positive_sentiment_sentences:\n",
+ " print(\"+ \" + str(sentence.replace(\"\\n\", \"\").replace(\" \", \" \")))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "print(\"The \" + str(len(negative_sentiment_sentences)) + \" most negative sentences:\")\n",
+ "for sentence in negative_sentiment_sentences:\n",
+ " print(\"- \" + str(sentence.replace(\"\\n\", \"\").replace(\" \", \" \")))"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/README.md b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/README.md
new file mode 100644
index 000000000..037a97521
--- /dev/null
+++ b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/README.md
@@ -0,0 +1,408 @@
+# 使用酒店评论进行情感分析 - 数据处理
+
+在本节中,您将使用前几课中的技术对一个大型数据集进行一些探索性数据分析。一旦您对各列的实用性有了良好的理解,您将学习:
+
+- 如何删除不必要的列
+- 如何基于现有列计算一些新数据
+- 如何保存处理后的数据集以用于最终挑战
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+### 简介
+
+到目前为止,您已经了解了文本数据与数值数据类型的不同。如果文本是由人类书写或口述的,它可以被分析以发现模式和频率、情感和意义。本课将带您进入一个真实的数据集并面对一个真实的挑战:**[欧洲515K酒店评论数据](https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe)**,并包含一个[CC0: 公共领域许可](https://creativecommons.org/publicdomain/zero/1.0/)。该数据集是从Booking.com的公共来源抓取的,数据集的创建者是Jiashen Liu。
+
+### 准备工作
+
+您需要:
+
+* 能够使用Python 3运行.ipynb笔记本
+* pandas
+* NLTK,[您需要在本地安装](https://www.nltk.org/install.html)
+* 数据集可从Kaggle下载:[欧洲515K酒店评论数据](https://www.kaggle.com/jiashenliu/515k-hotel-reviews-data-in-europe)。解压后约230 MB。将其下载到与这些NLP课程相关的根目录`/data`文件夹中。
+
+## 探索性数据分析
+
+本次挑战假设您正在构建一个使用情感分析和客人评论评分的酒店推荐机器人。您将使用的数据集包括6个城市中1493家不同酒店的评论。
+
+使用Python、酒店评论数据集和NLTK的情感分析,您可以发现:
+
+* 评论中最常用的词汇和短语是什么?
+* 描述酒店的官方*标签*是否与评论评分相关(例如,某个酒店的*家庭带小孩*标签是否比*独行旅客*标签有更多负面评论,这可能表明该酒店更适合*独行旅客*?)
+* NLTK的情感评分是否与酒店评论者的数值评分“吻合”?
+
+#### 数据集
+
+让我们探索您已下载并保存到本地的数据集。使用VS Code或Excel等编辑器打开文件。
+
+数据集的标题如下:
+
+*Hotel_Address, Additional_Number_of_Scoring, Review_Date, Average_Score, Hotel_Name, Reviewer_Nationality, Negative_Review, Review_Total_Negative_Word_Counts, Total_Number_of_Reviews, Positive_Review, Review_Total_Positive_Word_Counts, Total_Number_of_Reviews_Reviewer_Has_Given, Reviewer_Score, Tags, days_since_review, lat, lng*
+
+以下是按类别分组的标题,可能更容易检查:
+##### 酒店相关列
+
+* `Hotel_Name`, `Hotel_Address`, `lat`(纬度), `lng`(经度)
+ * 使用*lat*和*lng*,您可以使用Python绘制一张地图,显示酒店位置(或许可以根据正面和负面评论进行颜色编码)
+ * Hotel_Address对我们来说似乎没有明显的用处,我们可能会将其替换为国家名称以便更容易排序和搜索
+
+**酒店元评论列**
+
+* `Average_Score`
+ * 根据数据集创建者的说法,此列是*酒店的平均评分,基于过去一年内的最新评论计算*。这似乎是一种不寻常的评分计算方式,但由于数据是抓取的,我们暂时接受这一点。
+
+ ✅ 根据此数据中的其他列,您能想到另一种计算平均评分的方法吗?
+
+* `Total_Number_of_Reviews`
+ * 此酒店收到的评论总数——尚不清楚(需要编写一些代码)这是否指数据集中的评论。
+* `Additional_Number_of_Scoring`
+ * 表示评论者给出了评分但没有写正面或负面评论
+
+**评论相关列**
+
+- `Reviewer_Score`
+ - 这是一个数值,最多有1位小数,范围在2.5到10之间
+ - 未解释为何最低评分为2.5
+- `Negative_Review`
+ - 如果评论者未写任何内容,此字段将显示“**No Negative**”
+ - 请注意,评论者可能会在负面评论列中写正面评论(例如,“这家酒店没有任何不好的地方”)
+- `Review_Total_Negative_Word_Counts`
+ - 较高的负面词汇计数表明评分较低(无需检查情感性)
+- `Positive_Review`
+ - 如果评论者未写任何内容,此字段将显示“**No Positive**”
+ - 请注意,评论者可能会在正面评论列中写负面评论(例如,“这家酒店完全没有任何好的地方”)
+- `Review_Total_Positive_Word_Counts`
+ - 较高的正面词汇计数表明评分较高(无需检查情感性)
+- `Review_Date`和`days_since_review`
+ - 可以对评论应用新鲜度或陈旧度的衡量(较旧的评论可能不如较新的评论准确,因为酒店管理可能发生了变化,或者进行了装修,或者新增了泳池等)
+- `Tags`
+ - 这些是评论者可能选择的简短描述,用于描述他们的客人类型(例如独行或家庭)、房间类型、入住时长以及评论提交方式。
+ - 不幸的是,使用这些标签存在问题,请查看下面讨论其实用性的部分
+
+**评论者相关列**
+
+- `Total_Number_of_Reviews_Reviewer_Has_Given`
+ - 这可能是推荐模型中的一个因素,例如,如果您可以确定评论数量较多的评论者(有数百条评论)更倾向于给出负面而非正面评论。然而,任何特定评论的评论者并未通过唯一代码标识,因此无法链接到一组评论。有30位评论者有100条或更多评论,但很难看出这如何帮助推荐模型。
+- `Reviewer_Nationality`
+ - 有些人可能认为某些国籍更倾向于给出正面或负面评论,因为有某种国家倾向。构建这样的轶事观点到模型中时要小心。这些是国家(有时是种族)刻板印象,每位评论者都是根据自己的经历写评论的个体。评论可能受到许多因素的影响,例如他们之前的酒店住宿经历、旅行距离以及个人性格。认为评论评分是由国籍决定的很难证明。
+
+##### 示例
+
+| 平均评分 | 评论总数 | 评论者评分 | 负面评论 | 正面评论 | 标签 |
+| -------- | -------- | ---------- || ------------------------ | ----------------------------------------------------------------------------------------- |
+| 7.8 | 1945 | 2.5 | 这家酒店目前不是酒店而是一个施工现场,我在长途旅行后休息时被早晨和全天的建筑噪音折磨。人们整天在相邻房间工作,例如使用凿岩机。我要求换房,但没有安静的房间可用。更糟糕的是,我被多收了费用。我在晚上退房,因为我需要赶早班飞机,并收到了一张适当的账单。一天后,酒店未经我同意又收取了超出预订价格的费用。这是一个可怕的地方,不要惩罚自己来这里预订。 | 没有任何好处,糟糕的地方,远离这里 | 商务旅行,情侣,标准双人房,入住2晚 |
+
+如您所见,这位客人在这家酒店的入住体验非常糟糕。酒店的平均评分为7.8,有1945条评论,但这位评论者给出了2.5分,并写了115个词描述他们的负面体验。如果他们在正面评论列中未写任何内容,您可能会推测没有任何正面内容,但他们写了7个词警告其他人。如果我们仅仅统计词汇数量而不是词汇的意义或情感,我们可能会对评论者的意图有一个偏差的看法。奇怪的是,他们的评分为2.5令人困惑,因为如果酒店体验如此糟糕,为什么还给了任何分数?仔细调查数据集,您会发现最低可能评分是2.5,而不是0。最高可能评分是10。
+
+##### 标签
+
+如上所述,乍一看,使用`Tags`来分类数据似乎是合理的。不幸的是,这些标签并未标准化,这意味着在某个酒店中,选项可能是*单人房*、*双床房*和*双人房*,但在另一个酒店中,它们可能是*豪华单人房*、*经典大床房*和*行政特大床房*。这些可能是相同的房型,但有如此多的变体,选择变成了:
+
+1. 尝试将所有术语更改为单一标准,这非常困难,因为不清楚每种情况的转换路径(例如,*经典单人房*映射到*单人房*,但*带庭院花园或城市景观的高级大床房*则更难映射)
+
+1. 我们可以采取NLP方法,测量某些术语的频率,例如*独行*、*商务旅客*或*带小孩的家庭*,并将其应用到每家酒店中,从而将其纳入推荐模型
+
+标签通常(但并非总是)是一个包含5到6个逗号分隔值的单一字段,对应于*旅行类型*、*客人类型*、*房间类型*、*入住天数*以及*评论提交设备类型*。然而,由于某些评论者未填写每个字段(可能留空一个字段),值并不总是按相同顺序排列。
+
+例如,考虑*群体类型*。在`Tags`列中,此字段有1025种独特可能性,不幸的是,其中只有部分提到群体(有些是房间类型等)。如果您仅过滤提到家庭的标签,结果包含许多*家庭房*类型的结果。如果您包括术语*with*,即统计*家庭带*的值,结果会更好,在515,000条结果中有超过80,000条包含短语“带小孩的家庭”或“带大孩的家庭”。
+
+这意味着标签列对我们来说并非完全无用,但需要一些工作才能使其变得有用。
+
+##### 酒店平均评分
+
+数据集中有一些奇怪或不一致的地方我无法解释,但在此列出以便您在构建模型时注意。如果您能解决,请在讨论区告诉我们!
+
+数据集有以下与平均评分和评论数量相关的列:
+
+1. Hotel_Name
+2. Additional_Number_of_Scoring
+3. Average_Score
+4. Total_Number_of_Reviews
+5. Reviewer_Score
+
+数据集中评论最多的单一酒店是*Britannia International Hotel Canary Wharf*,有4789条评论(总计515,000条)。但如果我们查看此酒店的`Total_Number_of_Reviews`值,它是9086。您可能会推测有更多评分没有评论,因此我们可能需要加上`Additional_Number_of_Scoring`列的值。该值是2682,加上4789得到7471,仍然比`Total_Number_of_Reviews`少1615。
+
+如果您查看`Average_Score`列,您可能会推测它是数据集中评论的平均值,但Kaggle的描述是“*酒店的平均评分,基于过去一年内的最新评论计算*”。这似乎不太有用,但我们可以根据数据集中的评论评分计算自己的平均值。以同一家酒店为例,给出的平均酒店评分是7.1,但计算得出的评分(数据集中评论者评分的平均值)是6.8。这很接近,但不是相同的值,我们只能猜测`Additional_Number_of_Scoring`评论中的评分将平均值提高到7.1。不幸的是,由于无法测试或证明这一假设,使用或信任`Average_Score`、`Additional_Number_of_Scoring`和`Total_Number_of_Reviews`变得困难,因为它们基于或引用了我们没有的数据。
+
+更复杂的是,评论数量第二多的酒店的计算平均评分是8.12,而数据集中的`Average_Score`是8.1。这是否正确评分是巧合还是第一家酒店存在不一致?
+
+考虑到这些酒店可能是异常值,并且可能大多数值是匹配的(但由于某些原因有些不匹配),我们将在下一步编写一个简短的程序来探索数据集中的值并确定这些值的正确使用(或不使用)。
+> 🚨 注意事项
+>
+> 在处理这个数据集时,你将编写代码从文本中计算某些内容,而无需自己阅读或分析文本。这正是自然语言处理(NLP)的核心:无需人工参与即可解读意义或情感。然而,有可能你会读到一些负面评论。我建议你不要这样做,因为没有必要。有些评论很荒谬,或者是与酒店无关的负面评论,比如“天气不好”,这是酒店甚至任何人都无法控制的事情。但有些评论也有阴暗的一面。有时负面评论可能带有种族歧视、性别歧视或年龄歧视。这种情况令人遗憾,但在从公共网站抓取的数据集中是可以预料的。一些评论者会留下让人觉得反感、不适或不安的评论。最好让代码来衡量情感,而不是自己阅读这些评论后感到不快。话虽如此,这类评论只占少数,但它们确实存在。
+## 练习 - 数据探索
+### 加载数据
+
+通过视觉检查数据已经足够了,现在你需要编写一些代码来获取答案!本节将使用 pandas 库。你的第一个任务是确保能够加载并读取 CSV 数据。pandas 库提供了一个快速的 CSV 加载器,加载结果会存储在一个 dataframe 中,就像之前的课程一样。我们加载的 CSV 文件有超过 50 万行,但只有 17 列。pandas 提供了许多强大的方法来与 dataframe 交互,包括对每一行执行操作的能力。
+
+从现在开始,这节课将包含代码片段、代码解释以及对结果的讨论。请使用提供的 _notebook.ipynb_ 文件来编写代码。
+
+让我们从加载你将使用的数据文件开始:
+
+```python
+# Load the hotel reviews from CSV
+import pandas as pd
+import time
+# importing time so the start and end time can be used to calculate file loading time
+print("Loading data file now, this could take a while depending on file size")
+start = time.time()
+# df is 'DataFrame' - make sure you downloaded the file to the data folder
+df = pd.read_csv('../../data/Hotel_Reviews.csv')
+end = time.time()
+print("Loading took " + str(round(end - start, 2)) + " seconds")
+```
+
+现在数据已经加载,我们可以对其进行一些操作。在接下来的部分中,请将这段代码保留在程序的顶部。
+
+## 数据探索
+
+在这个例子中,数据已经是*干净的*,这意味着它已经可以直接使用,并且没有其他语言的字符,这些字符可能会干扰只期望英文字符的算法。
+
+✅ 你可能需要处理一些需要初步格式化的数据,然后再应用 NLP 技术,但这次不需要。如果需要处理非英文字符,你会怎么做?
+
+花点时间确保数据加载后,你可以通过代码来探索它。很容易想要直接关注 `Negative_Review` 和 `Positive_Review` 列。它们包含了自然文本,供你的 NLP 算法处理。但等等!在跳入 NLP 和情感分析之前,你应该按照下面的代码检查数据集中给出的值是否与通过 pandas 计算的值一致。
+
+## Dataframe 操作
+
+本节的第一个任务是通过编写代码检查以下断言是否正确(无需更改 dataframe)。
+
+> 就像许多编程任务一样,完成这些任务的方法有很多,但一个好的建议是尽可能简单、易懂,尤其是当你以后需要回顾这段代码时。对于 dataframe,pandas 提供了一个全面的 API,通常可以高效地完成你想要的操作。
+
+将以下问题视为编码任务,尝试在不查看答案的情况下完成它们。
+
+1. 打印出刚刚加载的 dataframe 的*形状*(即行数和列数)。
+2. 计算评论者国籍的频率统计:
+ 1. `Reviewer_Nationality` 列中有多少个不同的值?它们分别是什么?
+ 2. 数据集中最常见的评论者国籍是什么?(打印国家和评论数量)
+ 3. 接下来最常见的 10 个国籍及其频率统计是什么?
+3. 对于评论最多的前 10 个国籍,每个国籍评论最多的酒店是什么?
+4. 数据集中每个酒店的评论数量是多少?(按酒店统计频率)
+5. 数据集中每个酒店都有一个 `Average_Score` 列,但你也可以计算一个平均分(即根据数据集中每个酒店的所有评论分数计算平均值)。为 dataframe 添加一个新列,列名为 `Calc_Average_Score`,存储计算的平均分。
+6. 是否有酒店的 `Average_Score` 和 `Calc_Average_Score`(四舍五入到小数点后一位)相同?
+ 1. 尝试编写一个 Python 函数,该函数接受一个 Series(行)作为参数,比较这两个值,并在值不相等时打印消息。然后使用 `.apply()` 方法对每一行应用该函数。
+7. 计算并打印 `Negative_Review` 列中值为 "No Negative" 的行数。
+8. 计算并打印 `Positive_Review` 列中值为 "No Positive" 的行数。
+9. 计算并打印 `Positive_Review` 列中值为 "No Positive" 且 `Negative_Review` 列中值为 "No Negative" 的行数。
+
+### 代码答案
+
+1. 打印出刚刚加载的 dataframe 的*形状*(即行数和列数)
+
+ ```python
+ print("The shape of the data (rows, cols) is " + str(df.shape))
+ > The shape of the data (rows, cols) is (515738, 17)
+ ```
+
+2. 计算评论者国籍的频率统计:
+
+ 1. `Reviewer_Nationality` 列中有多少个不同的值?它们分别是什么?
+ 2. 数据集中最常见的评论者国籍是什么?(打印国家和评论数量)
+
+ ```python
+ # value_counts() creates a Series object that has index and values in this case, the country and the frequency they occur in reviewer nationality
+ nationality_freq = df["Reviewer_Nationality"].value_counts()
+ print("There are " + str(nationality_freq.size) + " different nationalities")
+ # print first and last rows of the Series. Change to nationality_freq.to_string() to print all of the data
+ print(nationality_freq)
+
+ There are 227 different nationalities
+ United Kingdom 245246
+ United States of America 35437
+ Australia 21686
+ Ireland 14827
+ United Arab Emirates 10235
+ ...
+ Comoros 1
+ Palau 1
+ Northern Mariana Islands 1
+ Cape Verde 1
+ Guinea 1
+ Name: Reviewer_Nationality, Length: 227, dtype: int64
+ ```
+
+ 3. 接下来最常见的 10 个国籍及其频率统计是什么?
+
+ ```python
+ print("The highest frequency reviewer nationality is " + str(nationality_freq.index[0]).strip() + " with " + str(nationality_freq[0]) + " reviews.")
+ # Notice there is a leading space on the values, strip() removes that for printing
+ # What is the top 10 most common nationalities and their frequencies?
+ print("The next 10 highest frequency reviewer nationalities are:")
+ print(nationality_freq[1:11].to_string())
+
+ The highest frequency reviewer nationality is United Kingdom with 245246 reviews.
+ The next 10 highest frequency reviewer nationalities are:
+ United States of America 35437
+ Australia 21686
+ Ireland 14827
+ United Arab Emirates 10235
+ Saudi Arabia 8951
+ Netherlands 8772
+ Switzerland 8678
+ Germany 7941
+ Canada 7894
+ France 7296
+ ```
+
+3. 对于评论最多的前 10 个国籍,每个国籍评论最多的酒店是什么?
+
+ ```python
+ # What was the most frequently reviewed hotel for the top 10 nationalities
+ # Normally with pandas you will avoid an explicit loop, but wanted to show creating a new dataframe using criteria (don't do this with large amounts of data because it could be very slow)
+ for nat in nationality_freq[:10].index:
+ # First, extract all the rows that match the criteria into a new dataframe
+ nat_df = df[df["Reviewer_Nationality"] == nat]
+ # Now get the hotel freq
+ freq = nat_df["Hotel_Name"].value_counts()
+ print("The most reviewed hotel for " + str(nat).strip() + " was " + str(freq.index[0]) + " with " + str(freq[0]) + " reviews.")
+
+ The most reviewed hotel for United Kingdom was Britannia International Hotel Canary Wharf with 3833 reviews.
+ The most reviewed hotel for United States of America was Hotel Esther a with 423 reviews.
+ The most reviewed hotel for Australia was Park Plaza Westminster Bridge London with 167 reviews.
+ The most reviewed hotel for Ireland was Copthorne Tara Hotel London Kensington with 239 reviews.
+ The most reviewed hotel for United Arab Emirates was Millennium Hotel London Knightsbridge with 129 reviews.
+ The most reviewed hotel for Saudi Arabia was The Cumberland A Guoman Hotel with 142 reviews.
+ The most reviewed hotel for Netherlands was Jaz Amsterdam with 97 reviews.
+ The most reviewed hotel for Switzerland was Hotel Da Vinci with 97 reviews.
+ The most reviewed hotel for Germany was Hotel Da Vinci with 86 reviews.
+ The most reviewed hotel for Canada was St James Court A Taj Hotel London with 61 reviews.
+ ```
+
+4. 数据集中每个酒店的评论数量是多少?(按酒店统计频率)
+
+ ```python
+ # First create a new dataframe based on the old one, removing the uneeded columns
+ hotel_freq_df = df.drop(["Hotel_Address", "Additional_Number_of_Scoring", "Review_Date", "Average_Score", "Reviewer_Nationality", "Negative_Review", "Review_Total_Negative_Word_Counts", "Positive_Review", "Review_Total_Positive_Word_Counts", "Total_Number_of_Reviews_Reviewer_Has_Given", "Reviewer_Score", "Tags", "days_since_review", "lat", "lng"], axis = 1)
+
+ # Group the rows by Hotel_Name, count them and put the result in a new column Total_Reviews_Found
+ hotel_freq_df['Total_Reviews_Found'] = hotel_freq_df.groupby('Hotel_Name').transform('count')
+
+ # Get rid of all the duplicated rows
+ hotel_freq_df = hotel_freq_df.drop_duplicates(subset = ["Hotel_Name"])
+ display(hotel_freq_df)
+ ```
+ | Hotel_Name | Total_Number_of_Reviews | Total_Reviews_Found |
+ | :----------------------------------------: | :---------------------: | :-----------------: |
+ | Britannia International Hotel Canary Wharf | 9086 | 4789 |
+ | Park Plaza Westminster Bridge London | 12158 | 4169 |
+ | Copthorne Tara Hotel London Kensington | 7105 | 3578 |
+ | ... | ... | ... |
+ | Mercure Paris Porte d Orleans | 110 | 10 |
+ | Hotel Wagner | 135 | 10 |
+ | Hotel Gallitzinberg | 173 | 8 |
+
+ 你可能会注意到,*数据集中统计的*结果与 `Total_Number_of_Reviews` 的值不匹配。目前尚不清楚数据集中该值是否表示酒店的总评论数,但并未全部被抓取,或者是其他计算方式。由于这种不确定性,`Total_Number_of_Reviews` 并未用于模型中。
+
+5. 数据集中每个酒店都有一个 `Average_Score` 列,但你也可以计算一个平均分(即根据数据集中每个酒店的所有评论分数计算平均值)。为 dataframe 添加一个新列,列名为 `Calc_Average_Score`,存储计算的平均分。打印出 `Hotel_Name`、`Average_Score` 和 `Calc_Average_Score` 列。
+
+ ```python
+ # define a function that takes a row and performs some calculation with it
+ def get_difference_review_avg(row):
+ return row["Average_Score"] - row["Calc_Average_Score"]
+
+ # 'mean' is mathematical word for 'average'
+ df['Calc_Average_Score'] = round(df.groupby('Hotel_Name').Reviewer_Score.transform('mean'), 1)
+
+ # Add a new column with the difference between the two average scores
+ df["Average_Score_Difference"] = df.apply(get_difference_review_avg, axis = 1)
+
+ # Create a df without all the duplicates of Hotel_Name (so only 1 row per hotel)
+ review_scores_df = df.drop_duplicates(subset = ["Hotel_Name"])
+
+ # Sort the dataframe to find the lowest and highest average score difference
+ review_scores_df = review_scores_df.sort_values(by=["Average_Score_Difference"])
+
+ display(review_scores_df[["Average_Score_Difference", "Average_Score", "Calc_Average_Score", "Hotel_Name"]])
+ ```
+
+ 你可能还会疑惑 `Average_Score` 的值为何有时与计算的平均分不同。由于我们无法知道为什么有些值匹配,而其他值存在差异,在这种情况下,最安全的做法是使用评论分数自行计算平均分。不过,差异通常非常小,以下是数据集中平均分与计算平均分差异最大的酒店:
+
+ | Average_Score_Difference | Average_Score | Calc_Average_Score | Hotel_Name |
+ | :----------------------: | :-----------: | :----------------: | ------------------------------------------: |
+ | -0.8 | 7.7 | 8.5 | Best Western Hotel Astoria |
+ | -0.7 | 8.8 | 9.5 | Hotel Stendhal Place Vend me Paris MGallery |
+ | -0.7 | 7.5 | 8.2 | Mercure Paris Porte d Orleans |
+ | -0.7 | 7.9 | 8.6 | Renaissance Paris Vendome Hotel |
+ | -0.5 | 7.0 | 7.5 | Hotel Royal Elys es |
+ | ... | ... | ... | ... |
+ | 0.7 | 7.5 | 6.8 | Mercure Paris Op ra Faubourg Montmartre |
+ | 0.8 | 7.1 | 6.3 | Holiday Inn Paris Montparnasse Pasteur |
+ | 0.9 | 6.8 | 5.9 | Villa Eugenie |
+ | 0.9 | 8.6 | 7.7 | MARQUIS Faubourg St Honor Relais Ch teaux |
+ | 1.3 | 7.2 | 5.9 | Kube Hotel Ice Bar |
+
+ 只有 1 家酒店的分数差异大于 1,这意味着我们可以忽略这些差异,使用计算的平均分。
+
+6. 计算并打印 `Negative_Review` 列中值为 "No Negative" 的行数。
+
+7. 计算并打印 `Positive_Review` 列中值为 "No Positive" 的行数。
+
+8. 计算并打印 `Positive_Review` 列中值为 "No Positive" 且 `Negative_Review` 列中值为 "No Negative" 的行数。
+
+ ```python
+ # with lambdas:
+ start = time.time()
+ no_negative_reviews = df.apply(lambda x: True if x['Negative_Review'] == "No Negative" else False , axis=1)
+ print("Number of No Negative reviews: " + str(len(no_negative_reviews[no_negative_reviews == True].index)))
+
+ no_positive_reviews = df.apply(lambda x: True if x['Positive_Review'] == "No Positive" else False , axis=1)
+ print("Number of No Positive reviews: " + str(len(no_positive_reviews[no_positive_reviews == True].index)))
+
+ both_no_reviews = df.apply(lambda x: True if x['Negative_Review'] == "No Negative" and x['Positive_Review'] == "No Positive" else False , axis=1)
+ print("Number of both No Negative and No Positive reviews: " + str(len(both_no_reviews[both_no_reviews == True].index)))
+ end = time.time()
+ print("Lambdas took " + str(round(end - start, 2)) + " seconds")
+
+ Number of No Negative reviews: 127890
+ Number of No Positive reviews: 35946
+ Number of both No Negative and No Positive reviews: 127
+ Lambdas took 9.64 seconds
+ ```
+
+## 另一种方法
+
+另一种方法是不用 Lambdas,而是使用 sum 来统计行数:
+
+ ```python
+ # without lambdas (using a mixture of notations to show you can use both)
+ start = time.time()
+ no_negative_reviews = sum(df.Negative_Review == "No Negative")
+ print("Number of No Negative reviews: " + str(no_negative_reviews))
+
+ no_positive_reviews = sum(df["Positive_Review"] == "No Positive")
+ print("Number of No Positive reviews: " + str(no_positive_reviews))
+
+ both_no_reviews = sum((df.Negative_Review == "No Negative") & (df.Positive_Review == "No Positive"))
+ print("Number of both No Negative and No Positive reviews: " + str(both_no_reviews))
+
+ end = time.time()
+ print("Sum took " + str(round(end - start, 2)) + " seconds")
+
+ Number of No Negative reviews: 127890
+ Number of No Positive reviews: 35946
+ Number of both No Negative and No Positive reviews: 127
+ Sum took 0.19 seconds
+ ```
+
+ 你可能注意到,有 127 行的 `Negative_Review` 和 `Positive_Review` 列分别为 "No Negative" 和 "No Positive"。这意味着评论者给酒店打了一个数字分数,但没有写任何正面或负面的评论。幸运的是,这只是很少的一部分数据(127 行占 515738 行的 0.02%),所以它可能不会对我们的模型或结果产生显著影响。不过,你可能没有预料到一个评论数据集中会有没有评论内容的行,因此值得探索数据以发现类似的情况。
+
+现在你已经探索了数据集,在下一节课中,你将过滤数据并添加一些情感分析。
+
+---
+## 🚀挑战
+
+正如我们在之前的课程中看到的,这节课展示了理解数据及其特性在执行操作之前是多么重要。特别是基于文本的数据需要仔细检查。深入挖掘各种以文本为主的数据集,看看是否能发现可能引入偏差或导致情感倾斜的地方。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+参加 [NLP 学习路径](https://docs.microsoft.com/learn/paths/explore-natural-language-processing/?WT.mc_id=academic-77952-leestott),了解构建语音和文本模型时可以尝试的工具。
+
+## 作业
+
+[NLTK](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/assignment.md b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/assignment.md
new file mode 100644
index 000000000..2490acf81
--- /dev/null
+++ b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/assignment.md
@@ -0,0 +1,10 @@
+# NLTK
+
+## 使用说明
+
+NLTK 是一个广受欢迎的库,用于计算语言学和自然语言处理。请利用这个机会阅读 '[NLTK 书籍](https://www.nltk.org/book/)' 并尝试其中的练习。在这个不计分的作业中,您将更深入地了解这个库。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/notebook.ipynb b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/notebook.ipynb
new file mode 100644
index 000000000..e69de29bb
diff --git a/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md
new file mode 100644
index 000000000..c0d51b129
--- /dev/null
+++ b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/R/README.md b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/R/README.md
new file mode 100644
index 000000000..e939b3c66
--- /dev/null
+++ b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb
new file mode 100644
index 000000000..113161615
--- /dev/null
+++ b/translations/zh-CN/6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb
@@ -0,0 +1,174 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": 3
+ },
+ "orig_nbformat": 4,
+ "coopTranslator": {
+ "original_hash": "2d05e7db439376aa824f4b387f8324ca",
+ "translation_date": "2025-09-03T20:57:48+00:00",
+ "source_file": "6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# EDA\n",
+ "import pandas as pd\n",
+ "import time"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def get_difference_review_avg(row):\n",
+ " return row[\"Average_Score\"] - row[\"Calc_Average_Score\"]"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Load the hotel reviews from CSV\n",
+ "print(\"Loading data file now, this could take a while depending on file size\")\n",
+ "start = time.time()\n",
+ "df = pd.read_csv('../../data/Hotel_Reviews.csv')\n",
+ "end = time.time()\n",
+ "print(\"Loading took \" + str(round(end - start, 2)) + \" seconds\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# What shape is the data (rows, columns)?\n",
+ "print(\"The shape of the data (rows, cols) is \" + str(df.shape))\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# value_counts() creates a Series object that has index and values\n",
+ "# in this case, the country and the frequency they occur in reviewer nationality\n",
+ "nationality_freq = df[\"Reviewer_Nationality\"].value_counts()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# What reviewer nationality is the most common in the dataset?\n",
+ "print(\"The highest frequency reviewer nationality is \" + str(nationality_freq.index[0]).strip() + \" with \" + str(nationality_freq[0]) + \" reviews.\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# What is the top 10 most common nationalities and their frequencies?\n",
+ "print(\"The top 10 highest frequency reviewer nationalities are:\")\n",
+ "print(nationality_freq[0:10].to_string())\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# How many unique nationalities are there?\n",
+ "print(\"There are \" + str(nationality_freq.index.size) + \" unique nationalities in the dataset\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# What was the most frequently reviewed hotel for the top 10 nationalities - print the hotel and number of reviews\n",
+ "for nat in nationality_freq[:10].index:\n",
+ " # First, extract all the rows that match the criteria into a new dataframe\n",
+ " nat_df = df[df[\"Reviewer_Nationality\"] == nat] \n",
+ " # Now get the hotel freq\n",
+ " freq = nat_df[\"Hotel_Name\"].value_counts()\n",
+ " print(\"The most reviewed hotel for \" + str(nat).strip() + \" was \" + str(freq.index[0]) + \" with \" + str(freq[0]) + \" reviews.\") \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# How many reviews are there per hotel (frequency count of hotel) and do the results match the value in `Total_Number_of_Reviews`?\n",
+ "# First create a new dataframe based on the old one, removing the uneeded columns\n",
+ "hotel_freq_df = df.drop([\"Hotel_Address\", \"Additional_Number_of_Scoring\", \"Review_Date\", \"Average_Score\", \"Reviewer_Nationality\", \"Negative_Review\", \"Review_Total_Negative_Word_Counts\", \"Positive_Review\", \"Review_Total_Positive_Word_Counts\", \"Total_Number_of_Reviews_Reviewer_Has_Given\", \"Reviewer_Score\", \"Tags\", \"days_since_review\", \"lat\", \"lng\"], axis = 1)\n",
+ "# Group the rows by Hotel_Name, count them and put the result in a new column Total_Reviews_Found\n",
+ "hotel_freq_df['Total_Reviews_Found'] = hotel_freq_df.groupby('Hotel_Name').transform('count')\n",
+ "# Get rid of all the duplicated rows\n",
+ "hotel_freq_df = hotel_freq_df.drop_duplicates(subset = [\"Hotel_Name\"])\n",
+ "print()\n",
+ "print(hotel_freq_df.to_string())\n",
+ "print(str(hotel_freq_df.shape))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# While there is an `Average_Score` for each hotel according to the dataset, \n",
+ "# you can also calculate an average score (getting the average of all reviewer scores in the dataset for each hotel)\n",
+ "# Add a new column to your dataframe with the column header `Calc_Average_Score` that contains that calculated average. \n",
+ "df['Calc_Average_Score'] = round(df.groupby('Hotel_Name').Reviewer_Score.transform('mean'), 1)\n",
+ "# Add a new column with the difference between the two average scores\n",
+ "df[\"Average_Score_Difference\"] = df.apply(get_difference_review_avg, axis = 1)\n",
+ "# Create a df without all the duplicates of Hotel_Name (so only 1 row per hotel)\n",
+ "review_scores_df = df.drop_duplicates(subset = [\"Hotel_Name\"])\n",
+ "# Sort the dataframe to find the lowest and highest average score difference\n",
+ "review_scores_df = review_scores_df.sort_values(by=[\"Average_Score_Difference\"])\n",
+ "print(review_scores_df[[\"Average_Score_Difference\", \"Average_Score\", \"Calc_Average_Score\", \"Hotel_Name\"]])\n",
+ "# Do any hotels have the same (rounded to 1 decimal place) `Average_Score` and `Calc_Average_Score`?\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/README.md b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/README.md
new file mode 100644
index 000000000..c212881e0
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/README.md
@@ -0,0 +1,375 @@
+# 使用酒店评论进行情感分析
+
+现在您已经详细探索了数据集,是时候筛选列并对数据集应用NLP技术,以便获得关于酒店的新见解。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+### 筛选与情感分析操作
+
+正如您可能已经注意到的,数据集存在一些问题。一些列充满了无用的信息,另一些列看起来不正确。即使它们是正确的,也不清楚它们是如何计算的,您无法通过自己的计算独立验证答案。
+
+## 练习:进一步处理数据
+
+对数据进行更多清理。添加一些后续会用到的列,修改其他列中的值,并完全删除某些列。
+
+1. 初步列处理
+
+ 1. 删除 `lat` 和 `lng`
+
+ 2. 将 `Hotel_Address` 的值替换为以下值(如果地址中包含城市和国家的名称,则将其更改为仅包含城市和国家)。
+
+ 数据集中仅包含以下城市和国家:
+
+ 阿姆斯特丹,荷兰
+ 巴塞罗那,西班牙
+ 伦敦,英国
+ 米兰,意大利
+ 巴黎,法国
+ 维也纳,奥地利
+
+ ```python
+ def replace_address(row):
+ if "Netherlands" in row["Hotel_Address"]:
+ return "Amsterdam, Netherlands"
+ elif "Barcelona" in row["Hotel_Address"]:
+ return "Barcelona, Spain"
+ elif "United Kingdom" in row["Hotel_Address"]:
+ return "London, United Kingdom"
+ elif "Milan" in row["Hotel_Address"]:
+ return "Milan, Italy"
+ elif "France" in row["Hotel_Address"]:
+ return "Paris, France"
+ elif "Vienna" in row["Hotel_Address"]:
+ return "Vienna, Austria"
+
+ # Replace all the addresses with a shortened, more useful form
+ df["Hotel_Address"] = df.apply(replace_address, axis = 1)
+ # The sum of the value_counts() should add up to the total number of reviews
+ print(df["Hotel_Address"].value_counts())
+ ```
+
+ 现在您可以查询国家级别的数据:
+
+ ```python
+ display(df.groupby("Hotel_Address").agg({"Hotel_Name": "nunique"}))
+ ```
+
+ | Hotel_Address | Hotel_Name |
+ | :--------------------- | :--------: |
+ | 阿姆斯特丹,荷兰 | 105 |
+ | 巴塞罗那,西班牙 | 211 |
+ | 伦敦,英国 | 400 |
+ | 米兰,意大利 | 162 |
+ | 巴黎,法国 | 458 |
+ | 维也纳,奥地利 | 158 |
+
+2. 处理酒店元评论列
+
+ 1. 删除 `Additional_Number_of_Scoring`
+
+ 2. 将 `Total_Number_of_Reviews` 替换为数据集中该酒店实际的评论总数
+
+ 3. 用我们自己计算的分数替换 `Average_Score`
+
+ ```python
+ # Drop `Additional_Number_of_Scoring`
+ df.drop(["Additional_Number_of_Scoring"], axis = 1, inplace=True)
+ # Replace `Total_Number_of_Reviews` and `Average_Score` with our own calculated values
+ df.Total_Number_of_Reviews = df.groupby('Hotel_Name').transform('count')
+ df.Average_Score = round(df.groupby('Hotel_Name').Reviewer_Score.transform('mean'), 1)
+ ```
+
+3. 处理评论列
+
+ 1. 删除 `Review_Total_Negative_Word_Counts`、`Review_Total_Positive_Word_Counts`、`Review_Date` 和 `days_since_review`
+
+ 2. 保留 `Reviewer_Score`、`Negative_Review` 和 `Positive_Review` 不变
+
+ 3. 暂时保留 `Tags`
+
+ - 我们将在下一部分对标签进行一些额外的筛选操作,然后再删除标签
+
+4. 处理评论者列
+
+ 1. 删除 `Total_Number_of_Reviews_Reviewer_Has_Given`
+
+ 2. 保留 `Reviewer_Nationality`
+
+### 标签列
+
+`Tag` 列是一个问题,因为它是一个以文本形式存储的列表。不幸的是,该列中的子部分顺序和数量并不总是相同的。由于数据集有515,000行和1427家酒店,每个评论者可以选择的选项略有不同,因此人类很难识别出需要关注的正确短语。这正是NLP的优势所在。您可以扫描文本,找到最常见的短语并统计它们的数量。
+
+不幸的是,我们对单个单词不感兴趣,而是对多词短语(例如 *商务旅行*)感兴趣。在如此庞大的数据(6762646个单词)上运行多词频率分布算法可能需要极长的时间,但在不了解数据的情况下,这似乎是必要的开销。这时,探索性数据分析就派上用场了,因为您已经看到了标签的样本,例如 `[' 商务旅行 ', ' 独自旅行者 ', ' 单人房 ', ' 住了5晚 ', ' 从移动设备提交 ']`,您可以开始思考是否有可能大幅减少需要处理的数据量。幸运的是,这是可能的——但首先您需要遵循一些步骤来确定感兴趣的标签。
+
+### 筛选标签
+
+记住,数据集的目标是添加情感和列,以帮助您选择最佳酒店(无论是为自己还是为客户创建一个酒店推荐机器人)。您需要问自己,这些标签在最终数据集中是否有用。以下是一个解释(如果您出于其他原因需要数据集,不同的标签可能会被保留或删除):
+
+1. 旅行类型是相关的,应该保留
+2. 客人群体类型是重要的,应该保留
+3. 客人入住的房间、套房或工作室类型是无关的(所有酒店基本上都有相同的房间)
+4. 提交评论的设备是无关的
+5. 评论者入住的晚数*可能*相关,如果您认为更长的入住时间意味着他们更喜欢酒店,但这有点牵强,可能无关
+
+总之,**保留两类标签,删除其他标签**。
+
+首先,您不想在标签格式更好之前统计它们,因此需要移除方括号和引号。您可以通过多种方式完成此操作,但您需要最快的方法,因为处理大量数据可能需要很长时间。幸运的是,pandas 提供了一种简单的方法来完成这些步骤。
+
+```Python
+# Remove opening and closing brackets
+df.Tags = df.Tags.str.strip("[']")
+# remove all quotes too
+df.Tags = df.Tags.str.replace(" ', '", ",", regex = False)
+```
+
+每个标签变成类似于:`商务旅行, 独自旅行者, 单人房, 住了5晚, 从移动设备提交`。
+
+接下来我们发现一个问题。一些评论(或行)有5列,一些有3列,一些有6列。这是数据集创建方式的结果,很难修复。您希望统计每个短语的频率,但它们在每条评论中的顺序不同,因此统计可能会出错,某些酒店可能没有被分配到它应得的标签。
+
+相反,您可以利用不同的顺序,因为每个标签是多词的,但也用逗号分隔!最简单的方法是创建6个临时列,将每个标签插入到对应顺序的列中。然后,您可以将这6列合并为一个大列,并对结果列运行 `value_counts()` 方法。打印出来后,您会看到有2428个唯一标签。以下是一个小样本:
+
+| 标签 | 计数 |
+| --------------------------------- | ------ |
+| 休闲旅行 | 417778 |
+| 从移动设备提交 | 307640 |
+| 夫妻 | 252294 |
+| 住了1晚 | 193645 |
+| 住了2晚 | 133937 |
+| 独自旅行者 | 108545 |
+| 住了3晚 | 95821 |
+| 商务旅行 | 82939 |
+| 团体 | 65392 |
+| 带小孩的家庭 | 61015 |
+| 住了4晚 | 47817 |
+| 双人房 | 35207 |
+| 标准双人房 | 32248 |
+| 高级双人房 | 31393 |
+| 带大孩的家庭 | 26349 |
+| 豪华双人房 | 24823 |
+| 双人或双床房 | 22393 |
+| 住了5晚 | 20845 |
+| 标准双人或双床房 | 17483 |
+| 经典双人房 | 16989 |
+| 高级双人或双床房 | 13570 |
+| 2间房 | 12393 |
+
+一些常见标签如 `从移动设备提交` 对我们没有用,因此在统计短语出现次数之前删除它们可能是明智的,但由于这是一个非常快速的操作,您可以将它们保留并忽略它们。
+
+### 删除入住时长标签
+
+删除这些标签是第一步,这稍微减少了需要考虑的标签总数。注意,您并没有从数据集中删除它们,只是选择不将它们作为评论数据集中需要统计/保留的值。
+
+| 入住时长 | 计数 |
+| -------------- | ------ |
+| 住了1晚 | 193645 |
+| 住了2晚 | 133937 |
+| 住了3晚 | 95821 |
+| 住了4晚 | 47817 |
+| 住了5晚 | 20845 |
+| 住了6晚 | 9776 |
+| 住了7晚 | 7399 |
+| 住了8晚 | 2502 |
+| 住了9晚 | 1293 |
+| ... | ... |
+
+房间、套房、工作室、公寓等类型种类繁多。它们的意义大致相同,对您来说并不重要,因此从考虑中删除它们。
+
+| 房间类型 | 计数 |
+| ---------------------------- | ----- |
+| 双人房 | 35207 |
+| 标准双人房 | 32248 |
+| 高级双人房 | 31393 |
+| 豪华双人房 | 24823 |
+| 双人或双床房 | 22393 |
+| 标准双人或双床房 | 17483 |
+| 经典双人房 | 16989 |
+| 高级双人或双床房 | 13570 |
+
+最后,令人欣喜的是(因为几乎不需要处理),您将剩下以下**有用**的标签:
+
+| 标签 | 计数 |
+| --------------------------------------------- | ------ |
+| 休闲旅行 | 417778 |
+| 夫妻 | 252294 |
+| 独自旅行者 | 108545 |
+| 商务旅行 | 82939 |
+| 团体(与朋友旅行者合并) | 67535 |
+| 带小孩的家庭 | 61015 |
+| 带大孩的家庭 | 26349 |
+| 带宠物 | 1405 |
+
+您可以认为 `与朋友旅行者` 与 `团体` 基本相同,将两者合并是合理的,如上所示。识别正确标签的代码在 [Tags notebook](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) 中。
+
+最后一步是为每个这些标签创建新列。然后,对于每条评论行,如果 `Tag` 列与新列之一匹配,则添加1,否则添加0。最终结果将是一个统计数据,显示有多少评论者选择了这家酒店(总体上)用于商务、休闲或带宠物入住,这在推荐酒店时是有用的信息。
+
+```python
+# Process the Tags into new columns
+# The file Hotel_Reviews_Tags.py, identifies the most important tags
+# Leisure trip, Couple, Solo traveler, Business trip, Group combined with Travelers with friends,
+# Family with young children, Family with older children, With a pet
+df["Leisure_trip"] = df.Tags.apply(lambda tag: 1 if "Leisure trip" in tag else 0)
+df["Couple"] = df.Tags.apply(lambda tag: 1 if "Couple" in tag else 0)
+df["Solo_traveler"] = df.Tags.apply(lambda tag: 1 if "Solo traveler" in tag else 0)
+df["Business_trip"] = df.Tags.apply(lambda tag: 1 if "Business trip" in tag else 0)
+df["Group"] = df.Tags.apply(lambda tag: 1 if "Group" in tag or "Travelers with friends" in tag else 0)
+df["Family_with_young_children"] = df.Tags.apply(lambda tag: 1 if "Family with young children" in tag else 0)
+df["Family_with_older_children"] = df.Tags.apply(lambda tag: 1 if "Family with older children" in tag else 0)
+df["With_a_pet"] = df.Tags.apply(lambda tag: 1 if "With a pet" in tag else 0)
+
+```
+
+### 保存文件
+
+最后,将当前数据集保存为一个新名称。
+
+```python
+df.drop(["Review_Total_Negative_Word_Counts", "Review_Total_Positive_Word_Counts", "days_since_review", "Total_Number_of_Reviews_Reviewer_Has_Given"], axis = 1, inplace=True)
+
+# Saving new data file with calculated columns
+print("Saving results to Hotel_Reviews_Filtered.csv")
+df.to_csv(r'../data/Hotel_Reviews_Filtered.csv', index = False)
+```
+
+## 情感分析操作
+
+在最后一部分中,您将对评论列应用情感分析,并将结果保存到数据集中。
+
+## 练习:加载并保存筛选后的数据
+
+注意,现在您加载的是上一部分保存的筛选后的数据集,而**不是**原始数据集。
+
+```python
+import time
+import pandas as pd
+import nltk as nltk
+from nltk.corpus import stopwords
+from nltk.sentiment.vader import SentimentIntensityAnalyzer
+nltk.download('vader_lexicon')
+
+# Load the filtered hotel reviews from CSV
+df = pd.read_csv('../../data/Hotel_Reviews_Filtered.csv')
+
+# You code will be added here
+
+
+# Finally remember to save the hotel reviews with new NLP data added
+print("Saving results to Hotel_Reviews_NLP.csv")
+df.to_csv(r'../data/Hotel_Reviews_NLP.csv', index = False)
+```
+
+### 删除停用词
+
+如果您对负面和正面评论列运行情感分析,可能需要很长时间。在一台性能强劲的测试笔记本电脑上测试时,根据使用的情感分析库不同,耗时为12到14分钟。这是一个(相对)较长的时间,因此值得研究是否可以加快速度。
+
+删除停用词(即不会改变句子情感的常见英语单词)是第一步。通过删除它们,情感分析应该会运行得更快,但不会降低准确性(因为停用词不会影响情感,但会减慢分析速度)。
+
+最长的负面评论有395个单词,但删除停用词后仅剩195个单词。
+
+删除停用词也是一个快速操作,在测试设备上,从2个评论列中删除515,000行的停用词耗时3.3秒。根据您的设备CPU速度、内存、是否有SSD以及其他一些因素,这个时间可能略长或略短。操作相对较短,这意味着如果它能提高情感分析速度,那么值得一试。
+
+```python
+from nltk.corpus import stopwords
+
+# Load the hotel reviews from CSV
+df = pd.read_csv("../../data/Hotel_Reviews_Filtered.csv")
+
+# Remove stop words - can be slow for a lot of text!
+# Ryan Han (ryanxjhan on Kaggle) has a great post measuring performance of different stop words removal approaches
+# https://www.kaggle.com/ryanxjhan/fast-stop-words-removal # using the approach that Ryan recommends
+start = time.time()
+cache = set(stopwords.words("english"))
+def remove_stopwords(review):
+ text = " ".join([word for word in review.split() if word not in cache])
+ return text
+
+# Remove the stop words from both columns
+df.Negative_Review = df.Negative_Review.apply(remove_stopwords)
+df.Positive_Review = df.Positive_Review.apply(remove_stopwords)
+```
+
+### 执行情感分析
+
+现在,您应该计算负面和正面评论列的情感分析,并将结果存储在2个新列中。情感分析的测试是将其与同一评论的评论者评分进行比较。例如,如果情感分析认为负面评论的情感为1(极其正面的情感),正面评论的情感也为1,但评论者给酒店的评分是最低分,那么要么评论文本与评分不匹配,要么情感分析器无法正确识别情感。您应该预期某些情感评分完全错误,这通常是可以解释的,例如评论可能极具讽刺意味,“当然,我*喜欢*住在没有暖气的房间里”,情感分析器可能认为这是正面情感,但人类阅读时会知道这是讽刺。
+NLTK 提供了不同的情感分析器供学习使用,您可以替换它们并查看情感分析的准确性是否有所不同。这里使用的是 VADER 情感分析。
+
+> Hutto, C.J. & Gilbert, E.E. (2014). VADER: 一种简洁的基于规则的社交媒体文本情感分析模型。第八届国际博客与社交媒体会议 (ICWSM-14)。美国密歇根州安娜堡,2014年6月。
+
+```python
+from nltk.sentiment.vader import SentimentIntensityAnalyzer
+
+# Create the vader sentiment analyser (there are others in NLTK you can try too)
+vader_sentiment = SentimentIntensityAnalyzer()
+# Hutto, C.J. & Gilbert, E.E. (2014). VADER: A Parsimonious Rule-based Model for Sentiment Analysis of Social Media Text. Eighth International Conference on Weblogs and Social Media (ICWSM-14). Ann Arbor, MI, June 2014.
+
+# There are 3 possibilities of input for a review:
+# It could be "No Negative", in which case, return 0
+# It could be "No Positive", in which case, return 0
+# It could be a review, in which case calculate the sentiment
+def calc_sentiment(review):
+ if review == "No Negative" or review == "No Positive":
+ return 0
+ return vader_sentiment.polarity_scores(review)["compound"]
+```
+
+在程序中,当您准备计算情感时,可以将其应用到每条评论,如下所示:
+
+```python
+# Add a negative sentiment and positive sentiment column
+print("Calculating sentiment columns for both positive and negative reviews")
+start = time.time()
+df["Negative_Sentiment"] = df.Negative_Review.apply(calc_sentiment)
+df["Positive_Sentiment"] = df.Positive_Review.apply(calc_sentiment)
+end = time.time()
+print("Calculating sentiment took " + str(round(end - start, 2)) + " seconds")
+```
+
+在我的电脑上大约需要 120 秒,但每台电脑的运行时间会有所不同。如果您想打印结果并查看情感是否与评论匹配:
+
+```python
+df = df.sort_values(by=["Negative_Sentiment"], ascending=True)
+print(df[["Negative_Review", "Negative_Sentiment"]])
+df = df.sort_values(by=["Positive_Sentiment"], ascending=True)
+print(df[["Positive_Review", "Positive_Sentiment"]])
+```
+
+在挑战中使用文件之前,最后要做的事情就是保存它!您还应该考虑重新排列所有新列,使其更易于操作(对人类来说,这只是一个外观上的调整)。
+
+```python
+# Reorder the columns (This is cosmetic, but to make it easier to explore the data later)
+df = df.reindex(["Hotel_Name", "Hotel_Address", "Total_Number_of_Reviews", "Average_Score", "Reviewer_Score", "Negative_Sentiment", "Positive_Sentiment", "Reviewer_Nationality", "Leisure_trip", "Couple", "Solo_traveler", "Business_trip", "Group", "Family_with_young_children", "Family_with_older_children", "With_a_pet", "Negative_Review", "Positive_Review"], axis=1)
+
+print("Saving results to Hotel_Reviews_NLP.csv")
+df.to_csv(r"../data/Hotel_Reviews_NLP.csv", index = False)
+```
+
+您应该运行 [分析笔记本](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb) 的完整代码(在运行 [过滤笔记本](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) 生成 Hotel_Reviews_Filtered.csv 文件之后)。
+
+回顾一下,步骤如下:
+
+1. 原始数据集文件 **Hotel_Reviews.csv** 在上一课中通过 [探索笔记本](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/4-Hotel-Reviews-1/solution/notebook.ipynb) 进行了探索。
+2. Hotel_Reviews.csv 通过 [过滤笔记本](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb) 过滤,生成 **Hotel_Reviews_Filtered.csv**。
+3. Hotel_Reviews_Filtered.csv 通过 [情感分析笔记本](https://github.com/microsoft/ML-For-Beginners/blob/main/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb) 处理,生成 **Hotel_Reviews_NLP.csv**。
+4. 在下面的 NLP 挑战中使用 Hotel_Reviews_NLP.csv。
+
+### 结论
+
+在开始时,您有一个包含列和数据的数据集,但并非所有数据都可以验证或使用。您已经探索了数据,过滤掉了不需要的部分,将标签转换为有用的内容,计算了自己的平均值,添加了一些情感列,并希望学到了一些关于处理自然文本的有趣知识。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 挑战
+
+现在您已经对数据集进行了情感分析,试着使用您在本课程中学到的策略(例如聚类)来确定情感的模式。
+
+## 复习与自学
+
+学习 [这个模块](https://docs.microsoft.com/en-us/learn/modules/classify-user-feedback-with-the-text-analytics-api/?WT.mc_id=academic-77952-leestott),了解更多内容并使用不同的工具探索文本中的情感。
+
+## 作业
+
+[尝试一个不同的数据集](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/assignment.md b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/assignment.md
new file mode 100644
index 000000000..c6ef325c0
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/assignment.md
@@ -0,0 +1,16 @@
+# 尝试不同的数据集
+
+## 说明
+
+现在您已经了解了如何使用 NLTK 为文本分配情感,尝试使用一个不同的数据集。您可能需要对数据进行一些处理,因此请创建一个笔记本并记录您的思考过程。您发现了什么?
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | ----------------------------------------------------------------------------------------------------------- | -------------------------------------- | ---------------------- |
+| | 提供了完整的笔记本和数据集,并通过详细的单元格记录了如何分配情感的过程 | 笔记本缺乏良好的解释 | 笔记本存在缺陷 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/notebook.ipynb b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/notebook.ipynb
new file mode 100644
index 000000000..e69de29bb
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb
new file mode 100644
index 000000000..6cb58d031
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb
@@ -0,0 +1,172 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 4,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "033cb89c85500224b3c63fd04f49b4aa",
+ "translation_date": "2025-09-03T20:58:26+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/1-notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import pandas as pd\n",
+ "import time\n",
+ "import ast"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def replace_address(row):\n",
+ " if \"Netherlands\" in row[\"Hotel_Address\"]:\n",
+ " return \"Amsterdam, Netherlands\"\n",
+ " elif \"Barcelona\" in row[\"Hotel_Address\"]:\n",
+ " return \"Barcelona, Spain\"\n",
+ " elif \"United Kingdom\" in row[\"Hotel_Address\"]:\n",
+ " return \"London, United Kingdom\"\n",
+ " elif \"Milan\" in row[\"Hotel_Address\"]: \n",
+ " return \"Milan, Italy\"\n",
+ " elif \"France\" in row[\"Hotel_Address\"]:\n",
+ " return \"Paris, France\"\n",
+ " elif \"Vienna\" in row[\"Hotel_Address\"]:\n",
+ " return \"Vienna, Austria\" \n",
+ " else:\n",
+ " return row.Hotel_Address\n",
+ " "
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Load the hotel reviews from CSV\n",
+ "start = time.time()\n",
+ "df = pd.read_csv('../../data/Hotel_Reviews.csv')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# dropping columns we will not use:\n",
+ "df.drop([\"lat\", \"lng\"], axis = 1, inplace=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Replace all the addresses with a shortened, more useful form\n",
+ "df[\"Hotel_Address\"] = df.apply(replace_address, axis = 1)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Drop `Additional_Number_of_Scoring`\n",
+ "df.drop([\"Additional_Number_of_Scoring\"], axis = 1, inplace=True)\n",
+ "# Replace `Total_Number_of_Reviews` and `Average_Score` with our own calculated values\n",
+ "df.Total_Number_of_Reviews = df.groupby('Hotel_Name').transform('count')\n",
+ "df.Average_Score = round(df.groupby('Hotel_Name').Reviewer_Score.transform('mean'), 1)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Process the Tags into new columns\n",
+ "# The file Hotel_Reviews_Tags.py, identifies the most important tags\n",
+ "# Leisure trip, Couple, Solo traveler, Business trip, Group combined with Travelers with friends, \n",
+ "# Family with young children, Family with older children, With a pet\n",
+ "df[\"Leisure_trip\"] = df.Tags.apply(lambda tag: 1 if \"Leisure trip\" in tag else 0)\n",
+ "df[\"Couple\"] = df.Tags.apply(lambda tag: 1 if \"Couple\" in tag else 0)\n",
+ "df[\"Solo_traveler\"] = df.Tags.apply(lambda tag: 1 if \"Solo traveler\" in tag else 0)\n",
+ "df[\"Business_trip\"] = df.Tags.apply(lambda tag: 1 if \"Business trip\" in tag else 0)\n",
+ "df[\"Group\"] = df.Tags.apply(lambda tag: 1 if \"Group\" in tag or \"Travelers with friends\" in tag else 0)\n",
+ "df[\"Family_with_young_children\"] = df.Tags.apply(lambda tag: 1 if \"Family with young children\" in tag else 0)\n",
+ "df[\"Family_with_older_children\"] = df.Tags.apply(lambda tag: 1 if \"Family with older children\" in tag else 0)\n",
+ "df[\"With_a_pet\"] = df.Tags.apply(lambda tag: 1 if \"With a pet\" in tag else 0)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# No longer need any of these columns\n",
+ "df.drop([\"Review_Date\", \"Review_Total_Negative_Word_Counts\", \"Review_Total_Positive_Word_Counts\", \"days_since_review\", \"Total_Number_of_Reviews_Reviewer_Has_Given\"], axis = 1, inplace=True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Saving results to Hotel_Reviews_Filtered.csv\n",
+ "Filtering took 23.74 seconds\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Saving new data file with calculated columns\n",
+ "print(\"Saving results to Hotel_Reviews_Filtered.csv\")\n",
+ "df.to_csv(r'../../data/Hotel_Reviews_Filtered.csv', index = False)\n",
+ "end = time.time()\n",
+ "print(\"Filtering took \" + str(round(end - start, 2)) + \" seconds\")\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/2-notebook.ipynb b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/2-notebook.ipynb
new file mode 100644
index 000000000..a7e0b93fd
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/2-notebook.ipynb
@@ -0,0 +1,137 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 4,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "341efc86325ec2a214f682f57a189dfd",
+ "translation_date": "2025-09-03T20:58:43+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/2-notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Load the hotel reviews from CSV (you can )\n",
+ "import pandas as pd \n",
+ "\n",
+ "df = pd.read_csv('../../data/Hotel_Reviews_Filtered.csv')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# We want to find the most useful tags to keep\n",
+ "# Remove opening and closing brackets\n",
+ "df.Tags = df.Tags.str.strip(\"[']\")\n",
+ "# remove all quotes too\n",
+ "df.Tags = df.Tags.str.replace(\" ', '\", \",\", regex = False)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# removing this to take advantage of the 'already a phrase' fact of the dataset \n",
+ "# Now split the strings into a list\n",
+ "tag_list_df = df.Tags.str.split(',', expand = True)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Remove leading and trailing spaces\n",
+ "df[\"Tag_1\"] = tag_list_df[0].str.strip()\n",
+ "df[\"Tag_2\"] = tag_list_df[1].str.strip()\n",
+ "df[\"Tag_3\"] = tag_list_df[2].str.strip()\n",
+ "df[\"Tag_4\"] = tag_list_df[3].str.strip()\n",
+ "df[\"Tag_5\"] = tag_list_df[4].str.strip()\n",
+ "df[\"Tag_6\"] = tag_list_df[5].str.strip()\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Merge the 6 columns into one with melt\n",
+ "df_tags = df.melt(value_vars=[\"Tag_1\", \"Tag_2\", \"Tag_3\", \"Tag_4\", \"Tag_5\", \"Tag_6\"])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "The shape of the tags with no filtering: (2514684, 2)\n",
+ " index count\n",
+ "0 Leisure trip 338423\n",
+ "1 Couple 205305\n",
+ "2 Solo traveler 89779\n",
+ "3 Business trip 68176\n",
+ "4 Group 51593\n",
+ "5 Family with young children 49318\n",
+ "6 Family with older children 21509\n",
+ "7 Travelers with friends 1610\n",
+ "8 With a pet 1078\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Get the value counts\n",
+ "tag_vc = df_tags.value.value_counts()\n",
+ "# print(tag_vc)\n",
+ "print(\"The shape of the tags with no filtering:\", str(df_tags.shape))\n",
+ "# Drop rooms, suites, and length of stay, mobile device and anything with less count than a 1000\n",
+ "df_tags = df_tags[~df_tags.value.str.contains(\"Standard|room|Stayed|device|Beds|Suite|Studio|King|Superior|Double\", na=False, case=False)]\n",
+ "tag_vc = df_tags.value.value_counts().reset_index(name=\"count\").query(\"count > 1000\")\n",
+ "# Print the top 10 (there should only be 9 and we'll use these in the filtering section)\n",
+ "print(tag_vc[:10])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb
new file mode 100644
index 000000000..0107825cc
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb
@@ -0,0 +1,260 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 4,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "705bf02633759f689abc37b19749a16d",
+ "translation_date": "2025-09-03T20:58:59+00:00",
+ "source_file": "6-NLP/5-Hotel-Reviews-2/solution/3-notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "[nltk_data] Downloading package vader_lexicon to\n[nltk_data] /Users/jenlooper/nltk_data...\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ],
+ "source": [
+ "import time\n",
+ "import pandas as pd\n",
+ "import nltk as nltk\n",
+ "from nltk.corpus import stopwords\n",
+ "from nltk.sentiment.vader import SentimentIntensityAnalyzer\n",
+ "nltk.download('vader_lexicon')\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "vader_sentiment = SentimentIntensityAnalyzer()\n",
+ "\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# There are 3 possibilities of input for a review:\n",
+ "# It could be \"No Negative\", in which case, return 0\n",
+ "# It could be \"No Positive\", in which case, return 0\n",
+ "# It could be a review, in which case calculate the sentiment\n",
+ "def calc_sentiment(review): \n",
+ " if review == \"No Negative\" or review == \"No Positive\":\n",
+ " return 0\n",
+ " return vader_sentiment.polarity_scores(review)[\"compound\"] \n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Load the hotel reviews from CSV\n",
+ "df = pd.read_csv(\"../../data/Hotel_Reviews_Filtered.csv\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Remove stop words - can be slow for a lot of text!\n",
+ "# Ryan Han (ryanxjhan on Kaggle) has a great post measuring performance of different stop words removal approaches\n",
+ "# https://www.kaggle.com/ryanxjhan/fast-stop-words-removal # using the approach that Ryan recommends\n",
+ "start = time.time()\n",
+ "cache = set(stopwords.words(\"english\"))\n",
+ "def remove_stopwords(review):\n",
+ " text = \" \".join([word for word in review.split() if word not in cache])\n",
+ " return text\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Remove the stop words from both columns\n",
+ "df.Negative_Review = df.Negative_Review.apply(remove_stopwords) \n",
+ "df.Positive_Review = df.Positive_Review.apply(remove_stopwords)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Removing stop words took 5.77 seconds\n"
+ ]
+ }
+ ],
+ "source": [
+ "end = time.time()\n",
+ "print(\"Removing stop words took \" + str(round(end - start, 2)) + \" seconds\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Calculating sentiment columns for both positive and negative reviews\n",
+ "Calculating sentiment took 201.07 seconds\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Add a negative sentiment and positive sentiment column\n",
+ "print(\"Calculating sentiment columns for both positive and negative reviews\")\n",
+ "start = time.time()\n",
+ "df[\"Negative_Sentiment\"] = df.Negative_Review.apply(calc_sentiment)\n",
+ "df[\"Positive_Sentiment\"] = df.Positive_Review.apply(calc_sentiment)\n",
+ "end = time.time()\n",
+ "print(\"Calculating sentiment took \" + str(round(end - start, 2)) + \" seconds\")\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " Negative_Review Negative_Sentiment\n",
+ "186584 So bad experience memories I hotel The first n... -0.9920\n",
+ "129503 First charged twice room booked booking second... -0.9896\n",
+ "307286 The staff Had bad experience even booking Janu... -0.9889\n",
+ "452092 No WLAN room Incredibly rude restaurant staff ... -0.9884\n",
+ "201293 We usually traveling Paris 2 3 times year busi... -0.9873\n",
+ "... ... ...\n",
+ "26899 I would say however one night expensive even d... 0.9933\n",
+ "138365 Wifi terribly slow I speed test network upload... 0.9938\n",
+ "79215 I find anything hotel first I walked past hote... 0.9938\n",
+ "278506 The property great location There bakery next ... 0.9945\n",
+ "339189 Guys I like hotel I wish return next year Howe... 0.9948\n",
+ "\n",
+ "[515738 rows x 2 columns]\n",
+ " Positive_Review Positive_Sentiment\n",
+ "137893 Bathroom Shower We going stay twice hotel 2 ni... -0.9820\n",
+ "5839 I completely disappointed mad since reception ... -0.9780\n",
+ "64158 get everything extra internet parking breakfas... -0.9751\n",
+ "124178 I didnt like anythig Room small Asked upgrade ... -0.9721\n",
+ "489137 Very rude manager abusive staff reception Dirt... -0.9703\n",
+ "... ... ...\n",
+ "331570 Everything This recently renovated hotel class... 0.9984\n",
+ "322920 From moment stepped doors Guesthouse Hotel sta... 0.9985\n",
+ "293710 This place surprise expected good actually gre... 0.9985\n",
+ "417442 We celebrated wedding night Langham I commend ... 0.9985\n",
+ "132492 We arrived super cute boutique hotel area expl... 0.9987\n",
+ "\n",
+ "[515738 rows x 2 columns]\n"
+ ]
+ }
+ ],
+ "source": [
+ "df = df.sort_values(by=[\"Negative_Sentiment\"], ascending=True)\n",
+ "print(df[[\"Negative_Review\", \"Negative_Sentiment\"]])\n",
+ "df = df.sort_values(by=[\"Positive_Sentiment\"], ascending=True)\n",
+ "print(df[[\"Positive_Review\", \"Positive_Sentiment\"]])\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Reorder the columns (This is cosmetic, but to make it easier to explore the data later)\n",
+ "df = df.reindex([\"Hotel_Name\", \"Hotel_Address\", \"Total_Number_of_Reviews\", \"Average_Score\", \"Reviewer_Score\", \"Negative_Sentiment\", \"Positive_Sentiment\", \"Reviewer_Nationality\", \"Leisure_trip\", \"Couple\", \"Solo_traveler\", \"Business_trip\", \"Group\", \"Family_with_young_children\", \"Family_with_older_children\", \"With_a_pet\", \"Negative_Review\", \"Positive_Review\"], axis=1)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Saving results to Hotel_Reviews_NLP.csv\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(\"Saving results to Hotel_Reviews_NLP.csv\")\n",
+ "df.to_csv(r\"../../data/Hotel_Reviews_NLP.csv\", index = False)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md
new file mode 100644
index 000000000..c443a00dd
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/R/README.md b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/R/README.md
new file mode 100644
index 000000000..2b73ba091
--- /dev/null
+++ b/translations/zh-CN/6-NLP/5-Hotel-Reviews-2/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/README.md b/translations/zh-CN/6-NLP/README.md
new file mode 100644
index 000000000..7418af484
--- /dev/null
+++ b/translations/zh-CN/6-NLP/README.md
@@ -0,0 +1,29 @@
+# 开始学习自然语言处理
+
+自然语言处理(NLP)是指计算机程序理解人类语言(包括口语和书面语)的能力——即所谓的自然语言。它是人工智能(AI)的一个组成部分。自然语言处理已有超过50年的历史,其根源可以追溯到语言学领域。整个领域的目标是帮助机器理解和处理人类语言。这项技术可以用于执行诸如拼写检查或机器翻译等任务。它在许多领域都有实际应用,包括医学研究、搜索引擎和商业智能。
+
+## 地区主题:欧洲语言文学与浪漫酒店 ❤️
+
+在本课程的这一部分中,您将了解机器学习最广泛的应用之一:自然语言处理(NLP)。这一人工智能类别源于计算语言学,是通过语音或文本交流连接人类与机器的桥梁。
+
+在这些课程中,我们将通过构建小型对话机器人来学习NLP的基础知识,了解机器学习如何帮助使这些对话变得越来越“智能”。您将穿越时光,与简·奥斯汀1813年出版的经典小说《傲慢与偏见》中的伊丽莎白·班内特和达西先生进行对话。随后,您将通过学习欧洲酒店评论中的情感分析进一步加深知识。
+
+
+> 图片由 Elaine Howlin 提供,来自 Unsplash
+
+## 课程
+
+1. [自然语言处理简介](1-Introduction-to-NLP/README.md)
+2. [常见的NLP任务与技术](2-Tasks/README.md)
+3. [机器学习中的翻译与情感分析](3-Translation-Sentiment/README.md)
+4. [准备您的数据](4-Hotel-Reviews-1/README.md)
+5. [使用NLTK进行情感分析](5-Hotel-Reviews-2/README.md)
+
+## 致谢
+
+这些自然语言处理课程由 [Stephen Howell](https://twitter.com/Howell_MSFT) 倾情创作 ☕
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/6-NLP/data/README.md b/translations/zh-CN/6-NLP/data/README.md
new file mode 100644
index 000000000..444e947cc
--- /dev/null
+++ b/translations/zh-CN/6-NLP/data/README.md
@@ -0,0 +1,6 @@
+将酒店评论数据下载到此文件夹。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/1-Introduction/README.md b/translations/zh-CN/7-TimeSeries/1-Introduction/README.md
new file mode 100644
index 000000000..f88e67406
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/1-Introduction/README.md
@@ -0,0 +1,190 @@
+# 时间序列预测简介
+
+
+
+> 草图由 [Tomomi Imura](https://www.twitter.com/girlie_mac) 绘制
+
+在本课及接下来的课程中,你将学习一些关于时间序列预测的知识。这是机器学习科学家技能库中一个有趣且有价值的部分,虽然它的知名度可能不如其他主题。时间序列预测就像一种“水晶球”:基于某个变量(如价格)的过去表现,你可以预测其未来的潜在价值。
+
+[](https://youtu.be/cBojo1hsHiI "时间序列预测简介")
+
+> 🎥 点击上方图片观看关于时间序列预测的视频
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+时间序列预测是一个有用且有趣的领域,对商业具有实际价值,因为它可以直接应用于定价、库存和供应链问题。虽然深度学习技术开始被用于更深入地预测未来表现,但时间序列预测仍然是一个主要由经典机器学习技术驱动的领域。
+
+> 宾夕法尼亚州立大学的时间序列课程可以在 [这里](https://online.stat.psu.edu/stat510/lesson/1) 找到
+
+## 简介
+
+假设你维护了一组智能停车计时器,这些计时器提供关于它们使用频率和使用时长的数据。
+
+> 如果你能根据计时器的过去表现,预测其未来价值,并结合供需规律,会怎么样?
+
+准确预测何时采取行动以实现目标是一个挑战,可以通过时间序列预测来解决。虽然在繁忙时段提高停车费可能会让人们不高兴,但这确实是一个增加收入以清洁街道的有效方法!
+
+让我们探索一些时间序列算法,并开始一个笔记本来清理和准备一些数据。你将分析的数据来自 GEFCom2014 预测竞赛,包括 2012 年至 2014 年间 3 年的每小时电力负载和温度值。根据电力负载和温度的历史模式,你可以预测电力负载的未来值。
+
+在这个例子中,你将学习如何仅使用历史负载数据预测一个时间步长的未来值。然而,在开始之前,了解背后的原理是很有帮助的。
+
+## 一些定义
+
+当遇到“时间序列”这个术语时,你需要理解它在不同上下文中的使用。
+
+🎓 **时间序列**
+
+在数学中,“时间序列是一系列按时间顺序索引(或列出或绘制)的数据点。最常见的是,时间序列是在连续且等间隔的时间点上获取的序列。” 时间序列的一个例子是 [道琼斯工业平均指数](https://wikipedia.org/wiki/Time_series) 的每日收盘值。时间序列图和统计建模的使用在信号处理、天气预测、地震预测以及其他事件发生并可以随时间绘制数据点的领域中经常出现。
+
+🎓 **时间序列分析**
+
+时间序列分析是对上述时间序列数据的分析。时间序列数据可以采取不同的形式,包括“中断时间序列”,它检测时间序列在中断事件前后演变的模式。所需的时间序列分析类型取决于数据的性质。时间序列数据本身可以是数字或字符序列。
+
+分析使用了多种方法,包括频域和时域、线性和非线性等。[了解更多](https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm) 关于分析这种数据的多种方法。
+
+🎓 **时间序列预测**
+
+时间序列预测是使用模型根据过去收集的数据所显示的模式预测未来值。虽然可以使用回归模型来探索时间序列数据,并将时间索引作为图上的 x 变量,但这种数据最好使用特殊类型的模型进行分析。
+
+时间序列数据是一个有序的观察值列表,与可以通过线性回归分析的数据不同。最常见的模型是 ARIMA,它是“自回归积分移动平均”的缩写。
+
+[ARIMA 模型](https://online.stat.psu.edu/stat510/lesson/1/1.1) “将序列的当前值与过去的值和过去的预测误差联系起来。” 它们最适合分析时间域数据,即数据按时间顺序排列。
+
+> ARIMA 模型有多种类型,你可以在 [这里](https://people.duke.edu/~rnau/411arim.htm) 学习这些类型,并将在下一课中涉及。
+
+在下一课中,你将使用 [单变量时间序列](https://itl.nist.gov/div898/handbook/pmc/section4/pmc44.htm) 构建一个 ARIMA 模型,该模型专注于一个随时间变化的变量。这种数据的一个例子是 [这个数据集](https://itl.nist.gov/div898/handbook/pmc/section4/pmc4411.htm),记录了 Mauna Loa 天文台的每月 CO2 浓度:
+
+| CO2 | YearMonth | Year | Month |
+| :-----: | :-------: | :---: | :---: |
+| 330.62 | 1975.04 | 1975 | 1 |
+| 331.40 | 1975.13 | 1975 | 2 |
+| 331.87 | 1975.21 | 1975 | 3 |
+| 333.18 | 1975.29 | 1975 | 4 |
+| 333.92 | 1975.38 | 1975 | 5 |
+| 333.43 | 1975.46 | 1975 | 6 |
+| 331.85 | 1975.54 | 1975 | 7 |
+| 330.01 | 1975.63 | 1975 | 8 |
+| 328.51 | 1975.71 | 1975 | 9 |
+| 328.41 | 1975.79 | 1975 | 10 |
+| 329.25 | 1975.88 | 1975 | 11 |
+| 330.97 | 1975.96 | 1975 | 12 |
+
+✅ 在这个数据集中,识别随时间变化的变量
+
+## 时间序列数据需要考虑的特性
+
+观察时间序列数据时,你可能会注意到它具有一些 [特性](https://online.stat.psu.edu/stat510/lesson/1/1.1),需要考虑并减轻这些特性以更好地理解其模式。如果你将时间序列数据视为一个你想要分析的“信号”,这些特性可以被视为“噪声”。你通常需要通过使用一些统计技术来减少这些“噪声”。
+
+以下是一些你需要了解的概念,以便能够处理时间序列:
+
+🎓 **趋势**
+
+趋势是指随时间可测量的增长和下降。[阅读更多](https://machinelearningmastery.com/time-series-trends-in-python)。在时间序列的背景下,这涉及如何使用以及(如果必要)移除时间序列中的趋势。
+
+🎓 **[季节性](https://machinelearningmastery.com/time-series-seasonality-with-python/)**
+
+季节性是指周期性波动,例如节假日的销售高峰。[看看](https://itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm) 不同类型的图如何显示数据中的季节性。
+
+🎓 **异常值**
+
+异常值远离标准数据方差。
+
+🎓 **长期周期**
+
+独立于季节性,数据可能显示长期周期,例如持续超过一年的经济衰退。
+
+🎓 **恒定方差**
+
+随着时间的推移,某些数据显示恒定的波动,例如每天和夜间的能源使用量。
+
+🎓 **突变**
+
+数据可能显示突变,需要进一步分析。例如,由于 COVID 的突然停业导致数据发生变化。
+
+✅ 这里有一个 [时间序列图示例](https://www.kaggle.com/kashnitsky/topic-9-part-1-time-series-analysis-in-python),显示了几年来每日游戏内货币消费。你能在这些数据中识别出上述任何特性吗?
+
+
+
+## 练习 - 开始使用电力使用数据
+
+让我们开始创建一个时间序列模型,根据过去的使用情况预测未来的电力使用。
+
+> 本例中的数据来自 GEFCom2014 预测竞赛,包括 2012 年至 2014 年间 3 年的每小时电力负载和温度值。
+>
+> Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli 和 Rob J. Hyndman,“概率能源预测:全球能源预测竞赛 2014 及未来”,《国际预测杂志》,第 32 卷,第 3 期,第 896-913 页,2016 年 7 月至 9 月。
+
+1. 在本课的 `working` 文件夹中,打开 _notebook.ipynb_ 文件。首先添加一些库以帮助加载和可视化数据:
+
+ ```python
+ import os
+ import matplotlib.pyplot as plt
+ from common.utils import load_data
+ %matplotlib inline
+ ```
+
+ 注意,你正在使用包含的 `common` 文件夹中的文件,这些文件设置了你的环境并处理数据下载。
+
+2. 接下来,通过调用 `load_data()` 和 `head()` 检查数据作为数据框:
+
+ ```python
+ data_dir = './data'
+ energy = load_data(data_dir)[['load']]
+ energy.head()
+ ```
+
+ 你可以看到有两列分别表示日期和负载:
+
+ | | load |
+ | :-----------------: | :----: |
+ | 2012-01-01 00:00:00 | 2698.0 |
+ | 2012-01-01 01:00:00 | 2558.0 |
+ | 2012-01-01 02:00:00 | 2444.0 |
+ | 2012-01-01 03:00:00 | 2402.0 |
+ | 2012-01-01 04:00:00 | 2403.0 |
+
+3. 现在,通过调用 `plot()` 绘制数据:
+
+ ```python
+ energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+4. 现在,通过提供 `[起始日期]:[结束日期]` 模式的输入,绘制 2014 年 7 月的第一周:
+
+ ```python
+ energy['2014-07-01':'2014-07-07'].plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+ 一个漂亮的图!看看这些图,看看你是否能确定上述列出的任何特性。通过可视化数据,我们可以推测出什么?
+
+在下一课中,你将创建一个 ARIMA 模型来进行一些预测。
+
+---
+
+## 🚀挑战
+
+列出你能想到的所有可能受益于时间序列预测的行业和研究领域。你能想到这些技术在艺术领域的应用吗?在计量经济学、生态学、零售业、工业、金融领域呢?还有哪些领域?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+虽然我们不会在这里讨论,但有时会使用神经网络来增强经典的时间序列预测方法。[阅读更多](https://medium.com/microsoftazure/neural-networks-for-forecasting-financial-and-economic-time-series-6aca370ff412) 关于它们的内容。
+
+## 作业
+
+[可视化更多时间序列](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/1-Introduction/assignment.md b/translations/zh-CN/7-TimeSeries/1-Introduction/assignment.md
new file mode 100644
index 000000000..fe2260fd9
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/1-Introduction/assignment.md
@@ -0,0 +1,16 @@
+# 可视化更多时间序列
+
+## 说明
+
+你已经开始通过观察需要特殊建模的数据类型来学习时间序列预测。你已经可视化了一些关于能源的数据。现在,寻找一些其他可以从时间序列预测中受益的数据。找到三个例子(可以尝试 [Kaggle](https://kaggle.com) 和 [Azure Open Datasets](https://azure.microsoft.com/en-us/services/open-datasets/catalog/?WT.mc_id=academic-77952-leestott)),并创建一个笔记本来可视化这些数据。在笔记本中记录它们的任何特殊特性(季节性、突变或其他趋势)。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ----------------------------------------------------- | --------------------------------------------------- | ---------------------------------------------------------------------------------------- |
+| | 在笔记本中绘制并解释了三个数据集 | 在笔记本中绘制并解释了两个数据集 | 在笔记本中绘制或解释的数据集较少,或者所展示的数据不足 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/1-Introduction/solution/Julia/README.md b/translations/zh-CN/7-TimeSeries/1-Introduction/solution/Julia/README.md
new file mode 100644
index 000000000..b411dd85f
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/1-Introduction/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/1-Introduction/solution/R/README.md b/translations/zh-CN/7-TimeSeries/1-Introduction/solution/R/README.md
new file mode 100644
index 000000000..cc1524018
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/1-Introduction/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/1-Introduction/solution/notebook.ipynb b/translations/zh-CN/7-TimeSeries/1-Introduction/solution/notebook.ipynb
new file mode 100644
index 000000000..8ca393a09
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/1-Introduction/solution/notebook.ipynb
@@ -0,0 +1,168 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "在本笔记中,我们将演示如何:\n",
+ "- 设置本模块的时间序列数据\n",
+ "- 可视化数据\n",
+ "\n",
+ "本示例中的数据来自GEFCom2014预测竞赛。它包含2012年至2014年间3年的每小时电力负荷和温度值。\n",
+ "\n",
+ "陶宏、Pierre Pinson、Shu Fan、Hamidreza Zareipour、Alberto Troccoli和Rob J. Hyndman,“概率能源预测:全球能源预测竞赛2014及未来”,《国际预测期刊》,第32卷,第3期,页896-913,2016年7月至9月。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import matplotlib.pyplot as plt\n",
+ "from common.utils import load_data\n",
+ "%matplotlib inline"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "将数据从csv加载到Pandas数据框中\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ " load\n",
+ "2012-01-01 00:00:00 2698.0\n",
+ "2012-01-01 01:00:00 2558.0\n",
+ "2012-01-01 02:00:00 2444.0\n",
+ "2012-01-01 03:00:00 2402.0\n",
+ "2012-01-01 04:00:00 2403.0"
+ ],
+ "text/html": "\n\n
\n \n \n \n load \n \n \n \n \n 2012-01-01 00:00:00 \n 2698.0 \n \n \n 2012-01-01 01:00:00 \n 2558.0 \n \n \n 2012-01-01 02:00:00 \n 2444.0 \n \n \n 2012-01-01 03:00:00 \n 2402.0 \n \n \n 2012-01-01 04:00:00 \n 2403.0 \n \n \n
\n
"
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ],
+ "source": [
+ "data_dir = './data'\n",
+ "energy = load_data(data_dir)[['load']]\n",
+ "energy.head()\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "绘制所有可用的负载数据(2012年1月至2014年12月)\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "energy['2014-07-01':'2014-07-07'].plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernel_info": {
+ "name": "python3"
+ },
+ "kernelspec": {
+ "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "nteract": {
+ "version": "nteract-front-end@1.0.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "dddca9ad9e34435494e0933c218e1579",
+ "translation_date": "2025-09-03T19:56:28+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/1-Introduction/working/notebook.ipynb b/translations/zh-CN/7-TimeSeries/1-Introduction/working/notebook.ipynb
new file mode 100644
index 000000000..10da8cc34
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/1-Introduction/working/notebook.ipynb
@@ -0,0 +1,64 @@
+{
+ "cells": [
+ {
+ "source": [
+ "# 数据设置\n",
+ "\n",
+ "在本笔记中,我们将演示如何:\n",
+ "\n",
+ "设置本模块的时间序列数据 \n",
+ "可视化数据 \n",
+ "\n",
+ "本示例中的数据来自GEFCom2014预测竞赛。数据包括2012年至2014年间3年的每小时电力负荷和温度值。\n",
+ "\n",
+ "1陶宏、Pierre Pinson、Shu Fan、Hamidreza Zareipour、Alberto Troccoli和Rob J. Hyndman,“概率能源预测:全球能源预测竞赛2014及未来”,《国际预测期刊》,第32卷,第3期,页896-913,2016年7月至9月。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernel_info": {
+ "name": "python3"
+ },
+ "kernelspec": {
+ "name": "python37364bit8d3b438fb5fc4430a93ac2cb74d693a7",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "nteract": {
+ "version": "nteract-front-end@1.0.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "coopTranslator": {
+ "original_hash": "5e2bbe594906dce3aaaa736d6dac6683",
+ "translation_date": "2025-09-03T19:57:09+00:00",
+ "source_file": "7-TimeSeries/1-Introduction/working/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/2-ARIMA/README.md b/translations/zh-CN/7-TimeSeries/2-ARIMA/README.md
new file mode 100644
index 000000000..ee4a1d8d9
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/2-ARIMA/README.md
@@ -0,0 +1,398 @@
+# 使用 ARIMA 进行时间序列预测
+
+在上一节课中,您学习了一些关于时间序列预测的知识,并加载了一个显示电力负载随时间波动的数据集。
+
+[](https://youtu.be/IUSk-YDau10 "ARIMA 简介")
+
+> 🎥 点击上方图片观看视频:ARIMA 模型的简要介绍。示例使用 R 语言,但概念具有普适性。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 简介
+
+在本节课中,您将学习一种特定的方法来构建 [ARIMA: *A*uto*R*egressive *I*ntegrated *M*oving *A*verage](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average) 模型。ARIMA 模型特别适合拟合显示 [非平稳性](https://wikipedia.org/wiki/Stationary_process) 的数据。
+
+## 基本概念
+
+为了能够使用 ARIMA,您需要了解以下一些概念:
+
+- 🎓 **平稳性**。从统计学的角度来看,平稳性指的是分布在时间上不发生变化的数据。非平稳数据则由于趋势而出现波动,必须经过转换才能进行分析。例如,季节性可能会引入数据波动,可以通过“季节性差分”过程来消除。
+
+- 🎓 **[差分](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average#Differencing)**。差分数据是指从统计学角度将非平稳数据转换为平稳数据的过程,通过去除其非恒定趋势来实现。“差分消除了时间序列中的水平变化,消除了趋势和季节性,从而稳定了时间序列的均值。” [Shixiong 等人的论文](https://arxiv.org/abs/1904.07632)
+
+## ARIMA 在时间序列中的应用
+
+让我们拆解 ARIMA 的各个部分,以更好地理解它如何帮助我们对时间序列建模并进行预测。
+
+- **AR - 自回归**。顾名思义,自回归模型会“回溯”时间,分析数据中的先前值并对其进行假设。这些先前值称为“滞后”。例如,显示每月铅笔销售数据的时间序列。每个月的销售总额可以被视为数据集中的“演变变量”。该模型的构建方式是“将感兴趣的演变变量回归到其自身的滞后(即先前)值上。” [维基百科](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average)
+
+- **I - 积分**。与类似的“ARMA”模型不同,ARIMA 中的“I”指的是其 *[积分](https://wikipedia.org/wiki/Order_of_integration)* 特性。通过应用差分步骤来消除非平稳性,从而使数据“积分化”。
+
+- **MA - 移动平均**。该模型的 [移动平均](https://wikipedia.org/wiki/Moving-average_model) 部分指的是通过观察当前和过去的滞后值来确定输出变量。
+
+总结:ARIMA 用于使模型尽可能贴合时间序列数据的特殊形式。
+
+## 练习 - 构建 ARIMA 模型
+
+打开本节课中的 [_/working_](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA/working) 文件夹,找到 [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/7-TimeSeries/2-ARIMA/working/notebook.ipynb) 文件。
+
+1. 运行 notebook 加载 `statsmodels` Python 库;您将需要它来构建 ARIMA 模型。
+
+1. 加载必要的库。
+
+1. 接下来,加载一些用于绘制数据的库:
+
+ ```python
+ import os
+ import warnings
+ import matplotlib.pyplot as plt
+ import numpy as np
+ import pandas as pd
+ import datetime as dt
+ import math
+
+ from pandas.plotting import autocorrelation_plot
+ from statsmodels.tsa.statespace.sarimax import SARIMAX
+ from sklearn.preprocessing import MinMaxScaler
+ from common.utils import load_data, mape
+ from IPython.display import Image
+
+ %matplotlib inline
+ pd.options.display.float_format = '{:,.2f}'.format
+ np.set_printoptions(precision=2)
+ warnings.filterwarnings("ignore") # specify to ignore warning messages
+ ```
+
+1. 从 `/data/energy.csv` 文件中加载数据到 Pandas 数据框并查看:
+
+ ```python
+ energy = load_data('./data')[['load']]
+ energy.head(10)
+ ```
+
+1. 绘制 2012 年 1 月至 2014 年 12 月的所有可用能源数据。没有意外,因为我们在上一节课中已经看到过这些数据:
+
+ ```python
+ energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 现在,让我们构建一个模型!
+
+### 创建训练和测试数据集
+
+现在数据已加载,您可以将其分为训练集和测试集。您将在训练集上训练模型。与往常一样,模型训练完成后,您将使用测试集评估其准确性。您需要确保测试集覆盖的时间段晚于训练集,以确保模型不会从未来时间段中获取信息。
+
+1. 将 2014 年 9 月 1 日至 10 月 31 日的两个月分配给训练集。测试集将包括 2014 年 11 月 1 日至 12 月 31 日的两个月:
+
+ ```python
+ train_start_dt = '2014-11-01 00:00:00'
+ test_start_dt = '2014-12-30 00:00:00'
+ ```
+
+ 由于这些数据反映了每日能源消耗,因此存在强烈的季节性模式,但消耗与最近几天的消耗最为相似。
+
+1. 可视化差异:
+
+ ```python
+ energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)][['load']].rename(columns={'load':'train'}) \
+ .join(energy[test_start_dt:][['load']].rename(columns={'load':'test'}), how='outer') \
+ .plot(y=['train', 'test'], figsize=(15, 8), fontsize=12)
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+ 因此,使用一个相对较小的时间窗口来训练数据应该是足够的。
+
+ > 注意:由于我们用于拟合 ARIMA 模型的函数在拟合过程中使用了样本内验证,因此我们将省略验证数据。
+
+### 准备训练数据
+
+现在,您需要通过过滤和缩放数据来准备训练数据。过滤数据集以仅包含所需的时间段和列,并缩放数据以确保其投影在区间 0,1 内。
+
+1. 过滤原始数据集,仅包含每个集合中上述时间段以及所需的“load”列和日期:
+
+ ```python
+ train = energy.copy()[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']]
+ test = energy.copy()[energy.index >= test_start_dt][['load']]
+
+ print('Training data shape: ', train.shape)
+ print('Test data shape: ', test.shape)
+ ```
+
+ 您可以查看数据的形状:
+
+ ```output
+ Training data shape: (1416, 1)
+ Test data shape: (48, 1)
+ ```
+
+1. 将数据缩放到范围 (0, 1)。
+
+ ```python
+ scaler = MinMaxScaler()
+ train['load'] = scaler.fit_transform(train)
+ train.head(10)
+ ```
+
+1. 可视化原始数据与缩放数据:
+
+ ```python
+ energy[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']].rename(columns={'load':'original load'}).plot.hist(bins=100, fontsize=12)
+ train.rename(columns={'load':'scaled load'}).plot.hist(bins=100, fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+ > 原始数据
+
+ 
+
+ > 缩放数据
+
+1. 现在您已经校准了缩放数据,可以对测试数据进行缩放:
+
+ ```python
+ test['load'] = scaler.transform(test)
+ test.head()
+ ```
+
+### 实现 ARIMA
+
+现在是时候实现 ARIMA 了!您将使用之前安装的 `statsmodels` 库。
+
+接下来需要遵循几个步骤:
+
+ 1. 通过调用 `SARIMAX()` 并传入模型参数:p、d 和 q 参数,以及 P、D 和 Q 参数来定义模型。
+ 2. 通过调用 `fit()` 函数为训练数据准备模型。
+ 3. 通过调用 `forecast()` 函数并指定预测步数(即预测的时间范围)来进行预测。
+
+> 🎓 这些参数的作用是什么?在 ARIMA 模型中,有 3 个参数用于帮助建模时间序列的主要方面:季节性、趋势和噪声。这些参数是:
+
+`p`:与模型的自回归部分相关的参数,包含 *过去* 的值。
+`d`:与模型的积分部分相关的参数,影响应用于时间序列的 *差分*(🎓 记得差分 👆?)。
+`q`:与模型的移动平均部分相关的参数。
+
+> 注意:如果您的数据具有季节性特征(例如本数据),我们使用季节性 ARIMA 模型(SARIMA)。在这种情况下,您需要使用另一组参数:`P`、`D` 和 `Q`,它们与 `p`、`d` 和 `q` 的关联相同,但对应于模型的季节性部分。
+
+1. 首先设置您偏好的时间范围值。我们尝试 3 小时:
+
+ ```python
+ # Specify the number of steps to forecast ahead
+ HORIZON = 3
+ print('Forecasting horizon:', HORIZON, 'hours')
+ ```
+
+ 为 ARIMA 模型选择最佳参数值可能具有挑战性,因为它有些主观且耗时。您可以考虑使用 [`pyramid` 库](https://alkaline-ml.com/pmdarima/0.9.0/modules/generated/pyramid.arima.auto_arima.html) 中的 `auto_arima()` 函数。
+
+1. 目前尝试一些手动选择以找到一个好的模型。
+
+ ```python
+ order = (4, 1, 0)
+ seasonal_order = (1, 1, 0, 24)
+
+ model = SARIMAX(endog=train, order=order, seasonal_order=seasonal_order)
+ results = model.fit()
+
+ print(results.summary())
+ ```
+
+ 打印出结果表。
+
+您已经构建了第一个模型!现在我们需要找到一种方法来评估它。
+
+### 评估您的模型
+
+为了评估您的模型,您可以执行所谓的 `逐步验证`。在实践中,每次有新数据可用时,时间序列模型都会重新训练。这使得模型能够在每个时间步进行最佳预测。
+
+使用此技术从时间序列的开头开始,在训练数据集上训练模型。然后对下一个时间步进行预测。预测结果与已知值进行评估。然后扩展训练集以包含已知值,并重复该过程。
+
+> 注意:为了更高效地训练,您应该保持训练集窗口固定,这样每次向训练集中添加新观测值时,您都会从集合的开头移除观测值。
+
+此过程提供了模型在实践中表现的更稳健估计。然而,这需要创建许多模型的计算成本。如果数据量较小或模型较简单,这是可以接受的,但在规模较大时可能会成为问题。
+
+逐步验证是时间序列模型评估的黄金标准,建议在您的项目中使用。
+
+1. 首先,为每个时间范围步创建一个测试数据点。
+
+ ```python
+ test_shifted = test.copy()
+
+ for t in range(1, HORIZON+1):
+ test_shifted['load+'+str(t)] = test_shifted['load'].shift(-t, freq='H')
+
+ test_shifted = test_shifted.dropna(how='any')
+ test_shifted.head(5)
+ ```
+
+ | | | load | load+1 | load+2 |
+ | ---------- | -------- | ---- | ------ | ------ |
+ | 2014-12-30 | 00:00:00 | 0.33 | 0.29 | 0.27 |
+ | 2014-12-30 | 01:00:00 | 0.29 | 0.27 | 0.27 |
+ | 2014-12-30 | 02:00:00 | 0.27 | 0.27 | 0.30 |
+ | 2014-12-30 | 03:00:00 | 0.27 | 0.30 | 0.41 |
+ | 2014-12-30 | 04:00:00 | 0.30 | 0.41 | 0.57 |
+
+ 数据根据其时间范围点水平移动。
+
+1. 使用滑动窗口方法对测试数据进行预测,循环大小为测试数据长度:
+
+ ```python
+ %%time
+ training_window = 720 # dedicate 30 days (720 hours) for training
+
+ train_ts = train['load']
+ test_ts = test_shifted
+
+ history = [x for x in train_ts]
+ history = history[(-training_window):]
+
+ predictions = list()
+
+ order = (2, 1, 0)
+ seasonal_order = (1, 1, 0, 24)
+
+ for t in range(test_ts.shape[0]):
+ model = SARIMAX(endog=history, order=order, seasonal_order=seasonal_order)
+ model_fit = model.fit()
+ yhat = model_fit.forecast(steps = HORIZON)
+ predictions.append(yhat)
+ obs = list(test_ts.iloc[t])
+ # move the training window
+ history.append(obs[0])
+ history.pop(0)
+ print(test_ts.index[t])
+ print(t+1, ': predicted =', yhat, 'expected =', obs)
+ ```
+
+ 您可以观察训练过程:
+
+ ```output
+ 2014-12-30 00:00:00
+ 1 : predicted = [0.32 0.29 0.28] expected = [0.32945389435989236, 0.2900626678603402, 0.2739480752014323]
+
+ 2014-12-30 01:00:00
+ 2 : predicted = [0.3 0.29 0.3 ] expected = [0.2900626678603402, 0.2739480752014323, 0.26812891674127126]
+
+ 2014-12-30 02:00:00
+ 3 : predicted = [0.27 0.28 0.32] expected = [0.2739480752014323, 0.26812891674127126, 0.3025962399283795]
+ ```
+
+1. 将预测结果与实际负载进行比较:
+
+ ```python
+ eval_df = pd.DataFrame(predictions, columns=['t+'+str(t) for t in range(1, HORIZON+1)])
+ eval_df['timestamp'] = test.index[0:len(test.index)-HORIZON+1]
+ eval_df = pd.melt(eval_df, id_vars='timestamp', value_name='prediction', var_name='h')
+ eval_df['actual'] = np.array(np.transpose(test_ts)).ravel()
+ eval_df[['prediction', 'actual']] = scaler.inverse_transform(eval_df[['prediction', 'actual']])
+ eval_df.head()
+ ```
+
+ 输出
+ | | | timestamp | h | prediction | actual |
+ | --- | ---------- | --------- | --- | ---------- | -------- |
+ | 0 | 2014-12-30 | 00:00:00 | t+1 | 3,008.74 | 3,023.00 |
+ | 1 | 2014-12-30 | 01:00:00 | t+1 | 2,955.53 | 2,935.00 |
+ | 2 | 2014-12-30 | 02:00:00 | t+1 | 2,900.17 | 2,899.00 |
+ | 3 | 2014-12-30 | 03:00:00 | t+1 | 2,917.69 | 2,886.00 |
+ | 4 | 2014-12-30 | 04:00:00 | t+1 | 2,946.99 | 2,963.00 |
+
+ 观察每小时数据的预测结果,与实际负载进行比较。准确性如何?
+
+### 检查模型准确性
+
+通过测试所有预测的平均绝对百分比误差 (MAPE) 来检查模型的准确性。
+> **🧮 展示数学公式**
+>
+> 
+>
+> [MAPE](https://www.linkedin.com/pulse/what-mape-mad-msd-time-series-allameh-statistics/) 用于以上述公式定义的比率显示预测准确性。实际值与预测值之间的差异除以实际值。
+>
+> “在此计算中,绝对值会对每个预测点进行求和,然后除以拟合点的数量 n。” [wikipedia](https://wikipedia.org/wiki/Mean_absolute_percentage_error)
+1. 用代码表示公式:
+
+ ```python
+ if(HORIZON > 1):
+ eval_df['APE'] = (eval_df['prediction'] - eval_df['actual']).abs() / eval_df['actual']
+ print(eval_df.groupby('h')['APE'].mean())
+ ```
+
+1. 计算单步预测的MAPE:
+
+ ```python
+ print('One step forecast MAPE: ', (mape(eval_df[eval_df['h'] == 't+1']['prediction'], eval_df[eval_df['h'] == 't+1']['actual']))*100, '%')
+ ```
+
+ 单步预测的MAPE:0.5570581332313952 %
+
+1. 打印多步预测的MAPE:
+
+ ```python
+ print('Multi-step forecast MAPE: ', mape(eval_df['prediction'], eval_df['actual'])*100, '%')
+ ```
+
+ ```output
+ Multi-step forecast MAPE: 1.1460048657704118 %
+ ```
+
+ 一个较低的数值是最好的:请注意,如果预测的MAPE为10,则表示误差为10%。
+
+1. 但正如往常一样,这种准确性测量通过可视化更容易理解,所以让我们绘制一下:
+
+ ```python
+ if(HORIZON == 1):
+ ## Plotting single step forecast
+ eval_df.plot(x='timestamp', y=['actual', 'prediction'], style=['r', 'b'], figsize=(15, 8))
+
+ else:
+ ## Plotting multi step forecast
+ plot_df = eval_df[(eval_df.h=='t+1')][['timestamp', 'actual']]
+ for t in range(1, HORIZON+1):
+ plot_df['t+'+str(t)] = eval_df[(eval_df.h=='t+'+str(t))]['prediction'].values
+
+ fig = plt.figure(figsize=(15, 8))
+ ax = plt.plot(plot_df['timestamp'], plot_df['actual'], color='red', linewidth=4.0)
+ ax = fig.add_subplot(111)
+ for t in range(1, HORIZON+1):
+ x = plot_df['timestamp'][(t-1):]
+ y = plot_df['t+'+str(t)][0:len(x)]
+ ax.plot(x, y, color='blue', linewidth=4*math.pow(.9,t), alpha=math.pow(0.8,t))
+
+ ax.legend(loc='best')
+
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+🏆 非常棒的图表,展示了一个具有良好准确性的模型。干得好!
+
+---
+
+## 🚀挑战
+
+深入研究测试时间序列模型准确性的方法。本课中我们提到了MAPE,但还有其他方法可以使用吗?研究它们并进行注释。可以参考[这份文档](https://otexts.com/fpp2/accuracy.html)。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+本课仅涉及ARIMA时间序列预测的基础知识。花些时间通过研究[这个仓库](https://microsoft.github.io/forecasting/)及其各种模型类型,深入了解其他构建时间序列模型的方法。
+
+## 作业
+
+[一个新的ARIMA模型](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/2-ARIMA/assignment.md b/translations/zh-CN/7-TimeSeries/2-ARIMA/assignment.md
new file mode 100644
index 000000000..67cfc44e3
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/2-ARIMA/assignment.md
@@ -0,0 +1,16 @@
+# 一个新的ARIMA模型
+
+## 说明
+
+现在您已经构建了一个ARIMA模型,请使用新的数据集构建一个新的模型(可以尝试使用[杜克大学的这些数据集](http://www2.stat.duke.edu/~mw/ts_data_sets.html))。在笔记本中记录您的工作,可视化数据和模型,并使用MAPE测试其准确性。
+
+## 评分标准
+
+| 标准 | 卓越 | 合格 | 需要改进 |
+| -------- | ------------------------------------------------------------------------------------------------------------------- | -------------------------------------------------------- | ----------------------------------- |
+| | 提交的笔记本中包含一个新的ARIMA模型,经过测试并通过可视化和准确性说明进行解释。 | 提交的笔记本未进行注释或存在错误 | 提交的笔记本不完整 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/Julia/README.md b/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/Julia/README.md
new file mode 100644
index 000000000..779236745
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/R/README.md b/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/R/README.md
new file mode 100644
index 000000000..ba3fc1469
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/notebook.ipynb b/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/notebook.ipynb
new file mode 100644
index 000000000..e0e051feb
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/2-ARIMA/solution/notebook.ipynb
@@ -0,0 +1,1135 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "source": [
+ "陶宏、Pierre Pinson、Shu Fan、Hamidreza Zareipour、Alberto Troccoli 和 Rob J. Hyndman,“概率能源预测:2014年全球能源预测竞赛及未来”,《国际预测期刊》,第32卷,第3期,页896-913,2016年7月至9月。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 安装依赖项\n",
+ "首先安装一些必要的依赖项。这些库及其对应版本已知可以正常运行解决方案:\n",
+ "\n",
+ "* `statsmodels == 0.12.2`\n",
+ "* `matplotlib == 3.4.2`\n",
+ "* `scikit-learn == 0.24.2`\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "source": [
+ "!pip install statsmodels"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "/bin/sh: pip: command not found\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "source": [
+ "import os\n",
+ "import warnings\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import datetime as dt\n",
+ "import math\n",
+ "\n",
+ "from pandas.plotting import autocorrelation_plot\n",
+ "from statsmodels.tsa.statespace.sarimax import SARIMAX\n",
+ "from sklearn.preprocessing import MinMaxScaler\n",
+ "from common.utils import load_data, mape\n",
+ "from IPython.display import Image\n",
+ "\n",
+ "%matplotlib inline\n",
+ "pd.options.display.float_format = '{:,.2f}'.format\n",
+ "np.set_printoptions(precision=2)\n",
+ "warnings.filterwarnings(\"ignore\") # specify to ignore warning messages\n"
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "source": [
+ "energy = load_data('./data')[['load']]\n",
+ "energy.head(10)"
+ ],
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-01-01 00:00:00 \n",
+ " 2,698.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 01:00:00 \n",
+ " 2,558.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 02:00:00 \n",
+ " 2,444.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 03:00:00 \n",
+ " 2,402.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 04:00:00 \n",
+ " 2,403.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 05:00:00 \n",
+ " 2,453.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 06:00:00 \n",
+ " 2,560.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 07:00:00 \n",
+ " 2,719.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 08:00:00 \n",
+ " 2,916.00 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 09:00:00 \n",
+ " 3,105.00 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2012-01-01 00:00:00 2,698.00\n",
+ "2012-01-01 01:00:00 2,558.00\n",
+ "2012-01-01 02:00:00 2,444.00\n",
+ "2012-01-01 03:00:00 2,402.00\n",
+ "2012-01-01 04:00:00 2,403.00\n",
+ "2012-01-01 05:00:00 2,453.00\n",
+ "2012-01-01 06:00:00 2,560.00\n",
+ "2012-01-01 07:00:00 2,719.00\n",
+ "2012-01-01 08:00:00 2,916.00\n",
+ "2012-01-01 09:00:00 3,105.00"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "绘制所有可用的负载数据(2012年1月至2014年12月)\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "source": [
+ "energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 创建训练和测试数据集\n",
+ "\n",
+ "### 数据集的重要性\n",
+ "在机器学习中,数据集是模型训练和评估的核心。一个好的数据集可以显著提高模型的性能,而一个不平衡或质量较差的数据集可能会导致模型表现不佳。\n",
+ "\n",
+ "### 划分数据集\n",
+ "通常情况下,数据集会被分为三个部分:\n",
+ "- **训练集**:用于训练模型。\n",
+ "- **验证集**:用于调整模型参数和选择最佳模型。\n",
+ "- **测试集**:用于评估模型的最终性能。\n",
+ "\n",
+ "### 如何划分数据集\n",
+ "以下是一些常见的划分比例:\n",
+ "- 训练集占 70%,验证集占 15%,测试集占 15%。\n",
+ "- 如果数据量较大,可以考虑训练集占 80%,验证集和测试集各占 10%。\n",
+ "\n",
+ "### 注意事项\n",
+ "- 确保数据集的分布一致,避免训练集和测试集之间的分布差异。\n",
+ "- 如果数据集较小,可以使用交叉验证来提高模型的可靠性。\n",
+ "\n",
+ "### 示例代码\n",
+ "以下是一个简单的代码示例,展示如何使用 @@INLINE_CODE_x@@ 划分数据集:\n",
+ "\n",
+ "```python\n",
+ "# 导入必要的库\n",
+ "import random\n",
+ "\n",
+ "# 定义数据集\n",
+ "data = ['样本1', '样本2', '样本3', '样本4', '样本5']\n",
+ "\n",
+ "# 随机打乱数据\n",
+ "random.shuffle(data)\n",
+ "\n",
+ "# 划分数据集\n",
+ "train_data = data[:3]\n",
+ "test_data = data[3:]\n",
+ "\n",
+ "print(\"训练集:\", train_data)\n",
+ "print(\"测试集:\", test_data)\n",
+ "```\n",
+ "\n",
+ "### [!TIP]\n",
+ "在实际项目中,使用库(如 @@INLINE_CODE_x@@ 或 @@INLINE_CODE_x@@)可以简化数据集划分的过程。\n",
+ "\n",
+ "### 总结\n",
+ "划分数据集是机器学习项目中的重要步骤。通过合理的划分,可以确保模型的训练和评估更加可靠,从而提高模型的实际应用效果。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "source": [
+ "train_start_dt = '2014-11-01 00:00:00'\n",
+ "test_start_dt = '2014-12-30 00:00:00' "
+ ],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "source": [
+ "energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)][['load']].rename(columns={'load':'train'}) \\\n",
+ " .join(energy[test_start_dt:][['load']].rename(columns={'load':'test'}), how='outer') \\\n",
+ " .plot(y=['train', 'test'], figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "source": [
+ "train = energy.copy()[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']]\n",
+ "test = energy.copy()[energy.index >= test_start_dt][['load']]\n",
+ "\n",
+ "print('Training data shape: ', train.shape)\n",
+ "print('Test data shape: ', test.shape)"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Training data shape: (1416, 1)\n",
+ "Test data shape: (48, 1)\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "source": [
+ "scaler = MinMaxScaler()\n",
+ "train['load'] = scaler.fit_transform(train)\n",
+ "train.head(10)"
+ ],
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2014-11-01 00:00:00 \n",
+ " 0.10 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 01:00:00 \n",
+ " 0.07 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 02:00:00 \n",
+ " 0.05 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 03:00:00 \n",
+ " 0.04 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 04:00:00 \n",
+ " 0.06 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 05:00:00 \n",
+ " 0.10 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 06:00:00 \n",
+ " 0.19 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 07:00:00 \n",
+ " 0.31 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 08:00:00 \n",
+ " 0.40 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 09:00:00 \n",
+ " 0.48 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2014-11-01 00:00:00 0.10\n",
+ "2014-11-01 01:00:00 0.07\n",
+ "2014-11-01 02:00:00 0.05\n",
+ "2014-11-01 03:00:00 0.04\n",
+ "2014-11-01 04:00:00 0.06\n",
+ "2014-11-01 05:00:00 0.10\n",
+ "2014-11-01 06:00:00 0.19\n",
+ "2014-11-01 07:00:00 0.31\n",
+ "2014-11-01 08:00:00 0.40\n",
+ "2014-11-01 09:00:00 0.48"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "原始数据与缩放数据:\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "source": [
+ "energy[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']].rename(columns={'load':'original load'}).plot.hist(bins=100, fontsize=12)\n",
+ "train.rename(columns={'load':'scaled load'}).plot.hist(bins=100, fontsize=12)\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "iVBORw0KGgoAAAANSUhEUgAAAYEAAAD7CAYAAACMlyg3AAAABHNCSVQICAgIfAhkiAAAAAlwSFlzAAALEgAACxIB0t1+/AAAADh0RVh0U29mdHdhcmUAbWF0cGxvdGxpYiB2ZXJzaW9uMy4xLjAsIGh0dHA6Ly9tYXRwbG90bGliLm9yZy+17YcXAAAZ+klEQVR4nO3df5BV5Z3n8fdHoOjwKyq0ZFYGOroRGBEDNNHEgJg4cUdXolIzi8YVzRiyZq1UyspkslZQRl3N7jBOyk00YWOUKJgfikw07tRIIok6M2rjChFtpSxFWX8UkAnQ/Ea/+8c5rZdL3+5z6T739u3zeVWdou95zjn3e56+fb8853nOcxQRmJlZMR1V7wDMzKx+nATMzArMScDMrMCcBMzMCsxJwMyswAbXO4BqjBkzJlpaWuodhplZQ1m7du3WiGjuqqyhkkBLSwttbW31DsPMrKFI2lSpzJeDzMwKzEnAzKzAnATMzAqsofoEzKz/OnDgAJs3b2bv3r31DqWwmpqaGDduHEOGDMm8j5OAmfWJzZs3M3LkSFpaWpBU73AKJyLYtm0bmzdv5qMf/Wjm/Xw5yMz6xN69exk9erQTQJ1IYvTo0VW3xJwEzKzPOAHU15HUv5OAmVmBuU/AzHLR8s1f9unxXvv2eX12rHPPPZcVK1Zw9NFHV9zmuuuuY/bs2Zx99tlVH3/NmjUsWbKEhx9+ONP6IzFnzhyWLFlCa2trr47jJGBWAKVfyH35ZdpoIoKI4JFHHulx2xtuuKEGEdWfLweZ2YBx6623MmXKFKZMmcJ3vvMdAF577TUmTpzIZZddxpQpU3jjjTdoaWlh69atANx4441MnDiRT3/601x88cUsWbIEgMsvv5z7778fSKasuf7665k+fTqnnHIK7e3tADz99NN88pOfZNq0aXzqU5/ipZdeyhzr73//ey644AKmTp3K6aefzvr167s95p49e5g/fz6TJ0/mwgsvZM+ePX1SZzVpCUj6GPA74P6IuDRddwlwCzAGeBT4YkT8vhbxmNnAs3btWu666y6eeuopIoLTTjuNM888k2OOOYaNGzeybNkyTj/99EP2eeaZZ3jggQdYt24dBw4cYPr06cyYMaPL448ZM4Znn32W22+/nSVLlvDDH/6QSZMm8fjjjzN48GBWr17NtddeywMPPJAp3uuvv55p06axatUqfv3rX3PZZZfx3HPPVTzmHXfcwbBhw3jxxRdZv34906dP73WdQe0uB30PeKbzhaSTgR8A5wHPAkuB24H5NYrHzAaYJ554ggsvvJDhw4cDcNFFF/H4448zd+5cJkyYcFgCAHjyySf5/Oc/T1NTE01NTZx//vkVj3/RRRcBMGPGDFauXAnA9u3bWbBgARs3bkQSBw4cqCrezoTxmc98hm3btrFjx46Kx/ztb3/LV7/6VQCmTp3K1KlTM79Xd3K/HCRpPvAH4Fclq78APBQRv42IDmARcJGkkXnHY2bF05kYemPo0KEADBo0iIMHDwKwaNEizjrrLJ5//nkeeuihPrlbOo9jdifXJCBpFHADcE1Z0cnAus4XEfEKsB84qYtjLJTUJqlty5YteYZrVjgt3/zl+0ujmzVrFqtWrWL37t3s2rWLBx98kFmzZnW7zxlnnPH+F21HR0fVo3a2b9/O8ccfD8Ddd99ddbzLly8HklFDY8aMYdSoURWPOXv2bFasWAHA888//34fQm/lfTnoRuDOiNhcdhPDCGB72bbbgcNaAhGxlORyEa2trZFTnGbWx2o9Cmn69OlcfvnlfOITnwDgyiuvZNq0abz22msV95k5cyZz585l6tSpjB07llNOOYUPf/jDmd/zG9/4BgsWLOCmm27ivPOqO9/FixfzxS9+kalTpzJs2DCWLVvW7TGvuuoqrrjiCiZPnszkyZMr9l1USxH5fK9K+jiwHJgWEfslLQb+fURcKukfgCcj4n+WbL8TmBMRaysds7W1NfxQGbPqVRoi2pdDR1988UUmT57cq2PUQ0dHByNGjGD37t3Mnj2bpUuX9lmnaz109XuQtDYiuryhIM+WwBygBXg9bQWMAAZJ+hPgH4FTSwI8ARgKvJxjPGZmh1m4cCEvvPACe/fuZcGCBQ2dAI5EnklgKfCTktdfJ0kKVwHHAf8iaRbJ6KAbgJURsTPHeMzMDtN5nb2ocksCEbEb2N35WlIHsDcitgBbJP0XkstFo4HVwBV5xWJmtRERnkSujo7k8n7Npo2IiMVlr1cAxU7BZgNIU1MT27Zt83TSddL5PIGmpqaq9vPcQWbWJ8aNG8fmzZvxUO766XyyWDWcBMysTwwZMqSqJ1pZ/+AJ5MzMCsxJwMyswJwEzMwKzEnAzKzAnATMzArMScDMrMCcBMzMCsxJwMyswJwEzMwKzEnAzKzAnATMzArMcweZ9QN9+YQvs2q4JWBmVmC5JgFJ90p6S9IOSS9LujJd3yIpJHWULIvyjMXMzA6X9+WgW4C/jIh9kiYBayT9X2BbWn50RBzMOQYzM6sg15ZARGyIiH2dL9PlxDzf08zMssu9T0DS7ZJ2A+3AW8AjJcWbJG2WdJekMRX2XyipTVKbn1hkZta3ck8CEfEVYCQwC1gJ7AO2AjOBCcCMtHx5hf2XRkRrRLQ2NzfnHa6ZWaHUZHRQRLwbEU8A44CrIqIjItoi4mBEvANcDXxO0shaxGNmZolaDxEdTNd9ApH+6yGrZmY1lNuXrqTjJM2XNELSIEnnABcDv5J0mqSJko6SNBq4DVgTEdvzisfMzA6X5xDRAK4Cvk+SbDYBX4uIX0i6GLgZOA7YATxKkiDMrB8ovYMZfBfzQJZbEoiILcCZFcruA+7L673NzCwbX4M3MyswJwEzswJzEjAzKzBPJW1mPfJU1wOXWwJmZgXmJGBmVmBOAmZmBeYkYGZWYO4YNsuZO1WtP3NLwMyswJwEzMwKzEnAzKzAnATMzArMScDMrMCcBMzMCizXJCDpXklvSdoh6WVJV5aUfVZSu6Tdkh6TNCHPWMzM7HB5twRuAVoiYhQwF7hJ0gxJY4CVwCLgWKAN+GnOsZiZWZlcbxaLiA2lL9PlRGAGsCEifg4gaTGwVdKkiGjPMyYzM/tA7n0Ckm6XtBtoB94CHgFOBtZ1bhMRu4BX0vXl+y+U1CapbcuWLXmHa2ZWKLkngYj4CjASmEVyCWgfMALYXrbp9nS78v2XRkRrRLQ2NzfnHa6ZWaHUZHRQRLwbEU8A44CrgA5gVNlmo4CdtYjHzMwStR4iOpikT2ADcGrnSknDS9abmVmN5JYEJB0nab6kEZIGSToHuBj4FfAgMEXSPElNwHXAencKm5nVVp6jg4Lk0s/3SZLNJuBrEfELAEnzgO8C9wJPAfNzjMWsX8h7WmlPW23Vyi0JRMQW4MxuylcDk/J6fzMz65mnjTAzKzAnATOzAnMSMDMrMD9j2Kyfceeu1ZJbAmZmBeYkYGZWYE4CZmYF5iRgZlZg7hg2s6q443pgcUvAzKzAnATMzArMScDMrMAyJQFJp+QdiJmZ1V7WjuHbJQ0F7gaWR0T5oyHNLAfuhLW8ZWoJRMQs4AvAHwNrJa2Q9Ke5RmZmZrnL3CcQERuBbwF/TfKcgNsktUu6qKvtJQ2VdKekTZJ2SnpO0p+lZS2SQlJHybKoL07IzMyyy3Q5SNJU4ArgPOBR4PyIeFbSvwP+BVhZ4dhvkCSM14FzgZ+V9S8cHREHexG/mZn1QtY+gf8F/BC4NiL2dK6MiDclfaurHSJiF7C4ZNXDkl4FZgBrjyxcMzPrS1mTwHnAnoh4F0DSUUBTROyOiHuyHEDSWOAkYEPJ6k2SgqR18VcRsTV76GZm1ltZk8Bq4GygI309DPgn4FNZdpY0BFgOLIuIdkkjgJnAc8Bo4Htp+Tld7LsQWAgwfvz4jOEWi0eQFFvp778321d7HBsYsnYMN0VEZwIg/XlYlh3TVsM9wH7g6s79I6ItIg5GxDvp+s9JGlm+f0QsjYjWiGhtbm7OGK6ZmWWRNQnskjS984WkGcCebrbv3E7AncBYYF5EHKiwaVQZj5mZ9YGsl4O+Bvxc0puAgI8A/ynDfncAk4GzSzuUJZ0G/AHYCBwD3Aas8U1oZma1lSkJRMQzkiYBE9NVL3Xzv3oAJE0AvgzsA95OGgWQrnsPuBk4DthB0jF8cdXRm5lZr1TzPIGZQEu6z3RJRMSPK20cEZtIWg2V3FfFe5sNOHl3xLqj17LIerPYPcCJJKN53k1XB1AxCZiZWf+XtSXQCvxJRESPW5qZWcPIOhrneZLOYDMzG0CytgTGAC9IepqkoxeAiJibS1RmZlYTWZPA4jyDKCrf6ds4/LuygSrrENHfpEM+PxYRqyUNAwblG5qZmeUt6+MlvwTcD/wgXXU8sCqvoMzMrDaydgz/V+AMkhu7Oh8wc1xeQZmZWW1kTQL7ImJ/5wtJg/lgvh8zM2tQWTuGfyPpWuBD6bOFvwI8lF9Y1hfcmVk/vlvXGkXWlsA3gS3A70jm/nmE5HnDZmbWwLKODnoP+N/pYmZmA0TWuYNepYs+gIg4oc8jMjOzmqlm7qBOTcCfA8f2fThmZlZLWS8HbStb9R1Ja4Hr+j4ks77Xl53k7nDvmuulMWW9HDS95OVRJC2Dap5FYGZm/VDWL/K/K/n5IPAa8Bfd7SBpKHA7cDbJpaNXgP8WEf8nLf8s8D1gPPAUcHn6IBozM6uRrJeDzjrCY78BnAm8DpwL/EzSKUAHsBK4kuR+gxuBnwKnH8H7mJnZEcp6Oeia7soj4tYu1u3i0NlHH05HGc0ARgMbIuLn6fEXA1slTYqI9myhm5lZb1UzOmgm8Iv09fnA08DGrG8kaSxwErABuApY11kWEbskvQKcDLSX7bcQWAgwfvz4rG9nNVL0zsCin781vqxJYBwwPSJ2wvv/c/9lRFyaZWdJQ4DlwLKIaJc0guQO5FLbgZHl+0bEUmApQGtrq+crMjPrQ1mnjRgL7C95vT9d1yNJRwH3pPtcna7uAEaVbToK2JkxHjMz6wNZWwI/Bp6W9GD6+gJgWU87SRJwJ0nCODciDqRFG4AFJdsNB05M15uZWY1kaglExH8HrgD+LV2uiIibM+x6BzAZOD8i9pSsfxCYImmepCaSm87Wu1PYzKy2qrnhaxiwIyLuktQs6aMR8WqljdPHUX6Z5MH0byeNAgC+HBHLJc0DvgvcS3KfwPwjOgMzqxtPmd34sg4RvZ5khNBE4C5gCMmX9xmV9klv/FI35auBSdUEa2ZmfStrx/CFwFxgF0BEvEkXI3nMzKyxZE0C+yMiSKeTTjtyzcyswWVNAj+T9APgaElfAlbjB8yYmTW8rHMHLUmfLbyDpF/guoh4NNfIrKH5Ttq+5zq1PPSYBCQNAlank8j5i9/MbADp8XJQRLwLvCfpwzWIx8zMaijrfQIdwO8kPUo6QgggIr6aS1RmZlYTWZPAynQxM7MBpNskIGl8RLweET3OE2S9404/q4bv1LW+0lOfwKrOHyQ9kHMsZmZWYz0lgdJpH07IMxAzM6u9npJAVPjZzMwGgJ46hk+VtIOkRfCh9GfS1xER5Q+GMTOzBtJtEoiIQbUKxGqrUsdiaad0lm36s952nla7vztrrRFlnTvIzMwGoFyTgKSrJbVJ2ifp7pL1LZJCUkfJsijPWMzM7HDVPFnsSLwJ3AScA3yoi/KjI+JgzjGYmVkFuSaBiFgJIKkVGJfne5mZWfXybgn0ZJOkIJmd9K8iYmv5BpIWAgsBxo8fX+Pw6iNLB2OjdM5aMVV7B7zvmK+fenUMbwVmAhOAGSSPqlze1YYRsTQiWiOitbm5uYYhmpkNfHVpCUREB9CWvnxH0tXAW5JGRsTOesRkZlZE/WWIaOfdyP0lHjOzQsi1JSBpcPoeg4BBkpqAgySXgP4AbASOAW4D1kTE9jzjMTOzQ+V9OehbwPUlry8F/gZ4CbgZOI7kucWPAhfnHEvduNPrA64L61RpAIQ7lWsr7yGii4HFFYrvy/O9zcysZ74Gb2ZWYE4CZmYF5iRgZlZg9b5j2I5Qlk61RuWOQbPacUvAzKzAnATMzArMScDMrMCcBMzMCswdw0egUkfkQOuUbcRjZnmv7jqSB8Lv0KwabgmYmRWYk4CZWYE5CZiZFZiTgJlZgbljuJfckfiB3tSF67EY+uoz4jvD+45bAmZmBZZrEpB0taQ2Sfsk3V1W9llJ7ZJ2S3pM0oQ8YzEzs8Pl3RJ4E7gJ+FHpSkljgJXAIuBYkofO/zTnWMzMrEzeTxZbCSCpFRhXUnQRsCEifp6WLwa2SpoUEe15xmRmZh+oV8fwycC6zhcRsUvSK+n6Q5KApIXAQoDx48fXMkbrZxrlbmbrO/795K9eHcMjgO1l67YDI8s3jIilEdEaEa3Nzc01Cc7MrCjqlQQ6gFFl60YBO+sQi5lZYdUrCWwATu18IWk4cGK63szMaiTvIaKDJTUBg4BBkpokDQYeBKZImpeWXwesd6ewmVlt5d0x/C3g+pLXlwJ/ExGLJc0DvgvcCzwFzM85ll5p9A6qRo/fzPKR9xDRxcDiCmWrgUl5vr+ZmXXP00aYmRWYk4CZWYE5CZiZFZinku6G71DNl+uiGPrD77lSDJ6S2i0BM7NCcxIwMyswJwEzswJzEjAzKzB3DJfpD51Y9gH/Pqwr/lz0HbcEzMwKzEnAzKzAnATMzArMScDMrMAK2zFc2rHkuwar404566/8d109twTMzAqsrklA0hpJeyV1pMtL9YzHzKxo+kNL4OqIGJEuE+sdjJlZkfSHJGBmZnXSH5LALZK2SnpS0px6B2NmViT1TgJ/DZwAHA8sBR6SdGLpBpIWSmqT1LZly5Z6xGhmNmDVNQlExFMRsTMi9kXEMuBJ4NyybZZGRGtEtDY3N9cnUDOzAareLYFyAajeQZiZFUXdkoCkoyWdI6lJ0mBJXwBmA/9Yr5jMzIqmnncMDwFuAiYB7wLtwAUR8XIdYzIzK5S6JYGI2ALMrNf7m9nAlmV6E08z0f/6BMzMrIacBMzMCsxJwMyswJwEzMwKrLDPEzAzq6RIHcZuCZiZFZiTgJlZgTkJmJkVmJOAmVmBFapj2A9IN7PeyPod0kidyW4JmJkVmJOAmVmBOQmYmRWYk4CZWYEVqmPYzKySSp2+RzKgpNo7jittX4s7l90SMDMrsLomAUnHSnpQ0i5JmyRdUs94zMyKpt6Xg74H7AfGAh8HfilpXURsqG9YZmbFUM8HzQ8H5gGLIqIjIp4AfgH853rFZGZWNIqI+ryxNA14MiKGlaz7OnBmRJxfsm4hsDB9ORF4qaaB9t4YYGu9g+hHXB+Hcn0cyvVxqL6qjwkR0dxVQT0vB40AdpSt2w6MLF0REUuBpbUKqq9JaouI1nrH0V+4Pg7l+jiU6+NQtaiPenYMdwCjytaNAnbWIRYzs0KqZxJ4GRgs6WMl604F3ClsZlYjdUsCEbELWAncIGm4pDOAzwP31CumnDTspaycuD4O5fo4lOvjULnXR906hiG5TwD4EfCnwDbgmxGxom4BmZkVTF2TgJmZ1ZenjTAzKzAnATOzAnMSyEDSUEl3pvMb7ZT0nKQ/S8taJIWkjpJlUdm+P5K0Q9Lbkq4pO/ZnJbVL2i3pMUkTan1+R0LSvZLeSs/rZUlXlpRVPKeBWh9QuU6K+hkBkPQxSXsl3Vuy7pL0b2mXpFVp32BnWbfziXW3b6MorxNJcyS9V/b5WFCyfb51EhFeeliA4cBioIUkcf5HkvsZWtIlgMEV9r0FeBw4BpgMvA38h7RsDMkNcn8ONAF/C/xrvc83Y52cDAxNf56UnteMns5poNZHD3VSyM9IGv8/ped2b0kd7QRmk9wwugL4Scn29wE/Tcs+nZ77yVn2bZSlizqZA2zuZvtc66TuFdKoC7CeZO6jnv7A3wQ+V/L6xs5fEsl0GP9cUjYc2ANMqvf5VVkXE4G3gL/o6ZyKUB9d1EkhPyPAfOBnJP+B6vzCuxlYUbLNiSSTSI5Mz20/cFJJ+T3At3vat97n2ss6qZgEalEnvhx0BCSNBU7i0BvbNknaLOkuSWPS7Y4B/ghYV7LdOpLsTfrv+2WR3DvxSkl5vybpdkm7gXaSL7xH6OacBnp9QMU66VSYz4ikUcANwDVlReXn8wrpl1y6HIyIl0u2764uSvft97qpE4DjJL0j6VVJf69kgk2oQZ04CVRJ0hBgObAsItpJJneaCUwgafqPTMshaZ5B0nyj5OeRJeWlZeXl/VpEfIUk1lkkN/7to/tzGtD1ARXrpIifkRuBOyNic9n6nj4f3c0n1qh10alSnbSTTKX/R8BnSD4jt6ZludeJk0AVJB1F0hTbD1wNEMk02G0RcTAi3knXf07SSJL5keDQOZJK50dq+PmTIuLdSKYBHwdcRffnNODrAw6vk6J9RiR9HDgb+Psuinv6fHR3rg1XF526q5OIeDsiXoiI9yLiVeAbJJeaoQZ14iSQkSQBd5I8AGdeRByosGnn3XdHRcS/kVwSOLWkvHR+pA2lZWkT8EQac/6kwXwQe5fnVLD6gA/qpNxA/4zMIekHeV3S28DXgXmSnuXw8zkBGEoyl1hP84l1t29/N4fKdVIu+OC7Of86qXdHSaMswPeBfwVGlK0/jaQT8ChgNEkv/mMl5d8GfkMy8mMSyR9858iPZpKm2zySkR//gwYY+QEcR9LBNQIYBJwD7ALm9nROA7E+MtRJoT4jwDDgIyXLEuD+9FxOJrm8MYuk0/NeDh0d9BOS0TDDgTM4fCRMxX3789JDnZxFcqlQwB8DjwF31apO6l45jbCkv6AA9pI0vzqXLwAXA6+mf/BvAT8GPlKy71CS+ZF2AO8A15Qd+2ySa4J7gDVAS73PN0N9NKdfWn9Iz+t3wJeynNNArI+e6qSIn5Gy+BeTjoRJX18CvJ7Wxz8Ax5aUHQusSsteBy4pO1bFfRtp4dDRQdcA/w/YDbwB3EbJ6J6868RzB5mZFZj7BMzMCsxJwMyswJwEzMwKzEnAzKzAnATMzArMScDMrMCcBMzMCsxJwMyswP4/zu7dqmtpqTMAAAAASUVORK5CYII=",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "让我们也对测试数据进行缩放\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "source": [
+ "test['load'] = scaler.transform(test)\n",
+ "test.head()"
+ ],
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2014-12-30 00:00:00 \n",
+ " 0.33 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 01:00:00 \n",
+ " 0.29 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 02:00:00 \n",
+ " 0.27 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 03:00:00 \n",
+ " 0.27 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 04:00:00 \n",
+ " 0.30 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2014-12-30 00:00:00 0.33\n",
+ "2014-12-30 01:00:00 0.29\n",
+ "2014-12-30 02:00:00 0.27\n",
+ "2014-12-30 03:00:00 0.27\n",
+ "2014-12-30 04:00:00 0.30"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 25
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "source": [
+ "# Specify the number of steps to forecast ahead\n",
+ "HORIZON = 3\n",
+ "print('Forecasting horizon:', HORIZON, 'hours')"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Forecasting horizon: 3 hours\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "source": [
+ "order = (4, 1, 0)\n",
+ "seasonal_order = (1, 1, 0, 24)\n",
+ "\n",
+ "model = SARIMAX(endog=train, order=order, seasonal_order=seasonal_order)\n",
+ "results = model.fit()\n",
+ "\n",
+ "print(results.summary())\n"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ " SARIMAX Results \n",
+ "==========================================================================================\n",
+ "Dep. Variable: load No. Observations: 1416\n",
+ "Model: SARIMAX(4, 1, 0)x(1, 1, 0, 24) Log Likelihood 3477.239\n",
+ "Date: Thu, 30 Sep 2021 AIC -6942.477\n",
+ "Time: 14:36:28 BIC -6911.050\n",
+ "Sample: 11-01-2014 HQIC -6930.725\n",
+ " - 12-29-2014 \n",
+ "Covariance Type: opg \n",
+ "==============================================================================\n",
+ " coef std err z P>|z| [0.025 0.975]\n",
+ "------------------------------------------------------------------------------\n",
+ "ar.L1 0.8403 0.016 52.226 0.000 0.809 0.872\n",
+ "ar.L2 -0.5220 0.034 -15.388 0.000 -0.588 -0.456\n",
+ "ar.L3 0.1536 0.044 3.470 0.001 0.067 0.240\n",
+ "ar.L4 -0.0778 0.036 -2.158 0.031 -0.148 -0.007\n",
+ "ar.S.L24 -0.2327 0.024 -9.718 0.000 -0.280 -0.186\n",
+ "sigma2 0.0004 8.32e-06 47.358 0.000 0.000 0.000\n",
+ "===================================================================================\n",
+ "Ljung-Box (L1) (Q): 0.05 Jarque-Bera (JB): 1464.60\n",
+ "Prob(Q): 0.83 Prob(JB): 0.00\n",
+ "Heteroskedasticity (H): 0.84 Skew: 0.14\n",
+ "Prob(H) (two-sided): 0.07 Kurtosis: 8.02\n",
+ "===================================================================================\n",
+ "\n",
+ "Warnings:\n",
+ "[1] Covariance matrix calculated using the outer product of gradients (complex-step).\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "## 评估模型\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "为每个HORIZON步骤创建一个测试数据点。\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "source": [
+ "test_shifted = test.copy()\n",
+ "\n",
+ "for t in range(1, HORIZON):\n",
+ " test_shifted['load+'+str(t)] = test_shifted['load'].shift(-t, freq='H')\n",
+ " \n",
+ "test_shifted = test_shifted.dropna(how='any')\n",
+ "test_shifted.head(5)"
+ ],
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " load+1 \n",
+ " load+2 \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2014-12-30 00:00:00 \n",
+ " 0.33 \n",
+ " 0.29 \n",
+ " 0.27 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 01:00:00 \n",
+ " 0.29 \n",
+ " 0.27 \n",
+ " 0.27 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 02:00:00 \n",
+ " 0.27 \n",
+ " 0.27 \n",
+ " 0.30 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 03:00:00 \n",
+ " 0.27 \n",
+ " 0.30 \n",
+ " 0.41 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 04:00:00 \n",
+ " 0.30 \n",
+ " 0.41 \n",
+ " 0.57 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load load+1 load+2\n",
+ "2014-12-30 00:00:00 0.33 0.29 0.27\n",
+ "2014-12-30 01:00:00 0.29 0.27 0.27\n",
+ "2014-12-30 02:00:00 0.27 0.27 0.30\n",
+ "2014-12-30 03:00:00 0.27 0.30 0.41\n",
+ "2014-12-30 04:00:00 0.30 0.41 0.57"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 28
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "对测试数据进行预测\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 29,
+ "source": [
+ "%%time\n",
+ "training_window = 720 # dedicate 30 days (720 hours) for training\n",
+ "\n",
+ "train_ts = train['load']\n",
+ "test_ts = test_shifted\n",
+ "\n",
+ "history = [x for x in train_ts]\n",
+ "history = history[(-training_window):]\n",
+ "\n",
+ "predictions = list()\n",
+ "\n",
+ "# let's user simpler model for demonstration\n",
+ "order = (2, 1, 0)\n",
+ "seasonal_order = (1, 1, 0, 24)\n",
+ "\n",
+ "for t in range(test_ts.shape[0]):\n",
+ " model = SARIMAX(endog=history, order=order, seasonal_order=seasonal_order)\n",
+ " model_fit = model.fit()\n",
+ " yhat = model_fit.forecast(steps = HORIZON)\n",
+ " predictions.append(yhat)\n",
+ " obs = list(test_ts.iloc[t])\n",
+ " # move the training window\n",
+ " history.append(obs[0])\n",
+ " history.pop(0)\n",
+ " print(test_ts.index[t])\n",
+ " print(t+1, ': predicted =', yhat, 'expected =', obs)"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "2014-12-30 00:00:00\n",
+ "1 : predicted = [0.32 0.29 0.28] expected = [0.32945389435989236, 0.2900626678603402, 0.2739480752014323]\n",
+ "2014-12-30 01:00:00\n",
+ "2 : predicted = [0.3 0.29 0.3 ] expected = [0.2900626678603402, 0.2739480752014323, 0.26812891674127126]\n",
+ "2014-12-30 02:00:00\n",
+ "3 : predicted = [0.27 0.28 0.32] expected = [0.2739480752014323, 0.26812891674127126, 0.3025962399283795]\n",
+ "2014-12-30 03:00:00\n",
+ "4 : predicted = [0.28 0.32 0.42] expected = [0.26812891674127126, 0.3025962399283795, 0.40823634735899716]\n",
+ "2014-12-30 04:00:00\n",
+ "5 : predicted = [0.3 0.39 0.54] expected = [0.3025962399283795, 0.40823634735899716, 0.5689346463742166]\n",
+ "2014-12-30 05:00:00\n",
+ "6 : predicted = [0.4 0.55 0.66] expected = [0.40823634735899716, 0.5689346463742166, 0.6799462846911368]\n",
+ "2014-12-30 06:00:00\n",
+ "7 : predicted = [0.57 0.68 0.75] expected = [0.5689346463742166, 0.6799462846911368, 0.7309758281110115]\n",
+ "2014-12-30 07:00:00\n",
+ "8 : predicted = [0.68 0.75 0.8 ] expected = [0.6799462846911368, 0.7309758281110115, 0.7511190689346463]\n",
+ "2014-12-30 08:00:00\n",
+ "9 : predicted = [0.75 0.8 0.82] expected = [0.7309758281110115, 0.7511190689346463, 0.7636526410026856]\n",
+ "2014-12-30 09:00:00\n",
+ "10 : predicted = [0.77 0.78 0.78] expected = [0.7511190689346463, 0.7636526410026856, 0.7381378692927483]\n",
+ "2014-12-30 10:00:00\n",
+ "11 : predicted = [0.76 0.75 0.74] expected = [0.7636526410026856, 0.7381378692927483, 0.7188898836168307]\n",
+ "2014-12-30 11:00:00\n",
+ "12 : predicted = [0.77 0.76 0.75] expected = [0.7381378692927483, 0.7188898836168307, 0.7090420769919425]\n",
+ "2014-12-30 12:00:00\n",
+ "13 : predicted = [0.7 0.68 0.69] expected = [0.7188898836168307, 0.7090420769919425, 0.7081468218442255]\n",
+ "2014-12-30 13:00:00\n",
+ "14 : predicted = [0.72 0.73 0.76] expected = [0.7090420769919425, 0.7081468218442255, 0.7385854968666068]\n",
+ "2014-12-30 14:00:00\n",
+ "15 : predicted = [0.71 0.73 0.86] expected = [0.7081468218442255, 0.7385854968666068, 0.8478066248880931]\n",
+ "2014-12-30 15:00:00\n",
+ "16 : predicted = [0.73 0.85 0.97] expected = [0.7385854968666068, 0.8478066248880931, 0.9516562220232765]\n",
+ "2014-12-30 16:00:00\n",
+ "17 : predicted = [0.87 0.99 0.97] expected = [0.8478066248880931, 0.9516562220232765, 0.934198746642793]\n",
+ "2014-12-30 17:00:00\n",
+ "18 : predicted = [0.94 0.92 0.86] expected = [0.9516562220232765, 0.934198746642793, 0.8876454789615038]\n",
+ "2014-12-30 18:00:00\n",
+ "19 : predicted = [0.94 0.89 0.82] expected = [0.934198746642793, 0.8876454789615038, 0.8294538943598924]\n",
+ "2014-12-30 19:00:00\n",
+ "20 : predicted = [0.88 0.82 0.71] expected = [0.8876454789615038, 0.8294538943598924, 0.7197851387645477]\n",
+ "2014-12-30 20:00:00\n",
+ "21 : predicted = [0.83 0.72 0.58] expected = [0.8294538943598924, 0.7197851387645477, 0.5747538048343777]\n",
+ "2014-12-30 21:00:00\n",
+ "22 : predicted = [0.72 0.58 0.47] expected = [0.7197851387645477, 0.5747538048343777, 0.4592658907788718]\n",
+ "2014-12-30 22:00:00\n",
+ "23 : predicted = [0.58 0.47 0.39] expected = [0.5747538048343777, 0.4592658907788718, 0.3858549686660697]\n",
+ "2014-12-30 23:00:00\n",
+ "24 : predicted = [0.46 0.38 0.34] expected = [0.4592658907788718, 0.3858549686660697, 0.34377797672336596]\n",
+ "2014-12-31 00:00:00\n",
+ "25 : predicted = [0.38 0.34 0.33] expected = [0.3858549686660697, 0.34377797672336596, 0.32542524619516544]\n",
+ "2014-12-31 01:00:00\n",
+ "26 : predicted = [0.36 0.34 0.34] expected = [0.34377797672336596, 0.32542524619516544, 0.33034914950760963]\n",
+ "2014-12-31 02:00:00\n",
+ "27 : predicted = [0.32 0.32 0.35] expected = [0.32542524619516544, 0.33034914950760963, 0.3706356311548791]\n",
+ "2014-12-31 03:00:00\n",
+ "28 : predicted = [0.32 0.36 0.47] expected = [0.33034914950760963, 0.3706356311548791, 0.470008952551477]\n",
+ "2014-12-31 04:00:00\n",
+ "29 : predicted = [0.37 0.48 0.65] expected = [0.3706356311548791, 0.470008952551477, 0.6145926589077886]\n",
+ "2014-12-31 05:00:00\n",
+ "30 : predicted = [0.48 0.64 0.75] expected = [0.470008952551477, 0.6145926589077886, 0.7247090420769919]\n",
+ "2014-12-31 06:00:00\n",
+ "31 : predicted = [0.63 0.73 0.79] expected = [0.6145926589077886, 0.7247090420769919, 0.786034019695613]\n",
+ "2014-12-31 07:00:00\n",
+ "32 : predicted = [0.71 0.76 0.79] expected = [0.7247090420769919, 0.786034019695613, 0.8012533572068039]\n",
+ "2014-12-31 08:00:00\n",
+ "33 : predicted = [0.79 0.82 0.83] expected = [0.786034019695613, 0.8012533572068039, 0.7994628469113696]\n",
+ "2014-12-31 09:00:00\n",
+ "34 : predicted = [0.82 0.83 0.81] expected = [0.8012533572068039, 0.7994628469113696, 0.780214861235452]\n",
+ "2014-12-31 10:00:00\n",
+ "35 : predicted = [0.8 0.78 0.76] expected = [0.7994628469113696, 0.780214861235452, 0.7587287376902416]\n",
+ "2014-12-31 11:00:00\n",
+ "36 : predicted = [0.77 0.75 0.74] expected = [0.780214861235452, 0.7587287376902416, 0.7367949865711727]\n",
+ "2014-12-31 12:00:00\n",
+ "37 : predicted = [0.77 0.76 0.76] expected = [0.7587287376902416, 0.7367949865711727, 0.7188898836168307]\n",
+ "2014-12-31 13:00:00\n",
+ "38 : predicted = [0.75 0.75 0.78] expected = [0.7367949865711727, 0.7188898836168307, 0.7273948075201431]\n",
+ "2014-12-31 14:00:00\n",
+ "39 : predicted = [0.73 0.75 0.87] expected = [0.7188898836168307, 0.7273948075201431, 0.8299015219337511]\n",
+ "2014-12-31 15:00:00\n",
+ "40 : predicted = [0.74 0.85 0.96] expected = [0.7273948075201431, 0.8299015219337511, 0.909579230080573]\n",
+ "2014-12-31 16:00:00\n",
+ "41 : predicted = [0.83 0.94 0.93] expected = [0.8299015219337511, 0.909579230080573, 0.855863921217547]\n",
+ "2014-12-31 17:00:00\n",
+ "42 : predicted = [0.94 0.93 0.88] expected = [0.909579230080573, 0.855863921217547, 0.7721575649059982]\n",
+ "2014-12-31 18:00:00\n",
+ "43 : predicted = [0.87 0.82 0.77] expected = [0.855863921217547, 0.7721575649059982, 0.7023276633840643]\n",
+ "2014-12-31 19:00:00\n",
+ "44 : predicted = [0.79 0.73 0.63] expected = [0.7721575649059982, 0.7023276633840643, 0.6195165622202325]\n",
+ "2014-12-31 20:00:00\n",
+ "45 : predicted = [0.7 0.59 0.46] expected = [0.7023276633840643, 0.6195165622202325, 0.5425246195165621]\n",
+ "2014-12-31 21:00:00\n",
+ "46 : predicted = [0.6 0.47 0.36] expected = [0.6195165622202325, 0.5425246195165621, 0.4735899731423454]\n",
+ "CPU times: user 12min 15s, sys: 2min 39s, total: 14min 54s\n",
+ "Wall time: 2min 36s\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "scrolled": true
+ }
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "将预测与实际负载进行比较\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 30,
+ "source": [
+ "eval_df = pd.DataFrame(predictions, columns=['t+'+str(t) for t in range(1, HORIZON+1)])\n",
+ "eval_df['timestamp'] = test.index[0:len(test.index)-HORIZON+1]\n",
+ "eval_df = pd.melt(eval_df, id_vars='timestamp', value_name='prediction', var_name='h')\n",
+ "eval_df['actual'] = np.array(np.transpose(test_ts)).ravel()\n",
+ "eval_df[['prediction', 'actual']] = scaler.inverse_transform(eval_df[['prediction', 'actual']])\n",
+ "eval_df.head()"
+ ],
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " timestamp \n",
+ " h \n",
+ " prediction \n",
+ " actual \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 \n",
+ " 2014-12-30 00:00:00 \n",
+ " t+1 \n",
+ " 3,008.74 \n",
+ " 3,023.00 \n",
+ " \n",
+ " \n",
+ " 1 \n",
+ " 2014-12-30 01:00:00 \n",
+ " t+1 \n",
+ " 2,955.53 \n",
+ " 2,935.00 \n",
+ " \n",
+ " \n",
+ " 2 \n",
+ " 2014-12-30 02:00:00 \n",
+ " t+1 \n",
+ " 2,900.17 \n",
+ " 2,899.00 \n",
+ " \n",
+ " \n",
+ " 3 \n",
+ " 2014-12-30 03:00:00 \n",
+ " t+1 \n",
+ " 2,917.69 \n",
+ " 2,886.00 \n",
+ " \n",
+ " \n",
+ " 4 \n",
+ " 2014-12-30 04:00:00 \n",
+ " t+1 \n",
+ " 2,946.99 \n",
+ " 2,963.00 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " timestamp h prediction actual\n",
+ "0 2014-12-30 00:00:00 t+1 3,008.74 3,023.00\n",
+ "1 2014-12-30 01:00:00 t+1 2,955.53 2,935.00\n",
+ "2 2014-12-30 02:00:00 t+1 2,900.17 2,899.00\n",
+ "3 2014-12-30 03:00:00 t+1 2,917.69 2,886.00\n",
+ "4 2014-12-30 04:00:00 t+1 2,946.99 2,963.00"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 30
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "计算所有预测的**平均绝对百分比误差 (MAPE)**\n",
+ "\n",
+ "$$MAPE = \\frac{1}{n} \\sum_{t=1}^{n}|\\frac{actual_t - predicted_t}{actual_t}|$$\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 31,
+ "source": [
+ "if(HORIZON > 1):\n",
+ " eval_df['APE'] = (eval_df['prediction'] - eval_df['actual']).abs() / eval_df['actual']\n",
+ " print(eval_df.groupby('h')['APE'].mean())"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "h\n",
+ "t+1 0.01\n",
+ "t+2 0.01\n",
+ "t+3 0.02\n",
+ "Name: APE, dtype: float64\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 32,
+ "source": [
+ "print('One step forecast MAPE: ', (mape(eval_df[eval_df['h'] == 't+1']['prediction'], eval_df[eval_df['h'] == 't+1']['actual']))*100, '%')"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "One step forecast MAPE: 0.5570581332313952 %\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 33,
+ "source": [
+ "print('Multi-step forecast MAPE: ', mape(eval_df['prediction'], eval_df['actual'])*100, '%')"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Multi-step forecast MAPE: 1.1460048657704118 %\n"
+ ]
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "绘制测试集第一周的预测值与实际值的对比图\n"
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 34,
+ "source": [
+ "if(HORIZON == 1):\n",
+ " ## Plotting single step forecast\n",
+ " eval_df.plot(x='timestamp', y=['actual', 'prediction'], style=['r', 'b'], figsize=(15, 8))\n",
+ "\n",
+ "else:\n",
+ " ## Plotting multi step forecast\n",
+ " plot_df = eval_df[(eval_df.h=='t+1')][['timestamp', 'actual']]\n",
+ " for t in range(1, HORIZON+1):\n",
+ " plot_df['t+'+str(t)] = eval_df[(eval_df.h=='t+'+str(t))]['prediction'].values\n",
+ "\n",
+ " fig = plt.figure(figsize=(15, 8))\n",
+ " ax = plt.plot(plot_df['timestamp'], plot_df['actual'], color='red', linewidth=4.0)\n",
+ " ax = fig.add_subplot(111)\n",
+ " for t in range(1, HORIZON+1):\n",
+ " x = plot_df['timestamp'][(t-1):]\n",
+ " y = plot_df['t+'+str(t)][0:len(x)]\n",
+ " ax.plot(x, y, color='blue', linewidth=4*math.pow(.9,t), alpha=math.pow(0.8,t))\n",
+ " \n",
+ " ax.legend(loc='best')\n",
+ " \n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ],
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "No handles with labels found to put in legend.\n"
+ ]
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "image/png": "",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "source": [],
+ "outputs": [],
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "kernel_info": {
+ "name": "python3"
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "nteract": {
+ "version": "nteract-front-end@1.0.0"
+ },
+ "metadata": {
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ }
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "c193140200b9684da27e3890211391b6",
+ "translation_date": "2025-09-03T19:54:16+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2
+}
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/2-ARIMA/working/notebook.ipynb b/translations/zh-CN/7-TimeSeries/2-ARIMA/working/notebook.ipynb
new file mode 100644
index 000000000..fcb883143
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/2-ARIMA/working/notebook.ipynb
@@ -0,0 +1,50 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": 3
+ },
+ "orig_nbformat": 2,
+ "coopTranslator": {
+ "original_hash": "523ec472196307b3c4235337353c9ceb",
+ "translation_date": "2025-09-03T19:55:40+00:00",
+ "source_file": "7-TimeSeries/2-ARIMA/working/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "陶宏、Pierre Pinson、Shu Fan、Hamidreza Zareipour、Alberto Troccoli 和 Rob J. Hyndman,“概率能源预测:2014年全球能源预测竞赛及未来”,《国际预测期刊》,第32卷,第3期,页896-913,2016年7月至9月。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "pip install statsmodels"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/3-SVR/README.md b/translations/zh-CN/7-TimeSeries/3-SVR/README.md
new file mode 100644
index 000000000..481611734
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/3-SVR/README.md
@@ -0,0 +1,384 @@
+# 使用支持向量回归器进行时间序列预测
+
+在上一节课中,你学习了如何使用 ARIMA 模型进行时间序列预测。现在,你将学习支持向量回归器(Support Vector Regressor, SVR)模型,这是一种用于预测连续数据的回归模型。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 介绍
+
+在本课中,你将学习如何使用[**SVM**(支持向量机)](https://en.wikipedia.org/wiki/Support-vector_machine)构建回归模型,即**SVR(支持向量回归器)**。
+
+### 时间序列中的 SVR [^1]
+
+在理解 SVR 在时间序列预测中的重要性之前,你需要了解以下几个关键概念:
+
+- **回归(Regression):** 一种监督学习技术,用于根据给定的输入集预测连续值。其核心思想是拟合一条曲线(或直线),使其尽可能多地通过数据点。[点击这里](https://en.wikipedia.org/wiki/Regression_analysis)了解更多信息。
+- **支持向量机(SVM):** 一种监督学习模型,可用于分类、回归和异常值检测。SVM 模型在特征空间中是一条超平面,在分类任务中充当边界,在回归任务中充当最佳拟合线。SVM 通常使用核函数将数据集转换到更高维的空间,以便更容易分离。[点击这里](https://en.wikipedia.org/wiki/Support-vector_machine)了解更多关于 SVM 的信息。
+- **支持向量回归器(SVR):** SVM 的一种变体,用于找到最佳拟合线(在 SVM 中是超平面),使其尽可能多地通过数据点。
+
+### 为什么选择 SVR?[^1]
+
+在上一节课中,你学习了 ARIMA,这是一种非常成功的统计线性方法,用于预测时间序列数据。然而,在许多情况下,时间序列数据具有*非线性*特性,这种特性无法通过线性模型映射。在这种情况下,SVM 在回归任务中处理数据非线性的能力使得 SVR 在时间序列预测中非常成功。
+
+## 练习 - 构建一个 SVR 模型
+
+数据准备的前几步与上一节关于 [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA) 的内容相同。
+
+打开本课的 [_/working_](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/3-SVR/working) 文件夹,找到 [_notebook.ipynb_](https://github.com/microsoft/ML-For-Beginners/blob/main/7-TimeSeries/3-SVR/working/notebook.ipynb) 文件。[^2]
+
+1. 运行 notebook 并导入必要的库:[^2]
+
+ ```python
+ import sys
+ sys.path.append('../../')
+ ```
+
+ ```python
+ import os
+ import warnings
+ import matplotlib.pyplot as plt
+ import numpy as np
+ import pandas as pd
+ import datetime as dt
+ import math
+
+ from sklearn.svm import SVR
+ from sklearn.preprocessing import MinMaxScaler
+ from common.utils import load_data, mape
+ ```
+
+2. 从 `/data/energy.csv` 文件中加载数据到 Pandas 数据框中并查看:[^2]
+
+ ```python
+ energy = load_data('../../data')[['load']]
+ ```
+
+3. 绘制 2012 年 1 月至 2014 年 12 月的所有能源数据:[^2]
+
+ ```python
+ energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+ 现在,让我们构建 SVR 模型。
+
+### 创建训练集和测试集
+
+现在数据已经加载,你可以将其分为训练集和测试集。接着,你需要对数据进行重塑,以创建基于时间步长的数据集,这是 SVR 所需的。你将在训练集上训练模型。训练完成后,你将在训练集、测试集以及完整数据集上评估模型的准确性,以查看整体性能。需要确保测试集覆盖的时间段晚于训练集,以避免模型从未来时间段中获取信息[^2](这种情况称为*过拟合*)。
+
+1. 将 2014 年 9 月 1 日至 10 月 31 日的两个月数据分配给训练集。测试集将包括 2014 年 11 月 1 日至 12 月 31 日的两个月数据:[^2]
+
+ ```python
+ train_start_dt = '2014-11-01 00:00:00'
+ test_start_dt = '2014-12-30 00:00:00'
+ ```
+
+2. 可视化差异:[^2]
+
+ ```python
+ energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)][['load']].rename(columns={'load':'train'}) \
+ .join(energy[test_start_dt:][['load']].rename(columns={'load':'test'}), how='outer') \
+ .plot(y=['train', 'test'], figsize=(15, 8), fontsize=12)
+ plt.xlabel('timestamp', fontsize=12)
+ plt.ylabel('load', fontsize=12)
+ plt.show()
+ ```
+
+ 
+
+### 准备训练数据
+
+现在,你需要通过过滤和缩放数据来准备训练数据。过滤数据集以仅包含所需的时间段和列,并通过缩放将数据投影到 0 到 1 的区间内。
+
+1. 过滤原始数据集,仅包含上述时间段的数据集,并仅保留所需的“load”列和日期:[^2]
+
+ ```python
+ train = energy.copy()[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']]
+ test = energy.copy()[energy.index >= test_start_dt][['load']]
+
+ print('Training data shape: ', train.shape)
+ print('Test data shape: ', test.shape)
+ ```
+
+ ```output
+ Training data shape: (1416, 1)
+ Test data shape: (48, 1)
+ ```
+
+2. 将训练数据缩放到 (0, 1) 区间:[^2]
+
+ ```python
+ scaler = MinMaxScaler()
+ train['load'] = scaler.fit_transform(train)
+ ```
+
+4. 现在,缩放测试数据:[^2]
+
+ ```python
+ test['load'] = scaler.transform(test)
+ ```
+
+### 创建基于时间步长的数据 [^1]
+
+对于 SVR,你需要将输入数据转换为 `[batch, timesteps]` 的形式。因此,你需要重塑现有的 `train_data` 和 `test_data`,以便创建一个新的维度来表示时间步长。
+
+```python
+# Converting to numpy arrays
+train_data = train.values
+test_data = test.values
+```
+
+在本例中,我们设置 `timesteps = 5`。因此,模型的输入是前 4 个时间步的数据,输出是第 5 个时间步的数据。
+
+```python
+timesteps=5
+```
+
+使用嵌套列表推导将训练数据转换为二维张量:
+
+```python
+train_data_timesteps=np.array([[j for j in train_data[i:i+timesteps]] for i in range(0,len(train_data)-timesteps+1)])[:,:,0]
+train_data_timesteps.shape
+```
+
+```output
+(1412, 5)
+```
+
+将测试数据转换为二维张量:
+
+```python
+test_data_timesteps=np.array([[j for j in test_data[i:i+timesteps]] for i in range(0,len(test_data)-timesteps+1)])[:,:,0]
+test_data_timesteps.shape
+```
+
+```output
+(44, 5)
+```
+
+从训练数据和测试数据中选择输入和输出:
+
+```python
+x_train, y_train = train_data_timesteps[:,:timesteps-1],train_data_timesteps[:,[timesteps-1]]
+x_test, y_test = test_data_timesteps[:,:timesteps-1],test_data_timesteps[:,[timesteps-1]]
+
+print(x_train.shape, y_train.shape)
+print(x_test.shape, y_test.shape)
+```
+
+```output
+(1412, 4) (1412, 1)
+(44, 4) (44, 1)
+```
+
+### 实现 SVR [^1]
+
+现在是时候实现 SVR 了。要了解更多关于此实现的信息,你可以参考[此文档](https://scikit-learn.org/stable/modules/generated/sklearn.svm.SVR.html)。在我们的实现中,我们遵循以下步骤:
+
+1. 调用 `SVR()` 并传入模型超参数:kernel、gamma、C 和 epsilon 来定义模型。
+2. 调用 `fit()` 函数准备训练数据。
+3. 调用 `predict()` 函数进行预测。
+
+现在我们创建一个 SVR 模型。在这里,我们使用 [RBF 核函数](https://scikit-learn.org/stable/modules/svm.html#parameters-of-the-rbf-kernel),并将超参数 gamma、C 和 epsilon 分别设置为 0.5、10 和 0.05。
+
+```python
+model = SVR(kernel='rbf',gamma=0.5, C=10, epsilon = 0.05)
+```
+
+#### 在训练数据上拟合模型 [^1]
+
+```python
+model.fit(x_train, y_train[:,0])
+```
+
+```output
+SVR(C=10, cache_size=200, coef0=0.0, degree=3, epsilon=0.05, gamma=0.5,
+ kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)
+```
+
+#### 进行模型预测 [^1]
+
+```python
+y_train_pred = model.predict(x_train).reshape(-1,1)
+y_test_pred = model.predict(x_test).reshape(-1,1)
+
+print(y_train_pred.shape, y_test_pred.shape)
+```
+
+```output
+(1412, 1) (44, 1)
+```
+
+你已经构建了 SVR!现在我们需要对其进行评估。
+
+### 评估模型 [^1]
+
+为了评估模型,首先我们需要将数据缩放回原始比例。然后,为了检查性能,我们将绘制原始数据和预测数据的时间序列图,并打印 MAPE 结果。
+
+将预测值和原始输出缩放回原始比例:
+
+```python
+# Scaling the predictions
+y_train_pred = scaler.inverse_transform(y_train_pred)
+y_test_pred = scaler.inverse_transform(y_test_pred)
+
+print(len(y_train_pred), len(y_test_pred))
+```
+
+```python
+# Scaling the original values
+y_train = scaler.inverse_transform(y_train)
+y_test = scaler.inverse_transform(y_test)
+
+print(len(y_train), len(y_test))
+```
+
+#### 检查模型在训练数据和测试数据上的性能 [^1]
+
+我们从数据集中提取时间戳,以显示在图表的 x 轴上。注意,我们使用前 ```timesteps-1``` 个值作为第一个输出的输入,因此输出的时间戳将从那之后开始。
+
+```python
+train_timestamps = energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)].index[timesteps-1:]
+test_timestamps = energy[test_start_dt:].index[timesteps-1:]
+
+print(len(train_timestamps), len(test_timestamps))
+```
+
+```output
+1412 44
+```
+
+绘制训练数据的预测结果:
+
+```python
+plt.figure(figsize=(25,6))
+plt.plot(train_timestamps, y_train, color = 'red', linewidth=2.0, alpha = 0.6)
+plt.plot(train_timestamps, y_train_pred, color = 'blue', linewidth=0.8)
+plt.legend(['Actual','Predicted'])
+plt.xlabel('Timestamp')
+plt.title("Training data prediction")
+plt.show()
+```
+
+
+
+打印训练数据的 MAPE:
+
+```python
+print('MAPE for training data: ', mape(y_train_pred, y_train)*100, '%')
+```
+
+```output
+MAPE for training data: 1.7195710200875551 %
+```
+
+绘制测试数据的预测结果:
+
+```python
+plt.figure(figsize=(10,3))
+plt.plot(test_timestamps, y_test, color = 'red', linewidth=2.0, alpha = 0.6)
+plt.plot(test_timestamps, y_test_pred, color = 'blue', linewidth=0.8)
+plt.legend(['Actual','Predicted'])
+plt.xlabel('Timestamp')
+plt.show()
+```
+
+
+
+打印测试数据的 MAPE:
+
+```python
+print('MAPE for testing data: ', mape(y_test_pred, y_test)*100, '%')
+```
+
+```output
+MAPE for testing data: 1.2623790187854018 %
+```
+
+🏆 你在测试数据集上取得了非常好的结果!
+
+### 检查模型在完整数据集上的性能 [^1]
+
+```python
+# Extracting load values as numpy array
+data = energy.copy().values
+
+# Scaling
+data = scaler.transform(data)
+
+# Transforming to 2D tensor as per model input requirement
+data_timesteps=np.array([[j for j in data[i:i+timesteps]] for i in range(0,len(data)-timesteps+1)])[:,:,0]
+print("Tensor shape: ", data_timesteps.shape)
+
+# Selecting inputs and outputs from data
+X, Y = data_timesteps[:,:timesteps-1],data_timesteps[:,[timesteps-1]]
+print("X shape: ", X.shape,"\nY shape: ", Y.shape)
+```
+
+```output
+Tensor shape: (26300, 5)
+X shape: (26300, 4)
+Y shape: (26300, 1)
+```
+
+```python
+# Make model predictions
+Y_pred = model.predict(X).reshape(-1,1)
+
+# Inverse scale and reshape
+Y_pred = scaler.inverse_transform(Y_pred)
+Y = scaler.inverse_transform(Y)
+```
+
+```python
+plt.figure(figsize=(30,8))
+plt.plot(Y, color = 'red', linewidth=2.0, alpha = 0.6)
+plt.plot(Y_pred, color = 'blue', linewidth=0.8)
+plt.legend(['Actual','Predicted'])
+plt.xlabel('Timestamp')
+plt.show()
+```
+
+
+
+```python
+print('MAPE: ', mape(Y_pred, Y)*100, '%')
+```
+
+```output
+MAPE: 2.0572089029888656 %
+```
+
+🏆 非常棒的图表,显示了一个具有良好准确性的模型。干得好!
+
+---
+
+## 🚀挑战
+
+- 尝试在创建模型时调整超参数(gamma、C、epsilon),并在数据上进行评估,看看哪组超参数在测试数据上表现最佳。要了解更多关于这些超参数的信息,你可以参考[这里的文档](https://scikit-learn.org/stable/modules/svm.html#parameters-of-the-rbf-kernel)。
+- 尝试为模型使用不同的核函数,并分析它们在数据集上的表现。相关文档可以参考[这里](https://scikit-learn.org/stable/modules/svm.html#kernel-functions)。
+- 尝试为模型设置不同的 `timesteps` 值,观察模型在预测时的表现。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+本课旨在介绍 SVR 在时间序列预测中的应用。要了解更多关于 SVR 的信息,你可以参考[这篇博客](https://www.analyticsvidhya.com/blog/2020/03/support-vector-regression-tutorial-for-machine-learning/)。[scikit-learn 的文档](https://scikit-learn.org/stable/modules/svm.html)提供了关于 SVM 的更全面解释,包括 [SVR](https://scikit-learn.org/stable/modules/svm.html#regression) 和其他实现细节,例如可以使用的不同[核函数](https://scikit-learn.org/stable/modules/svm.html#kernel-functions)及其参数。
+
+## 作业
+
+[一个新的 SVR 模型](assignment.md)
+
+## 致谢
+
+[^1]: 本节中的文本、代码和输出由 [@AnirbanMukherjeeXD](https://github.com/AnirbanMukherjeeXD) 提供
+[^2]: 本节中的文本、代码和输出取自 [ARIMA](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/3-SVR/assignment.md b/translations/zh-CN/7-TimeSeries/3-SVR/assignment.md
new file mode 100644
index 000000000..33e487d4a
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/3-SVR/assignment.md
@@ -0,0 +1,18 @@
+# 一个新的 SVR 模型
+
+## 说明 [^1]
+
+现在您已经构建了一个 SVR 模型,请使用新的数据集构建一个新的模型(可以尝试使用[杜克大学的这些数据集](http://www2.stat.duke.edu/~mw/ts_data_sets.html))。在笔记本中对您的工作进行注释,直观展示数据和模型,并使用适当的图表和 MAPE 测试模型的准确性。同时尝试调整不同的超参数,并使用不同的时间步长值。
+
+## 评分标准 [^1]
+
+| 标准 | 优秀 | 合格 | 需要改进 |
+| -------- | ------------------------------------------------------------ | --------------------------------------------------------- | ----------------------------------- |
+| | 提交的笔记本包含构建、测试并通过可视化和准确性说明的 SVR 模型。 | 提交的笔记本未注释或存在错误。 | 提交的笔记本不完整。 |
+
+[^1]:本节中的内容基于[ARIMA 的作业](https://github.com/microsoft/ML-For-Beginners/tree/main/7-TimeSeries/2-ARIMA/assignment.md)。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/3-SVR/solution/notebook.ipynb b/translations/zh-CN/7-TimeSeries/3-SVR/solution/notebook.ipynb
new file mode 100644
index 000000000..7b4443b5b
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/3-SVR/solution/notebook.ipynb
@@ -0,0 +1,1029 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fv9OoQsMFk5A"
+ },
+ "source": [
+ "# 使用支持向量回归器进行时间序列预测\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "在本笔记中,我们将演示如何:\n",
+ "\n",
+ "- 准备二维时间序列数据以训练SVM回归模型\n",
+ "- 使用RBF核实现SVR\n",
+ "- 通过图表和MAPE评估模型\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 导入模块\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "sys.path.append('../../')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "M687KNlQFp0-"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import warnings\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import datetime as dt\n",
+ "import math\n",
+ "\n",
+ "from sklearn.svm import SVR\n",
+ "from sklearn.preprocessing import MinMaxScaler\n",
+ "from common.utils import load_data, mape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Cj-kfVdMGjWP"
+ },
+ "source": [
+ "## 准备数据\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8fywSjC6GsRz"
+ },
+ "source": [
+ "### 加载数据\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 363
+ },
+ "id": "aBDkEB11Fumg",
+ "outputId": "99cf7987-0509-4b73-8cc2-75d7da0d2740"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-01-01 00:00:00 \n",
+ " 2698.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 01:00:00 \n",
+ " 2558.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 02:00:00 \n",
+ " 2444.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 03:00:00 \n",
+ " 2402.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 04:00:00 \n",
+ " 2403.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2012-01-01 00:00:00 2698.0\n",
+ "2012-01-01 01:00:00 2558.0\n",
+ "2012-01-01 02:00:00 2444.0\n",
+ "2012-01-01 03:00:00 2402.0\n",
+ "2012-01-01 04:00:00 2403.0"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "energy = load_data('../../data')[['load']]\n",
+ "energy.head(5)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "O0BWP13rGnh4"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 486
+ },
+ "id": "hGaNPKu_Gidk",
+ "outputId": "7f89b326-9057-4f49-efbe-cb100ebdf76d"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IPuNor4eGwYY"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "id": "ysvsNyONGt0Q"
+ },
+ "outputs": [],
+ "source": [
+ "train_start_dt = '2014-11-01 00:00:00'\n",
+ "test_start_dt = '2014-12-30 00:00:00'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 548
+ },
+ "id": "SsfdLoPyGy9w",
+ "outputId": "d6d6c25b-b1f4-47e5-91d1-707e043237d7"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)][['load']].rename(columns={'load':'train'}) \\\n",
+ " .join(energy[test_start_dt:][['load']].rename(columns={'load':'test'}), how='outer') \\\n",
+ " .plot(y=['train', 'test'], figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XbFTqBw6G1Ch"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "现在,您需要通过过滤和缩放数据来准备训练数据。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cYivRdQpHDj3",
+ "outputId": "a138f746-461c-4fd6-bfa6-0cee094c4aa1"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Training data shape: (1416, 1)\n",
+ "Test data shape: (48, 1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "train = energy.copy()[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']]\n",
+ "test = energy.copy()[energy.index >= test_start_dt][['load']]\n",
+ "\n",
+ "print('Training data shape: ', train.shape)\n",
+ "print('Test data shape: ', test.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "将数据缩放到范围 (0, 1)。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 363
+ },
+ "id": "3DNntGQnZX8G",
+ "outputId": "210046bc-7a66-4ccd-d70d-aa4a7309949c"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2014-11-01 00:00:00 \n",
+ " 0.101611 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 01:00:00 \n",
+ " 0.065801 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 02:00:00 \n",
+ " 0.046106 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 03:00:00 \n",
+ " 0.042525 \n",
+ " \n",
+ " \n",
+ " 2014-11-01 04:00:00 \n",
+ " 0.059087 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2014-11-01 00:00:00 0.101611\n",
+ "2014-11-01 01:00:00 0.065801\n",
+ "2014-11-01 02:00:00 0.046106\n",
+ "2014-11-01 03:00:00 0.042525\n",
+ "2014-11-01 04:00:00 0.059087"
+ ]
+ },
+ "execution_count": 8,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "scaler = MinMaxScaler()\n",
+ "train['load'] = scaler.fit_transform(train)\n",
+ "train.head(5)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 206
+ },
+ "id": "26Yht-rzZexe",
+ "outputId": "20326077-a38a-4e78-cc5b-6fd7af95d301"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2014-12-30 00:00:00 \n",
+ " 0.329454 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 01:00:00 \n",
+ " 0.290063 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 02:00:00 \n",
+ " 0.273948 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 03:00:00 \n",
+ " 0.268129 \n",
+ " \n",
+ " \n",
+ " 2014-12-30 04:00:00 \n",
+ " 0.302596 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2014-12-30 00:00:00 0.329454\n",
+ "2014-12-30 01:00:00 0.290063\n",
+ "2014-12-30 02:00:00 0.273948\n",
+ "2014-12-30 03:00:00 0.268129\n",
+ "2014-12-30 04:00:00 0.302596"
+ ]
+ },
+ "execution_count": 9,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "test['load'] = scaler.transform(test)\n",
+ "test.head(5)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "x0n6jqxOQ41Z"
+ },
+ "source": [
+ "### 创建具有时间步的数据\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fdmxTZtOQ8xs"
+ },
+ "source": [
+ "对于我们的SVR,我们将输入数据转换为`[batch, timesteps]`的形式。因此,我们重新调整现有的`train_data`和`test_data`,使其具有一个新的维度,该维度表示时间步。在我们的示例中,我们取`timesteps = 5`。因此,模型的输入是前4个时间步的数据,输出将是第5个时间步的数据。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {
+ "id": "Rpju-Sc2HFm0"
+ },
+ "outputs": [],
+ "source": [
+ "# Converting to numpy arrays\n",
+ "\n",
+ "train_data = train.values\n",
+ "test_data = test.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Selecting the timesteps\n",
+ "\n",
+ "timesteps=5"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "O-JrsrsVJhUQ",
+ "outputId": "c90dbe71-bacc-4ec4-b452-f82fe5aefaef"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(1412, 5)"
+ ]
+ },
+ "execution_count": 12,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Converting data to 2D tensor\n",
+ "\n",
+ "train_data_timesteps=np.array([[j for j in train_data[i:i+timesteps]] for i in range(0,len(train_data)-timesteps+1)])[:,:,0]\n",
+ "train_data_timesteps.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "exJD8AI7KE4g",
+ "outputId": "ce90260c-f327-427d-80f2-77307b5a6318"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "(44, 5)"
+ ]
+ },
+ "execution_count": 13,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Converting test data to 2D tensor\n",
+ "\n",
+ "test_data_timesteps=np.array([[j for j in test_data[i:i+timesteps]] for i in range(0,len(test_data)-timesteps+1)])[:,:,0]\n",
+ "test_data_timesteps.shape"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {
+ "id": "2u0R2sIsLuq5"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(1412, 4) (1412, 1)\n",
+ "(44, 4) (44, 1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "x_train, y_train = train_data_timesteps[:,:timesteps-1],train_data_timesteps[:,[timesteps-1]]\n",
+ "x_test, y_test = test_data_timesteps[:,:timesteps-1],test_data_timesteps[:,[timesteps-1]]\n",
+ "\n",
+ "print(x_train.shape, y_train.shape)\n",
+ "print(x_test.shape, y_test.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8wIPOtAGLZlh"
+ },
+ "source": [
+ "## 创建SVR模型\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {
+ "id": "EhA403BEPEiD"
+ },
+ "outputs": [],
+ "source": [
+ "# Create model using RBF kernel\n",
+ "\n",
+ "model = SVR(kernel='rbf',gamma=0.5, C=10, epsilon = 0.05)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 16,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "GS0UA3csMbqp",
+ "outputId": "d86b6f05-5742-4c1d-c2db-c40510bd4f0d"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/plain": [
+ "SVR(C=10, cache_size=200, coef0=0.0, degree=3, epsilon=0.05, gamma=0.5,\n",
+ " kernel='rbf', max_iter=-1, shrinking=True, tol=0.001, verbose=False)"
+ ]
+ },
+ "execution_count": 16,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "# Fit model on training data\n",
+ "\n",
+ "model.fit(x_train, y_train[:,0])"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Rz_x8S3UrlcF"
+ },
+ "source": [
+ "### 进行模型预测\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 17,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "XR0gnt3MnuYS",
+ "outputId": "157e40ab-9a23-4b66-a885-0d52a24b2364"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "(1412, 1) (44, 1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Making predictions\n",
+ "\n",
+ "y_train_pred = model.predict(x_train).reshape(-1,1)\n",
+ "y_test_pred = model.predict(x_test).reshape(-1,1)\n",
+ "\n",
+ "print(y_train_pred.shape, y_test_pred.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "_2epncg-SGzr"
+ },
+ "source": [
+ "## 分析模型性能\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 18,
+ "metadata": {},
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1412 44\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Scaling the predictions\n",
+ "\n",
+ "y_train_pred = scaler.inverse_transform(y_train_pred)\n",
+ "y_test_pred = scaler.inverse_transform(y_test_pred)\n",
+ "\n",
+ "print(len(y_train_pred), len(y_test_pred))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 19,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "xmm_YLXhq7gV",
+ "outputId": "18392f64-4029-49ac-c71a-a4e2411152a1"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1412 44\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Scaling the original values\n",
+ "\n",
+ "y_train = scaler.inverse_transform(y_train)\n",
+ "y_test = scaler.inverse_transform(y_test)\n",
+ "\n",
+ "print(len(y_train), len(y_test))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "u3LBj93coHEi",
+ "outputId": "d4fd49e8-8c6e-4bb0-8ef9-ca0b26d725b4"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "1412 44\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Extract the timesteps for x-axis\n",
+ "\n",
+ "train_timestamps = energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)].index[timesteps-1:]\n",
+ "test_timestamps = energy[test_start_dt:].index[timesteps-1:]\n",
+ "\n",
+ "print(len(train_timestamps), len(test_timestamps))"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 21,
+ "metadata": {},
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(25,6))\n",
+ "plt.plot(train_timestamps, y_train, color = 'red', linewidth=2.0, alpha = 0.6)\n",
+ "plt.plot(train_timestamps, y_train_pred, color = 'blue', linewidth=0.8)\n",
+ "plt.legend(['Actual','Predicted'])\n",
+ "plt.xlabel('Timestamp')\n",
+ "plt.title(\"Training data prediction\")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "LnhzcnYtXHCm",
+ "outputId": "f5f0d711-f18b-4788-ad21-d4470ea2c02b"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MAPE for training data: 1.7195710200875551 %\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('MAPE for training data: ', mape(y_train_pred, y_train)*100, '%')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 225
+ },
+ "id": "53Q02FoqQH4V",
+ "outputId": "53e2d59b-5075-4765-ad9e-aed56c966583"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(10,3))\n",
+ "plt.plot(test_timestamps, y_test, color = 'red', linewidth=2.0, alpha = 0.6)\n",
+ "plt.plot(test_timestamps, y_test_pred, color = 'blue', linewidth=0.8)\n",
+ "plt.legend(['Actual','Predicted'])\n",
+ "plt.xlabel('Timestamp')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 24,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "clOAUH-SXCJG",
+ "outputId": "a3aa85ff-126a-4a4a-cd9e-90b9cc465ef5"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MAPE for testing data: 1.2623790187854018 %\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('MAPE for testing data: ', mape(y_test_pred, y_test)*100, '%')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DHlKvVCId5ue"
+ },
+ "source": [
+ "## 全数据集预测\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 25,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cOFJ45vreO0N",
+ "outputId": "35628e33-ecf9-4966-8036-f7ea86db6f16"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "Tensor shape: (26300, 5)\n",
+ "X shape: (26300, 4) \n",
+ "Y shape: (26300, 1)\n"
+ ]
+ }
+ ],
+ "source": [
+ "# Extracting load values as numpy array\n",
+ "data = energy.copy().values\n",
+ "\n",
+ "# Scaling\n",
+ "data = scaler.transform(data)\n",
+ "\n",
+ "# Transforming to 2D tensor as per model input requirement\n",
+ "data_timesteps=np.array([[j for j in data[i:i+timesteps]] for i in range(0,len(data)-timesteps+1)])[:,:,0]\n",
+ "print(\"Tensor shape: \", data_timesteps.shape)\n",
+ "\n",
+ "# Selecting inputs and outputs from data\n",
+ "X, Y = data_timesteps[:,:timesteps-1],data_timesteps[:,[timesteps-1]]\n",
+ "print(\"X shape: \", X.shape,\"\\nY shape: \", Y.shape)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {
+ "id": "ESSAdQgwexIi"
+ },
+ "outputs": [],
+ "source": [
+ "# Make model predictions\n",
+ "Y_pred = model.predict(X).reshape(-1,1)\n",
+ "\n",
+ "# Inverse scale and reshape\n",
+ "Y_pred = scaler.inverse_transform(Y_pred)\n",
+ "Y = scaler.inverse_transform(Y)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 27,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 328
+ },
+ "id": "M_qhihN0RVVX",
+ "outputId": "a89cb23e-1d35-437f-9d63-8b8907e12f80"
+ },
+ "outputs": [
+ {
+ "data": {
+ "image/png": "\n",
+ "text/plain": [
+ ""
+ ]
+ },
+ "metadata": {
+ "needs_background": "light"
+ },
+ "output_type": "display_data"
+ }
+ ],
+ "source": [
+ "plt.figure(figsize=(30,8))\n",
+ "plt.plot(Y, color = 'red', linewidth=2.0, alpha = 0.6)\n",
+ "plt.plot(Y_pred, color = 'blue', linewidth=1)\n",
+ "plt.legend(['Actual','Predicted'])\n",
+ "plt.xlabel('Timestamp')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 28,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "AcN7pMYXVGTK",
+ "outputId": "7e1c2161-47ce-496c-9d86-7ad9ae0df770"
+ },
+ "outputs": [
+ {
+ "name": "stdout",
+ "output_type": "stream",
+ "text": [
+ "MAPE: 2.0572089029888656 %\n"
+ ]
+ }
+ ],
+ "source": [
+ "print('MAPE: ', mape(Y_pred, Y)*100, '%')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "collapsed_sections": [],
+ "name": "Recurrent_Neural_Networks.ipynb",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.1"
+ },
+ "coopTranslator": {
+ "original_hash": "f8f3967282314d3995245835bdaa8418",
+ "translation_date": "2025-09-03T19:58:46+00:00",
+ "source_file": "7-TimeSeries/3-SVR/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/3-SVR/working/notebook.ipynb b/translations/zh-CN/7-TimeSeries/3-SVR/working/notebook.ipynb
new file mode 100644
index 000000000..a6926e949
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/3-SVR/working/notebook.ipynb
@@ -0,0 +1,705 @@
+{
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fv9OoQsMFk5A"
+ },
+ "source": [
+ "# 使用支持向量回归器进行时间序列预测\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "在本笔记中,我们将演示如何:\n",
+ "\n",
+ "- 准备二维时间序列数据以训练SVM回归模型\n",
+ "- 使用RBF核实现SVR\n",
+ "- 通过图表和MAPE评估模型\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "## 导入模块\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import sys\n",
+ "sys.path.append('../../')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {
+ "id": "M687KNlQFp0-"
+ },
+ "outputs": [],
+ "source": [
+ "import os\n",
+ "import warnings\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import pandas as pd\n",
+ "import datetime as dt\n",
+ "import math\n",
+ "\n",
+ "from sklearn.svm import SVR\n",
+ "from sklearn.preprocessing import MinMaxScaler\n",
+ "from common.utils import load_data, mape"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Cj-kfVdMGjWP"
+ },
+ "source": [
+ "## 准备数据\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8fywSjC6GsRz"
+ },
+ "source": [
+ "### 加载数据\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 363
+ },
+ "id": "aBDkEB11Fumg",
+ "outputId": "99cf7987-0509-4b73-8cc2-75d7da0d2740"
+ },
+ "outputs": [
+ {
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " load \n",
+ " \n",
+ " \n",
+ " \n",
+ " \n",
+ " 2012-01-01 00:00:00 \n",
+ " 2698.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 01:00:00 \n",
+ " 2558.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 02:00:00 \n",
+ " 2444.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 03:00:00 \n",
+ " 2402.0 \n",
+ " \n",
+ " \n",
+ " 2012-01-01 04:00:00 \n",
+ " 2403.0 \n",
+ " \n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " load\n",
+ "2012-01-01 00:00:00 2698.0\n",
+ "2012-01-01 01:00:00 2558.0\n",
+ "2012-01-01 02:00:00 2444.0\n",
+ "2012-01-01 03:00:00 2402.0\n",
+ "2012-01-01 04:00:00 2403.0"
+ ]
+ },
+ "execution_count": 3,
+ "metadata": {},
+ "output_type": "execute_result"
+ }
+ ],
+ "source": [
+ "energy = load_data('../../data')[['load']]\n",
+ "energy.head(5)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "O0BWP13rGnh4"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 486
+ },
+ "id": "hGaNPKu_Gidk",
+ "outputId": "7f89b326-9057-4f49-efbe-cb100ebdf76d"
+ },
+ "outputs": [],
+ "source": [
+ "energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "IPuNor4eGwYY"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ysvsNyONGt0Q"
+ },
+ "outputs": [],
+ "source": [
+ "train_start_dt = '2014-11-01 00:00:00'\n",
+ "test_start_dt = '2014-12-30 00:00:00'"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 548
+ },
+ "id": "SsfdLoPyGy9w",
+ "outputId": "d6d6c25b-b1f4-47e5-91d1-707e043237d7"
+ },
+ "outputs": [],
+ "source": [
+ "energy[(energy.index < test_start_dt) & (energy.index >= train_start_dt)][['load']].rename(columns={'load':'train'}) \\\n",
+ " .join(energy[test_start_dt:][['load']].rename(columns={'load':'test'}), how='outer') \\\n",
+ " .plot(y=['train', 'test'], figsize=(15, 8), fontsize=12)\n",
+ "plt.xlabel('timestamp', fontsize=12)\n",
+ "plt.ylabel('load', fontsize=12)\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "XbFTqBw6G1Ch"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "现在,您需要通过过滤和缩放数据来准备训练数据。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cYivRdQpHDj3",
+ "outputId": "a138f746-461c-4fd6-bfa6-0cee094c4aa1"
+ },
+ "outputs": [],
+ "source": [
+ "train = energy.copy()[(energy.index >= train_start_dt) & (energy.index < test_start_dt)][['load']]\n",
+ "test = energy.copy()[energy.index >= test_start_dt][['load']]\n",
+ "\n",
+ "print('Training data shape: ', train.shape)\n",
+ "print('Test data shape: ', test.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "将数据缩放到范围 (0, 1)。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 363
+ },
+ "id": "3DNntGQnZX8G",
+ "outputId": "210046bc-7a66-4ccd-d70d-aa4a7309949c"
+ },
+ "outputs": [],
+ "source": [
+ "scaler = MinMaxScaler()\n",
+ "train['load'] = scaler.fit_transform(train)\n",
+ "train.head(5)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 206
+ },
+ "id": "26Yht-rzZexe",
+ "outputId": "20326077-a38a-4e78-cc5b-6fd7af95d301"
+ },
+ "outputs": [],
+ "source": [
+ "test['load'] = scaler.transform(test)\n",
+ "test.head(5)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "x0n6jqxOQ41Z"
+ },
+ "source": [
+ "### 创建具有时间步的数据\n"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "fdmxTZtOQ8xs"
+ },
+ "source": [
+ "对于我们的SVR,我们将输入数据转换为`[batch, timesteps]`的形式。因此,我们重新调整现有的`train_data`和`test_data`,使其具有一个新的维度,该维度表示时间步。在我们的示例中,我们取`timesteps = 5`。因此,模型的输入是前4个时间步的数据,输出将是第5个时间步的数据。\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "Rpju-Sc2HFm0"
+ },
+ "outputs": [],
+ "source": [
+ "# Converting to numpy arrays\n",
+ "\n",
+ "train_data = train.values\n",
+ "test_data = test.values"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Selecting the timesteps\n",
+ "\n",
+ "timesteps=None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "O-JrsrsVJhUQ",
+ "outputId": "c90dbe71-bacc-4ec4-b452-f82fe5aefaef"
+ },
+ "outputs": [],
+ "source": [
+ "# Converting data to 2D tensor\n",
+ "\n",
+ "train_data_timesteps=None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "exJD8AI7KE4g",
+ "outputId": "ce90260c-f327-427d-80f2-77307b5a6318"
+ },
+ "outputs": [],
+ "source": [
+ "# Converting test data to 2D tensor\n",
+ "\n",
+ "test_data_timesteps=None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "2u0R2sIsLuq5"
+ },
+ "outputs": [],
+ "source": [
+ "x_train, y_train = None\n",
+ "x_test, y_test = None\n",
+ "\n",
+ "print(x_train.shape, y_train.shape)\n",
+ "print(x_test.shape, y_test.shape)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "8wIPOtAGLZlh"
+ },
+ "source": [
+ "## 创建SVR模型\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "EhA403BEPEiD"
+ },
+ "outputs": [],
+ "source": [
+ "# Create model using RBF kernel\n",
+ "\n",
+ "model = None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "GS0UA3csMbqp",
+ "outputId": "d86b6f05-5742-4c1d-c2db-c40510bd4f0d"
+ },
+ "outputs": [],
+ "source": [
+ "# Fit model on training data"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "Rz_x8S3UrlcF"
+ },
+ "source": [
+ "### 进行模型预测\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "XR0gnt3MnuYS",
+ "outputId": "157e40ab-9a23-4b66-a885-0d52a24b2364"
+ },
+ "outputs": [],
+ "source": [
+ "# Making predictions\n",
+ "\n",
+ "y_train_pred = None\n",
+ "y_test_pred = None"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "_2epncg-SGzr"
+ },
+ "source": [
+ "## 分析模型性能\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# Scaling the predictions\n",
+ "\n",
+ "y_train_pred = scaler.inverse_transform(y_train_pred)\n",
+ "y_test_pred = scaler.inverse_transform(y_test_pred)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "xmm_YLXhq7gV",
+ "outputId": "18392f64-4029-49ac-c71a-a4e2411152a1"
+ },
+ "outputs": [],
+ "source": [
+ "# Scaling the original values\n",
+ "\n",
+ "y_train = scaler.inverse_transform(y_train)\n",
+ "y_test = scaler.inverse_transform(y_test)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "u3LBj93coHEi",
+ "outputId": "d4fd49e8-8c6e-4bb0-8ef9-ca0b26d725b4"
+ },
+ "outputs": [],
+ "source": [
+ "# Extract the timesteps for x-axis\n",
+ "\n",
+ "train_timestamps = None\n",
+ "test_timestamps = None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "plt.figure(figsize=(25,6))\n",
+ "# plot original output\n",
+ "# plot predicted output\n",
+ "plt.legend(['Actual','Predicted'])\n",
+ "plt.xlabel('Timestamp')\n",
+ "plt.title(\"Training data prediction\")\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "LnhzcnYtXHCm",
+ "outputId": "f5f0d711-f18b-4788-ad21-d4470ea2c02b"
+ },
+ "outputs": [],
+ "source": [
+ "print('MAPE for training data: ', mape(y_train_pred, y_train)*100, '%')"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 225
+ },
+ "id": "53Q02FoqQH4V",
+ "outputId": "53e2d59b-5075-4765-ad9e-aed56c966583"
+ },
+ "outputs": [],
+ "source": [
+ "plt.figure(figsize=(10,3))\n",
+ "# plot original output\n",
+ "# plot predicted output\n",
+ "plt.legend(['Actual','Predicted'])\n",
+ "plt.xlabel('Timestamp')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "clOAUH-SXCJG",
+ "outputId": "a3aa85ff-126a-4a4a-cd9e-90b9cc465ef5"
+ },
+ "outputs": [],
+ "source": [
+ "print('MAPE for testing data: ', mape(y_test_pred, y_test)*100, '%')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "DHlKvVCId5ue"
+ },
+ "source": [
+ "## 全数据集预测\n"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cOFJ45vreO0N",
+ "outputId": "35628e33-ecf9-4966-8036-f7ea86db6f16"
+ },
+ "outputs": [],
+ "source": [
+ "# Extracting load values as numpy array\n",
+ "data = None\n",
+ "\n",
+ "# Scaling\n",
+ "data = None\n",
+ "\n",
+ "# Transforming to 2D tensor as per model input requirement\n",
+ "data_timesteps=None\n",
+ "\n",
+ "# Selecting inputs and outputs from data\n",
+ "X, Y = None, None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "ESSAdQgwexIi"
+ },
+ "outputs": [],
+ "source": [
+ "# Make model predictions\n",
+ "\n",
+ "# Inverse scale and reshape\n",
+ "Y_pred = None\n",
+ "Y = None"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 328
+ },
+ "id": "M_qhihN0RVVX",
+ "outputId": "a89cb23e-1d35-437f-9d63-8b8907e12f80"
+ },
+ "outputs": [],
+ "source": [
+ "plt.figure(figsize=(30,8))\n",
+ "# plot original output\n",
+ "# plot predicted output\n",
+ "plt.legend(['Actual','Predicted'])\n",
+ "plt.xlabel('Timestamp')\n",
+ "plt.show()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "AcN7pMYXVGTK",
+ "outputId": "7e1c2161-47ce-496c-9d86-7ad9ae0df770"
+ },
+ "outputs": [],
+ "source": [
+ "print('MAPE: ', mape(Y_pred, Y)*100, '%')"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ],
+ "metadata": {
+ "accelerator": "GPU",
+ "colab": {
+ "collapsed_sections": [],
+ "name": "Recurrent_Neural_Networks.ipynb",
+ "provenance": []
+ },
+ "kernelspec": {
+ "display_name": "Python 3",
+ "language": "python",
+ "name": "python3"
+ },
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.1"
+ },
+ "coopTranslator": {
+ "original_hash": "e86ce102239a14c44585623b9b924a74",
+ "translation_date": "2025-09-03T20:00:54+00:00",
+ "source_file": "7-TimeSeries/3-SVR/working/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 1
+}
\ No newline at end of file
diff --git a/translations/zh-CN/7-TimeSeries/README.md b/translations/zh-CN/7-TimeSeries/README.md
new file mode 100644
index 000000000..f1d427adb
--- /dev/null
+++ b/translations/zh-CN/7-TimeSeries/README.md
@@ -0,0 +1,28 @@
+# 时间序列预测简介
+
+什么是时间序列预测?它是通过分析过去的趋势来预测未来事件。
+
+## 区域主题:全球电力使用 ✨
+
+在这两节课中,你将了解时间序列预测,这是一种相对较少被人熟知但在工业和商业应用等领域极具价值的机器学习领域。虽然神经网络可以用来增强这些模型的效用,但我们将从经典机器学习的角度研究它们,因为这些模型可以根据过去的数据预测未来的表现。
+
+我们的区域重点是全球电力使用,这是一个有趣的数据集,可以用来学习如何根据过去的负载模式预测未来的电力使用情况。你会发现这种预测在商业环境中非常有帮助。
+
+
+
+照片由 [Peddi Sai hrithik](https://unsplash.com/@shutter_log?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText) 在拉贾斯坦邦的道路上拍摄的电力塔,发布于 [Unsplash](https://unsplash.com/s/photos/electric-india?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText)
+
+## 课程
+
+1. [时间序列预测简介](1-Introduction/README.md)
+2. [构建 ARIMA 时间序列模型](2-ARIMA/README.md)
+3. [构建支持向量回归器进行时间序列预测](3-SVR/README.md)
+
+## 致谢
+
+“时间序列预测简介”由 [Francesca Lazzeri](https://twitter.com/frlazzeri) 和 [Jen Looper](https://twitter.com/jenlooper) ⚡️ 编写。相关笔记本最初出现在 [Azure "Deep Learning For Time Series" 仓库](https://github.com/Azure/DeepLearningForTimeSeriesForecasting),由 Francesca Lazzeri 编写。SVR 课程由 [Anirban Mukherjee](https://github.com/AnirbanMukherjeeXD) 编写。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/README.md b/translations/zh-CN/8-Reinforcement/1-QLearning/README.md
new file mode 100644
index 000000000..827398ac5
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/README.md
@@ -0,0 +1,247 @@
+# 强化学习与Q学习简介
+
+
+> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)
+
+强化学习涉及三个重要概念:代理、状态和每个状态的一组动作。通过在指定状态下执行一个动作,代理会获得奖励。想象一下电脑游戏《超级马里奥》。你是马里奥,处于一个游戏关卡中,站在悬崖边上。你的上方有一个金币。你作为马里奥,处于游戏关卡中的特定位置……这就是你的状态。向右移动一步(一个动作)会让你掉下悬崖,这会给你一个较低的数值分数。然而,按下跳跃按钮会让你得分并保持存活。这是一个积极的结果,应该奖励你一个正数分数。
+
+通过使用强化学习和模拟器(游戏),你可以学习如何玩游戏以最大化奖励,即保持存活并尽可能多地得分。
+
+[](https://www.youtube.com/watch?v=lDq_en8RNOo)
+
+> 🎥 点击上方图片观看 Dmitry 讨论强化学习
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 前提条件与设置
+
+在本课中,我们将用 Python 实验一些代码。你应该能够在你的电脑或云端运行本课的 Jupyter Notebook 代码。
+
+你可以打开[课程笔记本](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/notebook.ipynb),并按照课程内容进行学习。
+
+> **注意:** 如果你从云端打开代码,还需要获取 [`rlboard.py`](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/rlboard.py) 文件,该文件在笔记本代码中使用。将其添加到与笔记本相同的目录中。
+
+## 简介
+
+在本课中,我们将探索**《彼得与狼》**的世界,这个故事灵感来源于俄罗斯作曲家[谢尔盖·普罗科菲耶夫](https://en.wikipedia.org/wiki/Sergei_Prokofiev)创作的音乐童话。我们将使用**强化学习**让彼得探索他的环境,收集美味的苹果并避免遇到狼。
+
+**强化学习**(RL)是一种学习技术,它通过运行许多实验让我们学习代理在某个**环境**中的最佳行为。代理在这个环境中应该有某种**目标**,由**奖励函数**定义。
+
+## 环境
+
+为了简化,我们将彼得的世界设定为一个大小为 `width` x `height` 的方形棋盘,如下所示:
+
+
+
+棋盘中的每个单元格可以是:
+
+* **地面**,彼得和其他生物可以在上面行走。
+* **水域**,显然无法在上面行走。
+* **树**或**草地**,可以休息的地方。
+* **苹果**,彼得很高兴找到的食物。
+* **狼**,危险的生物,应避免接触。
+
+有一个单独的 Python 模块 [`rlboard.py`](https://github.com/microsoft/ML-For-Beginners/blob/main/8-Reinforcement/1-QLearning/rlboard.py),包含了与这个环境交互的代码。由于这些代码对理解我们的概念并不重要,我们将导入模块并使用它创建示例棋盘(代码块 1):
+
+```python
+from rlboard import *
+
+width, height = 8,8
+m = Board(width,height)
+m.randomize(seed=13)
+m.plot()
+```
+
+这段代码应该打印出类似上图的环境。
+
+## 动作与策略
+
+在我们的示例中,彼得的目标是找到苹果,同时避免狼和其他障碍物。为此,他可以在棋盘上四处走动,直到找到苹果。
+
+因此,在任何位置,他可以选择以下动作之一:向上、向下、向左和向右。
+
+我们将这些动作定义为一个字典,并将它们映射到对应的坐标变化。例如,向右移动(`R`)对应于坐标对 `(1,0)`。(代码块 2):
+
+```python
+actions = { "U" : (0,-1), "D" : (0,1), "L" : (-1,0), "R" : (1,0) }
+action_idx = { a : i for i,a in enumerate(actions.keys()) }
+```
+
+总结一下,这个场景的策略和目标如下:
+
+- **策略**:我们的代理(彼得)的策略由所谓的**策略函数**定义。策略函数在任何给定状态下返回动作。在我们的例子中,问题的状态由棋盘表示,包括玩家的当前位置。
+
+- **目标**:强化学习的目标是最终学习一个好的策略,使我们能够高效地解决问题。然而,作为基线,我们可以考虑最简单的策略,称为**随机游走**。
+
+## 随机游走
+
+首先,我们通过实现随机游走策略来解决问题。在随机游走中,我们会随机选择允许的动作,直到到达苹果(代码块 3)。
+
+1. 使用以下代码实现随机游走:
+
+ ```python
+ def random_policy(m):
+ return random.choice(list(actions))
+
+ def walk(m,policy,start_position=None):
+ n = 0 # number of steps
+ # set initial position
+ if start_position:
+ m.human = start_position
+ else:
+ m.random_start()
+ while True:
+ if m.at() == Board.Cell.apple:
+ return n # success!
+ if m.at() in [Board.Cell.wolf, Board.Cell.water]:
+ return -1 # eaten by wolf or drowned
+ while True:
+ a = actions[policy(m)]
+ new_pos = m.move_pos(m.human,a)
+ if m.is_valid(new_pos) and m.at(new_pos)!=Board.Cell.water:
+ m.move(a) # do the actual move
+ break
+ n+=1
+
+ walk(m,random_policy)
+ ```
+
+ 调用 `walk` 应返回对应路径的长度,该长度可能因运行而异。
+
+1. 多次运行游走实验(例如,100 次),并打印结果统计数据(代码块 4):
+
+ ```python
+ def print_statistics(policy):
+ s,w,n = 0,0,0
+ for _ in range(100):
+ z = walk(m,policy)
+ if z<0:
+ w+=1
+ else:
+ s += z
+ n += 1
+ print(f"Average path length = {s/n}, eaten by wolf: {w} times")
+
+ print_statistics(random_policy)
+ ```
+
+ 注意,路径的平均长度约为 30-40 步,这相当多,考虑到到最近苹果的平均距离约为 5-6 步。
+
+ 你还可以看到彼得在随机游走中的移动情况:
+
+ 
+
+## 奖励函数
+
+为了让我们的策略更智能,我们需要了解哪些动作比其他动作“更好”。为此,我们需要定义目标。
+
+目标可以通过**奖励函数**定义,该函数为每个状态返回一些分数值。分数越高,奖励函数越好。(代码块 5)
+
+```python
+move_reward = -0.1
+goal_reward = 10
+end_reward = -10
+
+def reward(m,pos=None):
+ pos = pos or m.human
+ if not m.is_valid(pos):
+ return end_reward
+ x = m.at(pos)
+ if x==Board.Cell.water or x == Board.Cell.wolf:
+ return end_reward
+ if x==Board.Cell.apple:
+ return goal_reward
+ return move_reward
+```
+
+奖励函数的一个有趣之处在于,大多数情况下,*我们只有在游戏结束时才会获得实质性奖励*。这意味着我们的算法应该以某种方式记住导致最终正奖励的“好”步骤,并增加它们的重要性。同样,所有导致不良结果的动作应该被抑制。
+
+## Q学习
+
+我们将讨论的算法称为**Q学习**。在这个算法中,策略由一个称为**Q表**的函数(或数据结构)定义。它记录了在给定状态下每个动作的“好坏程度”。
+
+之所以称为 Q表,是因为将其表示为表格或多维数组通常很方便。由于我们的棋盘维度为 `width` x `height`,我们可以使用形状为 `width` x `height` x `len(actions)` 的 numpy 数组来表示 Q表:(代码块 6)
+
+```python
+Q = np.ones((width,height,len(actions)),dtype=np.float)*1.0/len(actions)
+```
+
+注意,我们将 Q表的所有值初始化为相等值,在我们的例子中为 0.25。这对应于“随机游走”策略,因为每个状态中的所有动作都同样好。我们可以将 Q表传递给 `plot` 函数,以便在棋盘上可视化表格:`m.plot(Q)`。
+
+
+
+每个单元格的中心有一个“箭头”,指示移动的优选方向。由于所有方向都相等,显示的是一个点。
+
+现在我们需要运行模拟,探索环境,并学习 Q表值的更好分布,这将使我们更快找到苹果的路径。
+
+## Q学习的核心:贝尔曼方程
+
+一旦我们开始移动,每个动作都会有相应的奖励,即我们理论上可以根据最高的即时奖励选择下一个动作。然而,在大多数状态下,动作不会立即实现我们到达苹果的目标,因此我们无法立即决定哪个方向更好。
+
+> 请记住,重要的不是即时结果,而是最终结果,即我们将在模拟结束时获得的结果。
+
+为了考虑这种延迟奖励,我们需要使用**[动态规划](https://en.wikipedia.org/wiki/Dynamic_programming)**的原理,这使我们能够递归地思考问题。
+
+假设我们现在处于状态 *s*,并希望移动到下一个状态 *s'*。通过这样做,我们将获得即时奖励 *r(s,a)*,由奖励函数定义,加上某些未来奖励。如果我们假设我们的 Q表正确反映了每个动作的“吸引力”,那么在状态 *s'* 我们将选择一个动作 *a'*,其对应的值为 *Q(s',a')* 的最大值。因此,我们在状态 *s* 能够获得的最佳未来奖励将定义为 `max`
+
+## 检查策略
+
+由于 Q-Table 列出了每个状态下每个动作的“吸引力”,因此使用它来定义我们世界中的高效导航非常简单。在最简单的情况下,我们可以选择对应于最高 Q-Table 值的动作:(代码块 9)
+
+```python
+def qpolicy_strict(m):
+ x,y = m.human
+ v = probs(Q[x,y])
+ a = list(actions)[np.argmax(v)]
+ return a
+
+walk(m,qpolicy_strict)
+```
+
+> 如果多次尝试上面的代码,你可能会注意到有时它会“卡住”,需要按下笔记本中的 STOP 按钮来中断。这是因为可能存在两种状态在最佳 Q 值方面“指向”彼此的情况,这样代理就会在这些状态之间无限移动。
+
+## 🚀挑战
+
+> **任务 1:** 修改 `walk` 函数以限制路径的最大长度为一定步数(例如 100),并观察上面的代码是否会不时返回该值。
+
+> **任务 2:** 修改 `walk` 函数,使其不返回到之前已经到过的地方。这将防止 `walk` 进入循环,但代理仍可能最终被“困”在无法逃脱的位置。
+
+## 导航
+
+更好的导航策略是我们在训练期间使用的策略,它结合了利用和探索。在此策略中,我们将以一定的概率选择每个动作,该概率与 Q-Table 中的值成比例。此策略可能仍会导致代理返回到已经探索过的位置,但正如你从下面的代码中看到的,它会导致到达目标位置的平均路径非常短(记住 `print_statistics` 会运行 100 次模拟):(代码块 10)
+
+```python
+def qpolicy(m):
+ x,y = m.human
+ v = probs(Q[x,y])
+ a = random.choices(list(actions),weights=v)[0]
+ return a
+
+print_statistics(qpolicy)
+```
+
+运行此代码后,你应该会得到比之前小得多的平均路径长度,范围在 3-6 之间。
+
+## 调查学习过程
+
+正如我们提到的,学习过程是在探索和利用已获得的关于问题空间结构的知识之间的平衡。我们已经看到学习的结果(帮助代理找到到达目标的短路径的能力)有所改善,但观察平均路径长度在学习过程中的变化也很有趣:
+
+学习总结如下:
+
+- **平均路径长度增加**。我们看到的是,起初平均路径长度增加。这可能是因为当我们对环境一无所知时,很容易陷入糟糕的状态,比如水或狼。随着我们学习更多并开始使用这些知识,我们可以更长时间地探索环境,但仍然不太清楚苹果的位置。
+
+- **路径长度随着学习增加而减少**。一旦我们学到足够多,代理更容易实现目标,路径长度开始减少。然而,我们仍然开放探索,因此经常偏离最佳路径,探索新的选项,使路径比最优路径更长。
+
+- **长度突然增加**。我们在图表上还观察到某些时候长度突然增加。这表明过程的随机性,并且我们可能会在某些时候通过用新值覆盖 Q-Table 系数来“破坏”它们。这应该通过降低学习率来尽量减少(例如,在训练结束时,我们仅通过小值调整 Q-Table 值)。
+
+总体而言,重要的是要记住,学习过程的成功和质量在很大程度上取决于参数,例如学习率、学习率衰减和折扣因子。这些通常被称为 **超参数**,以区别于 **参数**,后者是在训练期间优化的(例如 Q-Table 系数)。寻找最佳超参数值的过程称为 **超参数优化**,它值得单独讨论。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 作业
+[一个更真实的世界](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/assignment.md b/translations/zh-CN/8-Reinforcement/1-QLearning/assignment.md
new file mode 100644
index 000000000..328fead57
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/assignment.md
@@ -0,0 +1,32 @@
+# 一个更真实的世界
+
+在我们的场景中,Peter几乎可以不感到疲惫或饥饿地四处移动。在一个更真实的世界中,他需要时不时地坐下来休息,还需要进食。让我们通过实现以下规则,使我们的世界更加真实:
+
+1. 每次从一个地方移动到另一个地方,Peter会失去**能量**并增加一些**疲劳**。
+2. Peter可以通过吃苹果来获得更多能量。
+3. Peter可以通过在树下或草地上休息来消除疲劳(即走到棋盘上有树或草的地方——绿色区域)。
+4. Peter需要找到并杀死狼。
+5. 为了杀死狼,Peter需要达到一定的能量和疲劳水平,否则他会输掉战斗。
+
+## 指导
+
+使用原始的 [notebook.ipynb](notebook.ipynb) 笔记本作为解决方案的起点。
+
+根据游戏规则修改上述奖励函数,运行强化学习算法以学习赢得游戏的最佳策略,并将随机游走的结果与您的算法进行比较,比较赢得和输掉的游戏数量。
+
+> **Note**: 在您的新世界中,状态更加复杂,除了人的位置,还包括疲劳和能量水平。您可以选择将状态表示为一个元组 (Board,energy,fatigue),或者为状态定义一个类(您可能还希望从 `Board` 派生),甚至修改原始的 `Board` 类(位于 [rlboard.py](../../../../8-Reinforcement/1-QLearning/rlboard.py) 中)。
+
+在您的解决方案中,请保留负责随机游走策略的代码,并在最后将您的算法结果与随机游走进行比较。
+
+> **Note**: 您可能需要调整超参数以使其正常工作,尤其是训练的轮数。由于游戏的成功(与狼战斗)是一个罕见事件,您可以预期更长的训练时间。
+
+## 评分标准
+
+| 标准 | 卓越表现 | 合格表现 | 需要改进 |
+| -------- | ----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | --------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------- | ------------------------------------------------------------------------------------------------------------------------------------------ |
+| | 提供了一个笔记本,其中定义了新的世界规则、Q学习算法以及一些文字说明。Q学习能够显著改善与随机游走相比的结果。 | 提供了笔记本,Q学习已实现并改善了与随机游走相比的结果,但改善不显著;或者笔记本文档较差,代码结构不够清晰。 | 对重新定义世界规则做了一些尝试,但Q学习算法未能正常工作,或者奖励函数未完全定义。 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/notebook.ipynb b/translations/zh-CN/8-Reinforcement/1-QLearning/notebook.ipynb
new file mode 100644
index 000000000..44f884efb
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/notebook.ipynb
@@ -0,0 +1,411 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "17e5a668646eabf5aabd0e9bfcf17876",
+ "translation_date": "2025-09-03T20:45:02+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "# 彼得与狼:强化学习入门\n",
+ "\n",
+ "在本教程中,我们将学习如何将强化学习应用于路径寻找问题。这个场景的灵感来源于俄罗斯作曲家[谢尔盖·普罗科菲耶夫](https://en.wikipedia.org/wiki/Sergei_Prokofiev)创作的音乐童话故事[《彼得与狼》](https://en.wikipedia.org/wiki/Peter_and_the_Wolf)。故事讲述了年轻的先锋彼得勇敢地走出家门,来到森林空地追逐一只狼。我们将训练机器学习算法,帮助彼得探索周围区域并构建一个最优导航地图。\n",
+ "\n",
+ "首先,让我们导入一组有用的库:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import random\n",
+ "import math"
+ ]
+ },
+ {
+ "source": [
+ "## 强化学习概述\n",
+ "\n",
+ "**强化学习** (RL) 是一种学习技术,通过运行大量实验,让我们能够学习某个**智能体**在某个**环境**中的最优行为。在这个环境中,智能体应该有一个明确的**目标**,由**奖励函数**定义。\n",
+ "\n",
+ "## 环境\n",
+ "\n",
+ "为了简单起见,我们将彼得的世界设定为一个大小为 `width` x `height` 的方形棋盘。棋盘中的每个格子可以是以下几种类型:\n",
+ "* **地面**,彼得和其他生物可以在上面行走\n",
+ "* **水域**,显然无法在上面行走\n",
+ "* **树**或**草地** - 可以休息的地方\n",
+ "* **苹果**,代表彼得很乐意找到的食物以填饱肚子\n",
+ "* **狼**,危险的生物,应尽量避开\n",
+ "\n",
+ "为了与环境交互,我们将定义一个名为 `Board` 的类。为了避免让这个笔记本过于复杂,我们已将所有与棋盘相关的代码移至单独的 `rlboard` 模块中,现在我们将导入该模块。你可以查看该模块以了解实现细节的内部工作原理。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "现在让我们创建一个随机棋盘,看看它的样子:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# code block 1"
+ ]
+ },
+ {
+ "source": [
+ "## 行动和策略\n",
+ "\n",
+ "在我们的例子中,Peter 的目标是找到一个苹果,同时避开狼和其他障碍物。将这些行动定义为一个字典,并将其映射到对应的坐标变化对。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# code block 2"
+ ]
+ },
+ {
+ "source": [
+ "我们代理(Peter)的策略由一个所谓的**策略**定义。让我们来看看最简单的策略,称为**随机游走**。\n",
+ "\n",
+ "## 随机游走\n",
+ "\n",
+ "首先,让我们通过实现随机游走策略来解决我们的问题。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "# Let's run a random walk experiment several times and see the average number of steps taken: code block 3"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# code block 4"
+ ]
+ },
+ {
+ "source": [
+ "## 奖励函数\n",
+ "\n",
+ "为了让我们的策略更加智能,我们需要了解哪些动作比其他动作“更好”。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#code block 5"
+ ]
+ },
+ {
+ "source": [
+ "## Q-Learning\n",
+ "\n",
+ "构建一个 Q-表,或者说是一个多维数组。由于我们的棋盘尺寸为 `width` x `height`,我们可以通过一个形状为 `width` x `height` x `len(actions)` 的 numpy 数组来表示 Q-表:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# code block 6"
+ ]
+ },
+ {
+ "source": [
+ "将 Q-表传递给 `plot` 函数,以便在棋盘上可视化该表:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "NameError",
+ "evalue": "name 'm' is not defined",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mm\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mplot\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0mQ\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;31mNameError\u001b[0m: name 'm' is not defined"
+ ]
+ }
+ ],
+ "source": [
+ "m.plot(Q)"
+ ]
+ },
+ {
+ "source": [
+ "## Q-Learning 的核心:贝尔曼方程与学习算法\n",
+ "\n",
+ "编写我们的学习算法伪代码:\n",
+ "\n",
+ "* 初始化 Q-表 Q,使所有状态和动作的值相等\n",
+ "* 设置学习率 $\\alpha\\leftarrow 1$\n",
+ "* 多次重复模拟\n",
+ " 1. 从随机位置开始\n",
+ " 2. 重复以下步骤\n",
+ " 1. 在状态 $s$ 下选择一个动作 $a$\n",
+ " 2. 执行动作,移动到新状态 $s'$\n",
+ " 3. 如果遇到游戏结束条件,或者总奖励过小 - 退出模拟 \n",
+ " 4. 计算新状态下的奖励 $r$\n",
+ " 5. 根据贝尔曼方程更新 Q-函数:$Q(s,a)\\leftarrow (1-\\alpha)Q(s,a)+\\alpha(r+\\gamma\\max_{a'}Q(s',a'))$\n",
+ " 6. $s\\leftarrow s'$\n",
+ " 7. 更新总奖励并减少 $\\alpha$。\n",
+ "\n",
+ "## 利用与探索\n",
+ "\n",
+ "最佳方法是在探索与利用之间找到平衡。随着我们对环境的了解加深,我们更倾向于遵循最优路径,但偶尔也需要选择未探索的路径。\n",
+ "\n",
+ "## Python 实现\n",
+ "\n",
+ "现在我们准备实现学习算法。在此之前,我们还需要一个函数,将 Q-表中的任意数值转换为对应动作的概率向量:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# code block 7"
+ ]
+ },
+ {
+ "source": [
+ "我们在原始向量中添加少量的 `eps`,以避免在初始情况下所有向量分量相同时出现除以0的情况。\n",
+ "\n",
+ "我们将运行实际的学习算法进行5000次实验,也称为**训练周期**:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 56,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ ""
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "from IPython.display import clear_output\n",
+ "\n",
+ "lpath = []\n",
+ "\n",
+ "# code block 8"
+ ]
+ },
+ {
+ "source": [
+ "在执行此算法后,Q-表应更新为定义每个步骤中不同动作吸引力的值。在此处可视化该表:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 43,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "m.plot(Q)"
+ ]
+ },
+ {
+ "source": [
+ "## 检查策略\n",
+ "\n",
+ "由于 Q-Table 列出了每个状态下每个动作的“吸引力”,我们可以很容易地利用它来定义在我们的世界中高效的导航。在最简单的情况下,我们只需选择对应于最高 Q-Table 值的动作:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "2"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ],
+ "source": [
+ "# code block 9"
+ ]
+ },
+ {
+ "source": [
+ "如果你多次尝试运行上述代码,你可能会注意到有时它会“卡住”,需要按下笔记本中的停止按钮来中断运行。\n",
+ "\n",
+ "> **任务 1:** 修改 `walk` 函数,限制路径的最大长度为一定步数(例如,100步),并观察上述代码是否会不时返回这个值。\n",
+ "\n",
+ "> **任务 2:** 修改 `walk` 函数,使其不再回到之前已经访问过的地方。这将防止 `walk` 进入循环,但代理仍可能被“困”在一个无法逃脱的位置。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 58,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Average path length = 5.31, eaten by wolf: 0 times\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "# code block 10"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 57,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 57
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.plot(lpath)"
+ ]
+ },
+ {
+ "source": [
+ "## 练习\n",
+ "## 一个更真实的《彼得与狼》的世界\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/solution/Julia/README.md b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/Julia/README.md
new file mode 100644
index 000000000..f30fc4eeb
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/solution/R/README.md b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/R/README.md
new file mode 100644
index 000000000..e939b3c66
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/solution/assignment-solution.ipynb b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/assignment-solution.ipynb
new file mode 100644
index 000000000..a69d0a05a
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/assignment-solution.ipynb
@@ -0,0 +1,478 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "eadbd20d2a075efb602615ad90b1e97a",
+ "translation_date": "2025-09-03T20:52:18+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/assignment-solution.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "# 彼得与狼:真实环境\n",
+ "\n",
+ "在我们的场景中,彼得几乎可以不感到疲惫或饥饿地四处移动。在更真实的世界中,他需要时不时地坐下来休息,还需要进食。让我们通过实施以下规则使我们的世界更加真实:\n",
+ "\n",
+ "1. 从一个地方移动到另一个地方时,彼得会失去**能量**并增加一些**疲劳**。\n",
+ "2. 彼得可以通过吃苹果来获得更多能量。\n",
+ "3. 彼得可以通过在树下或草地上休息来消除疲劳(即走到棋盘上有树或草的地方——绿色区域)。\n",
+ "4. 彼得需要找到并杀死狼。\n",
+ "5. 为了杀死狼,彼得需要达到一定的能量和疲劳水平,否则他会在战斗中失败。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import random\n",
+ "import math\n",
+ "from rlboard import *"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "width, height = 8,8\n",
+ "m = Board(width,height)\n",
+ "m.randomize(seed=13)\n",
+ "m.plot()"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "actions = { \"U\" : (0,-1), \"D\" : (0,1), \"L\" : (-1,0), \"R\" : (1,0) }\n",
+ "action_idx = { a : i for i,a in enumerate(actions.keys()) }"
+ ]
+ },
+ {
+ "source": [
+ "## 定义状态\n",
+ "\n",
+ "在我们的新游戏规则中,我们需要在每个棋盘状态下跟踪能量和疲劳。因此,我们将创建一个对象 `state`,它将包含当前问题状态所需的所有信息,包括棋盘状态、当前的能量和疲劳水平,以及在终端状态下是否能够击败狼:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "class state:\n",
+ " def __init__(self,board,energy=10,fatigue=0,init=True):\n",
+ " self.board = board\n",
+ " self.energy = energy\n",
+ " self.fatigue = fatigue\n",
+ " self.dead = False\n",
+ " if init:\n",
+ " self.board.random_start()\n",
+ " self.update()\n",
+ "\n",
+ " def at(self):\n",
+ " return self.board.at()\n",
+ "\n",
+ " def update(self):\n",
+ " if self.at() == Board.Cell.water:\n",
+ " self.dead = True\n",
+ " return\n",
+ " if self.at() == Board.Cell.tree:\n",
+ " self.fatigue = 0\n",
+ " if self.at() == Board.Cell.apple:\n",
+ " self.energy = 10\n",
+ "\n",
+ " def move(self,a):\n",
+ " self.board.move(a)\n",
+ " self.energy -= 1\n",
+ " self.fatigue += 1\n",
+ " self.update()\n",
+ "\n",
+ " def is_winning(self):\n",
+ " return self.energy > self.fatigue"
+ ]
+ },
+ {
+ "source": [
+ "让我们尝试使用随机游走来解决这个问题,看看是否成功:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ],
+ "source": [
+ "def random_policy(state):\n",
+ " return random.choice(list(actions))\n",
+ "\n",
+ "def walk(board,policy):\n",
+ " n = 0 # number of steps\n",
+ " s = state(board)\n",
+ " while True:\n",
+ " if s.at() == Board.Cell.wolf:\n",
+ " if s.is_winning():\n",
+ " return n # success!\n",
+ " else:\n",
+ " return -n # failure!\n",
+ " if s.at() == Board.Cell.water:\n",
+ " return 0 # died\n",
+ " a = actions[policy(m)]\n",
+ " s.move(a)\n",
+ " n+=1\n",
+ "\n",
+ "walk(m,random_policy)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Killed by wolf = 5, won: 1 times, drown: 94 times\n"
+ ]
+ }
+ ],
+ "source": [
+ "def print_statistics(policy):\n",
+ " s,w,n = 0,0,0\n",
+ " for _ in range(100):\n",
+ " z = walk(m,policy)\n",
+ " if z<0:\n",
+ " w+=1\n",
+ " elif z==0:\n",
+ " n+=1\n",
+ " else:\n",
+ " s+=1\n",
+ " print(f\"Killed by wolf = {w}, won: {s} times, drown: {n} times\")\n",
+ "\n",
+ "print_statistics(random_policy)"
+ ]
+ },
+ {
+ "source": [
+ "## 奖励函数\n",
+ "\n",
+ "### 什么是奖励函数?\n",
+ "\n",
+ "奖励函数是强化学习中用于指导代理行为的核心组件。它定义了代理在特定状态或执行某些动作时所获得的奖励。通过奖励函数,代理能够学习如何在环境中采取最佳行动以实现目标。\n",
+ "\n",
+ "### 设计奖励函数的原则\n",
+ "\n",
+ "设计一个有效的奖励函数需要遵循以下原则:\n",
+ "\n",
+ "1. **明确目标** \n",
+ " 奖励函数应该清晰地反映任务目标。例如,如果目标是让机器人避开障碍物并到达目标位置,那么奖励函数应该鼓励机器人靠近目标,同时惩罚碰撞行为。\n",
+ "\n",
+ "2. **避免稀疏奖励** \n",
+ " 稀疏奖励可能导致学习过程缓慢。尝试提供更频繁的反馈,以帮助代理更快地理解哪些行为是有益的。\n",
+ "\n",
+ "3. **平衡短期与长期奖励** \n",
+ " 奖励函数应该鼓励代理在短期内采取有益的行动,同时考虑长期目标。例如,避免设计只关注即时奖励而忽略长期效果的函数。\n",
+ "\n",
+ "4. **防止意外行为** \n",
+ " 确保奖励函数不会鼓励代理采取意外或不合理的行为。例如,如果奖励函数仅根据速度奖励代理,可能会导致代理忽略安全性。\n",
+ "\n",
+ "### 示例奖励函数\n",
+ "\n",
+ "以下是一个简单的奖励函数示例:\n",
+ "\n",
+ "```python\n",
+ "def reward_function(state, action):\n",
+ " if state == \"goal_reached\":\n",
+ " return 100 # 到达目标位置的高奖励\n",
+ " elif state == \"collision\":\n",
+ " return -50 # 碰撞的惩罚\n",
+ " else:\n",
+ " return -1 # 每一步的轻微惩罚以鼓励快速完成任务\n",
+ "```\n",
+ "\n",
+ "### 常见问题\n",
+ "\n",
+ "#### 奖励函数过于复杂怎么办?\n",
+ "\n",
+ "奖励函数不需要过于复杂。一个简单且清晰的奖励函数通常更容易调试和优化。复杂的奖励函数可能会导致代理难以学习正确的行为。\n",
+ "\n",
+ "#### 如何处理代理的意外行为?\n",
+ "\n",
+ "如果代理表现出意外行为,检查奖励函数是否存在漏洞。例如,代理可能会尝试最大化奖励而采取不合理的行动。通过调整奖励函数,确保它能够正确引导代理行为。\n",
+ "\n",
+ "#### 是否需要动态调整奖励函数?\n",
+ "\n",
+ "在某些情况下,动态调整奖励函数可能是有益的。例如,随着任务难度增加,可以逐步提高奖励以激励代理持续改进。\n",
+ "\n",
+ "### 总结\n",
+ "\n",
+ "奖励函数是强化学习中至关重要的一部分。设计一个有效的奖励函数需要明确目标、提供及时反馈,并防止意外行为。通过不断优化奖励函数,可以帮助代理更快地学习并实现目标。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def reward(s):\n",
+ " r = s.energy-s.fatigue\n",
+ " if s.at()==Board.Cell.wolf:\n",
+ " return 100 if s.is_winning() else -100\n",
+ " if s.at()==Board.Cell.water:\n",
+ " return -100\n",
+ " return r"
+ ]
+ },
+ {
+ "source": [
+ "## Q-Learning 算法\n",
+ "\n",
+ "实际的学习算法几乎没有变化,我们只是使用 `state` 而不是仅仅使用棋盘位置。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "Q = np.ones((width,height,len(actions)),dtype=np.float)*1.0/len(actions)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def probs(v,eps=1e-4):\n",
+ " v = v-v.min()+eps\n",
+ " v = v/v.sum()\n",
+ " return v"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ ""
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "from IPython.display import clear_output\n",
+ "\n",
+ "lpath = []\n",
+ "\n",
+ "for epoch in range(10000):\n",
+ " clear_output(wait=True)\n",
+ " print(f\"Epoch = {epoch}\",end='')\n",
+ "\n",
+ " # Pick initial point\n",
+ " s = state(m)\n",
+ " \n",
+ " # Start travelling\n",
+ " n=0\n",
+ " cum_reward = 0\n",
+ " while True:\n",
+ " x,y = s.board.human\n",
+ " v = probs(Q[x,y])\n",
+ " while True:\n",
+ " a = random.choices(list(actions),weights=v)[0]\n",
+ " dpos = actions[a]\n",
+ " if s.board.is_valid(s.board.move_pos(s.board.human,dpos)):\n",
+ " break \n",
+ " s.move(dpos)\n",
+ " r = reward(s)\n",
+ " if abs(r)==100: # end of game\n",
+ " print(f\" {n} steps\",end='\\r')\n",
+ " lpath.append(n)\n",
+ " break\n",
+ " alpha = np.exp(-n / 3000)\n",
+ " gamma = 0.5\n",
+ " ai = action_idx[a]\n",
+ " Q[x,y,ai] = (1 - alpha) * Q[x,y,ai] + alpha * (r + gamma * Q[x+dpos[0], y+dpos[1]].max())\n",
+ " n+=1"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "m.plot(Q)"
+ ]
+ },
+ {
+ "source": [
+ "## 结果\n",
+ "\n",
+ "让我们看看我们是否成功训练了彼得与狼作战!\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Killed by wolf = 1, won: 9 times, drown: 90 times\n"
+ ]
+ }
+ ],
+ "source": [
+ "def qpolicy(m):\n",
+ " x,y = m.human\n",
+ " v = probs(Q[x,y])\n",
+ " a = random.choices(list(actions),weights=v)[0]\n",
+ " return a\n",
+ "\n",
+ "print_statistics(qpolicy)"
+ ]
+ },
+ {
+ "source": [
+ "我们现在看到溺水的案例少了很多,但彼得仍然无法总是杀死狼。尝试进行实验,看看通过调整超参数是否能改善这个结果。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.plot(lpath)"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {},
+ "outputs": [],
+ "source": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而引起的任何误解或误读,我们概不负责。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/1-QLearning/solution/notebook.ipynb b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/notebook.ipynb
new file mode 100644
index 000000000..c4ea93772
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/1-QLearning/solution/notebook.ipynb
@@ -0,0 +1,577 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 2,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "488431336543f71f14d4aaf0399e3381",
+ "translation_date": "2025-09-03T20:49:17+00:00",
+ "source_file": "8-Reinforcement/1-QLearning/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "# 彼得与狼:强化学习入门\n",
+ "\n",
+ "在本教程中,我们将学习如何将强化学习应用于路径寻找问题。这个场景灵感来源于俄罗斯作曲家[谢尔盖·普罗科菲耶夫](https://en.wikipedia.org/wiki/Sergei_Prokofiev)创作的音乐童话故事[《彼得与狼》](https://en.wikipedia.org/wiki/Peter_and_the_Wolf)。故事讲述了年轻的先锋彼得勇敢地走出家门,来到森林空地追逐狼的冒险经历。我们将训练机器学习算法,帮助彼得探索周围区域并构建一张最优导航地图。\n",
+ "\n",
+ "首先,让我们导入一些有用的库:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import random\n",
+ "import math"
+ ]
+ },
+ {
+ "source": [
+ "## 强化学习概述\n",
+ "\n",
+ "**强化学习**(RL)是一种学习技术,通过运行大量实验,让我们能够学习到某个**环境**中**智能体**的最优行为。在这个环境中,智能体应该有一个明确的**目标**,由一个**奖励函数**来定义。\n",
+ "\n",
+ "## 环境\n",
+ "\n",
+ "为了简单起见,我们将彼得的世界设定为一个大小为 `width` x `height` 的方形棋盘。在这个棋盘上的每个格子可以是:\n",
+ "* **地面**,彼得和其他生物可以在上面行走\n",
+ "* **水域**,显然无法在上面行走\n",
+ "* **树**或**草地**——一个可以休息的地方\n",
+ "* **苹果**,代表彼得很乐意找到的食物以填饱肚子\n",
+ "* **狼**,危险的存在,应该尽量避开\n",
+ "\n",
+ "为了与环境交互,我们将定义一个名为 `Board` 的类。为了避免让这个笔记本过于复杂,我们已将所有与棋盘相关的代码移至一个单独的 `rlboard` 模块中,现在我们将导入该模块。你可以查看这个模块内部的实现细节以了解更多信息。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 2,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "from rlboard import *"
+ ]
+ },
+ {
+ "source": [
+ "现在让我们创建一个随机棋盘,看看它的样子:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 3,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "width, height = 8,8\n",
+ "m = Board(width,height)\n",
+ "m.randomize(seed=13)\n",
+ "m.plot()"
+ ]
+ },
+ {
+ "source": [
+ "## 行动与策略\n",
+ "\n",
+ "在我们的例子中,彼得的目标是找到一个苹果,同时避开狼和其他障碍物。为此,他可以四处走动直到找到苹果。因此,在任何位置,他可以选择以下行动之一:向上、向下、向左或向右。我们将这些行动定义为一个字典,并将它们映射到对应的坐标变化对。例如,向右移动 (`R`) 对应坐标变化对 `(1,0)`。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 4,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "actions = { \"U\" : (0,-1), \"D\" : (0,1), \"L\" : (-1,0), \"R\" : (1,0) }\n",
+ "action_idx = { a : i for i,a in enumerate(actions.keys()) }"
+ ]
+ },
+ {
+ "source": [
+ "我们代理(Peter)的策略由一个所谓的**策略**定义。让我们来看看最简单的策略,称为**随机游走**。\n",
+ "\n",
+ "## 随机游走\n",
+ "\n",
+ "首先,让我们通过实现随机游走策略来解决我们的问题。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {
+ "tags": []
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "18"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ],
+ "source": [
+ "def random_policy(m):\n",
+ " return random.choice(list(actions))\n",
+ "\n",
+ "def walk(m,policy,start_position=None):\n",
+ " n = 0 # number of steps\n",
+ " # set initial position\n",
+ " if start_position:\n",
+ " m.human = start_position \n",
+ " else:\n",
+ " m.random_start()\n",
+ " while True:\n",
+ " if m.at() == Board.Cell.apple:\n",
+ " return n # success!\n",
+ " if m.at() in [Board.Cell.wolf, Board.Cell.water]:\n",
+ " return -1 # eaten by wolf or drowned\n",
+ " while True:\n",
+ " a = actions[policy(m)]\n",
+ " new_pos = m.move_pos(m.human,a)\n",
+ " if m.is_valid(new_pos) and m.at(new_pos)!=Board.Cell.water:\n",
+ " m.move(a) # do the actual move\n",
+ " break\n",
+ " n+=1\n",
+ "\n",
+ "walk(m,random_policy)"
+ ]
+ },
+ {
+ "source": [
+ "让我们多次运行随机游走实验,看看平均步数:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Average path length = 32.87096774193548, eaten by wolf: 7 times\n"
+ ]
+ }
+ ],
+ "source": [
+ "def print_statistics(policy):\n",
+ " s,w,n = 0,0,0\n",
+ " for _ in range(100):\n",
+ " z = walk(m,policy)\n",
+ " if z<0:\n",
+ " w+=1\n",
+ " else:\n",
+ " s += z\n",
+ " n += 1\n",
+ " print(f\"Average path length = {s/n}, eaten by wolf: {w} times\")\n",
+ "\n",
+ "print_statistics(random_policy)"
+ ]
+ },
+ {
+ "source": [
+ "## 奖励函数\n",
+ "\n",
+ "为了让我们的策略更加智能,我们需要了解哪些动作比其他动作“更好”。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "move_reward = -0.1\n",
+ "goal_reward = 10\n",
+ "end_reward = -10\n",
+ "\n",
+ "def reward(m,pos=None):\n",
+ " pos = pos or m.human\n",
+ " if not m.is_valid(pos):\n",
+ " return end_reward\n",
+ " x = m.at(pos)\n",
+ " if x==Board.Cell.water or x == Board.Cell.wolf:\n",
+ " return end_reward\n",
+ " if x==Board.Cell.apple:\n",
+ " return goal_reward\n",
+ " return move_reward"
+ ]
+ },
+ {
+ "source": [
+ "## Q-Learning\n",
+ "\n",
+ "构建一个 Q-Table,或者说是一个多维数组。由于我们的棋盘尺寸是 `width` x `height`,我们可以用一个形状为 `width` x `height` x `len(actions)` 的 numpy 数组来表示 Q-Table:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "Q = np.ones((width,height,len(actions)),dtype=np.float)*1.0/len(actions)"
+ ]
+ },
+ {
+ "source": [
+ "将Q表传递给绘图函数,以便在板上可视化该表:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "m.plot(Q)"
+ ]
+ },
+ {
+ "source": [
+ "## Q-Learning的核心:贝尔曼方程和学习算法\n",
+ "\n",
+ "编写学习算法的伪代码:\n",
+ "\n",
+ "* 初始化Q表Q,所有状态和动作的值设为相同\n",
+ "* 设置学习率 $\\alpha\\leftarrow 1$\n",
+ "* 多次重复模拟\n",
+ " 1. 从随机位置开始\n",
+ " 1. 重复以下步骤\n",
+ " 1. 在状态$s$选择一个动作$a$\n",
+ " 2. 执行动作,移动到新状态$s'$\n",
+ " 3. 如果遇到游戏结束条件,或者总奖励过低——退出模拟 \n",
+ " 4. 计算新状态的奖励$r$\n",
+ " 5. 根据贝尔曼方程更新Q函数:$Q(s,a)\\leftarrow (1-\\alpha)Q(s,a)+\\alpha(r+\\gamma\\max_{a'}Q(s',a'))$\n",
+ " 6. $s\\leftarrow s'$\n",
+ " 7. 更新总奖励并降低$\\alpha$。\n",
+ "\n",
+ "## 探索与利用\n",
+ "\n",
+ "最佳方法是平衡探索与利用。当我们对环境了解得越多时,更倾向于遵循最优路径,但偶尔选择未探索的路径。\n",
+ "\n",
+ "## Python实现\n",
+ "\n",
+ "现在我们准备实现学习算法。在此之前,我们还需要一些函数,将Q表中的任意数字转换为对应动作的概率向量:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def probs(v,eps=1e-4):\n",
+ " v = v-v.min()+eps\n",
+ " v = v/v.sum()\n",
+ " return v"
+ ]
+ },
+ {
+ "source": [
+ "我们在原始向量中添加少量的 `eps`,以避免在初始情况下所有向量分量相同时出现除以0的情况。\n",
+ "\n",
+ "我们将运行实际的学习算法进行5000次实验,也称为**epochs**:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ ""
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "from IPython.display import clear_output\n",
+ "\n",
+ "lpath = []\n",
+ "\n",
+ "for epoch in range(10000):\n",
+ " clear_output(wait=True)\n",
+ " print(f\"Epoch = {epoch}\",end='')\n",
+ "\n",
+ " # Pick initial point\n",
+ " m.random_start()\n",
+ " \n",
+ " # Start travelling\n",
+ " n=0\n",
+ " cum_reward = 0\n",
+ " while True:\n",
+ " x,y = m.human\n",
+ " v = probs(Q[x,y])\n",
+ " a = random.choices(list(actions),weights=v)[0]\n",
+ " dpos = actions[a]\n",
+ " m.move(dpos,check_correctness=False) # we allow player to move outside the board, which terminates episode\n",
+ " r = reward(m)\n",
+ " cum_reward += r\n",
+ " if r==end_reward or cum_reward < -1000:\n",
+ " print(f\" {n} steps\",end='\\r')\n",
+ " lpath.append(n)\n",
+ " break\n",
+ " alpha = np.exp(-n / 3000)\n",
+ " gamma = 0.5\n",
+ " ai = action_idx[a]\n",
+ " Q[x,y,ai] = (1 - alpha) * Q[x,y,ai] + alpha * (r + gamma * Q[x+dpos[0], y+dpos[1]].max())\n",
+ " n+=1"
+ ]
+ },
+ {
+ "source": [
+ "在执行此算法后,Q-表应更新为定义每个步骤中不同动作吸引力的值。在此处可视化该表:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "m.plot(Q)"
+ ]
+ },
+ {
+ "source": [
+ "## 检查策略\n",
+ "\n",
+ "由于 Q-Table 列出了每个状态下每个动作的“吸引力”,我们可以很容易地利用它来定义在我们的世界中高效的导航。在最简单的情况下,我们只需选择对应最高 Q-Table 值的动作:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "2"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ],
+ "source": [
+ "def qpolicy_strict(m):\n",
+ " x,y = m.human\n",
+ " v = probs(Q[x,y])\n",
+ " a = list(actions)[np.argmax(v)]\n",
+ " return a\n",
+ "\n",
+ "walk(m,qpolicy_strict)"
+ ]
+ },
+ {
+ "source": [
+ "如果你多次运行上述代码,你可能会注意到有时它会“卡住”,需要按下笔记本中的停止按钮来中断它。\n",
+ "\n",
+ "> **任务 1:** 修改 `walk` 函数,限制路径的最大长度为一定步数(例如,100),并观察上述代码是否会不时返回该值。\n",
+ "\n",
+ "> **任务 2:** 修改 `walk` 函数,使其不再回到之前已经访问过的地方。这将防止 `walk` 进入循环,但代理仍可能被“困”在一个无法逃脱的位置。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Average path length = 3.45, eaten by wolf: 0 times\n"
+ ]
+ }
+ ],
+ "source": [
+ "\n",
+ "def qpolicy(m):\n",
+ " x,y = m.human\n",
+ " v = probs(Q[x,y])\n",
+ " a = random.choices(list(actions),weights=v)[0]\n",
+ " return a\n",
+ "\n",
+ "print_statistics(qpolicy)"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 15,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\n\n\n\n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n \n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.plot(lpath)"
+ ]
+ },
+ {
+ "source": [
+ "我们可以看到,起初平均路径长度有所增加。这可能是因为当我们对环境一无所知时,很容易陷入糟糕的状态,比如掉进水里或遇到狼。随着我们学习更多并开始利用这些知识,我们能够在环境中探索更长时间,但仍然不太清楚苹果的位置。\n",
+ "\n",
+ "当我们学到足够多时,代理更容易实现目标,路径长度开始减少。然而,我们仍然会进行探索,因此经常偏离最佳路径,尝试新的选项,这使得路径比最优路径更长。\n",
+ "\n",
+ "我们在图表上还观察到,某个时刻路径长度突然增加。这表明过程具有随机性,并且我们可能会在某些时候“破坏”Q-表的系数,通过用新值覆盖它们。这种情况理想情况下应该通过降低学习率来最小化(即在训练后期,我们仅用小幅度调整Q-表的值)。\n",
+ "\n",
+ "总体来说,重要的是要记住,学习过程的成功和质量在很大程度上取决于一些参数,比如学习率、学习率衰减和折扣因子。这些通常被称为**超参数**,以区别于我们在训练过程中优化的**参数**(例如Q-表系数)。寻找最佳超参数值的过程被称为**超参数优化**,这是一个值得单独讨论的话题。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "## 练习\n",
+ "#### 一个更真实的《彼得与狼》世界\n",
+ "\n",
+ "在我们的情境中,彼得几乎可以四处移动而不会感到疲惫或饥饿。在一个更真实的世界里,他需要时不时地坐下来休息,还需要给自己补充食物。让我们通过实现以下规则,使我们的世界更加真实:\n",
+ "\n",
+ "1. 每次从一个地方移动到另一个地方,彼得都会损失一定的**能量**并增加一些**疲劳**。\n",
+ "2. 彼得可以通过吃苹果来恢复能量。\n",
+ "3. 彼得可以通过在树下或草地上休息来消除疲劳(即走到有树或草的棋盘位置——绿色区域)。\n",
+ "4. 彼得需要找到并杀死狼。\n",
+ "5. 为了杀死狼,彼得需要达到一定的能量和疲劳水平,否则他会在战斗中失败。\n",
+ "\n",
+ "根据游戏规则修改上述奖励函数,运行强化学习算法以学习赢得游戏的最佳策略,并将随机游走的结果与您的算法进行比较,比较胜负场次。\n",
+ "\n",
+ "> **注意**: 您可能需要调整超参数以使其正常运行,尤其是训练的轮数。由于游戏的成功(与狼战斗)是一个罕见事件,您可以预期训练时间会更长。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/2-Gym/README.md b/translations/zh-CN/8-Reinforcement/2-Gym/README.md
new file mode 100644
index 000000000..2c8cc473d
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/2-Gym/README.md
@@ -0,0 +1,342 @@
+# CartPole 滑行
+
+我们在上一课中解决的问题可能看起来像一个玩具问题,似乎与现实生活场景无关。但事实并非如此,因为许多现实世界的问题也具有类似的场景——包括下棋或围棋。这些问题类似,因为我们也有一个带有规则的棋盘和一个**离散状态**。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 介绍
+
+在本课中,我们将把 Q-Learning 的相同原理应用于一个具有**连续状态**的问题,即状态由一个或多个实数表示。我们将处理以下问题:
+
+> **问题**:如果彼得想要逃离狼的追捕,他需要能够移动得更快。我们将看到彼得如何通过 Q-Learning 学习滑行,特别是保持平衡。
+
+
+
+> 彼得和他的朋友们发挥创意逃离狼的追捕!图片由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+我们将使用一种称为 **CartPole** 的简化平衡问题。在 CartPole 世界中,我们有一个可以左右移动的水平滑块,目标是让滑块顶部的垂直杆保持平衡。
+
+## 前置知识
+
+在本课中,我们将使用一个名为 **OpenAI Gym** 的库来模拟不同的**环境**。你可以在本地运行本课的代码(例如在 Visual Studio Code 中),此时模拟会在新窗口中打开。如果在线运行代码,你可能需要对代码进行一些调整,具体描述见[这里](https://towardsdatascience.com/rendering-openai-gym-envs-on-binder-and-google-colab-536f99391cc7)。
+
+## OpenAI Gym
+
+在上一课中,游戏规则和状态由我们自己定义的 `Board` 类提供。在这里,我们将使用一个特殊的**模拟环境**,它会模拟平衡杆的物理过程。训练强化学习算法最流行的模拟环境之一是 [Gym](https://gym.openai.com/),由 [OpenAI](https://openai.com/) 维护。通过使用这个 Gym,我们可以创建不同的**环境**,从 CartPole 模拟到 Atari 游戏。
+
+> **注意**:你可以在 OpenAI Gym 中查看其他可用的环境 [这里](https://gym.openai.com/envs/#classic_control)。
+
+首先,让我们安装 Gym 并导入所需的库(代码块 1):
+
+```python
+import sys
+!{sys.executable} -m pip install gym
+
+import gym
+import matplotlib.pyplot as plt
+import numpy as np
+import random
+```
+
+## 练习 - 初始化一个 CartPole 环境
+
+要处理 CartPole 平衡问题,我们需要初始化相应的环境。每个环境都与以下内容相关联:
+
+- **观察空间**:定义我们从环境中接收到的信息结构。对于 CartPole 问题,我们接收到杆的位置、速度以及其他一些值。
+
+- **动作空间**:定义可能的动作。在我们的例子中,动作空间是离散的,由两个动作组成——**左**和**右**。(代码块 2)
+
+1. 要初始化,请输入以下代码:
+
+ ```python
+ env = gym.make("CartPole-v1")
+ print(env.action_space)
+ print(env.observation_space)
+ print(env.action_space.sample())
+ ```
+
+为了了解环境如何工作,让我们运行一个短暂的模拟,持续 100 步。在每一步中,我们提供一个动作——在这个模拟中,我们只是随机选择一个来自 `action_space` 的动作。
+
+1. 运行以下代码并查看结果。
+
+ ✅ 请记住,最好在本地 Python 安装中运行此代码!(代码块 3)
+
+ ```python
+ env.reset()
+
+ for i in range(100):
+ env.render()
+ env.step(env.action_space.sample())
+ env.close()
+ ```
+
+ 你应该会看到类似于以下图片的内容:
+
+ 
+
+1. 在模拟过程中,我们需要获取观察值以决定如何行动。实际上,`step` 函数会返回当前的观察值、奖励函数以及一个表示是否继续模拟的完成标志:(代码块 4)
+
+ ```python
+ env.reset()
+
+ done = False
+ while not done:
+ env.render()
+ obs, rew, done, info = env.step(env.action_space.sample())
+ print(f"{obs} -> {rew}")
+ env.close()
+ ```
+
+ 你将在笔记本输出中看到类似以下的内容:
+
+ ```text
+ [ 0.03403272 -0.24301182 0.02669811 0.2895829 ] -> 1.0
+ [ 0.02917248 -0.04828055 0.03248977 0.00543839] -> 1.0
+ [ 0.02820687 0.14636075 0.03259854 -0.27681916] -> 1.0
+ [ 0.03113408 0.34100283 0.02706215 -0.55904489] -> 1.0
+ [ 0.03795414 0.53573468 0.01588125 -0.84308041] -> 1.0
+ ...
+ [ 0.17299878 0.15868546 -0.20754175 -0.55975453] -> 1.0
+ [ 0.17617249 0.35602306 -0.21873684 -0.90998894] -> 1.0
+ ```
+
+ 在模拟的每一步返回的观察向量包含以下值:
+ - 小车的位置
+ - 小车的速度
+ - 杆的角度
+ - 杆的旋转速率
+
+1. 获取这些数值的最小值和最大值:(代码块 5)
+
+ ```python
+ print(env.observation_space.low)
+ print(env.observation_space.high)
+ ```
+
+ 你可能还会注意到,每次模拟步骤的奖励值始终为 1。这是因为我们的目标是尽可能长时间地保持杆在合理的垂直位置。
+
+ ✅ 实际上,如果我们在 100 次连续试验中平均奖励达到 195,则认为 CartPole 模拟问题已解决。
+
+## 状态离散化
+
+在 Q-Learning 中,我们需要构建 Q-Table 来定义在每个状态下的行动。为了做到这一点,我们需要状态是**离散的**,更确切地说,它应该包含有限数量的离散值。因此,我们需要以某种方式**离散化**我们的观察值,将它们映射到有限的状态集合。
+
+有几种方法可以做到这一点:
+
+- **划分为区间**。如果我们知道某个值的范围,我们可以将这个范围划分为若干**区间**,然后用该值所属的区间编号替换原值。这可以使用 numpy 的 [`digitize`](https://numpy.org/doc/stable/reference/generated/numpy.digitize.html) 方法来完成。在这种情况下,我们将准确知道状态的大小,因为它将取决于我们为离散化选择的区间数量。
+
+✅ 我们可以使用线性插值将值映射到某个有限区间(例如,从 -20 到 20),然后通过四舍五入将数字转换为整数。这种方法对状态大小的控制稍弱,特别是当我们不知道输入值的确切范围时。例如,在我们的例子中,观察值中的 4 个值中有 2 个没有上下界,这可能导致状态数量无限。
+
+在我们的例子中,我们将采用第二种方法。正如你稍后可能注意到的,尽管没有明确的上下界,这些值很少会超出某些有限区间,因此具有极端值的状态将非常罕见。
+
+1. 以下是一个函数,它将从模型中获取观察值并生成一个包含 4 个整数值的元组:(代码块 6)
+
+ ```python
+ def discretize(x):
+ return tuple((x/np.array([0.25, 0.25, 0.01, 0.1])).astype(np.int))
+ ```
+
+1. 我们还可以探索另一种使用区间的离散化方法:(代码块 7)
+
+ ```python
+ def create_bins(i,num):
+ return np.arange(num+1)*(i[1]-i[0])/num+i[0]
+
+ print("Sample bins for interval (-5,5) with 10 bins\n",create_bins((-5,5),10))
+
+ ints = [(-5,5),(-2,2),(-0.5,0.5),(-2,2)] # intervals of values for each parameter
+ nbins = [20,20,10,10] # number of bins for each parameter
+ bins = [create_bins(ints[i],nbins[i]) for i in range(4)]
+
+ def discretize_bins(x):
+ return tuple(np.digitize(x[i],bins[i]) for i in range(4))
+ ```
+
+1. 现在让我们运行一个短暂的模拟并观察这些离散化的环境值。可以尝试 `discretize` 和 `discretize_bins`,看看是否有区别。
+
+ ✅ `discretize_bins` 返回区间编号,从 0 开始。因此,对于输入变量值接近 0 的情况,它返回区间中间的编号(10)。在 `discretize` 中,我们没有关心输出值的范围,允许它们为负,因此状态值没有偏移,0 对应于 0。(代码块 8)
+
+ ```python
+ env.reset()
+
+ done = False
+ while not done:
+ #env.render()
+ obs, rew, done, info = env.step(env.action_space.sample())
+ #print(discretize_bins(obs))
+ print(discretize(obs))
+ env.close()
+ ```
+
+ ✅ 如果你想查看环境如何执行,可以取消注释以 `env.render` 开头的行。否则,你可以在后台执行,这样速度更快。在我们的 Q-Learning 过程中,我们将使用这种“不可见”的执行方式。
+
+## Q-Table 结构
+
+在上一课中,状态是一个简单的数字对,从 0 到 8,因此用形状为 8x8x2 的 numpy 张量表示 Q-Table 很方便。如果我们使用区间离散化,状态向量的大小也是已知的,因此我们可以使用相同的方法,用形状为 20x20x10x10x2 的数组表示状态(这里的 2 是动作空间的维度,前几个维度对应于我们为观察空间中每个参数选择的区间数量)。
+
+然而,有时观察空间的精确维度是未知的。在使用 `discretize` 函数的情况下,我们可能无法确定状态是否保持在某些限制范围内,因为某些原始值是没有界限的。因此,我们将使用稍微不同的方法,用字典表示 Q-Table。
+
+1. 使用 *(state, action)* 对作为字典键,值对应于 Q-Table 的条目值。(代码块 9)
+
+ ```python
+ Q = {}
+ actions = (0,1)
+
+ def qvalues(state):
+ return [Q.get((state,a),0) for a in actions]
+ ```
+
+ 在这里我们还定义了一个函数 `qvalues()`,它返回给定状态对应于所有可能动作的 Q-Table 值列表。如果 Q-Table 中没有该条目,我们将返回默认值 0。
+
+## 开始 Q-Learning
+
+现在我们准备教彼得如何保持平衡了!
+
+1. 首先,让我们设置一些超参数:(代码块 10)
+
+ ```python
+ # hyperparameters
+ alpha = 0.3
+ gamma = 0.9
+ epsilon = 0.90
+ ```
+
+ 这里,`alpha` 是**学习率**,定义了我们在每一步中应该在多大程度上调整 Q-Table 的当前值。在上一课中,我们从 1 开始,然后在训练过程中将 `alpha` 降低到较低的值。在这个例子中,为了简单起见,我们将保持它不变,你可以稍后尝试调整 `alpha` 值。
+
+ `gamma` 是**折扣因子**,表示我们应该在多大程度上优先考虑未来奖励而不是当前奖励。
+
+ `epsilon` 是**探索/利用因子**,决定我们是否应该更倾向于探索还是利用。在我们的算法中,我们将在 `epsilon` 百分比的情况下根据 Q-Table 值选择下一个动作,而在剩余情况下执行随机动作。这将允许我们探索以前从未见过的搜索空间区域。
+
+ ✅ 在平衡方面——选择随机动作(探索)就像是一个随机的错误方向的推力,杆需要学习如何从这些“错误”中恢复平衡。
+
+### 改进算法
+
+我们还可以对上一课的算法进行两项改进:
+
+- **计算平均累计奖励**,在多次模拟中进行。我们将每 5000 次迭代打印一次进度,并在这段时间内对累计奖励进行平均。这意味着如果我们获得超过 195 分——我们可以认为问题已经解决,质量甚至高于要求。
+
+- **计算最大平均累计结果**,`Qmax`,并存储对应于该结果的 Q-Table。当你运行训练时,你会注意到有时平均累计结果开始下降,我们希望保留训练过程中观察到的最佳模型对应的 Q-Table 值。
+
+1. 在每次模拟中将所有累计奖励收集到 `rewards` 向量中,以便进一步绘图。(代码块 11)
+
+ ```python
+ def probs(v,eps=1e-4):
+ v = v-v.min()+eps
+ v = v/v.sum()
+ return v
+
+ Qmax = 0
+ cum_rewards = []
+ rewards = []
+ for epoch in range(100000):
+ obs = env.reset()
+ done = False
+ cum_reward=0
+ # == do the simulation ==
+ while not done:
+ s = discretize(obs)
+ if random.random() Qmax:
+ Qmax = np.average(cum_rewards)
+ Qbest = Q
+ cum_rewards=[]
+ ```
+
+你可能从这些结果中注意到:
+
+- **接近目标**。我们非常接近实现目标,即在 100 次以上的连续模拟中获得 195 的累计奖励,或者我们实际上已经实现了!即使我们获得较小的数字,我们仍然不知道,因为我们平均了 5000 次运行,而正式标准只需要 100 次运行。
+
+- **奖励开始下降**。有时奖励开始下降,这意味着我们可能会用使情况变得更糟的新值“破坏” Q-Table 中已经学习到的值。
+
+如果我们绘制训练进度,这种观察会更加清晰。
+
+## 绘制训练进度
+
+在训练过程中,我们将每次迭代的累计奖励值收集到 `rewards` 向量中。以下是将其与迭代次数绘制在一起的样子:
+
+```python
+plt.plot(rewards)
+```
+
+
+
+从这个图表中无法看出任何信息,因为由于随机训练过程的性质,训练会话的长度变化很大。为了让这个图表更有意义,我们可以计算一系列实验的**运行平均值**,比如 100 次。这可以使用 `np.convolve` 方便地完成:(代码块 12)
+
+```python
+def running_average(x,window):
+ return np.convolve(x,np.ones(window)/window,mode='valid')
+
+plt.plot(running_average(rewards,100))
+```
+
+
+
+## 调整超参数
+
+为了使学习更加稳定,有必要在训练过程中调整一些超参数。特别是:
+
+- **学习率** `alpha`,我们可以从接近 1 的值开始,然后逐渐降低该参数。随着时间的推移,我们将在 Q-Table 中获得良好的概率值,因此我们应该稍微调整它们,而不是完全用新值覆盖。
+
+- **增加 epsilon**。我们可能希望慢慢增加 `epsilon`,以便减少探索,更多地利用。可能合理的是从较低的 `epsilon` 值开始,然后逐渐增加到接近 1。
+> **任务 1**:尝试调整超参数的值,看看是否能获得更高的累计奖励。你的得分是否超过了195?
+> **任务 2**:为了正式解决这个问题,你需要在连续100次运行中获得195的平均奖励。在训练过程中进行测量,并确保你已经正式解决了这个问题!
+
+## 查看结果的实际表现
+
+观察训练好的模型如何表现会非常有趣。让我们运行模拟,并遵循与训练时相同的动作选择策略,根据Q表中的概率分布进行采样:(代码块13)
+
+```python
+obs = env.reset()
+done = False
+while not done:
+ s = discretize(obs)
+ env.render()
+ v = probs(np.array(qvalues(s)))
+ a = random.choices(actions,weights=v)[0]
+ obs,_,done,_ = env.step(a)
+env.close()
+```
+
+你应该会看到类似这样的画面:
+
+
+
+---
+
+## 🚀挑战
+
+> **任务 3**:在这里,我们使用的是Q表的最终版本,但它可能不是表现最好的版本。记住,我们已经将表现最好的Q表存储在变量`Qbest`中!尝试用表现最好的Q表替换当前的Q表,看看是否能观察到差异。
+
+> **任务 4**:在这里,我们并没有在每一步选择最佳动作,而是根据对应的概率分布进行采样。是否总是选择具有最高Q表值的最佳动作会更合理?这可以通过使用`np.argmax`函数找到对应于最高Q表值的动作编号来实现。尝试实施这种策略,看看是否能改善平衡效果。
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 作业
+[训练一个山地车](assignment.md)
+
+## 总结
+
+我们现在已经学会了如何通过提供一个定义游戏目标状态的奖励函数,并让智能体有机会智能地探索搜索空间,来训练智能体以获得良好的结果。我们成功地在离散和连续环境中应用了Q学习算法,但动作是离散的。
+
+研究动作状态也是连续的情况,以及观察空间更复杂的情况(例如来自Atari游戏屏幕的图像)也很重要。在这些问题中,我们通常需要使用更强大的机器学习技术,例如神经网络,以获得良好的结果。这些更高级的主题将是我们即将推出的高级AI课程的内容。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/2-Gym/assignment.md b/translations/zh-CN/8-Reinforcement/2-Gym/assignment.md
new file mode 100644
index 000000000..cf1633d80
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/2-Gym/assignment.md
@@ -0,0 +1,48 @@
+# 训练山地车
+
+[OpenAI Gym](http://gym.openai.com) 的设计使得所有环境都提供相同的 API——即相同的方法 `reset`、`step` 和 `render`,以及相同的 **动作空间** 和 **观察空间** 抽象。因此,可以通过最小的代码更改,将相同的强化学习算法适配到不同的环境中。
+
+## 山地车环境
+
+[山地车环境](https://gym.openai.com/envs/MountainCar-v0/) 包含一辆被困在山谷中的小车:
+
+目标是通过以下动作之一,在每一步中让小车驶出山谷并夺取旗帜:
+
+| 值 | 含义 |
+|---|---|
+| 0 | 向左加速 |
+| 1 | 不加速 |
+| 2 | 向右加速 |
+
+然而,这个问题的主要难点在于,小车的引擎动力不足,无法一次性爬上山顶。因此,唯一的成功方法是通过来回移动来积累动能。
+
+观察空间仅包含两个值:
+
+| 编号 | 观察值 | 最小值 | 最大值 |
+|-----|--------------|-----|-----|
+| 0 | 小车位置 | -1.2 | 0.6 |
+| 1 | 小车速度 | -0.07 | 0.07 |
+
+山地车的奖励系统相当复杂:
+
+ * 如果智能体到达山顶的旗帜位置(位置 = 0.5),奖励为 0。
+ * 如果智能体的位置小于 0.5,奖励为 -1。
+
+当小车位置超过 0.5 或者回合长度超过 200 时,回合终止。
+
+## 指导说明
+
+将我们的强化学习算法适配到山地车问题中。以现有的 [notebook.ipynb](notebook.ipynb) 代码为起点,替换新的环境,修改状态离散化函数,并尝试通过最小的代码修改使现有算法能够进行训练。通过调整超参数来优化结果。
+
+> **注意**: 可能需要调整超参数以使算法收敛。
+
+## 评分标准
+
+| 标准 | 优秀 | 合格 | 需要改进 |
+| -------- | --------- | -------- | ----------------- |
+| | 成功从 CartPole 示例中适配 Q-Learning 算法,代码修改最小,能够在 200 步内解决夺旗问题。 | 从网上采用了新的 Q-Learning 算法,但文档记录良好;或者采用了现有算法,但未达到预期结果。 | 未能成功采用任何算法,但在解决方案上迈出了重要一步(实现了状态离散化、Q 表数据结构等)。 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/2-Gym/notebook.ipynb b/translations/zh-CN/8-Reinforcement/2-Gym/notebook.ipynb
new file mode 100644
index 000000000..9aa8ff149
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/2-Gym/notebook.ipynb
@@ -0,0 +1,394 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.4"
+ },
+ "orig_nbformat": 4,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.4 64-bit ('base': conda)"
+ },
+ "interpreter": {
+ "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
+ },
+ "coopTranslator": {
+ "original_hash": "f22f8f3daed4b6d34648d1254763105b",
+ "translation_date": "2025-09-03T20:54:32+00:00",
+ "source_file": "8-Reinforcement/2-Gym/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "## 小车杆滑行\n",
+ "\n",
+ "> **问题**:如果彼得想要逃离狼的追捕,他需要比狼移动得更快。我们将探讨彼得如何学习滑行,特别是如何通过 Q-Learning 学习保持平衡。\n",
+ "\n",
+ "首先,让我们安装 gym 并导入所需的库:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#code block 1"
+ ]
+ },
+ {
+ "source": [
+ "## 创建一个平衡杆环境\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "#code block 2"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "source": [
+ "要了解环境如何运行,让我们进行一个100步的短模拟。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "#code block 3"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "source": [
+ "在模拟过程中,我们需要获取观察结果以决定如何行动。实际上,`step` 函数会返回当前的观察结果、奖励函数以及 `done` 标志,该标志指示是否有必要继续模拟:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "#code block 4"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "source": [
+ "我们可以获取这些数字的最小值和最大值:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]\n[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]\n"
+ ]
+ }
+ ],
+ "source": [
+ "#code block 5"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#code block 6"
+ ]
+ },
+ {
+ "source": [
+ "让我们也探索使用分箱的其他离散化方法:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Sample bins for interval (-5,5) with 10 bins\n [-5. -4. -3. -2. -1. 0. 1. 2. 3. 4. 5.]\n"
+ ]
+ }
+ ],
+ "source": [
+ "#code block 7"
+ ]
+ },
+ {
+ "source": [
+ "现在让我们运行一个简短的模拟,并观察那些离散的环境值。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "(0, 0, -2, -2)\n(0, 1, -2, -5)\n(0, 2, -3, -8)\n(0, 3, -5, -11)\n(0, 3, -7, -14)\n(0, 4, -10, -17)\n(0, 3, -14, -15)\n(0, 3, -17, -12)\n(0, 3, -20, -16)\n(0, 4, -23, -19)\n"
+ ]
+ }
+ ],
+ "source": [
+ "#code block 8"
+ ]
+ },
+ {
+ "source": [
+ "## Q-表结构\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 12,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#code block 9"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 13,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "#code block 10"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 14,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "0: 22.0, alpha=0.3, epsilon=0.9\n",
+ "5000: 70.1384, alpha=0.3, epsilon=0.9\n",
+ "10000: 121.8586, alpha=0.3, epsilon=0.9\n",
+ "15000: 149.6368, alpha=0.3, epsilon=0.9\n",
+ "20000: 168.2782, alpha=0.3, epsilon=0.9\n",
+ "25000: 196.7356, alpha=0.3, epsilon=0.9\n",
+ "30000: 220.7614, alpha=0.3, epsilon=0.9\n",
+ "35000: 233.2138, alpha=0.3, epsilon=0.9\n",
+ "40000: 248.22, alpha=0.3, epsilon=0.9\n",
+ "45000: 264.636, alpha=0.3, epsilon=0.9\n",
+ "50000: 276.926, alpha=0.3, epsilon=0.9\n",
+ "55000: 277.9438, alpha=0.3, epsilon=0.9\n",
+ "60000: 248.881, alpha=0.3, epsilon=0.9\n",
+ "65000: 272.529, alpha=0.3, epsilon=0.9\n",
+ "70000: 281.7972, alpha=0.3, epsilon=0.9\n",
+ "75000: 284.2844, alpha=0.3, epsilon=0.9\n",
+ "80000: 269.667, alpha=0.3, epsilon=0.9\n",
+ "85000: 273.8652, alpha=0.3, epsilon=0.9\n",
+ "90000: 278.2466, alpha=0.3, epsilon=0.9\n",
+ "95000: 269.1736, alpha=0.3, epsilon=0.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "#code block 11"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 20
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.plot(rewards)"
+ ]
+ },
+ {
+ "source": [
+ "从这个图表中无法得出任何结论,因为由于随机训练过程的性质,训练会话的长度差异很大。为了更好地理解这个图表,我们可以计算**运行平均值**,例如基于100次实验。这可以通过使用`np.convolve`方便地完成:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 22
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "#code block 12"
+ ]
+ },
+ {
+ "source": [
+ "## 调整超参数并观察结果\n",
+ "\n",
+ "现在,实际观察训练后的模型表现会非常有趣。让我们运行模拟,并采用与训练时相同的动作选择策略:根据 Q-Table 中的概率分布进行采样:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# code block 13"
+ ]
+ },
+ {
+ "source": [
+ "## 将结果保存为动画 GIF\n",
+ "\n",
+ "如果你想给朋友留下深刻印象,可以考虑发送平衡杆的动画 GIF 图片。为此,我们可以调用 `env.render` 来生成图像帧,然后使用 PIL 库将这些帧保存为动画 GIF:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "360\n"
+ ]
+ }
+ ],
+ "source": [
+ "from PIL import Image\n",
+ "obs = env.reset()\n",
+ "done = False\n",
+ "i=0\n",
+ "ims = []\n",
+ "while not done:\n",
+ " s = discretize(obs)\n",
+ " img=env.render(mode='rgb_array')\n",
+ " ims.append(Image.fromarray(img))\n",
+ " v = probs(np.array([Qbest.get((s,a),0) for a in actions]))\n",
+ " a = random.choices(actions,weights=v)[0]\n",
+ " obs,_,done,_ = env.step(a)\n",
+ " i+=1\n",
+ "env.close()\n",
+ "ims[0].save('images/cartpole-balance.gif',save_all=True,append_images=ims[1::2],loop=0,duration=5)\n",
+ "print(i)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/2-Gym/solution/Julia/README.md b/translations/zh-CN/8-Reinforcement/2-Gym/solution/Julia/README.md
new file mode 100644
index 000000000..e2fb46232
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/2-Gym/solution/Julia/README.md
@@ -0,0 +1,6 @@
+
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/2-Gym/solution/R/README.md b/translations/zh-CN/8-Reinforcement/2-Gym/solution/R/README.md
new file mode 100644
index 000000000..61677dbd3
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/2-Gym/solution/R/README.md
@@ -0,0 +1,6 @@
+这是一个临时占位符
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/2-Gym/solution/notebook.ipynb b/translations/zh-CN/8-Reinforcement/2-Gym/solution/notebook.ipynb
new file mode 100644
index 000000000..08632234b
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/2-Gym/solution/notebook.ipynb
@@ -0,0 +1,526 @@
+{
+ "metadata": {
+ "language_info": {
+ "codemirror_mode": {
+ "name": "ipython",
+ "version": 3
+ },
+ "file_extension": ".py",
+ "mimetype": "text/x-python",
+ "name": "python",
+ "nbconvert_exporter": "python",
+ "pygments_lexer": "ipython3",
+ "version": "3.7.0"
+ },
+ "orig_nbformat": 4,
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3.7.0 64-bit ('3.7')"
+ },
+ "interpreter": {
+ "hash": "70b38d7a306a849643e446cd70466270a13445e5987dfa1344ef2b127438fa4d"
+ },
+ "coopTranslator": {
+ "original_hash": "5c0e485e58d63c506f1791c4dbf990ce",
+ "translation_date": "2025-09-03T20:56:48+00:00",
+ "source_file": "8-Reinforcement/2-Gym/solution/notebook.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+ {
+ "source": [
+ "## 小车杆滑行\n",
+ "\n",
+ "> **问题**:如果彼得想要逃离狼的追捕,他需要比狼移动得更快。我们将探讨彼得如何学习滑行,特别是如何通过Q学习来保持平衡。\n",
+ "\n",
+ "首先,让我们安装gym并导入所需的库:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 1,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Requirement already satisfied: gym in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (0.18.3)\n",
+ "Requirement already satisfied: Pillow<=8.2.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from gym) (7.0.0)\n",
+ "Requirement already satisfied: scipy in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from gym) (1.4.1)\n",
+ "Requirement already satisfied: numpy>=1.10.4 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from gym) (1.19.2)\n",
+ "Requirement already satisfied: cloudpickle<1.7.0,>=1.2.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from gym) (1.6.0)\n",
+ "Requirement already satisfied: pyglet<=1.5.15,>=1.4.0 in /Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages (from gym) (1.5.15)\n",
+ "\u001b[33mWARNING: You are using pip version 20.2.3; however, version 21.1.2 is available.\n",
+ "You should consider upgrading via the '/Library/Frameworks/Python.framework/Versions/3.7/bin/python3.7 -m pip install --upgrade pip' command.\u001b[0m\n"
+ ]
+ }
+ ],
+ "source": [
+ "import sys\n",
+ "!pip install gym \n",
+ "\n",
+ "import gym\n",
+ "import matplotlib.pyplot as plt\n",
+ "import numpy as np\n",
+ "import random"
+ ]
+ },
+ {
+ "source": [
+ "## 创建一个平衡杆环境\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "env = gym.make(\"CartPole-v1\")\n",
+ "print(env.action_space)\n",
+ "print(env.observation_space)\n",
+ "print(env.action_space.sample())"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Discrete(2)\nBox(-3.4028234663852886e+38, 3.4028234663852886e+38, (4,), float32)\n0\n"
+ ]
+ }
+ ]
+ },
+ {
+ "source": [
+ "要了解环境如何运行,让我们进行一个100步的短模拟。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "env.reset()\n",
+ "\n",
+ "for i in range(100):\n",
+ " env.render()\n",
+ " env.step(env.action_space.sample())\n",
+ "env.close()"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stderr",
+ "text": [
+ "/Library/Frameworks/Python.framework/Versions/3.7/lib/python3.7/site-packages/gym/logger.py:30: UserWarning: \u001b[33mWARN: You are calling 'step()' even though this environment has already returned done = True. You should always call 'reset()' once you receive 'done = True' -- any further steps are undefined behavior.\u001b[0m\n warnings.warn(colorize('%s: %s'%('WARN', msg % args), 'yellow'))\n"
+ ]
+ }
+ ]
+ },
+ {
+ "source": [
+ "在模拟过程中,我们需要获取观察结果以决定如何行动。实际上,`step` 函数会返回当前的观察结果、奖励函数以及 `done` 标志,该标志指示是否有必要继续模拟:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "source": [
+ "env.reset()\n",
+ "\n",
+ "done = False\n",
+ "while not done:\n",
+ " env.render()\n",
+ " obs, rew, done, info = env.step(env.action_space.sample())\n",
+ " print(f\"{obs} -> {rew}\")\n",
+ "env.close()"
+ ],
+ "cell_type": "code",
+ "metadata": {},
+ "execution_count": 4,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "[ 0.03044442 -0.19543914 -0.04496216 0.28125618] -> 1.0\n",
+ "[ 0.02653564 -0.38989186 -0.03933704 0.55942606] -> 1.0\n",
+ "[ 0.0187378 -0.19424049 -0.02814852 0.25461393] -> 1.0\n",
+ "[ 0.01485299 -0.38894946 -0.02305624 0.53828712] -> 1.0\n",
+ "[ 0.007074 -0.19351108 -0.0122905 0.23842953] -> 1.0\n",
+ "[ 0.00320378 0.00178427 -0.00752191 -0.05810469] -> 1.0\n",
+ "[ 0.00323946 0.19701326 -0.008684 -0.35315131] -> 1.0\n",
+ "[ 0.00717973 0.00201587 -0.01574703 -0.06321931] -> 1.0\n",
+ "[ 0.00722005 0.19736001 -0.01701141 -0.36082863] -> 1.0\n",
+ "[ 0.01116725 0.39271958 -0.02422798 -0.65882671] -> 1.0\n",
+ "[ 0.01902164 0.19794307 -0.03740452 -0.37387001] -> 1.0\n",
+ "[ 0.0229805 0.39357584 -0.04488192 -0.67810827] -> 1.0\n",
+ "[ 0.03085202 0.58929164 -0.05844408 -0.98457719] -> 1.0\n",
+ "[ 0.04263785 0.78514572 -0.07813563 -1.2950295 ] -> 1.0\n",
+ "[ 0.05834076 0.98116859 -0.10403622 -1.61111521] -> 1.0\n",
+ "[ 0.07796413 0.78741784 -0.13625852 -1.35259196] -> 1.0\n",
+ "[ 0.09371249 0.98396202 -0.16331036 -1.68461179] -> 1.0\n",
+ "[ 0.11339173 0.79106371 -0.1970026 -1.44691436] -> 1.0\n",
+ "[ 0.12921301 0.59883361 -0.22594088 -1.22169133] -> 1.0\n"
+ ]
+ }
+ ]
+ },
+ {
+ "source": [
+ "我们可以获取这些数字的最小值和最大值:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 5,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "[-4.8000002e+00 -3.4028235e+38 -4.1887903e-01 -3.4028235e+38]\n[4.8000002e+00 3.4028235e+38 4.1887903e-01 3.4028235e+38]\n"
+ ]
+ }
+ ],
+ "source": [
+ "print(env.observation_space.low)\n",
+ "print(env.observation_space.high)"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 6,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "def discretize(x):\n",
+ " return tuple((x/np.array([0.25, 0.25, 0.01, 0.1])).astype(np.int))"
+ ]
+ },
+ {
+ "source": [
+ "让我们也探索使用分箱的其他离散化方法:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 7,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Sample bins for interval (-5,5) with 10 bins\n [-5. -4. -3. -2. -1. 0. 1. 2. 3. 4. 5.]\n"
+ ]
+ }
+ ],
+ "source": [
+ "def create_bins(i,num):\n",
+ " return np.arange(num+1)*(i[1]-i[0])/num+i[0]\n",
+ "\n",
+ "print(\"Sample bins for interval (-5,5) with 10 bins\\n\",create_bins((-5,5),10))\n",
+ "\n",
+ "ints = [(-5,5),(-2,2),(-0.5,0.5),(-2,2)] # intervals of values for each parameter\n",
+ "nbins = [20,20,10,10] # number of bins for each parameter\n",
+ "bins = [create_bins(ints[i],nbins[i]) for i in range(4)]\n",
+ "\n",
+ "def discretize_bins(x):\n",
+ " return tuple(np.digitize(x[i],bins[i]) for i in range(4))"
+ ]
+ },
+ {
+ "source": [
+ "现在让我们运行一个简短的模拟,并观察那些离散的环境值。\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 8,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "(0, 0, -1, -3)\n(0, 0, -2, 0)\n(0, 0, -2, -3)\n(0, 1, -3, -6)\n(0, 2, -4, -9)\n(0, 3, -6, -12)\n(0, 2, -8, -9)\n(0, 3, -10, -13)\n(0, 4, -13, -16)\n(0, 4, -16, -19)\n(0, 4, -20, -17)\n(0, 4, -24, -20)\n"
+ ]
+ }
+ ],
+ "source": [
+ "env.reset()\n",
+ "\n",
+ "done = False\n",
+ "while not done:\n",
+ " #env.render()\n",
+ " obs, rew, done, info = env.step(env.action_space.sample())\n",
+ " #print(discretize_bins(obs))\n",
+ " print(discretize(obs))\n",
+ "env.close()"
+ ]
+ },
+ {
+ "source": [
+ "## Q-表结构\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 9,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "Q = {}\n",
+ "actions = (0,1)\n",
+ "\n",
+ "def qvalues(state):\n",
+ " return [Q.get((state,a),0) for a in actions]"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 10,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "# hyperparameters\n",
+ "alpha = 0.3\n",
+ "gamma = 0.9\n",
+ "epsilon = 0.90"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 11,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "0: 108.0, alpha=0.3, epsilon=0.9\n"
+ ]
+ }
+ ],
+ "source": [
+ "def probs(v,eps=1e-4):\n",
+ " v = v-v.min()+eps\n",
+ " v = v/v.sum()\n",
+ " return v\n",
+ "\n",
+ "Qmax = 0\n",
+ "cum_rewards = []\n",
+ "rewards = []\n",
+ "for epoch in range(100000):\n",
+ " obs = env.reset()\n",
+ " done = False\n",
+ " cum_reward=0\n",
+ " # == do the simulation ==\n",
+ " while not done:\n",
+ " s = discretize(obs)\n",
+ " if random.random() Qmax:\n",
+ " Qmax = np.average(cum_rewards)\n",
+ " Qbest = Q\n",
+ " cum_rewards=[]"
+ ]
+ },
+ {
+ "source": [],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 20,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 20
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "plt.plot(rewards)"
+ ]
+ },
+ {
+ "source": [
+ "从这个图表中无法得出任何结论,因为由于随机训练过程的性质,训练会话的长度差异很大。为了更好地理解这个图表,我们可以计算**运行平均值**,例如通过一系列实验,假设是100。这可以方便地使用`np.convolve`来完成:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 22,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "[]"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 22
+ },
+ {
+ "output_type": "display_data",
+ "data": {
+ "text/plain": "",
+ "image/svg+xml": "\r\n\r\n\r\n\r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n \r\n",
+ "image/png": "\n"
+ },
+ "metadata": {
+ "needs_background": "light"
+ }
+ }
+ ],
+ "source": [
+ "def running_average(x,window):\n",
+ " return np.convolve(x,np.ones(window)/window,mode='valid')\n",
+ "\n",
+ "plt.plot(running_average(rewards,100))"
+ ]
+ },
+ {
+ "source": [
+ "## 调整超参数并观察结果\n",
+ "\n",
+ "现在,我们可以实际看看训练好的模型是如何表现的。让我们运行模拟,并采用与训练时相同的动作选择策略:根据 Q-Table 中的概率分布进行采样:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 23,
+ "metadata": {},
+ "outputs": [],
+ "source": [
+ "obs = env.reset()\n",
+ "done = False\n",
+ "while not done:\n",
+ " s = discretize(obs)\n",
+ " env.render()\n",
+ " v = probs(np.array(qvalues(s)))\n",
+ " a = random.choices(actions,weights=v)[0]\n",
+ " obs,_,done,_ = env.step(a)\n",
+ "env.close()"
+ ]
+ },
+ {
+ "source": [
+ "## 将结果保存为动画 GIF\n",
+ "\n",
+ "如果你想给朋友留下深刻印象,可以将平衡杆的动画 GIF 图片发送给他们。为此,我们可以调用 `env.render` 来生成图像帧,然后使用 PIL 库将这些帧保存为动画 GIF:\n"
+ ],
+ "cell_type": "markdown",
+ "metadata": {}
+ },
+ {
+ "cell_type": "code",
+ "execution_count": 26,
+ "metadata": {},
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "360\n"
+ ]
+ }
+ ],
+ "source": [
+ "from PIL import Image\n",
+ "obs = env.reset()\n",
+ "done = False\n",
+ "i=0\n",
+ "ims = []\n",
+ "while not done:\n",
+ " s = discretize(obs)\n",
+ " img=env.render(mode='rgb_array')\n",
+ " ims.append(Image.fromarray(img))\n",
+ " v = probs(np.array([Qbest.get((s,a),0) for a in actions]))\n",
+ " a = random.choices(actions,weights=v)[0]\n",
+ " obs,_,done,_ = env.step(a)\n",
+ " i+=1\n",
+ "env.close()\n",
+ "ims[0].save('images/cartpole-balance.gif',save_all=True,append_images=ims[1::2],loop=0,duration=5)\n",
+ "print(i)"
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/8-Reinforcement/README.md b/translations/zh-CN/8-Reinforcement/README.md
new file mode 100644
index 000000000..f13cbcfa9
--- /dev/null
+++ b/translations/zh-CN/8-Reinforcement/README.md
@@ -0,0 +1,58 @@
+# 强化学习简介
+
+强化学习(RL)被认为是与监督学习和无监督学习并列的基本机器学习范式之一。RL的核心是决策:做出正确的决策,或者至少从决策中学习。
+
+想象一下,你有一个模拟环境,比如股票市场。如果你实施某项规定,会发生什么?它会产生积极还是消极的影响?如果发生了消极的事情,你需要接受这种_负强化_,从中学习并调整方向。如果是积极的结果,你需要基于这种_正强化_继续发展。
+
+
+
+> 彼得和他的朋友们需要逃离饥饿的狼!图片由 [Jen Looper](https://twitter.com/jenlooper) 提供
+
+## 地区主题:彼得与狼(俄罗斯)
+
+[彼得与狼](https://en.wikipedia.org/wiki/Peter_and_the_Wolf) 是由俄罗斯作曲家 [谢尔盖·普罗科菲耶夫](https://en.wikipedia.org/wiki/Sergei_Prokofiev) 创作的一部音乐童话。故事讲述了年轻的先锋彼得勇敢地走出家门,来到森林空地追逐狼。在本节中,我们将训练机器学习算法来帮助彼得:
+
+- **探索**周围区域并构建最佳导航地图
+- **学习**如何使用滑板并保持平衡,以便更快地移动
+
+[](https://www.youtube.com/watch?v=Fmi5zHg4QSM)
+
+> 🎥 点击上方图片收听普罗科菲耶夫的《彼得与狼》
+
+## 强化学习
+
+在之前的章节中,你已经看到两种机器学习问题的例子:
+
+- **监督学习**,我们有数据集提供问题的样本解决方案。[分类](../4-Classification/README.md) 和 [回归](../2-Regression/README.md) 是监督学习任务。
+- **无监督学习**,我们没有标注的训练数据。无监督学习的主要例子是 [聚类](../5-Clustering/README.md)。
+
+在本节中,我们将向你介绍一种不需要标注训练数据的新型学习问题。这类问题有几种类型:
+
+- **[半监督学习](https://wikipedia.org/wiki/Semi-supervised_learning)**,我们有大量未标注的数据,可以用来预训练模型。
+- **[强化学习](https://wikipedia.org/wiki/Reinforcement_learning)**,代理通过在某些模拟环境中进行实验来学习如何行动。
+
+### 示例 - 电脑游戏
+
+假设你想教电脑玩游戏,比如国际象棋或 [超级马里奥](https://wikipedia.org/wiki/Super_Mario)。为了让电脑玩游戏,我们需要它预测在每个游戏状态下应该采取的行动。虽然这看起来像是一个分类问题,但实际上并不是——因为我们没有一个包含状态和对应动作的数据集。虽然我们可能有一些数据,比如现有的国际象棋比赛或玩家玩超级马里奥的录像,但这些数据可能不足以覆盖足够多的可能状态。
+
+与其寻找现有的游戏数据,**强化学习**(RL)基于一个理念:*让电脑多次玩游戏并观察结果*。因此,要应用强化学习,我们需要两样东西:
+
+- **一个环境**和**一个模拟器**,允许我们多次玩游戏。这个模拟器会定义所有的游戏规则以及可能的状态和动作。
+
+- **一个奖励函数**,告诉我们每次行动或游戏过程中表现得如何。
+
+强化学习与其他类型的机器学习的主要区别在于,在RL中我们通常不知道自己是否赢了或输了,直到游戏结束。因此,我们无法单独判断某个动作是否是好的——我们只有在游戏结束时才会收到奖励。而我们的目标是设计算法,使我们能够在不确定的条件下训练模型。我们将学习一种称为**Q学习**的RL算法。
+
+## 课程
+
+1. [强化学习和Q学习简介](1-QLearning/README.md)
+2. [使用Gym模拟环境](2-Gym/README.md)
+
+## 致谢
+
+《强化学习简介》由 [Dmitry Soshnikov](http://soshnikov.com) 倾情创作 ❤️
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/9-Real-World/1-Applications/README.md b/translations/zh-CN/9-Real-World/1-Applications/README.md
new file mode 100644
index 000000000..53f79ed03
--- /dev/null
+++ b/translations/zh-CN/9-Real-World/1-Applications/README.md
@@ -0,0 +1,150 @@
+# 后记:机器学习在现实世界中的应用
+
+
+> 由 [Tomomi Imura](https://www.twitter.com/girlie_mac) 绘制的手绘笔记
+
+在本课程中,你学习了许多准备数据进行训练和创建机器学习模型的方法。你构建了一系列经典的回归、聚类、分类、自然语言处理和时间序列模型。恭喜你!现在,你可能会好奇这些模型的实际用途是什么……它们在现实世界中的应用是什么?
+
+尽管深度学习驱动的人工智能在工业界引起了广泛关注,但经典机器学习模型仍然有其重要的应用价值。事实上,你可能已经在日常生活中使用了其中的一些应用!在本课中,你将探索八个不同的行业和领域如何利用这些模型来使其应用更加高效、可靠、智能,并为用户创造更大的价值。
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 💰 金融
+
+金融领域为机器学习提供了许多机会。该领域的许多问题都可以通过机器学习建模和解决。
+
+### 信用卡欺诈检测
+
+我们在课程中学习了 [k-means 聚类](../../5-Clustering/2-K-Means/README.md),但它如何用于解决信用卡欺诈相关问题呢?
+
+k-means 聚类在一种称为**异常值检测**的信用卡欺诈检测技术中非常有用。异常值,即数据集中的偏离观测值,可以帮助我们判断信用卡的使用是否正常或是否存在异常情况。正如以下论文所述,你可以使用 k-means 聚类算法对信用卡数据进行分类,并根据每笔交易的异常程度将其分配到一个聚类中。然后,你可以评估最具风险的聚类以区分欺诈交易和合法交易。
+[参考](https://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.680.1195&rep=rep1&type=pdf)
+
+### 财富管理
+
+在财富管理中,个人或公司代表客户管理投资。他们的工作是长期维持和增长财富,因此选择表现良好的投资至关重要。
+
+评估某项投资表现的一种方法是通过统计回归。[线性回归](../../2-Regression/1-Tools/README.md)是理解基金相对于某个基准表现的有力工具。我们还可以推断回归结果是否具有统计显著性,以及它们对客户投资的影响程度。你甚至可以进一步扩展分析,使用多元回归来考虑额外的风险因素。以下论文展示了如何使用回归评估特定基金的表现。
+[参考](http://www.brightwoodventures.com/evaluating-fund-performance-using-regression/)
+
+## 🎓 教育
+
+教育领域也是机器学习可以应用的一个非常有趣的领域。这里有许多有趣的问题需要解决,例如检测考试或论文中的作弊行为,或管理纠正过程中的偏见(无论是有意还是无意)。
+
+### 预测学生行为
+
+[Coursera](https://coursera.com),一个在线开放课程提供商,在其技术博客中讨论了许多工程决策。在这个案例研究中,他们绘制了一条回归线,试图探索低 NPS(净推荐值)评分与课程保留或退课之间的相关性。
+[参考](https://medium.com/coursera-engineering/controlled-regression-quantifying-the-impact-of-course-quality-on-learner-retention-31f956bd592a)
+
+### 减少偏见
+
+[Grammarly](https://grammarly.com),一个检查拼写和语法错误的写作助手,在其产品中使用了复杂的[自然语言处理系统](../../6-NLP/README.md)。他们在技术博客中发布了一篇有趣的案例研究,讨论了如何处理机器学习中的性别偏见问题,这也是你在我们的[公平性入门课程](../../1-Introduction/3-fairness/README.md)中学习过的内容。
+[参考](https://www.grammarly.com/blog/engineering/mitigating-gender-bias-in-autocorrect/)
+
+## 👜 零售
+
+零售行业可以通过机器学习受益,从优化客户体验到优化库存管理。
+
+### 个性化客户体验
+
+在 Wayfair,一家销售家具等家居用品的公司,帮助客户找到符合他们品味和需求的产品至关重要。在这篇文章中,该公司的工程师描述了他们如何使用机器学习和自然语言处理来“为客户提供合适的搜索结果”。特别是,他们的查询意图引擎通过实体提取、分类器训练、资产和意见提取以及客户评论的情感标记来实现。这是 NLP 在在线零售中的经典应用案例。
+[参考](https://www.aboutwayfair.com/tech-innovation/how-we-use-machine-learning-and-natural-language-processing-to-empower-search)
+
+### 库存管理
+
+像 [StitchFix](https://stitchfix.com) 这样的创新型公司,一个向消费者发送服装盒的服务,严重依赖机器学习进行推荐和库存管理。他们的造型团队与商品团队紧密合作:“我们的数据科学家使用遗传算法并将其应用于服装,以预测哪些尚不存在的服装可能会成功。我们将这一工具提供给商品团队,现在他们可以将其作为工具使用。”
+[参考](https://www.zdnet.com/article/how-stitch-fix-uses-machine-learning-to-master-the-science-of-styling/)
+
+## 🏥 医疗保健
+
+医疗保健领域可以利用机器学习优化研究任务以及物流问题,例如患者再入院管理或疾病传播控制。
+
+### 临床试验管理
+
+临床试验中的毒性是药物制造商的主要关注点。多少毒性是可以接受的?在这项研究中,分析各种临床试验方法导致了一种预测临床试验结果概率的新方法的开发。具体来说,他们使用随机森林生成了一个[分类器](../../4-Classification/README.md),能够区分药物组。
+[参考](https://www.sciencedirect.com/science/article/pii/S2451945616302914)
+
+### 医院再入院管理
+
+医院护理成本高昂,尤其是当患者需要再次入院时。这篇论文讨论了一家公司如何使用机器学习通过[聚类](../../5-Clustering/README.md)算法预测再入院的可能性。这些聚类帮助分析师“发现可能具有共同原因的再入院群体”。
+[参考](https://healthmanagement.org/c/healthmanagement/issuearticle/hospital-readmissions-and-machine-learning)
+
+### 疾病管理
+
+最近的疫情突显了机器学习在阻止疾病传播方面的作用。在这篇文章中,你会看到 ARIMA、逻辑曲线、线性回归和 SARIMA 的应用。“这项工作试图计算病毒的传播率,从而预测死亡、康复和确诊病例,以帮助我们更好地准备和应对。”
+[参考](https://www.ncbi.nlm.nih.gov/pmc/articles/PMC7979218/)
+
+## 🌲 生态与绿色科技
+
+自然和生态由许多敏感系统组成,动物与自然之间的相互作用尤为重要。准确测量这些系统并在发生问题时采取适当行动(例如森林火灾或动物数量下降)非常重要。
+
+### 森林管理
+
+你在之前的课程中学习了[强化学习](../../8-Reinforcement/README.md)。它在预测自然模式时非常有用。特别是,它可以用于跟踪生态问题,例如森林火灾和入侵物种的传播。在加拿大,一组研究人员使用强化学习从卫星图像中构建了森林火灾动态模型。通过创新的“空间传播过程(SSP)”,他们将森林火灾视为“景观中任何单元格的代理”。“火灾在任何时间点可以采取的行动包括向北、南、东或西传播或不传播。”
+
+这种方法颠覆了通常的强化学习设置,因为相应马尔可夫决策过程(MDP)的动态是已知的即时火灾传播函数。阅读以下链接了解该团队使用的经典算法。
+[参考](https://www.frontiersin.org/articles/10.3389/fict.2018.00006/full)
+
+### 动物运动感知
+
+虽然深度学习在视觉跟踪动物运动方面带来了革命性变化(你可以在这里构建自己的[北极熊追踪器](https://docs.microsoft.com/learn/modules/build-ml-model-with-azure-stream-analytics/?WT.mc_id=academic-77952-leestott)),但经典机器学习在这一任务中仍然有其作用。
+
+用于跟踪农场动物运动的传感器和物联网利用了这种视觉处理,但更基本的机器学习技术在数据预处理方面非常有用。例如,在这篇论文中,使用各种分类器算法监测和分析了羊的姿势。你可能会在第 335 页看到 ROC 曲线。
+[参考](https://druckhaus-hofmann.de/gallery/31-wj-feb-2020.pdf)
+
+### ⚡️ 能源管理
+
+在我们关于[时间序列预测](../../7-TimeSeries/README.md)的课程中,我们提到了智能停车计时器的概念,通过理解供需关系为一个城镇创造收入。这篇文章详细讨论了聚类、回归和时间序列预测如何结合起来帮助预测爱尔兰未来的能源使用,基于智能计量。
+[参考](https://www-cdn.knime.com/sites/default/files/inline-images/knime_bigdata_energy_timeseries_whitepaper.pdf)
+
+## 💼 保险
+
+保险行业是另一个使用机器学习构建和优化可行财务和精算模型的领域。
+
+### 波动性管理
+
+MetLife,一家人寿保险提供商,公开了他们分析和缓解财务模型波动性的方法。在这篇文章中,你会看到二元和序列分类的可视化图表,还会发现预测的可视化图表。
+[参考](https://investments.metlife.com/content/dam/metlifecom/us/investments/insights/research-topics/macro-strategy/pdf/MetLifeInvestmentManagement_MachineLearnedRanking_070920.pdf)
+
+## 🎨 艺术、文化与文学
+
+在艺术领域,例如新闻业,有许多有趣的问题。检测假新闻是一个巨大的挑战,因为它已被证明会影响人们的观点,甚至颠覆民主。博物馆也可以通过机器学习受益,从发现文物之间的联系到资源规划。
+
+### 假新闻检测
+
+在当今媒体中,检测假新闻已成为一场猫捉老鼠的游戏。在这篇文章中,研究人员建议测试结合我们学习过的多种机器学习技术的系统,并部署最佳模型:“该系统基于自然语言处理从数据中提取特征,然后使用这些特征训练机器学习分类器,例如朴素贝叶斯、支持向量机(SVM)、随机森林(RF)、随机梯度下降(SGD)和逻辑回归(LR)。”
+[参考](https://www.irjet.net/archives/V7/i6/IRJET-V7I6688.pdf)
+
+这篇文章展示了如何结合不同的机器学习领域来产生有趣的结果,从而帮助阻止假新闻的传播和造成的实际损害;在这种情况下,动机是关于 COVID 治疗的谣言传播引发的暴力事件。
+
+### 博物馆机器学习
+
+博物馆正处于人工智能革命的前沿,随着技术的进步,编目和数字化收藏以及发现文物之间的联系变得更加容易。像 [In Codice Ratio](https://www.sciencedirect.com/science/article/abs/pii/S0306457321001035#:~:text=1.,studies%20over%20large%20historical%20sources.) 这样的项目正在帮助解锁难以接触的收藏,例如梵蒂冈档案。但博物馆的商业方面也从机器学习模型中受益。
+
+例如,芝加哥艺术学院构建了模型来预测观众的兴趣以及他们参观展览的时间。目标是每次用户参观博物馆时都能创造个性化和优化的体验。“在 2017 财年,该模型预测的参观人数和门票收入的准确率达到了 1%,”芝加哥艺术学院高级副总裁 Andrew Simnick 说道。
+[参考](https://www.chicagobusiness.com/article/20180518/ISSUE01/180519840/art-institute-of-chicago-uses-data-to-make-exhibit-choices)
+
+## 🏷 营销
+
+### 客户细分
+
+最有效的营销策略根据不同的分组以不同方式定位客户。在这篇文章中,讨论了聚类算法在支持差异化营销中的应用。差异化营销帮助公司提高品牌认知度、接触更多客户并赚取更多利润。
+[参考](https://ai.inqline.com/machine-learning-for-marketing-customer-segmentation/)
+
+## 🚀 挑战
+
+找出另一个受益于本课程中所学技术的领域,并探索它如何使用机器学习。
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 复习与自学
+
+Wayfair的数据科学团队制作了几段有趣的视频,介绍他们如何在公司中应用机器学习。值得[看看](https://www.youtube.com/channel/UCe2PjkQXqOuwkW1gw6Ameuw/videos)!
+
+## 作业
+
+[机器学习寻宝游戏](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/9-Real-World/1-Applications/assignment.md b/translations/zh-CN/9-Real-World/1-Applications/assignment.md
new file mode 100644
index 000000000..50491247f
--- /dev/null
+++ b/translations/zh-CN/9-Real-World/1-Applications/assignment.md
@@ -0,0 +1,18 @@
+# 一个机器学习寻宝游戏
+
+## 说明
+
+在本课中,你学习了许多通过经典机器学习解决的真实案例。虽然深度学习、新技术和工具的应用,以及神经网络的使用加速了这些领域工具的开发,但使用本课程中的经典机器学习技术仍然具有重要价值。
+
+在这个任务中,假设你正在参加一个黑客马拉松。利用你在课程中学到的知识,提出一个使用经典机器学习解决本课中讨论的某个领域问题的方案。创建一个演示文稿,讨论你将如何实现你的想法。如果你能收集样本数据并构建一个支持你概念的机器学习模型,还可以获得额外加分!
+
+## 评分标准
+
+| 标准 | 卓越表现 | 基本达标 | 需要改进 |
+| -------- | ---------------------------------------------------------------- | --------------------------------------------- | --------------------- |
+| | 提交了一个PowerPoint演示文稿 - 构建模型可获得额外加分 | 提交了一个非创新的基础演示文稿 | 工作不完整 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/9-Real-World/2-Debugging-ML-Models/README.md b/translations/zh-CN/9-Real-World/2-Debugging-ML-Models/README.md
new file mode 100644
index 000000000..190ec0817
--- /dev/null
+++ b/translations/zh-CN/9-Real-World/2-Debugging-ML-Models/README.md
@@ -0,0 +1,174 @@
+# 后记:使用负责任的AI仪表板组件进行机器学习模型调试
+
+## [课前测验](https://ff-quizzes.netlify.app/en/ml/)
+
+## 简介
+
+机器学习正在影响我们的日常生活。人工智能正在逐步渗透到一些对个人和社会至关重要的系统中,例如医疗、金融、教育和就业领域。例如,系统和模型参与了日常决策任务,如医疗诊断或欺诈检测。因此,随着人工智能的快速发展和广泛应用,社会对其的期望也在不断变化,同时相关法规也在逐步完善。我们经常看到人工智能系统未能达到预期的领域,它们暴露出新的挑战,而各国政府也开始对人工智能解决方案进行监管。因此,分析这些模型以确保其为所有人提供公平、可靠、包容、透明和负责任的结果是非常重要的。
+
+在本课程中,我们将探讨一些实用工具,这些工具可以用来评估模型是否存在负责任的人工智能问题。传统的机器学习调试技术通常基于定量计算,例如总体准确率或平均误差损失。然而,想象一下,当您用于构建这些模型的数据缺乏某些人口统计信息(如种族、性别、政治观点、宗教)或这些人口统计信息被不成比例地代表时会发生什么情况。如果模型的输出被解释为偏向某些人口统计信息,这可能会导致这些敏感特征组的过度或不足代表,从而引发模型的公平性、包容性或可靠性问题。此外,机器学习模型通常被认为是“黑箱”,这使得理解和解释模型预测的驱动因素变得困难。这些都是数据科学家和人工智能开发者在缺乏足够工具来调试和评估模型的公平性或可信度时面临的挑战。
+
+在本课程中,您将学习如何使用以下方法调试模型:
+
+- **错误分析**:识别模型在数据分布中错误率较高的区域。
+- **模型概览**:对不同数据群体进行比较分析,发现模型性能指标中的差异。
+- **数据分析**:调查数据是否存在过度或不足代表的情况,这可能导致模型偏向某些数据群体。
+- **特征重要性**:了解哪些特征在全局或局部层面驱动模型的预测。
+
+## 前提条件
+
+作为前提条件,请先查看[开发者的负责任AI工具](https://www.microsoft.com/ai/ai-lab-responsible-ai-dashboard)
+
+> 
+
+## 错误分析
+
+用于衡量准确性的传统模型性能指标通常基于正确与错误预测的计算。例如,确定一个模型89%的时间是准确的,误差损失为0.001,可以被认为是良好的性能。然而,错误通常不会在您的基础数据集中均匀分布。您可能获得89%的模型准确率,但发现模型在某些数据区域的失败率高达42%。这些特定数据群体的失败模式可能导致公平性或可靠性问题。因此,了解模型表现良好或不佳的区域至关重要。模型中错误率较高的数据区域可能是重要的数据群体。
+
+
+
+RAI仪表板上的错误分析组件通过树形可视化展示模型失败在不同群体中的分布情况。这有助于识别数据集中错误率较高的特征或区域。通过查看模型大部分错误的来源,您可以开始调查根本原因。您还可以创建数据群体以进行分析。这些数据群体有助于调试过程,以确定为什么模型在一个群体中表现良好,而在另一个群体中却出现错误。
+
+
+
+树形图上的视觉指示器可以更快地定位问题区域。例如,树节点的红色阴影越深,错误率越高。
+
+热图是另一种可视化功能,用户可以使用它通过一个或两个特征调查错误率,以发现整个数据集或群体中导致模型错误的因素。
+
+
+
+使用错误分析时,您可以:
+
+* 深入了解模型失败如何在数据集和多个输入及特征维度中分布。
+* 分解总体性能指标,自动发现错误群体,以指导您的针对性缓解措施。
+
+## 模型概览
+
+评估机器学习模型的性能需要全面了解其行为。这可以通过查看多个指标(如错误率、准确率、召回率、精确度或平均绝对误差(MAE))来发现性能指标中的差异来实现。一个性能指标可能看起来很好,但另一个指标可能暴露出不准确性。此外,比较整个数据集或群体中的指标差异有助于揭示模型表现良好或不佳的区域。这对于查看模型在敏感特征(如患者种族、性别或年龄)与非敏感特征之间的表现尤为重要,以发现模型可能存在的潜在不公平性。例如,发现模型在包含敏感特征的群体中错误率更高可能揭示模型潜在的不公平性。
+
+RAI仪表板的模型概览组件不仅有助于分析数据群体中的性能指标,还为用户提供了比较模型在不同群体中的行为的能力。
+
+
+
+组件的基于特征的分析功能允许用户缩小特定特征内的数据子群体,以更细粒度地识别异常。例如,仪表板具有内置智能,可以自动为用户选择的特征生成群体(例如,*"time_in_hospital < 3"* 或 *"time_in_hospital >= 7"*)。这使用户能够从较大的数据组中隔离特定特征,以查看它是否是模型错误结果的关键影响因素。
+
+
+
+模型概览组件支持两类差异指标:
+
+**模型性能差异**:这些指标计算所选性能指标在数据子群体之间的差异(差距)。以下是一些示例:
+
+* 准确率差异
+* 错误率差异
+* 精确度差异
+* 召回率差异
+* 平均绝对误差(MAE)差异
+
+**选择率差异**:此指标包含子群体之间选择率(有利预测)的差异。例如,贷款批准率的差异。选择率指的是每个类别中被分类为1的数据点的比例(在二元分类中)或预测值的分布(在回归中)。
+
+## 数据分析
+
+> “如果你对数据施加足够的压力,它会承认任何事情” - Ronald Coase
+
+这句话听起来极端,但确实如此,数据可以被操纵以支持任何结论。这种操纵有时可能是无意的。作为人类,我们都有偏见,而要意识到自己在数据中引入偏见通常是困难的。确保人工智能和机器学习的公平性仍然是一个复杂的挑战。
+
+数据是传统模型性能指标的一个巨大盲点。您可能有很高的准确率,但这并不总是反映数据集中可能存在的潜在数据偏差。例如,如果一个公司员工数据集中有27%的女性担任高管职位,而73%的男性担任同一职位,那么基于该数据训练的招聘广告AI模型可能会主要针对男性观众投放高级职位广告。这种数据的不平衡使模型的预测偏向了某一性别。这揭示了模型存在性别偏见的公平性问题。
+
+RAI仪表板上的数据分析组件有助于识别数据集中过度和不足代表的区域。它帮助用户诊断由于数据不平衡或缺乏特定数据群体代表性而引入的错误和公平性问题。这使用户能够根据预测和实际结果、错误群体以及特定特征可视化数据集。有时发现一个代表性不足的数据群体也可能揭示模型学习效果不佳,从而导致高错误率。一个具有数据偏差的模型不仅是一个公平性问题,还表明模型不够包容或可靠。
+
+
+
+使用数据分析时,您可以:
+
+* 通过选择不同的过滤器探索数据集统计信息,将数据切分为不同维度(也称为群体)。
+* 了解数据集在不同群体和特征组中的分布。
+* 确定与公平性、错误分析和因果关系相关的发现(来自其他仪表板组件)是否是数据集分布的结果。
+* 决定在哪些领域收集更多数据,以缓解由于代表性问题、标签噪声、特征噪声、标签偏差等因素导致的错误。
+
+## 模型可解释性
+
+机器学习模型通常是“黑箱”。理解哪些关键数据特征驱动模型的预测可能具有挑战性。提供模型为何做出某种预测的透明性非常重要。例如,如果一个AI系统预测某位糖尿病患者有可能在30天内再次入院,它应该能够提供支持其预测的数据。提供支持数据指标可以帮助临床医生或医院做出明智的决策。此外,能够解释模型为何对个别患者做出某种预测可以确保符合健康法规的责任。当您使用机器学习模型影响人们的生活时,理解和解释模型行为的驱动因素至关重要。模型可解释性和可解释性可以帮助回答以下场景中的问题:
+
+* 模型调试:为什么我的模型会犯这个错误?我该如何改进模型?
+* 人机协作:我如何理解并信任模型的决策?
+* 法规合规:我的模型是否满足法律要求?
+
+RAI仪表板的特征重要性组件帮助您调试并全面了解模型如何做出预测。它也是机器学习专业人士和决策者解释和展示影响模型行为的特征证据的有用工具,以满足法规要求。接下来,用户可以探索全局和局部解释,验证哪些特征驱动模型的预测。全局解释列出影响模型整体预测的主要特征。局部解释显示哪些特征导致模型对个别案例的预测。评估局部解释的能力在调试或审计特定案例时也很有帮助,以更好地理解和解释模型为何做出准确或不准确的预测。
+
+
+
+* 全局解释:例如,哪些特征影响糖尿病患者入院模型的整体行为?
+* 局部解释:例如,为什么一位年龄超过60岁且有过住院记录的糖尿病患者被预测为会或不会在30天内再次入院?
+
+在调试模型性能的过程中,特征重要性显示了特征在不同群体中的影响程度。它有助于揭示比较特征对模型错误预测的影响程度时的异常情况。特征重要性组件可以显示特征中的哪些值对模型结果产生了正面或负面影响。例如,如果模型做出了错误预测,该组件使您能够深入分析并确定哪些特征或特征值驱动了预测。这种细节不仅有助于调试,还在审计情况下提供了透明性和责任性。最后,该组件可以帮助您识别公平性问题。例如,如果种族或性别等敏感特征在驱动模型预测中具有高度影响力,这可能表明模型存在种族或性别偏见。
+
+
+
+使用可解释性时,您可以:
+
+* 通过了解哪些特征对预测最重要,确定您的AI系统预测的可信度。
+* 通过首先理解模型并识别模型是否使用健康特征或仅仅是错误关联来调试模型。
+* 发现潜在的不公平性来源,了解模型是否基于敏感特征或与敏感特征高度相关的特征进行预测。
+* 通过生成局部解释来展示模型结果,建立用户对模型决策的信任。
+* 完成AI系统的法规审计,以验证模型并监控模型决策对人类的影响。
+
+## 结论
+
+RAI仪表板的所有组件都是帮助您构建对社会更少伤害、更值得信赖的机器学习模型的实用工具。它有助于防止对人权的威胁;避免歧视或排除某些群体的生活机会;以及减少身体或心理伤害的风险。它还通过生成局部解释来展示模型结果,帮助建立对模型决策的信任。一些潜在的伤害可以分类为:
+
+- **分配**:例如,某一性别或种族被优待于另一性别或种族。
+- **服务质量**:如果您为一个特定场景训练数据,但现实情况更复杂,这会导致服务质量差。
+- **刻板印象**:将某一群体与预先分配的属性联系起来。
+- **贬低**:不公平地批评和标记某事或某人。
+- **过度或不足的代表性**。这个概念指的是某些群体在某些职业中未被看到,而任何继续推动这种现象的服务或功能都在助长伤害。
+
+### Azure RAI 仪表板
+
+[Azure RAI 仪表板](https://learn.microsoft.com/en-us/azure/machine-learning/concept-responsible-ai-dashboard?WT.mc_id=aiml-90525-ruyakubu) 基于由领先学术机构和组织(包括微软)开发的开源工具构建。这些工具对数据科学家和 AI 开发者理解模型行为、发现并缓解 AI 模型中的不良问题至关重要。
+
+- 通过查看 RAI 仪表板的[文档](https://learn.microsoft.com/en-us/azure/machine-learning/how-to-responsible-ai-dashboard?WT.mc_id=aiml-90525-ruyakubu),学习如何使用不同的组件。
+
+- 查看一些 RAI 仪表板的[示例笔记本](https://github.com/Azure/RAI-vNext-Preview/tree/main/examples/notebooks),以调试 Azure 机器学习中的更多负责任 AI 场景。
+
+---
+## 🚀 挑战
+
+为了从一开始就避免引入统计或数据偏差,我们应该:
+
+- 确保参与系统开发的人员具有多样化的背景和观点
+- 投资于反映社会多样性的数据集
+- 开发更好的方法来检测和纠正偏差
+
+思考现实生活中模型构建和使用中显而易见的不公平场景。我们还应该考虑什么?
+
+## [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+## 复习与自学
+
+在本课中,你学习了一些将负责任 AI 融入机器学习的实用工具。
+
+观看以下工作坊以更深入地了解相关主题:
+
+- 负责任 AI 仪表板:由 Besmira Nushi 和 Mehrnoosh Sameki 主讲,实践中实现 RAI 的一站式解决方案
+
+[](https://www.youtube.com/watch?v=f1oaDNl3djg "负责任 AI 仪表板:实践中实现 RAI 的一站式解决方案")
+
+> 🎥 点击上方图片观看视频:负责任 AI 仪表板:实践中实现 RAI 的一站式解决方案,由 Besmira Nushi 和 Mehrnoosh Sameki 主讲
+
+参考以下材料,了解更多关于负责任 AI 的内容以及如何构建更值得信赖的模型:
+
+- 微软的 RAI 仪表板工具,用于调试 ML 模型:[负责任 AI 工具资源](https://aka.ms/rai-dashboard)
+
+- 探索负责任 AI 工具包:[Github](https://github.com/microsoft/responsible-ai-toolbox)
+
+- 微软的 RAI 资源中心:[负责任 AI 资源 – Microsoft AI](https://www.microsoft.com/ai/responsible-ai-resources?activetab=pivot1%3aprimaryr4)
+
+- 微软的 FATE 研究组:[FATE:AI 中的公平性、问责性、透明性和伦理 - Microsoft Research](https://www.microsoft.com/research/theme/fate/)
+
+## 作业
+
+[探索 RAI 仪表板](assignment.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/9-Real-World/2-Debugging-ML-Models/assignment.md b/translations/zh-CN/9-Real-World/2-Debugging-ML-Models/assignment.md
new file mode 100644
index 000000000..a5ac0eb36
--- /dev/null
+++ b/translations/zh-CN/9-Real-World/2-Debugging-ML-Models/assignment.md
@@ -0,0 +1,16 @@
+# 探索负责任人工智能(RAI)仪表板
+
+## 说明
+
+在本课程中,您学习了RAI仪表板,这是一个基于“开源”工具构建的组件套件,旨在帮助数据科学家进行错误分析、数据探索、公平性评估、模型可解释性、反事实/假设评估以及人工智能系统的因果分析。作为本次作业的一部分,请探索一些RAI仪表板的示例[笔记本](https://github.com/Azure/RAI-vNext-Preview/tree/main/examples/notebooks),并在论文或演示文稿中报告您的发现。
+
+## 评分标准
+
+| 标准 | 优秀 | 合格 | 需要改进 |
+| -------- | --------- | -------- | ----------------- |
+| | 提交了一份讨论RAI仪表板组件、运行的笔记本以及从中得出的结论的论文或PPT演示文稿 | 提交了一份没有结论的论文 | 未提交论文 |
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。因使用本翻译而导致的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/9-Real-World/README.md b/translations/zh-CN/9-Real-World/README.md
new file mode 100644
index 000000000..a442de445
--- /dev/null
+++ b/translations/zh-CN/9-Real-World/README.md
@@ -0,0 +1,23 @@
+# 后记:经典机器学习的实际应用
+
+在本课程的这一部分中,您将了解经典机器学习在现实世界中的一些应用。我们在互联网上搜集了关于这些策略应用的白皮书和文章,尽量避免涉及神经网络、深度学习和人工智能。了解机器学习如何应用于商业系统、生态应用、金融、艺术与文化等领域。
+
+
+
+> 图片由 Alexis Fauvet 提供,来源于 Unsplash
+
+## 课程
+
+1. [机器学习的实际应用](1-Applications/README.md)
+2. [使用负责任的AI仪表板组件进行机器学习模型调试](2-Debugging-ML-Models/README.md)
+
+## 致谢
+
+“机器学习的实际应用”由包括 [Jen Looper](https://twitter.com/jenlooper) 和 [Ornella Altunyan](https://twitter.com/ornelladotcom) 在内的团队撰写。
+
+“使用负责任的AI仪表板组件进行机器学习模型调试”由 [Ruth Yakubu](https://twitter.com/ruthieyakubu) 撰写。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/AGENTS.md b/translations/zh-CN/AGENTS.md
new file mode 100644
index 000000000..20b20fa4a
--- /dev/null
+++ b/translations/zh-CN/AGENTS.md
@@ -0,0 +1,336 @@
+# AGENTS.md
+
+## 项目概述
+
+这是**机器学习入门**,一个全面的12周、26课的课程体系,涵盖使用Python(主要是Scikit-learn)和R的经典机器学习概念。该仓库设计为一个自学资源,包含实践项目、测验和作业。每节课通过来自世界各地不同文化和地区的真实数据探索机器学习概念。
+
+关键组成部分:
+- **教育内容**:26节课,涵盖机器学习简介、回归、分类、聚类、自然语言处理(NLP)、时间序列和强化学习
+- **测验应用**:基于Vue.js的测验应用,提供课前和课后评估
+- **多语言支持**:通过GitHub Actions自动翻译成40多种语言
+- **双语言支持**:课程内容同时提供Python(Jupyter笔记本)和R(R Markdown文件)
+- **基于项目的学习**:每个主题都包含实践项目和作业
+
+## 仓库结构
+
+```
+ML-For-Beginners/
+├── 1-Introduction/ # ML basics, history, fairness, techniques
+├── 2-Regression/ # Regression models with Python/R
+├── 3-Web-App/ # Flask web app for ML model deployment
+├── 4-Classification/ # Classification algorithms
+├── 5-Clustering/ # Clustering techniques
+├── 6-NLP/ # Natural Language Processing
+├── 7-TimeSeries/ # Time series forecasting
+├── 8-Reinforcement/ # Reinforcement learning
+├── 9-Real-World/ # Real-world ML applications
+├── quiz-app/ # Vue.js quiz application
+├── translations/ # Auto-generated translations
+└── sketchnotes/ # Visual learning aids
+```
+
+每个课程文件夹通常包含:
+- `README.md` - 主要课程内容
+- `notebook.ipynb` - Python Jupyter笔记本
+- `solution/` - 解决方案代码(Python和R版本)
+- `assignment.md` - 练习题
+- `images/` - 可视化资源
+
+## 设置命令
+
+### 针对Python课程
+
+大多数课程使用Jupyter笔记本。安装所需依赖项:
+
+```bash
+# Install Python 3.8+ if not already installed
+python --version
+
+# Install Jupyter
+pip install jupyter
+
+# Install common ML libraries
+pip install scikit-learn pandas numpy matplotlib seaborn
+
+# For specific lessons, check lesson-specific requirements
+# Example: Web App lesson
+pip install flask
+```
+
+### 针对R课程
+
+R课程位于`solution/R/`文件夹中,以`.rmd`或`.ipynb`文件形式存在:
+
+```bash
+# Install R and required packages
+# In R console:
+install.packages(c("tidyverse", "tidymodels", "caret"))
+```
+
+### 针对测验应用
+
+测验应用是一个位于`quiz-app/`目录中的Vue.js应用:
+
+```bash
+cd quiz-app
+npm install
+```
+
+### 针对文档站点
+
+本地运行文档:
+
+```bash
+# Install Docsify
+npm install -g docsify-cli
+
+# Serve from repository root
+docsify serve
+
+# Access at http://localhost:3000
+```
+
+## 开发工作流程
+
+### 使用课程笔记本
+
+1. 进入课程目录(例如,`2-Regression/1-Tools/`)
+2. 打开Jupyter笔记本:
+ ```bash
+ jupyter notebook notebook.ipynb
+ ```
+3. 学习课程内容并完成练习
+4. 如有需要,可查看`solution/`文件夹中的解决方案
+
+### Python开发
+
+- 课程使用标准的Python数据科学库
+- Jupyter笔记本用于交互式学习
+- 每节课的`solution/`文件夹中提供解决方案代码
+
+### R开发
+
+- R课程以`.rmd`格式(R Markdown)提供
+- 解决方案位于`solution/R/`子目录中
+- 使用RStudio或带有R内核的Jupyter运行R笔记本
+
+### 测验应用开发
+
+```bash
+cd quiz-app
+
+# Start development server
+npm run serve
+# Access at http://localhost:8080
+
+# Build for production
+npm run build
+
+# Lint and fix files
+npm run lint
+```
+
+## 测试说明
+
+### 测验应用测试
+
+```bash
+cd quiz-app
+
+# Lint code
+npm run lint
+
+# Build to verify no errors
+npm run build
+```
+
+**注意**:这是一个主要用于教育的课程仓库。课程内容没有自动化测试。验证通过以下方式完成:
+- 完成课程练习
+- 成功运行笔记本单元格
+- 将输出与解决方案中的预期结果进行比较
+
+## 代码风格指南
+
+### Python代码
+- 遵循PEP 8风格指南
+- 使用清晰、描述性的变量名
+- 对复杂操作添加注释
+- Jupyter笔记本应包含解释概念的Markdown单元格
+
+### JavaScript/Vue.js(测验应用)
+- 遵循Vue.js风格指南
+- ESLint配置位于`quiz-app/package.json`
+- 运行`npm run lint`检查并自动修复问题
+
+### 文档
+- Markdown文件应清晰且结构良好
+- 在代码块中包含代码示例
+- 内部引用使用相对链接
+- 遵循现有的格式约定
+
+## 构建与部署
+
+### 测验应用部署
+
+测验应用可以部署到Azure静态Web应用:
+
+1. **先决条件**:
+ - Azure账户
+ - GitHub仓库(已分叉)
+
+2. **部署到Azure**:
+ - 创建Azure静态Web应用资源
+ - 连接到GitHub仓库
+ - 设置应用位置:`/quiz-app`
+ - 设置输出位置:`dist`
+ - Azure会自动创建GitHub Actions工作流
+
+3. **GitHub Actions工作流**:
+ - 工作流文件创建于`.github/workflows/azure-static-web-apps-*.yml`
+ - 推送到主分支时自动构建和部署
+
+### 文档PDF
+
+从文档生成PDF:
+
+```bash
+npm install
+npm run convert
+```
+
+## 翻译工作流程
+
+**重要**:翻译通过GitHub Actions使用Co-op Translator自动完成。
+
+- 当更改推送到`main`分支时,翻译会自动生成
+- **不要手动翻译内容** - 系统会处理
+- 工作流定义在`.github/workflows/co-op-translator.yml`
+- 使用Azure AI/OpenAI服务进行翻译
+- 支持40多种语言
+
+## 贡献指南
+
+### 针对内容贡献者
+
+1. **分叉仓库**并创建一个功能分支
+2. **修改课程内容**以添加或更新课程
+3. **不要修改翻译文件** - 它们是自动生成的
+4. **测试代码** - 确保所有笔记本单元格成功运行
+5. **验证链接和图片**是否正常工作
+6. **提交拉取请求**并提供清晰的描述
+
+### 拉取请求指南
+
+- **标题格式**:`[部分] 简要描述更改`
+ - 示例:`[回归] 修复第5课中的拼写错误`
+ - 示例:`[测验应用] 更新依赖项`
+- **提交前**:
+ - 确保所有笔记本单元格无错误执行
+ - 如果修改了测验应用,运行`npm run lint`
+ - 验证Markdown格式
+ - 测试任何新的代码示例
+- **拉取请求必须包括**:
+ - 更改描述
+ - 更改原因
+ - 如果有UI更改,提供截图
+- **行为准则**:遵循[Microsoft开源行为准则](CODE_OF_CONDUCT.md)
+- **CLA**:需要签署贡献者许可协议
+
+## 课程结构
+
+每节课遵循一致的模式:
+
+1. **课前测验** - 测试基础知识
+2. **课程内容** - 书面说明和解释
+3. **代码演示** - 笔记本中的实践示例
+4. **知识检查** - 验证学习理解
+5. **挑战** - 独立应用概念
+6. **作业** - 扩展练习
+7. **课后测验** - 评估学习成果
+
+## 常用命令参考
+
+```bash
+# Python/Jupyter
+jupyter notebook # Start Jupyter server
+jupyter notebook notebook.ipynb # Open specific notebook
+pip install -r requirements.txt # Install dependencies (where available)
+
+# Quiz App
+cd quiz-app
+npm install # Install dependencies
+npm run serve # Development server
+npm run build # Production build
+npm run lint # Lint and fix
+
+# Documentation
+docsify serve # Serve documentation locally
+npm run convert # Generate PDF
+
+# Git workflow
+git checkout -b feature/my-change # Create feature branch
+git add . # Stage changes
+git commit -m "Description" # Commit changes
+git push origin feature/my-change # Push to remote
+```
+
+## 其他资源
+
+- **Microsoft Learn集合**:[机器学习入门模块](https://learn.microsoft.com/en-us/collections/qrqzamz1nn2wx3?WT.mc_id=academic-77952-bethanycheum)
+- **测验应用**:[在线测验](https://ff-quizzes.netlify.app/en/ml/)
+- **讨论板**:[GitHub Discussions](https://github.com/microsoft/ML-For-Beginners/discussions)
+- **视频讲解**:[YouTube播放列表](https://aka.ms/ml-beginners-videos)
+
+## 关键技术
+
+- **Python**:机器学习课程的主要语言(Scikit-learn, Pandas, NumPy, Matplotlib)
+- **R**:使用tidyverse, tidymodels, caret的替代实现
+- **Jupyter**:Python课程的交互式笔记本
+- **R Markdown**:R课程的文档
+- **Vue.js 3**:测验应用框架
+- **Flask**:用于机器学习模型部署的Web应用框架
+- **Docsify**:文档站点生成器
+- **GitHub Actions**:CI/CD和自动翻译
+
+## 安全注意事项
+
+- **代码中不包含秘密信息**:不要提交API密钥或凭证
+- **依赖项**:保持npm和pip包更新
+- **用户输入**:Flask Web应用示例包括基本输入验证
+- **敏感数据**:示例数据集是公开且无敏感信息的
+
+## 故障排除
+
+### Jupyter笔记本
+
+- **内核问题**:如果单元格挂起,请重启内核:内核 → 重启
+- **导入错误**:确保使用pip安装了所有必需的包
+- **路径问题**:从笔记本所在目录运行笔记本
+
+### 测验应用
+
+- **npm安装失败**:清除npm缓存:`npm cache clean --force`
+- **端口冲突**:更改端口:`npm run serve -- --port 8081`
+- **构建错误**:删除`node_modules`并重新安装:`rm -rf node_modules && npm install`
+
+### R课程
+
+- **未找到包**:使用以下命令安装:`install.packages("package-name")`
+- **RMarkdown渲染问题**:确保安装了rmarkdown包
+- **内核问题**:可能需要为Jupyter安装IRkernel
+
+## 项目特定说明
+
+- 这主要是一个**学习课程**,而非生产代码
+- 重点是通过实践练习**理解机器学习概念**
+- 代码示例优先考虑**清晰性而非优化**
+- 大多数课程是**独立的**,可以单独完成
+- **提供解决方案**,但学习者应先尝试完成练习
+- 仓库使用**Docsify**生成Web文档,无需构建步骤
+- **手绘笔记**提供概念的可视化总结
+- **多语言支持**使内容全球可访问
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们对因使用此翻译而产生的任何误解或误读不承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/CODE_OF_CONDUCT.md b/translations/zh-CN/CODE_OF_CONDUCT.md
new file mode 100644
index 000000000..fa794ccfb
--- /dev/null
+++ b/translations/zh-CN/CODE_OF_CONDUCT.md
@@ -0,0 +1,14 @@
+# Microsoft 开源行为准则
+
+本项目已采用 [Microsoft 开源行为准则](https://opensource.microsoft.com/codeofconduct/)。
+
+资源:
+
+- [Microsoft 开源行为准则](https://opensource.microsoft.com/codeofconduct/)
+- [Microsoft 行为准则常见问题](https://opensource.microsoft.com/codeofconduct/faq/)
+- 如有疑问或需帮助,请联系 [opencode@microsoft.com](mailto:opencode@microsoft.com)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。对于因使用本翻译而引起的任何误解或误读,我们概不负责。
\ No newline at end of file
diff --git a/translations/zh-CN/CONTRIBUTING.md b/translations/zh-CN/CONTRIBUTING.md
new file mode 100644
index 000000000..832b3d123
--- /dev/null
+++ b/translations/zh-CN/CONTRIBUTING.md
@@ -0,0 +1,16 @@
+# 贡献
+
+本项目欢迎贡献和建议。大多数贡献需要您同意一份贡献者许可协议 (CLA),声明您拥有并确实授予我们使用您贡献的权利。详情请访问 https://cla.microsoft.com。
+
+> 重要提示:在翻译此仓库中的文本时,请确保不要使用机器翻译。我们将通过社区验证翻译,因此请仅在您熟练掌握的语言中自愿进行翻译。
+
+当您提交一个拉取请求时,CLA-bot 会自动判断您是否需要提供 CLA,并适当地标记 PR(例如,标签、评论)。只需按照机器人提供的指示操作即可。您只需在所有使用我们 CLA 的仓库中完成一次此操作。
+
+本项目已采用 [Microsoft 开源行为准则](https://opensource.microsoft.com/codeofconduct/)。
+有关更多信息,请参阅 [行为准则常见问题](https://opensource.microsoft.com/codeofconduct/faq/)
+或通过 [opencode@microsoft.com](mailto:opencode@microsoft.com) 联系我们,提出其他问题或意见。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/PyTorch_Fundamentals.ipynb b/translations/zh-CN/PyTorch_Fundamentals.ipynb
new file mode 100644
index 000000000..c82c561cb
--- /dev/null
+++ b/translations/zh-CN/PyTorch_Fundamentals.ipynb
@@ -0,0 +1,2830 @@
+{
+ "nbformat": 4,
+ "nbformat_minor": 0,
+ "metadata": {
+ "colab": {
+ "provenance": [],
+ "gpuType": "T4",
+ "authorship_tag": "ABX9TyOgv0AozH1FKQBD+RkgT2bV",
+ "include_colab_link": true
+ },
+ "kernelspec": {
+ "name": "python3",
+ "display_name": "Python 3"
+ },
+ "language_info": {
+ "name": "python"
+ },
+ "accelerator": "GPU",
+ "coopTranslator": {
+ "original_hash": "0ca21b6ee62904d616f2e36dc1cf0da7",
+ "translation_date": "2025-09-03T19:15:43+00:00",
+ "source_file": "PyTorch_Fundamentals.ipynb",
+ "language_code": "zh"
+ }
+ },
+ "cells": [
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "view-in-github",
+ "colab_type": "text"
+ },
+ "source": []
+ },
+ {
+ "cell_type": "code",
+ "execution_count": null,
+ "metadata": {
+ "id": "EHh5JllMh1rG",
+ "outputId": "f55755ad-c369-414c-85ec-6e9d4f061a02",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ }
+ },
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'2.2.1+cu121'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 1
+ }
+ ],
+ "source": [
+ "import torch\n",
+ "torch.__version__"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "print(\"I am excited to run this\")"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "UPlb-duwXAfz",
+ "outputId": "cfd687e4-1238-49f4-ab6b-ee1305b740d2"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "I am excited to run this\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import torch\n",
+ "import pandas as pd\n",
+ "import numpy as np\n",
+ "import matplotlib.pyplot as plt\n",
+ "print(torch.__version__)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "byWVlJ9wXDSk",
+ "outputId": "fd74a5c4-4d4a-41b2-ef3c-562ea3e4811f"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "2.2.1+cu121\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "Osm80zoEYklS"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# scalar\n",
+ "scalar = torch.tensor(7)\n",
+ "scalar"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "-o8wvJ-VXZmI",
+ "outputId": "558816f5-1205-4de1-fe1f-2f96e9bd79e6"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(7)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 4
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "scalar.ndim"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "mCZ2tXC4Y_Sg",
+ "outputId": "2d86dbdc-56e1-45c6-d3dd-14515f2a457a"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "0"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "scalar.item()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ssN00By0ZQgS",
+ "outputId": "490f40d1-5135-4969-a6d3-c8c902cdc473"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "7"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# vector\n",
+ "vector = torch.tensor([7, 7])\n",
+ "vector\n",
+ "#vector.ndim\n",
+ "#vector.item()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Bws__5wlZnmF",
+ "outputId": "944e38f9-5ba1-4ddc-a9c6-cfb6a19bb488"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([7, 7])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "vector.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "9pjCvnsZZzNG",
+ "outputId": "e030a4da-8f81-4858-fbce-86da2aaafe52"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.Size([2])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Matrix\n",
+ "MATRIX = torch.tensor([[7, 8],[9, 10]])\n",
+ "MATRIX"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "a747hI9SaBGW",
+ "outputId": "af835ddb-81ff-4981-badb-441567194d15"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[ 7, 8],\n",
+ " [ 9, 10]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "MATRIX.ndim"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "XdTfFa7vaRUj",
+ "outputId": "0fbbab9c-8263-4cad-a380-0d2a16ca499e"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "2"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "MATRIX[0]\n",
+ "MATRIX[1]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "TFeD3jSDafm7",
+ "outputId": "69b44ab3-5ba7-451a-c6b2-f019a03d0c96"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([ 9, 10])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "# Tensor\n",
+ "TENSOR = torch.tensor([[[1, 2, 3],[3,6,9], [2,4,5]]])\n",
+ "TENSOR"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ic3cE47tah42",
+ "outputId": "f250e295-91de-43ec-9d80-588a6fe0abde"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[[1, 2, 3],\n",
+ " [3, 6, 9],\n",
+ " [2, 4, 5]]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 12
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "TENSOR.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Wvjf5fczbAM1",
+ "outputId": "9c72b5b8-bafe-4ae7-9883-b051e209eada"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.Size([1, 3, 3])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "TENSOR.ndim"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "mwtXZwiMbN3m",
+ "outputId": "331a5e36-b1b0-4a5f-a9b8-e7049cbaa8f9"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "3"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "TENSOR[0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "vzdZu_IfbP3J",
+ "outputId": "e24e7e71-e365-412d-ff50-fc094b56d2f3"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1, 2, 3],\n",
+ " [3, 6, 9],\n",
+ " [2, 4, 5]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "A8OL9eWfcRrJ"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "random_tensor = torch.rand(3,4)\n",
+ "random_tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "hAqSDE1EcVS_",
+ "outputId": "946171c3-d054-400c-f893-79110356888c"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0.4414, 0.7681, 0.8385, 0.3166],\n",
+ " [0.0468, 0.5812, 0.0670, 0.9173],\n",
+ " [0.2959, 0.3276, 0.7411, 0.4643]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "random_tensor.ndim"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "g4fvPE5GcwzP",
+ "outputId": "8737f36b-6864-4059-eaed-6f9156c22306"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "2"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 17
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "random_tensor.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "XsAg99QmdAU6",
+ "outputId": "35467c11-257c-4f16-99aa-eca930bcbc36"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.Size([3, 4])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "random_tensor.size()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cii1pNdVdB68",
+ "outputId": "fc8d2de6-9215-43de-99f7-7b0d7f7d20fa"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.Size([3, 4])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 19
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "random_image_tensor = torch.rand(size=(3, 224, 224)) #color channels, height, width\n",
+ "random_image_tensor.ndim, random_image_tensor.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "aTKq2j0cdDjb",
+ "outputId": "6be42057-20b9-4faf-d79d-8b65c42cc27e"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(3, torch.Size([3, 224, 224]))"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 20
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "random_tensor_ofownsize = torch.rand(size=(5,10,10))\n",
+ "random_tensor_ofownsize.ndim, random_tensor_ofownsize.shape\n"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "IyhDdj-Pd6nC",
+ "outputId": "43e5e334-6d4d-4b67-f87d-7d364c6d8c67"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(3, torch.Size([5, 10, 10]))"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 21
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "UOJW08uOert_"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "zero = torch.zeros(size=(3, 4))\n",
+ "zero"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "uGvXtaXyefie",
+ "outputId": "d40d3e28-8667-4d2f-8b62-f0829c6162ad"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 22
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "zero*random_tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "OyUkUPkDe0uH",
+ "outputId": "26c2e4be-36ba-4c6c-9a90-2704ec135828"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "ones = torch.ones(size=(3, 4))\n",
+ "ones\n"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "y_Ac62Aqe82G",
+ "outputId": "291de5d9-b9df-49de-c9d1-d098e3e9f4d8"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1., 1., 1., 1.],\n",
+ " [1., 1., 1., 1.],\n",
+ " [1., 1., 1., 1.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 24
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "ones.dtype"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "TvGOA9odfIEO",
+ "outputId": "45949ef4-6649-4b6c-d6af-2d4bfb8de832"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.float32"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 25
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "ones*zero"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "--pTyge-fI-8",
+ "outputId": "c4d9bb7e-829b-43db-e2db-b1a2d64e61f0"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.],\n",
+ " [0., 0., 0., 0.]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 26
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [],
+ "metadata": {
+ "id": "qDcc7Z36fSJF"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "one_to_ten = torch.arange(start = 1, end = 11, step = 1)\n",
+ "one_to_ten"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "w3CZB4zUfR1s",
+ "outputId": "197fcba1-da0a-4b4a-ed11-3974bd6c01aa"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([ 1, 2, 3, 4, 5, 6, 7, 8, 9, 10])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 27
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "ten_zeros = torch.zeros_like(one_to_ten)\n",
+ "ten_zeros"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "WZh99BwVfRy8",
+ "outputId": "51ef8bfb-6fa0-4099-ff66-b97d65b2ddea"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([0, 0, 0, 0, 0, 0, 0, 0, 0, 0])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 28
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "source": [
+ "张量数据类型\n"
+ ],
+ "metadata": {
+ "id": "pGGhgsbUgqbW"
+ }
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "float_32_tensor = torch.tensor([3.0, 6.0,9.0], dtype = None, device = None, requires_grad = False)\n",
+ "float_32_tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "JORJl4XkfRsx",
+ "outputId": "71114171-0f49-481f-b6fc-6cb48e2fb895"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([3., 6., 9.])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 29
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "float_32_tensor.dtype"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "6wOPPwGyfRLn",
+ "outputId": "f23776a1-b682-404a-9f67-d5bcb0402666"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.float32"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 30
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "float_16_tensor = float_32_tensor.type(torch.float16)\n",
+ "float_16_tensor.dtype"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "tFsHCvmZfOYe",
+ "outputId": "d3aa305a-7591-47f5-97fd-61bff60b44bd"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.float16"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 31
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "float_16_tensor*float_32_tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "TQiCGTPuwq0q",
+ "outputId": "98750fce-1ca3-4889-e269-8b753efdea96"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([ 9., 36., 81.])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 32
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "int_32_tensor = torch.tensor([3, 6, 9], dtype = torch.int32)\n",
+ "int_32_tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "5hlrLvGUw5D_",
+ "outputId": "41d890a0-9aee-446c-d906-631ce2ab0995"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([3, 6, 9], dtype=torch.int32)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 33
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "int_32_tensor*float_32_tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ihApD9u3xTNW",
+ "outputId": "d295eed0-6996-4e0f-8502-ff4b55cd1373"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([ 9., 36., 81.])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 34
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x = torch.arange(0,100,10)"
+ ],
+ "metadata": {
+ "id": "utKhlb_KxWDQ"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "p78D74E9Rj7Y",
+ "outputId": "781a1614-a900-41f5-9e5d-358f0b2390aa"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([ 0, 10, 20, 30, 40, 50, 60, 70, 80, 90])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 36
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x.min()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "4BcSs5NeRkcj",
+ "outputId": "3f24a8dc-58e9-4a5f-9834-e85856a34f9d"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 37
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x.max()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "hinqvXVLRm4q",
+ "outputId": "5c7d8a53-3913-4ac1-bba3-5ba8ff68250a"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(90)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 38
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.mean(x.type(torch.float32))"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "k7okc0_vRpnB",
+ "outputId": "91e5494f-dc57-417c-ea4d-25dbc547c893"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(45.)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 39
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x.type(torch.float32).mean()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "29QcDTjHRq10",
+ "outputId": "62937c6c-78e0-49f2-dde3-1543ee8f7907"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(45.)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 40
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x.sum()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "wlpY_G_sbdKF",
+ "outputId": "475d8258-af65-4011-a258-b93d4d8142d4"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(450)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 41
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x.argmax()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "GT6HJzwhbk4n",
+ "outputId": "2e455c20-c322-4bcf-d07c-1259d3ccefc6"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(9)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 42
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x.argmin()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "egL3oi2Mb19P",
+ "outputId": "f71fb32f-6338-44a3-b377-75bea0a3ab54"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 43
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "p2U8DZKib3DP",
+ "outputId": "b9f613b9-74e9-45f4-ed01-05babb6a6793"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 44
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[9]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "24qBFlGYcABe",
+ "outputId": "5813cfcb-7f63-4bd7-ee46-f95ccbfda939"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(90)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 45
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x = torch.arange(1, 10)\n",
+ "x.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "0GPOxEzkcBHO",
+ "outputId": "aefbd903-4f4c-4d2c-c90f-eccd682fe018"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "torch.Size([9])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 46
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_reshaped = x.reshape(1,9)\n",
+ "x_reshaped, x_reshaped.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "spmRgQjwddgp",
+ "outputId": "85a7c55c-2909-4ea2-fc68-386dddc65742"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9]]), torch.Size([1, 9]))"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 47
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_reshaped.view(1,9)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "tH2ahWGydqqP",
+ "outputId": "65d92263-4fc4-434a-c06d-c5e08436f7fe"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1, 2, 3, 4, 5, 6, 7, 8, 9]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 48
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_stacked = torch.stack([x, x, x, x], dim = 1)\n",
+ "x_stacked"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "jgCeJcaud_-1",
+ "outputId": "7f293a37-6ef1-43b6-aee5-9d6d91c94f9e"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1, 1, 1, 1],\n",
+ " [2, 2, 2, 2],\n",
+ " [3, 3, 3, 3],\n",
+ " [4, 4, 4, 4],\n",
+ " [5, 5, 5, 5],\n",
+ " [6, 6, 6, 6],\n",
+ " [7, 7, 7, 7],\n",
+ " [8, 8, 8, 8],\n",
+ " [9, 9, 9, 9]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 49
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_stacked.squeeze()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "XhJHIK6cfPse",
+ "outputId": "06c47b89-3a9e-453e-bcc3-00cbcb0b8b49"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1, 1, 1, 1],\n",
+ " [2, 2, 2, 2],\n",
+ " [3, 3, 3, 3],\n",
+ " [4, 4, 4, 4],\n",
+ " [5, 5, 5, 5],\n",
+ " [6, 6, 6, 6],\n",
+ " [7, 7, 7, 7],\n",
+ " [8, 8, 8, 8],\n",
+ " [9, 9, 9, 9]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 50
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_stacked.unsqueeze(dim=1)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ej2c3Xxzf0tq",
+ "outputId": "94024061-eb37-446d-c4a8-e4d16cb6de81"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[[1, 1, 1, 1]],\n",
+ "\n",
+ " [[2, 2, 2, 2]],\n",
+ "\n",
+ " [[3, 3, 3, 3]],\n",
+ "\n",
+ " [[4, 4, 4, 4]],\n",
+ "\n",
+ " [[5, 5, 5, 5]],\n",
+ "\n",
+ " [[6, 6, 6, 6]],\n",
+ "\n",
+ " [[7, 7, 7, 7]],\n",
+ "\n",
+ " [[8, 8, 8, 8]],\n",
+ "\n",
+ " [[9, 9, 9, 9]]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 52
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_stacked.squeeze()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "4DJYo1a0f5M0",
+ "outputId": "efca2b47-1b14-44de-9a9a-2c83629d153f"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1, 1, 1, 1],\n",
+ " [2, 2, 2, 2],\n",
+ " [3, 3, 3, 3],\n",
+ " [4, 4, 4, 4],\n",
+ " [5, 5, 5, 5],\n",
+ " [6, 6, 6, 6],\n",
+ " [7, 7, 7, 7],\n",
+ " [8, 8, 8, 8],\n",
+ " [9, 9, 9, 9]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 53
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_stacked.unsqueeze(dim=-2)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "J4iEjn2ah2HL",
+ "outputId": "22395593-7c16-4162-beae-dd2bbe7bda35"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[[1, 1, 1, 1]],\n",
+ "\n",
+ " [[2, 2, 2, 2]],\n",
+ "\n",
+ " [[3, 3, 3, 3]],\n",
+ "\n",
+ " [[4, 4, 4, 4]],\n",
+ "\n",
+ " [[5, 5, 5, 5]],\n",
+ "\n",
+ " [[6, 6, 6, 6]],\n",
+ "\n",
+ " [[7, 7, 7, 7]],\n",
+ "\n",
+ " [[8, 8, 8, 8]],\n",
+ "\n",
+ " [[9, 9, 9, 9]]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 55
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import torch\n",
+ "tensor = torch.tensor([1, 2, 3])\n",
+ "tensor = tensor - 10\n",
+ "tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cFfiD7Nth7Z_",
+ "outputId": "1139e1f8-fc1a-46ca-d636-f2bc4fd2eef6"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([-9, -8, -7])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 7
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.mul(tensor, 10)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "dyA7BM_GHhqE",
+ "outputId": "0e3b9671-d9e8-4a32-87bb-59bc05986142"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([-90, -80, -70])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 9
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.sub(tensor, 100)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "owtUsZ1KNegI",
+ "outputId": "189b7b23-0041-4e09-b991-cd209a48506a"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([-109, -108, -107])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.add(tensor, 100)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "K5STXlQONsyc",
+ "outputId": "00cbb79a-0a1d-4e21-86ec-5c91c37a2d01"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([91, 92, 93])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.divide(tensor, 2)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "xqMGnzIUNvp0",
+ "outputId": "c894cf3e-f148-45f8-cfc8-d78740735306"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([-4.5000, -4.0000, -3.5000])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 13
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.matmul(tensor, tensor)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "ruGzKpV8NyBc",
+ "outputId": "fddb63bf-006f-48b6-ae28-287fbcda8bc5"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(194)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor@tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "8GS3r9yTeGfD",
+ "outputId": "c80b12ac-30b5-4f3d-c38c-9e41ba511b0e"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(194)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 16
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%%time\n",
+ "tensor@tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "QmuYHqXTemC0",
+ "outputId": "402fe3ba-70b5-4bb2-c83b-254db84ff810"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "CPU times: user 622 µs, sys: 0 ns, total: 622 µs\n",
+ "Wall time: 516 µs\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(194)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 17
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "%%time\n",
+ "torch.matmul(tensor,tensor)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "dGr1fzdNepd8",
+ "outputId": "97bd6c91-bc25-4b38-cdf5-f22dcdef243e"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "CPU times: user 424 µs, sys: 998 µs, total: 1.42 ms\n",
+ "Wall time: 1.43 ms\n"
+ ]
+ },
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(194)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.rand(3,2)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "pGYDoK2gevfo",
+ "outputId": "2c8783d5-0453-47c5-c7ed-af10d25d6989"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0.5999, 0.0073],\n",
+ " [0.9321, 0.3026],\n",
+ " [0.3463, 0.3872]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 20
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.matmul(torch.rand(3,2), torch.rand(2,3))"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "KGBGQoB8e2DP",
+ "outputId": "4c2ef361-a2d0-41ee-c328-3992cbbc138d"
+ },
+ "execution_count": null,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0.3528, 0.1893, 0.0714],\n",
+ " [1.2791, 0.7110, 0.2563],\n",
+ " [0.8812, 0.4553, 0.1803]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import torch"
+ ],
+ "metadata": {
+ "id": "ib8DMtkBe_LJ"
+ },
+ "execution_count": 1,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x = torch.rand(2,9)"
+ ],
+ "metadata": {
+ "id": "nJo8ZBdrQY1b"
+ },
+ "execution_count": 2,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "wi6oRv4MQfgf",
+ "outputId": "55c99f55-31f6-4cf5-ba4e-19a47c3a0167"
+ },
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[0.5894, 0.4391, 0.2018, 0.5417, 0.3844, 0.3592, 0.9209, 0.9269, 0.0681],\n",
+ " [0.0746, 0.1740, 0.6821, 0.6890, 0.0999, 0.7444, 0.2391, 0.4625, 0.8302]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 3
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "y=torch.randn(2,3,5)\n",
+ "y"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Zpx8myAUQgoc",
+ "outputId": "07756d70-56bd-437c-c74e-9aecc1a77311"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[[ 1.5552, -0.4877, 0.5175, -1.7958, -0.6187],\n",
+ " [-0.3359, -1.9710, 0.0112, -1.7578, -1.5295],\n",
+ " [ 0.0932, 1.4079, 0.9108, 0.3328, -0.6978]],\n",
+ "\n",
+ " [[-0.9406, -1.0809, -0.2595, 0.1282, 1.6605],\n",
+ " [ 1.1624, 1.0902, 1.7092, -0.2842, -1.3780],\n",
+ " [-0.1534, -1.2795, -0.5495, 0.9902, 0.1822]]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_original = torch.rand(size=(224,224,3))\n",
+ "x_original"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "s4U-X9bJQnWe",
+ "outputId": "657a7a76-962c-4b41-a76b-902d0482266c"
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[[0.4549, 0.6809, 0.2118],\n",
+ " [0.4824, 0.9008, 0.8741],\n",
+ " [0.1715, 0.1757, 0.1845],\n",
+ " ...,\n",
+ " [0.8741, 0.6594, 0.2610],\n",
+ " [0.0092, 0.1984, 0.1955],\n",
+ " [0.4236, 0.4182, 0.0251]],\n",
+ "\n",
+ " [[0.9174, 0.1661, 0.5852],\n",
+ " [0.1837, 0.2351, 0.3810],\n",
+ " [0.3726, 0.4808, 0.8732],\n",
+ " ...,\n",
+ " [0.6794, 0.0554, 0.9202],\n",
+ " [0.0864, 0.8750, 0.3558],\n",
+ " [0.8445, 0.9759, 0.4934]],\n",
+ "\n",
+ " [[0.1600, 0.2635, 0.7194],\n",
+ " [0.9488, 0.3405, 0.3647],\n",
+ " [0.6683, 0.5168, 0.9592],\n",
+ " ...,\n",
+ " [0.0521, 0.0140, 0.2445],\n",
+ " [0.3596, 0.3999, 0.2730],\n",
+ " [0.5926, 0.9877, 0.7784]],\n",
+ "\n",
+ " ...,\n",
+ "\n",
+ " [[0.4794, 0.5635, 0.3764],\n",
+ " [0.9124, 0.6094, 0.5059],\n",
+ " [0.4528, 0.4447, 0.5021],\n",
+ " ...,\n",
+ " [0.0089, 0.4816, 0.8727],\n",
+ " [0.2173, 0.6296, 0.2347],\n",
+ " [0.2028, 0.9931, 0.7201]],\n",
+ "\n",
+ " [[0.3116, 0.6459, 0.4703],\n",
+ " [0.0148, 0.2345, 0.7149],\n",
+ " [0.8393, 0.5804, 0.6691],\n",
+ " ...,\n",
+ " [0.2105, 0.9460, 0.2696],\n",
+ " [0.5918, 0.9295, 0.2616],\n",
+ " [0.2537, 0.7819, 0.4700]],\n",
+ "\n",
+ " [[0.6654, 0.1200, 0.5841],\n",
+ " [0.9147, 0.5522, 0.6529],\n",
+ " [0.1799, 0.5276, 0.5415],\n",
+ " ...,\n",
+ " [0.7536, 0.4346, 0.8793],\n",
+ " [0.3793, 0.1750, 0.7792],\n",
+ " [0.9266, 0.8325, 0.9974]]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_permuted=x_original.permute(2, 0, 1)\n",
+ "print(x_original.shape)\n",
+ "print(x_permuted.shape)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "DD19_zvbQzHo",
+ "outputId": "1d64ce1b-eb48-47e3-90b6-7f1340e7f2b2"
+ },
+ "execution_count": 9,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "torch.Size([224, 224, 3])\n",
+ "torch.Size([3, 224, 224])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_original[0,0,0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "NnPmMk4ZRF7w",
+ "outputId": "2cd5da7f-4a23-4a76-8c4a-bb982113f2a4"
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0.4549)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 10
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_permuted[0,0,0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Z0ylNoAARgTo",
+ "outputId": "ddca0298-cddf-4048-9b71-a791655e5bed"
+ },
+ "execution_count": 11,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0.4549)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 11
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_original[0,0,0]=0.989"
+ ],
+ "metadata": {
+ "id": "RXw0xXsDRi4L"
+ },
+ "execution_count": 13,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_original[0,0,0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "1sFdV6wzRo3f",
+ "outputId": "1cf87d2c-6d88-453a-d136-0f625a2800f1"
+ },
+ "execution_count": 14,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0.9890)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 14
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x_permuted[0,0,0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "xTX-hx2SR1wp",
+ "outputId": "0d4908c4-c3bc-44e3-8ec6-1487104cc209"
+ },
+ "execution_count": 15,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(0.9890)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 15
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x=torch.arange(1,10).reshape(1,3,3)\n",
+ "x, x.shape"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "mZomOe7gR4Q8",
+ "outputId": "0b3c922f-ec11-46de-b8a5-9f9533d866ad"
+ },
+ "execution_count": 18,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(tensor([[[1, 2, 3],\n",
+ " [4, 5, 6],\n",
+ " [7, 8, 9]]]),\n",
+ " torch.Size([1, 3, 3]))"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 18
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "3y7v4SQvSBs1",
+ "outputId": "8c53307d-e628-404d-db66-56c6bdffab7c"
+ },
+ "execution_count": 19,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([[1, 2, 3],\n",
+ " [4, 5, 6],\n",
+ " [7, 8, 9]])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 19
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0][0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "hf9uG4xLSNya",
+ "outputId": "3075bc42-9ffa-426b-8a86-95628ffcd824"
+ },
+ "execution_count": 21,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([1, 2, 3])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 21
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0][0][0]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "zA4G2Se4SRB3",
+ "outputId": "324312d2-ed0a-49eb-f81f-e904e53992fe"
+ },
+ "execution_count": 22,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(1)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 22
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0][2][2]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Mwy3zmKKSdbk",
+ "outputId": "d35172c3-b099-40a6-ddf1-a453c2adfa44"
+ },
+ "execution_count": 23,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor(9)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 23
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[:,1,1]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "fE3nCM1KS7XT",
+ "outputId": "01f5d755-9737-4235-9f73-dce89ff6ba16"
+ },
+ "execution_count": 24,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([5])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 24
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0,0,:]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "luNDINKNTTxp",
+ "outputId": "091195ef-2f71-4602-e95f-529a69193150"
+ },
+ "execution_count": 25,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([1, 2, 3])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 25
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "x[0,:,2]"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "KG8A4xbfThCL",
+ "outputId": "5866bc41-9241-4619-be7b-e9206b3f80ab"
+ },
+ "execution_count": 26,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([3, 6, 9])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 26
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import numpy as np"
+ ],
+ "metadata": {
+ "id": "CZ3PX0qlTwHJ"
+ },
+ "execution_count": 27,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "array = np.arange(1.0, 8.0)"
+ ],
+ "metadata": {
+ "id": "UOBeTumiT3Lf"
+ },
+ "execution_count": 28,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "array"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "RzcO32E9UCQl",
+ "outputId": "430def24-c42c-461f-e5e7-398544c695d3"
+ },
+ "execution_count": 29,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([1., 2., 3., 4., 5., 6., 7.])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 29
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor = torch.from_numpy(array)\n",
+ "tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "JJIL0q1DUC6O",
+ "outputId": "8a3b1d7c-4482-4d32-f34f-9212d9d3a177"
+ },
+ "execution_count": 32,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([1., 2., 3., 4., 5., 6., 7.], dtype=torch.float64)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 32
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "array[3]=11.0"
+ ],
+ "metadata": {
+ "id": "j3Ce6q3DUIEK"
+ },
+ "execution_count": 33,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "array"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "dc_BCVdjUsCc",
+ "outputId": "65537325-8b11-4f36-fc73-e56f30d6a036"
+ },
+ "execution_count": 34,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "array([ 1., 2., 3., 11., 5., 6., 7.])"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 34
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "VG1e_eITUta2",
+ "outputId": "a26c5198-23b6-4a6d-d73a-ba20cd9782b8"
+ },
+ "execution_count": 35,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([ 1., 2., 3., 11., 5., 6., 7.], dtype=torch.float64)"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 35
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor = torch.ones(7)\n",
+ "tensor, tensor.dtype\n",
+ "numpy_tensor = tensor.numpy()\n",
+ "numpy_tensor, numpy_tensor.dtype"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Swt8JF8vUuev",
+ "outputId": "c9e5bf6a-6d2c-41d6-8327-366867ffdd2d"
+ },
+ "execution_count": 37,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "(array([1., 1., 1., 1., 1., 1., 1.], dtype=float32), dtype('float32'))"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 37
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import torch\n",
+ "random_tensor_A = torch.rand(3,4)\n",
+ "random_tensor_B = torch.rand(3,4)\n",
+ "print(random_tensor_A)\n",
+ "print(random_tensor_B)\n",
+ "print(random_tensor_A == random_tensor_B)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "uGcagTteVFTD",
+ "outputId": "49405790-08e7-4210-b7f1-f00b904c7eb9"
+ },
+ "execution_count": 38,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "tensor([[0.9870, 0.6636, 0.6873, 0.8863],\n",
+ " [0.8386, 0.4169, 0.3587, 0.0265],\n",
+ " [0.2981, 0.6025, 0.5652, 0.5840]])\n",
+ "tensor([[0.9821, 0.3481, 0.0913, 0.4940],\n",
+ " [0.7495, 0.4387, 0.9582, 0.8659],\n",
+ " [0.5064, 0.6919, 0.0809, 0.9771]])\n",
+ "tensor([[False, False, False, False],\n",
+ " [False, False, False, False],\n",
+ " [False, False, False, False]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "RANDOM_SEED = 42\n",
+ "torch.manual_seed(RANDOM_SEED)\n",
+ "random_tensor_C = torch.rand(3,4)\n",
+ "torch.manual_seed(RANDOM_SEED)\n",
+ "random_tensor_D = torch.rand(3,4)\n",
+ "print(random_tensor_C)\n",
+ "print(random_tensor_D)\n",
+ "print(random_tensor_C == random_tensor_D)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "HznyXyEaWjLM",
+ "outputId": "25956434-01b6-4059-9054-c9978884ddc1"
+ },
+ "execution_count": 46,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "tensor([[0.8823, 0.9150, 0.3829, 0.9593],\n",
+ " [0.3904, 0.6009, 0.2566, 0.7936],\n",
+ " [0.9408, 0.1332, 0.9346, 0.5936]])\n",
+ "tensor([[0.8823, 0.9150, 0.3829, 0.9593],\n",
+ " [0.3904, 0.6009, 0.2566, 0.7936],\n",
+ " [0.9408, 0.1332, 0.9346, 0.5936]])\n",
+ "tensor([[True, True, True, True],\n",
+ " [True, True, True, True],\n",
+ " [True, True, True, True]])\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "!nvidia-smi"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "vltPTh0YXJSt",
+ "outputId": "807af6dc-a9ca-4301-ec32-b688dbde8be8"
+ },
+ "execution_count": 2,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "Thu May 23 02:57:59 2024 \n",
+ "+---------------------------------------------------------------------------------------+\n",
+ "| NVIDIA-SMI 535.104.05 Driver Version: 535.104.05 CUDA Version: 12.2 |\n",
+ "|-----------------------------------------+----------------------+----------------------+\n",
+ "| GPU Name Persistence-M | Bus-Id Disp.A | Volatile Uncorr. ECC |\n",
+ "| Fan Temp Perf Pwr:Usage/Cap | Memory-Usage | GPU-Util Compute M. |\n",
+ "| | | MIG M. |\n",
+ "|=========================================+======================+======================|\n",
+ "| 0 Tesla T4 Off | 00000000:00:04.0 Off | 0 |\n",
+ "| N/A 60C P8 11W / 70W | 0MiB / 15360MiB | 0% Default |\n",
+ "| | | N/A |\n",
+ "+-----------------------------------------+----------------------+----------------------+\n",
+ " \n",
+ "+---------------------------------------------------------------------------------------+\n",
+ "| Processes: |\n",
+ "| GPU GI CI PID Type Process name GPU Memory |\n",
+ "| ID ID Usage |\n",
+ "|=======================================================================================|\n",
+ "| No running processes found |\n",
+ "+---------------------------------------------------------------------------------------+\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "import torch\n",
+ "torch.cuda.is_available()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "L6mMyPDyYh1j",
+ "outputId": "279c5dd8-c2a8-4fbd-f321-2f5d7c6e90e6"
+ },
+ "execution_count": 3,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "True"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 3
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "device = \"cuda\" if torch.cuda.is_available() else \"cpu\"\n",
+ "device"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 35
+ },
+ "id": "oOdiYa7ZYytx",
+ "outputId": "d73b04fc-8963-4826-9722-08d118d5ab91"
+ },
+ "execution_count": 5,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "'cuda'"
+ ],
+ "application/vnd.google.colaboratory.intrinsic+json": {
+ "type": "string"
+ }
+ },
+ "metadata": {},
+ "execution_count": 5
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "torch.cuda.device_count()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "vOdsazLqZFM5",
+ "outputId": "8189cd6a-9017-4663-a652-3e15c517d9c3"
+ },
+ "execution_count": 6,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "1"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 6
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor = torch.tensor([1,2,3], device = \"cpu\")\n",
+ "print(tensor, tensor.device)"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "cdik9Vw3ZMv0",
+ "outputId": "044a68fd-83a1-409d-8e3b-655142ca0270"
+ },
+ "execution_count": 7,
+ "outputs": [
+ {
+ "output_type": "stream",
+ "name": "stdout",
+ "text": [
+ "tensor([1, 2, 3]) cpu\n"
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor_on_gpu = tensor.to(device)\n",
+ "tensor_on_gpu"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ },
+ "id": "Zmp835rrZp-z",
+ "outputId": "37fa3413-18a3-47bf-ae51-5b36ff85a3ef"
+ },
+ "execution_count": 8,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "tensor([1, 2, 3], device='cuda:0')"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 8
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor_on_gpu.numpy()"
+ ],
+ "metadata": {
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 159
+ },
+ "id": "jhriaa8uZ1yM",
+ "outputId": "bc5a3226-1a12-4fea-8769-a44f21cdc323"
+ },
+ "execution_count": 10,
+ "outputs": [
+ {
+ "output_type": "error",
+ "ename": "TypeError",
+ "evalue": "can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first.",
+ "traceback": [
+ "\u001b[0;31m---------------------------------------------------------------------------\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m Traceback (most recent call last)",
+ "\u001b[0;32m\u001b[0m in \u001b[0;36m\u001b[0;34m()\u001b[0m\n\u001b[0;32m----> 1\u001b[0;31m \u001b[0mtensor_on_gpu\u001b[0m\u001b[0;34m.\u001b[0m\u001b[0mnumpy\u001b[0m\u001b[0;34m(\u001b[0m\u001b[0;34m)\u001b[0m\u001b[0;34m\u001b[0m\u001b[0;34m\u001b[0m\u001b[0m\n\u001b[0m",
+ "\u001b[0;31mTypeError\u001b[0m: can't convert cuda:0 device type tensor to numpy. Use Tensor.cpu() to copy the tensor to host memory first."
+ ]
+ }
+ ]
+ },
+ {
+ "cell_type": "code",
+ "source": [
+ "tensor_on_cpu = tensor_on_gpu.cpu().numpy()"
+ ],
+ "metadata": {
+ "id": "LHGXK3GgaOzL"
+ },
+ "execution_count": 12,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "source": [],
+ "metadata": {
+ "id": "j-El4LlCajfq"
+ },
+ "execution_count": null,
+ "outputs": []
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {},
+ "source": [
+ "\n---\n\n**免责声明**: \n本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。\n"
+ ]
+ }
+ ]
+}
\ No newline at end of file
diff --git a/translations/zh-CN/README.md b/translations/zh-CN/README.md
new file mode 100644
index 000000000..5c36134d4
--- /dev/null
+++ b/translations/zh-CN/README.md
@@ -0,0 +1,221 @@
+[](https://github.com/microsoft/ML-For-Beginners/blob/master/LICENSE)
+[](https://GitHub.com/microsoft/ML-For-Beginners/graphs/contributors/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/issues/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/pulls/)
+[](http://makeapullrequest.com)
+
+[](https://GitHub.com/microsoft/ML-For-Beginners/watchers/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/network/)
+[](https://GitHub.com/microsoft/ML-For-Beginners/stargazers/)
+
+### 🌐 多语言支持
+
+#### 通过 GitHub Action 支持(自动且始终保持最新)
+
+
+[阿拉伯语](../ar/README.md) | [孟加拉语](../bn/README.md) | [保加利亚语](../bg/README.md) | [缅甸语](../my/README.md) | [中文(简体)](./README.md) | [中文(繁体,香港)](../zh-HK/README.md) | [中文(繁体,澳门)](../zh-MO/README.md) | [中文(繁体,台湾)](../zh-TW/README.md) | [克罗地亚语](../hr/README.md) | [捷克语](../cs/README.md) | [丹麦语](../da/README.md) | [荷兰语](../nl/README.md) | [爱沙尼亚语](../et/README.md) | [芬兰语](../fi/README.md) | [法语](../fr/README.md) | [德语](../de/README.md) | [希腊语](../el/README.md) | [希伯来语](../he/README.md) | [印地语](../hi/README.md) | [匈牙利语](../hu/README.md) | [印度尼西亚语](../id/README.md) | [意大利语](../it/README.md) | [日语](../ja/README.md) | [卡纳达语](../kn/README.md) | [韩语](../ko/README.md) | [立陶宛语](../lt/README.md) | [马来语](../ms/README.md) | [马拉雅拉姆语](../ml/README.md) | [马拉地语](../mr/README.md) | [尼泊尔语](../ne/README.md) | [尼日利亚皮钦语](../pcm/README.md) | [挪威语](../no/README.md) | [波斯语(法尔西语)](../fa/README.md) | [波兰语](../pl/README.md) | [葡萄牙语(巴西)](../pt-BR/README.md) | [葡萄牙语(葡萄牙)](../pt-PT/README.md) | [旁遮普语(古鲁穆奇)](../pa/README.md) | [罗马尼亚语](../ro/README.md) | [俄语](../ru/README.md) | [塞尔维亚语(西里尔字母)](../sr/README.md) | [斯洛伐克语](../sk/README.md) | [斯洛文尼亚语](../sl/README.md) | [西班牙语](../es/README.md) | [斯瓦希里语](../sw/README.md) | [瑞典语](../sv/README.md) | [他加禄语(菲律宾语)](../tl/README.md) | [泰米尔语](../ta/README.md) | [泰卢固语](../te/README.md) | [泰语](../th/README.md) | [土耳其语](../tr/README.md) | [乌克兰语](../uk/README.md) | [乌尔都语](../ur/README.md) | [越南语](../vi/README.md)
+
+> **更喜欢本地克隆?**
+
+> 该仓库包含50多种语言的翻译,显著增加了下载大小。若要在不下载翻译的情况下克隆,请使用稀疏检出:
+> ```bash
+> git clone --filter=blob:none --sparse https://github.com/microsoft/ML-For-Beginners.git
+> cd ML-For-Beginners
+> git sparse-checkout set --no-cone '/*' '!translations' '!translated_images'
+> ```
+> 这样你将获得完成课程所需的一切,下载速度更快。
+
+
+#### 加入我们的社区
+
+[](https://discord.gg/nTYy5BXMWG)
+
+我们开展了一个 Discord 中的 AI 学习系列,了解更多信息并加入我们,时间为 2025 年 9 月 18 日至 30 日,访问 [Learn with AI Series](https://aka.ms/learnwithai/discord)。您将获得使用 GitHub Copilot 从事数据科学的技巧和窍门。
+
+
+
+# 初学者机器学习课程
+
+> 🌍 通过探访世界各地文化,一起探索机器学习 🌍
+
+微软云倡导者很高兴提供一份为期12周、共26课的课程,全面介绍**机器学习**。在此课程中,你将学习有时称为**经典机器学习**的内容,主要使用Scikit-learn库,避免深度学习部分,深度学习内容已包含于我们的[初学者 AI 课程](https://aka.ms/ai4beginners)。同时你也可以结合我们的[初学者数据科学课程](https://aka.ms/ds4beginners)学习。
+
+跟我们一起环游世界,将这些经典技术应用于来自世界各地的数据。每节课都包括课前和课后测验、完成课程的书面说明、解决方案、作业等。我们的项目驱动教学法让你在实战中学习,帮助新技能得以巩固。
+
+**✍️ 衷心感谢我们的作者** Jen Looper、Stephen Howell、Francesca Lazzeri、Tomomi Imura、Cassie Breviu、Dmitry Soshnikov、Chris Noring、Anirban Mukherjee、Ornella Altunyan、Ruth Yakubu 和 Amy Boyd
+
+**🎨 感谢我们的插画师** Tomomi Imura、Dasani Madipalli 和 Jen Looper
+
+**🙏 特别感谢 🙏 我们的微软学生大使作者、审稿人及内容贡献者**,尤其是 Rishit Dagli、Muhammad Sakib Khan Inan、Rohan Raj、Alexandru Petrescu、Abhishek Jaiswal、Nawrin Tabassum、Ioan Samuila 和 Snigdha Agarwal
+
+**🤩 额外感谢微软学生大使 Eric Wanjau、Jasleen Sondhi 和 Vidushi Gupta 为我们的 R 课程做出的贡献!**
+
+# 入门指南
+
+执行以下步骤:
+1. **Fork 仓库**:点击本页右上角的“Fork”按钮。
+2. **克隆仓库**:`git clone https://github.com/microsoft/ML-For-Beginners.git`
+
+> [在我们的 Microsoft Learn 集合中查看本课程的所有额外资源](https://learn.microsoft.com/en-us/collections/qrqzamz1nn2wx3?WT.mc_id=academic-77952-bethanycheum)
+
+> 🔧 **需要帮助?** 请查阅我们的[故障排除指南](TROUBLESHOOTING.md),解决安装、设置及课程运行的问题。
+
+**[学生](https://aka.ms/student-page)**,使用本课程,请将整个仓库 fork 到你自己的 GitHub 账户,并单独或小组完成练习:
+
+- 从课前测验开始。
+- 阅读课程内容并完成相关活动,在每个知识点处暂停并思考。
+- 尽量通过理解课程内容来创建项目,而不是直接运行解决方案代码;解决方案代码可在每个项目导向课程的 `/solution` 文件夹中找到。
+- 参加课后测验。
+- 完成挑战任务。
+- 完成作业。
+- 完成一组课程后,访问[讨论区](https://github.com/microsoft/ML-For-Beginners/discussions)并通过填写相应的 PAT 评估表“公开学习”。‘PAT’ 是一个进度评估工具,这是你填写的一份评分表,帮助你进一步学习。你也可以对其他人的 PAT 作出反馈,共同学习。
+
+> 进一步学习,我们推荐跟随这些[Microsoft Learn](https://docs.microsoft.com/en-us/users/jenlooper-2911/collections/k7o7tg1gp306q4?WT.mc_id=academic-77952-leestott)模块和学习路径。
+
+**教师**,我们提供了[使用本课程的建议](for-teachers.md)。
+
+---
+
+## 视频讲解
+
+部分课程提供短视频形式。您可以在课程中查看所有这些视频,或访问[微软开发者频道上初学者机器学习视频播放列表](https://aka.ms/ml-beginners-videos),点击下方图片观看。
+
+[](https://aka.ms/ml-beginners-videos)
+
+---
+
+## 团队介绍
+
+[](https://youtu.be/Tj1XWrDSYJU)
+
+**动图由** [Mohit Jaisal](https://linkedin.com/in/mohitjaisal) 制作
+
+> 🎥 点击上图观看关于项目及其创建者的视频!
+
+---
+
+## 教学理念
+
+我们在设计此课程时选定了两大教学原则:确保课程是动手的**项目驱动**,并且包含**频繁测验**。此外,课程贯穿了统一的**主题**,以增强整体连贯性。
+
+确保内容与项目相结合,使学习过程更具吸引力,从而增强概念的记忆效果。此外,课前小测验帮助学生确定学习目标,课后测验则促进巩固知识。该课程设计灵活有趣,可以全程学习,也可以部分学习。项目起步简单,到第12周会逐渐变得复杂。课程此外包含有关机器学习实际应用的后记,可用作额外加分或讨论基础。
+
+> 请查阅我们的[行为准则](CODE_OF_CONDUCT.md)、[贡献指南](CONTRIBUTING.md)、[翻译说明](TRANSLATIONS.md)和[故障排除](TROUBLESHOOTING.md)指南。我们欢迎您的建设性反馈!
+
+## 每节课程包含
+
+- 可选的手绘笔记
+- 可选的补充视频
+- 视频讲解(部分课程)
+- [课前热身测验](https://ff-quizzes.netlify.app/en/ml/)
+- 书面课程内容
+- 项目课程包含构建项目的逐步指导
+- 知识点检查
+- 挑战任务
+- 补充阅读材料
+- 作业
+- [课后测验](https://ff-quizzes.netlify.app/en/ml/)
+
+> **关于语言说明**:这些课程主要用 Python 编写,但很多课程也提供 R 语言版本。要完成 R 课程,请到 `/solution` 文件夹中查找带有 `.rmd` 扩展名的文件,它代表**R Markdown**文件,即将代码块(R 或其他语言)和一个指导如何格式化输出(如PDF)的 `YAML` 头部嵌入到一个 Markdown 文档中。因此,它是数据科学创作的优秀框架,因为你可以将代码、代码输出和你的想法一并写入 Markdown。R Markdown 文档也可以渲染成 PDF、HTML 或 Word 等输出格式。
+> **关于测验的说明**:所有测验都包含在[测验应用文件夹](../../quiz-app)中,共52个测验,每个测验包含三个问题。测验链接嵌入在课程中,但测验应用可以在本地运行;请按照`quiz-app`文件夹中的说明在本地托管或部署到 Azure。
+
+| 课程编号 | 主题 | 课程分组 | 学习目标 | 相关课程 | 作者 |
+| :------: | :------------------------------------------------------------: | :------------------------------------------: | ---------------------------------------------------------------------------------------------------------------------------- | :-------------------------------------------------------------------------------------------------------------------------------------: | :--------------------------------------------------: |
+| 01 | 机器学习简介 | [介绍](1-Introduction/README.md) | 学习机器学习背后的基本概念 | [课程](1-Introduction/1-intro-to-ML/README.md) | Muhammad |
+| 02 | 机器学习的历史 | [介绍](1-Introduction/README.md) | 了解该领域背后的历史 | [课程](1-Introduction/2-history-of-ML/README.md) | Jen 和 Amy |
+| 03 | 公平性与机器学习 | [介绍](1-Introduction/README.md) | 学生在构建和应用机器学习模型时应考虑的重要哲学性公平问题是什么? | [课程](1-Introduction/3-fairness/README.md) | Tomomi |
+| 04 | 机器学习技术 | [介绍](1-Introduction/README.md) | 机器学习研究人员使用哪些技术来构建机器学习模型? | [课程](1-Introduction/4-techniques-of-ML/README.md) | Chris 和 Jen |
+| 05 | 回归简介 | [回归](2-Regression/README.md) | 使用 Python 和 Scikit-learn 开始回归模型的学习 | [Python](2-Regression/1-Tools/README.md) • [R](../../2-Regression/1-Tools/solution/R/lesson_1.html) | Jen • Eric Wanjau |
+| 06 | 北美南瓜价格 🎃 | [回归](2-Regression/README.md) | 可视化并清理数据以准备机器学习 | [Python](2-Regression/2-Data/README.md) • [R](../../2-Regression/2-Data/solution/R/lesson_2.html) | Jen • Eric Wanjau |
+| 07 | 北美南瓜价格 🎃 | [回归](2-Regression/README.md) | 构建线性和多项式回归模型 | [Python](2-Regression/3-Linear/README.md) • [R](../../2-Regression/3-Linear/solution/R/lesson_3.html) | Jen 和 Dmitry • Eric Wanjau |
+| 08 | 北美南瓜价格 🎃 | [回归](2-Regression/README.md) | 构建逻辑回归模型 | [Python](2-Regression/4-Logistic/README.md) • [R](../../2-Regression/4-Logistic/solution/R/lesson_4.html) | Jen • Eric Wanjau |
+| 09 | Web 应用 🔌 | [Web App](3-Web-App/README.md) | 构建一个使用您训练模型的网页应用 | [Python](3-Web-App/1-Web-App/README.md) | Jen |
+| 10 | 分类简介 | [分类](4-Classification/README.md) | 清理、准备并可视化您的数据;分类简介 | [Python](4-Classification/1-Introduction/README.md) • [R](../../4-Classification/1-Introduction/solution/R/lesson_10.html) | Jen 和 Cassie • Eric Wanjau |
+| 11 | 美味的亚洲和印度美食 🍜 | [分类](4-Classification/README.md) | 分类器简介 | [Python](4-Classification/2-Classifiers-1/README.md) • [R](../../4-Classification/2-Classifiers-1/solution/R/lesson_11.html) | Jen 和 Cassie • Eric Wanjau |
+| 12 | 美味的亚洲和印度美食 🍜 | [分类](4-Classification/README.md) | 更多分类器 | [Python](4-Classification/3-Classifiers-2/README.md) • [R](../../4-Classification/3-Classifiers-2/solution/R/lesson_12.html) | Jen 和 Cassie • Eric Wanjau |
+| 13 | 美味的亚洲和印度美食 🍜 | [分类](4-Classification/README.md) | 使用您的模型构建推荐网页应用 | [Python](4-Classification/4-Applied/README.md) | Jen |
+| 14 | 聚类简介 | [聚类](5-Clustering/README.md) | 清理、准备并可视化您的数据;聚类简介 | [Python](5-Clustering/1-Visualize/README.md) • [R](../../5-Clustering/1-Visualize/solution/R/lesson_14.html) | Jen • Eric Wanjau |
+| 15 | 探索尼日利亚音乐品味 🎧 | [聚类](5-Clustering/README.md) | 探索 K-均值聚类方法 | [Python](5-Clustering/2-K-Means/README.md) • [R](../../5-Clustering/2-K-Means/solution/R/lesson_15.html) | Jen • Eric Wanjau |
+| 16 | 自然语言处理简介 ☕️ | [自然语言处理](6-NLP/README.md) | 通过构建简单的机器人学习自然语言处理基础 | [Python](6-NLP/1-Introduction-to-NLP/README.md) | Stephen |
+| 17 | 常见的 NLP 任务 ☕️ | [自然语言处理](6-NLP/README.md) | 通过了解处理语言结构时所需的常见任务,深化您的 NLP 知识 | [Python](6-NLP/2-Tasks/README.md) | Stephen |
+| 18 | 翻译和情感分析 ♥️ | [自然语言处理](6-NLP/README.md) | 使用简·奥斯汀进行翻译和情感分析 | [Python](6-NLP/3-Translation-Sentiment/README.md) | Stephen |
+| 19 | 欧洲浪漫酒店 ♥️ | [自然语言处理](6-NLP/README.md) | 使用酒店评论进行情感分析 1 | [Python](6-NLP/4-Hotel-Reviews-1/README.md) | Stephen |
+| 20 | 欧洲浪漫酒店 ♥️ | [自然语言处理](6-NLP/README.md) | 使用酒店评论进行情感分析 2 | [Python](6-NLP/5-Hotel-Reviews-2/README.md) | Stephen |
+| 21 | 时间序列预测简介 | [时间序列](7-TimeSeries/README.md) | 时间序列预测简介 | [Python](7-TimeSeries/1-Introduction/README.md) | Francesca |
+| 22 | ⚡️ 世界电力使用 ⚡️ - 使用 ARIMA 进行时间序列预测 | [时间序列](7-TimeSeries/README.md) | 使用 ARIMA 进行时间序列预测 | [Python](7-TimeSeries/2-ARIMA/README.md) | Francesca |
+| 23 | ⚡️ 世界电力使用 ⚡️ - 使用 SVR 进行时间序列预测 | [时间序列](7-TimeSeries/README.md) | 使用支持向量回归器进行时间序列预测 | [Python](7-TimeSeries/3-SVR/README.md) | Anirban |
+| 24 | 强化学习简介 | [强化学习](8-Reinforcement/README.md) | 使用 Q 学习入门强化学习 | [Python](8-Reinforcement/1-QLearning/README.md) | Dmitry |
+| 25 | 帮助彼得躲避狼!🐺 | [强化学习](8-Reinforcement/README.md) | 强化学习 Gym | [Python](8-Reinforcement/2-Gym/README.md) | Dmitry |
+| 附录 | 现实世界的机器学习场景与应用 | [野外机器学习](9-Real-World/README.md) | 经典机器学习的有趣且发人深省的现实应用 | [课程](9-Real-World/1-Applications/README.md) | 团队 |
+| 附录 | 使用 RAI 仪表盘进行机器学习模型调试 | [野外机器学习](9-Real-World/README.md) | 使用负责任的 AI 仪表盘组件进行机器学习模型调试 | [课程](9-Real-World/2-Debugging-ML-Models/README.md) | Ruth Yakubu |
+
+> [在我们的 Microsoft Learn 集合中查找本课程的所有附加资源](https://learn.microsoft.com/en-us/collections/qrqzamz1nn2wx3?WT.mc_id=academic-77952-bethanycheum)
+
+## 离线访问
+
+您可以通过使用[Docsify](https://docsify.js.org/#/)离线运行此文档。分叉此仓库,在您的本地机器上[安装 Docsify](https://docsify.js.org/#/quickstart),然后在此仓库的根文件夹中键入`docsify serve`。该网站将通过本地端口3000提供服务:`localhost:3000`。
+
+## PDF 文件
+
+在[这里](https://microsoft.github.io/ML-For-Beginners/pdf/readme.pdf)找到带有链接的课程大纲的 PDF 文件。
+
+## 🎒 其他课程
+
+我们的团队还制作其他课程!请查看:
+
+
+### LangChain
+[](https://aka.ms/langchain4j-for-beginners)
+[](https://aka.ms/langchainjs-for-beginners?WT.mc_id=m365-94501-dwahlin)
+
+---
+
+### Azure / Edge / MCP / Agents
+[](https://github.com/microsoft/AZD-for-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/edgeai-for-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/mcp-for-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/ai-agents-for-beginners?WT.mc_id=academic-105485-koreyst)
+
+---
+
+### 生成式 AI 系列
+[](https://github.com/microsoft/generative-ai-for-beginners?WT.mc_id=academic-105485-koreyst)
+[-9333EA?style=for-the-badge&labelColor=E5E7EB&color=9333EA)](https://github.com/microsoft/Generative-AI-for-beginners-dotnet?WT.mc_id=academic-105485-koreyst)
+[-C084FC?style=for-the-badge&labelColor=E5E7EB&color=C084FC)](https://github.com/microsoft/generative-ai-for-beginners-java?WT.mc_id=academic-105485-koreyst)
+[-E879F9?style=for-the-badge&labelColor=E5E7EB&color=E879F9)](https://github.com/microsoft/generative-ai-with-javascript?WT.mc_id=academic-105485-koreyst)
+
+---
+
+### 核心学习
+[](https://aka.ms/ml-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://aka.ms/datascience-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://aka.ms/ai-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/Security-101?WT.mc_id=academic-96948-sayoung)
+[](https://aka.ms/webdev-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://aka.ms/iot-beginners?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/xr-development-for-beginners?WT.mc_id=academic-105485-koreyst)
+
+---
+
+### Copilot 系列
+[](https://aka.ms/GitHubCopilotAI?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/mastering-github-copilot-for-dotnet-csharp-developers?WT.mc_id=academic-105485-koreyst)
+[](https://github.com/microsoft/CopilotAdventures?WT.mc_id=academic-105485-koreyst)
+
+
+## 获取帮助
+
+如果您遇到困难或对构建 AI 应用程序有任何疑问,请加入学习者和经验丰富的开发者们的讨论社区 MCP。这里是一个支持性的社区,欢迎提问并自由分享知识。
+
+[](https://discord.gg/nTYy5BXMWG)
+
+如果您在构建过程中有产品反馈或遇到错误,请访问:
+
+[](https://aka.ms/foundry/forum)
+
+---
+
+
+**免责声明**:
+本文件由 AI 翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 翻译。尽管我们力求准确,但请注意自动翻译可能包含错误或不准确之处。原始语言版本的文件应被视为权威来源。对于重要信息,建议采用专业人工翻译。我们不对因使用本翻译而引起的任何误解或误释承担任何责任。
+
\ No newline at end of file
diff --git a/translations/zh-CN/SECURITY.md b/translations/zh-CN/SECURITY.md
new file mode 100644
index 000000000..592a633a1
--- /dev/null
+++ b/translations/zh-CN/SECURITY.md
@@ -0,0 +1,42 @@
+## 安全性
+
+微软非常重视我们软件产品和服务的安全性,这包括通过我们的 GitHub 组织管理的所有源代码库,这些组织包括 [Microsoft](https://github.com/Microsoft)、[Azure](https://github.com/Azure)、[DotNet](https://github.com/dotnet)、[AspNet](https://github.com/aspnet)、[Xamarin](https://github.com/xamarin) 以及 [我们的 GitHub 组织](https://opensource.microsoft.com/)。
+
+如果您认为在任何微软拥有的代码库中发现了符合 [微软安全漏洞定义](https://docs.microsoft.com/previous-versions/tn-archive/cc751383(v=technet.10)?WT.mc_id=academic-77952-leestott) 的安全漏洞,请按照以下描述向我们报告。
+
+## 报告安全问题
+
+**请不要通过公开的 GitHub 问题报告安全漏洞。**
+
+相反,请通过微软安全响应中心 (MSRC) 报告,网址为 [https://msrc.microsoft.com/create-report](https://msrc.microsoft.com/create-report)。
+
+如果您希望在不登录的情况下提交报告,可以发送电子邮件至 [secure@microsoft.com](mailto:secure@microsoft.com)。如果可能,请使用我们的 PGP 密钥加密您的消息;您可以从 [微软安全响应中心 PGP 密钥页面](https://www.microsoft.com/en-us/msrc/pgp-key-msrc) 下载密钥。
+
+您应该会在 24 小时内收到回复。如果由于某种原因未收到回复,请通过电子邮件进行跟进,以确保我们收到了您的原始消息。更多信息可以在 [microsoft.com/msrc](https://www.microsoft.com/msrc) 找到。
+
+请尽可能提供以下所需信息,以帮助我们更好地理解问题的性质和范围:
+
+ * 问题类型(例如缓冲区溢出、SQL 注入、跨站脚本攻击等)
+ * 与问题表现相关的源文件的完整路径
+ * 受影响源代码的位置(标签/分支/提交或直接 URL)
+ * 重现问题所需的任何特殊配置
+ * 重现问题的逐步说明
+ * 概念验证或漏洞利用代码(如果可能)
+ * 问题的影响,包括攻击者可能如何利用该问题
+
+这些信息将帮助我们更快地处理您的报告。
+
+如果您是为漏洞赏金计划报告问题,更完整的报告可能会获得更高的赏金奖励。请访问我们的 [微软漏洞赏金计划](https://microsoft.com/msrc/bounty) 页面,了解有关我们当前计划的更多详情。
+
+## 首选语言
+
+我们希望所有交流均使用英语。
+
+## 政策
+
+微软遵循 [协调漏洞披露](https://www.microsoft.com/en-us/msrc/cvd) 原则。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/SUPPORT.md b/translations/zh-CN/SUPPORT.md
new file mode 100644
index 000000000..ed355fe39
--- /dev/null
+++ b/translations/zh-CN/SUPPORT.md
@@ -0,0 +1,20 @@
+# 支持
+## 如何提交问题并获得帮助
+
+在提交问题之前,请先查看我们的 [故障排除指南](TROUBLESHOOTING.md),以解决安装、设置和运行课程时的常见问题。
+
+此项目使用 GitHub Issues 来跟踪错误和功能请求。在提交新问题之前,请先搜索现有问题以避免重复。对于新问题,请将您的错误或功能请求作为新问题提交。
+
+如果您需要帮助或对使用此项目有疑问,也可以:
+- 查看 [故障排除指南](TROUBLESHOOTING.md)
+- 访问我们的 [Discord 讨论 #ml-for-beginners 频道](https://aka.ms/foundry/discord)
+- 提交问题
+
+## Microsoft 支持政策
+
+对该存储库的支持仅限于上述资源。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/TROUBLESHOOTING.md b/translations/zh-CN/TROUBLESHOOTING.md
new file mode 100644
index 000000000..59a4f7ee7
--- /dev/null
+++ b/translations/zh-CN/TROUBLESHOOTING.md
@@ -0,0 +1,601 @@
+# 故障排查指南
+
+本指南帮助您解决使用《机器学习初学者》课程时常见的问题。如果您在这里找不到解决方案,请查看我们的[Discord讨论](https://aka.ms/foundry/discord)或[提交问题](https://github.com/microsoft/ML-For-Beginners/issues)。
+
+## 目录
+
+- [安装问题](../..)
+- [Jupyter Notebook问题](../..)
+- [Python包问题](../..)
+- [R环境问题](../..)
+- [测验应用问题](../..)
+- [数据和文件路径问题](../..)
+- [常见错误信息](../..)
+- [性能问题](../..)
+- [环境和配置](../..)
+
+---
+
+## 安装问题
+
+### Python安装
+
+**问题**:`python: command not found`
+
+**解决方案**:
+1. 从[python.org](https://www.python.org/downloads/)安装Python 3.8或更高版本
+2. 验证安装:`python --version`或`python3 --version`
+3. 在macOS/Linux上,可能需要使用`python3`而不是`python`
+
+**问题**:多个Python版本导致冲突
+
+**解决方案**:
+```bash
+# Use virtual environments to isolate projects
+python -m venv ml-env
+
+# Activate virtual environment
+# On Windows:
+ml-env\Scripts\activate
+# On macOS/Linux:
+source ml-env/bin/activate
+```
+
+### Jupyter安装
+
+**问题**:`jupyter: command not found`
+
+**解决方案**:
+```bash
+# Install Jupyter
+pip install jupyter
+
+# Or with pip3
+pip3 install jupyter
+
+# Verify installation
+jupyter --version
+```
+
+**问题**:Jupyter无法在浏览器中启动
+
+**解决方案**:
+```bash
+# Try specifying the browser
+jupyter notebook --browser=chrome
+
+# Or copy the URL with token from terminal and paste in browser manually
+# Look for: http://localhost:8888/?token=...
+```
+
+### R安装
+
+**问题**:R包无法安装
+
+**解决方案**:
+```r
+# Ensure you have the latest R version
+# Install packages with dependencies
+install.packages(c("tidyverse", "tidymodels", "caret"), dependencies = TRUE)
+
+# If compilation fails, try installing binary versions
+install.packages("package-name", type = "binary")
+```
+
+**问题**:IRkernel在Jupyter中不可用
+
+**解决方案**:
+```r
+# In R console
+install.packages('IRkernel')
+IRkernel::installspec(user = TRUE)
+```
+
+---
+
+## Jupyter Notebook问题
+
+### 内核问题
+
+**问题**:内核不断崩溃或重启
+
+**解决方案**:
+1. 重启内核:`Kernel → Restart`
+2. 清除输出并重启:`Kernel → Restart & Clear Output`
+3. 检查内存问题(参见[性能问题](../..))
+4. 尝试逐个运行单元格以识别问题代码
+
+**问题**:选择了错误的Python内核
+
+**解决方案**:
+1. 检查当前内核:`Kernel → Change Kernel`
+2. 选择正确的Python版本
+3. 如果内核缺失,请创建:
+```bash
+python -m ipykernel install --user --name=ml-env
+```
+
+**问题**:内核无法启动
+
+**解决方案**:
+```bash
+# Reinstall ipykernel
+pip uninstall ipykernel
+pip install ipykernel
+
+# Register the kernel again
+python -m ipykernel install --user
+```
+
+### Notebook单元格问题
+
+**问题**:单元格正在运行但不显示输出
+
+**解决方案**:
+1. 检查单元格是否仍在运行(查看`[*]`指示器)
+2. 重启内核并运行所有单元格:`Kernel → Restart & Run All`
+3. 检查浏览器控制台是否有JavaScript错误(按F12)
+
+**问题**:无法运行单元格——点击“运行”无响应
+
+**解决方案**:
+1. 检查Jupyter服务器是否仍在终端中运行
+2. 刷新浏览器页面
+3. 关闭并重新打开Notebook
+4. 重启Jupyter服务器
+
+---
+
+## Python包问题
+
+### 导入错误
+
+**问题**:`ModuleNotFoundError: No module named 'sklearn'`
+
+**解决方案**:
+```bash
+pip install scikit-learn
+
+# Common ML packages for this course
+pip install scikit-learn pandas numpy matplotlib seaborn
+```
+
+**问题**:`ImportError: cannot import name 'X' from 'sklearn'`
+
+**解决方案**:
+```bash
+# Update scikit-learn to latest version
+pip install --upgrade scikit-learn
+
+# Check version
+python -c "import sklearn; print(sklearn.__version__)"
+```
+
+### 版本冲突
+
+**问题**:包版本不兼容错误
+
+**解决方案**:
+```bash
+# Create a new virtual environment
+python -m venv fresh-env
+source fresh-env/bin/activate # or fresh-env\Scripts\activate on Windows
+
+# Install packages fresh
+pip install jupyter scikit-learn pandas numpy matplotlib seaborn
+
+# If specific version needed
+pip install scikit-learn==1.3.0
+```
+
+**问题**:`pip install`因权限错误失败
+
+**解决方案**:
+```bash
+# Install for current user only
+pip install --user package-name
+
+# Or use virtual environment (recommended)
+python -m venv venv
+source venv/bin/activate
+pip install package-name
+```
+
+### 数据加载问题
+
+**问题**:加载CSV文件时出现`FileNotFoundError`
+
+**解决方案**:
+```python
+import os
+# Check current working directory
+print(os.getcwd())
+
+# Use relative paths from notebook location
+df = pd.read_csv('../../data/filename.csv')
+
+# Or use absolute paths
+df = pd.read_csv('/full/path/to/data/filename.csv')
+```
+
+---
+
+## R环境问题
+
+### 包安装
+
+**问题**:包安装因编译错误失败
+
+**解决方案**:
+```r
+# Install binary version (Windows/macOS)
+install.packages("package-name", type = "binary")
+
+# Update R to latest version if packages require it
+# Check R version
+R.version.string
+
+# Install system dependencies (Linux)
+# For Ubuntu/Debian, in terminal:
+# sudo apt-get install r-base-dev
+```
+
+**问题**:`tidyverse`无法安装
+
+**解决方案**:
+```r
+# Install dependencies first
+install.packages(c("rlang", "vctrs", "pillar"))
+
+# Then install tidyverse
+install.packages("tidyverse")
+
+# Or install components individually
+install.packages(c("dplyr", "ggplot2", "tidyr", "readr"))
+```
+
+### RMarkdown问题
+
+**问题**:RMarkdown无法渲染
+
+**解决方案**:
+```r
+# Install/update rmarkdown
+install.packages("rmarkdown")
+
+# Install pandoc if needed
+install.packages("pandoc")
+
+# For PDF output, install tinytex
+install.packages("tinytex")
+tinytex::install_tinytex()
+```
+
+---
+
+## 测验应用问题
+
+### 构建和安装
+
+**问题**:`npm install`失败
+
+**解决方案**:
+```bash
+# Clear npm cache
+npm cache clean --force
+
+# Remove node_modules and package-lock.json
+rm -rf node_modules package-lock.json
+
+# Reinstall
+npm install
+
+# If still fails, try with legacy peer deps
+npm install --legacy-peer-deps
+```
+
+**问题**:端口8080已被占用
+
+**解决方案**:
+```bash
+# Use different port
+npm run serve -- --port 8081
+
+# Or find and kill process using port 8080
+# On Linux/macOS:
+lsof -ti:8080 | xargs kill -9
+
+# On Windows:
+netstat -ano | findstr :8080
+taskkill /PID /F
+```
+
+### 构建错误
+
+**问题**:`npm run build`失败
+
+**解决方案**:
+```bash
+# Check Node.js version (should be 14+)
+node --version
+
+# Update Node.js if needed
+# Then clean install
+rm -rf node_modules package-lock.json
+npm install
+npm run build
+```
+
+**问题**:Linting错误阻止构建
+
+**解决方案**:
+```bash
+# Fix auto-fixable issues
+npm run lint -- --fix
+
+# Or temporarily disable linting in build
+# (not recommended for production)
+```
+
+---
+
+## 数据和文件路径问题
+
+### 路径问题
+
+**问题**:运行Notebook时找不到数据文件
+
+**解决方案**:
+1. **始终从包含Notebook的目录运行**
+ ```bash
+ cd /path/to/lesson/folder
+ jupyter notebook
+ ```
+
+2. **检查代码中的相对路径**
+ ```python
+ # Correct path from notebook location
+ df = pd.read_csv('../data/filename.csv')
+
+ # Not from your terminal location
+ ```
+
+3. **必要时使用绝对路径**
+ ```python
+ import os
+ base_path = os.path.dirname(os.path.abspath(__file__))
+ data_path = os.path.join(base_path, 'data', 'filename.csv')
+ ```
+
+### 数据文件丢失
+
+**问题**:数据集文件丢失
+
+**解决方案**:
+1. 检查数据是否应该在仓库中——大多数数据集都已包含
+2. 某些课程可能需要下载数据——请查看课程README
+3. 确保您已拉取最新的更改:
+ ```bash
+ git pull origin main
+ ```
+
+---
+
+## 常见错误信息
+
+### 内存错误
+
+**错误**:处理数据时出现`MemoryError`或内核崩溃
+
+**解决方案**:
+```python
+# Load data in chunks
+for chunk in pd.read_csv('large_file.csv', chunksize=10000):
+ process(chunk)
+
+# Or read only needed columns
+df = pd.read_csv('file.csv', usecols=['col1', 'col2'])
+
+# Free memory when done
+del large_dataframe
+import gc
+gc.collect()
+```
+
+### 收敛警告
+
+**警告**:`ConvergenceWarning: Maximum number of iterations reached`
+
+**解决方案**:
+```python
+from sklearn.linear_model import LogisticRegression
+
+# Increase max iterations
+model = LogisticRegression(max_iter=1000)
+
+# Or scale your features first
+from sklearn.preprocessing import StandardScaler
+scaler = StandardScaler()
+X_scaled = scaler.fit_transform(X)
+```
+
+### 绘图问题
+
+**问题**:Jupyter中不显示图表
+
+**解决方案**:
+```python
+# Enable inline plotting
+%matplotlib inline
+
+# Import pyplot
+import matplotlib.pyplot as plt
+
+# Show plot explicitly
+plt.plot(data)
+plt.show()
+```
+
+**问题**:Seaborn图表显示异常或报错
+
+**解决方案**:
+```python
+import warnings
+warnings.filterwarnings('ignore', category=UserWarning)
+
+# Update to compatible version
+# pip install --upgrade seaborn matplotlib
+```
+
+### Unicode/编码错误
+
+**问题**:读取文件时出现`UnicodeDecodeError`
+
+**解决方案**:
+```python
+# Specify encoding explicitly
+df = pd.read_csv('file.csv', encoding='utf-8')
+
+# Or try different encoding
+df = pd.read_csv('file.csv', encoding='latin-1')
+
+# For errors='ignore' to skip problematic characters
+df = pd.read_csv('file.csv', encoding='utf-8', errors='ignore')
+```
+
+---
+
+## 性能问题
+
+### Notebook执行缓慢
+
+**问题**:Notebook运行速度非常慢
+
+**解决方案**:
+1. **重启内核释放内存**:`Kernel → Restart`
+2. **关闭未使用的Notebook**以释放资源
+3. **使用较小的数据样本进行测试**:
+ ```python
+ # Work with subset during development
+ df_sample = df.sample(n=1000)
+ ```
+4. **分析代码性能**以找到瓶颈:
+ ```python
+ %time operation() # Time single operation
+ %timeit operation() # Time with multiple runs
+ ```
+
+### 高内存使用
+
+**问题**:系统内存不足
+
+**解决方案**:
+```python
+# Check memory usage
+df.info(memory_usage='deep')
+
+# Optimize data types
+df['column'] = df['column'].astype('int32') # Instead of int64
+
+# Drop unnecessary columns
+df = df[['col1', 'col2']] # Keep only needed columns
+
+# Process in batches
+for batch in np.array_split(df, 10):
+ process(batch)
+```
+
+---
+
+## 环境和配置
+
+### 虚拟环境问题
+
+**问题**:虚拟环境未激活
+
+**解决方案**:
+```bash
+# Windows
+python -m venv venv
+venv\Scripts\activate.bat
+
+# macOS/Linux
+python3 -m venv venv
+source venv/bin/activate
+
+# Check if activated (should show venv name in prompt)
+which python # Should point to venv python
+```
+
+**问题**:包已安装但在Notebook中找不到
+
+**解决方案**:
+```bash
+# Ensure notebook uses the correct kernel
+# Install ipykernel in your venv
+pip install ipykernel
+python -m ipykernel install --user --name=ml-env --display-name="Python (ml-env)"
+
+# In Jupyter: Kernel → Change Kernel → Python (ml-env)
+```
+
+### Git问题
+
+**问题**:无法拉取最新更改——出现合并冲突
+
+**解决方案**:
+```bash
+# Stash your changes
+git stash
+
+# Pull latest
+git pull origin main
+
+# Reapply your changes
+git stash pop
+
+# If conflicts, resolve manually or:
+git checkout --theirs path/to/file # Take remote version
+git checkout --ours path/to/file # Keep your version
+```
+
+### VS Code集成
+
+**问题**:Jupyter Notebook无法在VS Code中打开
+
+**解决方案**:
+1. 在VS Code中安装Python扩展
+2. 在VS Code中安装Jupyter扩展
+3. 选择正确的Python解释器:`Ctrl+Shift+P` → "Python: Select Interpreter"
+4. 重启VS Code
+
+---
+
+## 其他资源
+
+- **Discord讨论**:[在#ml-for-beginners频道提问并分享解决方案](https://aka.ms/foundry/discord)
+- **Microsoft Learn**:[机器学习初学者模块](https://learn.microsoft.com/en-us/collections/qrqzamz1nn2wx3?WT.mc_id=academic-77952-bethanycheum)
+- **视频教程**:[YouTube播放列表](https://aka.ms/ml-beginners-videos)
+- **问题追踪器**:[报告错误](https://github.com/microsoft/ML-For-Beginners/issues)
+
+---
+
+## 仍有问题?
+
+如果您尝试了上述解决方案但仍然遇到问题:
+
+1. **搜索现有问题**:[GitHub Issues](https://github.com/microsoft/ML-For-Beginners/issues)
+2. **查看Discord讨论**:[Discord Discussions](https://aka.ms/foundry/discord)
+3. **提交新问题**:包括以下内容:
+ - 您的操作系统及版本
+ - Python/R版本
+ - 错误信息(完整回溯)
+ - 重现问题的步骤
+ - 您已尝试的解决方法
+
+我们随时为您提供帮助!🚀
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/docs/_sidebar.md b/translations/zh-CN/docs/_sidebar.md
new file mode 100644
index 000000000..d912a1dfd
--- /dev/null
+++ b/translations/zh-CN/docs/_sidebar.md
@@ -0,0 +1,48 @@
+- 简介
+ - [机器学习简介](../1-Introduction/1-intro-to-ML/README.md)
+ - [机器学习的历史](../1-Introduction/2-history-of-ML/README.md)
+ - [机器学习与公平性](../1-Introduction/3-fairness/README.md)
+ - [机器学习的技术](../1-Introduction/4-techniques-of-ML/README.md)
+
+- 回归
+ - [实用工具](../2-Regression/1-Tools/README.md)
+ - [数据](../2-Regression/2-Data/README.md)
+ - [线性回归](../2-Regression/3-Linear/README.md)
+ - [逻辑回归](../2-Regression/4-Logistic/README.md)
+
+- 构建一个网页应用
+ - [网页应用](../3-Web-App/1-Web-App/README.md)
+
+- 分类
+ - [分类简介](../4-Classification/1-Introduction/README.md)
+ - [分类器 1](../4-Classification/2-Classifiers-1/README.md)
+ - [分类器 2](../4-Classification/3-Classifiers-2/README.md)
+ - [应用机器学习](../4-Classification/4-Applied/README.md)
+
+- 聚类
+ - [数据可视化](../5-Clustering/1-Visualize/README.md)
+ - [K-Means](../5-Clustering/2-K-Means/README.md)
+
+- 自然语言处理
+ - [自然语言处理简介](../6-NLP/1-Introduction-to-NLP/README.md)
+ - [自然语言处理任务](../6-NLP/2-Tasks/README.md)
+ - [翻译与情感分析](../6-NLP/3-Translation-Sentiment/README.md)
+ - [酒店评论 1](../6-NLP/4-Hotel-Reviews-1/README.md)
+ - [酒店评论 2](../6-NLP/5-Hotel-Reviews-2/README.md)
+
+- 时间序列预测
+ - [时间序列预测简介](../7-TimeSeries/1-Introduction/README.md)
+ - [ARIMA](../7-TimeSeries/2-ARIMA/README.md)
+ - [SVR](../7-TimeSeries/3-SVR/README.md)
+
+- 强化学习
+ - [Q-Learning](../8-Reinforcement/1-QLearning/README.md)
+ - [Gym](../8-Reinforcement/2-Gym/README.md)
+
+- 真实世界中的机器学习
+ - [应用](../9-Real-World/1-Applications/README.md)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/for-teachers.md b/translations/zh-CN/for-teachers.md
new file mode 100644
index 000000000..7792f4c1d
--- /dev/null
+++ b/translations/zh-CN/for-teachers.md
@@ -0,0 +1,28 @@
+## 给教育工作者
+
+您想在课堂上使用这套课程吗?请随意使用!
+
+事实上,您可以直接在 GitHub 上使用它,通过 GitHub Classroom 来实现。
+
+为此,您需要 fork 此仓库。您需要为每节课创建一个单独的仓库,因此需要将每个文件夹提取到一个独立的仓库中。这样,[GitHub Classroom](https://classroom.github.com/classrooms) 就可以单独识别每节课。
+
+这些[完整的说明](https://github.blog/2020-03-18-set-up-your-digital-classroom-with-github-classroom/)可以帮助您了解如何设置您的课堂。
+
+## 按现有形式使用仓库
+
+如果您希望按当前形式使用此仓库,而不使用 GitHub Classroom,也完全可以实现。您需要与您的学生沟通,共同决定要学习的课程。
+
+在在线教学环境中(如 Zoom、Teams 或其他平台),您可以为测验创建分组讨论室,并指导学生做好学习准备。然后邀请学生参加测验,并在规定时间内以“问题”的形式提交答案。如果您希望学生公开协作完成作业,也可以采用类似的方式。
+
+如果您更倾向于私密的教学方式,可以让学生逐课 fork 课程到他们自己的 GitHub 私有仓库,并授予您访问权限。这样,他们可以私下完成测验和作业,并通过您课堂仓库中的问题提交给您。
+
+在在线课堂环境中,有很多方法可以让这套课程发挥作用。请告诉我们哪种方式最适合您!
+
+## 请告诉我们您的想法!
+
+我们希望这套课程能够满足您和您学生的需求。请通过[反馈](https://forms.microsoft.com/Pages/ResponsePage.aspx?id=v4j5cvGGr0GRqy180BHbR2humCsRZhxNuI79cm6n0hRUQzRVVU9VVlU5UlFLWTRLWlkyQUxORTg5WS4u)告诉我们您的意见!
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务 [Co-op Translator](https://github.com/Azure/co-op-translator) 进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/quiz-app/README.md b/translations/zh-CN/quiz-app/README.md
new file mode 100644
index 000000000..faf68f16c
--- /dev/null
+++ b/translations/zh-CN/quiz-app/README.md
@@ -0,0 +1,118 @@
+# 测验
+
+这些测验是 ML 课程(https://aka.ms/ml-beginners)的课前和课后测验。
+
+## 项目设置
+
+```
+npm install
+```
+
+### 编译并热加载用于开发
+
+```
+npm run serve
+```
+
+### 编译并压缩用于生产
+
+```
+npm run build
+```
+
+### 检查并修复文件
+
+```
+npm run lint
+```
+
+### 自定义配置
+
+请参阅 [配置参考](https://cli.vuejs.org/config/)。
+
+致谢:感谢此测验应用的原始版本:https://github.com/arpan45/simple-quiz-vue
+
+## 部署到 Azure
+
+以下是帮助您入门的分步指南:
+
+1. Fork 一个 GitHub 仓库
+确保您的静态 Web 应用代码在您的 GitHub 仓库中。Fork 此仓库。
+
+2. 创建一个 Azure 静态 Web 应用
+- 创建一个 [Azure 账户](http://azure.microsoft.com)
+- 访问 [Azure 门户](https://portal.azure.com)
+- 点击“创建资源”,搜索“静态 Web 应用”。
+- 点击“创建”。
+
+3. 配置静态 Web 应用
+- 基本信息:
+ - 订阅:选择您的 Azure 订阅。
+ - 资源组:创建一个新的资源组或使用现有的资源组。
+ - 名称:为您的静态 Web 应用提供一个名称。
+ - 区域:选择离您的用户最近的区域。
+
+- #### 部署详情:
+ - 来源:选择“GitHub”。
+ - GitHub 账户:授权 Azure 访问您的 GitHub 账户。
+ - 组织:选择您的 GitHub 组织。
+ - 仓库:选择包含静态 Web 应用的仓库。
+ - 分支:选择您希望部署的分支。
+
+- #### 构建详情:
+ - 构建预设:选择您的应用所使用的框架(例如 React、Angular、Vue 等)。
+ - 应用位置:指定包含应用代码的文件夹(例如,如果在根目录则为 /)。
+ - API 位置:如果有 API,请指定其位置(可选)。
+ - 输出位置:指定生成构建输出的文件夹(例如 build 或 dist)。
+
+4. 审核并创建
+审核您的设置并点击“创建”。Azure 将设置必要的资源,并在您的仓库中创建一个 GitHub Actions 工作流。
+
+5. GitHub Actions 工作流
+Azure 会自动在您的仓库中创建一个 GitHub Actions 工作流文件(.github/workflows/azure-static-web-apps-.yml)。此工作流将处理构建和部署过程。
+
+6. 监控部署
+进入 GitHub 仓库中的“Actions”标签页。
+您应该会看到一个工作流正在运行。此工作流将构建并部署您的静态 Web 应用到 Azure。
+工作流完成后,您的应用将上线并可通过提供的 Azure URL 访问。
+
+### 示例工作流文件
+
+以下是 GitHub Actions 工作流文件的示例:
+name: Azure Static Web Apps CI/CD
+```
+on:
+ push:
+ branches:
+ - main
+ pull_request:
+ types: [opened, synchronize, reopened, closed]
+ branches:
+ - main
+
+jobs:
+ build_and_deploy_job:
+ runs-on: ubuntu-latest
+ name: Build and Deploy Job
+ steps:
+ - uses: actions/checkout@v2
+ - name: Build And Deploy
+ id: builddeploy
+ uses: Azure/static-web-apps-deploy@v1
+ with:
+ azure_static_web_apps_api_token: ${{ secrets.AZURE_STATIC_WEB_APPS_API_TOKEN }}
+ repo_token: ${{ secrets.GITHUB_TOKEN }}
+ action: "upload"
+ app_location: "/quiz-app" # App source code path
+ api_location: ""API source code path optional
+ output_location: "dist" #Built app content directory - optional
+```
+
+### 其他资源
+- [Azure 静态 Web 应用文档](https://learn.microsoft.com/azure/static-web-apps/getting-started)
+- [GitHub Actions 文档](https://docs.github.com/actions/use-cases-and-examples/deploying/deploying-to-azure-static-web-app)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。应以原始语言的文档作为权威来源。对于重要信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/sketchnotes/LICENSE.md b/translations/zh-CN/sketchnotes/LICENSE.md
new file mode 100644
index 000000000..dde8df6a2
--- /dev/null
+++ b/translations/zh-CN/sketchnotes/LICENSE.md
@@ -0,0 +1,190 @@
+归属-相同方式共享 4.0 国际许可协议
+
+=======================================================================
+
+创作共用组织(Creative Commons)不是律师事务所,也不提供法律服务或法律建议。分发创作共用公共许可协议并不会建立律师与客户或其他关系。创作共用以“现状”形式提供其许可协议及相关信息。创作共用对其许可协议、根据其条款和条件许可的任何材料或相关信息不作任何保证。创作共用在法律允许的最大范围内对因使用其许可协议而导致的损害不承担任何责任。
+
+使用创作共用公共许可协议
+
+创作共用公共许可协议提供了一套标准条款和条件,创作者和其他权利持有人可以使用这些条款和条件来分享原创作品及其他受版权和以下公共许可中规定的某些其他权利约束的材料。以下注意事项仅供参考,并不详尽,也不构成我们许可协议的一部分。
+
+ 对许可人的注意事项:我们的公共许可协议旨在供那些有权向公众授权使用材料的人使用,这些材料的使用方式通常受到版权和某些其他权利的限制。我们的许可协议是不可撤销的。许可人在应用许可协议之前应阅读并理解所选许可协议的条款和条件。许可人还应在应用我们的许可协议之前确保获得所有必要的权利,以便公众能够按照预期重新使用材料。许可人应明确标记任何不受许可协议约束的材料,包括其他创作共用许可的材料,或根据版权的例外或限制使用的材料。更多关于许可人的注意事项:
+ wiki.creativecommons.org/Considerations_for_licensors
+
+ 对公众的注意事项:通过使用我们的公共许可协议,许可人授予公众在指定条款和条件下使用许可材料的权限。如果由于任何原因不需要许可人的授权,例如适用的版权例外或限制,则该使用不受许可协议的约束。我们的许可协议仅授予许可人在版权和某些其他权利范围内有权授予的权限。对许可材料的使用可能仍因其他原因受到限制,包括其他人对材料拥有版权或其他权利。许可人可能会提出特殊要求,例如要求标记或描述所有更改。虽然我们的许可协议不要求这样做,但我们鼓励您在合理的情况下尊重这些要求。更多关于公众的注意事项:
+ wiki.creativecommons.org/Considerations_for_licensees
+
+=======================================================================
+
+创作共用归属-相同方式共享 4.0 国际公共许可协议
+
+通过行使以下定义的许可权利,您接受并同意受创作共用归属-相同方式共享 4.0 国际公共许可协议(“公共许可协议”)条款和条件的约束。如果此公共许可协议可被解释为合同,则您因接受这些条款和条件而获得许可权利,许可人因根据这些条款和条件提供许可材料而获得利益。
+
+第1节——定义。
+
+ a. 改编材料指受版权及类似权利约束的材料,这些材料基于许可材料进行衍生或改编,并且许可材料被翻译、修改、编排、转化或以其他方式更改,需获得许可人持有的版权及类似权利的许可。对于本公共许可协议而言,如果许可材料是音乐作品、表演或录音,则改编材料总是在许可材料与动态影像同步时产生。
+
+ b. 改编者许可指您根据本公共许可协议的条款和条件应用于您对改编材料的贡献的版权及类似权利的许可。
+
+ c. BY-SA 兼容许可指创作共用批准的、与本公共许可协议基本等同的许可,列于 creativecommons.org/compatiblelicenses。
+
+ d. 版权及类似权利指与版权密切相关的权利,包括但不限于表演权、广播权、录音权以及独特数据库权利,无论这些权利如何被标记或分类。对于本公共许可协议而言,第2节(b)(1)-(2)中规定的权利不属于版权及类似权利。
+
+ e. 有效技术措施指在没有适当授权的情况下,根据1996年12月20日通过的《世界知识产权组织版权条约》第11条及/或类似国际协议的法律规定,不得规避的措施。
+
+ f. 例外和限制指适用于您使用许可材料的版权及类似权利的任何例外或限制,例如合理使用、合理交易等。
+
+ g. 许可元素指创作共用公共许可协议名称中列出的许可属性。本公共许可协议的许可元素为归属和相同方式共享。
+
+ h. 许可材料指许可人应用本公共许可协议的艺术或文学作品、数据库或其他材料。
+
+ i. 许可权利指根据本公共许可协议的条款和条件授予您的权利,这些权利仅限于适用于您使用许可材料的所有版权及类似权利,并且许可人有权许可这些权利。
+
+ j. 许可人指根据本公共许可协议授予权利的个人或实体。
+
+ k. 分享指通过任何需要许可权利的方式或过程向公众提供材料,例如复制、公开展示、公开表演、分发、传播、通信或进口,以及以公众可以在其选择的时间和地点访问材料的方式向公众提供材料。
+
+ l. 独特数据库权利指除版权外,根据1996年3月11日欧洲议会和理事会通过的《数据库法律保护指令》(Directive 96/9/EC)及其修订或后续版本,以及全球范围内其他基本等同的权利所产生的权利。
+
+ m. 您指根据本公共许可协议行使许可权利的个人或实体。“您的”具有相应含义。
+
+第2节——范围。
+
+ a. 许可授予。
+
+ 1. 根据本公共许可协议的条款和条件,许可人特此授予您全球范围内的、免版税的、不可转授权的、非独占的、不可撤销的许可权,以行使许可材料中的许可权利:
+
+ a. 复制和分享许可材料,无论是全部还是部分;以及
+
+ b. 生产、复制和分享改编材料。
+
+ 2. 例外和限制。为避免疑义,如果例外和限制适用于您的使用,则本公共许可协议不适用,您无需遵守其条款和条件。
+
+ 3. 期限。本公共许可协议的期限在第6节(a)中规定。
+
+ 4. 媒体和格式;允许技术修改。许可人授权您在现有或未来创建的所有媒体和格式中行使许可权利,并进行必要的技术修改以实现这一点。许可人放弃并/或同意不主张任何权利或权限,以禁止您进行必要的技术修改以行使许可权利,包括必要的技术修改以规避有效技术措施。对于本公共许可协议,仅进行本第2节(a)(4)授权的修改从未产生改编材料。
+
+ 5. 下游接收者。
+
+ a. 许可人的要约——许可材料。每个许可材料的接收者自动收到许可人的要约,以根据本公共许可协议的条款和条件行使许可权利。
+
+ b. 许可人的额外要约——改编材料。每个从您处接收改编材料的接收者自动收到许可人的要约,以根据您应用的改编者许可的条件行使改编材料中的许可权利。
+
+ c. 无下游限制。您不得对许可材料施加任何额外或不同的条款和条件,也不得应用任何有效技术措施,如果这样做会限制任何接收者行使许可权利。
+
+ 6. 无认可。本公共许可协议中的任何内容均不构成或可被解释为许可您主张或暗示您与许可人或其他指定接收归属的人有联系,或您的使用获得许可人或其他人的认可、支持或官方地位。
+
+ b. 其他权利。
+
+ 1. 道德权利,例如完整性权利,不在本公共许可协议的许可范围内,也不包括宣传权、隐私权和/或其他类似的个性权利;然而,在可能的范围内,许可人放弃并/或同意不主张许可人持有的任何此类权利,以允许您在有限范围内行使许可权利,但不包括其他情况。
+
+ 2. 专利权和商标权不在本公共许可协议的许可范围内。
+
+ 3. 在可能的范围内,许可人放弃任何直接或通过收集机构根据任何自愿或可放弃的法定或强制许可计划向您收取版税的权利。在所有其他情况下,许可人明确保留收取此类版税的权利。
+
+第3节——许可条件。
+
+您行使许可权利明确以以下条件为前提。
+
+ a. 归属。
+
+ 1. 如果您分享许可材料(包括以修改形式),您必须:
+
+ a. 保留以下内容(如果许可人随许可材料提供):
+
+ i. 许可材料创作者及任何其他指定接收归属者的身份信息,以许可人要求的任何合理方式(包括使用化名,如果指定);
+
+ ii. 版权声明;
+
+ iii. 提及本公共许可协议的声明;
+
+ iv. 提及免责声明的声明;
+
+ v. 在合理可行的范围内,许可材料的URI或超链接;
+
+ b. 表明您是否修改了许可材料,并保留任何先前修改的指示;以及
+
+ c. 表明许可材料是根据本公共许可协议授权的,并包括本公共许可协议的文本或URI或超链接。
+
+ 2. 您可以根据您分享许可材料的媒介、方式和上下文,以任何合理方式满足第3节(a)(1)中的条件。例如,可以通过提供URI或超链接到包含所需信息的资源来满足条件。
+
+ 3. 如果许可人要求,您必须在合理可行的范围内移除第3节(a)(1)(A)中要求的任何信息。
+
+ b. 相同方式共享。
+
+ 除第3节(a)中的条件外,如果您分享您制作的改编材料,还需满足以下条件。
+
+ 1. 您应用的改编者许可必须是具有相同许可元素的创作共用许可协议(本版本或更高版本),或BY-SA兼容许可。
+
+ 2. 您必须包括您应用的改编者许可的文本或URI或超链接。您可以根据您分享改编材料的媒介、方式和上下文,以任何合理方式满足此条件。
+
+ 3. 您不得对改编材料施加任何额外或不同的条款和条件,也不得应用任何有效技术措施,这些行为会限制根据您应用的改编者许可授予的权利的行使。
+
+第4节——独特数据库权利。
+
+如果许可权利包括适用于您使用许可材料的独特数据库权利:
+
+ a. 为避免疑义,第2节(a)(1)授予您提取、重用、复制和分享数据库内容全部或实质部分的权利;
+
+ b. 如果您将数据库内容全部或实质部分包含在您拥有独特数据库权利的数据库中:
+权利,然后您拥有“独创性数据库权利”的数据库(但不包括其单独内容)属于改编材料,
+
+包括用于第3(b)节的目的;以及
+c. 如果您共享数据库的全部或大部分内容,则必须遵守第3(a)节中的条件。
+
+为避免疑义,本第4节是对您的义务的补充,而不是替代您在本公共许可下的义务,当许可权利包括其他版权和类似权利时。
+
+
+第5节——免责声明和责任限制。
+
+a. 除非许可方另行单独承诺,在可能的范围内,许可方按“现状”和“可用”提供许可材料,并且不对许可材料作出任何形式的陈述或保证,无论是明示、暗示、法定或其他。这包括但不限于所有权保证、适销性、特定用途适用性、非侵权、无潜在或其他缺陷、准确性或是否存在错误(无论是否已知或可发现)。如果法律不允许完全或部分免责声明,则此免责声明可能不适用于您。
+
+b. 在可能的范围内,无论基于何种法律理论(包括但不限于过失)或其他原因,许可方在任何情况下均不对您因本公共许可或使用许可材料而产生的任何直接、特殊、间接、附带、后果性、惩罚性、示范性或其他损失、成本、费用或损害承担责任,即使许可方已被告知可能发生此类损失、成本、费用或损害。如果法律不允许完全或部分责任限制,则此限制可能不适用于您。
+
+c. 上述免责声明和责任限制应以尽可能接近绝对免责声明和放弃所有责任的方式进行解释。
+
+
+第6节——期限和终止。
+
+a. 本公共许可适用于此处许可的版权和类似权利的期限。然而,如果您未能遵守本公共许可,则您在本公共许可下的权利将自动终止。
+
+b. 如果您的使用许可材料的权利根据第6(a)节终止,则该权利可恢复:
+
+1. 如果在您发现违规行为后的30天内纠正违规行为,则自违规行为纠正之日起自动恢复;或
+2. 经许可方明确恢复。
+
+为避免疑义,本第6(b)节不影响许可方因您违反本公共许可而寻求补救的任何权利。
+
+c. 为避免疑义,许可方也可以随时根据单独的条款或条件提供许可材料或停止分发许可材料;然而,这不会终止本公共许可。
+
+d. 第1、5、6、7和8节在本公共许可终止后仍然有效。
+
+
+第7节——其他条款和条件。
+
+a. 除非许可方明确同意,否则许可方不受您传达的任何额外或不同条款或条件的约束。
+
+b. 关于许可材料的任何安排、理解或协议未在此处说明的,均与本公共许可的条款和条件分离且独立。
+
+
+第8节——解释。
+
+a. 为避免疑义,本公共许可不会,也不应被解释为减少、限制、约束或对任何在未获得本公共许可许可的情况下合法使用许可材料的行为施加条件。
+
+b. 在可能的范围内,如果本公共许可的任何条款被认为不可执行,则应自动调整至使其可执行的最低程度。如果该条款无法调整,则应从本公共许可中删除,而不影响其余条款和条件的可执行性。
+
+c. 除非许可方明确同意,否则本公共许可的任何条款或条件均不得被放弃,也不得因未遵守而被视为同意。
+
+d. 本公共许可中的任何内容均不构成或可被解释为对适用于许可方或您的任何特权和豁免的限制或放弃,包括任何司法辖区或权威的法律程序。
+
+
+=======================================================================
+
+Creative Commons不是其公共许可的当事方。然而,Creative Commons可以选择将其公共许可应用于其发布的材料,在这些情况下将被视为“许可方”。Creative Commons公共许可的文本已通过CC0公共领域奉献声明献给公共领域。除用于表明材料是根据Creative Commons公共许可共享或根据Creative Commons政策(发布于creativecommons.org/policies)允许的有限目的外,Creative Commons不授权使用“Creative Commons”商标或任何其他Creative Commons商标或标志,未经其事先书面同意,包括但不限于与任何未经授权修改其公共许可或任何其他关于许可材料使用的安排、理解或协议相关的情况。为避免疑义,本段不构成公共许可的一部分。
+
+您可以通过creativecommons.org联系Creative Commons。
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
diff --git a/translations/zh-CN/sketchnotes/README.md b/translations/zh-CN/sketchnotes/README.md
new file mode 100644
index 000000000..16a3343be
--- /dev/null
+++ b/translations/zh-CN/sketchnotes/README.md
@@ -0,0 +1,12 @@
+所有课程的手绘笔记可以在这里下载。
+
+🖨 如果需要打印高分辨率版本,可以在 [这个仓库](https://github.com/girliemac/a-picture-is-worth-a-1000-words/tree/main/ml/tiff) 中找到 TIFF 格式文件。
+
+🎨 制作人: [Tomomi Imura](https://github.com/girliemac) (Twitter: [@girlie_mac](https://twitter.com/girlie_mac))
+
+[](https://creativecommons.org/licenses/by-sa/4.0/)
+
+---
+
+**免责声明**:
+本文档使用AI翻译服务[Co-op Translator](https://github.com/Azure/co-op-translator)进行翻译。尽管我们努力确保翻译的准确性,但请注意,自动翻译可能包含错误或不准确之处。原始语言的文档应被视为权威来源。对于关键信息,建议使用专业人工翻译。我们不对因使用此翻译而产生的任何误解或误读承担责任。
\ No newline at end of file
|