ML-For-Beginners/translations/hk/5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb

{
 "cells": [
  {
   "cell_type": "markdown",
   "source": [
    "## **從 Spotify 擷取的尼日利亞音樂分析**\n",
    "\n",
    "聚類是一種[無監督學習](https://wikipedia.org/wiki/Unsupervised_learning)方法，假設數據集是未標籤的，或者其輸入未與預定義的輸出匹配。它使用各種算法來整理未標籤的數據，並根據數據中辨識出的模式進行分組。\n",
    "\n",
    "[**課前測驗**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/27/)\n",
    "\n",
    "### **簡介**\n",
    "\n",
    "[聚類](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124)在數據探索中非常有用。讓我們看看它是否能幫助發現尼日利亞觀眾消費音樂的趨勢和模式。\n",
    "\n",
    "> ✅ 花一分鐘思考一下聚類的用途。在日常生活中，聚類就像你有一堆洗好的衣服，需要將家人不同的衣物分類 🧦👕👖🩲。在數據科學中，聚類發生在分析用戶偏好或確定任何未標籤數據集的特徵時。聚類在某種程度上幫助我們從混亂中找到秩序，就像整理襪子抽屜一樣。\n",
    "\n",
    "在專業環境中，聚類可以用於確定市場細分，例如確定哪些年齡段購買哪些商品。另一個用途是異常檢測，例如從信用卡交易數據集中檢測欺詐行為。或者，你可以用聚類來分析一批醫學掃描中的腫瘤。\n",
    "\n",
    "✅ 花一分鐘思考一下你在銀行業、電子商務或商業環境中可能遇到過的聚類。\n",
    "\n",
    "> 🎓 有趣的是，聚類分析起源於1930年代的人類學和心理學領域。你能想像它可能是如何被使用的嗎？\n",
    "\n",
    "此外，你可以用它來分組搜索結果，例如購物鏈接、圖片或評論。當你有一個大型數據集需要縮減並進行更細緻的分析時，聚類技術非常有用，因此它可以在構建其他模型之前幫助了解數據。\n",
    "\n",
    "✅ 一旦你的數據被組織成聚類，你可以為其分配一個聚類 ID。這種技術在保護數據集隱私時非常有用；你可以用聚類 ID 而不是更具識別性的數據來引用數據點。你能想到其他使用聚類 ID 而不是聚類中其他元素來識別它的原因嗎？\n",
    "\n",
    "### 聚類入門\n",
    "\n",
    "> 🎓 我們如何創建聚類與我們如何將數據點分組有很大關係。讓我們解釋一些術語：\n",
    ">\n",
    "> 🎓 ['Transductive' vs. 'inductive'](https://wikipedia.org/wiki/Transduction_(machine_learning))\n",
    ">\n",
    "> Transductive 推理是從觀察到的訓練案例中得出的，這些案例映射到特定的測試案例。Inductive 推理是從訓練案例中得出的，這些案例映射到一般規則，然後才應用於測試案例。\n",
    ">\n",
    "> 舉個例子：假設你有一個部分標籤的數據集。一些是“唱片”，一些是“CD”，還有一些是空白的。你的任務是為空白部分提供標籤。如果你選擇 Inductive 方法，你會訓練一個模型尋找“唱片”和“CD”，並將這些標籤應用於未標籤數據。這種方法可能難以分類實際是“磁帶”的東西。而 Transductive 方法則更有效地處理這些未知數據，因為它努力將相似的項目分組，然後為整個組分配一個標籤。在這種情況下，聚類可能反映“圓形音樂物品”和“方形音樂物品”。\n",
    ">\n",
    "> 🎓 ['Non-flat' vs. 'flat' geometry](https://datascience.stackexchange.com/questions/52260/terminology-flat-geometry-in-the-context-of-clustering)\n",
    ">\n",
    "> 源自數學術語，Non-flat 和 Flat 幾何指的是通過“平面”（[歐幾里得](https://wikipedia.org/wiki/Euclidean_geometry)）或“非平面”（非歐幾里得）幾何方法測量點之間的距離。\n",
    ">\n",
    "> 在這裡，“平面”指的是歐幾里得幾何（部分被教為“平面幾何”），而“非平面”指的是非歐幾里得幾何。幾何與機器學習有什麼關係？作為兩個根植於數學的領域，必須有一種通用的方法來測量聚類中點之間的距離，這可以根據數據的性質以“平面”或“非平面”的方式完成。[歐幾里得距離](https://wikipedia.org/wiki/Euclidean_distance)是通過兩點之間線段的長度來測量的。[非歐幾里得距離](https://wikipedia.org/wiki/Non-Euclidean_geometry)則沿曲線測量。如果你的數據在可視化時似乎不在平面上，你可能需要使用專門的算法來處理它。\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/flat-nonflat.png\"\n",
    "   width=\"600\"/>\n",
    "   <figcaption>Dasani Madipalli 的信息圖</figcaption>\n",
    "\n",
    "> 🎓 ['Distances'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)\n",
    ">\n",
    "> 聚類由其距離矩陣定義，例如點之間的距離。這些距離可以通過幾種方式測量。歐幾里得聚類由點值的平均值定義，並包含一個“中心點”。距離因此通過到該中心點的距離來測量。非歐幾里得距離指的是“聚類中心”，即最接近其他點的點。聚類中心可以通過多種方式定義。\n",
    ">\n",
    "> 🎓 ['Constrained'](https://wikipedia.org/wiki/Constrained_clustering)\n",
    ">\n",
    "> [約束聚類](https://web.cs.ucdavis.edu/~davidson/Publications/ICDMTutorial.pdf)在這種無監督方法中引入了“半監督”學習。點之間的關係被標記為“不能鏈接”或“必須鏈接”，因此對數據集施加了一些規則。\n",
    ">\n",
    "> 舉個例子：如果一個算法在一批未標籤或半標籤數據上自由運行，它生成的聚類可能質量較差。在上述例子中，聚類可能將“圓形音樂物品”、“方形音樂物品”、“三角形物品”和“餅乾”分組。如果給出一些約束或規則（例如“物品必須是塑料製成的”、“物品需要能夠產生音樂”），這可以幫助“約束”算法做出更好的選擇。\n",
    ">\n",
    "> 🎓 'Density'\n",
    ">\n",
    "> 被認為“噪聲多”的數據被視為“密集”。其聚類中的點之間的距離可能在檢查時顯示為更密集或更稀疏，因此需要使用適當的聚類方法來分析這些數據。[這篇文章](https://www.kdnuggets.com/2020/02/understanding-density-based-clustering.html)展示了使用 K-Means 聚類與 HDBSCAN 算法探索具有不均勻聚類密度的噪聲數據集的區別。\n",
    "\n",
    "在這個[學習模組](https://docs.microsoft.com/learn/modules/train-evaluate-cluster-models?WT.mc_id=academic-77952-leestott)中深入了解聚類技術。\n",
    "\n",
    "### **聚類算法**\n",
    "\n",
    "有超過100種聚類算法，其使用取決於手頭數據的性質。讓我們討論一些主要的算法：\n",
    "\n",
    "-   **層次聚類**。如果一個物品是通過其與附近物品的接近程度而被分類，而不是與更遠的物品，聚類是基於其成員與其他物品的距離形成的。層次聚類的特點是反覆合併兩個聚類。\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/hierarchical.png\"\n",
    "   width=\"600\"/>\n",
    "   <figcaption>Dasani Madipalli 的信息圖</figcaption>\n",
    "\n",
    "-   **中心點聚類**。這種流行的算法需要選擇“k”，即要形成的聚類數量，之後算法確定聚類的中心點並圍繞該點收集數據。[K-means 聚類](https://wikipedia.org/wiki/K-means_clustering)是中心點聚類的一種流行版本，它將數據集分成預定義的 K 組。中心點由最近的平均值確定，因此得名。聚類的平方距離被最小化。\n",
    "\n",
    "<p >\n",
    "   <img src=\"../../images/centroid.png\"\n",
    "   width=\"600\"/>\n",
    "   <figcaption>Dasani Madipalli 的信息圖</figcaption>\n",
    "\n",
    "-   **基於分佈的聚類**。基於統計建模，分佈式聚類集中於確定數據點屬於某個聚類的概率，並據此分配。高斯混合方法屬於這種類型。\n",
    "\n",
    "-   **基於密度的聚類**。數據點根據其密度或彼此之間的分組被分配到聚類中。遠離群體的數據點被視為異常值或噪聲。DBSCAN、Mean-shift 和 OPTICS 屬於這種類型的聚類。\n",
    "\n",
    "-   **基於網格的聚類**。對於多維數據集，創建一個網格，並將數據分配到網格的單元中，從而形成聚類。\n",
    "\n",
    "學習聚類的最佳方式是親自嘗試，因此你將在這個練習中進行操作。\n",
    "\n",
    "我們需要一些套件來完成這個模組。你可以通過以下方式安裝它們：`install.packages(c('tidyverse', 'tidymodels', 'DataExplorer', 'summarytools', 'plotly', 'paletteer', 'corrplot', 'patchwork'))`\n",
    "\n",
    "或者，下面的腳本會檢查你是否擁有完成此模組所需的套件，並在缺少某些套件時為你安裝它們。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "suppressWarnings(if(!require(\"pacman\")) install.packages(\"pacman\"))\r\n",
    "\r\n",
    "pacman::p_load('tidyverse', 'tidymodels', 'DataExplorer', 'summarytools', 'plotly', 'paletteer', 'corrplot', 'patchwork')\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "## 練習 - 將你的數據進行分群\n",
    "\n",
    "分群是一種技術，透過適當的視覺化可以大大提升效果，因此讓我們從視覺化音樂數據開始吧。這個練習將幫助我們決定哪種分群方法最適合用於這些數據的特性。\n",
    "\n",
    "讓我們立即開始，先導入數據。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Load the core tidyverse and make it available in your current R session\r\n",
    "library(tidyverse)\r\n",
    "\r\n",
    "# Import the data into a tibble\r\n",
    "df <- read_csv(file = \"https://raw.githubusercontent.com/microsoft/ML-For-Beginners/main/5-Clustering/data/nigerian-songs.csv\")\r\n",
    "\r\n",
    "# View the first 5 rows of the data set\r\n",
    "df %>% \r\n",
    "  slice_head(n = 5)\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "有時候，我們可能想要更深入了解我們的數據。我們可以使用 [*glimpse()*](https://pillar.r-lib.org/reference/glimpse.html) 函數來查看 `數據` 和 `其結構`：\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Glimpse into the data set\r\n",
    "df %>% \r\n",
    "  glimpse()\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "做得好！💪\n",
    "\n",
    "我們可以看到 `glimpse()` 會提供總行數（觀察值）和列數（變數），然後在變數名稱後面顯示每個變數的前幾個條目。此外，變數的*數據類型*會在每個變數名稱後面以 `< >` 的形式直接顯示。\n",
    "\n",
    "`DataExplorer::introduce()` 可以整齊地總結這些信息：\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Describe basic information for our data\r\n",
    "df %>% \r\n",
    "  introduce()\r\n",
    "\r\n",
    "# A visual display of the same\r\n",
    "df %>% \r\n",
    "  plot_intro()\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "太棒了！我們剛剛了解到我們的數據沒有缺失值。\n",
    "\n",
    "既然如此，我們可以探索一些常見的集中趨勢統計（例如 [平均值](https://en.wikipedia.org/wiki/Arithmetic_mean) 和 [中位數](https://en.wikipedia.org/wiki/Median)）以及分散程度的測量（例如 [標準差](https://en.wikipedia.org/wiki/Standard_deviation)），使用 `summarytools::descr()`。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Describe common statistics\r\n",
    "df %>% \r\n",
    "  descr(stats = \"common\")\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "讓我們來看看數據的一般值。請注意，流行度可以是 `0`，這表示歌曲沒有排名。我們稍後會移除這些。\n",
    "\n",
    "> 🤔 如果我們正在使用聚類，一種不需要標籤數據的無監督方法，為什麼我們要展示這些帶有標籤的數據呢？在數據探索階段，這些標籤非常有用，但它們並不是聚類算法運作所必需的。\n",
    "\n",
    "### 1. 探索流行的音樂類型\n",
    "\n",
    "讓我們繼續找出最流行的音樂類型 🎶，方法是統計它出現的次數。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Popular genres\r\n",
    "top_genres <- df %>% \r\n",
    "  count(artist_top_genre, sort = TRUE) %>% \r\n",
    "# Encode to categorical and reorder the according to count\r\n",
    "  mutate(artist_top_genre = factor(artist_top_genre) %>% fct_inorder())\r\n",
    "\r\n",
    "# Print the top genres\r\n",
    "top_genres\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "事情進展得不錯！有人說一張圖片勝過數千行數據框（其實沒人真的這樣說過 😅）。但你明白我的意思吧？\n",
    "\n",
    "視覺化分類數據（字符或因子變量）的一種方法是使用柱狀圖。讓我們製作一個前10大類型的柱狀圖：\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Change the default gray theme\r\n",
    "theme_set(theme_light())\r\n",
    "\r\n",
    "# Visualize popular genres\r\n",
    "top_genres %>%\r\n",
    "  slice(1:10) %>% \r\n",
    "  ggplot(mapping = aes(x = artist_top_genre, y = n,\r\n",
    "                       fill = artist_top_genre)) +\r\n",
    "  geom_col(alpha = 0.8) +\r\n",
    "  paletteer::scale_fill_paletteer_d(\"rcartocolor::Vivid\") +\r\n",
    "  ggtitle(\"Top genres\") +\r\n",
    "  theme(plot.title = element_text(hjust = 0.5),\r\n",
    "        # Rotates the X markers (so we can read them)\r\n",
    "    axis.text.x = element_text(angle = 90))\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "現在更容易辨認出我們有「缺失」的音樂類型 🧐！\n",
    "\n",
    "> 一個好的視覺化能讓你發現意想不到的事情，或者對數據提出新的問題 —— Hadley Wickham 和 Garrett Grolemund，《R For Data Science》([R For Data Science](https://r4ds.had.co.nz/introduction.html))\n",
    "\n",
    "注意，當主要音樂類型被描述為「缺失」時，這表示 Spotify 沒有對其進行分類，所以我們應該將它移除。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Visualize popular genres\r\n",
    "top_genres %>%\r\n",
    "  filter(artist_top_genre != \"Missing\") %>% \r\n",
    "  slice(1:10) %>% \r\n",
    "  ggplot(mapping = aes(x = artist_top_genre, y = n,\r\n",
    "                       fill = artist_top_genre)) +\r\n",
    "  geom_col(alpha = 0.8) +\r\n",
    "  paletteer::scale_fill_paletteer_d(\"rcartocolor::Vivid\") +\r\n",
    "  ggtitle(\"Top genres\") +\r\n",
    "  theme(plot.title = element_text(hjust = 0.5),\r\n",
    "        # Rotates the X markers (so we can read them)\r\n",
    "    axis.text.x = element_text(angle = 90))\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "從初步的數據探索中，我們了解到前三大音樂類型在這個數據集中占主導地位。我們將重點放在 `afro dancehall`、`afropop` 和 `nigerian pop` 上，並進一步篩選數據集，移除任何人氣值為 0 的項目（這意味著它在數據集中未被分類為有人氣，可以視為對我們目的而言的噪音）：\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "nigerian_songs <- df %>% \r\n",
    "  # Concentrate on top 3 genres\r\n",
    "  filter(artist_top_genre %in% c(\"afro dancehall\", \"afropop\",\"nigerian pop\")) %>% \r\n",
    "  # Remove unclassified observations\r\n",
    "  filter(popularity != 0)\r\n",
    "\r\n",
    "\r\n",
    "\r\n",
    "# Visualize popular genres\r\n",
    "nigerian_songs %>%\r\n",
    "  count(artist_top_genre) %>%\r\n",
    "  ggplot(mapping = aes(x = artist_top_genre, y = n,\r\n",
    "                       fill = artist_top_genre)) +\r\n",
    "  geom_col(alpha = 0.8) +\r\n",
    "  paletteer::scale_fill_paletteer_d(\"ggsci::category10_d3\") +\r\n",
    "  ggtitle(\"Top genres\") +\r\n",
    "  theme(plot.title = element_text(hjust = 0.5))\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "讓我們看看數據集中數值變量之間是否存在明顯的線性關係。這種關係可以通過[相關係數](https://en.wikipedia.org/wiki/Correlation)來進行數學量化。\n",
    "\n",
    "相關係數是一個介於 -1 和 1 之間的值，用於表示關係的強度。大於 0 的值表示*正相關*（一個變量的高值往往與另一個變量的高值同時出現），而小於 0 的值表示*負相關*（一個變量的高值往往與另一個變量的低值同時出現）。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Narrow down to numeric variables and fid correlation\r\n",
    "corr_mat <- nigerian_songs %>% \r\n",
    "  select(where(is.numeric)) %>% \r\n",
    "  cor()\r\n",
    "\r\n",
    "# Visualize correlation matrix\r\n",
    "corrplot(corr_mat, order = 'AOE', col = c('white', 'black'), bg = 'gold2')  \r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "數據之間的相關性並不強，除了 `energy` 和 `loudness` 之間的關係較為明顯，這是合理的，因為嘈吵的音樂通常更有活力。`Popularity` 與 `release date` 也有一定的關聯，這也合理，因為較新的歌曲可能更受歡迎。`Length` 和 `energy` 似乎也有一定的相關性。\n",
    "\n",
    "這些數據交給聚類算法分析可能會很有趣！\n",
    "\n",
    "> 🎓 請注意，相關性並不代表因果關係！我們有相關性的證據，但沒有因果關係的證據。一個[有趣的網站](https://tylervigen.com/spurious-correlations)提供了一些視覺化例子來強調這一點。\n",
    "\n",
    "### 2. 探索數據分佈\n",
    "\n",
    "讓我們提出一些更微妙的問題。不同的音樂類型在其舞蹈性上的感知是否因其受歡迎程度而顯著不同？讓我們使用[密度圖](https://www.khanacademy.org/math/ap-statistics/density-curves-normal-distribution-ap/density-curves/v/density-curves)來檢視我們前三大音樂類型在受歡迎程度和舞蹈性上的數據分佈，沿著給定的 x 和 y 軸進行分析。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# Perform 2D kernel density estimation\r\n",
    "density_estimate_2d <- nigerian_songs %>% \r\n",
    "  ggplot(mapping = aes(x = popularity, y = danceability, color = artist_top_genre)) +\r\n",
    "  geom_density_2d(bins = 5, size = 1) +\r\n",
    "  paletteer::scale_color_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
    "  xlim(-20, 80) +\r\n",
    "  ylim(0, 1.2)\r\n",
    "\r\n",
    "# Density plot based on the popularity\r\n",
    "density_estimate_pop <- nigerian_songs %>% \r\n",
    "  ggplot(mapping = aes(x = popularity, fill = artist_top_genre, color = artist_top_genre)) +\r\n",
    "  geom_density(size = 1, alpha = 0.5) +\r\n",
    "  paletteer::scale_fill_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
    "  paletteer::scale_color_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
    "  theme(legend.position = \"none\")\r\n",
    "\r\n",
    "# Density plot based on the danceability\r\n",
    "density_estimate_dance <- nigerian_songs %>% \r\n",
    "  ggplot(mapping = aes(x = danceability, fill = artist_top_genre, color = artist_top_genre)) +\r\n",
    "  geom_density(size = 1, alpha = 0.5) +\r\n",
    "  paletteer::scale_fill_paletteer_d(\"RSkittleBrewer::wildberry\") +\r\n",
    "  paletteer::scale_color_paletteer_d(\"RSkittleBrewer::wildberry\")\r\n",
    "\r\n",
    "\r\n",
    "# Patch everything together\r\n",
    "library(patchwork)\r\n",
    "density_estimate_2d / (density_estimate_pop + density_estimate_dance)\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "我們可以看到，不論是什麼類型，這些同心圓都能對齊。是否意味著尼日利亞人的喜好在這個類型的某個舞蹈性水平上趨於一致？\n",
    "\n",
    "整體來說，這三個類型在受歡迎程度和舞蹈性方面是相符的。要在這些鬆散排列的數據中找出聚類將會是一個挑戰。讓我們看看散點圖是否能提供支持。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "code",
   "execution_count": null,
   "source": [
    "# A scatter plot of popularity and danceability\r\n",
    "scatter_plot <- nigerian_songs %>% \r\n",
    "  ggplot(mapping = aes(x = popularity, y = danceability, color = artist_top_genre, shape = artist_top_genre)) +\r\n",
    "  geom_point(size = 2, alpha = 0.8) +\r\n",
    "  paletteer::scale_color_paletteer_d(\"futurevisions::mars\")\r\n",
    "\r\n",
    "# Add a touch of interactivity\r\n",
    "ggplotly(scatter_plot)\r\n"
   ],
   "outputs": [],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "source": [
    "散佈圖使用相同的坐標軸顯示了類似的收斂模式。\n",
    "\n",
    "一般來說，對於分群分析，你可以使用散佈圖來展示數據的群集，因此掌握這種可視化方法非常有用。在下一課中，我們將使用這些篩選後的數據，並利用 k-means 分群法來發現數據中有趣的重疊群組。\n",
    "\n",
    "## **🚀 挑戰**\n",
    "\n",
    "為了準備下一課，製作一個關於各種分群算法的圖表，這些算法可能會在生產環境中被發現和使用。分群算法試圖解決哪些類型的問題？\n",
    "\n",
    "## [**課後測驗**](https://gray-sand-07a10f403.1.azurestaticapps.net/quiz/28/)\n",
    "\n",
    "## **回顧與自學**\n",
    "\n",
    "在應用分群算法之前，正如我們所學的，了解數據集的特性是個好主意。你可以在[這裡](https://www.kdnuggets.com/2019/10/right-clustering-algorithm.html)閱讀更多相關內容。\n",
    "\n",
    "加深你對分群技術的理解：\n",
    "\n",
    "-   [使用 Tidymodels 和相關工具訓練及評估分群模型](https://rpubs.com/eR_ic/clustering)\n",
    "\n",
    "-   Bradley Boehmke & Brandon Greenwell, [*Hands-On Machine Learning with R*](https://bradleyboehmke.github.io/HOML/)*.*\n",
    "\n",
    "## **作業**\n",
    "\n",
    "[研究其他分群的可視化方法](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/1-Visualize/assignment.md)\n",
    "\n",
    "## 特別感謝：\n",
    "\n",
    "[Jen Looper](https://www.twitter.com/jenlooper) 創建了這個模組的原始 Python 版本 ♥️\n",
    "\n",
    "[`Dasani Madipalli`](https://twitter.com/dasani_decoded) 創作了令人驚嘆的插圖，使機器學習概念更易於理解。\n",
    "\n",
    "祝學習愉快，\n",
    "\n",
    "[Eric](https://twitter.com/ericntay)，Gold Microsoft Learn 學生大使。\n"
   ],
   "metadata": {}
  },
  {
   "cell_type": "markdown",
   "metadata": {},
   "source": [
    "\n---\n\n**免責聲明**：  \n本文件已使用人工智能翻譯服務 [Co-op Translator](https://github.com/Azure/co-op-translator) 進行翻譯。儘管我們致力於提供準確的翻譯，但請注意，自動翻譯可能包含錯誤或不準確之處。原始語言的文件應被視為權威來源。對於重要信息，建議使用專業人工翻譯。我們對因使用此翻譯而引起的任何誤解或錯誤解釋概不負責。\n"
   ]
  }
 ],
 "metadata": {
  "anaconda-cloud": "",
  "kernelspec": {
   "display_name": "R",
   "language": "R",
   "name": "ir"
  },
  "language_info": {
   "codemirror_mode": "r",
   "file_extension": ".r",
   "mimetype": "text/x-r-source",
   "name": "R",
   "pygments_lexer": "r",
   "version": "3.4.1"
  },
  "coopTranslator": {
   "original_hash": "99c36449cad3708a435f6798cfa39972",
   "translation_date": "2025-09-03T20:08:46+00:00",
   "source_file": "5-Clustering/1-Visualize/solution/R/lesson_14-R.ipynb",
   "language_code": "hk"
  }
 },
 "nbformat": 4,
 "nbformat_minor": 1
}