adding a note on constrained data

3 years ago · 407aa1f731
parent 6b2cd620fc
commit 407aa1f731
3 changed files with 32 additions and 3 deletions
--- a/Clustering/1-Visualize/README.md
+++ b/Clustering/1-Visualize/README.md
@ -10,7 +10,7 @@ Clustering is a type of [Unsupervised Learning](https://wikipedia.org/wiki/Unsup

 > TODO infographic

-[Clustering]() is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music.
+[Clustering](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124) is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music.

 ✅ Take a minute to think about the uses of clustering. In real life, clustering happens whenever you have a pile of laundry and need to sort out your family members' clothes 🧦👕👖🩲. In data science, clustering happens when trying to analyze a user's preferences, or determine the characteristics of any unlabeled dataset. Clustering, in a way, helps make sense of chaos.

@ -63,6 +63,8 @@ Alternately, you could use it for grouping search results - by shopping links, i
 > 🎓 ['Constrained'](https://wikipedia.org/wiki/Constrained_clustering)
 > 
 > Constrained Clustering introduces 'semi-supervised' learning into this unsupervised method. The relationships between points are flagged as 'cannot link' or 'must-link' so some rules are forced on the dataset.
+>
+>An example: If an algorithm is set free on a batch of unlabelled or semi-labelled data, the clusters it produces may be of poor quality. In the example above, the clusters might group 'round music things' and 'square music things' and 'triangular things' and 'cookies'. If given some constraints, or rules to follow ("the item must be made of plastic", "the item needs to be able to produce music") this can help 'constrain' the algorithm to make better choices.
 > 
 > 🎓 'Density'
 > 
--- a/Clustering/2-K-Means/notebook.ipynb
+++ b/Clustering/2-K-Means/notebook.ipynb
@ -0,0 +1,28 @@
+{
+ "metadata": {
+  "language_info": {
+   "codemirror_mode": {
+    "name": "ipython",
+    "version": 3
+   },
+   "file_extension": ".py",
+   "mimetype": "text/x-python",
+   "name": "python",
+   "nbconvert_exporter": "python",
+   "pygments_lexer": "ipython3",
+   "version": 3
+  },
+  "orig_nbformat": 2
+ },
+ "nbformat": 4,
+ "nbformat_minor": 2,
+ "cells": [
+  {
+   "source": [
+    "# Nigerian Music scraped from Spotify - an analysis"
+   ],
+   "cell_type": "markdown",
+   "metadata": {}
+  }
+ ]
+}
--- a/Clustering/README.md
+++ b/Clustering/README.md
@ -13,9 +13,8 @@ In this series of lessons, you will discover new ways to analyze data using Clus

 1. [Introduction to Clustering](1-Visualize/README.md)
 2. [K-Means Clustering](2-K-Means/README.md)
-3. [Centroid Clustering](3-Centroid/README.md)
 ## Credits

-These lessons were written with ♥️ by [Jen Looper](https://www.twitter.com/jenlooper)
+These lessons were written with ♥️ by [Jen Looper](https://www.twitter.com/jenlooper) with helpful reviews by Muhammad Sakib Khan Inan.

 The [Nigerian Songs](https://www.kaggle.com/sootersaalu/nigerian-songs-spotify) dataset was sourced from Kaggle as scraped from Spotify.