diff --git a/Clustering/1-Visualize/README.md b/Clustering/1-Visualize/README.md index 0b5c7ff3..d568641c 100644 --- a/Clustering/1-Visualize/README.md +++ b/Clustering/1-Visualize/README.md @@ -10,7 +10,7 @@ Clustering is a type of [Unsupervised Learning](https://wikipedia.org/wiki/Unsup > TODO infographic -[Clustering]() is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music. +[Clustering](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124) is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music. ✅ Take a minute to think about the uses of clustering. In real life, clustering happens whenever you have a pile of laundry and need to sort out your family members' clothes 🧦👕👖🩲. In data science, clustering happens when trying to analyze a user's preferences, or determine the characteristics of any unlabeled dataset. Clustering, in a way, helps make sense of chaos. @@ -63,6 +63,8 @@ Alternately, you could use it for grouping search results - by shopping links, i > 🎓 ['Constrained'](https://wikipedia.org/wiki/Constrained_clustering) > > Constrained Clustering introduces 'semi-supervised' learning into this unsupervised method. The relationships between points are flagged as 'cannot link' or 'must-link' so some rules are forced on the dataset. +> +>An example: If an algorithm is set free on a batch of unlabelled or semi-labelled data, the clusters it produces may be of poor quality. In the example above, the clusters might group 'round music things' and 'square music things' and 'triangular things' and 'cookies'. If given some constraints, or rules to follow ("the item must be made of plastic", "the item needs to be able to produce music") this can help 'constrain' the algorithm to make better choices. > > 🎓 'Density' > diff --git a/Clustering/2-K-Means/notebook.ipynb b/Clustering/2-K-Means/notebook.ipynb index e69de29b..fd6b5b32 100644 --- a/Clustering/2-K-Means/notebook.ipynb +++ b/Clustering/2-K-Means/notebook.ipynb @@ -0,0 +1,28 @@ +{ + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3 + }, + "orig_nbformat": 2 + }, + "nbformat": 4, + "nbformat_minor": 2, + "cells": [ + { + "source": [ + "# Nigerian Music scraped from Spotify - an analysis" + ], + "cell_type": "markdown", + "metadata": {} + } + ] +} \ No newline at end of file diff --git a/Clustering/README.md b/Clustering/README.md index 812b4b1c..c9cc2fc0 100644 --- a/Clustering/README.md +++ b/Clustering/README.md @@ -13,9 +13,8 @@ In this series of lessons, you will discover new ways to analyze data using Clus 1. [Introduction to Clustering](1-Visualize/README.md) 2. [K-Means Clustering](2-K-Means/README.md) -3. [Centroid Clustering](3-Centroid/README.md) ## Credits -These lessons were written with ♥️ by [Jen Looper](https://www.twitter.com/jenlooper) +These lessons were written with ♥️ by [Jen Looper](https://www.twitter.com/jenlooper) with helpful reviews by Muhammad Sakib Khan Inan. The [Nigerian Songs](https://www.kaggle.com/sootersaalu/nigerian-songs-spotify) dataset was sourced from Kaggle as scraped from Spotify. \ No newline at end of file