From eecc4e1d3aed6ec7eb62c32a6912c3904ddc18a4 Mon Sep 17 00:00:00 2001 From: Jen Looper Date: Wed, 26 May 2021 16:41:04 -0400 Subject: [PATCH] adding a note on constrained data --- Clustering/1-Visualize/README.md | 4 +++- Clustering/2-K-Means/notebook.ipynb | 28 ++++++++++++++++++++++++++++ Clustering/README.md | 3 +-- 3 files changed, 32 insertions(+), 3 deletions(-) diff --git a/Clustering/1-Visualize/README.md b/Clustering/1-Visualize/README.md index 0b5c7ff34..d568641c1 100644 --- a/Clustering/1-Visualize/README.md +++ b/Clustering/1-Visualize/README.md @@ -10,7 +10,7 @@ Clustering is a type of [Unsupervised Learning](https://wikipedia.org/wiki/Unsup > TODO infographic -[Clustering]() is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music. +[Clustering](https://link.springer.com/referenceworkentry/10.1007%2F978-0-387-30164-8_124) is very useful for data exploration. Let's see if it can help discover trends and patterns in the way Nigerian audiences consume music. ✅ Take a minute to think about the uses of clustering. In real life, clustering happens whenever you have a pile of laundry and need to sort out your family members' clothes 🧦👕👖🩲. In data science, clustering happens when trying to analyze a user's preferences, or determine the characteristics of any unlabeled dataset. Clustering, in a way, helps make sense of chaos. @@ -63,6 +63,8 @@ Alternately, you could use it for grouping search results - by shopping links, i > 🎓 ['Constrained'](https://wikipedia.org/wiki/Constrained_clustering) > > Constrained Clustering introduces 'semi-supervised' learning into this unsupervised method. The relationships between points are flagged as 'cannot link' or 'must-link' so some rules are forced on the dataset. +> +>An example: If an algorithm is set free on a batch of unlabelled or semi-labelled data, the clusters it produces may be of poor quality. In the example above, the clusters might group 'round music things' and 'square music things' and 'triangular things' and 'cookies'. If given some constraints, or rules to follow ("the item must be made of plastic", "the item needs to be able to produce music") this can help 'constrain' the algorithm to make better choices. > > 🎓 'Density' > diff --git a/Clustering/2-K-Means/notebook.ipynb b/Clustering/2-K-Means/notebook.ipynb index e69de29bb..fd6b5b321 100644 --- a/Clustering/2-K-Means/notebook.ipynb +++ b/Clustering/2-K-Means/notebook.ipynb @@ -0,0 +1,28 @@ +{ + "metadata": { + "language_info": { + "codemirror_mode": { + "name": "ipython", + "version": 3 + }, + "file_extension": ".py", + "mimetype": "text/x-python", + "name": "python", + "nbconvert_exporter": "python", + "pygments_lexer": "ipython3", + "version": 3 + }, + "orig_nbformat": 2 + }, + "nbformat": 4, + "nbformat_minor": 2, + "cells": [ + { + "source": [ + "# Nigerian Music scraped from Spotify - an analysis" + ], + "cell_type": "markdown", + "metadata": {} + } + ] +} \ No newline at end of file diff --git a/Clustering/README.md b/Clustering/README.md index 812b4b1c8..c9cc2fc01 100644 --- a/Clustering/README.md +++ b/Clustering/README.md @@ -13,9 +13,8 @@ In this series of lessons, you will discover new ways to analyze data using Clus 1. [Introduction to Clustering](1-Visualize/README.md) 2. [K-Means Clustering](2-K-Means/README.md) -3. [Centroid Clustering](3-Centroid/README.md) ## Credits -These lessons were written with ♥️ by [Jen Looper](https://www.twitter.com/jenlooper) +These lessons were written with ♥️ by [Jen Looper](https://www.twitter.com/jenlooper) with helpful reviews by Muhammad Sakib Khan Inan. The [Nigerian Songs](https://www.kaggle.com/sootersaalu/nigerian-songs-spotify) dataset was sourced from Kaggle as scraped from Spotify. \ No newline at end of file