Update with infographics

Adding infographics for the following:
- flat vs nonflat geometry
- hierarchical clustering
- centroid clustering
pull/34/head
Dasani Madipalli 4 years ago committed by GitHub
parent fd3824a086
commit 259fee55fe
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -58,7 +58,8 @@ Deepen your understanding of Clustering techniques in this [Learn module](https:
> >
>'Flat' in this context refers to Euclidean geometry (parts of which are taught as 'plane' geometry), and non-flat refers to non-Euclidean geometry. What does geometry have to do with machine learning? Well, as two fields that are rooted in mathematics, there must be a common way to measure distances between points in clusters, and that can be done in a 'flat' or 'non-flat' way, depending on the nature of the data. [Euclidean distances](https://wikipedia.org/wiki/Euclidean_distance) are measured as the length of a line segment between two points. [Non-Euclidean distances](https://wikipedia.org/wiki/Non-Euclidean_geometry) are measured along a curve. If your data, visualized, seems to not exist on a plane, you might need to use a specialized algorithm to handle it. >'Flat' in this context refers to Euclidean geometry (parts of which are taught as 'plane' geometry), and non-flat refers to non-Euclidean geometry. What does geometry have to do with machine learning? Well, as two fields that are rooted in mathematics, there must be a common way to measure distances between points in clusters, and that can be done in a 'flat' or 'non-flat' way, depending on the nature of the data. [Euclidean distances](https://wikipedia.org/wiki/Euclidean_distance) are measured as the length of a line segment between two points. [Non-Euclidean distances](https://wikipedia.org/wiki/Non-Euclidean_geometry) are measured along a curve. If your data, visualized, seems to not exist on a plane, you might need to use a specialized algorithm to handle it.
> >
> Infographic: like the last one here https://datascience.stackexchange.com/questions/52260/terminology-flat-geometry-in-the-context-of-clustering ![Flat vs Nonflat Geometry Infographic](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/images/Flat%20Vs%20Nonflat%20Geometry.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
> >
> 🎓 ['Distances'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf) > 🎓 ['Distances'](https://web.stanford.edu/class/cs345a/slides/12-clustering.pdf)
> >
@ -82,13 +83,15 @@ There are over 100 clustering algorithms, and their use depends on the nature of
If an object is classified by its proximity to a nearby object, rather than to one farther away, clusters are formed based on their members' distance to and from other objects. Scikit-Learn's Agglomerative clustering is hierarchical. If an object is classified by its proximity to a nearby object, rather than to one farther away, clusters are formed based on their members' distance to and from other objects. Scikit-Learn's Agglomerative clustering is hierarchical.
TODO: infographic ![Hierarchical clustering Infographic](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/images/Hierarchical%20Clustering.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
**Centroid clustering** **Centroid clustering**
This popular algorithm requires the choice of 'k', or the number of clusters to form, after which the algorithm determines the center point of a cluster and gathers data around that point. [K-means clustering](https://wikipedia.org/wiki/K-means_clustering) is a popular version of centroid clustering. The center is determined by the nearest mean, thus the name. The squared distance from the cluster is minimized. This popular algorithm requires the choice of 'k', or the number of clusters to form, after which the algorithm determines the center point of a cluster and gathers data around that point. [K-means clustering](https://wikipedia.org/wiki/K-means_clustering) is a popular version of centroid clustering. The center is determined by the nearest mean, thus the name. The squared distance from the cluster is minimized.
TODO: infographic ![Centroid clustering Infographic](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/images/Centroid%20Clustering.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
**Distribution-based clustering** **Distribution-based clustering**

Loading…
Cancel
Save