Welcome to this course on classic machine learning for beginners! If you are totally new to this topic, you're very welcome. If you are an experienced ML practitioner looking to brush up on an area, you're equally welcome. We want to create a friendly launching spot for your ML learning and would be happy to evaluate, respond to, and incorporate your [feedback](https://github.com/microsoft/ML-For-Beginners/discussions).
[![Introduction to ML](https://img.youtube.com/vi/h0e2HAPTGF4/0.jpg)](https://youtu.be/h0e2HAPTGF4 "Introduction to ML")
[![Introduction to ML](https://img.youtube.com/vi/h0e2HAPTGF4/0.jpg)](https://youtu.be/h0e2HAPTGF4 "Introduction to ML")
> 🎥 Click the image above for a video: MIT's John Guttag introduces Machine Learning
> 🎥 Click the image above for a video: MIT's John Guttag introduces Machine Learning
@ -14,11 +14,35 @@ Describe what will be covered
Before embarking on this curriculum, you need to have your computer set up and ready to run notebooks locally. Learn more about how to do this in this [set of videos](https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6)
Before embarking on this curriculum, you need to have your computer set up and ready to run notebooks locally. Learn more about how to do this in this [set of videos](https://www.youtube.com/playlist?list=PLlrxD0HtieHhS8VzuMCfQD4uJ9yne1mE6)
It's also recommended to grasp the basics of Python, which we use in this course. Familiarize yourself with Scikit-Learn as well.
In today's era of Industrial Revolution 4.0, the term Machine Learning is one of the most popular and frequently used terms. There is a huge possibility that you have heard this term at least once if you have some sort of familiarity with technology irrespective of your working domain. Yet, Machine Learning is a great mystery to most people. For a beginner in Machine Learning, the subject can sometimes be felt as overwhelming. Therefore, it is important to understand what Machine Learning actually is or the motivation of this widely used term.
We live in a universe full of mystery and magical-looking things. The great scientists of the world like Stephen Hawking, Albert Einstein, and many more have devoted their lives in search of meaningful information that uncovers the mysteries of the world around us. A human child learns to understand mysteries and uncover the meaning of the mystical world year by year in the process of growing as an adult. Human Brain is not preprogrammed. It means we don't come up in the world with all fixed sets of knowledge about certain things.
Rather than that a child's brain perceives the nature around it and learns the hidden patterns in different objects which help it to build logical rules to identify a certain object next time. And this learning process of the human brain makes humans the most sophisticated living creature of this world. Learning continuously by discovering hidden patterns from the environment around us and making ourselves better and better throughout our lifetime is related to a concept called Brain's Plasticity. Superficially, we can draw some motivational similarities between the learning process of the Human Brain and the concepts of Machine Learning.
Human Brain perceives things from the real world, processes the perceived information, makes rational decisions, and performs certain actions based on circumstances. This is what we called behaving intelligently. When we artificially program this whole intelligent behavioral process to a machine, it is called Artificial Intelligence. Whereas, Machine Learning, an important subset of Artificial Intelligence, is concerned with fetching meaningful information and finding hidden patterns from perceived data to corroborate the rational decision-making process.
In this curriculum, we are going to cover only the core concepts of Machine Learning that a beginner must know. However, Artificial Intelligence or Deep Neural Networks are broader concepts that are out of the scope of this learning module. But to understand broader concepts of Artificial Intelligence or Deep Learning, a strong fundamental knowledge of Machine Learning is indispensable.
The major motivation of Machine Learning is to create automated systems that can learn hidden patterns from data sophisticatedly to infer intelligent decisions which seem to be loosely inspired by how Human Brain learns certain things based on the data it perceives from the outside world.
The applications of Machine Learning are now almost everywhere. Considering the immense potential of state-of-the-art Machine Learning algorithms, researchers have been exploring their capability to solve multi-dimensional and multi-disciplinary real-life problems with great positive outcomes. Natural Language Processing (NLP) and Computer Vision (CV) are two extensions of Machine Learning, accordingly concerned with processing text/writings or sound/speech data to understand the Languages of Humans, and perusing images perceived from the real world.
Diagnosing a disease like Breast Cancer from a patient's medical history or reports, understanding Climate Change from historical weather data to predict a natural calamity, identifying a person from an image automatically, understanding the sentiment of a text, or detecting fake news to stop spreading propaganda, etc are some common use case of Machine Learning in recent times. Finance, economics, Earth science, Space exploration, biomedical, cognitive science, and even fields in the humanities have adapted Machine Learning to solve arduous problems of their domain.
Machine Learning automates the process of automation by finding meaningful insights from real-world or generated data. In this context, a bright future for machine learning is not so far. In near future, the knowledge of Machine Learning is going to be a must for people from any domain due to the wide usage of this great technology in almost every single domain.
@ -58,7 +58,7 @@ Deepen your understanding of Clustering techniques in this [Learn module](https:
>
>
>'Flat' in this context refers to Euclidean geometry (parts of which are taught as 'plane' geometry), and non-flat refers to non-Euclidean geometry. What does geometry have to do with machine learning? Well, as two fields that are rooted in mathematics, there must be a common way to measure distances between points in clusters, and that can be done in a 'flat' or 'non-flat' way, depending on the nature of the data. [Euclidean distances](https://wikipedia.org/wiki/Euclidean_distance) are measured as the length of a line segment between two points. [Non-Euclidean distances](https://wikipedia.org/wiki/Non-Euclidean_geometry) are measured along a curve. If your data, visualized, seems to not exist on a plane, you might need to use a specialized algorithm to handle it.
>'Flat' in this context refers to Euclidean geometry (parts of which are taught as 'plane' geometry), and non-flat refers to non-Euclidean geometry. What does geometry have to do with machine learning? Well, as two fields that are rooted in mathematics, there must be a common way to measure distances between points in clusters, and that can be done in a 'flat' or 'non-flat' way, depending on the nature of the data. [Euclidean distances](https://wikipedia.org/wiki/Euclidean_distance) are measured as the length of a line segment between two points. [Non-Euclidean distances](https://wikipedia.org/wiki/Non-Euclidean_geometry) are measured along a curve. If your data, visualized, seems to not exist on a plane, you might need to use a specialized algorithm to handle it.
>
>
![Flat vs Nonflat Geometry Infographic](https://github.com/microsoft/ML-For-Beginners/blob/main/5-Clustering/images/Flat%20Vs%20Nonflat%20Geometry.png)
![Flat vs Nonflat Geometry Infographic](./images/flat-nonflat.png)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
@ -83,14 +83,14 @@ There are over 100 clustering algorithms, and their use depends on the nature of
If an object is classified by its proximity to a nearby object, rather than to one farther away, clusters are formed based on their members' distance to and from other objects. Scikit-Learn's Agglomerative clustering is hierarchical.
If an object is classified by its proximity to a nearby object, rather than to one farther away, clusters are formed based on their members' distance to and from other objects. Scikit-Learn's Agglomerative clustering is hierarchical.
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
> Infographic by [Dasani Madipalli](https://twitter.com/dasani_decoded)
**Centroid clustering**
**Centroid clustering**
This popular algorithm requires the choice of 'k', or the number of clusters to form, after which the algorithm determines the center point of a cluster and gathers data around that point. [K-means clustering](https://wikipedia.org/wiki/K-means_clustering) is a popular version of centroid clustering. The center is determined by the nearest mean, thus the name. The squared distance from the cluster is minimized.
This popular algorithm requires the choice of 'k', or the number of clusters to form, after which the algorithm determines the center point of a cluster and gathers data around that point. [K-means clustering](https://wikipedia.org/wiki/K-means_clustering) is a popular version of centroid clustering. The center is determined by the nearest mean, thus the name. The squared distance from the cluster is minimized.