diff --git a/.github/ISSUE_TEMPLATE/lesson-card.md b/.github/ISSUE_TEMPLATE/lesson-card.md new file mode 100644 index 00000000..71ebeba6 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/lesson-card.md @@ -0,0 +1,15 @@ +--- +name: Lesson Card +about: Add a Lesson Card +title: "[LESSON]" +labels: '' +assignees: '' + +--- + +- [ ] quiz 1 +- [ ] written content +- [ ] quiz 2 +- [ ] challenge +- [ ] extra reading +- [ ] assignment diff --git a/.github/ISSUE_TEMPLATE/lesson_elements.md b/.github/ISSUE_TEMPLATE/lesson_elements.md new file mode 100644 index 00000000..eef18922 --- /dev/null +++ b/.github/ISSUE_TEMPLATE/lesson_elements.md @@ -0,0 +1,6 @@ +- [ ] quiz 1 +- [ ] written content +- [ ] quiz 2 +- [ ] challenge +- [ ] extra reading +- [ ] assignment diff --git a/Clustering/1-Visualize/README.md b/Clustering/1-Visualize/README.md index 5a8093d0..2f50b043 100644 --- a/Clustering/1-Visualize/README.md +++ b/Clustering/1-Visualize/README.md @@ -1,12 +1,13 @@ # Introduction to Clustering +Clustering is a type of [Unsupervised Learning](https://wikipedia.org/wiki/Unsupervised_learning) that presumes that a dataset is unlabelled. It uses various algorithms to sort through unlabeled data and provide groupings according to patterns it discerns in the data. + [![No One Like You by PSquare](https://img.youtube.com/vi/ty2advRiWJM/0.jpg)](https://youtu.be/ty2advRiWJM "No One Like You by PSquare") > While you're studying Machine Learning with Clustering, enjoy some Nigerian Dance Hall tracks - this is a highly rated song from 2014 by PSquare. ## [Pre-lecture quiz](link-to-quiz-app) -### Introduction -Clustering is a type of [Unsupervised Learning](https://wikipedia.org/wiki/Unsupervised_learning) that presumes that a dataset is unlabelled. It uses various algorithms to sort through unlabeled data and provide groupings according to patterns it discerns in the data. +### Introduction > TODO infographic diff --git a/NLP/1-Introduction-to-NLP/README.md b/NLP/1-Introduction-to-NLP/README.md index dcc0178c..bd8a7e95 100644 --- a/NLP/1-Introduction-to-NLP/README.md +++ b/NLP/1-Introduction-to-NLP/README.md @@ -1,13 +1,14 @@ # Introduction to Natural Language Processing - -Add a sketchnote if possible/appropriate + +This lesson covers a brief history and important concepts of *Computational Linguistics* focusing on *Natural Language Processing*. [![NLP 101](https://img.youtube.com/vi/C75SiVhXjRM/0.jpg)](https://youtu.be/C75SiVhXjRM "NLP 101") + ## [Pre-lecture quiz](link-to-quiz-app) ## Introduction -This lesson covers a brief history and important concepts of *Computational Linguistics* focusing on *Natural Language Processing*. NLP, as it is commonly known, is one of the best-known areas where machine learning has been applied and used in production software. +NLP, as it is commonly known, is one of the best-known areas where machine learning has been applied and used in production software. โœ… Can you think of software that you use every day that probably has some NLP embedded? What about your word processing programs or mobile apps that you use regularly? @@ -20,6 +21,7 @@ This is possible because someone wrote a computer program to do this. A few deca At this point, you may be remembering school classes where the teacher covered the parts of grammar in a sentence. In some countries, students are taught grammar and linguistics as a dedicated subject, but in many, these topics are included as part of learning a language: either your first language in primary school (learning to read and write) and perhaps a second language in post-primary, or high school. Don't worry if you are not an expert at differentiating nouns from verbs or adverbs from adjectives! If you struggle with the difference between the *simple present* and *present progressive*, you are not alone. This is a challenging thing for many people, even native speakers of a language. The good news is that computers are really good at applying formal rules, and you will learn to write code that can *parse* a sentence as well as a human. The greater challenge you will examine later is understanding the *meaning*, and *sentiment*, of a sentence. + ## Prerequisites For this lesson, the main prerequisite is being able to read and understand the language of this lesson. There are no math problems or equations to solve. While the original author wrote this lesson in English, it is also translated into other languages, so you could be reading a translation. There are examples where a number of different languages are used (to compare the different grammar rules of different languages). These are *not* translated, but the explanatory text is, so the meaning should be clear. diff --git a/NLP/2-Tasks/README.md b/NLP/2-Tasks/README.md index 1a9bc2de..db80d056 100644 --- a/NLP/2-Tasks/README.md +++ b/NLP/2-Tasks/README.md @@ -1,14 +1,12 @@ # Common Natural Language Processing Tasks and Techniques -Add a sketchnote if possible/appropriate - -![Embed a video here if available](video-url) +For most *Natural Language Processing* tasks, the text to be processed must be broken down, examined, and the results stored or cross referenced with rules and data sets. This allows the programmer to derive the meaning or intent or only the frequency of terms and words in a text. ## [Pre-lecture quiz](link-to-quiz-app) -For most *Natural Language Processing* tasks, the text to be processed must be broken down, examined, and the results stored or cross referenced with rules and data sets. This allows the programmer to derive the meaning or intent or only the frequency of terms and words in a text. Let's discover common techniques used in processing text. Combined with machine learning, these techniques help you to analyse large amounts of text efficiently. Before applying ML to these tasks, however, let's understand the problems encountered by an NLP specialist. + ## Tasks common to NLP > ๐ŸŽ“ **Tokenization** @@ -117,6 +115,7 @@ np = user_input_blob.noun_phrases ``` > What's going on here? [ConllExtractor](https://textblob.readthedocs.io/en/dev/api_reference.html?highlight=Conll#textblob.en.np_extractors.ConllExtractor) is "A noun phrase extractor that uses chunk parsing trained with the ConLL-2000 training corpus." ConLL-2000 refers to the Conference on Computational Natural Language Learning (CoNLL-2000). Each year the conference hosted a workshop to tackle a thorny NLP problem, and in 2000 it was noun chunking. A model was trained on the Wall Street Journal, with "sections 15-18 as training data (211727 tokens) and section 20 as test data (47377 tokens)". You can look at the procedures used [here](https://www.clips.uantwerpen.be/conll2000/chunking/) and the [results](https://ifarm.nl/erikt/research/np-chunking.html). + ## Task: Improving your bot with a little NLP In the previous lesson you built a very simple Q&A bot. Now, you'll make Marvin a bit more sympathetic by analyzing your input for sentiment and printing out a response to match the sentiment. You'll also need to identify a `noun_phrase` and ask about it. @@ -170,9 +169,11 @@ One possible solution to the task is [here](solution/bot.py) 1. Do you think the sympathetic responses would 'trick' someone into thinking that the bot actually understood them? 2. Does identifying the noun phrase make the bot more 'believable'? 3. Why would extracting a 'noun phrase' from a sentence a useful thing to do? + ## ๐Ÿš€Challenge Take a task in the prior knowledge check and try to implement it. Test the bot on a friend. Can it trick them? Can you make your bot more 'believable?' + ## [Post-lecture quiz](link-to-quiz-app) ## Review & Self Study diff --git a/NLP/3-Translation-Sentiment/README.md b/NLP/3-Translation-Sentiment/README.md index 8b3458f2..70f9ffad 100644 --- a/NLP/3-Translation-Sentiment/README.md +++ b/NLP/3-Translation-Sentiment/README.md @@ -1,13 +1,9 @@ # Translation and Sentiment Analysis with ML -Add a sketchnote if possible/appropriate - -![Embed a video here if available](video-url) +In the previous lessons you learned how to build a basic bot using TextBlob, a library that embeds ML behind-the-scenes to perform basic NLP tasks such as noun phrase extraction. Another important challenge in computational linguistics is accurate *translation* of a sentence from one spoken or written language to another. ## [Pre-lecture quiz](link-to-quiz-app) -In the previous lessons you learned how to build a basic bot using TextBlob, a library that embeds ML behind-the-scenes to perform basic NLP tasks such as noun phrase extraction. Another important challenge in computational linguistics is accurate *translation* of a sentence from one spoken or written language to another. - This is a very hard problem compounded by the fact that there are thousands of languages and each can have very different grammar rules. One approach is to convert the formal grammar rules for one language, such as English, into a non-language dependent structure, and then translate it by converting back to another language. This means that you would take the following steps: 1. Identify or tag the words in input language into nouns, verbs etc. @@ -55,6 +51,7 @@ In this case, the translation informed by ML does a better job than the human tr > What's going on here? and why is TextBlob so good at translation? Well, behind the scenes, it's using Google translate, a sophisticated AI able to parse millions of phrases to predict the best strings for the task at hand. There's nothing manual going on here and you need an internet connection to use `blob.translate`. โœ… Try some more sentences. Which is better, ML or human translation? In which cases? + ## Sentiment analysis Another area where machine learning can work very well is sentiment analysis. A non-ML approach to sentiment is to identify words and phrases which are 'positive' and 'negative'. Then, given a new piece of text, calculate the total value of the positive, negative and neutral words to identify the overall sentiment. @@ -72,6 +69,7 @@ The ML approach would be to hand gather negative and positive bodies of text - t > One way to achieve that is to use Machine Learning. You would train the model with a portion of the *against* emails and a portion of the *for* emails. The model would tend to associate phrases and words with the against side and the for side, *but it would not understand any of the content*, only that certain words and patterns were more likely to appear in an *against* or a *for* email. You could test it with some emails that you had not used to train the model, and see if it came to the same conclusion as you did. Then, once you were happy with the accuracy of the model, you could process future emails without having to read each one. โœ… Does this process sound like processes you have used in previous lessons? + ### Task: Sentimental Sentences Sentiment is measured in with a *polarity* of -1 to 1, meaning -1 is the most negative sentiment, and 1 is the most positive. Sentiment is also measured with an 0 - 1 score for objectivity (0) and subjectivity (1). diff --git a/TimeSeries/1-Introduction/README.md b/TimeSeries/1-Introduction/README.md index 9380dabb..e2aab22e 100644 --- a/TimeSeries/1-Introduction/README.md +++ b/TimeSeries/1-Introduction/README.md @@ -1,13 +1,14 @@ # Introduction to Time Series Forecasting -[![Introduction to Time Series Forecasting](https://img.youtube.com/vi/wGUV_XqchbE/0.jpg)](https://youtu.be/wGUV_XqchbE "Introduction to Time Series Forecasting") -## [Pre-lecture quiz](link-to-quiz-app) +In this lesson and the following one, you will learn a bit about Time Series Forecasting, an interesting and valuable part of a ML scientist's repertoire that is a bit lesser known than other topics. Time Series Forecasting is a sort of crystal ball: based on past performance of a variable such as price, you can predict its future potential value. -In this lesson and the following one, you will learn a bit about Time Series Forecasting, an interesting and valuable part of a ML scientist's repertoire that is a bit lesser known than other topics. Time Series Forecasting is a sort of crystal ball: based on past performance of a variable such as price, you can predict its future potential value. +[![Introduction to Time Series Forecasting](https://img.youtube.com/vi/wGUV_XqchbE/0.jpg)](https://youtu.be/wGUV_XqchbE "Introduction to Time Series Forecasting") +## [Pre-lecture quiz](link-to-quiz-app) It's a powerful and interesting field especially in business, given its direct application to problems of value, pricing, inventory, and supply chain issues. While deep learning techniques have started to be used to gain more insights in the prediction of future performance, Time Series Forecasting remains a field greatly informed by classic ML techniques. > Penn State's useful Time Series curriculum can be found [here](https://online.stat.psu.edu/stat510/lesson/1) + ### Introduction Supposing you maintain an array of smart parking meters that provide data about how often they are used and for how long over time. What if you could generate revenue to maintain your streets by slightly augmenting the prices of the meters when there is greater demand for them? What if you could predict, based on the meter's past performance, its future value according to the laws of supply and demand? This is a challenge that could be tackled by Time Series Forecasting. It wouldn't make those folks in search of a rare parking spot in busy times very happy to have to pay more for it, but it would be a sure way to generate revenue to clean the streets! @@ -19,14 +20,17 @@ Before starting, however, it's useful to understand what's going on behind the s ## Some Definitions When encountering the term 'time series' you need to understand its use in several different contexts. + ### Time Series In mathematics, "a time series is a series of data points indexed (or listed or graphed) in time order. Most commonly, a time series is a sequence taken at successive equally spaced points in time." An example of a time series is the daily closing value of the [Dow Jones Industrial Average](https://wikipedia.org/wiki/Time_series). The use of time series plots and statistical modeling is frequently encountered in signal processing, weather forecasting, earthquake prediction, and other fields where events occur and data points can be plotted over time. + ### Time Series Analysis Time Series Analysis is the analysis of the above mentioned time series data. Time series data can take distinct forms, including 'interrupted time series' which detects patterns in a time series' evolution before and after an interrupting event. The type of analysis needed for the time series depends on the nature of the data. Time series data itself can take the form of series of numbers or characters. The analysis be performed using a variety of methods, including frequency-domain and time-domain, linear and nonlinear, and more. [Learn more](https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm) about the may ways to analyze this type of data. + ### Time Series Forecasting Time Series Forecasting is the use of a model to predict future values based on patterns displayed by previously gathered data as it occurred in the past. While it is possible to use regression models to explore time series data, with time indices as x variables on a plot, this type of data is best analyzed using special types of models. @@ -55,11 +59,12 @@ In the next lesson, you will build an ARIMA model using [Univariate Time Series] | 330.97 | 1975.96 | 1975 | 12 | โœ… Identify the variable that changes over time in this dataset + ## Time Series [data characteristics](https://online.stat.psu.edu/stat510/lesson/1/1.1) to consider When looking at time series data, you might notice that it has certain characteristics that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques. ### Trends -Measurable increases and decreases over time +Measurable increases and decreases over time. [Read more](https://machinelearningmastery.com/time-series-trends-in-python) about how to use and, if necessary, remove trends from your time series. ### [Seasonality](https://machinelearningmastery.com/time-series-seasonality-with-python/) Periodic fluctuations, such as holiday rushes that might affect sales, for example. [Take a look](https://itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm) at how different types of plots display seasonality in data. ### Outliers diff --git a/TimeSeries/2-ARIMA/README.md b/TimeSeries/2-ARIMA/README.md index aebdaaed..2bcd28af 100644 --- a/TimeSeries/2-ARIMA/README.md +++ b/TimeSeries/2-ARIMA/README.md @@ -1,15 +1,17 @@ # Time Series Forecasting with ARIMA +In the previous lesson, you learned a bit about Time Series Forecasting and loaded a dataset showing the fluctuations of electrical load over a time period. + [![Introduction to ARIMA](https://img.youtube.com/vi/IUSk-YDau10/0.jpg)](https://youtu.be/IUSk-YDau10 "Introduction to ARIMA") > A brief introduction to ARIMA models. The example is done in R, but the concepts are universal. ## [Pre-lecture quiz](link-to-quiz-app) -In the previous lesson, you learned a bit about Time Series Forecasting and loaded a dataset showing the fluctuations of electrical load over a time period. In this lesson, you will discover a specific way to build models with [ARIMA: *A*uto*R*egressive *I*ntegrated *M*oving *A*verage](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average). ARIMA models are particularly suited to fit data that shows [non-stationarity](https://wikipedia.org/wiki/Stationary_process). +In this lesson, you will discover a specific way to build models with [ARIMA: *A*uto*R*egressive *I*ntegrated *M*oving *A*verage](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average). ARIMA models are particularly suited to fit data that shows [non-stationarity](https://wikipedia.org/wiki/Stationary_process). > ๐ŸŽ“ Stationarity, from a statistical context, refers to data whose distribution does not change when shifted in time. Non-stationary data, then, shows fluctuations due to trends that must be transformed to be analyzed. Seasonality, for example, can introduce fluctuations in data and can be eliminated by a process of 'seasonal-differencing'. -> ๐ŸŽ“ [Differencing](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average#Differencing) data, again from a statistical context, refers to the process of transforming non-stationary data to make it stationary by removing its non-constant trend. "Differencing removes the changes in the level of a time series, eliminating trend and seasonality and consequently stabilizing the mean of the time series."[Paper by Shixiong et al](https://arxiv.org/abs/1904.07632) +> ๐ŸŽ“ [Differencing](https://wikipedia.org/wiki/Autoregressive_integrated_moving_average#Differencing) data, again from a statistical context, refers to the process of transforming non-stationary data to make it stationary by removing its non-constant trend. "Differencing removes the changes in the level of a time series, eliminating trend and seasonality and consequently stabilizing the mean of the time series." [Paper by Shixiong et al](https://arxiv.org/abs/1904.07632) Let's unpack the parts of ARIMA to better understand how it helps us model Time Series and help us make predictions against it. ## AR - for AutoRegressive @@ -184,7 +186,7 @@ results = model.fit() print(results.summary()) ``` -TODO: Explain these results and show residuals +A table of results is printed. You've built your first model! Now we need to find a way to evaluate it. @@ -345,11 +347,13 @@ plt.show() ``` A very nice plot, showing a model with good accuracy. Well done! + ## ๐Ÿš€Challenge Dig into the ways to test the accuracy of a Time Series Model. We touch on MAPE in this lesson, but are there other methods you could use? Research them and annotate them. A helpful document can be found [here](https://otexts.com/fpp2/accuracy.html) ## [Post-lecture quiz](link-to-quiz-app) + ## Review & Self Study This lesson touches on only the basics of Time Series Forecasting with ARIMA. Take some time to deepen your knowledge by digging into [this repository](https://microsoft.github.io/forecasting/) and its various model types to learn other ways to build Time Series models. diff --git a/TimeSeries/2-ARIMA/solution/notebook.ipynb b/TimeSeries/2-ARIMA/solution/notebook.ipynb index a201d67f..62ebccaa 100644 --- a/TimeSeries/2-ARIMA/solution/notebook.ipynb +++ b/TimeSeries/2-ARIMA/solution/notebook.ipynb @@ -354,13 +354,7 @@ "print(results.summary())\n" ] }, - { - "cell_type": "markdown", - "metadata": {}, - "source": [ - "Next we display the distribution of residuals. A zero mean in the residuals may indicate that there is no bias in the prediction. " - ] - }, + { "cell_type": "markdown", "metadata": {}, @@ -747,4 +741,4 @@ }, "nbformat": 4, "nbformat_minor": 2 -} \ No newline at end of file +} diff --git a/Web-App/1-Web-App/assignment.md b/Web-App/1-Web-App/assignment.md index bee297f7..00dbee8a 100644 --- a/Web-App/1-Web-App/assignment.md +++ b/Web-App/1-Web-App/assignment.md @@ -8,4 +8,4 @@ Now that you have built one web app using a trained Regression model, use one of | Criteria | Exemplary | Adequate | Needs Improvement | | -------------------------- | --------------------------------------------------------- | --------------------------------------------------------- | -------------------------------------- | -| A new web app is presented | The web app runs as expected and is deployed to the cloud | The web app contains flaws or exhibits unexpected results | The web app does not function properly | +| | The web app runs as expected and is deployed to the cloud | The web app contains flaws or exhibits unexpected results | The web app does not function properly |