NLP!

4 years ago · e71506d8ea
parent 61205016e7
commit e71506d8ea
20 changed files with 161 additions and 170 deletions
--- a/NLP/1-Introduction-to-NLP/README.md
+++ b/NLP/1-Introduction-to-NLP/README.md
@ -0,0 +1,127 @@
+# Introduction to Natural Language Processing
+
+Add a sketchnote if possible/appropriate
+
+TODO: We need a good video here!
+
+## [Pre-lecture quiz](link-to-quiz-app)
+
+## Introduction
+
+This lesson covers a brief history and important concepts of *Computational Linguistics* focusing on *Natural Language Processing*. NLP, as it is commonly known, is one of the best-known areas where machine learning has been applied and used in production software.
+
+✅ Can you think of software that you use every day that probably has some NLP embedded? What about your word processing programs or mobile apps that you use regularly?
+
+You will learn about how the ideas about languages developed and what the major areas of study have been. You will also learn definitions and concepts about how computers process text, including parsing, grammar, and identifying nouns and verbs. There are some coding tasks in this lesson, and several important concepts are introduced that you will learn to code later on in the next lessons. 
+
+Computational linguistics is an area of research and development over many decades that studies how computers can work with, and even understand, translate, and communicate with languages. Natural Language Processing (NLP) is a related field focused on how computers can process 'natural', or human, languages. If you have ever dictated to your phone instead of typing, or asked a virtual assistant a question, your speech was converted into a text form and then processed or *parsed* from the language you spoke into a format that the phone or assistant could understand and act on. 
+
+This is possible because a coder wrote a program to do this. A few decades ago, some science fiction writers predicted that people would mostly speak to their computers, and the computers would always understand exactly what they meant. Sadly, it turned out to be a harder problem that many imagined, and while it is a much better understood problem today, there are significant challenges in achieving 'perfect' natural language processing when it comes to understanding the meaning of a sentence. This is a particularly hard problem when it comes to understanding humour or detecting emotions such as sarcasm in a sentence.
+
+At this point, you may be remembering school classes where the teacher covered the parts of grammar in a sentence. In some countries, students are taught grammar and linguistics as a dedicated subject, but in many, these topics are included as part of learning a language: either your first language in primary school (learning to read and write) and perhaps a second language in post-primary, or high school. Don't  worry if you are not an expert at differentiating nouns from verbs, or adverbs from adjectives! 
+
+If you struggle with the difference between the *simple present* and *present progressive*, you are not alone. This is a challenging thing for many people, even native speakers of a language. The good news is that computers are really good at applying formal rules, and you will learn to write code that can *parse* a sentence as well as a human. The greater challenge you will examine later is understanding the *meaning*, and *sentiment*, of a sentence.
+
+## Prerequisites
+
+For this lesson, the main prerequisite is being able to read and understand the language of this lesson. There are no math problems or equations to solve. While the original author wrote this lesson in English, it is also translated into other languages, so you could be reading a translation. There are examples where a number of different languages are used (to compare the different grammar rules of different languages). These are *not* translated, but the explanatory text is, so the meaning should be clear.
+
+For the coding tasks, you will use Python and the examples are using Python 3.8.
+
+In this section, you will need:
+* Python 3 programming language comprehension
+  * this lesson uses input, loops, file reading, arrays
+* Visual Studio Code with its Python extension
+  * (*or the Python IDE of your choice*)
+* [TextBlob](https://github.com/sloria/TextBlob) a simplified text processing library for Python
+  * Follow the instructions on the TextBlob site to install it on your system (install the corpora as well, as shown below)
+    ```bash
+    pip install -U textblob
+    python -m textblob.download_corpora
+    ```
+
+> 💡 Tip: You can run Python directly in VS Code environments. Check the [docs](https://code.visualstudio.com/docs/languages/python?WT.mc_id=academic-15963-cxa) for more information.
+
+## Conversing with Eliza
+
+The history of trying to make computers understand human language goes back decades, and one of the earliest scientists to consider natural language processing was *Alan Turing*. When Turing was researching *Artificial Intelligence* in the 1950's, he considered if a conversational test could be given to a human and computer (via typed correspondence) where the human in the conversation was not sure if they were conversing with another human or a computer. If, after a certain length of conversation, the human could not determine that the answers were from a computer or not, then could the computer be said to be *thinking*? 
+
+The idea for this came from a party game called *The Imitation Game* where an interrogator is alone in a room and tasked with determining which of two people (in another room) are male and female respectively. The interrogator can send notes, and must try to think of questions where the written answers reveal the gender of the mystery person. Of course, the players in the other room are trying to trick the interrogator by answering questions in such as way as to mislead or confuse the interrogator, whilst also giving the appearance of answering honestly.
+
+In the 1960's an MIT scientist called *Joseph Weizenbaum* developed [*Eliza*](https://en.wikipedia.org/wiki/ELIZA), a computer 'therapist' that would ask the human questions and give the appearance of understanding their answers. However, while Eliza could parse a sentence and identify certain grammatical constructs and keywords so as to give a reasonable answer, it could not be said to *understand* the sentence. If Eliza was presented with a sentence following the format "**I am** <u>sad</u>" it might rearrange and substitute words in the sentence to form the response "How long have **you been** <u>sad</u>". This gave the impression that Eliza understood the statement and was asking a follow-on question, whereas in reality, it was changing the tense and adding some words. If Eliza could not identify a keyword that it had a response for, it would instead give a random response that could be applicable to many different statements. Eliza could be easily tricked, for instance if a user wrote "**You are** a <u>bicycle</u>" it might respond with "How long have **I been** a <u>bicycle</u>?", instead of a more reasoned response. 
+
+> Note: You can read the original description of [Eliza](https://cacm.acm.org/magazines/1966/1/13317-elizaa-computer-program-for-the-study-of-natural-language-communication-between-man-and-machine/abstract) published in 1966 if you have an ACM account. Alternately, read about Eliza on [wikipedia](https://en.wikipedia.org/wiki/ELIZA)
+
+### Task: Coding a simple conversational bot
+
+A conversational bot, like Eliza, is a program that elicits user input and seems to understand and respond intelligently. Unlike Eliza, our bot will not have several rules giving it the appearance of having an intelligent conversation. Instead, out bot will have one ability only, to keep the conversation going with random responses that might work in just about any trivial conversation.
+
+Your steps when building a conversational bot:
+
+1. Print instructions advising the user how to interact with the bot
+2. Start a loop
+   1. Accept user input
+   2. If user has asked to exit, then exit
+   3. Process user input and determine response (in this case, the response is a random choice from a list of possible generic responses)
+   4. Print response and 
+3. loop back to step 2
+
+Create this bot yourself in Python with the following random responses:
+
+```python
+random_responses = ["That is quite interesting, please tell me more.",
+                    "I see. Do go on.",
+                    "Why do you say that?",
+                    "Funny weather we've been having, isn't it?",
+                    "Let's change the subject.",
+                    "Did you catch the game last night?"]
+```
+
+Here is some sample output to guide you (user input is on the lines starting with >):
+
+```
+Hello, I am Marvin, the simple robot.
+You can end this conversation at any time by typing 'bye'
+After typing each answer, press 'enter'
+How are you today?
+> I am good thanks
+That is quite interesting, please tell me more.
+> today I went for a walk     
+Did you catch the game last night?
+> I did, but my team lost
+Funny weather we've been having, isn't it?
+> yes but I hope next week is better
+Let's change the subject.
+> ok, lets talk about music
+Why do you say that?
+> because I like music!
+Why do you say that?
+> bye
+It was nice talking to you, goodbye!
+```
+
+One possible solution to the task is [here](solutions/bot.py)
+
+✅ Stop and consider
+1. Do you think the random responses would 'trick' someone into thinking that the bot actually understood them?
+2. What features would the bot need to be more effective?
+3. If a bot could really 'understand' the meaning of a sentence, would it need to 'remember' the meaning of previous sentences in a conversation too?
+
+## 🚀Challenge
+
+Choose one of the "stop and consider" elements above and either try to implement them in code or write a solution on paper using pseudocode.
+
+In the next lesson, you'll learn about a number of other approaches to parsing natural language and machine learning.
+
+### [Post-lecture quiz](link-to-quiz-app)
+
+## Review & Self Study
+
+Take a look at the references below as further reading opportunities.
+
+### References
+
+1. Schubert, Lenhart, "Computational Linguistics", *The Stanford Encyclopedia of Philosophy* (Spring 2020 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/spr2020/entries/computational-linguistics/>.
+2. Princeton University "About WordNet." [WordNet](https://wordnet.princeton.edu/). Princeton University. 2010. 
+
+**Assignment**: [Make a Bot talk back](assignment.md)
--- a/NLP/1-Introduction-to-NLP/assignment.md
+++ b/NLP/1-Introduction-to-NLP/assignment.md
@ -0,0 +1,11 @@
+# Make a Bot talk back
+
+## Instructions
+
+In this lesson, you programmed a basic bot with whom to chat. This bot gives random answers until you say 'bye'. Can you make the answers a little less random, and trigger answers if you say specific things, like 'why' or 'how'? Think a bit how machine learning might make this type of work less manual as you extend your bot.
+
+## Rubric
+
+| Criteria | Exemplary                                     | Adequate                                         | Needs Improvement       |
+| -------- | --------------------------------------------- | ------------------------------------------------ | ----------------------- |
+|          | A new bot.py file is presented and documented | A new bot file is presented but it contains bugs | A file is not presented |
--- a/NLP/1-Introduction/solutions/lesson1_task1.py
+++ b/NLP/1-Introduction/solutions/lesson1_task1.py
--- a/NLP/1-Introduction-to-NLP/translations/README.es.md
+++ b/NLP/1-Introduction-to-NLP/translations/README.es.md
--- a/NLP/1-Introduction/README.md
+++ b/NLP/1-Introduction/README.md
@ -1,4 +1,4 @@
-# Natural Language Processing
+# Common Natural Language Processing Tasks and Techniques

 Add a sketchnote if possible/appropriate

@ -6,100 +6,6 @@ Add a sketchnote if possible/appropriate

 ## [Pre-lecture quiz](link-to-quiz-app)

-## Introduction
-
-This section covers a brief history and important concepts of *Computational Linguistics* focusing on *Natural Language Processing*. 
-You will learn about how the ideas about languages developed and what the major areas of study have been. 
-You will also learn definitions and concepts about how computers process text, including parsing, grammar, and identifying nouns and verbs. There are some coding tasks in this lesson, and several important concepts are introduced that you will learn to code later on in the next lessons. 
-
-Computational linguistics is an area of research and development over many decades that studies how computers can work with, and even understand, translate, and communicate with languages. Natural Language Processing (NLP) is a related field focused on how computers can process 'natural', or human, languages. If you have ever dictated to your phone instead of typing, or asked a virtual assistant a question, your speech was converted into a text form and then processed or *parsed* from the language you spoke into a format that the phone or assistant could understand and act on. This is possible because a coder wrote a program to do this. A few decades ago, some science fiction writers predicted that people would mostly speak to their computers, and the computers would always understand exactly what they meant. Sadly, it turned out to be a harder problem that many imagined, and while it is a much better understood problem today, there are significant challenges in achieving 'perfect' natural language processing when it comes to understanding the meaning of a sentence. This is a particularly hard problem when it comes to understanding humour, or detecting emotions such as sarcasm in a sentence.
-
-At this point, you may be remembering school classes where the teacher covered the parts of grammar in a sentence. In some countries, students are taught grammar and linguistics as a dedicated subject, but in many, these topics are included as part of learning a language: either your first language in primary school (learning to read and write) and perhaps a second language in post-primary, or high school. Don't  worry if you are not an expert at differentiating nouns from verbs, or adverbs from adjectives! If you struggle with the difference between the *simple present* and *present progressive*, you are not alone. This is a challenging thing for many people, even native speakers of a language. The good news is that computers are really good at applying formal rules, and you will learn to write code that can *parse* a sentence as well as a human. The greater challenge you will examine later is understanding the *meaning*, and *sentiment*, of a sentence.
-
-## Prerequisites
-
-For this lesson, the main prerequisite is being able to read and understand the language of this lesson. There are no maths or equations to handle. While I happen to be writing this lesson in English, it is also translated into other languages, so you could be reading a translation. There are examples where a number of different languages are used (to compare the different grammar rules of different languages). These are *not* translated, but the explanatory text is, so the meaning should be clear.
-
-For the coding tasks, you will use Python and the examples are using Python 3.8.
-
-In this section, you will need:
-* Python 3 programming language 
-  * this lesson uses input, loops, file reading, arrays
-* Visual Studio Code & the Python extension
-  * (*or the Python IDE of your choice*)
-* [TextBlob](https://github.com/sloria/TextBlob) a simplified text processing library for Python
-  * Follow the instructions on the TextBlob site to install it on your system (install the corpora as well as shown below)
-    ```bash
-    pip install -U textblob
-    python -m textblob.download_corpora
-    ```
-
-## Conversing with Eliza
-
-The history of trying to make computers understand human language goes back decades, and one of the earliest scientists to consider natural language processing was *Alan Turing*. When Turing was researching *Artificial Intelligence* in the 1950's, he considered if a conversational test could be given to a human and computer (via typed correspondence) where the human in the conversation was not sure if they were conversing with another human or a computer. If, after a certain length of conversation, the human could not determine that the answers were from a computer or not, then could the computer be said to be *thinking*? The idea for this came from a party game called *The Imitation Game* where an interrogator is alone in a room and tasked with determining which of two people (in another room) are male and female respectively. The interrogator can send notes, and must try to think of questions where the written answers reveal the gender of the mystery person. Of course, the players in the other room are trying to trick the interrogator by answering questions in such as way as to mislead or confuse the interrogator, whilst also giving the appearance of answering honestly.
-
-In the 1960's an MIT scientist called *Joseph Weizenbaum* developed [*Eliza*](https://en.wikipedia.org/wiki/ELIZA), a computer 'therapist' that would ask the human questions and give the appearance of understanding their answers. However, while Eliza could parse a sentence and identify certain grammatical constructs and keywords so as to give a reasonable answer, it could not be said to *understand* the sentence. If Eliza was presented with a sentence following the format "**I am** <u>sad</u>" it might rearrange and substitute words in the sentence to form the response "How long have **you been** <u>sad</u>". This gave the impression that Eliza understood the statement and was asking a follow-on question, whereas in reality, it was changing the tense and adding some words. If Eliza could not identify a keyword that it had a response for, it would instead give a random response that could be applicable to many different statements. Eliza could be easily tricked, for instance if a user wrote "**You are** a <u>bicycle</u>" it might respond with "How long have **I been** a <u>bicycle</u>?", instead of a more reasoned response. 
-
-> Note: You can read the original description of [Eliza](https://cacm.acm.org/magazines/1966/1/13317-elizaa-computer-program-for-the-study-of-natural-language-communication-between-man-and-machine/abstract) published in 1966 if you have an ACM account
-
-### Task: Coding a simple conversational bot
-
-A conversational bot, like Eliza, is a program that elicits user input and seems to understand and respond intelligently. Unlike Eliza, our bot will not have several rules giving it the appearance of having an intelligent conversation. Instead, out bot will have one ability only, to keep the conversation going with random responses that might work in just about any trivial conversation.
-
-Your steps when building a conversational bot:
-
-1. Print instructions advising the user how to interact with the bot
-2. Start a loop
-   1. Accept user input
-   2. If user has asked to exit, then exit
-   3. Process user input and determine response (in this case, the response is a random choice from a list of possible generic responses)
-   4. Print response and 
-3. loop back to step 2
-
-If you have some Python coding skills, attempt to write this bot yourself with the following random responses:
-
-```python
-random_responses = ["That is quite interesting, please tell me more.",
-                    "I see. Do go on.",
-                    "Why do you say that?",
-                    "Funny weather we've been having, isn't it?",
-                    "Let's change the subject.",
-                    "Did you catch the game last night?"]
-```
-
-Here is some sample output to guide you (user input is on the lines with starting with >):
-
-```
-Hello, I am Marvin, the simple robot.
-You can end this conversation at any time by typing 'bye'
-After typing each answer, press 'enter'
-How are you today?
-> I am good thanks
-That is quite interesting, please tell me more.
-> today I went for a walk     
-Did you catch the game last night?
-> I did, but my team lost
-Funny weather we've been having, isn't it?
-> yes but I hope next week is better
-Let's change the subject.
-> ok, lets talk about music
-Why do you say that?
-> because I like music!
-Why do you say that?
-> bye
-It was nice talking to you, goodbye!
-```
-
-One possible solution to the task is [here](solutions/lesson1_task1.py)
-
-✅ Knowledge Check
-1. Do you think the random responses would 'trick' someone into thinking that the bot actually understood them?
-2. What features would the bot need to be more effective?
-3. If a bot could really 'understand' the meaning of a sentence, would it need to 'remember' the meaning of previous sentences in a conversation too?
-NLP Techniques
-
-In the next lesson, you'll learn about a number of other approaches to parsing natural language and machine learning, but there are a few concepts you should know here first.
-
 For most *Natural Language Processing* tasks, the text to be processed must be broken down, examined, and the results stored or cross referenced with rules and data sets. This allows the programmer to derive the meaning or intent or even just the frequency of terms and words in a text. Here are a list of common techniques used in processing text. You should know these are because they are combined with machine learning techniques to analyse large amounts of text efficiently. In the next lesson, you'll learn how to code some of these.

 ### Tokenization
@ -336,9 +242,14 @@ Here is a sample [solution](solutions/lesson1_task3.py).

 🚀 Challenge: Can you make Marvin even better by extracting other features from the user input?

-### [Post-lesson quiz](link-to-quiz-app)
+## 🚀Challenge
+
+Add a challenge for students to work on collaboratively in class to enhance the project
+
+Optional: add a screenshot of the completed lesson's UI if appropriate
+
+## [Post-lecture quiz](link-to-quiz-app)

-### References
+## Review & Self Study

-1. Schubert, Lenhart, "Computational Linguistics", *The Stanford Encyclopedia of Philosophy* (Spring 2020 Edition), Edward N. Zalta (ed.), URL = <https://plato.stanford.edu/archives/spr2020/entries/computational-linguistics/>.
-2. Princeton University "About WordNet." [WordNet](https://wordnet.princeton.edu/). Princeton University. 2010. 
+**Assignment**: [Assignment Name](assignment.md)
--- a/NLP/1-Introduction/assignment.md
+++ b/NLP/1-Introduction/assignment.md
--- a/NLP/1-Introduction/solutions/lesson1_task2.py
+++ b/NLP/1-Introduction/solutions/lesson1_task2.py
--- a/NLP/1-Introduction/solutions/lesson1_task3.py
+++ b/NLP/1-Introduction/solutions/lesson1_task3.py
--- a/NLP/2-Algorithms/translations/README.es.md
+++ b/NLP/2-Algorithms/translations/README.es.md
--- a/NLP/3-Hotel-Reviews-1/README.md
+++ b/NLP/3-Hotel-Reviews-1/README.md
--- a/NLP/3-Hotel-Reviews-1/assignment.md
+++ b/NLP/3-Hotel-Reviews-1/assignment.md
--- a/NLP/3-Hotel-Reviews-1/translations/README.es.md
+++ b/NLP/3-Hotel-Reviews-1/translations/README.es.md
--- a/NLP/4-Bot/README.md
+++ b/NLP/4-Bot/README.md
@ -1,55 +0,0 @@
-# [Lesson Topic]
-
-Add a sketchnote if possible/appropriate
-
-![Embed a video here if available](video-url)
-
-## [Pre-lecture quiz](link-to-quiz-app)
-
-Describe what we will learn
-
-### Introduction
-
-Describe what will be covered
-
-> Notes
-
-### Prerequisite
-
-What steps should have been covered before this lesson?
-
-### Preparation
-
-Preparatory steps to start this lesson
-
---
-
-[Step through content in blocks]
-
-## [Topic 1]
-
-### Task:
-
-Work together to progressively enhance your codebase to build the project with shared code:
-
-```html
-code blocks
-```
-
-✅ Knowledge Check - use this moment to stretch students' knowledge with open questions
-
-## [Topic 2]
-
-## [Topic 3]
-
-## 🚀Challenge
-
-Add a challenge for students to work on collaboratively in class to enhance the project
-
-Optional: add a screenshot of the completed lesson's UI if appropriate
-
-## [Post-lecture quiz](link-to-quiz-app)
-
-## Review & Self Study
-
-**Assignment**: [Assignment Name](assignment.md)
--- a/NLP/4-Bot/assignment.md
+++ b/NLP/4-Bot/assignment.md
@ -1,9 +0,0 @@
-# [Assignment Name]
-
-## Instructions
-
-## Rubric
-
-| Criteria | Exemplary | Adequate | Needs Improvement |
-| -------- | --------- | -------- | ----------------- |
-|          |           |          |                   |
--- a/NLP/4-Hotel-Reviews-2/README.md
+++ b/NLP/4-Hotel-Reviews-2/README.md
--- a/NLP/4-Hotel-Reviews-2/assignment.md
+++ b/NLP/4-Hotel-Reviews-2/assignment.md
--- a/NLP/4-Hotel-Reviews-2/translations/README.es.md
+++ b/NLP/4-Hotel-Reviews-2/translations/README.es.md
--- a/NLP/README.md
+++ b/NLP/README.md
@ -1,12 +1,18 @@
-# Getting Started with 
+# Getting Started with Natural Language Processing 

-In this section of the curriculum, you will be introduced to ...
+In this section of the curriculum, you will be introduced to one of the most widespread uses of machine learning: Natural Language Processing. Derived from Computational Linguistics, NLP informs models that you use every day to communicate via voice and text with machines. In these lessons we'll learn the basics of NLP by building small conversational bots to learn how Machine Learning aids in making these conversations more and more 'smart'. You'll travel back in time, chatting with Elizabeth Bennett and Mr. Darcy from Jane Austen's classic novel, **Pride and Prejudice**, published in 1813. Then, you'll further your knowledge by learning about sentiment analysis via hotel reviews in Europe.

+![Pride and Prejudice book and tea](images/p&p.jpg)
+> Photo by <a href="https://unsplash.com/@elaineh?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Elaine Howlin</a> on <a href="https://unsplash.com/s/photos/pride-and-prejudice?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
+  
 ## Lessons

-1. [Introduction to](1-intro-to/README.md)
+1. [Introduction to](1-Introduction-to-NLP/README.md)
+2. [NLP Tasks](2-NLP-Tasks/README.md)
+TBD
+TBD


-## Credits
+## Credits 

-"Introduction to" was written with ♥️ by [Name](Twitter)
+These Natural Language Processing lessons were written with ☕ by [Stephen Howell]([Twitter](https://twitter.com/Howell_MSFT))
--- a/NLP/images/p&p.jpg
+++ b/NLP/images/p&p.jpg
--- a/README.md
+++ b/README.md
@ -50,14 +50,14 @@ By ensuring that the content aligns with projects, the process is made more enga

 - optional sketchnote
 - optional supplemental video
- pre-lesson warmup quiz
+- pre-lecture warmup quiz
 - written lesson
 - for project-based lessons, step-by-step guides on how to build the project
 - knowledge checks
 - a challenge
 - supplemental reading
 - assignment
- post-lesson quiz
+- post-lecture quiz

 > **A note about quizzes**: All quizzes are contained [in this app](https://jolly-sea-0a877260f.azurestaticapps.net), for 48 total quizzes of three questions each. They are linked from within the lessons but the quiz app can be run locally; follow the instruction in the `quiz-app` folder.