NLP, as it is commonly known, is one of the best-known areas where machine learning has been applied and used in production software.
众所周知,自然语言处理(Natural Language Processing, NLP)是机器学习在生产软件中应用最广泛的领域之一。
✅ Can you think of software that you use every day that probably has some NLP embedded? What about your word processing programs or mobile apps that you use regularly?
- **The idea of languages**. How languages developed and what the major areas of study have been.
- **Definition and concepts**. You will also learn definitions and concepts about how computers process text, including parsing, grammar, and identifying nouns and verbs. There are some coding tasks in this lesson, and several important concepts are introduced that you will learn to code later on in the next lessons.
Computational linguistics is an area of research and development over many decades that studies how computers can work with, and even understand, translate, and communicate with languages. natural language processing (NLP) is a related field focused on how computers can process 'natural', or human, languages.
If you have ever dictated to your phone instead of typing or asked a virtual assistant a question, your speech was converted into a text form and then processed or *parsed* from the language you spoke. The detected keywords were then processed into a format that the phone or assistant could understand and act on.
This is possible because someone wrote a computer program to do this. A few decades ago, some science fiction writers predicted that people would mostly speak to their computers, and the computers would always understand exactly what they meant. Sadly, it turned out to be a harder problem that many imagined, and while it is a much better understood problem today, there are significant challenges in achieving 'perfect' natural language processing when it comes to understanding the meaning of a sentence. This is a particularly hard problem when it comes to understanding humour or detecting emotions such as sarcasm in a sentence.
At this point, you may be remembering school classes where the teacher covered the parts of grammar in a sentence. In some countries, students are taught grammar and linguistics as a dedicated subject, but in many, these topics are included as part of learning a language: either your first language in primary school (learning to read and write) and perhaps a second language in post-primary, or high school. Don't worry if you are not an expert at differentiating nouns from verbs or adverbs from adjectives!
If you struggle with the difference between the *simple present* and *present progressive*, you are not alone. This is a challenging thing for many people, even native speakers of a language. The good news is that computers are really good at applying formal rules, and you will learn to write code that can *parse* a sentence as well as a human. The greater challenge you will examine later is understanding the *meaning*, and *sentiment*, of a sentence.
For this lesson, the main prerequisite is being able to read and understand the language of this lesson. There are no math problems or equations to solve. While the original author wrote this lesson in English, it is also translated into other languages, so you could be reading a translation. There are examples where a number of different languages are used (to compare the different grammar rules of different languages). These are *not* translated, but the explanatory text is, so the meaning should be clear.
For the coding tasks, you will use Python and the examples are using Python 3.8.
编程任务中,你将会使用Python语言,示例使用的是Python 3.8版本。
In this section, you will need, and use:
在本节中你将需要并使用:
- **Python 3 comprehension**. Programming language comprehension in Python 3, this lesson uses input, loops, file reading, arrays.
- **Visual Studio Code + extension**. We will use Visual Studio Code and its Python extension. You can also use a Python IDE of your choice.
- **TextBlob**. [TextBlob](https://github.com/sloria/TextBlob) is a simplified text processing library for Python. Follow the instructions on the TextBlob site to install it on your system (install the corpora as well, as shown below):
@ -73,66 +45,40 @@ In this section, you will need, and use:
python -m textblob.download_corpora
```
> 💡 Tip: You can run Python directly in VS Code environments. Check the [docs](https://code.visualstudio.com/docs/languages/python?WT.mc_id=academic-15963-cxa) for more information.
> 💡 提示:可以在 VS Code 环境中直接运行 Python。 点击[docs](https://code.visualstudio.com/docs/languages/python?WT.mc_id=academic-15963-cxa)查看更多信息。
## Talking to machines
## 与机器对话
The history of trying to make computers understand human language goes back decades, and one of the earliest scientists to consider natural language processing was *Alan Turing*.
When Turing was researching *artificial intelligence* in the 1950's, he considered if a conversational test could be given to a human and computer (via typed correspondence) where the human in the conversation was not sure if they were conversing with another human or a computer.
If, after a certain length of conversation, the human could not determine that the answers were from a computer or not, then could the computer be said to be *thinking*?
如果经过一定时间的交谈,人类无法确定答案是否来自计算机,那么是否可以说计算机正在“思考”?
### The inspiration - 'the imitation game'
### 灵感 - “模仿游戏”
The idea for this came from a party game called *The Imitation Game* where an interrogator is alone in a room and tasked with determining which of two people (in another room) are male and female respectively. The interrogator can send notes, and must try to think of questions where the written answers reveal the gender of the mystery person. Of course, the players in the other room are trying to trick the interrogator by answering questions in such as way as to mislead or confuse the interrogator, whilst also giving the appearance of answering honestly.
In the 1960's an MIT scientist called *Joseph Weizenbaum* developed [*Eliza*](https:/wikipedia.org/wiki/ELIZA), a computer 'therapist' that would ask the human questions and give the appearance of understanding their answers. However, while Eliza could parse a sentence and identify certain grammatical constructs and keywords so as to give a reasonable answer, it could not be said to *understand* the sentence. If Eliza was presented with a sentence following the format "**I am** <u>sad</u>" it might rearrange and substitute words in the sentence to form the response "How long have **you been**<u>sad</u>".
在 1960 年代,一位名叫 *Joseph Weizenbaum* 的麻省理工学院科学家开发了[*Eliza*](https:/wikipedia.org/wiki/ELIZA),Eliza是一位计算机“治疗师”,它可以向人类提出问题并表现出理解他们的答案。然而,虽然 Eliza 可以解析句子并识别某些语法结构和关键字以给出合理的答案,但不能说它*理解*了句子。如果 Eliza 看到的句子格式为“**I am** <u>sad</u>”,它可能会重新排列并替换句子中的单词以形成响应“How long have ** you been** <u>sad</u>"。
This gave the impression that Eliza understood the statement and was asking a follow-on question, whereas in reality, it was changing the tense and adding some words. If Eliza could not identify a keyword that it had a response for, it would instead give a random response that could be applicable to many different statements. Eliza could be easily tricked, for instance if a user wrote "**You are** a <u>bicycle</u>" it might respond with "How long have **I been** a <u>bicycle</u>?", instead of a more reasoned response.
这给人的印象是伊丽莎理解了这句话,并在问一个后续问题,而实际上,它是在改变时态并添加一些词。如果 Eliza 无法识别它有响应的关键字,它会给出一个随机响应,该响应可以适用于许多不同的语句。 Eliza 很容易被欺骗,例如,如果用户写了**You are** a <u>bicycle</u>",它可能会回复"How long have **I been** a <u>bicycle</u>?",而不是更合理的回答。
[![Chatting with Eliza](https://img.youtube.com/vi/RMK9AphfLco/0.jpg)](https://youtu.be/RMK9AphfLco "Chatting with Eliza")
> 🎥 Click the image above for a video about original ELIZA program
> 🎥 点击上方的图片查看真实的ELIZA程序视频
> Note: You can read the original description of [Eliza](https://cacm.acm.org/magazines/1966/1/13317-elizaa-computer-program-for-the-study-of-natural-language-communication-between-man-and-machine/abstract) published in 1966 if you have an ACM account. Alternately, read about Eliza on [wikipedia](https://wikipedia.org/wiki/ELIZA)
> 注意:如果你拥有ACM账户,你可以阅读1996年发表的[Eliza](https://cacm.acm.org/magazines/1966/1/13317-elizaa-computer-program-for-the-study-of-natural-language-communication-between-man-and-machine/abstract)的原始介绍。或者,在[wikipedia](https://wikipedia.org/wiki/ELIZA)阅读有关 Eliza 的信息
## Exercise - coding a basic conversational bot
## 联系 - 编码实现一个基础的对话机器人
A conversational bot, like Eliza, is a program that elicits user input and seems to understand and respond intelligently. Unlike Eliza, our bot will not have several rules giving it the appearance of having an intelligent conversation. Instead, out bot will have one ability only, to keep the conversation going with random responses that might work in almost any trivial conversation.
像 Eliza 一样的对话机器人是一个似乎可以智能地理解和响应用户输入的程序。与 Eliza 不同的是,我们的机器人不会用规则让它看起来像是在进行智能对话。取而代之的是,我们的对话机器人将只有一种能力,通过几乎在所有琐碎对话中都适用的随机响应保持对话的进行。
### The plan
### 计划
Your steps when building a conversational bot:
搭建聊天机器人的步骤
1. Print instructions advising the user how to interact with the bot
2. Start a loop
1. Accept user input
2. If user has asked to exit, then exit
3. Process user input and determine response (in this case, the response is a random choice from a list of possible generic responses)
4. Print response
3. loop back to step 2
1. 打印指导用户如何与机器人交互的说明
2. 开启循环
1. 获取用户输入
@ -141,7 +87,6 @@ Your steps when building a conversational bot:
4. 打印回答
3. 重复步骤2
### Building the bot
### 构建聊天机器人
接下来让我们构建聊天机器人。我们将从定义一些短语开始。
@ -194,32 +139,22 @@ Your steps when building a conversational bot:
3. 如果机器人真的可以“理解”一个句子的意思,它是否也需要“记住”对话中前面句子的意思?
---
## 🚀Challenge
## 🚀挑战
Choose one of the "stop and consider" elements above and either try to implement them in code or write a solution on paper using pseudocode.
选择上面的“停止并思考”元素之一,然后尝试在代码中实现它们或使用伪代码在纸上编写解决方案。
In the next lesson, you'll learn about a number of other approaches to parsing natural language and machine learning.