From 2095f755200990206697e9bdada5cf48cbe1ecaf Mon Sep 17 00:00:00 2001 From: Jen Looper <jen.looper@gmail.com> Date: Wed, 2 Jun 2021 22:03:29 -0400 Subject: [PATCH] NLP review --- NLP/2-Tasks/README.md | 8 +++++++- 1 file changed, 7 insertions(+), 1 deletion(-) diff --git a/NLP/2-Tasks/README.md b/NLP/2-Tasks/README.md index 4ee08932..881acc28 100644 --- a/NLP/2-Tasks/README.md +++ b/NLP/2-Tasks/README.md @@ -11,7 +11,13 @@ Let's discover common techniques used in processing text. Combined with machine > 🎓 **Tokenization** > -> Probably the first thing most NLP algorithms have to do is split the text into tokens, or words. While this sounds simple, having to account for punctuation and different language's word and sentence delimiters can make it tricky. +> Probably the first thing most NLP algorithms have to do is split the text into tokens, or words. While this sounds simple, having to account for punctuation and different languages' word and sentence delimiters can make it tricky. Thought it might seem very straightforward to split a sentence into words, you might have to use some other methods to determine demarcations. + +🎓 **Embeddings** + +> [Word embeddings](https://en.wikipedia.org/wiki/Word_embedding) are a way to convert your text data numerically. This is done in a way so that words with a similar meaning or words used together cluster together. + +✅ Try [this interesting tool](https://projector.tensorflow.org/) to experiment with word embeddings. Clicking on one word shows clusters of similar words: 'toy' clusters with 'disney', 'lego', 'playstation', and 'console'. > 🎓 **Parsing & Part-of-speech Tagging** >