diff --git a/1-Introduction/01-defining-data-science/notebook.ipynb b/1-Introduction/01-defining-data-science/notebook.ipynb index cf3988e8..5edafb9b 100644 --- a/1-Introduction/01-defining-data-science/notebook.ipynb +++ b/1-Introduction/01-defining-data-science/notebook.ipynb @@ -1,419 +1,736 @@ { - "cells": [ - { - "cell_type": "markdown", - "source": [ - "# Challenge: Analyzing Text about Data Science\r\n", - "\r\n", - "In this example, let's do a simple exercise that covers all steps of a traditional data science process. You do not have to write any code, you can just click on the cells below to execute them and observe the result. As a challenge, you are encouraged to try this code out with different data. \r\n", - "\r\n", - "## Goal\r\n", - "\r\n", - "In this lesson, we have been discussing different concepts related to Data Science. Let's try to discover more related concepts by doing some **text mining**. We will start with a text about Data Science, extract keywords from it, and then try to visualize the result.\r\n", - "\r\n", - "As a text, I will use the page on Data Science from Wikipedia:" - ], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": 62, - "source": [ - "url = 'https://en.wikipedia.org/wiki/Data_science'" - ], - "outputs": [], - "metadata": {} - }, - { - "cell_type": "markdown", - "source": [ - "## Step 1: Getting the Data\r\n", - "\r\n", - "First step in every data science process is getting the data. We will use `requests` library to do that:" - ], - "metadata": {} - }, - { - "cell_type": "code", - "execution_count": 63, - "source": [ - "import requests\r\n", - "\r\n", - "text = requests.get(url).content.decode('utf-8')\r\n", - "print(text[:1000])" - ], - "outputs": [ - { - "output_type": "stream", - "name": "stdout", - "text": [ - "\n", - "\n", - "
\n", - "\n", - "