{ "cells": [ { "cell_type": "markdown", "source": [ "# Challenge: Analyzing Text about Data Science\n", "\n", "In this example, let's do a simple exercise that includes all the steps of a typical data science process. You don't need to write any code; you can simply click on the cells below to run them and observe the results. As a challenge, you're encouraged to test this code with different data.\n", "\n", "## Goal\n", "\n", "In this lesson, we've been discussing various concepts related to Data Science. Let's explore more related concepts by performing **text mining**. We'll start with a text about Data Science, extract keywords from it, and then attempt to visualize the results.\n", "\n", "For the text, we'll use the Wikipedia page on Data Science:\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "code", "execution_count": 62, "source": [ "url = 'https://en.wikipedia.org/wiki/Data_science'" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Step 1: Obtaining the Data\n", "\n", "The first step in any data science process is obtaining the data. We'll use the `requests` library for this:\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 63, "source": [ "import requests\r\n", "\r\n", "text = requests.get(url).content.decode('utf-8')\r\n", "print(text[:1000])" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "\n", "
\n", "\n", "