{ "cells": [ { "cell_type": "markdown", "source": [ "# Changamoto: Kuchambua Maandishi Kuhusu Sayansi ya Takwimu\n", "\n", "Katika mfano huu, hebu tufanye zoezi rahisi linalojumuisha hatua zote za mchakato wa jadi wa sayansi ya takwimu. Huna haja ya kuandika msimbo wowote, unaweza kubonyeza tu seli zilizo hapa chini ili kuzitekeleza na kuona matokeo. Kama changamoto, unahimizwa kujaribu msimbo huu na data tofauti.\n", "\n", "## Lengo\n", "\n", "Katika somo hili, tumekuwa tukijadili dhana mbalimbali zinazohusiana na Sayansi ya Takwimu. Hebu tujaribu kugundua dhana zaidi zinazohusiana kwa kufanya **uchimbaji wa maandishi**. Tutaanza na maandishi kuhusu Sayansi ya Takwimu, tutatoa maneno muhimu kutoka humo, na kisha tutajaribu kuonyesha matokeo kwa njia ya picha.\n", "\n", "Kama maandishi, nitatumia ukurasa kuhusu Sayansi ya Takwimu kutoka Wikipedia:\n" ], "metadata": {} }, { "cell_type": "markdown", "source": [], "metadata": {} }, { "cell_type": "code", "execution_count": 62, "source": [ "url = 'https://en.wikipedia.org/wiki/Data_science'" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "## Hatua ya 1: Kupata Data\n", "\n", "Hatua ya kwanza katika kila mchakato wa sayansi ya data ni kupata data. Tutatumia maktaba ya `requests` kufanya hivyo:\n" ], "metadata": {} }, { "cell_type": "code", "execution_count": 63, "source": [ "import requests\r\n", "\r\n", "text = requests.get(url).content.decode('utf-8')\r\n", "print(text[:1000])" ], "outputs": [ { "output_type": "stream", "name": "stdout", "text": [ "\n", "\n", "
\n", "\n", "