You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/2-Working-With-Data/07-python/notebook-covidspread.ipynb

287 lines
9.2 KiB

This file contains ambiguous Unicode characters!

This file contains ambiguous Unicode characters that may be confused with others in your current locale. If your use case is intentional and legitimate, you can safely ignore this warning. Use the Escape button to highlight these characters.

{
"cells": [
{
"cell_type": "markdown",
"source": [
"# Using Python for Data Processing\r\n",
"\r\n",
"## Tabular Data\r\n",
"\r\n",
"We will use data on COVID-19 infected individuals, provided by the [Center for Systems Science and Engineering](https://systems.jhu.edu/) (CSSE) at [Johns Hopkins University](https://jhu.edu/). Dataset is available in [this GitHub Repository](https://github.com/CSSEGISandData/COVID-19)."
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 1,
"source": [
"import numpy as np\r\n",
"import pandas as pd"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"We can load the most recent data directly from GitHub using `pd.read_csv`. If for some reason the data is not available, you can always use the copy available locally in the `data` folder - just uncomment lines below:"
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"base_url = \"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/\" # loading from Internet\r\n",
"# base_url = \"../../data/COVID/\" # loading from disk\r\n",
"infected_dataset_url = base_url + \"time_series_covid19_confirmed_global.csv\"\r\n",
"recovered_dataset_url = base_url + \"time_series_covid19_recovered_global.csv\"\r\n",
"deaths_dataset_url = base_url + \"time_series_covid19_deaths_global.csv\"\r\n",
"countries_dataset_url = base_url + \"../UID_ISO_FIPS_LookUp_Table.csv\""
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 5,
"source": [
"infected = pd.read_csv(infected_dataset_url)\r\n",
"infected.head()"
],
"outputs": [
{
"output_type": "execute_result",
"data": {
"text/plain": [
" Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n",
"0 NaN Afghanistan 33.93911 67.709953 0 0 \n",
"1 NaN Albania 41.15330 20.168300 0 0 \n",
"2 NaN Algeria 28.03390 1.659600 0 0 \n",
"3 NaN Andorra 42.50630 1.521800 0 0 \n",
"4 NaN Angola -11.20270 17.873900 0 0 \n",
"\n",
" 1/24/20 1/25/20 1/26/20 1/27/20 ... 8/14/21 8/15/21 8/16/21 \\\n",
"0 0 0 0 0 ... 151770 151770 152142 \n",
"1 0 0 0 0 ... 135550 135947 136147 \n",
"2 0 0 0 0 ... 186655 187258 187968 \n",
"3 0 0 0 0 ... 14924 14924 14954 \n",
"4 0 0 0 0 ... 44534 44617 44739 \n",
"\n",
" 8/17/21 8/18/21 8/19/21 8/20/21 8/21/21 8/22/21 8/23/21 \n",
"0 152243 152363 152411 152448 152448 152448 152583 \n",
"1 136598 137075 137597 138132 138790 139324 139721 \n",
"2 188663 189384 190078 190656 191171 191583 192089 \n",
"3 14960 14976 14981 14988 14988 14988 15002 \n",
"4 44972 45175 45325 45583 45817 45945 46076 \n",
"\n",
"[5 rows x 584 columns]"
],
"text/html": [
"<div>\n",
"<style scoped>\n",
" .dataframe tbody tr th:only-of-type {\n",
" vertical-align: middle;\n",
" }\n",
"\n",
" .dataframe tbody tr th {\n",
" vertical-align: top;\n",
" }\n",
"\n",
" .dataframe thead th {\n",
" text-align: right;\n",
" }\n",
"</style>\n",
"<table border=\"1\" class=\"dataframe\">\n",
" <thead>\n",
" <tr style=\"text-align: right;\">\n",
" <th></th>\n",
" <th>Province/State</th>\n",
" <th>Country/Region</th>\n",
" <th>Lat</th>\n",
" <th>Long</th>\n",
" <th>1/22/20</th>\n",
" <th>1/23/20</th>\n",
" <th>1/24/20</th>\n",
" <th>1/25/20</th>\n",
" <th>1/26/20</th>\n",
" <th>1/27/20</th>\n",
" <th>...</th>\n",
" <th>8/14/21</th>\n",
" <th>8/15/21</th>\n",
" <th>8/16/21</th>\n",
" <th>8/17/21</th>\n",
" <th>8/18/21</th>\n",
" <th>8/19/21</th>\n",
" <th>8/20/21</th>\n",
" <th>8/21/21</th>\n",
" <th>8/22/21</th>\n",
" <th>8/23/21</th>\n",
" </tr>\n",
" </thead>\n",
" <tbody>\n",
" <tr>\n",
" <th>0</th>\n",
" <td>NaN</td>\n",
" <td>Afghanistan</td>\n",
" <td>33.93911</td>\n",
" <td>67.709953</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>151770</td>\n",
" <td>151770</td>\n",
" <td>152142</td>\n",
" <td>152243</td>\n",
" <td>152363</td>\n",
" <td>152411</td>\n",
" <td>152448</td>\n",
" <td>152448</td>\n",
" <td>152448</td>\n",
" <td>152583</td>\n",
" </tr>\n",
" <tr>\n",
" <th>1</th>\n",
" <td>NaN</td>\n",
" <td>Albania</td>\n",
" <td>41.15330</td>\n",
" <td>20.168300</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>135550</td>\n",
" <td>135947</td>\n",
" <td>136147</td>\n",
" <td>136598</td>\n",
" <td>137075</td>\n",
" <td>137597</td>\n",
" <td>138132</td>\n",
" <td>138790</td>\n",
" <td>139324</td>\n",
" <td>139721</td>\n",
" </tr>\n",
" <tr>\n",
" <th>2</th>\n",
" <td>NaN</td>\n",
" <td>Algeria</td>\n",
" <td>28.03390</td>\n",
" <td>1.659600</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>186655</td>\n",
" <td>187258</td>\n",
" <td>187968</td>\n",
" <td>188663</td>\n",
" <td>189384</td>\n",
" <td>190078</td>\n",
" <td>190656</td>\n",
" <td>191171</td>\n",
" <td>191583</td>\n",
" <td>192089</td>\n",
" </tr>\n",
" <tr>\n",
" <th>3</th>\n",
" <td>NaN</td>\n",
" <td>Andorra</td>\n",
" <td>42.50630</td>\n",
" <td>1.521800</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>14924</td>\n",
" <td>14924</td>\n",
" <td>14954</td>\n",
" <td>14960</td>\n",
" <td>14976</td>\n",
" <td>14981</td>\n",
" <td>14988</td>\n",
" <td>14988</td>\n",
" <td>14988</td>\n",
" <td>15002</td>\n",
" </tr>\n",
" <tr>\n",
" <th>4</th>\n",
" <td>NaN</td>\n",
" <td>Angola</td>\n",
" <td>-11.20270</td>\n",
" <td>17.873900</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>0</td>\n",
" <td>...</td>\n",
" <td>44534</td>\n",
" <td>44617</td>\n",
" <td>44739</td>\n",
" <td>44972</td>\n",
" <td>45175</td>\n",
" <td>45325</td>\n",
" <td>45583</td>\n",
" <td>45817</td>\n",
" <td>45945</td>\n",
" <td>46076</td>\n",
" </tr>\n",
" </tbody>\n",
"</table>\n",
"<p>5 rows × 584 columns</p>\n",
"</div>"
]
},
"metadata": {},
"execution_count": 5
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [],
"outputs": [],
"metadata": {}
}
],
"metadata": {
"orig_nbformat": 4,
"language_info": {
"name": "python",
"version": "3.8.8",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3.8.8 64-bit (conda)"
},
"interpreter": {
"hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5"
}
},
"nbformat": 4,
"nbformat_minor": 2
}