{ "cells": [ { "cell_type": "markdown", "source": [ "# Using Python for Data Processing\r\n", "\r\n", "## Tabular Data\r\n", "\r\n", "We will use data on COVID-19 infected individuals, provided by the [Center for Systems Science and Engineering](https://systems.jhu.edu/) (CSSE) at [Johns Hopkins University](https://jhu.edu/). Dataset is available in [this GitHub Repository](https://github.com/CSSEGISandData/COVID-19)." ], "metadata": {} }, { "cell_type": "code", "execution_count": 1, "source": [ "import numpy as np\r\n", "import pandas as pd" ], "outputs": [], "metadata": {} }, { "cell_type": "markdown", "source": [ "We can load the most recent data directly from GitHub using `pd.read_csv`. If for some reason the data is not available, you can always use the copy available locally in the `data` folder - just uncomment lines below:" ], "metadata": {} }, { "cell_type": "code", "execution_count": 4, "source": [ "base_url = \"https://raw.githubusercontent.com/CSSEGISandData/COVID-19/master/csse_covid_19_data/csse_covid_19_time_series/\" # loading from Internet\r\n", "# base_url = \"../../data/COVID/\" # loading from disk\r\n", "infected_dataset_url = base_url + \"time_series_covid19_confirmed_global.csv\"\r\n", "recovered_dataset_url = base_url + \"time_series_covid19_recovered_global.csv\"\r\n", "deaths_dataset_url = base_url + \"time_series_covid19_deaths_global.csv\"\r\n", "countries_dataset_url = base_url + \"../UID_ISO_FIPS_LookUp_Table.csv\"" ], "outputs": [], "metadata": {} }, { "cell_type": "code", "execution_count": 5, "source": [ "infected = pd.read_csv(infected_dataset_url)\r\n", "infected.head()" ], "outputs": [ { "output_type": "execute_result", "data": { "text/plain": [ " Province/State Country/Region Lat Long 1/22/20 1/23/20 \\\n", "0 NaN Afghanistan 33.93911 67.709953 0 0 \n", "1 NaN Albania 41.15330 20.168300 0 0 \n", "2 NaN Algeria 28.03390 1.659600 0 0 \n", "3 NaN Andorra 42.50630 1.521800 0 0 \n", "4 NaN Angola -11.20270 17.873900 0 0 \n", "\n", " 1/24/20 1/25/20 1/26/20 1/27/20 ... 8/14/21 8/15/21 8/16/21 \\\n", "0 0 0 0 0 ... 151770 151770 152142 \n", "1 0 0 0 0 ... 135550 135947 136147 \n", "2 0 0 0 0 ... 186655 187258 187968 \n", "3 0 0 0 0 ... 14924 14924 14954 \n", "4 0 0 0 0 ... 44534 44617 44739 \n", "\n", " 8/17/21 8/18/21 8/19/21 8/20/21 8/21/21 8/22/21 8/23/21 \n", "0 152243 152363 152411 152448 152448 152448 152583 \n", "1 136598 137075 137597 138132 138790 139324 139721 \n", "2 188663 189384 190078 190656 191171 191583 192089 \n", "3 14960 14976 14981 14988 14988 14988 15002 \n", "4 44972 45175 45325 45583 45817 45945 46076 \n", "\n", "[5 rows x 584 columns]" ], "text/html": [ "
\n", "\n", "\n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", " \n", "
Province/StateCountry/RegionLatLong1/22/201/23/201/24/201/25/201/26/201/27/20...8/14/218/15/218/16/218/17/218/18/218/19/218/20/218/21/218/22/218/23/21
0NaNAfghanistan33.9391167.709953000000...151770151770152142152243152363152411152448152448152448152583
1NaNAlbania41.1533020.168300000000...135550135947136147136598137075137597138132138790139324139721
2NaNAlgeria28.033901.659600000000...186655187258187968188663189384190078190656191171191583192089
3NaNAndorra42.506301.521800000000...14924149241495414960149761498114988149881498815002
4NaNAngola-11.2027017.873900000000...44534446174473944972451754532545583458174594546076
\n", "

5 rows × 584 columns

\n", "
" ] }, "metadata": {}, "execution_count": 5 } ], "metadata": {} }, { "cell_type": "code", "execution_count": null, "source": [], "outputs": [], "metadata": {} } ], "metadata": { "orig_nbformat": 4, "language_info": { "name": "python", "version": "3.8.8", "mimetype": "text/x-python", "codemirror_mode": { "name": "ipython", "version": 3 }, "pygments_lexer": "ipython3", "nbconvert_exporter": "python", "file_extension": ".py" }, "kernelspec": { "name": "python3", "display_name": "Python 3.8.8 64-bit (conda)" }, "interpreter": { "hash": "86193a1ab0ba47eac1c69c1756090baa3b420b3eea7d4aafab8b85f8b312f0c5" } }, "nbformat": 4, "nbformat_minor": 2 }