You can not select more than 25 topics Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
Data-Science-For-Beginners/2-Working-With-Data/08-data-preparation/assignment.ipynb

143 lines
22 KiB

{
"cells": [
{
"cell_type": "markdown",
"source": [],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [
"# Assignment: Evaluating Data from a Form\r\n",
"\r\n",
"A client has been testing a [small form](index.html) to gather some basic data about their client-base. They have brought their findings to you to validate the data they have gathered. You can open the `index.html` page in a browser to take a look at the form.\r\n",
"\r\n",
"You have been provided a [dataset of csv records](../../data/form.csv) that contain entries from the form as well as some basic visualizations.The client pointed out that some of the visualizations look incorrect but they're unsure about how to resolve them. You can explore it in the [assignment notebook](assignment.ipynb).\r\n",
"\r\n",
"## Instructions\r\n",
"\r\n",
"Use the techniques in this lesson to make recommendations about the form so it captures accurate and consistent information. "
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": null,
"source": [
"!pip install pandas\r\n",
"!pip install matplotlib"
],
"outputs": [],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 4,
"source": [
"import pandas as pd\r\n",
"import matplotlib.pyplot as plt\r\n",
"\r\n",
"#Loading the dataset\r\n",
"path = '../../data/form.csv'\r\n",
"form_df = pd.read_csv(path)\r\n",
"print(form_df)"
],
"outputs": [
{
"output_type": "stream",
"name": "stdout",
"text": [
" birth_month state pet\n",
"0 January NaN Cats\n",
"1 JAN CA Cats\n",
"2 Sept Hawaii Dog\n",
"3 january AK Dog\n",
"4 July RI Cats\n",
"5 September California Cats\n",
"6 April CA Dog\n",
"7 January California Cats\n",
"8 November FL Dog\n",
"9 December Florida Cats\n"
]
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 7,
"source": [
"form_df['state'].value_counts().plot(kind='bar');\r\n",
"plt.show()"
],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
}
}
],
"metadata": {}
},
{
"cell_type": "code",
"execution_count": 8,
"source": [
"form_df['birth_month'].value_counts().plot(kind='bar');\r\n",
"plt.show()"
],
"outputs": [
{
"output_type": "display_data",
"data": {
"image/png": "",
"text/plain": [
"<Figure size 432x288 with 1 Axes>"
]
},
"metadata": {
"needs_background": "light"
}
}
],
"metadata": {}
},
{
"cell_type": "markdown",
"source": [],
"metadata": {}
}
],
"metadata": {
"orig_nbformat": 4,
"language_info": {
"name": "python",
"version": "3.9.7",
"mimetype": "text/x-python",
"codemirror_mode": {
"name": "ipython",
"version": 3
},
"pygments_lexer": "ipython3",
"nbconvert_exporter": "python",
"file_extension": ".py"
},
"kernelspec": {
"name": "python3",
"display_name": "Python 3.9.7 64-bit ('venv': venv)"
},
"interpreter": {
"hash": "6b9b57232c4b57163d057191678da2030059e733b8becc68f245de5a75abe84e"
}
},
"nbformat": 4,
"nbformat_minor": 2
}