"Let's take a look at this dataset to see what we have:"
"The amount of data(given by the `shape` attribute) and the name of the features or columns(given by the `columns` attribute) tell us something about the dataset. Now, we would want to dive deeper into the dataset. The `DataFrame.info()` function is quite useful for this. "
"From this, we know that the *Iris* dataset has 150 entries in four columns. All of the data is stored as 64-bit floating-point numbers."
"From here, we get to can make a few observations:\n",
"1. The DataType of each column: In this dataset, all of the data is stored as 64-bit floating-point numbers.\n",
"2. Number of Non-Null values: Dealing with null values is an important step in data preparation. It will be dealt with later in the notebook."
]
},
{
"cell_type": "markdown",
"metadata": {
"id": "IYlyxbpWFEF4"
},
"source": [
"### DataFrame.describe()\n",
"Say we have a lot of numerical data in our dataset. Univariate statistical calculations such as the mean, median, quartiles etc. can be done on each of the columns individually. The `DataFrame.describe()` function provides us with a statistical summary of the numerical columns of a dataset.\n",
"The output above shows the total number of data points, mean, standard deviation, minimum, lower quartile(25%), median(50%), upper quartile(75%) and the maximum value of each column."
]
},
{
@ -216,20 +359,117 @@
},
"source": [
"### `DataFrame.head`\n",
"Next, let's see what the first few rows of our `DataFrame` look like:"
"With all the above functions and attributes, we have got a top level view of the dataset. We know how many data points are there, how many features are there, the data type of each feature and the number of non-null values for each feature.\n",
"\n",
"Now its time to look at the data itself. Let's see what the first few rows(the first few datapoints) of our `DataFrame` look like:"
"As the output here, we can see five(5) entries of the dataset. If we look at the index at the left, we find out that these are the first five rows."
]
},
{
"cell_type": "markdown",
@ -239,7 +479,7 @@
"source": [
"### Exercise:\n",
"\n",
"By default, `DataFrame.head` returns the first five rows of a `DataFrame`. In the code cell below, can you figure out how to get it to show more?"
"From the example given above, it is clear that, by default, `DataFrame.head` returns the first five rows of a `DataFrame`. In the code cell below, can you figure out a way to display more than five rows?"
]
},
{
@ -252,7 +492,7 @@
"source": [
"# Hint: Consult the documentation by using iris_df.head?"
],
"execution_count": null,
"execution_count": 6,
"outputs": []
},
{
@ -262,20 +502,106 @@
},
"source": [
"### `DataFrame.tail`\n",
"The flipside of `DataFrame.head` is `DataFrame.tail`, which returns the last five rows of a `DataFrame`:"
"Another way of looking at the data can be from the end(instead of the beginning). The flipside of `DataFrame.head` is `DataFrame.tail`, which returns the last five rows of a `DataFrame`:"
"In practice, it is useful to be able to easily examine the first few rows or the last few rows of a `DataFrame`, particularly when you are looking for outliers in ordered datasets. \n",
"\n",
"All the functions and attributes shown above with the help of code examples, help us get a look and feel of the data. \n",
"\n",
"> **Takeaway:** Even just by looking at the metadata about the information in a DataFrame or the first and last few values in one, you can get an immediate idea about the size, shape, and content of the data you are dealing with."