diff --git a/2-Working-With-Data/08-data-preparation/notebook.ipynb b/2-Working-With-Data/08-data-preparation/notebook.ipynb
index ac9bab8..3e8ae01 100644
--- a/2-Working-With-Data/08-data-preparation/notebook.ipynb
+++ b/2-Working-With-Data/08-data-preparation/notebook.ipynb
@@ -1614,6 +1614,300 @@
"You could use `isnull` to do this in place, but that can be laborious, particularly if you have a lot of values to fill. Because this is such a common task in data science, pandas provides `fillna`, which returns a copy of the `Series` or `DataFrame` with the missing values replaced with one of your choosing. Let's create another example `Series` to see how this works in practice."
]
},
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "CE8S7louLezV"
+ },
+ "source": [
+ "First let us consider non-numeric data. In datasets, we have columns with categorical data. Eg. Gender, True or False etc.\n",
+ "\n",
+ "In most of these cases, we replace missing values with the `mode` of the column. Say, we have 100 data points and 90 have said True, 8 have said False and 2 have not filled. Then, we can will the 2 with True, considering the full column. \n",
+ "\n",
+ "Again, here we can use domain knowledge here. Let us consider an example of filling with the mode."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "MY5faq4yLdpQ",
+ "outputId": "c3838b07-0d15-471e-8dad-370de91d4bdc",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ }
+ },
+ "source": [
+ "fill_with_mode = pd.DataFrame([[1,2,\"True\"],\n",
+ " [3,4,None],\n",
+ " [5,6,\"False\"],\n",
+ " [7,8,\"True\"],\n",
+ " [9,10,\"True\"]])\n",
+ "\n",
+ "fill_with_mode"
+ ],
+ "execution_count": 28,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "
\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " None | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 5 | \n",
+ " 6 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 7 | \n",
+ " 8 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 9 | \n",
+ " 10 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2\n",
+ "0 1 2 True\n",
+ "1 3 4 None\n",
+ "2 5 6 False\n",
+ "3 7 8 True\n",
+ "4 9 10 True"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 28
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "MLAoMQOfNPlA"
+ },
+ "source": [
+ "Now, lets first find the mode before filling the `None` value with the mode."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "WKy-9Y2tN5jv",
+ "outputId": "41f5064e-502d-4aec-dc2d-86f885068b4f",
+ "colab": {
+ "base_uri": "https://localhost:8080/"
+ }
+ },
+ "source": [
+ "fill_with_mode[2].value_counts()"
+ ],
+ "execution_count": 29,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/plain": [
+ "True 3\n",
+ "False 1\n",
+ "Name: 2, dtype: int64"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 29
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "6iNz_zG_OKrx"
+ },
+ "source": [
+ "So, we will replace None with True"
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "TxPKteRvNPOs"
+ },
+ "source": [
+ "fill_with_mode[2].fillna('True',inplace=True)"
+ ],
+ "execution_count": 30,
+ "outputs": []
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "tvas7c9_OPWE",
+ "outputId": "7282c4f7-0e59-4398-b4f2-5919baf61164",
+ "colab": {
+ "base_uri": "https://localhost:8080/",
+ "height": 204
+ }
+ },
+ "source": [
+ "fill_with_mode"
+ ],
+ "execution_count": 31,
+ "outputs": [
+ {
+ "output_type": "execute_result",
+ "data": {
+ "text/html": [
+ "\n",
+ "\n",
+ "
\n",
+ " \n",
+ " \n",
+ " | \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ "
\n",
+ " \n",
+ " \n",
+ " \n",
+ " 0 | \n",
+ " 1 | \n",
+ " 2 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 1 | \n",
+ " 3 | \n",
+ " 4 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 2 | \n",
+ " 5 | \n",
+ " 6 | \n",
+ " False | \n",
+ "
\n",
+ " \n",
+ " 3 | \n",
+ " 7 | \n",
+ " 8 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ " 4 | \n",
+ " 9 | \n",
+ " 10 | \n",
+ " True | \n",
+ "
\n",
+ " \n",
+ "
\n",
+ "
"
+ ],
+ "text/plain": [
+ " 0 1 2\n",
+ "0 1 2 True\n",
+ "1 3 4 True\n",
+ "2 5 6 False\n",
+ "3 7 8 True\n",
+ "4 9 10 True"
+ ]
+ },
+ "metadata": {},
+ "execution_count": 31
+ }
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "SktitLxxOR16"
+ },
+ "source": [
+ "As we can see, the null value has been replaced. Needless to say, we could have written anything in place or `'True'` and it would have got substituted."
+ ]
+ },
+ {
+ "cell_type": "markdown",
+ "metadata": {
+ "id": "heYe1I0dOmQ_"
+ },
+ "source": [
+ "Now, coming to numeric data. Here, we have a two common ways of replacing missing values:\n",
+ "\n",
+ "1. Replace with Median of the row\n",
+ "2. Replace with Mean of the row \n",
+ "\n",
+ "We replace with Median, in case of skewed data with outliers. This is beacuse median is robust to outliers.\n",
+ "\n",
+ "When the data is normalized, we can use mean, as in that case, mean and median would be pretty close."
+ ]
+ },
+ {
+ "cell_type": "code",
+ "metadata": {
+ "id": "09HM_2feOj5Y"
+ },
+ "source": [
+ ""
+ ],
+ "execution_count": null,
+ "outputs": []
+ },
{
"cell_type": "code",
"metadata": {