Explain what tibble is

pull/313/head
R-icntay 3 years ago
parent 558f25f80d
commit 74bf75f37f

@ -163,12 +163,12 @@
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"source": [ "source": [
"# Basic information about the data\n", "# Basic information about the data\r\n",
"df %>%\n", "df %>%\r\n",
" introduce()\n", " introduce()\r\n",
"\n", "\r\n",
"# Visualize basic information above\n", "# Visualize basic information above\r\n",
"df %>% \n", "df %>% \r\n",
" plot_intro(ggtheme = theme_light())" " plot_intro(ggtheme = theme_light())"
], ],
"outputs": [], "outputs": [],
@ -193,17 +193,17 @@
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"source": [ "source": [
"# Count observations per cuisine\n", "# Count observations per cuisine\r\n",
"df %>% \n", "df %>% \r\n",
" count(cuisine) %>% \n", " count(cuisine) %>% \r\n",
" arrange(n)\n", " arrange(n)\r\n",
"\n", "\r\n",
"# Plot the distribution\n", "# Plot the distribution\r\n",
"theme_set(theme_light())\n", "theme_set(theme_light())\r\n",
"df %>% \n", "df %>% \r\n",
" count(cuisine) %>% \n", " count(cuisine) %>% \r\n",
" ggplot(mapping = aes(x = n, y = reorder(cuisine, -n))) +\n", " ggplot(mapping = aes(x = n, y = reorder(cuisine, -n))) +\r\n",
" geom_col(fill = \"midnightblue\", alpha = 0.7) +\n", " geom_col(fill = \"midnightblue\", alpha = 0.7) +\r\n",
" ylab(\"cuisine\")" " ylab(\"cuisine\")"
], ],
"outputs": [], "outputs": [],
@ -214,15 +214,17 @@
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"There are a finite number of cuisines, but the distribution of data is uneven. You can fix that! Before doing so, explore a little more.\n", "There are a finite number of cuisines, but the distribution of data is uneven. You can fix that! Before doing so, explore a little more.\r\n",
"\n", "\r\n",
"Next, let's assign each cuisine into its individual table and find out how much data is available (rows, columns) per cuisine.\n", "Next, let's assign each cuisine into its individual tibble and find out how much data is available (rows, columns) per cuisine.\r\n",
"\n", "\r\n",
"<p >\n", "> A tibble is a modern reimagining of the data frame, keeping what time has proven to be effective, and throwing out what is not.\r\n",
" <img src=\"../../images/dplyr_filter.jpg\"\n", "\r\n",
" width=\"600\"/>\n", "<p >\r\n",
" <figcaption>Artwork by @allison_horst</figcaption>\n", " <img src=\"../../images/dplyr_filter.jpg\"\r\n",
"\n" " width=\"600\"/>\r\n",
" <figcaption>Artwork by @allison_horst</figcaption>\r\n",
"\r\n"
], ],
"metadata": { "metadata": {
"id": "vVvyDb1kG2in" "id": "vVvyDb1kG2in"
@ -232,24 +234,24 @@
"cell_type": "code", "cell_type": "code",
"execution_count": null, "execution_count": null,
"source": [ "source": [
"# Create individual tables for the cuisines\n", "# Create individual tibble for the cuisines\r\n",
"thai_df <- df %>% \n", "thai_df <- df %>% \r\n",
" filter(cuisine == \"thai\")\n", " filter(cuisine == \"thai\")\r\n",
"japanese_df <- df %>% \n", "japanese_df <- df %>% \r\n",
" filter(cuisine == \"japanese\")\n", " filter(cuisine == \"japanese\")\r\n",
"chinese_df <- df %>% \n", "chinese_df <- df %>% \r\n",
" filter(cuisine == \"chinese\")\n", " filter(cuisine == \"chinese\")\r\n",
"indian_df <- df %>% \n", "indian_df <- df %>% \r\n",
" filter(cuisine == \"indian\")\n", " filter(cuisine == \"indian\")\r\n",
"korean_df <- df %>% \n", "korean_df <- df %>% \r\n",
" filter(cuisine == \"korean\")\n", " filter(cuisine == \"korean\")\r\n",
"\n", "\r\n",
"\n", "\r\n",
"# Find out how much data is avilable per cuisine\n", "# Find out how much data is avilable per cuisine\r\n",
"cat(\" thai df:\", dim(thai_df), \"\\n\",\n", "cat(\" thai df:\", dim(thai_df), \"\\n\",\r\n",
" \"japanese df:\", dim(japanese_df), \"\\n\",\n", " \"japanese df:\", dim(japanese_df), \"\\n\",\r\n",
" \"chinese_df:\", dim(chinese_df), \"\\n\",\n", " \"chinese_df:\", dim(chinese_df), \"\\n\",\r\n",
" \"indian_df:\", dim(indian_df), \"\\n\",\n", " \"indian_df:\", dim(indian_df), \"\\n\",\r\n",
" \"korean_df:\", dim(korean_df))" " \"korean_df:\", dim(korean_df))"
], ],
"outputs": [], "outputs": [],

@ -34,7 +34,7 @@ Classification is one of the fundamental activities of the machine learning rese
To state the process in a more scientific way, your classification method creates a predictive model that enables you to map the relationship between input variables to output variables. To state the process in a more scientific way, your classification method creates a predictive model that enables you to map the relationship between input variables to output variables.
![Binary vs. multiclass problems for classification algorithms to handle. Infographic by Jen Looper](../images/binary-multiclass.png) ![Binary vs. multiclass problems for classification algorithms to handle. Infographic by Jen Looper](../../images/binary-multiclass.png){width="500"}
Before starting the process of cleaning our data, visualizing it, and prepping it for our ML tasks, let's learn a bit about the various ways machine learning can be leveraged to classify data. Before starting the process of cleaning our data, visualizing it, and prepping it for our ML tasks, let's learn a bit about the various ways machine learning can be leveraged to classify data.
@ -127,7 +127,9 @@ There are a finite number of cuisines, but the distribution of data is uneven. Y
2. Next, let's assign each cuisine into it's individual tibble and find out how much data is available (rows, columns) per cuisine. 2. Next, let's assign each cuisine into it's individual tibble and find out how much data is available (rows, columns) per cuisine.
![Artwork by \@allison_horst](../images/dplyr_filter.jpg) > A tibble, or tbl_df, is a modern reimagining of the data.frame, keeping what time has proven to be effective, and throwing out what is not.
![Artwork by \@allison_horst](../../images/dplyr_filter.jpg)
```{r cuisine_df} ```{r cuisine_df}
# Create individual tibbles for the cuisines # Create individual tibbles for the cuisines
@ -297,7 +299,7 @@ df_select %>%
## Preprocessing data using recipes 👩‍🍳👨‍🍳 - Dealing with imbalanced data ⚖️ ## Preprocessing data using recipes 👩‍🍳👨‍🍳 - Dealing with imbalanced data ⚖️
![Artwork by \@allison_horst](../images/recipes.png) ![Artwork by \@allison_horst](../../images/recipes.png)
Given that this lesson is about cuisines, we have to put `recipes` into context . Given that this lesson is about cuisines, we have to put `recipes` into context .

Loading…
Cancel
Save