Merge branch 'main' of https://github.com/jasleen101010/ML-For-Beginners

1 year ago · 9054bad5e9
parent 6fddfa55c6 ab24e44a52
commit 9054bad5e9
1 changed files with 21 additions and 0 deletions
--- a/2-Regression/4-Logistic/solution/R/lesson_4.Rmd
+++ b/2-Regression/4-Logistic/solution/R/lesson_4.Rmd
@ -243,6 +243,27 @@ Now that we have an idea of the relationship between the binary categories of co

 ## 3. Build your model

+Let's begin by splitting the data into `training` and `test` sets. The training set is used to train a classifier so that it finds a statistical relationship between the features and the label value.
+
+It is best practice to hold out some of your data for **testing** in order to get a better estimate of how your models will perform on new data by comparing the predicted labels with the already known labels in the test set. [rsample](https://rsample.tidymodels.org/), a package in Tidymodels, provides infrastructure for efficient data splitting and resampling:
+
+```{r split_data}
+# Split data into 80% for training and 20% for testing
+set.seed(2056)
+pumpkins_split <- pumpkins_select %>% 
+  initial_split(prop = 0.8)
+
+# Extract the data in each split
+pumpkins_train <- training(pumpkins_split)
+pumpkins_test <- testing(pumpkins_split)
+
+# Print out the first 5 rows of the training set
+pumpkins_train %>% 
+  slice_head(n = 5)
+
+
+```
+
 🙌 We are now ready to train a model by fitting the training features to the training label (color).

 We'll begin by creating a recipe that specifies the preprocessing steps that should be carried out on our data to get it ready for modelling i.e: encoding categorical variables into a set of integers. Just like `baked_pumpkins`, we create a `pumpkins_recipe` but do not `prep` and `bake` since it would be bundled into a workflow, which you will see in just a few steps from now.