You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
|
|
4 years ago | |
|---|---|---|
| .. | ||
| solution | 5 years ago | |
| translations | 5 years ago | |
| README.md | 4 years ago | |
| assignment.md | 5 years ago | |
| notebook.ipynb | 5 years ago | |
README.md
Working with Data: Data Preparation
![]() |
|---|
| Data Preparation - Sketchnote by @nitya |
Pre-Lecture Quiz
Depending on its source, raw data may contain some inconsistencies that will cause challenges in analysis and modeling. In other words, this data can be categorized as “dirty” and will need to be cleaned up. This lesson focuses on techniques for cleaning and transforming the data to handle challenges of missing, inaccurate, or incomplete data. Topics covered in this lesson will utilize Python and the Pandas library and will be demonstrated in the notebook within this directory.
The importance of cleaning data
- Ease of use and reuse: When data is properly organized and normalized it’s easier to search, use, and share with others.
- Consistency: Data science often requires working with more than one dataset, where datasets from different sources need to be joined together. Making sure that each individual data set has common standardization will ensure that the data is still useful when they are all merged into one dataset.
- Model accuracy: Data that has been cleaned improves the accuracy of models that rely on it.
