data science lifecycle intro challenge

pull/108/head
Jasmine 3 years ago
parent 9adcaacf2b
commit f94391a031

@ -1,5 +1,9 @@
# Introduction to the Data Science Lifecycle
## Pre-Lecture Quiz
[Pre-lecture quiz]()
At this point you've probably come to the realization that data science is a process. This process can be broken down into 5 stages:
- Capturing
@ -61,9 +65,6 @@ On premise refers to hosting managing the data on your own equipment, like ownin
**Cold vs hot data**
When training your models, you may require more training data. If youre content with your model, more data will arrive for a model to serve its purpose. In any case the cost of storing and accessing data will increase as you accumulate more of it. Separating rarely used data, known as cold data from frequently accessed hot data can be a cheaper data storage option through hardware or software services. If cold data needs to be accessed, it may take a little longer to retrieve in comparison to hot data.
Below is an example of the cost of owning an Azure Storage Account
[screenshot of Azure cost calculator]
### Managing Data
As you work with data you may discover that some of the data needs to be cleaned using some of the techniques covered in the lesson focused on [data preparation](2-Working-With-Data\08-data-preparation) to build accurate models. When new data arrives, it will need some of the same applications to maintain consistency in quality. Some projects will involve use of an automated tool for cleansing, aggregation, and compression before the data is moved to its final location. Azure Data Factory is an example of one of these tools.
@ -77,11 +78,15 @@ Heres some things that a team may do with security in mind:
- Let only certain project members alter the data
## Pre-Lecture Quiz
## 🚀 Challenge
[Pre-lecture quiz]()
There are many versions of the Data Science Lifecycle, where each step may have different names and number of stages but will contain the same processes mentioned within this lesson.
## 🚀 Challenge
Explore the [Team Data Science Process lifecycle](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle) and the [Cross-industry standard process for data mining](https://www.datascience-pm.com/crisp-dm-2/). Name 3 similarities and differences between the two.
|Team Data Science Process (TDSP)|Cross-industry standard process for data mining (CRISP-DM)|
|--|--|
|![](..\images\tdsp-lifecycle2.png)| ![](..\images\CRISP-DM.png)
## Post-Lecture Quiz

Binary file not shown.

After

Width:  |  Height:  |  Size: 20 KiB

Binary file not shown.

After

Width:  |  Height:  |  Size: 279 KiB

Loading…
Cancel
Save