capturing stage

4 years ago · 302af31481
parent 0d569a60e0
commit 302af31481
1 changed files with 34 additions and 1 deletions
--- a/4-Data-Science-Lifecycle/14-introduction-to-data-science-lifecycle/README.md
+++ b/4-Data-Science-Lifecycle/14-introduction-to-data-science-lifecycle/README.md
@ -1,4 +1,37 @@
-# The Data Science Lifecycle: Capturing
+# Introduction to the Data Science Lifecycle
+
+At this point you've probably come to the realization that that data science is a process. This process can be broken down into 5 stages, starting with capturing and ending with maintenance. 
+This lesson focuses on 3 parts of the life cycle: capturing, processing and maintenance.
+
+![](./data-science-lifecycle.jpg)
+(source??)
+
+## Capturing
+
+The first stage of the lifecycle is very important as the next stages are dependent on it. It’s practically two stages combined into one: acquiring the data and defining the purpose and problems that need to be addressed. 
+Defining the goals of the project will require deeper context into the problem or question. First, we need to identify and acquire those who need their problem solved. These may be stakeholders in a business or sponsors of the project who can help identify who or what will benefit from this project as well as what, and why they need it. A well-defined goal should be measurable and quantifiable to define an acceptable result. 
+
+Questions a data scientist may ask:
+-	Has this problem been approached before? What was discovered?
+-	Is the purpose and goal understood by all involved?
+-	Where is there ambiguity and how to reduce it?
+-	What are the constraints?
+-	What will the end result potentially look like?
+-	How much resources (time, people, computational) are available?
+
+Next is identifying, collecting, then finally exploring the data needed to achieve these defined goals. At this step of acquisition, data scientists must also evaluate the quantity and quality of the data. This requires some data exploration to confirm what has been acquired will support reaching the desired result.  
+
+Questions a data scientist may ask about the data:
+-	What data is already available to me?
+-	Who owns this data?
+-	What are the privacy concerns? 
+-	Do I have enough to solve this problem?
+-	Is the data of acceptable quality for this problem?
+-	If I discover additional information through this data, should we consider changing or redefining the goals?
+
+
+## Processing
+## Maintaining

 ## Pre-Lecture Quiz