diff --git a/4-Data-Science-Lifecycle/14-Introduction/README.md b/4-Data-Science-Lifecycle/14-Introduction/README.md index 2b26d697..393db5fa 100644 --- a/4-Data-Science-Lifecycle/14-Introduction/README.md +++ b/4-Data-Science-Lifecycle/14-Introduction/README.md @@ -57,7 +57,7 @@ Common techniques used in this stage are covered in the ML for Beginners curricu In the diagram of lifecycle, you may have noticed that maintenance sits between capturing and processing. Maintenance is an ongoing process of managing, storing and securing the data throughout the process of a project and should be taken into consideration throughout the entirety of the project. ### Storing Data -Considerations of how and where the data is stored can influence the cost of its storage as well as performance of how fast the data can be accessed. Decisions like these are not likely to made by a data scientist alone but they may find themselves making choices on how to work with the data based on how it’s stored. +Considerations of how and where the data is stored can influence the cost of its storage as well as performance of how fast the data can be accessed. Decisions like these are not likely to be made by a data scientist alone but they may find themselves making choices on how to work with the data based on how it’s stored. Here’s some aspects of modern data storage systems that can affect these choices: diff --git a/4-Data-Science-Lifecycle/15-analyzing/README.md b/4-Data-Science-Lifecycle/15-analyzing/README.md index a71cf337..fa3ee14e 100644 --- a/4-Data-Science-Lifecycle/15-analyzing/README.md +++ b/4-Data-Science-Lifecycle/15-analyzing/README.md @@ -4,8 +4,6 @@ |:---:| | Data Science Lifecycle: Analyzing - _Sketchnote by [@nitya](https://twitter.com/nitya)_ | -## Pre-Lecture Quiz - ## [Pre-Lecture Quiz](https://ff-quizzes.netlify.app/en/ds/quiz/28) Analyzing in the data lifecycle confirms that the data can answer the questions that are proposed or solving a particular problem. This step can also focus on confirming a model is correctly addressing these questions and problems. This lesson is focused on Exploratory Data Analysis or EDA, which are techniques for defining features and relationships within the data and can be used to prepare the data for modeling. @@ -27,11 +25,11 @@ How do we evaluate if we have enough data to solve this problem? Data profiling In a few of the previous lessons, we have used Pandas to provide some descriptive statistics with the [`describe()` function]( https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.describe.html). It provides the count, max and min values, mean, standard deviation and quantiles on the numerical data. Using descriptive statistics like the `describe()` function can help you assess how much you have and if you need more. ## Sampling and Querying -Exploring everything in a large dataset can be very time consuming and a task that’s usually left up to a computer to do. However, sampling is a helpful tool in understanding of the data and allows us to have a better understanding of what’s in the dataset and what it represents. With a sample, you can apply probability and statistics to come to some general conclusions about your data. While there’s no defined rule on how much data you should sample it’s important to note that the more data you sample, the more precise of a generalization you can make of about data. +Exploring everything in a large dataset can be very time consuming and a task that’s usually left up to a computer to do. However, sampling is a helpful tool in understanding the data and allows us to have a better understanding of what’s in the dataset and what it represents. With a sample, you can apply probability and statistics to come to some general conclusions about your data. While there’s no defined rule on how much data you should sample it’s important to note that the more data you sample, the more precise of a generalization you can make of about data. Pandas has the [`sample()` function in its library](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.sample.html) where you can pass an argument of how many random samples you’d like to receive and use. General querying of the data can help you answer some general questions and theories you may have. In contrast to sampling, queries allow you to have control and focus on specific parts of the data you have questions about. -The [`query() `function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html) in the Pandas library allows you to select columns and receive simple answers about the data through the rows retrieved. +The [`query()` function](https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.query.html) in the Pandas library allows you to select columns and receive simple answers about the data through the rows retrieved. ## Exploring with Visualizations You don’t have to wait until the data is thoroughly cleaned and analyzed to start creating visualizations. In fact, having a visual representation while exploring can help identify patterns, relationships, and problems in the data. Furthermore, visualizations provide a means of communication with those who are not involved with managing the data and can be an opportunity to share and clarify additional questions that were not addressed in the capture stage. Refer to the [section on Visualizations](/3-Data-Visualization) to learn more about some popular ways to explore visually. diff --git a/4-Data-Science-Lifecycle/16-communication/README.md b/4-Data-Science-Lifecycle/16-communication/README.md index 6d10f537..a2cb45a3 100644 --- a/4-Data-Science-Lifecycle/16-communication/README.md +++ b/4-Data-Science-Lifecycle/16-communication/README.md @@ -11,7 +11,7 @@ Test your knowledge of what's to come with the Pre-Lecture Quiz above! # Introduction ### What is Communication? -Let’s start this lesson by defining what is means to communicate. **To communicate is to convey or exchange information.** Information can be ideas, thoughts, feelings, messages, covert signals, data – anything that a **_sender_** (someone sending information) wants a **_receiver_** (someone receiving information) to understand. In this lesson, we will refer to senders as communicators, and receivers as the audience. +Let’s start this lesson by defining what it means to communicate. **To communicate is to convey or exchange information.** Information can be ideas, thoughts, feelings, messages, covert signals, data – anything that a **_sender_** (someone sending information) wants a **_receiver_** (someone receiving information) to understand. In this lesson, we will refer to senders as communicators, and receivers as the audience. ### Data Communication & Storytelling We understand that when communicating, the aim is to convey or exchange information. But when communicating data, your aim shouldn't be to simply pass along numbers to your audience. Your aim should be to communicate a story that is informed by your data - effective data communication and storytelling go hand-in-hand. Your audience is more likely to remember a story you tell, than a number you give. Later in this lesson, we will go over a few ways that you can use storytelling to communicate your data more effectively. @@ -91,7 +91,7 @@ That messaging is more clear. When communicating data, it can be easy to think t You can communicate data more clearly when you use meaningful words and phrases, instead of vague ones. Below are a few examples. - We had an *impressive* year! - - One person could think a impressive means a 2% - 3% increase in revenue, and one person could think it means a 50% - 60% increase. + - One person could think an impressive year means a 2% - 3% increase in revenue, and one person could think it means a 50% - 60% increase. - Our users' success rates increased *dramatically*. - How large of an increase is a dramatic increase? - This undertaking will require *significant* effort. @@ -121,7 +121,7 @@ How do you use emotion when communicating data? Below are a couple of ways. - Yellow is usually optimism and happiness # Communication Case Study -Emerson is a Product Manager for a mobile app. Emerson has noticed that customers submit 42% more complaints and bug reports on the weekends. Emerson also noticed that customers who submit a complaint that goes unanswered after 48 hours are more 32% more likely to give the app a rating of 1 or 2 in the app store. +Emerson is a Product Manager for a mobile app. Emerson has noticed that customers submit 42% more complaints and bug reports on the weekends. Emerson also noticed that customers who submit a complaint that goes unanswered after 48 hours are 32% more likely to give the app a rating of 1 or 2 in the app store. After doing research, Emerson has a couple of solutions that will address the issue. Emerson sets up a 30-minute meeting with the 3 company leads to communicate the data and the proposed solutions. @@ -143,7 +143,7 @@ How could Emerson improve this approach? Context, Conflict, Climax, Closure, Conclusion **Context** - Emerson could spend the first 5 minutes introducing the entire situation and making sure that the team leads understand how the problems affect metrics that are critical to the company, like revenue. -It could be laid out this way: "Currently, our app's rating in the app store is a 2.5. Ratings in the app store are critical to App Store Optimization, which impacts how many users see our app in search, and how our app is viewed to perspective users. And ofcourse, the number of users we have is tied directly to revenue." +It could be laid out this way: "Currently, our app's rating in the app store is a 2.5. Ratings in the app store are critical to App Store Optimization, which impacts how many users see our app in search, and how our app is viewed to prospective users. And of course, the number of users we have is tied directly to revenue." **Conflict** Emerson could then move to talk for the next 5 minutes or so on the conflict.