diff --git a/4-Data-Science-Lifecycle/14-Introduction/README.md b/4-Data-Science-Lifecycle/14-Introduction/README.md index e248c16f..20e7c12d 100644 --- a/4-Data-Science-Lifecycle/14-Introduction/README.md +++ b/4-Data-Science-Lifecycle/14-Introduction/README.md @@ -4,8 +4,6 @@ |:---:| | Introduction to the Data Science Lifecycle - _Sketchnote by [@nitya](https://twitter.com/nitya)_ | -## Pre-Lecture Quiz - ## [Pre-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/26) At this point you've probably come to the realization that data science is a process. This process can be broken down into 5 stages: @@ -93,8 +91,7 @@ Explore the [Team Data Science Process lifecycle](https://docs.microsoft.com/en- |![](..\images\tdsp-lifecycle2.png)> Photo by [Microsoft](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle)| ![](..\images\CRISP-DM.png)> Photo by [Data Science Process Alliance](https://www.datascience-pm.com/crisp-dm-2/) -## Post-Lecture Quiz -[Post-lecture quiz]() +## [Post-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/27) ## Review & Self Study @@ -105,4 +102,4 @@ Applying the Data Science Lifecycle involves multiple roles and tasks, where som ## Assignment -[Exploring and Assessing a Dataset](assignment.md) +[Assessing a Dataset](assignment.md) diff --git a/4-Data-Science-Lifecycle/14-Introduction/assignment.md b/4-Data-Science-Lifecycle/14-Introduction/assignment.md index 44f8436c..36b01d5d 100644 --- a/4-Data-Science-Lifecycle/14-Introduction/assignment.md +++ b/4-Data-Science-Lifecycle/14-Introduction/assignment.md @@ -1,18 +1,20 @@ -# Exploring and Assessing a Dataset +# Assessing a Dataset A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City. They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?** -Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of exploring the dataset. You have been provided a notebook and data from Azure Open Datasets to explore and assess if the data can answer the client's question. You have decided to select a small sample of 1 summer month and 1 winter month in the year 2019. +Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of handling the the dataset. You have been provided a notebook and [data](../../data/taxi.csv) to explore. -## Instructions +In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets). +You can also open the taxi data file in text editor or spreadsheet software like Excel. -In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets) for the months of January and July 2019. These datasets have been joined together in a Pandas dataframe. +## Instructions -Your task is to identify the columns that are the most likely required to answer this question, then reorganize the joined dataset so that these columns are displayed first. +- Assess whether or not the data in this dataset can help answer the question. +- Write 3 questions that you would ask the client for more clarification and better understanding of the problem. -Finally, write 3 questions that you would ask the client for more clarification and better understanding of the problem. +Refer to the [dataset's dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) for more information about the ## Rubric diff --git a/4-Data-Science-Lifecycle/14-Introduction/notebook.ipynb b/4-Data-Science-Lifecycle/14-Introduction/notebook.ipynb index cce66031..e4041090 100644 --- a/4-Data-Science-Lifecycle/14-Introduction/notebook.ipynb +++ b/4-Data-Science-Lifecycle/14-Introduction/notebook.ipynb @@ -1,20 +1,11 @@ { "cells": [ - { - "cell_type": "markdown", - "source": [ - "Copyright (c) Microsoft Corporation. All rights reserved.\r\n", - "\r\n", - "Licensed under the MIT License." - ], - "metadata": {} - }, { "cell_type": "markdown", "source": [ "# Exploring NYC Taxi data in Winter and Summer\r\n", "\r\n", - "Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to explore the columns that have been provided.\r\n" + "Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to learn more about the columns that have been provided.\r\n" ], "metadata": {} }, @@ -37,12 +28,7 @@ "import glob\r\n", "\r\n", "path = '../../data/taxi.csv'\r\n", - "# july_taxi = pd.read_csv(path.format('07'))\r\n", - "# january_taxi = pd.read_csv(path.format('01'))\r\n", - "\r\n", - "# df = pd.concat([july_taxi.sample(100), january_taxi.sample(100)])\r\n", "df = pd.read_csv(path)\r\n", - "# df.describe()\r\n", "print(df)\r\n" ], "outputs": [