update assignment

pull/117/head
Jasmine 4 years ago
parent b0fa4305d2
commit cf4ada2020

@ -4,8 +4,6 @@
|:---:| |:---:|
| Introduction to the Data Science Lifecycle - _Sketchnote by [@nitya](https://twitter.com/nitya)_ | | Introduction to the Data Science Lifecycle - _Sketchnote by [@nitya](https://twitter.com/nitya)_ |
## Pre-Lecture Quiz
## [Pre-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/26) ## [Pre-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/26)
At this point you've probably come to the realization that data science is a process. This process can be broken down into 5 stages: At this point you've probably come to the realization that data science is a process. This process can be broken down into 5 stages:
@ -93,8 +91,7 @@ Explore the [Team Data Science Process lifecycle](https://docs.microsoft.com/en-
|![](..\images\tdsp-lifecycle2.png)> Photo by [Microsoft](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle)| ![](..\images\CRISP-DM.png)> Photo by [Data Science Process Alliance](https://www.datascience-pm.com/crisp-dm-2/) |![](..\images\tdsp-lifecycle2.png)> Photo by [Microsoft](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle)| ![](..\images\CRISP-DM.png)> Photo by [Data Science Process Alliance](https://www.datascience-pm.com/crisp-dm-2/)
## Post-Lecture Quiz ## [Post-Lecture Quiz](https://red-water-0103e7a0f.azurestaticapps.net/quiz/27)
[Post-lecture quiz]()
## Review & Self Study ## Review & Self Study
@ -105,4 +102,4 @@ Applying the Data Science Lifecycle involves multiple roles and tasks, where som
## Assignment ## Assignment
[Exploring and Assessing a Dataset](assignment.md) [Assessing a Dataset](assignment.md)

@ -1,18 +1,20 @@
# Exploring and Assessing a Dataset # Assessing a Dataset
A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City. A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City.
They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?** They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?**
Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of exploring the dataset. You have been provided a notebook and data from Azure Open Datasets to explore and assess if the data can answer the client's question. You have decided to select a small sample of 1 summer month and 1 winter month in the year 2019. Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of handling the the dataset. You have been provided a notebook and [data](../../data/taxi.csv) to explore.
## Instructions In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets).
You can also open the taxi data file in text editor or spreadsheet software like Excel.
In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets) for the months of January and July 2019. These datasets have been joined together in a Pandas dataframe. ## Instructions
Your task is to identify the columns that are the most likely required to answer this question, then reorganize the joined dataset so that these columns are displayed first. - Assess whether or not the data in this dataset can help answer the question.
- Write 3 questions that you would ask the client for more clarification and better understanding of the problem.
Finally, write 3 questions that you would ask the client for more clarification and better understanding of the problem. Refer to the [dataset's dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) for more information about the
## Rubric ## Rubric

@ -1,20 +1,11 @@
{ {
"cells": [ "cells": [
{
"cell_type": "markdown",
"source": [
"Copyright (c) Microsoft Corporation. All rights reserved.\r\n",
"\r\n",
"Licensed under the MIT License."
],
"metadata": {}
},
{ {
"cell_type": "markdown", "cell_type": "markdown",
"source": [ "source": [
"# Exploring NYC Taxi data in Winter and Summer\r\n", "# Exploring NYC Taxi data in Winter and Summer\r\n",
"\r\n", "\r\n",
"Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to explore the columns that have been provided.\r\n" "Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to learn more about the columns that have been provided.\r\n"
], ],
"metadata": {} "metadata": {}
}, },
@ -37,12 +28,7 @@
"import glob\r\n", "import glob\r\n",
"\r\n", "\r\n",
"path = '../../data/taxi.csv'\r\n", "path = '../../data/taxi.csv'\r\n",
"# july_taxi = pd.read_csv(path.format('07'))\r\n",
"# january_taxi = pd.read_csv(path.format('01'))\r\n",
"\r\n",
"# df = pd.concat([july_taxi.sample(100), january_taxi.sample(100)])\r\n",
"df = pd.read_csv(path)\r\n", "df = pd.read_csv(path)\r\n",
"# df.describe()\r\n",
"print(df)\r\n" "print(df)\r\n"
], ],
"outputs": [ "outputs": [

Loading…
Cancel
Save