At this point you've probably come to the realization that data science is a process. This process can be broken down into 5 stages:
At this point you've probably come to the realization that data science is a process. This process can be broken down into 5 stages:
@ -93,8 +91,7 @@ Explore the [Team Data Science Process lifecycle](https://docs.microsoft.com/en-
|> Photo by [Microsoft](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle)| > Photo by [Data Science Process Alliance](https://www.datascience-pm.com/crisp-dm-2/)
|> Photo by [Microsoft](https://docs.microsoft.com/en-us/azure/architecture/data-science-process/lifecycle)| > Photo by [Data Science Process Alliance](https://www.datascience-pm.com/crisp-dm-2/)
A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City.
A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City.
They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?**
They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?**
Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of exploring the dataset. You have been provided a notebook and data from Azure Open Datasets to explore and assess if the data can answer the client's question. You have decided to select a small sample of 1 summer month and 1 winter month in the year 2019.
Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of handling the the dataset. You have been provided a notebook and [data](../../data/taxi.csv) to explore.
## Instructions
In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets).
You can also open the taxi data file in text editor or spreadsheet software like Excel.
In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets) for the months of January and July 2019. These datasets have been joined together in a Pandas dataframe.
## Instructions
Your task is to identify the columns that are the most likely required to answer this question, then reorganize the joined dataset so that these columns are displayed first.
- Assess whether or not the data in this dataset can help answer the question.
- Write 3 questions that you would ask the client for more clarification and better understanding of the problem.
Finally, write 3 questions that you would ask the client for more clarification and better understanding of the problem.
Refer to the [dataset's dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) for more information about the
"Copyright (c) Microsoft Corporation. All rights reserved.\r\n",
"\r\n",
"Licensed under the MIT License."
],
"metadata": {}
},
{
{
"cell_type": "markdown",
"cell_type": "markdown",
"source": [
"source": [
"# Exploring NYC Taxi data in Winter and Summer\r\n",
"# Exploring NYC Taxi data in Winter and Summer\r\n",
"\r\n",
"\r\n",
"Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to explore the columns that have been provided.\r\n"
"Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to learn more about the columns that have been provided.\r\n"