You can not select more than 25 topics
Topics must start with a letter or number, can include dashes ('-') and can be up to 35 characters long.
21 lines
1.3 KiB
21 lines
1.3 KiB
# Exploring and Assessing a Dataset
|
|
|
|
A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City.
|
|
|
|
They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?**
|
|
|
|
Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of exploring the dataset. You have been provided a notebook and data from Azure Open Datasets to explore and assess if the data can answer the client's question. You have decided to select a small sample of 1 summer month and 1 winter month in the year 2019.
|
|
|
|
## Instructions
|
|
|
|
In this directory is a [notebook](notebook.ipynb) that uses Python to load yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets) for the months of January and July 2019. These datasets have been joined together in a Pandas dataframe.
|
|
|
|
Your task is to identify the columns that are the most likely required to answer this question, then reorganize the joined dataset so that these columns are displayed first.
|
|
|
|
Finally, write 3 questions that you would ask the client for more clarification and better understanding of the problem.
|
|
|
|
## Rubric
|
|
|
|
Exemplary | Adequate | Needs Improvement
|
|
--- | --- | -- |
|