"# Exploring NYC Taxi data in Winter and Summer\r\n",
"\r\n",
"Refer to the [Data dictionary](https://www1.nyc.gov/assets/tlc/downloads/pdf/data_dictionary_trip_records_yellow.pdf) to explore the columns that have been provided.\r\n"
A client has approached your team for help in investigating a taxi customer's seasonal spending habits in New York City.
This continues the process of the lifecycle
They want to know: **Do yellow taxi passengers in New York City tip drivers more in the winter or summer?**
Your team is in the [Capturing](Readme.md#Capturing) stage of the Data Science Lifecycle and you are in charge of exploring the data. You have been provided a notebook and data from Azure Open Datasets to explore. You have decided to begin by exploring taxi data in the year 2019. For summer you choose June, July, and August and for winter you choose January, February, and December.
Your team is in the [Analyzing](Readme.md) stage of the Data Science Lifecycle.. You have been provided a notebook and data from Azure Open Datasets to explore. For summer you choose June, July, and August and for winter you choose January, February, and December.
## Instructions
In this directory is a [notebook](notebook.ipynb) that uses Python to load 6 months of yellow taxi trip data from the [NYC Taxi & Limousine Commission](https://docs.microsoft.com/en-us/azure/open-datasets/dataset-taxi-yellow?tabs=azureml-opendatasets) and Integrated Surface Data from NOAA. These datasets have been joined together in a Pandas dataframe.
Your task is to identify the columns that are the most likely required to answer this question, then reorganize the joined dataset so that these columns are displayed first.
Finally, write 3 questions that you would ask the client for more clarification and better understanding of the problem.