Merge pull request #50 from softchris/timeseries-intro

editorial
pull/73/head
chris 4 years ago committed by GitHub
commit 0d685849e1
No known key found for this signature in database
GPG Key ID: 4AEE18F83AFDEB23

@ -4,24 +4,29 @@
> Sketchnote by [Tomomi Imura](https://www.twitter.com/girlie_mac)
In this lesson and the following one, you will learn a bit about time series forecasting, an interesting and valuable part of a ML scientist's repertoire that is a bit lesser known than other topics. Time series forecasting is a sort of crystal ball: based on past performance of a variable such as price, you can predict its future potential value.
In this lesson and the following one, you will learn a bit about time series forecasting, an interesting and valuable part of a ML scientist's repertoire that is a bit less known than other topics. Time series forecasting is a sort of 'crystal ball': based on past performance of a variable such as price, you can predict its future potential value.
[![Introduction to time series forecasting](https://img.youtube.com/vi/cBojo1hsHiI/0.jpg)](https://youtu.be/cBojo1hsHiI "Introduction to time series forecasting")
> 🎥 Click the image above for a video about time series forecasting
## [Pre-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/41/)
It's a useful and interesting field with real value to business, given its direct application to problems of pricing, inventory, and supply chain issues. While deep learning techniques have started to be used to gain more insights in the prediction of future performance, time series forecasting remains a field greatly informed by classic ML techniques.
It's a useful and interesting field with real value to business, given its direct application to problems of pricing, inventory, and supply chain issues. While deep learning techniques have started to be used to gain more insights to better predict future performance, time series forecasting remains a field greatly informed by classic ML techniques.
> Penn State's useful time series curriculum can be found [here](https://online.stat.psu.edu/stat510/lesson/1)
### Introduction
## Introduction
Suppose you maintain an array of smart parking meters that provide data about how often they are used and for how long over time.
> What if you could predict, based on the meter's past performance, its future value according to the laws of supply and demand?
Supposing you maintain an array of smart parking meters that provide data about how often they are used and for how long over time. What if you could generate revenue to maintain your streets by slightly augmenting the prices of the meters when there is greater demand for them? What if you could predict, based on the meter's past performance, its future value according to the laws of supply and demand? This is a challenge that could be tackled by time series forecasting. It wouldn't make those folks in search of a rare parking spot in busy times very happy to have to pay more for it, but it would be a sure way to generate revenue to clean the streets!
Accurately predicting when to act so as to achieve your goal is a challenge that could be tackled by time series forecasting. It wouldn't make folks happy to be charged more in busy times when they're looking for a parking spot, but it would be a sure way to generate revenue to clean the streets!
Let's explore some of the types of time series algorithms and start a notebook to clean and prepare some data. The data you will analyze is taken from the GEFCom2014 forecasting competition. It consists of 3 years of hourly electricity load and temperature values between 2012 and 2014. Given the historical patterns of electricity load and temperature, you can predict future values of electricity load. In this example, you'll learn how to forecast one time step ahead, using historical load data only.
Let's explore some of the types of time series algorithms and start a notebook to clean and prepare some data. The data you will analyze is taken from the GEFCom2014 forecasting competition. It consists of 3 years of hourly electricity load and temperature values between 2012 and 2014. Given the historical patterns of electricity load and temperature, you can predict future values of electricity load.
Before starting, however, it's useful to understand what's going on behind the scenes.
In this example, you'll learn how to forecast one time step ahead, using historical load data only. Before starting, however, it's useful to understand what's going on behind the scenes.
## Some definitions
@ -33,13 +38,13 @@ In mathematics, "a time series is a series of data points indexed (or listed or
🎓 **Time series analysis**
Time series analysis is the analysis of the above mentioned time series data. Time series data can take distinct forms, including 'interrupted time series' which detects patterns in a time series' evolution before and after an interrupting event. The type of analysis needed for the time series depends on the nature of the data. Time series data itself can take the form of series of numbers or characters.
Time series analysis, is the analysis of the above mentioned time series data. Time series data can take distinct forms, including 'interrupted time series' which detects patterns in a time series' evolution before and after an interrupting event. The type of analysis needed for the time series, depends on the nature of the data. Time series data itself can take the form of series of numbers or characters.
The analysis be performed using a variety of methods, including frequency-domain and time-domain, linear and nonlinear, and more. [Learn more](https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm) about the may ways to analyze this type of data.
The analysis to be performed, uses a variety of methods, including frequency-domain and time-domain, linear and nonlinear, and more. [Learn more](https://www.itl.nist.gov/div898/handbook/pmc/section4/pmc4.htm) about the may ways to analyze this type of data.
🎓 **Time series forecasting**
Time series forecasting is the use of a model to predict future values based on patterns displayed by previously gathered data as it occurred in the past. While it is possible to use regression models to explore time series data, with time indices as x variables on a plot, this type of data is best analyzed using special types of models.
Time series forecasting is the use of a model to predict future values based on patterns displayed by previously gathered data as it occurred in the past. While it is possible to use regression models to explore time series data, with time indices as x variables on a plot, such data is best analyzed using special types of models.
Time series data is a list of ordered observations, unlike data that can be analyzed by linear regression. The most common one is ARIMA, an acronym that stands for "Autoregressive Integrated Moving Average".
@ -68,19 +73,21 @@ In the next lesson, you will build an ARIMA model using [Univariate Time Series]
## Time Series [data characteristics](https://online.stat.psu.edu/stat510/lesson/1/1.1) to consider
When looking at time series data, you might notice that it has certain characteristics that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques.
When looking at time series data, you might notice that it has certain characteristics that you need to take into account and mitigate to better understand its patterns. If you consider time series data as potentially providing a 'signal' that you want to analyze, these characteristics can be thought of as 'noise'. You often will need to reduce this 'noise' by offsetting some of these characteristics using some statistical techniques.
Here are some concepts you should know to be able to work with time series:
🎓 **Trends**
Measurable increases and decreases over time. [Read more](https://machinelearningmastery.com/time-series-trends-in-python) about how to use and, if necessary, remove trends from your time series.
Trends are defined as measurable increases and decreases over time. [Read more](https://machinelearningmastery.com/time-series-trends-in-python). In the context of time series, it's about how to use and, if necessary, remove trends from your time series.
🎓 **[Seasonality](https://machinelearningmastery.com/time-series-seasonality-with-python/)**
Periodic fluctuations, such as holiday rushes that might affect sales, for example. [Take a look](https://itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm) at how different types of plots display seasonality in data.
Seasonality is defined as periodic fluctuations, such as holiday rushes that might affect sales, for example. [Take a look](https://itl.nist.gov/div898/handbook/pmc/section4/pmc443.htm) at how different types of plots display seasonality in data.
🎓 **Outliers**
Outliers are far away from the standard data variance.
Outliers are far away from the standard data variance.
🎓 **Long-run cycle**
@ -97,76 +104,82 @@ The data might display an abrupt change that might need further analysis. The ab
✅ Here is a [sample time series plot](https://www.kaggle.com/kashnitsky/topic-9-part-1-time-series-analysis-in-python) showing daily in-game currency spent over a few years. Can you identify any of the characteristics listed above in this data?
![In-game currency spend](./images/currency.png)
## Exercise: Getting started with power usage data
Let's get started creating a time series model to predict future power usage given past usage.
## Exercise - getting started with power usage data
Let's get started creating a time series model to predict future power usage given past usage.
> The data in this example is taken from the GEFCom2014 forecasting competition. It consists of 3 years of hourly electricity load and temperature values between 2012 and 2014.
>
> Tao Hong, Pierre Pinson, Shu Fan, Hamidreza Zareipour, Alberto Troccoli and Rob J. Hyndman, "Probabilistic energy forecasting: Global Energy Forecasting Competition 2014 and beyond", International Journal of Forecasting, vol.32, no.3, pp 896-913, July-September, 2016.
1. In the `working` folder of this lesson, open the `notebook.ipynb` file. Start by adding libraries that will help you load and visualize data
1. In the `working` folder of this lesson, open the _notebook.ipynb_ file. Start by adding libraries that will help you load and visualize data
```python
import os
import matplotlib.pyplot as plt
from common.utils import load_data
%matplotlib inline
```
Note, you are using the files from the included `common` folder which set up your environment and handle downloading the data.
```python
import os
import matplotlib.pyplot as plt
from common.utils import load_data
%matplotlib inline
```
2. Next, examine the data as a dataframe
Note, you are using the files from the included `common` folder which set up your environment and handle downloading the data.
```python
data_dir = './data'
energy = load_data(data_dir)[['load']]
energy.head()
```
You can see that there are two columns representing date and load:
2. Next, examine the data as a dataframe calling `load_data()` and `head()`:
| | load |
| :-----------------: | :----: |
| 2012-01-01 00:00:00 | 2698.0 |
| 2012-01-01 01:00:00 | 2558.0 |
| 2012-01-01 02:00:00 | 2444.0 |
| 2012-01-01 03:00:00 | 2402.0 |
| 2012-01-01 04:00:00 | 2403.0 |
```python
data_dir = './data'
energy = load_data(data_dir)[['load']]
energy.head()
```
3. Now, plot the data:
You can see that there are two columns representing date and load:
```python
energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
plt.xlabel('timestamp', fontsize=12)
plt.ylabel('load', fontsize=12)
plt.show()
```
![energy plot](images/energy-plot.png)
| | load |
| :-----------------: | :----: |
| 2012-01-01 00:00:00 | 2698.0 |
| 2012-01-01 01:00:00 | 2558.0 |
| 2012-01-01 02:00:00 | 2444.0 |
| 2012-01-01 03:00:00 | 2402.0 |
| 2012-01-01 04:00:00 | 2403.0 |
4. Now, plot the first week of July 2014
3. Now, plot the data calling `plot()`:
```python
energy['2014-07-01':'2014-07-07'].plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
plt.xlabel('timestamp', fontsize=12)
plt.ylabel('load', fontsize=12)
plt.show()
```
```python
energy.plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
plt.xlabel('timestamp', fontsize=12)
plt.ylabel('load', fontsize=12)
plt.show()
```
![july](images/july-2014.png)
![energy plot](images/energy-plot.png)
A beautiful plot! Take a look at these plots and see if you can determine any of the characteristics listed above. What can we surmise just by visualizing the data?
4. Now, plot the first week of July 2014, by providing it as input to the `energy` in `[from date]: [to date]` pattern:
```python
energy['2014-07-01':'2014-07-07'].plot(y='load', subplots=True, figsize=(15, 8), fontsize=12)
plt.xlabel('timestamp', fontsize=12)
plt.ylabel('load', fontsize=12)
plt.show()
```
![july](images/july-2014.png)
A beautiful plot! Take a look at these plots and see if you can determine any of the characteristics listed above. What can we surmise by visualizing the data?
In the next lesson, you will create an ARIMA model to create some forecasts.
---
## 🚀Challenge
Make a list of all the industries and areas of inquiry you can think of that would benefit from time series forecasting. Can you think of an application of these techniques in the arts? In Econometrics? Ecology? Retail? Industry? Finance? Where else?
## [Post-lecture quiz](https://jolly-sea-0a877260f.azurestaticapps.net/quiz/42/)
## Review & Self Study
Although we won't cover them here, neural networks are sometimes used to enhance classic methods of time series forecasting. Read more about them [in this article](https://medium.com/microsoftazure/neural-networks-for-forecasting-financial-and-economic-time-series-6aca370ff412)
## Assignment
## Assignment
[Visualize some more time series](assignment.md)

@ -1,4 +1,7 @@
# Introduction to time series forecasting
What is time series forecasting? It's about predicting future events by analyzing trends of the past.
## Regional topic: worldwide electricity usage ✨
In these two lessons, you will be introduced to time series forecasting, a somewhat lesser known area of machine learning that is nevertheless extremely valuable for industry and business applications, among other fields. While neural networks can be used to enhance the utility of these models, we will study them in the context of classical machine learning as models help predict future performance based on the past.
@ -8,6 +11,7 @@ Our regional focus is electrical usage in the world, an interesting dataset to l
![electric grid](images/electric-grid.jpg)
Photo by <a href="https://unsplash.com/@shutter_log?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Peddi Sai hrithik</a> of electrical towers on a road in Rajasthan on <a href="https://unsplash.com/s/photos/electric-india?utm_source=unsplash&utm_medium=referral&utm_content=creditCopyText">Unsplash</a>
## Lessons
1. [Introduction to time series forecasting](1-Introduction/README.md)

Loading…
Cancel
Save