@ -83,16 +83,16 @@ Open the _/working_ folder in this lesson and find the _notebook.ipynb_ file.
### Create training and testing datasets
Now your data is loaded, so you can separate it into train and test sets. You'll train your model on the train set. As usual, after the model has finished training, you'll evaluate its accuracy using the test set. You need to ensure that the test set covers a later period in time from the training set to ensure that the model does not gain information from future time periods.
Now your data is loaded, so you can separate it into train and test sets. You'll train your model on the train set. As usual, after the model has finished training, you'll evaluate its accuracy using the test set. You need to ensure that the test set covers a later period in time from the training set to ensure that the model does not gain information from future time periods.
1. Allocate a two-month period from September 1 to October 31, 2014 to the training set. The test set will include the two-month period of November 1 to December 31, 2014:
```python
train_start_dt = '2014-11-01 00:00:00'
test_start_dt = '2014-12-30 00:00:00'
test_start_dt = '2014-12-30 00:00:00'
```
Since this data reflects the daily consumption of energy, there is a strong seasonal pattern, but the consumption is most similar to the consumption in more recent days.
Since this data reflects the daily consumption of energy, there is a strong seasonal pattern, but the consumption is most similar to the consumption in more recent days.
1. Visualize the differences:
@ -120,11 +120,11 @@ Now, you need to prepare the data for training by performing filtering and scali
test = energy.copy()[energy.index >= test_start_dt][['load']]
print('Training data shape: ', train.shape)
print('Test data shape: ', test.shape)
```
You can see the shape of the data:
```output
@ -189,17 +189,17 @@ Now you need to follow several steps
print('Forecasting horizon:', HORIZON, 'hours')
```
Selecting the best values for an ARIMA model's parameters can be challenging as it's somewhat subjective and time intensive. You might consider using an `auto_arima()` function from the [`pyramid` library](https://alkaline-ml.com/pmdarima/0.9.0/modules/generated/pyramid.arima.auto_arima.html),
Selecting the best values for an ARIMA model's parameters can be challenging as it's somewhat subjective and time intensive. You might consider using an `auto_arima()` function from the [`pyramid` library](https://alkaline-ml.com/pmdarima/0.9.0/modules/generated/pyramid.arima.auto_arima.html),
1. For now try some manual selections to find a good model.
```python
order = (4, 1, 0)
seasonal_order = (1, 1, 0, 24)
model = SARIMAX(endog=train, order=order, seasonal_order=seasonal_order)
results = model.fit()
print(results.summary())
```
@ -223,10 +223,10 @@ Walk-forward validation is the gold standard of time series model evaluation and
Observe the hourly data's prediction, compared to the actual load. How accurate is this?
@ -311,10 +311,10 @@ Walk-forward validation is the gold standard of time series model evaluation and
Check the accuracy of your model by testing its mean absolute percentage error (MAPE) over all the predictions.
> **🧮 Show me the math**
> **🧮 Show me the math**
>
> ![MAPE](images/mape.png)
>
>
> [MAPE](https://www.linkedin.com/pulse/what-mape-mad-msd-time-series-allameh-statistics/) is used to show prediction accuracy as a ratio defined by the above formula. The difference between actual<sub>t</sub> and predicted<sub>t</sub> is divided by the actual<sub>t</sub>. "The absolute value in this calculation is summed for every forecasted point in time and divided by the number of fitted points n." [wikipedia](https://wikipedia.org/wiki/Mean_absolute_percentage_error)
1. Express equation in code:
@ -351,13 +351,13 @@ Check the accuracy of your model by testing its mean absolute percentage error (
@ -365,9 +365,9 @@ Check the accuracy of your model by testing its mean absolute percentage error (
x = plot_df['timestamp'][(t-1):]
y = plot_df['t+'+str(t)][0:len(x)]
ax.plot(x, y, color='blue', linewidth=4*math.pow(.9,t), alpha=math.pow(0.8,t))
ax.legend(loc='best')
plt.xlabel('timestamp', fontsize=12)
plt.ylabel('load', fontsize=12)
plt.show()
@ -389,6 +389,6 @@ Dig into the ways to test the accuracy of a Time Series Model. We touch on MAPE
This lesson touches on only the basics of Time Series Forecasting with ARIMA. Take some time to deepen your knowledge by digging into [this repository](https://microsoft.github.io/forecasting/) and its various model types to learn other ways to build Time Series models.