Multistep Time Series Forecasting with LSTMs in Python (번역)

개인적인 공부를 위해 번역해서 포스트 올립니다. 하지만 오역이 난무합니다. (수정 피드백 환영)

- 출처

machinelearningmastery.com/multi-step-time-series-forecasting-long-short-term-memory-networks-python/

Multistep Time Series Forecasting with LSTMs in Python

The Long Short-Term Memory network or LSTM is a recurrent neural network that can learn and forecast long sequences. A benefit of LSTMs in addition to learning long sequences is that they can learn to make a one-shot multi-step forecast which may be useful

machinelearningmastery.com

Shampoo Sales Dataset

3년간 월별 샴푸 판매량에 대한 데이터셋

The units are a sales count and there are 36 observations. The original dataset is credited to Makridakis,

Wheelwright, and Hyndman (1998).

Download the dataset.

아래의 예제는 로드된 데이터셋의 Plot을 로드하고 생성한다.

# load and plot dataset
from pandas import read_csv
from pandas import datetime
from matplotlib import pyplot
# load dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# summarize first few rows
print(series.head())
# line plot
series.plot()
pyplot.show()

예제의 실행 결과로 Pandas Series로써 데이터셋을 로드하고 처음 5행을 출력한다.

Month
1901-01-01    266.0
1901-02-01    145.9
1901-03-01    183.1
1901-04-01    119.3
1901-05-01    180.3
Name: Sales, dtype: float64

명확한 증가 추세를 보여주는 일련의 line plot이 생성된다.

Line Plot of Shampoo Sales Dataset

다음으로 실험에서 사용된 모델의 구성과 테스트 하네스를 살펴본다.

(test harness: 시스템 및 시스템 컴포넌트를 시험하는 환경의 일부분으로 시험을 지원하는 목적하에 생성된 코드와 데이터. )

Data Preparation and Model Evaluation

이번 섹션은 데이터 준비와 모델 평가를 설명한다.

Data Split

샴푸 판매 데이터셋을 트레이닝셋과 테스트셋 2개로 나눈다.

처음 2년의 데이터는 트레이닝 데이터셋으로 취하고 남은 1년의 데이터는 테스트셋으로 취한다.

모델은 트레이닝 데이터셋을 사용하면서 발전될 것이고 테스트셋에 대한 예측을 할 것이다.

참고로 마지막 12달의 관측은 다음과 같다:

"3-01",339.7
"3-02",440.4
"3-03",315.9
"3-04",439.3
"3-05",401.3
"3-06",437.4
"3-07",575.5
"3-08",407.6
"3-09",682.0
"3-10",475.3
"3-11",581.3
"3-12",646.9

Multi-Step Forecast

우리는 multi-step 예측을 할 것이다.

데이터셋의 마지막 12개월 중 지정된 달에 대한 3개월 예측을 해야한다.

주어진 과거의 관측(t-1, t-2, ... t-n)으로 t, t+1, t+2를 예측한다.

특히, 2년차의 12월로부터는 1월, 2월, 3월을 예측해야 한다. 1월로부터는 2월과 3월, 4월을 예측한다. All the way to an October, November, December forecast from September in year 3.

다음과 같이 총 10개의 3개월 예측이 필요하다.

Dec,	Jan, Feb, Mar
Jan,	Feb, Mar, Apr
Feb,	Mar, Apr, May
Mar,	Apr, May, Jun
Apr, 	May, Jun, Jul
May,	Jun, Jul, Aug
Jun,	Jul, Aug, Sep
Jul,	Aug, Sep, Oct
Aug,	Sep, Oct, Nov
Sep,	Oct, Nov, Dec

Model Evaluation

롤링-예측 시나리오가 사용될 것이다. (also called walk-forward model validation.)

테스트 데이터셋의 각각의 time step은 한 번에 하나씩 진행된다.

모델은 time step에 대한 예측에 사용되는데 테스트 세트로부터 실제로 다음달에 기대되는 값(예측값)은 그 다음의 time step에 예측을 위한 모델에 사용된다.

이것은 새로운 샴푸 판매 관측치들이 각각의 달에 사용될 수 있고 다음달의 예측에 사용되는 현실 세계의 시나리오를 모방한다. 이것은 훈련 및 시험 데이터 세트의 구조에 의해 시뮬레이션될 것이다.

테스트 데이터 집합에 대한 모든 예측을 수집하고 각 예측 시간 단계에 대한 모델의 기술을 요약하기 위해 오류 점수를 계산한다.

root mean squared error (RMSE)는 큰 오류를 잡는데 사용되고 예측 데이터와 같은 유닛에 있는 점수의 결과로 사용된다. 즉, 월간 샴푸 판매량

Persistence Model

시계열 예측을 위한 좋은 Baseline은 지속성 모델이다.

이것은 마지막 관측이 앞으로도 유지되는 예측 모델이다.

이것은 단순성때문에 종종 navie 예측이라 불린다.

아래의 포스트에서 시계열 예측을 위한 지속성 모델을 더 공부할 수 있다:

How to Make Baseline Predictions for Time Series Forecasting with Python

Prepare Data

첫 단계는 시계열 데이터를 지도된 학습 problem으로 바꾸는 것이다.

숫자들의 리스트에서 입출력 패턴의 리스트로 가는 것이다.

우리는 series_to_supervised()라고 불리는 pre-prepared function을 사용하여 이를 달성할 수 있다.

series_to_supervised() 함수에 대한 더 많은 정보가 담긴 포스트:

How to Convert a Time Series to a Supervised Learning Problem in Python

이 함수는 아래에 리스트되어 있다.

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

이 함수는 로드된 시계열 값 n_in value 1과 n_out value 3을 전달함으로써 호출할 수 있다.

For example:

supervised = series_to_supervised(raw_values, 1, 3)

다음으로 우리는 지도 학습 데이터셋을 훈련 세트와 테스트 세트로 분할할 수 있다.

이 형태에서 마지막 10개의 행은 마지막 해의 데이터를 가지고 있음을 알고 있다. 이 행들은 테스트 세트를 구성하고 나머지 데이터는 훈련 데이터 세트를 구성한다.

우리는 로드된 시계열과 몇몇의 파라미터들을 취하고 모델링 준비를 위한 훈련 및 테스트 세트를 반환하는 새로운 함수에 이것들을 모두 넣을 수 있다.

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	raw_values = raw_values.reshape(len(raw_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(raw_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return train, test

We can test this with the Shampoo dataset. The complete example is listed below.

from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import datetime

# date-time parsing function for loading the dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	raw_values = raw_values.reshape(len(raw_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(raw_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return train, test

# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# configure
n_lag = 1
n_seq = 3
n_test = 10
# prepare data
train, test = prepare_data(series, n_test, n_lag, n_seq)
print(test)
print('Train: %s, Test: %s' % (train.shape, test.shape))

Running the example first prints the entire test dataset, which is the last 10 rows. The shape and size of the train test datasets is also printed.

[[ 342.3  339.7  440.4  315.9]
 [ 339.7  440.4  315.9  439.3]
 [ 440.4  315.9  439.3  401.3]
 [ 315.9  439.3  401.3  437.4]
 [ 439.3  401.3  437.4  575.5]
 [ 401.3  437.4  575.5  407.6]
 [ 437.4  575.5  407.6  682. ]
 [ 575.5  407.6  682.   475.3]
 [ 407.6  682.   475.3  581.3]
 [ 682.   475.3  581.3  646.9]]
Train: (23, 4), Test: (10, 4)

테스트 데이터셋에 첫번째 행의 단일 입력 값이 2번째 해의 12월 샴품 판매량 관측치와 일치하는 것을 볼 수 있다:

"2-12",342.3

우리는 또한 각 행이 1개의 입력과 3개의 출력 값들로 이루어진 4개의 열을 포함하고 있음을 볼 수 있다.

Make Forecasts

다음 단계는 지속성 예측을 하는 것이다.

지속성 예측은 persistence()이라는 이름의 함수에서 마지막 관측과 지속할 예측 단계의 수를 취하는 것으로 쉽게 구현할 수 있다.

이 함수는 예측을 포함하는 배열을 반환한다.

# make a persistence forecast
def persistence(last_ob, n_seq):
	return [last_ob for i in range(n_seq)]

그러면 우리는 이 함수를 2년차 12월 부터 3년차 9월까지 테스트 데이터세트의 각 타임 스탭별로 호출할 수 있다.

아래는 함수 make_forecasts()이다. 이 함수는 이것을 하고(타임 스탭별 호출?) 인자로 데이터 세트에 대한 구성과 학습 및 테스트 데이터를 취하며 예측 리스트를 반환한다.

# evaluate the persistence model
def make_forecasts(train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = persistence(X[-1], n_seq)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

We can call this function as follows:

forecasts = make_forecasts(train, test, 1, 3)

Evaluate Forecasts

마지막 단계는 예측을 평가하는 것이다.

multi-step 예측의 각 time step에 대한 RMSE를 계산함으로써 평가한다. 이번 경우에는 3개의 RMSE 점수가 주어진다.

아래의 함수 evaluate_forecasts()는 각 time step별 예측에 대한 RMSE를 계산하고 출력한다.

# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = test[:,(n_lag+i)]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

We can call it as follows:

evaluate_forecasts(test, forecasts, 1, 3)

또한 RMSE 점수가 context에서 문제와 어떻게 관련되는지 알기 위해 원래 데이터의 context에서 예측을 plot하는 것도 도움이 된다.

We can first plot the entire Shampoo dataset, then plot each forecast as a red line.

첫번째로 우리는 전체 샴푸 데이터세트를 도표를 그린다. 그리고나서 빨간 선으로 각 예측을 보여준다.

아래의 함수 plot_forecasts()는 이 도표를 생성하고 보여줄 것이다.

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - n_test + i
		off_e = off_s + len(forecasts[i])
		xaxis = [x for x in range(off_s, off_e)]
		pyplot.plot(xaxis, forecasts[i], color='red')
	# show the plot
	pyplot.show()

We can call the function as follows.

Note that the number of observations held back on the test set is 12 for the 12 months, as opposed to 10 for the 10 supervised learning input/output patterns as was used above.

위에서 사용된 10가지 감독 학습 입출력 패턴의 경우 10개가 아니라 12개월 동안 시험 세트에 보류된 관측치의 수는 12개라는 점에 유의하십시오.

# plot forecasts
plot_forecasts(series, forecasts, 12)

우리는 원래 데이터 세트안의 실제 지속되는 값에 지속되는 예측을 연결함으로써 더 나은 도표를 만들 수 있다.

이를 위해 예측의 앞에 마지막 관측 값을 추가한다.

아래는 plot_forecasts() 함수의 업데이트된 버전이다.

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - 12 + i - 1
		off_e = off_s + len(forecasts[i]) + 1
		xaxis = [x for x in range(off_s, off_e)]
		yaxis = [series.values[off_s]] + forecasts[i]
		pyplot.plot(xaxis, yaxis, color='red')
	# show the plot
	pyplot.show()

Complete Example

We can put all of these pieces together.

The complete code example for the multi-step persistence forecast is listed below.

from pandas import DataFrame
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from math import sqrt
from matplotlib import pyplot

# date-time parsing function for loading the dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')

# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	raw_values = raw_values.reshape(len(raw_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(raw_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return train, test

# make a persistence forecast
def persistence(last_ob, n_seq):
	return [last_ob for i in range(n_seq)]

# evaluate the persistence model
def make_forecasts(train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = persistence(X[-1], n_seq)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = test[:,(n_lag+i)]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - n_test + i - 1
		off_e = off_s + len(forecasts[i]) + 1
		xaxis = [x for x in range(off_s, off_e)]
		yaxis = [series.values[off_s]] + forecasts[i]
		pyplot.plot(xaxis, yaxis, color='red')
	# show the plot
	pyplot.show()

# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# configure
n_lag = 1
n_seq = 3
n_test = 10
# prepare data
train, test = prepare_data(series, n_test, n_lag, n_seq)
# make forecasts
forecasts = make_forecasts(train, test, n_lag, n_seq)
# evaluate forecasts
evaluate_forecasts(test, forecasts, n_lag, n_seq)
# plot forecasts
plot_forecasts(series, forecasts, n_test+2)

예제를 실행시키면 첫번째로 각 예측된 타임스탭에 대한 RMSE의 값이 출력된다.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision.

예제를 몇 번 실행해보고 평균 결과를 비교해보자.

이것은 LSTM이 능가할 수 있는 각 단계별 성능 기준을 제공한다.

t+1 RMSE: 144.535304
t+2 RMSE: 86.479905
t+3 RMSE: 121.149168

multi-step 지속성 예측이 있는 원래 시계열의 도표도 생성된다.

이 선들은 각 예측에 대한 적절한 입력 값에 연결된다.

이 문맥은 지속성 예측이 실제로 얼마나 naive한가를 보여준다.

Line Plot of Shampoo Sales Dataset with Multi-Step Persistence Forecasts

Multi-Step LSTM Network

이 섹션에서는 지속성 example를 출발점으로 삼고 LSTM을 훈련 데이터에 적합시키는데 필요한 변경사항을 살펴보고 시험 데이터 집합에 대한 multi-step 예측을 할 것이다.

Prepare Data

데이터는 LSTM을 훈련시키기 위해 사용되기 전에 준비되어야 한다.

특히, 2가지 추가 변경이 필요하다:

Stationary. 데이터는 차이점 분류(differencing)에 의해 제거되어야 하는 증가 추세를 보여준다.
Scale. 데이터의 스케일은 LSTM 유닛의 활성화 함수인 -1과 -1사이의 값으로 감소되어야 한다.

difference()라고 하는 데이터를 정지하게 하는 기능을 도입할 수 있다.

이것은 일련의 값을 일련의 차이점, 즉 작업하기에 더 간단한 표현으로 변환시킬 것이다.

# create a differenced series
def difference(dataset, interval=1):
	diff = list()
	for i in range(interval, len(dataset)):
		value = dataset[i] - dataset[i - interval]
		diff.append(value)
	return Series(diff)

우리는 sklearn 라이브러리의 MinMaxScaler을 데이터 scale에 사용할 수 있다.

Putting this together, we can update the prepare_data() function to first difference the data and rescale it, then perform the transform into a supervised learning problem and train test sets as we did before with the persistence example.

이것을 종합하면, 우리는 pready_dataproperties 기능을 업데이트하여 먼저 데이터를 변경하고 크기를 재조정한 다음, 지속성(persistence) 예제를 사용하여 감독된 학습 문제로 변환을 수행하고 테스트 세트를 훈련할 수 있다.

The function now returns a scaler in addition to the train and test datasets.

이제 이 함수는 훈련 및 테스트 데이터세트 외에 scaler를 반환한다.

# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	# transform data to be stationary
	diff_series = difference(raw_values, 1)
	diff_values = diff_series.values
	diff_values = diff_values.reshape(len(diff_values), 1)
	# rescale values to -1, 1
	scaler = MinMaxScaler(feature_range=(-1, 1))
	scaled_values = scaler.fit_transform(diff_values)
	scaled_values = scaled_values.reshape(len(scaled_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(scaled_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return scaler, train, test

We can call this function as follows:

# prepare data
scaler, train, test = prepare_data(series, n_test, n_lag, n_seq)

Fit LSTM Network

다음으로, 우리는 LSTM 네트워크 모델을 훈련데이터에 fit 해야 한다.

이것을 위해 첫째로 훈련데이터를 2D배열[samples, features]에서 3D배열[samples, timesteps, features]로 변환한다.

우리는 time steps를 1로 고정한다. 이 변화는 간단하다.

다음으로, LSTM 네트워크를 디자인한다. 우리는 1 LSTM unit을 가지는 1 Hidden layer인 간단한 구조를 사용할 것이다. 그런 다음 선형 활성화(linear activation)와 3개의 출력 값을 같는 output layer를 사용할 것이다.

네트워크는 mean squared error loss function(평균 제곱 오류 손실 함수) and the efficient ADAM optimization algorithm(효율적 ADAM 최적화 알고리즘)을 사용한다.

LSTM은 stateful하다. 이 의미는 우리가 수동적으로 네트워크의 상태를 각 훈련 에포크의 끝에서 리셋해야 함을 의미한다.

네트워크는 1500에 포크동안 fit할 것이다.

훈련과 예측을 하는 동안 동일한 배치 사이즈가 사용되어야 하고 테스트 데이터셋의 각 time step에서 예측이 이루어지도록 한다.

A batch size of 1 is also called online learning as the network weights will be updated during training after each training pattern (as opposed to mini batch or batch updates).

이것은 배치사이즈로 1이 사용되어야만 함을 의미한다. 배치사이즈 1은 각 훈련 패턴 후 훈련 중에 네트워크 가중치가 업데이트되기 때문에 온라인 학습이라고도 한다.

fit_lstm()함수에 이 모든것을 집어 넣는다. 이 함수는 나중에 네트워크를 조정하기 위해서 사용되는 다수의 키 파라미터를 가지고 예측을 위해 준비된 fit LSTM 모델을 반환한다.

# fit an LSTM network to training data
def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
	# reshape training into [samples, timesteps, features]
	X, y = train[:, 0:n_lag], train[:, n_lag:]
	X = X.reshape(X.shape[0], 1, X.shape[1])
	# design network
	model = Sequential()
	model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
	model.add(Dense(y.shape[1]))
	model.compile(loss='mean_squared_error', optimizer='adam')
	# fit network
	for i in range(nb_epoch):
		model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
		model.reset_states()
	return model

The function can be called as follows:

# fit model
model = fit_lstm(train, 1, 3, 1, 1500, 1)

네트워크의 설정은 tune되지 않았다. 당신이 원하는 다른 파라미터를 시도해봐라.

Make LSTM Forecasts

다음 단계는 fit한 LSTM 네트워크를 예측에 사용하는 것이다.

하나의 예측은 fit LSTM 네트워크로 model.predict()를 불러서 할 수 있다.

다시 한번 말하면, 데이터는 3D배열이어야 한다.

우리는 이것을 forecate_lstm()함수로 wrap할 수 있다.

# make one forecast with an LSTM,
def forecast_lstm(model, X, n_batch):
	# reshape input pattern to [samples, timesteps, features]
	X = X.reshape(1, 1, len(X))
	# make forecast
	forecast = model.predict(X, batch_size=n_batch)
	# convert to array
	return [x for x in forecast[0, :]]

우리는 이 함수를 make_forecasts()함수로 호출하고 모델을 인자로 받아 들이도록 업데이트할 수 있다. 업데이트 버전은 아래에 있다.

# evaluate the persistence model
def make_forecasts(model, n_batch, train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = forecast_lstm(model, X, n_batch)
		# store the forecast
		forecasts.append(forecast)
	return forecasts

This updated version of the make_forecasts() function can be called as follows:

# make forecasts
forecasts = make_forecasts(model, 1, train, test, 1, 3)

Invert Transforms

예측이 이루어진 이후에 우리는 값들을 원래의 스케일로 되돌리기 위해 변환을 반전시켜야 한다.

위의 지속성 예측과 같이 다른 모델과 비교 가능한 오류 점수 및 플롯을 계산할 수 있도록 이것이 필요하다.

inverse_transform()함수를 제공하는 MinMaxScaler 객체를 사용하여 예측의 스케일을 직접 반전시킬 수 있다.

We can invert the differencing by adding the value of the last observation (prior months’ shampoo sales) to the first forecasted value, then propagating the value down the forecast.

우리는 마지막 관측치(달 전의 샴푸 판매량)의 가치를 첫 번째 예측 값에 추가한 다음, 그 값을 예측치를 낮추어 전파함으로써 차이를 반전시킬 수 있다.

This is a little fiddly; we can wrap up the behavior in a function name inverse_difference() that takes the last observed value prior to the forecast and the forecast as arguments and returns the inverted forecast.

이것은 약간 성가신일로, 예측과 예측에 앞서 마지막으로 관측된 값을 인수로받고 반전된 예측을 반환하는 함수 inverse_difference()로 정의한다.

# invert differenced forecast
def inverse_difference(last_ob, forecast):
	# invert first forecast
	inverted = list()
	inverted.append(forecast[0] + last_ob)
	# propagate difference forecast using inverted first value
	for i in range(1, len(forecast)):
		inverted.append(forecast[i] + inverted[i-1])
	return inverted

Putting this together, we can create an inverse_transform() function that works through each forecast, first inverting the scale and then inverting the differences, returning forecasts to their original scale.

이것을 종합하면, 우리는 각 예측을 통해 작용하는 inverse_transform() 함수를 만들어 우선 스케일을 반전시킨 다음 차이를 뒤집어서 예측을 원래 스케일로 되돌릴 수 있다.

# inverse data transform on forecasts
def inverse_transform(series, forecasts, scaler, n_test):
	inverted = list()
	for i in range(len(forecasts)):
		# create array from forecast
		forecast = array(forecasts[i])
		forecast = forecast.reshape(1, len(forecast))
		# invert scaling
		inv_scale = scaler.inverse_transform(forecast)
		inv_scale = inv_scale[0, :]
		# invert differencing
		index = len(series) - n_test + i - 1
		last_ob = series.values[index]
		inv_diff = inverse_difference(last_ob, inv_scale)
		# store
		inverted.append(inv_diff)
	return inverted

We can call this function with the forecasts as follows:

# inverse transform forecasts and test
forecasts = inverse_transform(series, forecasts, scaler, n_test+2)

다음과 같이 RMSE 점수를 정확하게 계산할 수 있도록 출력 부분의 테스트 데이터셋의 변형을 반전시킬 수 있다. (the transforms on the output part test dataset)

actual = [row[n_lag:] for row in test]
actual = inverse_transform(series, actual, scaler, n_test+2)

우리는 또한 RMSE 점수의 계산을 단순화하여 테스트 데이터가 오직 출력 결과들만 포함하도록 할 수 있다.

def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = [row[i] for row in test]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))

Complete Example

우리는 이 모든 조각들을 하나로 묶을 수 있고 LSTM 네트워크를 multi-step 시계열 예측 문제에 fit할 수 있다.

완전한 코드는 아래에 있다.

from pandas import DataFrame
from pandas import Series
from pandas import concat
from pandas import read_csv
from pandas import datetime
from sklearn.metrics import mean_squared_error
from sklearn.preprocessing import MinMaxScaler
from keras.models import Sequential
from keras.layers import Dense
from keras.layers import LSTM
from math import sqrt
from matplotlib import pyplot
from numpy import array
 
# date-time parsing function for loading the dataset
def parser(x):
	return datetime.strptime('190'+x, '%Y-%m')
 
# convert time series into supervised learning problem
def series_to_supervised(data, n_in=1, n_out=1, dropnan=True):
	n_vars = 1 if type(data) is list else data.shape[1]
	df = DataFrame(data)
	cols, names = list(), list()
	# input sequence (t-n, ... t-1)
	for i in range(n_in, 0, -1):
		cols.append(df.shift(i))
		names += [('var%d(t-%d)' % (j+1, i)) for j in range(n_vars)]
	# forecast sequence (t, t+1, ... t+n)
	for i in range(0, n_out):
		cols.append(df.shift(-i))
		if i == 0:
			names += [('var%d(t)' % (j+1)) for j in range(n_vars)]
		else:
			names += [('var%d(t+%d)' % (j+1, i)) for j in range(n_vars)]
	# put it all together
	agg = concat(cols, axis=1)
	agg.columns = names
	# drop rows with NaN values
	if dropnan:
		agg.dropna(inplace=True)
	return agg
 
# create a differenced series
def difference(dataset, interval=1):
	diff = list()
	for i in range(interval, len(dataset)):
		value = dataset[i] - dataset[i - interval]
		diff.append(value)
	return Series(diff)
 
# transform series into train and test sets for supervised learning
def prepare_data(series, n_test, n_lag, n_seq):
	# extract raw values
	raw_values = series.values
	# transform data to be stationary
	diff_series = difference(raw_values, 1)
	diff_values = diff_series.values
	diff_values = diff_values.reshape(len(diff_values), 1)
	# rescale values to -1, 1
	scaler = MinMaxScaler(feature_range=(-1, 1))
	scaled_values = scaler.fit_transform(diff_values)
	scaled_values = scaled_values.reshape(len(scaled_values), 1)
	# transform into supervised learning problem X, y
	supervised = series_to_supervised(scaled_values, n_lag, n_seq)
	supervised_values = supervised.values
	# split into train and test sets
	train, test = supervised_values[0:-n_test], supervised_values[-n_test:]
	return scaler, train, test
 
# fit an LSTM network to training data
def fit_lstm(train, n_lag, n_seq, n_batch, nb_epoch, n_neurons):
	# reshape training into [samples, timesteps, features]
	X, y = train[:, 0:n_lag], train[:, n_lag:]
	X = X.reshape(X.shape[0], 1, X.shape[1])
	# design network
	model = Sequential()
	model.add(LSTM(n_neurons, batch_input_shape=(n_batch, X.shape[1], X.shape[2]), stateful=True))
	model.add(Dense(y.shape[1]))
	model.compile(loss='mean_squared_error', optimizer='adam')
	# fit network
	for i in range(nb_epoch):
		model.fit(X, y, epochs=1, batch_size=n_batch, verbose=0, shuffle=False)
		model.reset_states()
	return model
 
# make one forecast with an LSTM,
def forecast_lstm(model, X, n_batch):
	# reshape input pattern to [samples, timesteps, features]
	X = X.reshape(1, 1, len(X))
	# make forecast
	forecast = model.predict(X, batch_size=n_batch)
	# convert to array
	return [x for x in forecast[0, :]]
 
# evaluate the persistence model
def make_forecasts(model, n_batch, train, test, n_lag, n_seq):
	forecasts = list()
	for i in range(len(test)):
		X, y = test[i, 0:n_lag], test[i, n_lag:]
		# make forecast
		forecast = forecast_lstm(model, X, n_batch)
		# store the forecast
		forecasts.append(forecast)
	return forecasts
 
# invert differenced forecast
def inverse_difference(last_ob, forecast):
	# invert first forecast
	inverted = list()
	inverted.append(forecast[0] + last_ob)
	# propagate difference forecast using inverted first value
	for i in range(1, len(forecast)):
		inverted.append(forecast[i] + inverted[i-1])
	return inverted
 
# inverse data transform on forecasts
def inverse_transform(series, forecasts, scaler, n_test):
	inverted = list()
	for i in range(len(forecasts)):
		# create array from forecast
		forecast = array(forecasts[i])
		forecast = forecast.reshape(1, len(forecast))
		# invert scaling
		inv_scale = scaler.inverse_transform(forecast)
		inv_scale = inv_scale[0, :]
		# invert differencing
		index = len(series) - n_test + i - 1
		last_ob = series.values[index]
		inv_diff = inverse_difference(last_ob, inv_scale)
		# store
		inverted.append(inv_diff)
	return inverted
 
# evaluate the RMSE for each forecast time step
def evaluate_forecasts(test, forecasts, n_lag, n_seq):
	for i in range(n_seq):
		actual = [row[i] for row in test]
		predicted = [forecast[i] for forecast in forecasts]
		rmse = sqrt(mean_squared_error(actual, predicted))
		print('t+%d RMSE: %f' % ((i+1), rmse))
 
# plot the forecasts in the context of the original dataset
def plot_forecasts(series, forecasts, n_test):
	# plot the entire dataset in blue
	pyplot.plot(series.values)
	# plot the forecasts in red
	for i in range(len(forecasts)):
		off_s = len(series) - n_test + i - 1
		off_e = off_s + len(forecasts[i]) + 1
		xaxis = [x for x in range(off_s, off_e)]
		yaxis = [series.values[off_s]] + forecasts[i]
		pyplot.plot(xaxis, yaxis, color='red')
	# show the plot
	pyplot.show()
 
# load dataset
series = read_csv('shampoo-sales.csv', header=0, parse_dates=[0], index_col=0, squeeze=True, date_parser=parser)
# configure
n_lag = 1
n_seq = 3
n_test = 10
n_epochs = 1500
n_batch = 1
n_neurons = 1
# prepare data
scaler, train, test = prepare_data(series, n_test, n_lag, n_seq)
# fit model
model = fit_lstm(train, n_lag, n_seq, n_batch, n_epochs, n_neurons)
# make forecasts
forecasts = make_forecasts(model, n_batch, train, test, n_lag, n_seq)
# inverse transform forecasts and test
forecasts = inverse_transform(series, forecasts, scaler, n_test+2)
actual = [row[n_lag:] for row in test]
actual = inverse_transform(series, actual, scaler, n_test+2)
# evaluate forecasts
evaluate_forecasts(actual, forecasts, n_lag, n_seq)
# plot forecasts
plot_forecasts(series, forecasts, n_test+2)

예제를 실행시키면 첫째로 각 예측된 타입 스텝별 RMSE가 출력된다.

Note: Your results may vary given the stochastic nature of the algorithm or evaluation procedure, or differences in numerical precision.

예제를 몇 번 실행하고 결과의 평균을 비교해보라.

각각 예측된 타임 스텝에서 점수들이 지속성 예측보다 나아지는 것을 볼 수 있다.

이것은 구성된 LSTM이 문제에 대한 스킬을 가지고 있음을 보여준다.

RMSE가 예측 경계선의 길이에 따라 점진적으로 악화되지 않는다는 점은 흥미롭다.

이는 t+2가 t+1보다 예측이 쉽다는 사실을 나타낸다.

이는 시계열에서 기록된 상향 눈금보다 하향 눈금을 예측하기 쉽기 때문일 수 있다(이는 결과에 대한 보다 심도 있는 분석으로 확인할 수 있다).

t+1 RMSE: 95.973221
t+2 RMSE: 78.872348
t+3 RMSE: 105.613951

예측(빨강)과 함께 시계열(파랑) 라인 도표가 생성된다.

이 도표는 모델의 성능이 좋더라도 몇몇 예측은 매우 안좋고 개선의 여지가 많다는 것을 보여준다.

Line Plot of Shampoo Sales Dataset with Multi-Step LSTM Forecasts

Extensions

이 튜토리얼을 넘어서려고 할 때 고려할 수 있는 몇 가지 확장성이 있다.

Update LSTM. Change the example to refit or update the LSTM as new data is made available. A 10s of training epochs should be sufficient to retrain with a new observation.
Update LSTM. 새로운 데이터를 사용하게 되면 LSTM을 refit 하거나 업데이트하도록 예제를 변경하라. 10초간의 훈련 에포크는 새로운 관찰로 재훈련되기에 충분하다.
Tune the LSTM. Grid search some of the LSTM parameters used in the tutorial, such as number of epochs, number of neurons, and number of layers to see if you can further lift performance.
Tune the LSTM. 에포크 수, 뉴런 수, 레이어 수 등 튜토리얼에 사용되는 LSTM 파라미터 중 일부를 그리드 검색하여 성능을 더욱 끌어올릴 수 있는지 확인한다.
Seq2Seq. Use the encoder-decoder paradigm for LSTMs to forecast each sequence to see if this offers any benefit.
Seq2Seq. LSTM에 대한 인코더-디코더 패러다임을 사용하여 각 시퀀스를 예측하여 이것이 어떤 이점을 제공하는지 확인합니다.
Time Horizon. Experiment with forecasting different time horizons and see how the behavior of the network varies at different lead times.
Time Horizon. 다른 시간 경계에 대한 예측에서 실험하고 다른 리드 타임에서 네트워크의 행동이 어떻게 다른지 봐라.

Did you try any of these extensions?

Share your results in the comments; I’d love to hear about it.

Summary

이 튜토리얼에서 당시은 LSTM 네트워크를 multi-step time series 예측을 위해 어떻게 발전시킬 수 있는지 알 수 있다.

특히, 당신은 아래와 같은 것을 배운다:

Specifically, you learned:

How to develop a persistence model for multi-step time series forecasting.
How to develop an LSTM network for multi-step time series forecasting.
How to evaluate and plot the results from multi-step time series forecasting.

Do you have any questions about multi-step time series forecasting with LSTMs?
Ask your questions in the comments below and I will do my best to answer.

저작자표시 (새창열림)

콕콕박스

Multi-step LSTM 시계열 예측 (by Jason Brownlee)

Multistep Time Series Forecasting with LSTMs in Python (번역)

Shampoo Sales Dataset

Data Preparation and Model Evaluation

Data Split

Multi-Step Forecast

Model Evaluation

Persistence Model

Prepare Data

Make Forecasts

Evaluate Forecasts

Complete Example

Multi-Step LSTM Network

Prepare Data

Fit LSTM Network

Make LSTM Forecasts

Invert Transforms

Complete Example

Extensions

Summary

댓글

티스토리툴바