Main Problem
The following code produces the data leakage FFT feature.
close_fft = np.fft.fft(np.asarray(data_combine['Close'].tolist()))
fft_df = pd.DataFrame({'fft':close_fft})
fft_df['absolute'] = fft_df['fft'].apply(lambda x: np.abs(x))
fft_df['angle'] = fft_df['fft'].apply(lambda x: np.angle(x))
plt.figure(figsize=(14, 7), dpi=100)
fft_list = np.asarray(fft_df['fft'].tolist())
for num_ in [3, 6, 9, 27, 81, 100]:
fft_list_m10= np.copy(fft_list); fft_list_m10[num_:-num_] = 0
data_combine[f'FT_{num_}components'] = np.fft.ifft(fft_list_m10)
plt.plot(np.fft.ifft(fft_list_m10), label='Fourier transform with {} components'.format(num_))
plt.plot(data_combine['Close'].values, label='Real')
What goes wrong
- Generate the FFT from the whole time series.
- Use the future data to provide the previous FFT value.
What can happen
Even an MLP model can have a good result in predicting the next day's up/down trend.
Solution
To provide FFT feature without data leakage. You will need to generate it by each bar.
such as :
for i in range(1, len(df)):
window = df[:i]['close']
index_data.append(df.index[i])
fft_close = np.fft.fft(window.values)
absolute = np.abs(fft_close)
angle = np.angle(absolute)
...
After this alteration, the model will perform very badly.