mcompetitions / m4-methods Goto Github PK
View Code? Open in Web Editor NEWData, Benchmarks, and methods submitted to the M4 forecasting competition
Data, Benchmarks, and methods submitted to the M4 forecasting competition
Hello. Thanks for sharing this repository.
I'm looking at ML_benchmarks.py#L156 and noticed that you just reshape x_train
to have sequences along each row. This means that your rows do not have any overlap between them. Doesn't this hinder the RNN's performance?
Hello, excuse me. I would like to ask, what do V1, V2, V3, V4, V5, V6, etc. in the dataset represent? What do Y1, Y2, Y3, Y4 represent?
ignore
I have a quick question regarding the order of data,
so each row means a time series, does it start from left to right (small column index to large index), or the other direction?
Thanks!
How were the ARIMA results generated? I can only find the submission file with the results, but not the code used to generate them. Was any preprocessing applied? Any non-default hyper-parameters settings? Did you use forecast's auto.arima
(v8.2)?
Hi there, can we reuse and redistribute code from "Benchmarks and Evaluation.R" in our organization?
Hello, is this the best validation dataset in the dataset? Is it generated by someone else's model? Thaks.
Hi,
Came across this topic when searching for existing forecast tool.
In respect to the method shown, is there any reference on the theta model exhibited? (i.e.: logic of the codes)
Much appreciated.
Hi,
I am having issues replicating the results from the code submitted by Nikoletta-Zampeta Legaki. Running the code on RStudio gives following error:
Error in datasets[[j]][[i]] : subscript out of bounds
Your assistance in this regard is much appreciated!
Sorry if this is a false alarm, but my antivirus just reacted to
SIOPREDM4.exe
Is the source code for this entrant not available or have I just overlooked it?
Hello :)
I was trying to run the predict.py, however, it shows that there in line 105 of predict.py need template_Naive.csv, which I couldn't find in M4 dataset. Where can I find it?
Thank you very much
Hi all, I was wondering if there is information regarding the date's format or the range of the StartingDate
column given in the M4-info.csv
or not.
From my observation, the StartingDate
is usually in the format of "DD-MM-YY hh:mm"
, but there are some that break this rule or is ambiguous. For example:
1882-07-01 12:00:00
(which I supposed is the 1st of July, 1882).01-01-17 12:00
(which I can't tell if it is 1st of January of 1917 or 2017).Any clarification is appreciated!
The submission 237 - prologistica does not replicate due to missing R files classifier.R
and models.R
referenced in model_choice_M4.r
.
Hi,
I'm having issues replicated the results for the MLP method, especially for the hourly dataset.
I'm using the hyper-parameter settings found in https://github.com/Mcompetitions/M4-methods/blob/master/ML_benchmarks.py.
Yearly | Quarterly | Monthly | Weekly | Daily | Hourly | |
---|---|---|---|---|---|---|
MLP | -7.910408 | -7.948233 | -4.790635 | -44.658715 | -51.489338 | 147.500501 |
Values are percentage difference between published sMAPE and replicated ones, i.e. a value of 100 means 100% difference, positive values indicate the replicated results are worse than published results.
The plot shows part of H1 training series, the full test series and MLP point forecasts, where y_pred_orig
are the point forecasts found in the submission-MLP.rar
file, y_pred_add
are point forecasts I obtain with additive deseasonalisation and y_pred_mul
are point forecasts I obtain with multiplicative deseasonalisation.
I find similar patterns for RNN. Are you using additive seasonality by any chance? Any other idea where the deviation may come from?
Hi,
What version of the forecast package was used for calculating the point forecasts of the benchmarks?
I'm asking because if i try to reproduce the benchmarks point forecasts with the code from "Benchmarks and Evaluation.R" i get different results for SES, Holt etc. I'm using forecast 8.21 on R 4.3.1 on Linux.
Thank you.
I suggest adding the topics time-series
, time-series-analysis
, forecasting
to the About section at https://github.com/Mcompetitions/M4-methods
Depending on where the dataset comes from, this might affect which license affects the M4 dataset.
As a for-profits business, we need to figure out what is possible for us to do with this data.
It seems mean() function is missing in your smape_cal function in Benchmarks and Evaluation.R.
Arsa
Hello, in this paper "The M4 Competition: 100000 time series and 61 forecasting methods", it is proposed that the M4 dataset is divided into six data frequencies and six application fields. For the Yearly dataset, Micro accounts for 6538, Industry accounts for 3716, Macro accounts for 3903, Finance accounts for 6519, Demographic accounts for 1088, and Other accounts for 1236. May I ask which rows of the entire dataset are this Micro in? What are the rows of Industry in the entire dataset? What are the rows of Macro in the entire dataset? What are the rows of Finance in the entire dataset?
The seasonality tests in Python and R seem to give different results.
If you run the R code snippet below, you get FALSE. If you run the Python snippet you get True.
R code:
# copied from https://github.com/Mcompetitions/M4-SeasonalityTest <- function(input, ppy){
#Used to determine whether a time series is seasonal
tcrit <- 1.645
if (length(input)<3*ppy){
test_seasonal <- FALSE
}else{
xacf <- acf(input, plot = FALSE)$acf[-1, 1, 1]
clim <- tcrit/sqrt(length(input)) * sqrt(cumsum(c(1, 2 * xacf^2)))
test_seasonal <- ( abs(xacf[ppy]) > clim[ppy] )
if (is.na(test_seasonal)==TRUE){ test_seasonal <- FALSE }
}
return(test_seasonal)
}
data <- c(2.62434536, -0.61175641, -0.52817175, -1.07296862, 1.86540763,
-2.3015387 , 1.74481176, -0.7612069 , 1.3190391 , -0.24937038,
1.46210794, -2.06014071, 0.6775828 , -0.38405435, 1.13376944,
-1.09989127)
ppy <- 4
SeasonalityTest(data, ppy)
Python code:
import numpy as np
from math import sqrt
data = np.array([2.62434536, -0.61175641, -0.52817175, -1.07296862, 1.86540763,
-2.3015387 , 1.74481176, -0.7612069 , 1.3190391 , -0.24937038,
1.46210794, -2.06014071, 0.6775828 , -0.38405435, 1.13376944,
-1.09989127])
# copied from https://github.com/Mcompetitions/M4-methods/blob/master/ML_benchmarks.py
def seasonality_test(original_ts, ppy):
"""
Seasonality test
:param original_ts: time series
:param ppy: periods per year
:return: boolean value: whether the TS is seasonal
"""
s = acf(original_ts, 1)
for i in range(2, ppy):
s = s + (acf(original_ts, i) ** 2)
limit = 1.645 * (sqrt((1 + 2 * s) / len(original_ts)))
return (abs(acf(original_ts, ppy))) > limit
def acf(data, k):
"""
Autocorrelation function
:param data: time series
:param k: lag
:return:
"""
m = np.mean(data)
s1 = 0
for i in range(k, len(data)):
s1 = s1 + ((data[i] - m) * (data[i - k] - m))
s2 = 0
for i in range(0, len(data)):
s2 = s2 + ((data[i] - m) ** 2)
return float(s1 / s2)
ppy = 4
seasonality_test(data, ppy)
The difference is that in the Python code you do not take the square of the autocorrelation coefficient at the first lag, i.e.
s = acf(original_ts, 1) ** 2
In the file 'Benchmarks and Evaluation.R', function naive_seasonal, line 43, is ''+ frcst - frcst" actually meaningful? It seems that it does nothing
A declarative, efficient, and flexible JavaScript library for building user interfaces.
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
An Open Source Machine Learning Framework for Everyone
The Web framework for perfectionists with deadlines.
A PHP framework for web artisans
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
Some thing interesting about web. New door for the world.
A server is a program made to process requests and deliver data to clients.
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
Some thing interesting about visualization, use data art
Some thing interesting about game, make everyone happy.
We are working to build community through open source technology. NB: members must have two-factor auth.
Open source projects and samples from Microsoft.
Google ❤️ Open Source for everyone.
Alibaba Open Source for everyone
Data-Driven Documents codes.
China tencent open source team.