fzj-iek3-vsa / tsam Goto Github PK

Time series aggregation module (tsam). Determines typical operation periods or dereases the temporal resolution. Accelerates model or experiment runs.

Home Page: https://tsam.readthedocs.io/

License: MIT License

Python 100.00%

clustering timeseries energy-system typical-periods optimization aggregation python time-series

tsam's People

Contributors

Stargazers

Watchers

tsam's Issues

Helmholtz Energy System 2.0 tsam page: wrong repo link

https://www.helmholtz.de/en/research/energy/energy-system-2050/heci-artikel/time-series-aggregation-module-tsam/ says:

Open-source available here.

But "here" links to http//. It should link to https://github.com/FZJ-IEK3-VSA/tsam.

I'm aware that the tsam team probably cannot change the Helmholtz website, but perhaps you can get in touch to get the link fixed. I couldn't find a "webmaster" e-mail address or anything like that, so I thought I'd at least report this issue here.

Integer multiple error

Dear Mr @l-kotzur,

Part of my masters thesis is to reduce the size of the input data of an energy system without loosing the peaks! I would like to use tsam to do it.
I do not have much reliable data so right now I am working with one month data with a minutely resolution which means that the length of the time series is 43200. It is already normalized!
My goal is to aggregate the data into 15 min using the peak integration.
I am trying to run the package functions but for some lack of knowledge and misunderstanding I simply can`t run it.

My question is: In the case that I want to aggregate the data into 15 min, What would be the number of typical periods and the hours per period?

I`m using the following which throws me an error of integer multiple. 2880 typical periods because there are 2880 15min in 30 days

aggregation = tsam.TimeSeriesAggregation(ts, 
						noTypicalPeriods = 2880, 
						hoursPerPeriod =  15, 
						clusterMethod = 'k_means')

ValueError: The combination of hoursPerPeriod and the resulution does 
not result in an integer number of time steps per period

Accuracy indicators: Which normalization type?

Hi there,

I have tried your new method to calculate indicators: indicators = aggregation.accuracyIndicators()

As both the RMSE and MAE are absolute error measures in their original form, I assume that the obtained low values are calculated based on the normalized time series and consequently relative i.e. the NRMSE and NMAE (see https://en.wikipedia.org/wiki/Root-mean-square_deviation for an example). From your code base, I see that you just import the calculation from scikit.learn.

This brings me to my question(s):

Am I right with my assumption?
And if yes, which normalization type did you choose for the time series?
Is is value(t)/max(values)-min(values) or value(t)/max(values)?
And the indicator itself is calculated for each column itself, right?

Thanks in advance for your help!

Cheers,
Cord

Problem with TerminationCondition check in method k-medoids

Hi tsam-developer-team,

I've noticed an issue in the k-medoids method. It looks like you create and solve a pyomo model in the method “_k_medoids_excact” and you want to check if the optimization was successful. However, also for successful processes, a ValueError is raised in my software set-up when using Gurobi (my versions: Gurobi 9.1, tsam 1.1.0, pyomo 5.7.1).

File "C:\...\lib\site-packages\tsam\utils\k_medoids_exact.py", line 236, in _k_medoids_exact raise ValueError(results['Solver'][0]['Termination message']) ValueError: Model was solved to optimality (subject to tolerances), and an optimal solution is available.

Probably pyomo changed the order of the TerminationConditions. Maybe it’s more stable to load the TerminationConditions directly from the module “pyomo.opt” and check for the required ones.

Best regards,
Stefan

Documentation (of clustering algorithms)

Hi there,

thanks for creating this package and I am already looking forward to the webinar (on Netzwerk Energysystemanalyse) tomorrow!

As we would like to use the package for a research project, I am very interested in the documentation. Do you already have a pre-print version of your paper or something that can be used?

Or did you implement the clustering (I am interested in hierachical and k-means) according to the following papers?

Then I could have a look at these and already cite with you DOI..

Thanks in advance!
Cord

Data Normalization

Hi again,

I have just looked at the example in which the input data has not been normalized: https://github.com/FZJ-IEK3-VSA/tsam/blob/master/example/aggregation_example.ipynb

If I have let's say 3 time series and (one normalized and two regular ones), is it mandatory to have all of them normalized in order to have the algorithm working correctly?

I am guessing, yes..

Best
Cord

Max iteration number reached while rescaling the cluster periods

Print total/relative deviation and specific time series name

Aggregated peak

Hi guys (@l-kotzur @maximilian-hoffmann),
I was wondering if there was a way to create an additional typical period (extremePeriodMethod) for the aggregated peak of a collection of profiles (concurrent peak) and not just the profile which contains the peak.

Great to see the activity on this package. It is super useful!

Best,

Sam

Add automatic documentation with Sphinx

Python allows for automatic extraction of docstrings, methods, classes, etc. and automatically creating more readable documentations in HTML or PDF format. For example, the Python documentation itself and the documentation of Pyomo are created this way.

The package for this is called Sphinx and can be automatically run by CI runners (e.g. Travis CI [free for open-source projects]). The documentation can also be hosted on ReadTheDocs.org for free and make the package even more accessible.

Problem with "predictOriginalData" and "weightDict"

Sorry, it’s me again today.
I have another and a rather important issue.
It seems like the predicted time series is not calculated correctly when a weighting value is applied for that series. Hence, also the accuracy indicators are wrong.
The problem is somehow related to the "inverse_transform" method of the MinMaxScaler. Probably that’s a good spot to start searching.
For validation, please consider the following minimal example. Everything is fine with a weighting value of 1. If selected differently – see yourself...

import pandas as pd
from tsam.timeseriesaggregation import TimeSeriesAggregation

data = pd.DataFrame()
idx = pd.date_range('2020-01-01 00:00:00', periods=3, freq='1H')
data['test'] = pd.Series(index=idx, data=[1, 2, 3])

tsa = TimeSeriesAggregation(data, noTypicalPeriods=3, hoursPerPeriod=1,
                            weightDict={'test': 0.1})

print(data)
print(tsa.createTypicalPeriods())
print(tsa.predictOriginalData())
print(tsa.accuracyIndicators())

Expected incompatibility with pandas 3.0 due to changes in stack function implemetation

Dear tsam developers,

when using a recent version of pandas, I receive the following deprecation warning when using tsam:

The previous implementation of stack is deprecated and will be removed in a future version of pandas. See the What's New notes for pandas 2.1.0 for details. Specify future_stack=True to adopt the new implementation and silence this warning.

The notes referenced in the deprecation warning state that from pandas 3.0 on, the previous implementation of the stack-function will not be available anymore, cf. https://pandas.pydata.org/docs/whatsnew/v2.1.0.html#new-implementation-of-dataframe-stack. According to their milestones, release of pandas 3.0 could be expected already around April 2024, cf. https://github.com/pandas-dev/pandas/milestone/102.

In tsam, the mentioned stack function is used in multiple positions in timeseriesaggregation.py and durationRepresentation.py. Since, e.g., the default sorting behavior is expected to change in the new implementation, I am unsure whether these changes in behavior could impact the outputs of tsam. Therefore, I wanted to ask whether you are already aware of this change and if it is already planned to adjust the use of pandas at these points in the code, accordingly.

Thank you for providing this very useful software!

Allow to choose solvers from SolverFactory

I have used tsam again for a small project and therefore installed the latest version from pypi.
Compared to my experiences with the initial version it took really long to cluster the data using all available methods. Then, I noticed that the standard solver has changed to glpk and before probably had been set to gurobi.

Unfortunately, I could not find an option to change this behaviour as the call for the medoids clustering uses the default value within the utils package/subfolder.

Wouldn't it make sense to provide this as an argument along with all other options such as cluster methods, etc.?

If you agree, I could also provide a quick pull request!

Cheers
Cord

Allow deterministic k-means/k-medoids

Allow to set random_state parameter of scikit's cluster methods to make TSA reproducible and deterministic (e.g., random_state=0)

Error when creating typical periods which are longer than 24h (e.g. 48h or 72h)

I am using version 2.0.1 of Pandas. Unfortunately, as of version 1.4.0, DataFrame.append() has been deprecated. So now I get an error message when I try to create typical periods longer than 24h, for example, typical periods that are 48h long.

Tsam erroneously raises error after successfully solving k_medoids clustering problem with Gurobi

tsam/tsam/utils/k_medoids_exact.py

Lines 186 to 187 in 25c5b39

 elif self.solver=='gurobi' and not results['Solver'][0]['Termination condition'].index in [2,7,8,9,10]: # optimal 

 raise ValueError(results['Solver'][0]['Termination message'])

The above code raises an error although Gurobi finds a valid solution for clustering method k_medoids. Hence, the error message should not be raised.

Possible cause for the bug:

API changes in Pyomo (my version: 6.0.1) or Gurobi (my version: 9.1.1). The index attribute of results['Solver'][0]['Termination condition'] seems not to be present in Pyomo 6.0.1 anymore. Could it be replaced by results['Solver'][0]['Return code']?

Hotfix:

If I comment out the above lines everything works just fine.

Error: "No module named 'tsa'" when using 'k_medoids'

Hi again,

I am testing the clustering methods against each other for a reduced unit commitment model by producing single results for each method:

# cluster data
methods = ['k_means', 'hierarchical', 'k_medoids']
for m in methods:
    aggregation = tsam.TimeSeriesAggregation(normalized, noTypicalPeriods=4,
	                                     hoursPerPeriod=24*7,
	                                     clusterMethod=m)
    typPeriods = aggregation.createTypicalPeriods()
    typPeriods.shape
    typPeriods.to_csv('clustered_' + str(m) + '.csv')
    predictedPeriods = aggregation.predictOriginalData()
    predictedPeriods.shape
    predictedPeriods.to_csv('clustered_predicted_' + str(m) + '.csv')

which produces an error:

Traceback (most recent call last):
  File "cluster.py", line 36, in <module>
    typPeriods = aggregation.createTypicalPeriods()
  File "/some/path/tsam/tsam/timeseriesaggregation.py", line 757, in createTypicalPeriods
    clusterMethod=self.clusterMethod)
  File "/some/path/tsam/tsam/timeseriesaggregation.py", line 113, in aggregatePeriods
    from tsa.utils.k_medoids_exact import KMedoids
ImportError: No module named 'tsa'

It works perfectly with:

methods = ['k_means', 'hierarchical']

so 'k_medoids' seems to produce the error due to a missing tsa module.

I thought tsa is included within statsmodels which is installed within my python environment..

Do you have any advice here?

Best
Cord

Documentation builds seem to be out of date

The documentation https://tsam.readthedocs.io seems to have been updated last for version 2.0.0.

Solver selection

Make solver selection available as keyword argument

New release?

Are there any plans for a new release?
And would you consider to release tsam also at pypi?

Since we use tsam now in our apps it would be nice to
a) get the recent changes within in a new released version
b) being able to simply install that version from pypi

Standartize nomenclature such as clusterOrder vs. periodOrder etc.

Numerical issues for hierarchical aggregation

If the "predefined_sequence_example.ipynb" is operated under different systems (e.g. Windows ), a different clustering (check clusterOrder) occurs. This behavior was tested for Python 3.6 and 3.7, as well as differekt scikit-learn versions.
It is expected that this is due to numerical precision of the candidates in different systems: If the candidate inputs (normalizedPeriodlyProfiles) get rounded to dec=14 before the aggregation, the results of the Windows system can be reproduced under Linux.
For testing check test_aggregate_hierarchical.py and round normalizedPeriodlyProfiles before the aggregation.

Mapping: Period<->Representative period

Hi again,

how can I obtain information about which periods have been selected e.g. weeks 2, 15, 35, and 46 are the chosen ones with the corresponding weighting factors (if any)?

Sorry for asking so many questions at once..

Best
Cord

incompatible with pandas 2.0

Since the append-method was removed from pandas 2.0, all commands using this function do not work with this pandas version.
e.g. l64 in timeseriesaggregation.py:

unstackedTimeSeries = unstackedTimeSeries.append(rep_data, ignore_index=False)

which I think should be rewritten to:

unstackedTimeSeries=pd.concat([unstackedTimeSeries, rep_data])

Input column order affects clustering output

Column order of original time series data affects resulting clustered time series data for a larger number of typical periods.

Update setup.py pip etc.

Picking more than 24 hours (4 type weeks)

I am doing some tests and everything seems to work well with less than 24 hours.

But if I change the call from let's say:

aggregation = tsam.TimeSeriesAggregation(normalized, noTypicalPeriods=7,
	                                 hoursPerPeriod=24,
	                                 clusterMethod='hierarchical')

to (trying to pick 4 type weeks per year):

aggregation = tsam.TimeSeriesAggregation(normalized, noTypicalPeriods=4,
	                                 hoursPerPeriod=24*7,
	                                 clusterMethod='hierarchical')

I get an error:

Traceback (most recent call last):
  File "cluster.py", line 34, in <module>
    typPeriods = aggregation.createTypicalPeriods()
  File "/home/cord/Programmierung/tsam/tsam/timeseriesaggregation.py", line 793, in createTypicalPeriods
    self.clusterPeriodNoOccur[self.clusterOrder[-1]] -= (1 - float(len(self.timeSeries) % self.timeStepsPerPeriod)/self.timeStepsPerPeriod)
AttributeError: 'TimeSeriesAggregation' object has no attribute 'clusterPeriodNoOccur'

I have observed that it pops up when I change the value of hoursPerPeriod to more than 24 hours.

Can anyone help?

Best
Cord

	elif self.solver=='gurobi' and not results['Solver'][0]['Termination condition'].index in [2,7,8,9,10]: # optimal
	raise ValueError(results['Solver'][0]['Termination message'])

fzj-iek3-vsa / tsam Goto Github PK

tsam's People

Contributors

Stargazers

Watchers

Forkers

tsam's Issues

Recommend Projects

Recommend Topics

Recommend Org