Giter Club home page Giter Club logo

eemeter's Introduction

EEmeter: tools for calculating metered energy savings

Build Status

License

Documentation Status

PyPI Version

Code Coverage Status

Code Style


EEmeter — an open source toolkit for implementing and developing standard methods for calculating normalized metered energy consumption (NMEC) and avoided energy use.

Background - why use the EEMeter library

At time of writing (Sept 2018), the OpenEEmeter, as implemented in the eemeter package and sibling eeweather package, contains the most complete open source implementation of the CalTRACK Methods, which specify a family of ways to calculate and aggregate estimates avoided energy use at a single meter particularly suitable for use in pay-for-performance (P4P) programs.

The eemeter package contains a toolkit written in the python langage which may help in implementing a CalTRACK compliant analysis.

It contains a modular set of of functions, parameters, and classes which can be configured to run the CalTRACK methods and close variants.

Note

Please keep in mind that use of the OpenEEmeter is neither necessary nor sufficient for compliance with the CalTRACK method specification. For example, while the CalTRACK methods set specific hard limits for the purpose of standardization and consistency, the EEmeter library can be configured to edit or entirely ignore those limits. This is becuase the emeter package is used not only for compliance with, but also for development of the CalTRACK methods.

Please also keep in mind that the EEmeter assumes that certain data cleaning tasks specified in the CalTRACK methods have occurred prior to usage with the eemeter. The package proactively exposes warnings to point out issues of this nature where possible.

Installation

EEmeter is a python package and can be installed with pip.

$ pip install eemeter

Features

  • Reference implementation of standard methods
    • CalTRACK Daily Method
    • CalTRACK Monthly Billing Method
    • CalTRACK Hourly Method
  • Flexible sources of temperature data. See EEweather.
  • Candidate model selection
  • Data sufficiency checking
  • Model serialization
  • First-class warnings reporting
  • Pandas dataframe support
  • Visualization tools

Roadmap for 2020 development

The OpenEEmeter project growth goals for the year fall into two categories:

  1. Community goals - we want help our community thrive and continue to grow.
  2. Technical goals - we want to keep building the library in new ways that make it as easy as possible to use.

Community goals

  1. Develop project documentation and tutorials

A number of users have expressed how hard it is to get started when tutorials are out of date. We will dedicate time and energy this year to help create high quality tutorials that build upon the API documentation and existing tutorials.

  1. Make it easier to contribute

As our user base grows, the need and desire for users to contribute back to the library also grows, and we want to make this as seamless as possible. This means writing and maintaining contribution guides, and creating checklists to guide users through the process.

Technical goals

  1. Implement new CalTRACK recommendations

The CalTRACK process continues to improve the underlying methods used in the OpenEEmeter. Our primary technical goal is to keep up with these changes and continue to be a resource for testing and experimentation during the CalTRACK methods setting process.

  1. Hourly model visualizations

The hourly methods implemented in the OpenEEMeter library are not yet packaged with high quality visualizations like the daily and billing methods are. As we build and package new visualizations with the library, more users will be able to understand, deploy, and contribute to the hourly methods.

  1. Weather normal and unusual scenarios

The EEweather package, which supports the OpenEEmeter, comes packaged with publicly available weather normal scenarios, but one feature that could help make that easier would be to package methods for creating custom weather year scenarios.

  1. Greater weather coverage

The weather station coverage in the EEweather package includes full coverage of US and Australia, but with some technical work, it could be expanded to include greater, or even worldwide coverage.

License

This project is licensed under [Apache 2.0](LICENSE).

Other resources

eemeter's People

Contributors

arpankotecha avatar cathydeng avatar dyeager-recurve avatar ericdill avatar hshaban avatar jason-recurve avatar jfenna avatar jglasskatz avatar joydeep-recurve avatar jpvelez avatar jwickers avatar kfogel avatar lisbyers-recurve avatar marcpare avatar marcrecurve avatar mariano-recurve avatar mdrpheus avatar natphi avatar peterbolson avatar philngo avatar philngo-recurve avatar potash avatar pyup-bot avatar rybalko avatar ssuffian avatar takkaria avatar toshi09 avatar tplagge avatar travis-recurve avatar tsennott avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

eemeter's Issues

Parse and graph temperatures in degC

Can you please add functionality to receive temperature in degC (not just degF) and to plot graphs using degC, too?

(incl. .plot_time_series etc.)

Constant (average) counterfactual with eemeter daily matrix on certain datasets only.

Hi,

I'm using eemeter for a research project comparing reliability of metered savings from hourly and daily gas consumption data. I've been able to generate varying hourly counterfactuals for a set of publicly available data; however, I'm having trouble generating a varying counterfactual for daily consumption with some datasets. Instead of giving me a counterfactual that varies with temperature, I'm getting the average of the baseline meter period.

Have you ever come across an issue like this? Is this just a data issue, or is there an issue with the model?

Output from Dataset 1 (LCL-June2015v2_126)

Summary statistics from baseline period included for reference.

            value
count  385.000000
mean     1.341932
std      0.924915
min      0.000000
25%      0.708000
50%      1.299000
75%      1.908000
max      7.219000


                          reporting_observed  counterfactual_usage
2013-01-21 00:00:00+00:00               1.560              1.642287
2013-01-22 00:00:00+00:00               3.207              1.628237
2013-01-23 00:00:00+00:00               1.796              1.598183
2013-01-24 00:00:00+00:00               2.400              1.610889
2013-01-25 00:00:00+00:00               1.746              1.610034
2013-01-26 00:00:00+00:00               2.336              1.497270
2013-01-27 00:00:00+00:00               2.314              1.408208
2013-01-28 00:00:00+00:00               1.914              1.459275
2013-01-29 00:00:00+00:00               1.635              1.304730
2013-01-30 00:00:00+00:00               0.000              1.352743

Output from Dataset 2 (LCL-June2015v2_0)

Summary statistics from baseline period included for reference.

            value
count  385.000000
mean     5.912592
std      2.664848
min      0.000000
25%      5.144000
50%      6.102000
75%      6.922000
max     23.399000

                        reporting_observed  counterfactual_usage
2013-01-21 00:00:00+00:00               6.083              5.912592
2013-01-22 00:00:00+00:00               5.715              5.912592
2013-01-23 00:00:00+00:00               6.080              5.912592
2013-01-24 00:00:00+00:00               6.491              5.912592
2013-01-25 00:00:00+00:00               4.954              5.912592
2013-01-26 00:00:00+00:00               8.271              5.912592
2013-01-27 00:00:00+00:00               6.022              5.912592
2013-01-28 00:00:00+00:00               5.305              5.912592
2013-01-29 00:00:00+00:00               4.802              5.912592
2013-01-30 00:00:00+00:00               0.000              5.912592

Extracting Confidence Interval of fitted regression models in "CalTRACK Hourly method" with a “one_month” setting

I am using OpenEE open source code to measure the energy efficiency of intervention and my client is asking for Confidence Interval (CI) of the fitted regression model.

I am running the eemeter with hourly meter and temperature data sets using "CalTRACK Hourly method" with a “one_month” setting (one regression model for each month or 12 models in total).

Can you please show me how to extract the Confidence Interval of the model for each model?

This is the core code I am using to do "CalTRACK Hourly method" with a “one_month”:

# Get meter data suitable for fitting a baseline model
baseline_end_date_hr = min(meter_data.index) + pd.Timedelta(days=365)
baseline_meter_data_hr, warnings = eemeter.get_baseline_data(
    meter_data, end=baseline_end_date_hr, max_days=365
    )

# Create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data_hr, temperature_data,
        )
    )

# Build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'one_month',
    )

# Assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
    )

# Assign temperatures to bins
temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
    )

# Build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        temperature_bins,
        )
    )

# BEGIN NEW CODE for fitting baseline model - example of using SegmentedModel
# directly with modified segment type. CalTRACKHourlyModel is a very thin wrapper
# around SegmentedModel, which is why this works
segment_models = [
    eemeter.fit_caltrack_hourly_model_segment(segment_name, segment_data)
    for segment_name, segment_data in segmented_design_matrices.items()
    ]

# Fit a CalTRACK hourly model
baseline_model_hr = eemeter.SegmentedModel(
    prediction_segment_type="one_month",
    prediction_segment_name_mapping=None,
    segment_models=segment_models,
    prediction_feature_processor=eemeter.caltrack_hourly_prediction_feature_processor,
    prediction_feature_processor_kwargs={
        "occupancy_lookup": occupancy_lookup,
        "temperature_bins": temperature_bins,
        },
            )

# END NEW CODE

# Get a year of reporting period data
reporting_meter_data_hr, warnings_hr = eemeter.get_reporting_data(
    meter_data, start=baseline_end_date_hr, max_days=(455)
    )
warnings_hr

# Compute metered savings for the year of the reporting period we've selected
metered_savings_hr, error_bands_hr = eemeter.metered_savings(
    baseline_model_hr, reporting_meter_data_hr,
    temperature_data, confidence_level=0.90, with_disaggregated=True
    )
error_bands_hr
metered_savings_hr.metered_savings.plot()

Flake 8 failures

The command pytest --flake8 currently fails because of a bunch of bare except clauses (E722). We should either ignore this type of error where it's not as relevant or fix them in cases where the error flags a real code smell.

Segmentation with holidays

Energy usage in buildings typically varies on holidays compared to weekends or other weekday-hour brackets. The segmentation allows us to easily define a new map, like the following example, and segregate holiday data from the rest. This enhances the regression accuracy through more precise occupancy bins. However, one challenge is the number of data points in the holiday segment, which is necessary to prevent overfitting due to the number of independent variables (such as 168 weekday-hours, temperature bins, etc.). I would recommend to update segment_weights... and segment_time_series functions of segmentation.py to include holidays.

"three_month_weighted": {
"jan": "dec-jan-feb-weighted",
"feb": "jan-feb-mar-weighted",
"mar": "feb-mar-apr-weighted",
"apr": "mar-apr-may-weighted",
"may": "apr-may-jun-weighted",
"jun": "may-jun-jul-weighted",
"jul": "jun-jul-aug-weighted",
"aug": "jul-aug-sep-weighted",
"sep": "aug-sep-oct-weighted",
"oct": "sep-oct-nov-weighted",
"nov": "oct-nov-dec-weighted",
"dec": "nov-dec-jan-weighted",
"holiday": "holiday",
},

Change datetime type for samples read_meter_data_from_csv

When loading the default sample data, .tz_localize('UTC') requires a different datetime type than is being passed. The parser is unable to standardize dates and is unable to convert 'start' to a datetime index.

I fixed this bug by explicitly converting the datetime column.

SOLUTION:
def meter_data_from_csv(
filepath_or_buffer,
tz=None,
start_col="start",
value_col="value",
gzipped=False,
freq=None,
**kwargs
):
""" Load meter data from a CSV file.
Default format::
start,value
2017-01-01T00:00:00+00:00,0.31
2017-01-02T00:00:00+00:00,0.4
2017-01-03T00:00:00+00:00,0.58
Parameters
----------
filepath_or_buffer : :any:str or file-handle
File path or object.
tz : :any:str, optional
E.g., 'UTC' or 'US/Pacific'
start_col : :any:str, optional, default 'start'
Date period start column.
value_col : :any:str, optional, default 'value'
Value column, can be in any unit.
gzipped : :any:bool, optional
Whether file is gzipped.
freq : :any:str, optional
If given, apply frequency to data using :any:pandas.DataFrame.resample.
**kwargs
Extra keyword arguments to pass to :any:pandas.read_csv, such as
sep='|'.
"""

read_csv_kwargs = {
    "usecols": [start_col, value_col],
    "dtype": {value_col: np.float64},
    "parse_dates": [start_col],
    "index_col": start_col,
}

if gzipped:
    read_csv_kwargs.update({"compression": "gzip"})

# allow passing extra kwargs
read_csv_kwargs.update(kwargs)

df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs)
**df.index = pd.to_datetime(df.index, utc=True)**
if tz is not None:
    df = df.tz_convert(tz)

if freq == "hourly":
    df = df.resample("H").sum()
elif freq == "daily":
    df = df.resample("D").sum()

return df

ERROR:

TypeError Traceback (most recent call last)
in
1 #Daily Billing for Caltrack
2 meter_data, temperature_data, sample_metadata = (
----> 3 eemeter.load_sample("il-electricity-cdd-hdd-daily")
4 )
5

~/anaconda3/envs/eenv/lib/python3.7/site-packages/eemeter/samples/load.py in load_sample(sample)
80 meter_data_filename = metadata["meter_data_filename"]
81 with resource_stream("eemeter.samples", meter_data_filename) as f:
---> 82 meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
83
84 temperature_filename = metadata["temperature_filename"]

~/anaconda3/envs/eenv/lib/python3.7/site-packages/eemeter/io.py in meter_data_from_csv(filepath_or_buffer, tz, start_col, value_col, gzipped, freq, **kwargs)
81 read_csv_kwargs.update(kwargs)
82
---> 83 df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
84 if tz is not None:
85 df = df.tz_convert(tz)

~/anaconda3/envs/eenv/lib/python3.7/site-packages/pandas/core/generic.py in tz_localize(self, tz, axis, level, copy, ambiguous, nonexistent)
9865 if level not in (None, 0, ax.name):
9866 raise ValueError("The level {0} is not valid".format(level))
-> 9867 ax = _tz_localize(ax, tz, ambiguous, nonexistent)
9868
9869 result = self._constructor(self._data, copy=copy)

~/anaconda3/envs/eenv/lib/python3.7/site-packages/pandas/core/generic.py in _tz_localize(ax, tz, ambiguous, nonexistent)
9848 ax_name = self._get_axis_name(axis)
9849 raise TypeError(
-> 9850 "%s is not a valid DatetimeIndex or " "PeriodIndex" % ax_name
9851 )
9852 else:

TypeError: index is not a valid DatetimeIndex or PeriodIndex

loading sample data

meter_data, temperature_data, metadata =
... eemeter.load_sample('il-electricity-cdd-hdd-daily')
Traceback (most recent call last):
File "", line 2, in
File "c:\users\ben\src\eemeter\eemeter\samples\load.py", line 82, in load_sample
meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
File "c:\users\ben\src\eemeter\eemeter\io.py", line 83, in meter_data_from_csv
df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
File "C:\Users\Ben\Anaconda3\envs\OpenEE\lib\site-packages\pandas\core\generic.py", line 9407, in tz_localize
ax = _tz_localize(ax, tz, ambiguous, nonexistent)
File "C:\Users\Ben\Anaconda3\envs\OpenEE\lib\site-packages\pandas\core\generic.py", line 9387, in _tz_localize
'PeriodIndex' % ax_name)
TypeError: index is not a valid DatetimeIndex or PeriodIndex

Error produced when using metered_savings() on hourly data.

Hi, I am trying to run an example in the Tutorial (http://eemeter.openee.io/tutorial.html) for the hourly data (Quickstart for CalTRACK Hourly). Please advise. Thank you!

  1. The error occurs when running the following code:
    metered_savings_dataframe, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data,
    temperature_data, with_disaggregated=True
    )

Traceback (most recent call last):

File "", line 3, in temperature_data, with_disaggregated=True
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\derivatives.py", line 227, in metered_savings prediction_index, temperature_data, **predict_kwargs
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\caltrack\hourly.py", line 191, in predict
return self.model.predict(prediction_index, temperature_data, **kwargs)
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\segmentation.py", line 221, in predict
prediction = segment_model.predict(segmented_data) * segmented_data.weight
File "C:\Users\xxx\anaconda3\lib\site-packages\eemeter\segmentation.py", line 98, in predict
columns={0: "predicted_usage"}
TypeError: rename() got an unexpected keyword argument 'columns'

2.Version string of eemeter, pandas, and their dependencies:
eemeter version : 2.9.2
pandas version : 1.0.1
python : 3.7.6

ETL using Singer

Just a suggestion.

It might be worth looking at the https://www.singer.io/ project, it is sponsored by Stitch Data and is a nice open source generalised ETL approach which could be adopted or even recommended when integrating eemeter at organisations.

Problem executing tutorial hourly example

Hi,

trying to reproduce the hourly example found here, I get the following error:

Traceback (most recent call last):
  File "test_hourly.py", line 64, in <module>
    metered_savings_dataframe, error_bands = eemeter.metered_savings(
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/derivatives.py", line 226, in metered_savings
    model_prediction = baseline_model.predict(
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/caltrack/hourly.py", line 191, in predict
    return self.model.predict(prediction_index, temperature_data, **kwargs)
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/segmentation.py", line 221, in predict
    prediction = segment_model.predict(segmented_data) * segmented_data.weight
  File "/home/stefano/evogy/caltrack/venv/lib/python3.8/site-packages/eemeter/segmentation.py", line 97, in predict
    prediction = design_matrix_granular.dot(parameters).rename(
TypeError: rename() got an unexpected keyword argument 'columns'

The problem seems to be that the rename command is called on a Pandas series instead of a Pandas Dataframe with the keyword argument "columns"

Report installed package versions

pandas==1.1.3 
eemeter==2.10.0

Minimal example

import eemeter

meter_data, temperature_data, sample_metadata = (
    eemeter.load_sample("il-electricity-cdd-hdd-hourly")
)

# the dates if an analysis "blackout" period during which a project was performed.
blackout_start_date = sample_metadata["blackout_start_date"]
blackout_end_date = sample_metadata["blackout_end_date"]

# get meter data suitable for fitting a baseline model
baseline_meter_data, warnings = eemeter.get_baseline_data(
    meter_data, end=blackout_start_date, max_days=365
)

# create a design matrix for occupancy and segmentation
preliminary_design_matrix = (
    eemeter.create_caltrack_hourly_preliminary_design_matrix(
        baseline_meter_data, temperature_data,
    )
)

# build 12 monthly models - each step from now on operates on each segment
segmentation = eemeter.segment_time_series(
    preliminary_design_matrix.index,
    'three_month_weighted'
)

# assign an occupancy status to each hour of the week (0-167)
occupancy_lookup = eemeter.estimate_hour_of_week_occupancy(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# assign temperatures to bins
temperature_bins = eemeter.fit_temperature_bins(
    preliminary_design_matrix,
    segmentation=segmentation,
)

# build a design matrix for each monthly segment
segmented_design_matrices = (
    eemeter.create_caltrack_hourly_segmented_design_matrices(
        preliminary_design_matrix,
        segmentation,
        occupancy_lookup,
        temperature_bins,
    )
)

# build a CalTRACK hourly model
baseline_model = eemeter.fit_caltrack_hourly_model(
    segmented_design_matrices,
    occupancy_lookup,
    temperature_bins,
)

# get a year of reporting period data
reporting_meter_data, warnings = eemeter.get_reporting_data(
    meter_data, start=blackout_end_date, max_days=365
)

# compute metered savings for the year of the reporting period we've selected
metered_savings_dataframe, error_bands = eemeter.metered_savings(
    baseline_model, reporting_meter_data,
    temperature_data, with_disaggregated=True
)

Thank you!

Versions of eemeter > 1.5.1 on PyPI?

Hello,

It appears that none of the 2.x series have been uploaded to PyPI. Is that an oversight or by design? I noticed that several releases for 2.5.x were released today.

Thanks!

pip does not install site-packages/eemeter/sample_data

eemeter sample fails with
IOError: [Errno 2] No such file or directory: '/usr/lib64/python2.7/site-packages/eemeter/sample_data/projects.csv'
Installing the sample_data directory by hand enabled the command to run to completion

get_baseline_data does not partition data (using daily data set).

Report installed package versions

eemeter==2.7.2
pandas==0.23.4
scipy==1.3.0
numpy==1.16.4

Describe the bug
The get_baseline_data function with option max_days = 365 returns the input dataframe, not a version subsetted to 365 days.

  1. Include a short, self-contained Python snippet reproducing the problem. You can
    format the code nicely by using GitHub Flavored Markdown:

    >>> In [1]: import eemeter
    
    >>> In [2]: import pandas as pd
    
    >>> In [3]: meter_data, temperature_data, metadata = \
    ...:     eemeter.load_sample('il-electricity-cdd-hdd-daily')
    >>> In [5]: data = eemeter.create_caltrack_daily_design_matrix(meter_data, temperature_data)
    ...:
    >>> In [6]: baseline_data, warnings = eemeter.get_baseline_data(data, max_days=365)
    >>> In [7]: baseline_data.equals(data)
    >>>  Out[7]: True
    >>> In [8]: eemeter.get_version()
    >>>  Out[8]: '2.7.2'
    >>> In [9]: pd.show_versions()

INSTALLED VERSIONS

commit: None
python: 3.7.3.final.0
python-bits: 64
OS: Windows
OS-release: 10
machine: AMD64
processor: Intel64 Family 6 Model 61 Stepping 4, GenuineIntel
byteorder: little
LC_ALL: None
LANG: None
LOCALE: None.None

pandas: 0.23.4
pytest: None
pip: 19.1.1
setuptools: 41.0.1
Cython: None
numpy: 1.16.4
scipy: 1.3.0
pyarrow: None
xarray: None
IPython: 7.6.1
sphinx: None
patsy: 0.5.1
dateutil: 2.8.0
pytz: 2019.1
blosc: None
bottleneck: None
tables: None
numexpr: None
feather: None
matplotlib: 3.1.1
openpyxl: None
xlrd: None
xlwt: None
xlsxwriter: None
lxml: None
bs4: None
html5lib: None
sqlalchemy: 1.3.6
pymysql: None
psycopg2: None
jinja2: None
s3fs: None
fastparquet: None
pandas_gbq: None
pandas_datareader: None

   >>> In [10]: import scipy
   >>> In [12]: scipy.__version__
   >>> Out[12]: '1.3.0'
   >>>  In [13]: import numpy
   >>> In [15]: numpy.__version__
   >>> Out[15]: '1.16.4'
   >>> In [16]: len(baseline_data)
   >>> Out[16]: 810
   >>> In [17]: len(data)
   >>> Out[17]: 810

Expected behavior

Expect a dataframe of length 365 days over only the first 365 days of data.

Additional context
Add any other context about the problem here.

LF enery data architecture presentation request

Hi openEEmeter community,

With LF energy data architecture working group we like to have more insight in the current LF energy projects and their data architecture. The goal of the data architecture is to improve interopabilty of the LF energy projects.

We would like to get insight in the following topics. Can you guys give an 30 minute presentation around this topics during one of the office hours?
Project focus and introduction
Data input
Data output
Used semantics (e.g. What information standards are used?)

Please select a date and I will send an invite.
https://wiki.lfenergy.org/display/HOME/Data+Architecture+Working+Group

Data architecture working document:
https://docs.google.com/document/d/1QcHqPRSmUUJQlJnfygGDkOpDPlId6U1V22pBuvZvDYk/edit#heading=h.g0v5yhj0kiyj

Gr Sander

Problems loading sample data in 2.5.4

Installing using pip install eemeter and trying to run the sample data using eemeter.samples() gives me the following error:

FileNotFoundError: [Errno 2] No such file or directory: '/Users/jeff/anaconda3/envs/openee/lib/python3.7/site-packages/eemeter/samples/metadata.json'

The contents of /Users/jeff/anaconda3/envs/openee/lib/python3.7/site-packages/eemeter/samples/ has:

__init__.py
__pycache__
load.py

No sample data or metadata.json.

However, based on issue #330 I installed in a new environment using:

pip install -e git+git://github.com/openeemeter/[email protected]#egg=eemeter

eemeter.samples() worked at this point.

I'm using conda to manage environments, but I suspect that isn't the problem. I would guess those assets are not being distributed in the release, but it works when cloning. Either way, here is my setup from conda info:

     active environment : openee
    active env location : /Users/jeff/anaconda3/envs/openee
            shell level : 2
       user config file : /Users/jeff/.condarc
 populated config files : /Users/jeff/.condarc
          conda version : 4.6.14
    conda-build version : 3.17.8
         python version : 3.7.3.final.0
       base environment : /Users/jeff/anaconda3  (writable)
           channel URLs : https://repo.anaconda.com/pkgs/main/osx-64
                          https://repo.anaconda.com/pkgs/main/noarch
                          https://repo.anaconda.com/pkgs/free/osx-64
                          https://repo.anaconda.com/pkgs/free/noarch
                          https://repo.anaconda.com/pkgs/r/osx-64
                          https://repo.anaconda.com/pkgs/r/noarch
          package cache : /Users/jeff/anaconda3/pkgs
                          /Users/jeff/.conda/pkgs
       envs directories : /Users/jeff/anaconda3/envs
                          /Users/jeff/.conda/envs
               platform : osx-64
             user-agent : conda/4.6.14 requests/2.21.0 CPython/3.7.3 Darwin/18.5.0 OSX/10.14.4
                UID:GID : 501:20
             netrc file : /Users/jeff/.netrc
           offline mode : False

So I'm not blocked, but I thought you should know. Thanks.

eemeter 2.x.x version to test tutorial

Hey there,

I cloned the repo to local and try to test the code - I wanted to test the tutorial jupyter notebook - however, when I installed the eemeter library - the current version is 1.5.1 which didn't compile with the code. Can you tell me how I can install version 2.x.x? Thx a lot!

Best,
Doris

Documentation is misleading about dates & datetimes

I'm deviating from the issue reporting template here because I'm reporting a bug in the documentation.

It's a confusing aspect of eemeter's documentation that in some places it refers to dates and datetimes interchangably. For example, the documentation for get_baseline_data():

def get_baseline_data(...):
    """
    ...
    start : :any:`datetime.datetime`
        A timezone-aware datetime that represents the earliest allowable start
        date for the baseline data. The stricter of this or `max_days` is used
        to determine the earliest allowable baseline period date.
    end : :any:`datetime.datetime`
        A timezone-aware datetime that represents the latest allowable end
        date for the baseline data, i.e., the latest date for which data is
        available before the intervention begins.
    max_days : :any:`int`, default 365
        The maximum length of the period. Ignored if `end` is not set.
        The stricter of this or `start` is used to determine the earliest
        allowable baseline period date.
    ...
    """

The language here talks about both datetimes and dates, but they're different things and you'd expect different handling:

  • If using dates, you assume the smallest unit of processing is the date and then the fact you're passing a 'timezone-aware datetime' seems weird and leads you to question what happens if you pass a datetime in the middle of a day (is the time ignored?).
  • On the other hand, if you assume these values are handled as timestamps, then the talk of dates makes it hard to understand what will happen without reading the code (max_days gets added to start, so if start is 12:00 one day, does the end 'date' end up being at midday as well?)

It turns out that really everything is using timestamps/datetimes and the talk of dates is a bit misleading. It would be a lot clearer if the language of timestamps was used throughout:

def get_baseline_data(...):
    """
    ...
    start : :any:`datetime.datetime`
        A timezone-aware datetime that represents the earliest allowable moment
        for the baseline data. The stricter of this or `max_days` is used
        to determine the earliest allowable baseline period timestamp.
    end : :any:`datetime.datetime`
        A timezone-aware datetime that represents the latest allowable end
        moment for the baseline data, i.e., the latest moment for which data is
        available before the intervention begins.
    max_days : :any:`int`, default 365
        The maximum length of the period. Ignored if `end` is not set.
        The stricter of this or `start` is used to determine the earliest
        allowable baseline period timestamp.
    ...
    """

It would also be worth mentioning that start and max_days are mutually exclusive, as that's not clear in the current description for this function :)

Caltrack usage per day predict

I am trying to use eemeter.caltrack_usage_per_day_predict with my caltrack daily model to predict usage over a provided prediction index. I get an error:
'DatetimeIndex' object has no attribute 'index'

The error seems to be associated with the computer_temperatures_features step. When I isolate the compute temperature features step with the same values it runs without errors.

Here is my code with details of each argument for eemeter.caltrack_usage_per_day_predict

eemeter.caltrack_usage_per_day_predict(
    baseline_model_results_daily.model.model_type, 
    baseline_model_results_daily.model.model_params,
    normal_year_temperatures_F,
    prediction_index_hourly
    )

baseline_model_results_daily.model.model_type=

'cdd_hdd'

baseline_model_results_daily.model.model_params =

{'intercept': 15.348747394608491,
'beta_cdd': 0.9809488470884145,
'beta_hdd': 0.37002743650584663,
'cooling_balance_point': 55,
'heating_balance_point': 55}

normal_year_temperatures_F a series (same error message received if I convert type to dataframe) =

2020-10-10 15:00:00+00:00 53.96
2020-10-10 16:00:00+00:00 55.94
2020-10-10 17:00:00+00:00 57.92
2020-10-10 18:00:00+00:00 60.08
2020-10-10 19:00:00+00:00 60.98
...
2021-10-11 10:00:00+00:00 42.98
2021-10-11 11:00:00+00:00 44.06
2021-10-11 12:00:00+00:00 44.60
2021-10-11 13:00:00+00:00 44.96
2021-10-11 14:00:00+00:00 44.96
Freq: H, Length: 8784, dtype: float64

prediction_index_hourly=

DatetimeIndex(['2020-10-10 15:00:00+00:00', '2020-10-10 16:00:00+00:00',
'2020-10-10 17:00:00+00:00', '2020-10-10 18:00:00+00:00',
'2020-10-10 19:00:00+00:00', '2020-10-10 20:00:00+00:00',
'2020-10-10 21:00:00+00:00', '2020-10-10 22:00:00+00:00',
'2020-10-10 23:00:00+00:00', '2020-10-11 00:00:00+00:00',
...
'2021-10-11 05:00:00+00:00', '2021-10-11 06:00:00+00:00',
'2021-10-11 07:00:00+00:00', '2021-10-11 08:00:00+00:00',
'2021-10-11 09:00:00+00:00', '2021-10-11 10:00:00+00:00',
'2021-10-11 11:00:00+00:00', '2021-10-11 12:00:00+00:00',
'2021-10-11 13:00:00+00:00', '2021-10-11 14:00:00+00:00'],
dtype='datetime64[ns, UTC]', length=8784, freq='H')

Here is my code for eemeter.compute_temperature_features

eemeter.compute_temperature_features(
    prediction_index_daily, 
    normal_year_temperatures_F, 
    heating_balance_points=None, 
    cooling_balance_points=None, 
    data_quality=False, 
    temperature_mean=True, 
    degree_day_method='daily', 
    percent_hourly_coverage_per_day=0.5, 
    percent_hourly_coverage_per_billing_period=0.9, 
    use_mean_daily_values=True, 
    tolerance=None, 
    keep_partial_nan_rows=True
    )

I only post this because this ran successfully when this is what I believe was causing an error in caltrack_usage_per_day_predict.

Docker build fails

Re testing: running both docker build . and docker-compose build returns an error relating to line 11 in Dockerfile: executor failed running [/bin/sh -c set -ex && pipenv install --system --deploy --dev]: exit code: 1. Has this issue been noticed before? Is there a step-by-step testing tutorial that could be shared specific to eemeter?

This occurs for me on Windows 11 running eemeter tests on a unchanged copy of eemeter cloned directly from this repo. Any help much appreciated.

"OutOfBoundsDatetime: Out of bounds nanosecond timestamp" error.

An error is produced when using eemeter with the most recent version of pandas.

>>> meter_data_daily, temperature_data_daily, metadata_daily = eemeter.load_sample('il-electricity-cdd-hdd-daily')
>>> meter_data_billing, temperature_data_billing, metadata_billing = eemeter.load_sample('il-electricity-cdd-hdd-billing_monthly')
>>> baseline_end_date = metadata_billing['blackout_start_date']
>>> baseline_meter_data_daily, baseline_warnings_daily = eemeter.get_baseline_data(meter_data_daily, end=baseline_end_date, max_days=365)

After calling eemeter.get_baseline_data(), the error is: OutOfBoundsDatetime: Out of bounds nanosecond timestamp: 1677-09-21 00:12:43.

Package versions

Python==3.9.1
eemeter==3.1.0
pandas==1.3.2

Reverting pandas version back to 1.2.1 has fixed the issue.

pip install eemeter failing to install. Is statsmodels the culprit or just a symptom?

An issue that Devan from Kilowatt Analytics ran into trying to pip install eemeter

image

I’m guessing this issue is due to the eemeter install not completing without errors. Having just repeated the process I realize that ‘pip install eemeter’ is failing due to an issue with lxml. I had to manually apt-get install lxml as I was previously warned that it was not present when trying to install eemeter. Maybe there’s a lxml version dependency that is causing an issue. Regardless, I cannot seem to install eemeter following the instructions on either github or RTD

Documentation is out of date

Describe the bug

The documentation at https://eemeter.openee.io/index.html doesn't reflect the latest version of the package, 4.0.

For example, trying to access functions in modules in the tutorial and the API aren't available

import eemeter
eemeter.create_caltrack_billing_design_matrix
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'eemeter' has no attribute 'create_caltrack_billing_design_matrix'

Expected behavior

The documentation should reflect the current state of the software.

Additional context

None

Advices on how to use eemeter

Hello,

I'm interested in applying the eemeter library on data for building winter energy usage, but I'm having some difficulties in applying the model for daily data, while I'm getting consistent results in the monthly and weekly cases.

Data

Data are for a building in Italy for years 2019, 2020 and 2021 (data.zip). The meter data report the number of instants in which the heating machine was on. Below you can find a plot of the data.

image

Weekly and monthly models

Aggregating the data to obtain monthly and weekly frequencies, the resulting models make sense.
This is the code that I'm using to generate the model.

import datetime

import pytz
import pandas as pd
import matplotlib.pyplot as plt

import eemeter


# Load data
meter_data_path = "./meter_data.csv"
temp_data_path = "./temperature_data.csv"

meter_data = pd.read_csv(meter_data_path, index_col=0)
meter_data.index = pd.to_datetime(meter_data.index)

temp_data = pd.read_csv(temp_data_path, index_col=0)

temp_data.index = pd.to_datetime(temp_data.index)
temp_data = temp_data.resample("1H").mean().interpolate(method="linear").value

temp_data = temp_data.loc[temp_data.index >= datetime.datetime(2019, 1, 1, tzinfo=pytz.utc)]


# Define parameters: "W" for weekly model, "M" fr monthly model
time_freq = "W"
use_billing_presets = True
weights_col = "num_days"

# Aggregate meter data
meter_data_agg = meter_data.value.dropna().resample(time_freq).agg(["sum", "size"]) 
meter_data_agg["num_days"] = meter_data_agg["size"] / 24

meter_data_agg = meter_data_agg.rename(columns={"sum": "value"}) 

# Create caltrack billing design matrix and extract baseline data
data = eemeter.create_caltrack_billing_design_matrix(meter_data_agg, temp_data)
    
baseline_data = eemeter.get_baseline_data(
    data,
    start=datetime.datetime(2019, 1, 1, tzinfo=pytz.utc),
    end=datetime.datetime(2019, 12, 31, tzinfo=pytz.utc),
    max_days=None
)

# Add weights column to baseline data
baseline_df = baseline_data[0]
baseline_df[weights_col] = meter_data_agg[weights_col]

# Fit Caltrack model
model_results = eemeter.fit_caltrack_usage_per_day_model(
    baseline_data[0],
    use_billing_presets=use_billing_presets,
    weights_col=weights_col
)

# Plot resulting model
fig, ax = plt.subplots(2, 1, figsize=(12, 8))

ax[0].set_title("Reference period")
eemeter.plot_energy_signature(
    meter_data_agg.loc[meter_data_agg.index <= datetime.datetime(2020, 1, 1, tzinfo=pytz.utc)],
    temp_data, ax=ax[0])
model_results.plot(ax=ax[0], with_candidates=False)

ax[1].set_title("Whole dataset")
eemeter.plot_energy_signature(meter_data_agg, temp_data, ax=ax[1])
model_results.plot(ax=ax[1], with_candidates=False)

fig.subplots_adjust(hspace=0.5)

plt.show()

The above code generates the following two figures (setting time_freq to "W" and "M" respectively).

Weekly

image

Monthly

image

Daily data

The daily data show a strong dependence on the day of the week with a very different pattern between weekdays and weekends (see image below).

image

Consequently, when I fit the Caltrack daily model, I obtain a model that underestimate the in-week values and overestimate the weekends.

image

image

My idea was to include a week of day categorical variable in the regression model features (overriding the methods get_single_*_only_candidate_model). Do you have any advice on how to improve the daily model?

Thank you!

Bug in eemter when doing CalTRACK Hourly method with 'one_month' adjustment

I am trying to run the eemeter with my hourly meter and temperature data sets using "CalTRACK Hourly method".

When I do it with 'three_month_weighted' setting, it works well and I can calculate metered_savings after fitting a CalTRACK hourly model.

But, when I change that setting to 'one_month', I can still fit a CalTRACK hourly model, but when I want to calculate the metered_savings, it gives me the following error:

>>> KeyError: 'dec-jan-feb-weighted'

It seems that the source of error is here:

>>>  File "eemeter-2.8.5\eemeter\caltrack\hourly.py", line 159, in predict return self.model.predict(prediction_index, temperature_data, **kwargs) 

I also get the same error when the setting is 'single' or 'three_month'.

Do you know how we can fix this error?

Thanks in advance,
Ali

can't load sample data

I'm trying load sample data as described in http://eemeter.openee.io/basics.html#loading-sample-data. I tried with both 1.5.1 (pip install) and 2.2.6 (pip install git+git://github.com/openeemeter/[email protected]), both with python 3.6.0

running eemeter.samples() with 1.5.1 fails with module 'eemeter' has no attribute 'samples'

(eemeter) ~/work/platform $ pip install eemeter
Installing collected packages: eemeter
Successfully installed eemeter-1.5.1
(eemeter) ~/work/platform $ python3
Python 3.6.0 (default, Sep 12 2017, 20:42:47)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
>>> import eemeter
>>> eemeter.samples()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
AttributeError: module 'eemeter' has no attribute 'samples'
>>>

running eemeter.samples() with 2.2.6 fails with FileNotFoundError: [Errno 2] No such file or directory: '/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/metadata.json'

(eemeter) ~/work/platform $ pip install git+git://github.com/openeemeter/[email protected]
Successfully installed eemeter-2.2.6
(eemeter) ~/work/platform $ python3
Python 3.6.0 (default, Sep 12 2017, 20:42:47)
[GCC 4.2.1 Compatible Apple LLVM 8.1.0 (clang-802.0.42)] on darwin
>>> import eemeter
>>> eemeter.samples()
Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/load.py", line 45, in samples
    sample_metadata = _load_sample_metadata()
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/load.py", line 32, in _load_sample_metadata
    with resource_stream("eemeter.samples", "metadata.json") as f:
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1208, in resource_stream
    self, resource_name
  File "/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/pkg_resources/__init__.py", line 1573, in get_resource_stream
    return open(self._fn(self.module_path, resource_name), 'rb')
FileNotFoundError: [Errno 2] No such file or directory: '/Users/kimberly/.pyenv/versions/eemeter/lib/python3.6/site-packages/eemeter/samples/metadata.json'

I expected it to load sample data as described on http://eemeter.openee.io/basics.html#loading-sample-data

I'm using MacOS High Sierra 10.13.6 and python 3.6.0

Daily Model Overpredictions - eemeter version 4.0

In this specific zone, the heating system has reached its maximum capacity, resulting in a plateau in energy usage. However, the regression used for prediction has not been split appropriately. It’s important to note that this issue is not about the heating balance point; rather, it pertains to properly handling the regression when HVAC is operating at its maximum capacity.

image


caltrack hourly method problem

I plan to try EEmeter caltrack hourly method, but there is one line of the code doesn't work. I used the sample data and do exactly the same with the tutorial, but this line of the code just shows error. I wonder if we need a specific version of pandas to run our code? I' m really confused about it, would you please have a look at this problem?

occupancy_lookup_hourly = eemeter.estimate_hour_of_week_occupancy(
preliminary_design_matrix_hourly,
segmentation=segmentation_hourly,
# threshold=0.65 # default
)

Add more versatile tools for controlling logging from the CLI

From @hangtwenty: Add versatile tools for controlling logging from the CLI. Allows changing the whole logging config, but more typically you would let the default config get used... and optionally --log-console to turn on console log output, and/or --log-level=DEBUG to increase the verbosity. For the console logger and the DEBUG level logger, the default logging config's log format has a "debug trace" flavor to it, showing modules, function names, and line numbers. (Reusing my favorite base config from other projects.) See #127

load_sample() error

From the basic usage docs:
http://eemeter.openee.io/basics.html#loading-sample-data

meter_data, temperature_data, metadata = eemeter.load_sample('il-electricity-cdd-hdd-daily')

I get this:

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
<ipython-input-17-a24e0ab28a69> in <module>
----> 1 meter_data, temperature_data, metadata = eemeter.load_sample('il-gas-intercept-only-hourly')

~/git/learning/openee3/src/eemeter/eemeter/samples/load.py in load_sample(sample)
     80     meter_data_filename = metadata["meter_data_filename"]
     81     with resource_stream("eemeter.samples", meter_data_filename) as f:
---> 82         meter_data = meter_data_from_csv(f, gzipped=True, freq=freq)
     83 
     84     temperature_filename = metadata["temperature_filename"]

~/git/learning/openee3/src/eemeter/eemeter/io.py in meter_data_from_csv(filepath_or_buffer, tz, start_col, value_col, gzipped, freq, **kwargs)
     81     read_csv_kwargs.update(kwargs)
     82 
---> 83     df = pd.read_csv(filepath_or_buffer, **read_csv_kwargs).tz_localize("UTC")
     84     if tz is not None:
     85         df = df.tz_convert(tz)

~/anaconda3/envs/openee2/lib/python3.7/site-packages/pandas/core/generic.py in tz_localize(self, tz, axis, level, copy, ambiguous, nonexistent)
   9405             if level not in (None, 0, ax.name):
   9406                 raise ValueError("The level {0} is not valid".format(level))
-> 9407             ax = _tz_localize(ax, tz, ambiguous, nonexistent)
   9408 
   9409         result = self._constructor(self._data, copy=copy)

~/anaconda3/envs/openee2/lib/python3.7/site-packages/pandas/core/generic.py in _tz_localize(ax, tz, ambiguous, nonexistent)
   9385                     ax_name = self._get_axis_name(axis)
   9386                     raise TypeError('%s is not a valid DatetimeIndex or '
-> 9387                                     'PeriodIndex' % ax_name)
   9388                 else:
   9389                     ax = DatetimeIndex([], tz=tz)

TypeError: index is not a valid DatetimeIndex or PeriodIndex

I installed eemeter with pip install -e git+git://github.com/openeemeter/[email protected]#egg=eemeteras in #352 but I don't know if that is causing it.

eemeter is loaded and available:

> eemeter.samples()
['il-electricity-cdd-hdd-billing_bimonthly',
 'il-electricity-cdd-hdd-billing_monthly',
 'il-electricity-cdd-hdd-daily',
 'il-electricity-cdd-hdd-hourly',
 'il-electricity-cdd-only-billing_bimonthly',
 'il-electricity-cdd-only-billing_monthly',
 'il-electricity-cdd-only-daily',
 'il-electricity-cdd-only-hourly',
 'il-gas-hdd-only-billing_bimonthly',
 'il-gas-hdd-only-billing_monthly',
 'il-gas-hdd-only-daily',
 'il-gas-hdd-only-hourly',
 'il-gas-intercept-only-billing_bimonthly',
 'il-gas-intercept-only-billing_monthly',
 'il-gas-intercept-only-daily',
 'il-gas-intercept-only-hourly']

and see(eemeter.load_sample) shows this:

isfunction    isroutine     ()            <             <=            ==
    !=            >             >=            dir()         hash()
    help()        repr()        str()

Thanks for looking into it.

Best way to contribute?

Hello,

I was wondering what the best way to contribute to this project might be. Most of my experience is with front-end stuff with React/Angular (although I am by no means a master of it) and I was wondering if I could perhaps contribute towards a basic GUI/charting system for the project.

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.