Giter Club home page Giter Club logo

modern-time-series-forecasting-with-python's Introduction

Packt Conference

3 Days, 20+ AI Experts, 25+ Workshops and Power Talks

Code: USD75OFF

Modern Time Series Forecasting with Python

Modern Time Series Forecasting with Python

This is the code repository for Modern Time Series Forecasting with Python, published by Packt.

Explore industry-ready time series forecasting using modern machine learning and deep learning

What is this book about?

We live in a serendipitous era where the explosion in the quantum of data collected and a renewed interest in data-driven techniques such as machine learning (ML), has changed the landscape of analytics, and with it, time series forecasting. This book, filled with industry-tested tips and tricks, takes you beyond commonly used classical statistical methods such as ARIMA and introduces to you the latest techniques from the world of ML.

This book covers the following exciting features:

  • Find out how to manipulate and visualize time series data like a pro
  • Set strong baselines with popular models such as ARIMA
  • Discover how time series forecasting can be cast as regression
  • Engineer features for machine learning models for forecasting
  • Explore the exciting world of ensembling and stacking models
  • Get to grips with the global forecasting paradigm
  • Understand and apply state-of-the-art DL models such as N-BEATS and Autoformer
  • Explore multi-step forecasting and cross-validation strategies

If you feel this book is for you, get your copy today!

https://www.packtpub.com/

Instructions and Navigations

All of the code is organized into folders.

The code will look like the following:

#Does not support missing values, so using imputed ts instead
res = seasonal_decompose(ts, period=7*48, model="additive",
extrapolate_trend="freq")

Following is what you need for this book: The book is for data scientists, data analysts, machine learning engineers, and Python developers who want to build industry-ready time series models. Since the book explains most concepts from the ground up, basic proficiency in Python is all you need. Prior understanding of machine learning or forecasting will help speed up your learning. For experienced machine learning and forecasting practitioners, this book has a lot to offer in terms of advanced techniques and traversing the latest research frontiers in time series forecasting.

Setup the environment

The easiest way to setup the environment is by using Anaconda, a distribution of Python for scientific computing. You can use Miniconda, a minimal installer for conda as well if you do not want the pre-installed packages that come with Anaconda.

  1. Install Anaconda/Miniconda: Anaconda can be installed from https://www.anaconda.com/products/distribution. Depending on your operating system choose the corresponding file and follow instructions. Or you can install Miniconda from here: https://docs.conda.io/en/latest/miniconda.html#latest-miniconda-installer-links.
  2. Open conda prompt: To open Anaconda Prompt (or terminal on Linux or macOS):
    1. Windows: Open the Anaconda Prompt (Start >> Anaconda Prompt)
    2. macOS: Open Launchpad and then open Terminal. Type conda activate
    3. Linux: Open Terminal. Type conda activate
  3. Navigate to the downloaded code: Use operating system specific commands to navigate to the folder where you have downloaded the code. For instance, in Windows, use cd.
  4. Install the environment: Using the anaconda_env.yml file that is included install the environment. conda env create -f anaconda_env.yml This creates a new environment under the name, modern_ts, and will install all the required libraries in the environment. This can take a while.
  5. Checking the installation: We can check if all the libraries required for the book is installed properly by executing a script in the downloaded code folder python test_installation.py
  6. Activating the environment and Running Notebooks: Every time you want to run the notebooks, first activate the environment using the command conda activate modern_ts and then use Jupyter Notebook (jupyter notebook) or Jupyter Lab (jupyter lab) according to your preference.

If anaconda installation stalls

Sometimes the anaconda installation can stall at Solving Environment. This is because anaconda can sometimes be really slow at resolving package dependencies. We can get around this by using Mamba.

Mamba is a fast, robust, and cross-platform package manager.

It runs on Windows, OS X and Linux (ARM64 and PPC64LE included) and is fully compatible with conda packages and supports most of conda’s commands.

All we need to do is:

  1. Install mamba - conda install mamba -n base -c conda-forge
  2. Instead of using conda, use mamba to install the environment - mamba env create -f anaconda_env.yml

Special Instructions for MacOS

If the installation doesn't work for MacOS, please try the following:

  1. In anaconda_env.yml, change the line python-kaleido==0.1.0 to python-kaleido>=0.1.0
  2. In anaconda_env.yml, change the line statsforecast==0.6.0 to statsforecast>=0.6.0

Now, try installing the environment again. If this doesn't work, please raise an issue on the GitHub repo.

Download the Data

You are going to be using a single dataset throughout the book. The book uses London Smart Meters Dataset from Kaggle for this purpose. Therefore, if you don’t have an account with Kaggle, please go ahead and make one. https://www.kaggle.com/account/login?phase=startRegisterTab There are two ways you can download the data- automated and manual. For the automated way, we need to download a key from Kaggle. Let’s do that first (if you are going to choose the manual way, you can skip this).

  1. Click on your profile picture on the top right corner of Kaggle
  2. Select "Account”, and find the section for “API”
  3. Click the “Create New API Token” button. A file by the name kaggle.json will be downloaded.
  4. Copy the file and place it in the api_keys folder in the downloaded code folder. Now that we have the kaggle.json downloaded and placed in the right folder, let’s look at the three methods to download data:

Method 1: Automated Download

  1. Activate the environment using conda activate modern_ts
  2. Run the provided script from the root directory of downloaded code python scripts/download_data.py That’s it. Now just wait for the script to finish downloading, unzipping and organize the files in the expected format.

Method 2: Manual Download

  1. Go to https://www.kaggle.com/jeanmidev/smart-meters-in-london and download the dataset
  2. Unzip the contents to data/london_smart_meters
  3. Unzip hhblock_dataset to get the raw files we want to work with.
  4. Make sure the unzipped files are in the expected folder structure (next section) Now that you have downloaded the data, we need to make sure it is arranged in the below folder structure. Automated Download does it automatically, but for Manual Download this structure needs to be created. To avoid ambiguity, the expected folder structure can be found below:
data
├── london_smart_meters
│   ├── hhblock_dataset
│   │   ├── hhblock_dataset
│   │       ├── block_0.csv
│   │       ├── block_1.csv
│   │       ├── ...
│   │       ├── block_109.csv
│   │── acorn_details.csv
│   ├── informations_households.csv
│   ├── uk_bank_holidays.csv
│   ├── weather_daily_darksky.csv
│   ├── weather_hourly_darksky.csv

There can be additional files as part of the extraction process. You can remove them without impacting anything. There is a helpful script which checks this structure. python test_data_download.py

Blocks vs RAM

Number of blocks to select from the dataset is dependent on how much RAM you have in your machine. Although, these are not rules, but rough guidelines on how much blocks to choose based on your RAM is given below. If you still face problems, please experiment with lowering the number of blocks to make it work better for you.

  • 1 or <1 Block for 4GB RAM
  • 1 or 2 Blocks for 8GB RAM
  • 3 Blocks for 16GB RAM
  • 5 Blocks for 32GB RAM

We also provide a PDF file that has color images of the screenshots/diagrams used in this book. Click here to download it.

Related products

Get to Know the Author

Manu Joseph is a self-made data scientist with more than a decade of experience working with many Fortune 500 companies, enabling digital and AI transformations, specifically in machine learningbased demand forecasting. He is considered an expert, thought leader, and strong voice in the world of time series forecasting. Currently, Manu leads applied research at Thoucentric, where he advances research by bringing cutting-edge AI technologies to the industry. He is also an active open source contributor and has developed an open source library—PyTorch Tabular—which makes deep learning for tabular data easy and accessible. Originally from Thiruvananthapuram, India, Manu currently resides in Bengaluru, India, with his wife and son.

Download a free PDF

If you have already purchased a print or Kindle version of this book, you can get a DRM-free PDF version at no cost.
Simply click on the link to claim your free PDF.

https://packt.link/free-ebook/9781803246802

Errata

On page 3 of the Book, In chapter 1, it should be Welcome to "Modern Time Series Forecasting with Python" instead of Welcome to "Advanced Time Series Analysis Using Python".

modern-time-series-forecasting-with-python's People

Contributors

manujosephv avatar packt-itservice avatar roshank10 avatar utkarsha-packt avatar

Stargazers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

Watchers

 avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar  avatar

modern-time-series-forecasting-with-python's Issues

Reference to downloaded code in preface before download location is mentioned

Hi,
new to Anaconda et al so maybe it's obvious to other users. However, wanted to note it:

You write:

Install the environment: Using the anaconda_env.yml file that is included, install the
environment:
conda env create -f anaconda_env.yml
This creates a new environment under the name modern_ts and will install all the required
libraries in the environment. This can take a while.

However, the "downloaded code" location (this very github repo here) is only mentioned two subtitles down the road.

Thanks for the book!
Andreas

Setting up environment: 404 pytorch

When setting up the environment with mamba, it fails to download at pytorch.

RuntimeError: Multi-download failed. Reason: Transfer finalized, status: 404 [https://conda.anaconda.org/pytorch/pytorch-2.0.1-py3.9_cpu_0.tar.bz2] 955 bytes

Chapter 04 - Kaboudan Metric

I noticed an error in the book regarding the explanation of the Kaboudan metric. The current explanation states that if a time series is predictable, the Kaboudan metric approaches zero, and if it is not predictable, the metric approaches one. However, this is incorrect.
image

According to the correct interpretation of the Kaboudan metric, if a time series contains predictable patterns, the metric should approach one, indicating higher predictability. Conversely, if the time series lacks predictability, the metric should approach zero.

Thank you for your attention to this matter.

KeyError: "None of [Index(['Year', 'Revenue(mil. USD)'], dtype='object')] are in the [columns]"

Posting an issue I received through email:

in Chapter 3, notebook "02-Decomposing Time Series.ipynb", I get an error in the line:

tesla_revenue = pd.read_html("https://en.wikipedia.org/wiki/Tesla,_Inc.")[4][['Year', "Revenue(mil. USD)"]]

Error:

KeyError: "None of [Index(['Year', 'Revenue(mil. USD)'], dtype='object')] are in the [columns]"

In this loaded file there are no columns „Year“ and „Revenue“.

automated data download error

Hi -
Sorry to bother but I'm receiving the following erorr while running the download_data script in my root directory "ModuleNotFoundError: No module named 'logger_api'", it occurs on line 8, from logger_api import get_logger. Appreciate it, if you can help with this.

Thanks again!

Chapter 4- 02-Baseline Forecasts using darts

I am unable to import Naive Moving Average as i run the following code
from src.forecasting.baselines import NaiveMovingAverage

I am getting the following error:

ImportError Traceback (most recent call last)
Cell In[25], line 1
----> 1 from src.forecasting.baselines import NaiveMovingAverage

File ~\OneDrive\Desktop\thesis\book_reading\Modern-Time-Series-Forecasting\src\forecasting\baselines.py:2
1 import numpy as np
----> 2 from darts.models.forecasting.forecasting_model import LocalForecastingModel
3 from darts import TimeSeries
6 class NaiveMovingAverage(LocalForecastingModel):

ImportError: cannot import name 'LocalForecastingModel' from 'darts.models.forecasting.forecasting_model'

Chapter 08 : 01-Forecasting with ML

sample_train_df = train_df.loc[train_df.LCLid == "MAC000193", :]
sample_test_df = test_df.loc[test_df.LCLid == "MAC000193", :]

# display(sample_train_df)
train_features, train_target, train_original_target = feat_config.get_X_y(sample_train_df, categorical=False, exogenous=False
)
# Loading the Validation as test
test_features, test_target, test_original_target = feat_config.get_X_y(
    sample_test_df, categorical=False, exogenous=False
)
del sample_train_df, sample_test_df

I continue to get this error message.

---------------------------------------------------------------------------
TypeError                                 Traceback (most recent call last)
Cell In[14], line 5
      2 sample_test_df = test_df.loc[test_df.LCLid == "MAC000193", :]
      4 # display(sample_train_df)
----> 5 train_features, train_target, train_original_target = feat_config.get_X_y(sample_train_df, categorical=False, exogenous=False
      6 )
      7 # # Loading the Validation as test
      8 # test_features, test_target, test_original_target = feat_config.get_X_y(
      9 #     sample_test_df, categorical=False, exogenous=False
     10 # )
     11 # del sample_train_df, sample_test_df

File ~\OneDrive - Florida A&M University\COURSES\Online\Modern-Time-Series-Forecasting-with-Python-main\src\forecasting\ml_forecasting.py:153, in FeatureConfig.get_X_y(self, df, categorical, exogenous)
    150 feature_list = list(set(feature_list))
    151 delete_index_cols = list(set(self.index_cols) - set(self.feature_list))
    152 (X, y, y_orig) = (
--> 153     df.loc[:, set(feature_list + self.index_cols)]
    154     .set_index(self.index_cols, drop=False)
    155     .drop(columns=delete_index_cols),
    156     df.loc[:, [self.target] + self.index_cols].set_index(
    157         self.index_cols, drop=True
    158     )
    159     if self.target in df.columns
    160     else None,
    161     df.loc[:, [self.original_target] + self.index_cols].set_index(
    162         self.index_cols, drop=True
    163     )
    164     if self.original_target in df.columns
    165     else None,
    166 )
    167 return X, y, y_orig

File ~\anaconda3\envs\modern_ts\lib\site-packages\pandas\core\indexing.py:1091, in _LocationIndexer.__getitem__(self, key)
   1089 @final
   1090 def __getitem__(self, key):
-> 1091     check_dict_or_set_indexers(key)
   1092     if type(key) is tuple:
   1093         key = tuple(list(x) if is_iterator(x) else x for x in key)

File ~\anaconda3\envs\modern_ts\lib\site-packages\pandas\core\indexing.py:2618, in check_dict_or_set_indexers(key)
   2610 """
   2611 Check if the indexer is or contains a dict or set, which is no longer allowed.
   2612 """
   2613 if (
   2614     isinstance(key, set)
   2615     or isinstance(key, tuple)
   2616     and any(isinstance(x, set) for x in key)
   2617 ):
-> 2618     raise TypeError(
   2619         "Passing a set as an indexer is not supported. Use a list instead."
   2620     )
   2622 if (
   2623     isinstance(key, dict)
   2624     or isinstance(key, tuple)
   2625     and any(isinstance(x, dict) for x in key)
   2626 ):
   2627     raise TypeError(
   2628         "Passing a dict as an indexer is not supported. Use a list instead."
   2629     )

TypeError: Passing a set as an indexer is not supported. Use a list instead.

Ch10/01 - Global Forecasting Models: AttributeError: 'CountEncoder' object has no attribute 'get_feature_names'

Hi,
when running the notebook 01 for the Global Forecasting Models, I keep getting the following error:
AttributeError: 'CountEncoder' object has no attribute 'get_feature_names'

I have noticed that in the requirements, category_encoders package is listed without a specific version. When searching through its documentation, I haven't found the .get_feature_names method (I guess it might have been replaced by .get_feature_names_in() and .get_feature_names_out(), see here
I've been trying to fix the \src\forecasting\ml_forecasting.py module, however didn't get the code to work. Maybe providing a specific version of the category_encoders package you are using should suffice. Thanks

Traceback here:

Cell In[11], line 15, in train_model(model_config, feature_config, missing_config, train_features, train_target, test_features, fit_kwargs)
      1 def train_model(
      2     model_config,
      3     feature_config,
   (...)
      8     fit_kwargs={}
      9 ):
     10     ml_model = MLForecast(
     11         model_config=model_config,
     12         feature_config=feature_config,
     13         missing_config=missing_config,
     14     )
---> 15     ml_model.fit(train_features, train_target, fit_kwargs=fit_kwargs)
     16     y_pred = ml_model.predict(test_features)
     17     feat_df = ml_model.feature_importance()

File ~\Documents\ModernTimeSeries\src\forecasting\ml_forecasting.py:287, in MLForecast.fit(self, X, y, is_transformed, fit_kwargs)
    282     assert (
    283         len(missing_cat_cols) == 0
    284     ), f"These categorical features are not handled by the categorical_encoder : {missing_cat_cols}"
    285     X = self._cat_encoder.fit_transform(X, y)
    286     self._encoded_categorical_features = difference_list(
--> 287         self.model_config.categorical_encoder.get_feature_names(),
    288         self.feature_config.continuous_features
    289         + self.feature_config.boolean_features,
    290     )
    291 else:
    292     self._encoded_categorical_features = []

AttributeError: 'CountEncoder' object has no attribute 'get_feature_names'

python test_installation.py => AssertionError: Torch not compiled with CUDA enabled

(modern_ts) D:\PycharmProjects\Modern-Time-Series-Forecasting-with-Python>python test_installation.py
Traceback (most recent call last):

File "D:\PycharmProjects\Modern-Time-Series-Forecasting-with-Python\test_installation.py", line 8, in
f"GPU: {torch.cuda.is_available()} | # of GPU: {torch.cuda.device_count()}| Default GPU Name: {torch.cuda.get_device_name(0)}"
File "D:\Anaconda\envs\modern_ts\lib\site-packages\torch\cuda_init_.py", line 341, in get_device_name
return get_device_properties(device).name
File "D:\Anaconda\envs\modern_ts\lib\site-packages\torch\cuda_init_.py", line 371, in get_device_properties
_lazy_init() # will define get_device_properties
File "D:\Anaconda\envs\modern_ts\lib\site-packages\torch\cuda_init
.py", line 221, in _lazy_init
raise AssertionError("Torch not compiled with CUDA enabled")
AssertionError: Torch not compiled with CUDA enabled

ModuleNotFoundError: No module named 'src'

During the execution of this command:

from src.utils.data_utils import compact_to_expanded

I get the error:

ModuleNotFoundError Traceback (most recent call last)
Cell In[6], line 2
1 # Reading Blocks 0-7
----> 2 from src.utils.data_utils import compact_to_expanded

ModuleNotFoundError: No module named 'src'

When I try to install src with pip install src as stated here, I get the following error:

Collecting src
Using cached src-0.0.7.zip (6.3 kB)
Preparing metadata (setup.py) ... done
Building wheels for collected packages: src
Building wheel for src (setup.py) ... error
error: subprocess-exited-with-error

× python setup.py bdist_wheel did not run successfully.
│ exit code: 1
╰─> [49 lines of output]
running bdist_wheel
running build
running build_py
creating build
creating build/lib
creating build/lib/src
copying src/init.py -> build/lib/src
running egg_info
writing src.egg-info/PKG-INFO
writing dependency_links to src.egg-info/dependency_links.txt
deleting src.egg-info/entry_points.txt
writing requirements to src.egg-info/requires.txt
writing top-level names to src.egg-info/top_level.txt
reading manifest file 'src.egg-info/SOURCES.txt'
reading manifest template 'MANIFEST.in'
adding license file 'LICENSE.rst'
writing manifest file 'src.egg-info/SOURCES.txt'
/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/command/install.py:34: SetuptoolsDeprecationWarning: setup.py install is deprecated. Use build and pip and other standards-based tools.
warnings.warn(
Traceback (most recent call last):
File "", line 2, in
File "", line 34, in
File "/private/var/folders/rl/7k9wy1rj49b16tfgl8dttf200000gn/T/pip-install-t58cj2_l/src_44b4321ef9494d748a5c8d74e2d549ea/setup.py", line 70, in
setup(
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/init.py", line 108, in setup
return distutils.core.setup(**attrs)
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 185, in setup
return run_commands(dist)
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
dist.run_commands()
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
self.run_command(cmd)
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/dist.py", line 1221, in run_command
super().run_command(command)
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
cmd_obj.run()
File "/Users/bills/anaconda3/lib/python3.10/site-packages/wheel/bdist_wheel.py", line 292, in run
install = self.reinitialize_command('install',
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/init.py", line 221, in reinitialize_command
cmd = _Command.reinitialize_command(self, command, reinit_subcommands)
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 311, in reinitialize_command
return self.distribution.reinitialize_command(command, reinit_subcommands)
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/dist.py", line 953, in reinitialize_command
for sub in command.get_sub_commands():
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/cmd.py", line 329, in get_sub_commands
if method is None or method(self):
File "/Users/bills/anaconda3/lib/python3.10/site-packages/setuptools/_distutils/command/install.py", line 787, in has_lib
self.distribution.has_pure_modules() or self.distribution.has_ext_modules()
AttributeError: 'NoneType' object has no attribute 'has_pure_modules'
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
ERROR: Failed building wheel for src
Running setup.py clean for src
Failed to build src
Installing collected packages: src
Running setup.py install for src ... error
error: subprocess-exited-with-error

× Running setup.py install for src did not run successfully.
│ exit code: 1
╰─> [2 lines of output]
running install
You've probably made a mistake here and are trying to install from a 'src' directory which doesn't exist.
[end of output]

note: This error originates from a subprocess, and is likely not a problem with pip.
error: legacy-install-failure

× Encountered error while trying to install package.
╰─> src

note: This is an issue with the package mentioned above, not pip.
hint: See above for output from the failure.

Has anyone faced the same issue? I am using python version 3.10.9 on macOS (M2) with jupyter.

Environment Issues on M1 pro

The Environment is crashing after installing with the suggested changes, utilizing the the new anaconda_env.yml
In anaconda_env.yml, change the line python-kaleido==0.1.0 to python-kaleido>=0.1.0
In anaconda_env.yml, change the line statsforecast==0.6.0 to statsforecast>=0.6.0

Following error is noticed on Jupyter notebooks after installing the environment, a simple numpy import has failed
image
Canceled future for execute_request message before replies were done
The Kernel crashed while executing code in the the current cell or a previous cell. Please review the code in the cell(s) to identify a possible cause of the failure.

Additionally, following error is noticed on terminal:

Modern-Time-Series-Forecasting-with-Python % python
Python 3.9.0 | packaged by conda-forge | (default, Nov 26 2020, 07:54:06)
[Clang 11.0.0 ] on darwin
Type "help", "copyright", "credits" or "license" for more information.

import numpy as np
Intel MKL FATAL ERROR: This system does not meet the minimum requirements for use of the Intel(R) Math Kernel Library.
The processor must support the Intel(R) Supplemental Streaming SIMD Extensions 3 (Intel(R) SSSE3) instructions.
The processor must support the Intel(R) Streaming SIMD Extensions 4.2 (Intel(R) SSE4.2) instructions.
The processor must support the Intel(R) Advanced Vector Extensions (Intel(R) AVX) instructions.

Looking forward to the help.

Command "conda env create -f anaconda_env.yml" fails

My environment is:

  • Apple M1 Pro
  • MacOS Sonoma 14.5
  • conda 24.5.0

Command I ran:

conda env create -f anaconda_env.yml

My fix:

Failed with error:

Channels:
 - pytorch
 - conda-forge
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: failed
Channels:
 - pytorch
 - conda-forge
 - defaults
Platform: osx-arm64
Collecting package metadata (repodata.json): done
Solving environment: failed

LibMambaUnsatisfiableError: Encountered problems while solving:
  - nothing provides requested pytorch::torchvision
  - nothing provides requested pytorch::torchaudio
  - nothing provides requested pytorch::pytorch
  - nothing provides kaleido-core 0.1.0.* needed by python-kaleido-0.1.0-pyhd8ed1ab_0

Could not solve for environment specs
The following packages are incompatible
├─ python-kaleido 0.1.0  is not installable because it requires
│  └─ kaleido-core 0.1.0.* , which does not exist (perhaps a missing channel);
├─ pytorch does not exist (perhaps a typo or a missing channel);
├─ torchaudio does not exist (perhaps a typo or a missing channel);
└─ torchvision does not exist (perhaps a typo or a missing channel).

Data structure is not correct

In book preface xxii - xxiii, the data structure is depicted as:
data
├── london_smart_meters
│ ├── hhblock_dataset
│ │ ├── hhblock_dataset
│ │ ├── block_0.csv
│ │ ├── block_1.csv
│ │ ├── ...
│ │ ├── block_109.csv
│── acorn_details.csv
├── informations_households.csv
├── uk_bank_holidays.csv
├── weather_daily_darksky.csv
├── weather_hourly_darksky.csv

But based on the test code, the data structure should be like this:
data
├── london_smart_meters
│ ├── hhblock_dataset
│ │ ├── hhblock_dataset
│ │ ├── block_0.csv
│ │ ├── block_1.csv
│ │ ├── ...
│ │ ├── block_109.csv
│ ├── acorn_details.csv
│ ├── nformations_households.csv
│ ├── uk_bank_holidays.csv
│ ├── weather_daily_darksky.csv
│ ├── weather_hourly_darksky.csv

Target Transformation in ML_Forecast class not working (workaround solution included)

I tried making the target stationary to account for extrapolation issues with tree-based models but encountered a frequency inference error and had to manually set a frequency of "30min".
 

y = self.target_transformer.fit_transform(y)

Issue is with regards to frequency not being inferred from the target (not sure of the reason why?). I overrode it by modifying the code as such: y = self.target_transformer.fit_transform(y.asfreq('30min'))

Can you suggest a better way to debug this (may be there is an issue with the data itself?) and help me with a reason of why this might happening?

Ch04 - Baseline Forecasts using darts, 'MAC000193' not in data set

Hi,

I ran into a problem where the household 'MAC000193' was not available in 'selected_blocks_train_missing_imputed.parquet'.

This is probably due to a bug in Ch02 - 02-Preprocessing. Here, block_data_path.glob("*.csv") returns a list of files, for which the ordering is not deterministic. On a Windows system, this list was ordered alphabetically, but on a Linux system, the list was ordered differently.

In Ch04 - 01-Setting up Experiment Harness, 50 households are sampled. Even though random_state is specified explicitly, the results may differ because the incoming lists are ordered differently.

A quick fix is to wrap block_data_path.glob("*.csv") in the Ch02 - 02-Preprocessing notebook with sorted().
Hope this helps somebody who runs into the same problem.

Chapter 07 Code

y_trend has no time stamp

AttributeError Traceback (most recent call last)
Cell In[27], line 1
----> 1 y_trend.plot()
2 plt.show()
3 kendall_tau_res = check_trend(y_trend, confidence=0.05)

AttributeError: 'numpy.ndarray' object has no attribute 'plot'

TypeError: Passing a set as an indexer is not supported. Use a list instead. Occured in Chapter08.

I faced this issue when I was running 01-Forecasting with ML.ipynb file, the following error occurred when I executed the Sample Household block.
The error originated from src/forecasting/ml_forecasting.py --- line 153
df.loc[:, set(feature_list + self.index_cols)]
The reason of this error is that sets are not hashable and, therefore, cannot be used as an index.
so, I replace that line with the following:
df.loc[:, list(set(feature_list + self.index_cols))]
and it worked.

In Chapter 2, the memory usage of compact form seems unreal.

"For one block, this representation takes up only ~0.002 MB of memory." This is probably based on the result of pandas memory_usage. But if you calculate the numpy bytes in the energy consumption column, it takes ~10MB:
sum([x.nbytes for x in block1_compact['energy_consumption']])/1024**2
Out[32]: 9.8807373046875

I think memory_usage only considers the array pointers in the column (even with deep=True). Also if you save the table to disk it still takes ~15MB.

For BoxCoxTransformer, `y = self._add_one(y)` twice in fit + transform?

For BoxCoxTransformer, y = self._add_one(y) twice in fit + transform?

src\transforms\target_transformations.py

class BoxCoxTransformer:
...

    def fit(self, y: pd.Series):
        """No action is being done apart from checking the input. This is a dummy method for compatibility

        Args:
            y (pd.Series): The time series as a panda series or dataframe with a datetime index
        """
        check_input(y)
        check_negative(y)
        y = self._add_one(y)
        if self._do_optimize:
            self.boxcox_lambda = self._optimize_lambda(y)
        self._is_fitted = True
        return self

    def transform(self, y: pd.Series) -> pd.Series:
        """Applies the log transform

        Args:
            y (pd.Series): The time series as a panda series or dataframe with a datetime index

        Raises:
            ValueError: If there are zero values and `add_one` is False

        Returns:
            pd.Series: The transformed series
        """
        check_fitted(self._is_fitted)
        y = check_input(y)
        check_negative(y)
        y = self._add_one(y)
        return pd.Series(boxcox(y.values, lmbda=self.boxcox_lambda), index=y.index)

Ch07 > Detecting and correcting for heteroscedasticity > Box-Cox transform > 157

Abstract method supports_multivariate not implemented in baselines.py

Hello,
I think there is an abstract method called "supports_multivariate" which should be implemented in the script baselines.py. Otherwise,we can't instantiate the abstract class "NaiveMovingAverage".

I suggest to add simply this piece of code in the aforementioned class:

def supports_multivariate(self) -> bool:
return True

Naive Moving Forecast error

I am getting this error for the NaiveMovingAverage class. I am playing around with it try and figure it out but no luck yet. Any ideas?

image

issue creating parquet format file

For some reason the dataframe in notebook "02 - Preprocessing London Smart Meter Dataset" does not seem to be able to convert to Parquet format per this screen shot. I have tried setting the encoder to both fastparquet and pyarrow but still no luck. This would not be a problem except for the fact that the next chapter relies on this file so I am sort of stuck until I figure it out. Any ideas?

image

In implementation of Seasonal Decompose (STL) with FourierSeries, `max_cycle=np.max(seasonal_cycle)` is wrong?

For the Textbook's implementation of Seasonal Decompose (STL) with FourierSeries & RidgeCV.

(if I understood correctly)
In .\src\decomposition\seasonal.py

class FourierDecomposition(BaseDecomposition):
...
    def _calculate_fourier_terms(self, seasonal_cycle: np.ndarray, max_cycle: int):
        """Calculates Fourier Terms given the seasonal cycle and max_cycle"""
        sin_X = np.empty((len(seasonal_cycle), self.n_fourier_terms), dtype="float64")
        cos_X = np.empty((len(seasonal_cycle), self.n_fourier_terms), dtype="float64")
        for i in range(1, self.n_fourier_terms + 1):
            sin_X[:, i - 1] = np.sin((2 * np.pi * seasonal_cycle * i) / max_cycle)
            cos_X[:, i - 1] = np.cos((2 * np.pi * seasonal_cycle * i) / max_cycle)
        return np.hstack([sin_X, cos_X])
...

    def _prepare_X(self, detrended, **seasonality_kwargs):
...
            seasonal_cycle = (getattr(date_index, self.seasonality_period)).values
        return self._calculate_fourier_terms(
            seasonal_cycle, max_cycle=np.max(seasonal_cycle)
        )

max_cycle=np.max(seasonal_cycle) is wrong

ie,eg:
seasonal_cycle = df.index.dayofweek.values => we know: dayofweek gives 0,1,2,3,4,5,6
max_cycle=np.max(seasonal_cycle) => it takes 6 -- this is wrong
it should take 7 as the max_cycle (-- the period in FourierSeries)

eg:
For a comparison of a simple ex
image
using 7 clearly fits better than using 6.

eg:
For a comparison of a the textbook energy_consumption data:
image
(this is not so obvious)
(but you can see, if use 6, there are always 2 days sticks to the same level,, in each period)

Issues installing environment on M2 Macbook Air

Trying to install the environment on a M2 Macbook Air with the latest anaconda_env.yml and statsforecast>=0.6.0 and
python-kaleido>=0.1.0 leads to the below error message.

Any help with resolving this would be much appreciated.

Pip subprocess error:
  Running command git clone --filter=blob:none --quiet https://github.com/manujosephv/pytorch_tabular.git /private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-req-build-kdxixgmy
  Running command git checkout -b modern_ts_freeze --track origin/modern_ts_freeze
  Switched to a new branch 'modern_ts_freeze'
  branch 'modern_ts_freeze' set up to track 'origin/modern_ts_freeze'.
  error: subprocess-exited-with-error
  
  × Getting requirements to build wheel did not run successfully.
  │ exit code: 1
  ╰─> [62 lines of output]
      /private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/config/setupcfg.py:293: _DeprecatedConfig: Deprecated config in `setup.cfg`
      !!
      
              ********************************************************************************
              The license_file parameter is deprecated, use license_files instead.
      
              By 2023-Oct-30, you need to update your project and remove deprecated calls
              or your builds will no longer be supported.
      
              See https://setuptools.pypa.io/en/latest/userguide/declarative_config.html for details.
              ********************************************************************************
      
      !!
        parsed = self.parsers.get(option_name, lambda x: x)(value)
      running egg_info
      writing lib3/PyYAML.egg-info/PKG-INFO
      writing dependency_links to lib3/PyYAML.egg-info/dependency_links.txt
      writing top-level names to lib3/PyYAML.egg-info/top_level.txt
      Traceback (most recent call last):
        File "/Users/daniels/anaconda3/envs/modern_ts/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 353, in <module>
          main()
        File "/Users/daniels/anaconda3/envs/modern_ts/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 335, in main
          json_out['return_val'] = hook(**hook_input['kwargs'])
        File "/Users/daniels/anaconda3/envs/modern_ts/lib/python3.9/site-packages/pip/_vendor/pyproject_hooks/_in_process/_in_process.py", line 118, in get_requires_for_build_wheel
          return hook(config_settings)
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 341, in get_requires_for_build_wheel
          return self._get_build_requires(config_settings, requirements=['wheel'])
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 323, in _get_build_requires
          self.run_setup()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/build_meta.py", line 338, in run_setup
          exec(code, locals())
        File "<string>", line 271, in <module>
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/__init__.py", line 107, in setup
          return distutils.core.setup(**attrs)
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 185, in setup
          return run_commands(dist)
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/core.py", line 201, in run_commands
          dist.run_commands()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 969, in run_commands
          self.run_command(cmd)
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/dist.py", line 1234, in run_command
          super().run_command(command)
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/dist.py", line 988, in run_command
          cmd_obj.run()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 314, in run
          self.find_sources()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 322, in find_sources
          mm.run()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 551, in run
          self.add_defaults()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/command/egg_info.py", line 589, in add_defaults
          sdist.add_defaults(self)
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/command/sdist.py", line 104, in add_defaults
          super().add_defaults()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/sdist.py", line 251, in add_defaults
          self._add_defaults_ext()
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/command/sdist.py", line 336, in _add_defaults_ext
          self.filelist.extend(build_ext.get_source_files())
        File "<string>", line 201, in get_source_files
        File "/private/var/folders/mm/55vl94nd119gsfty5_1q66q40000gn/T/pip-build-env-pk92viix/overlay/lib/python3.9/site-packages/setuptools/_distutils/cmd.py", line 107, in __getattr__
          raise AttributeError(attr)
      AttributeError: cython_sources
      [end of output]
  
  note: This error originates from a subprocess, and is likely not a problem with pip.
error: subprocess-exited-with-error

× Getting requirements to build wheel did not run successfully.
│ exit code: 1
╰─> See above for output.

note: This error originates from a subprocess, and is likely not a problem with pip.

failed

CondaEnvException: Pip failed

Mismatch between `ts_utils.py` implementation of `forecast_bias` vs what is written in Chapter 4 of Book

There is a mismatch in the numerator of the Forecast Bias calculation shown in Chapter 4, page 80 of the book versus the forecast_bias implementation in ts_utils.py.

In ts_utils.py at line 105, forecast is calculated as this:
return ((y_true_sum - y_pred_sum) / y_true_sum) * 100.

Compare that to the book where the numerator is the negation of the code. I understand that both are valid definitions of forecast bias and just depends on your view, but I bring it up just in case you want consistency between the text and the code.

BTW, the metric values I get from running the notebook do not match the plots shown in the book, FYI.

ValueError from `eval_model` function within Chapter 4: 02-Baseline Forecasts using darts.ipynb Cell 12

When running the eval_model function within Chapter 4: 02-Baseline Forecasts using darts.ipynb at cell 12, I get the following error:

Cell In[12], line 4
      2 naive_model = NaiveSeasonal(K=1)
      3 with LogTime() as timer:
----> 4     y_pred, metrics = eval_model(naive_model, ts_train, ts_val, name=name)
      5 metrics['Time Elapsed'] = timer.elapsed
      6 metric_record.append(metrics)

Cell In[10], line 11
      4 model.fit(ts_train)
      5 y_pred = model.predict(len(ts_test))
      6 return y_pred, {
      7     "Algorithm": name,
      8     "MAE": mae(actual_series = ts_test, pred_series = y_pred),
      9     "MSE": mse(actual_series = ts_test, pred_series = y_pred),
     10     "MASE": mase(actual_series = ts_test, pred_series = y_pred, insample=ts_train),
---> 11     "Forecast Bias": forecast_bias(actual_series = ts_test, pred_series = y_pred)
     12 }

File ~/Modern-Time-Series-Forecasting-with-Python/src/utils/ts_utils.py:102, in forecast_bias(actual_series, pred_series, intersect, reduction, inter_reduction, n_jobs, verbose)
    100 else:
    101     y_true, y_pred = _get_values_or_raise(actual_series, pred_series, intersect)
--> 102 y_true, y_pred = _remove_nan_union(y_true, y_pred)
    103 y_true_sum, y_pred_sum = np.sum(y_true), np.sum(y_pred)
...
   5348                          'length of {}'.format(N))
   5350     # optimization, the other branch is slower
   5351     keep = ~obj

ValueError: boolean array argument obj to delete must be one dimensional and match the axis length of 1488

I went into ts_utils.py and made the following changes:

def _remove_nan_union(array_a: np.ndarray,
                      array_b: np.ndarray) -> Tuple[np.ndarray, np.ndarray]:
    """
    Returns the two inputs arrays where all elements are deleted that have an index that corresponds to
    a NaN value in either of the two input arrays.
    """

    isnan_mask = np.logical_or(np.isnan(array_a), np.isnan(array_b))

    # I added the line below
    isnan_mask = isnan_mask.reshape(-1,)

    return np.delete(array_a, isnan_mask), np.delete(array_b, isnan_mask)

This resolved the ValueError but I do not know if there are any side-effects of my modification.

Chapter 13) output tensor dimension error

In Multi-step prediction section of chapter 13 (LSTM-FC Seq2Seq Last Hidden, LSTM-FC Seq2Seq All Hidden) I got RuntimeError: Predictions and targets are expected to have the same shape, but got torch.Size([32, 1, 48]) and torch.Size([32, 48, 1]).

src/dl/models.py Line 361, 363 should be fixed to "unsqueeze(-1)" or something. (forward function part of Seq2SeqModel class)

if self.hparams.decoder_type == "FC":
            if self.hparams.decoder_use_all_hidden:
                y_hat = self.decoder(o.reshape(o.size(0), -1)).unsqueeze(-1)  # fixed
            else: 
                y_hat = self.decoder(o[:, -1, :]).unsqueeze(-1)  # fixed

Recommend Projects

  • React photo React

    A declarative, efficient, and flexible JavaScript library for building user interfaces.

  • Vue.js photo Vue.js

    🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.

  • Typescript photo Typescript

    TypeScript is a superset of JavaScript that compiles to clean JavaScript output.

  • TensorFlow photo TensorFlow

    An Open Source Machine Learning Framework for Everyone

  • Django photo Django

    The Web framework for perfectionists with deadlines.

  • D3 photo D3

    Bring data to life with SVG, Canvas and HTML. 📊📈🎉

Recommend Topics

  • javascript

    JavaScript (JS) is a lightweight interpreted programming language with first-class functions.

  • web

    Some thing interesting about web. New door for the world.

  • server

    A server is a program made to process requests and deliver data to clients.

  • Machine learning

    Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.

  • Game

    Some thing interesting about game, make everyone happy.

Recommend Org

  • Facebook photo Facebook

    We are working to build community through open source technology. NB: members must have two-factor auth.

  • Microsoft photo Microsoft

    Open source projects and samples from Microsoft.

  • Google photo Google

    Google ❤️ Open Source for everyone.

  • D3 photo D3

    Data-Driven Documents codes.