Comments (4)
Thanks. Feel free to make a PR.
cc @thinkall
from flaml.
Thanks @jgukelberger , @sonichi , I don't see the issue with pandas 2.0.3 and 2.2.2
numpy 1.24.3
pandas 2.2.2 / 2.0.3
from flaml.
That's interesting @sonichi. Attached is the full pip list
output for the environment I'm seeing these errors in. The errors are fixed with pip install "pandas<2"
.
The only difference between the two environments is pandas:
$ diff pipenv-works.txt pipenv-fails.txt
31c31
< pandas 1.5.3
---
> pandas 2.2.2
Here's the full output of a failing test case:
$ pytest -v test/automl/test_forecast.py -k test_numpy
=============================================== test session starts ================================================
platform linux -- Python 3.10.14, pytest-7.4.0, pluggy-1.0.0 -- /home/jagukelb/opt/miniconda3/envs/flaml-test/bin/python
cachedir: .pytest_cache
rootdir: /home/jagukelb/src/experiments/FLAML
configfile: pyproject.toml
collected 9 items / 7 deselected / 2 selected
test/automl/test_forecast.py::test_numpy FAILED [ 50%]
test/automl/test_forecast.py::test_numpy_large PASSED [100%]
===================================================== FAILURES =====================================================
____________________________________________________ test_numpy ____________________________________________________
def test_numpy():
X_train = np.arange("2014-01", "2021-01", dtype="datetime64[M]")
y_train = np.random.random(size=len(X_train))
automl = AutoML()
> automl.fit(
X_train=X_train[:72], # a single column of timestamp
y_train=y_train[:72], # value for each timestamp
period=12, # time horizon to forecast, e.g., 12 months
task="ts_forecast",
time_budget=3, # time budget in seconds
log_file_name="test/ts_forecast.log",
n_splits=3, # number of splits
)
test/automl/test_forecast.py:126:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
flaml/automl/automl.py:1664: in fit
task.validate_data(
flaml/automl/task/time_series_task.py:166: in validate_data
data = TimeSeriesDataset(
flaml/automl/time_series/ts_data.py:57: in __init__
self.frequency = pd.infer_freq(train_data[time_col].unique())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
index = array([1.3885344e+09, 1.3912128e+09, 1.3936320e+09, 1.3963104e+09,
1.3989024e+09, 1.4015808e+09, 1.4041728e+09,...8e+09, 1.5593472e+09, 1.5619392e+09, 1.5646176e+09,
1.5672960e+09, 1.5698880e+09, 1.5725664e+09, 1.5751584e+09])
def infer_freq(
index: DatetimeIndex | TimedeltaIndex | Series | DatetimeLikeArrayMixin,
) -> str | None:
"""
Infer the most likely frequency given the input index.
Parameters
----------
index : DatetimeIndex, TimedeltaIndex, Series or array-like
If passed a Series will use the values of the series (NOT THE INDEX).
Returns
-------
str or None
None if no discernible frequency.
Raises
------
TypeError
If the index is not datetime-like.
ValueError
If there are fewer than three values.
Examples
--------
>>> idx = pd.date_range(start='2020/12/01', end='2020/12/30', periods=30)
>>> pd.infer_freq(idx)
'D'
"""
from pandas.core.api import DatetimeIndex
if isinstance(index, ABCSeries):
values = index._values
if not (
lib.is_np_dtype(values.dtype, "mM")
or isinstance(values.dtype, DatetimeTZDtype)
or values.dtype == object
):
raise TypeError(
"cannot infer freq from a non-convertible dtype "
f"on a Series of {index.dtype}"
)
index = values
inferer: _FrequencyInferer
if not hasattr(index, "dtype"):
pass
elif isinstance(index.dtype, PeriodDtype):
raise TypeError(
"PeriodIndex given. Check the `freq` attribute "
"instead of using infer_freq."
)
elif lib.is_np_dtype(index.dtype, "m"):
# Allow TimedeltaIndex and TimedeltaArray
inferer = _TimedeltaFrequencyInferer(index)
return inferer.get_freq()
elif is_numeric_dtype(index.dtype):
> raise TypeError(
f"cannot infer freq from a non-convertible index of dtype {index.dtype}"
)
E TypeError: cannot infer freq from a non-convertible index of dtype float64
../../../opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/pandas/tseries/frequencies.py:148: TypeError
================================================= warnings summary =================================================
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy_large
/home/jagukelb/src/experiments/FLAML/flaml/automl/time_series/ts_data.py:121: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
return pd.concat([self.X_train, self.X_val], axis=0)
test/automl/test_forecast.py::test_numpy_large
/home/jagukelb/src/experiments/FLAML/test/automl/test_forecast.py:158: FutureWarning: 'T' is deprecated and will be removed in a future version, please use 'min' instead.
X_train = pd.date_range("2017-01-01", periods=70000, freq="T")
test/automl/test_forecast.py::test_numpy_large
/home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/prophet/models.py:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
test/automl/test_forecast.py: 20 warnings
/home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/lightgbm/basic.py:696: UserWarning: Usage of np.ndarray subset (sliced data) is not recommended due to it will double the peak memory cost in LightGBM.
_log_warning("Usage of np.ndarray subset (sliced data) is not recommended "
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================= short test summary info ==============================================
FAILED test/automl/test_forecast.py::test_numpy - TypeError: cannot infer freq from a non-convertible index of dtype float64
============================= 1 failed, 1 passed, 7 deselected, 24 warnings in 19.07s ==============================
And here when it works after downgrading pandas:
$ pytest -v test/automl/test_forecast.py -k test_numpy
=============================================== test session starts ================================================
platform linux -- Python 3.10.14, pytest-7.4.0, pluggy-1.0.0 -- /home/jagukelb/opt/miniconda3/envs/flaml-test/bin/python
cachedir: .pytest_cache
rootdir: /home/jagukelb/src/experiments/FLAML
configfile: pyproject.toml
collected 9 items / 7 deselected / 2 selected
test/automl/test_forecast.py::test_numpy PASSED [ 50%]
test/automl/test_forecast.py::test_numpy_large PASSED [100%]
================================================= warnings summary =================================================
test/automl/test_forecast.py::test_numpy
/home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/prophet/models.py:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
import pkg_resources
test/automl/test_forecast.py: 22 warnings
/home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated. Please use `np.result_type` or `np.promote_types`.
See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information. (Deprecated NumPy 1.25)
return np.find_common_type(types, [])
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy_large
test/automl/test_forecast.py::test_numpy_large
/home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/lightgbm/basic.py:696: UserWarning: Usage of np.ndarray subset (sliced data) is not recommended due to it will double the peak memory cost in LightGBM.
_log_warning("Usage of np.ndarray subset (sliced data) is not recommended "
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
/home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency MS will be used.
self._init_dates(dates, freq)
-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================== 2 passed, 7 deselected, 34 warnings in 20.20s ===================================
pipenv-fails.txt
pipenv-works.txt
from flaml.
Similar error happens, both kaggle and colab environment.
numpy: 1.25.2
pandas: 2.0.3
flaml: 2.1.2
1.25.2
2.0.3
2.1.2
---------------------------------------------------------------------------
AssertionError Traceback (most recent call last)
[<ipython-input-3-97dd957c5800>](https://localhost:8080/#) in <cell line: 13>()
11 y_train = np.random.random(size=84)
12 automl = AutoML()
---> 13 automl.fit(
14 X_train=X_train[:84], # a single column of timestamp
15 y_train=y_train, # value for each timestamp
2 frames
[/usr/local/lib/python3.10/dist-packages/flaml/automl/time_series/ts_data.py](https://localhost:8080/#) in __init__(self, train_data, time_col, target_names, time_idx, test_data)
56
57 self.frequency = pd.infer_freq(train_data[time_col].unique())
---> 58 assert self.frequency is not None, "Only time series of regular frequency are currently supported."
59
60 float_cols = list(train_data.select_dtypes(include=["floating"]).columns)
AssertionError: Only time series of regular frequency are currently supported.
from flaml.
Related Issues (20)
- Typo or misnaming in automl/timeseries/ts_model.py HOT 1
- Update Optuna HOT 2
- Shifting of Time Series data HOT 1
- BlendSearch in UDF mode HOT 6
- Official example: Time Series Forecast TypeError: 'NoneType' object is not callable HOT 2
- ml.py contains conflicting references to Numpy
- Cannot reproduce Flaml predictions using SkLearn RF HOT 1
- Wrong index during CV HOT 1
- Unable to work with root_mean_squared_log_error HOT 1
- Inconsistent naming HOT 3
- Add (built-in) multi-target support HOT 1
- TypeError: StratifiedKFold.split() missing 1 required positional argument: 'y'
- Inconsistent Usage of .format() and f-strings in FLAML Codebase HOT 1
- name 'shuffle' is not defined HOT 7
- `TypeError: fit() got an unexpected keyword argument 'callbacks'` with xgboost 2.1.0 HOT 2
- Incompatibility with NumPy 2.0.0: Module Crashes Due to Binary Incompatibility and Attribute Errors HOT 1
- n_estimators value on automl.model differs from value in logs (for CatBoost models) HOT 2
- starting_point not used HOT 15
- AutoML does not pass proper `objective` to `estimator_class` when `metric` is non-default. HOT 1
Recommend Projects
-
React
A declarative, efficient, and flexible JavaScript library for building user interfaces.
-
Vue.js
🖖 Vue.js is a progressive, incrementally-adoptable JavaScript framework for building UI on the web.
-
Typescript
TypeScript is a superset of JavaScript that compiles to clean JavaScript output.
-
TensorFlow
An Open Source Machine Learning Framework for Everyone
-
Django
The Web framework for perfectionists with deadlines.
-
Laravel
A PHP framework for web artisans
-
D3
Bring data to life with SVG, Canvas and HTML. 📊📈🎉
-
Recommend Topics
-
javascript
JavaScript (JS) is a lightweight interpreted programming language with first-class functions.
-
web
Some thing interesting about web. New door for the world.
-
server
A server is a program made to process requests and deliver data to clients.
-
Machine learning
Machine learning is a way of modeling and interpreting data that allows a piece of software to respond intelligently.
-
Visualization
Some thing interesting about visualization, use data art
-
Game
Some thing interesting about game, make everyone happy.
Recommend Org
-
Facebook
We are working to build community through open source technology. NB: members must have two-factor auth.
-
Microsoft
Open source projects and samples from Microsoft.
-
Google
Google ❤️ Open Source for everyone.
-
Alibaba
Alibaba Open Source for everyone
-
D3
Data-Driven Documents codes.
-
Tencent
China tencent open source team.
from flaml.