Several of the official examples and tests currently error out due to incompatibility

Thanks. Feel free to make a PR. cc <a class="user-mention notranslate" data-hoverc

Thanks <a class="user-mention notranslate" data-hovercard-type="user" data-hovercard-u

That's interesting <a class="user-mention notranslate" data-hovercard-type="user" data

automl broken with pandas 2 about flaml HOT 4 OPEN

jgukelberger commented on September 27, 2024 1

automl broken with pandas 2

from flaml.

Comments (4)

sonichi commented on September 27, 2024

Thanks. Feel free to make a PR.
cc @thinkall

from flaml.

thinkall commented on September 27, 2024

Thanks @jgukelberger , @sonichi , I don't see the issue with pandas 2.0.3 and 2.2.2

numpy 1.24.3
pandas 2.2.2 / 2.0.3

from flaml.

jgukelberger commented on September 27, 2024

That's interesting @sonichi. Attached is the full pip list output for the environment I'm seeing these errors in. The errors are fixed with pip install "pandas<2".

The only difference between the two environments is pandas:

$ diff pipenv-works.txt pipenv-fails.txt
31c31
< pandas                1.5.3
---
> pandas                2.2.2

Here's the full output of a failing test case:

$ pytest -v test/automl/test_forecast.py -k test_numpy
=============================================== test session starts ================================================
platform linux -- Python 3.10.14, pytest-7.4.0, pluggy-1.0.0 -- /home/jagukelb/opt/miniconda3/envs/flaml-test/bin/python
cachedir: .pytest_cache
rootdir: /home/jagukelb/src/experiments/FLAML
configfile: pyproject.toml
collected 9 items / 7 deselected / 2 selected

test/automl/test_forecast.py::test_numpy FAILED                                                              [ 50%]
test/automl/test_forecast.py::test_numpy_large PASSED                                                        [100%]

===================================================== FAILURES =====================================================
____________________________________________________ test_numpy ____________________________________________________

    def test_numpy():
        X_train = np.arange("2014-01", "2021-01", dtype="datetime64[M]")
        y_train = np.random.random(size=len(X_train))
        automl = AutoML()
>       automl.fit(
            X_train=X_train[:72],  # a single column of timestamp
            y_train=y_train[:72],  # value for each timestamp
            period=12,  # time horizon to forecast, e.g., 12 months
            task="ts_forecast",
            time_budget=3,  # time budget in seconds
            log_file_name="test/ts_forecast.log",
            n_splits=3,  # number of splits
        )

test/automl/test_forecast.py:126:
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _
flaml/automl/automl.py:1664: in fit
    task.validate_data(
flaml/automl/task/time_series_task.py:166: in validate_data
    data = TimeSeriesDataset(
flaml/automl/time_series/ts_data.py:57: in __init__
    self.frequency = pd.infer_freq(train_data[time_col].unique())
_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _

index = array([1.3885344e+09, 1.3912128e+09, 1.3936320e+09, 1.3963104e+09,
       1.3989024e+09, 1.4015808e+09, 1.4041728e+09,...8e+09, 1.5593472e+09, 1.5619392e+09, 1.5646176e+09,
       1.5672960e+09, 1.5698880e+09, 1.5725664e+09, 1.5751584e+09])

    def infer_freq(
        index: DatetimeIndex | TimedeltaIndex | Series | DatetimeLikeArrayMixin,
    ) -> str | None:
        """
        Infer the most likely frequency given the input index.

        Parameters
        ----------
        index : DatetimeIndex, TimedeltaIndex, Series or array-like
          If passed a Series will use the values of the series (NOT THE INDEX).

        Returns
        -------
        str or None
            None if no discernible frequency.

        Raises
        ------
        TypeError
            If the index is not datetime-like.
        ValueError
            If there are fewer than three values.

        Examples
        --------
        >>> idx = pd.date_range(start='2020/12/01', end='2020/12/30', periods=30)
        >>> pd.infer_freq(idx)
        'D'
        """
        from pandas.core.api import DatetimeIndex

        if isinstance(index, ABCSeries):
            values = index._values
            if not (
                lib.is_np_dtype(values.dtype, "mM")
                or isinstance(values.dtype, DatetimeTZDtype)
                or values.dtype == object
            ):
                raise TypeError(
                    "cannot infer freq from a non-convertible dtype "
                    f"on a Series of {index.dtype}"
                )
            index = values

        inferer: _FrequencyInferer

        if not hasattr(index, "dtype"):
            pass
        elif isinstance(index.dtype, PeriodDtype):
            raise TypeError(
                "PeriodIndex given. Check the `freq` attribute "
                "instead of using infer_freq."
            )
        elif lib.is_np_dtype(index.dtype, "m"):
            # Allow TimedeltaIndex and TimedeltaArray
            inferer = _TimedeltaFrequencyInferer(index)
            return inferer.get_freq()

        elif is_numeric_dtype(index.dtype):
>           raise TypeError(
                f"cannot infer freq from a non-convertible index of dtype {index.dtype}"
            )
E           TypeError: cannot infer freq from a non-convertible index of dtype float64

../../../opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/pandas/tseries/frequencies.py:148: TypeError
================================================= warnings summary =================================================
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy_large
  /home/jagukelb/src/experiments/FLAML/flaml/automl/time_series/ts_data.py:121: FutureWarning: The behavior of DataFrame concatenation with empty or all-NA entries is deprecated. In a future version, this will no longer exclude empty or all-NA columns when determining the result dtypes. To retain the old behavior, exclude the relevant entries before the concat operation.
    return pd.concat([self.X_train, self.X_val], axis=0)

test/automl/test_forecast.py::test_numpy_large
  /home/jagukelb/src/experiments/FLAML/test/automl/test_forecast.py:158: FutureWarning: 'T' is deprecated and will be removed in a future version, please use 'min' instead.
    X_train = pd.date_range("2017-01-01", periods=70000, freq="T")

test/automl/test_forecast.py::test_numpy_large
  /home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/prophet/models.py:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

test/automl/test_forecast.py: 20 warnings
  /home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/lightgbm/basic.py:696: UserWarning: Usage of np.ndarray subset (sliced data) is not recommended due to it will double the peak memory cost in LightGBM.
    _log_warning("Usage of np.ndarray subset (sliced data) is not recommended "

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
============================================= short test summary info ==============================================
FAILED test/automl/test_forecast.py::test_numpy - TypeError: cannot infer freq from a non-convertible index of dtype float64
============================= 1 failed, 1 passed, 7 deselected, 24 warnings in 19.07s ==============================

And here when it works after downgrading pandas:

$ pytest -v test/automl/test_forecast.py -k test_numpy
=============================================== test session starts ================================================
platform linux -- Python 3.10.14, pytest-7.4.0, pluggy-1.0.0 -- /home/jagukelb/opt/miniconda3/envs/flaml-test/bin/python
cachedir: .pytest_cache
rootdir: /home/jagukelb/src/experiments/FLAML
configfile: pyproject.toml
collected 9 items / 7 deselected / 2 selected

test/automl/test_forecast.py::test_numpy PASSED                                                              [ 50%]
test/automl/test_forecast.py::test_numpy_large PASSED                                                        [100%]

================================================= warnings summary =================================================
test/automl/test_forecast.py::test_numpy
  /home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/prophet/models.py:16: DeprecationWarning: pkg_resources is deprecated as an API. See https://setuptools.pypa.io/en/latest/pkg_resources.html
    import pkg_resources

test/automl/test_forecast.py: 22 warnings
  /home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/pandas/core/dtypes/cast.py:1641: DeprecationWarning: np.find_common_type is deprecated.  Please use `np.result_type` or `np.promote_types`.
  See https://numpy.org/devdocs/release/1.25.0-notes.html and the docs for more information.  (Deprecated NumPy 1.25)
    return np.find_common_type(types, [])

test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy_large
test/automl/test_forecast.py::test_numpy_large
  /home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/lightgbm/basic.py:696: UserWarning: Usage of np.ndarray subset (sliced data) is not recommended due to it will double the peak memory cost in LightGBM.
    _log_warning("Usage of np.ndarray subset (sliced data) is not recommended "

test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
test/automl/test_forecast.py::test_numpy
  /home/jagukelb/opt/miniconda3/envs/flaml-test/lib/python3.10/site-packages/statsmodels/tsa/base/tsa_model.py:473: ValueWarning: No frequency information was provided, so inferred frequency MS will be used.
    self._init_dates(dates, freq)

-- Docs: https://docs.pytest.org/en/stable/how-to/capture-warnings.html
================================== 2 passed, 7 deselected, 34 warnings in 20.20s ===================================

pipenv-fails.txt
pipenv-works.txt

from flaml.

yareyaredesuyo commented on September 27, 2024

Similar error happens, both kaggle and colab environment.

numpy: 1.25.2
pandas: 2.0.3
flaml: 2.1.2

1.25.2
2.0.3
2.1.2
---------------------------------------------------------------------------
AssertionError                            Traceback (most recent call last)
[<ipython-input-3-97dd957c5800>](https://localhost:8080/#) in <cell line: 13>()
     11 y_train = np.random.random(size=84)
     12 automl = AutoML()
---> 13 automl.fit(
     14     X_train=X_train[:84],  # a single column of timestamp
     15     y_train=y_train,  # value for each timestamp

2 frames
[/usr/local/lib/python3.10/dist-packages/flaml/automl/time_series/ts_data.py](https://localhost:8080/#) in __init__(self, train_data, time_col, target_names, time_idx, test_data)
     56 
     57         self.frequency = pd.infer_freq(train_data[time_col].unique())
---> 58         assert self.frequency is not None, "Only time series of regular frequency are currently supported."
     59 
     60         float_cols = list(train_data.select_dtypes(include=["floating"]).columns)

AssertionError: Only time series of regular frequency are currently supported.

from flaml.

automl broken with pandas 2 about flaml HOT 4 OPEN

Comments (4)

Related Issues (20)

Recommend Projects

React

Vue.js

Typescript

TensorFlow

Django

Laravel

D3

Recommend Topics

javascript

web

server

Machine learning

Visualization

Game

Recommend Org

Facebook

Microsoft

Google

Alibaba

D3

Tencent