Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Getting 【ValueError: Input contains NaN 】 when using ARIMA model on non-null data #516

Closed
theabc50111 opened this issue Sep 6, 2022 · 1 comment

Comments

@theabc50111
Copy link

theabc50111 commented Sep 6, 2022

Describe the bug

I get the error ValueError: Input contains NaN, when I try to predict the next value of series by using ARIMA model from pmdarima.

But the data I use didn't contains null values.

To Reproduce

perform following codes:

import pandas as pd
from pmdarima.arima import ARIMA
tmp_series = pd.Series([0.8867208063423082, 0.4969678051201152, -0.35079875681211814, 0.07156197743204402, 0.6888394890593726, 0.6136916470350972, 0.9020102952782968, 0.38539523911177426, -0.02211092685162178, 0.7051282791422511, -0.21841121961990842, 0.003262841037836234, 0.3970253153400027, 0.8187445259415379, -0.525847439014037, 0.3039480910711944, 0.0279240073596233, 0.8238419467739897, 0.8234157376839023, 0.5897892005398399, 0.8333118174945449])
model_211 = ARIMA(order=(2, 1, 1), out_of_sample_size=0, mle_regression=True, suppress_warnings=True)
model_211.fit(tmp_series[:-1])
print(model_211.predict())

Versions

  • I perform this code in a docker container
    • OS info:
      Distributor ID: Ubuntu
      Description:    Ubuntu 20.04.4 LTS
      Release:        20.04
      Codename:       focal
      
    • pip package info (I put complete pip package list in here):
/usr/local/lib/python3.8/dist-packages/_distutils_hack/__init__.py:33: UserWarning: Setuptools is replacing distutils.                                           
  warnings.warn("Setuptools is replacing distutils.")                                                                                                            
                                                                                                                                                                 
System:                                                                                                                                                          
    python: 3.8.10 (default, Jun 22 2022, 20:18:18)  [GCC 9.4.0]                                                                                                 
executable: /usr/local/bin/python                                                                                                                                
   machine: Linux-5.4.0-121-generic-x86_64-with-glibc2.29                                                                                                        

Python dependencies:  
        pip: 20.2.4                                                                                                                                     [10/1688]
 setuptools: 65.3.0                     
    sklearn: 1.1.2                      
statsmodels: 0.13.2                     
      numpy: 1.19.3                     
      scipy: 1.9.1                      
     Cython: 0.29.32                    
     pandas: 1.3.3                      
     joblib: 1.1.0                      
   pmdarima: 1.8.3     
Linux-5.4.0-121-generic-x86_64-with-glibc2.29                                   
Python 3.8.10 (default, Jun 22 2022, 20:18:18)                                  
[GCC 9.4.0]                             
pmdarima 1.8.3                          
NumPy 1.19.3                            
SciPy 1.9.1                             
Scikit-Learn 1.1.2                      
Statsmodels 0.13.2

Expected Behavior

output :

[0.31694344 0.33824822 0.37848267 0.37804227 0.37307705 0.37364949
 0.37511297 0.37583506 0.37639757 0.37705751]

Actual Behavior

error message:

---------------------------------------------------------------------------
ValueError                                Traceback (most recent call last)
Input In [7], in <cell line: 7>()
      5 display(model_211.params())
      6 display(model_211.aic())
----> 7 display(model_211.predict())

File /usr/local/lib/python3.8/dist-packages/pmdarima/arima/arima.py:793, in ARIMA.predict(self, n_periods, X, return_conf_int, alpha, **kwargs)
    790 arima = self.arima_res_
    791 end = arima.nobs + n_periods - 1
--> 793 f, conf_int = _seasonal_prediction_with_confidence(
    794     arima_res=arima,
    795     start=arima.nobs,
    796     end=end,
    797     X=X,
    798     alpha=alpha)
    800 if return_conf_int:
    801     # The confidence intervals may be a Pandas frame if it comes from
    802     # SARIMAX & we want Numpy. We will to duck type it so we don't add
    803     # new explicit requirements for the package
    804     return f, check_array(conf_int, force_all_finite=False)

File /usr/local/lib/python3.8/dist-packages/pmdarima/arima/arima.py:205, in _seasonal_prediction_with_confidence(arima_res, start, end, X, alpha, **kwargs)
    202     conf_int[:, 1] = f + q * np.sqrt(var)
    204 y_pred = check_endog(f, dtype=None, copy=False, preserve_series=True)
--> 205 conf_int = check_array(conf_int, copy=False, dtype=None)
    207 return y_pred, conf_int

File /usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:899, in check_array(array, accept_sparse, accept_large_sparse, dtype, order, copy, force_all_finite, ensure_2d, allow_nd, ensure_min_samples, ensure_min_features, estimator, input_name)
    893         raise ValueError(
    894             "Found array with dim %d. %s expected <= 2."
    895             % (array.ndim, estimator_name)
    896         )
    898     if force_all_finite:
--> 899         _assert_all_finite(
    900             array,
    901             input_name=input_name,
    902             estimator_name=estimator_name,
    903             allow_nan=force_all_finite == "allow-nan",
    904         )
    906 if ensure_min_samples > 0:
    907     n_samples = _num_samples(array)

File /usr/local/lib/python3.8/dist-packages/sklearn/utils/validation.py:146, in _assert_all_finite(X, allow_nan, msg_dtype, estimator_name, input_name)
    124         if (
    125             not allow_nan
    126             and estimator_name
   (...)
    130             # Improve the error message on how to handle missing values in
    131             # scikit-learn.
    132             msg_err += (
    133                 f"\n{estimator_name} does not accept missing values"
    134                 " encoded as NaN natively. For supervised learning, you might want"
   (...)
    144                 "#estimators-that-handle-nan-values"
    145             )
--> 146         raise ValueError(msg_err)
    148 # for object dtype data, we only check for NaNs (GH-13254)
    149 elif X.dtype == np.dtype("object") and not allow_nan:

ValueError: Input contains NaN.

Additional Context

  • Downgrading the following packages will resolve this error:
    numpy==1.19.3
    pandas==1.3.3
    pmdarima==1.8.3
    
  • I also posted this error in stackoverflow
@tgsmith61591
Copy link
Member

I believe this is related to #404, which is possibly related to #492. We believe the underlying issue was fixed upstream in statsmodels, but they have not cut a new release with the fix yet. Going to close as a dupe; let's continue to track in #404 (which is still open)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants