Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Error: Perfect separation detected, results not available #29

Open
wiekern opened this issue Nov 8, 2019 · 3 comments
Open

Error: Perfect separation detected, results not available #29

wiekern opened this issue Nov 8, 2019 · 3 comments

Comments

@wiekern
Copy link

wiekern commented Nov 8, 2019

Hi,
I met an error described in the title when invoking fit_scores(). My data structrue is below
image

and I draw samples 2000 for test, 20000 for control for fitting the matcher, but I have no clue why this error occurs (I have looked into the source code). In addition, I ran the example code for loan.csv successfully, so I wonder if the fields of the data should not be string, rather integer? In fact, the data structure of loan example contains string as well see below
image

Hope anyone can help, thanks!

@mark-mediware
Copy link

@wiekern Not sure if it helps you, but I had similar errors and was pretty stuck. After some basic data analysis, I realized I had a few input variables with very limited distribution across groups (ex. Binary age bin with 10,000 rows = 0, and 5 rows = 1). After removing these variables/features, I had no errors.

Again, not sure if that's applicable to you, but was my (embarrassing ) issue.

@wiekern
Copy link
Author

wiekern commented Jan 14, 2020

Thanks for your answer!
The distribution might not be the problem, that was my view. I am wondering if the regression model supports input with string like in my case column of "text". I am think of I must be convert text into a numeric value or word embeddings (vector).

@umangdadhaniya
Copy link

model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit()
Traceback (most recent call last):

File "", line 1, in
model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit()

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 1963, in fit
bnryfit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 227, in fit
mlefit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\model.py", line 519, in fit
xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 215, in _fit
xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 327, in _fit_newton
callback(newparams)

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 211, in _check_perfect_pred
raise PerfectSeparationError(msg)

PerfectSeparationError: Perfect separation detected, results not available

model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit()
Traceback (most recent call last):

File "", line 1, in
model = sm.logit('Result ~ Year + Amount_Spent + Popularity_Rank', data = train_data).fit()

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 1963, in fit
bnryfit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 227, in fit
mlefit = super().fit(start_params=start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\model.py", line 519, in fit
xopt, retvals, optim_settings = optimizer._fit(f, score, start_params,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 215, in _fit
xopt, retvals = func(objective, gradient, start_params, fargs, kwargs,

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\base\optimizer.py", line 327, in _fit_newton
callback(newparams)

File "C:\Users\UMANG\anaconda3\lib\site-packages\statsmodels\discrete\discrete_model.py", line 211, in _check_perfect_pred
raise PerfectSeparationError(msg)

PerfectSeparationError: Perfect separation detected, results not available

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants