-
Notifications
You must be signed in to change notification settings - Fork 156
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
MultinomialNBClassifier not available. #753
Comments
I figured it out, MultinomialNBClassifier needs ints to be passed to it so the solution is to convert everything to with |
It would be nice to have documentation that spells out how to deal with typing issues like this. I am not sure if this could be a documentation improvement or a better more clear warning message. Querying models is very useful but it's challenging to figure why something isn't hooking up as it should. |
Thanks for reporting anyway. Yes, your data needs to have the julia> info("MultinomialNBClassifier", pkg="ScikitLearn").input_scitype
Table{_s24} where _s24<:(AbstractArray{_s23,1} where _s23<:Count) Do you have a specific suggestion how to improve the documentation? The MLJ documentation already has this section. The packages providing the models (ScikitLearn.jl and NaiveBayes.jl) have their own documentation, but we don't really have any control over that. If you try to use the model with data of wrong type, you do get an informative message: X, y = make_moons()
model = (@load MultinomialNBClassifier pkg=ScikitLearn)()
julia> machine(model, X, y)
┌ Warning: The scitype of `X`, in `machine(model, X, ...)` is incompatible with `model=MultinomialNBClassifier @121`:
│ scitype(X) = Table{AbstractArray{Continuous,1}}
│ input_scitype(model) = Table{var"#s45"} where var"#s45"<:(AbstractArray{var"#s13",1} where var"#s13"<:Count).
└ @ MLJBase ~/.julia/packages/MLJBase/pCCd7/src/machines.jl:91
Machine{MultinomialNBClassifier,…} @789 trained 0 times; caches data
args:
1: Source @615 ⏎ `Table{AbstractArray{Continuous,1}}`
2: Source @897 ⏎ `AbstractArray{Multiclass{2},1}` Can you think of a way to improve this? |
Documentation is a hard thing to get right. Because we have to consider the skill level and familiarity of the user to get the right level of information. At the start, I struggled with understanding the error messages but then I realized what it was trying to tell them and it did make sense. This is my first time using Julia as well as machine learning. So I had the challenge of understanding Julia-type errors and machine learning too.
This error message is informative if you know how to read it. That's not something a new user like myself can do quickly without some pain. For example, For other older languages like Python and sklearn, you could google things and get some nice StackOverflow explaining the error and how to fix it. But MJL and Julia are much newer and there aren't many questions. https://stackoverflow.com/search?q=MLJ+Julia For "Julia MLJ" I get 4 results. For "Python Sklearn", I get 4,300 results. StackOverflowing error messages make a huge difference for beginners. I am not sure if there anything that Julia authors can do other than write documentation and hope for the best though. But one thing that can be done is a clearer overview of typing and why it's important to the project. The one thing that really helped me figure out how to use MJL is querying models. It would be nice to have a more complete version of the "working with categorical data" where you take data and transform it to use it with different models. Right now it's talked about the pieces to do so but not a complete guide on how to do so. It can be confusing to figure out how to string together functionality together when you aren't familiar with the library. For example, if you wanted to use a Neural Network with categorical data, you have to transform the data into continuous types and it isn't clear how or why to do that in the documentation. A second major issue is that measures are basically undocumented. There is no explanation on how to use them and what to do if things aren't going when. For example, with things like Neural Networks, you can't get After combing through the documentation I found out you can just do There isn't much guidance on how to do the same querying on measures instead of models. While the documentation does have a nice Overall, I think a page explaining the design and how MJL wraps things |
@f0lie thanks indeed for taking the time to give such detailed feedback. Very much appreciated. Creating a link to it here.
Indeed. And one would often want to devote finite resources elsewhere 😄
Good point.
Good idea! JuliaAI/MLJBase.jl#529 FYI. I think some of the early "data ingestion" stuff is covered in this workshop: https://github.com/ablaom/MachineLearningInJulia2020 or at https://alan-turing-institute.github.io/DataScienceTutorials.jl/ . But I think about how to include this better in the manual. Again many thanks. |
And generally a lot of julia questions get posted/answered on Julia Discourse https://discourse.julialang.org |
Describe the bug
Some classifiers are not showing up for some reason.
To Reproduce
The car data is from here. https://archive.ics.uci.edu/ml/datasets/Car+Evaluation
Expected behavior
Clearly
MultinomialNBClassifier
is supposed to be here. There is probably some way to use OneHotEncoder or something to transform the data to get it to work but it's impossible for me to figure out that from the documentation.Versions
The latest version in the pkg.
The text was updated successfully, but these errors were encountered: