Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Gaussian Mixture Model capability #359

Open
cbrautigam2 opened this issue Feb 14, 2024 · 4 comments · May be fixed by #369
Open

Gaussian Mixture Model capability #359

cbrautigam2 opened this issue Feb 14, 2024 · 4 comments · May be fixed by #369
Labels
question General question

Comments

@cbrautigam2
Copy link

Hi,

I need to port some Matlab code to java and I'm looking at what is out there in Java land that can do Gaussian Mixture Models. Specifically, the code that I have to port is making heavy use of Matlab's gmdistribution https://www.mathworks.com/help/stats/gmdistribution.html and fitgmdist https://www.mathworks.com/help/stats/fitgmdist.html. I see that Tribuo alludes to Gaussian Mixtures in the KMeans tutorial: https://tribuo.org/learn/4.3/tutorials/clustering-tribuo-v4.html. So maybe this would suffice? I'm definitely not a mathematician, but I'm trying to see if Tribuo can do GMMs like these Matlab functions. It appears that Matlab supports two covariance types 'full' and diagonal'.

Can you please elaborate on Tribuo's capabilities in regards to GMMs?

@cbrautigam2 cbrautigam2 added the question General question label Feb 14, 2024
@Craigacp
Copy link
Member

Craigacp commented Feb 14, 2024

Tribuo doesn't have an implementation of fitting GMMs. We have a data generator that can sample from them to generate example data, but it can't fit that generator to a dataset. The data generator is roughly analogous to the gmdistribution function but it's pretty limited in terms of the number of gaussians. Building a more flexible version which has the functionality of gmdistribution isn't too hard on top of what we provide (e.g. MultivariateNormalDistribution).

Implementing a basic EM algorithm to fit a GMM like fitgmdist wouldn't be too hard as we have the cholesky factorization which is used in the M step, but making something scalable requires more effort (as our matrix algebra library isn't parallel yet).

@Craigacp
Copy link
Member

I've written a GMM implementation which is currently being debugged. Do you need the gmdistribution function as applied to only a distribution fit on data, or do you also want to be able to sample from a mixture distribution that you've created by hand?

@cbrautigam2
Copy link
Author

cbrautigam2 commented Apr 29, 2024 via email

@Craigacp
Copy link
Member

Ok. You'll be able to save the model and reuse it for future predictions, but extracting a distribution object like MultivariateNormalDistribution back out of it will be a little complicated as the dimensions of the samples are based on Tribuo's feature dimensions which are named rather than indexed and getting the index is a little more work. I've thought about it a bit more today and I think I will add a MixtureDistribution class and try to add a distributions interface, but the sampling method will likely be exposed on both MixtureDistribution and GaussianMixtureModel.

@Craigacp Craigacp linked a pull request May 20, 2024 that will close this issue
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
question General question
Projects
None yet
Development

Successfully merging a pull request may close this issue.

2 participants