Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

New Semi Supervised Learning Algorithms? #205

Open
hansen7 opened this issue May 15, 2019 · 5 comments
Open

New Semi Supervised Learning Algorithms? #205

hansen7 opened this issue May 15, 2019 · 5 comments

Comments

@hansen7
Copy link
Contributor

hansen7 commented May 15, 2019

Description

Hi, is there going to be some metric learning algorithm on the semi-supervised direction, utilising both labels/pairwise constraints and unlabelled data to derive the distance metric.

Some References

Locally linear metric adaptation for semi-supervised clustering
Metric Learning from Relative Comparisons by Minimizing Squared Residual
Semi-Supervised Metric Learning Using Pairwise Constraints

@wdevazelhes
Copy link
Member

wdevazelhes commented May 22, 2019

Hi @hansen7 , thanks for those references

Metric Learning from Relative Comparisons by Minimizing Squared Residual is LSML, already present in metric-learn, but I didn't see mentioned how to use the unlabeled data in the paper ? (I didn't read it thoroughly though)
But indeed had a quick look for instance at Locally linear metric adaptation for semi-supervised clustering and it seems to be able to use unlabeled data as well as pairwise constraints

So yes I think it would be cool to have these kind of algorithms, I guess at some point we will need to decide what algorithms are a priority for metric-learn, so it's interesting to have these in mind already

Any thoughts @bellet @perimosocordiae @terrytangyuan @nvauquie ?

@perimosocordiae
Copy link
Contributor

Note that we also have gh-13 tracking other requested algorithms. Let's keep that list updated as new algorithms are proposed/implemented.

I'm in favor of adding more algorithm diversity to the package, in general. I think our standards can be looser than scikit-learn or scipy's, but we should also be pragmatic and not take on too much. Criteria might include:

  • A publication with a reasonable number of citations.
  • A reference implementation or published inputs/outputs that we can validate our version against.
  • An implementation that doesn't require thousands of lines of new code, or adding new mandatory dependencies.

Of course, any of these three guidelines could be ignored in special cases.

@terrytangyuan
Copy link
Member

Agree with what @perimosocordiae said above. Just adding my two cents here that we should prioritize the algorithms that have:

  • Larger number of citations
  • Common parts that can be reused by other/existing algorithms
  • Better proven performance over other similar/existing algorithms

@bellet
Copy link
Member

bellet commented May 24, 2019

I am adding the following paper which I think is the most classic semi-supervised metric learning algorithm (using graph regularization through the Laplacian):
https://dl.acm.org/citation.cfm?id=1823752

It would be nice to have such an algorithm in the package at some point. But inputting unlabeled points may require some thinking in terms of API, and the use-cases are perhaps a bit limited.

@hansen7 do you have a personal interest in implementing such semi-supervised methods, or are just simply looking for ideas on what could be included in metric-learn? If it is the latter, indeed gh-13 is a good place to look at. In my opinion, adding the super classic and effective triplet-based approach of https://www.cs.cornell.edu/people/tj/publications/schultz_joachims_03a.pdf would be awesome

@hansen7
Copy link
Contributor Author

hansen7 commented Jun 5, 2019

@hansen7 do you have a personal interest in implementing such semi-supervised methods, or are just simply looking for ideas on what could be included in metric-learn? If it is the latter, indeed gh-13 is a good place to look at. In my opinion, adding the super classic and effective triplet-based approach of https://www.cs.cornell.edu/people/tj/publications/schultz_joachims_03a.pdf would be awesome

thanks, actually I have implemented a few semi-supervised algorithms such as SERAPH from here for my research projects, I would be very happy to help develop these methods within the metric-learn module.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

5 participants