-
Notifications
You must be signed in to change notification settings - Fork 231
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[MRG] fix quadruplets decision_function #217
[MRG] fix quadruplets decision_function #217
Conversation
As I understand it, this is just a "smoke test": it verifies that we don't throw any exceptions, but doesn't check the numeric results. If so, we can probably shrink down the input size significantly, as we don't really care about convergence behavior. |
I agree, done: I just kept 30 samples (below that like 20 samples, there are pbs either with LMNN or RCA). But it makes the test only last 6s (instead of 15), so I guess it's OK for now ? (And maybe later we can address this by making more adapted datasets that have the minimum amount of samples to work for every algo) |
Looks good but why do we need as many as 30 samples? |
Still puzzled by this ;-) |
With 20 samples, I have the following error for LMNN: ValueError: not enough class labels for specified k (smallest class has 2) I've tried with (With 10 samples only RCA throws: ValueError: Unable to make 10 chunks of 2 examples each ) |
Apparently we need 30 samples for the test to pass in all versions on CI, otherwise we have the RCA problem |
Note: TODO: I'll prevent that and try to use less samples by setting |
Done: if tests are green, I think we are good to merge |
Tests are green. If everybody agree we can merge |
|
||
# we subsample the data for the test to be more efficient | ||
input_data, _, labels, _ = train_test_split(input_data, labels, | ||
train_size=20) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe setting stratify=labels
would make LMNN pass for train_size=10
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You're right, I tried and it worked for 11 (though it didn't work for 10, I think because the choice of chunks removes any point once it's already included in a chunk or something like that, and the dataset has 3 classes, but I would need to investigate more to be sure)
However, it bugs for MLKR, since labels is a continuous values array that cannot be used as an argument of "stratify"
So I suggest that to keep it simple we keep it like that (with 20 samples, and no stratify), what do you think ?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fair enough, let's merge
I just realized there was a pb with
decision_function
for quadruplets, since it didn't work on list of lists of lists (instead of 3D arrays). It made me realize that we don't have integration test for these array-like inputs (althought we had extensive unit tests ofcheck_input
functions). Here is a fix. It adds a significant test time (15s, because it tests all possibilities with reasonable size datasets), but it's like other similar tests, it's hard to find smaller datasets since sometimes the algorithms do wrong on them. I've opened a separate issue (#218) for reducing the time of tests, so if everything is OK regarding everything but the time consideration, I guess we can merge