[MRG] fix quadruplets decision_function #217

wdevazelhes · 2019-06-13T13:47:20Z

I just realized there was a pb with decision_function for quadruplets, since it didn't work on list of lists of lists (instead of 3D arrays). It made me realize that we don't have integration test for these array-like inputs (althought we had extensive unit tests of check_input functions). Here is a fix. It adds a significant test time (15s, because it tests all possibilities with reasonable size datasets), but it's like other similar tests, it's hard to find smaller datasets since sometimes the algorithms do wrong on them. I've opened a separate issue (#218) for reducing the time of tests, so if everything is OK regarding everything but the time consideration, I guess we can merge

perimosocordiae · 2019-06-13T17:05:22Z

As I understand it, this is just a "smoke test": it verifies that we don't throw any exceptions, but doesn't check the numeric results. If so, we can probably shrink down the input size significantly, as we don't really care about convergence behavior.

wdevazelhes · 2019-06-14T14:05:44Z

As I understand it, this is just a "smoke test": it verifies that we don't throw any exceptions, but doesn't check the numeric results. If so, we can probably shrink down the input size significantly, as we don't really care about convergence behavior.

I agree, done: I just kept 30 samples (below that like 20 samples, there are pbs either with LMNN or RCA). But it makes the test only last 6s (instead of 15), so I guess it's OK for now ? (And maybe later we can address this by making more adapted datasets that have the minimum amount of samples to work for every algo)

bellet · 2019-06-14T17:47:12Z

Looks good but why do we need as many as 30 samples?

bellet · 2019-06-24T14:13:53Z

Looks good but why do we need as many as 30 samples?

Still puzzled by this ;-)

wdevazelhes · 2019-06-24T18:21:03Z

Looks good but why do we need as many as 30 samples?

With 20 samples, I have the following error for LMNN:

ValueError: not enough class labels for specified k (smallest class has 2)

I've tried with train_test_split to balance the subsampling, with 20 samples only, let's see if travis turns green

(With 10 samples only RCA throws:

ValueError: Unable to make 10 chunks of 2 examples each

)

wdevazelhes · 2019-06-25T07:34:04Z

Apparently we need 30 samples for the test to pass in all versions on CI, otherwise we have the RCA problem

wdevazelhes · 2019-06-25T07:51:02Z

Note: TODO: I'll prevent that and try to use less samples by setting num_chunks= to a lower number than 100

wdevazelhes · 2019-06-25T08:07:50Z

Done: if tests are green, I think we are good to merge

wdevazelhes · 2019-06-25T09:13:39Z

Tests are green. If everybody agree we can merge

bellet · 2019-06-25T10:17:20Z

test/test_sklearn_compat.py

+
+  # we subsample the data for the test to be more efficient
+  input_data, _, labels, _ = train_test_split(input_data, labels,
+                                              train_size=20)


maybe setting stratify=labels would make LMNN pass for train_size=10?

You're right, I tried and it worked for 11 (though it didn't work for 10, I think because the choice of chunks removes any point once it's already included in a chunk or something like that, and the dataset has 3 classes, but I would need to investigate more to be sure)
However, it bugs for MLKR, since labels is a continuous values array that cannot be used as an argument of "stratify"
So I suggest that to keep it simple we keep it like that (with 20 samples, and no stratify), what do you think ?

Fair enough, let's merge

fix quadruplets decision_function

68162de

wdevazelhes requested review from bellet and perimosocordiae and removed request for bellet June 13, 2019 13:47

William de Vazelhes added 2 commits June 14, 2019 16:00

Address scikit-learn-contrib#217 (comment)

23a4cd8

fix: I put the column at the wrong side, now it does some subsampling

d2d1d7f

Fix number of samples

22c88f0

let's try again with 30 samples

bdaa890

Use less chunks

826696e

bellet reviewed Jun 25, 2019

View reviewed changes

bellet merged commit 580d38d into scikit-learn-contrib:master Jun 25, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[MRG] fix quadruplets decision_function #217

[MRG] fix quadruplets decision_function #217

wdevazelhes commented Jun 13, 2019 •

edited

Loading

perimosocordiae commented Jun 13, 2019

wdevazelhes commented Jun 14, 2019

bellet commented Jun 14, 2019

bellet commented Jun 24, 2019

wdevazelhes commented Jun 24, 2019

wdevazelhes commented Jun 25, 2019

wdevazelhes commented Jun 25, 2019

wdevazelhes commented Jun 25, 2019

wdevazelhes commented Jun 25, 2019

bellet Jun 25, 2019

wdevazelhes Jun 25, 2019

bellet Jun 25, 2019

[MRG] fix quadruplets decision_function #217

[MRG] fix quadruplets decision_function #217

Conversation

wdevazelhes commented Jun 13, 2019 • edited Loading

perimosocordiae commented Jun 13, 2019

wdevazelhes commented Jun 14, 2019

bellet commented Jun 14, 2019

bellet commented Jun 24, 2019

wdevazelhes commented Jun 24, 2019

wdevazelhes commented Jun 25, 2019

wdevazelhes commented Jun 25, 2019

wdevazelhes commented Jun 25, 2019

wdevazelhes commented Jun 25, 2019

bellet Jun 25, 2019

Choose a reason for hiding this comment

wdevazelhes Jun 25, 2019

Choose a reason for hiding this comment

bellet Jun 25, 2019

Choose a reason for hiding this comment

wdevazelhes commented Jun 13, 2019 •

edited

Loading