Slow event-wise scores evaluation #17

rth · 2019-03-28T23:18:28Z

Computing the event-wise scores evaluation appears to be very time consuming, mostly due to the repeated calls to pd.to_dataframe within the Eventwise* scores.

At least for ramp_test_submission --quick-test on the starting kit this accounts for most of the runtime,

Originally the starting kit with the --quick-test option takes 63s on my laptop, with event-wise scores disabled this reduces to 9s.

Looking for a way to speeding it up, but generally, such long run time (on a dataset that is not that big) makes iterations slower which is problematic when running events.

The text was updated successfully, but these errors were encountered:

glemaitre · 2019-03-28T23:21:11Z

ping @jorisvandenbossche

rth · 2019-03-28T23:32:13Z

To give more details, it's running pd.to_datetime(y_true[:, 0], unit='m'), where y_true[:, 0] is a numpy array of float64 in minutes. I think converting it to a timestamp (e.g. ns) and than back to datetype, without passing by pd.to_datetime might be faster, but I have not found a vectorized way to do that yet.

jorisvandenbossche · 2019-03-29T09:45:48Z

Yes, we recently had an issue about that on the pandas issue tracker (it might be fixed in the latest pandas release).
In older releases, when you had floats, we are taking a very slow generic object path, while for ints it is optimized:

In [9]: a = np.arange(100000.)                

In [10]: %timeit pd.to_datetime(a, unit='m') 
593 ms ± 6.02 ms per loop (mean ± std. dev. of 7 runs, 1 loop each)

In [11]: %timeit pd.to_datetime(a.astype(int), unit='m')   
2.11 ms ± 44.4 µs per loop (mean ± std. dev. of 7 runs, 100 loops each)

I don't fully recall, but it might be we can simply cast to ints there? (I mean, the fact that it is float is maybe just because it was concatenated into a 2D array with actual floats, but those dates are originally ints?)

jorisvandenbossche · 2019-03-29T09:47:22Z

Yes, they are ints:

solar_wind/problem.py

Line 87 in 0156adc

arr = y_true.index.values.astype('datetime64[m]').astype(int)

So doing a astype(int) in the to_datetime call should be safe and speed up a lot.

jorisvandenbossche · 2019-03-29T09:51:36Z

See #18

rth · 2019-03-29T09:55:02Z

Thanks @jorisvandenbossche ! Looks great!

rth · 2019-03-29T10:03:54Z

Let's keep this open for now, even if the to_datatime is a major improvement -- I'll try see if caching some of calculations in event-wise scores could improve performance more.

rth changed the title ~~Very slow evementwise scores evaluation~~ Very slow element-wise scores evaluation Mar 28, 2019

rth changed the title ~~Very slow element-wise scores evaluation~~ Very slow event-wise scores evaluation Mar 28, 2019

rth changed the title ~~Very slow event-wise scores evaluation~~ Slow event-wise scores evaluation Mar 29, 2019

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Slow event-wise scores evaluation #17

Slow event-wise scores evaluation #17

rth commented Mar 28, 2019 •

edited

Loading

glemaitre commented Mar 28, 2019

rth commented Mar 28, 2019 •

edited

Loading

jorisvandenbossche commented Mar 29, 2019

jorisvandenbossche commented Mar 29, 2019

jorisvandenbossche commented Mar 29, 2019

rth commented Mar 29, 2019

rth commented Mar 29, 2019

Slow event-wise scores evaluation #17

Slow event-wise scores evaluation #17

Comments

rth commented Mar 28, 2019 • edited Loading

glemaitre commented Mar 28, 2019

rth commented Mar 28, 2019 • edited Loading

jorisvandenbossche commented Mar 29, 2019

jorisvandenbossche commented Mar 29, 2019

jorisvandenbossche commented Mar 29, 2019

rth commented Mar 29, 2019

rth commented Mar 29, 2019

rth commented Mar 28, 2019 •

edited

Loading

rth commented Mar 28, 2019 •

edited

Loading