Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Optimize conditional entropy functions #50

Open
dwysocki opened this issue Dec 8, 2014 · 12 comments
Open

Optimize conditional entropy functions #50

dwysocki opened this issue Dec 8, 2014 · 12 comments

Comments

@dwysocki
Copy link
Member

dwysocki commented Dec 8, 2014

As of now, there are bottlenecks in the CE functions. The biggest one is probably the line

return np.sum((lambda p: p * np.log(np.sum(bins[i,:]) / size / p) \
                         if p > 0 else 0)(bins[i][j] / size)
              for i in np.arange(0, xbins)
              for j in np.arange(0, ybins)) if size > 0 else np.PINF

which uses vanilla python iterators to perform a mesh computation, when numpy matrix operations would be much more efficient. Other small optimizations include

size = len(r.T[1])

which is probably equivalent to r.shape[0], or something similar, and

entropies = list(m(partial_job, periods))
return periods[np.argmin(entropies)]

which can probably be done without evaluating the lazy iterable into a concrete list.

When I get the chance I will implement these optimizations, as right now conditional entropy runs at an unacceptably slow rate.

@dwysocki dwysocki self-assigned this Dec 8, 2014
@dwysocki dwysocki added this to the Engineering goals milestone Dec 8, 2014
@dwysocki
Copy link
Member Author

I've made an optimization that did not actually change run time c91e6cd

There were a lot of matrix transposes used to access a single column in an array, when that can be accomplished with a simple column slice. I tested the performance on those operations by themselves, and found that there was a performance increase, but within the context of conditional_entropy, this nanosecond performance boost was hidden by the runtime of the rest of the function, which was on the order of milliseconds to seconds.

Even if this didn't help runtime by any measurable means, it most likely helped memory usage a little bit, as the transposed arrays are not being created and stored in memory. It also gave me a chance to test out the unit test script I wrote to help with optimizing, which not only displays run time, but also asserts that behavior is not altered. Subsequent optimizations will be benchmarked with this script.

Now to fix the real bottleneck: that return statement.

@earlbellinger
Copy link
Contributor

Do you have any reason to believe that slices do anything different than
transposes? I'd think them to do the exact same thing, and in fact, based
on a small test, slices are actually *slower *than using transposes:

timeit.timeit('import numpy;
numpy.random.randint(1,1000,(1000,1000)).T[0]', number=1000)
15.503364500124007
timeit.timeit('import numpy;
numpy.random.randint(1,1000,(1000,1000))[:,0]', number=1000)
15.650746430968866

Earl Bellinger

On Mon, Dec 15, 2014 at 7:45 PM, Dan Wysocki [email protected]
wrote:

I've made an optimization that did not actually change run time c91e6cd
c91e6cd

There were a lot of matrix transposes used to access a single column in an
array, when that can be accomplished with a simple column slice. I tested
the performance on those operations by themselves, and found that there
was a performance increase, but within the context of
conditional_entropy, this nanosecond performance boost was hidden by the
runtime of the rest of the function, which was on the order of milliseconds
to seconds.

Even if this didn't help runtime by any measurable means, it most likely
helped memory usage a little bit, as the transposed arrays are not being
created and stored in memory. It also gave me a chance to test out the unit
test script https://gist.github.com/dwysocki/03391f11630e87d2cac4 I
wrote to help with optimizing, which not only displays run time, but also
asserts that behavior is not altered. Subsequent optimizations will be
benchmarked with this script.

Now to fix the real bottleneck: that return statement.


Reply to this email directly or view it on GitHub
#50 (comment).

@earlbellinger
Copy link
Contributor

Not that it really matters anyway. Micro-optimization, root of all evil,
etc

Earl Bellinger

On Mon, Dec 15, 2014 at 7:56 PM, Earl Bellinger [email protected]
wrote:

Do you have any reason to believe that slices do anything different than
transposes? I'd think them to do the exact same thing, and in fact, based
on a small test, slices are actually *slower *than using transposes:

timeit.timeit('import numpy;
numpy.random.randint(1,1000,(1000,1000)).T[0]', number=1000)
15.503364500124007
timeit.timeit('import numpy;
numpy.random.randint(1,1000,(1000,1000))[:,0]', number=1000)
15.650746430968866

Earl Bellinger

On Mon, Dec 15, 2014 at 7:45 PM, Dan Wysocki [email protected]
wrote:

I've made an optimization that did not actually change run time c91e6cd
c91e6cd

There were a lot of matrix transposes used to access a single column in
an array, when that can be accomplished with a simple column slice. I
tested the performance on those operations by themselves, and found that
there was a performance increase, but within the context of
conditional_entropy, this nanosecond performance boost was hidden by the
runtime of the rest of the function, which was on the order of milliseconds
to seconds.

Even if this didn't help runtime by any measurable means, it most likely
helped memory usage a little bit, as the transposed arrays are not being
created and stored in memory. It also gave me a chance to test out the unit
test script https://gist.github.com/dwysocki/03391f11630e87d2cac4 I
wrote to help with optimizing, which not only displays run time, but also
asserts that behavior is not altered. Subsequent optimizations will be
benchmarked with this script.

Now to fix the real bottleneck: that return statement.


Reply to this email directly or view it on GitHub
#50 (comment).

@dwysocki
Copy link
Member Author

From my understanding, slicing gives you a view of the existing array, while transposing creates a new array. Either way, it at least seems like better style to take a slice of an array than to transpose it and then slice it.

@dwysocki
Copy link
Member Author

So I guess I was wrong, transpose is also a view. I think the performance differences between the two are probably too small to measure, so I'm going to say we should adopt the column slicing approach, just for readability's sake.

@dwysocki
Copy link
Member Author

Ok, here's the real performance boost f2d57bc

All I changed here was the return line in CE. I broke it into much more optimal numpy array operations, instead of a generator comprehension. It isn't as readable as it was before, but it gave slightly more than a factor of 2 speed increase, without changing the behavior.

There is one caveat, though. The way I've done it, I perform the computation over all of the bins, disregarding even if that means dividing by zero. This doesn't change the end behavior, however, because we know in advance the indices where that will occur, and I just overwrite them with zeroes afterwards. This has the annoying side-effect of numpy printing division by zero and invalid operation warnings, which I suppressed. This could back-fire, though, because it suppresses possible other warnings. I would have to ensure the computation only occurs at the indices which do not need to be replaced with 0, which is rather difficult, so for now I'm going to leave these changes in a new function, until I figure out a better way.

Here are the performance tests:

$ ipython CE_test.ipy OGLE-LMC-CEP-0005.dat 0.001 5.0 6.0
Old performance:    1 loops, best of 3: 1.9 s per loop
New performance:    1 loops, best of 3: 829 ms per loop
$ ipython CE_test.ipy OGLE-LMC-CEP-0005.dat 0.0001 5.0 6.0
Old performance:    1 loops, best of 3: 17.4 s per loop
New performance:    1 loops, best of 3: 7.3 s per loop

@dwysocki
Copy link
Member Author

I've done away with the division by zero issue altogether 4374018

Here are the performance tests:

$ ipython CE_test.ipy OGLE-LMC-CEP-0005.dat 0.001 5.0 6.0
Old performance:    1 loops, best of 3: 1.83 s per loop
New performance:    1 loops, best of 3: 707 ms per loop
$ ipython CE_test.ipy ~/research/cv/input/OGLE-LMC-CEP-0005.dat 0.0001 5.0 6.0
Old performance:    1 loops, best of 3: 17.5 s per loop
New performance:    1 loops, best of 3: 7.18 s per loop

@earlbellinger
Copy link
Contributor

Good jorb. Is this issue solved now?

@dwysocki
Copy link
Member Author

I think it should be tested on a sample of OGLE stars, timed, and compared to the results Gabriel got previously.

@earlbellinger
Copy link
Contributor

possibly now solved?

@dwysocki
Copy link
Member Author

Probably. For future reference: the few test cases I tried were giving results consistent with expectations, but Gabriel uncovered test cases that were not. I had introduced a bug, and summed along the wrong axis.

To close this issue off, I think we should obtain Gabriel's old results, and confirm that we are now consistent with a random sample of them.

@dwysocki
Copy link
Member Author

Here is a tabulation of Gabriel's results, divided into folders for M different star types. I propose we take the complete data, randomly select N from each, and then run CE for the sample of N*M stars.

We might expect to see a very small difference, due to numerical error, so we should reproduce these results, take the difference, and make a call as to whether it's tolerable.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants