Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Histogram advice #18

Open
jmh530 opened this issue Jul 23, 2020 · 7 comments
Open

Histogram advice #18

jmh530 opened this issue Jul 23, 2020 · 7 comments

Comments

@jmh530
Copy link
Contributor

jmh530 commented Jul 23, 2020

@9il I've got a basic implementation of a histogram function, available here. However, I'm not entirely happy with how I am handling the types.

For the HistogramAccumulator, I took a similar approach as in that MovingAverage UT in using T[] and U[]. This requires converting the inputs in histogram to be rcarrays and then slicing them. However, then the output type is a U[] for the histogram function, and I would rather it be a slice of some kind. I'm not really sure on the best way to accomplish this and would appreciate some advice if you can think of a better way to structure it.

@9il
Copy link
Member

9il commented Jul 26, 2020

bool isRightClosed, and bool includeEdgea doesn't look like required params. They can be removed though. nextUp/nextDown can be used instead by user.

We don't need an accumulator struct here. Just a function that accepts iterable or a single element.

We can add two kinds of historgram API. Functions that accept slices to fill into. And functions that gc-allocates zero slices and use the previous one functions as a low level implementation.

Two algorithms can be provided:

  1. With breaks. Complexity is O(log(binCount)) for one input element
  2. Without breaks (but with lower/upper/length). Complexity is O(1) for one input element

@jmh530
Copy link
Contributor Author

jmh530 commented Jul 26, 2020

Thanks!

The isRightClosed and includeEdge were influenced by R's implementation (here)

I'm not sure what you mean by nextUp/nextDown.

My intention was to first get it working with user-provided breaks and then to create functions that create breaks and integrate them somehow. The part I wasn't sure about was whether to gc-allocate the result. So I think you've made that a little more clear.

I'm not sure I completely understand your second point about an algorithm without breaks. You mean that this would take one input element and tell you if it is within lower/upper? But what is length for? To tell you where lower is within breaks?

@jmh530
Copy link
Contributor Author

jmh530 commented Jul 27, 2020

@9il I've spent some time looking at Boost Histogram. It makes some interesting design decisions. I think your #2 on the algorithms could correspond to their idea of an axis, which provides the mapping from the input to the location in the bin.

@9il
Copy link
Member

9il commented Jul 31, 2020

I'm not sure what you mean by nextUp/nextDown.

http://mir-core.libmir.org/mir_math_ieee.html#.nextDown

@9il
Copy link
Member

9il commented Jul 31, 2020

I'm not sure I completely understand your second point about an algorithm without breaks. You mean that this would take one input element and tell you if it is within lower/upper? But what is length for? To tell you where lower is within breaks?

I mean a uniform grid. If you have lower, upper bounds, and length, don't need to have the breaks then.

@jmh530
Copy link
Contributor Author

jmh530 commented Jul 31, 2020 via email

@jmh530
Copy link
Contributor Author

jmh530 commented Apr 18, 2023

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants