Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add zipWithIndexAndSize() #363

Closed
Spike2050 opened this issue Oct 29, 2019 · 3 comments
Closed

Add zipWithIndexAndSize() #363

Spike2050 opened this issue Oct 29, 2019 · 3 comments

Comments

@Spike2050
Copy link
Contributor

Sometimes I need to interrupt my methodchaining to switch behaviours depending on the size of my sequence.

Would a zipWithIndexAndSize() make sense for you or is there a way around that I'm not seeing right now?

It could be like this Tuple<T,Tuple<Long,Long>> zipWithIndexAndSize()

@lukaseder
Copy link
Member

Would a zipWithIndexAndSize() make sense for you or is there a way around that I'm not seeing right now?

The fact that there is a keyword and in your method name hints at this practice not scaling at all in terms of API design. If we did this here, then we'd have to do it everywhere, as someone might eventually request crossJoinAndSize() or flatMapAndSize(), or just plain zipWithSize()

When in fact, you could just write:

seq.zipWithIndex()
   .map(t -> tuple(t.v1, t.v2, list.size()))

Or, if you don't have access to the original list, use jOOλ's window API:

seq.window()
   .map(w -> tuple(w.value(), w.rowNumber(), w.count()));

@Spike2050
Copy link
Contributor Author

Hi, thx for the feedback.

Window functions! Seq has window functions! I cant see this here. And I couldn't see it on your github page. I found your blog post after googling it. How did I not know this? Why isn't that advertised on your github page?

@lukaseder
Copy link
Member

Why isn't that advertised on your github page?

I'm still reluctant to advertise it too much, because the implementation is not optimised as much as it could have been. Once you use window frames, algorithms tend to be O(n^2) instead of O(n log n).

Some limitations include:

There's a lot of research in this area, which we have currently not investigated: http://vldb.org/pvldb/vol5/p1244_yucao_vldb2012.pdf

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

2 participants