Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Do we need more functional elements? #9

Open
timyates opened this issue Jan 28, 2013 · 52 comments
Open

Do we need more functional elements? #9

timyates opened this issue Jan 28, 2013 · 52 comments
Assignees
Milestone

Comments

@timyates
Copy link
Owner

So currently you can do:

def s = Stream.from x:1..3, y:1..3 map { x + y } filter { x != y }
assert s.take( 5 ).collect() == [ 3, 4, 3, 5, 4 ]

And to stop a stream you can do:

def s = Stream.from { x + y++ } filter { it < 5 ?: STOP } using x:1, y:1
assert s.take( 5 ).collect() == [ 2, 3, 4 ]

But I need to think whether it's worth adding some different constructs instead, something like:

def s = Stream.using x:1, y:1 map { x + y } update { x += 1 ; y += 3 } until { x >= 3 }
assert s.take( 5 ).collect() == [ 2, 6 ]

Anyone got any opinions before I arbitrarily make a decision?

Knowing me, you've got a good few weeks ;-)

@ghost ghost assigned timyates Mar 26, 2013
@timyates
Copy link
Owner Author

Added an until block as of commit d715f7c

@tbarker9
Copy link
Contributor

What about validate{}? It would work like filter, but also filter out anything causes an AssertionError.

@timyates
Copy link
Owner Author

So you'd do:

Stream.from list validate { assert it % 2 == 0 }

Rather than

Stream.from list filter { it % 2 == 0 }

?

What does it gain you?

@tbarker9
Copy link
Contributor

Might be nice for Stream to optionally store some stats. Four counts might be enough:

processed - count of all items that went through all steps without be filtered out
ignored - count of all items that were filtered out (not including invalid)
invalid - count of all items that were invalid
total - processed + ignored + invalid

@tbarker9
Copy link
Contributor

More descriptive I suppose. Say I am processing a list of people with some dirty data. I only want males (filter), but say one of the people has a number for a last name, I would put that check in validate.

@timyates
Copy link
Owner Author

Yeah, that would be nice... Need to think where this data can be stored...

I was thinking the other week, if we get rid of the using block, then the whole stream can be implemented as a chain of Iterators. However this does mean that each iterator step will be ignorant of the other steps in the chain, so metrics like this would be tricky (unless the stream keeps it's parent steps, and allows the user to interrogate each step)

Still not decided in my head whether it's worth abandoning using though... It's quite nice to have variables scoped to the stream, but it abandons all sense of functional purity

@tbarker9
Copy link
Contributor

Although with that example I could still use just conditionals and have validate match filter functionality. Given that a lot of people might use this library for data processing (me 😄 ), having validate regardless of whether or not it supports AssertErrors might be handy. Think of validate as given vs setup in spock. Its existence might be for readability only.

@tbarker9
Copy link
Contributor

To get the stats to work, you could implement each or something like each and return the stats. Maybe be explicit about it with eachWithStats.

@tbarker9
Copy link
Contributor

Actually, after looking at Stream, why can't you put the stats in there? You already iterate over the steps in loadNext. Probably a good idea to put the stats in a class of its own.

@timyates
Copy link
Owner Author

Added the stats as issue #14 so I don't forget or lose this 😉

@tbarker9
Copy link
Contributor

This is probably totally out of scope, but might as well shoot for the stars! What about chunking? Say you have

Stream.from 1..8 chunk(5)

This will create a list with two element, each containing a list of their own. The first element contains 1 to 5, the second 6 to 8. Each chunk has no more than 5 elements. Pretty handy if you want to process data to a database in transactional chunks. aggregate might be a reasonable name as well. To make it easier to implement, you could make this a terminating step where a plain old iterator is returned. Otherwise dealing with stats gets really dicy.....

@timyates
Copy link
Owner Author

I like it, kinda the opposite of flatMap?

I think chunk is the name used in Scala as well (and it's always good to
stick with used names)

Or was that chunkBy? (that's probably a different method though)

I'll do some searching... Can you add this as a new issue?

@tbarker9
Copy link
Contributor

I have a batch iterator lying around somewhere that can probably do this. Basically wraps another iterator and returns 'chunks'. Let me see if I can use it here.

@tbarker9
Copy link
Contributor

I like the idea of making the stream a chain of iterators. If someone wanted to make a stream 'plugin', you could do it easily using a groovy extension. You could also have a method addStep to add a custom step.

@timyates
Copy link
Owner Author

The only problem is that I think we would have to get rid of the using block, as otherwise this map needs to be passed between (and possibly mutated by) all Iterators in the chain

@tbarker9
Copy link
Contributor

I think the flexibility from chained iterators warrants canning using, but maybe I am too tempted by shiny things to see clearly

@timyates
Copy link
Owner Author

That's my worry too ;-)

Ooooh...shiny!

@tbarker9
Copy link
Contributor

so when I see Stream.from{} filter{} map{} using([foo:"bar"]) I think using applies to the previous map, but not to the filter. From the code I think using is a global concept (applied to all the steps). If we make it apply to the previous step only, then we could keep using using without much work. I think the examples you give on the webpage would still work too. I might be missing something.

@tbarker9
Copy link
Contributor

Actually never mind, noticed the last example on the web page would not work. Oh well.

@timyates
Copy link
Owner Author

That last example doesn't work with v0.6, as I removed STOP 😦

I need to revisit the docs, and get them up to scratch

Since v0.6, that code becomes:

Stream s = Stream.from { 1 } map { it } until { ++idx > 5 } using idx:0
assert s.collect() == [ 1, 1, 1, 1 ,1 ]

Which could equally be written as:

def idx = 0 
Stream s = Stream.from { 1 } map { it } until { ++idx > 5 }
s.collect()

Avoiding the using block... Right...time to revisit the documentation...

@timyates
Copy link
Owner Author

Right, thrown up a quick changed version to http://timyates.github.io/groovy-stream/

Documentation is always the first casualty of coding ;-)

@tbarker9
Copy link
Contributor

Last thing I thought about... If I am iterating through something big, I frequently want to give feed back for every n records. What about something like

Stream.from(somethingBig).atEvery(1000){log.info "I processed [$it] records"}.filter{}.map{}

The other option is to do module in map. atEvery would just pass the record to the next step no matter what. You could also pass a boolean to indicate if you want it to fire off at the end. it represents the cumulative count of records that have passed through atEvery

@timyates
Copy link
Owner Author

I like that idea 😃

Need to think what it should be called... It's kinda like the tap method I added in groovy-common-extensions

Maybe tapEvery or something... Or actually, maybe atEvery is a better name...

@tbarker9
Copy link
Contributor

hmmm... I like tap or tapEvery since I am a camel guy. This would be like a wiretap where nothing happens to the message, but you are allowed to do something else with it. Maybe I will do this this weekend.

@timyates
Copy link
Owner Author

I like tap too... ;-)

came up with this:

ba2a4d8

@tbarker9
Copy link
Contributor

Damn that was fast. I am sad, was going to pound this out tonight

On Wed, Oct 23, 2013 at 3:38 PM, Tim Yates [email protected] wrote:

I like tap too... ;-)

came up with this:

ba2a4d8ba2a4d8


Reply to this email directly or view it on GitHubhttps://github.com//issues/9#issuecomment-26938682
.

@timyates
Copy link
Owner Author

Sorry, I was left home alone with the kids asleep ;-)

There's probably a better way of doing it than that way (so it's in a separate branch) :-)

Code golf! Game on! :-D

@tbarker9
Copy link
Contributor

tbarker9 commented Nov 6, 2013

after using groovy-stream a decent amount, at this point I am not sure I like using at all. I 'think' I would prefer boring old normal closure variable scope handling. Seems more natural to me.

@tbarker9
Copy link
Contributor

tbarker9 commented Nov 6, 2013

Also, I started created extensions to stream to handle events I commonly deal with. Maybe I could carve them out of my projects and make a groovy-stream-ext project.... or something like that. Basically I have Stream.fromResultSet(ResultSet resultSet), Stream.fromXml(InputStream stream, String tag ) and a few potentially less useful ones for csv files and other file formats. The xml one is made to handle large documents that have repeating tags, and includes some other settings to deal with namespaces. Thoughts?

@timyates
Copy link
Owner Author

timyates commented Nov 8, 2013

Hiya! Sorry for the late response, multiple family birthdays and rampant cold viruses are to blame...

Re: using: I tend to agree, and will probably look to remove this in v0.7 it should make everything simpler

Re: extensions. Brilliant! :-D A separate module with static Stream extensions would be brilliant (so we can call ResultSet.toStream() for example. this project shows how you can set up a library that extends classes in Groovy when loaded :-)

Nice work!

@tbarker9
Copy link
Contributor

tbarker9 commented Nov 8, 2013

I already have done a lot of this already in a separate project for work. So it will be a matter of carving it out for public consumption. I will create a module called groovy-stream-ext and add at least my ResultSet and xml extensions.

In addition to ResultSet.toStream() (which I am not sure I did but will 😉 ), I also extended Stream with fromResultSet. I was a little wary of extending from in case it leads to future clashes, or leads to other weird behavior with extensions. Does that seem reasonable?

Anyway, with the extensions we could be on our way to creating a lightweight ETL scripting tool which is pretty darn cool.

If you like it, you are welcome to merge it into this project as a multi module build.

@timyates
Copy link
Owner Author

timyates commented Nov 8, 2013

Yeah, if you extend an existing method I believe it completely overwrites the original method, I think it should be ok in this case though as it would have a different method signature 😉

I reckon two separate projects to start with with strong href links between them is the place to start 😄

Going to spend the weekend thinking about "Stream without using", we could take a few methods from RxJava as well (which does a similar thing but in an asynchronous push way, rather then the stream which is a synchronous pull)

@tbarker9
Copy link
Contributor

tbarker9 commented Nov 8, 2013

I think I won't extend from still and leave fromResultSet. I like the from being defined in one place and more extensions claiming another relevant namespace... maybe I am just paranoid.

I was thinking about the async stuff the other day. At very least a decent opportunity for actors (or seda) I think. Basically a unit of work for grab -> process -> put (much like the laundrey problem, wash your clothes while they are drying, but can't dry the washed stuff until the dryer is done). The more steps you have the more you will benefit from parallel processing while everything maintains order. You could still use the main thread as well if using the Actor model (at least I think so) and make it synchronous. Please correct me if I am being stupid, not exactly my specialty.

@timyates
Copy link
Owner Author

@tbarker9 How does 8a02c39 look?

@tbarker9
Copy link
Contributor

From a cursory glance I like it. Might open it up at lunch today for a more thorough review. So what happens when Java 8 comes out? Will this project still be relevant? I know very little about Java 8's stream ambitions, but have seen similar concepts.

@tbarker9
Copy link
Contributor

Oh, and what does Next up, zip so we can cross the streams ;-) mean? Ghostbusters would disagree.

@timyates
Copy link
Owner Author

Yeah, the Java 8 streams are a very similar beast.

Indeed, I renamed a lot of the functions in this package to follow their naming conventions

The only tricky bit I can think of would be using lambdas with the map/filter/etc blocks (and I haven't yet tested to see what happens)... With the map and filter steps, we set the delegate of the passed in closure, so that when you pass [ x: 1, y: 2 ] into a map closure, you can access the var like { x + y } instead of { it.x + it.y}

I am not sure if lambdas can be used in place of Closures in this instance :-(

More testing is required... Also, this package may be made obsolete by Java Streams when java 8 is the de-facto standard...

I guess this is also usable in Java 6+ (should be Java5+, but I can't test Java 5 at the moment)

My thoughts about a zip method were:

Say you have 2 iterators (or streams), you could have a:

Stream streamA = Stream.from 1..3
Stream streamB = Stream.from 'a'..'c'

Stream streamC = streamA.zip( streamB ) { a, b -> [ a: a, b: b ] }

assert streamC.collect() == [ [ a:1, b:'a' ], [ a:2, b:'b' ], [ a:3, b:'c' ] ]

Or even:

Stream streamA = Stream.from 1..3
Stream streamB = Stream.from 'a'..'c'

Stream streamC = streamA.zip( streamB ) { a, b -> [ a, b ] } flatMap { it }

assert streamC.collect() == [ 1, 'a', 2, 'b', 3, 'c' ]

So basically a transpose call on iterators...

It worked against Mr Stay-Puft 😆

@tbarker9
Copy link
Contributor

So I guess the main plus over java streams would be the use of closures? I assume you have functionality they don't have too?

So what about concat, is that in there? Seems like a useful function to have.

@timyates
Copy link
Owner Author

Yeah, they have concat as a static method of Stream:

http://download.java.net/jdk8/docs/api/java/util/stream/Stream.html#concat-java.util.stream.Stream-java.util.stream.Stream-

So you'd do:

Stream.concat( streamA,streamB )

Not sure whether I prefer this, or:

streamA.concat( streamB )

In RxJava (the push version of streams if you think of Streams as being a "pull" technique), they have concat to do the elements of one thing, followed by the elements of the other (as with Java Streams)

And zip to take an item from each and generate something new from that

@tbarker9
Copy link
Contributor

Yeah, I guess you can't support both. The second option has the appeal of maintaining method chaining which groovy-stream promotes, but the first is what java 8 does. I think I like the second one more. I kind of like

streamA.map{}.filter{}.concat(streamB).map{}.filter{}.etc{}

It wouldn't be unusual to transform a stream to adhere to a similar one and then process them both at once. Both methods accomplish this, but the fluid one looks better to me.

@tbarker9
Copy link
Contributor

Actually, why not support both with concat for the static version and append for method chaining?

@timyates
Copy link
Owner Author

I'm also thinking of adding mapWithIndex, filterWithIndex and untilWithIndex which accept 2 arg Closures the second of which will be the current local index of that Iterator

@tbarker9
Copy link
Contributor

yes please!

@tbarker9
Copy link
Contributor

If you got really nutty you could also do prepend, although its usefulness is questionable.

@timyates
Copy link
Owner Author

We could call them both concat as they would have different forms wouldn't they? The 2 arg concat would be the static one, and the 1 arg concat the instance one?

prepend makes my head hurt ;-)

@tbarker9
Copy link
Contributor

So I thought groovy gets mad if you have a static and non static method with the same name regardless of different parameters? If I am wrong then yeah, seems good. Although this is in java now so maybe it is fine.

@tbarker9
Copy link
Contributor

You could also do << or addAll

@timyates
Copy link
Owner Author

Think it's ok... THis seems to work:

class Test {
    String name
    static stuff( String name ) {
        println "Mmmm.. $name..."
        new Test( name: name )
    }

    def stuff() {
        println "Lovely $name..."
    }
}

Test.stuff( 'Stuff' ).stuff()

And prints:

Mmmm.. Stuff...
Lovely Stuff...

@tbarker9
Copy link
Contributor

Huh... Maybe this is a grails thing? I swear I upgraded something groovy related and had to stop doing that. I tested it out, seems to work for me too. Yeah, supporting both seems reasonable then.

@timyates
Copy link
Owner Author

@timyates
Copy link
Owner Author

And added streamA.concat( streamB )

949f8b6

@uris77
Copy link

uris77 commented Jul 30, 2014

Not sure if this is a functional element, but I've needed to do a lazy sort (even multi sort at times). I think RxJava has a toSortedList(Func2).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

No branches or pull requests

3 participants