spores not serializable with spark? #36

nimbusgo · 2016-03-15T19:04:13Z

this throws a java.io.NotSerializableException

sc.parallelize(1 to 5).map(spore{(x: Int) => x * 2}).collect()

I thought this was the intended usage, am I missing something?

This looks like just the library I was hoping for, but I can't seem to find any way to specify a spore that passes sparks closure serializer.

This is with spark 1.6, scala 2.11.6

The text was updated successfully, but these errors were encountered:

nimbusgo · 2016-03-15T21:16:31Z

this seems like an issue:

    val f = (x:Int) => x*2
    val s = spore{(x:Int) => x*2}

    println(f.isInstanceOf[Serializable]) //true
    println(s.isInstanceOf[Serializable]) //false

nimbusgo · 2016-03-16T17:26:26Z

Oh I think I was definitely missing something, I think this is the intended usage? This works with spark now.

    val s = spore { (x:Int) => x * 2 }
    val res  = s.pickle

    def unpack(jSONPickle: JSONPickle)={
      jSONPickle.unpickle[Spore[Int, Int]]
    }

    sc.parallelize(1 to 5).map(x => {unpack(res)(x)}).foreach(println)

This is great and I think I'll be able to use this with spark now.

nimbusgo · 2016-03-16T17:49:18Z

hmm, the json that comes out has something like this in it:

"className": "my.package.SomeClass$$anonfun$10$anonspore$macro$24$1"

And it only works if you unpickle this where you have that function on the classpath. You can't ship it somewhere else and unpickle it.

Is there any way to serialize a spore that can be run somewhere else, or is that a topic of future research?

jvican · 2016-03-23T09:35:35Z

Yes, the className is the prefix name of the Unpickler and Pickler that you'll use when you receive the spore in the recipient. However, note that this class name depends on the place it's defined. If you want to be able to unpickle this in the recipient, you'll need to share the same program in both master and worker. Therefore, you'll have in both sides a class SomeClass defined in my.package.

heathermiller · 2016-03-23T10:24:26Z

Hotswapping (dynamically loading new classfiles) is not future work – there's been decades of not-so-successful research involving mobile code. Security is the biggest problem on that front as I'm sure you can understand.

So if you'd like to use spores, you'd have to do what Spark does and ship new jars to workers.

jvican · 2016-06-01T21:06:56Z

Could we close this and, maybe, add this to the README project so that people know better this limitation from the beginning?

samthebest · 2016-06-02T04:09:19Z

The conclusion of this thread isn't clear to me. The jar does get shared to the workers from the driver in Spark, so spores can work.

In what situations is this not the case? Not sure what the "limitation" is then.

nimbusgo · 2016-06-02T04:56:41Z

So, I definitely understand the reasoning on the tangent. It would require shipping class files / jars and is definitely a hassle. I'm not sure how it would be any more of a security risk than a browser accessible repl (notebooks), which is one of the primary ways people are using spark.

To return to the original issue, is there a good reason why this shouldn't work?

sc.parallelize(1 to 5).map(spore{(x: Int) => x * 2}).collect()

When I found this project and the related slides, I thought this was one of the primary intended uses.

I mean, spark is explicitly mentioned here as a use for spores: https://speakerdeck.com/heathermiller/spores-distributable-functions-in-scala

If this is not the intended way to use spores with spark, I'm curious what the intended way is or if spores are not intended to work with spark.

As I mentioned in an earlier reply, I did find a way to make use of spores, by manually packing and unpacking into json. But this seems like a lot of effort for not much reward. You get protection against closure problems within the immediate scope of the declared function. That is pretty good, and covers many cases. But this doesn't guarantee that your function is actually free of closure problems, because it could call a function which has closure problems and it won't know about it. Worse, you can get inconsistent behavior if this is going on.

Try this out. In theory, sp should be the same function as s But it produces different results.


    val y = 1
    val f = (x: Int) => x * 2 + y
    val sp = spore{
      val fun = f
      (x:Int) => fun(x)
    }

    val s = JSONPickle(sp.pickle.value).unpickle[Spore[Int, Int]]

Also with this method, because I also have to cast the unpack, I'm just trading one set of potential bugs (closure serialization) for another, bad type casts. This makes me think I've got the wrong idea about this and there's got to be a better way.

jvican · 2016-06-02T08:28:35Z

Hi @nimbusgo.

No, spores is not a project made exclusively for spark support and, in turn, has more use cases--all of them related to the use of safe, reliable closures in the context of distributed systems (e.g. I'm using it in the function-passing model). While we agree that it should work, we don't provide official support here, since this is more an issue of solving compatibility problems with Spark. Nonetheless, I'd be happy to help you get it working in Spark, I agree that it's an important target.

There's an ongoing PR that improves considerably the serialization/deserialization experience with scala pickling. This is the default mechanism to serialize and deserialize them and I'd encourage you to give it a try. Spark uses its own serialization infrastructure, iirc Kryo, and serializing and deserializing spores need some special information that normal reflection mechanisms lack.

W.r.t. the first code snippet, it should work perfectly but, again, that depends on what Spark is doing under the hood (I'm no expert in Spark, so I can't provide you good feedback on this). If it doesn't work, please let us know why with a small toy project or detailed error information.

The second example that you provide compiles and I'm going to explain why. Spores have some limitations because they use macros. By definition, macros can't inspect the environment on which they are expanded. So, in this case, there's no way that the macro is able to inspect the body of f, because you only have a reference to f. You could remove this limitation by putting the body of f into the spore header or by casting f to a spore and, then, an automatic conversion will take place and fail compilation. Also, if you remove val fun = f from the spore header, it won't compile. If users decide to make use of functions outside the scope and they declare them in the header, we can't do almost anything to prevent it. What we may be able to do, though, is to disallow any use of functions in the Captured type member. I'll entertain this idea for a while and I may even implement it, but I have to deeply look into the problem.

So, in conclusion, two things:

Scala pickling and spores work perfectly together (but wait for some days because we're cutting releases of both soon). I don't know what you mean by packing and unpacking, but you are able to serialize them with s.pickle and deserialize them with s.unpickle[SporeType].
Spores are not the ultimate solution for safe closures and have some limitations. However, in most of the cases, they work. In fact, your example would fall into these limited cases where it's the developers' fault, not ours (remember, we can't do almost anything to prevent it).

Right now, I'm focused on improving the test infrastructure and reducing the number of potential bugs that could appear. If you feel like spores is still a useful project, I'd very much like to see error reports that I could use to improve the overall quality of the project.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

spores not serializable with spark? #36

spores not serializable with spark? #36

nimbusgo commented Mar 15, 2016

nimbusgo commented Mar 15, 2016

nimbusgo commented Mar 16, 2016

nimbusgo commented Mar 16, 2016

jvican commented Mar 23, 2016

heathermiller commented Mar 23, 2016

jvican commented Jun 1, 2016

samthebest commented Jun 2, 2016

nimbusgo commented Jun 2, 2016

jvican commented Jun 2, 2016

spores not serializable with spark? #36

spores not serializable with spark? #36

Comments

nimbusgo commented Mar 15, 2016

nimbusgo commented Mar 15, 2016

nimbusgo commented Mar 16, 2016

nimbusgo commented Mar 16, 2016

jvican commented Mar 23, 2016

heathermiller commented Mar 23, 2016

jvican commented Jun 1, 2016

samthebest commented Jun 2, 2016

nimbusgo commented Jun 2, 2016

jvican commented Jun 2, 2016