Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

spores not serializable with spark? #36

Open
nimbusgo opened this issue Mar 15, 2016 · 9 comments
Open

spores not serializable with spark? #36

nimbusgo opened this issue Mar 15, 2016 · 9 comments

Comments

@nimbusgo
Copy link

this throws a java.io.NotSerializableException

sc.parallelize(1 to 5).map(spore{(x: Int) => x * 2}).collect()

I thought this was the intended usage, am I missing something?

This looks like just the library I was hoping for, but I can't seem to find any way to specify a spore that passes sparks closure serializer.

This is with spark 1.6, scala 2.11.6

@nimbusgo
Copy link
Author

this seems like an issue:

    val f = (x:Int) => x*2
    val s = spore{(x:Int) => x*2}

    println(f.isInstanceOf[Serializable]) //true
    println(s.isInstanceOf[Serializable]) //false

@nimbusgo
Copy link
Author

Oh I think I was definitely missing something, I think this is the intended usage? This works with spark now.

    val s = spore { (x:Int) => x * 2 }
    val res  = s.pickle

    def unpack(jSONPickle: JSONPickle)={
      jSONPickle.unpickle[Spore[Int, Int]]
    }

    sc.parallelize(1 to 5).map(x => {unpack(res)(x)}).foreach(println)

This is great and I think I'll be able to use this with spark now.

@nimbusgo
Copy link
Author

hmm, the json that comes out has something like this in it:

"className": "my.package.SomeClass$$anonfun$10$anonspore$macro$24$1"

And it only works if you unpickle this where you have that function on the classpath. You can't ship it somewhere else and unpickle it.

Is there any way to serialize a spore that can be run somewhere else, or is that a topic of future research?

@jvican
Copy link
Collaborator

jvican commented Mar 23, 2016

Yes, the className is the prefix name of the Unpickler and Pickler that you'll use when you receive the spore in the recipient. However, note that this class name depends on the place it's defined. If you want to be able to unpickle this in the recipient, you'll need to share the same program in both master and worker. Therefore, you'll have in both sides a class SomeClass defined in my.package.

@heathermiller
Copy link
Owner

Hotswapping (dynamically loading new classfiles) is not future work – there's been decades of not-so-successful research involving mobile code. Security is the biggest problem on that front as I'm sure you can understand.

So if you'd like to use spores, you'd have to do what Spark does and ship new jars to workers.

@jvican
Copy link
Collaborator

jvican commented Jun 1, 2016

Could we close this and, maybe, add this to the README project so that people know better this limitation from the beginning?

@samthebest
Copy link

The conclusion of this thread isn't clear to me. The jar does get shared to the workers from the driver in Spark, so spores can work.

In what situations is this not the case? Not sure what the "limitation" is then.

@nimbusgo
Copy link
Author

nimbusgo commented Jun 2, 2016

So, I definitely understand the reasoning on the tangent. It would require shipping class files / jars and is definitely a hassle. I'm not sure how it would be any more of a security risk than a browser accessible repl (notebooks), which is one of the primary ways people are using spark.

To return to the original issue, is there a good reason why this shouldn't work?

sc.parallelize(1 to 5).map(spore{(x: Int) => x * 2}).collect()

When I found this project and the related slides, I thought this was one of the primary intended uses.

I mean, spark is explicitly mentioned here as a use for spores: https://speakerdeck.com/heathermiller/spores-distributable-functions-in-scala

If this is not the intended way to use spores with spark, I'm curious what the intended way is or if spores are not intended to work with spark.

As I mentioned in an earlier reply, I did find a way to make use of spores, by manually packing and unpacking into json. But this seems like a lot of effort for not much reward. You get protection against closure problems within the immediate scope of the declared function. That is pretty good, and covers many cases. But this doesn't guarantee that your function is actually free of closure problems, because it could call a function which has closure problems and it won't know about it. Worse, you can get inconsistent behavior if this is going on.

Try this out. In theory, sp should be the same function as s But it produces different results.


    val y = 1
    val f = (x: Int) => x * 2 + y
    val sp = spore{
      val fun = f
      (x:Int) => fun(x)
    }

    val s = JSONPickle(sp.pickle.value).unpickle[Spore[Int, Int]]

Also with this method, because I also have to cast the unpack, I'm just trading one set of potential bugs (closure serialization) for another, bad type casts. This makes me think I've got the wrong idea about this and there's got to be a better way.

@jvican
Copy link
Collaborator

jvican commented Jun 2, 2016

Hi @nimbusgo.

No, spores is not a project made exclusively for spark support and, in turn, has more use cases--all of them related to the use of safe, reliable closures in the context of distributed systems (e.g. I'm using it in the function-passing model). While we agree that it should work, we don't provide official support here, since this is more an issue of solving compatibility problems with Spark. Nonetheless, I'd be happy to help you get it working in Spark, I agree that it's an important target.

There's an ongoing PR that improves considerably the serialization/deserialization experience with scala pickling. This is the default mechanism to serialize and deserialize them and I'd encourage you to give it a try. Spark uses its own serialization infrastructure, iirc Kryo, and serializing and deserializing spores need some special information that normal reflection mechanisms lack.

W.r.t. the first code snippet, it should work perfectly but, again, that depends on what Spark is doing under the hood (I'm no expert in Spark, so I can't provide you good feedback on this). If it doesn't work, please let us know why with a small toy project or detailed error information.

The second example that you provide compiles and I'm going to explain why. Spores have some limitations because they use macros. By definition, macros can't inspect the environment on which they are expanded. So, in this case, there's no way that the macro is able to inspect the body of f, because you only have a reference to f. You could remove this limitation by putting the body of f into the spore header or by casting f to a spore and, then, an automatic conversion will take place and fail compilation. Also, if you remove val fun = f from the spore header, it won't compile. If users decide to make use of functions outside the scope and they declare them in the header, we can't do almost anything to prevent it. What we may be able to do, though, is to disallow any use of functions in the Captured type member. I'll entertain this idea for a while and I may even implement it, but I have to deeply look into the problem.

So, in conclusion, two things:

  1. Scala pickling and spores work perfectly together (but wait for some days because we're cutting releases of both soon). I don't know what you mean by packing and unpacking, but you are able to serialize them with s.pickle and deserialize them with s.unpickle[SporeType].
  2. Spores are not the ultimate solution for safe closures and have some limitations. However, in most of the cases, they work. In fact, your example would fall into these limited cases where it's the developers' fault, not ours (remember, we can't do almost anything to prevent it).

Right now, I'm focused on improving the test infrastructure and reducing the number of potential bugs that could appear. If you feel like spores is still a useful project, I'd very much like to see error reports that I could use to improve the overall quality of the project.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants