-
Notifications
You must be signed in to change notification settings - Fork 212
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Serialization issues with Mono 5 #689
Comments
Ok, so for anyone else out there thinking their pyspark style lambda expressions are going to work you are going to have a bad day. There is a fundamental issue, namely the C# compiler generates anonymous functions as non-serializable. SparkCLR does some work to serialize fully closed lambda expressions, but any lambda's that reference outside their scope will be picked up by the C# compiler and made non-serializable. I'm sure there are ways around this to elegantly turn any anonymous function into a method on a serializable class, but it has thus far eluded me. The bottom line is that you need to make a serializable helper class and use a method from that class directly in map, etc. without any lambda expressions that are not fully closed. From the sharp CLR samples there is a BroadcastHelper class that can help you transfer data as a broadcast variable, and you can use this architecture to send any sort of data to the worker threads by first initializing a new object with the data you want used in the delegate: now, this can work: whereas I would love to chat with anyone out there who has the experience to make these kind of lambda expressions work out of the box without making helper classes. |
I get a serialization error any time I'm passing data at runtime. This happens with any calls using a lambda expression with data originating outside of the lambda expression. Using broadcast variables also gives the same error.
Gives serialization error:
string x="/path"; var results = rdd.Map(input => { Console.WriteLine(x); });
but, no serialization error here:
var results = rdd.Map(input => { Console.WriteLine("/path"); });
This is running 2.0.2 Spark, Linux Mono 5.10.1.20, built with msbuild. I've also tested Mono4.8.1 with xbuild and get the same error.
actual error:
ERROR System.Runtime.Serialization.SerializationException: Type '(MyClassName+<>c__DisplayClass4_0' in Assembly 'MyAssembly, Version=1.0.0.0, Culture=neutral, PublicKeyToken=null' is not marked as serializable.
at System.Runtime.Serialization.FormatterServices.InternalGetSerializableMembers (System.RuntimeType type) [0x00045] in <8fbafb724c144c9dad69bccfec38ae40>:0
at System.Runtime.Serialization.FormatterServices+<>c__DisplayClass9_0.b__0 (System.Runtime.Serialization.MemberHolder _) [0x00000] in <8fbafb724c144c9dad69bccfec38ae40>:0
at System.Collections.Concurrent.ConcurrentDictionary
2[TKey,TValue].GetOrAdd (TKey key, System.Func
2[T,TResult] valueFactory) [0x00034] in <8fbafb724c144c9dad69bccfec38ae40>:0at System.Runtime.Serialization.FormatterServices.GetSerializableMembers (System.Type type, System.Runtime.Serialization.StreamingContext context) [0x0005e] in <8fbafb724c144c9dad69bccfec38ae40>:0
The text was updated successfully, but these errors were encountered: