You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
I would love that. Cause it seems to me, that the impact of not running the computations in parallel is the reason why models could underperform due to some partitions finishing last and having therefor the biggest impact on the final model. (https://www.youtube.com/watch?v=nNrdv45O3pE at 15:00) At least this is what i understood when listening to this interesting talk.
BTW: Should all partitions of the training data be of the same size? Are there any guarantuees on how close the model performance of this async training are to the ones of normal training or some other estimates that help me get a grasp on the impact of this execution model like "Given the same initial weights, would different model building processes with the same data lead to very different final weights just because of other proccesses running on the workers which might lead to partitions finishing in different orders?".
With spark 2.4.0, Barrier Executors were added to ensure tasks run at the same time. We should add this for training in SparkFlow.
The text was updated successfully, but these errors were encountered: