Sk Kamaruddin February 2016

memory error in standalone spark cluster as "shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Remote]"

I got the following memory error in my standalone spark cluster, after 140 iterations of my code. How shall I run my code without memory fault?

I am having 7 nodes with 8GB RAM out of which 6GB is allocated to all the workers. The master is also having 8GB RAM.

[error] application - Remote calculator (Actor[akka.tcp://Remote@127.0.0.1:44545/remote/akka.tcp/NotebookServer@127.0.0.1:50778/user/$c/$a#872469007]) has been terminated !!!!!
[info] application - View notebook 'kamaruddin/PSOAANN_BreastCancer_optimized.snb', presentation: 'None'
[info] application - Closing websockets for kernel 6c8e8090-cbeb-430e-9d45-5710ce60b984
Uncaught error from thread [Remote-akka.actor.default-dispatcher-6] shutting down JVM since 'akka.jvm-exit-on-fatal-error' is enabled for ActorSystem[Remote]
Exception in thread "Thread-36" java.lang.OutOfMemoryError: GC overhead limit exceeded
    at java.util.jar.Attributes.read(Attributes.java:394)
    at java.util.jar.Manifest.read(Manifest.java:199)
    at java.util.jar.Manifest.<init>(Manifest.java:69)
    at java.util.jar.JarFile.getManifestFromReference(JarFile.java:186)
    at java.util.jar.JarFile.getManifest(JarFile.java:167)
    at sun.misc.URLClassPath$JarLoader$2.getManifest(URLClassPath.java:779)
    at java.net.URLClassLoader.defineClass(URLClassLoader.java:416)
    at java.net.URLClassLoader.access$100(URLClassLoader.java:71)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:361)
    at java.net.URLClassLoader$1.run(URLClassLoader.java:355)
    at java.security.AccessController.doPrivileged(Native Method)
    at java.net.URLClassLoader.findClass(URLClassLoader.java:354)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:425)
    at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:308)
    at java.lang.ClassLoader.loadClass(ClassLoader.java:358)
    at org.apache.spark.repl.SparkIMain$ReadEvalPrint.bindError(SparkIMain.scala:1041)
    at org.apache.spark.repl.SparkIMain$Request.        

Answers


Hlib February 2016

Maybe you can try to use checkpointing.

Data checkpointing - Saving of the generated RDDs to reliable storage. This is necessary in some stateful transformations that combine data across multiple batches. In such transformations, the generated RDDs depend on RDDs of previous batches, which causes the length of the dependency chain to keep increasing with time. To avoid such unbounded increases in recovery time (proportional to dependency chain), intermediate RDDs of stateful transformations are periodically checkpointed to reliable storage (e.g. HDFS) to cut off the dependency chain

Post Status

Asked in February 2016
Viewed 1,720 times
Voted 14
Answered 1 times

Search




Leave an answer