Apache Spark structured streaming and AWS EMR Memory issue

Externalize all logs into S3 bucket.
Reduce check point interval
If in driver program doing operation and keeping in memory plz change logic
http://spark.apache.org/docs/latest/monitoring.html#spark-configuration-options
/etc/spark/log4j.properties set ur log level
Event logs are kept by default on AWS EMR"hdfs:/var/log/spark/apps" set spark.eventLog.enabled to false
```
               spark.eventLog.enabled = false
```
Set spark.eventLog.enabled = false
Make sure any state you keep (using functions like mapWithState) does not grow in unbounded fashion , keep delete logs /tmp/spark
Disk usagePermalink , ou can run out of space on HDFS (thereby crasing your app) when you have a cluster up for a long time.

For example, logs under /var/log/spark may pile up, especially if you have loose logging settings and/or print a lot of stuff to STDOUT.

You can check your current disk usage using commands such as

      $ hadoop fs -df -h /

      $ hadoop fs -du -h /

Configuring RollingFileAppender and setting file location to YARN’s log directory will avoid disk overflow caused by large log files, and logs can be accessed using YARN’s log utility.

Apache Spark structured streaming and AWS EMR Memory issue

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!