concept checkpoint in category spark

appears as: checkpoints, checkpoint, A checkpoint, checkpoint
Spark in Action, Second Edition

This is an excerpt from Manning's book Spark in Action, Second Edition.

Spark will need a checkpoint directory to store its intermediate states and checkpoints (you’ll learn more about checkpoints in chapter 14). You can specify it here per output streams or globally at the SparkSession level by using SparkSession.conf.set ("spark.sql.streaming.checkpointLocation", . . . ) .

Checkpoints are another way to increase Spark performance. In this subsection, you’ll learn what checkpointing is, what kind of checkpointing you can perform, and how it differs from caching.

Figure 16.1 Visual representation of the transformation: if you do not put a cache or checkpoint after the filter, the filter will be recomputed every time.
...
1995 ... 1337
 
Processing times
Without cache ............... 3618 ms
With cache .................. 2559 ms
With checkpoint ............. 1860 ms
 With non-eager checkpoint ... 1420 ms
sitemap

Unable to load book!

The book could not be loaded.

(try again in a couple of minutes)

manning.com homepage
test yourself with a liveTest