Could you help me with instructions on how to set the checkpoint dir for a PySpark session on IBM's Data Science Experience?.
The need came because i have to run connectedComponents() from GraphFrames and it raises the following error
Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir().
The main issue is to get the directory that the notebook has as working directory to set the checkpoit dir with sc.setCheckpointDir(). this can be done easily with
!pwd
Then, a directory for checkpoints should be created on that route
!mkdir <pwd_output>/checkpoints
Finally set the checkpoint
spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With