Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

how to set checkpiont dir PySpark Data Science Experience

Could you help me with instructions on how to set the checkpoint dir for a PySpark session on IBM's Data Science Experience?.

The need came because i have to run connectedComponents() from GraphFrames and it raises the following error

Py4JJavaError: An error occurred while calling o221.run.
: java.io.IOException: Checkpoint directory is not set. Please set it first using sc.setCheckpointDir(). 
like image 484
ElBrocas Avatar asked Oct 23 '25 19:10

ElBrocas


1 Answers

The main issue is to get the directory that the notebook has as working directory to set the checkpoit dir with sc.setCheckpointDir(). this can be done easily with

!pwd

Then, a directory for checkpoints should be created on that route

!mkdir <pwd_output>/checkpoints

Finally set the checkpoint

spark.sparkContext.setCheckpointDir('<pwd_output>/checkpoints')
like image 182
ElBrocas Avatar answered Oct 26 '25 03:10

ElBrocas