Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

Data bricks cluster installs all the packages every time I start it

I have been working on Databricks notebook using Python/ R. Once job is done we need to terminate the cluster to save cost involved. ( As we are utilizing the machine).

So we also have to start the cluster if we want to work on any notebook. I have seen it takes a lot of time and install the packages again in the cluster. Is there any way to avoid installation everytime we start cluster?

enter image description here

like image 439
Arpit Sisodia Avatar asked Oct 27 '25 09:10

Arpit Sisodia


1 Answers

Update: Databricks now allows custom docker containers.

Unfortunately not.

When you terminate a cluster its memory state is lost, so when you start it again it comes with a clean image. Even if you add the desired packages into an init script they will have to be installed each initialization.

You may ask Databricks support to check if it is possible to create a custom cluster image for you.

like image 130
Henrique Florencio Avatar answered Oct 29 '25 08:10

Henrique Florencio



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!