Logo Questions Linux Laravel Mysql Ubuntu Git Menu
 

How to install Jupyter notebook on google Dataproc

I have already created the 3 node cluster on dataproc.

Now I dont want to delet the cluster and recreate with initialization actions for jupyter installation.

Is anyone can tell me that how to install the jupyter on existing dataproc cluster ?

-Revan

like image 947
Revan Avatar asked Feb 03 '26 11:02

Revan


1 Answers

Step 1: Get a Cloud Dataproc cluster up and running

In this step, you'll create a Cloud Dataproc cluster named "datascience" with Jupyter notebooks initialized and running using the command line. (Note: Please do not use Cloud Shell as you will not be able to create a socket connection from it in Step 2.)

The simplest approach is to use all default settings for your cluster. Jupyter will run on port 8123 of your master node. If you don't have defaults set, you'll be prompted at this stage to enter a zone for the cluster. As you'll be connecting to the UI on the cluster, choose zones in a region close to you.

gcloud dataproc clusters create datascience \
--initialization-actions \
    gs://dataproc-initialization-actions/jupyter/jupyter.sh \


Waiting on operation [projects/------/regions/global/operations/XXX-XXX-XXX-XXX-XXX].
Waiting for cluster creation operation...done.                                                                                                                     
Created tw[https://dataproc.googleapis.com/v1/projects/------/regions/global/clusters/datascience].

(If you prefer using a graphical user interface, then the same action can be taken by following these instructions.)

Once completed, your Cloud Dataproc cluster is up and running and ready for a connection.

For the next step, you'll need to know the hostname of your Cloud Dataproc master machine as well as the zone in which your instance was created. To determine that zone, run the following command in your terminal:

gcloud dataproc clusters list

Output:

    NAME      WORKER_COUNT  STATUS  ZONE
datascience 2     RUNNING europe-west1-c

The cluster master-host-name is the name of your Cloud Dataproc cluster followed by an -m suffix. For example, if your cluster is named "my-cluster", the master-host-name would be "my-cluster-m".

Step 2: Connect to the Jupyter notebook

You'll use an ssh tunnel from your local machine to the server to connect to the notebook. Depending on your machine’s networking setup, this step can take a little while to get right, so before proceeding confirm that everything is working by accessing the YARN UI. From the browser that you launched when following the instructions in the cluster-web-interfaces cloud documentation, access the following URL.

http://datascience-m:8088/

Once you have the tunnel running, connect to the external IP of the notebook and port. The default port is 8123.

http://datascience-m:8123

For More Details Follow this google post. CLICK ME

enjoy.

like image 57
Ritul Lakhtariya Avatar answered Feb 06 '26 13:02

Ritul Lakhtariya



Donate For Us

If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!