Skip to content

Commit 4c262b7

Browse files
committed
initial image/notebook comments
1 parent bb80104 commit 4c262b7

File tree

1 file changed

+56
-1
lines changed

1 file changed

+56
-1
lines changed

modules/tutorials/pages/jupyterhub.adoc

Lines changed: 56 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -1,9 +1,10 @@
11
= JupyterHub
22
:description: A tutorial on how to configure various aspects of JupyterHub on Kubernetes.
3-
:keywords: notebook, JupyterHub, Kubernetes, k8s, Spark, HDFS, S3
3+
:keywords: notebook, JupyterHub, Kubernetes, k8s, Apache Spark, HDFS, S3
44

55
This tutorial illustrates various scenarios and configuration options when using JupyterHub on Kubernetes.
66
The custom resources and configuration settings that are discussed here are based on the JupyterHub-Keycloak demo, so you may find it helpful to have that demo running to reference things as you read through this tutorial.
7+
The example notebook is used to demonstrate simple read/write interactions with an S3 storage backend using Apache Spark.
78

89
== Keycloak
910

@@ -426,6 +427,8 @@ This script instructs JupyterHub to use `KubeSpawner` to create a service refere
426427
427428
The `singleuser.profileList` section of the Helm chart values allows us to define notebook profiles by setting the CPU, Memory and Image combinations that can be selected. For instance, the profiles below allows to select 2/4/... CPUs, 4/8/... GB RAM and between two images.
428429
430+
[source,yaml]
431+
----
429432
singleuser:
430433
...
431434
profileList:
@@ -472,15 +475,67 @@ The `singleuser.profileList` section of the Helm chart values allows us to defin
472475
display_name: "quay.io/jupyter/pyspark-notebook:spark-3.5.2"
473476
kubespawner_override:
474477
image: "quay.io/jupyter/pyspark-notebook:spark-3.5.2"
478+
----
475479
476480
These options are then displayed as drop-down lists for the user once logged in:
477481
478482
image::jupyterhub/server-options.png[Server options]
479483
480484
== Images
481485
486+
The demo uses the following images:
487+
488+
* Notebook images
489+
** `quay.io/jupyter/pyspark-notebook:spark-3.5.2`
490+
** `quay.io/jupyter/pyspark-notebook:python-3.11.9`
491+
* Spark image
492+
** `oci.stackable.tech/sandbox/spark:3.5.2-python311` (custom image adding python 3.11, built on `spark:3.5.2-scala2.12-java17-ubuntu`)
493+
494+
.Dockerfile for the custom image
495+
[%collapsible]
496+
====
497+
[source, dockerfile]
498+
----
499+
FROM spark:3.5.2-scala2.12-java17-ubuntu
500+
501+
USER root
502+
503+
RUN set -ex; \
504+
apt-get update; \
505+
# Install dependencies for Python 3.11
506+
apt-get install -y \
507+
software-properties-common \
508+
&& apt-get update && apt-get install -y \
509+
python3.11 \
510+
python3.11-venv \
511+
python3.11-dev \
512+
&& rm -rf /var/lib/apt/lists/*; \
513+
# Install pip manually for Python 3.11
514+
curl https://bootstrap.pypa.io/get-pip.py -o get-pip.py && \
515+
python3.11 get-pip.py && \
516+
rm get-pip.py
517+
518+
# Make Python 3.11 the default Python version
519+
RUN update-alternatives --install /usr/bin/python3 python3 /usr/bin/python3.11 1 \
520+
&& update-alternatives --install /usr/bin/pip pip /usr/local/bin/pip3 1
521+
522+
USER spark
523+
----
524+
====
525+
526+
NOTE: The example notebook in the demo will start a distributed Spark cluster, whereby the notebook acts as the driver which spawns a number of executors.
527+
The driver uses the user-specific driver service (see above) to pass job dependencies to each executor.
528+
The Spark versions of these dependencies must be the same, or else serialization errors can occur.
529+
This is increasingly likely in cases where Java or Scala classes do not have a specified `serialVersionUID`, in which case one will be calculated at runtime based on the contents of each class (method signatures etc.): if the contents of these class files have been changed, then the UID may differ between driver and executor.
530+
To avoid this, care needs to be taken to use images for the notebook and the Spark job that are using a common Spark build.
531+
482532
== Example Notebook
483533
484534
=== Provisos
485535
536+
WARNING: When running a distributed Spark cluster from within a JupyterHub notebook, the notebook acts as the driver and requests executors Pods from k8s.
537+
These Pods in turn can mount *all* volumes and Secrets in that namespace.
538+
To prevent this from breaking user separation, it is planned to use an OPA gatekeeper to define OPA rules that restrict what the created executor Pods can mount. This is not yet implemented in the demo nor reflected in this tutorial.
539+
486540
=== Overview
541+

0 commit comments

Comments
 (0)