How to Select and Configure Your Container on the Data Science/Machine Learning Platform (DSMLP)


Launch Script Overview


Please review "Launching Containers From the Command Line" on how to access and launch containers.  Launch scripts include:

Launch Script Command Line Options


Defaults set within launch scripts' environment variables may be overridden using the following command-line options:

Option

Description

Example

-h 

List all command line options

-h

-c <# CPU>

Adjust # CPU cores

-c 8

-g <# GPU>

Adjust # GPU cards

-g 2

-m <GB RAM>

Adjust # GB RAM

-m 64

-i <IMAGE>

Docker image name

-i nvidia/cuda:latest 
(see below)

-e <COMMAND>

Docker image ENTRYPOINT/CMD. 
Review the Dockerfile for the name of the launch script. 

-e /setup.sh

-n <NODE>

Request specific cluster node (1-10)

-n 7

-v

Request specific GPU (gtx1080ti,k5200,titan)

-v k5200

-b

Request background pod (implies -J)

(see below)

-j

launch Jupyter notebook server within container (default)

 

-J

Inhibit launch of Jupyter notebook server

 

-W <COURSE>

Run in course-specific workspace directory

-W DSC10_FA22_A00

-G <GROUP ID>

Launch with /teams folder

-G 100001234

-s 

Launch only CLI shell; do not launch Jupyter notebook server

 

-S

Do not launch container CLI shell

 

-q

Quiet mode - suppress informational messages during container launch.

 

-f  <COMMAND>

Execute command within container, dump job output to stdout; implies -S (no shell), -J (no Jupyter). 

If the command is a shell script, any file paths in the script must be relative to the root directory inside the container (not your dsmlp-login home directory).

-f ./myscript.sh
OR
-f ./private/myscript.sh

-H

Launch sshd within container for use with ProxyCommand (see VS Code documentation)

 

-P <POLICY>

Specify image pull policy (ifnotpresent|always|never) assigned to container

-P Always

-N <name>

Specify alternate Pod name

-N mypod

-d

Dump Kubernetes Pod spec (JSON) - do not execute

 

--           

End processing of command line; remaining arguments are passed to container

-- /mycommand.sh arg1

Example:

launch-scipy-ml.sh -g 1 -m 64 -W DSC10_FA22_A00 -v k5200

Adjusting CPU/GPU/RAM limits


The maximum limits (8 CPU, 64GB, 1 GPU) apply to all of your running containers:  

Increases to GPU allocations require consent of TA, instructor or advisor.  

Background Execution / Long-Running Jobs


To support longer training runs, we permit background execution of student containers, up to 12 hours execution time, via the "-b" command line option.  

Use the ‘kubesh <pod-name>’ command to connect or reconnect to a background container, and ‘kubectl delete pod <pod-name>’ to terminate.

Please be considerate and terminate idle containers:  while containers share system RAM and CPU resources under the standard Linux/Unix model, the cluster’s 80 GPU cards are assigned to users on an exclusive basis. When attached to a container they become unusable by others even if completely idle.

Creating and Specifying a Custom Docker Image


If you need to create a custom container from one of the standard containers, please see Instructions on Building a Custom Image.  Typically these are used for entire courses using the DSMLP platform, or if individual users need to install operating system-level packages (e.g. 'apt-get install', 'yum').  To create and use a custom image: 

  1. Select any standard image (ucsdets/datahub-base-notebook, ucsdets/datascience-notebook, etc.) to base your image from
  2. Follow this guide to create, build and host a docker image.  As of June 2021 we recommend github actions. 
  3. SSH to either dsmlp-login.ucsd.edu 
  4. Run the command launch.sh -i {MY_DOCKER_ACCOUNT/REPO:TAG} to launch your container.  More information: Launching Containers from the Command Line

It may take some time to run your container as it has to download onto the DSMLP servers. Run the command:  kubectl describe pod -n {USERNAME} to see the state of your container

Adding Custom Python Packages to Your Container


To customize your environment within your existing container, please see: How To: Customize your environment in DSMLP/Datahub (including jupyter notebooks)

Original version of this guide (warning: may be outdated).

For more information about datahub.ucsd.edu, check out the FAQ.

Your instructor or TA will be your best resource for course-specific questions.

If you still have questions or need additional assistance, please email dsmlp@ucsd.edu or visit support.ucsd.edu.