TA Support for Building and Testing Custom Course Containers on DSMLP


Overview


This document describes how to test and maintain custom Docker images for courses on the Data Science/Machine Learning Platform (DSMLP). Information about standard DSMLP Docker images is also available.

For more information about DSMLP, see the FAQ.

Development


Please see https://github.com/ucsd-ets/datahub-example-notebook/blob/master/additional-information.md for Dockerfile best practices

Prerequisites

To have a custom Docker image on DSMLP, you'll have to request it when submitting an Instructional Technology Request (CINFO; cinfo.ucsd.edu) for your course. If you already submitted your request, update the support.ucsd.edu Service Now (SNOW) ticket that the CINFO request automatically created with the specifics. Please include the following:

The DSMLP team will configure a GitHub repository with a Dockerfile and a build process for automatically updating the image upon a git push to the repo. We typically create git branches to correspond to specific Docker tags. For example, pushing changes to the wi24 branch on Github will update the {docker_image_name}:wi24 image.

Updating Base Image

New base images are released periodically. To update your Dockerfile to use a new base image, modify the FROM line. For example, if you're switching from the Spring 2023 datascience image (2023.2-stable) to the Fall 2024 image (2024.4-stable), update the FROM line to:

FROM ghcr.io/ucsd-ets/datascience-notebook:2024.4-stable

If you require support for CUDA or RStudio, you can use one of our other base images. For instance, to use the Fall 2024 scipy-ml image (2024.4-stable), update the FROM line to: 

FROM ghcr.io/ucsd-ets/scipy-ml-notebook:2024.4-stable

Similarly, for the Fall 2024 RStudio image (2024.4-stable), use:

FROM ghcr.io/ucsd-ets/rstudio-notebook:2024.4-stable

Creating a tag

A new version of the image is built and tagged each time the git repository is updated. The tag is the name of the branch, e.g. main, The previous tag main is overwritten.

To preserve a version of an image you can create a tag. For example, you can create a tag named "fa24" for the version of the image used in fall 2024. You can then request that the course uses the "fa24" tag.

Branches

  1. Create a new git branch within the Github repo. You can call it dev or test, anything that indicates it's in a test phase. See this document for managing branches: https://docs.github.com/en/desktop/contributing-and-collaborating-using-github-desktop/making-changes-in-a-branch/managing-branches
  2. Add the required software and files for your course. For basic installation instructions, see this document: https://github.com/ucsd-ets/datahub-example-notebook#step-1-customize-the-dockerfile
  3. After committing your changes to the test branch, the build process will kick off. It will eventually publish your new image to your GitHub repository's "Packages" section.

Screenshot: Location of Github Packages Section from Repo

Note: If step 3 didn't complete–i.e., the image didn't build–look through the build logs (located under the “Actions” tab) and investigate why it didn't build correctly. Make the necessary changes and commit your changes to the branch to start a new automated build process.

Testing

  1. Connect to the UCSD VPN. Open a terminal on your computer and SSH to dsmlp-login:
    Screenshot: Example ssh command
  2. Use the launch.sh command to test your image. You'll have to supply the -i image:tag flag with your myimage:tag, the -P Always flag, and the -W COURSE_ID flag. The -i flag specifies the image, the -P specifies the image pull policy. Always will always pull the image. -W specifies the workspace.

    Example
    launch.sh -i myimage:tag -P Always -W COURSE_ID

    To get COURSE_ID please run "workspace --list"

  3. This script will generate a URL.  Open it in your browser, and use this notebook environment to test out your custom features.
  4. If everything works as expected, you can now create a pull request to the tag for your course on DSMLP. For example, if my course uses cogs101-notebook:wi24 and I confirmed that cogs101-notebook:test works, I'll create a pull request to the wi24 branch.
  5. Have any team member review your work and close the pull request. The build process should now start and rebuild your course image. Check the logs to ensure that the production image builds correctly.
  6. Test your newly built container on datahub.ucsd.edu by launching it from the “Launch your Environment” spawn page.

If you still have questions or need additional assistance, please email datahub@ucsd.edu or visit support.ucsd.edu.