Instructor Guidance for Datahub and the Data Science and Machine Learning Platform


Core Service Functionality


The Data Science and Machine Learning Platform (DSMLP) provides a standard set of features for instructional use, including web and command-line access to jupyter notebook servers, GPU access, student home directories and storage, and large dataset access. Learn more about DSMLP.

datahub.ucsd.edu is the web-based DSMLP platform.

For more information about the default course containers available for this platform, see "An overview of Standard Datahub/DSMLP Containers maintained by Educational Technology Services."

Scope of Support

Refer to Scope of Support & Guidelines for Usage

Critical Dates

Service Level Objectives and Roles/Responsibilities


IT Services' Responsibilities

Instructor/TA Responsibilities

Service Timeline


How-To's for Instructors and TAs


Add Custom Packages to a Standard Course Container

Please review TA Support for Building and Testing Custom Course Containers on DSMLP, and if necessary, request a consultation with IT Services staff via your course ticket or by emailing datahub@ucsd.edu.

Add TAs or Observers to a Datahub/DSMLP Course

  1. Add Teaching Assistants (TAs) and Other Users to a Canvas Course
  2. If the user is not a UC San Diego student, such as departmental staff and co-instructors, please email datahub@ucsd.edu (or follow up in your support.ucsd.edu course ticket) to request ITS staff add them to the datahub course roster.

Add Groups to a Datahub/DSMLP Course

How to use Teams in Canvas for DataHub/AWS Educate

Student Disk Space Quotas

All students have a 10GB disk quota.  This applied regardless of any groups they are also a member of.

Large Datasets Shared By Multiple Courses

There are several large datasets available for use with DSMLP/Datahub. To navigate to the directory containing the datasets, use the following command in a terminal: cd /datasets.

To request uploading a dataset to /datasets, please submit a ticket by emailing datahub@ucsd.edu.

Datasets Used By a Single Datahub/DSMLP Course

All courses have workspaces enabled, which allows the use of a shared course directory that can be accessed in the public directory in JupyterHub. By default, this is writeable by the grader account (e.g., grader-dsc102-01) and readable by everyone enrolled in the course, and can be used to store course datasets. You can use the chmod command as the grader user to change read/write permissions in the public directory so that only the grader can write to it:

chmod -R u+rwX,go+rXs,go-w ~/public

Note: if you unzip a file in this directory, the permissions from within the zip file will apply.  In general group (g) and others (o) should not have write access.

Naming assignments

"nbgrader" requires unique assignment names for any assignment across all instructor assignments.

Removing old "nbgrader" courses from "Assignments" dropdown

  1. Open a new Terminal
    1. New -> Terminal
  2. Navigate to the "nbgrader_cache" directory
    1. Enter command: "cd ~/.local/share/jupyter/nbgrader_cache"
  3. List all files in the directory
    1. Enter command: "ls -al"
  4. Delete the old directories

Service Resource Limitations


Resource exhaustion (insufficient GPUs/CPUs) can occur during periods of heavy usage. If eviction of a user's server is required, they will be notified 10 minutes before termination and instructed to save their data.

Scheduled Weekly Patching and Potential Downtime


Datahub/DSMLP Weekly Maintenance Window: Tuesdays, 6AM-8AM (Pacific)

The worldwide increase in ransomware and similar cybersecurity attacks has prompted UC San Diego to adopt strict policies regarding updating/patching of campus servers.

To minimize interruption to your students' work, we've designated Tuesdays from 6AM-8AM as a weekly maintenance period.

During this time, DataHub/DSMLP may operate at reduced capacity, and depending on the nature of the update, the cluster as a whole may be inaccessible.

Scheduling policies will try to run jobs on nodes that won't be impacted, but the risk remains that user jobs, i.e. Jupyter notebooks, may be terminated. It is important to periodically save state during long running jobs ("checkpointing") so they can be resumed when service is restored.

We recognizing the impact of any disruption during 10th week and Finals week, and will attempt to defer patching during these critical instructional periods unless the nature of the vulnerability requires immediate action.

Finally, as campus policy requires that "Critical" severity threads be addressed within 24 hours of discovery, we may be forced to patch outside of the Tuesday 6AM-8AM window. In such cases we will provide advance warning to instructors and TAs to the extent possible.

Troubleshooting


If you have further questions or concerns, submit a ticket or email us at datahub@ucsd.edu.