Instructor Guidance for Datahub and the Data Science and Machine Learning Platform


Core Service Functionality


The Data Science and Machine Learning Platform (DSMLP) provides a standard set of features for instructional use, including web and command-line access to jupyter notebook servers, GPU access, student home directories and storage, and large dataset access. Learn more about DSMLP.

datahub.ucsd.edu is the web-based DSMLP platform.

For more information about the default and custom containers available, see "An overview of Standard Datahub/DSMLP Containers maintained by Educational Technology Services."

Service Level Objectives and Roles/Responsibilities


IT Services’ Responsibilities

Instructor/TA Responsibilities


Service Timeline


How-To's for Instructors and TAs


Note: Non-students, such as departmental staff and co-instructors, may require additional steps to add them to a course.  After following the steps below, please contact datahub@ucsd.edu so ITS staff can perform these steps. 

Add TAs or Observers to Course

  1. Log into Canvas https://canvas.ucsd.edu/
  2. Click Courses on left nav bar
  3. Select course by clicking link
  4. Click Add TAs on right nav bar
  5. Click Access Level dropdown
  6. Select Teaching Assistant
  7. Enter user's email address to add
  8. Click Add Teaching Assistant or Add Observer
    1. Note: if observer is a UC San Diego Extension student, they will need to fill out the Concurrent Enrollment Account form first

Canvas Add TA Tool

Add Groups to Course

Note: if you need to add a student to more than one group, please request this via your course ticket in support.ucsd.edu, or email awsed@ucsd.edu

  1. Log into Canvas https://canvas.ucsd.edu and navigate to the course
  2. Click People on the left nav bar
  3. Click +Group Set to create a group set, e.g. Group Projects

    Screenshot: Page for creating a new group set

  4. Click the newly create group set
  5. Click +Group to create a group, e.g. Project 1

    Screenshot: Canvas page to add group to a project

  6. Drag students into the group

    Screenshot: Canvas menu to add students to a group in a project

  7. Repeat for each student and group


Accessing Datasets

There are several large datasets available for use with DSMLP/Datahub. To navigate to the directory containing the datasets, use the following command in a terminal: cd /datasets.


Request a Shared Course Directory

To request a shared course directory for your course, please include this in the comments of your Instructional Technology Request (CINFO; see below) or follow up in the support.ucsd.edu ticket that is created for your course when the CINFO is submitted.


Set up the "nb2canvas" Tool (LTI) in your Canvas course to upload datahub assignments/submissions

More information: How To: Configure and Use nb2canvas

For Winter 2022 courses, please email datahub@ucsd.edu and we will set this up for you.  We plan to allow instructors to directly install this tool in their Canvas course for Spring 2022 term.


Naming assignments

"nbgrader" requires unique assignment names for any assignment across all instructor assignments.


Removing old "nbgrader" courses from "Assignments" dropdown

  1. Open a new Terminal
    1. New -> Terminal
  2. Navigate to the "nbgrader_cache" directory
    1. Enter command: "cd ~/.local/share/jupyter/nbgrader_cache"
  3. List all files in the directory
    1. Enter command: "ls -al"
  4. Delete the old directories


Service Resource Limitations


Resource exhaustion (insufficient GPUs/CPUs) can occur during periods of heavy usage. If eviction of a user’s server is required, they will be notified 10 minutes before termination and instructed to save their data.


Scheduled Weekly Patching and Potential Downtime


Datahub/DSMLP Weekly Maintenance Window: Tuesdays, 6AM-8AM (Pacific)

The worldwide increase in ransomware and similar cybersecurity attacks has prompted UC San Diego to adopt strict policies regarding updating/patching of campus servers.

To minimize interruption to your students' work, we've designated Tuesdays from 6AM-8AM as a weekly maintenance period.

During this time, DataHub/DSMLP may operate at reduced capacity, and depending on the nature of the update, the cluster as a whole may be inaccessible.

Scheduling policies will try to run jobs on nodes that won’t be impacted, but the risk remains that user jobs, i.e. Jupyter notebooks, may be terminated. It is important to periodically save state during long running jobs ("checkpointing") so they can be resumed when service is restored.

We recognizing the impact of any disruption during 10th week and Finals week, and will attempt to defer patching during these critical instructional periods unless the nature of the vulnerability requires immediate action.

Finally, as campus policy requires that "Critical" severity threads be addressed within 24 hours of discovery, we may be forced to patch outside of the Tuesday 6AM-8AM window. In such cases we will provide advance warning to instructors and TAs to the extent possible.


Troubleshooting


If you have further questions or concerns, submit a ticket or email us at datahub@ucsd.edu.