Core Service Functionality
The Data Science and Machine Learning Platform (DSMLP) provides a standard set of features for instructional use, including web and command-line access to jupyter notebook servers, GPU access, student home directories and storage, and large dataset access. Learn more about DSMLP.
datahub.ucsd.edu is the web-based DSMLP platform.
For more information about the default course containers available for this platform, see "An overview of Standard Datahub/DSMLP Containers maintained by Educational Technology Services."
Scope of Support
Refer to Scope of Support & Guidelines for Usage
- CINFO Instructional Technology Requests for use of Datahub or DSMLP for a course (including designation of a Technical Point of Contact where required) are due 4 weeks before start of term. As of Summer 2023, these now route to a request form in support.ucsd.edu.
- Requests for 1:1 Consultations for setup issues for Course-Specific/Customized environments: 2 weeks before start of term. Requests for such consultations that arrive between two weeks before start of term and end of second week of term may be triaged behind other needs.
- Late requests for Datahub/DSMLP use may be accommodated depending on request load, but provisioning will not be delivered prior to fourth week of the term.
Service Level Objectives and Roles/Responsibilities
IT Services' Responsibilities
- Provide reliable system operations and core service functionality/restoration
- Enroll students (including waitlisted students) in courses
- Aim to resolve individual user issues within 1-2 business days
- Incidents affecting multiple courses, many users, or at critical points in the quarter (e.g. exams) will have a target response time of 30 minutes from first contact to Service Desk, and a target resolution time of 8 hours
- Assist instructors/TAs with incident resolution and minor configuration change requests
- Maintain our online FAQ and additional Knowledge Base documentation
- Adding TAs and Observers to courses (see How-To below). Formerly-waitlisted students who have been removed can be re-added as an Observer.
- Provide first-tier support for students with Jupyter notebook usage questions
- Basic knowledge of jupyter notebooks and nbgrader/formgrader
- Request course resources via an Instructional Technology Request (CINFO)
- On page 2 of the form (Web & Cloud Resources), select "Datahub" for the web-based version of the platform, "Data Science / Machine Learning GPU Cluster" for the command-line version, or both.
- In the comments, indicate:
- Which standard containers are needed for the course, or whether your course requires additional/custom software packages ("custom containers")
- Important course dates (e.g., course setup required by; midterm; final project due dates)
- Typical assignment due dates (e.g., every Friday at midnight)
- For courses requiring custom containers, please review TA Support for Building and Testing Custom Course Containers on DSMLP
- Provide timely communication of incidents/requests via the course's support ticket, or by:
- Knowledge of UC sensitive data policies
- 2-4 weeks prior to term start:
- Instructor visits the Instructional Technology Request (CINFO) page (https://cinfo.ucsd.edu/), which routes to a support.ucsd.edu request form with your selected course and term code. See above for details.
- If necessary, instructor requests meetings with ITS staff to discuss any non-standard requirements (such as custom containers) and/or to assist TAs with nbgrader/formgrader setup
- Early access granted for instructors/TAs to dsmlp-login.ucsd.edu and/or https://datahub.ucsd.edu. Student-test accounts available by request.
- Each course receives a shared nbgrader TA grader account for access to the formgrader UI in https://datahub.ucsd.edu, used for assignment creation/distrubution/collection and grading. nbgrader will only work with the shared grader account, other accounts such as the instructor or TA accounts, can not be used with nbgrader.
- Week 1:
- Students enrolled in course are granted ssh access to dsmlp-login.ucsd.edu and/or https://datahub.ucsd.edu
- IT Service Hub ticket for course is open for instructor and TA communication with IT Services, including new software package requests
- Week 1-10:
- nbgrader/formgrader student roster automatically updated based on course enrollment
- 11 days after the last day of class:
- IT Service Hub ticket closed
- 45 days after the last day of class:
- Shared nbgrader TA and TA access to course environment is removed
- 90 days after the last day of class:
- Instructor and student access to course environment is removed. Instructors are still able to access platform with a generic environment.
How-To's for Instructors and TAs
Add Custom Packages to a Standard Course Container
Please review TA Support for Building and Testing Custom Course Containers on DSMLP, and if necessary, request a consultation with IT Services staff via your course ticket or by emailing email@example.com.
Add TAs or Observers to a Datahub/DSMLP Course
- Add Teaching Assistants (TAs) and Other Users to a Canvas Course
- If the user is not a UC San Diego student, such as departmental staff and co-instructors, please email firstname.lastname@example.org (or follow up in your support.ucsd.edu course ticket) to request ITS staff add them to the datahub course roster.
Add Groups to a Datahub/DSMLP Course
How to use Teams in Canvas for DataHub/AWS Educate
Student Disk Space Quotas
All students have a 10GB disk quota. This applied regardless of any groups they are also a member of.
Large Datasets Shared By Multiple Courses
There are several large datasets available for use with DSMLP/Datahub. To navigate to the directory containing the datasets, use the following command in a terminal:
To request uploading a dataset to /datasets, please submit a ticket by emailing email@example.com.
Datasets Used By a Single Datahub/DSMLP Course
All courses have workspaces enabled, which allows the use of a shared course directory that can be accessed in the public directory in JupyterHub. By default, this is writeable by the grader account (e.g., grader-dsc102-01) and readable by everyone enrolled in the course, and can be used to store course datasets. You can use the chmod command as the grader user to change read/write permissions in the public directory so that only the grader can write to it:
chmod -R u+rwX,go+rXs,go-w ~/public
Note: if you unzip a file in this directory, the permissions from within the zip file will apply. In general group (g) and others (o) should not have write access.
"nbgrader" requires unique assignment names for any assignment across all instructor assignments.
Removing old "nbgrader" courses from "Assignments" dropdown
- Open a new Terminal
- New -> Terminal
- Navigate to the "nbgrader_cache" directory
- Enter command: "cd ~/.local/share/jupyter/nbgrader_cache"
- List all files in the directory
- Enter command: "ls -al"
- Delete the old directories
Service Resource Limitations
Resource exhaustion (insufficient GPUs/CPUs) can occur during periods of heavy usage. If eviction of a user's server is required, they will be notified 10 minutes before termination and instructed to save their data.
Scheduled Weekly Patching and Potential Downtime
Datahub/DSMLP Weekly Maintenance Window: Tuesdays, 6AM-8AM (Pacific)
The worldwide increase in ransomware and similar cybersecurity attacks has prompted UC San Diego to adopt strict policies regarding updating/patching of campus servers.
To minimize interruption to your students' work, we've designated Tuesdays from 6AM-8AM as a weekly maintenance period.
During this time, DataHub/DSMLP may operate at reduced capacity, and depending on the nature of the update, the cluster as a whole may be inaccessible.
Scheduling policies will try to run jobs on nodes that won't be impacted, but the risk remains that user jobs, i.e. Jupyter notebooks, may be terminated. It is important to periodically save state during long running jobs ("checkpointing") so they can be resumed when service is restored.
We recognizing the impact of any disruption during 10th week and Finals week, and will attempt to defer patching during these critical instructional periods unless the nature of the vulnerability requires immediate action.
Finally, as campus policy requires that "Critical" severity threads be addressed within 24 hours of discovery, we may be forced to patch outside of the Tuesday 6AM-8AM window. In such cases we will provide advance warning to instructors and TAs to the extent possible.
- Student's Notebook Failed to Validate/Cannot autograde assignment/Metadata Corrupted
- This occurs if a read-only cell/an autograded cell has been copied. To resolve this problem, your student will need to:
- Back up their assignment: Rename the existing notebook by adding '-corrupted' to the end of the notebook filename, and download a copy to their computer in case they need to start over.
- Re-fetch their notebook from the 'Assignments' tab. They will now see both the fresh copy and the corrupted copy in their Files.
- Open both copies of the notebook, and copy their answers from the corrupted notebook to the fresh copy.
- Re-submit their assignment.
- You re-collect the assignment in Formgrader.
- Nbgrader export failed
- This can happen if an assignment is created with a space in the title, or if an assignment is deleted through the filesystem rather than the "nbgrader db assignment remove" command.
- Delete the assignment and rerun the command.
- Not Enough GPUs
- Your students may need to wait some time (on average 5-10 minutes) before a GPU is available for scheduling. If so, make sure they are not running their launch scripts with the "-f" flag, as that will kill the containers after the given command is executed, instead of launching the container's shell.
If you have further questions or concerns, submit a ticket or email us at firstname.lastname@example.org.