UC San Diego's Datahub and Data Science & Machine Learning Platform (DSMLP provides thousands of undergraduates, graduate students, and their instructors with access to Jupyter, RStudio, and other advanced computational resources for scheduled classes, formal independent study, and student projects.
The platforms are used in courses ranging from introductory Data Science lectures, to graduate-level applied machine-learning, to curricula in Biology, Music, Social Sciences, and Public Health. Over AY 2022-23, more than 120 classes, 75 instructors, and 14,000 student enrollments were hosted.
This article describes the instructional request process and timeline for Datahub/DSMLP; the curated Standard Software Images and how to customize them; options for support & technical consulting; and finally, a few pertinent caveats and conditions including important limitations on use of sensitive/protected data.
Please reach out to our team with any questions or feedback regarding these guidelines or Datahub/DSMLP as a whole. We look forward to working with you!
Instructors, TAs, and departmental staff may request Datahub/DSMLP access via the Instructional Technology Request (CINFO) website beginning the 2nd week of the previous term. (For Summer/Fall classes, submission opens the 2nd week of Spring.). This page now routes to a form in support.ucsd.edu with your selected course and term codes.
Requests must be submitted 4 weeks prior to the start of instruction; this includes designation of a Technical Point of Contact when required (see below). Late requests are given lower priority and will be reviewed as time permits. Setup and ultimate availability to students will depend upon individual complexity and overall request load; students may not gain access until as late as 4th week.
Course setup and instructor access opens 4-5 weeks prior to the start of instruction. The instructor, TAs, and other course staff should begin testing features, validating assignments, and making desired customizations. Earlier setup is available upon request. Note that 1:1 Consulting support for customization (discussed below) is limited in the final weeks of each term.
Student access follows TritonLink course rosters, and opens one business day prior to the start of instruction. Subsequent add/drop activity is reflected in Datahub by 10am the following business day. Non-roster auditor and observer access is managed through Canvas; Concurrent Enrollment students receive access by way of Extended Studies staff.
Instructors and students retain access for one additional quarter beyond the instructional term(s), ignoring Summer for Spring courses. Instructors may request individual accounts remain active longer to facilitate Incomplete grade resolution, academic integrity proceedings, course development or hand-off, or similar circumstances.
Course environments are purged one quarter after account deactivation, i.e. after two quarters excluding Summer, except for: individual accounts extended as above; and instructor/course-wide files which can be archived for up to 3 years upon request. Please contact ITS to revive a previously archived class environment, or to make an archive available for download. (Note: large datasets cannot be archived due to storage limitations; contact ITS to discuss alternate options.)
For-credit Independent study, capstones, and similar student projects may request ongoing access via the Independent Study Request form. Research IT can help connect faculty, staff, and student researchers with compute platforms for non-credit research.
Key Dates & Events
Requests may be submitted
Setup & instructor testing begins
Request submission deadline
Students receive access
Student & instructor access ends
Course files purged or archived
Datahub offers a set of curated software environments which are sufficient for many courses and use cases. Questions, configuration assistance, malfunctions, and errors relating to these standard features receive priority support from IT Services via a number of channels (see below):
Instructors may supplement standard software images by through addition of language modules (e.g. Python or R libraries), new system-level packages (compilers, utilities, etc.), or more extensive modifications to meet the specific needs of their course.
As experts in their curriculum and subject area, the instructor and/or a designated Technical Point of Contact ("TPOC") must take the lead in the installation, configuration, and student use of the new course-specific features. ITS technical staff provide support for customization via 1:1 Consultation (see below).
The instructor and/or TPOC is expected to have basic familiarity with Unix commands such as 'ssh', 'mkdir' and 'chmod' and should be proficient using the intended core platform (e.g. Python or R) in a desktop Mac/PC setting. ITS will provide guidance and basic training regarding the Datahub/DSMLP environment and any system-specific procedures.
Begin customization well in advance to ensure adequate time for testing and support from ITS. Note that 1:1 Consultation availability is limited in the final weeks of each term.
Highly complex use cases (e.g. those incorporating clustered services such as Spark or Postgres, or software not deriving from our Standard Images) are regularly utilized within Datahub/DSMLP, but will require substantially more time and expertise from the instructor/TPOC than ordinary customizations. See "Complex Customization & Experimental Features" below for guidance.
IT Services provides prioritized support for standard features and functionality via several routes:
Online Documentation offers general information about Datahub and usage of the platform's various features, as well as documentation for known issues, workarounds, and limitations of the environment.
The IT Service Desk can be reached by Phone, Web, or Email for assistance with:
1:1 Consultation appointments connect instructors, TAs, and Technical Points of Contact with ITS technical experts for real-time guidance or assistance on standard features, course-specific customizations, and complex/experimental features.
Possible topics include:
Note that 1:1 Consultation availability is reduced in the final weeks of each term, and staffing levels limit courses to a maximum number of hours per term (see "Caveats and Limitations" below.)
The compute clusters underpinning Datahub and DSMLP are capable of hosting complex or novel customizations which fall outside ITS' normal bounds of support for a variety of reasons, e.g. installing or integrating them may exercise untested or seldomly-used features of Kubernetes, Docker, or Linux, or student use of the new features may require sophisticated technical expertise or hand-holding.
Examples of available capabilities considered complex or experimental include:
Incorporation of these or similar features into coursework will require the instructor and/or TPOC to invest significant time ahead of and during instruction, first to become independently familiar with the underlying technologies and then to serve as primary support for their students' activities.
ITS is eager to provide technical guidance for innovation within our services, but without advance agreement cannot become responsible for implementation or usage. We recommend scheduling a 1:1 Consultation with us at least one full quarter in advance of the planned use to discuss feasibility.
We ask all Datahub/DSMLP users to understand and abide by the following:
No Sensitive Data: Datahub is not engineered to protect highly-sensitive data such as clinical records or export-controlled information ("P4" per University of California classification levels) and must not be used for such purposes. Legally- or contractually- protected information ("P3") may be permitted after review; note that depending on the nature of the data, vetting may take 4-6 weeks or longer.
Shared Consulting Resources: IT Services staff must balance consulting duties against other operational and support responsibilities, and must ensure attention is distributed fairly across all courses and users. At Spring 2023 staffing levels, each course may request up to 6 hours of 1:1 Consulting services.
Scheduled Maintenance: Datahub may be unavailable Tuesdays, 6-8AM for installation of time-sensitive updates or security patches. Infrequent 'Critical' updates may require downtime outside of this timeframe, in which case we will notify instructors as soon as practical and attempt to minimize impact on coursework and long-running jobs.
Shared Compute Resources: Datahub and DSMLP system resources are shared among all courses assigned to Datahub and DSMLP. Demand for resources, in particular for GPUs, may exceed capacity at peak hours during 10th and Finals Weeks or at assignment deadlines. (Queuing mechanisms are in place to provide equitable access in such situations.)
Self-Supporting Programs: "Self-supporting" programs (e.g. MAS, MBA, etc.) are welcome to utilize DSMLP/Datahub for coursework or projects, but as Datahub equipment and staff are state-funded, UC policy requires us to recover associated direct and indirect costs. Please contact us to discuss .
Availability and Reliability: Datahub and DSMLP were designed with student workloads in mind, deliberately trading some of the costly redundancy typical of financial or health settings for additional capacity and capability. As such, they should not be used to host externally-available services or applications except as required for coursework or projects. (This caveat applies primarily to the compute nodes executing student jobs; critical components such as networking, file storage, and backups are maintained to Enterprise IT standards.)
Appropriate Use: The campus-wide IT Acceptable Use Policy applies to use of Datahub and DSMLP, including prohibitions on commercial or political activity, hacking or cyberstalking, and other types of unwelcome behavior.
If you still have questions or need additional assistance, email us at email@example.com, submit a ticket or call the ITS Service Desk at (858) 246-4357