Overview
ITS/ETS' Datahub and Data Science/Machine Learning Platform (DSMLP) are available to undergraduate and graduate students in support of:
- For-credit independent study/research (e.g. 199/299),
- Thesis-related research,
- Certain departmentally-sponsored student projects (e.g. HDSI capstone projects)
Datahub provides web-based Jupyter notebooks allowing students to combine live code, equations, visualizations and narrative text for:
- Data cleaning and transformation
- Numerical simulation
- Statistical modeling
- Data visualization
- Machine learning
DSMLP's Jupyter notebooks offer interactive access to popular languages and GPU-enabled frameworks such as:
- Python
- R
- Pandas
- PyTorch
- TensorFlow
- Keras
- NLTK
- AllenNLP
The underlying DSMLP platform is also accessible for interactive terminal and batch jobs.
- Complex ML workflows are supported through terminal/SSH logins, background batch jobs, and a full Linux/Ubuntu CUDA development suite.
- Users can opt to replace the default environments by launching their own custom Docker containers. (Detailed documentation)
Critical Concepts
-
Datahub/DSMLP is available for use by undergraduate and graduate students in nearly all academic departments (with some exceptions listed below) for their:
- Registered independent study/research courses (e.g., 198/199, 293/298/299)
- State-supported capstone or final projects (HDSI UG Capstone, SOM Independent Study Project)
- Certain departmentally-sponsored initiatives (e.g. Halıcıoğlu Data Science Undergraduate Fellowships)
- Non-instructional activities (e.g. personal projects, clubs, teams) which may be approved on a case-by-case basis as resources are available.
-
Limitations apply in the following situations:
- Professional students in certain self-supported programs (MBA, MAS, Extension) may be ineligible due to state funding restrictions; contact ITS to confirm.
- Students in Health-Sciences departments (e.g. Medicine, Pharmacy, Neurosciences) may have limited access to Datahub due to HS security restrictions; contact ITS to discuss. DSMLP access is unaffected.
- Non-student Researchers, including Postdoctoral Scholars, should contact Research IT Services for help identify comparable GPU compute environments for non-student research needs.
Steps to Take
Requesting Access
- Eligible Students should request Datahub/DSMLP access via the DSMLP Independent Study Access Request Form. You will be asked to provide information regarding your project, resource requirements, and sponsorship.
- Instructors and instructional support staff may request personal DSMLP access in order to support their students' use of the platform, or to explore and evaluate the environment for course development purposes. To request such an account, please submit your request through the ITS Service Desk.
- To request access for your scheduled undergraduate and graduate-level courses, please submit an Instructional Technology Request (CINFO).
Requesting Support
- Once you have been provisioned for access, any questions, requests, or problems should be submitted the ITS Service Desk.
- Datahub/DSLMP resources are provided to Independent Study users "as is"
-
- We are unable to modify them to support specific project needs (e.g. custom hardware or software versions).
- We will provide guidance in configuring the environment for your work (as time permits).
- 3. ITS/ETS provides trouble-ticket support for Independent Study users, including resolution of system problems affecting only those users, at a lower priority than for scheduled instruction.
- We cannot provide assistance outside of business hours or on weekends,
- It may take up to 5 days to respond to your problem reports.
Usage Policies
By applying for and using DSMLP resources, you (and your advisor, if a Doctoral candidate) agree to read and understand the following policies and guidelines:
- Access to ETS resources is limited to currently-registered UC San Diego students, which may impede collaboration with UC San Diego faculty, staff researchers, postdoctoral students, or with individuals at other institutions. Continuity of research may be impacted should responsibilities shift within your project or group.
- Student research jobs run at a lower priority than coursework, and resource contention is to be expected during busy 8th-10th week and final exam periods. We cannot raise priority of research jobs to accommodate publication deadlines.
- DSLMP resources are provided "as is"; we are unable to modify them to support specific project needs (e.g. custom hardware or software versions). We will however provide you guidance in configuring the environment for your work as time permits.
- DSMLP's capabilities are relatively modest compared to those available via San Diego Supercomputer Center (SDSC), NSF/national HPC resources such as XSEDE, and cloud-based services such as Amazon AWS, and may not accommodate natural growth in a project's resource needs over time.
- Larger requests, or projects requiring capabilities outside of ETS' instructional portfolio, will be routed to campus Research IT Services facilitators, who can help evaluate the project's broader context, including the related efforts of your PI and research group, and then help you identify the most appropriate resources.
- DSMLP is not suitable for storage or processing of Category P3/P4 sensitive data (definitions), examples of which include:
- Government classified/controlled (CUI/CTI/ITAR/FISMA)
- Health or personal protected information (PHI/HIPAA, IRB-controlled, or statutory PII)
- Other students' grades or academic records (FERPA)
- Information subject to certain Data Use Agreements (DUA)