How To: Customize your environment in DSMLP/Datahub (including Jupyter notebooks)


Overview


If you need to customize your environment in one of the Standard DSMLP/Datahub containers, please see the instructions below. 

If you need to create a custom container from one of the standard containers, please see TA Support for Building and Testing Custom Course Containers on DSMLP. Typically these are used for entire courses using the DSMLP platform, or if individual users need to install operating system-level packages (e.g. 'apt-get install', 'yum'). To launch your custom container, including via VSCode, see How To: Launching Containers from the Command Line

How do I install Python packages into my own virtual environment and kernel on datahub.ucsd.edu?


If there's a package you'd want to install into your Datahub Jupyter environment and you'd rather not install it into the existing Python 3 kernel, you can create a virtual environment for your kernel.

  1. Within the home page (https://datahub.ucsd.edu/user/[username]/tree) open a new terminal using "New>Terminal"
  2. Make a new directory for your kernel and create a virtual environment using Python 3's Virtual Environment module.
    # create the directory
    mkdir mykernel
    
    # create the virtual environment within your directory
    python3 -m venv mykernel
  3. Activate your virtual environment and install ipython & ipykernel
    # activate the kernel
    source mykernel/bin/activate
    
    # pip should now reference your virtual environment's pip
    which pip # output = /datasets/home/.../<YOUR USERNAME>/mykernel/bin/pip
    
    pip install ipython ipykernel
  4. Install the Python packages you'd like using pip
    # for example, install Scrapy
    pip install scrapy
  5. Install your kernel into your DSMLP jupyter notebook environment using the following:
    # make sure that ipython is referencing the virtual environment's ipython package
    # if it isn't, make sure your virtual environment is activated by following Step 3
    which ipython # output = /datasets/home/.../<YOUR USERNAME>/mykernel/bin/pip/ipython
    
    ipython kernel install --user --name=mykernel
    
    # deactivate your virtual environment and close the terminal
    deactivate
  6. Navigate back to the home page (see Step 1) and refresh it. You should now see your kernel appear under "New". Create a new notebook using this kernel by selecting it.
  7. >Verify that your installed packages can now be imported
    # within a cell of your jupyter notebook
    import scrapy
    scrapy?

    Note: All libraries installed through your kernel are only available to notebooks which use your kernel

How do I fix/reset my local environment if I'm having issues?

Warning: Installing packages using pip or conda can break your local environment.

We do not provide support for customizations to your environment using conda--if your course requires conda and you are having issues, then please reach to your TA/professor for assistance. If the package conda itself isn't installed correctly/isn't working however, please reach out to us.

To manually reset your environment navigate to datahub.ucsd.edu, click on the services dropdown, select manual-resetter, and click on the reset button. This will stop your servers, log you out, and reset your profile while preserving all work/files.

Alternative option: If the issue is with a notebook, you can create a clean Python3 notebook that ignores all packages located in the .local file. To do so, select "New>Python3(clean)" in the notebook server. If the clean option above fixes your problem, you will need to restore your local environment. Moving or deleting your .local/lib directory will fix many cases. In some cases, you may also need to move or delete your .local/jupyter directory (NOTE: if you are a TA using the shared grader account, please follow up in your course support ticket instead of doing this).

For a tool to save and switch environments, see: Save and switch local environment

If you still have questions or need additional assistance, please email datahub@ucsd.edu to create a support ticket, or visit support.ucsd.edu.