-
Notifications
You must be signed in to change notification settings - Fork 4
Software Resources
If you'd like to use Github to handle your course materials (and to create interact links), you'll need an account on [Github][github] and the ability to push to the Github repo associated with your connector course.
[Github][github] is a website that hosts code and files (it is the site you are currently visiting). A repository on Github holds the files for a specific project. Each connector class has a repository. If you don't know what your repository is and you'd like to know, contact the datahub team.
It is possible to create a link that will automatically pull new files into a student jupyterhub instance from your class github repository. See the url_to_interact
function in the connectortools
module. There is also a demo in the connectortools
binder to show how this is done.
[github]: https://github.com/
GitHub is one of several options for storing datasets. For other options see the hardware page.
Each user will have several common packages for scientific computing installed by default. However, many connectors wish to packages specific to their class.
There are two main options to install packages on the cluster. Choose the one that is best suited for your needs:
- To install a new package that all students will automatically have access to, create a new issue in the
connector-instructors
repo and flag the jupyterhub administrator. Attach (or point to) a small notebook that uses this package. We'll try to integrate any additional libraries that your notebook specifies. The more lead time we have the better. This can be a little clunky if you want to update a package on a regular basis. - Install packages directly to the cluster in each user session. In this case packages must be installed to your user directory. If you get permissions errors when using
pip
, it's usually because you're trying to write to the base cluster directory, not your user directory. Check out theconnectortools
module for a function calledinstall_package
that makes this straightforward.
Whenever a package is updated to a new version, students should restart their kernel. If you updated the package by contacting the datahub tech crew, it requires that students stop and start their server.
The datascience
package was written for use in Berkeley’s Data Science courses and contains useful functionality for investigating and graphically displaying data. There is detailed documentation available on Tables, Maps, and other components of the datascience
package.
- Run your code from start to finish in one go before pushing it to students. This ensures that it runs in a timely fashion, and that you aren't baking memory issues into the code itself.
- Always run the code on the cluster before distributing it, just in case you haven't accounted for some hardware or library restriction on the cluster.