Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Dependencies of ML projects #1

Open
nielsbauman opened this issue Nov 17, 2021 · 8 comments
Open

Dependencies of ML projects #1

nielsbauman opened this issue Nov 17, 2021 · 8 comments

Comments

@nielsbauman
Copy link

We could try to identify how dependencies affect the maintainability of an ML project.

Possible approaches:

  • Looking at one project - look at the history of the dependencies and relate it to the maintainability of the project.
  • Comparing multiple projects - see when developers are adding/removing dependencies.
  • Try to find a correlation between certain dependencies and maintainability.
  • Look at the history of the dependency requirements.
@rlogothetis01
Copy link
Collaborator

Motivation: can we improve maintainability and robustness relating to dependencies

@rlogothetis01
Copy link
Collaborator

rlogothetis01 commented Nov 17, 2021

Data:

  • data set
  • ML repos
  • gitprojects with a few commits about 100
  • python projects

@DungLai
Copy link

DungLai commented Nov 17, 2021

Motivating example: Lots of machine learning projects and paper implementation are written in python2. However, python2 is no longer supported. It is difficult to change to python 3, difficult to install dependency for python2. If there is a security vulnerability in the code, it's difficult to maintain or fix the existing systems that use depreciated dependencies.

@luiscruz
Copy link
Collaborator

Tool to extract module dependencies
https://pypi.org/project/findimports/

@DungLai
Copy link

DungLai commented Nov 17, 2021

List of projects can be found in this list of over 4000 repositories from this Microsoft paper

@rlogothetis01
Copy link
Collaborator

rlogothetis01 commented Nov 17, 2021

Methodology:

  • identify depencies by looking at statements or write a script (tool) [AST]
  • dependencies change over time (when added and when removed - if removed)
  • look at size of projects: define size - features, no. files etc.

@rlogothetis01
Copy link
Collaborator

rlogothetis01 commented Nov 17, 2021

  • do a comparative analysis
  • identify if there is a correlation between dependency & change over time?
  • what changes were made over time?

@rlogothetis01
Copy link
Collaborator

Dependencies
Look at practicality (what is the problem we are solving)
Too many dependencies - impact on deployment etc.
DS pulls in as many dependencies to make things work
SE remove dependencies
Example: pandas may not be needed but is commonly pulled in in DS projects but can find ways to not include
Evolution aspect
From build to deployment
Dependencies bw ML micro services (cross boundary issues)

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

4 participants