Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

added MLFlowBackend #163

Draft
wants to merge 2 commits into
base: master
Choose a base branch
from
Draft

added MLFlowBackend #163

wants to merge 2 commits into from

Conversation

itan1
Copy link

@itan1 itan1 commented Apr 9, 2024

I've added a MLFlowBackend type for logging to MLFlow, similar to the TensorBoardBackend. It uses the MLFLogger from MLFlowLogger.jl (which now uses the REST API).

Currently, the log_to() method is implemented for Loggables.Value. I can see if I can add log methods for the other types Loggables.Image, Loggables.Text, Loggables.Histogram later, just wanted to get some first feedback.

I also still have to figure out how to start the mlflow server in the CI on Windows to make all tests pass.

Any first feedback?

PR Checklist

  • Tests are added: added a test for LogMetrics
  • Documentation, if applicable: documentation in the code and added a reference in Features.md

Comment on lines +27 to +35
- name: Setup python and mlflow server
uses: actions/setup-python@v1
with:
python-version: ${{ matrix.python-version }}
architecture: ${{ matrix.arch }}
- run: |
python -m pip install mlflow
python -m pip show mlflow
mlflow server --host localhost --port 5000 &
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for the PR! For my understanding, is this still necessary if MLFlowLogger.jl doesn't use Python? I thought MLFlow logging does not require an active server running.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We'll get a HTTP.ConnectError when trying to log to a server that's not running. So unfortunately these tests require a running server. Maybe it's more developer friendly to only run the MLFlow tests in the CI such that FluxTraining developers can just do ]test without having to bother with MLFlow?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I misunderstood what https://github.com/rejuvyesh/MLFlowLogger.jl does and is capable of. I've always used a setup like 1) or 2) in https://www.mlflow.org/docs/latest/tracking.html#common-setups, but it looks like this library only supports 3)? Are there plans to add logging support without a tracking server running?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Aha, I assumed for setup 1 or 2 there would always be a tracking server running on localhost, but I see now that that does not make sense.

The previous activity in MLFlowLogger.jl was 3 years ago so I don't think there are any plans there. I'm personally mainly interested in setup 3 because I am collaborating with others on a Julia ML project, but I can invest a little time if needed.

I assume the way to go would be to add file logging functionality to MLFlowLogger.jl, similar to what was done for TensorBoardLogger.jl. Adding all file logging functionality is probably a larger effort, but I could work on a first version to at least support creating experiments, runs and log_metric().

What kind of roadmap do you envision to add MLFlow logging support in FluxTraining.jl?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Finally had some time to look into this further. If we're already including Python, I wonder if it would help to use the more actively maintained https://github.com/JuliaAI/MLJFlow.jl or underlying https://github.com/JuliaAI/MLFlowClient.jl? If that doesn't sound appealing, we can continue with this approach.

I would also consider whether this could be implemented as a package extension. If you're comfortable with trying that, please do.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes in that case I would also use the underlying MLFlowClient.jl directly.

I like the idea of MLFlowBackend supporting both use cases: use MLFlowLogger.jl to log locally and MLFlowClient.jl to log to a remote MLFlow server.

I sketched an overview of such a design here. What do you think? (I would have to change MLFlowLogger.jl to a local logger, which makes sense since then it's similar to TensorBoardLogger.jl).

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That sounds good to me. I realize adding local logging support could be quite a bit of work though, and I hadn't realized you already got MLFlow CI working with rejuvyesh/MLFlowLogger.jl#5. So if the local logging part turns out to be too much of a hassle, I'd be ok continuing with the current PR setup and revisiting local logging once the MLFlow client library in question supports it.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants