added MLFlowBackend #163

itan1 · 2024-04-09T07:38:57Z

I've added a MLFlowBackend type for logging to MLFlow, similar to the TensorBoardBackend. It uses the MLFLogger from MLFlowLogger.jl (which now uses the REST API).

Currently, the log_to() method is implemented for Loggables.Value. I can see if I can add log methods for the other types Loggables.Image, Loggables.Text, Loggables.Histogram later, just wanted to get some first feedback.

I also still have to figure out how to start the mlflow server in the CI on Windows to make all tests pass.

Any first feedback?

PR Checklist

Tests are added: added a test for LogMetrics
Documentation, if applicable: documentation in the code and added a reference in Features.md

ToucheSir · 2024-04-13T02:43:17Z

.github/workflows/ci.yml

+      - name: Setup python and mlflow server
+        uses: actions/setup-python@v1
+        with:
+          python-version: ${{ matrix.python-version }}
+          architecture: ${{ matrix.arch }}
+      - run: |
+          python -m pip install mlflow
+          python -m pip show mlflow
+          mlflow server --host localhost --port 5000 &


Thanks for the PR! For my understanding, is this still necessary if MLFlowLogger.jl doesn't use Python? I thought MLFlow logging does not require an active server running.

We'll get a HTTP.ConnectError when trying to log to a server that's not running. So unfortunately these tests require a running server. Maybe it's more developer friendly to only run the MLFlow tests in the CI such that FluxTraining developers can just do ]test without having to bother with MLFlow?

I think I misunderstood what https://github.com/rejuvyesh/MLFlowLogger.jl does and is capable of. I've always used a setup like 1) or 2) in https://www.mlflow.org/docs/latest/tracking.html#common-setups, but it looks like this library only supports 3)? Are there plans to add logging support without a tracking server running?

Aha, I assumed for setup 1 or 2 there would always be a tracking server running on localhost, but I see now that that does not make sense.

The previous activity in MLFlowLogger.jl was 3 years ago so I don't think there are any plans there. I'm personally mainly interested in setup 3 because I am collaborating with others on a Julia ML project, but I can invest a little time if needed.

I assume the way to go would be to add file logging functionality to MLFlowLogger.jl, similar to what was done for TensorBoardLogger.jl. Adding all file logging functionality is probably a larger effort, but I could work on a first version to at least support creating experiments, runs and log_metric().

What kind of roadmap do you envision to add MLFlow logging support in FluxTraining.jl?

Finally had some time to look into this further. If we're already including Python, I wonder if it would help to use the more actively maintained https://github.com/JuliaAI/MLJFlow.jl or underlying https://github.com/JuliaAI/MLFlowClient.jl? If that doesn't sound appealing, we can continue with this approach.

I would also consider whether this could be implemented as a package extension. If you're comfortable with trying that, please do.

Yes in that case I would also use the underlying MLFlowClient.jl directly.

I like the idea of MLFlowBackend supporting both use cases: use MLFlowLogger.jl to log locally and MLFlowClient.jl to log to a remote MLFlow server.

I sketched an overview of such a design here. What do you think? (I would have to change MLFlowLogger.jl to a local logger, which makes sense since then it's similar to TensorBoardLogger.jl).

That sounds good to me. I realize adding local logging support could be quite a bit of work though, and I hadn't realized you already got MLFlow CI working with rejuvyesh/MLFlowLogger.jl#5. So if the local logging part turns out to be too much of a hassle, I'd be ok continuing with the current PR setup and revisiting local logging once the MLFlow client library in question supports it.

…e mlflow connection error in the Windows CI

added MLFlowBackend for LogMetrics

302f4fd

ToucheSir reviewed Apr 13, 2024

View reviewed changes

try whether adding sleep 5 after starting the mlflow server solves th…

8b6280b

…e mlflow connection error in the Windows CI

ToucheSir mentioned this pull request Sep 22, 2024

src/callbacks/logging/mlflow.jl: Add basic MLFlowBackend. #164

Open

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

added MLFlowBackend #163

added MLFlowBackend #163

itan1 commented Apr 9, 2024

ToucheSir Apr 13, 2024

itan1 Apr 15, 2024

ToucheSir Apr 15, 2024

itan1 Apr 17, 2024

ToucheSir Apr 28, 2024

itan1 Jun 4, 2024

ToucheSir Jun 28, 2024

added MLFlowBackend #163

Are you sure you want to change the base?

added MLFlowBackend #163

Conversation

itan1 commented Apr 9, 2024

PR Checklist

ToucheSir Apr 13, 2024

Choose a reason for hiding this comment

itan1 Apr 15, 2024

Choose a reason for hiding this comment

ToucheSir Apr 15, 2024

Choose a reason for hiding this comment

itan1 Apr 17, 2024

Choose a reason for hiding this comment

ToucheSir Apr 28, 2024

Choose a reason for hiding this comment

itan1 Jun 4, 2024

Choose a reason for hiding this comment

ToucheSir Jun 28, 2024

Choose a reason for hiding this comment