Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

integrations: Evidently #699

Open
daavoo opened this issue Sep 4, 2023 · 19 comments
Open

integrations: Evidently #699

daavoo opened this issue Sep 4, 2023 · 19 comments
Labels
A: docs Area: user documentation A: frameworks Area: ML Framework integration hacktoberfest p3-nice-to-have

Comments

@daavoo
Copy link
Contributor

daavoo commented Sep 4, 2023

Probably no need to build anything custom but just make an example using the Python API.

Example with MLFlow: https://docs.evidentlyai.com/integrations/evidently-and-mlflow

@daavoo daavoo added A: docs Area: user documentation A: frameworks Area: ML Framework integration labels Sep 4, 2023
@francesco086
Copy link
Contributor

Hi @daavoo , would be happy to contribute to this one.

From what I see, I agree that an example with dvc is sufficient. But this example should appear on the evidently doc webpage, right?

@daavoo
Copy link
Contributor Author

daavoo commented Sep 28, 2023

Hi @daavoo , would be happy to contribute to this one.

From what I see, I agree that an example with dvc is sufficient. But this example should appear on the evidently doc webpage, right?

Yes.

I think it would not harm to have content both in dvc.org (i.e. in https://dvc.org/doc/user-guide/integrations/ml-frameworks) and in the evidently website.

@shcheklein
Copy link
Member

@mnrozhkov can share btw some materials on this I think.

@mnrozhkov
Copy link
Contributor

I would be happy to help with this! There is an example with DVC pipelines + Evidently. But, DVCLive can be used as well.

@francesco086
Copy link
Contributor

Thank you! I will come with a PR that adds a new evidently.md file here towards the end of next week. Once that is merged, I will then try to push it to the evidently doc webpage too.

@francesco086
Copy link
Contributor

Hi guys, sry for being late, last week I was sick.

So, I picked this up and what looks to me as being most meaningful is to mimic the evidently page, using dvclive instead of mlflow.
As soon as I started writing down something I realized that it would be useful to have something that one can replicate.
And this is why I ended up on a colab notebook (I barely used it in the past, and I am not an enthusiast, but I guess it serves the purpose here):
https://colab.research.google.com/drive/14usegPOSArF9tdO7NUOPndNk5vuUaYaX?usp=sharing

I think I could replicate everything, except that I am not able to show all the steps at the end. Any hint?

@dberenbaum
Copy link
Collaborator

Awesome work @francesco086!

There are two possible ways to handle the steps here:

  1. In a single experiment (the way you do it). In this case, you can't see all steps in the table. It only shows the latest. However, you can use plots to compare them (try dvc plots show) or see the results in the VS Code extension or Studio.
  2. Make each comparison a separate experiment. This will give you each step as a separate row in the table but without having them inherently grouped as steps.

@francesco086
Copy link
Contributor

Thanks for the clarification @dberenbaum, I thought there was a way I was unaware of.

I then adjusted the notebook, and actually show both approaches: all in one experiment, and plots, or multiple experiments and visualization via dvc exp show. I think we can leave both, right?

Ok, now that the formalities are done, let's get concrete. I can easily write a markdown, but every time an update would happen it will be a pain to update it... I think it would be easier if one could keep a notebook next to the markdown in the documentation, and have a script or something that turns the notebook into a doc page. There are quite many tools, perhaps you are already using one. Or perhaps you think it is not a good idea and you prefer to have a plain md.

Could you please share with me what is the best way (technically) to turn this notebook into a doc page?

PS: of course I will do some final lifting in the notebook and get rid of the warnings :)

@dberenbaum
Copy link
Collaborator

Thanks @francesco086! Looks great!

I think it would be easier if one could keep a notebook next to the markdown in the documentation, and have a script or something that turns the notebook into a doc page.

It's a good idea, but we don't have capacity now to set this up in our existing docs framework. We have had many discussions around it, but for now the cost of manually updating is lower than changing our docs framework.

Could you please share with me what is the best way (technically) to turn this notebook into a doc page?

It's fine to include the notebook as a link in the docs and as an example in dvclive itself, but regardless we will want to have the basics in a markdown page here. See the guide for contributing.

Are you also planning to add something to the evidently docs?

@francesco086
Copy link
Contributor

Thanks @dberenbaum !

Yes, I plan to add the same in the evidently docs.

Alright, I made the last lifting to the notebook, and now I am going to prepare the md document (will read how-to-contribute first).

@francesco086
Copy link
Contributor

Et voila': iterative/dvc.org#4918

By the way, if it is not too much work, I would really appreciate if you could add the hacktoberfest tags for this PR, as I am participating.

@dberenbaum
Copy link
Collaborator

@francesco086 I added the hacktoberfest label here. Is that all I need to do?

Thanks so much for the contribution!

@francesco086
Copy link
Contributor

francesco086 commented Oct 17, 2023

I do it with great pleasure @dberenbaum !

Here the instructions for the hacktoberfest: https://hacktoberfest.com/participation/#maintainers

I think what is missing is:

  • Add the “hacktoberfest” topic to your repository to opt-in to Hacktoberfest and indicate you’re looking for contributions.

I am waiting for these PRs to be merged to then try to "push" this to the evidently doc website.

@dberenbaum
Copy link
Collaborator

Here the instructions for the hacktoberfest: https://hacktoberfest.com/participation/#maintainers

I think what is missing is:

* Add the “hacktoberfest” topic to your repository to opt-in to Hacktoberfest and indicate you’re looking for contributions.

Thanks @francesco086! I think this is already a topic for https://github.com/iterative/dvc.org.

@shcheklein Could you please help with this for dvclive since I'm not an admin on this repo?

@shcheklein
Copy link
Member

done! thanks @francesco086

@francesco086
Copy link
Contributor

I created an issue on the Evidently side.

@francesco086
Copy link
Contributor

Et voilà!

https://docs.evidentlyai.com/integrations/evidently-and-dvclive

I think the issue can be closed 🤗

@dberenbaum
Copy link
Collaborator

Thanks @francesco086!

I will keep this open to discuss one more thought (feel free to give your thoughts or not here): there was a good point made in evidentlyai/evidently#819 (comment) that this is specifically about dvclive and https://www.youtube.com/watch?v=qFnwZ653Aks discusses other ways to integrate dvc. Should we include a link to that video in https://dvc.org/doc/user-guide/integrations/evidently? @mnrozhkov any thoughts?

@mnrozhkov
Copy link
Contributor

First and foremost, I'd like to express my gratitude for the incredible work done by @francesco086! I apologize for not responding directly within the issue thread earlier; I needed time to reflect on the best way to articulate my thoughts.

I've been closely examining the DVCLive + Evidently integration put forward by @francesco086 , and I appreciate the thoughtful approach taken. The concept of data monitoring during model training is intriguing, although its current implementation as part of DVC metrics and plots might not fully capture the essence of its potential value.

There are a few ideas on the next steps:

  1. While DVC plots provide a certain level of insight, the richness and interactivity of Evidently's HTML reports are unmatched in delivering a deeper understanding, especially when it comes to investigating detected drifts. The clarity and depth offered by Evidently could significantly enhance the way we interpret our models' performance. So, we could store and version with DVC HTML reports (if drift is detected)

  2. Generating and versioning a reference dataset alongside other artifacts of the ML pipeline could be highly advantageous

I'm excited about the dialogue this integration has opened up, and I see it as a fantastic starting point for continued innovation. The potential for what DVCLive could incorporate and represent in the future is truly inspiring.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
A: docs Area: user documentation A: frameworks Area: ML Framework integration hacktoberfest p3-nice-to-have
Projects
None yet
Development

No branches or pull requests

5 participants