Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[DOCS] Create a tutorial about using SpanMarker and Argilla #4086

Closed
sdiazlor opened this issue Oct 30, 2023 · 14 comments
Closed

[DOCS] Create a tutorial about using SpanMarker and Argilla #4086

sdiazlor opened this issue Oct 30, 2023 · 14 comments
Assignees
Labels
team: interns Indicates that the issue or pull request is owned by the machine learning interns team type: documentation Improvements or additions to documentation

Comments

@sdiazlor
Copy link
Contributor

Which page or section is this issue related to?

To create a tutorial about how to use SpanMaker and Argilla for NER

@sdiazlor sdiazlor added the type: documentation Improvements or additions to documentation label Oct 30, 2023
@davidberenstein1957 davidberenstein1957 added the team: interns Indicates that the issue or pull request is owned by the machine learning interns team label Oct 30, 2023
@davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 changed the title [DOCS] Create a tutorial about using SpanMaker and Argilla [DOCS] Create a tutorial about using SpanMarker and Argilla Nov 20, 2023
@davidberenstein1957
Copy link
Member

@davidberenstein1957
Copy link
Member

@davidberenstein1957 davidberenstein1957 self-assigned this Dec 11, 2023
@Rami-Ismael
Copy link

Hi there! I'm Rami Ismael, the individual behind the GitHub issues initiative as discussed here. I'm currently enjoying my winter break and have some free time on my hands. I'm keen on offering my assistance to help finalize the documentation. Would that be possible?

@nataliaElv
Copy link
Member

Perhaps this tutorial will be more useful when we release the Spans Question for Feedback Datasets? Otherwise it will be outdated quite soon.

@ceteri
Copy link

ceteri commented Feb 14, 2024

We're building this https://github.com/DerwenAI/textgraphs which leverages SpanMarker and other LLM-based tasks in KG construction ... and if you notice the "report" this project has a very large Argilla-shaped puzzle piece missing in its center (why we needed the gradients for extracted entity and relation streams). I'd like to offer help on the SpanMarker + Argilla tutorial too.

@louisguitton
Copy link
Contributor

louisguitton commented Mar 1, 2024

I'd also like to offer help on this tutorial, whether on designing it, writing it or maintaining it.

My notes on writing a tutorial:

  • the end of the tutorial must be meaningful and achievable to a beginner
  • having done the tutorial, the reader is in position to make sense of the rest of the documentation and of Argilla itself
  • objective = turning learners into users, get the learner started on their Argilla journey not to their destination
  • Tutorials need to be useful for the beginner, easy to follow, meaningful and extremely robust, and kept up-to-date
  • build from the simplest tools or operations to the most complex
  • be concrete, built with specificity in mind, don't explain anything the learner doesn't need to know to complete the tutorial (e.g. Argilla telemetry)
  • Note that it doesn’t tell you what you will learn, just what you will do. The learning comes out of that doing.

Proposed promise for this Argilla + SpanMarker tutorial

if you have the basic knowledge required to follow this tutorial (e.g. spaCy?), and you follow its directions, you will end up with a working Argilla Server, complete with a FeedbackDataset with Span Categorization Questions, with NER label Suggestions machine-generated by SpanMarker, ready for Annotators to add Responses. Advanced readers will be able to add Metadata or Vectors.

What do you think? what would be a good amount of knowledge required?

And we are waiting for the Span Categorization to be released in the FeedbackDataset right? or did I miss this going live?

@nataliaElv
Copy link
Member

hi @louisguitton ! Yes, we're working on releasing a Spans question for Feedback datasets and once that's out, we can start working on the tutorial. I think it would be highly beneficial for the adoption of this feature that the tutorial is published soon after the release.

I'll leave it to @davidberenstein1957 and @sdiazlor to tell you if they need any help with this one or if it's something they prefer to do internally.

@dvsrepo
Copy link
Member

dvsrepo commented Mar 1, 2024

Very cool notes about what a tutorial should be @louisguitton , fully agree!

We used to use tutorials more as a blog post to promote and introduce argilla to new users on social media but that has created a bit of a mismatch now. We'll take it into account for version 2.0 of the docs (unstarted but planned)

@sdiazlor
Copy link
Contributor Author

sdiazlor commented Mar 5, 2024

Hi, @louisguitton. Thanks for your notes! Any feedback is always welcome. Feel free to work on this tutorial and let us know if you have any doubt.

@dvsrepo
Copy link
Member

dvsrepo commented Mar 5, 2024

Let's wait for the new SpanQuestion

Copy link

This issue is stale because it has been open for 90 days with no activity.

@github-actions github-actions bot added the status: stale Indicates that there is no activity on an issue or pull request label Jun 20, 2024
@louisguitton
Copy link
Contributor

Since we started this discussion, SpanQuestion was released

The football news dataset, the code snippets I contribute in the talk, the structure of the talk can all be used to create a tutorial.
A part 2 of the talk was also discussed, to address some of the parts I didn't have time to cover: train a model, use weak supervision with skweak, do KG construction with the entities found etc...

The scoping exercise (i.e. splitting in parts and making sure we deliver small and incremental value) for NER is key I think, so any input from User feedback or Customer needs or Product vision is welcome to help prioritise.

@github-actions github-actions bot removed the status: stale Indicates that there is no activity on an issue or pull request label Jun 23, 2024
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
team: interns Indicates that the issue or pull request is owned by the machine learning interns team type: documentation Improvements or additions to documentation
Projects
None yet
Development

No branches or pull requests

9 participants