Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

feat: Tool dataclass - unified abstraction to represent tools #8652

Merged
merged 20 commits into from
Dec 18, 2024

Conversation

anakin87
Copy link
Member

@anakin87 anakin87 commented Dec 17, 2024

Related Issues

Proposed Changes:

How did you test it?

CI, several new tests

Notes for the reviewer

DO NOT MERGE
This PR is currently based on the new-chatmessage branch, to allow me to create other PRs for Chat Generators, which require both new ChatMessage and Tool.

Checklist

  • I have read the contributors guidelines and the code of conduct
  • I have updated the related issue with new insights and changes
  • I added unit tests and updated the docstrings
  • I've used one of the conventional commit types for my PR title: fix:, feat:, build:, chore:, ci:, docs:, style:, refactor:, perf:, test: and added ! in case the PR includes breaking changes.
  • I documented my code
  • I ran pre-commit hooks and fixed any issue

@coveralls
Copy link
Collaborator

coveralls commented Dec 17, 2024

Pull Request Test Coverage Report for Build 12391961949

Details

  • 0 of 0 changed or added relevant lines in 0 files are covered.
  • No unchanged relevant lines lost coverage.
  • Overall coverage increased (+0.09%) to 90.653%

Totals Coverage Status
Change from base Build 12376759419: 0.09%
Covered Lines: 8292
Relevant Lines: 9147

💛 - Coveralls

@anakin87 anakin87 changed the title feat: Tool dataclass feat: Tool dataclass Dec 17, 2024
@anakin87 anakin87 changed the title feat: Tool dataclass feat: Tool dataclass - unified abstraction to represent tools Dec 17, 2024
@anakin87 anakin87 marked this pull request as ready for review December 17, 2024 14:30
@anakin87 anakin87 requested review from a team as code owners December 17, 2024 14:30
@anakin87 anakin87 requested review from dfokina and vblagoje and removed request for a team December 17, 2024 14:30
@anakin87
Copy link
Member Author

@dfokina take a look at this when possible

@anakin87 anakin87 changed the base branch from main to new-chatmessage December 17, 2024 15:03
Base automatically changed from new-chatmessage to main December 17, 2024 16:02
Copy link
Member

@vblagoje vblagoje left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Only one comment - should we undermine UX/DX for agents and tools over 80kb binary? :-)

function: Callable

def __post_init__(self):
jsonschema_import.check()
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Knowing that Tool is central piece of the Agents push, should be we make this dependency default? It's 80Kb binary

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not totally sure. Let's involve also @julian-risch in the decision.

Copy link
Member

@vblagoje vblagoje Dec 18, 2024

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems to be not occurring on 3.9 and after, but ok, let's get this integrated and then we can experiment with including it as a default dependency. Or not.

@vblagoje vblagoje self-requested a review December 18, 2024 09:14
Copy link
Contributor

@dfokina dfokina left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I capitalized all "Tools" in the docstrings so that it looks unified :)

@anakin87 anakin87 enabled auto-merge (squash) December 18, 2024 11:28
@anakin87 anakin87 merged commit 96b4a1d into main Dec 18, 2024
20 checks passed
@anakin87 anakin87 deleted the tool-dataclass branch December 18, 2024 11:36
davidsbatista pushed a commit that referenced this pull request Dec 19, 2024
* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* port Tool from experimental

* release note

* docs upd

* Update tool.py

---------

Co-authored-by: Daria Fokina <[email protected]>
davidsbatista added a commit that referenced this pull request Jan 10, 2025
…8605)

* initial import

* adding initial version + tests

* adding more tests

* more tests

* incorporating SentenceSplitter based on NLTK

* adding more tests

* adding release notes

* adding LICENSE header

* removing unused imports

* fixing example docstring

* addding docstrings

* fixing tests and returning a dictionary

* updating release notes

* attending PR comments

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* wip: updating tests for split_idx_start and _split_overlap

* adding tests for split_idx and split_start and overlaps

* adjusting file for LICENSE checking

* adding more tests

* adding tests for page numbering

* adding tests for min split lenghts and falling back to character-level chunking based on size

* fixing linting issue

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* wip

* wip

* updating tests

* wip: fixing all tests after changes

* more tests

* wip: debugging sentence overlap

* wip: debugging page number

* wip

* wip; fixed bug with sentence tokenizer, needs to keep white spaces

* adding tests for counting pages on different split approaches

* NLTK checks done on SentenceSplitter

* fixing types

* adding detecting for full overlap with previous chunks

* fixing types

* improving docstring

* improving docstring

* adding custom lenght, 'character' use case

* customising overlap function for word and adding a few tests

* updating docstring

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* wip: adding more tests for word unit length

* fix

* feat: `Tool` dataclass - unified abstraction to represent tools (#8652)

* draft

* del HF token in tests

* adaptations

* progress

* fix type

* import sorting

* more control on deserialization

* release note

* improvements

* support name field

* fix chatpromptbuilder test

* port Tool from experimental

* release note

* docs upd

* Update tool.py

---------

Co-authored-by: Daria Fokina <[email protected]>

* fix: fix deserialization issues in multi-threading environments (#8651)

* adding 'word' as default length

* fixing types

* handing both default strategies

* wip

* \f was not being counted properly

* updating tests

* fixing the overlap bug

* adding more tests

* refactoring _apply_overlap

* further refactoring

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* Update haystack/components/preprocessors/recursive_splitter.py

Co-authored-by: Sebastian Husch Lee <[email protected]>

* adding ticks to close code block

* fixing comments

* applying changes: split with space and force keep_white_spaces=True

* fixing some tests and replacing count words approach in more places

* keep_white_spaces = True only if not defined

* cleaning docs

* handling some more edge cases, when split is still too big and all separators ran

* fixing fallback whitespaces count to fixed word/char split based on split size

* cleaning

---------

Co-authored-by: Sebastian Husch Lee <[email protected]>
Co-authored-by: Stefano Fiorucci <[email protected]>
Co-authored-by: Daria Fokina <[email protected]>
Co-authored-by: Tobias Wochinger <[email protected]>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants