forked from dottxt-ai/outlines
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Update branch #2
Open
LouisHernandez17
wants to merge
161
commits into
craft-ai:probabilities
Choose a base branch
from
dottxt-ai:main
base: probabilities
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…eration (#1012) I've added an abridged version this [post on the .txt blog](https://blog.dottxt.co/coding-for-structured-generation.html) to the cookbook that should provide a good overview of a basic workflow for developing code when working with structured generation.
Co-authored-by: Patrice Bechard <[email protected]>
…ration) (#1039) As [discussed in our Discord server](https://discord.com/channels/1182316225284554793/1182317446225481788/1261998326077984802) This PR adds support for custom regex parsers. This doesn't change the behavior of Outlines by default. But this allows us to write custom `Guide` classes that uses custom regex parsers for e.g. multimodal generation. Also improves documentation
As requested by @rlouf this PR adds a question answering with citations example to the Cookbook using llama-cpp-python.
add fallback tonenizer if tiktoken can not get encoding from model name support llm services which provide openai compatibility api such as ollama
- Correct link for llama-cpp-python - Add installation instructions for llama-cpp-python - Correct first question-answer
Rendered Docs: https://github.com/lapp0/outlines/blob/multimodal-models/docs/reference/models/transformers_vision.md - Fixes #787 - Fixes #662 # Changes - Introduce `models.transformers_vision` which subclasses `models.transformers` and overrides its behavior so it applies, instead of `AutoTokenizer`, `AutoProcessor` to handle the text AND `PIL.Images` media - Introduce `VisionSequenceGeneratorAdapter`, handling and validating the `media` argument. - Update `outlines.generate` to dispatch `TransformersVision` models to `VisionSequenceGeneratorAdapter` # Tests - `tests/generate/test_api.py`: Test `prompt` / `media` validation - `tests/generate/test_generate.py`: - Add `model_transformers_vision` fixture. **tests pass locally, but disabled because a model small enough for CI isn't available** - Test all `outlines.generate` generators to ensure dispatchers for this new sequence generator is handled correctly.
memory= parameter is deprecated in favor of size= See https://modal.com/docs/reference/changelog#062174-2024-05-17 Current doc example produces the following error: ``` /path/test_modal.py:56: DeprecationError: 2024-05-16: The `memory` parameter is deprecated. Use the `size='80GB'` parameter instead. @app.function(image=outlines_image, gpu=gpu.A100(memory=80)) ```
It seems modal deletes environment variables, which makes outlines.models.transformers("mistralai/Mistral-7B-Instruct-v0.2") fail even after login. This workaround instructs user to manually add a key before importing the model. Fixes #1024
Add links to the two examples: - Q&A with Citations - Knowledge Graph Generation
Hi, Thank you for this great library! It seems that the docstrings are not rendered correctly in the docs. I think we should explicitly set the `docstring_style` because [it defaults to `"google"`](https://mkdocstrings.github.io/python/usage/configuration/docstrings/#docstring_style) but outlines is using numpy. Before: ![Screenshot 2024-12-16 at 23 00 26](https://github.com/user-attachments/assets/c752ee3d-519e-4098-b943-3aab43c8af25) After: ![Screenshot 2024-12-16 at 23 00 41](https://github.com/user-attachments/assets/5b5f524b-6921-4dbe-994d-72c079e677bc) There seem to be other issues in the docstrings: - for example [`Properties`](https://github.com/dottxt-ai/outlines/blob/main/outlines/models/openai.py#L23) should be [`Attributes`](https://numpydoc.readthedocs.io/en/latest/format.html#parameters) - only openai and transformers models are present in the [api reference](https://github.com/dottxt-ai/outlines/blob/main/docs/api/models.md) I'm happy to make followup PRs for those. Please let me know if I missed something, I couldn't find related issues/PRs.
Use `is` and `is not` for type comparisons, or `isinstance()` for isinstance checks
the old library structure has not been updated to reflect the present one.
Before this commit, when you ran pytest -k specific_test, it spawned dozens of the same skipped warnings message on stdout... IMHO, that was not ideal ^^ Bug introduced in d32dfde
Allow giving custom filters to the prompt decorator ``` def reverses: str) -> str: return s[::-1] @prompt(filters={ 'reverse': reverse }) def reverse_prompt(text): '''{{ text | reverse }}''' prompt = reverse_prompt("Hello") print(prompt) >>> "olleH" ```
There's an extra `outlines.generate` row in the feature matrix docs. This removes it. I also modified the markdown syntax for one header to use ** rather than __, consistent with the rest of the table.
We have noticing the following error with a recent version of outlines when used with MLX: ``` TypeError: argument 'token_id': 'float' object cannot be interpreted as an integer At: /.../outlines_core/fsm/guide.py(294): get_next_state /.../outlines/processors/structured.py(101): process_logits /.../outlines/processors/base_logits_processor.py(90): __call__ ``` The issue is that the MLX array of tokens, which are integers, are being force-converted to floats, even though outlines expects an integer array. This is because all MLX arrays are being converted to `float32`, even when it's not necessarily appropriate, like in this case. Looking at the [commented link](https://ml-explore.github.io/mlx/build/html/usage/numpy.html#pytorch), the advice was to convert to `float32` only for `bfloat16`, because numpy does not support `bfloat16`. Now the MLX `_to_torch` implementation matches the other array libraries, none of the other libraries are being force-casted to float
The existing README has underwhelming or incorrect results (Example is underwhelming #1347) due to lack of templating for instruct models. This adds special tokens for each instruct model call, as well as provide comments on how to obtain/produce special tokens. --------- Co-authored-by: Victoria Terenina <[email protected]>
Also add instructions about different outlines "flavors"! Co-authored-by: Cameron Pfiffer <[email protected]>
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
No description provided.