Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

[WIP] Structured model state #929

Open
wants to merge 15 commits into
base: main
Choose a base branch
from

Conversation

hudson-ai
Copy link
Collaborator

@hudson-ai hudson-ai commented Jun 25, 2024

This PR primarily replaces the implementation of model state (Model._state) with a list of python objects. Currently, state is a str with embedded "tags" that represent special embedded objects, e.g. html for formatting, images for multimodal models, etc.

On the one hand, the existing implementation is "nice" because it allows adding these special objects as strings inside of normal (and even stateless) guidance functions without having do anything "dirty" like reach into private methods on Model -- everything is string concatenation.

On the other hand, this means that this string has to be "parsed" to produce actual prompts (usually pretty straight-forwardly via re.replace), introspect on token probabilities, or extract image data from tags. This feels fragile as a model could easily produce strings that correspond to our tags and blow everything up. Furthermore, if we ever have a guidance function that produces "special" strings with formatting, etc. inside of a select block, we're actually asking the model to produce this special structure instead of just the actual textual content...

A few TODOS:

  • Rethink formatting-only context blocks like silent and monospace
  • See if this approach simplifies the implementation of multimodal models (@nking-1)
  • See if this approach simplifies chat engines
    • e.g. have a method on ModelState that turns it into a list of {"role": "user", "message": ...} objects

Thank you @nking-1 for the help and feedback on this so far!
Note that this train of thought originated from the discussion on PR #905 for adding terminal colors. This isn't technically a blocker for that PR, but adding structured state will make that PR trivial :)

@hudson-ai hudson-ai self-assigned this Jun 25, 2024
@@ -966,10 +966,6 @@ def _re_with_temperature(grammar, temperature, visited_set):
# return ModelVariable(name)


def active_role_end() -> ModelVariable:
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The current implementation of active_role_end will actually produce the closing tag twice since it is essentially a stateless grammar and can't pop ContextBlocks off of a model. Furthermore, generating active_role_end requires the model to produce special tags which will be hidden from future prompts... A better implementation of this may exist, but I'm not quite sure yet. Deleting it for now

Comment on lines +989 to +990
# Import in function to guard against circular import
from ..library._role import RoleBlock
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I have a feeling that some may object to the import inside of a function... Is anyone else better at resolving issues with circular imports?

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Well, we haven't tried turning on ruff or flake8 yet.....

Comment on lines +10 to +14
def _html(self) -> str:
raise NotImplementedError

def __str__(self) -> str:
raise NotImplementedError
Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Currently undecided as to whether these should be methods or if these should just be data containers with external "Handler" classes to produce HTML, strings for prompts, pretty ANSI codes for terminal outputs, etc...

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'd definitely separate model state from how it gets displayed.

Copy link
Collaborator

@riedgar-ms riedgar-ms left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the basic idea of this is great.



@dataclass(frozen=True, slots=True)
class Object:
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We might want a better name....

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely -- please consider this a sketch 😆

__slots__ = ["objects"]

def __init__(self) -> None:
self.objects: list[Object] = []
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I imagine there are constraints on this list - for example, a RoleCloser must be followed by a RoleOpener?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Closers may come arbitrarily far in the future, but there is logic inside the model class to ensure that they get closed... Currently we just have run-time checks for balanced openers/closers inside the chat models, and I'm not sure if we can do much better... But we may at least be able to have constraints that you can't open a role if one is already open

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I was wondering if there should be a nested structure - a list of Role blocks, each of which could contain an arbitrary list of Text, Image etc. blocks. Or would that be overly complicated?

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought actually. On my first attempt, that felt pretty complicated, but it might not be so bad now that this skeleton is here. I'll experiment with it a bit :)

@paulbkoch
Copy link
Collaborator

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

@hudson-ai
Copy link
Collaborator Author

Hi @hudson-ai, the one concern I have with moving away from purely text based state is that I think there may be useful scenarios where the guidance developer would want to use the grammar to switch roles. For example, if the prompt is currently in an assistant role, the guidance program might switch to the user role, include some user role text, then switch back to the assistant role and then force the assistant to write some initial text in the response. Is this still a possible scenario to implement under this PR? I do recognize there are benefits to disallowing the user from creating illegal inputs, but I'm curious to explore the tradeoff here, if any.

Thanks for taking a look @paulbkoch :)

I have to think about this a bit, but my first impression is that there is nothing about structured model state that would prohibit special control sequences from the models themselves. Just thinking out loud...

  • internal model state can contain extra data that isn't used for LLM prompting, e.g. probabilities of generated tokens, formatting information, etc.
  • models are allowed to produce control sequences that add special data to the model state (e.g. start/end of roles)
  • that being said, but models should never be able to write that extra data directly (i.e. they have no control over the probability we log for a token or how we format role blocks, they only have control over whether or not a role block has begun, etc.)

The idea is just to make a logical distinction between the way that we do internal bookkeeping and how the contents of that bookkeeping are displayed to models, to ipython for formatting, etc.

Is this reasonable?

@Harsha-Nori
Copy link
Collaborator

I had the same concern as @paulbkoch, but I think your reply makes sense here Hudson :). In thinking about this, I wanted to brainstorm broadly about what types of metadata/bookkeeping we may want to track -- even beyond what we have implemented today -- and what the atomic "unit" the metadata ties to is (e.g. each token, each char in the model state, each byte, etc.)

A rough laundry list off the top of my head (we certainly don't need to implement all of these, but just want to brainstorm broadly):

  • Generated vs. templated text
  • (Log) Probability of each token
  • Was a token forced by the grammar?
  • What block(s) does the text/tokens belong to (role tags, user defined blocks)
  • Sampling parameters/strategy at time of generation (?)
  • If we introduce speculative decoding, is there speculative model metadata (e.g. how many tokens were accepted in the speculation)
  • Runtime performance metrics per token (wall clock time, hardware util stats, etc.)
  • If we update the parser to be hierarchical, which parser handled this piece of text (e.g. regex lexer vs parser)
  • Is a byte/token a text byte, or something multimodal (e.g. image, audio)
    etc.

If we get the design right here, it should get much easier to expose rich metrics to users and build alternative UIs. We could also shift over the (potentially buggy) token counting we're doing in the Metrics classes to this new model state view. Thoughts?

@hudson-ai
Copy link
Collaborator Author

@Harsha-Nori love these ideas!

I feel that all of these "logging" use cases, multimodality, chat vs. completion APIs, ... suggest that some kind of additional structure is needed (whether or not it ends up looking anything like what I drafted here).

Whatever the right design is, it should probably be:

  1. Extensible (e.g. it should be trivial to add a new modality or piece of logging attached to certain tokens or bytes in the future)
  2. Performant (I don't want to worry too much about optimizing so early, but the structure should hopefully be lightweight enough that we don't spend all of our CPU time formatting prompts...)
  3. User friendly (this maybe goes up with extensible, but the current html-ish/tag formatting business is confusing and the implementation is distributed across the repo, making it very hard to understand/introspect/modify)

@riedgar-ms
Copy link
Collaborator

When it comes to performance, do bear in mind that we already have things like this hanging around:


(other models have similar, if not quite identical code). If we make the internal model state richer, then we'll be able to generate conversations turns directly, rather than having to parse them out of a single state string.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants