Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Unification of submodels and distributions #2485

Open
Tracked by #2420
penelopeysm opened this issue Feb 10, 2025 · 20 comments
Open
Tracked by #2420

Unification of submodels and distributions #2485

penelopeysm opened this issue Feb 10, 2025 · 20 comments
Labels
Milestone

Comments

@penelopeysm
Copy link
Member

penelopeysm commented Feb 10, 2025

@mhauru @willtebbutt and I discussed submodels this evening (10 Feb). The present issue is that our support for submodels is currently halfway done - we are able to extract the return value of a submodel, but not its latent variables.

(Note that this has always been true, even with the old @submodel macro; TuringLang/DynamicPPL.jl#696 merely changed the syntax we used to achieve this.)

(1) Overview

After a fair bit of back and forth, the summary of the interface we would like is something along these lines:

using DynamicPPL, Distributions

@model function inner()
    x ~ Normal()
    y ~ Normal()
    return "my string"
end

@model function outer()
    a ~ Normal()
    b ~ inner()
    @show b      # Should be a NamedTuple{:x, :y}
    @show b.x    # Should be a float
    c ~ inner() {OP} retval
    @show c      # Should also be a NamedTuple{:x, :y}
    @show retval # Should be "my string"
end

# Conditioning on submodel variables should work
outer() | (@varname(c.x) => 1.0)
# This should ideally work too
outer() | (c = (x = 1.0,),)

for some infix operator {OP} (see section 3.2 below for some possible options).

Note that there are several changes with respect to the current behaviour (as of 10/2/2025):

  1. No need to wrap in to_submodel if possible (I am not totally sure if this is doable)
  2. Manual prefixing should not be needed and may be disallowed
  3. Prefixing should occur not by prepending directly to the symbol (as is currently done), but rather by making the submodel's variables be a field of the parent model's variable. Thus, we can write @show c.x instead of @show var"c.x".
  4. The lhs of a tilde should capture the submodel's random variables instead of its return value.
  5. The return value, if desired, can be extracted by placing a further operator on the right-hand side of the submodel.

Although we are collectively in favour of this interface, this is not meant to be set in stone yet, and there are several further points of discussion, which are detailed below.

(2) Motivation

Turing models in general have two types of 'useful information' that one might want to extract:

  1. The values of the random variables inside. This is best represented by the model trace, i.e., VarInfo that is used during execution.
  2. Since @model function ... end itself expands into a function definition (the so-called 'model evaluation function'), this function will itself also have a return value.

This return value may be constructed from the random variables' values, and in many of the DynamicPPL/Turing docs, this is indeed the case; however, this is not mandatory and in general the return value can contain arbitrary information.

With models, these two pieces of information are obtained respectively using rand() and function calls:

julia> using Distributions

julia> using DynamicPPL, Distributions

julia> @model function f()
           x ~ Normal()
           return "hello, world"
       end
f (generic function with 2 methods)

julia> model = f()
Model{typeof(f), (), (), (), Tuple{}, Tuple{}, DefaultContext}(f, NamedTuple(), NamedTuple(), DefaultContext())

julia> rand(model)
(x = 0.12314369056401028,)

julia> model()
"hello, world"

Currently, x ~ to_submodel(inner()) does not assign the random variables in inner() to x, but rather the return value. This means that there are several inconsistencies between the behaviour of submodels and distributions:

  1. The obvious difference is that with a distribution on the rhs, the value of x is sampled by calling rand(dist). With a submodel on the rhs, the value of x is obtained by calling inner()().
  2. It is not possible to calculate the logpdf of a submodel inner() evaluated at x. This is because the return value x, in general, has no relationship to the random variables contained inside inner(), and indeed there is no guarantee that a well-defined 'distribution' of return values exists.
  3. In x ~ to_submodel(inner()), although the variables of inner() are added to the VarInfo and the resulting chains from sampling, x itself is not.

This proposal therefore seeks to unify the behaviour of submodels and distributions in a way that is internally consistent and thus easier for users to intuit. In particular, it is proposed that:

  1. The syntax lhs ~ rhs is reserved for the results of sampling from a submodel or distribution using rand(). The result of sampling from a model should be some kind of data structure (a NamedTuple, struct, or dictionary) which allows for indexing. The variable lhs (or its subvariables) should always be part of the VarInfo and it should be possible to condition on them.

  2. We adopt new syntax, in the form of lhs ~ submodel {OP} retval where {OP} is an infix operator, to extract the return value of a submodel (if so desired). Because distributions do not have return values, this syntax would only be accepted when used with submodels in the middle. The {OP} retval section may be omitted, in which case the return value is simply discarded.

  3. Running a submodel without extracting its random values (i.e. just writing submodel {OP} retval) should be forbidden, because in such a case, users should refactor their code to use a plain Julia function instead of a submodel.

(3) Concrete steps

  1. Decide if the general idea makes sense.

  2. Decide on the infix operator {OP}. We would probably like the operator to (1) be ASCII-compatible; (2) resemble a rightwards arrow.

    • I originally proposed ~>, but this is not allowed by the Julia parser.
    • The best boring option I see is -->
    • >>= is also possible, and I have a Haskell bias towards it, but it technically conflicts with right-bit-shift-and-assign.
    • The simpler -> and => are probably best avoided because they are already used for anonymous functions and Pair respectively.
  3. Figure out the data structure that should be obtained when sampling from a submodel. Right now, rand(model) returns a NamedTuple. To me, this feels like the most natural interface to use; it 'makes sense' that if t is a random variable in a submodel, c ~ submodel should allow us to access c.t. It is possible that we may want to use a different type of data structure that retains more information (i.e. is closer to a varinfo) but still has an interface that allows field access.

  4. Figure out how to obtain this data structure when sampling from a submodel. My original proposal was to evaluate submodels with a special wrapper context, say SubmodelContext, which would collect sampled variables and their values in a NamedTuple as each assume statement was hit. (Note, the behaviour of this would be very similar to ValuesAsInModelContext.) However, it seems quite plausible that this could be obtained simply by subsetting the global varinfo.

  5. Implement this in the DynamicPPL compiler. Note that this may require special attention to e.g. operator precedence / associativity which may in turn place more restrictions on the possible operators used. Some extra abstract type machinery will likely also be needed if we plan to not wrap submodels in a new type; my suspicion is that this might actually be the hardest part of it.

  6. Iron out the odd bits of conditioning submodels. I actually suspect that all the infrastructure necessary for this is already in place, and it's mostly a matter of making sure that writing a comprehensive set of tests to make sure that everything behaves 'as expected'.

  7. Iron out the expected behaviour when varnames conflict, e.g. if we have c ~ submodel() then we should probably not allow the identifier c to be reused on the lhs of another tilde.

  8. Write tests. And more tests. And more tests. Even with as elegant an implementation as we can come up with, my gut feeling is that there are bound to be many awkward edge cases!

  9. Turn the contents of this issue into documentation. (I wrote it up, so the hard bit's already done 😉)

(4) Alternatives considered.

The main alternative considered was to use two different operators for extracting the random variables and the return value, plus one for extracting both, so something like:

@model function inner()
    x ~ Normal()
    y ~ Normal()
    return "my string"
end

@model function outer()
    a ~ Normal()
    b ~ inner()
    @show b       # Should be a NamedTuple{:x, :y}
    retval {OP1} inner()
    @show retval  # Should be "my string"
    c, retval2 {OP2} inner()
    @show c       # Should be a NamedTuple{:x, :y}
    @show retval2 # Should be "my string"
end

for some infix operators {OP1}and {OP2}.

We also considered having a single statement b ~ submodel return some data structure from which the random variables could be accessed using b.vars and the return value with b.retval.

However, we all agreed that the main proposal here is better, because its syntax is more elegant and it also does not introduce any extra layers of indirection.

@mhauru
Copy link
Member

mhauru commented Feb 11, 2025

Thanks for being the main brains behind this proposal and for an excellent write up. I don't really have much to add, I agree with essentially everything in the OP.

I'm still thinking a bit about whether we could come up with an even more elegant way of getting the return values. I think the core premise here, that c ~ inner() should create a NamedTuple with the latents of inner and add to __varinfo__ all the latents of inner prefixed with c, and that some complementary syntax should be used to extract the return value of inner(), is solid. Submodels are primarily models, which is to say things that take values for a set of random variables and return logprobs and allow sampling from the prior of those variables, and secondarily are allowed to return arbitrary Julia objects for whatever strange needs to user may have. The second infix operator as part of the RHS is the best proposal I've seen or can think of for capturing the return values, but I may keep thinking, see if there could be something even better.

@yebai
Copy link
Member

yebai commented Feb 11, 2025

Looks good; one simple solution to unification is: to_distribution(model()) would construct a NamedTupleDistribution that returns (latent=..., retval=...).

I prefer to minimise extra syntax or macro defined by Turing.jl. This is not a strict rule, but we should keep the bar high.

@willtebbutt
Copy link
Member

This is basically what @penelopeysm mentioned as an alternative that we considered -- see (4) Alternatives considered in her above post. There she discussed returning a Tuple, and you're proposing a NamedTuple, but it's basically the same. I do think that calling it to_distribution is a step in the wrong direction though, because we won't be able to implement logpdf on it, and we won't be able to condition on it. i.e. if you have a model

@model inner()
    x ~ Normal()
    y ~ Bernoulli()
    return "hello"
end

@model outer()
    (c, retval) ~ to_distribution(inner())
    return nothing
end

you cannot condition on retval, despite the fact that it appears on the lhs of a ~. Moreover, you cannot condition on a value of the NamedTuple (c, retval), despite the fact it also appears on the lhs of the tilde.

A key benefit of the proposed syntax is that it balances a few concerns:

  1. you need to be able to get access to the thing that a model returns somehow,
  2. you need to be able to get access to any of the latents of a model (in order to encourage model re-use in the most general way), and
  3. simple semantics are preferable, and the rule "you can condition on anything on the lhs of a tilde and nothing else" is extremely simple.

The to_distribution approach violates the third point, and makes it hard for users to intuit what is going on.

@yebai
Copy link
Member

yebai commented Feb 11, 2025

I do think that calling it to_distribution is a step in the wrong direction though, because we won't be able to implement logpdf on it, and we won't be able to condition on it. i.e. if you have a model

There isretval ~ to_sampleable(model) for inner models where we cannot implement the log density of returned variables. For something more general like (latent, retval) ~ to_sampleable(model), the docs can clearly explain users can only condition on latent but not retval.

I proposed retval ~ to_sampleable(model) to indicate cases where condition is not allowed. The current to_submodel is simply an alias for to_sampleable. So, clarity on what variables one can condition should not be an issue. However, I agree that simple semantics are preferable. The debatable point is what is simpler, i.e. whether a new special syntax is simpler than to_distribution / to_sampleable.

@torfjelde
Copy link
Member

Thanks for writing this up Penny!

Running a submodel without extracting its random values (i.e. just writing submodel {OP} retval) should be forbidden, because in such a case, users should refactor their code to use a plain Julia function instead of a submodel.

Not sure I quite see this.

The aim of submodels is to make it so that models can easily be shared across projects and applications, right? If so, there are definitively applications where you have submodels which happen to only represent a log-likelihood, but in a more general case might represent a model where you want to capture the return values. Asking the user to refactor this into a standard Julia function would either a) require accessing internal DPPL variables, or b) rewriting the model. (a) seems non-ideal, and (b) isn't so easy if the model they use comes from a different package which the user doesn't have control over.

But I guess this is technically something we've already decided to ignore after moving to ~ syntax for submodels (though I forget if we added the option of doing _ ~ ...).

Decide on the infix operator {OP}. We would probably like the operator to (1) be ASCII-compatible; (2) resemble a rightwards arrow.

Personally, I'm a bit worried about such an infix operation. It seems a bit too elaborate?

Maybe it's worth querying end-users about the syntax?

It is possible that we may want to use a different type of data structure that retains more information (i.e. is closer to a varinfo) but still has an interface that allows field access.

Would be surprised if we can find a good and performant solution that doens't involve a VarInfo (or similar) 😕

Figure out how to obtain this data structure when sampling from a submodel.

I think the most likely solution would be:

  1. Nest varinfo objects.
  2. Extract the relevant varinfo object when you hit a submodel.
  3. "Merge" (not quite the current merge since this would be a nested version) the resulting varinfo from submodel call into the "parent" varinfo (or defer these things until they're needed, e.g. when calling getlogp we would need to gather all the logp values from the nested varinfos).

my suspicion is that this might actually be the hardest part of it.

Agreed 😬

I would also point out some of these discussion points seem somewhat independent, e.g. how to represent return- and trace-values from submodels vs. syntax for specifying submodels. Might be worth separating these to avoid getting bogged down in one or the other?

@mhauru
Copy link
Member

mhauru commented Feb 12, 2025

One thing I really like about Penny's proposal is that everything that goes into a VarInfo is always on the LHS of a ~ statement, and everything that is on the LHS of a ~ statement results in a corresponding entry in a VarInfo. This comes from not allowing submodel {OP} retval without a ~, and using an infix operator rather than a tuple of (latents, retval). Also, anything on the LHS of a ~ can be conditioned, and nothing else can be conditioned. That simplicity I think will a) make it very easy to learn and understand, b) will pay long-term dividends when composing this with other language features, developing new syntax, etc.

Note the pleasing analogy with = and bringing variables into scope. ~ is like =, but for __varinfo__ rather than your current Julia scope.

Note also that this would allow us to get rid of prefix; The prefix is obvious from the LHS and always available.

@torfjelde
Copy link
Member

Note also that this would allow us to get rid of prefix; The prefix is obvious from the LHS and always available.

But not always wanted, no? E.g. if you have a model that is nested 10 levels, you don't necessarily want to prefix all that.

@willtebbutt
Copy link
Member

willtebbutt commented Feb 13, 2025

But not always wanted, no? E.g. if you have a model that is nested 10 levels, you don't necessarily want to prefix all that.

Not prefixing something which is 10 levels down feels extremely dangerous to me. This seems like the kind of feature that feels like a win at first, but which you would quickly regret using when you accidentally define the same symbol somewhere else, and now you're conditioning on the wrong thing and erroring / silently failing. If we want to make it convenient to access symbols which buried under layers of models, would a safer mechanism not be preferable?

@yebai
Copy link
Member

yebai commented Feb 14, 2025

I think turning on auto-prefixing is a sensible default, but that doesn't require us to change the syntax (i.e. to_distribution/to_sampleable allows it).

IIRC, some real-world applications for post-inference analysis could benefit from disabling auto-prefixing on demand. @torfjelde knows more about the details.

@torfjelde
Copy link
Member

torfjelde commented Feb 14, 2025

Not prefixing something which is 10 levels down feels extremely dangerous to me

This is why automatically prefixing is deffo preferable. But not having the option at all seems quite annoying, no?

A very simple example:

@model demo() = x ~ Normal()
@model function demo_used_for_prediction()
    @submodel x = demo()
    # Let's sample some predictions!
    y_predict ~ Normal(x, 1)
end

chain = sample(demo(), sampler, num_samples)
generated_quantities(demo_used_for_prediction(), chain)

I've used this pattern a lot, and not just for predictions.

@penelopeysm
Copy link
Member Author

penelopeysm commented Feb 14, 2025

I guess the alternative would be this?

@model demo() = x ~ Normal()

@model function demo_used_for_inference()
    @submodel x = demo()
end

@model function demo_used_for_prediction()
    @submodel x = demo()
    # Let's sample some predictions!
    y_predict ~ Normal(x, 1)
end

chain = sample(demo_used_for_inference(), sampler, num_samples)
generated_quantities(demo_used_for_prediction(), chain)

Personally, I think it's clearer this way because it forces the user to recognise that the inference and prediction models should have the same structure.

I do begin to see though why you might want to have something like an if @is_inference check TuringLang/DynamicPPL.jl#589. (I mean, I kind of got the idea before, but this exemplifies it really clearly to me.)

@torfjelde
Copy link
Member

Personally, I think it's clearer this way because it forces the user to recognise that the inference and prediction models should have the same structure.

That's strictly not necessary though 😕 Many times have I used chains from one model with the same parameters but not the exact same structure, and do predict or generated_quantities. Seems overly restrictive to not even allow this, even if it deffo shouldn't be the default 🤷

@penelopeysm
Copy link
Member Author

penelopeysm commented Feb 14, 2025

From a technical point of view, I'm quite uncomfortable with having submodel variables not be prefixed. To me it feels almost analogous to writing this

struct MyStruct
    a::A
    b::B
end

s = MyStruct(1, 2)

and having a and b in scope without explicitly unpacking with a, b = s.a, s.b.

I can see the convenience argument, but it doesn't really sway me enough. I think dealing with this irregularity with variable scope could be easier for us to navigate because we've seen all of DynamicPPL, but for somebody who is new to modelling it could easily be the other way round — having variables from a submodel injected into their varinfo/chain without prefixing/namespacing could be confusing as it wouldn't tally with their understanding of variable scope.

It's kind of how using Module is super convenient if you know your packages, but isn't really good for programming. Edit: Actually a better analogy is include("file.jl"), where file.jl could just declare literally anything it wanted to without namespacing and thus break code that followed it. In the same way, changing the names of variables in an unprefixed submodel could break tilde-statements that followed the submodel.

@torfjelde
Copy link
Member

All sensible arguments 👍 In the end, it's just a question of convenience vs. "don't allow incorrectness". Personally I've found the convenience very, well, convenient 🙃 so I'm obviously biased.

I believe I mentioned this earlier in the thread, but might be worth asking users here? Understand if you want to avoid that though, as it can complicate the discussion ofc 👍

@penelopeysm
Copy link
Member Author

Definitely worth checking with users, maybe one for Slack :) but also let's discuss it at one of the meetings. As you said there are also several points in this issue, I'm not sure how separable they are but I reckon the prefixing one is separate from the syntax.

@seabbs
Copy link

seabbs commented Feb 17, 2025

My unasked opinion here is that this sounds appealing, especially when viewed from a dev perspective, but @torfjelde has done a good job of capturing this user's concerns.

Not prefixing something which is 10 levels down feels extremely dangerous to me.

Would this mean 10 layers of prefix? I think this would be unpleasant from a user perspective, especially when trying to write post-processing code targeting deeply nested submodels that can change depth across a few models. My preference for not having prefixing be the default is coming from not wanting to see (or show others) the complexity of what is happening unless it is needed and also being fine with a error message telling me there is a name conflict. I see the dev arguments for it though and also agree it seems separate from the rest of this discussion (I just love telling people what I think though).

@penelopeysm
Copy link
Member Author

penelopeysm commented Feb 17, 2025

Would this mean 10 layers of prefix?

Indeed.

change depth across a few models

That's awkward, I can see that.

I might still be open to enabling no-prefix as an option, but making no-prefix the default is not something I would like to be responsible for implementing. If you want to live life on the edge, you should opt-in. We could have

@model function outer()
    lhs ~ inner()
end

do prefixing, and

@model function outer()
    lhs ~ noprefix(inner())
end

do no-prefixing. Note that you would have to apply noprefix() at each level of nesting if you wanted the variable 10 submodels deep to not be prefixed at all.

@yebai
Copy link
Member

yebai commented Feb 17, 2025

Here are a few more points to weigh when considering between

  1. (latent, retval) ~ to_sampleable/to_distribution(demo())
  2. latent ~ demo() --> retval

From a statistical perspective, the infix operator latent ~ demo() --> retval is not super intuitive. Also,

  • Turing models are not identical with distributions, e.g. many Distributions.jl APIs on distributions are missing, so treating demo() in the same way as Normal could lead to confusion.
  • The to_sampleable/distribution(demo()) is not exclusive to the @model block. Thus, one can inspect them in the terminal to understand their behaviour and associated APIs; such modularity makes working with and debugging sub-models easier. Extending them, e.g., work with JuliaBUGS / SSMProblems models is also straightforward.
  • The argument of keeping everything LHS that one can condition is not always true: a new distribution can be defined following Distribution.jl API, but does not implement the logpdf function. Thus, even with the infix operator, we still need to explain to users LHS can only be conditioned on if and only if the RHS implements a log density function.

In short, we need to carefully consider the intuitiveness of any syntax proposed from the perspective of statistical inference, together with programming language properties.

@torfjelde
Copy link
Member

I might still be open to enabling no-prefix as an option, but making no-prefix the default is not something I would like to be responsible for implementing.

Definitively agree we shouldn't do it by default 👍

@seabbs
Copy link

seabbs commented Feb 18, 2025

Note that you would have to apply noprefix() at each level of nesting if you wanted the variable 10 submodels deep to not be prefixed at all.

I see where you are all coming from but noting this would be quite painful - especially if you wanted to override it at specific points as we do now (currently we use another wrapper submodel to add the optional prefixing which I don't think could work with this formulation). Its a shame there is no way (?) to control this behaviour globally. Are you aware of other users making a lot of use of nested submodels somewhere? I would be very keen to see how they have found dealing with this.

If you want to live life on the edge, you should opt-in. We could have

Related I would be interested in hearing how many people have had serious issues with this (or if you mean from a developer view?) in our use I don't think we have run into it more than a few times and then its just be a case of adding a prefix quickly. I had a quick look at the issues but didn't have much success (mostly thinking here if we are just wrong and should stop the only prefix when needed stance we have been moving towards).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
Projects
None yet
Development

No branches or pull requests

6 participants