Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC on Hypotheses Resampling #196

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 3 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
118 changes: 104 additions & 14 deletions rfcs/0000_hypotheses_resampling.md
Original file line number Diff line number Diff line change
Expand Up @@ -3,21 +3,26 @@

# Summary

Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects.
Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects. The newly sampled hypotheses will come from:
1) a subset (uniformly sampled) of new hypotheses initialized based on the current step observation.
2) a set of newly sampled hypotheses from the distribution of the most likely hypotheses.
3) a subset of the old hypotheses based on the metric representing the most likely hypotheses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the wording of (3) a bit confusing/hard to read for some reason. Maybe something like "A subset of the old hypotheses. Which of these are maintained is based on...".

Also for 2 and 3, my understanding is we are basing resampling on the first-order derivative of their evidence accumulation. It might be worth saying something like "... the most rapidly rising hypotheses" rather than "most likely".


# High-Level Motivation

In an unsupervised learning experiment setup, Monty may be presented with multiple objects in a single episode. Ideally, we would like to move away from the traditional data loading setup of machine learning where there is a strict definition of an epoch, episode and step. As Monty starts to interact with the real world, the definition of epoch and episode will start to fade away and we'll be left with simple time discretization (i.e., step). The current definitions are:
* Epoch: Used by the experiment class to denote one full pass through all the objects
In an unsupervised experiment setup, Monty may be presented with multiple objects in a single episode. Ideally, we would like to move away from the traditional data loading setup of machine learning where there is a strict definition of an epoch, episode and step. As Monty starts to interact with the real world, the definition of epoch and episode will start to fade away and we'll be left with simple time discretization (i.e., step). The current definitions are:
* Epoch: Used by the experiment class to denote one full pass through all the objects at a specified rotation
* Episode: Denotes a change in object
* Step: Denotes a single sensation and action in the sensorimotor framework.

Real world interactions do not have epochs or episodes (these are only used for performance benchmarks), instead we could imagine the agent wondering around in a multi-object dynamic environment. The objects can be occluded, moving, or even disappearing behind new objects.
Real world interactions do not have epochs or episodes (these are only used for performance benchmarks), instead we could imagine the agent wandering around in a multi-object dynamic environment. The objects can be occluded, moving, or even disappearing behind new objects. The objects could also be compositional, such as logo on a coffee mug.

**We want Monty to handle dynamic environments by seamlessly switching from one object to another as it's sensors move around on the different objects.**
**We want Monty to handle dynamic environments by seamlessly switching from one object to another as it's sensors move around on the different - potentially compositional - objects.**

*We note that for learning, we will continue to assume for now that Monty learns about objects in an isolated manner (i.e. one at a time), whether or not it receives a supervisory signal in the form of an object label. This is akin to a child holding an object and devoting it's attention to it at the exclusion of the rest of the world (something which the nearsightedness of infants may actually assist with). Relaxing this learning assumption would therefore be a separate topic for future work.*

# The Problem
Monty is designed to receive a weak supervision signal when the episode ends and a new episode begins (change of object). This signal performs a full reset of all states within Monty. This reset includes counters, buffer, goal state generators, learning modules and sensory modules. Additionally, this reset sets Monty back into Matching mode. The below figure shows where this resetting is done. Most resetting happens in the `pre_episode` functions of the Monty and SMs and LMs classes.
Monty is designed to receive a weak supervision signal during inference when an episode ends and a new episode begins (denoting a change of object). This signal performs a full reset of all states within Monty. This reset includes counters, buffer, goal state generators, learning modules and sensory modules. Additionally, this reset sets Monty back into Matching mode. The below figure shows where this resetting is done. Most resetting happens in the `pre_episode` functions of the Monty and SMs and LMs classes.

![Monty Reset Logic](0000_hypotheses_resampling/monty_reset_logic.png)

Expand All @@ -31,6 +36,8 @@ To overcome this, I manually `reset_episode_steps()` such that the `matching_ste

This reveals the main problem. Monty is still unable to accumulate evidence on the existing hypotheses. The current implementation of Monty uses `_get_all_informed_possible_poses()` to initialize hypotheses after seeing a single pose of the object. This is a smart way to reduce the number of initial hypotheses based on the principal curvature but it assumes that the object doesn't change and that these hypotheses will always be valid. However, when we change the object we would need to update these initial hypotheses based on a new pose observation of the new object. A simple test of sampling additional hypotheses (with informed poses) on the second object pose shows that we are able to accumulate evidence on these new hypotheses. See figure below.

*Note that even when testing a single object, a noisy initial pose observation can affect the quality of the initially sampled hypotheses. Using these incorrect hypotheses (without resampling) will limit Monty's performance until the end of the episode.*

![Resampling](0000_hypotheses_resampling/resampling_banana_mug.png)

# The Proposed Solution
Expand All @@ -45,21 +52,104 @@ We currently use the total evidence score to decide which hypotheses are more pr
Why:
* **Faster**: we don't have to wait for high unbounded evidence to decay enough to realize that a new hypothesis is more likely. We also may not need to worry about initializing new hypotheses with mean evidence, giving them fighting chance against other old hypotheses. Average slope is more fair in this sense.
* **Accurate resampling**: If we sample new hypotheses close to the hypotheses with high total accumulated evidence (e.g., particle filter), we could be sampling from the incorrect hypotheses (if we had just switched objects). If we sample close to the hypotheses with high evidence slope, we may converge faster.
* **Practical**: The total evidence can still be unbounded, it doesn't matter because we only consider the slope. This metric does not care about how much evidence we've accumulated already. It other words, a hypothesis with a high evidence, can be removed if it hasn't accumulated evidence in a while, while a consistently growing hypothesis is less likely to be removed even if it was just added.
* **Practical**: The total evidence can still be unbounded, it doesn't matter because we only consider the slope. This metric does not care about how much evidence we've accumulated already. In other words, a hypothesis with a high evidence, can be removed if it hasn't accumulated evidence in a while, while a consistently growing hypothesis is less likely to be removed even if it was just added.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: What about the other end of the spectrum? Would another practical reason be that evidence could be tracked for S steps only? If my cotton candy becomes nothing in water, does it matter how much evidence I stored for cotton candy now that it all disappeared?

raccoon-cotton-candy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Considering the average evidence slope over the last S steps only will allow us to forget about objects that disappear in the favor of other objects with rapidly rising evidence. It doesn't matter if the old object has quickly accumulated evidence in the past (i.e., before S steps). But that's another parameter to tune..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, many thoughts around this! It seems like cotton candy disappearing in water may actually be a learned object behavior (and if you haven't learned it, like this raccoon, you are quite surprised). Just making the evidence horizon shorter will have similar effects as we saw when using bounded evidence (past_weight+present_weight=1) where our current policies are not efficient enough to explore a sufficient area of the object to make a confident classification. Of course, we could mitigate this more with more efficient policies, but it seems like forgetting about stuff is more than just a time horizon parameter. I originally thought of it this way as well (which is why I suggested testing the bounded evidence first as a solution for switching hypotheses when moving from one object to another) but it seems like there should be more rapid model-free and model-based mechanisms at play here. Such as using predictions and prediction errors to reset hypotheses (like described here #196 (comment)).


## Assumptions and constraints:
* Number of hypotheses should not scale up with steps, if anything they should decrease. For now, any sampled hypothesis must replace an old "unlikely" hypothesis.
* Sampling of likely and unlikely hypotheses is based on evidence change instead of absolute evidence.
* Terminal state calculation is still based on accumulated evidence and `x_percent_threshold`.
* The resampling procedure only occurs if principal curvature is defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a pretty limiting constraint. Consider an object like ball where PC is not well defined anywhere on the object. Why not still use the point normal to inform new hypotheses? If you are worries about adding too many new hypotheses we could set the number of sampled hypotheses in get_more_directions_in_plane lower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just worried that we would be replacing more accurate hypotheses with less accurate ones, especially if the object hasn't changed. For example, going from an area of defined PC to undefined PC. The hypotheses generated without a principal curvature are many and very coarse (9 per axis). It may not be too much of a concern because we only replace the unlikely hypotheses, I'm just worried that resampling every step will eventually run out of unlikely hypotheses and start replacing some good hypotheses too. What do you think?

You raise a good point on the ball object. Maybe add fewer "semi-informed" hypotheses when PC is undefined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just elaborating on this further - for objects like the ball, we shouldn't need a ton of hypotheses to recognize them, because the poses are symmetric. So I think this fits with the suggestion that if pose is undefined, we can add some hypotheses, but rather than adding them for e.g. 8 different rotations, we can just add them for say, 4 (or even 2). If it is an object that is less symmetric than a sphere, then this should hopefully get accounted for by informed hypotheses that are sampled when we get onto more pose-defining features.

(this also couples well with Monty's approach to symmetry, i.e. it doesn't matter if the poses it samples and converges to aren't the Euler-angle equivalent of the ground-truth pose, especially if it has stored which poses are equivalent, which is a feature in the pipeline)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. For what it's worth, I won't be directly controlling the number of different rotations to be sampled based on PC defined/undefined. Instead, I will be uniformly sampling a fixed number of informed hypotheses based on the parameters introduced in this RFC. For example, if I need to get 100 informed hypotheses, I will uniformly sample them from 2000 hypotheses (with PC defined) or 8000 hypotheses (with PC undefined).

My concern here was mainly for scenarios where the sensor moves on the side of the coffee mug for a while (i.e., PC defined) and then moves on the bottom side (i.e., PC undefined). Sampling from a bigger pool of inaccurate hypotheses to replace more accurate hypotheses that were sampled when PC was well defined may cause some problems. That being said, it may turn out to be empirically insignificant.


## The Resampling Procedure

1) For every object, at every new step, get initial hypotheses based on the observed pose.
2) If principal curvature is not defined, skip
3) Uniformly sample M percent from these new hypotheses (e.g., M = 20% = 1000 hyp)
4) Sample N percent from the existing hypotheses distribution of highest evidence slope (e.g., N = 10% = 500 hyp). These sampled hypotheses should be "close" to the most likely hypotheses.
5) Replace unlikely M+N percent of existing hypotheses (e.g., 30% = 1500 hyp) with these newly sampled hypotheses. The unlikely hypotheses are chosen based on evidence change.
1) If principal curvature is not defined, skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the separate discussion, this may need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I've removed this from the resampling procedure and the constraints.

2) For every object, at every new step, get initial hypotheses based on the observed pose.
3) Calculate the needed hypotheses counts to be sampled based on the defined parameters as shown [here](#The-Resampling-Count-Calculation).
4) Sample `needed_old_sampled_hyp` from the existing hypotheses based on highest evidence slope. We will keep these old hypotheses and remove the rest.
5) Uniformly sample `needed_new_informed_hyp` from the new informed hypotheses. We will add these new hypotheses.
6) Sample `needed_new_reinforced_hyp` from the existing hypotheses distribution of highest evidence slope. These sampled hypotheses should be "close" to the most likely hypotheses.


## High Level Code Changes

The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function
Copy link
Contributor

@nielsleadholm nielsleadholm Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"be either be modifying" ... "defining" --> "modify either"... "define" (noticed a typo, but also a useful reminder to use active voice in writing 😅)

directly or defining another function that calls `_update_evidence` as one of its steps. A rough proposal of the needed changes is shown below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For function number 8, it would be nice to refactor out the existing section below into it's own function so that can be used ideally as-is both i) in the existing _update_evidence, and ii) the new setting you are considering.

Lines 952-1002 in evidence_matching.py

        # Before moving we initialize the hypothesis space:
        if displacements is None:
                ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I may end up rewriting and factoring out parts of _update_evidence in the process. I was thinking of just rewriting _update_evidence to support resampling and breaking it up into manageable smaller functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah that would be great. It's a pretty big method at the moment.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth thinking whether function 6 "Resample old and reinforced" can be broken up into two methods - at least on the surface it seems like these should probably be separated out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The main purpose of the proposed figure is to show roughly where the changes will occur within the matching_step but I imagine that the exact number of functions will change as I start implementing. I may even do all the changes in update_evidence as discussed above. But at least this gives you an idea of where I plan to start.

![code change](0000_hypotheses_resampling/high_level_code_change.png)

*Note that the two types (reinforced and informed) of resampled hypotheses are treated differently.
Unlike reinforced hypotheses, the current positions of the informed hypotheses do not need to be rotated and displaced at resampling.
Informed hypotheses are added after the reinforced (and old) hypotheses are updated/displaced.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still plan on added feature evidence to the new informed hypotheses, right? (like in _calculate_feature_evidence_for_all_nodes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I would need to compare with sensed features and add evidence if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I'm probably better off rewriting _update_evidence with support for resampling.



## The Resampling Count Calculation <a name="The-Resampling-Count-Calculation"></a>

This is a proposed approach to calculating the new set of hypotheses at every step.
This should take care of dynamically increasing or decreasing the hypotheses count.
For example, if we want to start by coarse sampling of hypotheses and then increase the number of hypotheses (and vice versa), we should be able to do that here.

I'm introducing three new parameters. The naming of these parameters is preliminary and subject to change.

| Parameter | Description | Range |
| ----------|-------------|-------|
| **hypotheses_count_ratio** | A multiplier for the needed number of hypotheses at this new step | [0, inf) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean if it is 1, the number of hypotheses remains the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see this below now :) Could be worth adding to the description here too since you have that in the description of the other params as well and it's useful info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll add that.

| **hypotheses_old_to_new_ratio** | How many old to new hypotheses to be added. `0` means all old, `1` means all new | [0, 1] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I find the use of _ratio in hypothesis_old_to_new_ratio (and this table) confusing. I would expect a ratio to not be bound to [0, 1]. When I think ratio of old to new I think of phrases like 2 old to 1 new 2:1, which would be 2 numerically. Maybe this could be called hypothesis_old_to_new_range|mix|factor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point here. Yes, the actual parameter is not a ratio, but instead it allows us to set the ratio (e.g., 0.5 would give us 1:1). I like factor. We can also call it weight, as in the weight of old to new in the hypotheses mix.

| **hypotheses_informed_to_reinforce_ratio** | How many informed (sampled based on newly observed pose) to reinforced hypotheses (sampled close to existing likely hypotheses) to be added. `0` means all informed, `1` means all reinforced | [0, 1] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to define informed and reinforced earlier in the document, as these terms are used a few times but are a bit confusing without a clear description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good idea. I've added them to the summary. Also I came up with "reinforced" and I'm not sure if it's the best term, but we can go with it for now if no objections.


*Note that it is possible to configure these parameters to remove the effect of resampling and return to the current `EvidenceLM` behavior. Simply set `hypotheses_count_ratio=1` to keep the same number of hypotheses and `hypotheses_old_to_new_ratio=0` to sample only from existing hypotheses.*



```python

# a multiplier for the needed number of hypotheses at this new step
hypotheses_count_ratio = 1

# 0 means all sampled from old, 1 means all sampled from new
hypotheses_old_to_new_ratio = 0.5


# 0 means all sampled from informed hypotheses, 1 means all sampled from reinforced hypotheses
hypotheses_informed_to_reinforce_ratio = 0.5


def calculate_new_hypotheses_counts(
curr_hypotheses_count, new_informed_hypotheses_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the new_informed_hypotheses_count parameter for? Would be useful with some description here and on line 135.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the total number of informed hypotheses sampled based on the current pose observation. Assuming there are x points in the object graph, these can be 2x if the PC is defined or 8x if PC is undefined. When calculating the needed number of informed hypotheses (i.e., needed_new_informed_hyp), we should not request more than the available new_informed_hypotheses_count. If we request more, the difference will come from reinforced hypotheses which are technically infinite.

Note that we do not need to actually get the informed hypotheses at this step to know how many we have, we can just use the number of points in the graph and whether the pose is defined.

I'll add some description.

):
# calculate the total number of hypotheses needed
needed_hypotheses_count = curr_hypotheses_count * hypotheses_count_ratio

# calculate how many old and new hypotheses needed
needed_old_sampled_hyp, needed_new_sampled_hyp = (
needed_hypotheses_count * (1 - hypotheses_old_to_new_ratio),
needed_hypotheses_count * hypotheses_old_to_new_ratio,
)
if needed_old_sampled_hyp > curr_hypotheses_count:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be worth with a comment here, e.g. if trying to include more old hypotheses than exist, set their count as the ceiling.

needed_old_sampled_hyp = curr_hypotheses_count
needed_new_sampled_hyp = needed_hypotheses_count - curr_hypotheses_count

# calculate how many informed and re-enforced hypotheses needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Should be reinforced.

needed_new_informed_hyp, needed_new_reinforced_hyp = (
needed_new_sampled_hyp * (1 - hypotheses_informed_to_reinforce_ratio),
needed_new_sampled_hyp * hypotheses_informed_to_reinforce_ratio,
)
if needed_new_informed_hyp > new_informed_hypotheses_count:
needed_new_informed_hyp = new_informed_hypotheses_count
needed_new_reinforced_hyp = (
needed_new_sampled_hyp - new_informed_hypotheses_count
)

return (
int(needed_old_sampled_hyp),
int(needed_new_informed_hyp),
int(needed_new_reinforced_hyp),
)



calculate_new_hypotheses_counts(
curr_hypotheses_count=100, new_informed_hypotheses_count=100
)
```

# Alternatives Considered:
* Detect change with evidence decay and hard reset?
Expand All @@ -72,4 +162,4 @@ Why:
* Maybe we should brainstorm about what a terminal state should be in a multi-object environment.
* Maybe also brainstorm about ways to more seamlessly switch between matching and exploration? I don't like forcing Monty to always be in matching mode, by doing this it feels that we are making the gap between learning and inference wider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely two good topics to talk about more. I wrote my thoughts on both in earlier comments. lmk what you think.

* Start designing environments with multiple objects aiming towards compositional models.
* Will require motor policy change to allow for moving between objects
* Will require motor policy change to allow for moving between objects.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.