Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

RFC on Hypotheses Resampling #196

Open
wants to merge 9 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from 4 commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
165 changes: 165 additions & 0 deletions rfcs/0000_hypotheses_resampling.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,165 @@
- Start Date: 2025-02-27
- RFC PR: (leave this empty, it will be filled in after RFC is merged)

# Summary

Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects. The newly sampled hypotheses will come from:
1) a subset (uniformly sampled) of new hypotheses initialized based on the current step observation.
2) a set of newly sampled hypotheses from the distribution of the most likely hypotheses.
3) a subset of the old hypotheses based on the metric representing the most likely hypotheses.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I found the wording of (3) a bit confusing/hard to read for some reason. Maybe something like "A subset of the old hypotheses. Which of these are maintained is based on...".

Also for 2 and 3, my understanding is we are basing resampling on the first-order derivative of their evidence accumulation. It might be worth saying something like "... the most rapidly rising hypotheses" rather than "most likely".


# High-Level Motivation

In an unsupervised experiment setup, Monty may be presented with multiple objects in a single episode. Ideally, we would like to move away from the traditional data loading setup of machine learning where there is a strict definition of an epoch, episode and step. As Monty starts to interact with the real world, the definition of epoch and episode will start to fade away and we'll be left with simple time discretization (i.e., step). The current definitions are:
* Epoch: Used by the experiment class to denote one full pass through all the objects at a specified rotation
* Episode: Denotes a change in object
* Step: Denotes a single sensation and action in the sensorimotor framework.

Real world interactions do not have epochs or episodes (these are only used for performance benchmarks), instead we could imagine the agent wandering around in a multi-object dynamic environment. The objects can be occluded, moving, or even disappearing behind new objects. The objects could also be compositional, such as logo on a coffee mug.

**We want Monty to handle dynamic environments by seamlessly switching from one object to another as it's sensors move around on the different - potentially compositional - objects.**

*We note that for learning, we will continue to assume for now that Monty learns about objects in an isolated manner (i.e. one at a time), whether or not it receives a supervisory signal in the form of an object label. This is akin to a child holding an object and devoting it's attention to it at the exclusion of the rest of the world (something which the nearsightedness of infants may actually assist with). Relaxing this learning assumption would therefore be a separate topic for future work.*

# The Problem
Monty is designed to receive a weak supervision signal during inference when an episode ends and a new episode begins (denoting a change of object). This signal performs a full reset of all states within Monty. This reset includes counters, buffer, goal state generators, learning modules and sensory modules. Additionally, this reset sets Monty back into Matching mode. The below figure shows where this resetting is done. Most resetting happens in the `pre_episode` functions of the Monty and SMs and LMs classes.

![Monty Reset Logic](0000_hypotheses_resampling/monty_reset_logic.png)

If we simply disable this resetting signal for the Monty class (and by extension SMs and LMs) between episodes, there will not be enough evidence update in the first step of a new object to get out of a terminal state. Monty will still think it is seeing the old object after getting a single observation of the new object. See the plot below.

![Single Step Terminal State](0000_hypotheses_resampling/terminal_state_reached.png)

To overcome this, I manually `reset_episode_steps()` such that the `matching_steps` would still be under the `min_eval_steps` and allow Monty time to gather enough evidence on the new object. Additionally, I manually `switch_to_matching_step()` between episodes since the `_compute_possible_matches` function that accumulates evidence is only called during matching, not exploration. This results in the following plot.

![No Resampling](0000_hypotheses_resampling/no_resampling.png)

This reveals the main problem. Monty is still unable to accumulate evidence on the existing hypotheses. The current implementation of Monty uses `_get_all_informed_possible_poses()` to initialize hypotheses after seeing a single pose of the object. This is a smart way to reduce the number of initial hypotheses based on the principal curvature but it assumes that the object doesn't change and that these hypotheses will always be valid. However, when we change the object we would need to update these initial hypotheses based on a new pose observation of the new object. A simple test of sampling additional hypotheses (with informed poses) on the second object pose shows that we are able to accumulate evidence on these new hypotheses. See figure below.

*Note that even when testing a single object, a noisy initial pose observation can affect the quality of the initially sampled hypotheses. Using these incorrect hypotheses (without resampling) will limit Monty's performance until the end of the episode.*

![Resampling](0000_hypotheses_resampling/resampling_banana_mug.png)

# The Proposed Solution


Monty doesn't know when the object will be swapped, or if the next observation will move onto a different object. We have to resample hypotheses every step in a systematic manner, because an object change could happen in any step. Inspired by particle filters, we could come up with a modified resampling procedure.

## A New Metric

We currently use the total evidence score to decide which hypotheses are more promising. This is true if the object doesn't change, because these accumulated evidence scores are tied to a specific object. I propose to use the mean slope (over the last S steps) of evidence to decide on the goodness of a hypothesis. After swapping the object, the most promising hypotheses are the ones with higher positive slope not the ones with high accumulated evidence.

Why:
* **Faster**: we don't have to wait for high unbounded evidence to decay enough to realize that a new hypothesis is more likely. We also may not need to worry about initializing new hypotheses with mean evidence, giving them fighting chance against other old hypotheses. Average slope is more fair in this sense.
* **Accurate resampling**: If we sample new hypotheses close to the hypotheses with high total accumulated evidence (e.g., particle filter), we could be sampling from the incorrect hypotheses (if we had just switched objects). If we sample close to the hypotheses with high evidence slope, we may converge faster.
* **Practical**: The total evidence can still be unbounded, it doesn't matter because we only consider the slope. This metric does not care about how much evidence we've accumulated already. In other words, a hypothesis with a high evidence, can be removed if it hasn't accumulated evidence in a while, while a consistently growing hypothesis is less likely to be removed even if it was just added.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

thought: What about the other end of the spectrum? Would another practical reason be that evidence could be tracked for S steps only? If my cotton candy becomes nothing in water, does it matter how much evidence I stored for cotton candy now that it all disappeared?

raccoon-cotton-candy

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, good point. Considering the average evidence slope over the last S steps only will allow us to forget about objects that disappear in the favor of other objects with rapidly rising evidence. It doesn't matter if the old object has quickly accumulated evidence in the past (i.e., before S steps). But that's another parameter to tune..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, many thoughts around this! It seems like cotton candy disappearing in water may actually be a learned object behavior (and if you haven't learned it, like this raccoon, you are quite surprised). Just making the evidence horizon shorter will have similar effects as we saw when using bounded evidence (past_weight+present_weight=1) where our current policies are not efficient enough to explore a sufficient area of the object to make a confident classification. Of course, we could mitigate this more with more efficient policies, but it seems like forgetting about stuff is more than just a time horizon parameter. I originally thought of it this way as well (which is why I suggested testing the bounded evidence first as a solution for switching hypotheses when moving from one object to another) but it seems like there should be more rapid model-free and model-based mechanisms at play here. Such as using predictions and prediction errors to reset hypotheses (like described here #196 (comment)).


## Assumptions and constraints:
* Sampling of likely and unlikely hypotheses is based on evidence change instead of absolute evidence.
* Terminal state calculation is still based on accumulated evidence and `x_percent_threshold`.
* The resampling procedure only occurs if principal curvature is defined.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This could be a pretty limiting constraint. Consider an object like ball where PC is not well defined anywhere on the object. Why not still use the point normal to inform new hypotheses? If you are worries about adding too many new hypotheses we could set the number of sampled hypotheses in get_more_directions_in_plane lower.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm just worried that we would be replacing more accurate hypotheses with less accurate ones, especially if the object hasn't changed. For example, going from an area of defined PC to undefined PC. The hypotheses generated without a principal curvature are many and very coarse (9 per axis). It may not be too much of a concern because we only replace the unlikely hypotheses, I'm just worried that resampling every step will eventually run out of unlikely hypotheses and start replacing some good hypotheses too. What do you think?

You raise a good point on the ball object. Maybe add fewer "semi-informed" hypotheses when PC is undefined?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah just elaborating on this further - for objects like the ball, we shouldn't need a ton of hypotheses to recognize them, because the poses are symmetric. So I think this fits with the suggestion that if pose is undefined, we can add some hypotheses, but rather than adding them for e.g. 8 different rotations, we can just add them for say, 4 (or even 2). If it is an object that is less symmetric than a sphere, then this should hopefully get accounted for by informed hypotheses that are sampled when we get onto more pose-defining features.

(this also couples well with Monty's approach to symmetry, i.e. it doesn't matter if the poses it samples and converges to aren't the Euler-angle equivalent of the ground-truth pose, especially if it has stored which poses are equivalent, which is a feature in the pipeline)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah makes sense. For what it's worth, I won't be directly controlling the number of different rotations to be sampled based on PC defined/undefined. Instead, I will be uniformly sampling a fixed number of informed hypotheses based on the parameters introduced in this RFC. For example, if I need to get 100 informed hypotheses, I will uniformly sample them from 2000 hypotheses (with PC defined) or 8000 hypotheses (with PC undefined).

My concern here was mainly for scenarios where the sensor moves on the side of the coffee mug for a while (i.e., PC defined) and then moves on the bottom side (i.e., PC undefined). Sampling from a bigger pool of inaccurate hypotheses to replace more accurate hypotheses that were sampled when PC was well defined may cause some problems. That being said, it may turn out to be empirically insignificant.


## The Resampling Procedure

1) If principal curvature is not defined, skip
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Based on the separate discussion, this may need to be updated.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Sounds good, I've removed this from the resampling procedure and the constraints.

2) For every object, at every new step, get initial hypotheses based on the observed pose.
3) Calculate the needed hypotheses counts to be sampled based on the defined parameters as shown [here](#The-Resampling-Count-Calculation).
4) Sample `needed_old_sampled_hyp` from the existing hypotheses based on highest evidence slope. We will keep these old hypotheses and remove the rest.
5) Uniformly sample `needed_new_informed_hyp` from the new informed hypotheses. We will add these new hypotheses.
6) Sample `needed_new_reinforced_hyp` from the existing hypotheses distribution of highest evidence slope. These sampled hypotheses should be "close" to the most likely hypotheses.


## High Level Code Changes

The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function
Copy link
Contributor

@nielsleadholm nielsleadholm Mar 4, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

"be either be modifying" ... "defining" --> "modify either"... "define" (noticed a typo, but also a useful reminder to use active voice in writing 😅)

directly or defining another function that calls `_update_evidence` as one of its steps. A rough proposal of the needed changes is shown below.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

For function number 8, it would be nice to refactor out the existing section below into it's own function so that can be used ideally as-is both i) in the existing _update_evidence, and ii) the new setting you are considering.

Lines 952-1002 in evidence_matching.py

        # Before moving we initialize the hypothesis space:
        if displacements is None:
                ...

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, makes sense. I may end up rewriting and factoring out parts of _update_evidence in the process. I was thinking of just rewriting _update_evidence to support resampling and breaking it up into manageable smaller functions.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice, yeah that would be great. It's a pretty big method at the moment.


Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It would be worth thinking whether function 6 "Resample old and reinforced" can be broken up into two methods - at least on the surface it seems like these should probably be separated out.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good point. The main purpose of the proposed figure is to show roughly where the changes will occur within the matching_step but I imagine that the exact number of functions will change as I start implementing. I may even do all the changes in update_evidence as discussed above. But at least this gives you an idea of where I plan to start.

![code change](0000_hypotheses_resampling/high_level_code_change.png)

*Note that the two types (reinforced and informed) of resampled hypotheses are treated differently.
Unlike reinforced hypotheses, the current positions of the informed hypotheses do not need to be rotated and displaced at resampling.
Informed hypotheses are added after the reinforced (and old) hypotheses are updated/displaced.*
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You still plan on added feature evidence to the new informed hypotheses, right? (like in _calculate_feature_evidence_for_all_nodes)

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes. I would need to compare with sensed features and add evidence if needed.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At this point, I'm probably better off rewriting _update_evidence with support for resampling.



## The Resampling Count Calculation <a name="The-Resampling-Count-Calculation"></a>

This is a proposed approach to calculating the new set of hypotheses at every step.
This should take care of dynamically increasing or decreasing the hypotheses count.
For example, if we want to start by coarse sampling of hypotheses and then increase the number of hypotheses (and vice versa), we should be able to do that here.

I'm introducing three new parameters. The naming of these parameters is preliminary and subject to change.

| Parameter | Description | Range |
| ----------|-------------|-------|
| **hypotheses_count_ratio** | A multiplier for the needed number of hypotheses at this new step | [0, inf) |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does this mean if it is 1, the number of hypotheses remains the same?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ah I see this below now :) Could be worth adding to the description here too since you have that in the description of the other params as well and it's useful info

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah, I'll add that.

| **hypotheses_old_to_new_ratio** | How many old to new hypotheses to be added. `0` means all old, `1` means all new | [0, 1] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

issue: I find the use of _ratio in hypothesis_old_to_new_ratio (and this table) confusing. I would expect a ratio to not be bound to [0, 1]. When I think ratio of old to new I think of phrases like 2 old to 1 new 2:1, which would be 2 numerically. Maybe this could be called hypothesis_old_to_new_range|mix|factor?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see your point here. Yes, the actual parameter is not a ratio, but instead it allows us to set the ratio (e.g., 0.5 would give us 1:1). I like factor. We can also call it weight, as in the weight of old to new in the hypotheses mix.

| **hypotheses_informed_to_reinforce_ratio** | How many informed (sampled based on newly observed pose) to reinforced hypotheses (sampled close to existing likely hypotheses) to be added. `0` means all informed, `1` means all reinforced | [0, 1] |
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think it would be useful to define informed and reinforced earlier in the document, as these terms are used a few times but are a bit confusing without a clear description.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yeah, good idea. I've added them to the summary. Also I came up with "reinforced" and I'm not sure if it's the best term, but we can go with it for now if no objections.


*Note that it is possible to configure these parameters to remove the effect of resampling and return to the current `EvidenceLM` behavior. Simply set `hypotheses_count_ratio=1` to keep the same number of hypotheses and `hypotheses_old_to_new_ratio=0` to sample only from existing hypotheses.*



```python

# a multiplier for the needed number of hypotheses at this new step
hypotheses_count_ratio = 1

# 0 means all sampled from old, 1 means all sampled from new
hypotheses_old_to_new_ratio = 0.5


# 0 means all sampled from informed hypotheses, 1 means all sampled from reinforced hypotheses
hypotheses_informed_to_reinforce_ratio = 0.5


def calculate_new_hypotheses_counts(
curr_hypotheses_count, new_informed_hypotheses_count
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What exactly is the new_informed_hypotheses_count parameter for? Would be useful with some description here and on line 135.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are the total number of informed hypotheses sampled based on the current pose observation. Assuming there are x points in the object graph, these can be 2x if the PC is defined or 8x if PC is undefined. When calculating the needed number of informed hypotheses (i.e., needed_new_informed_hyp), we should not request more than the available new_informed_hypotheses_count. If we request more, the difference will come from reinforced hypotheses which are technically infinite.

Note that we do not need to actually get the informed hypotheses at this step to know how many we have, we can just use the number of points in the graph and whether the pose is defined.

I'll add some description.

):
# calculate the total number of hypotheses needed
needed_hypotheses_count = curr_hypotheses_count * hypotheses_count_ratio

# calculate how many old and new hypotheses needed
needed_old_sampled_hyp, needed_new_sampled_hyp = (
needed_hypotheses_count * (1 - hypotheses_old_to_new_ratio),
needed_hypotheses_count * hypotheses_old_to_new_ratio,
)
if needed_old_sampled_hyp > curr_hypotheses_count:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be worth with a comment here, e.g. if trying to include more old hypotheses than exist, set their count as the ceiling.

needed_old_sampled_hyp = curr_hypotheses_count
needed_new_sampled_hyp = needed_hypotheses_count - curr_hypotheses_count

# calculate how many informed and re-enforced hypotheses needed
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

typo: Should be reinforced.

needed_new_informed_hyp, needed_new_reinforced_hyp = (
needed_new_sampled_hyp * (1 - hypotheses_informed_to_reinforce_ratio),
needed_new_sampled_hyp * hypotheses_informed_to_reinforce_ratio,
)
if needed_new_informed_hyp > new_informed_hypotheses_count:
needed_new_informed_hyp = new_informed_hypotheses_count
needed_new_reinforced_hyp = (
needed_new_sampled_hyp - new_informed_hypotheses_count
)

return (
int(needed_old_sampled_hyp),
int(needed_new_informed_hyp),
int(needed_new_reinforced_hyp),
)



calculate_new_hypotheses_counts(
curr_hypotheses_count=100, new_informed_hypotheses_count=100
)
```

# Alternatives Considered:
* Detect change with evidence decay and hard reset?
* Here we would focus on detecting the change that triggers full reset of Monty states.
* Detecting the object change will likely be based on evidence decay and can be problematic. If an object was added and quickly changed before accumulating enough evidence, it will be difficult to detect change based on evidence decay. This will still result in incorrect initial hypotheses. Resampling without a hard reset is more general.

# Future possibilities
Copy link
Contributor

@nielsleadholm nielsleadholm Feb 28, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One thing I've been thinking about in terms of the artificial nature of this task (which is also something we've discussed before), is this current setup is a bit like a "magic trick" where someone puts their hand in a box to feel an object with one finger, then after a couple minutes, someone discretely replaces the object without their finger really feeling a major change. In other words, it's not very naturalistic. What we are actually trying to build towards is being able to seamlessly move from one sub-object to another if we have e.g. a logo on a mug, or a car composed of parts, and for the LM representation to update without needing a supervisory signal that it's on a new object. In the compositional case however, if the LM was 100% confident about where it was on a learned reference frame (e.g. on the "N" in the child object logo), after performing a movement onto the parent object (e.g. mug), it's hypothesized location in the child object's reference frame would now be in empty space, far from any learned features that exist for that object. In that case, the evidence associated with this "off-object" location should essentially be zero by definition, and it should be straightforward to implement this.

In the above case, resampling would still be important, but I think it just emphasizes that the evidence scores could be a lot less adversarial than in the current experiment setup.

For what it's worth in the above situation, if you moved back onto the object, I think it would be the responsibility of a higher level LM that remembers e.g. logo at location x to bias the low-level LM, rather than requiring the low-level LM maintain a memory of the previous object. This would tie in with resampling, i.e. we wouldn't need to worry about keeping lots of sampling points for the old object if we've moved off of it.

@vkakerbeck what do you think? It almost makes me wonder whether we would be better off having the compositional dataset to evaluate these approaches on, rather than trying to shoehorn it into the current YCB object cycling setup. This is similar to some of Jeff's comments at the retreat. Maybe something we can also discuss on Monday.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Good points. I think this may relate to the mechanism for switching between matching and exploration policies we discussed briefly:

  • If we have recognized an object and are now exploring it, we don't need to keep testing hypotheses, we just make predictions using our model of this object and check if these predictions are correct.
  • We would exit this mode if 1) our prediction is suddenly incorrect (the artificial setup where you suddenly replace the object, or you were wrong about the object) or 2) the sensor moves to a location where the model wouldn't predict the object to exist anymore (moving off object) which is the usual, more natural case.
  • The second case would give us a model-based signal for switching back into matching mode and potentially reinitializing the entire hypothesis space.
  • In my mind we are currently dealing with the case where we have actually not recognized the object yet so we can't make model-based predictions. I mean you could argue that we can make model-based predictions using the mlh (which we do when testing that hypothesis) but it is not as much of a reliable signal that it could be used for a hard reset.

In terms of the test bed we use, I think that we can keep using the current setup for the hypothesis resampling work since this mechanism doesn't rely on physical movement from one object onto another. It should actually be useful for recognizing a single object in general (removing reliance on first observation and allowing us to test less hypotheses at any point in time).

However, it does bring up a good point on whether this item is really the most impactful one for us to tackle for the unsupervised learning milestone. We could evaluate whether it makes more sense to start with the prediction error/off-object prediction signal I outlined. In that case we would definitely need a multi object environment setup since this relies on a physical separation between the two objects.

Lmk what you think. Happy to talk about this more in a meeting too.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah I think my concern with optimizing for this current problem setup is that we may end up with solutions that aren't actually that necessary / useful. E.g. clearly the hypothesis resampling is useful, but the exact details of it might be lead on a tangent by trying to deal with this adversarial (and artificial) problem setup.

  • One option is just use this as you say, but we should be cognizant of this caveat and not worry about getting super high accuracy in this setting, and just consider it as a test-bed for resampling.
  • The other is the option of, after this RFC, to refocus on the actual dataset we want to deal with. A lot of unsupervised learning features aren't actually important if we are learning a dataset where compositional objects can be learned in isolation, in which case we can potentially refocus the items under the first milestone.

Copy link
Contributor Author

@ramyamounir ramyamounir Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree with @nielsleadholm that this setup is not very naturalistic and might lead to some unnecessary solutions because we're optimizing for a much harder experiment. I like the option where we consider this a test-bed for resampling and not worry about super high accuracy. In practice, Monty should be able to always accumulate evidence even when we switch objects in place, but we don't need to spend too much time optimizing the speed of inference between objects.

I also like @vkakerbeck distinction between model-based and model-free evidence accumulation. If we have not recognized a model, we can't really use a model to make good predictions. This is where hypotheses resampling is most useful - to seamlessly adapt to changes while still trying to recognize the object. But if we have recognized the model, we can make predictions and figure out when a "Hard" reset might be needed. This would require that we continuously test the recognized model at every exploratory step. My only concern here is how do we know that a prediction error due to an unseen part of the object (e.g., previously unseen handle on a mug) should be treated as part of the object (i.e., add it to the cylinder graph) vs. trigger a prediction error reset for recognizing a new object that happens to be spatially close? Is this problem specific to learning-from-scratch?

I am a bit biased towards keeping this setup until we implement a decent hypotheses resampling LM because multiple objects would require us to think about (hierarchical) action policies to move between objects.

  • If it's a compositional object, such as wheels on a car, we would use a hypotheses-based hierarchical action policy. For example, if I know the higher level LM is on the car and the lower level LM is on the front wheel, now the higher level LM can instruct the lower level LM to move to the back wheel and reset its hypotheses (and perhaps even bias the new hypotheses with extra evidence for the back wheel object as Niels mentioned)
  • If we are not on a compositional object, we need to think about model-free action policies to move from one object to another. Maybe we can hard-code that in for now as a hack (i.e., after x steps move the sensor to object 2 at a known location)? Note that the "model" in "model-free" here refers to higher-level compositional model.

@nielsleadholm when you say compositional dataset, are you talking about the dinner set scenes?

Copy link
Contributor

@nielsleadholm nielsleadholm Mar 3, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Great, Viviane and I were discussing this earlier this morning and yes what we are thinking is:

  • Clarify that the current work to enable a first version of compositional objects is about unsupervised inference, not learning. For now, we assume that we are going to learn sub-objects in isolation (e.g. mug and logo, see below), and so it's more about equipping Monty to handle inference where it may move from one object onto another.
  • Continue the current work you are doing. Adding the ability to resample hypotheses will be important for several aspects of compositional objects. This first version would actually be best tested on the existing benchmarks by seeing whether we can get more robust inference with the same number of initial points, or equally achieve the same accuracy by starting with fewer initial points. We can still use this artificial object-swapping task as an adversarial condition to magnify the benefits of resampling, but we don't need to optimize parameters to achieve high performance on it.
  • Later, create an RFC proposing how we can use more model-based resetting based on moving off of an object. I agree this will be an interesting question to explore in terms of untangling prediction errors that should inform learning vs. hypothesis resetting. E.g. maybe the default is that the hypothesis space is reset, but with some evidence accumulating that the model might be incomplete. Given enough movements with a consistent prediction error, this might trigger a more learning-style phase. But there are clearly complications to work through.
  • The compositional dataset would initially be a simple one consisting of different objects with different logos on their surfaces. That way we can learn the sub-components in isolation. This ties in with simplifying the first part of research to focus on unsupervised inference, and in the future we can return to unsupervised learning to enable more complex compositional datasets.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think another future work item would be to test if we can use this mechanism to make Monty more efficient. Basically we don't have to initialize 1000s of hypotheses on the first step. We can initialize a small subset and then iteratively refine these hypotheses. This was one of my initial excitements about this idea.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yeah good point! I find it interesting that you said start with less hypotheses and add more, because that's the opposite of what they explained here on particle filters. They started with many particles and reduced the number as the model started narrowing down the location (i.e., simpler distribution can be represented with less particles). I can say good arguments for both ways actually.

I know you mentioned this as part of future work, but I think it makes sense to accommodate for it here. I've updated the RFC to allow for changing count of hypotheses. So if we want to start with a small number and increase the hypotheses by 10% every step, we can define hypotheses_count_ratio=1.1. We can also control where these hypotheses come from using the hypotheses_old_to_new_ratio and hypotheses_informed_to_reenforce_ratio.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Upon reading your comment again, I see now that you said "iteratively refine" not increase 😅 Yeah makes sense!

* This only works in inference assuming that we have good object models learned in isolation. It likely breaks learning from scratch experiments.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why would you think it would break learning from scratch? Do you mean because we would never have all hypotheses go below 0 evidence and can therefor never classify no_match? I would be curious about the details of how you plan to initialize the evidence values for the new hypotheses. I think we should try to make this mechanism work in the learning setting as well. Maybe by initializing the average evidence, which could be negative? I'd have to think about this a bit more.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think I am more concerned about environments with multiple objects during the learning phase. If we assume objects are learned in isolation, then these pretrained graphs will work fine during inference. But in the learning from scratch, don't we use the same environment for learning and inference? If we start a multiobject episode, wouldn't we have to use it during learning too? Then objects won't be learned in isolation and the graphs we use for matching might end up being graphs of multiple objects. I could be missing something here..

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

There is nothing that forces us to use the same environment setup during training as during evaluation. We could first show objects in isolation while not providing labels (making it a learning from scratch setup) and then show them in multi object setups. I'm not saying this would work great, even in the current unsupervised learning setup with isolated objects we often merge multiple objects into the same graph. Just saying that not providing labels doesn't keep us from showing objects in isolation initially.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I see. I had the wrong assumptions about the learning from scratch setup then.

* I manually force Monty back into the matching phase and reset it's counters between episodes.
* Maybe we should brainstorm about what a terminal state should be in a multi-object environment.
* Maybe also brainstorm about ways to more seamlessly switch between matching and exploration? I don't like forcing Monty to always be in matching mode, by doing this it feels that we are making the gap between learning and inference wider.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely two good topics to talk about more. I wrote my thoughts on both in earlier comments. lmk what you think.

* Start designing environments with multiple objects aiming towards compositional models.
* Will require motor policy change to allow for moving between objects.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.
Loading
Sorry, something went wrong. Reload?
Sorry, we cannot display this file.
Sorry, this file is invalid so it cannot be displayed.