RFC on Hypotheses Resampling #196

ramyamounir · 2025-02-27T20:01:47Z

This RFC is for resampling hypotheses inspired by particle filters.

View here: https://github.com/ramyamounir/tbp.monty/blob/rfc_hypotheses_resampling/rfcs/0000_hypotheses_resampling.md

codeallthethingz · 2025-02-27T21:59:24Z

rfcs/0000_hypotheses_resampling.md

+
+# Summary
+
+Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects.


A link to some background reading on particle filters would be great.

Ah right. @vkakerbeck shared this video a week ago. I thought it was really helpful.

https://www.youtube.com/watch?v=NrzmH_yerBU

I can add it to the RFC if needed.

Also maybe check out this (more specific to Monty): https://thousandbrainsproject.readme.io/docs/test-particle-filter-like-resampling-of-hypothesis-space

suggestion: Both of these links would be useful to include in the RFC.

nielsleadholm

Nice, thanks for putting this together! I like the idea of using the slope as a signal for sampling, i.e. even when the total evidence counts might be low.

Just a couple comments, e.g. I think it would be worth also emphasizing the uniform sampling, because I think this will still be useful, otherwise we might not see a good slope for the new object in the first case? At least based on the second plot, it looks like it takes a while for the slope to appear.

I've also just written a high-level comment about the dataset we want to evaluate this work on, as this might be something we want to revisit.

rfcs/0000_hypotheses_resampling.md

nielsleadholm · 2025-02-28T12:20:15Z

rfcs/0000_hypotheses_resampling.md

+
+1) For every object, at every new step, get initial hypotheses based on the observed pose.
+2) If principal curvature is not defined, skip
+3) Uniformly sample M percent from these new hypotheses (e.g., M = 20% = 1000 hyp)


I would mention further up in the RFC that we would also do some uniform sampling (which I think is reasonable, but I think this is the first time it is mentioned).

I've modified the summary section. Let me know if it's not clear enough.

rfcs/0000_hypotheses_resampling.md

nielsleadholm · 2025-02-28T12:39:38Z

rfcs/0000_hypotheses_resampling.md

+	* Here we would focus on detecting the change that triggers full reset of Monty states. 
+	* Detecting the object change will likely be based on evidence decay and can be problematic. If an object was added and quickly changed before accumulating enough evidence, it will be difficult to detect change based on evidence decay. This will still result in incorrect initial hypotheses. Resampling without a hard reset is more general.
+
+# Future possibilities


One thing I've been thinking about in terms of the artificial nature of this task (which is also something we've discussed before), is this current setup is a bit like a "magic trick" where someone puts their hand in a box to feel an object with one finger, then after a couple minutes, someone discretely replaces the object without their finger really feeling a major change. In other words, it's not very naturalistic. What we are actually trying to build towards is being able to seamlessly move from one sub-object to another if we have e.g. a logo on a mug, or a car composed of parts, and for the LM representation to update without needing a supervisory signal that it's on a new object. In the compositional case however, if the LM was 100% confident about where it was on a learned reference frame (e.g. on the "N" in the child object logo), after performing a movement onto the parent object (e.g. mug), it's hypothesized location in the child object's reference frame would now be in empty space, far from any learned features that exist for that object. In that case, the evidence associated with this "off-object" location should essentially be zero by definition, and it should be straightforward to implement this.

In the above case, resampling would still be important, but I think it just emphasizes that the evidence scores could be a lot less adversarial than in the current experiment setup.

For what it's worth in the above situation, if you moved back onto the object, I think it would be the responsibility of a higher level LM that remembers e.g. logo at location x to bias the low-level LM, rather than requiring the low-level LM maintain a memory of the previous object. This would tie in with resampling, i.e. we wouldn't need to worry about keeping lots of sampling points for the old object if we've moved off of it.

@vkakerbeck what do you think? It almost makes me wonder whether we would be better off having the compositional dataset to evaluate these approaches on, rather than trying to shoehorn it into the current YCB object cycling setup. This is similar to some of Jeff's comments at the retreat. Maybe something we can also discuss on Monday.

Good points. I think this may relate to the mechanism for switching between matching and exploration policies we discussed briefly:

If we have recognized an object and are now exploring it, we don't need to keep testing hypotheses, we just make predictions using our model of this object and check if these predictions are correct.

We would exit this mode if 1) our prediction is suddenly incorrect (the artificial setup where you suddenly replace the object, or you were wrong about the object) or 2) the sensor moves to a location where the model wouldn't predict the object to exist anymore (moving off object) which is the usual, more natural case.

The second case would give us a model-based signal for switching back into matching mode and potentially reinitializing the entire hypothesis space.

In my mind we are currently dealing with the case where we have actually not recognized the object yet so we can't make model-based predictions. I mean you could argue that we can make model-based predictions using the mlh (which we do when testing that hypothesis) but it is not as much of a reliable signal that it could be used for a hard reset.

In terms of the test bed we use, I think that we can keep using the current setup for the hypothesis resampling work since this mechanism doesn't rely on physical movement from one object onto another. It should actually be useful for recognizing a single object in general (removing reliance on first observation and allowing us to test less hypotheses at any point in time).

However, it does bring up a good point on whether this item is really the most impactful one for us to tackle for the unsupervised learning milestone. We could evaluate whether it makes more sense to start with the prediction error/off-object prediction signal I outlined. In that case we would definitely need a multi object environment setup since this relies on a physical separation between the two objects.

Lmk what you think. Happy to talk about this more in a meeting too.

Yeah I think my concern with optimizing for this current problem setup is that we may end up with solutions that aren't actually that necessary / useful. E.g. clearly the hypothesis resampling is useful, but the exact details of it might be lead on a tangent by trying to deal with this adversarial (and artificial) problem setup.

One option is just use this as you say, but we should be cognizant of this caveat and not worry about getting super high accuracy in this setting, and just consider it as a test-bed for resampling.

The other is the option of, after this RFC, to refocus on the actual dataset we want to deal with. A lot of unsupervised learning features aren't actually important if we are learning a dataset where compositional objects can be learned in isolation, in which case we can potentially refocus the items under the first milestone.

I agree with @nielsleadholm that this setup is not very naturalistic and might lead to some unnecessary solutions because we're optimizing for a much harder experiment. I like the option where we consider this a test-bed for resampling and not worry about super high accuracy. In practice, Monty should be able to always accumulate evidence even when we switch objects in place, but we don't need to spend too much time optimizing the speed of inference between objects.

I also like @vkakerbeck distinction between model-based and model-free evidence accumulation. If we have not recognized a model, we can't really use a model to make good predictions. This is where hypotheses resampling is most useful - to seamlessly adapt to changes while still trying to recognize the object. But if we have recognized the model, we can make predictions and figure out when a "Hard" reset might be needed. This would require that we continuously test the recognized model at every exploratory step. My only concern here is how do we know that a prediction error due to an unseen part of the object (e.g., previously unseen handle on a mug) should be treated as part of the object (i.e., add it to the cylinder graph) vs. trigger a prediction error reset for recognizing a new object that happens to be spatially close? Is this problem specific to learning-from-scratch?

I am a bit biased towards keeping this setup until we implement a decent hypotheses resampling LM because multiple objects would require us to think about (hierarchical) action policies to move between objects.

If it's a compositional object, such as wheels on a car, we would use a hypotheses-based hierarchical action policy. For example, if I know the higher level LM is on the car and the lower level LM is on the front wheel, now the higher level LM can instruct the lower level LM to move to the back wheel and reset its hypotheses (and perhaps even bias the new hypotheses with extra evidence for the back wheel object as Niels mentioned)

If we are not on a compositional object, we need to think about model-free action policies to move from one object to another. Maybe we can hard-code that in for now as a hack (i.e., after x steps move the sensor to object 2 at a known location)? Note that the "model" in "model-free" here refers to higher-level compositional model.

@nielsleadholm when you say compositional dataset, are you talking about the dinner set scenes?

Great, Viviane and I were discussing this earlier this morning and yes what we are thinking is:

Clarify that the current work to enable a first version of compositional objects is about unsupervised inference, not learning. For now, we assume that we are going to learn sub-objects in isolation (e.g. mug and logo, see below), and so it's more about equipping Monty to handle inference where it may move from one object onto another.

Continue the current work you are doing. Adding the ability to resample hypotheses will be important for several aspects of compositional objects. This first version would actually be best tested on the existing benchmarks by seeing whether we can get more robust inference with the same number of initial points, or equally achieve the same accuracy by starting with fewer initial points. We can still use this artificial object-swapping task as an adversarial condition to magnify the benefits of resampling, but we don't need to optimize parameters to achieve high performance on it.

Later, create an RFC proposing how we can use more model-based resetting based on moving off of an object. I agree this will be an interesting question to explore in terms of untangling prediction errors that should inform learning vs. hypothesis resetting. E.g. maybe the default is that the hypothesis space is reset, but with some evidence accumulating that the model might be incomplete. Given enough movements with a consistent prediction error, this might trigger a more learning-style phase. But there are clearly complications to work through.

The compositional dataset would initially be a simple one consisting of different objects with different logos on their surfaces. That way we can learn the sub-components in isolation. This ties in with simplifying the first part of research to focus on unsupervised inference, and in the future we can return to unsupervised learning to enable more complex compositional datasets.

vkakerbeck

Nice outline and great figures! Overall really like it. I just asked about clarifications in a few places. On a major note, we should talk more about the longer thread at the future possibilities section (is this really the most impactful next step). This shouldn't keep this RFC from being merged but may determine whether it's actually the next thing to work on.

rfcs/0000_hypotheses_resampling.md

vkakerbeck · 2025-02-28T13:18:14Z

rfcs/0000_hypotheses_resampling.md

+2) If principal curvature is not defined, skip
+3) Uniformly sample M percent from these new hypotheses (e.g., M = 20% = 1000 hyp)
+4) Sample N percent from the existing hypotheses distribution of highest evidence slope (e.g., N = 10% = 500 hyp). These sampled hypotheses should be "close" to the most likely hypotheses.
+5) Replace unlikely M+N percent of existing hypotheses (e.g., 30% = 1500 hyp) with these newly sampled hypotheses. The unlikely hypotheses are chosen based on evidence change.


How is the evidence count for these new hypotheses initialized?
On a technical level, we have this large matrix of hypotheses, do you basically in-place replace the low evidence one's with the new ones and thereby don't change the shape of the hypothesis space?

I was thinking we would just initialize with the mean of the evidence from the corresponding graph. It wouldn't matter when sampling new hypotheses and removing old ones because we can use evidence slope for that, but it would matter for deciding on the mlh for terminal state.

Yes, that was my plan, but I'm updating the RFC to include more flexible resampling that doesn't necessarily keep the same number of hypotheses.

vkakerbeck · 2025-02-28T13:36:26Z

rfcs/0000_hypotheses_resampling.md

+	* Here we would focus on detecting the change that triggers full reset of Monty states. 
+	* Detecting the object change will likely be based on evidence decay and can be problematic. If an object was added and quickly changed before accumulating enough evidence, it will be difficult to detect change based on evidence decay. This will still result in incorrect initial hypotheses. Resampling without a hard reset is more general.
+
+# Future possibilities


Good points. I think this may relate to the mechanism for switching between matching and exploration policies we discussed briefly:

If we have recognized an object and are now exploring it, we don't need to keep testing hypotheses, we just make predictions using our model of this object and check if these predictions are correct.

We would exit this mode if 1) our prediction is suddenly incorrect (the artificial setup where you suddenly replace the object, or you were wrong about the object) or 2) the sensor moves to a location where the model wouldn't predict the object to exist anymore (moving off object) which is the usual, more natural case.

The second case would give us a model-based signal for switching back into matching mode and potentially reinitializing the entire hypothesis space.

In my mind we are currently dealing with the case where we have actually not recognized the object yet so we can't make model-based predictions. I mean you could argue that we can make model-based predictions using the mlh (which we do when testing that hypothesis) but it is not as much of a reliable signal that it could be used for a hard reset.

In terms of the test bed we use, I think that we can keep using the current setup for the hypothesis resampling work since this mechanism doesn't rely on physical movement from one object onto another. It should actually be useful for recognizing a single object in general (removing reliance on first observation and allowing us to test less hypotheses at any point in time).

However, it does bring up a good point on whether this item is really the most impactful one for us to tackle for the unsupervised learning milestone. We could evaluate whether it makes more sense to start with the prediction error/off-object prediction signal I outlined. In that case we would definitely need a multi object environment setup since this relies on a physical separation between the two objects.

Lmk what you think. Happy to talk about this more in a meeting too.

vkakerbeck · 2025-02-28T13:39:22Z

rfcs/0000_hypotheses_resampling.md

+	* Detecting the object change will likely be based on evidence decay and can be problematic. If an object was added and quickly changed before accumulating enough evidence, it will be difficult to detect change based on evidence decay. This will still result in incorrect initial hypotheses. Resampling without a hard reset is more general.
+
+# Future possibilities
+* This only works in inference assuming that we have good object models learned in isolation. It likely breaks learning from scratch experiments.


Why would you think it would break learning from scratch? Do you mean because we would never have all hypotheses go below 0 evidence and can therefor never classify no_match? I would be curious about the details of how you plan to initialize the evidence values for the new hypotheses. I think we should try to make this mechanism work in the learning setting as well. Maybe by initializing the average evidence, which could be negative? I'd have to think about this a bit more.

I think I am more concerned about environments with multiple objects during the learning phase. If we assume objects are learned in isolation, then these pretrained graphs will work fine during inference. But in the learning from scratch, don't we use the same environment for learning and inference? If we start a multiobject episode, wouldn't we have to use it during learning too? Then objects won't be learned in isolation and the graphs we use for matching might end up being graphs of multiple objects. I could be missing something here..

There is nothing that forces us to use the same environment setup during training as during evaluation. We could first show objects in isolation while not providing labels (making it a learning from scratch setup) and then show them in multi object setups. I'm not saying this would work great, even in the current unsupervised learning setup with isolated objects we often merge multiple objects into the same graph. Just saying that not providing labels doesn't keep us from showing objects in isolation initially.

I see. I had the wrong assumptions about the learning from scratch setup then.

vkakerbeck · 2025-02-28T13:40:04Z

rfcs/0000_hypotheses_resampling.md

+* This only works in inference assuming that we have good object models learned in isolation. It likely breaks learning from scratch experiments.
+* I manually force Monty back into the matching phase and reset it's counters between episodes. 
+	* Maybe we should brainstorm about what a terminal state should be in a multi-object environment.
+	* Maybe also brainstorm about ways to more seamlessly switch between matching and exploration? I don't like forcing Monty to always be in matching mode, by doing this it feels that we are making the gap between learning and inference wider.


Definitely two good topics to talk about more. I wrote my thoughts on both in earlier comments. lmk what you think.

vkakerbeck · 2025-02-28T13:41:47Z

rfcs/0000_hypotheses_resampling.md

+	* Here we would focus on detecting the change that triggers full reset of Monty states. 
+	* Detecting the object change will likely be based on evidence decay and can be problematic. If an object was added and quickly changed before accumulating enough evidence, it will be difficult to detect change based on evidence decay. This will still result in incorrect initial hypotheses. Resampling without a hard reset is more general.
+
+# Future possibilities


I think another future work item would be to test if we can use this mechanism to make Monty more efficient. Basically we don't have to initialize 1000s of hypotheses on the first step. We can initialize a small subset and then iteratively refine these hypotheses. This was one of my initial excitements about this idea.

Yeah good point! I find it interesting that you said start with less hypotheses and add more, because that's the opposite of what they explained here on particle filters. They started with many particles and reduced the number as the model started narrowing down the location (i.e., simpler distribution can be represented with less particles). I can say good arguments for both ways actually.

I know you mentioned this as part of future work, but I think it makes sense to accommodate for it here. I've updated the RFC to allow for changing count of hypotheses. So if we want to start with a small number and increase the hypotheses by 10% every step, we can define hypotheses_count_ratio=1.1. We can also control where these hypotheses come from using the hypotheses_old_to_new_ratio and hypotheses_informed_to_reenforce_ratio.

Upon reading your comment again, I see now that you said "iteratively refine" not increase 😅 Yeah makes sense!

nielsleadholm

Thanks for the great updates! I like the proposed changes to the code, as well as the parameters you've introduced. Just added a few comments.

nielsleadholm · 2025-03-04T10:27:19Z

rfcs/0000_hypotheses_resampling.md

+* Number of hypotheses should not scale up with steps, if anything they should decrease. For now, any sampled hypothesis must replace an old "unlikely" hypothesis.
+* Sampling of likely and unlikely hypotheses is based on evidence change instead of absolute evidence.
+* Terminal state calculation is still based on accumulated evidence and `x_percent_threshold`.
+* The resampling procedure only occurs if principal curvature is defined.


Yeah just elaborating on this further - for objects like the ball, we shouldn't need a ton of hypotheses to recognize them, because the poses are symmetric. So I think this fits with the suggestion that if pose is undefined, we can add some hypotheses, but rather than adding them for e.g. 8 different rotations, we can just add them for say, 4 (or even 2). If it is an object that is less symmetric than a sphere, then this should hopefully get accounted for by informed hypotheses that are sampled when we get onto more pose-defining features.

(this also couples well with Monty's approach to symmetry, i.e. it doesn't matter if the poses it samples and converges to aren't the Euler-angle equivalent of the ground-truth pose, especially if it has stored which poses are equivalent, which is a feature in the pipeline)

nielsleadholm · 2025-03-04T10:31:34Z

rfcs/0000_hypotheses_resampling.md

+Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects. The newly sampled hypotheses will come from:
+1) a subset (uniformly sampled) of new hypotheses initialized based on the current step observation.
+2) a set of newly sampled hypotheses from the distribution of the most likely hypotheses.
+3) a subset of the old hypotheses based on the metric representing the most likely hypotheses.


I found the wording of (3) a bit confusing/hard to read for some reason. Maybe something like "A subset of the old hypotheses. Which of these are maintained is based on...".

Also for 2 and 3, my understanding is we are basing resampling on the first-order derivative of their evidence accumulation. It might be worth saying something like "... the most rapidly rising hypotheses" rather than "most likely".

nielsleadholm · 2025-03-04T10:33:07Z

rfcs/0000_hypotheses_resampling.md

-3) Uniformly sample M percent from these new hypotheses (e.g., M = 20% = 1000 hyp)
-4) Sample N percent from the existing hypotheses distribution of highest evidence slope (e.g., N = 10% = 500 hyp). These sampled hypotheses should be "close" to the most likely hypotheses.
-5) Replace unlikely M+N percent of existing hypotheses (e.g., 30% = 1500 hyp) with these newly sampled hypotheses. The unlikely hypotheses are chosen based on evidence change.
+1) If principal curvature is not defined, skip


Based on the separate discussion, this may need to be updated.

Sounds good, I've removed this from the resampling procedure and the constraints.

nielsleadholm · 2025-03-04T10:36:49Z

rfcs/0000_hypotheses_resampling.md

+
+## High Level Code Changes
+
+The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function


"be either be modifying" ... "defining" --> "modify either"... "define" (noticed a typo, but also a useful reminder to use active voice in writing 😅)

nielsleadholm · 2025-03-04T10:42:16Z

rfcs/0000_hypotheses_resampling.md

+## High Level Code Changes
+
+The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function
+directly or defining another function that calls `_update_evidence` as one of its steps. A rough proposal of the needed changes is shown below.


For function number 8, it would be nice to refactor out the existing section below into it's own function so that can be used ideally as-is both i) in the existing _update_evidence, and ii) the new setting you are considering.

Lines 952-1002 in evidence_matching.py

# Before moving we initialize the hypothesis space: if displacements is None: ...

Yes, makes sense. I may end up rewriting and factoring out parts of _update_evidence in the process. I was thinking of just rewriting _update_evidence to support resampling and breaking it up into manageable smaller functions.

Nice, yeah that would be great. It's a pretty big method at the moment.

nielsleadholm · 2025-03-04T10:47:39Z

rfcs/0000_hypotheses_resampling.md

+| ----------|-------------|-------|
+| **hypotheses_count_ratio** | A multiplier for the needed number of hypotheses at this new step | [0, inf) |
+| **hypotheses_old_to_new_ratio** | How many old to new hypotheses to be added. `0` means all old, `1` means all new | [0, 1] |
+| **hypotheses_informed_to_reinforce_ratio** | How many informed (sampled based on newly observed pose) to reinforced hypotheses (sampled close to existing likely hypotheses) to be added. `0` means all informed, `1` means all reinforced | [0, 1] |


I think it would be useful to define informed and reinforced earlier in the document, as these terms are used a few times but are a bit confusing without a clear description.

yeah, good idea. I've added them to the summary. Also I came up with "reinforced" and I'm not sure if it's the best term, but we can go with it for now if no objections.

nielsleadholm · 2025-03-04T10:55:16Z

rfcs/0000_hypotheses_resampling.md

+
+
+def calculate_new_hypotheses_counts(
+    curr_hypotheses_count, new_informed_hypotheses_count


What exactly is the new_informed_hypotheses_count parameter for? Would be useful with some description here and on line 135.

These are the total number of informed hypotheses sampled based on the current pose observation. Assuming there are x points in the object graph, these can be 2x if the PC is defined or 8x if PC is undefined. When calculating the needed number of informed hypotheses (i.e., needed_new_informed_hyp), we should not request more than the available new_informed_hypotheses_count. If we request more, the difference will come from reinforced hypotheses which are technically infinite.

Note that we do not need to actually get the informed hypotheses at this step to know how many we have, we can just use the number of points in the graph and whether the pose is defined.

I'll add some description.

nielsleadholm · 2025-03-04T10:56:12Z

rfcs/0000_hypotheses_resampling.md

+        needed_hypotheses_count * (1 - hypotheses_old_to_new_ratio),
+        needed_hypotheses_count * hypotheses_old_to_new_ratio,
+    )
+    if needed_old_sampled_hyp > curr_hypotheses_count:


Would be worth with a comment here, e.g. if trying to include more old hypotheses than exist, set their count as the ceiling.

nielsleadholm · 2025-03-04T10:57:22Z

rfcs/0000_hypotheses_resampling.md

+
+The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function
+directly or defining another function that calls `_update_evidence` as one of its steps. A rough proposal of the needed changes is shown below.
+


It would be worth thinking whether function 6 "Resample old and reinforced" can be broken up into two methods - at least on the surface it seems like these should probably be separated out.

Good point. The main purpose of the proposed figure is to show roughly where the changes will occur within the matching_step but I imagine that the exact number of functions will change as I start implementing. I may even do all the changes in update_evidence as discussed above. But at least this gives you an idea of where I plan to start.

nielsleadholm

Nice, thanks for making those changes! Just a couple more very minor comments but otherwise it looks ready to merge from my perspective.

nielsleadholm · 2025-03-04T15:34:04Z

rfcs/0000_hypotheses_resampling.md

-3) a subset of the old hypotheses based on the metric representing the most likely hypotheses.
+1) **Informed Hypotheses:** a subset (uniformly sampled) of new hypotheses initialized based on the current step observation (i.e., sensed pose).
+2) **Reinforced Hypotheses:** a set of newly sampled hypotheses from the distribution of the most rapidly rising hypotheses.
+3) **Old Hypotheses:** a subset of the old hypotheses, sampled based on the most rapidly rising hypotheses.


Nice thanks for writing these descriptions, I think it would be useful to maybe say "maintained based on..." rather than "sampled based on..." for (3) - sampled to me suggests drawing from a random distribution, whereas here we are just keeping ones that already exist.

nielsleadholm · 2025-03-04T15:48:46Z

rfcs/0000_hypotheses_resampling.md

+
+    Outputs:
+    - `needed_old_sampled_hyp`: The number of existing hypotheses to be kept from the
+        previous step. Should not exceed `curr_hypotheses_count`.


Rather than "should not", maybe describe it as "Will not exceed" - should implies to me that it is a user-specified parameter that the user needs to be mindful of setting correctly.

vkakerbeck

Really nice, thanks for the updates! I just left one more major comment around the execution order of step 1 and 2. But happy to approve this already so you can get started on implementing this.

vkakerbeck · 2025-03-04T16:08:59Z

rfcs/0000_hypotheses_resampling.md

 * Sampling of likely and unlikely hypotheses is based on evidence change instead of absolute evidence.
 * Terminal state calculation is still based on accumulated evidence and `x_percent_threshold`.
-* The resampling procedure only occurs if principal curvature is defined.

 ## The Resampling Procedure

 1) For every object, at every new step, get initial hypotheses based on the observed pose.


I think for efficiency reasons it could make sense to do step 2 first, and then only calculate initial hypotheses for needed_new_informed_hyp locations on the object.

Yes, that makes sense. This is what I planned out in the high-level code change diagram, but I forgot to update this here in the resampling procedure.

vkakerbeck · 2025-03-04T19:12:00Z

rfcs/0000_hypotheses_resampling.md

+
+*Note that the two types (reinforced and informed) of resampled hypotheses are treated differently. 
+Unlike reinforced hypotheses, the current positions of the informed hypotheses do not need to be rotated and displaced at resampling.
+Informed hypotheses are added after the reinforced (and old) hypotheses are updated/displaced.*


You still plan on added feature evidence to the new informed hypotheses, right? (like in _calculate_feature_evidence_for_all_nodes)

Yes. I would need to compare with sensed features and add evidence if needed.

At this point, I'm probably better off rewriting _update_evidence with support for resampling.

vkakerbeck · 2025-03-04T19:13:25Z

rfcs/0000_hypotheses_resampling.md

+
+| Parameter | Description | Range |
+| ----------|-------------|-------|
+| **hypotheses_count_ratio** | A multiplier for the needed number of hypotheses at this new step | [0, inf) |


Does this mean if it is 1, the number of hypotheses remains the same?

Ah I see this below now :) Could be worth adding to the description here too since you have that in the description of the other params as well and it's useful info

Yeah, I'll add that.

vkakerbeck · 2025-03-04T19:21:41Z

rfcs/0000_hypotheses_resampling.md

+    - `curr_hypotheses_count`: The current number of hypotheses
+    - `new_informed_hypotheses_count`: The total number of informed hypotheses
+        sampled based on the current pose observation. Assuming there are x points
+        in the object graph, these can be 2x if the PC is defined or 8x if PC is undefined. 


Related to my earlier comment, I think it would be more efficient if we sampled the informed hypotheses after we calculated the counts so we don't need to calculate possible poses for all points in the graph. You could take the needed_new_informed_hyp and sample that number/2 points from the graph (could even be sampled in an informed way by first picking the ones where the features actually match the sensed features.

This would remove the need for this parameter but would introduce the risk that needed_new_informed_hyp is larger than the number of points in the graph * 2. Is this actually something that would realistically happen? In what case would this happen? If hypotheses_count_ratio is >1? Do we want to support this case? And if so, could we just duplicate hypotheses with some noise added to them to fill up the number of requested hypotheses?

Yes, that's a good point. I don't plan on sampling the informed hypotheses first, but I only need the number of expected informed hypotheses at this step. We can quickly figure that out from the number of graph points and whether PC is defined. Then, I can know if the needed_new_informed_hyp is more than the actual informed hypotheses and handle it. I will only sample what I need from the graph later.

Yes, we run this risk if hypotheses_count_ratio is > 1. Another case is if we started off with undefined PC at step 1 (graph points * 8) and moved to defined PC at step 2 (graph points * 2). Even if hypotheses_count_ratio is < 1, we could still request too many informed hypotheses to maintain the high number of hypotheses we started with. A bit of an edge case, but could happen.

vkakerbeck · 2025-03-04T19:25:44Z

rfcs/0000_hypotheses_resampling.md

+2) Calculate the needed hypotheses counts to be sampled based on the defined parameters as shown [here](#The-Resampling-Count-Calculation).
+3) Sample `needed_old_sampled_hyp` from the existing hypotheses based on highest evidence slope. We will keep these old hypotheses and remove the rest.
+4) Uniformly sample `needed_new_informed_hyp` from the new informed hypotheses. We will add these new hypotheses.
+5) Sample `needed_new_reinforced_hyp` from the existing hypotheses distribution of highest evidence slope. These sampled hypotheses should be "close" to the most likely hypotheses.


I'm not sure what you are planning exactly for how to initialize the evidence values but I think it would be worth considering using the average of the most likely hypotheses (the ones that the new hypotheses are sampled from) for this here instead of the average evidence for all hypotheses. We can use the average evidence of all hypotheses for step 4 (+evidence based on the sensed feature match)

Yes, great idea!

vkakerbeck · 2025-03-04T19:30:10Z

rfcs/0000_hypotheses_resampling.md

-Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects.
+Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects. The newly sampled hypotheses will come from:
+1) **Informed Hypotheses:** a subset (uniformly sampled) of new hypotheses initialized based on the current step observation (i.e., sensed pose).
+2) **Reinforced Hypotheses:** a set of newly sampled hypotheses from the distribution of the most rapidly rising hypotheses.


Is "reinforced hypotheses" a known term from particle filters? To me it sounds maybe a bit ambiguous as to what they actually are. If there is a common terminology for this already we should use this. I also find "informed hypotheses" a bit confusing in this context now (since technically all 3 types are informed, just by different things). Initially I used the term to contrast the initialization to uniformly initialized hypotheses. I get how it would be useful to keep using the same terminology so I don't have a good solution. I just noticed reading through the code that I was getting quite confused at time and the variable names don't seem so intuitive. I don't know you @nielsleadholm you have any thoughts or if we should just stick with it and add good documentation.

It is not a known term, I just needed a place holder and this is the best I could come up with 😅

My intuition is we are reinforcing existing hypotheses that are most likely correct by adding ones that are "close" to them. Open to suggestions though..

We could call it resampled hypotheses, similar to here: https://stonesoup.readthedocs.io/en/v1.5/auto_tutorials/sampling/ResamplingTutorial.html Or offspring hypotheses (borrowing from evolutionary algorithm terminology) as it is a resampling of the "fittest" hypotheses. But yeah not sure those are much better.

Yeah both are great! I am a bit biased towards offspring because resampled sounds a bit more generic, we've used the term resampling in many other places. I'll go with offspring for now but we can change it later in the actual PR.

scottcanoe

Wow, great write-up and an interesting read. It looks like there are quite a few ideas in here to try out, and I'm looking forward to seeing how it pans out.

I don't have much detailed feedback since my knowledge of EvidenceLMs details is incomplete. I'll keep digesting it but, but I'm approving so as not to hold up its merge.

ramyamounir · 2025-03-04T23:15:10Z

Great, thanks @scottcanoe! And thanks @nielsleadholm and @vkakerbeck for the thorough and quick review.

ramyamounir · 2025-03-05T01:24:52Z

@vkakerbeck @nielsleadholm @tristanls @codeallthethingz @scottcanoe @hlee9212

I'd like to motion for final comment period. All maintainers please state your final comments/vote.

I will take a thumbs up on this comment as a vote for "merge", and thumbs down as a vote for "No merge". However feel free to also write comments to explain or provide additional feedback if needed.

codeallthethingz

LGTM

hlee9212

LGTM.

tristanls

I added some comments. Voting to merge.

tristanls · 2025-03-07T21:47:30Z

rfcs/0000_hypotheses_resampling.md

+Why:
+* **Faster**: we don't have to wait for high unbounded evidence to decay enough to realize that a new hypothesis is more likely. We also may not need to worry about initializing new hypotheses with mean evidence, giving them fighting chance against other old hypotheses. Average slope is more fair in this sense.
+* **Accurate resampling**:  If we sample new hypotheses close to the hypotheses with high total accumulated evidence (e.g., particle filter), we could be sampling from the incorrect hypotheses (if we had just switched objects). If we sample close to the hypotheses with high evidence slope, we may converge faster.
+* **Practical**: The total evidence can still be unbounded, it doesn't matter because we only consider the slope. This metric does not care about how much evidence we've accumulated already. In other words, a hypothesis with a high evidence, can be removed if it hasn't accumulated evidence in a while, while a consistently growing hypothesis is less likely to be removed even if it was just added.


thought: What about the other end of the spectrum? Would another practical reason be that evidence could be tracked for S steps only? If my cotton candy becomes nothing in water, does it matter how much evidence I stored for cotton candy now that it all disappeared?

Yes, good point. Considering the average evidence slope over the last S steps only will allow us to forget about objects that disappear in the favor of other objects with rapidly rising evidence. It doesn't matter if the old object has quickly accumulated evidence in the past (i.e., before S steps). But that's another parameter to tune..

tristanls · 2025-03-07T21:52:58Z

rfcs/0000_hypotheses_resampling.md

+## The Resampling Procedure
+
+1) Calculate the needed hypotheses counts to be sampled based on the defined parameters as shown [here](#The-Resampling-Count-Calculation).
+2) Select `needed_old_sampled_hyp` from the existing hypotheses based on highest evidence slope. We will keep these old hypotheses and remove the rest.


nitpick: All of these variables have needed_ prefix and _hyp postfix, which makes them seem redundant as they don't highlight any differences.

suggestion: old_sampled, new_informed, new_reinforced since the method is going to be calculate_new_hypotheses_counts(), providing the context that these are all hypothesis counts.

The whole function would read as follows, which I regard easier to read:

# a multiplier for the needed number of hypotheses at this new step hypotheses_count_ratio = 1 # 0 means all sampled from old, 1 means all sampled from new hypotheses_old_to_new_ratio = 0.5 # 0 means all sampled from informed hypotheses, 1 means all sampled from reinforced hypotheses hypotheses_informed_to_reinforced_ratio = 0.5 def calculate_new_hypotheses_counts( current, informed ): """ This is a proposed function for calculating the needed count of each type of hypotheses ("informed", "reinforced", "old"). Inputs: - `current`: The current number of hypotheses - `informed`: The total number of informed hypotheses sampled based on the current pose observation. Assuming there are x points in the object graph, these can be 2x if the PC is defined or 8x if PC is undefined. Outputs: - `old_sampled`: The number of existing hypotheses to be kept from the previous step. Will not exceed `current`. - `new_informed`: The number of informed hypotheses to be sampled from the pool of informed hypotheses. Will not exceed `informed`. - `new_reinforced`: The number of needed reinforced hypotheses. Can technically be an infinite amount. """ # calculate the total number of hypotheses needed needed = current * hypotheses_count_ratio # calculate how many old and new hypotheses needed old_sampled, new_sampled = ( needed * (1 - hypotheses_old_to_new_ratio), needed * hypotheses_old_to_new_ratio, ) # needed old hypotheses should not exceed the existing hypotheses # if trying to sample more hypotheses, set the available count as ceiling if old_sampled > current: old_sampled = current new_sampled = needed - current # calculate how many informed and reinforced hypotheses needed new_informed, new_reinforced = ( new_sampled * (1 - hypotheses_informed_to_reinforced_ratio), new_sampled * hypotheses_informed_to_reinforced_ratio, ) # needed informed hypotheses should not exceed the available informed hypotheses # if trying to sample more hypotheses, set the available count as ceiling if new_informed > informed: new_informed = informed new_reinforced = new_sampled - informed return ( int(old_sampled), int(new_informed), int(new_reinforced), ) calculate_new_hypotheses_counts( currrent=100, informed=100 )

Great, I've added this to the RFC. I've also changed old_sampled to old_maintained since it's not really stochastically sampled as Niels has pointed out in another discussion.

tristanls · 2025-03-07T22:03:01Z

rfcs/0000_hypotheses_resampling.md

+| Parameter | Description | Range |
+| ----------|-------------|-------|
+| **hypotheses_count_ratio** | A multiplier for the needed number of hypotheses at this new step. `1` means maintain the same number of hypotheses. | [0, inf) |
+| **hypotheses_old_to_new_ratio** | How many old to new hypotheses to be added. `0` means all old, `1` means all new | [0, 1] |


issue: I find the use of _ratio in hypothesis_old_to_new_ratio (and this table) confusing. I would expect a ratio to not be bound to [0, 1]. When I think ratio of old to new I think of phrases like 2 old to 1 new 2:1, which would be 2 numerically. Maybe this could be called hypothesis_old_to_new_range|mix|factor?

I see your point here. Yes, the actual parameter is not a ratio, but instead it allows us to set the ratio (e.g., 0.5 would give us 1:1). I like factor. We can also call it weight, as in the weight of old to new in the hypotheses mix.

tristanls · 2025-03-07T22:04:36Z

rfcs/0000_hypotheses_resampling.md

+
+| Parameter | Description | Range |
+| ----------|-------------|-------|
+| **hypotheses_count_ratio** | A multiplier for the needed number of hypotheses at this new step. `1` means maintain the same number of hypotheses. | [0, inf) |


suggestion: Rename to hypothesis_count_multiplier as per the description.

tristanls · 2025-03-07T22:24:19Z

rfcs/0000_hypotheses_resampling.md

+        needed_old_sampled_hyp = curr_hypotheses_count
+        needed_new_sampled_hyp = needed_hypotheses_count - curr_hypotheses_count
+
+    # calculate how many informed and re-enforced hypotheses needed


typo: Should be reinforced.

tristanls · 2025-03-07T22:32:32Z

rfcs/0000_hypotheses_resampling.md

+
+## High Level Code Changes
+
+The needed modification will only change the `EvidenceLM` class. More specifically, we would modify either the `_update_evidence` function


typo: There is no EvidenceLM class. I think you mean EvidenceGraphLM?

added RFC

9de31cf

ramyamounir requested review from scottcanoe, vkakerbeck, nielsleadholm and hlee9212 February 27, 2025 20:01

ramyamounir self-assigned this Feb 27, 2025

tristanls added rfc:proposal This issue tracks an RFC proposal triaged This issue or pull request was triaged labels Feb 27, 2025

codeallthethingz reviewed Feb 27, 2025

View reviewed changes

nielsleadholm reviewed Feb 28, 2025

View reviewed changes

vkakerbeck reviewed Feb 28, 2025

View reviewed changes

ramyamounir added 3 commits February 28, 2025 15:27

addressing Niels and Viviane comments

9892798

minor update

407cc06

added high level code change proposal

bd0665c

nielsleadholm reviewed Mar 4, 2025

View reviewed changes

Addressing Niels comments

60df997

nielsleadholm approved these changes Mar 4, 2025

View reviewed changes

minor update

2a998a9

vkakerbeck approved these changes Mar 4, 2025

View reviewed changes

addressing Viviane's comments

c0c75bd

scottcanoe approved these changes Mar 4, 2025

View reviewed changes

codeallthethingz self-requested a review March 5, 2025 03:43

codeallthethingz reviewed Mar 5, 2025

View reviewed changes

codeallthethingz self-requested a review March 5, 2025 03:58

codeallthethingz approved these changes Mar 5, 2025

View reviewed changes

hlee9212 approved these changes Mar 6, 2025

View reviewed changes

tristanls reviewed Mar 7, 2025

View reviewed changes

addressing Tristan's comments

0e0889c

minor update

06b1343


		# Summary

		Resample hypotheses at every step in a manner inspired by particle-filters. This is the first step for Monty to interact with multiple objects and recognize compositional objects.


		## High Level Code Changes

		The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function



		def calculate_new_hypotheses_counts(
		curr_hypotheses_count, new_informed_hypotheses_count


		The needed modification will only change the `EvidenceLM` class. More specifically, we would be either be modifying the `_update_evidence` function
		directly or defining another function that calls `_update_evidence` as one of its steps. A rough proposal of the needed changes is shown below.

RFC on Hypotheses Resampling #196

Are you sure you want to change the base?

RFC on Hypotheses Resampling #196

Conversation

ramyamounir commented Feb 27, 2025

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

tristanls Mar 7, 2025 • edited Loading

Choose a reason for hiding this comment

nielsleadholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsleadholm Feb 28, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

ramyamounir Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

nielsleadholm Mar 3, 2025 • edited Loading

Choose a reason for hiding this comment

vkakerbeck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsleadholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsleadholm Mar 4, 2025 • edited Loading

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

nielsleadholm left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

vkakerbeck left a comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

scottcanoe left a comment

Choose a reason for hiding this comment

ramyamounir commented Mar 4, 2025

ramyamounir commented Mar 5, 2025 • edited Loading

codeallthethingz left a comment

tristanls Mar 7, 2025 •

edited

Loading

nielsleadholm Feb 28, 2025 •

edited

Loading

ramyamounir Mar 3, 2025 •

edited

Loading

nielsleadholm Mar 3, 2025 •

edited

Loading

nielsleadholm Mar 4, 2025 •

edited

Loading

ramyamounir commented Mar 5, 2025 •

edited

Loading