Demonstrating Monty's Capabilities: Configs Code-Share #70

scottcanoe · 2024-11-25T22:08:36Z

This is largely unchanged from the code-share in the last repo except for a couple of pretraining parameters. I expect this to undergo fairly significant changes and additions in the near future as we add more experiments.

Some thoughts:

It would probably be best to do this all in monty_lab. Right now, we are blocked from doing that because custom config dataclasses (like those for 9 and 10 LM configs) must be in tbp.monty in order to load model.pt files. I wonder if this might be a pain point for users wanting to use the config dataclass pattern.
A workaround is to use functions that return instances of core tbp.monty config dataclasses, and the functions do the work of customizing config parameters. I adopted that pattern in the dmc_pretraining_experiments.py and dmc_eval_experiments.py before realizing it might be a solution to the above problem.
It's on my list to write functions to generate configs for multi-LM monty and dataset configs. Once that is done, we should be able to move to monty_lab without issue. It'll also help switch to different SM/LM setups since 1, 2, 5, 9, 10 is kind of arbitrary and non-ideal. We talked about doing something like 1, 2, 4, 8, 16 SMs/LMs.
We might want to repackage this code in the future where different sets of experiments get broken into different files. eval_experiments.py is already getting pretty long, and it'll get quite a bit longer when we add the rapid learning experiments. In this case, it might make sense to create a module like dmc_common.py which contains parameters and reusable config-getter functions, rotations, etc.

Add multi-lm setups.

- use cube_face_and_corner_views - set output_dir during final chores - various cleanup tasks

- use MontyForEvidenceGraphMatching in 1-LM experiments - use cube_face_and_corner_views - set output_dir during final chores - add PYTHON_LOG_LEVEL - various cleanup tasks

vkakerbeck · 2024-11-26T08:58:04Z

benchmarks/configs/__init__.py

@@ -15,3 +17,5 @@
 CONFIGS.update(PRETRAININGS)
 CONFIGS.update(YCB)
 CONFIGS.update(MONTY_WORLD)
+CONFIGS.update(DMC_EVAL)


Thanks for sharing! Is there a reason you are adding these to the tbp.monty repository instead of the monty_lab repo? Since these experiments are npot part of our benchmark test suite I would not add them here in the benchmarks folder. Can we put these configs into this folder https://github.com/thousandbrainsproject/monty_lab/tree/main/monty_capabilities_analysis ? Similar to how we have a bunch of configs here: https://github.com/thousandbrainsproject/monty_lab/tree/main/experiments/configs

It would probably be best to do this all in monty_lab. Right now, we are blocked from doing that because custom config dataclasses (like those for 9 and 10 LM configs) must be in tbp.monty in order to load model.pt files. I wonder if this might be a pain point for users wanting to use the config dataclass pattern.

Slightly tangential but I'm also testing out having all configs in monty_lab and will see if I can take this Draft PR and create a new one in monty_lab repo.

Maybe you can give me some more details some time of why we can't define new dataclasses in Monty lab? In general, if there is no way to do this, we could make this PR here only contain the new dataclasses (and actually merge the PR) and then have the configs in monty_lab

@tristanls just as an fyi of another current pain point with configs.

Well... I ran into this problem with loading models before when dataclasses weren't in tbp.monty, but I just successfully loaded a 9-LM model.pt while on main and had no problem.

So what I think happened was this: about a month ago, I was saving checkpoints during pretraining so I could keep track of how many points were added per epoch. This saved config.pt files, and loading those was throwing errors because config.pt had pickled everything in a monty experiment. Since the custom dataclasses had to be visible for unpickling, loading config.pt without those classes in tbp.monty threw an error. I didn't keep detailed notes, but I remember there was a class discovery mechanism in torch that wasn't seeing classes defined outside tbp.monty.

I had written a custom logger for saving checkpoints, so maybe it was an issue on my end. Like maybe I needed to be dict-ifying the experiment config before saving it.

I think the good news is that I was wrong about not being able to load model.pt files. On the other hand, I'll be revisiting checkpointing soon for the rapid learning experiments. While I'm at it, I'll try to nail down what the issue was exactly.

Okay interesting, thanks for the extra infos! I'll be curious to hear what you find out.

Re: "dict-ifying" experimental config -- that's the top thing I want, haha. Instead of saving the dataclass itself, to save a serialized version.

Thanks for sharing those details Scott.

Yeah in terms of making complex data-classes serializable, I'm curious what the best options are, e.g. it might just be worth implementing some custom methods since we don't have that many different types of objects. Something we can continue to keep in mind as part of the RFC in PR 46

I learned that pydantic package can do this automatically with its built-in model_dump() (saves as dictionary) model_dump_json() (saves as JSON) 😄 (link: https://docs.pydantic.dev/latest/concepts/serialization/)

Ah ok interesting! Yeah this description certainly sounds interesting (form the main 'pydantic' page)

"Models share many similarities with Python's dataclasses, but have been designed with some subtle-yet-important differences that streamline certain workflows related to validation, serialization, and JSON schema generation. You can find more discussion of this in the Dataclasses section of the docs."

hlee9212 · 2024-12-02T22:20:23Z

Hi @scottcanoe, @vkakerbeck, @nielsleadholm - I moved the configs from this PR to monty_lab here. I checked with pretrain_dist_agent_1lm that the config loads and starts training.

I think this PR can now be closed.

nielsleadholm · 2024-12-05T11:41:37Z

Thanks @hlee9212 !

scottcanoe added 10 commits November 20, 2024 10:55

Add configs

7f55f7f

Update config_args.py and make_dataset_configs.py

3cea4ea

Add multi-lm setups.

Update dmc_pretraining_experiments.py

11a0b70

- use cube_face_and_corner_views - set output_dir during final chores - various cleanup tasks

Update dmc_eval_experiments.py

dc49175

- use MontyForEvidenceGraphMatching in 1-LM experiments - use cube_face_and_corner_views - set output_dir during final chores - add PYTHON_LOG_LEVEL - various cleanup tasks

Doc cleanup

155dc1c

Comments

a477ecd

python log level back to INFO

14f7acf

Update dmc_eval_experiments.py

9cc400e

Merge branch 'main' into dmc

45d3e39

Update dmc_eval_experiments.py

7b1713a

scottcanoe requested review from vkakerbeck, nielsleadholm and hlee9212 November 25, 2024 22:08

vkakerbeck reviewed Nov 26, 2024

View reviewed changes

hlee9212 mentioned this pull request Nov 27, 2024

Transferring DMC Configs to monty_labs thousandbrainsproject/monty_lab#6

Merged

nielsleadholm closed this Dec 5, 2024

scottcanoe deleted the dmc branch January 15, 2025 21:57

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Demonstrating Monty's Capabilities: Configs Code-Share #70

Demonstrating Monty's Capabilities: Configs Code-Share #70

scottcanoe commented Nov 25, 2024

vkakerbeck Nov 26, 2024

hlee9212 Nov 26, 2024

vkakerbeck Nov 26, 2024

scottcanoe Nov 26, 2024

vkakerbeck Nov 26, 2024

hlee9212 Nov 26, 2024

nielsleadholm Nov 27, 2024

hlee9212 Dec 5, 2024

nielsleadholm Dec 6, 2024

hlee9212 commented Dec 2, 2024

nielsleadholm commented Dec 5, 2024

Demonstrating Monty's Capabilities: Configs Code-Share #70

Demonstrating Monty's Capabilities: Configs Code-Share #70

Conversation

scottcanoe commented Nov 25, 2024

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

Choose a reason for hiding this comment

hlee9212 commented Dec 2, 2024

nielsleadholm commented Dec 5, 2024