Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Deterministic iteration order for reproducible codegen #846

Merged
merged 2 commits into from
Dec 10, 2024

Conversation

harrysarson
Copy link
Contributor

@harrysarson harrysarson commented Nov 27, 2024

Two small changes to get derministic iteration order which makes codegen reproducible

  • Use OrderedDict for reproducible codegen

    • Python set's do not have a deterministic iteration order (unlike dicts),
      therefore replace usage of set with OrderedDict's for the includes
      variable in full__description.c.em to make codegen reproducible.
  • Use deterministic iteration in type_support header

    • The include_directives set is used by the nested templates to
      prevent generation of duplicate include directives, but is not iterated
      over by those nested templates.

    • In idl__type_support.c.em rather that iterating over the
      include_directives set (which introduces non-determinism) to generate
      the includes for top level header files, we iterate over those top level
      includes as a list and seperately construct the include_directives set
      from the list. This ensures deterministic codegen.

@harrysarson harrysarson force-pushed the harry/deterministic-codegen branch from 7cf813d to 1c760e0 Compare November 27, 2024 15:59
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

While I'm all for making code generation reproducible, does this actually have an effect? My reading of this is that the include_directives set() is used just to tell if we've every included a particular header before. That is, we don't actually iterate over include_directives at any point, so this is basically a no-op. Can you give an example of what changes with this in place?

@harrysarson
Copy link
Contributor Author

we don't actually iterate over include_directives at any point

We iterated over include_directives here

@[for header_file in include_directives]@

There are other include_directives variables (e.g the one defined at here https://github.com/ros2/rosidl/blob/dd0530023e758914c9e8e48ef0ddc1f797161a64/rosidl_generator_c/resource/idl__struct.h.em) and for those we could keep using a set. However, I think it is worth being consistent and using an OrderedDict for all the different include_directives variables.

@clalancette
Copy link
Contributor

We iterated over include_directives here

Fair enough. I think then we should just replace that one with a list. We shouldn't need the rest of the changes here.

@harrysarson
Copy link
Contributor Author

harrysarson commented Dec 4, 2024

The include_directives defined here

is passed to other templates

TEMPLATE(
'srv__type_support.c.em',
package_name=package_name, service=service,
interface_path=interface_path, include_directives=include_directives)
}@
and
TEMPLATE(
'action__type_support.c.em',
package_name=package_name, action=action,
interface_path=interface_path, include_directives=include_directives)
}@

and those templates then pass include_directives to other templates.


I guess we could do

--- a/rosidl_generator_c/resource/idl__type_support.c.em
+++ b/rosidl_generator_c/resource/idl__type_support.c.em
@@ -16,16 +16,19 @@ include_parts = [package_name] + list(interface_path.parents[0].parts) + [
     'detail', convert_camel_case_to_lower_case_underscore(interface_path.stem)]
 include_base = '/'.join(include_parts)
 
-include_directives = {
+include_directives_ordered = [
   'rosidl_typesupport_interface/macros.h',
   include_base + '__type_support.h',
   include_base + '__struct.h',
-  include_base + '__functions.h'}
+  include_base + '__functions.h']
+
+include_directives = set(include_directives_ordered)
+
 }@
 
 #include <string.h>
 
-@[for header_file in include_directives]@
+@[for header_file in include_directives_ordered]@
 #include "@(header_file)"
 @[end for]@

so we iterate over the ordered list but and use the set in the other templates

@clalancette
Copy link
Contributor

so we iterate over the ordered list but and use the set in the other templates

Yeah, I like this. I'd say let's go ahead with that change. Thanks!

Python set's do not have a deterministic iteration order (unlike dicts),
therefore replace usage of set with OrderedDict's for the `includes`
variable in `full__description.c.em` to make codegen reproducible.

Signed-off-by: Harry Sarson <[email protected]>
The `include_directives` set is used by the nested templates to
prevent generation of duplicate include directives, but is not iterated
over by those nested templates.

In `idl__type_support.c.em` rather that iterating over the
`include_directives` set (which introduces non-determinism) to generate
the includes for top level header files, we iterate over those top level
includes as a list and seperately construct the `include_directives` set
from the list. This ensures deterministic codegen.

Signed-off-by: Harry Sarson <[email protected]>
@harrysarson harrysarson force-pushed the harry/deterministic-codegen branch from 1c760e0 to a9ea6b0 Compare December 5, 2024 09:15
@harrysarson
Copy link
Contributor Author

harrysarson commented Dec 5, 2024

I have had a go at this, please see latest commits. I kept the OrderedDict in rosidl_generator_c/resource/full__description.c.em for the includes variable (this matches what we do here for example

).

@harrysarson harrysarson closed this Dec 5, 2024
@harrysarson harrysarson reopened this Dec 5, 2024
Copy link
Contributor

@clalancette clalancette left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

One more change, then this will be ready for CI. Thanks for iterating.

@harrysarson harrysarson changed the title Use OrderedDict for reproducible codegen Deterministic iteration order for reproducible codegen Dec 6, 2024
@clalancette
Copy link
Contributor

Pulls: #846
Gist: https://gist.githubusercontent.com/clalancette/3b0f4ae9440be6b9395a03b5fb6fdfa0/raw/b854cb8c05665050c8ef1c8f0a5c5ecd0630c6a4/ros2.repos
BUILD args:
TEST args:
ROS Distro: rolling
Job: ci_launcher
ci_launcher ran: https://ci.ros2.org/job/ci_launcher/14929

  • Linux Build Status
  • Linux-aarch64 Build Status
  • Linux-rhel Build Status
  • Windows Build Status

@clalancette clalancette merged commit 95739d5 into ros2:rolling Dec 10, 2024
4 checks passed
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

2 participants