Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Implement subgraph splitting for overflow resolution #388

Merged
merged 1 commit into from
May 10, 2023
Merged

Conversation

cmyr
Copy link
Member

@cmyr cmyr commented May 5, 2023

A port of the harfbuzz implementation; when overflows occur in subgraphs with multiple 32-bit roots (e.g. extension subtables sharing coverage tables or similar) we will iteratively select some subset of those roots and move them to another subgraph, duplicating and dependencies.

based on #386, which should go in first.

Base automatically changed from hb-repacker to main May 8, 2023 15:49
@cmyr cmyr force-pushed the isolate-subgraph branch 4 times, most recently from a122fdf to 357be56 Compare May 9, 2023 14:46

assert!(!overflows.is_empty());
self.debug_overflows(&overflows);
//FIXME: return some kind of error here
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I suggest we might err on the side of issues over comments for this sort of thing, with possible exception for cases where we're going to fix it in O(days)

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oops this comment is out of date; we check the return and convert to an error in the dump_table fn.

n_overflows == 0

// now isolate spaces in a loop, until there are no more left:
let overflows = loop {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you write this as while !overflows.is_empty() instead of sticking a return in the middle? Hm. Or structure it as iteration?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The problem is that we aren't always going to continue until there are no overflows left; sometimes overflow is unavoidable.

The control flow here is:

  • check if there are overflows
  • if there are none, return
  • if there are some, try to make progress on resolving them
  • if we don't make progress, break the loop and print diagnostics

This should always terminate, since we exit the loop if either there are no overflows, or if there are overflows but there are no splittable subgraphs remaining.

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The comment above reads as if we are going to continue until there are none so that might need a fix then. IMHO while primary-exit-condition is still nicer, even if there is another path toward failure.

What about using Result<(), SomeErrType> so you can use normal control flow, e.g. try_splitting_subgraphs(...)??

}
}
}
result
}

fn debug_overflows(&self, overflows: &[Overflow]) {
let (parents, children): (HashSet<_>, HashSet<_>) =
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

if debug is not enabled early exit

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also should be behind log feature per other conversation? - np if that's out of scope here though.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

yea we don't have a log feature yet, will revisit that when I make that change (opened #397 to track this)

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It looks like the early exit for log disabled - avoiding the iterations and zips to produce logs we never write - was not done?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We don't have a log feature, so there's nothing to use to gate this code. In any case this exists right now as a utility for me to use while I am implementing the splitting/promotion stuff.

Comment on lines +694 to +698
for overflow in overflows {
let child_space = self.nodes[&overflow.child].space;
// we only isolate subgraphs in wide-space
if !child_space.is_custom() || self.num_roots_per_space[&child_space] <= 1 {
continue;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Optional, looks like this would fit very naturally as .iter().filter()?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't feel like a clear improvement, because i want to reuse the child_space binding outside of the filter. 🤷

for overflow in overflows.iter().filter(|overflow| {
            let child_space = self.nodes[&overflow.child].space;
            // we only isolate subgraphs in wide-space
            !(!child_space.is_custom() || self.num_roots_per_space[&child_space] <= 1)
        }) {
            let child_space = self.nodes[&overflow.child].space;
            let root = self.find_root_of_space(overflow.child);
            debug_assert_eq!(self.nodes[&root].space, child_space);
            to_isolate
                .entry(child_space)
                .or_insert_with(HashSet::new)
                .insert(root);
        }


if to_isolate.is_empty() {
return false;
}
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

At a glance it looks like you could write this as to_isolate = overflows.iter.filter.map.collect?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The reason I do this in a loop is because I want to use the 'entry' method on HashMap, which doesn't work with collect. Basically I'm finding multiple roots per space.

// │ └─8─┘ │ / \ / \
// │ │ (cov tables) 7' 8' 7 8
// └───7───┘
//
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Diagram appreciated, without it the test would be inscrutable

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed, I found it impossible to come up with good test cases without drawing them out first.

In the case that a subgraph in 32-bit space contains an overflow, we
will now attempt to select some set of roots in that subgraph and move
them to a new space, duplicating any children as required.
@dfrg
Copy link
Member

dfrg commented May 10, 2023

QQ: ObjectIds are allocated with a global usize counter. Should we change this to an explicit u64 (or allocate ids locally in the graph) to avoid potential overflows if someone does decide to run this on a 32-bit system?

@cmyr
Copy link
Member Author

cmyr commented May 10, 2023

QQ: ObjectIds are allocated with a global usize counter. Should we change this to an explicit u64 (or allocate ids locally in the graph) to avoid potential overflows if someone does decide to run this on a 32-bit system?

Hmm... I would expect the process to only be running for a specific compilation, which even in the case of a very large multi-access font it's hard to imagine there being 4.2e9 unique objects? And in that case I would definitely expect us to be on a 64-bit system. That said, it wouldn't hurt...

@cmyr cmyr mentioned this pull request May 10, 2023
@cmyr cmyr merged commit 812ceb6 into main May 10, 2023
@cmyr cmyr deleted the isolate-subgraph branch May 10, 2023 17:31
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants