-
Notifications
You must be signed in to change notification settings - Fork 61
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add initial layout selection framework #1264
Conversation
c942df4
to
f796e2d
Compare
818383c
to
dd5197a
Compare
af8ea16
to
4f040e1
Compare
// result of sum has row-major layout, i.e., with don't cares at the end | ||
// (1, 2, ..., 32, x, x, x, x, ..., x) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Related to #1258
rotate-and-reduce code (for example, sum tensor<32x32xi16> to tensor<32xi16> then sum to i16) would assume cyclic repetition for the middle layout (d0 -> d0), namely the tail is not "dont care" but meaningful data. Could we describe that
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is a good question. affine_map
can't natively represent the semantics of what outside the image of the layout map. It seems reasonable to me that we could require cyclic repetition by default for all layouts that don't fill up their ciphertexts.
I can think of a few obstacles to doing this:
- It requires the data to have dimensions divisible by a power of 2 (so that we can get at least one full repetition, a partial repetition would be invalid), though I think most compilers do padding on non-power-of-two-sized inputs anyway.
- Some op may have a kernel that doesn't produce cyclic repetition in its output layout, in which case we would have to insert an op (during the lowering, likely) to restore this invariant. The overhead could add up to unacceptable performance.
But I'm willing to forge ahead with this assumption anyway, since if/when it becomes a problem, we can wrap the layout attribute in a larger attribute that includes an enum specifying the semantics.
159def7
to
9d6144b
Compare
Did a bit of cleanup and left some TODOs so we can get the first big piece of this PR in. New stuff since last comment:
Followup work:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Happy to see this! Just some comments on the MLIR side, others LGTM!
c2e9461
to
609cbf3
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks! Just a few very qq and I'll mark as approved for you to submit
@@ -0,0 +1,114 @@ | |||
#ifndef LIB_TRANSFORMS_LAYOUTPROPAGATION_LAYOUTPROPAGATION_TD_ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you place this transform under lib/Dialect/TensorExt/Transforms? If a transform is pretty dialect specific I think we'd like to put it there, to avoid making lib/Transforms/ such a mixed bag (@AlexanderViand-Intel and I still had leftover un-categorized transforms there, but this one seems pretty clearly under TensorExt)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I actually had it there and decided, because of the dependence on ops from tensor
, tensor_ext
and linalg
, that it should be considered a general transform not specific to one dialect. There will definitely be more linalg
ops coming (convolution, matmul, matvec) and I could also see it supporting layout assignment for higher-level ML ops if they are present.
9cb2bc6
to
b488f00
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
thanks!
This PR adds an initial ciphertext layout selection pass.
This involves the following:
This pass does not optimize for the selected layouts. Instead, I expect to define additional passes that optimize, while allowing this pass to be doubly used as an analysis pass (persisting layout information to ops as attributes) as well as an initial layout selection pass. As such, the logic in this pass is to convert all secret+tensor op operands to have the layout of the op's first operand.
This PR also does not yet lower vecmat/matvec using Halevi-Shoup, nor does it lower convert_layout (which will require its own effort to implement a shift network optimization algorithm). I wanted to get feedback from the community on the direction as it relates to the layout attribute and propagation before doing too much additional work.
Example commands
An IR that requires layout conversion ops to be inserted
bazel run --noallow_analysis_cache_discard //tools:heir-opt -- --layout-propagation $PWD/tests/Transforms/layout_propagation/insert_conversion.mlir
Result: https://gist.github.com/j2kun/eef84273a83893a992694c219b4dde3f
Layout assignments for Halevi-Shoup matvec mul:
bazel run //tools:heir-opt -- --secretize --wrap-generic --drop-unit-dims --canonicalize --cse --linalg-canonicalizations --layout-propagation $PWD/tests/Examples/openfhe/halevi_shoup_matmul.mlir
Result: https://gist.github.com/j2kun/75b676fe90d5ba1f2c67ad1a2409406b
Open questions
layout-propagation
as defined in this PR, and will have to do a global optimization from scratch.Notes