-
Notifications
You must be signed in to change notification settings - Fork 423
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
--llvm-wide-opt TODOs #7523
Comments
2 tasks
mppf
added a commit
that referenced
this issue
Oct 3, 2017
Enable --llvm-wide-opt with struct wide pointers Previous to this PR, --llvm-wide-opt has not been working, because `CHPL_WIDE_POINTERS=node16` was not working. See issue #6149. But the packed wide pointer representation was introduced along with `--llvm-wide-opt` as a workaround for problems with LLVM optimizations not understanding that pointers in different address spaces can have different sizes. That was a serious issue in the LLVM 3.3 era but it appears to have been fixed in later LLVMs. The packed wide pointer representation has never had much support, other than enabling the experimental `--llvm-wide-opt`. For one thing, the representation limits maximum scaling. For another, it appears to inhibit some C/LLVM optimization because the packed pointers lose type information. Therefore this PR takes two steps: 1. Completele remove CHPL_WIDE_POINTERS and support for packed wide pointers. 2. Update `--llvm-wide-opt` support to work with a structure wide pointer representation 3. Make use of new interfaces provided by LLVM to remove some workarounds 4. Fix bugs and get `--llvm-wide-opt` in many cases. I don't think `--llvm-wide-opt` has ever passed full local testing, and it hasn't worked at all for a release or so. So even if this PR doesn't get all tests passing with `--llvm-wide-opt`, it is an improvement to the current situation. ### What is --llvm-wide-opt? See also doc/rst/technotes/llvm.rst, but --llvm-wide-opt activates a mode where Chapel generates wide pointers as address-space-100 pointers, and GET is load, PUT is store (or these could possibly be the LLVM memcpy instrinsic in more general cases). Then normal LLVM optimizations run and can do things like hoist a GET/load out of a loop. After LLVM optimizations run, we run llvmAggregateGlobalOps, which un-does some deoptimization that some other LLVM passes might do, and combines loads/stores to provably adjacent memory locations. Finally, we run llvmGlobalToWide, which "lowers" all of this address-space-100 business back to wide pointers and calls to runtime functions to GET and PUT. ### Performance Impact * `performance/ferguson/remote-record-read-licm.chpl` shows LLVM optimizations hoisting a GET out of a loop * observed no change in performance for stream-ep, lulesh with or without --llvm-wide-opt * miniMD shows about 25% performance improvement with --llvm-wide-opt over C backend * PTRANS shows about a 4x performance speedup with --llvm-wide-opt vs the C backend. Elliot ran some performance testing with revision 3218725 and there were performance regressions that are fixed by later commits. ### Notes on particular changes: dc82aa3 makes a few methods on AST classes const and adds Type::isWidePtrType() which enables a workaround in a later commit. 5c4c514 uses Type::isWidePtrType in code generation of types to work around a hack in parallel.cpp and then 96ec460 adjusts codegenWideThingField to further work around this issue. ad9319b cleans up how we use clang and separates out details of the clang invocation from GenInfo into a new ClangInfo structure. Additionally, it uses CreateLLVMCodeGen (new in 3.9) now instead of relying on internal Clang types. Unfortunately, I couldn't find a way to generate an llvm::Type* from a clang::Type* using the public API and so we still use some internal clang headers, but we are much closer to being API-clean, so to speak. We might be able to get this Type information and also GEP offset generation (the other missing piece) in to an additional interface for clang's CodeGenModule. 8783dba switches over to running our own LLVM optimization pipeline, instead of relying on clang's pipeline. Without this change, I was unable to get the data layout string in the module to reflect that address space 100 pointers are 128 bits (and that causes all sorts of problems with the lowering in llvmGlobalToWide). Within EmitBackendOutput, the clang code *really* wants to replace the data layout string with whatever it thinks is appropriate. But we need to adjust it before/after the llvmGlobalToWide pass. 70b0a10 uses an alternative strategy to EmitBackendOutput so we don't run optimizations twice and 0520ced fixes some problems with detecting target Features (such as sse) and -mllvm parsing. 1313e3e was originally in PR #7496 and is included here so that this PR can be tested with LLVM 5 and to minimize conflicts. That PR has a good description of the change. Resolves #6149. Reviewed by @dmk42 - thanks! Further TODOs (including getting more tests working with --llvm-wide-opt) are in #7523 - [x] full local testing with CHPL_LLVM=llvm - [x] full local --llvm testing
I'm closing this issue because now we track TODOs in a different repository, for the most part. So this is moved to https://github.com/Cray/chapel-private/issues/2121. |
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
The text was updated successfully, but these errors were encountered: