-
Notifications
You must be signed in to change notification settings - Fork 7
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[WIP] LLVM-based Locality Optimization for Chapel programs #9
base: master
Are you sure you want to change the base?
Conversation
Function* f = call->getCalledFunction(); | ||
if (f != NULL) { | ||
/* * | ||
* We are assuming that gf.addr function calls correspond to Chapel's local statements, but this is not always true because gf.addr is also used to extract a local pointer from a wide pointer. We work on this later pass (see exemptionTest in llvmLocalityOptimization.cpp). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Could you split this comment line into multiple lines < 80 characters?
To infer the locality of each possibly-remote data access considering CFG, the inequality graph should be constructed accordingly. Since the origianl LLVM IR does not have any information on Def/Use of locality (e.g. local statement in Chapel), we first need to construct our own CFG having such information. Next, locality-SSA can be built in a similar way that classic compilers does and finally inequality graph can easily be constructed. Here is the steps: (Step 1: This commit) Construct our own CFG by recognizing def/use of locality (e.g. calling @.gf.addr and then doing load and store => local statement) (Step 2: Locality-SSA Construction 1 TODO) Compute Dominator Tree and Dominance Frontier of the CFG (Step 3: Locality-SSA Construction 2 TODO) Insert Phi-function
Hi, Michael, Thank you very much for the valuable comments, that's very helpful. My changes includes:
I've just sent you an email to give you more details including things that I would like to keep confidential for now. Please let me know if there is anything I missed. Thanks, Akihiro |
Summary
An initial version of LLVM-based Locality Inference Pass (Locality Optimization Pass). This pass tries to convert possibly-remote accesses (addrspace(100)* accesses) to definitely-local accesses as much as possible at compilation time to avoid runtime affinity checking overheads.
What I would expect
I would appreciate if Chapel folks give me some feedback particularly on:
Detection of Chapel Runtime API calls
The current implementation tries to recover Local statement, Chapel Array Construction, and Chapel Array Access since such information is lost at the time of LLVM IR generation. Additionally, the recovery process can easily fail because it completely depends on how the Chapel-LLVM frontends emits LLVM IRs. I'd suggest that the frontend add some annotations and/or attributes so the LLVM-based PGAS optimization can easily recognize the high-level information. Also this will be helpful to make this pass language-agnostic.
Coding style
If there is a preferred coding style for this repository. Please let me know.
Any other feedback is certainly appreciated! For more details of the locality optimization pass, please see below.
How it works
To infer the locality, the locality optimization pass tries to utilize following information :
(Please also see test/local.ll)
Case 1. Scalar access enclosed by Chapel's LOCAL statement
The locality level of x is inferred by searching an SSA value graph, which is implemented in IGraph.[h|cpp]. When you specify debugThisFn, the pass generates .dot file that can be visualized by the graphviz tool. (http://www.graphviz.org/)
Case 2. Array access enclosed by Chapel's LOCAL statement
The locality optimization pass is element-sensitive. For example, the locality of A(1) is definitely-local, but the pass leave A(2) possibly-remote since there is no enough information about the locality of A(2). This is done by using a reduced version of the LLVM's global value numbering pass for assigning a value number to variables and expressions (in ValueTable.[h|cpp]) and an array offset analysis.
Case 3. Locale-locale array declaration
The locality of A(5) is definitely-local since an array A is declared in this scope. Note that this pass is not element-sensitve so far.
Limitations
Chapel's Local statement Detection
Currently, we are assuming that gf.addr function calls correspond to Chapel's local statements, but this is not always true because gf.addr is also used to extract a local pointer from a wide pointer. To avoid this problem, we have an std::vector named "NonLocals" to record a retun value of gf.addr which is also an argument of gf.make and the NonLocals are referred when doing "exemptionTest". This may not be always true. Ideally, a PGAS-LLVM frontend should tell the locality optimization pass which gf.addr call is a local statement.
Example :
Chapel's Array Declaration Detection
We basically look for chpl__convertRuntimeTypeToValue to detect Chapel's array declaration. Please see analyzeCallInsn for more details.
TODOs and future work
The utilization of high-level information
The locality optimization pass has to recover high-level information such as array accesses and local statements from low-level LLVM IR, but ideally, PGAS-LLVM frontend are supposed to add annotations to keep these information so the locality optimization can easily recognize high-level information and perform language-agnostic PGAS optimization.
Locality Inference considering if statements
The current implementation does not propagate a condition even if a local statement is enclosed by if statement. Hence, we may fail to infer the locality in some cases like this:
if (condition) { local{ p = x } })
Make it inter-procedural pass
This can make more possibly-remote accesses to definitely-local accesses.
More experiments with the latest version of the Chapel compiler (1.12.0)
I have been mainly working with the Chapel compiler 1.9.0. I need to check more if the locality optimization pass works.