forked from intel/llvm
-
Notifications
You must be signed in to change notification settings - Fork 0
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
[SYCL][Sema] Prevent Ctor/Dtor from being called during SYCL kernel r… #1
Open
Naghasan
wants to merge
1
commit into
sycl
Choose a base branch
from
victor/memcpy-kernel-arg-tmp
base: sycl
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Conversation
This file contains bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
…ebuild User can declare their type as device copyable even if they do not meet the criterias. In some cases, this triggers calls to undefined, deleted or not allowed on device functions. Per SYCL rules, a SYCL kernel object must be device copiable which implies the structure can be initialized using memcpy and have a trivial destructor. Plus the spec doesn't guarantee that the constructor/destructor should get called at all as it allows the functor to be reused by multiple WI. This patch exploit that implementation defined behavior by preventing the constructor/destructor of being called by wrapping it into a union. In order to prevent this, the SYCL kernel is wrapped into a union and instead of using initializer list, the SYCL kernel clone is init using assignment for scalars, memcpy for non special types and Ctor/__init for SYCL special types. Signed-off-by: Victor Lomuller <[email protected]>
Naghasan
pushed a commit
that referenced
this pull request
Feb 6, 2023
In RegisterInfos_loongarch64.h, r22 is defined twice. Having an extra array member causes problems reading and writing registers defined after r22. So, for r22, keep the alias fp, delete the s9 alias. The PC register is incorrectly accessed when the step command is executed. The step command behavior is incorrect. This test reflects this problem: ``` loongson@linux:~$ cat test.c #include <stdio.h> int func(int a) { return a + 1; } int main(int argc, char const *argv[]) { func(10); return 0; } loongson@linux:~$ clang -g test.c -o test ``` Without this patch: ``` loongson@linux:~$ llvm-project/llvm/build/bin/lldb test (lldb) target create "test" Current executable set to '/home/loongson/test' (loongarch64). (lldb) b main Breakpoint 1: where = test`main + 40 at test.c:8:3, address = 0x0000000120000668 (lldb) r Process 278049 launched: '/home/loongson/test' (loongarch64) Process 278049 stopped * thread #1, name = 'test', stop reason = breakpoint 1.1 frame #0: 0x0000000120000668 test`main(argc=1, argv=0x00007fffffff72a8) at test.c:8:3 5 } 6 7 int main(int argc, char const *argv[]) { -> 8 func(10); 9 return 0; 10 } 11 (lldb) s Process 278049 stopped * thread #1, name = 'test', stop reason = step in frame #0: 0x0000000120000670 test`main(argc=1, argv=0x00007fffffff72a8) at test.c:9:3 6 7 int main(int argc, char const *argv[]) { 8 func(10); -> 9 return 0; 10 } ``` With this patch: ``` loongson@linux:~$ llvm-project/llvm/build/bin/lldb test (lldb) target create "test" Current executable set to '/home/loongson/test' (loongarch64). (lldb) b main Breakpoint 1: where = test`main + 40 at test.c:8:3, address = 0x0000000120000668 (lldb) r Process 278632 launched: '/home/loongson/test' (loongarch64) Process 278632 stopped * thread #1, name = 'test', stop reason = breakpoint 1.1 frame #0: 0x0000000120000668 test`main(argc=1, argv=0x00007fffffff72a8) at test.c:8:3 5 } 6 7 int main(int argc, char const *argv[]) { -> 8 func(10); 9 return 0; 10 } 11 (lldb) s Process 278632 stopped * thread #1, name = 'test', stop reason = step in frame #0: 0x0000000120000624 test`func(a=10) at test.c:4:10 1 #include <stdio.h> 2 3 int func(int a) { -> 4 return a + 1; 5 } ``` Reviewed By: SixWeining, DavidSpickett Differential Revision: https://reviews.llvm.org/D140615
Naghasan
pushed a commit
that referenced
this pull request
Feb 6, 2023
When building/testing ASan inside the GCC tree on Solaris while using GNU `ld` instead of Solaris `ld`, a large number of tests SEGVs on both sparc and x86 like this: Thread 2 received signal SIGSEGV, Segmentation fault. [Switching to Thread 1 (LWP 1)] 0xfe014cfc in __sanitizer::atomic_load<__sanitizer::atomic_uintptr_t> (a=0xfc602a58, mo=__sanitizer::memory_order_acquire) at sanitizer_common/sanitizer_atomic_clang_x86.h:46 46 v = a->val_dont_use; 1: x/i $pc => 0xfe014cfc <_ZN11__sanitizer11atomic_loadINS_16atomic_uintptr_tEEENT_4TypeEPVKS2_NS_12memory_orderE+62>: mov (%eax),%eax (gdb) bt #0 0xfe014cfc in __sanitizer::atomic_load<__sanitizer::atomic_uintptr_t> (a=0xfc602a58, mo=__sanitizer::memory_order_acquire) at sanitizer_common/sanitizer_atomic_clang_x86.h:46 #1 0xfe0bd1d7 in __sanitizer::DTLS_NextBlock (cur=0xfc602a58) at sanitizer_common/sanitizer_tls_get_addr.cpp:53 intel#2 0xfe0bd319 in __sanitizer::DTLS_Find (id=1) at sanitizer_common/sanitizer_tls_get_addr.cpp:77 intel#3 0xfe0bd466 in __sanitizer::DTLS_on_tls_get_addr (arg_void=0xfeffd068, res=0xfe602a18, static_tls_begin=0, static_tls_end=0) at sanitizer_common/sanitizer_tls_get_addr.cpp:116 intel#4 0xfe063f81 in __interceptor___tls_get_addr (arg=0xfeffd068) at sanitizer_common/sanitizer_common_interceptors.inc:5501 intel#5 0xfe0a3054 in __sanitizer::CollectStaticTlsBlocks (info=0xfeffd108, size=40, data=0xfeffd16c) at sanitizer_common/sanitizer_linux_libcdep.cpp:366 intel#6 0xfe6ba9fa in dl_iterate_phdr () from /usr/lib/ld.so.1 intel#7 0xfe0a3132 in __sanitizer::GetStaticTlsBoundary (addr=0xfe608020, size=0xfeffd244, align=0xfeffd1b0) at sanitizer_common/sanitizer_linux_libcdep.cpp:382 intel#8 0xfe0a33f7 in __sanitizer::GetTls (addr=0xfe608020, size=0xfeffd244) at sanitizer_common/sanitizer_linux_libcdep.cpp:482 intel#9 0xfe0a34b1 in __sanitizer::GetThreadStackAndTls (main=true, stk_addr=0xfe608010, stk_size=0xfeffd240, tls_addr=0xfe608020, tls_size=0xfeffd244) at sanitizer_common/sanitizer_linux_libcdep.cpp:565 The address being accessed is unmapped. However, even when the tests `PASS` with Solaris `ld`, `ASAN_OPTIONS=verbosity=2` shows ==6582==__tls_get_addr: Can't guess glibc version Given that that the code is stricly `glibc`-specific according to `sanitizer_tls_get_addr.h`, there seems little point in using the interceptor on non-`glibc` targets. That's what this patch does. Tested on `i386-pc-solaris2.11` and `sparc-sun-solaris2.11` inside the GCC tree. Differential Revision: https://reviews.llvm.org/D141385
Naghasan
pushed a commit
that referenced
this pull request
May 30, 2023
…est unittest Need to finalize the DIBuilder to avoid leak sanitizer errors like this: Direct leak of 48 byte(s) in 1 object(s) allocated from: #0 0x55c99ea1761d in operator new(unsigned long) #1 0x55c9a518ae49 in operator new intel#2 0x55c9a518ae49 in llvm::MDTuple::getImpl(...) intel#3 0x55c9a4f1b1ec in getTemporary intel#4 0x55c9a4f1b1ec in llvm::DIBuilder::createFunction(...)
Naghasan
pushed a commit
that referenced
this pull request
Oct 9, 2023
Summary: Thread sanitizer reports the following data race: ``` Write of size 8 at 0x000103303e70 by thread T1 (mutexes: write M0): #0 RNBRemote::CommDataReceived(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>> const&) RNBRemote.cpp:1075 (debugserver:arm64+0x100038db8) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) #1 RNBRemote::ThreadFunctionReadRemoteData(void*) RNBRemote.cpp:1180 (debugserver:arm64+0x1000391dc) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) Previous read of size 8 at 0x000103303e70 by main thread: #0 RNBRemote::GetPacketPayload(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&) RNBRemote.cpp:797 (debugserver:arm64+0x100037c5c) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) #1 RNBRemote::GetPacket(std::__1::basic_string<char, std::__1::char_traits<char>, std::__1::allocator<char>>&, RNBRemote::Packet&, bool) RNBRemote.cpp:907 (debugserver:arm64+0x1000378cc) (BuildId: f130b34f693c4f3eba96139104af2b7132000000200000000100000000000e00) ``` RNBRemote already has a mutex, extend its usage to protect the read of m_rx_packets. Reviewers: jdevlieghere, bulbazord, jingham Subscribers:
Naghasan
pushed a commit
that referenced
this pull request
Feb 15, 2024
…ass template explict specializations (#78720) According to [[dcl.type.elab] p2](http://eel.is/c++draft/dcl.type.elab#2): > If an [elaborated-type-specifier](http://eel.is/c++draft/dcl.type.elab#nt:elaborated-type-specifier) is the sole constituent of a declaration, the declaration is ill-formed unless it is an explicit specialization, an explicit instantiation or it has one of the following forms [...] Consider the following: ```cpp template<typename T> struct A { template<typename U> struct B; }; template<> template<typename U> struct A<int>::B; // #1 ``` The _elaborated-type-specifier_ at `#1` declares an explicit specialization (which is itself a template). We currently (incorrectly) reject this, and this PR fixes that. I moved the point at which _elaborated-type-specifiers_ with _nested-name-specifiers_ are diagnosed from `ParsedFreeStandingDeclSpec` to `ActOnTag` for two reasons: `ActOnTag` isn't called for explicit instantiations and partial/explicit specializations, and because it's where we determine if a member specialization is being declared. With respect to diagnostics, I am currently issuing the diagnostic without marking the declaration as invalid or returning early, which results in more diagnostics that I think is necessary. I would like feedback regarding what the "correct" behavior should be here.
Naghasan
pushed a commit
that referenced
this pull request
Feb 15, 2024
…ing bound ops (#80317) `getDataOperandBaseAddr` retrieve the address of a value when we need to generate bound operations. When switching to HLFIR, we did not really handle the fact that this value was then pointing to the result of a hlfir.declare. Because of that the `#1` value was being used. `#0` value is carrying the correct information about lowerbounds and should be used. This patch updates the `getDataOperandBaseAddr` function to use the correct result value from hlfir.declare.
Naghasan
pushed a commit
that referenced
this pull request
Feb 19, 2024
The concurrent tests all do a pthread_join at the end, and concurrent_base.py stops after that pthread_join and sanity checks that only 1 thread is running. On macOS, after pthread_join() has completed, there can be an extra thread still running which is completing the details of that task asynchronously; this causes testsuite failures. When this happens, we see the second thread is in ``` frame #0: 0x0000000180ce7700 libsystem_kernel.dylib`__ulock_wake + 8 frame #1: 0x0000000180d25ad4 libsystem_pthread.dylib`_pthread_joiner_wake + 52 frame intel#2: 0x0000000180d23c18 libsystem_pthread.dylib`_pthread_terminate + 384 frame intel#3: 0x0000000180d23a98 libsystem_pthread.dylib`_pthread_terminate_invoke + 92 frame intel#4: 0x0000000180d26740 libsystem_pthread.dylib`_pthread_exit + 112 frame intel#5: 0x0000000180d26040 libsystem_pthread.dylib`_pthread_start + 148 ``` there are none of the functions from the test file present on this thread. In this patch, instead of counting the number of threads, I iterate over the threads looking for functions from our test file (by name) and only count threads that have at least one of them. It's a lower frequency failure than the darwin kernel bug causing an extra step instruction mach exception when hardware breakpoint/watchpoints are used, but once I fixed that, this came up as the next most common failure for these tests. rdar://110555062
Naghasan
pushed a commit
that referenced
this pull request
Feb 27, 2024
…(#80904)" This reverts commit b1ac052. This commit breaks coroutine splitting for non-swift calling convention functions. In this example: ```ll ; ModuleID = 'repro.ll' source_filename = "stdlib/test/runtime/test_llcl.mojo" target datalayout = "e-m:e-p270:32:32-p271:32:32-p272:64:64-i64:64-i128:128-f80:128-n8:16:32:64-S128" target triple = "x86_64-unknown-linux-gnu" @0 = internal constant { i32, i32 } { i32 trunc (i64 sub (i64 ptrtoint (ptr @craSH to i64), i64 ptrtoint (ptr getelementptr inbounds ({ i32, i32 }, ptr @0, i32 0, i32 1) to i64)) to i32), i32 64 } define dso_local void @af_suspend_fn(ptr %0, i64 %1, ptr %2) #0 { ret void } define dso_local void @craSH(ptr %0) #0 { %2 = call token @llvm.coro.id.async(i32 64, i32 8, i32 0, ptr @0) %3 = call ptr @llvm.coro.begin(token %2, ptr null) %4 = getelementptr inbounds { ptr, { ptr, ptr }, i64, { ptr, i1 }, i64, i64 }, ptr poison, i32 0, i32 0 %5 = call ptr @llvm.coro.async.resume() store ptr %5, ptr %4, align 8 %6 = call { ptr, ptr, ptr } (i32, ptr, ptr, ...) @llvm.coro.suspend.async.sl_p0p0p0s(i32 0, ptr %5, ptr @ctxt_proj_fn, ptr @af_suspend_fn, ptr poison, i64 -1, ptr poison) ret void } define dso_local ptr @ctxt_proj_fn(ptr %0) #0 { ret ptr %0 } ; Function Attrs: nomerge nounwind declare { ptr, ptr, ptr } @llvm.coro.suspend.async.sl_p0p0p0s(i32, ptr, ptr, ...) #1 ; Function Attrs: nounwind declare token @llvm.coro.id.async(i32, i32, i32, ptr) intel#2 ; Function Attrs: nounwind declare ptr @llvm.coro.begin(token, ptr writeonly) intel#2 ; Function Attrs: nomerge nounwind declare ptr @llvm.coro.async.resume() #1 attributes #0 = { "target-features"="+adx,+aes,+avx,+avx2,+bmi,+bmi2,+clflushopt,+clwb,+clzero,+crc32,+cx16,+cx8,+f16c,+fma,+fsgsbase,+fxsr,+invpcid,+lzcnt,+mmx,+movbe,+mwaitx,+pclmul,+pku,+popcnt,+prfchw,+rdpid,+rdpru,+rdrnd,+rdseed,+sahf,+sha,+sse,+sse2,+sse3,+sse4.1,+sse4.2,+sse4a,+ssse3,+vaes,+vpclmulqdq,+wbnoinvd,+x87,+xsave,+xsavec,+xsaveopt,+xsaves" } attributes #1 = { nomerge nounwind } attributes intel#2 = { nounwind } ``` This verifier crashes after the `coro-split` pass with ``` cannot guarantee tail call due to mismatched parameter counts musttail call void @af_suspend_fn(ptr poison, i64 -1, ptr poison) LLVM ERROR: Broken function PLEASE submit a bug report to https://github.com/llvm/llvm-project/issues/ and include the crash backtrace. Stack dump: 0. Program arguments: opt ../../../reduced.ll -O0 #0 0x00007f1d89645c0e __interceptor_backtrace.part.0 /build/gcc-11-XeT9lY/gcc-11-11.4.0/build/x86_64-linux-gnu/libsanitizer/asan/../../../../src/libsanitizer/sanitizer_common/sanitizer_common_interceptors.inc:4193:28 #1 0x0000556d94d254f7 llvm::sys::PrintStackTrace(llvm::raw_ostream&, int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:723:22 intel#2 0x0000556d94d19a2f llvm::sys::RunSignalHandlers() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Signals.cpp:105:20 intel#3 0x0000556d94d1aa42 SignalHandler(int) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/Unix/Signals.inc:371:36 intel#4 0x00007f1d88e42520 (/lib/x86_64-linux-gnu/libc.so.6+0x42520) intel#5 0x00007f1d88e969fc __pthread_kill_implementation ./nptl/pthread_kill.c:44:76 intel#6 0x00007f1d88e969fc __pthread_kill_internal ./nptl/pthread_kill.c:78:10 intel#7 0x00007f1d88e969fc pthread_kill ./nptl/pthread_kill.c:89:10 intel#8 0x00007f1d88e42476 gsignal ./signal/../sysdeps/posix/raise.c:27:6 intel#9 0x00007f1d88e287f3 abort ./stdlib/abort.c:81:7 intel#10 0x0000556d8944be01 std::vector<llvm::json::Value, std::allocator<llvm::json::Value>>::size() const /usr/include/c++/11/bits/stl_vector.h:919:40 intel#11 0x0000556d8944be01 bool std::operator==<llvm::json::Value, std::allocator<llvm::json::Value>>(std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&, std::vector<llvm::json::Value, std::allocator<llvm::json::Value>> const&) /usr/include/c++/11/bits/stl_vector.h:1893:23 intel#12 0x0000556d8944be01 llvm::json::operator==(llvm::json::Array const&, llvm::json::Array const&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Support/JSON.h:572:69 intel#13 0x0000556d8944be01 llvm::json::operator==(llvm::json::Value const&, llvm::json::Value const&) (.cold) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/JSON.cpp:204:28 intel#14 0x0000556d949ed2bd llvm::report_fatal_error(char const*, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Support/ErrorHandling.cpp:82:70 intel#15 0x0000556d8e37e876 llvm::SmallVectorBase<unsigned int>::size() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:91:32 intel#16 0x0000556d8e37e876 llvm::SmallVectorTemplateCommon<llvm::DiagnosticInfoOptimizationBase::Argument, void>::end() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:282:41 intel#17 0x0000556d8e37e876 llvm::SmallVector<llvm::DiagnosticInfoOptimizationBase::Argument, 4u>::~SmallVector() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallVector.h:1215:24 intel#18 0x0000556d8e37e876 llvm::DiagnosticInfoOptimizationBase::~DiagnosticInfoOptimizationBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:413:7 intel#19 0x0000556d8e37e876 llvm::DiagnosticInfoIROptimization::~DiagnosticInfoIROptimization() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:622:7 intel#20 0x0000556d8e37e876 llvm::OptimizationRemark::~OptimizationRemark() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/DiagnosticInfo.h:689:7 intel#21 0x0000556d8e37e876 operator() /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2213:14 intel#22 0x0000556d8e37e876 emit<llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::CGSCCAnalysisManager&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&)::<lambda()> > /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/Analysis/OptimizationRemarkEmitter.h:83:12 intel#23 0x0000556d8e37e876 llvm::CoroSplitPass::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroSplit.cpp:2212:13 intel#24 0x0000556d8c36ecb1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::CoroSplitPass, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 intel#25 0x0000556d91c1a84f llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:90:12 intel#26 0x0000556d8c3690d1 llvm::detail::PassModel<llvm::LazyCallGraph::SCC, llvm::PassManager<llvm::LazyCallGraph::SCC, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&>::run(llvm::LazyCallGraph::SCC&, llvm::AnalysisManager<llvm::LazyCallGraph::SCC, llvm::LazyCallGraph&>&, llvm::LazyCallGraph&, llvm::CGSCCUpdateResult&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 intel#27 0x0000556d91c2162d llvm::ModuleToPostOrderCGSCCPassAdaptor::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Analysis/CGSCCPassManager.cpp:278:18 intel#28 0x0000556d8c369035 llvm::detail::PassModel<llvm::Module, llvm::ModuleToPostOrderCGSCCPassAdaptor, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 intel#29 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20 intel#30 0x0000556d8e30979e llvm::CoroConditionalWrapper::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/lib/Transforms/Coroutines/CoroConditionalWrapper.cpp:19:74 intel#31 0x0000556d8c365755 llvm::detail::PassModel<llvm::Module, llvm::CoroConditionalWrapper, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManagerInternal.h:91:3 intel#32 0x0000556d9457abc5 llvm::PassManager<llvm::Module, llvm::AnalysisManager<llvm::Module>>::run(llvm::Module&, llvm::AnalysisManager<llvm::Module>&) /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/PassManager.h:247:20 intel#33 0x0000556d89818556 llvm::SmallPtrSetImplBase::isSmall() const /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:196:33 intel#34 0x0000556d89818556 llvm::SmallPtrSetImplBase::~SmallPtrSetImplBase() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:84:17 intel#35 0x0000556d89818556 llvm::SmallPtrSetImpl<llvm::AnalysisKey*>::~SmallPtrSetImpl() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:321:7 intel#36 0x0000556d89818556 llvm::SmallPtrSet<llvm::AnalysisKey*, 2u>::~SmallPtrSet() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/ADT/SmallPtrSet.h:427:7 intel#37 0x0000556d89818556 llvm::PreservedAnalyses::~PreservedAnalyses() /home/ubuntu/modular/third-party/llvm-project/llvm/include/llvm/IR/Analysis.h:109:7 intel#38 0x0000556d89818556 llvm::runPassPipeline(llvm::StringRef, llvm::Module&, llvm::TargetMachine*, llvm::TargetLibraryInfoImpl*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::ToolOutputFile*, llvm::StringRef, llvm::ArrayRef<llvm::PassPlugin>, llvm::ArrayRef<std::function<void (llvm::PassBuilder&)>>, llvm::opt_tool::OutputKind, llvm::opt_tool::VerifierKind, bool, bool, bool, bool, bool, bool, bool) /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/NewPMDriver.cpp:532:10 intel#39 0x0000556d897e3939 optMain /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/optdriver.cpp:737:27 intel#40 0x0000556d89455461 main /home/ubuntu/modular/third-party/llvm-project/llvm/tools/opt/opt.cpp:25:33 intel#41 0x00007f1d88e29d90 __libc_start_call_main ./csu/../sysdeps/nptl/libc_start_call_main.h:58:16 intel#42 0x00007f1d88e29e40 call_init ./csu/../csu/libc-start.c:128:20 intel#43 0x00007f1d88e29e40 __libc_start_main ./csu/../csu/libc-start.c:379:5 intel#44 0x0000556d897b6335 _start (/home/ubuntu/modular/.derived/third-party/llvm-project/build-relwithdebinfo-asan/bin/opt+0x150c335) Aborted (core dumped)
Naghasan
pushed a commit
that referenced
this pull request
Jul 3, 2024
…erSize (#67657)" This reverts commit f0b3654. This commit triggers UB by reading an uninitialized variable. `UP.PartialThreshold` is used uninitialized in `getUnrollingPreferences()` when it is called from `LoopVectorizationPlanner::executePlan()`. In this case the `UP` variable is created on the stack and its fields are not initialized. ``` ==8802==WARNING: MemorySanitizer: use-of-uninitialized-value #0 0x557c0b081b99 in llvm::BasicTTIImplBase<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/CodeGen/BasicTTIImpl.h #1 0x557c0b07a40c in llvm::TargetTransformInfo::Model<llvm::X86TTIImpl>::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) llvm-project/llvm/include/llvm/Analysis/TargetTransformInfo.h:2277:17 intel#2 0x557c0f5d69ee in llvm::TargetTransformInfo::getUnrollingPreferences(llvm::Loop*, llvm::ScalarEvolution&, llvm::TargetTransformInfo::UnrollingPreferences&, llvm::OptimizationRemarkEmitter*) const llvm-project/llvm/lib/Analysis/TargetTransformInfo.cpp:387:19 intel#3 0x557c0e6b96a0 in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7624:7 intel#4 0x557c0e6e4b63 in llvm::LoopVectorizePass::processLoop(llvm::Loop*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10253:13 intel#5 0x557c0e6f2429 in llvm::LoopVectorizePass::runImpl(llvm::Function&, llvm::ScalarEvolution&, llvm::LoopInfo&, llvm::TargetTransformInfo&, llvm::DominatorTree&, llvm::BlockFrequencyInfo*, llvm::TargetLibraryInfo*, llvm::DemandedBits&, llvm::AssumptionCache&, llvm::LoopAccessInfoManager&, llvm::OptimizationRemarkEmitter&, llvm::ProfileSummaryInfo*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10344:30 intel#6 0x557c0e6f2f97 in llvm::LoopVectorizePass::run(llvm::Function&, llvm::AnalysisManager<llvm::Function>&) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:10383:9 [...] Uninitialized value was created by an allocation of 'UP' in the stack frame #0 0x557c0e6b961e in llvm::LoopVectorizationPlanner::executePlan(llvm::ElementCount, unsigned int, llvm::VPlan&, llvm::InnerLoopVectorizer&, llvm::DominatorTree*, bool, llvm::DenseMap<llvm::SCEV const*, llvm::Value*, llvm::DenseMapInfo<llvm::SCEV const*, void>, llvm::detail::DenseMapPair<llvm::SCEV const*, llvm::Value*>> const*) llvm-project/llvm/lib/Transforms/Vectorize/LoopVectorize.cpp:7623:3 ```
Naghasan
pushed a commit
that referenced
this pull request
Jul 3, 2024
…0820) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, a template template parameter containing no packs should be more specialized than one that does. Given the following example: ```C++ template<class T2> struct A; template<template<class ...T3s> class TT1, class T4> struct A<TT1<T4>>; // #1 template<template<class T5 > class TT2, class T6> struct A<TT2<T6>>; // intel#2 template<class T1> struct B; template struct A<B<char>>; ``` Prior to P0522, candidate `intel#2` would be more specialized. After P0522, neither is more specialized, so this becomes ambiguous. With this change, `intel#2` becomes more specialized again, maintaining compatibility with pre-P0522 implementations. The problem is that in P0522, candidates are at least as specialized when matching packs to fixed-size lists both ways, whereas before, a fixed-size list is more specialized. This patch keeps the original behavior when checking template arguments outside deduction, but restores this aspect of pre-P0522 matching during deduction. --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.
Naghasan
pushed a commit
that referenced
this pull request
Jul 3, 2024
… (#92855) This solves some ambuguity introduced in P0522 regarding how template template parameters are partially ordered, and should reduce the negative impact of enabling `-frelaxed-template-template-args` by default. When performing template argument deduction, we extend the provisional wording introduced in llvm/llvm-project#89807 so it also covers deduction of class templates. Given the following example: ```C++ template <class T1, class T2 = float> struct A; template <class T3> struct B; template <template <class T4> class TT1, class T5> struct B<TT1<T5>>; // #1 template <class T6, class T7> struct B<A<T6, T7>>; // intel#2 template struct B<A<int>>; ``` Prior to P0522, `intel#2` was picked. Afterwards, this became ambiguous. This patch restores the pre-P0522 behavior, `intel#2` is picked again. This has the beneficial side effect of making the following code valid: ```C++ template<class T, class U> struct A {}; A<int, float> v; template<template<class> class TT> void f(TT<int>); // OK: TT picks 'float' as the default argument for the second parameter. void g() { f(v); } ``` --- Since this changes provisional implementation of CWG2398 which has not been released yet, and already contains a changelog entry, we don't provide a changelog entry here.
Naghasan
pushed a commit
that referenced
this pull request
Jul 3, 2024
The problematic program is as follows: ```shell #define pre_a 0 #define PRE(x) pre_##x void f(void) { PRE(a) && 0; } int main(void) { return 0; } ``` in which after token concatenation (`##`), there's another nested macro `pre_a`. Currently only the outer expansion region will be produced. ([compiler explorer link](https://godbolt.org/#g:!((g:!((g:!((h:codeEditor,i:(filename:'1',fontScale:14,fontUsePx:'0',j:1,lang:___c,selection:(endColumn:29,endLineNumber:8,positionColumn:29,positionLineNumber:8,selectionStartColumn:29,selectionStartLineNumber:8,startColumn:29,startLineNumber:8),source:'%23define+pre_a+0%0A%23define+PRE(x)+pre_%23%23x%0A%0Avoid+f(void)+%7B%0A++++PRE(a)+%26%26+0%3B%0A%7D%0A%0Aint+main(void)+%7B+return+0%3B+%7D'),l:'5',n:'0',o:'C+source+%231',t:'0')),k:51.69491525423727,l:'4',n:'0',o:'',s:0,t:'0'),(g:!((g:!((h:compiler,i:(compiler:cclang_assertions_trunk,filters:(b:'0',binary:'1',binaryObject:'1',commentOnly:'0',debugCalls:'1',demangle:'0',directives:'0',execute:'0',intel:'0',libraryCode:'1',trim:'1',verboseDemangling:'0'),flagsViewOpen:'1',fontScale:14,fontUsePx:'0',j:2,lang:___c,libs:!(),options:'-fprofile-instr-generate+-fcoverage-mapping+-fcoverage-mcdc+-Xclang+-dump-coverage-mapping+',overrides:!(),selection:(endColumn:1,endLineNumber:1,positionColumn:1,positionLineNumber:1,selectionStartColumn:1,selectionStartLineNumber:1,startColumn:1,startLineNumber:1),source:1),l:'5',n:'0',o:'+x86-64+clang+(assertions+trunk)+(Editor+%231)',t:'0')),k:34.5741843594503,l:'4',m:28.903654485049834,n:'0',o:'',s:0,t:'0'),(g:!((h:output,i:(compilerName:'x86-64+clang+(trunk)',editorid:1,fontScale:14,fontUsePx:'0',j:2,wrap:'1'),l:'5',n:'0',o:'Output+of+x86-64+clang+(assertions+trunk)+(Compiler+%232)',t:'0')),header:(),l:'4',m:71.09634551495017,n:'0',o:'',s:0,t:'0')),k:48.30508474576271,l:'3',n:'0',o:'',t:'0')),l:'2',m:100,n:'0',o:'',t:'0')),version:4)) ```text f: File 0, 4:14 -> 6:2 = #0 Decision,File 0, 5:5 -> 5:16 = M:0, C:2 Expansion,File 0, 5:5 -> 5:8 = #0 (Expanded file = 1) File 0, 5:15 -> 5:16 = #1 Branch,File 0, 5:15 -> 5:16 = 0, 0 [2,0,0] File 1, 2:16 -> 2:23 = #0 File 2, 1:15 -> 1:16 = #0 File 2, 1:15 -> 1:16 = #0 Branch,File 2, 1:15 -> 1:16 = 0, 0 [1,2,0] ``` The inner expansion region isn't produced because: 1. In the range-based for loop quoted below, each sloc is processed and possibly emit a corresponding expansion region. 2. For our sloc in question, its direct parent returned by `getIncludeOrExpansionLoc()` is a `<scratch space>`, because that's how `##` is processed. https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L518-L520 3. This `<scratch space>` cannot be found in the FileID mapping so `ParentFileID` will be assigned an `std::nullopt` https://github.com/llvm/llvm-project/blob/88b6186af3908c55b357858eb348b5143f21c289/clang/lib/CodeGen/CoverageMappingGen.cpp#L521-L526 4. As a result this iteration of for loop finishes early and no expansion region is added for the sloc. This problem gets worse with MC/DC: as the example shows, there's a branch from File 2 but File 2 itself is missing. This will trigger assertion failures. The fix is more or less a workaround and takes a similar approach as #89573. ~~Depends on #89573.~~ This includes #89573. Kudos to @chapuni! This and #89573 together fix #87000: I tested locally, both the reduced program and my original use case (fwiw, Linux kernel) can run successfully. --------- Co-authored-by: NAKAMURA Takumi <[email protected]>
Naghasan
pushed a commit
that referenced
this pull request
Jul 3, 2024
…intel#13999) Example `sycl-ls --verbose` output: ``` ... Platform [#1]: Version : 1.3 Name : Intel(R) Level-Zero Vendor : Intel(R) Corporation Devices : 1 Type : gpu Version : 12.60.7 Name : Intel(R) Data Center GPU Max 1550 Vendor : Intel(R) Corporation Driver : 1.3.26918 UUID : 1341282141147000580000000 Num SubDevices : 2 Num SubSubDevices : 8 Aspects : gpu fp16 fp64 ... ```
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
…ebuild
User can declare their type as device copyable even if they do not meet the criterias. In some cases,
this triggers calls to undefined, deleted or not allowed on device functions.
Per SYCL rules, a SYCL kernel object must be device copiable which implies the structure can be initialized using memcpy and have a trivial destructor. Plus the spec doesn't guarantee that the constructor/destructor should get called at all as it allows the functor to be reused by multiple WI.
This patch exploit that implementation defined behavior by preventing the constructor/destructor of being called by wrapping it into a union.
In order to prevent this, the SYCL kernel is wrapped into a union and instead of using initializer list,
the SYCL kernel clone is init using assignment for scalars, memcpy for non special types and Ctor/__init for SYCL special types.
Signed-off-by: Victor Lomuller [email protected]