HIP Clang compatibility#567
Conversation
|
With the hip clang setup, the main compiler used by HIP is sitting in /opt/rocm/llvm/bin/clang++ cannot compile c++11 with a default installationThe basic fact that c++11 is not properly compiled is clear when we When compiling with a recent g++, we get instead: The important part here is that the clang++ version does not use Root cause: HIP clang does not know recent gcc installationsTo understand the root of the problem, we can find which gcc installation Even when loading the relevant modules (I have gcc 9.2.0 loaded), this Unpractical solution: using a recent gcc with clang the manual wayThere is one unpractical way for projects to target another gcc We need to pass in the Other unpractical solution (untested?)Maybe using a CMake cross-compilation toolchain file would work? Summary and possible fixesIn summary, clang is unable to find a modern version of gcc. Here are some
|
|
We might be able to make the unpractical solution work by determining the gcc toolchain path from the implicit include directories used by the CXX compiler: |
|
Indeed I guess that would work, though I think that's for a specific case of using clang with gcc, and we need to populate all the separate |
|
That might be an issues either with LLVM or SPACK - a quick look through the clang source [1] [2] suggests that this should be set to a suitable GCC toolchain location at configuration time. |
|
I don't think hip-clang can be installed from spack, so it cannot give the We need the |
|
Ah sorry, I guess you knew all that already, I must have skipped over your first comment too quickly. |
|
Yes maybe I didn't lay this out properly, but the last three points are some other solutions I thought of which are rather on the server/system side. One of which include the What really surprised me is that looking at clang's code, it didn't seem like it tried to use the environment or something to add some runtime check. I guess there can be multiple issues with that. |
7db3aef to
02f2af3
Compare
02f2af3 to
0fb3802
Compare
yhmtsai
left a comment
There was a problem hiding this comment.
hip < 3.5 always uses hip-hcc, and hip >= 3.5 always uses hip-clang, is it correct?
| endif() | ||
|
|
||
| if(NOT DEFINED HIPBLAS_PATH) | ||
| if(DEFINED ENV{HIPBLAS_PATH}) |
There was a problem hiding this comment.
I forgot to mention it before. In hipblas/hipsparse, they use if (DEFINED $ENV{}) https://github.com/tcojean/hipBLAS/blob/ad14b3bf5ec6da92abd646a533268aace33e0902/CMakeLists.txt#L9
which should be if (DEFINED ENV{}) here
There was a problem hiding this comment.
Good find. Maybe they fixed it upstream. I need to update these repositories.
|
Yes, 3.5 always uses hip-clang. They deprecated hcc. Technically, I guess it could still be installed manually though. |
0fb3802 to
709214d
Compare
Codecov Report
@@ Coverage Diff @@
## develop #567 +/- ##
========================================
Coverage 84.01% 84.01%
========================================
Files 296 296
Lines 20352 20358 +6
========================================
+ Hits 17098 17104 +6
Misses 3254 3254
Continue to review full report at Codecov.
|
|
I can now confirm that this now fully works with a hip-clang setup (with both shared and static libraries). I ended up trying it out on our AMD system after updating the ROCm suite to 3.5. I had to fix a last naughty issue concerning the benchmarks linking to |
+ Add the new option to pass in HIP clang specific options + Properly detect and populate the new critical ROCM_PATH and HIP_CLANG_PATH variables. + Make HIP THRUST easier to be found thanks to using an optional environment variable. + Also add some missing test for the HipError and related classes + Also fix some missing includes which were found
the `CLANG_OPTIONS` addition to the hip_add_executable is available since roc-3.5 only. See: https://github.com/ROCm-Developer-Tools/HIP/blob/roc-3.3.x/cmake/FindHIP.cmake (does not have it) https://github.com/ROCm-Developer-Tools/HIP/blob/roc-3.5.x/cmake/FindHIP.cmake (has it)
Since it does not work in the FindHIP.cmake file.
16af7ee to
6d0fbab
Compare
+ It is necessary to ban several flags in both INTERFACE_COMPILE_OPTIONS and INTERFACE_LINK_LIBRARIES, which are clang (with HIP enabled) dependant. + Add a macro which does this and use it for both static libraries so as to not propagate this issue everywhere, in particular to user libraries/executables, and for benchmarks using both CUDA and HIP capabilities. fix
Co-authored-by: Tobias Ribizel <ribizel@kit.edu>
6d0fbab to
723b15a
Compare
|
Kudos, SonarCloud Quality Gate passed!
|
The Ginkgo team is proud to announce the new minor release of Ginkgo version 1.2.0. This release brings full HIP support to Ginkgo, new preconditioners (ParILUT, ISAI), conversion between double and float for all LinOps, and many more features and fixes. Supported systems and requirements: + For all platforms, cmake 3.9+ + Linux and MacOS + gcc: 5.3+, 6.3+, 7.3+, all versions after 8.1+ + clang: 3.9+ + Intel compiler: 2017+ + Apple LLVM: 8.0+ + CUDA module: CUDA 9.0+ + HIP module: ROCm 2.8+ + Windows + MinGW and CygWin: gcc 5.3+, 6.3+, 7.3+, all versions after 8.1+ + Microsoft Visual Studio: VS 2017 15.7+ + CUDA module: CUDA 9.0+, Microsoft Visual Studio + OpenMP module: MinGW or CygWin. The current known issues can be found in the [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues). # Additions Here are the main additions to the Ginkgo library. Other thematic additions are listed below. + Add full HIP support to Ginkgo [#344](#344), [#357](#357), [#384](#384), [#373](#373), [#391](#391), [#396](#396), [#395](#395), [#393](#393), [#404](#404), [#439](#439), [#443](#443), [#567](#567) + Add a new ISAI preconditioner [#489](#489), [#502](#502), [#512](#512), [#508](#508), [#520](#520) + Add support for ParILUT and ParICT factorization with ILU preconditioners [#400](#400) + Add a new BiCG solver [#438](#438) + Add a new permutation matrix format [#352](#352), [#469](#469) + Add CSR SpGEMM support [#386](#386), [#398](#398), [#418](#418), [#457](#457) + Add CSR SpGEAM support [#556](#556) + Make all solvers and preconditioners transposable [#535](#535) + Add CsrBuilder and CooBuilder for intrusive access to matrix arrays [#437](#437) + Add a standard-compliant allocator based on the Executors [#504](#504) + Support conversions for all LinOp between double and float [#521](#521) + Add a new boolean to the CUDA and HIP executors to control DeviceReset (default off) [#557](#557) + Add a relaxation factor to IR to represent Richardson Relaxation [#574](#574) + Add two new stopping criteria, for relative (to `norm(b)`) and absolute residual norm [#577](#577) ### Example additions + Templatize all examples to simplify changing the precision [#513](#513) + Add a new adaptive precision block-Jacobi example [#507](#507) + Add a new IR example [#522](#522) + Add a new Mixed Precision Iterative Refinement example [#525](#525) + Add a new example on iterative trisolves in ILU preconditioning [#526](#526), [#536](#536), [#550](#550) ### Compilation and library changes + Auto-detect compilation settings based on environment [#435](#435), [#537](#537) + Add SONAME to shared libraries [#524](#524) + Add clang-cuda support [#543](#543) ### Other additions + Add sorting, searching and merging kernels for GPUs [#403](#403), [#428](#428), [#417](#417), [#455](#455) + Add `gko::as` support for smart pointers [#493](#493) + Add setters and getters for criterion factories [#527](#527) + Add a new method to check whether a solver uses `x` as an initial guess [#531](#531) + Add contribution guidelines [#549](#549) # Fixes ### Algorithms + Improve the classical CSR strategy's performance [#401](#401) + Improve the CSR automatical strategy [#407](#407), [#559](#559) + Memory, speed improvements to the ELL kernel [#411](#411) + Multiple improvements and fixes to ParILU [#419](#419), [#427](#427), [#429](#429), [#456](#456), [#544](#544) + Fix multiple issues with GMRES [#481](#481), [#523](#523), [#575](#575) + Optimize OpenMP matrix conversions [#505](#505) + Ensure the linearity of the ILU preconditioner [#506](#506) + Fix IR's use of the advanced apply [#522](#522) + Fix empty matrices conversions and add tests [#560](#560) ### Other core functionalities + Fix complex number support in our math header [#410](#410) + Fix CUDA compatibility of the main ginkgo header [#450](#450) + Fix isfinite issues [#465](#465) + Fix the Array::view memory leak and the array/view copy/move [#485](#485) + Fix typos preventing use of some interface functions [#496](#496) + Fix the `gko::dim` to abide to the C++ standard [#498](#498) + Simplify the executor copy interface [#516](#516) + Optimize intermediate storage for Composition [#540](#540) + Provide an initial guess for relevant Compositions [#561](#561) + Better management of nullptr as criterion [#562](#562) + Fix the norm calculations for complex support [#564](#564) ### CUDA and HIP specific + Use the return value of the atomic operations in our wrappers [#405](#405) + Improve the portability of warp lane masks [#422](#422) + Extract thread ID computation into a separate function [#464](#464) + Reorder kernel parameters for consistency [#474](#474) + Fix the use of `pragma unroll` in HIP [#492](#492) ### Other + Fix the Ginkgo CMake installation files [#414](#414), [#553](#553) + Fix the Windows compilation [#415](#415) + Always use demangled types in error messages [#434](#434), [#486](#486) + Add CUDA header dependency to appropriate tests [#452](#452) + Fix several sonarqube or compilation warnings [#453](#453), [#463](#463), [#532](#532), [#569](#569) + Add shuffle tests [#460](#460) + Fix MSVC C2398 error [#490](#490) + Fix missing interface tests in test install [#558](#558) # Tools and ecosystem ### Benchmarks + Add better norm support in the benchmarks [#377](#377) + Add CUDA 10.1 generic SpMV support in benchmarks [#468](#468), [#473](#473) + Add sparse library ILU in benchmarks [#487](#487) + Add overhead benchmarking capacities [#501](#501) + Allow benchmarking from a matrix list file [#503](#503) + Fix benchmarking issue with JSON and non-finite numbers [#514](#514) + Fix benchmark logger crashers with OpenMP [#565](#565) ### CI related + Improvements to the CI setup with HIP compilation [#421](#421), [#466](#466) + Add MacOSX CI support [#470](#470), [#488](#488) + Add Windows CI support [#471](#471), [#488](#488), [#510](#510), [#566](#566) + Use sanitizers instead of valgrind [#476](#476) + Add automatic container generation and update facilities [#499](#499) + Fix the CI parallelism settings [#517](#517), [#538](#538), [#539](#539) + Make the codecov patch check informational [#519](#519) + Add support for LLVM sanitizers with improved thread sanitizer support [#578](#578) ### Test suite + Add an assertion for sparsity pattern equality [#416](#416) + Add core and reference multiprecision tests support [#448](#448) + Speed up GPU tests by avoiding device reset [#467](#467) + Change test matrix location string [#494](#494) ### Other + Add Ginkgo badges from our tools [#413](#413) + Update the `create_new_algorithm.sh` script [#420](#420) + Bump copyright and improve license management [#436](#436), [#433](#433) + Set clang-format minimum requirement [#441](#441), [#484](#484) + Update git-cmake-format [#446](#446), [#484](#484) + Disable the development tools by default [#442](#442) + Add a script for automatic header formatting [#447](#447) + Add GDB pretty printer for `gko::Array` [#509](#509) + Improve compilation speed [#533](#533) + Add editorconfig support [#546](#546) + Add a compile-time check for header self-sufficiency [#552](#552) # Related PR: #583
Make Ginkgo compatible with HIP Clang compilation.
The MI50 machine available at ICL's cluster uses hip-clang as a compiler.
hip-clang actually fixes some of the most horrible thing with the classical
hcccompiler, in particular it's not necessary to ban flags anymorebecause all of these were
hccrelated.Nonetheless, this is still an ongoing process. I believe what is here is mostly
what we need to do, but due to toolchain issue which took me a very long
time to debug, I cannot successfully compile everything on the MI50 machine
from ICL yet.
HIP_CLANG_PATH variables.
As I believe this might be relevant to the future (for our CI systems or any
cluster), I will add in a comment the sneaky toolchain issue that hip clang has.