Skip to content

Add basic distributed capability to Ginkgo#932

Merged
pratikvn merged 61 commits into
developfrom
distributed-develop
Nov 29, 2021
Merged

Add basic distributed capability to Ginkgo#932
pratikvn merged 61 commits into
developfrom
distributed-develop

Conversation

@pratikvn

Copy link
Copy Markdown
Member

This PR merges the basic distributed capability into Ginkgo, namely

  1. MPI layer (Add a MPI layer, some bindings, CMake setup and MPI test framework #908)
  2. Partition class and kernels (Add a class for partitions of intervals #909)

The discussion regarding the interfaces for the above functionalities can be found in the above linked PR's.

MarcelKoch and others added 7 commits November 29, 2021 09:34
- clarify documentation
- renaming
- small kernel fix
- simplify tests

Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>
Co-authored-by: Pratik Nayak <pratik.nayak@kit.edu>
- renaming
- unifying kernels for computing local starting indices
- documentation

Co-authored-by: Aditya Kashi <aditya.kashi@kit.edu>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
- merge /common/unified/.../partitions_kernels.hpp.inc into /commun/cuda_hip/.../partitions_kernsls.hpp.inc
- documentation

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
This PR adds a class to handle a partition of an interval [0,N) into a number of disjoint sub ranges. Each range is associated with a not necessarily unique part ID. 

Related PR: #909
@pratikvn pratikvn added the 1:ST:ready-for-review This PR is ready for review label Nov 29, 2021
@pratikvn pratikvn added this to the Ginkgo 1.5.0 milestone Nov 29, 2021
@pratikvn pratikvn requested a review from a team November 29, 2021 13:50
@pratikvn pratikvn self-assigned this Nov 29, 2021
@ginkgo-bot ginkgo-bot added mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. reg:documentation This is related to documentation. reg:testing This is related to testing. labels Nov 29, 2021
@pratikvn pratikvn added is:new-feature A request or implementation of a feature that does not exist yet. mod:mpi This is related to the MPI module type:distributed-functionality labels Nov 29, 2021
@ginkgo-bot

Copy link
Copy Markdown
Member

Note: This PR changes the Ginkgo ABI:

Functions changes summary: 0 Removed, 0 Changed, 340 Added functions
Variables changes summary: 0 Removed, 0 Changed, 0 Added variable

For details check the full ABI diff under Artifacts here

@codecov

codecov Bot commented Nov 29, 2021

Copy link
Copy Markdown

Codecov Report

Merging #932 (ed77851) into develop (00226d0) will decrease coverage by 1.04%.
The diff coverage is 92.88%.

Impacted file tree graph

@@             Coverage Diff             @@
##           develop     #932      +/-   ##
===========================================
- Coverage    94.39%   93.35%   -1.05%     
===========================================
  Files          453      467      +14     
  Lines        36908    37992    +1084     
===========================================
+ Hits         34839    35467     +628     
- Misses        2069     2525     +456     
Impacted Files Coverage Δ
core/device_hooks/common_kernels.inc.cpp 0.00% <0.00%> (ø)
core/test/utils.hpp 100.00% <ø> (ø)
include/ginkgo/core/base/exception_helpers.hpp 90.90% <ø> (ø)
include/ginkgo/core/base/types.hpp 92.59% <ø> (ø)
third_party/gtest/gtest_mpi_listener.cpp 55.72% <55.72%> (ø)
include/ginkgo/core/distributed/partition.hpp 90.90% <90.90%> (ø)
include/ginkgo/core/base/mpi.hpp 92.39% <92.39%> (ø)
core/distributed/partition.cpp 97.82% <97.82%> (ø)
test/distributed/partition_kernels.cpp 99.25% <99.25%> (ø)
common/unified/distributed/partition_kernels.cpp 100.00% <100.00%> (ø)
... and 18 more

Continue to review full report at Codecov.

Legend - Click here to learn more
Δ = absolute <relative> (impact), ø = not affected, ? = missing data
Powered by Codecov. Last update 00226d0...ed77851. Read the comment docs.

@pratikvn pratikvn added 1:ST:ready-to-merge This PR is ready to merge. and removed 1:ST:ready-for-review This PR is ready for review labels Nov 29, 2021
@pratikvn pratikvn merged commit f7dd61b into develop Nov 29, 2021
@pratikvn pratikvn deleted the distributed-develop branch November 29, 2021 20:06
@pratikvn pratikvn restored the distributed-develop branch November 29, 2021 20:06
tcojean pushed a commit that referenced this pull request Nov 12, 2022
Advertise release 1.5.0 and last changes

+ Add changelog,
+ Update third party libraries
+ A small fix to a CMake file

See PR: #1195

The Ginkgo team is proud to announce the new Ginkgo minor release 1.5.0. This release brings many important new features such as:
- MPI-based multi-node support for all matrix formats and most solvers;
- full DPC++/SYCL support,
- functionality and interface for GPU-resident sparse direct solvers,
- an interface for wrapping solvers with scaling and reordering applied,
- a new algebraic Multigrid solver/preconditioner,
- improved mixed-precision support,
- support for device matrix assembly,

and much more.

If you face an issue, please first check our [known issues page](https://github.com/ginkgo-project/ginkgo/wiki/Known-Issues) and the [open issues list](https://github.com/ginkgo-project/ginkgo/issues) and if you do not find a solution, feel free to [open a new issue](https://github.com/ginkgo-project/ginkgo/issues/new/choose) or ask a question using the [github discussions](https://github.com/ginkgo-project/ginkgo/discussions).

Supported systems and requirements:
+ For all platforms, CMake 3.13+
+ C++14 compliant compiler
+ Linux and macOS
  + GCC: 5.5+
  + clang: 3.9+
  + Intel compiler: 2018+
  + Apple LLVM: 8.0+
  + NVHPC: 22.7+
  + Cray Compiler: 14.0.1+
  + CUDA module: CUDA 9.2+ or NVHPC 22.7+
  + HIP module: ROCm 4.0+
  + DPC++ module: Intel OneAPI 2021.3 with oneMKL and oneDPL. Set the CXX compiler to `dpcpp`.
+ Windows
  + MinGW and Cygwin: GCC 5.5+
  + Microsoft Visual Studio: VS 2019
  + CUDA module: CUDA 9.2+, Microsoft Visual Studio
  + OpenMP module: MinGW or Cygwin.


Algorithm and important feature additions:
+ Add MPI-based multi-node for all matrix formats and solvers (except GMRES and IDR). ([#676](#676), [#908](#908), [#909](#909), [#932](#932), [#951](#951), [#961](#961), [#971](#971), [#976](#976), [#985](#985), [#1007](#1007), [#1030](#1030), [#1054](#1054), [#1100](#1100), [#1148](#1148))
+ Porting the remaining algorithms (preconditioners like ISAI, Jacobi, Multigrid, ParILU(T) and ParIC(T)) to DPC++/SYCL, update to SYCL 2020, and improve support and performance ([#896](#896), [#924](#924), [#928](#928), [#929](#929), [#933](#933), [#943](#943), [#960](#960), [#1057](#1057), [#1110](#1110),  [#1142](#1142))
+ Add a Sparse Direct interface supporting GPU-resident numerical LU factorization, symbolic Cholesky factorization, improved triangular solvers, and more ([#957](#957), [#1058](#1058), [#1072](#1072), [#1082](#1082))
+ Add a ScaleReordered interface that can wrap solvers and automatically apply reorderings and scalings ([#1059](#1059))
+ Add a Multigrid solver and improve the aggregation based PGM coarsening scheme ([#542](#542), [#913](#913), [#980](#980), [#982](#982),  [#986](#986))
+ Add infrastructure for unified, lambda-based, backend agnostic, kernels and utilize it for some simple kernels ([#833](#833), [#910](#910), [#926](#926))
+ Merge different CUDA, HIP, DPC++ and OpenMP tests under a common interface ([#904](#904), [#973](#973), [#1044](#1044), [#1117](#1117))
+ Add a device_matrix_data type for device-side matrix assembly ([#886](#886), [#963](#963), [#965](#965))
+ Add support for mixed real/complex BLAS operations ([#864](#864))
+ Add a FFT LinOp for all but DPC++/SYCL ([#701](#701))
+ Add FBCSR support for NVIDIA and AMD GPUs and CPUs with OpenMP ([#775](#775))
+ Add CSR scaling ([#848](#848))
+ Add array::const_view and equivalent to create constant matrices from non-const data ([#890](#890))
+ Add a RowGatherer LinOp supporting mixed precision to gather dense matrix rows ([#901](#901))
+ Add mixed precision SparsityCsr SpMV support ([#970](#970))
+ Allow creating CSR submatrix including from (possibly discontinuous) index sets ([#885](#885), [#964](#964))
+ Add a scaled identity addition (M <- aI + bM) feature interface and impls for Csr and Dense ([#942](#942))


Deprecations and important changes:
+ Deprecate AmgxPgm in favor of the new Pgm name. ([#1149](#1149)).
+ Deprecate specialized residual norm classes in favor of a common `ResidualNorm` class ([#1101](#1101))
+ Deprecate CamelCase non-polymorphic types in favor of snake_case versions (like array, machine_topology, uninitialized_array, index_set) ([#1031](#1031), [#1052](#1052))
+ Bug fix: restrict gko::share to rvalue references (*possible interface break*) ([#1020](#1020))
+ Bug fix: when using cuSPARSE's triangular solvers, specifying the factory parameter `num_rhs` is now required when solving for more than one right-hand side, otherwise an exception is thrown ([#1184](#1184)).
+ Drop official support for old CUDA < 9.2 ([#887](#887))


Improved performance additions:
+ Reuse tmp storage in reductions in solvers and add a mutable workspace to all solvers ([#1013](#1013), [#1028](#1028))
+ Add HIP unsafe atomic option for AMD ([#1091](#1091))
+ Prefer vendor implementations for Dense dot, conj_dot and norm2 when available ([#967](#967)).
+ Tuned OpenMP SellP, COO, and ELL SpMV kernels for a small number of RHS ([#809](#809))


Fixes:
+ Fix various compilation warnings ([#1076](#1076), [#1183](#1183), [#1189](#1189))
+ Fix issues with hwloc-related tests ([#1074](#1074))
+ Fix include headers for GCC 12 ([#1071](#1071))
+ Fix for simple-solver-logging example ([#1066](#1066))
+ Fix for potential memory leak in Logger ([#1056](#1056))
+ Fix logging of mixin classes ([#1037](#1037))
+ Improve value semantics for LinOp types, like moved-from state in cross-executor copy/clones ([#753](#753))
+ Fix some matrix SpMV and conversion corner cases ([#905](#905), [#978](#978))
+ Fix uninitialized data ([#958](#958))
+ Fix CUDA version requirement for cusparseSpSM ([#953](#953))
+ Fix several issues within bash-script ([#1016](#1016))
+ Fixes for `NVHPC` compiler support ([#1194](#1194))


Other additions:
+ Simplify and properly name GMRES kernels ([#861](#861))
+ Improve pkg-config support for non-CMake libraries ([#923](#923), [#1109](#1109))
+ Improve gdb pretty printer ([#987](#987), [#1114](#1114))
+ Add a logger highlighting inefficient allocation and copy patterns ([#1035](#1035))
+ Improved and optimized test random matrix generation ([#954](#954), [#1032](#1032))
+ Better CSR strategy defaults ([#969](#969))
+ Add `move_from` to `PolymorphicObject` ([#997](#997))
+ Remove unnecessary device_guard usage ([#956](#956))
+ Improvements to the generic accessor for mixed-precision ([#727](#727))
+ Add a naive lower triangular solver implementation for CUDA ([#764](#764))
+ Add support for int64 indices from CUDA 11 onward with SpMV and SpGEMM ([#897](#897))
+ Add a L1 norm implementation ([#900](#900))
+ Add reduce_add for arrays ([#831](#831))
+ Add utility to simplify Dense View creation from an existing Dense vector ([#1136](#1136)).
+ Add a custom transpose implementation for Fbcsr and Csr transpose for unsupported vendor types ([#1123](#1123))
+ Make IDR random initilization deterministic ([#1116](#1116))
+ Move the algorithm choice for triangular solvers from Csr::strategy_type to a factory parameter ([#1088](#1088))
+ Update CUDA archCoresPerSM ([#1175](#1116))
+ Add kernels for Csr sparsity pattern lookup ([#994](#994))
+ Differentiate between structural and numerical zeros in Ell/Sellp ([#1027](#1027))
+ Add a binary IO format for matrix data ([#984](#984))
+ Add a tuple zip_iterator implementation ([#966](#966))
+ Simplify kernel stubs and declarations ([#888](#888))
+ Simplify GKO_REGISTER_OPERATION with lambdas ([#859](#859))
+ Simplify copy to device in tests and examples ([#863](#863))
+ More verbose output to array assertions ([#858](#858))
+ Allow parallel compilation for Jacobi kernels ([#871](#871))
+ Change clang-format pointer alignment to left ([#872](#872))
+ Various improvements and fixes to the benchmarking framework ([#750](#750), [#759](#759), [#870](#870), [#911](#911), [#1033](#1033), [#1137](#1137))
+ Various documentation improvements ([#892](#892), [#921](#921), [#950](#950), [#977](#977), [#1021](#1021), [#1068](#1068), [#1069](#1069), [#1080](#1080), [#1081](#1081), [#1108](#1108), [#1153](#1153), [#1154](#1154))
+ Various CI improvements ([#868](#868), [#874](#874), [#884](#884), [#889](#889), [#899](#899), [#903](#903),  [#922](#922), [#925](#925), [#930](#930), [#936](#936), [#937](#937), [#958](#958), [#882](#882), [#1011](#1011), [#1015](#1015), [#989](#989), [#1039](#1039), [#1042](#1042), [#1067](#1067), [#1073](#1073), [#1075](#1075), [#1083](#1083), [#1084](#1084), [#1085](#1085), [#1139](#1139), [#1178](#1178), [#1187](#1187))
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-to-merge This PR is ready to merge. is:new-feature A request or implementation of a feature that does not exist yet. mod:all This touches all Ginkgo modules. mod:mpi This is related to the MPI module reg:build This is related to the build system. reg:ci-cd This is related to the continuous integration system. reg:documentation This is related to documentation. reg:testing This is related to testing. type:distributed-functionality

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants