Skip to content

Add a batch::Bicgstab solver class, core, ref and omp kernels#1438

Merged
pratikvn merged 36 commits into
developfrom
batch-bicgstab
Nov 1, 2023
Merged

Add a batch::Bicgstab solver class, core, ref and omp kernels#1438
pratikvn merged 36 commits into
developfrom
batch-bicgstab

Conversation

@pratikvn

Copy link
Copy Markdown
Member

This PR adds a batch::Bicgstab solver and only the reference kernels for now. Another PR will be created to add the cuda, hip and dpcpp kernels to avoid making this PR too large.

In addition, some general solver, stopping critieria, logger and preconditioner framework is also added. These are fairly simple and I think it helps review these in the context of the solver itself.

  1. Batch stopping criteria
  2. Simple batch logger
  3. Some batch matrix generation utilities
  4. A basic BatchIdentity matrix class and a corresponding Identity preconditioner to enable unpreconditioned solves.
  5. The batch dispatch mechanism that selects the correct matrix, solver, preconditioner, stopping critieria at runtime and dispatches the correct kernel on the device.

@pratikvn pratikvn added 1:ST:WIP This PR is a work in progress. Not ready for review. type:batched-functionality This is related to the batched functionality in Ginkgo labels Oct 21, 2023
@pratikvn pratikvn added this to the Release 1.7.0 milestone Oct 21, 2023
@pratikvn pratikvn self-assigned this Oct 21, 2023
@ginkgo-bot ginkgo-bot added reg:build This is related to the build system. reg:testing This is related to testing. type:solver This is related to the solvers type:preconditioner This is related to the preconditioners type:matrix-format This is related to the Matrix formats type:stopping-criteria This is related to the stopping criteria mod:all This touches all Ginkgo modules. labels Oct 21, 2023
@pratikvn pratikvn force-pushed the batch-bicgstab branch 2 times, most recently from 25a894a to 26472b9 Compare October 23, 2023 05:36

@MarcelKoch MarcelKoch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think we can use our unified kernels approach for some of these parts. In particular, the logger and stopping criteria don't use any backend specific stuff, except for some function attributes. Those could also be handled uniformly through macros, which we already have.

I think even the identity preconditioner could be handled this way, although that would require some adjustments to our unified kernels, so I think we should postpone that.

@MarcelKoch MarcelKoch self-requested a review October 23, 2023 09:13

@yhmtsai yhmtsai left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

first part of my review



/**
* Logs the final residual and iteration count for a batch solver.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
* Logs the final residual and iteration count for a batch solver.
* Logs the final actual residual norm and iteration count for a batch solver.

It is for actual residual not implicit residual, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

That depends on the solver, so I would not specify that here.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it also applied to criterion?
If it is, it gives unexpected convergence behavior. User sometimes gets the residual indeed less the requirement (actual residual) but sometimes get higher residual as converged result because it depends on the implicit one

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, criterion checks are also always with whatever residual the solver provides.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Maybe I should clarify that we always check against the implicit residual within the solvers. In some cases, the implicit residual and the actual residual may be the same, but that depends on the solver.

Comment thread common/cuda_hip/log/batch_logger.hpp.inc Outdated
Comment thread common/cuda_hip/preconditioner/batch_identity.hpp.inc Outdated
Comment thread common/cuda_hip/stop/batch_criteria.hpp.inc
Comment thread common/cuda_hip/stop/batch_criteria.hpp.inc Outdated
Comment thread include/ginkgo/core/matrix/batch_identity.hpp Outdated
Comment thread include/ginkgo/core/matrix/batch_identity.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_bicgstab.hpp Outdated
Comment thread reference/preconditioner/batch_identity.hpp
Comment thread reference/stop/batch_criteria.hpp Outdated

@MarcelKoch MarcelKoch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think the code can use some of the new core developments. For example, the factory parameter can be unified, or maybe the workspace can be extended to also cover the batched case. But some of those changes (e.g. the workspace) could be done at a later time. So for now I'm focusing on the interface to allow for these changes.
Part 1/n

Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated
@MarcelKoch MarcelKoch self-requested a review October 24, 2023 08:04
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp Outdated

@MarcelKoch MarcelKoch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 2/n, mostly done with the interface and core stuff (except the test helpers). I think especially on the logger side there are some inconsistencies that I would like to see addressed.

Comment thread include/ginkgo/core/matrix/batch_dense.hpp Outdated
Comment thread core/matrix/batch_struct.hpp
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread core/solver/batch_dispatch.hpp Outdated
Comment thread core/solver/batch_dispatch.hpp Outdated
Comment thread core/matrix/batch_struct.hpp
Comment thread include/ginkgo/core/solver/batch_solver_base.hpp
Comment thread core/solver/batch_dispatch.hpp Outdated
Comment thread core/test/solver/batch_bicgstab.cpp Outdated
Comment thread core/test/solver/batch_bicgstab.cpp Outdated

@yhmtsai yhmtsai left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

second part

Comment thread core/solver/batch_dispatch.hpp
Comment thread dpcpp/log/batch_logger.hpp
Comment thread include/ginkgo/core/log/logger.hpp Outdated
Comment thread reference/preconditioner/batch_identity.hpp Outdated
* Sets the input and generates the identity preconditioner.(Nothing needs
* to be actually generated.)
*/
void generate(size_type,

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does batch_identity need to be preconditioner?
batch_identity will be passed through the generated_preconditioner or the default preconditioner, right?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Essentially, the solver will always have prec.generate(...) and prec_apply(...) calls. As it is templated, in the default case, we need to have the identity preconditioner.

Comment on lines +266 to +262
initialize(A_entry, b_entry, gko::batch::to_const(x_entry), rho_old_entry,
omega_entry, alpha_entry, r_entry, r_hat_entry, p_entry,
p_hat_entry, v_entry, rhs_norms_entry, res_norms_entry);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the function call is slightly different from the core/solver/bicgstab. Is there any benefit merge b-Ax and r_hat = r to initialize? keeping them similar to core might be easier for reviewing

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I draw back my comment because the other kernel can put the dot together unlike the core already

Comment thread reference/solver/batch_bicgstab_kernels.hpp.inc Outdated
Comment thread reference/solver/batch_bicgstab_kernels.hpp.inc Outdated

template <typename StopType, typename PrecType, typename LogType,
typename BatchMatrixType, typename ValueType>
inline void batch_entry_bicgstab_impl(

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I also think the core part can be shared among backends, but I do not focus on that now.
I assume the fused kernel from GPU perspective

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think we can think about unifying this later.

Comment thread omp/solver/batch_bicgstab_kernels.cpp Outdated

@MarcelKoch MarcelKoch left a comment

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Part 3/3. This concerns mostly the reference/omp kernel and tests. There are only few notes on the kernels (beside moving parts into common/unified). I think there are some easy generalizations in the test helpers possible.

Comment thread core/test/utils/batch_helpers.hpp Outdated
Comment thread core/test/utils/batch_helpers.hpp Outdated
Comment thread core/test/utils/batch_helpers.hpp Outdated
Comment thread core/test/utils/batch_helpers.hpp
Comment thread core/test/utils/batch_helpers.hpp Outdated
for (size_t i = 0; i < this->num_batch_items; i++) {
ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0),
this->solver_settings.residual_tol);
ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not sure that this is a helpful test. IMO it would be better to compare the solver result to the true solution, or just leave it out. The test above might already be sufficient.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it should be equal not near, I think?

Comment thread reference/stop/batch_criteria.hpp Outdated
Comment thread reference/test/solver/batch_bicgstab_kernels.cpp Outdated
Comment thread omp/solver/batch_bicgstab_kernels.cpp
Comment thread omp/solver/batch_bicgstab_kernels.cpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/log/batch_logger.hpp
Comment thread include/ginkgo/core/log/batch_logger.hpp Outdated
Comment thread include/ginkgo/core/solver/batch_bicgstab.hpp Outdated
Comment on lines +165 to +168
auto iter_array = res.log_data->iter_counts.get_const_data();
for (size_t i = 0; i < num_batch_items; i++) {
ASSERT_EQ(iter_array[i], ref_iters);
}

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

does it make the linear system unsolved? otherwise, it might be less than ref_iters

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, the tolerance of 0 is not acheivable and it should always hit the ref iters

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

using nan is maybe more general, which also fit if we decide to use <= not <

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Will that work on device as well ?

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, I think so. It should work if the compiler does not use fast math.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

In this case, it is still not possible be acheive a tolerance of 0, so i think nan is not necessary.

auto comp_res_norm =
exec->copy_val_to_host(res.res_norm->get_const_values() + i);
ASSERT_LE(iter_counts->get_const_data()[i], max_iters);
EXPECT_LE(res_norm->get_const_data()[i], comp_tol);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

why does this criterion need use 100 * tol not tol if the criterion is absolute residual norm?

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think there were issues only on some systems, particularly MSVC. Not sure why.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's might related to the optimization or different random input?
The codes gives me the confusion about the criterion.
From my first thought, it is actual residual norm check. That's why I do not think that the residual norm does not match the required criterion makes sense.

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think this code is a bit stale and has been updated. So, I think it should be correct now. In the updated code, comp_res_norm is the actual residual while resnorm is the residual from the logger, which in this case is the implicit residual.

for (size_t i = 0; i < this->num_batch_items; i++) {
ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0),
this->solver_settings.residual_tol);
ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i],

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also, it should be equal not near, I think?

Comment on lines +259 to +260
EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);
ASSERT_LE(rel_res_norm, tol * 10);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);
ASSERT_LE(rel_res_norm, tol * 10);
EXPECT_EQ(rel_res_norm, res_norm.get_const_data()[i]);
ASSERT_LE(rel_res_norm, tol);


GKO_ASSERT_BATCH_MTX_NEAR(res.x, linear_system.exact_sol, tol * 50);
for (size_t i = 0; i < num_batch_items; i++) {
ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);
ASSERT_LE(res.res_norm->get_const_values()[i], tol);

Copy link
Copy Markdown
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Both MSVC and NVHPC seem to have issues with even 50.

@MarcelKoch

Copy link
Copy Markdown
Member

@pratikvn Do you mind holding off on the rebasing until all reviews are done (unless necessary)? Github can't keep track of the new changes otherwise (and VS Code seems also unable to do so).

@pratikvn pratikvn force-pushed the batch-bicgstab branch 2 times, most recently from 82712a3 to e17e58d Compare October 25, 2023 22:54
@pratikvn

pratikvn commented Oct 25, 2023

Copy link
Copy Markdown
Member Author

@yhmtsai , the issue of tolerance is the same we have had in other places. Some compilers always seem to need higher values for tolerances, so the values of 50, 10 and 100 have been set empirically.

@pratikvn pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Oct 25, 2023
pratikvn and others added 23 commits October 31, 2023 23:46
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Pratik Nayak <pratikvn@pm.me>
Co-authored-by: Yu-Hsian Tsai <yhmtsai@gmail.com>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Terry Cojean <terry.cojean@kit.edu>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
@pratikvn

pratikvn commented Nov 1, 2023

Copy link
Copy Markdown
Member Author

As the discussion of the experimental namespace is independent of this PR and this PR has been reviewed, I will go ahead and merge this now to simplify the other batch PR as our CI seems to be stuck.

@pratikvn pratikvn merged commit 3d8dc38 into develop Nov 1, 2023
@pratikvn pratikvn deleted the batch-bicgstab branch November 1, 2023 09:06
@tcojean tcojean mentioned this pull request Nov 6, 2023
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

1:ST:ready-to-merge This PR is ready to merge. mod:all This touches all Ginkgo modules. reg:build This is related to the build system. reg:testing This is related to testing. type:batched-functionality This is related to the batched functionality in Ginkgo type:matrix-format This is related to the Matrix formats type:preconditioner This is related to the preconditioners type:solver This is related to the solvers type:stopping-criteria This is related to the stopping criteria

Projects

Status: Completed

Development

Successfully merging this pull request may close these issues.

6 participants