Add a batch::Bicgstab solver class, core, ref and omp kernels by pratikvn · Pull Request #1438 · ginkgo-project/ginkgo

pratikvn · 2023-10-21T19:33:15Z

This PR adds a batch::Bicgstab solver and only the reference kernels for now. Another PR will be created to add the cuda, hip and dpcpp kernels to avoid making this PR too large.

In addition, some general solver, stopping critieria, logger and preconditioner framework is also added. These are fairly simple and I think it helps review these in the context of the solver itself.

Batch stopping criteria
Simple batch logger
Some batch matrix generation utilities
A basic BatchIdentity matrix class and a corresponding Identity preconditioner to enable unpreconditioned solves.
The batch dispatch mechanism that selects the correct matrix, solver, preconditioner, stopping critieria at runtime and dispatches the correct kernel on the device.

MarcelKoch

I think we can use our unified kernels approach for some of these parts. In particular, the logger and stopping criteria don't use any backend specific stuff, except for some function attributes. Those could also be handled uniformly through macros, which we already have.

I think even the identity preconditioner could be handled this way, although that would require some adjustments to our unified kernels, so I think we should postpone that.

yhmtsai

first part of my review

yhmtsai · 2023-10-23T08:46:15Z

+
+
+/**
+ * Logs the final residual and iteration count for a batch solver.


Suggested change

* Logs the final residual and iteration count for a batch solver.

* Logs the final actual residual norm and iteration count for a batch solver.

It is for actual residual not implicit residual, right?

That depends on the solver, so I would not specify that here.

Is it also applied to criterion?
If it is, it gives unexpected convergence behavior. User sometimes gets the residual indeed less the requirement (actual residual) but sometimes get higher residual as converged result because it depends on the implicit one

Yes, criterion checks are also always with whatever residual the solver provides.

Maybe I should clarify that we always check against the implicit residual within the solvers. In some cases, the implicit residual and the actual residual may be the same, but that depends on the solver.

MarcelKoch

I think the code can use some of the new core developments. For example, the factory parameter can be unified, or maybe the workspace can be extended to also cover the batched case. But some of those changes (e.g. the workspace) could be done at a later time. So for now I'm focusing on the interface to allow for these changes.
Part 1/n

MarcelKoch

Part 2/n, mostly done with the interface and core stuff (except the test helpers). I think especially on the logger side there are some inconsistencies that I would like to see addressed.

yhmtsai

second part

yhmtsai · 2023-10-24T09:25:31Z

+     * Sets the input and generates the identity preconditioner.(Nothing needs
+     * to be actually generated.)
+     */
+    void generate(size_type,


does batch_identity need to be preconditioner?
batch_identity will be passed through the generated_preconditioner or the default preconditioner, right?

Essentially, the solver will always have prec.generate(...) and prec_apply(...) calls. As it is templated, in the default case, we need to have the identity preconditioner.

yhmtsai · 2023-10-24T11:21:43Z

+    initialize(A_entry, b_entry, gko::batch::to_const(x_entry), rho_old_entry,
+               omega_entry, alpha_entry, r_entry, r_hat_entry, p_entry,
+               p_hat_entry, v_entry, rhs_norms_entry, res_norms_entry);


the function call is slightly different from the core/solver/bicgstab. Is there any benefit merge b-Ax and r_hat = r to initialize? keeping them similar to core might be easier for reviewing

I draw back my comment because the other kernel can put the dot together unlike the core already

yhmtsai · 2023-10-24T11:33:56Z

+
+template <typename StopType, typename PrecType, typename LogType,
+          typename BatchMatrixType, typename ValueType>
+inline void batch_entry_bicgstab_impl(


I also think the core part can be shared among backends, but I do not focus on that now.
I assume the fused kernel from GPU perspective

Yes, I think we can think about unifying this later.

MarcelKoch

Part 3/3. This concerns mostly the reference/omp kernel and tests. There are only few notes on the kernels (beside moving parts into common/unified). I think there are some easy generalizations in the test helpers possible.

MarcelKoch · 2023-10-24T12:46:47Z

+    for (size_t i = 0; i < this->num_batch_items; i++) {
+        ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0),
+                  this->solver_settings.residual_tol);
+        ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i],


I'm not sure that this is a helpful test. IMO it would be better to compare the solver result to the true solution, or just leave it out. The test above might already be sufficient.

also, it should be equal not near, I think?

yhmtsai · 2023-10-25T09:40:19Z

+    auto iter_array = res.log_data->iter_counts.get_const_data();
+    for (size_t i = 0; i < num_batch_items; i++) {
+        ASSERT_EQ(iter_array[i], ref_iters);
+    }


does it make the linear system unsolved? otherwise, it might be less than ref_iters

Yes, the tolerance of 0 is not acheivable and it should always hit the ref iters

using nan is maybe more general, which also fit if we decide to use <= not <

Will that work on device as well ?

Yes, I think so. It should work if the compiler does not use fast math.

In this case, it is still not possible be acheive a tolerance of 0, so i think nan is not necessary.

yhmtsai · 2023-10-25T09:42:26Z

+        auto comp_res_norm =
+            exec->copy_val_to_host(res.res_norm->get_const_values() + i);
+        ASSERT_LE(iter_counts->get_const_data()[i], max_iters);
+        EXPECT_LE(res_norm->get_const_data()[i], comp_tol);


why does this criterion need use 100 * tol not tol if the criterion is absolute residual norm?

I think there were issues only on some systems, particularly MSVC. Not sure why.

It's might related to the optimization or different random input?
The codes gives me the confusion about the criterion.
From my first thought, it is actual residual norm check. That's why I do not think that the residual norm does not match the required criterion makes sense.

I think this code is a bit stale and has been updated. So, I think it should be correct now. In the updated code, comp_res_norm is the actual residual while resnorm is the residual from the logger, which in this case is the implicit residual.

yhmtsai · 2023-10-25T09:44:03Z

+    for (size_t i = 0; i < this->num_batch_items; i++) {
+        ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0),
+                  this->solver_settings.residual_tol);
+        ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i],


also, it should be equal not near, I think?

yhmtsai · 2023-10-25T09:46:28Z

+        EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);
+        ASSERT_LE(rel_res_norm, tol * 10);


Suggested change

EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);

ASSERT_LE(rel_res_norm, tol * 10);

EXPECT_EQ(rel_res_norm, res_norm.get_const_data()[i]);

ASSERT_LE(rel_res_norm, tol);

yhmtsai · 2023-10-25T09:46:55Z

+
+    GKO_ASSERT_BATCH_MTX_NEAR(res.x, linear_system.exact_sol, tol * 50);
+    for (size_t i = 0; i < num_batch_items; i++) {
+        ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);


Suggested change

ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);

ASSERT_LE(res.res_norm->get_const_values()[i], tol);

Both MSVC and NVHPC seem to have issues with even 50.

MarcelKoch · 2023-10-25T14:15:02Z

@pratikvn Do you mind holding off on the rebasing until all reviews are done (unless necessary)? Github can't keep track of the new changes otherwise (and VS Code seems also unable to do so).

pratikvn · 2023-10-25T22:56:14Z

@yhmtsai , the issue of tolerance is the same we have had in other places. Some compilers always seem to need higher values for tolerances, so the values of 50, 10 and 100 have been set empirically.

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com> Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

Co-authored-by: Yu-Hsian Tsai <yhmtsai@gmail.com>

Co-authored-by: Marcel Koch <marcel.koch@kit.edu> Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

pratikvn · 2023-11-01T09:03:48Z

As the discussion of the experimental namespace is independent of this PR and this PR has been reviewed, I will go ahead and merge this now to simplify the other batch PR as our CI seems to be stuck.

pratikvn added 1:ST:WIP This PR is a work in progress. Not ready for review. type:batched-functionality This is related to the batched functionality in Ginkgo labels Oct 21, 2023

pratikvn added this to the Release 1.7.0 milestone Oct 21, 2023

pratikvn self-assigned this Oct 21, 2023

pratikvn force-pushed the batch-bicgstab branch 2 times, most recently from 25a894a to 26472b9 Compare October 23, 2023 05:36

MarcelKoch reviewed Oct 23, 2023

View reviewed changes

MarcelKoch self-requested a review October 23, 2023 09:13

yhmtsai reviewed Oct 23, 2023

View reviewed changes

MarcelKoch reviewed Oct 23, 2023

View reviewed changes

MarcelKoch self-requested a review October 24, 2023 08:04

MarcelKoch reviewed Oct 24, 2023

View reviewed changes

yhmtsai reviewed Oct 24, 2023

View reviewed changes

MarcelKoch reviewed Oct 24, 2023

View reviewed changes

pratikvn force-pushed the batch-bicgstab branch from 2611c7a to 5e282b5 Compare October 25, 2023 09:03

yhmtsai reviewed Oct 25, 2023

View reviewed changes

pratikvn force-pushed the batch-bicgstab branch 2 times, most recently from 82712a3 to e17e58d Compare October 25, 2023 22:54

pratikvn added 1:ST:ready-for-review This PR is ready for review and removed 1:ST:WIP This PR is a work in progress. Not ready for review. labels Oct 25, 2023

pratikvn and others added 23 commits October 31, 2023 23:46

Fix ref test issues

98f9f29

Add omp tests and gen improvements

201b3e0

Fix logger and update docs

9262bbf

re-template logger and logdata

8d55033

doc improvements and some restructuring

fb0b856

formatting and renames

d58a997

generic logdata improvements

6f61cd0

rename kernel namespaces

83b1fde

use workspace for logger

1fc68ce

use new factory setup, move crit to base

4f67841

Add batch identity test and fix apply

20419af

Review updates

a89b4af

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com> Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

s/BicgstabSettings/settings

d5a55ad

Fix workspace issues and review updates

f6ae1a4

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

Review updates

8f920ea

Co-authored-by: Marcel Koch <marcel.koch@kit.edu>

rename crit getters and setters

f7bcbea

Format files

9b831a3

Co-authored-by: Pratik Nayak <pratikvn@pm.me>

Update copy/move semantics

0a6e700

Review updates

f4f69ba

Co-authored-by: Yu-Hsian Tsai <yhmtsai@gmail.com>

Review updates

9c1e139

Co-authored-by: Marcel Koch <marcel.koch@kit.edu> Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

Fix cuda incom type and check defaults

b87f213

clarify implicit/actual res norm docs, MSVC fixes

adb8f97

review updates

2260c8f

Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>

pratikvn force-pushed the batch-bicgstab branch from e21b275 to 2260c8f Compare October 31, 2023 22:47

pratikvn merged commit 3d8dc38 into develop Nov 1, 2023

pratikvn deleted the batch-bicgstab branch November 1, 2023 09:06

tcojean mentioned this pull request Nov 6, 2023

Release 1.7.0 to master #1451

Merged



		/**
		* Logs the final residual and iteration count for a batch solver.

	* Logs the final residual and iteration count for a batch solver.
	* Logs the final actual residual norm and iteration count for a batch solver.

		EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]);
		ASSERT_LE(rel_res_norm, tol * 10);

	ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50);
	ASSERT_LE(res.res_norm->get_const_values()[i], tol);

Conversation

pratikvn commented Oct 21, 2023

Uh oh!

MarcelKoch left a comment

Choose a reason for hiding this comment

Uh oh!

yhmtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcelKoch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

MarcelKoch left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

yhmtsai left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!