Add a batch::Bicgstab solver class, core, ref and omp kernels#1438
Conversation
25a894a to
26472b9
Compare
MarcelKoch
left a comment
There was a problem hiding this comment.
I think we can use our unified kernels approach for some of these parts. In particular, the logger and stopping criteria don't use any backend specific stuff, except for some function attributes. Those could also be handled uniformly through macros, which we already have.
I think even the identity preconditioner could be handled this way, although that would require some adjustments to our unified kernels, so I think we should postpone that.
|
|
||
|
|
||
| /** | ||
| * Logs the final residual and iteration count for a batch solver. |
There was a problem hiding this comment.
| * Logs the final residual and iteration count for a batch solver. | |
| * Logs the final actual residual norm and iteration count for a batch solver. |
It is for actual residual not implicit residual, right?
There was a problem hiding this comment.
That depends on the solver, so I would not specify that here.
There was a problem hiding this comment.
Is it also applied to criterion?
If it is, it gives unexpected convergence behavior. User sometimes gets the residual indeed less the requirement (actual residual) but sometimes get higher residual as converged result because it depends on the implicit one
There was a problem hiding this comment.
Yes, criterion checks are also always with whatever residual the solver provides.
There was a problem hiding this comment.
Maybe I should clarify that we always check against the implicit residual within the solvers. In some cases, the implicit residual and the actual residual may be the same, but that depends on the solver.
MarcelKoch
left a comment
There was a problem hiding this comment.
I think the code can use some of the new core developments. For example, the factory parameter can be unified, or maybe the workspace can be extended to also cover the batched case. But some of those changes (e.g. the workspace) could be done at a later time. So for now I'm focusing on the interface to allow for these changes.
Part 1/n
MarcelKoch
left a comment
There was a problem hiding this comment.
Part 2/n, mostly done with the interface and core stuff (except the test helpers). I think especially on the logger side there are some inconsistencies that I would like to see addressed.
| * Sets the input and generates the identity preconditioner.(Nothing needs | ||
| * to be actually generated.) | ||
| */ | ||
| void generate(size_type, |
There was a problem hiding this comment.
does batch_identity need to be preconditioner?
batch_identity will be passed through the generated_preconditioner or the default preconditioner, right?
There was a problem hiding this comment.
Essentially, the solver will always have prec.generate(...) and prec_apply(...) calls. As it is templated, in the default case, we need to have the identity preconditioner.
| initialize(A_entry, b_entry, gko::batch::to_const(x_entry), rho_old_entry, | ||
| omega_entry, alpha_entry, r_entry, r_hat_entry, p_entry, | ||
| p_hat_entry, v_entry, rhs_norms_entry, res_norms_entry); |
There was a problem hiding this comment.
the function call is slightly different from the core/solver/bicgstab. Is there any benefit merge b-Ax and r_hat = r to initialize? keeping them similar to core might be easier for reviewing
There was a problem hiding this comment.
I draw back my comment because the other kernel can put the dot together unlike the core already
|
|
||
| template <typename StopType, typename PrecType, typename LogType, | ||
| typename BatchMatrixType, typename ValueType> | ||
| inline void batch_entry_bicgstab_impl( |
There was a problem hiding this comment.
I also think the core part can be shared among backends, but I do not focus on that now.
I assume the fused kernel from GPU perspective
There was a problem hiding this comment.
Yes, I think we can think about unifying this later.
MarcelKoch
left a comment
There was a problem hiding this comment.
Part 3/3. This concerns mostly the reference/omp kernel and tests. There are only few notes on the kernels (beside moving parts into common/unified). I think there are some easy generalizations in the test helpers possible.
| for (size_t i = 0; i < this->num_batch_items; i++) { | ||
| ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0), | ||
| this->solver_settings.residual_tol); | ||
| ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i], |
There was a problem hiding this comment.
I'm not sure that this is a helpful test. IMO it would be better to compare the solver result to the true solution, or just leave it out. The test above might already be sufficient.
There was a problem hiding this comment.
also, it should be equal not near, I think?
2611c7a to
5e282b5
Compare
| auto iter_array = res.log_data->iter_counts.get_const_data(); | ||
| for (size_t i = 0; i < num_batch_items; i++) { | ||
| ASSERT_EQ(iter_array[i], ref_iters); | ||
| } |
There was a problem hiding this comment.
does it make the linear system unsolved? otherwise, it might be less than ref_iters
There was a problem hiding this comment.
Yes, the tolerance of 0 is not acheivable and it should always hit the ref iters
There was a problem hiding this comment.
using nan is maybe more general, which also fit if we decide to use <= not <
There was a problem hiding this comment.
Will that work on device as well ?
There was a problem hiding this comment.
Yes, I think so. It should work if the compiler does not use fast math.
There was a problem hiding this comment.
In this case, it is still not possible be acheive a tolerance of 0, so i think nan is not necessary.
| auto comp_res_norm = | ||
| exec->copy_val_to_host(res.res_norm->get_const_values() + i); | ||
| ASSERT_LE(iter_counts->get_const_data()[i], max_iters); | ||
| EXPECT_LE(res_norm->get_const_data()[i], comp_tol); |
There was a problem hiding this comment.
why does this criterion need use 100 * tol not tol if the criterion is absolute residual norm?
There was a problem hiding this comment.
I think there were issues only on some systems, particularly MSVC. Not sure why.
There was a problem hiding this comment.
It's might related to the optimization or different random input?
The codes gives me the confusion about the criterion.
From my first thought, it is actual residual norm check. That's why I do not think that the residual norm does not match the required criterion makes sense.
There was a problem hiding this comment.
I think this code is a bit stale and has been updated. So, I think it should be correct now. In the updated code, comp_res_norm is the actual residual while resnorm is the residual from the logger, which in this case is the implicit residual.
| for (size_t i = 0; i < this->num_batch_items; i++) { | ||
| ASSERT_LE(res_log_array[i] / this->linear_system.rhs_norm->at(i, 0, 0), | ||
| this->solver_settings.residual_tol); | ||
| ASSERT_NEAR(res_log_array[i], res.res_norm->get_const_values()[i], |
There was a problem hiding this comment.
also, it should be equal not near, I think?
| EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]); | ||
| ASSERT_LE(rel_res_norm, tol * 10); |
There was a problem hiding this comment.
| EXPECT_LE(rel_res_norm, res_norm.get_const_data()[i]); | |
| ASSERT_LE(rel_res_norm, tol * 10); | |
| EXPECT_EQ(rel_res_norm, res_norm.get_const_data()[i]); | |
| ASSERT_LE(rel_res_norm, tol); |
|
|
||
| GKO_ASSERT_BATCH_MTX_NEAR(res.x, linear_system.exact_sol, tol * 50); | ||
| for (size_t i = 0; i < num_batch_items; i++) { | ||
| ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50); |
There was a problem hiding this comment.
| ASSERT_LE(res.res_norm->get_const_values()[i], tol * 50); | |
| ASSERT_LE(res.res_norm->get_const_values()[i], tol); |
|
@pratikvn Do you mind holding off on the rebasing until all reviews are done (unless necessary)? Github can't keep track of the new changes otherwise (and VS Code seems also unable to do so). |
82712a3 to
e17e58d
Compare
|
@yhmtsai , the issue of tolerance is the same we have had in other places. Some compilers always seem to need higher values for tolerances, so the values of 50, 10 and 100 have been set empirically. |
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com> Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu>
Co-authored-by: Pratik Nayak <pratikvn@pm.me>
Co-authored-by: Yu-Hsian Tsai <yhmtsai@gmail.com>
Co-authored-by: Marcel Koch <marcel.koch@kit.edu> Co-authored-by: Terry Cojean <terry.cojean@kit.edu> Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
Co-authored-by: Yu-Hsiang Tsai <yhmtsai@gmail.com>
e21b275 to
2260c8f
Compare
|
As the discussion of the |
This PR adds a batch::Bicgstab solver and only the reference kernels for now. Another PR will be created to add the cuda, hip and dpcpp kernels to avoid making this PR too large.
In addition, some general solver, stopping critieria, logger and preconditioner framework is also added. These are fairly simple and I think it helps review these in the context of the solver itself.