Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
1 change: 1 addition & 0 deletions src/linalg_iterative/CMakeLists.txt
Original file line number Diff line number Diff line change
@@ -1,6 +1,7 @@
set(linalg_iterative_fppFiles
stdlib_linalg_iterative_solvers_bicgstab.fypp
stdlib_linalg_iterative_solvers_cg.fypp
stdlib_linalg_iterative_solvers_gmres.fypp
stdlib_linalg_iterative_solvers.fypp
stdlib_linalg_iterative_solvers_pcg.fypp
)
Expand Down
54 changes: 53 additions & 1 deletion src/linalg_iterative/stdlib_linalg_iterative_solvers.fypp
Original file line number Diff line number Diff line change
Expand Up @@ -16,8 +16,9 @@ module stdlib_linalg_iterative_solvers
enumerator :: stdlib_size_wksp_cg = 3
enumerator :: stdlib_size_wksp_pcg = 4
enumerator :: stdlib_size_wksp_bicgstab = 8
enumerator :: stdlib_size_wksp_gmres = 3
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

is it on purpose the same as stdlib_size_wksp_cg?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It is not on purpose the as stdlib_size_wksp_cg. The workspace for the GMRES algorithm is bit more tricky than for the other gradient descent methods as not only are there reference buffer vectors with the size of the problem but also there are arrays depending on the kdim (number of internal iterations per restart cycle).

The current workspace design :
r(:) → residual (1 vector)
w(:) → Arnoldi work vector (1 vector)
v(:,1:kdim+1) → Krylov basis (kdim+1 vectors)
z(:,1:kdim) → preconditioned basis (kdim vectors)

Actually the last vector z could be reduced to z(:) as the preconditionner is applied per each j inner iteration.

I'll review this and try to propose a more lean version in terms of internal storage size. This won't change the fact that the workspace depends not only on n (problem size) but also on the selected number of internal iterations kdim.

Maybe @AlexanderGSC you have some suggestions here?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Hi @jvdp1, @jalvesz!

@jalvesz is right; the size of the workspace depends on the size of the restart (kdim), and there isn’t much that can be done about it. Storing only the last vector from the preconditioner saves a lot of memory – well spotted!

I can think of a few ideas for keeping the interface unchanged, each with its own pros and cons:

  • move the allocation of v inside the kernel:
    pros: keeps the current interface unchanged.
    cons: I don’t know if it’s allowed to reserve memory inside the kernel. At the moment there are variables such as the Hessemberg matrix, the rotation vectors, etc. that are allocated inside the kernel, but they are small, depending only on kdim, which by default is 30, meaning very little memory is used.

  • Explicit workspace handling:
    Pros: no memory allocation within the kernel.
    Cons: we would need to document that the workspace size is 3+(kdim+1). And we would need to add a check in the code to ensure this is the case. Furthermore, in the interface, the value would not reflect the workspace size.

  • Implicit kdim handling: the workspace size is checked, and the kdim size is deduced implicitly. If a value of 33 is set in the interface, gmres would run with a default kdim=30.
    Pros: allows running ‘by default’ or with ‘fine-tuning’ without allocating memory in the kernel.
    Cons: the way the workspace is managed becomes a little odd and less straightforward. Restart is a parameter that every version of gmres has.

In my opinion, if it isn’t a problem, I would move the v memory allocation into the kernel. If that is a problem, then we could explore other options.

end enum
public :: stdlib_size_wksp_cg, stdlib_size_wksp_pcg, stdlib_size_wksp_bicgstab
public :: stdlib_size_wksp_cg, stdlib_size_wksp_pcg, stdlib_size_wksp_bicgstab, stdlib_size_wksp_gmres

enum, bind(c)
enumerator :: pc_none = 0
Expand Down Expand Up @@ -161,6 +162,27 @@ module stdlib_linalg_iterative_solvers
end interface
public :: stdlib_solve_bicgstab_kernel

!! version: experimental
!!
!! stdlib_solve_gmres_kernel interface for the restarted generalized minimal residual method.
!! [Specifications](../page/specs/stdlib_linalg_iterative_solvers.html#stdlib_solve_gmres_kernel)
interface stdlib_solve_gmres_kernel
#:for k, t, s in R_KINDS_TYPES
module subroutine stdlib_solve_gmres_kernel_${s}$(A,M,b,x,rtol,atol,maxiter,kdim,workspace)
class(stdlib_linop_${s}$_type), intent(in) :: A !! linear operator
class(stdlib_linop_${s}$_type), intent(in) :: M !! preconditioner linear operator
${t}$, intent(in) :: b(:) !! right-hand side vector
${t}$, intent(inout) :: x(:) !! solution vector and initial guess
${t}$, intent(in) :: rtol !! relative tolerance for convergence
${t}$, intent(in) :: atol !! absolute tolerance for convergence
integer, intent(in) :: maxiter !! maximum number of iterations
integer, intent(in) :: kdim !! Krylov subspace size before restart
type(stdlib_solver_workspace_${s}$_type), intent(inout) :: workspace !! workspace for the solver
end subroutine
#:endfor
end interface
public :: stdlib_solve_gmres_kernel

!! version: experimental
!!
!! [Specifications](../page/specs/stdlib_linalg_iterative_solvers.html#stdlib_solve_pcg)
Expand Down Expand Up @@ -219,6 +241,36 @@ module stdlib_linalg_iterative_solvers
end interface
public :: stdlib_solve_bicgstab

!! version: experimental
!!
!! [Specifications](../page/specs/stdlib_linalg_iterative_solvers.html#stdlib_solve_gmres)
interface stdlib_solve_gmres
#:for matrix in MATRIX_TYPES
#:for k, t, s in R_KINDS_TYPES
module subroutine stdlib_solve_gmres_${matrix}$_${s}$(A,b,x,di,rtol,atol,maxiter,restart,kdim,precond,M,workspace)
!! linear operator matrix
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

should it be !> as the docstring precedes the declaration?

Suggested change
!! linear operator matrix
!> linear operator matrix

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Oh yes it could. In that case all the docstring characters in this file should be changed.

#:if matrix == "dense"
${t}$, intent(in) :: A(:,:)
#:else
type(${matrix}$_${s}$_type), intent(in) :: A
#:endif
${t}$, intent(in) :: b(:) !! right-hand side vector
${t}$, intent(inout) :: x(:) !! solution vector and initial guess
${t}$, intent(in), optional :: rtol !! relative tolerance for convergence
${t}$, intent(in), optional :: atol !! absolute tolerance for convergence
logical(int8), intent(in), optional, target :: di(:) !! dirichlet conditions mask
integer, intent(in), optional :: maxiter !! maximum number of iterations
logical, intent(in), optional :: restart !! restart flag
integer, intent(in), optional :: kdim !! Krylov subspace size before restart
integer, intent(in), optional :: precond !! preconditioner method enumerator
class(stdlib_linop_${s}$_type), optional, intent(in), target :: M !! preconditioner linear operator
type(stdlib_solver_workspace_${s}$_type), optional, intent(inout), target :: workspace !! workspace for the solver
end subroutine
#:endfor
#:endfor
end interface
public :: stdlib_solve_gmres

contains

!------------------------------------------------------------------
Expand Down
292 changes: 292 additions & 0 deletions src/linalg_iterative/stdlib_linalg_iterative_solvers_gmres.fypp
Original file line number Diff line number Diff line change
@@ -0,0 +1,292 @@
#:include "common.fypp"
#:set R_KINDS_TYPES = list(zip(REAL_KINDS, REAL_TYPES, REAL_SUFFIX))
#:set C_KINDS_TYPES = list(zip(CMPLX_KINDS, CMPLX_TYPES, CMPLX_SUFFIX))
#:set MATRIX_TYPES = ["dense", "CSR"]
#:set RANKS = range(1, 2+1)

submodule(stdlib_linalg_iterative_solvers) stdlib_linalg_iterative_gmres
use stdlib_kinds
use stdlib_sparse
use stdlib_constants
use stdlib_optval, only: optval
implicit none

contains

#:for k, t, s in R_KINDS_TYPES
module subroutine stdlib_solve_gmres_kernel_${s}$(A,M,b,x,rtol,atol,maxiter,kdim,workspace)
class(stdlib_linop_${s}$_type), intent(in) :: A
class(stdlib_linop_${s}$_type), intent(in) :: M
${t}$, intent(in) :: b(:), rtol, atol
${t}$, intent(inout) :: x(:)
integer, intent(in) :: maxiter, kdim
type(stdlib_solver_workspace_${s}$_type), intent(inout) :: workspace
integer :: i, iter, inner_iter, j, iorth
${t}$ :: beta, denom, hnext, htmp, norm_sq, norm_sq0, temp, tolsq
${t}$, allocatable :: cs(:), g(:), h(:,:), sn(:), x_base(:), y(:)

allocate(h(kdim+1, kdim), cs(kdim), sn(kdim), g(kdim+1), y(kdim), x_base(size(x)))

associate( r => workspace%tmp(:,1), &
w => workspace%tmp(:,2), &
v => workspace%tmp(:,3:kdim+3), &
z => workspace%tmp(:,kdim+4:2*kdim+3) )

! Initialize convergence targets from the right-hand side norm.
norm_sq0 = A%inner_product(b, b)
tolsq = max(rtol*rtol*norm_sq0, atol*atol)

! Form the initial residual and report the starting iterate.
r = b
call A%matvec(x, r, alpha=-one_${s}$, beta=one_${s}$, op='N')
norm_sq = A%inner_product(r, r)
if (associated(workspace%callback)) call workspace%callback(x, norm_sq, 0)

if (norm_sq <= tolsq .or. maxiter <= 0) then
deallocate(h, cs, sn, g, y, x_base)
return
end if

iter = 0
do while (iter < maxiter .and. norm_sq >= tolsq)
! Start a new GMRES cycle from the current residual.
beta = sqrt(max(norm_sq, zero_${s}$))
if (beta <= epsilon(one_${s}$)) exit

inner_iter = min(kdim, maxiter - iter)
x_base = x
h = zero_${s}$
cs = zero_${s}$
sn = zero_${s}$
g = zero_${s}$
y = zero_${s}$

! Initialize the Krylov basis and least-squares right-hand side.
v(:,1) = r / beta
g(1) = beta

do j = 1, inner_iter
! Run Arnoldi with the preconditioned basis vector.
call M%matvec(v(:,j), z(:,j), alpha=one_${s}$, beta=zero_${s}$, op='N')
call A%matvec(z(:,j), w, alpha=one_${s}$, beta=zero_${s}$, op='N')

! Modified Gram Schmidt (MGSR)
do iorth = 1, 2 ! reorthogonalization
do i = 1, j
htmp = A%inner_product(v(:,i), w)
h(i,j) = h(i,j) + htmp
w = w - htmp*v(:,i)
end do
end do

hnext = sqrt(max(A%inner_product(w, w), zero_${s}$))
h(j+1,j) = hnext
if (hnext > epsilon(one_${s}$)) then
v(:,j+1) = w / hnext
else
v(:,j+1) = zero_${s}$
end if

! Apply the previously accumulated Givens rotations.
do i = 1, j - 1
temp = cs(i) * h(i,j) + sn(i) * h(i+1,j)
h(i+1,j) = -sn(i) * h(i,j) + cs(i) * h(i+1,j)
h(i,j) = temp
end do

! Build and apply the next Givens rotation.
denom = sqrt(h(j,j) * h(j,j) + h(j+1,j) * h(j+1,j))
if (denom > epsilon(one_${s}$)) then
cs(j) = h(j,j) / denom
sn(j) = h(j+1,j) / denom
else
cs(j) = one_${s}$
sn(j) = zero_${s}$
end if

temp = cs(j) * h(j,j) + sn(j) * h(j+1,j)
h(j+1,j) = -sn(j) * h(j,j) + cs(j) * h(j+1,j)
h(j,j) = temp

temp = cs(j) * g(j) + sn(j) * g(j+1)
g(j+1) = -sn(j) * g(j) + cs(j) * g(j+1)
g(j) = temp

! Solve the reduced system and update the current iterate.
call upper_triangular_solve(h, g, y, j)
x = x_base
do i = 1, j
x = x + y(i) * z(:,i)
end do

! Track the residual norm estimate for convergence.
norm_sq = g(j+1) * g(j+1)
iter = iter + 1
if (associated(workspace%callback)) call workspace%callback(x, norm_sq, iter)

if (norm_sq < tolsq .or. hnext <= epsilon(one_${s}$) .or. iter >= maxiter) exit
end do

if (norm_sq < tolsq .or. iter >= maxiter) exit

! Refresh the residual before the next restarted cycle.
r = b
call A%matvec(x, r, alpha=-one_${s}$, beta=one_${s}$, op='N')
norm_sq = A%inner_product(r, r)
end do
end associate

deallocate(h, cs, sn, g, y, x_base)

contains

subroutine upper_triangular_solve(h, g, y, n)
${t}$, intent(in) :: h(:,:), g(:)
${t}$, intent(inout) :: y(:)
integer, intent(in) :: n
integer :: row

y(1:n) = g(1:n)
do row = n, 1, -1
if (row < n) y(row) = y(row) - dot_product(h(row,row+1:n) , y(row+1:n))
if (abs(h(row,row)) > epsilon(one_${s}$)) then
y(row) = y(row) / h(row,row)
else
y(row) = zero_${s}$
end if
end do
end subroutine
end subroutine
#:endfor

#:for matrix in MATRIX_TYPES
#:for k, t, s in R_KINDS_TYPES
module subroutine stdlib_solve_gmres_${matrix}$_${s}$(A,b,x,di,rtol,atol,maxiter,restart,kdim,precond,M,workspace)
#:if matrix == "dense"
use stdlib_linalg, only: diag
${t}$, intent(in) :: A(:,:)
#:else
type(${matrix}$_${s}$_type), intent(in) :: A
#:endif
${t}$, intent(in) :: b(:)
${t}$, intent(inout) :: x(:)
${t}$, intent(in), optional :: rtol, atol
logical(int8), intent(in), optional, target :: di(:)
integer, intent(in), optional :: maxiter, kdim
logical, intent(in), optional :: restart
integer, intent(in), optional :: precond
class(stdlib_linop_${s}$_type), optional, intent(in), target :: M
type(stdlib_solver_workspace_${s}$_type), optional, intent(inout), target :: workspace
type(stdlib_linop_${s}$_type) :: op
type(stdlib_linop_${s}$_type), pointer :: M_ => null()
type(stdlib_solver_workspace_${s}$_type), pointer :: workspace_
integer :: kdim_, maxiter_, n, ncols, precond_
${t}$ :: rtol_, atol_
logical :: restart_
logical(int8), pointer :: di_(:)
${t}$, allocatable :: diagonal(:)

n = size(b)
maxiter_ = optval(x=maxiter, default=n)
kdim_ = max(1, min(optval(x=kdim, default=min(30, n)), n))
restart_ = optval(x=restart, default=.true.)
rtol_ = optval(x=rtol, default=1.e-5_${s}$)
atol_ = optval(x=atol, default=epsilon(one_${s}$))
precond_ = optval(x=precond, default=pc_none)
ncols = 2 * kdim_ + stdlib_size_wksp_gmres

if (present(M)) then
M_ => M
else
allocate(M_)
allocate(diagonal(n), source=zero_${s}$)

select case(precond_)
case(pc_jacobi)
#:if matrix == "dense"
diagonal = diag(A)
#:else
call diag(A, diagonal)
#:endif
M_%matvec => precond_jacobi
case default
M_%matvec => precond_none
end select
where(abs(diagonal) > epsilon(zero_${s}$)) diagonal = one_${s}$ / diagonal
end if

op%matvec => matvec

if (present(di)) then
di_ => di
else
allocate(di_(n), source=.false._int8)
end if

if (present(workspace)) then
workspace_ => workspace
else
allocate(workspace_)
end if
if (.not.allocated(workspace_%tmp)) then
allocate(workspace_%tmp(n, ncols), source=zero_${s}$)
else if (size(workspace_%tmp,1) /= n .or. size(workspace_%tmp,2) < ncols) then
deallocate(workspace_%tmp)
allocate(workspace_%tmp(n, ncols), source=zero_${s}$)
end if

if (restart_) x = zero_${s}$
x = merge(b, x, di_)
call stdlib_solve_gmres_kernel(op, M_, b, x, rtol_, atol_, maxiter_, kdim_, workspace_)

if (.not.present(di)) deallocate(di_)
di_ => null()

if (.not.present(workspace)) then
deallocate(workspace_%tmp)
deallocate(workspace_)
end if
M_ => null()
workspace_ => null()

contains

subroutine matvec(x,y,alpha,beta,op)
#:if matrix == "dense"
use stdlib_linalg_blas, only: gemv
#:endif
${t}$, intent(in) :: x(:)
${t}$, intent(inout) :: y(:)
${t}$, intent(in) :: alpha
${t}$, intent(in) :: beta
character(1), intent(in) :: op
#:if matrix == "dense"
call gemv(op, m=size(A,1), n=size(A,2), alpha=alpha, a=A, lda=size(A,1), x=x, incx=1, beta=beta, y=y, incy=1)
#:else
call spmv(A, x, y, alpha, beta, op)
#:endif
y = merge(zero_${s}$, y, di_)
end subroutine

subroutine precond_none(x,y,alpha,beta,op)
${t}$, intent(in) :: x(:)
${t}$, intent(inout) :: y(:)
${t}$, intent(in) :: alpha
${t}$, intent(in) :: beta
character(1), intent(in) :: op
y = merge(zero_${s}$, x, di_)
end subroutine

subroutine precond_jacobi(x,y,alpha,beta,op)
${t}$, intent(in) :: x(:)
${t}$, intent(inout) :: y(:)
${t}$, intent(in) :: alpha
${t}$, intent(in) :: beta
character(1), intent(in) :: op
y = merge(zero_${s}$, diagonal * x, di_)
end subroutine
end subroutine
#:endfor
#:endfor

end submodule stdlib_linalg_iterative_gmres
Loading
Loading