-
Notifications
You must be signed in to change notification settings - Fork 107
Optimizing XC instantiation for GPUs #1262
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: master
Are you sure you want to change the base?
Changes from all commits
92708ef
af6c331
c65c8ad
102079e
3e8d219
278fb6d
File filter
Filter by extension
Conversations
Jump to
Diff view
Diff view
There are no files selected for viewing
| Original file line number | Diff line number | Diff line change |
|---|---|---|
|
|
@@ -43,14 +43,18 @@ eval_psp_energy_correction(T, ::Element) = zero(T) | |
| eval_psp_energy_correction(psp::Element) = eval_psp_energy_correction(Float64, psp) | ||
|
|
||
| # Fall back to the Gaussian table for Elements without pseudopotentials | ||
| function valence_charge_density_fourier(el::Element, p::T)::T where {T <: Real} | ||
| function valence_charge_density_fourier(el::Element, p) | ||
| gaussian_valence_charge_density_fourier(el, p) | ||
| end | ||
|
|
||
| """Gaussian valence charge density using Abinit's coefficient table, in Fourier space.""" | ||
| function gaussian_valence_charge_density_fourier(el::Element, p::T)::T where {T <: Real} | ||
| charge_ionic(el) * exp(-(p * atom_decay_length(el))^2) | ||
| end | ||
| function gaussian_valence_charge_density_fourier(el::Element, ps::AbstractVector{T}) where {T <: Real} | ||
| arch = architecture(ps) | ||
| to_device(arch, map(p -> gaussian_valence_charge_density_fourier(el, p), to_cpu(ps))) | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I'm surprised this needs to be executed on the CPU. The implementation above looks so simple.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. It should work on the GPU, I was on auto pilot. I'll check and change
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Well, actually it is not that straight forward, because would not work. It would be possible to port it given some refactoring, but I don't think it is worth the trouble, because it never shows up as a performance bottleneck. |
||
| end | ||
|
|
||
| function core_charge_density_fourier(::Element, ::T)::T where {T <: Real} | ||
| error("Abstract elements do not necesesarily provide core charge density.") | ||
|
|
@@ -60,12 +64,6 @@ function core_kinetic_energy_density_fourier(::Element, ::T)::T where {T <: Real | |
| error("Abstract elements do not necesesarily provide core kinetic energy density.") | ||
| end | ||
|
|
||
| # Generic vectorized version of local_potential_fourier, GPU-safe | ||
| function local_potential_fourier(el::Element, ps::AbstractVector{T}) where {T <: Real} | ||
| arch = architecture(ps) | ||
| to_device(arch, map(p -> local_potential_fourier(el, p), to_cpu(ps))) | ||
| end | ||
|
|
||
| Base.show(io::IO, el::Element) = print(io, "$(typeof(el))(:$(species(el)))") | ||
|
|
||
| # | ||
|
|
@@ -175,26 +173,17 @@ has_core_density(el::ElementPsp) = has_core_density(el.psp) | |
| has_core_kinetic_energy_density(el::ElementPsp) = has_core_kinetic_energy_density(el.psp) | ||
| eval_psp_energy_correction(T, el::ElementPsp) = eval_psp_energy_correction(T, el.psp) | ||
|
|
||
| function local_potential_fourier(el::ElementPsp, p::T) where {T <: Real} | ||
| eval_psp_local_fourier(el.psp, p) | ||
| end | ||
| # Vectorized version of the above | ||
| function local_potential_fourier(el::ElementPsp, ps::AbstractVector{T}) where {T <: Real} | ||
| eval_psp_local_fourier(el.psp, ps) | ||
| end | ||
| local_potential_real(el::ElementPsp, r::Real) = eval_psp_local_real(el.psp, r) | ||
| local_potential_fourier(el::ElementPsp, p) = eval_psp_local_fourier(el.psp, p) | ||
| local_potential_real(el::ElementPsp, r) = eval_psp_local_real(el.psp, r) | ||
|
|
||
| function valence_charge_density_fourier(el::ElementPsp, p::T) where {T <: Real} | ||
| function valence_charge_density_fourier(el::ElementPsp, p) | ||
| if has_valence_density(el.psp) | ||
| eval_psp_valence_density_fourier(el.psp, p) | ||
| else | ||
| gaussian_valence_charge_density_fourier(el, p) | ||
| end | ||
| end | ||
| function core_charge_density_fourier(el::ElementPsp, p::T) where {T <: Real} | ||
| eval_psp_core_density_fourier(el.psp, p) | ||
| end | ||
|
|
||
| core_charge_density_fourier(el::ElementPsp, p) = eval_psp_core_density_fourier(el.psp, p) | ||
|
|
||
| # | ||
| # ElementCohenBergstresser | ||
|
|
@@ -264,7 +253,6 @@ function local_potential_fourier(el::ElementCohenBergstresser, p::T) where {T <: | |
| end | ||
| # TODO Strictly speaking needs a eval_psp_energy_correction | ||
|
|
||
|
|
||
| # | ||
| # ElementGaussian | ||
| # | ||
|
|
@@ -297,6 +285,24 @@ function local_potential_fourier(el::ElementGaussian, p::Real) | |
| end | ||
| # TODO Strictly speaking needs a eval_psp_energy_correction | ||
|
|
||
| # Generic API expectations: all element functions taking a real space scalar |r| or a | ||
| # reciprocal space scalar |p| should have a vectorized version accepting vectors of |r| or |p|. | ||
| # The following loop vectorizes element functions by calling the single-value version | ||
| # elementwise. This is GPU safe and generic. Performance critical functions should have their | ||
| # own GPU-optimized implementation. Note: explicit loop over Element types in order to avoid | ||
| # ambiguities with the specific ElementPsp functions. | ||
| for fn in [:gaussian_valence_charge_density_fourier, :core_charge_density_fourier, | ||
|
Member
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I trust you checked we still really need this after the simplifications in the But we should make sure that once new
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. Yes, while it is more compact than the previous solution, it is absolutely necessary that the abstract That being said, when adding a new element type, tests will not pass if the important vectorized functions are not provided.
Collaborator
Author
There was a problem hiding this comment. Choose a reason for hiding this commentThe reason will be displayed to describe this comment to others. Learn more. I think this double loop is showing some limitations. One of the But Element2DCoulomb is defined nowhere within DFTK. It would be caught if the default vectorized functions were defined for generic I am slowly running out of ideas. |
||
| :local_potential_fourier, :local_potential_real] | ||
| for el_type in [ElementCoulomb, ElementCohenBergstresser, ElementGaussian] | ||
| @eval begin | ||
| function DFTK.$fn(el::$el_type, arg::AbstractVector{T}) where {T <: Real} | ||
| arch = architecture(arg) | ||
| to_device(arch, map(p -> $fn(el, p), to_cpu(arg))) | ||
| end | ||
| end | ||
| end | ||
| end | ||
|
|
||
| # | ||
| # Helper functions | ||
| # | ||
|
|
||
Uh oh!
There was an error while loading. Please reload this page.