Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
Show all changes
115 commits
Select commit Hold shift + click to select a range
cd90f5b
Fix `ReplaceAxisSymbol` and keep it to Taskslets -> `ReplaceAxisSymbo…
FlorianDeconinck May 14, 2026
2c8f74e
Add `TreeOptimizationStatistics` to capture the results of the opt at…
FlorianDeconinck May 14, 2026
8b49e3b
Add a master `CartesianMerge` bringing everything axis merge, refacto…
FlorianDeconinck May 14, 2026
c8d05af
Move helpers into `common` and break them by type
FlorianDeconinck May 15, 2026
20665a8
Fix imports
FlorianDeconinck May 15, 2026
c8a225e
`InlineVertical2DWrite` + utest
FlorianDeconinck May 15, 2026
73f5609
Fix InlineVertical2DWrite
FlorianDeconinck May 15, 2026
fc1ecb1
cleanup
romanc May 18, 2026
d7e40aa
fix symbol replacement
romanc May 18, 2026
55ad8fa
update gt4py (log10 and friends)
romanc May 18, 2026
1c9bb5f
more cleanup (all minor nothing fancy)
romanc May 18, 2026
376a187
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc May 20, 2026
36204b0
Add support for InlineOffgridConditionals
romanc May 20, 2026
689ab89
fixup: temp fix for test of InlineOffgridConditionals
romanc May 20, 2026
c263116
cleanup: remove old "push if down" codepath
romanc May 20, 2026
7d6ecc1
Normalize cartesian index during data depedancy check
FlorianDeconinck May 20, 2026
de03d34
Update tests
FlorianDeconinck May 20, 2026
ff57227
ReplaceAxisSymbolInTasklet -> ReplaceAxisSymbol + debug of it's usage
FlorianDeconinck May 20, 2026
94b2e99
fix unit test by hardinging detection of "our" loops
romanc May 21, 2026
3a50577
unrelated cleanup: fix/assert type issues
romanc May 21, 2026
d6824f3
Changes to `InlineVertical2DWrite`
romanc May 21, 2026
454fb44
dace update: connect source/sink nodes with empty memlets
romanc May 22, 2026
9ba2664
dace update: support for self-assigning copy nodes
romanc May 22, 2026
f8798a0
GPU tree orchestration pipeline
FlorianDeconinck May 22, 2026
89294d2
Add scalarized array to tree statistics
FlorianDeconinck May 22, 2026
456b5fb
Replace `AxisSymbol` in "masklet as well + rename file
FlorianDeconinck May 22, 2026
0aaa78d
Deactivate `InlineVertical2DWrite` for now
FlorianDeconinck May 22, 2026
634a097
Lint
FlorianDeconinck May 22, 2026
02102af
Fix tests after collapsing maps / fix non-cartesian loop inline
romanc May 26, 2026
43674af
fixes to run GFLD_1M with orch:dace:cpu:KJI backend
romanc May 28, 2026
1ea1314
ci: gt4py update (restore temp dace working branch)
romanc May 29, 2026
fa60dcc
remove extra `f` in result report header
romanc May 29, 2026
78d8487
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc May 29, 2026
160923f
unrelated dace/gt4py update: just test fixes and a typo
romanc Jun 1, 2026
c2bd78d
Expose `gpu:IJK` backends to NDSL
FlorianDeconinck Jun 2, 2026
eaaa0cc
Disable DaceConfig.from_dict() as it is incomplete
romanc Jun 3, 2026
749f8a3
readability of cache location code
romanc Jun 3, 2026
84fc75c
Merge branch 'opt_cycle_I/loop_merge' of github.com:FlorianDeconinck/…
romanc Jun 3, 2026
0d860bf
translate tests: fix crash in reporting when comparing scalars
romanc Jun 4, 2026
5ee3bb9
Weaken the cube-sphere communicator hard ranks limit. We need "at lea…
FlorianDeconinck Jun 5, 2026
da82ede
Adjust `cflags` format read for orchestrated compile
FlorianDeconinck Jun 5, 2026
64dd47c
Lint
FlorianDeconinck Jun 5, 2026
26fb0ef
Introduce hardware configuration good defaults
FlorianDeconinck Jun 6, 2026
7bdd3fa
Fix double load for compiling rank
FlorianDeconinck Jun 6, 2026
0fcd9bd
Hardware default: gives back default when no `cp` instead of raising
FlorianDeconinck Jun 6, 2026
d843a2c
Orch: always collapse maps to maximize the kernel parallel basis
FlorianDeconinck Jun 7, 2026
15dbc89
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jun 9, 2026
fd9e588
gt4py update to latest romanc/fix-log10-precision
romanc Jun 9, 2026
c32fabf
review of new gpu hardware detection
romanc Jun 9, 2026
7e45ff7
update dace/gt4py to bring a typehint fix from dace
romanc Jun 9, 2026
0e7197f
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jun 10, 2026
099cd91
gt4py update: unit-aligned dace gpu backends
romanc Jun 12, 2026
ea873ec
unrelated: use dace import shortcut convetions in ndsl code
romanc Jun 12, 2026
a94d5a0
Transform to kernelize maps on GPU
romanc Jun 12, 2026
d9bbe1d
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jun 12, 2026
42e16de
fix some red squiggly lines in vscode :)
romanc Jun 12, 2026
2c64992
Don't raise kernelize_maps in KJI layout
romanc Jun 13, 2026
3c8e53b
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jun 15, 2026
cfc4e87
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jun 15, 2026
546ce16
Move sdfg save on verbose into a DaceProgress
FlorianDeconinck Jun 15, 2026
145ee10
Fix init of StreePipeline
FlorianDeconinck Jun 15, 2026
c997988
Lint
FlorianDeconinck Jun 15, 2026
a390e53
allow map collapse with different schedules, unique loop region names
romanc Jun 17, 2026
43a2903
Detect write-after-write where offset/index differs
FlorianDeconinck Jun 17, 2026
5124545
Lint
FlorianDeconinck Jun 17, 2026
a97c3bc
Merge branch 'fix/write-after-write-detection' into opt_cycle_I/loop_…
FlorianDeconinck Jun 18, 2026
bd14cc6
Remove `lineinfo` in `DaCe`
FlorianDeconinck Jun 18, 2026
c46c12e
Mvoe `gt4py` to `tmp_June26_01` to bring in the `better_parallel_kern…
FlorianDeconinck Jun 18, 2026
741994f
dace: fix for read-after-write in input_memlets
FlorianDeconinck Jun 18, 2026
63c5773
feat: enable/disable stree via dace_config
romanc Jun 19, 2026
839af5d
feat: turn on/off overcompute merge `NDSL_STREE_OVERCOMPUTE_MERGE`
romanc Jun 19, 2026
af8c2d4
Re-work GPU xforms to exclude callback from going to host
FlorianDeconinck Jun 19, 2026
de89c11
feat: optimization config for orchestrated code
romanc Jun 19, 2026
20018ff
`OptimizationConfig` tweaks:
FlorianDeconinck Jun 19, 2026
ed298be
GPU opt: apply AddThreadBlock so we have proper thread-blocking
FlorianDeconinck Jun 19, 2026
df1aac3
Fix `OptimizationConfig` tests
FlorianDeconinck Jun 19, 2026
5bdec8f
Lint
FlorianDeconinck Jun 19, 2026
f2430f4
Default `common_gpu_xforms` to False as it crashes more often than no…
FlorianDeconinck Jun 19, 2026
1c8d377
Add option `pad_non_interface_dimensions` to GridSizer to undo the be…
FlorianDeconinck Jun 20, 2026
8656704
[Opt config] Add `kernalize` to `stree` and `enabled` to `stree.merger`
FlorianDeconinck Jun 21, 2026
d26bdf4
Always print report of schedule tree opt
FlorianDeconinck Jun 21, 2026
24c33dc
Add 3D kernel count to stree stats
FlorianDeconinck Jun 21, 2026
450da57
feat/fix: push maps to GPU if on GPU
romanc Jun 22, 2026
dce60af
refactor: rename "kernalize" -> "kernelize" in optimization config
romanc Jun 22, 2026
f14836a
gt4py update: keep dace version in sync
romanc Jun 22, 2026
944ee09
refactor: leveage OptimizationConfig as pipeline config
romanc Jun 22, 2026
2861f05
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jun 22, 2026
3842db5
build: ignore non-version tags in setuptools-scm
romanc Jun 23, 2026
71efc5a
gt4py update: avoid `x+0` / `0+x` in horizontal regions
romanc Jun 23, 2026
9b4d0be
update dace/gt4py: gpu transformation fix
romanc Jun 23, 2026
908b02e
Push `matplolib` import into the plotting function
FlorianDeconinck Jun 23, 2026
b8432e4
Moved log to grab `_parse` call and move labeler so it is applied onl…
FlorianDeconinck Jun 24, 2026
633f73b
fixup: consistently pass opt config to `_parse_sdfg()`
romanc Jun 25, 2026
7d48048
refacor: use "dace import slang" in our memlet helper
romanc Jun 25, 2026
de9763d
fix: account for map start in axis normalization
romanc Jun 25, 2026
b24e1fc
fixup: use normalized indices in debug message
romanc Jun 25, 2026
0d98445
Make "dace" import non-ambiguous
romanc Jun 25, 2026
6683347
feat: support for custom merging oder in OptimizationConfig
romanc Jun 25, 2026
74ff202
Revert accounting for map start in axis normalization
romanc Jun 26, 2026
f37b547
feature: `LabledSection`s for use with local optimizations
romanc Jun 26, 2026
55db71a
feat: local optimization
romanc Jun 26, 2026
e31d1d7
fix: new algo for creating labeled sections
Jun 28, 2026
699949b
fix new algo
romanc Jun 28, 2026
4f5804d
Clean up logging in orchestration
FlorianDeconinck Jun 29, 2026
b2e282a
Add API to get an equivalent CPU and STENCIL backend from an existing…
FlorianDeconinck Jun 29, 2026
1d242bc
Lint
FlorianDeconinck Jun 29, 2026
c310377
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jul 1, 2026
10ab2a4
fixup: clean the docs for stree opt
romanc Jul 1, 2026
2975f3f
fix: support for plain numbers in index normalization
romanc Jul 1, 2026
d5b6b0d
fix: match whole words when replacing axis symbols
romanc Jul 1, 2026
475b050
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jul 1, 2026
6d06987
first version of local optimization pipeline
romanc Jul 1, 2026
902c647
fix test case of no-overcompute merge
romanc Jul 1, 2026
987493e
Merge branch 'opt_cycle_I/loop_merge_with_local_opt' into opt_cycle_I…
romanc Jul 1, 2026
2865641
Merge remote-tracking branch 'origin/develop' into opt_cycle_I/loop_m…
romanc Jul 1, 2026
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
2 changes: 1 addition & 1 deletion docs/docstrings/dsl/dace/stree/optimizations/axis_merge.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# axis_merge

::: dsl.dace.stree.optimizations
::: dsl.dace.stree.optimizations.axis_merge

<style>
/* re-enable the left side navigation bar for this page */
Expand Down
2 changes: 1 addition & 1 deletion docs/docstrings/dsl/dace/stree/optimizations/clean_tree.md
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# clean_tree

::: dsl.dace.stree.optimizations
::: dsl.dace.stree.optimizations.clean_tree

<style>
/* re-enable the left side navigation bar for this page */
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# memlet_helpers
# loops

::: dsl.dace.stree.optimizations.memlet_helpers
::: dsl.dace.stree.optimizations.common.loops

<style>
/* re-enable the left side navigation bar for this page */
Expand Down
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# tree_common_op
# memlet

::: dsl.dace.stree.optimizations.tree_common_op
::: dsl.dace.stree.optimizations.common.memlet

<style>
/* re-enable the left side navigation bar for this page */
Expand Down
12 changes: 12 additions & 0 deletions docs/docstrings/dsl/dace/stree/optimizations/common/topology.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# topology

::: dsl.dace.stree.optimizations.common.topology

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
12 changes: 12 additions & 0 deletions docs/docstrings/dsl/dace/stree/optimizations/kernelize_maps.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# kernelize_maps

::: dsl.dace.stree.optimizations.kernelize_maps

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# local_optimizations

::: dsl.dace.stree.optimizations.local_optimizations

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# offgrid_conditionals

::: dsl.dace.stree.optimizations.offgrid_conditionals

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
# refine_transients

::: dsl.dace.stree.optimizations
::: dsl.dace.stree.optimizations.refine_transients

<style>
/* re-enable the left side navigation bar for this page */
Expand Down
12 changes: 12 additions & 0 deletions docs/docstrings/dsl/dace/stree/optimizations/remove_loops.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# remove_loops

::: dsl.dace.stree.optimizations.remove_loops

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# replace_axis_symbol

::: dsl.dace.stree.optimizations.replace_axis_symbol

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
12 changes: 12 additions & 0 deletions docs/docstrings/dsl/dace/stree/optimizations/statistics.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,12 @@
# statistics

::: dsl.dace.stree.optimizations.statistics

<style>
/* re-enable the left side navigation bar for this page */
@media screen and (min-width: 76.1875em) {
.md-sidebar--primary {
display: block !important;
}
}
</style>
2 changes: 1 addition & 1 deletion external/dace
Submodule dace updated 88 files
+1 −1 .github/workflows/general-ci.yml
+1 −1 .github/workflows/ml-ci.yml
+4 −1 dace/codegen/compiled_sdfg.py
+36 −32 dace/codegen/control_flow.py
+1 −1 dace/codegen/instrumentation/papi.py
+5 −2 dace/codegen/targets/cuda.py
+11 −10 dace/codegen/targets/framecode.py
+1 −1 dace/config.py
+4 −1 dace/data/ctypes_interop.py
+2 −2 dace/dtypes.py
+4 −1 dace/frontend/python/newast.py
+3 −2 dace/frontend/python/parser.py
+1 −0 dace/frontend/python/replacements/array_creation.py
+1 −0 dace/frontend/python/replacements/array_manipulation.py
+1 −0 dace/frontend/python/replacements/array_metadata.py
+1 −0 dace/frontend/python/replacements/linalg.py
+3 −2 dace/frontend/python/replacements/misc.py
+1 −0 dace/frontend/python/replacements/reduction.py
+1 −0 dace/frontend/python/replacements/ufunc.py
+1 −1 dace/memlet.py
+114 −4 dace/runtime/include/dace/math.h
+115 −48 dace/sdfg/analysis/schedule_tree/tree_to_sdfg.py
+29 −3 dace/sdfg/analysis/schedule_tree/treenodes.py
+1 −1 dace/sdfg/analysis/vector_inference.py
+1 −0 dace/sdfg/infer_types.py
+10 −3 dace/sdfg/nodes.py
+296 −97 dace/sdfg/propagation.py
+94 −14 dace/sdfg/sdfg.py
+203 −18 dace/sdfg/state.py
+3 −0 dace/sdfg/utils.py
+35 −35 dace/subsets.py
+1 −0 dace/transformation/dataflow/add_threadblock_map.py
+69 −22 dace/transformation/dataflow/map_fission.py
+38 −23 dace/transformation/dataflow/redundant_array.py
+1 −1 dace/transformation/dataflow/sve/infer_types.py
+155 −47 dace/transformation/helpers.py
+9 −7 dace/transformation/interstate/gpu_transform_sdfg.py
+12 −1 dace/transformation/interstate/state_fusion_with_happens_before.py
+7 −5 dace/transformation/passes/analysis/analysis.py
+511 −0 dace/transformation/passes/loop_to_reduce.py
+8 −15 dace/transformation/passes/reference_reduction.py
+32 −1 doc/conf.py
+218 −0 doc/extensions/backend.rst
+9 −8 doc/extensions/extensions.rst
+129 −0 doc/extensions/frontend.rst
+113 −0 doc/extensions/instrumentation.rst
+393 −0 doc/extensions/libraries.rst
+111 −0 doc/extensions/sdfgconvertible.rst
+133 −0 doc/extensions/symbolic.rst
+1 −1 doc/frontend/daceprograms.rst
+4 −6 doc/frontend/parsing.rst
+129 −0 doc/frontend/preprocessing.rst
+56 −14 doc/frontend/pysupport.rst
+1 −3 doc/frontend/python.rst
+51 −0 doc/general/faq.rst
+1 −1 doc/general/glossary.rst
+1 −1 doc/general/structure.rst
+8 −2 doc/index.rst
+84 −0 doc/optimization/guidelines.rst
+112 −0 doc/optimization/interactive.rst
+3 −5 doc/optimization/optimization.rst
+1 −4 doc/sdfg/ir.rst
+177 −0 doc/sdfg/schedule_tree.rst
+1 −1 doc/setup/integration.rst
+1 −1 doc/setup/quickstart.rst
+2 −2 doc/source/dace.cli.rst
+1 −1 doc/source/dace.codegen.instrumentation.rst
+0 −9 doc/source/dace.rst
+1 −1 tests/codegen/allocation_lifetime_test.py
+2 −1 tests/codegen/control_flow_generation_test.py
+27 −0 tests/codegen/gpu_min_warps_per_eu_test.py
+2 −1 tests/graph_test.py
+134 −0 tests/memlet_propagation_squeezing_test.py
+1 −1 tests/numpy/common.py
+406 −0 tests/passes/loop_to_reduce_test.py
+26 −21 tests/passes/writeset_underapproximation_test.py
+2 −2 tests/schedule_tree/naming_test.py
+3 −4 tests/schedule_tree/schedule_test.py
+142 −1 tests/schedule_tree/to_sdfg_test.py
+32 −1 tests/schedule_tree/treenodes_test.py
+17 −10 tests/sdfg/data/container_array_test.py
+50 −37 tests/sdfg/data/structure_test.py
+29 −4 tests/sdfg/reference_test.py
+1 −1 tests/state_transition_test.py
+105 −0 tests/transformations/helpers_test.py
+6 −6 tests/transformations/loop_to_map_test.py
+147 −1 tests/transformations/map_fission_test.py
+1 −1 tutorials/getting_started.ipynb
2 changes: 1 addition & 1 deletion external/gt4py
Submodule gt4py updated 139 files
2 changes: 2 additions & 0 deletions ndsl/__init__.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,6 +10,7 @@
from .constants import ConstantVersions
from .dsl.caches.codepath import FV3CodePath
from .quantity import Quantity
from .dsl.optimization_config import OptimizationConfig
from .dsl.ndsl_runtime import NDSLRuntime
from .dsl.stencil import FrozenStencil, GridIndexing, StencilFactory, TimingCollector
from .dsl.stencil_config import CompilationConfig, RunMode, StencilConfig
Expand Down Expand Up @@ -90,6 +91,7 @@
"MetaEnumStr",
"State",
"LocalState",
"OptimizationConfig",
"NDSLRuntime",
"Local",
"DiagManagerMonitor",
Expand Down
2 changes: 2 additions & 0 deletions ndsl/config/backend.py
Original file line number Diff line number Diff line change
Expand Up @@ -52,6 +52,8 @@ class BackendLoopOrder(Enum):
"orch:dace:cpu:KJI": "dace:cpu_KJI",
"st:dace:gpu:KJI": "dace:gpu",
"orch:dace:gpu:KJI": "dace:gpu",
"st:dace:gpu:IJK": "dace:gpu_IJK",
"orch:dace:gpu:IJK": "dace:gpu_IJK",
}
"""Internal: match the NDSL backend names with the GT4Py names"""

Expand Down
64 changes: 33 additions & 31 deletions ndsl/dsl/caches/cache_location.py
Original file line number Diff line number Diff line change
Expand Up @@ -7,46 +7,48 @@ def identify_code_path(
partitioner: Partitioner,
single_code_path: bool,
) -> FV3CodePath:
"""Determine which code path your rank will hit.
"""
Determine which code path your rank will hit.

If single_code_path is True, single_code_path is True,
only one code path exists (case of doubly periodic grid).
If single_code_path is True, only one code path exists,
e.g. in case of a doubly periodic grid.
If single_code_path is False, we are in the case of the
cube-sphere and we will look at our position on the tile."""
cube-sphere and we will look at our position on the tile.
"""

# Doubly-periodic or single tile grid
if single_code_path:
if single_code_path or partitioner.layout == (1, 1):
return FV3CodePath.All

# Cube-sphere
if partitioner.layout == (1, 1):
return FV3CodePath.All
elif partitioner.layout[0] == 1 or partitioner.layout[1] == 1:
if partitioner.layout[0] <= 1 or partitioner.layout[1] <= 1:
raise NotImplementedError(
f"Build for layout {partitioner.layout} is not handled"
f"Build for layout {partitioner.layout} is not handled."
)
else:
if partitioner.tile.on_tile_bottom(rank):
if partitioner.tile.on_tile_left(rank):
return FV3CodePath.BottomLeft
if partitioner.tile.on_tile_right(rank):
return FV3CodePath.BottomRight
else:
return FV3CodePath.Bottom
if partitioner.tile.on_tile_top(rank):
if partitioner.tile.on_tile_left(rank):
return FV3CodePath.TopLeft
if partitioner.tile.on_tile_right(rank):
return FV3CodePath.TopRight
else:
return FV3CodePath.Top
else:
if partitioner.tile.on_tile_left(rank):
return FV3CodePath.Left
if partitioner.tile.on_tile_right(rank):
return FV3CodePath.Right
else:
return FV3CodePath.Center

# Bottom row
if partitioner.tile.on_tile_bottom(rank):
if partitioner.tile.on_tile_left(rank):
return FV3CodePath.BottomLeft
if partitioner.tile.on_tile_right(rank):
return FV3CodePath.BottomRight
return FV3CodePath.Bottom

# Top row
if partitioner.tile.on_tile_top(rank):
if partitioner.tile.on_tile_left(rank):
return FV3CodePath.TopLeft
if partitioner.tile.on_tile_right(rank):
return FV3CodePath.TopRight
return FV3CodePath.Top

# Left & right column with corners already handled
if partitioner.tile.on_tile_left(rank):
return FV3CodePath.Left
if partitioner.tile.on_tile_right(rank):
return FV3CodePath.Right

return FV3CodePath.Center


def get_cache_fullpath(code_path: FV3CodePath) -> str:
Expand Down
71 changes: 50 additions & 21 deletions ndsl/dsl/dace/dace_config.py
Original file line number Diff line number Diff line change
Expand Up @@ -10,14 +10,20 @@
from gt4py.cartesian.utils.compiler import cxx_compiler_defaults, gpu_configuration

from ndsl import LocalComm
from ndsl.comm import Comm
from ndsl.comm.communicator import Communicator
from ndsl.comm.partitioner import Partitioner
from ndsl.config import Backend
from ndsl.dsl import NDSL_COMPILER_SILENCE, NDSL_GLOBAL_PRECISION
from ndsl.dsl.caches.cache_location import identify_code_path
from ndsl.dsl.caches.codepath import FV3CodePath
from ndsl.dsl.dace.hardware_config import get_gpu_hardware_defaults
from ndsl.optional_imports import cupy as cp
from ndsl.performance.collector import NullPerformanceCollector, PerformanceCollector
from ndsl.performance.collector import (
AbstractPerformanceCollector,
NullPerformanceCollector,
PerformanceCollector,
)


if TYPE_CHECKING:
Expand Down Expand Up @@ -166,8 +172,8 @@ def __init__(
Args:
communicator: used for setting the distributed caches
backend: string for the backend
tile_nx: x/y domain size for a single time
tile_nz: z domain size for a single time
tile_nx: x/y domain size for a single tile
tile_nz: z domain size for a single tile
orchestration: orchestration mode from DaCeOrchestration
time: trigger performance collection, available to user with
`performance_collector`
Expand All @@ -181,16 +187,12 @@ def __init__(
# ToDo: DaceConfig becomes a bit more than a read-only config
# with this. Should be refactored into a DaceExecutor carrying a config
self.loaded_dace_executables: DaceExecutables = {}
self.performance_collector = (
PerformanceCollector(
"InternalOrchestrationTimer",
comm=(
LocalComm(0, 6, {}) if communicator is None else communicator.comm
),
if not time:
self.performance_collector: AbstractPerformanceCollector = (
NullPerformanceCollector()
)
if time
else NullPerformanceCollector()
)
else:
self.set_timer(communicator.comm if communicator else None)

# Temporary. This is a bit too out of the ordinary for the common user.
# We should refactor the architecture to allow for a `gtc:orchestrated:dace:X`
Expand Down Expand Up @@ -265,21 +267,29 @@ def __init__(
march_option = "-mcpu=native" if is_arm_neoverse else "-march=native"
# Removed --fast-math
gpu_config = gpu_configuration(GT4PY_COMPILE_OPT_LEVEL)
gpu_cflags = " ".join(gpu_config.gpu_compile_flags).strip()
dace.config.Config.set(
"compiler",
"cuda",
"args",
value=f"-std=c++14 {warnings_policy} -Xcompiler -fPIC -O{optimization_level} -Xcompiler {march_option} {gpu_config.gpu_compile_flags}",
value=f"-std=c++14 {warnings_policy} -Xcompiler -fPIC -O{optimization_level} -Xcompiler {march_option} {gpu_cflags}",
)

cuda_sm = cp.cuda.Device(0).compute_capability if cp else 60
dace.config.Config.set("compiler", "cuda", "cuda_arch", value=f"{cuda_sm}")
# Block size/thread count is defaulted to an average value for recent
# hardware (Pascal and upward). The problem of setting an optimized
# block/thread is both hardware and problem dependant. Fine tuners
# available in DaCe should be relied on for further tuning of this value.
# Target compilation for hardware micro-code capacities
gpu_defaults = get_gpu_hardware_defaults()
dace.config.Config.set(
"compiler", "cuda", "default_block_size", value="64,8,1"
"compiler",
"cuda",
"cuda_arch",
value=f"{gpu_defaults.compute_capability}",
)

# Default block size for kernels launch
dace.config.Config.set(
"compiler",
"cuda",
"default_block_size",
value=str(gpu_defaults.block_size)[1:-1],
)
# Potentially buggy - deactivate
dace.config.Config.set(
Expand Down Expand Up @@ -346,6 +356,9 @@ def __init__(
value="c",
)

# Debug lineinfo is incorrect anyway for the stencils
dace.config.Config.set("compiler", "lineinfo", value="none")

# Attempt to kill the dace.conf to avoid confusion
dace_conf_to_kill = dace.config.Config.cfg_filename()
if dace_conf_to_kill is not None:
Expand Down Expand Up @@ -413,4 +426,20 @@ def from_dict(cls, data: dict) -> Self:
config.rank_size = data["rank_size"]
config.layout = data["layout"]
config.tile_resolution = data["tile_resolution"]
return config
# TODO
# Computed properties like `self.code_path` and `self.do_compile`
# aren't updated.
# We also don't `set_distributed_caches()` based on that updated
# information.
raise NotImplementedError(
"Implementation of `DaceConfig.from_dict()` is incomplete."
)

def set_timer(self, comm: Comm | None) -> None:
"""Set timer on configuration externally"""
# TODO: this absolutely should not be a on a Configuration object
# and even less setup outside. Madness, we have lost our ways...
self.performance_collector = PerformanceCollector(
"InternalOrchestrationTimer",
comm=(LocalComm(0, 6, {}) if comm is None else comm),
)
Loading