Skip to content

{ai}[foss/2023a] DeePMD-kit v3.0.1, Horovod v0.28.1 w/ TensorFlow 2.13.0#22217

Open
pavelToman wants to merge 7 commits into
easybuilders:developfrom
pavelToman:20250127161535_new_pr_DeePDM-kit301
Open

{ai}[foss/2023a] DeePMD-kit v3.0.1, Horovod v0.28.1 w/ TensorFlow 2.13.0#22217
pavelToman wants to merge 7 commits into
easybuilders:developfrom
pavelToman:20250127161535_new_pr_DeePDM-kit301

Conversation

@pavelToman
Copy link
Copy Markdown
Collaborator

@pavelToman pavelToman commented Jan 27, 2025

(created using eb --new-pr)
resolves vscentrum/vsc-software-stack#487

…, DeePDM-kit-3.0.1-foss-2023a.eb, Horovod-0.28.1-foss-2023a-TensorFlow-2.15.1.eb
@github-actions
Copy link
Copy Markdown

github-actions Bot commented Jan 27, 2025

Updated software Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb

Diff against Horovod-0.28.1-foss-2022a-PyTorch-1.12.0.eb

easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-PyTorch-1.12.0.eb

diff --git a/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-PyTorch-1.12.0.eb b/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb
index 90aa0d7222..514a13da15 100644
--- a/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-PyTorch-1.12.0.eb
+++ b/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb
@@ -2,33 +2,40 @@ easyblock = 'PythonBundle'
 
 name = 'Horovod'
 version = '0.28.1'
-local_pt_version = '1.12.0'
-versionsuffix = '-PyTorch-%s' % local_pt_version
+local_tf_version = '2.13.0'
+versionsuffix = '-TensorFlow-%s' % local_tf_version
 
 homepage = 'https://github.com/uber/horovod'
-description = """Horovod is a distributed training framework for TensorFlow, PyTorch and MXnet.
-This build only has PyTorch enabled."""
+description = "Horovod is a distributed training framework for TensorFlow."
 
-toolchain = {'name': 'foss', 'version': '2022a'}
+toolchain = {'name': 'foss', 'version': '2023a'}
 
 builddependencies = [
-    ('CMake', '3.23.1'),
+    ('CMake', '3.26.3'),
 ]
 dependencies = [
-    ('Python', '3.10.4'),
+    ('Python', '3.11.3'),
     ('PyYAML', '6.0'),
-    ('PyTorch', local_pt_version),
+    ('TensorFlow', local_tf_version),
 ]
 
-preinstallopts = 'HOROVOD_WITH_MPI=1 '
-preinstallopts += 'HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 '
+use_pip = True
+sanity_pip_check = True
+
+local_preinstallopts = 'module swap protobuf/3.21.9-GCCcore-12.3.0 && HOROVOD_WITH_MPI=1 '
+local_preinstallopts += 'HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITHOUT_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 '
 
 exts_list = [
     ('cloudpickle', '2.2.1', {
         'checksums': ['d89684b8de9e34a2a43b3460fbca07d09d6e25ce858df4d5a44240403b6178f5'],
     }),
     ('horovod', version, {
-        'checksums': ['92a43f5a94c43907a56805bad15f19700c62ffc83b7ca483f9e104e229f67ef0'],
+        'preinstallopts': local_preinstallopts,
+        'patches': ['Horovod-0.28.1_support_flatbuffers_2.0.6.patch'],
+        'checksums': [
+            '92a43f5a94c43907a56805bad15f19700c62ffc83b7ca483f9e104e229f67ef0',
+            '9696ffb3b2bad1d6dd5a9f37bc58078ca7c585f933bcbec037036ad9fc0b297d',
+        ],
     }),
 ]
 
Diff against Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.13.1.eb

easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.13.1.eb

diff --git a/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.13.1.eb b/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb
index 744c678169..514a13da15 100644
--- a/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.13.1.eb
+++ b/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb
@@ -2,36 +2,40 @@ easyblock = 'PythonBundle'
 
 name = 'Horovod'
 version = '0.28.1'
-local_pt_version = '1.13.1'
-local_cuda_suffix = '-CUDA-%(cudaver)s'
-versionsuffix = local_cuda_suffix + '-PyTorch-%s' % local_pt_version
+local_tf_version = '2.13.0'
+versionsuffix = '-TensorFlow-%s' % local_tf_version
 
 homepage = 'https://github.com/uber/horovod'
-description = """Horovod is a distributed training framework for TensorFlow, PyTorch and MXnet.
-This build only has PyTorch enabled."""
+description = "Horovod is a distributed training framework for TensorFlow."
 
-toolchain = {'name': 'foss', 'version': '2022a'}
+toolchain = {'name': 'foss', 'version': '2023a'}
 
 builddependencies = [
-    ('CMake', '3.23.1'),
+    ('CMake', '3.26.3'),
 ]
 dependencies = [
-    ('Python', '3.10.4'),
+    ('Python', '3.11.3'),
     ('PyYAML', '6.0'),
-    ('CUDA', '11.7.0', '', SYSTEM),
-    ('NCCL', '2.12.12', local_cuda_suffix),
-    ('PyTorch', local_pt_version, local_cuda_suffix),
+    ('TensorFlow', local_tf_version),
 ]
 
-preinstallopts = 'HOROVOD_WITH_MPI=1 HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL '
-preinstallopts += 'HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 '
+use_pip = True
+sanity_pip_check = True
+
+local_preinstallopts = 'module swap protobuf/3.21.9-GCCcore-12.3.0 && HOROVOD_WITH_MPI=1 '
+local_preinstallopts += 'HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITHOUT_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 '
 
 exts_list = [
     ('cloudpickle', '2.2.1', {
         'checksums': ['d89684b8de9e34a2a43b3460fbca07d09d6e25ce858df4d5a44240403b6178f5'],
     }),
     ('horovod', version, {
-        'checksums': ['92a43f5a94c43907a56805bad15f19700c62ffc83b7ca483f9e104e229f67ef0'],
+        'preinstallopts': local_preinstallopts,
+        'patches': ['Horovod-0.28.1_support_flatbuffers_2.0.6.patch'],
+        'checksums': [
+            '92a43f5a94c43907a56805bad15f19700c62ffc83b7ca483f9e104e229f67ef0',
+            '9696ffb3b2bad1d6dd5a9f37bc58078ca7c585f933bcbec037036ad9fc0b297d',
+        ],
     }),
 ]
 
Diff against Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.12.1.eb

easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.12.1.eb

diff --git a/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.12.1.eb b/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb
index 8485defceb..514a13da15 100644
--- a/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2022a-CUDA-11.7.0-PyTorch-1.12.1.eb
+++ b/easybuild/easyconfigs/h/Horovod/Horovod-0.28.1-foss-2023a-TensorFlow-2.13.0.eb
@@ -2,36 +2,40 @@ easyblock = 'PythonBundle'
 
 name = 'Horovod'
 version = '0.28.1'
-local_pt_version = '1.12.1'
-local_cuda_suffix = '-CUDA-%(cudaver)s'
-versionsuffix = local_cuda_suffix + '-PyTorch-%s' % local_pt_version
+local_tf_version = '2.13.0'
+versionsuffix = '-TensorFlow-%s' % local_tf_version
 
 homepage = 'https://github.com/uber/horovod'
-description = """Horovod is a distributed training framework for TensorFlow, PyTorch and MXnet.
-This build only has PyTorch enabled."""
+description = "Horovod is a distributed training framework for TensorFlow."
 
-toolchain = {'name': 'foss', 'version': '2022a'}
+toolchain = {'name': 'foss', 'version': '2023a'}
 
 builddependencies = [
-    ('CMake', '3.23.1'),
+    ('CMake', '3.26.3'),
 ]
 dependencies = [
-    ('Python', '3.10.4'),
+    ('Python', '3.11.3'),
     ('PyYAML', '6.0'),
-    ('CUDA', '11.7.0', '', SYSTEM),
-    ('NCCL', '2.12.12', local_cuda_suffix),
-    ('PyTorch', local_pt_version, local_cuda_suffix),
+    ('TensorFlow', local_tf_version),
 ]
 
-preinstallopts = 'HOROVOD_WITH_MPI=1 HOROVOD_GPU_ALLREDUCE=NCCL HOROVOD_GPU_BROADCAST=NCCL '
-preinstallopts += 'HOROVOD_WITHOUT_TENSORFLOW=1 HOROVOD_WITH_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 '
+use_pip = True
+sanity_pip_check = True
+
+local_preinstallopts = 'module swap protobuf/3.21.9-GCCcore-12.3.0 && HOROVOD_WITH_MPI=1 '
+local_preinstallopts += 'HOROVOD_WITH_TENSORFLOW=1 HOROVOD_WITHOUT_PYTORCH=1 HOROVOD_WITHOUT_MXNET=1 '
 
 exts_list = [
     ('cloudpickle', '2.2.1', {
         'checksums': ['d89684b8de9e34a2a43b3460fbca07d09d6e25ce858df4d5a44240403b6178f5'],
     }),
     ('horovod', version, {
-        'checksums': ['92a43f5a94c43907a56805bad15f19700c62ffc83b7ca483f9e104e229f67ef0'],
+        'preinstallopts': local_preinstallopts,
+        'patches': ['Horovod-0.28.1_support_flatbuffers_2.0.6.patch'],
+        'checksums': [
+            '92a43f5a94c43907a56805bad15f19700c62ffc83b7ca483f9e104e229f67ef0',
+            '9696ffb3b2bad1d6dd5a9f37bc58078ca7c585f933bcbec037036ad9fc0b297d',
+        ],
     }),
 ]
 

Updated software protobuf-3.21.9-GCCcore-12.3.0.eb

Diff against protobuf-28.0-GCCcore-13.3.0.eb

easybuild/easyconfigs/p/protobuf/protobuf-28.0-GCCcore-13.3.0.eb

diff --git a/easybuild/easyconfigs/p/protobuf/protobuf-28.0-GCCcore-13.3.0.eb b/easybuild/easyconfigs/p/protobuf/protobuf-3.21.9-GCCcore-12.3.0.eb
index b5f959fc06..2da707b2e7 100644
--- a/easybuild/easyconfigs/p/protobuf/protobuf-28.0-GCCcore-13.3.0.eb
+++ b/easybuild/easyconfigs/p/protobuf/protobuf-3.21.9-GCCcore-12.3.0.eb
@@ -1,31 +1,31 @@
 easyblock = 'CMakeMake'
 
 name = 'protobuf'
-version = '28.0'
+version = '3.21.9'
 
 homepage = 'https://github.com/protocolbuffers/protobuf'
-description = """Protocol Buffers (a.k.a., protobuf) are Google's
-language-neutral, platform-neutral, extensible mechanism for
+description = """Protocol Buffers (a.k.a., protobuf) are Google's 
+language-neutral, platform-neutral, extensible mechanism for 
 serializing structured data."""
 
-toolchain = {'name': 'GCCcore', 'version': '13.3.0'}
+toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
 
-github_account = 'protocolbuffers'
-source_urls = [GITHUB_RELEASE]
-sources = [SOURCE_TAR_GZ]
-checksums = ['13e7749c30bc24af6ee93e092422f9dc08491c7097efa69461f88eb5f61805ce']
+source_urls = ['https://github.com/protocolbuffers/protobuf/archive/refs/tags/']
+sources = ['v21.9.tar.gz']
+patches = ['protobuf-21.9_linking-error.patch']
+checksums = [
+    '0aa7df8289c957a4c54cbe694fbabe99b180e64ca0f8fdb5e2f76dcf56ff2422',  # v21.9.tar.gz
+    '14487154fa9d50cc647d6837f9e83f24d2002bcbac876b6b35eb042ededee7ad',  # protobuf-21.9_linking-error.patch
+]
 
 builddependencies = [
-    ('binutils', '2.42'),
-    ('CMake', '3.29.3'),
-]
-dependencies = [
-    ('Abseil', '20240722.0'),
+    ('binutils', '2.40'),
+    ('CMake', '3.26.3'),
 ]
 
-srcdir = '.'
+srcdir = 'cmake'
 
-configopts = '-Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_SHARED_LIBS=ON -Dprotobuf_ABSL_PROVIDER="package" '
+configopts = '-Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_SHARED_LIBS=ON '
 
 sanity_check_paths = {
     'files': ['bin/protoc', 'lib/libprotobuf.%s' % SHLIB_EXT],
Diff against protobuf-25.3-GCCcore-13.2.0.eb

easybuild/easyconfigs/p/protobuf/protobuf-25.3-GCCcore-13.2.0.eb

diff --git a/easybuild/easyconfigs/p/protobuf/protobuf-25.3-GCCcore-13.2.0.eb b/easybuild/easyconfigs/p/protobuf/protobuf-3.21.9-GCCcore-12.3.0.eb
index 5a700f6825..2da707b2e7 100644
--- a/easybuild/easyconfigs/p/protobuf/protobuf-25.3-GCCcore-13.2.0.eb
+++ b/easybuild/easyconfigs/p/protobuf/protobuf-3.21.9-GCCcore-12.3.0.eb
@@ -1,31 +1,31 @@
 easyblock = 'CMakeMake'
 
 name = 'protobuf'
-version = '25.3'
+version = '3.21.9'
 
 homepage = 'https://github.com/protocolbuffers/protobuf'
-description = """Protocol Buffers (a.k.a., protobuf) are Google's
-language-neutral, platform-neutral, extensible mechanism for
+description = """Protocol Buffers (a.k.a., protobuf) are Google's 
+language-neutral, platform-neutral, extensible mechanism for 
 serializing structured data."""
 
-toolchain = {'name': 'GCCcore', 'version': '13.2.0'}
+toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
 
-github_account = 'protocolbuffers'
-source_urls = [GITHUB_RELEASE]
-sources = [SOURCE_TAR_GZ]
-checksums = ['d19643d265b978383352b3143f04c0641eea75a75235c111cc01a1350173180e']
+source_urls = ['https://github.com/protocolbuffers/protobuf/archive/refs/tags/']
+sources = ['v21.9.tar.gz']
+patches = ['protobuf-21.9_linking-error.patch']
+checksums = [
+    '0aa7df8289c957a4c54cbe694fbabe99b180e64ca0f8fdb5e2f76dcf56ff2422',  # v21.9.tar.gz
+    '14487154fa9d50cc647d6837f9e83f24d2002bcbac876b6b35eb042ededee7ad',  # protobuf-21.9_linking-error.patch
+]
 
 builddependencies = [
     ('binutils', '2.40'),
-    ('CMake', '3.27.6'),
-]
-dependencies = [
-    ('Abseil', '20240116.1'),
+    ('CMake', '3.26.3'),
 ]
 
-srcdir = '.'
+srcdir = 'cmake'
 
-configopts = '-Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_SHARED_LIBS=ON -Dprotobuf_ABSL_PROVIDER="package" '
+configopts = '-Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_SHARED_LIBS=ON '
 
 sanity_check_paths = {
     'files': ['bin/protoc', 'lib/libprotobuf.%s' % SHLIB_EXT],
Diff against protobuf-24.0-GCCcore-12.3.0.eb

easybuild/easyconfigs/p/protobuf/protobuf-24.0-GCCcore-12.3.0.eb

diff --git a/easybuild/easyconfigs/p/protobuf/protobuf-24.0-GCCcore-12.3.0.eb b/easybuild/easyconfigs/p/protobuf/protobuf-3.21.9-GCCcore-12.3.0.eb
index 55040a3fd3..2da707b2e7 100644
--- a/easybuild/easyconfigs/p/protobuf/protobuf-24.0-GCCcore-12.3.0.eb
+++ b/easybuild/easyconfigs/p/protobuf/protobuf-3.21.9-GCCcore-12.3.0.eb
@@ -1,31 +1,31 @@
 easyblock = 'CMakeMake'
 
 name = 'protobuf'
-version = '24.0'
+version = '3.21.9'
 
 homepage = 'https://github.com/protocolbuffers/protobuf'
-description = """Protocol Buffers (a.k.a., protobuf) are Google's
-language-neutral, platform-neutral, extensible mechanism for
+description = """Protocol Buffers (a.k.a., protobuf) are Google's 
+language-neutral, platform-neutral, extensible mechanism for 
 serializing structured data."""
 
 toolchain = {'name': 'GCCcore', 'version': '12.3.0'}
 
 source_urls = ['https://github.com/protocolbuffers/protobuf/archive/refs/tags/']
-sources = ['v%(version)s.tar.gz']
-checksums = ['850357336189c470e429e9bdffca92229d8cd5b7f84aa2f3b4c5fdb80ce8351b']
+sources = ['v21.9.tar.gz']
+patches = ['protobuf-21.9_linking-error.patch']
+checksums = [
+    '0aa7df8289c957a4c54cbe694fbabe99b180e64ca0f8fdb5e2f76dcf56ff2422',  # v21.9.tar.gz
+    '14487154fa9d50cc647d6837f9e83f24d2002bcbac876b6b35eb042ededee7ad',  # protobuf-21.9_linking-error.patch
+]
 
 builddependencies = [
     ('binutils', '2.40'),
     ('CMake', '3.26.3'),
 ]
 
-srcdir = '.'
-
-configopts = '-Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_SHARED_LIBS=ON -Dprotobuf_ABSL_PROVIDER="package" '
+srcdir = 'cmake'
 
-dependencies = [
-    ('Abseil', '20230125.3'),
-]
+configopts = '-Dprotobuf_BUILD_TESTS=OFF -Dprotobuf_BUILD_SHARED_LIBS=ON '
 
 sanity_check_paths = {
     'files': ['bin/protoc', 'lib/libprotobuf.%s' % SHLIB_EXT],

@pavelToman pavelToman changed the title {ai}[foss/2023a] DeePDM-kit v3.0.1, Horovod v0.28.1 w/ TensorFlow 2.15.1, with LAMMPS plugin {ai}[foss/2023a] DeePDM-kit v3.0.1 (+ version with LAMMPS plugin) , Horovod v0.28.1 w/ TensorFlow 2.15.1 Jan 27, 2025
@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22217 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22217 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5607

Test results coming soon (I hope)...

Details

- notification for comment with ID 2616046892 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 0 out of 3 (3 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/08b6bde5515f3a0bba1384f8ea1cdc91 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 3 out of 3 (3 easyconfigs in total)
node3129.skitty.os - Linux RHEL 9.4, x86_64, Intel(R) Xeon(R) Gold 6140 CPU @ 2.30GHz, Python 3.9.18
See https://gist.github.com/pavelToman/3019635aaee390c66173434fd4b8875d for a full test report.

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22217 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22217 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5608

Test results coming soon (I hope)...

Details

- notification for comment with ID 2616106179 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
FAILED
Build succeeded for 1 out of 4 (4 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/2684f1ad26e30636b53ce63c50c02f75 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

pavelToman commented Jan 27, 2025

Test report by @boegelbot FAILED Build succeeded for 0 out of 3 (3 easyconfigs in total) jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21 See https://gist.github.com/boegelbot/08b6bde5515f3a0bba1384f8ea1cdc91 for a full test report.

horovod runs before protobuf, how to fix this? I can not have protobuf in builddeps - it crash with Lmod has detected the following error: A different version of the 'protobuf' module is already loaded...

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node4012.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8
See https://gist.github.com/pavelToman/9af021d84b21f81486d83b8f0916b27e for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3 EB_ARGS="protobuf-3.21.9-GCCcore-12.3.0.eb"

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22217 EB_ARGS="protobuf-3.21.9-GCCcore-12.3.0.eb" EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22217 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5610

Test results coming soon (I hope)...

Details

- notification for comment with ID 2618551104 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 1 out of 1 (1 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/ba284be51459c57c09b15d3a98909af7 for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3

@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22217 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22217 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5612

Test results coming soon (I hope)...

Details

- notification for comment with ID 2618582441 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/3ab086c93624216fcec5fb5fe5ee940d for a full test report.

@github-actions github-actions Bot added update and removed new labels Jan 28, 2025
@pavelToman pavelToman changed the title {ai}[foss/2023a] DeePDM-kit v3.0.1 (+ version with LAMMPS plugin) , Horovod v0.28.1 w/ TensorFlow 2.15.1 {ai}[foss/2023a] DeePDM-kit v3.0.1 (+ version with LAMMPS plugin) , Horovod v0.28.1 w/ TensorFlow 2.13.0 Jan 28, 2025
@pavelToman
Copy link
Copy Markdown
Collaborator Author

@boegelbot please test @ jsc-zen3

@pavelToman pavelToman added new and removed update labels Jan 28, 2025
@boegelbot
Copy link
Copy Markdown
Collaborator

@pavelToman: Request for testing this PR well received on jsczen3l1.int.jsc-zen3.fz-juelich.de

PR test command 'if [[ develop != 'develop' ]]; then EB_BRANCH=develop ./easybuild_develop.sh 2> /dev/null 1>&2; EB_PREFIX=/home/boegelbot/easybuild/develop source init_env_easybuild_develop.sh; fi; EB_PR=22217 EB_ARGS= EB_CONTAINER= EB_REPO=easybuild-easyconfigs EB_BRANCH=develop /opt/software/slurm/bin/sbatch --job-name test_PR_22217 --ntasks=8 ~/boegelbot/eb_from_pr_upload_jsc-zen3.sh' executed!

  • exit code: 0
  • output:
Submitted batch job 5616

Test results coming soon (I hope)...

Details

- notification for comment with ID 2618922698 processed

Message to humans: this is just bookkeeping information for me,
it is of no use to you (unless you think I have a bug, which I don't).

@boegelbot
Copy link
Copy Markdown
Collaborator

Test report by @boegelbot
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
jsczen3c1.int.jsc-zen3.fz-juelich.de - Linux Rocky Linux 9.5, x86_64, AMD EPYC-Milan Processor (zen3), Python 3.9.21
See https://gist.github.com/boegelbot/05f6700983179333450a9734d3cac00b for a full test report.

@pavelToman
Copy link
Copy Markdown
Collaborator Author

Test report by @pavelToman
SUCCESS
Build succeeded for 4 out of 4 (4 easyconfigs in total)
node4012.donphan.os - Linux RHEL 8.8, x86_64, Intel(R) Xeon(R) Gold 6240 CPU @ 2.60GHz, 1 x NVIDIA NVIDIA A2, 545.23.08, Python 3.6.8
See https://gist.github.com/pavelToman/29bb0cdb2e6ee33a4e80b6c2e5672a8c for a full test report.

'easyblock': 'PythonPackage',
'source_urls': ['https://pypi.python.org/packages/source/d/deepmd-kit/'],
'sources': ['deepmd_kit-%(version)s.tar.gz'],
'preinstallopts': "module swap protobuf/3.21.9-GCCcore-12.3.0 && ",
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would love to find a better way of dealing with this, since this is quite hackish.
It assumes that the default module naming scheme is used, for example, and it also assumes that this module is already installed (since it's not listed in dependencies or builddependencies).

Would a new framework feature like swap_dependencies make sense here, perhaps?

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@boegel is this feature now there? I could find this easybuilders/easybuild-framework#1506 framework pr. What needs to be changed for this pr for it to get merged. Because someone else has also expressed interest in having the DeePMD plugin with LAMMPS which is stuck behind this pr. See #24733.

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

easybuilders/easybuild-framework#1506 has been included since EasyBuild v2.5.0, so that's quite old. It may be helpful in this context though, but more work is needed.

The idea would be to be able to specify something like this in an easyconfig file:

swap_dependencies = [('protobuf', '3.21.9')]

EasyBuild framework would then pick up on this to do a module swap after the modules for toolchain and all dependencies have been loaded (so picked up in the prepare method of the Toolchain class).

In this particular easyconfig it would replace the hardcoded module swap command:

components = [
    ('deepmd', version, {
        'easyblock': 'PythonPackage',
        'source_urls': ['https://pypi.python.org/packages/source/d/deepmd-kit/'],
        'sources': ['deepmd_kit-%(version)s.tar.gz'],
        'swap_dependencies': [('protobuf', '3.21.9')],
        'use_pip': True,
        'start_dir': 'deepmd_kit-%(version)s',
        'checksums': ['10d4443c6fe31a9a4573ed6eda73b6a669dae572cf2bc43f45e9a63aaae02cff'],
    }),

I think making it work for components is easy once the top-level support for swap_dependencies is in place, but it may need a small change to Bundle easyblock (not sure, didn't check)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@pavelToman Are you up for trying to implement support for swap_dependencies yourself in EasyBuild framework, so we can get this across the finish line?

@boegel boegel self-assigned this May 20, 2025
@pavelToman pavelToman changed the title {ai}[foss/2023a] DeePDM-kit v3.0.1 (+ version with LAMMPS plugin) , Horovod v0.28.1 w/ TensorFlow 2.13.0 {ai}[foss/2023a] DeePMD-kit v3.0.1 (+ version with LAMMPS plugin) , Horovod v0.28.1 w/ TensorFlow 2.13.0 Jun 13, 2025
@@ -0,0 +1,113 @@
easyblock = 'PythonBundle'

name = 'DeePMD-kit'
Copy link
Copy Markdown
Contributor

@laraPPr laraPPr Jun 20, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This config is not working it does not return anything when running lmp -h | grep deepmd. It is required to compile LAMMPS with DeePMD-KIT for it to work. This easyconfig should be removed. See #23166 for LAMMPS with the deepmd plugin.

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What about to install this updated LAMMPS-with-deepmd-plugin as a component in this EC?

Copy link
Copy Markdown
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

DeePMD needs LAMMPS sources to be built anyway, so I will try to add another LAMMPS component, which will install LAMMPS-with-deepmd-plugin.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes that might be good to go that way as well because in the docs it seems to indicate that the /bin/lmp should be in the DeePMD root directory.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Not sure if it is possible because I'm getting this error when I try to use lammps easyblock. It seems that with componets you can only use generic easyblocks.

ERROR: Failed to get application instance for DeePMD-kit (easyblock: 

PythonBundle): Failed to obtain class for lammps easyblock (not available?): 

No module named 'easybuild.easyblocks.generic.lammps'

@github-actions github-actions Bot added the new label Jun 25, 2025
@laraPPr laraPPr changed the title {ai}[foss/2023a] DeePMD-kit v3.0.1 (+ version with LAMMPS plugin) , Horovod v0.28.1 w/ TensorFlow 2.13.0 {ai}[foss/2023a] DeePMD-kit v3.0.1, Horovod v0.28.1 w/ TensorFlow 2.13.0 Jun 26, 2025
@Thyre Thyre added the 2023a label Aug 18, 2025
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Projects

None yet

Development

Successfully merging this pull request may close these issues.

DeePMD-kit

6 participants