Skip to content

Document and scripts for OpenShift deployment#4679

Merged
IsaacYangSLA merged 13 commits into
NVIDIA:mainfrom
IsaacYangSLA:final_oc
Jun 2, 2026
Merged

Document and scripts for OpenShift deployment#4679
IsaacYangSLA merged 13 commits into
NVIDIA:mainfrom
IsaacYangSLA:final_oc

Conversation

@IsaacYangSLA

Copy link
Copy Markdown
Collaborator

Description

Readme and user guide to describe how to deployment nvflare into OpenShift.
The following scripts are included and described in the user guides.

  • provisioning nvflare with one server, two clients (site-1 and site-2), and one admin
  • deploying all participants to OpenShift to form nvflare system
  • submitting one job to the nvflare system
  • monitoring the live pods in the OpenShift cluster

For users without existing OpenShift cluster, the follow two scripts are also included and described in the user guide.

  • creating Red Hat OpenShift Local cluster
  • starting Red Hat OpenShift Local cluster

One small update in k8s job launcher to support different token formats.

Types of changes

  • Non-breaking change (fix or new feature that would not break existing functionality).
  • Breaking change (fix or new feature that would cause existing functionality to change).
  • New tests added to cover the changes.
  • Quick tests passed locally by running ./runtest.sh.
  • In-line docstrings updated.
  • Documentation updated.

Copilot AI review requested due to automatic review settings May 22, 2026 17:59

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Adds a full OpenShift deployment guide (plus helper scripts) to the admin deployment docs, and includes a small compatibility tweak to the Kubernetes job launcher to handle newer Kubernetes Python client token key behavior.

Changes:

  • Documented an end-to-end OpenShift workflow (provision → deploy → submit job → monitor) under docs/user_guide/admin_guide/deployment/openshift/.
  • Added helper scripts for CRC (OpenShift Local) cluster creation/start, scripted NVFlare provisioning/deploy/job submission, and a Rich-based live pod monitor.
  • Updated K8sJobLauncher to mirror an in-cluster token from authorization to BearerToken for newer Kubernetes Python client behavior.

Reviewed changes

Copilot reviewed 14 out of 14 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
nvflare/app_opt/job_launcher/k8s_launcher.py Adds a small token-key compatibility shim for in-cluster Kubernetes client configuration.
docs/user_guide/admin_guide/deployment/openshift/scripts/start_openshift_cluster.sh Starts CRC, optionally logs in via oc, and prepares/ चयन selects the target project/namespace.
docs/user_guide/admin_guide/deployment/openshift/scripts/create_openshift_cluster.sh Configures CRC settings and runs crc setup, optionally delegating to the start script.
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_common.sh Shared implementation for provision/deploy/submit phases (workspace prep, PVC staging, helm installs, job submit/wait).
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_provision.sh Phase script to generate startup kits via nvflare provision.
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_deploy.sh Phase script to run nvflare deploy prepare, stage PVCs, and install generated Helm charts.
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_submit_job.sh Phase script to export and submit hello-numpy from an in-cluster admin pod and wait for completion.
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_watch.sh Shell wrapper to run the Rich pod watcher with shared env defaults.
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_watch.py Implements a Rich live pod table driven by oc/kubectl get pods -o json.
docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_e2e.sh Convenience script to run provision → deploy → submit sequentially.
docs/user_guide/admin_guide/deployment/openshift/scripts/Dockerfile Provides an OpenShift restricted-SCC-compatible NVFlare image build recipe.
docs/user_guide/admin_guide/deployment/openshift/README.md Describes the OpenShift docs/scripts layout and quickstart entry points.
docs/user_guide/admin_guide/deployment/openshift/index.rst New OpenShift deployment guide page (CRC setup, image requirements, scripted and manual workflows).
docs/user_guide/admin_guide/deployment/index.rst Adds the OpenShift guide to the deployment guide TOC.

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment thread docs/user_guide/admin_guide/deployment/openshift/scripts/Dockerfile Outdated
@greptile-apps

greptile-apps Bot commented May 22, 2026

Copy link
Copy Markdown
Contributor

Greptile Summary

This PR adds OpenShift deployment documentation, helper shell scripts, and a Python watch utility for running NVFlare on OpenShift/Kubernetes. A targeted kubernetes==36.0.0 exclusion in setup.cfg addresses a known client-library regression, and the k8s launcher tests are extended to cover the in-cluster config code path.

  • New scripts (k8s_common.sh, k8s_deploy.sh, k8s_e2e.sh, k8s_provision.sh, k8s_submit_job.sh, k8s_watch.py, k8s_watch.sh, start_openshift_cluster.sh, create_openshift_cluster.sh, cleanup_openshift_cluster.sh) implement a full end-to-end OpenShift workflow: provision, deploy, submit, and monitor.
  • setup.cfg pins kubernetes!=36.0.0 to skip the broken release, and tests validate that config_file_path=None correctly triggers load_incluster_config.
  • Documentation (openshift.rst, index.md, README.md) is added to guide users through the workflow.

Confidence Score: 5/5

Safe to merge. The Python code change is a targeted version exclusion with a matching test; the new scripts and documentation do not touch any existing runtime paths.

The only runtime code change is excluding kubernetes==36.0.0 in setup.cfg and adding a test for the already-existing in-cluster config branch. All other additions are new deployment scripts and documentation that are opt-in and have no effect on existing functionality.

examples/devops/openshift/scripts/k8s_common.sh — the tarfile extraction call and the python-path comparison are worth a second look before these scripts are used in production or CI.

Important Files Changed

Filename Overview
examples/devops/openshift/scripts/k8s_common.sh Core shared library for all OpenShift scripts (~846 lines); well-structured with good path traversal protection in copy_dir_to_admin_pod, but uses tarfile.extract without a filter parameter (Python 3.12+ deprecation) and has a misleading python-path comparison in write_prepare_config.
setup.cfg Excludes kubernetes==36.0.0 to avoid the known API-key header regression; change is minimal and correct.
tests/unit_test/app_opt/job_launcher/k8s_launcher_test.py Adds a well-structured test verifying load_incluster_config is called when config_file_path=None; makes config_file_path parametric in the shared helper.
examples/devops/openshift/scripts/k8s_watch.py Rich-based live pod monitor; handles missing rich gracefully, correctly wraps all dict accesses with or-empty-dict guards, and parses timestamps safely.
examples/devops/openshift/scripts/k8s_e2e.sh Orchestrates the three-phase workflow (provision → deploy → submit) by invoking individual phase scripts; correctly sets CLEAN_WORK_DIR=false for the second and third phases.
tests/unit_test/tool/deploy/deploy_commands_test.py New test verifies config_file_path defaults to None (in-cluster config) when no config_file_path is provided in the prepare config; straightforward and correct.

Sequence Diagram

sequenceDiagram
    participant User
    participant k8s_e2e.sh
    participant k8s_provision.sh
    participant k8s_deploy.sh
    participant k8s_submit_job.sh
    participant OpenShift_K8s

    User->>k8s_e2e.sh: "IMAGE=... bash k8s_e2e.sh"
    k8s_e2e.sh->>k8s_provision.sh: "CLEAN_WORK_DIR=true"
    k8s_provision.sh->>k8s_provision.sh: write_project_file()
    k8s_provision.sh->>k8s_provision.sh: nvflare provision
    k8s_provision.sh-->>k8s_e2e.sh: prod_00/ ready
    k8s_e2e.sh->>k8s_deploy.sh: "CLEAN_WORK_DIR=false"
    k8s_deploy.sh->>k8s_deploy.sh: nvflare deploy prepare (each participant)
    k8s_deploy.sh->>OpenShift_K8s: create namespace + PVCs
    k8s_deploy.sh->>OpenShift_K8s: stage workspace via copy pods
    k8s_deploy.sh->>OpenShift_K8s: helm upgrade --install (server + clients)
    OpenShift_K8s-->>k8s_deploy.sh: rollout status OK
    k8s_e2e.sh->>k8s_submit_job.sh: "CLEAN_WORK_DIR=false"
    k8s_submit_job.sh->>k8s_submit_job.sh: export_hello_numpy_job + patch_job_launcher_spec
    k8s_submit_job.sh->>OpenShift_K8s: launch admin pod
    k8s_submit_job.sh->>OpenShift_K8s: nvflare job submit (in-cluster)
    OpenShift_K8s->>OpenShift_K8s: K8sJobLauncher (load_incluster_config) spawns job pods
    OpenShift_K8s-->>k8s_submit_job.sh: job FINISHED:COMPLETED
    k8s_submit_job.sh-->>k8s_e2e.sh: done
Loading

Reviews (17): Last reviewed commit: "Merge branch 'main' into final_oc" | Re-trigger Greptile

Comment thread nvflare/app_opt/job_launcher/k8s_launcher.py Outdated
Comment thread nvflare/app_opt/job_launcher/k8s_launcher.py Outdated

@YuanTingHsieh YuanTingHsieh left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

overall LGTM

Comment thread setup.cfg Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/index.rst Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_common.sh Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_common.sh Outdated

Copilot AI left a comment

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Copilot reviewed 22 out of 22 changed files in this pull request and generated 4 comments.

Comment thread examples/devops/openshift/scripts/k8s_watch.py
Comment thread docs/user_guide/admin_guide/deployment/openshift/index.rst Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/index.rst Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_common.sh Outdated
Comment thread examples/devops/openshift/scripts/k8s_deploy.sh
Comment thread examples/devops/openshift/scripts/create_openshift_cluster.sh
Comment thread docs/user_guide/admin_guide/deployment/openshift/scripts/Dockerfile Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/scripts/openshift_k8s_deploy.sh Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/index.rst Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/index.rst Outdated
Comment thread docs/user_guide/admin_guide/deployment/openshift/index.rst Outdated
Comment thread examples/devops/openshift/scripts/k8s_provision.sh

@chesterxgchen chesterxgchen left a comment

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

the scripts adding too many wrappers under script folder.
The user will have to understand each scripts and to learn what it does. Remember users are not interested in just run your scripts and see it worked. They are here to learn what are the basic commands that they can use for their deployment.

By wrapping the nvflare commands under these wrapper scripts creating a new barrier to learn. Many of the wrapper scripts are just conveniences.

Try remove or minimize these wrapper scripts to expose the base insrtuctions for user to use.

@IsaacYangSLA IsaacYangSLA enabled auto-merge (squash) June 2, 2026 21:12
@IsaacYangSLA IsaacYangSLA merged commit a5673d6 into NVIDIA:main Jun 2, 2026
16 checks passed
@IsaacYangSLA IsaacYangSLA deleted the final_oc branch June 3, 2026 16:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

5 participants