Skip to content
Merged
1 change: 1 addition & 0 deletions docs/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -84,6 +84,7 @@ NVIDIA FLARE
Deploy Prepare <user_guide/nvflare_cli/deploy_command>
Running FLARE in Docker <user_guide/admin_guide/deployment/containerized_deployment>
Running FLARE in Kubernetes <user_guide/admin_guide/deployment/helm_chart>
Deploying FLARE on OpenShift <user_guide/admin_guide/deployment/openshift>
Brev Scripted Deployment Quickstart <user_guide/admin_guide/deployment/brev_scripted_deployment>
Brev Kubernetes Helm Deployment <user_guide/admin_guide/deployment/brev_deployment>
Preflight Check <user_guide/nvflare_cli/preflight_check>
Expand Down
16 changes: 12 additions & 4 deletions docs/user_guide/admin_guide/deployment/brev_deployment.rst
Original file line number Diff line number Diff line change
Expand Up @@ -302,11 +302,15 @@ Dockerfile, install the dependency in the image:

.. code-block:: dockerfile

RUN pip install kubernetes
RUN pip install "kubernetes!=36.0.0"

The repository ``docker/Dockerfile.parent`` already installs the NVFlare
``K8S`` extra, which includes this dependency. Keep that install line, or add
the explicit ``pip install kubernetes`` line above before building your image.
the explicit ``pip install kubernetes!=36.0.0`` line above before building your image.

The prepared Brev launcher uses in-cluster Kubernetes config
(``job_launcher.config_file_path: null``), so the parent pod authenticates with
its ServiceAccount token.

.. code-block:: shell

Expand Down Expand Up @@ -576,7 +580,9 @@ in that namespace.
Copy the prepared server ``startup/`` and ``local/`` directories into the
``nvflws`` PVC. The chart starts the server with
``-m /var/tmp/nvflare/workspace``, so the PVC root must contain ``startup/``
and ``local/`` directly.
and ``local/`` directly. The temporary copy pod image must contain ``tar``
because ``kubectl cp`` requires it in the target container; ``busybox:1.36``
includes ``tar``.

.. code-block:: shell

Expand Down Expand Up @@ -701,7 +707,9 @@ same launcher settings from ``/tmp/nvflare-k8s.yaml``. Keep the Helm namespace
consistent with the ``namespace`` value used by ``nvflare deploy prepare``.

Copy the prepared ``site-1`` ``startup/`` and ``local/`` directories into the
client ``nvflws`` PVC:
client ``nvflws`` PVC. The temporary copy pod image must contain ``tar``
because ``kubectl cp`` requires it in the target container; ``busybox:1.36``
includes ``tar``:

.. code-block:: shell

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -94,6 +94,11 @@ if prepared_namespace != namespace:
f"Prepared launcher namespace is {prepared_namespace!r}, but launch NAMESPACE is {namespace!r}. "
"Use the same NAMESPACE for prepare and launch."
)
config_file_path = args.get("config_file_path")
if config_file_path not in (None, ""):
raise SystemExit(
f"Prepared launcher config_file_path is {config_file_path!r}; expected null/empty for in-cluster config."
)
if not args.get("workspace_mount_path"):
raise SystemExit("k8s_launcher args missing workspace_mount_path")

Expand Down Expand Up @@ -197,6 +202,7 @@ spec:
restartPolicy: Never
containers:
- name: copy
# kubectl cp requires tar in the target container; busybox includes it.
image: busybox:1.36
command:
- sh
Expand Down Expand Up @@ -298,6 +304,14 @@ install_chart() {
helm "${helm_args[@]}"
}

verify_parent_kubernetes_client() {
kubectl -n "${NAMESPACE}" exec "deploy/${PARTICIPANT}" -- "${PARENT_PYTHON_PATH}" -c '
import kubernetes

print(f"kubernetes-python-client={kubernetes.__version__}")
'
}

if [[ "${1:-}" == "-h" || "${1:-}" == "--help" ]]; then
usage
exit 0
Expand All @@ -318,6 +332,7 @@ ARCHIVE="${ARCHIVE:-${HOME}/nvflare-${PARTICIPANT}.tgz}"
COPY_POD="${COPY_POD:-nvflare-pvc-copy}"
ROLLOUT_TIMEOUT="${ROLLOUT_TIMEOUT:-300s}"
LOG_TAIL="${LOG_TAIL:-100}"
PARENT_PYTHON_PATH="${PARENT_PYTHON_PATH:-/usr/local/bin/python3}"

require_cmd kubectl helm tar python3
[[ -f "${ARCHIVE}" ]] || fail "Archive not found: ${ARCHIVE}"
Expand Down Expand Up @@ -361,6 +376,7 @@ fi
install_chart

kubectl -n "${NAMESPACE}" rollout status "deployment/${PARTICIPANT}" --timeout="${ROLLOUT_TIMEOUT}"
verify_parent_kubernetes_client
kubectl -n "${NAMESPACE}" get pods
kubectl -n "${NAMESPACE}" logs "deploy/${PARTICIPANT}" --tail="${LOG_TAIL}" || true

Expand Down
17 changes: 13 additions & 4 deletions docs/user_guide/admin_guide/deployment/helm_chart.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,8 +13,8 @@ kits and then preparing each server or client kit for the Kubernetes runtime.
The prepared kit contains a participant-specific Helm chart plus the
``startup/`` and ``local/`` folders that must be staged into Kubernetes storage.

For example scripts that automate temporary Kubernetes and managed cloud cluster
testing flows, see
For example scripts that automate temporary Kubernetes, OpenShift, and managed
cloud cluster testing flows, see
:github_nvflare_link:`examples/devops <examples/devops>`. These scripts are
for development, smoke testing, demos, and learning only; they are not
production deployment guidance.
Expand All @@ -28,6 +28,9 @@ Before you start, make sure you have:
``nvflare deploy prepare``.
* ``kubectl`` configured for the target cluster. Use a ``kubectl`` version that
is compatible with the Kubernetes API server.
* ``tar`` installed locally and in any temporary pod image used with
``kubectl cp``. The staging examples below use ``busybox:1.36``, which
includes ``tar``.
* Helm 3.
* A Kubernetes cluster with standard ``apps/v1`` Deployment,
``rbac.authorization.k8s.io/v1`` Role/RoleBinding, Service, Secret, and PVC
Expand Down Expand Up @@ -94,6 +97,9 @@ The generated Helm chart does not run submitted jobs directly. It installs the
parent participant process, its Kubernetes Service, its ServiceAccount, and the
Role/RoleBinding that allow the launcher to create job pods.

When ``job_launcher.config_file_path`` is omitted or set to ``null``, the
launcher uses Kubernetes in-cluster config from the parent pod's ServiceAccount.

The parent Service is the stable in-cluster address for dynamically launched job
pods. ``nvflare deploy prepare`` patches the prepared kit's internal
communication settings to use the generated Service name and ``parent_port``.
Expand Down Expand Up @@ -315,7 +321,9 @@ mounts ``parent.workspace_pvc`` at ``parent.workspace_mount_path``, but it does
not upload files to the PVC. Copy the prepared kit's ``startup/`` and
``local/`` directories into the root of that workspace PVC before installing the
chart. For server kits, also create or copy ``transfer/`` at the workspace root
for admin file-transfer storage.
for admin file-transfer storage. If you use ``kubectl cp`` as shown below, the
temporary copy pod image must contain ``tar`` because ``kubectl cp`` requires it
in the target container.

Example ``workspace-pvc.yaml``:

Expand Down Expand Up @@ -977,7 +985,8 @@ Check the parent logs for Kubernetes import or authorization failures:
--as=system:serviceaccount:"$NAMESPACE":server

If the logs show that the ``kubernetes`` Python package is missing, rebuild the
parent image with the NVFlare ``K8S`` extra or ``pip install kubernetes``.
parent image with the NVFlare ``K8S`` extra or
``pip install "kubernetes!=36.0.0"``.

If the logs show ``SSLCertVerificationError`` with
``CA cert does not include key usage extension``, the parent Kubernetes client
Expand Down
1 change: 1 addition & 0 deletions docs/user_guide/admin_guide/deployment/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -13,6 +13,7 @@ Deployment Guide
operation
containerized_deployment
helm_chart
openshift
brev_scripted_deployment
brev_deployment
cloud_deployment
Expand Down
31 changes: 31 additions & 0 deletions docs/user_guide/admin_guide/deployment/openshift.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,31 @@
.. _openshift_k8s_deployment:

##############################
Deploying FLARE on OpenShift
##############################

The OpenShift deployment guide and helper scripts now live in the DevOps
examples directory:

``examples/devops/openshift``

Where to Start
==============

Open ``examples/devops/openshift/README.md`` first for a concise folder
overview. It lists the Dockerfiles, helper scripts, and typical quickstart
commands.

Open ``examples/devops/openshift/index.md`` for the full OpenShift deployment
guide. That source document covers prerequisites, image requirements, the
scripted workflow, manual deployment steps, OpenShift SCC notes,
troubleshooting, and cleanup.

Run the scripts from the NVFlare repository root. For example:

.. code-block:: bash

bash examples/devops/openshift/scripts/k8s_e2e.sh

The OpenShift example builds on the generic Kubernetes deployment runtime. See
:ref:`helm_chart` for the Kubernetes Helm chart workflow and runtime details.
3 changes: 2 additions & 1 deletion docs/user_guide/nvflare_cli/deploy_command.rst
Original file line number Diff line number Diff line change
Expand Up @@ -176,7 +176,8 @@ Top-level keys:
``job_launcher`` keys:

- ``config_file_path``: kubeconfig path used by ``K8sJobLauncher``. Use
``null`` for in-cluster config.
``null`` for in-cluster config, where the Kubernetes Python client uses the
pod's ServiceAccount token.
- ``pending_timeout``: seconds to wait for a job pod to leave ``Pending``.
- ``default_python_path``: Python executable used in job pods unless a job
overrides it with ``launcher_spec[site]["k8s"]["python_path"]``. Defaults to
Expand Down
1 change: 1 addition & 0 deletions examples/README.md
Original file line number Diff line number Diff line change
Expand Up @@ -186,5 +186,6 @@ When you open a notebook, select the kernel `nvflare_example` using the dropdown
| Example | Framework | Summary |
|-------------------------------------------------------------|-----------|--------------------------------------------------------------------------------------------------------------------------|
| [Docker Job Launcher](./docker/README.md) | NA | End-to-end Docker runtime example using `nvflare deploy prepare` and per-job Docker containers. |
| [OpenShift Deployment](./devops/openshift/README.md) | NA | OpenShift-specific deployment guide and helper scripts using the Kubernetes runtime support. |
| [DevOps Deployment Examples](./devops/README.md) | NA | Test-only helper scripts for trying NVFlare deployment flows on Kubernetes and managed cloud clusters; not production deployment guidance. |
| [Monitoring](./advanced/monitoring/README.md) | NA | FLARE Monitoring provides an initial solution for tracking system metrics of your federated learning jobs. |
15 changes: 9 additions & 6 deletions examples/devops/README.md
Original file line number Diff line number Diff line change
@@ -1,8 +1,8 @@
# NVFlare DevOps Examples

This directory contains example scripts for quickly testing NVFlare deployment
flows on Kubernetes and managed cloud clusters. They are intended for local
development, smoke testing, demos, and learning.
flows on Kubernetes, OpenShift, and managed cloud clusters. They are intended
for local development, smoke testing, demos, and learning.

These scripts are not production quality. They are not a hardened deployment
blueprint and do not replace site-specific review for security, networking,
Expand All @@ -11,10 +11,11 @@ operations.

## Scope

Use these examples to create temporary test clusters, build and push a test
NVFlare image, deploy a small NVFlare system, inspect it, and tear it down.
They assume you already have an NVFlare development environment and the
required cloud CLIs configured for the target accounts or projects.
Use these examples to create or target temporary test clusters, build and push
a test NVFlare image, deploy a small NVFlare system, inspect it, and tear it
down. They assume you already have an NVFlare development environment and the
required Kubernetes, OpenShift, or cloud CLIs configured for the target
clusters, accounts, or projects.

Before running a deployment, copy or edit `examples/devops/multicloud/all-clouds.yaml`
and replace the placeholder image tag, kubeconfig inputs, namespaces, storage
Expand All @@ -24,6 +25,8 @@ classes, and participants for the clusters you want to test.

- `multicloud/` - YAML-driven NVFlare deployment, status, dashboard, and image
build/push helpers.
- `openshift/` - OpenShift-specific deployment guide and helper scripts using
the Kubernetes runtime support.
- `gcp/gke/`, `aws/eks/`, `azure/aks/` - cloud cluster setup scripts and notes.
- `examples/devops/.tmp/` - local generated kubeconfigs and state; not intended for
commit.
Expand Down
77 changes: 77 additions & 0 deletions examples/devops/openshift/README.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,77 @@
# OpenShift Deployment Helpers

This directory contains the OpenShift-specific NVFlare deployment guide and helper scripts.

- [index.md](index.md) is the detailed OpenShift deployment guide.
- Repository `docker/Dockerfile.parent` builds the parent image used by server/client and admin pods.
- Repository `docker/Dockerfile.job` builds the workload image used by job pods.
- `scripts/create_openshift_cluster.sh` configures Red Hat OpenShift Local (CRC) and optionally starts it.
- `scripts/start_openshift_cluster.sh` starts CRC, logs in with `oc`, and prepares the target project.
- `scripts/cleanup_openshift_cluster.sh` deletes scripted deployment resources and stops CRC.
- `scripts/k8s_provision.sh` runs `nvflare provision` for the sample server, `site-1`, `site-2`, and admin.
- `scripts/k8s_deploy.sh` prepares K8s startup kits, stages PVC workspaces, installs Helm charts, and verifies parent pods can import the Kubernetes Python client.
- `scripts/k8s_submit_job.sh` submits `hello-numpy` from an in-cluster admin pod and waits for successful completion.
- `scripts/k8s_watch.sh` shows an in-place live Rich pod table for the created pods.
- `scripts/k8s_watch.py` implements the Rich table used by the shell wrapper.
- `scripts/k8s_e2e.sh` runs provision, deploy, and submit in order.

## Create a Local OpenShift Cluster

Use the CRC helper scripts only when you need a single-node Red Hat OpenShift
Local cluster for development or testing. Production OpenShift clusters are
platform-specific; create those with your organization's approved installer or
cloud service workflow, then use the deployment scripts here against that
cluster.

Before using the local-cluster scripts, install Red Hat OpenShift Local so the
`crc` command is available, download your Red Hat OpenShift pull secret from
`https://console.redhat.com/openshift/create/local`, enable host hardware
virtualization, and make sure the host has enough CPU, memory, and disk for
OpenShift plus the NVFlare test pods. The create script defaults to 6 vCPUs,
24576 MiB memory, and 120 GiB disk.

Use `scripts/create_openshift_cluster.sh` for first-time local CRC setup. It
validates that `crc` exists, requires `PULL_SECRET_FILE` when the cluster will
be started, writes CRC settings such as resource sizing and shared-directory
behavior, runs `crc setup`, and starts the cluster by delegating to
`scripts/start_openshift_cluster.sh` unless `START_AFTER_CREATE=false` is set.

```bash
export PULL_SECRET_FILE="$HOME/Downloads/pull-secret.txt"
export NAMESPACE=nvflare-e2e

bash examples/devops/openshift/scripts/create_openshift_cluster.sh
```

Use `scripts/start_openshift_cluster.sh` after CRC has already been configured,
or when restarting after `crc stop`. It runs `crc start` when needed, adds the
CRC-provided `oc` to `PATH` if needed, waits for OpenShift to report running,
logs in with `oc`, creates or selects `NAMESPACE`, and prints the console URL
and available StorageClasses.

```bash
PULL_SECRET_FILE="$HOME/Downloads/pull-secret.txt" \
bash examples/devops/openshift/scripts/start_openshift_cluster.sh
```

Run scripts from the repository root. Build the maintained images from `docker/Dockerfile.parent` and `docker/Dockerfile.job`, push them to a registry the cluster can pull from, then set `IMAGE` to the parent image and `JOB_IMAGE` to the workload image. `ADMIN_IMAGE` defaults to `IMAGE`, so the parent image can also be used for the temporary admin pod. The parent image needs NVFlare with the `K8S` extra/Kubernetes Python client. A custom `COPY_IMAGE` needs `sh`, `sleep`, and `tar`; `JOB_IMAGE` only needs `tar` when the job workload itself needs it.

```bash
export IMAGE=registry.example.com/nvflare-parent:dev
export JOB_IMAGE=registry.example.com/nvflare-job:dev
export NAMESPACE=nvflare-e2e

bash examples/devops/openshift/scripts/k8s_e2e.sh
```

The watch tool requires the Python `rich` package:

```bash
python3 -m pip install rich
```

Clean up generated resources and stop OpenShift Local:

```bash
bash examples/devops/openshift/scripts/cleanup_openshift_cluster.sh
```
Loading
Loading