apache · wangzhigang1999 · May 15, 2026 · May 15, 2026 · May 15, 2026 · May 15, 2026
diff --git a/AGENTS.md b/AGENTS.md
@@ -0,0 +1,222 @@
+<!--
+  Licensed to the Apache Software Foundation (ASF) under one or more
+  contributor license agreements.  See the NOTICE file distributed with
+  this work for additional information regarding copyright ownership.
+  The ASF licenses this file to You under the Apache License, Version 2.0
+  (the "License"); you may not use this file except in compliance with
+  the License.  You may obtain a copy of the License at
+
+      http://www.apache.org/licenses/LICENSE-2.0
+
+  Unless required by applicable law or agreed to in writing, software
+  distributed under the License is distributed on an "AS IS" BASIS,
+  WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied.
+  See the License for the specific language governing permissions and
+  limitations under the License.
+-->
+
+# Apache Kyuubi — Agent Instructions
+
+A guide for AI coding agents contributing to Apache Kyuubi. Pairs with `CONTRIBUTING.md`, the [Contributor Guide](https://kyuubi.readthedocs.io/en/master/contributing/code/index.html), and review-derived rules in `docs/design/merged_rules.md` (the source of truth when this file is silent).
+
+## Pre-flight Checks
+
+Before the first edit or test in a session:
+
+1. Run `git remote -v`. An `upstream` (or equivalent) remote must point to `apache/kyuubi`; do not work from a fork-only checkout.
+2. If `<upstream>/master` is stale, `git fetch <upstream> master` before branching or rebasing.
+3. Check `git status`. If the tree is dirty, ask the user to stash before any branch switch or release/RAT-style check.
+4. Existing PR: resolve the branch via `gh api repos/apache/kyuubi/pulls/<num> --jq '.head.ref'`, then check it out.
+5. New work: branch from `<upstream>/master`. Branch names like `kyuubi-NNNN-short-slug` are convention, not policy.
+6. Confirm `git config user.email` matches an email on the GitHub account that will open the PR.
+7. Keep private/ignored files outside the repo root — RAT scans generated local files even when `.gitignore`d.
+
+## Architecture
+
+Kyuubi is a multi-tenant gateway that fronts pluggable SQL engines (Spark, Flink, Hive, Trino, JDBC). The module split is load-bearing in code review.
+
+### Module Layout
+
+- `kyuubi-server/` — gateway process. Must not depend on engine modules under `externals/` or implement engine-side runtime behavior.
+- `kyuubi-common/` — shared service abstractions, config registry, session/operation base classes, relocated Hive Thrift RPC.
+- `kyuubi-ha/`, `kyuubi-events/`, `kyuubi-metrics/`, `kyuubi-rest-client/`, `kyuubi-zookeeper/` — focused libraries.
+- `kyuubi-ctl/` — admin CLI.
+- `kyuubi-hive-jdbc/`, `kyuubi-hive-beeline/` — JDBC client and shell.
+- `externals/kyuubi-{spark,flink,hive,trino,jdbc,data-agent}-sql-engine/` — engine processes. Must not depend on each other.
+- `extensions/{server,spark,flink}/` — opt-in plug-ins.
+- `integration-tests/` — cross-module suites.
+- `kyuubi-assembly/` — binary distribution.
+
+### Hard Boundaries
+
+- Server must not depend on Kyuubi engine modules or use engine runtime APIs (Spark/Flink internals) to implement engine behavior server-side.
+- Protocol/client libraries are not a blanket exception — if one is needed in `kyuubi-server`, keep it limited to wire-protocol or client-model translation and verify the dep tree.
+- Engines must not depend on sibling engines (e.g. no `kyuubi-flink-sql-engine` → `kyuubi-spark-sql-engine`).
+- Public APIs are abstract over the cluster manager. Use `killApplication`, not `closeYarnJob`; APIs, classes, and config keys must work for YARN, Kubernetes, and future managers.
+- Spark SQL syntax extensions need maintainer agreement, version-aware parser support, and a clear reason they cannot stay upstream-compatible.
+- Do not break the Thrift wire protocol. Kyuubi speaks Hive TCLIService through relocated dependencies; Python bindings are patched under `python/scripts/thrift-patches/`. Never remove or renumber wire fields.
+
+### High-Sensitivity Areas
+
+Get reviewer attention before changing:
+
+- `python/scripts/thrift-patches/` — patches against upstream Hive TCLIService schema.
+- `kyuubi-common/.../session/SessionManager`, `.../operation/OperationManager` — lifecycle, concurrency, shutdown.
+- `kyuubi-server/.../engine/KubernetesApplicationOperation` — cluster-manager integration; tests must not assume a real cluster.
+- `kyuubi-common/.../config/KyuubiConf` — config registry; changes require regenerating `settings.md`.
+- `kyuubi-server/.../api/v1/` — public REST surface; add auth checks before exposing.
+
+## Build and Test
+
+Use the bundled Maven wrapper (`build/mvn`).
+
+```
+build/mvn -Pfast clean package -DskipTests        # local compile, skips tests/style/docs/RAT/downloads
+build/mvn clean package -pl kyuubi-common -DskipTests
+build/mvn clean install                           # all tests
+build/mvn clean install -pl kyuubi-common         # one module's tests
+build/mvn test -pl kyuubi-server -Dtest=none \
+    -DwildcardSuites=org.apache.kyuubi.server.api.v1.SessionsResourceSuite
+build/mvn test -pl kyuubi-hive-jdbc -Dtest=KyuubiStatementTest -DwildcardSuites=none
+```
+
+Integration-style tests that exercise a packaged engine require `build/dist` first.
+
+### Engine profile matrix
+
+| Profile | Notes |
+|---|---|
+| `-Pspark-3.5` (default), `-Pspark-{3.3,3.4,4.0,4.1,master}` | Spark version |
+| `-Pflink-1.20` (default), `-Pflink-{1.17,1.18,1.19}` | Flink version |
+| `-Pscala-2.13` | Scala 2.13 (default is 2.12) |
+| `-P{spark,flink,hive}-provided` | skip bundled engine downloads |
+| `-Pmirror-cdn` | use Apache mirror CDN for engine archives |
+| `-Pfast` | skip tests/style/docs/enforcer/RAT/downloads |
+
+When engine code varies across versions, gate source/binary differences by Maven profile and runtime capability differences by feature detection — not by parsing version strings.
+
+### Style, deps, docs
+
+```
+dev/reformat                          # Spotless (+ Python Spotless if `black` is installed); run before every commit
+build/dependency.sh [--replace]       # detect / update dev/dependencyList drift after dep changes
+dev/gen/gen_all_config_docs.sh        # regenerate docs/configuration/settings.md after KyuubiConf changes
+```
+
+Runtime dep changes may also require `LICENSE-binary` and `NOTICE` updates.
+
+## Coding Conventions
+
+### Languages
+
+- Prefer Java for new cross-module business logic. Use Scala for Spark/Flink extension internals, Scala test suites, and APIs already dominated by Scala types.
+- Scala follows the Scalafmt/Scalastyle config; Java follows Spotless/google-java-format. `dev/reformat` enforces both.
+
+### Scala
+
+- `Option[T]`, never `null`.
+- Pattern-match guards (`case X(...) if cond =>`) over nested `if`/`match`.
+- `()` on no-arg methods with side effects; omit on pure methods.
+- Early return for guard conditions over nested `if-else`.
+- Avoid new `SparkSession.active` lookups in production code; pass `SparkSession` explicitly unless a Spark entry point gives no parameterized path.
+
+### Naming
+
+- Names describe purpose, not type. Avoid generic terms like `line`, `clue`, `principal`.
+- No abbreviations (`currentUser`, not `currUser`).
+- Avoid generic factories like `from(...)` — prefer `fromConfig`, `fromUserInput`.
+- Names must reflect scope: a connection-level counter is not `totalRows` if it counts one operation.
+- No magic numbers/strings. Use named constants; for enums use `MyEnum.X.toString`.
+
+### Comments
+
+- Explain why, not what. Self-explanatory code needs no comment.
+- Comment intentional omissions and surprising decisions.
+- TODO/FIXME should link a tracking issue when the work is deferred.
+
+### Error handling and logging
+
+- Throw for unsupported features and invalid configs; do not silently ignore.
+- Do not introduce direct `process.destroy()` / `destroyForcibly()` for engine shutdown — notify engines so their shutdown hooks run.
+- Catch the specific exception (`IOException`, `JsonProcessingException`), not its parent.
+- Log levels: `error` = actionable failure, `warn` = recoverable degradation, `info` = lifecycle, `debug` = diagnostics.
+- SLF4J placeholders, not concatenation: `log.info("opened session {}", handle)`.
+
+## Configuration
+
+Configs are `ConfigEntry` instances in `KyuubiConf` or per-engine config classes.
+
+- Server-only entries call `.serverOnly` on the builder; `KyuubiConf` tracks them in `serverOnlyConfEntries` so they are stripped before engine-side propagation.
+- `version()` on `ConfigEntry` is the release the key first ships in.
+- Namespace matches scope: `kyuubi.operation.*`, `kyuubi.session.*`, `kyuubi.<engine>.*`.
+- Provide a sensible default; if none is safe, leave it undefined and document the requirement.
+- Initialize config-dependent fields in `initialize(conf)`, not at declaration time.
+- Reuse `fallbackConf` to share defaults across frontends; do not copy identical entries.
+- After any entry change, run `dev/gen/gen_all_config_docs.sh` and commit the diff.
+
+## Testing
+
+- Tests ship in the same PR as the feature. "Tests will follow" is rejected.
+- Place tests in the existing suite for the area; create a new suite only when the domain is genuinely new.
+- Use `withSessionConf` from `HiveJDBCTestHelper` for per-test config overrides; do not mutate shared fixtures.
+- Scala tests use ScalaTest style — no JUnit `@Test` annotations.
+- Java tests follow surrounding module style (JDBC tests use JUnit 4: `Assert.assertEquals`, `Assert.assertThrows`).
+- A test that still passes when the change under test is reverted is not a test. Verify meaningful failure locally when practical.
+- Do not add new `Utils.isTesting` branches in production code; use test config or build properties.
+- Benchmarks belong in the Spark benchmark framework, not regular suites.
+
+## PR Workflow
+
+PR title: `[KYUUBI #NNNN][COMPONENT] Short imperative summary`. `NNNN` is the linked issue number; `[COMPONENT]` is optional but encouraged (use existing tags from recent `git log` for precedent). Use `[WIP]` while iterating. Update title and description if scope expands — both flow into the merge commit.
+
+Fill in `.github/PULL_REQUEST_TEMPLATE`:
+
+- **Why are the changes needed?** — motivation. Reviewers read the diff for "what".
+- **How was this patch tested?** — concrete commands or manual steps; explain if tests were not added.
+- **Was this patch authored or co-authored using generative AI tooling?** — fill the field.
+
+One concern per PR. Refactors go in their own PR; if an unrelated nit catches your eye, open a separate PR or follow-up issue.
+
+### Investigating CI failures
+
+Use scoped `gh run` commands instead of downloading full logs first.
+
+```
+gh run list -R apache/kyuubi -b <pr-branch> -L 5
+gh run view <run-id> -R apache/kyuubi --log-failed
+gh run view <run-id> -R apache/kyuubi -j <job-id> --log-failed
+gh run download <run-id> -R apache/kyuubi -n <unit-tests-log-...>   # full logs of a failed Kyuubi+Spark job
+```
+
+## AI-Assisted Contributions
+
+Allowed under ASF guidance and Kyuubi's PR template.
+
+- Disclose tools via `Generated-by: <tool name and version>` in the PR template field.
+- Do not use `Co-Authored-By: <AI tool>`.
+- The human author is responsible for the patch and must review every line.
+- No AI self-references in source, comments, commit messages, or PR titles — disclosure stays in the PR description.
+- Verify license/terms compatibility for generated content.
+
+## Boundaries
+
+### Never
+
+- Push directly to `apache/kyuubi`. All changes go through a PR.
+- Force-push to a shared branch. Force-push to your own PR branch is fine when coordinated.
+- Break Thrift wire compatibility. Adding optional fields is less risky than removing/renumbering, but any wire change needs reviewer attention.
+- Commit `kyuubi_state_store.db`, logs, generated local configs, IDE files, credentials, or other private artifacts.
+- Skip `dev/reformat` before committing.
+- Introduce new direct `process.destroy()` / `destroyForcibly()` for engine shutdown.
+- Introduce new `SparkSession.active` lookups in production code without a reviewer-approved reason.
+
+### Ask first
+
+- New runtime dependencies, especially with non-ASL2-compatible licenses.
+- Public API changes (`kyuubi-common`, REST endpoints, Thrift IDL).
+- Cross-engine refactors that touch more than one `externals/*` module.
+- New configuration entries that gate user-visible behavior.
+- New subsystems or large architectural shifts — discuss on the dev mailing list when scope warrants community design input.
+- Anything in High-Sensitivity Areas above.
+
+For anything not covered here, see the [Contributor Guide](https://kyuubi.readthedocs.io/en/master/contributing/code/index.html) or ask on the [Apache Kyuubi mailing list](https://kyuubi.apache.org/mailing_lists.html).