[KYUUBI #7445] Add AGENTS.md to guide AI coding agents by wangzhigang1999 · Pull Request #7452 · apache/kyuubi

wangzhigang1999 · 2026-05-15T08:29:40Z

Why are the changes needed?

Resolves #7445. Adds AGENTS.md at the repo root — a terse, command-oriented guide for AI coding agents contributing to Kyuubi. Complements CONTRIBUTING.md. Content is distilled from review feedback on landed PRs and aligned with sibling Apache projects (Spark, Iceberg, Airflow, DataFusion); the methodology and underlying rule set are linked in #7445.

How was this patch tested?

Documentation-only change. Every referenced command, Maven profile, script path, class name, and file path was verified against the current master. Maven invocations (-Pfast, -DwildcardSuites, -Dtest, dev/reformat via Spotless) were run on tiny modules to confirm the documented usage actually executes.

Was this patch authored or co-authored using generative AI tooling?

Assisted-by: Claude:claude-opus-4-7

Resolves apache#7445.

Verified all commands, file paths, class names, and Maven profiles against the actual repository. Corrected the following: - Thrift IDL is upstream (Hive TCLIService via kyuubi-relocated-thrift); rewrote the wire-format references accordingly. Only Kyuubi-side delta is python/scripts/thrift-patches/. - SessionManager and OperationManager live in kyuubi-common, not kyuubi-server. Corrected high-sensitivity area paths. - Replaced KyuubiBaseConnectionTest (does not exist) with the actual KyuubiStatementTest example. - Replaced PlanOnlyStatementSuite (does not exist) with PlanOnlyOperationSuite. - Replaced the broad "KYUUBI_UPDATE=1 build/mvn clean test" with dev/gen/gen_all_config_docs.sh, which is the scoped wrapper that actually regenerates settings.md. - Rewrote the CI failure investigation section: Kyuubi CI does not emit per-test check-run annotations the way Spark CI does, so the recipe now uses gh run --log-failed and artifact download. - Enumerated the externals/* engine modules explicitly instead of relying on a glob pattern that did not cover kyuubi-trino-engine.

Verified against root pom.xml, kyuubi-common/pom.xml, ConfigBuilder.scala, and live source. Corrections: - Removed externals/kyuubi-chat-engine reference; the CHAT engine was removed in Kyuubi 1.12 (per docs/deployment/migration-guide.md). - Rephrased the engine profile matrix: only Spark and Scala have version profiles in pom.xml; Flink (1.20.3), Hive (3.1.3), and Trino client (411) are pinned to single versions. - Java tests use JUnit 4 (no AssertJ dependency in any pom.xml); corrected the testing section accordingly. - kyuubi-common does not contain in-tree Thrift IDL; described it as hosting relocated Hive Thrift RPC types instead. - Distinguished integration-style tests from unit tests for build/dist prerequisite (per docs/contributing/code/testing.md). - Server-only configs are declared via .serverOnly on the ConfigBuilder; serverOnlyConfEntries is the private registry that tracks them. - Softened the "Never" rules for process.destroy() and SparkSession.active to "do not introduce new usage", since legacy usages exist in Spark plugin entry points that require a no-arg constructor.

Markdown reflows naturally on render; the hard wraps at ~70 chars were a stylistic choice from the initial draft that made the source harder to edit. Code fences, tables, nested list items, and the license header are preserved unchanged.

- Server boundary: rephrase from "no Spark/Flink/Hive/Trino dependency" to "no Kyuubi engine module dependency". The server legitimately uses io.trino:trino-client for the Trino frontend and similar protocol/client libraries are allowed. - Engine profile matrix: Flink also has version profiles (flink-1.17/1.18/1.19/1.20). Updated the table and the surrounding text; Hive and Trino remain pinned. - Spark SQL syntax: Kyuubi extensions already add project-specific clauses (OPTIMIZE ... ZORDER BY). Rephrased from "no SQL syntax that Spark doesn't support" to require maintainer agreement and version-gated parser support for new clauses. - Thrift dependencies: kyuubi-common pulls both kyuubi-relocated-thrift and kyuubi-relocated-hive-service-rpc; spelled both out. - Utils.isTesting: legacy usages remain in production (EngineRef, JdbcProcessBuilder, TrinoProcessBuilder, BatchesResource). Softened from "never" to "do not introduce new ones without reviewer agreement". - git config user.email: removed the "reviewers will block" framing, which is not documented as a hard policy; kept the practical fix-it guidance.

…d process.destroy primary rules - Pre-flight: warn that Apache RAT scans the working tree and only honors .rat-excludes (not .gitignore), so private ignored files outside the repo root are safer. - Scala specifics: rephrase the SparkSession.active rule to match the softened Never-list wording (allow Spark plugin/catalog entry points with no parameterized path). - Error handling: rephrase the process.destroy() rule to "do not introduce direct process.destroy() / destroyForcibly()" for consistency with the Never list.

….md Never list

… Spark/Iceberg style

pan3793

@wangzhigang1999, thank you for drafting this proposal, I went through it and left some comments, some points might need more discussion.

pan3793 · 2026-05-17T17:38:12Z

+
+Allowed under ASF guidance and Kyuubi's PR template.
+
+- Disclose tools via `Generated-by: <tool name and version>` in the PR template field.


I feel this is actually not a good practice; we borrowed it from Spark, which was derived from ASF guidance

When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”.

https://www.apache.org/legal/generative-tooling.html

I think this is a misinterpretation of that, Generated-by: used to be an informal git commit message tag proposed by the Linux kernel community, and never got adopted. It is also misleading - we usually use LLM and Agent for not only code generation, but also code review, code reading, solution exploring, etc., the Generated-by: makes me feel like the whole patch is generated by the AI and I am just a transporter delivering it to GitHub.

Instead, I suggest we update this to adopt the Assisted-by: tag, which has been adopted by the Linux kernel community.

torvalds/linux@78d979d
https://docs.kernel.org/process/coding-assistants.html

Agreed on dropping Generated-by:. The thing I'm not sure about is what to put after it.

I went through all 221 real Assisted-by: trailers in torvalds/linux — formats are pretty messy (claude-opus-4-6 / 4.6 / just Claude...), and around 17% use the [TOOL] slot, almost always for coccinelle or kconfiglint, neither of which we run.

So three ways we could go:

Copy kernel as-is: Assisted-by: <AGENT>:<MODEL> [TOOL1] [TOOL2]. My worry is that "tool" in an AI context easily reads as "agent tool call" (Read/Edit/Bash), and people will start dumping their IDE or formatter in there.

Same thing but drop the tool slot: Assisted-by: <AGENT>:<MODEL>. This is actually what ~80% of real kernel commits look like anyway, so we'd be matching what people actually do, not what the spec says.

Just rename the field, keep the value free-form: Assisted-by: <tool name and version>. Smallest change to the existing PR template.

I lean toward 2. WDYT?

(Whichever we pick I'll update .github/PULL_REQUEST_TEMPLATE in this PR too.)

I prefer the option 2

- Use `apache` remote; Kyuubi scripts assume the name - Engine list trails with `etc.`; drop Hive from the example set - Add `-am` to single-module Maven examples - Spell out "dependency" instead of "dep" - Drop the "Prefer Java" stance; both languages are first-class - Soften `Option[T]` rule with perf exception - Remove Python thrift-patches reference - Note Thrift Operation reuse with operation-level config keys - Cite fixed version when referencing upstream issues - Keep PR prose precise and concise - Switch AI disclosure to `Assisted-by: <AGENT>:<MODEL>` - Loosen Java test note to defer to surrounding module style - Mention GitHub Discussions alongside the dev mailing list

…NT>:<MODEL_ID> - AGENTS.md: pin the example to the canonical model id (`claude-opus-4-7`). - PULL_REQUEST_TEMPLATE: rename the field from `Generated-by:` to `Assisted-by:`, broaden the prompt to cover review/exploration, and align the example.

aajisaka · 2026-05-19T03:23:23Z

+
+Allowed under [ASF generative tooling guidance](https://www.apache.org/legal/generative-tooling.html) and Kyuubi's PR template. AI is used for more than code generation (review, reading, solution exploration), so disclose assistance rather than implying the patch was machine-authored.
+
+- Disclose assistance via `Assisted-by: <AGENT>:<MODEL_VERSION>` in the PR template field, following the [Linux kernel convention](https://docs.kernel.org/process/coding-assistants.html). Example: `Assisted-by: Claude:claude-opus-4-7`.


Thank you for the update. Would you also update the pull request template?

I just made changes in bb861cb.

wangzhigang1999 added 8 commits May 15, 2026 11:12

[KYUUBI apache#7445] Add AGENTS.md to guide AI coding agents

a09dabf

Resolves apache#7445.

[KYUUBI apache#7445] Refine SparkSession.active legacy note in AGENTS…

f17ae41

….md Never list

[KYUUBI apache#7445] Trim AGENTS.md: drop Source Notation, tighten to…

91cf6c0

… Spark/Iceberg style

github-actions Bot added the kind:documentation Documentation is a feature! label May 15, 2026

wangzhigang1999 marked this pull request as ready for review May 15, 2026 08:54

pan3793 reviewed May 17, 2026

View reviewed changes

wangzhigang1999 force-pushed the kyuubi-7445-agents-md branch from f9f67d4 to d0c685c Compare May 18, 2026 09:47

wangzhigang1999 added 2 commits May 18, 2026 23:48

[KYUUBI apache#7445] Retrigger CI

caf20d9

github-actions Bot added the kind:infra license, community building, project builds, asf infra related, etc. label May 19, 2026

aajisaka reviewed May 19, 2026

View reviewed changes

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

[KYUUBI #7445] Add AGENTS.md to guide AI coding agents#7452

[KYUUBI #7445] Add AGENTS.md to guide AI coding agents#7452
wangzhigang1999 wants to merge 11 commits into
apache:masterfrom
wangzhigang1999:kyuubi-7445-agents-md

wangzhigang1999 commented May 15, 2026 •

edited

Loading

Uh oh!

pan3793 left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan3793 May 17, 2026

Uh oh!

wangzhigang1999 May 18, 2026

Uh oh!

aajisaka May 18, 2026

Uh oh!

Uh oh!

aajisaka May 19, 2026

Uh oh!

wangzhigang1999 May 19, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants


		Allowed under ASF guidance and Kyuubi's PR template.

		- Disclose tools via `Generated-by: <tool name and version>` in the PR template field.


		Allowed under [ASF generative tooling guidance](https://www.apache.org/legal/generative-tooling.html) and Kyuubi's PR template. AI is used for more than code generation (review, reading, solution exploration), so disclose assistance rather than implying the patch was machine-authored.

		- Disclose assistance via `Assisted-by: <AGENT>:<MODEL_VERSION>` in the PR template field, following the [Linux kernel convention](https://docs.kernel.org/process/coding-assistants.html). Example: `Assisted-by: Claude:claude-opus-4-7`.

Conversation

wangzhigang1999 commented May 15, 2026 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Why are the changes needed?

How was this patch tested?

Was this patch authored or co-authored using generative AI tooling?

Uh oh!

pan3793 left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

pan3793 May 17, 2026

Choose a reason for hiding this comment

Uh oh!

wangzhigang1999 May 18, 2026

Choose a reason for hiding this comment

Uh oh!

aajisaka May 18, 2026

Choose a reason for hiding this comment

Uh oh!

Uh oh!

aajisaka May 19, 2026

Choose a reason for hiding this comment

Uh oh!

wangzhigang1999 May 19, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

3 participants

wangzhigang1999 commented May 15, 2026 •

edited

Loading