Skip to content

[KYUUBI #7445] Add AGENTS.md to guide AI coding agents#7452

Open
wangzhigang1999 wants to merge 11 commits into
apache:masterfrom
wangzhigang1999:kyuubi-7445-agents-md
Open

[KYUUBI #7445] Add AGENTS.md to guide AI coding agents#7452
wangzhigang1999 wants to merge 11 commits into
apache:masterfrom
wangzhigang1999:kyuubi-7445-agents-md

Conversation

@wangzhigang1999
Copy link
Copy Markdown
Contributor

@wangzhigang1999 wangzhigang1999 commented May 15, 2026

Why are the changes needed?

Resolves #7445. Adds AGENTS.md at the repo root — a terse, command-oriented guide for AI coding agents contributing to Kyuubi. Complements CONTRIBUTING.md. Content is distilled from review feedback on landed PRs and aligned with sibling Apache projects (Spark, Iceberg, Airflow, DataFusion); the methodology and underlying rule set are linked in #7445.

How was this patch tested?

Documentation-only change. Every referenced command, Maven profile, script path, class name, and file path was verified against the current master. Maven invocations (-Pfast, -DwildcardSuites, -Dtest, dev/reformat via Spotless) were run on tiny modules to confirm the documented usage actually executes.

Was this patch authored or co-authored using generative AI tooling?

Assisted-by: Claude:claude-opus-4-7

Verified all commands, file paths, class names, and Maven profiles
against the actual repository. Corrected the following:

- Thrift IDL is upstream (Hive TCLIService via kyuubi-relocated-thrift);
  rewrote the wire-format references accordingly. Only Kyuubi-side
  delta is python/scripts/thrift-patches/.
- SessionManager and OperationManager live in kyuubi-common, not
  kyuubi-server. Corrected high-sensitivity area paths.
- Replaced KyuubiBaseConnectionTest (does not exist) with the actual
  KyuubiStatementTest example.
- Replaced PlanOnlyStatementSuite (does not exist) with
  PlanOnlyOperationSuite.
- Replaced the broad "KYUUBI_UPDATE=1 build/mvn clean test" with
  dev/gen/gen_all_config_docs.sh, which is the scoped wrapper that
  actually regenerates settings.md.
- Rewrote the CI failure investigation section: Kyuubi CI does not
  emit per-test check-run annotations the way Spark CI does, so the
  recipe now uses gh run --log-failed and artifact download.
- Enumerated the externals/* engine modules explicitly instead of
  relying on a glob pattern that did not cover kyuubi-trino-engine.
Verified against root pom.xml, kyuubi-common/pom.xml,
ConfigBuilder.scala, and live source. Corrections:

- Removed externals/kyuubi-chat-engine reference; the CHAT engine was
  removed in Kyuubi 1.12 (per docs/deployment/migration-guide.md).
- Rephrased the engine profile matrix: only Spark and Scala have
  version profiles in pom.xml; Flink (1.20.3), Hive (3.1.3), and Trino
  client (411) are pinned to single versions.
- Java tests use JUnit 4 (no AssertJ dependency in any pom.xml);
  corrected the testing section accordingly.
- kyuubi-common does not contain in-tree Thrift IDL; described it as
  hosting relocated Hive Thrift RPC types instead.
- Distinguished integration-style tests from unit tests for build/dist
  prerequisite (per docs/contributing/code/testing.md).
- Server-only configs are declared via .serverOnly on the
  ConfigBuilder; serverOnlyConfEntries is the private registry that
  tracks them.
- Softened the "Never" rules for process.destroy() and
  SparkSession.active to "do not introduce new usage", since legacy
  usages exist in Spark plugin entry points that require a no-arg
  constructor.
Markdown reflows naturally on render; the hard wraps at ~70 chars
were a stylistic choice from the initial draft that made the source
harder to edit. Code fences, tables, nested list items, and the
license header are preserved unchanged.
- Server boundary: rephrase from "no Spark/Flink/Hive/Trino
  dependency" to "no Kyuubi engine module dependency". The server
  legitimately uses io.trino:trino-client for the Trino frontend and
  similar protocol/client libraries are allowed.
- Engine profile matrix: Flink also has version profiles
  (flink-1.17/1.18/1.19/1.20). Updated the table and the surrounding
  text; Hive and Trino remain pinned.
- Spark SQL syntax: Kyuubi extensions already add project-specific
  clauses (OPTIMIZE ... ZORDER BY). Rephrased from "no SQL syntax that
  Spark doesn't support" to require maintainer agreement and
  version-gated parser support for new clauses.
- Thrift dependencies: kyuubi-common pulls both
  kyuubi-relocated-thrift and kyuubi-relocated-hive-service-rpc;
  spelled both out.
- Utils.isTesting: legacy usages remain in production
  (EngineRef, JdbcProcessBuilder, TrinoProcessBuilder, BatchesResource).
  Softened from "never" to "do not introduce new ones without
  reviewer agreement".
- git config user.email: removed the "reviewers will block" framing,
  which is not documented as a hard policy; kept the practical fix-it
  guidance.
…d process.destroy primary rules

- Pre-flight: warn that Apache RAT scans the working tree and only
  honors .rat-excludes (not .gitignore), so private ignored files
  outside the repo root are safer.
- Scala specifics: rephrase the SparkSession.active rule to match the
  softened Never-list wording (allow Spark plugin/catalog entry
  points with no parameterized path).
- Error handling: rephrase the process.destroy() rule to "do not
  introduce direct process.destroy() / destroyForcibly()" for
  consistency with the Never list.
@github-actions github-actions Bot added the kind:documentation Documentation is a feature! label May 15, 2026
@wangzhigang1999 wangzhigang1999 marked this pull request as ready for review May 15, 2026 08:54
Copy link
Copy Markdown
Member

@pan3793 pan3793 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@wangzhigang1999, thank you for drafting this proposal, I went through it and left some comments, some points might need more discussion.

Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md Outdated
Comment thread AGENTS.md
Comment thread AGENTS.md Outdated

Allowed under ASF guidance and Kyuubi's PR template.

- Disclose tools via `Generated-by: <tool name and version>` in the PR template field.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I feel this is actually not a good practice; we borrowed it from Spark, which was derived from ASF guidance

When providing contributions authored using generative AI tooling, a recommended practice is for contributors to indicate the tooling used to create the contribution. This should be included as a token in the source control commit message, for example including the phrase “Generated-by: ”.

https://www.apache.org/legal/generative-tooling.html

I think this is a misinterpretation of that, Generated-by: used to be an informal git commit message tag proposed by the Linux kernel community, and never got adopted. It is also misleading - we usually use LLM and Agent for not only code generation, but also code review, code reading, solution exploring, etc., the Generated-by: makes me feel like the whole patch is generated by the AI and I am just a transporter delivering it to GitHub.

Instead, I suggest we update this to adopt the Assisted-by: tag, which has been adopted by the Linux kernel community.

torvalds/linux@78d979d
https://docs.kernel.org/process/coding-assistants.html

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Agreed on dropping Generated-by:. The thing I'm not sure about is what to put after it.

I went through all 221 real Assisted-by: trailers in torvalds/linux — formats are pretty messy (claude-opus-4-6 / 4.6 / just Claude...), and around 17% use the [TOOL] slot, almost always for coccinelle or kconfiglint, neither of which we run.

So three ways we could go:

  1. Copy kernel as-is: Assisted-by: <AGENT>:<MODEL> [TOOL1] [TOOL2]. My worry is that "tool" in an AI context easily reads as "agent tool call" (Read/Edit/Bash), and people will start dumping their IDE or formatter in there.

  2. Same thing but drop the tool slot: Assisted-by: <AGENT>:<MODEL>. This is actually what ~80% of real kernel commits look like anyway, so we'd be matching what people actually do, not what the spec says.

  3. Just rename the field, keep the value free-form: Assisted-by: <tool name and version>. Smallest change to the existing PR template.

I lean toward 2. WDYT?

(Whichever we pick I'll update .github/PULL_REQUEST_TEMPLATE in this PR too.)

Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I prefer the option 2

Comment thread AGENTS.md Outdated
- Use `apache` remote; Kyuubi scripts assume the name
- Engine list trails with `etc.`; drop Hive from the example set
- Add `-am` to single-module Maven examples
- Spell out "dependency" instead of "dep"
- Drop the "Prefer Java" stance; both languages are first-class
- Soften `Option[T]` rule with perf exception
- Remove Python thrift-patches reference
- Note Thrift Operation reuse with operation-level config keys
- Cite fixed version when referencing upstream issues
- Keep PR prose precise and concise
- Switch AI disclosure to `Assisted-by: <AGENT>:<MODEL>`
- Loosen Java test note to defer to surrounding module style
- Mention GitHub Discussions alongside the dev mailing list
@wangzhigang1999 wangzhigang1999 force-pushed the kyuubi-7445-agents-md branch from f9f67d4 to d0c685c Compare May 18, 2026 09:47
…NT>:<MODEL_ID>

- AGENTS.md: pin the example to the canonical model id (`claude-opus-4-7`).
- PULL_REQUEST_TEMPLATE: rename the field from `Generated-by:` to
  `Assisted-by:`, broaden the prompt to cover review/exploration, and
  align the example.
@github-actions github-actions Bot added the kind:infra license, community building, project builds, asf infra related, etc. label May 19, 2026
Comment thread AGENTS.md Outdated

Allowed under [ASF generative tooling guidance](https://www.apache.org/legal/generative-tooling.html) and Kyuubi's PR template. AI is used for more than code generation (review, reading, solution exploration), so disclose assistance rather than implying the patch was machine-authored.

- Disclose assistance via `Assisted-by: <AGENT>:<MODEL_VERSION>` in the PR template field, following the [Linux kernel convention](https://docs.kernel.org/process/coding-assistants.html). Example: `Assisted-by: Claude:claude-opus-4-7`.
Copy link
Copy Markdown
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thank you for the update. Would you also update the pull request template?

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I just made changes in bb861cb.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

kind:documentation Documentation is a feature! kind:infra license, community building, project builds, asf infra related, etc.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[FEATURE] Add AGENTS.md to guide AI coding agents contributing to Kyuubi

3 participants