Skip to content

Commit 0bb17bc

Browse files
rluvatonblagininmartin-g
authored
build: allow posting comments on PRs made from forks and fix missing protobuf (#21913)
## Which issue does this PR close? - #21911 ## Rationale for this change The breaking-change detector added in #21499 fails on fork PRs with HTTP 403: > The GITHUB_TOKEN has read-only permissions in pull requests from forked repositories. > > From [GitHub Docs](https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request) A read-only token can't post the sticky comment, so the workflow errors out at the `gh api … POST /comments` call. We can't switch to `pull_request_target` either - ASF infra policy forbids it for any workflow exposing `GITHUB_TOKEN` (https://infra.apache.org/github-actions-policy.html), and `cargo-semver-checks` compiles fork-controlled code (`build.rs`, proc macros) anyway, so granting it a write token would be unsafe. ## What changes are included in this PR? Split the comment posting into a companion `workflow_run` workflow: - `breaking_changes_detector.yml` keeps the `pull_request` trigger but only stages the result (`pr_number`, `result`, `logs`) and uploads it as an artifact. No write token, no comment posting from this workflow. - `breaking_changes_detector_comment.yml` triggers on `workflow_run`, runs in the base-repo context with `pull-requests: write`, downloads the artifact, validates the inputs, and upserts/deletes the sticky comment via `actions-cool/maintain-one-comment`. Never checks out PR code. The comment workflow uses a runtime-randomized heredoc delimiter when piping the fork-controlled logs into `$GITHUB_OUTPUT`, to stop log content from closing the heredoc early and overwriting the validated `result` output (or injecting other keys). Drops the now-unused `comment` subcommand from `ci/scripts/changed_crates.sh`. ---- also installed protobuf as noticed this failed when building subtrait in: - #15591 ## Are these changes tested? No, cant really test it ## Are there any user-facing changes? No --------- Co-authored-by: Dmitrii Blaginin <dmitrii@blaginin.me> Co-authored-by: Martin Grigorov <martin-g@users.noreply.github.com>
1 parent 42cd2fa commit 0bb17bc

3 files changed

Lines changed: 183 additions & 93 deletions

File tree

.github/workflows/breaking_changes_detector.yml

Lines changed: 50 additions & 34 deletions
Original file line numberDiff line numberDiff line change
@@ -20,8 +20,24 @@
2020
# Only public workspace crates that have file changes are checked.
2121
# Internal crates (benchmarks, test-utils, sqllogictest, doc) are excluded.
2222
#
23-
# If breaking changes are found, a sticky comment is posted on the PR.
24-
# The comment is removed automatically once the issues are resolved.
23+
# This workflow only runs cargo-semver-checks and uploads the result as an
24+
# artifact. The actual PR comment is posted by a companion workflow
25+
# (`breaking_changes_detector_comment.yml`) that picks up the artifact via
26+
# `workflow_run`.
27+
#
28+
# Why split it?
29+
# "The GITHUB_TOKEN has read-only permissions in pull requests from forked
30+
# repositories."
31+
# https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request
32+
# A read-only token cannot post comments, so on fork PRs the previous
33+
# single-workflow design failed with HTTP 403. We can't simply broaden the
34+
# trigger here either: cargo-semver-checks compiles PR code (build.rs, proc
35+
# macros), so granting this job a write token would expose it to any code
36+
# in the PR. And ASF infra policy independently forbids `pull_request_target`
37+
# for any workflow that exposes GITHUB_TOKEN
38+
# (https://infra.apache.org/github-actions-policy.html). The companion
39+
# `workflow_run` workflow runs in the base-repo context with write access
40+
# and never executes PR code.
2541

2642
name: "Detect breaking changes"
2743

@@ -37,11 +53,6 @@ jobs:
3753
check-semver:
3854
name: Check semver
3955
runs-on: ubuntu-latest
40-
outputs:
41-
logs: ${{ steps.check_semver.outputs.logs }}
42-
# Default to "success" so the comment job clears any stale comment
43-
# when the check step is skipped (e.g. no published crates changed).
44-
result: ${{ steps.check_semver.outputs.result || 'success' }}
4556
steps:
4657
- name: Checkout
4758
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
@@ -66,6 +77,15 @@ jobs:
6677
echo "packages=$PACKAGES" >> "$GITHUB_OUTPUT"
6778
echo "Changed crates: $PACKAGES"
6879
80+
# `datafusion-substrait` (and crates that depend on it via sqllogictest)
81+
# have a build script that calls protoc, which is not preinstalled on
82+
# ubuntu-latest runners.
83+
- name: Install Protobuf Compiler
84+
if: steps.changed_crates.outputs.packages != ''
85+
run: |
86+
sudo apt-get update
87+
sudo apt-get install -y protobuf-compiler
88+
6989
- name: Install cargo-semver-checks
7090
if: steps.changed_crates.outputs.packages != ''
7191
uses: taiki-e/install-action@94cb46f8d6e437890146ffbd78a778b78e623fb2 # v2.74.0
@@ -85,11 +105,6 @@ jobs:
85105
ci/scripts/changed_crates.sh semver-check "origin/${BASE_REF}" $PACKAGES \
86106
2>&1 | tee /tmp/semver-output.txt
87107
EXIT_CODE=${PIPESTATUS[0]}
88-
{
89-
echo "logs<<EOF"
90-
sed 's/\x1b\[[0-9;]*m//g' /tmp/semver-output.txt
91-
echo "EOF"
92-
} >> "$GITHUB_OUTPUT"
93108
# Pass the result through an output instead of failing the job:
94109
# a detected breaking change should surface as a PR comment, not a
95110
# red check, so PR authors aren't confused by an intentional break.
@@ -99,28 +114,29 @@ jobs:
99114
echo "result=failure" >> "$GITHUB_OUTPUT"
100115
fi
101116
102-
# Post or remove a sticky comment on the PR based on the semver check result.
103-
comment-on-pr:
104-
name: Comment on pull request
105-
runs-on: ubuntu-latest
106-
needs: check-semver
107-
if: always()
108-
permissions:
109-
contents: read
110-
pull-requests: write
111-
steps:
112-
- name: Checkout
113-
uses: actions/checkout@de0fac2e4500dabe0009e67214ff5f5447ce83dd # v6.0.2
114-
with:
115-
sparse-checkout: ci/scripts
116-
117-
- name: Update PR comment
117+
# Stage the data the companion comment workflow needs into a single
118+
# directory. We default the result to "success" so the comment
119+
# workflow clears any stale comment when the check step is skipped
120+
# (e.g. no published crates changed).
121+
- name: Stage artifact for comment workflow
122+
if: always()
118123
env:
119-
GH_TOKEN: ${{ github.token }}
120-
REPO: ${{ github.repository }}
121124
PR_NUMBER: ${{ github.event.pull_request.number }}
122-
CHECK_RESULT: ${{ needs.check-semver.outputs.result }}
123-
SEMVER_LOGS: ${{ needs.check-semver.outputs.logs }}
125+
CHECK_RESULT: ${{ steps.check_semver.outputs.result || 'success' }}
124126
run: |
125-
ci/scripts/changed_crates.sh comment \
126-
"$REPO" "$PR_NUMBER" "$CHECK_RESULT" "$SEMVER_LOGS"
127+
mkdir -p semver-artifact
128+
echo "$PR_NUMBER" > semver-artifact/pr_number
129+
echo "$CHECK_RESULT" > semver-artifact/result
130+
if [ -f /tmp/semver-output.txt ]; then
131+
sed 's/\x1b\[[0-9;]*m//g' /tmp/semver-output.txt > semver-artifact/logs
132+
else
133+
: > semver-artifact/logs
134+
fi
135+
136+
- name: Upload artifact
137+
if: always()
138+
uses: actions/upload-artifact@043fb46d1a93c77aae656e7c1c64a875d1fc6a0a # v7.0.1
139+
with:
140+
name: semver-check-result
141+
path: semver-artifact/
142+
retention-days: 1
Lines changed: 132 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,132 @@
1+
# Licensed to the Apache Software Foundation (ASF) under one
2+
# or more contributor license agreements. See the NOTICE file
3+
# distributed with this work for additional information
4+
# regarding copyright ownership. The ASF licenses this file
5+
# to you under the Apache License, Version 2.0 (the
6+
# "License"); you may not use this file except in compliance
7+
# with the License. You may obtain a copy of the License at
8+
#
9+
# http://www.apache.org/licenses/LICENSE-2.0
10+
#
11+
# Unless required by applicable law or agreed to in writing,
12+
# software distributed under the License is distributed on an
13+
# "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY
14+
# KIND, either express or implied. See the License for the
15+
# specific language governing permissions and limitations
16+
# under the License.
17+
18+
# Companion to `breaking_changes_detector.yml`. Posts the sticky PR comment.
19+
#
20+
# Why this workflow exists:
21+
# "The GITHUB_TOKEN has read-only permissions in pull requests from forked
22+
# repositories."
23+
# https://docs.github.com/en/actions/reference/events-that-trigger-workflows#pull_request
24+
# That is why the upstream `pull_request` workflow cannot post the comment
25+
# itself when the PR comes from a fork.
26+
#
27+
# Why not `pull_request_target`? ASF infra policy forbids it:
28+
# "You MUST NOT use `pull_request_target` as a trigger on ANY action that
29+
# exports ANY confidential credentials or tokens such as GITHUB_TOKEN or
30+
# NPM_TOKEN."
31+
# https://infra.apache.org/github-actions-policy.html
32+
# `workflow_run` is the supported alternative: it runs in the base
33+
# repository's context regardless of where the upstream run was triggered
34+
# from, so the GITHUB_TOKEN here can be granted `pull-requests: write`. See:
35+
# https://docs.github.com/en/actions/reference/events-that-trigger-workflows#workflow_run
36+
#
37+
# Security note: this workflow MUST NOT check out or execute any code from
38+
# the PR. The artifact's contents originate from a workflow run that may
39+
# have compiled fork-controlled code, so PR_NUMBER and CHECK_RESULT are
40+
# validated against strict patterns before being passed to any action.
41+
42+
name: "Detect breaking changes - Comment"
43+
44+
on:
45+
workflow_run:
46+
workflows: ["Detect breaking changes"]
47+
types:
48+
- completed
49+
50+
permissions:
51+
contents: read
52+
53+
jobs:
54+
comment-on-pr:
55+
name: Comment on pull request
56+
if: github.event.workflow_run.event == 'pull_request'
57+
runs-on: ubuntu-latest
58+
# Scoped to the minimum needed to upsert/delete the sticky comment.
59+
permissions:
60+
actions: read
61+
pull-requests: write
62+
steps:
63+
- name: Download semver-check artifact
64+
uses: actions/download-artifact@3e5f45b2cfb9172054b4087a40e8e0b5a5461e7c # v8.0.1
65+
with:
66+
name: semver-check-result
67+
run-id: ${{ github.event.workflow_run.id }}
68+
github-token: ${{ github.token }}
69+
path: ./semver-artifact
70+
71+
- name: Read and validate artifact
72+
id: read
73+
run: |
74+
set -euo pipefail
75+
# Validate every field: the artifact comes from a workflow run
76+
# that compiled fork-controlled code, so its contents are untrusted.
77+
PR_NUMBER=$(cat ./semver-artifact/pr_number)
78+
if ! [[ "$PR_NUMBER" =~ ^[0-9]+$ ]]; then
79+
echo "Invalid PR number: $PR_NUMBER" >&2
80+
exit 1
81+
fi
82+
CHECK_RESULT=$(cat ./semver-artifact/result)
83+
if [[ "$CHECK_RESULT" != "success" && "$CHECK_RESULT" != "failure" ]]; then
84+
echo "Invalid check result: $CHECK_RESULT" >&2
85+
exit 1
86+
fi
87+
echo "pr_number=$PR_NUMBER" >> "$GITHUB_OUTPUT"
88+
echo "result=$CHECK_RESULT" >> "$GITHUB_OUTPUT"
89+
90+
# Multi-line output: random delimiter so a malicious log line can't
91+
# close the heredoc and inject extra output keys. See:
92+
# https://docs.github.com/en/actions/reference/workflow-commands-for-github-actions#multiline-strings
93+
DELIM="EOF_$(openssl rand -hex 16)"
94+
{
95+
echo "logs<<${DELIM}"
96+
cat ./semver-artifact/logs
97+
echo "${DELIM}"
98+
} >> "$GITHUB_OUTPUT"
99+
100+
# The marker `<!-- semver-check-comment -->` is what makes the comment
101+
# "sticky": maintain-one-comment uses it to find and replace (or
102+
# delete) the existing comment instead of stacking new ones.
103+
- name: Upsert sticky comment
104+
if: steps.read.outputs.result != 'success'
105+
uses: actions-cool/maintain-one-comment@909842216bc8e8658364c572ec52100f4c2cc50a # v3.3.0
106+
with:
107+
token: ${{ secrets.GITHUB_TOKEN }}
108+
number: ${{ steps.read.outputs.pr_number }}
109+
body-include: '<!-- semver-check-comment -->'
110+
body: |
111+
<!-- semver-check-comment -->
112+
Thank you for opening this pull request!
113+
114+
Reviewer note: [cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks) reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).
115+
116+
<details>
117+
<summary>Details</summary>
118+
119+
```
120+
${{ steps.read.outputs.logs }}
121+
```
122+
123+
</details>
124+
125+
- name: Delete sticky comment
126+
if: steps.read.outputs.result == 'success'
127+
uses: actions-cool/maintain-one-comment@909842216bc8e8658364c572ec52100f4c2cc50a # v3.3.0
128+
with:
129+
token: ${{ secrets.GITHUB_TOKEN }}
130+
number: ${{ steps.read.outputs.pr_number }}
131+
body-include: '<!-- semver-check-comment -->'
132+
delete: true

ci/scripts/changed_crates.sh

Lines changed: 1 addition & 59 deletions
Original file line numberDiff line numberDiff line change
@@ -28,17 +28,9 @@
2828
# Run cargo-semver-checks for the given packages against base_ref.
2929
# Output and exit code are passed through unchanged; the caller is
3030
# responsible for capturing/formatting them.
31-
#
32-
# comment <repo> <pr_number> <check_result> [logs]
33-
# Upsert or delete a sticky PR comment based on check_result.
34-
# check_result: "success" deletes any existing comment,
35-
# anything else upserts the comment with the provided logs.
36-
# Requires GH_TOKEN to be set.
3731

3832
set -euo pipefail
3933

40-
MARKER="<!-- semver-check-comment -->"
41-
4234
# ── changed-crates ──────────────────────────────────────────────────
4335
cmd_changed_crates() {
4436
local base_ref="${1:?Usage: changed_crates.sh changed-crates <base_ref>}"
@@ -80,62 +72,12 @@ cmd_semver_check() {
8072
cargo semver-checks --baseline-rev "$base_ref" "${args[@]}"
8173
}
8274

83-
# ── comment ─────────────────────────────────────────────────────────
84-
cmd_comment() {
85-
local repo="${1:?Usage: changed_crates.sh comment <repo> <pr_number> <check_result> [logs]}"
86-
local pr_number="${2:?}"
87-
local check_result="${3:?}"
88-
local logs="${4:-}"
89-
90-
# Find existing comment with our marker
91-
local comment_id
92-
comment_id=$(gh api "repos/${repo}/issues/${pr_number}/comments" \
93-
--jq ".[] | select(.body | contains(\"${MARKER}\")) | .id" | head -1)
94-
95-
echo "existing breaking change comment id $comment_id"
96-
97-
if [ "$check_result" = "success" ]; then
98-
# Delete the comment if one exists
99-
if [ -n "$comment_id" ]; then
100-
echo "result is success, so deleting breaking change comment"
101-
gh api "repos/${repo}/issues/comments/${comment_id}" --method DELETE
102-
else
103-
echo "result is success and no previous comment to delete"
104-
fi
105-
else
106-
local body="${MARKER}
107-
Thank you for opening this pull request!
108-
109-
Reviewer note: [cargo-semver-checks](https://github.com/obi1kenobi/cargo-semver-checks) reported the current version number is not SemVer-compatible with the changes in this pull request (compared against the base branch).
110-
111-
<details>
112-
<summary>Details</summary>
113-
114-
\`\`\`
115-
${logs}
116-
\`\`\`
117-
118-
</details>"
119-
120-
if [ -n "$comment_id" ]; then
121-
echo "comment already exists, updating content"
122-
gh api "repos/${repo}/issues/comments/${comment_id}" \
123-
--method PATCH --field body="$body"
124-
else
125-
echo "no comment with breaking changes, creating a new one"
126-
gh api "repos/${repo}/issues/${pr_number}/comments" \
127-
--method POST --field body="$body"
128-
fi
129-
fi
130-
}
131-
13275
# ── main ────────────────────────────────────────────────────────────
133-
cmd="${1:?Usage: changed_crates.sh <changed-crates|semver-check|comment> [args...]}"
76+
cmd="${1:?Usage: changed_crates.sh <changed-crates|semver-check> [args...]}"
13477
shift
13578

13679
case "$cmd" in
13780
changed-crates) cmd_changed_crates "$@" ;;
13881
semver-check) cmd_semver_check "$@" ;;
139-
comment) cmd_comment "$@" ;;
14082
*) echo "Unknown command: $cmd" >&2; exit 1 ;;
14183
esac

0 commit comments

Comments
 (0)