Skip to content

fix(FR-2582): preserve final TPS value after LLM Playground response ends#6707

Merged
graphite-app[bot] merged 1 commit intomainfrom
04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends
Apr 23, 2026
Merged

fix(FR-2582): preserve final TPS value after LLM Playground response ends#6707
graphite-app[bot] merged 1 commit intomainfrom
04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends

Conversation

@yomybaby
Copy link
Copy Markdown
Member

@yomybaby yomybaby commented Apr 15, 2026

Resolves #6705(FR-2582)

Summary

In the LLM Playground, the TPS (tokens per second) indicator dropped to 0 immediately after a streaming response finished because onFinish cleared startTime to null and ChatTokenCounter returned 0 whenever startTime was nullish. This made the final TPS measurement disappear from the UI.

This change:

  • Aligns TPS measurement with the standard LLM inference convention (vLLM, Ollama, NVIDIA GenAI-Perf, Anyscale): start the measurement window when the first output token actually arrives, not when the user presses send. This excludes file upload, network RTT, and prefill time (TTFT) from the TPS numerator, so the displayed value reflects pure decode rate.
  • Tracks the measurement window as { startTime, endTime } in ChatCard:
    • startTime is set by a useEffect when status transitions to 'streaming' (i.e., the first token has been received).
    • endTime is set by a useEffect when streaming ends — covers normal completion and abort / error paths, so TPS freezes correctly in every case instead of drifting downward indefinitely after stop().
    • handleSendMessage resets both to null on every new send.
  • ChatTokenCounter now computes elapsed as ((endTime ?? Date.now()) - startTime) / 1000 and short-circuits to 0 when elapsed is non-positive, avoiding an Infinity TPS display when the computation runs before the first token chunk has been counted.

Files changed

  • react/src/components/Chat/ChatCard.tsx
  • react/src/components/Chat/ChatMessages.tsx
  • react/src/components/Chat/ChatTokenCounter.tsx

Manual test plan

  • Send a prompt to a model and confirm the TPS counter updates while the response streams in.
  • After the response completes, confirm the TPS value remains visible (frozen at the last measurement) instead of resetting to 0.
  • Send a second prompt and confirm the TPS counter resets and starts measuring the new response.
  • Click stop mid-stream: TPS should freeze at the partial value rather than continue drifting downward.
  • Send a prompt with a large file attachment: TPS should reflect only the model's decode rate, not the upload duration.

Verification

bash scripts/verify.sh -> === ALL PASS === (Relay, Lint, Format, TypeScript)

Copilot AI review requested due to automatic review settings April 15, 2026 05:28
@github-actions github-actions Bot added the size:S 10~30 LoC label Apr 15, 2026
Copy link
Copy Markdown
Member Author


How to use the Graphite Merge Queue

Add either label to this PR to merge it via the merge queue:

  • flow:merge-queue - adds this PR to the back of the merge queue
  • flow:hotfix - for urgent changes, fast-track this PR to the front of the merge queue

You must have a Graphite account in order to use the merge queue. Sign up using this link.

An organization admin has required the Graphite Merge Queue in this repository.

Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue.

This stack of pull requests is managed by Graphite. Learn more about stacking.

Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

Preserves the final “tokens per second” (TPS) value in the LLM Playground UI after a streaming response completes by capturing an end timestamp and using it in the TPS calculation.

Changes:

  • Add endTime state in ChatCard and set it when streaming finishes; reset it when a new message send starts.
  • Plumb endTime through ChatMessages into ChatTokenCounter.
  • Update TPS calculation to use (endTime ?? Date.now()) so TPS remains stable after completion.

Reviewed changes

Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.

File Description
react/src/components/Chat/ChatCard.tsx Tracks endTime for each run (set on finish, cleared on new send) and passes it down to message UI.
react/src/components/Chat/ChatMessages.tsx Extends props to accept endTime and forwards it to the token counter.
react/src/components/Chat/ChatTokenCounter.tsx Uses endTime (or current time while streaming) to compute TPS and keep the final value after completion.

Comment thread react/src/components/Chat/ChatTokenCounter.tsx Outdated
Comment thread react/src/components/Chat/ChatCard.tsx Outdated
@github-actions
Copy link
Copy Markdown
Contributor

github-actions Bot commented Apr 15, 2026

Coverage report for ./react

St.
Category Percentage Covered / Total
🔴 Statements 8.59% 1757/20443
🔴 Branches 7.88% 1131/14351
🔴 Functions 5.14% 285/5544
🔴 Lines 8.31% 1649/19837

Test suite run success

856 tests passing in 39 suites.

Report generated by 🧪jest coverage report action from 2e0a5e1

@yomybaby yomybaby force-pushed the 04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends branch from 5a82835 to 1ca31e4 Compare April 15, 2026 05:37
@github-actions github-actions Bot added the bug label Apr 15, 2026
@yomybaby yomybaby force-pushed the 04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends branch from 1ca31e4 to d8d3d02 Compare April 15, 2026 07:02
@github-actions github-actions Bot added size:M 30~100 LoC and removed size:S 10~30 LoC labels Apr 15, 2026
@yomybaby yomybaby requested a review from agatha197 April 21, 2026 04:05
Comment thread react/src/components/Chat/ChatTokenCounter.tsx Outdated
@yomybaby yomybaby force-pushed the 04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends branch from d8d3d02 to 91ff835 Compare April 23, 2026 03:38
@yomybaby yomybaby requested a review from agatha197 April 23, 2026 03:38
Copy link
Copy Markdown
Contributor

@agatha197 agatha197 left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@graphite-app
Copy link
Copy Markdown

graphite-app Bot commented Apr 23, 2026

Merge activity

…ends (#6707)

Resolves #6705(FR-2582)

## Summary

In the LLM Playground, the TPS (tokens per second) indicator dropped to `0` immediately after a streaming response finished because `onFinish` cleared `startTime` to `null` and `ChatTokenCounter` returned `0` whenever `startTime` was nullish. This made the final TPS measurement disappear from the UI.

This change:

- **Aligns TPS measurement with the standard LLM inference convention** ([vLLM](https://docs.vllm.ai/en/stable/design/metrics/), [Ollama](https://github.com/ollama/ollama/blob/main/docs/api.md), [NVIDIA GenAI-Perf](https://docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html), [Anyscale](https://docs.anyscale.com/llm/serving/benchmarking/metrics)): start the measurement window when the **first output token** actually arrives, not when the user presses send. This excludes file upload, network RTT, and prefill time (TTFT) from the TPS numerator, so the displayed value reflects pure decode rate.
- Tracks the measurement window as `{ startTime, endTime }` in `ChatCard`:
  - `startTime` is set by a `useEffect` when `status` transitions to `'streaming'` (i.e., the first token has been received).
  - `endTime` is set by a `useEffect` when streaming ends — covers normal completion **and** abort / error paths, so TPS freezes correctly in every case instead of drifting downward indefinitely after `stop()`.
  - `handleSendMessage` resets both to `null` on every new send.
- `ChatTokenCounter` now computes elapsed as `((endTime ?? Date.now()) - startTime) / 1000` and short-circuits to `0` when elapsed is non-positive, avoiding an `Infinity` TPS display when the computation runs before the first token chunk has been counted.

## Files changed

- `react/src/components/Chat/ChatCard.tsx`
- `react/src/components/Chat/ChatMessages.tsx`
- `react/src/components/Chat/ChatTokenCounter.tsx`

## Manual test plan

- Send a prompt to a model and confirm the TPS counter updates while the response streams in.
- After the response completes, confirm the TPS value remains visible (frozen at the last measurement) instead of resetting to `0`.
- Send a second prompt and confirm the TPS counter resets and starts measuring the new response.
- Click stop mid-stream: TPS should freeze at the partial value rather than continue drifting downward.
- Send a prompt with a large file attachment: TPS should reflect only the model's decode rate, not the upload duration.

## Verification

`bash scripts/verify.sh` -> `=== ALL PASS ===` (Relay, Lint, Format, TypeScript)
@graphite-app graphite-app Bot force-pushed the 04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends branch from 91ff835 to 2e0a5e1 Compare April 23, 2026 07:33
@graphite-app graphite-app Bot merged commit 2e0a5e1 into main Apr 23, 2026
11 checks passed
@graphite-app graphite-app Bot deleted the 04-15-fix_fr-2582_preserve_final_tps_value_after_llm_playground_response_ends branch April 23, 2026 07:34
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

bug size:M 30~100 LoC

Projects

None yet

Development

Successfully merging this pull request may close these issues.

LLM Playground TPS value drops to 0 after response ends

3 participants