fix(FR-2582): preserve final TPS value after LLM Playground response ends#6707
Conversation
How to use the Graphite Merge QueueAdd either label to this PR to merge it via the merge queue:
You must have a Graphite account in order to use the merge queue. Sign up using this link. An organization admin has required the Graphite Merge Queue in this repository. Please do not merge from GitHub as this will restart CI on PRs being processed by the merge queue. This stack of pull requests is managed by Graphite. Learn more about stacking. |
There was a problem hiding this comment.
Pull request overview
Preserves the final “tokens per second” (TPS) value in the LLM Playground UI after a streaming response completes by capturing an end timestamp and using it in the TPS calculation.
Changes:
- Add
endTimestate inChatCardand set it when streaming finishes; reset it when a new message send starts. - Plumb
endTimethroughChatMessagesintoChatTokenCounter. - Update TPS calculation to use
(endTime ?? Date.now())so TPS remains stable after completion.
Reviewed changes
Copilot reviewed 3 out of 3 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
| react/src/components/Chat/ChatCard.tsx | Tracks endTime for each run (set on finish, cleared on new send) and passes it down to message UI. |
| react/src/components/Chat/ChatMessages.tsx | Extends props to accept endTime and forwards it to the token counter. |
| react/src/components/Chat/ChatTokenCounter.tsx | Uses endTime (or current time while streaming) to compute TPS and keep the final value after completion. |
Coverage report for
|
St.❔ |
Category | Percentage | Covered / Total |
|---|---|---|---|
| 🔴 | Statements | 8.59% | 1757/20443 |
| 🔴 | Branches | 7.88% | 1131/14351 |
| 🔴 | Functions | 5.14% | 285/5544 |
| 🔴 | Lines | 8.31% | 1649/19837 |
Test suite run success
856 tests passing in 39 suites.
Report generated by 🧪jest coverage report action from 2e0a5e1
5a82835 to
1ca31e4
Compare
1ca31e4 to
d8d3d02
Compare
d8d3d02 to
91ff835
Compare
Merge activity
|
…ends (#6707) Resolves #6705(FR-2582) ## Summary In the LLM Playground, the TPS (tokens per second) indicator dropped to `0` immediately after a streaming response finished because `onFinish` cleared `startTime` to `null` and `ChatTokenCounter` returned `0` whenever `startTime` was nullish. This made the final TPS measurement disappear from the UI. This change: - **Aligns TPS measurement with the standard LLM inference convention** ([vLLM](https://docs.vllm.ai/en/stable/design/metrics/), [Ollama](https://github.com/ollama/ollama/blob/main/docs/api.md), [NVIDIA GenAI-Perf](https://docs.nvidia.com/nim/benchmarking/llm/latest/metrics.html), [Anyscale](https://docs.anyscale.com/llm/serving/benchmarking/metrics)): start the measurement window when the **first output token** actually arrives, not when the user presses send. This excludes file upload, network RTT, and prefill time (TTFT) from the TPS numerator, so the displayed value reflects pure decode rate. - Tracks the measurement window as `{ startTime, endTime }` in `ChatCard`: - `startTime` is set by a `useEffect` when `status` transitions to `'streaming'` (i.e., the first token has been received). - `endTime` is set by a `useEffect` when streaming ends — covers normal completion **and** abort / error paths, so TPS freezes correctly in every case instead of drifting downward indefinitely after `stop()`. - `handleSendMessage` resets both to `null` on every new send. - `ChatTokenCounter` now computes elapsed as `((endTime ?? Date.now()) - startTime) / 1000` and short-circuits to `0` when elapsed is non-positive, avoiding an `Infinity` TPS display when the computation runs before the first token chunk has been counted. ## Files changed - `react/src/components/Chat/ChatCard.tsx` - `react/src/components/Chat/ChatMessages.tsx` - `react/src/components/Chat/ChatTokenCounter.tsx` ## Manual test plan - Send a prompt to a model and confirm the TPS counter updates while the response streams in. - After the response completes, confirm the TPS value remains visible (frozen at the last measurement) instead of resetting to `0`. - Send a second prompt and confirm the TPS counter resets and starts measuring the new response. - Click stop mid-stream: TPS should freeze at the partial value rather than continue drifting downward. - Send a prompt with a large file attachment: TPS should reflect only the model's decode rate, not the upload duration. ## Verification `bash scripts/verify.sh` -> `=== ALL PASS ===` (Relay, Lint, Format, TypeScript)
91ff835 to
2e0a5e1
Compare

Resolves #6705(FR-2582)
Summary
In the LLM Playground, the TPS (tokens per second) indicator dropped to
0immediately after a streaming response finished becauseonFinishclearedstartTimetonullandChatTokenCounterreturned0wheneverstartTimewas nullish. This made the final TPS measurement disappear from the UI.This change:
{ startTime, endTime }inChatCard:startTimeis set by auseEffectwhenstatustransitions to'streaming'(i.e., the first token has been received).endTimeis set by auseEffectwhen streaming ends — covers normal completion and abort / error paths, so TPS freezes correctly in every case instead of drifting downward indefinitely afterstop().handleSendMessageresets both tonullon every new send.ChatTokenCounternow computes elapsed as((endTime ?? Date.now()) - startTime) / 1000and short-circuits to0when elapsed is non-positive, avoiding anInfinityTPS display when the computation runs before the first token chunk has been counted.Files changed
react/src/components/Chat/ChatCard.tsxreact/src/components/Chat/ChatMessages.tsxreact/src/components/Chat/ChatTokenCounter.tsxManual test plan
0.Verification
bash scripts/verify.sh->=== ALL PASS ===(Relay, Lint, Format, TypeScript)