feat(copilot): Ask Ontos uplift — personalization, smart prompts, audience contract (#280)#488
Draft
mvkonchits-db wants to merge 27 commits into
Draft
feat(copilot): Ask Ontos uplift — personalization, smart prompts, audience contract (#280)#488mvkonchits-db wants to merge 27 commits into
mvkonchits-db wants to merge 27 commits into
Conversation
Add 13 markdown files under docs/concepts/ that serve as the grounding
corpus for the Ask Ontos copilot. Covers:
- roles & RBAC + permission model
- data product / data contract lifecycles
- agreement workflow (workflow vs execution vs agreement)
- ontology and knowledge-graph model, semantic linking (three-tier)
- data quality + DQX integration end-to-end
- delivery modes vs delivery methods (disambiguated)
- MCP and Ask Ontos surfaces
- asset model
- personas quick-reference
- end-to-end flows (bottom-up UC -> catalog, top-down ontology -> assets)
Every major section carries an explicit {#kebab-anchor} so the copilot
can cite via search_ontos_concepts in a follow-up commit. Citations are
hidden from end-users in v1; the corpus is LLM grounding, not a
user-facing docs site.
Vocabulary aligned with the pitch deck + CUJ doc (ODPS v1.0.0, ODCS
v3.1.0). Forward-compatibility softening applied for several in-flight
PRs (versioning, Ontos admin decoupling, approver-role filter, etc.)
without naming them.
Co-authored-by: Isaac
Make the in-product copilot citation-anchor conceptual answers to the
new docs/concepts/ corpus.
- Add SearchOntosConceptsTool that walks docs/concepts/, parses sections
by heading and {#anchor}, returns top-K excerpts ranked by title >
anchor > body keyword frequency. Each match returns file, anchor,
title, excerpt, source_uri (file.md#anchor).
- Add 'concepts' query-classifier category in DEFAULT_CATEGORIES and
ALWAYS_INCLUDED_CATEGORIES so the tool is offered on every conceptual
question.
- Extract hardcoded SYSTEM_PROMPT into a new
controller/system_prompts.py module exposing get_system_prompt() with
personalization slots (role, page_name, selected_entity,
adoption_mode) for Phase 2/3 to fill. v1 ignores the slots.
- Honor LLM_SYSTEM_PROMPT env override (previously defined in Settings
but never consumed).
- New default system prompt: vocabulary primer aligned with the pitch
deck + CUJ doc, tool-first policy for conceptual questions, three-tier
confidence labels ([Confirmed]/[Documented]/[Inferred]), hidden
citation discipline, strict refusal template, out-of-scope deflection.
Tests:
- 13 unit tests for SearchOntosConceptsTool (empty query, known concept,
multi-doc concept, no-match, anchor extraction)
- 6 integration tests for /api/llm-search/chat with the new tool +
system prompt
- Full unit suite passes (1011/1011); no regressions
Co-authored-by: Isaac
The previous resolution walked exactly 5 parents above concepts.py to find docs/concepts/. That assumed the local-dev layout (with src/ as a wrapper) and silently broke in deployed Databricks Apps where src/ is stripped (so the corpus lives 4 parents up, not 5). Replace with: - ONTOS_CONCEPTS_DIR env var override (explicit, takes precedence) - Walk-up search across parents 2..6 looking for docs/concepts/ - Graceful None on miss (tool still returns success=True, empty matches) Verified for both layouts: - Local: <ontos>/src/backend/src/tools/concepts.py -> finds at parents[4] - Deployed: <approot>/backend/src/tools/concepts.py -> finds at parents[3] Co-authored-by: Isaac
…onse The system prompt asks the model to anchor conceptual answers with hidden HTML-comment citations (e.g. `<!-- ref: roles-and-rbac.md#... -->`) so reviewers can audit grounding without exposing them to end users. Most markdown renderers drop HTML comments on render, but the chat UI surfaces them as visible text — which is what live E2E confirmed. Add a server-side strip in LlmSearchManager: - `_CITATION_COMMENT_RE` matches `<!-- ref: ... -->` (non-greedy) - `_strip_internal_citations` returns (cleaned_text, [refs]) so debug_info retains the citations for audit while the user-facing response is clean - Applied at the inner-loop final return; collapses any 3+ newlines created by the strip back to double Citations remain accessible via `debug_info["internal_citations"]` when the client sets `debug=True`. Co-authored-by: Isaac
Add a 14th file covering install (Marketplace vs Git), update procedures,
maintenance, and common UI errors. 37 anchors so any specific error can
be cited. Topics:
- Distribution channels: Marketplace vs GitHub repo, when to choose which
- First install: prerequisites, first-admin bootstrap, demo presets
- Updates: Marketplace path, Git path, migration discipline (append-only,
≤32-char revision IDs), DB state vs code state
- Maintenance: alembic at startup, role re-seeding (first-start-only),
workspace sync direction (from src/), OAuth scope-change cookie gotcha,
customer fork hygiene
- UI errors users actually see:
* Identity — Request role prompt, unexpected 403s, UC scope missing
* Workflows — Cannot approve, grant_permissions failed (MANAGE required)
* Database — Alembic version too long, Lakebase autoscale stuck, stale
data after git revert
* Deploy — Process did not start in 10 min, corpus not found
6 customer-voice "Common questions". Cross-references to roles-and-rbac,
agreement-workflow, delivery-and-propagation, mcp-and-ask-ontos. No
customer names, no internal ticket IDs.
README.md updated to 14 files; verification footer bumped to 2026-05-29.
Co-authored-by: Isaac
…ponse The labels [Confirmed]/[Documented]/[Inferred] were emitted user-visible per the v1 system prompt, but they expose grounding mechanics that don't belong in the surfaced answer. Treat them the same way as citation comments — emit them so the model still stratifies confidence and so reviewers can audit grounding, but strip server-side before returning. - Add `_CONFIDENCE_LABEL_RE` and extend `_strip_internal_citations` to a 3-tuple return (cleaned_text, citations, confidence_labels) - Surface both into debug_info (`internal_citations`, `confidence_labels`) so audit consumers can still see them via `debug=true` - Update system prompt to declare the labels internal/stripped (so the model knows the act of stratifying matters even though they're hidden) Co-authored-by: Isaac
The model was opening conceptual answers with a bolded restatement of the user's question (e.g. **What is a Team?** followed by the answer). That's redundant in the chat thread where the user already sees their own question above, and reads as noise. Update the Response format section to explicitly forbid: - restating, echoing, or rephrasing the question - bolded-question headers as openers - "Great question!" / "Let me explain..." fillers Begin with the answer directly. Co-authored-by: Isaac
system_prompts.py is not a controller — it's a pure templating function with no state, no business logic, no DB or manager dependencies. Move it next to the MCP/LLM tools registry it actually feeds, where the file structure already groups everything the copilot reaches for. Per Lars' review on PR #472. Co-authored-by: Isaac
…" namespace (#280) Ontos already uses "Concept" for an RDF/SKOS ontology entity — a node in the knowledge graph identified by an IRI, surfaced as an ontology class or glossary term. Overloading the same noun for the LLM grounding corpus caused real confusion: `tools/concepts.py` was unrelated to the `Concept` entity, `search_ontos_concepts` could mean either, and "concept corpus" / "concept docs" prose blurred the distinction throughout the codebase. This commit renames the LLM grounding corpus to "handbook" everywhere — `docs/concepts/` → `docs/handbook/`, `tools/concepts.py` → `tools/handbook.py`, the LLM-callable tool `search_ontos_concepts` → `search_ontos_handbook`, the `ONTOS_CONCEPTS_DIR` env var → `ONTOS_HANDBOOK_DIR`, and the `concepts` query-classifier category → `handbook`. "Concept" is now reserved for the ontology entity. Per Lars' review on PR #472. Co-authored-by: Isaac
The handbook corpus is English-only, but the Ontos UI ships in
seven locales (English, German, Spanish, French, Italian, Japanese,
Dutch). Today the system prompt gives the LLM no guidance on what
to translate vs. keep in English — so non-English answers end up
translating product nouns ("Datenprodukt", "Lieferbar") that don't
exist anywhere in the UI.
Add a `## Language` section that tells the model: answer in the
user's language, but keep Ontos product nouns and UI labels in
English exactly as they appear in the app.
Per Lars' review on PR #472.
Co-authored-by: Isaac
…ure log The corpus-not-found warning referenced a symbol that was never defined (carried over from the original concepts.py implementation; the constant was removed but the log f-string still referenced it). This was a latent NameError on the unhappy path — if the resolver ever returned None the log emit itself would crash before the empty-matches ToolResult could be returned. Replace with the env-var constant we actually have, so the warning tells operators exactly which knobs were tried. Co-authored-by: Isaac
…ble (#280) Phase 2 of the Ask Ontos uplift teaches the copilot whether the workspace is fresh (no published data products) or active and tailors every conceptual answer accordingly. Two coordinated paths share one snapshot helper so the prompt and the LLM-callable tool can never disagree. Backend - New ``src/tools/app_state.py`` exposes ``GetAppStateTool`` (LLM-callable, no params, always-on category) and ``get_adoption_snapshot(db)`` — the shared helper that counts entities and derives ``adoption_mode in {blank, active}``. Mode hinges on *published* products (non-null ``publication_scope`` AND not literal 'none'), so a workspace with only drafts is still blank. - Tool registered in ``tools/registry.py`` and added to ``ToolName.GET_APP_STATE``. New ``app_state`` keyword category in ``query_classifier.py``, also added to ``ALWAYS_INCLUDED_CATEGORIES`` so the LLM can introspect on every call. - ``LLMSearchManager.chat`` pre-fetches the snapshot ONCE per request and passes ``adoption_mode`` into ``get_system_prompt``. Snapshot failure logs and degrades to ``adoption_mode=None`` — the prompt falls back to the byte-identical Phase 1 default, so the existing override-path and grounded-prompt tests still pass. - ``system_prompts.py`` prepends a ``## Current workspace state`` H2 with one of two short preambles (onboarding vs operational tone). The ``LLM_SYSTEM_PROMPT`` env override still wins verbatim — preambles do NOT stack on top of it. - ``LLMSearchStatus`` gains ``adoption_mode`` so the frontend can pick mode-aware starter prompts without an extra round-trip. Also fixes a NameError in ``tools/concepts.py`` (PR1 left a reference to an undefined ``_DEFAULT_CONCEPTS_DIR`` in the missing-corpus branch); covered by the existing test which now passes. Frontend - ``LLMSearchStatus`` TS type and ``AdoptionMode`` literal mirror the backend. - ``copilot-questions.ts`` gains an optional ``adoptionMode`` field on ``CopilotQuestionDef`` and a new ``getting_started`` category. Five onboarding starter prompts (create first product, set up domains, assign roles, what-is-Ontos, core concepts) tagged ``adoptionMode: 'blank'``. - ``useCopilotQuestions`` filters by adoption mode passed from the panel. ``null`` mode (snapshot unavailable) hides mode-tagged questions only — historical catalog still renders. - ``copilot-panel.tsx`` forwards ``status?.adoption_mode`` into the hook. ``llm-search.tsx``'s ``ExampleQuestions`` also switches its 4 starter buttons based on mode. Tests - 10 unit tests for ``GetAppStateTool`` + the snapshot helper (blank, draft-only-still-blank, literal-'none' handling, active-on-publish, multi-entity counts, no-param tolerance, error degrade, snapshot shape). - 8 integration tests through ``LLMSearchManager`` (registry, always-on category, blank-mode preamble, active-mode preamble, override path bypasses preamble, status surfaces mode, snapshot failure falls back to default prompt). All 37 unit + integration tests in the touched scope pass. Co-authored-by: Isaac
…mpt (#280) Phase 3 of the Ask Ontos uplift teaches the copilot WHO is asking and WHAT they're looking at. The frontend already had the data in its copilot store; this PR wires it through the chat payload and renders a ``## Current user context`` block in the system prompt so answers can be tailored ("a Data Consumer needs task-completion help; an Admin needs configuration depth"). Backend - ``ChatMessageCreate`` gains optional ``page_name`` / ``page_url`` / ``feature_id`` / ``selected_entity`` fields. A new ``SelectedEntity`` Pydantic model holds the small (type, name, id) descriptor. All fields are optional so pre-Phase-3 clients (MCP, tests) keep working unchanged. - The chat route receives those fields and passes them to ``LLMSearchManager.chat``. The user's effective Ontos role label is derived server-side via a new helper ``_derive_effective_role_label`` that intersects the user's groups with role assignments (case-insensitive, comma-joined when multiple match). Role is NOT taken from the request payload — defense-in-depth so a client can't impersonate a role just by lying. - ``system_prompts.get_system_prompt`` now renders the ``## Current user context`` H2 below the Phase 2 workspace-state preamble, with bullets for Role / Currently on / Viewing. Empty inputs cleanly omit individual lines or the whole block — returns the byte-identical Phase 1 default when nothing is passed. - ``LLM_SYSTEM_PROMPT`` env override still wins verbatim — Phase 3 preambles do NOT prepend on top of an override. Frontend - ``sendMessage`` in ``llm-search-api.ts`` now reads ``pageContext`` off the Zustand store via ``getState()`` (the API helper is a plain function, not a hook) and forwards ``page_name``, ``page_url``, ``feature_id``, ``selected_entity`` in the chat payload. A small inline ``ChatRequestBody`` type documents the shape without dragging in a fuller request-model export. Tests - 11 integration tests in ``test_llm_search_user_context.py``: - 4 prompt-assembly tests (full block, omit-when-empty, partial inputs, entity-without-id). - 3 end-to-end chat tests (propagation, no-context backward compat, ``_chat_context`` is cleared between calls so a long-lived manager can't leak request state). - 4 role-derivation unit tests (no settings manager, empty groups, case-insensitive intersection, multi-role join). All 48 unit + integration tests in the Ask Ontos touch-set pass. Full backend suite: my touched scope has zero failures; the 165 pre-existing failures on this branch are in unrelated areas (test_teams_routes, test_user_routes, test_workspace_routes, test_pr_i_partial_put, test_semantic_upload_integration) and predate this PR. Co-authored-by: Isaac
Phase 3 (commit 5f3ed7a8) added structured page / role / entity context to the chat-request payload and rendered it as a `## Current user context` preamble in the system prompt. The old client-side `buildContextPrefix` function in copilot-panel.tsx is now duplicating the same information by injecting it directly into the user's message content, which: 1. eats user-message tokens unnecessarily 2. duplicates structured context as unstructured prose 3. makes session transcripts confusing (the LLM sees the context twice, once in the prompt preamble and once in every user turn) Remove the helper and the call site. The server-side preamble is the single source of truth. Co-authored-by: Isaac
) When a user clicks "Role: X (Override)" in the UI, the override is persisted via SettingsManager.set_applied_role_override_for_user but does not mutate user.groups. The Ask Ontos role-derivation helper was the lone holdout still consulting only the user's groups, so the LLM kept seeing the original (e.g. Admin) role instead of the applied override (e.g. Data Producer). Mirror the override-first pattern used in authorization.py and other call sites: consult get_applied_role_override_for_user before falling back to group intersection. If the override id no longer matches any current role, fall through rather than erroring. Fits Phase 3 of the Ask Ontos uplift. Co-authored-by: Isaac
Detail pages were surfacing the same prompts as list pages — "Which
contracts are linked to this data product?" appeared on both
/data-products and /data-products/<id>, never naming the actual
entity. Tag entity-specific questions with `requiresEntity: true`
and hide them when no `selectedEntity` is in the copilot page-context
store. Substitute `{{entityName}}` from `pageContext.selectedEntity.name`
in the rendered text so prompts read e.g. "Which contracts are linked
to Customer 360?". Tagged keys: dp_show_contracts, ct_explain_quality,
ct_add_quality_check, asset_show_lineage, dom_domain_health.
Translations updated across all 7 locales (en, de, es, fr, it, ja, nl)
to use the same placeholder spelling.
Co-authored-by: Isaac
Add 20 new `requiresEntity: true` prompts across data products,
contracts, assets, and domains, each using `{{entityName}}` so they
only surface on detail pages and substitute the current entity name.
New keys: dp_quality_score, dp_owner, dp_schema, dp_last_updated,
dp_subscribe, dp_consumers, dp_publication_status, dp_publish_blockers;
ct_what_covers, ct_used_by, ct_owner, ct_version_impact; asset_built_on,
asset_freshness, asset_quality, asset_consumers; dom_products_in,
dom_business_terms, dom_owner, dom_health_detail. Spread across
explore / build / govern categories. `dom_domain_health` is kept as-is
for backward compat. Localized in all 7 bundles (en/de/es/fr/it/ja/nl).
Co-authored-by: Isaac
Sort matching starter prompts by specificity — entity-templated first, then page-scoped, then global — so the most-context-bound questions surface at the top of the panel. Cap the total prompt count per scope: 6 on detail pages, 6 on type- scoped list pages, 15 on main/marketplace/search. Combined with the new sort, detail pages structurally land ~70% entity-aware prompts (cap=6 with the entity-templated tier taking the top slots). Co-authored-by: Isaac
Turn the "Asking about" chip into a dropdown so users can flip between page-scoped help and a scope-free "Ontos (general)" mode. The chip now renders on every page: page-specific shows the selected entity (detail pages) or the page name (list/main pages); general shows "Ontos (general)" with a check next to the active choice. When the scope is `general`, the frontend strips `page_name`, `page_url`, `feature_id`, and `selected_entity` from the chat payload (backend treats their absence as no context), and the starter-prompt hook filters out page-scoped and entity-templated questions — only globals survive, with the wider main-page cap. The chosen scope is persisted in localStorage under `copilot-context-scope` so it survives panel close/reopen. New i18n keys: `search:copilot.scopePageSpecific`, `search:copilot.scopeGeneral`, `search:copilot.scopeThisPage` — added to all 7 locales (en/de/es/fr/it/ja/nl). Co-authored-by: Isaac
`AppRole.id` is typed as Python `UUID` (object) but the applied role
override is stored as a string (JSON has no UUID type), so the
`role.id == override_id` comparison was always False — every override
silently fell through to group intersection, returning the user's
default role ("Admin" for admins) instead of the impersonated role.
Compare via `str()` on both sides.
Caught when a user with admin groups + Data Producer override saw
admin-flavored Ask Ontos responses despite the UI reflecting the
override.
Co-authored-by: Isaac
…280) Ask Ontos answers were leaking implementation language (SQLAlchemy `Db` model names, raw column names, internal workflow IDs) into user-facing output, missing the Ontos Asset concept when users wanted to publish UC tables, and emitting literal `****` strings as section dividers. This commit addresses the structural causes in the prompt + classifier. System prompt (`controller/system_prompts.py`): - New **Audience and tone** section establishes the end-user contract and enumerates forbidden vocabulary (`*Db` class names, Pydantic classes, raw column names, workflow IDs). Includes translation examples so the model knows how to map corpus internals to UI labels. - Added **Asset** to "Core artifacts" so the model treats UC tables as Ontos Assets the moment they enter the platform, instead of staying in raw-catalog framing. - Tightened the **Consumable** definition to clarify that first-time products exposing existing UC tables don't need Consumables — only products that depend on other Ontos-governed products do. - Added a Discovery-strategy bullet that routes UC-resource mentions (tables, views, models) to the Asset grounding. - Added Response-format rules: markdown only, no `****` / unmatched asterisk sequences, and a four-part Action / What happens / Where to see it / Next template for "how do I…" and "where is X…" questions. Query classifier (`tools/query_classifier.py`): - New `assets` category triggered by `asset`, `table`, `unity catalog`, `uc`, `delta table`, `publish`, `govern`, etc. The category is wired into the per-request category list so the system prompt's Asset awareness fires on the right queries. No tool currently registers `category = "assets"`, so this is purely a classifier-side signal; future asset tools and tests can rely on it without further work. Co-authored-by: Isaac
…ions (#280) Each of the 13 concept docs now splits into two top-level sections: - **What you see in Ontos** — UI labels, surface names, action steps, user-visible outcomes. The system prompt's audience contract (Commit A) instructs the LLM to prefer this layer when answering an Ontos end user. - **Under the hood** — SQLAlchemy `*Db` model names, raw column names, workflow IDs, RDF/SPARQL internals, transport detail. The LLM is instructed to descend here only when the user explicitly asks about implementation or schema. Heaviest restructure landed on `data-quality.md`, `asset-model.md`, and `ontology-and-knowledge-graph.md` — these mixed UI surfaces and DB internals throughout. The two-system framing in `data-quality.md` keeps the operational steps in "What you see" (Profile dataset action, Quality panel, Schema tab) and the persisted-row detail (the check definition fields, the measurement row fields, the DQX workflow internals, the DQX→ODCS dimension mapping) in "Under the hood". `asset-model.md` rephrases "What an Asset is" in user-facing terms and puts the `AssetDb`/`EntityRelationshipDb` schema detail and the asset-type sync mechanics under the implementation layer. `ontology-and-knowledge-graph.md` reorders sections so the user-facing three-tier linking story, glossary-as-view explanation, industry packs, and round-trip asymmetry all live under "What you see", while the rdflib runtime graph internals and SPARQL endpoint detail land under "Under the hood". Lighter restructures land on `agreement-workflow.md` (StepType enum, WorkflowExecutionDb state machine, `grant_permissions` SDK detail go under the hood; agreement concept, approval gates, roles, triggers, webhook/delivery/immutability stay user-facing), `data-contract- lifecycle.md` (schema-object / server / role / SLA DB-table detail under the hood, ODCS lifecycle and editor-of-record user-facing), `data-product-lifecycle.md` (most ODPS field detail is already user-visible spec talk, so the under-the-hood block is a brief pointer to `db_models/data_products.py`), `delivery-and-propagation.md` (change-types enum table under the hood, modes and concept→UC-tag flow user-facing), `roles-and-rbac.md` (permission model + built-in roles + Filtered + persona override stay user-facing, identity resolution and per-execution authorization go under the hood), `installation-and-troubleshooting.md` (admin-facing throughout, so a shallow wrap with a one-paragraph under-the-hood pointer), and `mcp-and-ask-ontos.md`, `entities-glossary.md`, `personas-quick- reference.md`, `end-to-end-flows.md` (all primarily user-facing, light wrapping). All existing anchors are preserved so cross-references continue to resolve. No content was deleted; this is a layout-only restructure plus a small amount of in-place rewording for sectional flow on `data-quality.md` and `asset-model.md`. Co-authored-by: Isaac
"in the top-right" was inserted during the corpus restructure without verifying the UI. The action label "Profile dataset" is enough — locations like top-right/sidebar/header should not appear in concept docs unless explicitly verified against the live UI, since they leak into Ask Ontos answers verbatim. Co-authored-by: Isaac
Two corpus corrections caught when Ask Ontos gave imprecise UI guidance: 1. The action button is **Profile with DQX**, not "Profile dataset". 2. On the contract detail page, check definitions live in the **Quality Rules** section. The **Quality panel** is the Data Product detail page rollup of measurements. The two pages have distinct surfaces and shouldn't be conflated. Co-authored-by: Isaac
The prompt asked the LLM to itemize quality checks on a contract, but `get_data_contract` doesn't return the embedded `qualityRules` array, so Ask Ontos consistently fell back to "I don't have authoritative information." Better to remove the prompt than to set the LLM up to fail — itemizing checks is the job of the UI's Quality Rules section on the contract detail page, not the copilot. Removes the question definition from copilot-questions.ts and the corresponding i18n strings from all 7 locales. Co-authored-by: Isaac
The LLM ignored the previous negative rule ("Never emit ****") and
kept emitting 4+ asterisk sequences as visual markers — they leak
into the chat panel as literal characters because they're never valid
CommonMark.
Two defenses:
1. Server-side strip in LLMSearchManager.chat: post-process the
assistant response with `re.sub(r"\*{4,}", "", text)`. Four or
more consecutive asterisks are never valid markdown (bold needs
exactly two, bold-italic exactly three), so stripping is safe.
2. Rewrote the markdown rule in system_prompts.py as a positive
spec ("every `*` must pair correctly, closing `**` on the same
line"), with the no-three-or-more-asterisks rule called out
explicitly. Also fixed the inline "Profile dataset" example to
match the actual UI label ("Profile with DQX").
Co-authored-by: Isaac
Two issues surfaced when PR2 rebased onto the renamed PR1: 1. test_llm_search_user_context.py still imported from src.controller.system_prompts; updated to src.tools.system_prompts. 2. After the docs corpus restructure into "What you see in Ontos" / "Under the hood", named-entity content (e.g. \`#### Data Steward\`) landed under h4 — but the parser only saw h2/h3 sections, so focused content was buried inside long h3 bodies and lost token-density ranking. Extended _HEADING_RE and the section selector to accept h4. Net effect: queries like "data steward" now hit the entity's own focused section in roles-and-rbac.md instead of being outscored by smaller, denser docs. Co-authored-by: Isaac
aa6b9c4 to
7836d3b
Compare
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
PR 2 of the Ask Ontos uplift (#280), stacked on top of #472. Where PR 1 grounded the copilot in the curated handbook corpus, this PR personalizes it (role/page/entity awareness + adoption-mode preamble), tightens the response style (audience contract, how-to template, asterisk strip), and adds entity-aware starter prompts with a context-scope toggle. 16 commits, all atomic.
Base branch:
feature/ask-ontos-uplift-pr1. Merge #472 first; this PR rebases cleanly onto its head.Phase 2 — App-state awareness
feat(copilot): app-state awareness — adoption mode + mode-aware preamble— newget_app_statetool + sharedget_adoption_snapshothelper.adoption_mode: blank | active(binary, hinges on published products). The## Current workspace statepreamble is prepended to the system prompt; the frontend reads the same value to pick mode-aware starter prompts.Phase 3 — Role / page / entity injection
feat(copilot): inject role + page + entity context into Ask Ontos prompt—ChatMessageCreategains optionalpage_name/page_url/feature_id/selected_entity. Server derives the effective role label via group ∩ assigned_groups (defense-in-depth — never accepts role from the client). Frontend forwards page context from the Zustand copilot store.refactor(copilot): drop redundant client-side context prefix— server-side preamble is the single source of truth; removed the duplicate client-side prose injection.Role-override fix (caught during E2E)
feat(copilot): honor applied role override in Ask Ontos role label— initial wiring.fix(copilot): match role override id as string in role label derivation— root cause:AppRole.idis a Python UUID but the override is stored as a string from JSON, sorole.id == override_idwas always False and the label silently fell through to group intersection. Compare viastr()on both sides. Now Data Producer impersonation actually addresses the user as Data Producer.Starter-prompt UX
feat(copilot): entity-aware starter prompts on detail pages— addsrequiresEntity?: booleantoCopilotQuestionDef; hides those questions when noselectedEntityis in the copilot store; substitutes{{entityName}}fromselectedEntity.namein the localized text.feat(copilot): expand entity-templated starter prompts— ~20 new questions across data products, contracts, assets, domains (e.g. "What is the data quality score of {{entityName}}?"). All 7 locales updated.feat(copilot): smart prompt ranking + per-scope cap— sort: entity-templated > page-scoped > global. Cap: 6 on detail/list pages, 15 on main/marketplace. The 70%+ entity-bias on detail pages emerges structurally from the sort + cap.feat(copilot): context-scope toggle on Asking-about chip— chip is now a dropdown withPage-specific/Ontos (general). Persisted in localStorage. Whengeneralis selected the frontend strips page context from the chat payload and the hook filters to globally-scoped starters only. Chip renders on every page (not just when an entity is selected).feat(copilot): drop ct_explain_quality starter prompt— the prompt asked the LLM to itemize quality checks on a contract, butget_data_contractdoesn't return thequalityRulesarray, so the LLM consistently fell back to "I don't have authoritative information." Itemization is the UI's job (Quality Rules section); removed the prompt.Response-style discipline
feat(copilot): audience contract + Asset awareness + how-to template— explicit "you are speaking to an end user" framing in the system prompt, forbidden-vocabulary list (*Dbclass names, raw column names, internal workflow IDs), Asset added to "Core artifacts", Consumable definition tightened ("only needed for upstream-product dependencies"), new how-to template (Action / What happens / Where to see it / Next),assetskeyword category added to the query classifier.docs(concepts): layer corpus into user-facing and implementation sections— restructures all 13 handbook docs into## What you see in Ontos(UI labels, surfaces, actions) and## Under the hood(DB models, columns, workflow IDs). The system prompt instructs the LLM to prefer the user-facing section when answering user questions.docs(concepts): drop unverified UI placement in DQX profiling step— removed an unverified "top-right" location claim.docs(concepts): align Quality UI labels with the actual app—Profile dataset→Profile with DQX(actual button label); disambiguatedQuality Rules(contract page) vsQuality panel(product page).fix(copilot): strip orphan asterisk runs + tighten markdown rule— server-sidere.sub(r"\*{4,}", "", text)defense-in-depth strip, plus rewrote the markdown rule as a positive spec.Post-rebase reconciliation
fix(copilot): post-rebase fixes after concepts→handbook rename— when rebased onto the renamed PR 1 (feat(copilot): ground Ask Ontos in concept docs corpus (#280) #472), one PR-2 test still imported fromsrc.controller.system_prompts(nowsrc.tools.system_prompts); fixed. Also extended the heading parser to include H4 so that named-entity sections under the new "What you see / Under the hood" H2 split (e.g.#### Data Steward) get their own focused scoring instead of being diluted into long H3 bodies.Verification
-k "llm_search or system_prompt or handbook or role_override"): 52 pass, 0 fail.knowledge-graph.tsx(4) andslider.tsx(1); zero new errors introduced.{{entityName}}substitution; ranking + cap honored.*Dbclass names, raw columns, or internal workflow IDs in user-visible output.****reaches the chat panel even when the LLM emits them.Stacking notes
This PR's diff against
mainincludes #472's 4 commits transitively. The 16 commits in this PR specifically sit on top of #472's head. CI failures (pre-existing onmain): TS6133 unused-import errors indata-contract-details.tsx, pytest-asyncio fixture issues intest_trigger_types_endpoint.py. Neither is in the touched scope.Closes #280.
This pull request and its description were written by Isaac.