feat(copilot): Ask Ontos uplift — personalization, smart prompts, audience contract (#280) by mvkonchits-db · Pull Request #488 · databrickslabs/ontos

mvkonchits-db · 2026-06-03T07:51:07Z

Summary

PR 2 of the Ask Ontos uplift (#280), stacked on top of #472. Where PR 1 grounded the copilot in the curated handbook corpus, this PR personalizes it (role/page/entity awareness + adoption-mode preamble), tightens the response style (audience contract, how-to template, asterisk strip), and adds entity-aware starter prompts with a context-scope toggle. 16 commits, all atomic.

Base branch: feature/ask-ontos-uplift-pr1. Merge #472 first; this PR rebases cleanly onto its head.

Phase 2 — App-state awareness

feat(copilot): app-state awareness — adoption mode + mode-aware preamble — new get_app_state tool + shared get_adoption_snapshot helper. adoption_mode: blank | active (binary, hinges on published products). The ## Current workspace state preamble is prepended to the system prompt; the frontend reads the same value to pick mode-aware starter prompts.

Phase 3 — Role / page / entity injection

feat(copilot): inject role + page + entity context into Ask Ontos prompt — ChatMessageCreate gains optional page_name / page_url / feature_id / selected_entity. Server derives the effective role label via group ∩ assigned_groups (defense-in-depth — never accepts role from the client). Frontend forwards page context from the Zustand copilot store.
refactor(copilot): drop redundant client-side context prefix — server-side preamble is the single source of truth; removed the duplicate client-side prose injection.

Role-override fix (caught during E2E)

feat(copilot): honor applied role override in Ask Ontos role label — initial wiring.
fix(copilot): match role override id as string in role label derivation — root cause: AppRole.id is a Python UUID but the override is stored as a string from JSON, so role.id == override_id was always False and the label silently fell through to group intersection. Compare via str() on both sides. Now Data Producer impersonation actually addresses the user as Data Producer.

Starter-prompt UX

feat(copilot): entity-aware starter prompts on detail pages — adds requiresEntity?: boolean to CopilotQuestionDef; hides those questions when no selectedEntity is in the copilot store; substitutes {{entityName}} from selectedEntity.name in the localized text.
feat(copilot): expand entity-templated starter prompts — ~20 new questions across data products, contracts, assets, domains (e.g. "What is the data quality score of {{entityName}}?"). All 7 locales updated.
feat(copilot): smart prompt ranking + per-scope cap — sort: entity-templated > page-scoped > global. Cap: 6 on detail/list pages, 15 on main/marketplace. The 70%+ entity-bias on detail pages emerges structurally from the sort + cap.
feat(copilot): context-scope toggle on Asking-about chip — chip is now a dropdown with Page-specific / Ontos (general). Persisted in localStorage. When general is selected the frontend strips page context from the chat payload and the hook filters to globally-scoped starters only. Chip renders on every page (not just when an entity is selected).
feat(copilot): drop ct_explain_quality starter prompt — the prompt asked the LLM to itemize quality checks on a contract, but get_data_contract doesn't return the qualityRules array, so the LLM consistently fell back to "I don't have authoritative information." Itemization is the UI's job (Quality Rules section); removed the prompt.

Response-style discipline

feat(copilot): audience contract + Asset awareness + how-to template — explicit "you are speaking to an end user" framing in the system prompt, forbidden-vocabulary list (*Db class names, raw column names, internal workflow IDs), Asset added to "Core artifacts", Consumable definition tightened ("only needed for upstream-product dependencies"), new how-to template (Action / What happens / Where to see it / Next), assets keyword category added to the query classifier.
docs(concepts): layer corpus into user-facing and implementation sections — restructures all 13 handbook docs into ## What you see in Ontos (UI labels, surfaces, actions) and ## Under the hood (DB models, columns, workflow IDs). The system prompt instructs the LLM to prefer the user-facing section when answering user questions.
docs(concepts): drop unverified UI placement in DQX profiling step — removed an unverified "top-right" location claim.
docs(concepts): align Quality UI labels with the actual app — Profile dataset → Profile with DQX (actual button label); disambiguated Quality Rules (contract page) vs Quality panel (product page).
fix(copilot): strip orphan asterisk runs + tighten markdown rule — server-side re.sub(r"\*{4,}", "", text) defense-in-depth strip, plus rewrote the markdown rule as a positive spec.

Post-rebase reconciliation

fix(copilot): post-rebase fixes after concepts→handbook rename — when rebased onto the renamed PR 1 (feat(copilot): ground Ask Ontos in concept docs corpus (#280) #472), one PR-2 test still imported from src.controller.system_prompts (now src.tools.system_prompts); fixed. Also extended the heading parser to include H4 so that named-entity sections under the new "What you see / Under the hood" H2 split (e.g. #### Data Steward) get their own focused scoring instead of being diluted into long H3 bodies.

Verification

Backend tests for touched scope (-k "llm_search or system_prompt or handbook or role_override"): 52 pass, 0 fail.
Frontend typecheck: 5 pre-existing errors in knowledge-graph.tsx (4) and slider.tsx (1); zero new errors introduced.
E2E on FEVM workspace:
- Role override → answer addresses user as Data Producer (verified after the UUID-vs-string fix).
- Detail-page starter prompts → entity-templated prompts with {{entityName}} substitution; ranking + cap honored.
- Context-scope toggle → dropdown switches between page-specific and general; chat payload + starter prompts both respect the mode.
- Audience contract → no *Db class names, raw columns, or internal workflow IDs in user-visible output.
- Asterisk strip → no literal **** reaches the chat panel even when the LLM emits them.

Stacking notes

This PR's diff against main includes #472's 4 commits transitively. The 16 commits in this PR specifically sit on top of #472's head. CI failures (pre-existing on main): TS6133 unused-import errors in data-contract-details.tsx, pytest-asyncio fixture issues in test_trigger_types_endpoint.py. Neither is in the touched scope.

Closes #280.

This pull request and its description were written by Isaac.

Add 13 markdown files under docs/concepts/ that serve as the grounding corpus for the Ask Ontos copilot. Covers: - roles & RBAC + permission model - data product / data contract lifecycles - agreement workflow (workflow vs execution vs agreement) - ontology and knowledge-graph model, semantic linking (three-tier) - data quality + DQX integration end-to-end - delivery modes vs delivery methods (disambiguated) - MCP and Ask Ontos surfaces - asset model - personas quick-reference - end-to-end flows (bottom-up UC -> catalog, top-down ontology -> assets) Every major section carries an explicit {#kebab-anchor} so the copilot can cite via search_ontos_concepts in a follow-up commit. Citations are hidden from end-users in v1; the corpus is LLM grounding, not a user-facing docs site. Vocabulary aligned with the pitch deck + CUJ doc (ODPS v1.0.0, ODCS v3.1.0). Forward-compatibility softening applied for several in-flight PRs (versioning, Ontos admin decoupling, approver-role filter, etc.) without naming them. Co-authored-by: Isaac

Make the in-product copilot citation-anchor conceptual answers to the new docs/concepts/ corpus. - Add SearchOntosConceptsTool that walks docs/concepts/, parses sections by heading and {#anchor}, returns top-K excerpts ranked by title > anchor > body keyword frequency. Each match returns file, anchor, title, excerpt, source_uri (file.md#anchor). - Add 'concepts' query-classifier category in DEFAULT_CATEGORIES and ALWAYS_INCLUDED_CATEGORIES so the tool is offered on every conceptual question. - Extract hardcoded SYSTEM_PROMPT into a new controller/system_prompts.py module exposing get_system_prompt() with personalization slots (role, page_name, selected_entity, adoption_mode) for Phase 2/3 to fill. v1 ignores the slots. - Honor LLM_SYSTEM_PROMPT env override (previously defined in Settings but never consumed). - New default system prompt: vocabulary primer aligned with the pitch deck + CUJ doc, tool-first policy for conceptual questions, three-tier confidence labels ([Confirmed]/[Documented]/[Inferred]), hidden citation discipline, strict refusal template, out-of-scope deflection. Tests: - 13 unit tests for SearchOntosConceptsTool (empty query, known concept, multi-doc concept, no-match, anchor extraction) - 6 integration tests for /api/llm-search/chat with the new tool + system prompt - Full unit suite passes (1011/1011); no regressions Co-authored-by: Isaac

The previous resolution walked exactly 5 parents above concepts.py to find docs/concepts/. That assumed the local-dev layout (with src/ as a wrapper) and silently broke in deployed Databricks Apps where src/ is stripped (so the corpus lives 4 parents up, not 5). Replace with: - ONTOS_CONCEPTS_DIR env var override (explicit, takes precedence) - Walk-up search across parents 2..6 looking for docs/concepts/ - Graceful None on miss (tool still returns success=True, empty matches) Verified for both layouts: - Local: <ontos>/src/backend/src/tools/concepts.py -> finds at parents[4] - Deployed: <approot>/backend/src/tools/concepts.py -> finds at parents[3] Co-authored-by: Isaac

…onse The system prompt asks the model to anchor conceptual answers with hidden HTML-comment citations (e.g. ``) so reviewers can audit grounding without exposing them to end users. Most markdown renderers drop HTML comments on render, but the chat UI surfaces them as visible text — which is what live E2E confirmed. Add a server-side strip in LlmSearchManager: - `_CITATION_COMMENT_RE` matches `` (non-greedy) - `_strip_internal_citations` returns (cleaned_text, [refs]) so debug_info retains the citations for audit while the user-facing response is clean - Applied at the inner-loop final return; collapses any 3+ newlines created by the strip back to double Citations remain accessible via `debug_info["internal_citations"]` when the client sets `debug=True`. Co-authored-by: Isaac

Add a 14th file covering install (Marketplace vs Git), update procedures, maintenance, and common UI errors. 37 anchors so any specific error can be cited. Topics: - Distribution channels: Marketplace vs GitHub repo, when to choose which - First install: prerequisites, first-admin bootstrap, demo presets - Updates: Marketplace path, Git path, migration discipline (append-only, ≤32-char revision IDs), DB state vs code state - Maintenance: alembic at startup, role re-seeding (first-start-only), workspace sync direction (from src/), OAuth scope-change cookie gotcha, customer fork hygiene - UI errors users actually see: * Identity — Request role prompt, unexpected 403s, UC scope missing * Workflows — Cannot approve, grant_permissions failed (MANAGE required) * Database — Alembic version too long, Lakebase autoscale stuck, stale data after git revert * Deploy — Process did not start in 10 min, corpus not found 6 customer-voice "Common questions". Cross-references to roles-and-rbac, agreement-workflow, delivery-and-propagation, mcp-and-ask-ontos. No customer names, no internal ticket IDs. README.md updated to 14 files; verification footer bumped to 2026-05-29. Co-authored-by: Isaac

…ponse The labels [Confirmed]/[Documented]/[Inferred] were emitted user-visible per the v1 system prompt, but they expose grounding mechanics that don't belong in the surfaced answer. Treat them the same way as citation comments — emit them so the model still stratifies confidence and so reviewers can audit grounding, but strip server-side before returning. - Add `_CONFIDENCE_LABEL_RE` and extend `_strip_internal_citations` to a 3-tuple return (cleaned_text, citations, confidence_labels) - Surface both into debug_info (`internal_citations`, `confidence_labels`) so audit consumers can still see them via `debug=true` - Update system prompt to declare the labels internal/stripped (so the model knows the act of stratifying matters even though they're hidden) Co-authored-by: Isaac

The model was opening conceptual answers with a bolded restatement of the user's question (e.g. **What is a Team?** followed by the answer). That's redundant in the chat thread where the user already sees their own question above, and reads as noise. Update the Response format section to explicitly forbid: - restating, echoing, or rephrasing the question - bolded-question headers as openers - "Great question!" / "Let me explain..." fillers Begin with the answer directly. Co-authored-by: Isaac

system_prompts.py is not a controller — it's a pure templating function with no state, no business logic, no DB or manager dependencies. Move it next to the MCP/LLM tools registry it actually feeds, where the file structure already groups everything the copilot reaches for. Per Lars' review on PR #472. Co-authored-by: Isaac

…" namespace (#280) Ontos already uses "Concept" for an RDF/SKOS ontology entity — a node in the knowledge graph identified by an IRI, surfaced as an ontology class or glossary term. Overloading the same noun for the LLM grounding corpus caused real confusion: `tools/concepts.py` was unrelated to the `Concept` entity, `search_ontos_concepts` could mean either, and "concept corpus" / "concept docs" prose blurred the distinction throughout the codebase. This commit renames the LLM grounding corpus to "handbook" everywhere — `docs/concepts/` → `docs/handbook/`, `tools/concepts.py` → `tools/handbook.py`, the LLM-callable tool `search_ontos_concepts` → `search_ontos_handbook`, the `ONTOS_CONCEPTS_DIR` env var → `ONTOS_HANDBOOK_DIR`, and the `concepts` query-classifier category → `handbook`. "Concept" is now reserved for the ontology entity. Per Lars' review on PR #472. Co-authored-by: Isaac

The handbook corpus is English-only, but the Ontos UI ships in seven locales (English, German, Spanish, French, Italian, Japanese, Dutch). Today the system prompt gives the LLM no guidance on what to translate vs. keep in English — so non-English answers end up translating product nouns ("Datenprodukt", "Lieferbar") that don't exist anywhere in the UI. Add a `## Language` section that tells the model: answer in the user's language, but keep Ontos product nouns and UI labels in English exactly as they appear in the app. Per Lars' review on PR #472. Co-authored-by: Isaac

…ure log The corpus-not-found warning referenced a symbol that was never defined (carried over from the original concepts.py implementation; the constant was removed but the log f-string still referenced it). This was a latent NameError on the unhappy path — if the resolver ever returned None the log emit itself would crash before the empty-matches ToolResult could be returned. Replace with the env-var constant we actually have, so the warning tells operators exactly which knobs were tried. Co-authored-by: Isaac

…ble (#280) Phase 2 of the Ask Ontos uplift teaches the copilot whether the workspace is fresh (no published data products) or active and tailors every conceptual answer accordingly. Two coordinated paths share one snapshot helper so the prompt and the LLM-callable tool can never disagree. Backend - New ``src/tools/app_state.py`` exposes ``GetAppStateTool`` (LLM-callable, no params, always-on category) and ``get_adoption_snapshot(db)`` — the shared helper that counts entities and derives ``adoption_mode in {blank, active}``. Mode hinges on *published* products (non-null ``publication_scope`` AND not literal 'none'), so a workspace with only drafts is still blank. - Tool registered in ``tools/registry.py`` and added to ``ToolName.GET_APP_STATE``. New ``app_state`` keyword category in ``query_classifier.py``, also added to ``ALWAYS_INCLUDED_CATEGORIES`` so the LLM can introspect on every call. - ``LLMSearchManager.chat`` pre-fetches the snapshot ONCE per request and passes ``adoption_mode`` into ``get_system_prompt``. Snapshot failure logs and degrades to ``adoption_mode=None`` — the prompt falls back to the byte-identical Phase 1 default, so the existing override-path and grounded-prompt tests still pass. - ``system_prompts.py`` prepends a ``## Current workspace state`` H2 with one of two short preambles (onboarding vs operational tone). The ``LLM_SYSTEM_PROMPT`` env override still wins verbatim — preambles do NOT stack on top of it. - ``LLMSearchStatus`` gains ``adoption_mode`` so the frontend can pick mode-aware starter prompts without an extra round-trip. Also fixes a NameError in ``tools/concepts.py`` (PR1 left a reference to an undefined ``_DEFAULT_CONCEPTS_DIR`` in the missing-corpus branch); covered by the existing test which now passes. Frontend - ``LLMSearchStatus`` TS type and ``AdoptionMode`` literal mirror the backend. - ``copilot-questions.ts`` gains an optional ``adoptionMode`` field on ``CopilotQuestionDef`` and a new ``getting_started`` category. Five onboarding starter prompts (create first product, set up domains, assign roles, what-is-Ontos, core concepts) tagged ``adoptionMode: 'blank'``. - ``useCopilotQuestions`` filters by adoption mode passed from the panel. ``null`` mode (snapshot unavailable) hides mode-tagged questions only — historical catalog still renders. - ``copilot-panel.tsx`` forwards ``status?.adoption_mode`` into the hook. ``llm-search.tsx``'s ``ExampleQuestions`` also switches its 4 starter buttons based on mode. Tests - 10 unit tests for ``GetAppStateTool`` + the snapshot helper (blank, draft-only-still-blank, literal-'none' handling, active-on-publish, multi-entity counts, no-param tolerance, error degrade, snapshot shape). - 8 integration tests through ``LLMSearchManager`` (registry, always-on category, blank-mode preamble, active-mode preamble, override path bypasses preamble, status surfaces mode, snapshot failure falls back to default prompt). All 37 unit + integration tests in the touched scope pass. Co-authored-by: Isaac

…mpt (#280) Phase 3 of the Ask Ontos uplift teaches the copilot WHO is asking and WHAT they're looking at. The frontend already had the data in its copilot store; this PR wires it through the chat payload and renders a ``## Current user context`` block in the system prompt so answers can be tailored ("a Data Consumer needs task-completion help; an Admin needs configuration depth"). Backend - ``ChatMessageCreate`` gains optional ``page_name`` / ``page_url`` / ``feature_id`` / ``selected_entity`` fields. A new ``SelectedEntity`` Pydantic model holds the small (type, name, id) descriptor. All fields are optional so pre-Phase-3 clients (MCP, tests) keep working unchanged. - The chat route receives those fields and passes them to ``LLMSearchManager.chat``. The user's effective Ontos role label is derived server-side via a new helper ``_derive_effective_role_label`` that intersects the user's groups with role assignments (case-insensitive, comma-joined when multiple match). Role is NOT taken from the request payload — defense-in-depth so a client can't impersonate a role just by lying. - ``system_prompts.get_system_prompt`` now renders the ``## Current user context`` H2 below the Phase 2 workspace-state preamble, with bullets for Role / Currently on / Viewing. Empty inputs cleanly omit individual lines or the whole block — returns the byte-identical Phase 1 default when nothing is passed. - ``LLM_SYSTEM_PROMPT`` env override still wins verbatim — Phase 3 preambles do NOT prepend on top of an override. Frontend - ``sendMessage`` in ``llm-search-api.ts`` now reads ``pageContext`` off the Zustand store via ``getState()`` (the API helper is a plain function, not a hook) and forwards ``page_name``, ``page_url``, ``feature_id``, ``selected_entity`` in the chat payload. A small inline ``ChatRequestBody`` type documents the shape without dragging in a fuller request-model export. Tests - 11 integration tests in ``test_llm_search_user_context.py``: - 4 prompt-assembly tests (full block, omit-when-empty, partial inputs, entity-without-id). - 3 end-to-end chat tests (propagation, no-context backward compat, ``_chat_context`` is cleared between calls so a long-lived manager can't leak request state). - 4 role-derivation unit tests (no settings manager, empty groups, case-insensitive intersection, multi-role join). All 48 unit + integration tests in the Ask Ontos touch-set pass. Full backend suite: my touched scope has zero failures; the 165 pre-existing failures on this branch are in unrelated areas (test_teams_routes, test_user_routes, test_workspace_routes, test_pr_i_partial_put, test_semantic_upload_integration) and predate this PR. Co-authored-by: Isaac

Phase 3 (commit 5f3ed7a8) added structured page / role / entity context to the chat-request payload and rendered it as a `## Current user context` preamble in the system prompt. The old client-side `buildContextPrefix` function in copilot-panel.tsx is now duplicating the same information by injecting it directly into the user's message content, which: 1. eats user-message tokens unnecessarily 2. duplicates structured context as unstructured prose 3. makes session transcripts confusing (the LLM sees the context twice, once in the prompt preamble and once in every user turn) Remove the helper and the call site. The server-side preamble is the single source of truth. Co-authored-by: Isaac

) When a user clicks "Role: X (Override)" in the UI, the override is persisted via SettingsManager.set_applied_role_override_for_user but does not mutate user.groups. The Ask Ontos role-derivation helper was the lone holdout still consulting only the user's groups, so the LLM kept seeing the original (e.g. Admin) role instead of the applied override (e.g. Data Producer). Mirror the override-first pattern used in authorization.py and other call sites: consult get_applied_role_override_for_user before falling back to group intersection. If the override id no longer matches any current role, fall through rather than erroring. Fits Phase 3 of the Ask Ontos uplift. Co-authored-by: Isaac

Detail pages were surfacing the same prompts as list pages — "Which contracts are linked to this data product?" appeared on both /data-products and /data-products/<id>, never naming the actual entity. Tag entity-specific questions with `requiresEntity: true` and hide them when no `selectedEntity` is in the copilot page-context store. Substitute `{{entityName}}` from `pageContext.selectedEntity.name` in the rendered text so prompts read e.g. "Which contracts are linked to Customer 360?". Tagged keys: dp_show_contracts, ct_explain_quality, ct_add_quality_check, asset_show_lineage, dom_domain_health. Translations updated across all 7 locales (en, de, es, fr, it, ja, nl) to use the same placeholder spelling. Co-authored-by: Isaac

Add 20 new `requiresEntity: true` prompts across data products, contracts, assets, and domains, each using `{{entityName}}` so they only surface on detail pages and substitute the current entity name. New keys: dp_quality_score, dp_owner, dp_schema, dp_last_updated, dp_subscribe, dp_consumers, dp_publication_status, dp_publish_blockers; ct_what_covers, ct_used_by, ct_owner, ct_version_impact; asset_built_on, asset_freshness, asset_quality, asset_consumers; dom_products_in, dom_business_terms, dom_owner, dom_health_detail. Spread across explore / build / govern categories. `dom_domain_health` is kept as-is for backward compat. Localized in all 7 bundles (en/de/es/fr/it/ja/nl). Co-authored-by: Isaac

Sort matching starter prompts by specificity — entity-templated first, then page-scoped, then global — so the most-context-bound questions surface at the top of the panel. Cap the total prompt count per scope: 6 on detail pages, 6 on type- scoped list pages, 15 on main/marketplace/search. Combined with the new sort, detail pages structurally land ~70% entity-aware prompts (cap=6 with the entity-templated tier taking the top slots). Co-authored-by: Isaac

Turn the "Asking about" chip into a dropdown so users can flip between page-scoped help and a scope-free "Ontos (general)" mode. The chip now renders on every page: page-specific shows the selected entity (detail pages) or the page name (list/main pages); general shows "Ontos (general)" with a check next to the active choice. When the scope is `general`, the frontend strips `page_name`, `page_url`, `feature_id`, and `selected_entity` from the chat payload (backend treats their absence as no context), and the starter-prompt hook filters out page-scoped and entity-templated questions — only globals survive, with the wider main-page cap. The chosen scope is persisted in localStorage under `copilot-context-scope` so it survives panel close/reopen. New i18n keys: `search:copilot.scopePageSpecific`, `search:copilot.scopeGeneral`, `search:copilot.scopeThisPage` — added to all 7 locales (en/de/es/fr/it/ja/nl). Co-authored-by: Isaac

`AppRole.id` is typed as Python `UUID` (object) but the applied role override is stored as a string (JSON has no UUID type), so the `role.id == override_id` comparison was always False — every override silently fell through to group intersection, returning the user's default role ("Admin" for admins) instead of the impersonated role. Compare via `str()` on both sides. Caught when a user with admin groups + Data Producer override saw admin-flavored Ask Ontos responses despite the UI reflecting the override. Co-authored-by: Isaac

…280) Ask Ontos answers were leaking implementation language (SQLAlchemy `Db` model names, raw column names, internal workflow IDs) into user-facing output, missing the Ontos Asset concept when users wanted to publish UC tables, and emitting literal `****` strings as section dividers. This commit addresses the structural causes in the prompt + classifier. System prompt (`controller/system_prompts.py`): - New **Audience and tone** section establishes the end-user contract and enumerates forbidden vocabulary (`*Db` class names, Pydantic classes, raw column names, workflow IDs). Includes translation examples so the model knows how to map corpus internals to UI labels. - Added **Asset** to "Core artifacts" so the model treats UC tables as Ontos Assets the moment they enter the platform, instead of staying in raw-catalog framing. - Tightened the **Consumable** definition to clarify that first-time products exposing existing UC tables don't need Consumables — only products that depend on other Ontos-governed products do. - Added a Discovery-strategy bullet that routes UC-resource mentions (tables, views, models) to the Asset grounding. - Added Response-format rules: markdown only, no `****` / unmatched asterisk sequences, and a four-part Action / What happens / Where to see it / Next template for "how do I…" and "where is X…" questions. Query classifier (`tools/query_classifier.py`): - New `assets` category triggered by `asset`, `table`, `unity catalog`, `uc`, `delta table`, `publish`, `govern`, etc. The category is wired into the per-request category list so the system prompt's Asset awareness fires on the right queries. No tool currently registers `category = "assets"`, so this is purely a classifier-side signal; future asset tools and tests can rely on it without further work. Co-authored-by: Isaac

…ions (#280) Each of the 13 concept docs now splits into two top-level sections: - **What you see in Ontos** — UI labels, surface names, action steps, user-visible outcomes. The system prompt's audience contract (Commit A) instructs the LLM to prefer this layer when answering an Ontos end user. - **Under the hood** — SQLAlchemy `*Db` model names, raw column names, workflow IDs, RDF/SPARQL internals, transport detail. The LLM is instructed to descend here only when the user explicitly asks about implementation or schema. Heaviest restructure landed on `data-quality.md`, `asset-model.md`, and `ontology-and-knowledge-graph.md` — these mixed UI surfaces and DB internals throughout. The two-system framing in `data-quality.md` keeps the operational steps in "What you see" (Profile dataset action, Quality panel, Schema tab) and the persisted-row detail (the check definition fields, the measurement row fields, the DQX workflow internals, the DQX→ODCS dimension mapping) in "Under the hood". `asset-model.md` rephrases "What an Asset is" in user-facing terms and puts the `AssetDb`/`EntityRelationshipDb` schema detail and the asset-type sync mechanics under the implementation layer. `ontology-and-knowledge-graph.md` reorders sections so the user-facing three-tier linking story, glossary-as-view explanation, industry packs, and round-trip asymmetry all live under "What you see", while the rdflib runtime graph internals and SPARQL endpoint detail land under "Under the hood". Lighter restructures land on `agreement-workflow.md` (StepType enum, WorkflowExecutionDb state machine, `grant_permissions` SDK detail go under the hood; agreement concept, approval gates, roles, triggers, webhook/delivery/immutability stay user-facing), `data-contract- lifecycle.md` (schema-object / server / role / SLA DB-table detail under the hood, ODCS lifecycle and editor-of-record user-facing), `data-product-lifecycle.md` (most ODPS field detail is already user-visible spec talk, so the under-the-hood block is a brief pointer to `db_models/data_products.py`), `delivery-and-propagation.md` (change-types enum table under the hood, modes and concept→UC-tag flow user-facing), `roles-and-rbac.md` (permission model + built-in roles + Filtered + persona override stay user-facing, identity resolution and per-execution authorization go under the hood), `installation-and-troubleshooting.md` (admin-facing throughout, so a shallow wrap with a one-paragraph under-the-hood pointer), and `mcp-and-ask-ontos.md`, `entities-glossary.md`, `personas-quick- reference.md`, `end-to-end-flows.md` (all primarily user-facing, light wrapping). All existing anchors are preserved so cross-references continue to resolve. No content was deleted; this is a layout-only restructure plus a small amount of in-place rewording for sectional flow on `data-quality.md` and `asset-model.md`. Co-authored-by: Isaac

"in the top-right" was inserted during the corpus restructure without verifying the UI. The action label "Profile dataset" is enough — locations like top-right/sidebar/header should not appear in concept docs unless explicitly verified against the live UI, since they leak into Ask Ontos answers verbatim. Co-authored-by: Isaac

Two corpus corrections caught when Ask Ontos gave imprecise UI guidance: 1. The action button is **Profile with DQX**, not "Profile dataset". 2. On the contract detail page, check definitions live in the **Quality Rules** section. The **Quality panel** is the Data Product detail page rollup of measurements. The two pages have distinct surfaces and shouldn't be conflated. Co-authored-by: Isaac

The prompt asked the LLM to itemize quality checks on a contract, but `get_data_contract` doesn't return the embedded `qualityRules` array, so Ask Ontos consistently fell back to "I don't have authoritative information." Better to remove the prompt than to set the LLM up to fail — itemizing checks is the job of the UI's Quality Rules section on the contract detail page, not the copilot. Removes the question definition from copilot-questions.ts and the corresponding i18n strings from all 7 locales. Co-authored-by: Isaac

The LLM ignored the previous negative rule ("Never emit ****") and kept emitting 4+ asterisk sequences as visual markers — they leak into the chat panel as literal characters because they're never valid CommonMark. Two defenses: 1. Server-side strip in LLMSearchManager.chat: post-process the assistant response with `re.sub(r"\*{4,}", "", text)`. Four or more consecutive asterisks are never valid markdown (bold needs exactly two, bold-italic exactly three), so stripping is safe. 2. Rewrote the markdown rule in system_prompts.py as a positive spec ("every `*` must pair correctly, closing `**` on the same line"), with the no-three-or-more-asterisks rule called out explicitly. Also fixed the inline "Profile dataset" example to match the actual UI label ("Profile with DQX"). Co-authored-by: Isaac

Two issues surfaced when PR2 rebased onto the renamed PR1: 1. test_llm_search_user_context.py still imported from src.controller.system_prompts; updated to src.tools.system_prompts. 2. After the docs corpus restructure into "What you see in Ontos" / "Under the hood", named-entity content (e.g. \`#### Data Steward\`) landed under h4 — but the parser only saw h2/h3 sections, so focused content was buried inside long h3 bodies and lost token-density ranking. Extended _HEADING_RE and the section selector to accept h4. Net effect: queries like "data steward" now hit the entity's own focused section in roles-and-rbac.md instead of being outscored by smaller, denser docs. Co-authored-by: Isaac

mvkonchits-db added 27 commits May 29, 2026 14:02

larsgeorge-db force-pushed the feature/ask-ontos-uplift-pr1 branch from aa6b9c4 to 7836d3b Compare June 8, 2026 07:50

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

feat(copilot): Ask Ontos uplift — personalization, smart prompts, audience contract (#280)#488

feat(copilot): Ask Ontos uplift — personalization, smart prompts, audience contract (#280)#488
mvkonchits-db wants to merge 27 commits into
feature/ask-ontos-uplift-pr1from
feature/ask-ontos-uplift-pr2

mvkonchits-db commented Jun 3, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

mvkonchits-db commented Jun 3, 2026

Summary

Phase 2 — App-state awareness

Phase 3 — Role / page / entity injection

Role-override fix (caught during E2E)

Starter-prompt UX

Response-style discipline

Post-rebase reconciliation

Verification

Stacking notes

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant