Skip to content

fix(search): default project filter, FTS5 fallback, concept remap, import sync, dedup#2064

Closed
thedotmack wants to merge 2 commits intomainfrom
bugfix/search-bugs-1
Closed

fix(search): default project filter, FTS5 fallback, concept remap, import sync, dedup#2064
thedotmack wants to merge 2 commits intomainfrom
bugfix/search-bugs-1

Conversation

@thedotmack
Copy link
Copy Markdown
Owner

Summary

Test plan

  • Verify search without project param returns only current project results
  • Verify per-type search endpoints (/api/search/observations, /api/search/sessions, /api/search/prompts) respect project filter in Chroma queries
  • Verify text search works when Chroma is disabled (FTS5 fallback)
  • Verify /api/search/by-concept works with singular concept param
  • Verify imported observations appear in MCP search results (ChromaDB sync)
  • Verify search results are deduplicated by content_hash

Closes #1911, closes #1912, closes #1913, closes #1916, closes #1914, closes #1915

🤖 Generated with Claude Code

…port sync, dedup

- #1911: Default project filter to current project in normalizeParams()
- #1912: Pass project filter through to Chroma where clause and SQLite hydration
  in searchObservations(), searchSessions(), searchUserPrompts()
- #1913: Fall back to FTS5 MATCH when ChromaDB is disabled/unavailable
- #1916: Remap singular `concept` to plural `concepts` in normalizeParams()
- #1914: Sync imported observations to ChromaDB via new syncObservationRow()
- #1915: Deduplicate results by content_hash and per-project+session diversity cap

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@coderabbitai
Copy link
Copy Markdown
Contributor

coderabbitai Bot commented Apr 18, 2026

Warning

Rate limit exceeded

@thedotmack has exceeded the limit for the number of commits that can be reviewed per hour. Please wait 4 minutes and 48 seconds before requesting another review.

Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 48 seconds.

⌛ How to resolve this issue?

After the wait time has elapsed, a review can be triggered using the @coderabbitai review command as a PR comment. Alternatively, push new commits to this PR.

We recommend that you space out your commits to avoid hitting the rate limit.

🚦 How do rate limits work?

CodeRabbit enforces hourly rate limits for each developer per organization.

Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout.

Please see our FAQ for further information.

ℹ️ Review info
⚙️ Run configuration

Configuration used: Organization UI

Review profile: CHILL

Plan: Pro

Run ID: d63a0c47-d75d-47fd-baea-8ad328f7d71a

📥 Commits

Reviewing files that changed from the base of the PR and between beea789 and d63471f.

📒 Files selected for processing (3)
  • src/services/sync/ChromaSync.ts
  • src/services/worker/SearchManager.ts
  • src/services/worker/http/routes/DataRoutes.ts
✨ Finishing Touches
🧪 Generate unit tests (beta)
  • Create PR with unit tests
  • Commit unit tests in branch bugfix/search-bugs-1

Comment @coderabbitai help to get the list of available commands and usage tips.

@claude
Copy link
Copy Markdown

claude Bot commented Apr 18, 2026

Code Review

Good set of bug fixes that addresses real gaps in search correctness. Here's my analysis by area:


SearchManager.ts — FTS5 Fallback (searchFTS5Fallback)

Incomplete FTS5 query sanitization (potential runtime error)
The sanitizer strips ' and " but leaves other FTS5 special characters (*, (, ), -, ^) that can cause parse errors:

// Current — misses ( ) * - ^
.replace(/['"]/g, '')
// Suggested — strip all FTS5 special chars
.replace(/['"()*\-^]/g, '')

Repeated FTS table-existence checks
session_summaries_fts and user_prompts_fts are each checked inside the conditional block on every call. Consider caching these booleans on first run or checking all three up-front alongside observations_fts.

any[] params type

const params: any[] = [sanitizedQuery];

This can be (string | number)[] since the only values pushed are sanitizedQuery (string), options.project (string), and options.limit (number).


SearchManager.ts — Default project filter (normalizeParams)

Silently breaks cross-project search with no opt-out

if (!normalized.project) {
  normalized.project = getProjectContext(process.cwd()).primary;
}

There is no way for callers to request all-project results anymore. Consider a sentinel value (e.g. project: '*') or an explicit allProjects: true flag so power users aren't permanently locked out.


SearchManager.ts — Deduplication (deduplicateResults)

Hardcoded diversity cap
maxPerProjectSession = 5 is hardcoded. If the defaults in normalizeParams keep a project filter, the diversity cap will mostly be a no-op — results are already scoped to one project. Consider documenting this or relaxing the default when a project filter is active.

Missing content_hash on some result types
content_hash is optional on the generic type, so records without it bypass hash dedup entirely. That seems intentional for sessions/prompts, but worth a brief comment to avoid future confusion.


ChromaSync.ts — syncObservationRow

Inline row type duplicates StoredObservation
The parameter type is a large anonymous struct that mirrors StoredObservation almost exactly. Consider accepting StoredObservation directly (the caller in DataRoutes already constructs one), or at minimum define a named type alias:

type ObservationDbRow = Omit<StoredObservation, /* any diverging fields */>;

No Chroma-initialized guard
addDocuments is called unconditionally. If ChromaSync is constructed but the underlying client fails to connect mid-session, this will throw. The existing syncObservation methods check initialization state — this one should too.


DataRoutes.ts — Fire-and-forget Chroma sync

The fire-and-forget pattern is appropriate here and the double .catch() is a good safety net. One minor nit: the outer IIFE is unnecessary — you can call .catch() directly on the async arrow:

// Before
(async () => { ... })().catch(...)
// After
const syncAll = async () => { ... };
syncAll().catch(...)

Comment style

Per project conventions, comments like // Bug #1916: and /** Bug #1913: ... */ reference issue numbers that will rot as the codebase evolves. The why is valuable — the issue number pointer is not. Consider rewriting as pure rationale:

// Remap singular `concept` to plural `concepts` — findByConcept() destructures
// the plural form and silently produces malformed SQL when it is absent.

No automated tests

All test-plan items remain unchecked. The FTS5 fallback path, deduplication logic, and conceptconcepts remap are pure functions that would be straightforward to unit test. At minimum a test for deduplicateResults and the FTS5 sanitization would catch regressions cheaply.


Summary

The fixes are directionally correct and address real bugs. The main concerns before merging:

  1. FTS5 sanitization — incomplete escaping can cause runtime errors on user queries with (, ), *, etc.
  2. No cross-project search escape hatch — the default project filter is a breaking change for any workflow relying on global search.
  3. syncObservationRow type duplication — minor, but worth cleaning up before it drifts further from StoredObservation.

🤖 Generated with Claude Code

@greptile-apps
Copy link
Copy Markdown
Contributor

greptile-apps Bot commented Apr 18, 2026

Greptile Summary

This PR fixes six search isolation bugs: defaulting project scope, passing the project filter through to Chroma where-clauses and SQLite hydration, FTS5 fallback when Chroma is absent, singular concept param remapping, ChromaDB sync on import, and result deduplication.

  • P1 — Diversity cap on filter-only queries: deduplicateResults (cap = 5 per project+session) is applied unconditionally across all three paths including the filter-only path, silently truncating date-range or type-only queries to 5 results per session.
  • P2 — Concept remap type mismatch: the remap stores concepts = [concept] (array), but findByConcept declares concept: string; works at runtime by accident via buildFilterClause's array-normalisation.

Confidence Score: 3/5

Merge-ready after the filter-only deduplication cap is scoped to semantic search paths only.

One P1 defect exists: the per-session diversity cap (5) is applied unconditionally to filter-only queries, silently returning fewer results than the requested limit for date/type-only searches. Remaining issues are P2.

src/services/worker/SearchManager.ts — deduplicateResults call site and concept remap logic.

Important Files Changed

Filename Overview
src/services/worker/SearchManager.ts Core of the PR: adds project-scoped FTS5 fallback, deduplication with diversity cap, concept remap, and default project injection — the diversity cap is unconditionally applied to filter-only queries (P1), and the concept remap wraps the value in an array causing a type mismatch (P2).
src/services/sync/ChromaSync.ts Adds syncObservationRow() to sync raw DB rows to Chroma; logic is sound but prompt_number is typed as non-nullable while the import schema allows null (P2 type mismatch).
src/services/worker/http/routes/DataRoutes.ts Fire-and-forget ChromaDB sync for imported observations is correctly wired; error handling and logging are in place, no issues found.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[search args] --> B[normalizeParams]
    B -->|no project| C[default to cwd project]
    B -->|concept singular| D[remap to concepts array]
    B --> E{query present?}

    E -->|No| F[PATH 1: Filter-only SQLite]
    E -->|Yes, chromaSync available| G[PATH 2: Chroma semantic search]
    E -->|Yes, chromaSync null| H[PATH 3: FTS5 fallback]

    G -->|Chroma returns 0| I[Empty result - no FTS5 fallback]
    G -->|Chroma returns IDs| J[Filter by date window]
    J --> K[Hydrate from SQLite with project filter]

    H --> L{FTS5 tables exist?}
    L -->|No| M[chromaFailed=true, empty]
    L -->|Yes| N[FTS5 MATCH with project filter]

    F --> O[deduplicateResults ⚠️ diversity cap=5/session]
    K --> O
    N --> O
    M --> O

    O --> P[Format & return]

    style O fill:#ffcccc
    style I fill:#ffffcc
Loading

Fix All in Claude Code

Reviews (1): Last reviewed commit: "fix(search): default project filter, FTS..." | Re-trigger Greptile

Comment thread src/services/worker/SearchManager.ts Outdated
Comment on lines +471 to +474
// Bug #1915: Deduplicate results by content hash and apply diversity cap
observations = this.deduplicateResults(observations);
sessions = this.deduplicateResults(sessions);
prompts = this.deduplicateResults(prompts);
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P1 Diversity cap silently truncates filter-only results

deduplicateResults (with maxPerProjectSession = 5) is applied unconditionally across all three search paths, including PATH 1 (filter-only, no query text). A user calling /api/search?type=observations&project=myproject or any date-range filter against a single active session will silently receive at most 5 results per session, even if the limit is 20 and 20 rows match. The hash-dedup step doesn't help (no duplicates in direct DB queries), so only the diversity cap fires.

Consider gating the diversity cap on the semantic-search paths only:

// Bug #1915: Deduplicate results only for semantic/FTS paths where
// result monopolization is a concern (not filter-only queries)
if (query) {
  observations = this.deduplicateResults(observations);
  sessions    = this.deduplicateResults(sessions);
  prompts     = this.deduplicateResults(prompts);
}

Fix in Claude Code

Comment on lines +129 to +132
if (normalized.concept && !normalized.concepts) {
normalized.concepts = [normalized.concept];
delete normalized.concept;
}
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 concept remap wraps the string in an array; findByConcept receives string[] where it declares string

After the remap, normalized.concepts = [normalized.concept] (an array). findByConcept then does const { concepts: concept, ...filters } = normalized, so concept becomes string[]. SessionSearch.findByConcept(concept: string, ...) receives an array at runtime, but it only works by accident because buildFilterClause normalises the concepts filter with Array.isArray.

A simpler and type-safe fix is to keep it as a string:

if (normalized.concept && !normalized.concepts) {
  normalized.concepts = normalized.concept; // keep as string; CSV splitting happens above
  delete normalized.concept;
}

Fix in Claude Code

Comment on lines +385 to +402
async syncObservationRow(row: {
id: number;
memory_session_id: string;
project: string;
merged_into_project?: string | null;
text: string | null;
type: string;
title: string | null;
subtitle: string | null;
facts: string | null;
narrative: string | null;
concepts: string | null;
files_read: string | null;
files_modified: string | null;
prompt_number: number;
discovery_tokens: number;
created_at: string;
created_at_epoch: number;
Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2 prompt_number declared non-nullable here but import payload allows null

SessionStore.importObservation accepts prompt_number: number | null, so an imported observation may have prompt_number: null. syncObservationRow's inline parameter type declares prompt_number: number, creating a TypeScript type mismatch at the call site in DataRoutes.ts. TypeScript will infer the element type from the import schema's nullable field and flag the mismatch.

Suggested change
async syncObservationRow(row: {
id: number;
memory_session_id: string;
project: string;
merged_into_project?: string | null;
text: string | null;
type: string;
title: string | null;
subtitle: string | null;
facts: string | null;
narrative: string | null;
concepts: string | null;
files_read: string | null;
files_modified: string | null;
prompt_number: number;
discovery_tokens: number;
created_at: string;
created_at_epoch: number;
prompt_number: number | null;

Fix in Claude Code

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
@claude
Copy link
Copy Markdown

claude Bot commented Apr 18, 2026

Code Review

Good batch of correctness fixes. The overall approach is sound — each bug is addressed directly without over-engineering. A few things worth addressing before merge:


Issues

SQL injection risk in FTS5 sanitization (SearchManager.tssearchFTS5Fallback)

The current sanitizer strips ' and " then wraps terms in double-quotes, but FTS5 MATCH still accepts operators like ^, NEAR/n, and - that can produce unexpected query behavior. Consider using a stricter allowlist:

const sanitizedQuery = query
  .replace(/[^a-zA-Z0-9\s]/g, ' ')  // strip all non-alphanumeric
  .split(/\s+/)
  .filter(Boolean)
  .map(term => `"${term}"`)
  .join(' OR ');

process.cwd() in normalizeParams() (SearchManager.ts — Bug #1911)

The worker is a long-running Express service — its process.cwd() is fixed at startup and won't reflect the user's current project when they switch directories in a new Claude session. If getProjectContext is what drives project isolation, this default may silently return wrong-project results for multi-project users. Worth verifying how the worker receives the correct project context (presumably via the hook/HTTP request), and whether a fallback to cwd is actually safe here.

syncObservationRow type redundancy (ChromaSync.ts — Bug #1914)

The inline parameter type is 17 fields that duplicate StoredObservation. Since the method immediately maps to StoredObservation, just accept Omit<StoredObservation, ...> or whatever subset is available from the DB row. Less drift risk.


Nits

Repeated whereFilter construction — The Chroma filter block ($and: [doc_type, $or: [project, merged_into_project]]) is copied verbatim for observations, sessions, and prompts. Extract a helper:

private buildChromaFilter(docType: string, project?: string): Record<string, any> {
  if (!project) return { doc_type: docType };
  return { $and: [{ doc_type: docType }, { $or: [{ project }, { merged_into_project: project }] }] };
}

params: any[] in searchFTS5Fallback — should be (string | number)[].

Hardcoded maxPerProjectSession = 5 — fine for now, but this will likely need to be configurable as usage patterns vary.


Looks Good

  • FTS5 fallback logic is clean; graceful degradation when neither Chroma nor FTS is available is handled correctly.
  • Fire-and-forget Chroma sync for imports (DataRoutes.ts) is the right call — no reason to block the import response.
  • deduplicateResults generic is well-typed and the hash-first then diversity-cap ordering is correct.
  • Skipping diversity cap for filter-only (no query) searches is a good UX decision.

Test Coverage

All test plan checkboxes are unchecked. The FTS5 fallback path in particular is easy to miss in manual testing since it only triggers when Chroma is absent. Would be good to add at least an integration test for the fallback path before merge.

@thedotmack
Copy link
Copy Markdown
Owner Author

Closing to start fresh from main — will redo fixes isolated in Docker container.

@thedotmack thedotmack closed this Apr 20, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment