fix(search): default project filter, FTS5 fallback, concept remap, import sync, dedup#2064
fix(search): default project filter, FTS5 fallback, concept remap, import sync, dedup#2064thedotmack wants to merge 2 commits intomainfrom
Conversation
…port sync, dedup - #1911: Default project filter to current project in normalizeParams() - #1912: Pass project filter through to Chroma where clause and SQLite hydration in searchObservations(), searchSessions(), searchUserPrompts() - #1913: Fall back to FTS5 MATCH when ChromaDB is disabled/unavailable - #1916: Remap singular `concept` to plural `concepts` in normalizeParams() - #1914: Sync imported observations to ChromaDB via new syncObservationRow() - #1915: Deduplicate results by content_hash and per-project+session diversity cap Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
|
Warning Rate limit exceeded
Your organization is not enrolled in usage-based pricing. Contact your admin to enable usage-based pricing to continue reviews beyond the rate limit, or try again in 4 minutes and 48 seconds. ⌛ How to resolve this issue?After the wait time has elapsed, a review can be triggered using the We recommend that you space out your commits to avoid hitting the rate limit. 🚦 How do rate limits work?CodeRabbit enforces hourly rate limits for each developer per organization. Our paid plans have higher rate limits than the trial, open-source and free plans. In all cases, we re-allow further reviews after a brief timeout. Please see our FAQ for further information. ℹ️ Review info⚙️ Run configurationConfiguration used: Organization UI Review profile: CHILL Plan: Pro Run ID: 📒 Files selected for processing (3)
✨ Finishing Touches🧪 Generate unit tests (beta)
Comment |
Code ReviewGood set of bug fixes that addresses real gaps in search correctness. Here's my analysis by area: SearchManager.ts — FTS5 Fallback (
|
Greptile SummaryThis PR fixes six search isolation bugs: defaulting project scope, passing the project filter through to Chroma where-clauses and SQLite hydration, FTS5 fallback when Chroma is absent, singular
Confidence Score: 3/5Merge-ready after the filter-only deduplication cap is scoped to semantic search paths only. One P1 defect exists: the per-session diversity cap (5) is applied unconditionally to filter-only queries, silently returning fewer results than the requested limit for date/type-only searches. Remaining issues are P2. src/services/worker/SearchManager.ts — deduplicateResults call site and concept remap logic. Important Files Changed
Flowchart%%{init: {'theme': 'neutral'}}%%
flowchart TD
A[search args] --> B[normalizeParams]
B -->|no project| C[default to cwd project]
B -->|concept singular| D[remap to concepts array]
B --> E{query present?}
E -->|No| F[PATH 1: Filter-only SQLite]
E -->|Yes, chromaSync available| G[PATH 2: Chroma semantic search]
E -->|Yes, chromaSync null| H[PATH 3: FTS5 fallback]
G -->|Chroma returns 0| I[Empty result - no FTS5 fallback]
G -->|Chroma returns IDs| J[Filter by date window]
J --> K[Hydrate from SQLite with project filter]
H --> L{FTS5 tables exist?}
L -->|No| M[chromaFailed=true, empty]
L -->|Yes| N[FTS5 MATCH with project filter]
F --> O[deduplicateResults ⚠️ diversity cap=5/session]
K --> O
N --> O
M --> O
O --> P[Format & return]
style O fill:#ffcccc
style I fill:#ffffcc
Reviews (1): Last reviewed commit: "fix(search): default project filter, FTS..." | Re-trigger Greptile |
| // Bug #1915: Deduplicate results by content hash and apply diversity cap | ||
| observations = this.deduplicateResults(observations); | ||
| sessions = this.deduplicateResults(sessions); | ||
| prompts = this.deduplicateResults(prompts); |
There was a problem hiding this comment.
Diversity cap silently truncates filter-only results
deduplicateResults (with maxPerProjectSession = 5) is applied unconditionally across all three search paths, including PATH 1 (filter-only, no query text). A user calling /api/search?type=observations&project=myproject or any date-range filter against a single active session will silently receive at most 5 results per session, even if the limit is 20 and 20 rows match. The hash-dedup step doesn't help (no duplicates in direct DB queries), so only the diversity cap fires.
Consider gating the diversity cap on the semantic-search paths only:
// Bug #1915: Deduplicate results only for semantic/FTS paths where
// result monopolization is a concern (not filter-only queries)
if (query) {
observations = this.deduplicateResults(observations);
sessions = this.deduplicateResults(sessions);
prompts = this.deduplicateResults(prompts);
}| if (normalized.concept && !normalized.concepts) { | ||
| normalized.concepts = [normalized.concept]; | ||
| delete normalized.concept; | ||
| } |
There was a problem hiding this comment.
concept remap wraps the string in an array; findByConcept receives string[] where it declares string
After the remap, normalized.concepts = [normalized.concept] (an array). findByConcept then does const { concepts: concept, ...filters } = normalized, so concept becomes string[]. SessionSearch.findByConcept(concept: string, ...) receives an array at runtime, but it only works by accident because buildFilterClause normalises the concepts filter with Array.isArray.
A simpler and type-safe fix is to keep it as a string:
if (normalized.concept && !normalized.concepts) {
normalized.concepts = normalized.concept; // keep as string; CSV splitting happens above
delete normalized.concept;
}| async syncObservationRow(row: { | ||
| id: number; | ||
| memory_session_id: string; | ||
| project: string; | ||
| merged_into_project?: string | null; | ||
| text: string | null; | ||
| type: string; | ||
| title: string | null; | ||
| subtitle: string | null; | ||
| facts: string | null; | ||
| narrative: string | null; | ||
| concepts: string | null; | ||
| files_read: string | null; | ||
| files_modified: string | null; | ||
| prompt_number: number; | ||
| discovery_tokens: number; | ||
| created_at: string; | ||
| created_at_epoch: number; |
There was a problem hiding this comment.
prompt_number declared non-nullable here but import payload allows null
SessionStore.importObservation accepts prompt_number: number | null, so an imported observation may have prompt_number: null. syncObservationRow's inline parameter type declares prompt_number: number, creating a TypeScript type mismatch at the call site in DataRoutes.ts. TypeScript will infer the element type from the import schema's nullable field and flag the mismatch.
| async syncObservationRow(row: { | |
| id: number; | |
| memory_session_id: string; | |
| project: string; | |
| merged_into_project?: string | null; | |
| text: string | null; | |
| type: string; | |
| title: string | null; | |
| subtitle: string | null; | |
| facts: string | null; | |
| narrative: string | null; | |
| concepts: string | null; | |
| files_read: string | null; | |
| files_modified: string | null; | |
| prompt_number: number; | |
| discovery_tokens: number; | |
| created_at: string; | |
| created_at_epoch: number; | |
| prompt_number: number | null; |
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Code ReviewGood batch of correctness fixes. The overall approach is sound — each bug is addressed directly without over-engineering. A few things worth addressing before merge: IssuesSQL injection risk in FTS5 sanitization ( The current sanitizer strips const sanitizedQuery = query
.replace(/[^a-zA-Z0-9\s]/g, ' ') // strip all non-alphanumeric
.split(/\s+/)
.filter(Boolean)
.map(term => `"${term}"`)
.join(' OR ');
The worker is a long-running Express service — its
The inline parameter type is 17 fields that duplicate NitsRepeated private buildChromaFilter(docType: string, project?: string): Record<string, any> {
if (!project) return { doc_type: docType };
return { $and: [{ doc_type: docType }, { $or: [{ project }, { merged_into_project: project }] }] };
}
Hardcoded Looks Good
Test CoverageAll test plan checkboxes are unchecked. The FTS5 fallback path in particular is easy to miss in manual testing since it only triggers when Chroma is absent. Would be good to add at least an integration test for the fallback path before merge. |
|
Closing to start fresh from main — will redo fixes isolated in Docker container. |
Summary
conceptto pluralconceptsin normalizeParams() to prevent malformed SQLsyncObservationRow()methodTest plan
conceptparamCloses #1911, closes #1912, closes #1913, closes #1916, closes #1914, closes #1915
🤖 Generated with Claude Code