fix(forms,storage): Form 144 num() whitespace + PG bulk-writer duplicate-CIK dedup#118
Open
sroussey wants to merge 1 commit into
Open
fix(forms,storage): Form 144 num() whitespace + PG bulk-writer duplicate-CIK dedup#118sroussey wants to merge 1 commit into
sroussey wants to merge 1 commit into
Conversation
…ate-CIK dedup
- Extract shared str/num helpers (scalar + wrapped) into _valueHelpers.ts
so Form 144 and OwnershipDocument share the same null-on-empty/whitespace
semantics. Form 144's previous local num() treated "" as null but a
whitespace-only value would Number(" ")=0, fabricating a 0 in DB.
- Bump Form_144.storage extractor_version 1.0.0 -> 1.1.0 to trigger
re-extraction. OwnershipDocument behaviour is unchanged so its version
stays.
- In the Postgres cik_names bulk writer, dedup duplicate CIKs per slice
(last value wins) before building the multi-row INSERT, so an in-batch
duplicate cik no longer trips ON CONFLICT once-per-statement.
11 tasks
sroussey
added a commit
that referenced
this pull request
Jun 2, 2026
Plan H part 1 of 5 — split across multiple commits due to the push-files tool size limit; logically one fix. See later commits for Form_1_A tests, Form_1_K, Form_1_Z, and Form_C. Adds the shared _valueHelpers module (numScalar/strScalar/numWrapped/ strWrapped) plus its unit tests. numScalar treats empty / whitespace- only / NaN / Infinity input as null and rejects thousand-separator strings; legitimate "0" round-trips so the regression guard holds. NOTE: this is an inline copy of the helpers introduced by PR #118 under src/sec/forms/insider-trading/_valueHelpers.ts. When #118 merges the duplicate should be removed in favour of a single shared module. Also switches Form_1_A.schema.ts decimal-type aliases from Type.Number() to Type.String() (4 aliases) so the storage layer can make the null-vs-zero decision per cell with numScalar() instead of Value.Convert silently producing 0 for empty text. Form_1_A.storage.ts is updated to numScalar() every decimal field in processFinancialData and the RegAOfferingHistory build, and the extractor_version is bumped 1.0.0 → 1.1.0 to force re-extraction. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
Contributor
There was a problem hiding this comment.
Pull request overview
This PR delivers two data-integrity fixes: (1) prevent whitespace-only numeric values in insider-trading form XML from being coerced into fabricated 0 values during extraction, and (2) prevent Postgres bulk inserts for CIK-name imports from failing when duplicate CIKs appear within the same INSERT ... ON CONFLICT DO UPDATE statement.
Changes:
- Introduces shared
str*/num*value helpers that trim and map empty/whitespace-only inputs tonull, then adopts them in Form 144 and Ownership Document storage code. - Bumps Form 144
extractor_versionto1.1.0to force re-extraction and overwrite previously stored fabricated zeros. - Adds Postgres per-slice dedup for
cikNameBulkWriterto avoid duplicate-conflict-key failures, plus new/updated unit tests.
Reviewed changes
Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.
Show a summary per file
| File | Description |
|---|---|
| src/sec/forms/insider-trading/_valueHelpers.ts | Adds shared scalar vs wrapped value-parsing helpers with whitespace-to-null handling. |
| src/sec/forms/insider-trading/_valueHelpers.test.ts | Adds unit tests pinning empty/whitespace and finite-number coercion semantics. |
| src/sec/forms/insider-trading/Form_144.storage.ts | Switches to shared helpers and bumps extractor version to force re-extract. |
| src/sec/forms/insider-trading/Form_144.storage.test.ts | Adds regression tests ensuring whitespace-only numeric fields persist as null. |
| src/sec/forms/insider-trading/OwnershipDocument.storage.ts | Switches to shared wrapped helpers (intended behavior unchanged). |
| src/sec/forms/insider-trading/OwnershipDocument.storage.test.ts | Adds regression test for whitespace-only wrapped numeric element (transactionShares). |
| src/storage/entity/cikNameBulkWriter.ts | Adds per-slice dedup in Postgres writer to prevent ON CONFLICT duplicate-key statement failures. |
| src/task/ciknames/FetchAllCikNamesTask.test.ts | Adds a duplicate-CIK batch test (currently routed through the repository fallback writer in this suite). |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
Comment on lines
+86
to
+98
| it("dedups duplicate CIKs within a single batch, last value wins", async () => { | ||
| const writer = createCikNameBulkWriter(); | ||
| await writer.writeBatch([ | ||
| { cik: 1, name: "FIRST" }, | ||
| { cik: 2, name: "B" }, | ||
| { cik: 1, name: "LAST" }, | ||
| ]); | ||
| await writer.close(); | ||
| const repo = globalServiceRegistry.get(CIK_NAME_REPOSITORY_TOKEN); | ||
| expect((await repo.get({ cik: 1 }))?.name).toBe("LAST"); | ||
| const all = await repo.getAll(); | ||
| expect(all?.length).toBe(2); | ||
| }); |
sroussey
added a commit
that referenced
this pull request
Jun 3, 2026
Plan H part 1 of 5 — split across multiple commits due to the push-files tool size limit; logically one fix. See later commits for Form_1_A tests, Form_1_K, Form_1_Z, and Form_C. Adds the shared _valueHelpers module (numScalar/strScalar/numWrapped/ strWrapped) plus its unit tests. numScalar treats empty / whitespace- only / NaN / Infinity input as null and rejects thousand-separator strings; legitimate "0" round-trips so the regression guard holds. NOTE: this is an inline copy of the helpers introduced by PR #118 under src/sec/forms/insider-trading/_valueHelpers.ts. When #118 merges the duplicate should be removed in favour of a single shared module. Also switches Form_1_A.schema.ts decimal-type aliases from Type.Number() to Type.String() (4 aliases) so the storage layer can make the null-vs-zero decision per cell with numScalar() instead of Value.Convert silently producing 0 for empty text. Form_1_A.storage.ts is updated to numScalar() every decimal field in processFinancialData and the RegAOfferingHistory build, and the extractor_version is bumped 1.0.0 → 1.1.0 to force re-extraction. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
sroussey
added a commit
that referenced
this pull request
Jun 4, 2026
…SP + download leak (sec) (#121) * fix(cli): complete CSV escape hardening with unicode + bare-CR support Layers two refinements on top of the line-by-line CSV defusing in #119: 1. Tighten LEADING_WS so it only strips space-like characters that spreadsheets silently ignore (ASCII space, NBSP, SHY, ZWSP, ZWNJ, ZWJ, LRM, RLM, BOM). \t and \r are themselves dangerous formula leads, so we no longer strip them away before the DANGEROUS_LEAD check — otherwise "\t=cmd" or "\r=cmd" would slip through as non-dangerous after the strip. 2. Split on /(\r\n|\r|\n)/ instead of /(\r?\n)/ so that a bare CR inside a multi-line cell is also a line boundary; the line after the CR is independently defused. Excel re-parses every physical line of a quoted cell, including lines separated by lone CR. Tests cover the zero-width-prefix bypasses (ZWSP/ZWNJ/ZWJ/LRM/RLM/SHY/BOM + "=cmd"), mixed-WS bypasses ("ZWSP space =cmd"), bare-CR-followed-by-formula ("safe\r=cmd"), and a negative control to prove ZWSP-then-benign is left alone. All 44 cases in TableRenderer.test.ts pass. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(storage,task): close observation TOCTOU + bootstrap zip leak Three related correctness fixes: 1. PersonObservationRepo / CompanyObservationRepo upsertByNaturalKey had a TOCTOU window — two concurrent upserts on an empty in-memory store would both observe `size()` as 0 and assign observation_id=1 to two different natural keys. Introduce a tiny process-wide AsyncMutex in src/util/AsyncMutex.ts that serializes the INSERT branch only (the existing-row update branch is keyed by observation_id and remains lock-free). Inside the critical section we re-query the natural key before allocating an id so that a parallel upsert of the same natural key still collapses to one row and one id. 2. BootstrapDownloadTask staged the multi-GB zip into SEC_RAW_DATA_FOLDER and only removed it on the success branch. If Bun.spawn threw, or unzip exited non-zero, the archive leaked until the next bootstrap run, silently consuming disk on operator VMs. Wrap the spawn+exit-code block in try/finally and remove the zip (with `force: true`) on every exit path. 3. AsyncMutex itself: trivial queue-based serializer. Rejections in one critical section don't poison the queue because the chained tracker swallows both branches; each caller still observes its own rejection through the returned promise. Tests: - PersonObservationRepo.test.ts / CompanyObservationRepo.test.ts each gain two new cases: concurrent INSERT-on-empty must yield distinct sequential ids, and concurrent UPDATE-existing + parallel INSERT must keep the original id stable and give the insert the next id. - BootstrapDownloadTask.test.ts gains an "execute zip cleanup" describe block that stubs fetch + Bun.spawn + Bun.which and asserts the zip is removed on (a) spawn-throws, (b) unzip-exits-nonzero, and (c) the success path. All 24 affected tests pass. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(forms): stop fabricating 0 for empty exempt-offering numerics (1/5) Plan H part 1 of 5 — split across multiple commits due to the push-files tool size limit; logically one fix. See later commits for Form_1_A tests, Form_1_K, Form_1_Z, and Form_C. Adds the shared _valueHelpers module (numScalar/strScalar/numWrapped/ strWrapped) plus its unit tests. numScalar treats empty / whitespace- only / NaN / Infinity input as null and rejects thousand-separator strings; legitimate "0" round-trips so the regression guard holds. NOTE: this is an inline copy of the helpers introduced by PR #118 under src/sec/forms/insider-trading/_valueHelpers.ts. When #118 merges the duplicate should be removed in favour of a single shared module. Also switches Form_1_A.schema.ts decimal-type aliases from Type.Number() to Type.String() (4 aliases) so the storage layer can make the null-vs-zero decision per cell with numScalar() instead of Value.Convert silently producing 0 for empty text. Form_1_A.storage.ts is updated to numScalar() every decimal field in processFinancialData and the RegAOfferingHistory build, and the extractor_version is bumped 1.0.0 → 1.1.0 to force re-extraction. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(forms): stop fabricating 0 for empty exempt-offering numerics (2/5) Plan H part 2 of 5 — Form_1_A test updates. Form_1_A.test.ts: decimal-typed leaves now arrive as raw strings from the parser; assertions updated accordingly (pricePerSecurity, audit/ legal/etc. fees). New "validate financial data types and ranges" split into decimal-string vs integer-number checks. Form_1_A.storage.test.ts: three new regression tests pin the null / empty / "0" round-trip behaviour through processForm1A. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(forms): stop fabricating 0 for empty exempt-offering numerics (3/5) Plan H part 3 of 5 — Form_1_K. Schema: 2 DECIMAL_TYPE aliases switch from Type.Number() to Type.String(). Storage (processOfferingHistory): price_per_security, aggregate_offering_price, aggregate_offering_price_holders, and estimated_net_amount now flow through numScalar(). extractor_version bumped 1.0.0 → 1.1.0 with the standard re-extract comment. Tests: parsing assertions updated for the new string contract; new storage regression tests for null/empty/"0" round-trip. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(forms): stop fabricating 0 for empty exempt-offering numerics (4/5) Plan H part 4 of 5 — Form_1_Z. Schema: 2 DECIMAL_TYPE aliases switch from Type.Number() to Type.String(). Storage: processOfferingSummaries now flows price_per_security, portionSecuritiesSoldIssuer/Securityholders, and issuerNetProceeds through numScalar(). processCertificationSuspension's approxRecordHolders also goes through numScalar() now (with a null-check instead of an undefined-check so the row is dropped when the input is empty/whitespace). extractor_version 1.0.0 → 1.1.0. Tests: parsing assertions updated for string contract; new storage regression tests for null/empty/"0" round-trip. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(forms): stop fabricating 0 for empty exempt-offering numerics (5/5) Plan H part 5 of 5 — Form_C and final wrap-up. Schema: 4 DECIMAL_TYPE aliases switch from Type.Number() to Type.String() — including the previously minimum-0 DECIMAL_TYPE7_2_ NONNEGATIVE. The minimum-0 invariant is no longer expressible at the schema level; that constraint becomes the extractor's responsibility (see Form_C.storage.ts numScalar() use). Storage: - processOfferingInfo now flows price, offering_amount, and maximum_offering_amount through numScalar(); priceDeterminationMethod remains a passthrough. - processAnnualReportDisclosures now flows every disclosure field through numScalar() and drops the row when the result is null, instead of writing a fabricated 0 disclosure_value. - extractor_version 1.0.0 → 1.1.0. Tests: parsing assertions updated for the new string contract on price/offeringAmount/maximumOfferingAmount/currentEmployees/ totalAssetMostRecentFiscalYear/revenueMostRecentFiscalYear. New storage regression tests cover null/empty/"0" round-trip for both offering and disclosure numerics. Whole-suite verification (bun test) passes 845/845 with the helpers, schemas, storage, and tests all in place. https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc * fix(cli): broaden CSV LEADING_WS to cover all spreadsheet-stripped invisibles The LEADING_WS character class used to strip a handful of zero-width characters before the formula-injection check, but missed common spreadsheet-stripped invisibles (U+00AD SHY, U+034F CGJ, U+061C ALM, U+115F/U+1160 Hangul fillers, U+1680 Ogham space, U+180E MVS, the bidi formatting block U+202A..U+202E, U+205F MMSP, the invisible operators U+2061..U+2064 / U+206A..U+206F, U+3164 Hangul filler, U+FFA0 halfwidth Hangul filler). Each of those can prefix a `=`/`+`/`-`/`@` cell that Excel/Sheets/Numbers strip before formula parsing. The previous source also kept raw invisible characters inline, which editors silently normalise. The regex is now built from `\uXXXX` escape sequences so the codepoint set is reviewable. The pre-existing test "defuses leading U+00A0 NBSP before '='" passed ASCII SPACE to escapeCsvValue() — vacuous because ASCII space was already stripped. The test now uses the U+00A0 escape and we add a table-driven regression for every codepoint LEADING_WS covers, plus NBSP-prefixed variants of every DANGEROUS_LEAD codepoint. https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u * fix(task): clean up partial bulk-download on stream abort or size mismatch streamDownloadToFile previously left a half-written archive on disk if the fetch body errored mid-stream. The next bootstrap run would then hand that partial multi-GB ZIP straight to unzip and either silently produce a corrupt extract or fail in a way that masked the real cause. Three fixes: - await writer.write(value) so back-pressure is honoured. The previous code fired-and-forgot, which let bytes pile up in the writer queue. - On any error path (network abort, writer failure, size mismatch), rmSync(destPath, { force: true }) before rethrowing. Best-effort — the original error is the one callers see. - Validate `bytes === content-length` when the server advertised one. A short response is now a hard failure instead of a truncated archive. The reader lock is released before the writer is closed so any pending flush doesn't deadlock against a still-locked stream. Tests cover the mid-stream abort and content-length mismatch paths, asserting both the throw AND that the destination file is gone. https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u * fix(observation): delegate observation_id assignment to the storage backend The old INSERT path computed `(await repo.size()) + 1` for the new row's observation_id, guarded by a process-wide AsyncMutex to close the size()/put() TOCTOU window. That was wrong in two ways: 1. The mutex was only effective in a single process. Two `sec` processes — or two workers — sharing the same SQLite/Postgres backend would still collide. 2. `repo.size()` reads the live row count, so a delete elsewhere in the table would silently reassign an id to a new natural key. Replace the whole dance with `x-auto-generated: true` on the observation_id schema field. The storage backend assigns the id (SQLite INTEGER PRIMARY KEY AUTOINCREMENT, Postgres SERIAL, in-memory counter) and returns the persisted row. `put()`'s return value gives us back the assigned id without a second query. The repo-side AsyncMutex import is dropped from both observation repos. Tests previously asserted specific id values (`expect(...).toBe(1)`). Auto-generated keys do not contract for "first id is 1" across backends (Postgres SERIAL can skip on rollback), so the assertions are now `toBeGreaterThan(0)` + `.not.toBe(other)`. EntityObserver tests get the same treatment. A smoke-test against InMemoryTabularStorage confirmed the backend does return the auto-generated id from put() — the natural-key PK fallback documented in the plan was not needed. https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u * fix(resolver): serialize person/company find-or-create per natural key Two parallel `resolver.resolve(obs)` calls that mapped to the same (resolver_version, key_kind, key_value) used to race: both observed no canonical row, both minted a fresh UUID via randomUUID(), both inserted, yielding two distinct canonical ids for what should have been the same entity. Every downstream identity-link that pointed at the duplicate canonical was then silently wrong. PersonResolver and CompanyResolver now hold a process-wide Map<string, { mutex: AsyncMutex; refs: number }> keyed on the resolver_version-plus-natural-key tuple. The find-or-create critical section runs inside `mutex.lock()` so a queued caller observes the canonical row the previous caller just inserted, and returns the same id. The refcount is decremented in a finally block and the entry is deleted from the map when it hits zero, so the map stays bounded for processes that resolve millions of distinct keys. Multi-process callers (separate `sec` invocations, workers sharing a backend) still need a backend-level UNIQUE constraint to be race-free — a single-process mutex isn't visible across processes. That is noted in the class JSDoc. New PersonResolver.race.test.ts and CompanyResolver.race.test.ts cover: - two parallel resolves on the same CIK collapsing to one canonical - ditto for the name-key path (Person) and CRD + name paths (Company) - 20-25 fan-out stress for queue-depth coverage - distinct-key parallel resolves do NOT block each other https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u * fix(pr-review): address 5 Copilot review findings on PR #121 * AsyncMutex: rewrite the file docblock so it documents the current per-key resolver find-or-create use-case instead of the retired observation-repo TOCTOU window (observation_id is now auto-generated). * BootstrapDownloadTask.streamDownloadToFile: close the writer before unlinking so Windows can actually delete the partial file, and surface writer.end() failures as the operation error on the success path instead of silently returning success over a half-flushed file. * BootstrapDownloadTask.test: snapshot+restore the previous SEC_RAW_DATA_FOLDER binding around setupRawDataFolder() so the globalServiceRegistry no longer leaks a stale tmpdir into later test files. * Form_1_Z.test: replace the vacuous `expect(cik).toBe(cik)` self- comparison with assertions that the value is a 1-10 digit string, matching the CIK_TYPE schema. * Form_C.storage: restore the schema-level minimum:0 invariant on `currentEmployees` that was lost when the field moved to string-typed decimals + numScalar(). Negative values now drop to null instead of persisting as a negative disclosure_value. Adds a regression test. * fix(cli): strip U+2028/U+2029 before formula-lead check JS `\n` does not match U+2028 LINE SEPARATOR or U+2029 PARAGRAPH SEPARATOR, so escapeCsvValue never split them out as line breaks. Some spreadsheet importers silently treat them as leading whitespace before formula parsing, leaving a "<LS>=cmd" payload unguarded. Add both codepoints to LEADING_WS and pin them with regression tests. https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u --------- Co-authored-by: Claude <noreply@anthropic.com>
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
Two independent High-priority data-integrity fixes in the insider-trading extractor and the CIK-names bulk writer.
Fix 1 — Form 144
num()whitespace + shared helpersThe local
num()helper insideForm_144.storage.tstreated""asnullbut a whitespace-only value (e.g." ","\t\n") slipped past the early-return and reachedNumber(" ")which returns0. EDGAR filings have been observed with whitespace-only numeric elements; the extractor was silently fabricating a0foraggregate_market_value,gross_proceeds,amount_acquired, etc. and stamping it into Postgres as if the filer had actually reported zero.src/sec/forms/insider-trading/_valueHelpers.tsexports two pairs of helpers with intentionally distinct signatures (scalar vs{value}-wrapped) so call sites can't accidentally cross-wire them:strScalar/numScalar— for Form 144's flat string/number leaves.strWrapped/numWrapped— for OwnershipDocument's{ value }leaves.null.Form_144.storage.tsremoves the localstr/numand importsstrScalar as str, numScalar as num. Bumpsextractor_versionfrom"1.0.0"→"1.1.0"so the production version-slot machinery re-runs the extractor against every previously-stored Form 144 and overwrites the fabricated zeros.OwnershipDocument.storage.tsremoves its localstr/numand importsstrWrapped as str, numWrapped as num.extractor_versionis intentionally NOT bumped — that helper already trimmed before the empty-check, so behaviour is byte-for-byte identical and re-extraction would be pure churn._valueHelpers.test.tspins null-on-empty/whitespace, finite-only coercion, and the scalar/wrapped boundary (including{value:" "},{value:undefined},{}).Form_144.storage.test.tsgets parallel whitespace-only regression tests foraggregateMarketValue,grossProceeds(recent sales), andamountOfSecuritiesAcquired(acquisitions).OwnershipDocument.storage.test.tsgets a parallel whitespace-onlytransactionSharesregression test.Fix 2 — PG
cikNameBulkWriterper-slice dedupcreatePostgresWriter().writeBatchbuilt one multi-rowINSERT ... ON CONFLICT ("cik") DO UPDATEper 30 000-row slice. Postgres rejects a statement that names the same conflict key twice in a single INSERT (ON CONFLICT DO UPDATE command cannot affect row a second time), so a single duplicate CIK incik-lookup-data.txtaborted the whole transaction and lost all ~1M rows for the run. The SQLite branch'sINSERT OR REPLACEalready swallowed in-batch duplicates with last-write-wins.Map<number, string>dedup runs after slicing and only shrinks the row set, so the existing 60 000-bind cap (PG_MAX_ROWS_PER_STATEMENT * 2) still holds — left a comment to that effect.console.debugrecords the drop count when dedup actually fires.PG_MAX_ROWS_PER_STATEMENT = 30_000and the SQLite branch are untouched.FetchAllCikNamesTask.test.tscovers the duplicate-CIK case end-to-end through the repository writer (the in-memory writer it falls through to in tests still exercises the dedup ordering invariant: last value wins, row count shrinks).Follow-up
The same
Number("")===0/Number(" ")===0bug class is plausible in the exempt-offerings extractors (Form C/D/1-A/1-K/1-Z); they were not in scope for this PR but warrant an audit pass and, where a localnum()exists, migration to the shared_valueHelpers.Test plan
bun test src/sec/forms/insider-trading/_valueHelpers.test.ts— new helpers cover all empty/whitespace/non-finite branches for both scalar and wrapped shapes.bun test src/sec/forms/insider-trading/Form_144.storage.test.ts— existing tests pass under the new shared helpers; new whitespace-only tests foraggregate_market_value,gross_proceeds,amount_acquiredall assertnull.bun test src/sec/forms/insider-trading/OwnershipDocument.storage.test.ts— existing empty-element tests still pass (behaviour unchanged); new whitespace-onlytransactionSharestest assertsnull.bun test src/task/ciknames/FetchAllCikNamesTask.test.ts— new duplicate-CIK test asserts the second value wins and the row count collapses to the unique-key count.aggregateMarketValuewas previously0and confirm the new rows areNULLand thatextractor_runsreflects1.1.0.Generated by Claude Code