Skip to content

fix(forms,storage): Form 144 num() whitespace + PG bulk-writer duplicate-CIK dedup#118

Open
sroussey wants to merge 1 commit into
mainfrom
claude/wonderful-hypatia-j2Nma
Open

fix(forms,storage): Form 144 num() whitespace + PG bulk-writer duplicate-CIK dedup#118
sroussey wants to merge 1 commit into
mainfrom
claude/wonderful-hypatia-j2Nma

Conversation

@sroussey
Copy link
Copy Markdown
Contributor

Summary

Two independent High-priority data-integrity fixes in the insider-trading extractor and the CIK-names bulk writer.

Fix 1 — Form 144 num() whitespace + shared helpers

The local num() helper inside Form_144.storage.ts treated "" as null but a whitespace-only value (e.g. " ", "\t\n") slipped past the early-return and reached Number(" ") which returns 0. EDGAR filings have been observed with whitespace-only numeric elements; the extractor was silently fabricating a 0 for aggregate_market_value, gross_proceeds, amount_acquired, etc. and stamping it into Postgres as if the filer had actually reported zero.

  • New src/sec/forms/insider-trading/_valueHelpers.ts exports two pairs of helpers with intentionally distinct signatures (scalar vs {value}-wrapped) so call sites can't accidentally cross-wire them:
    • strScalar / numScalar — for Form 144's flat string/number leaves.
    • strWrapped / numWrapped — for OwnershipDocument's { value } leaves.
    • All four trim before testing for empty, so whitespace-only values now correctly resolve to null.
  • Form_144.storage.ts removes the local str/num and imports strScalar as str, numScalar as num. Bumps extractor_version from "1.0.0""1.1.0" so the production version-slot machinery re-runs the extractor against every previously-stored Form 144 and overwrites the fabricated zeros.
  • OwnershipDocument.storage.ts removes its local str/num and imports strWrapped as str, numWrapped as num. extractor_version is intentionally NOT bumped — that helper already trimmed before the empty-check, so behaviour is byte-for-byte identical and re-extraction would be pure churn.
  • New _valueHelpers.test.ts pins null-on-empty/whitespace, finite-only coercion, and the scalar/wrapped boundary (including {value:" "}, {value:undefined}, {}).
  • Form_144.storage.test.ts gets parallel whitespace-only regression tests for aggregateMarketValue, grossProceeds (recent sales), and amountOfSecuritiesAcquired (acquisitions).
  • OwnershipDocument.storage.test.ts gets a parallel whitespace-only transactionShares regression test.

Fix 2 — PG cikNameBulkWriter per-slice dedup

createPostgresWriter().writeBatch built one multi-row INSERT ... ON CONFLICT ("cik") DO UPDATE per 30 000-row slice. Postgres rejects a statement that names the same conflict key twice in a single INSERT (ON CONFLICT DO UPDATE command cannot affect row a second time), so a single duplicate CIK in cik-lookup-data.txt aborted the whole transaction and lost all ~1M rows for the run. The SQLite branch's INSERT OR REPLACE already swallowed in-batch duplicates with last-write-wins.

  • Per-slice Map<number, string> dedup runs after slicing and only shrinks the row set, so the existing 60 000-bind cap (PG_MAX_ROWS_PER_STATEMENT * 2) still holds — left a comment to that effect.
  • Last-write-wins ordering matches the SQLite path.
  • console.debug records the drop count when dedup actually fires.
  • PG_MAX_ROWS_PER_STATEMENT = 30_000 and the SQLite branch are untouched.
  • New regression test in FetchAllCikNamesTask.test.ts covers the duplicate-CIK case end-to-end through the repository writer (the in-memory writer it falls through to in tests still exercises the dedup ordering invariant: last value wins, row count shrinks).

Follow-up

The same Number("")===0 / Number(" ")===0 bug class is plausible in the exempt-offerings extractors (Form C/D/1-A/1-K/1-Z); they were not in scope for this PR but warrant an audit pass and, where a local num() exists, migration to the shared _valueHelpers.

Test plan

  • bun test src/sec/forms/insider-trading/_valueHelpers.test.ts — new helpers cover all empty/whitespace/non-finite branches for both scalar and wrapped shapes.
  • bun test src/sec/forms/insider-trading/Form_144.storage.test.ts — existing tests pass under the new shared helpers; new whitespace-only tests for aggregate_market_value, gross_proceeds, amount_acquired all assert null.
  • bun test src/sec/forms/insider-trading/OwnershipDocument.storage.test.ts — existing empty-element tests still pass (behaviour unchanged); new whitespace-only transactionShares test asserts null.
  • bun test src/task/ciknames/FetchAllCikNamesTask.test.ts — new duplicate-CIK test asserts the second value wins and the row count collapses to the unique-key count.
  • On a staging Postgres, re-run the Form 144 extractor against a known sample of filings whose aggregateMarketValue was previously 0 and confirm the new rows are NULL and that extractor_runs reflects 1.1.0.

Generated by Claude Code

…ate-CIK dedup

- Extract shared str/num helpers (scalar + wrapped) into _valueHelpers.ts
  so Form 144 and OwnershipDocument share the same null-on-empty/whitespace
  semantics. Form 144's previous local num() treated "" as null but a
  whitespace-only value would Number("   ")=0, fabricating a 0 in DB.
- Bump Form_144.storage extractor_version 1.0.0 -> 1.1.0 to trigger
  re-extraction. OwnershipDocument behaviour is unchanged so its version
  stays.
- In the Postgres cik_names bulk writer, dedup duplicate CIKs per slice
  (last value wins) before building the multi-row INSERT, so an in-batch
  duplicate cik no longer trips ON CONFLICT once-per-statement.
sroussey added a commit that referenced this pull request Jun 2, 2026
Plan H part 1 of 5 — split across multiple commits due to the
push-files tool size limit; logically one fix. See later commits for
Form_1_A tests, Form_1_K, Form_1_Z, and Form_C.

Adds the shared _valueHelpers module (numScalar/strScalar/numWrapped/
strWrapped) plus its unit tests. numScalar treats empty / whitespace-
only / NaN / Infinity input as null and rejects thousand-separator
strings; legitimate "0" round-trips so the regression guard holds.

NOTE: this is an inline copy of the helpers introduced by PR #118
under src/sec/forms/insider-trading/_valueHelpers.ts. When #118 merges
the duplicate should be removed in favour of a single shared module.

Also switches Form_1_A.schema.ts decimal-type aliases from
Type.Number() to Type.String() (4 aliases) so the storage layer can
make the null-vs-zero decision per cell with numScalar() instead of
Value.Convert silently producing 0 for empty text. Form_1_A.storage.ts
is updated to numScalar() every decimal field in processFinancialData
and the RegAOfferingHistory build, and the extractor_version is bumped
1.0.0 → 1.1.0 to force re-extraction.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
@sroussey sroussey requested a review from Copilot June 3, 2026 18:44
@sroussey sroussey self-assigned this Jun 3, 2026
Copy link
Copy Markdown
Contributor

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR delivers two data-integrity fixes: (1) prevent whitespace-only numeric values in insider-trading form XML from being coerced into fabricated 0 values during extraction, and (2) prevent Postgres bulk inserts for CIK-name imports from failing when duplicate CIKs appear within the same INSERT ... ON CONFLICT DO UPDATE statement.

Changes:

  • Introduces shared str*/num* value helpers that trim and map empty/whitespace-only inputs to null, then adopts them in Form 144 and Ownership Document storage code.
  • Bumps Form 144 extractor_version to 1.1.0 to force re-extraction and overwrite previously stored fabricated zeros.
  • Adds Postgres per-slice dedup for cikNameBulkWriter to avoid duplicate-conflict-key failures, plus new/updated unit tests.

Reviewed changes

Copilot reviewed 8 out of 8 changed files in this pull request and generated 1 comment.

Show a summary per file
File Description
src/sec/forms/insider-trading/_valueHelpers.ts Adds shared scalar vs wrapped value-parsing helpers with whitespace-to-null handling.
src/sec/forms/insider-trading/_valueHelpers.test.ts Adds unit tests pinning empty/whitespace and finite-number coercion semantics.
src/sec/forms/insider-trading/Form_144.storage.ts Switches to shared helpers and bumps extractor version to force re-extract.
src/sec/forms/insider-trading/Form_144.storage.test.ts Adds regression tests ensuring whitespace-only numeric fields persist as null.
src/sec/forms/insider-trading/OwnershipDocument.storage.ts Switches to shared wrapped helpers (intended behavior unchanged).
src/sec/forms/insider-trading/OwnershipDocument.storage.test.ts Adds regression test for whitespace-only wrapped numeric element (transactionShares).
src/storage/entity/cikNameBulkWriter.ts Adds per-slice dedup in Postgres writer to prevent ON CONFLICT duplicate-key statement failures.
src/task/ciknames/FetchAllCikNamesTask.test.ts Adds a duplicate-CIK batch test (currently routed through the repository fallback writer in this suite).

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

Comment on lines +86 to +98
it("dedups duplicate CIKs within a single batch, last value wins", async () => {
const writer = createCikNameBulkWriter();
await writer.writeBatch([
{ cik: 1, name: "FIRST" },
{ cik: 2, name: "B" },
{ cik: 1, name: "LAST" },
]);
await writer.close();
const repo = globalServiceRegistry.get(CIK_NAME_REPOSITORY_TOKEN);
expect((await repo.get({ cik: 1 }))?.name).toBe("LAST");
const all = await repo.getAll();
expect(all?.length).toBe(2);
});
sroussey added a commit that referenced this pull request Jun 3, 2026
Plan H part 1 of 5 — split across multiple commits due to the
push-files tool size limit; logically one fix. See later commits for
Form_1_A tests, Form_1_K, Form_1_Z, and Form_C.

Adds the shared _valueHelpers module (numScalar/strScalar/numWrapped/
strWrapped) plus its unit tests. numScalar treats empty / whitespace-
only / NaN / Infinity input as null and rejects thousand-separator
strings; legitimate "0" round-trips so the regression guard holds.

NOTE: this is an inline copy of the helpers introduced by PR #118
under src/sec/forms/insider-trading/_valueHelpers.ts. When #118 merges
the duplicate should be removed in favour of a single shared module.

Also switches Form_1_A.schema.ts decimal-type aliases from
Type.Number() to Type.String() (4 aliases) so the storage layer can
make the null-vs-zero decision per cell with numScalar() instead of
Value.Convert silently producing 0 for empty text. Form_1_A.storage.ts
is updated to numScalar() every decimal field in processFinancialData
and the RegAOfferingHistory build, and the extractor_version is bumped
1.0.0 → 1.1.0 to force re-extraction.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc
sroussey added a commit that referenced this pull request Jun 4, 2026
…SP + download leak (sec) (#121)

* fix(cli): complete CSV escape hardening with unicode + bare-CR support

Layers two refinements on top of the line-by-line CSV defusing in #119:

1. Tighten LEADING_WS so it only strips space-like characters that
   spreadsheets silently ignore (ASCII space, NBSP, SHY, ZWSP, ZWNJ,
   ZWJ, LRM, RLM, BOM). \t and \r are themselves dangerous formula
   leads, so we no longer strip them away before the DANGEROUS_LEAD
   check — otherwise "\t=cmd" or "\r=cmd" would slip through as
   non-dangerous after the strip.

2. Split on /(\r\n|\r|\n)/ instead of /(\r?\n)/ so that a bare CR
   inside a multi-line cell is also a line boundary; the line after
   the CR is independently defused. Excel re-parses every physical
   line of a quoted cell, including lines separated by lone CR.

Tests cover the zero-width-prefix bypasses (ZWSP/ZWNJ/ZWJ/LRM/RLM/SHY/BOM
+ "=cmd"), mixed-WS bypasses ("ZWSP space =cmd"), bare-CR-followed-by-formula
("safe\r=cmd"), and a negative control to prove ZWSP-then-benign is left
alone. All 44 cases in TableRenderer.test.ts pass.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(storage,task): close observation TOCTOU + bootstrap zip leak

Three related correctness fixes:

1. PersonObservationRepo / CompanyObservationRepo upsertByNaturalKey
   had a TOCTOU window — two concurrent upserts on an empty in-memory
   store would both observe `size()` as 0 and assign observation_id=1
   to two different natural keys. Introduce a tiny process-wide
   AsyncMutex in src/util/AsyncMutex.ts that serializes the INSERT
   branch only (the existing-row update branch is keyed by
   observation_id and remains lock-free). Inside the critical section
   we re-query the natural key before allocating an id so that a
   parallel upsert of the same natural key still collapses to one row
   and one id.

2. BootstrapDownloadTask staged the multi-GB zip into
   SEC_RAW_DATA_FOLDER and only removed it on the success branch. If
   Bun.spawn threw, or unzip exited non-zero, the archive leaked until
   the next bootstrap run, silently consuming disk on operator VMs.
   Wrap the spawn+exit-code block in try/finally and remove the zip
   (with `force: true`) on every exit path.

3. AsyncMutex itself: trivial queue-based serializer. Rejections in
   one critical section don't poison the queue because the chained
   tracker swallows both branches; each caller still observes its own
   rejection through the returned promise.

Tests:
- PersonObservationRepo.test.ts / CompanyObservationRepo.test.ts each
  gain two new cases: concurrent INSERT-on-empty must yield distinct
  sequential ids, and concurrent UPDATE-existing + parallel INSERT
  must keep the original id stable and give the insert the next id.
- BootstrapDownloadTask.test.ts gains an "execute zip cleanup"
  describe block that stubs fetch + Bun.spawn + Bun.which and asserts
  the zip is removed on (a) spawn-throws, (b) unzip-exits-nonzero,
  and (c) the success path.

All 24 affected tests pass.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(forms): stop fabricating 0 for empty exempt-offering numerics (1/5)

Plan H part 1 of 5 — split across multiple commits due to the
push-files tool size limit; logically one fix. See later commits for
Form_1_A tests, Form_1_K, Form_1_Z, and Form_C.

Adds the shared _valueHelpers module (numScalar/strScalar/numWrapped/
strWrapped) plus its unit tests. numScalar treats empty / whitespace-
only / NaN / Infinity input as null and rejects thousand-separator
strings; legitimate "0" round-trips so the regression guard holds.

NOTE: this is an inline copy of the helpers introduced by PR #118
under src/sec/forms/insider-trading/_valueHelpers.ts. When #118 merges
the duplicate should be removed in favour of a single shared module.

Also switches Form_1_A.schema.ts decimal-type aliases from
Type.Number() to Type.String() (4 aliases) so the storage layer can
make the null-vs-zero decision per cell with numScalar() instead of
Value.Convert silently producing 0 for empty text. Form_1_A.storage.ts
is updated to numScalar() every decimal field in processFinancialData
and the RegAOfferingHistory build, and the extractor_version is bumped
1.0.0 → 1.1.0 to force re-extraction.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(forms): stop fabricating 0 for empty exempt-offering numerics (2/5)

Plan H part 2 of 5 — Form_1_A test updates.

Form_1_A.test.ts: decimal-typed leaves now arrive as raw strings from
the parser; assertions updated accordingly (pricePerSecurity, audit/
legal/etc. fees). New "validate financial data types and ranges" split
into decimal-string vs integer-number checks.

Form_1_A.storage.test.ts: three new regression tests pin the null /
empty / "0" round-trip behaviour through processForm1A.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(forms): stop fabricating 0 for empty exempt-offering numerics (3/5)

Plan H part 3 of 5 — Form_1_K.

Schema: 2 DECIMAL_TYPE aliases switch from Type.Number() to
Type.String().

Storage (processOfferingHistory): price_per_security,
aggregate_offering_price, aggregate_offering_price_holders, and
estimated_net_amount now flow through numScalar(). extractor_version
bumped 1.0.0 → 1.1.0 with the standard re-extract comment.

Tests: parsing assertions updated for the new string contract; new
storage regression tests for null/empty/"0" round-trip.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(forms): stop fabricating 0 for empty exempt-offering numerics (4/5)

Plan H part 4 of 5 — Form_1_Z.

Schema: 2 DECIMAL_TYPE aliases switch from Type.Number() to
Type.String().

Storage: processOfferingSummaries now flows price_per_security,
portionSecuritiesSoldIssuer/Securityholders, and issuerNetProceeds
through numScalar(). processCertificationSuspension's
approxRecordHolders also goes through numScalar() now (with a
null-check instead of an undefined-check so the row is dropped
when the input is empty/whitespace). extractor_version 1.0.0 →
1.1.0.

Tests: parsing assertions updated for string contract; new storage
regression tests for null/empty/"0" round-trip.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(forms): stop fabricating 0 for empty exempt-offering numerics (5/5)

Plan H part 5 of 5 — Form_C and final wrap-up.

Schema: 4 DECIMAL_TYPE aliases switch from Type.Number() to
Type.String() — including the previously minimum-0 DECIMAL_TYPE7_2_
NONNEGATIVE. The minimum-0 invariant is no longer expressible at the
schema level; that constraint becomes the extractor's responsibility
(see Form_C.storage.ts numScalar() use).

Storage:
- processOfferingInfo now flows price, offering_amount, and
  maximum_offering_amount through numScalar(); priceDeterminationMethod
  remains a passthrough.
- processAnnualReportDisclosures now flows every disclosure field
  through numScalar() and drops the row when the result is null,
  instead of writing a fabricated 0 disclosure_value.
- extractor_version 1.0.0 → 1.1.0.

Tests: parsing assertions updated for the new string contract on
price/offeringAmount/maximumOfferingAmount/currentEmployees/
totalAssetMostRecentFiscalYear/revenueMostRecentFiscalYear. New
storage regression tests cover null/empty/"0" round-trip for both
offering and disclosure numerics.

Whole-suite verification (bun test) passes 845/845 with the helpers,
schemas, storage, and tests all in place.

https://claude.ai/code/session_01Wws8oZpB5imjKL2e7DRXtc

* fix(cli): broaden CSV LEADING_WS to cover all spreadsheet-stripped invisibles

The LEADING_WS character class used to strip a handful of zero-width
characters before the formula-injection check, but missed common
spreadsheet-stripped invisibles (U+00AD SHY, U+034F CGJ, U+061C ALM,
U+115F/U+1160 Hangul fillers, U+1680 Ogham space, U+180E MVS, the bidi
formatting block U+202A..U+202E, U+205F MMSP, the invisible operators
U+2061..U+2064 / U+206A..U+206F, U+3164 Hangul filler, U+FFA0 halfwidth
Hangul filler). Each of those can prefix a `=`/`+`/`-`/`@` cell that
Excel/Sheets/Numbers strip before formula parsing.

The previous source also kept raw invisible characters inline, which
editors silently normalise. The regex is now built from `\uXXXX` escape
sequences so the codepoint set is reviewable.

The pre-existing test "defuses leading U+00A0 NBSP before '='" passed
ASCII SPACE to escapeCsvValue() — vacuous because ASCII space was
already stripped. The test now uses the U+00A0 escape and we add a
table-driven regression for every codepoint LEADING_WS covers, plus
NBSP-prefixed variants of every DANGEROUS_LEAD codepoint.

https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u

* fix(task): clean up partial bulk-download on stream abort or size mismatch

streamDownloadToFile previously left a half-written archive on disk if
the fetch body errored mid-stream. The next bootstrap run would then
hand that partial multi-GB ZIP straight to unzip and either silently
produce a corrupt extract or fail in a way that masked the real cause.

Three fixes:
- await writer.write(value) so back-pressure is honoured. The previous
  code fired-and-forgot, which let bytes pile up in the writer queue.
- On any error path (network abort, writer failure, size mismatch),
  rmSync(destPath, { force: true }) before rethrowing. Best-effort —
  the original error is the one callers see.
- Validate `bytes === content-length` when the server advertised one.
  A short response is now a hard failure instead of a truncated archive.

The reader lock is released before the writer is closed so any pending
flush doesn't deadlock against a still-locked stream.

Tests cover the mid-stream abort and content-length mismatch paths,
asserting both the throw AND that the destination file is gone.

https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u

* fix(observation): delegate observation_id assignment to the storage backend

The old INSERT path computed `(await repo.size()) + 1` for the new row's
observation_id, guarded by a process-wide AsyncMutex to close the
size()/put() TOCTOU window. That was wrong in two ways:

  1. The mutex was only effective in a single process. Two `sec`
     processes — or two workers — sharing the same SQLite/Postgres
     backend would still collide.
  2. `repo.size()` reads the live row count, so a delete elsewhere in
     the table would silently reassign an id to a new natural key.

Replace the whole dance with `x-auto-generated: true` on the
observation_id schema field. The storage backend assigns the id (SQLite
INTEGER PRIMARY KEY AUTOINCREMENT, Postgres SERIAL, in-memory counter)
and returns the persisted row. `put()`'s return value gives us back the
assigned id without a second query. The repo-side AsyncMutex import is
dropped from both observation repos.

Tests previously asserted specific id values (`expect(...).toBe(1)`).
Auto-generated keys do not contract for "first id is 1" across backends
(Postgres SERIAL can skip on rollback), so the assertions are now
`toBeGreaterThan(0)` + `.not.toBe(other)`. EntityObserver tests get the
same treatment.

A smoke-test against InMemoryTabularStorage confirmed the backend does
return the auto-generated id from put() — the natural-key PK fallback
documented in the plan was not needed.

https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u

* fix(resolver): serialize person/company find-or-create per natural key

Two parallel `resolver.resolve(obs)` calls that mapped to the same
(resolver_version, key_kind, key_value) used to race: both observed no
canonical row, both minted a fresh UUID via randomUUID(), both inserted,
yielding two distinct canonical ids for what should have been the same
entity. Every downstream identity-link that pointed at the duplicate
canonical was then silently wrong.

PersonResolver and CompanyResolver now hold a process-wide
Map<string, { mutex: AsyncMutex; refs: number }> keyed on the
resolver_version-plus-natural-key tuple. The find-or-create critical
section runs inside `mutex.lock()` so a queued caller observes the
canonical row the previous caller just inserted, and returns the same
id. The refcount is decremented in a finally block and the entry is
deleted from the map when it hits zero, so the map stays bounded for
processes that resolve millions of distinct keys.

Multi-process callers (separate `sec` invocations, workers sharing a
backend) still need a backend-level UNIQUE constraint to be race-free —
a single-process mutex isn't visible across processes. That is noted
in the class JSDoc.

New PersonResolver.race.test.ts and CompanyResolver.race.test.ts cover:
- two parallel resolves on the same CIK collapsing to one canonical
- ditto for the name-key path (Person) and CRD + name paths (Company)
- 20-25 fan-out stress for queue-depth coverage
- distinct-key parallel resolves do NOT block each other

https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u

* fix(pr-review): address 5 Copilot review findings on PR #121

* AsyncMutex: rewrite the file docblock so it documents the current
  per-key resolver find-or-create use-case instead of the retired
  observation-repo TOCTOU window (observation_id is now auto-generated).
* BootstrapDownloadTask.streamDownloadToFile: close the writer before
  unlinking so Windows can actually delete the partial file, and
  surface writer.end() failures as the operation error on the success
  path instead of silently returning success over a half-flushed file.
* BootstrapDownloadTask.test: snapshot+restore the previous
  SEC_RAW_DATA_FOLDER binding around setupRawDataFolder() so the
  globalServiceRegistry no longer leaks a stale tmpdir into later test
  files.
* Form_1_Z.test: replace the vacuous `expect(cik).toBe(cik)` self-
  comparison with assertions that the value is a 1-10 digit string,
  matching the CIK_TYPE schema.
* Form_C.storage: restore the schema-level minimum:0 invariant on
  `currentEmployees` that was lost when the field moved to string-typed
  decimals + numScalar(). Negative values now drop to null instead of
  persisting as a negative disclosure_value. Adds a regression test.

* fix(cli): strip U+2028/U+2029 before formula-lead check

JS `\n` does not match U+2028 LINE SEPARATOR or U+2029 PARAGRAPH
SEPARATOR, so escapeCsvValue never split them out as line breaks.
Some spreadsheet importers silently treat them as leading whitespace
before formula parsing, leaving a "<LS>=cmd" payload unguarded.
Add both codepoints to LEADING_WS and pin them with regression tests.

https://claude.ai/code/session_011KMd9sERp2rguyekAi8a3u

---------

Co-authored-by: Claude <noreply@anthropic.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants