Skip to content

T021 P1 stabilization: CI layering, Redis, /scrape contract, health, logging lifecycle#921

Open
lsh-915 wants to merge 33 commits into
NanmiCoder:mainfrom
lsh-915:t021-p1-stabilization
Open

T021 P1 stabilization: CI layering, Redis, /scrape contract, health, logging lifecycle#921
lsh-915 wants to merge 33 commits into
NanmiCoder:mainfrom
lsh-915:t021-p1-stabilization

Conversation

@lsh-915

@lsh-915 lsh-915 commented Jun 23, 2026

Copy link
Copy Markdown

Summary

This PR contains the T021 P1 stabilization work, separated from PR #918 to keep the Content Asset release scope frozen.

It includes five focused stabilization commits:

  1. CI/test baseline layering
  2. Redis configuration alignment
  3. Legacy /crawler API contract cleanup
  4. Health field contract alignment
  5. Logging / task file lifecycle cleanup

PR #918 remains frozen and should only receive review feedback fixes.

Completed

1. CI / test baseline governance

  • Split tests into core / legacy / external layers.
  • Kept core-tests as the blocking gate.
  • Added explicit markers:
    • core
    • legacy
    • external
    • redis
    • mongo
    • proxy
    • playwright
    • known_fail
  • Added Makefile commands:
    • test-core
    • test-baseline
    • test-all
    • test-known-failures
    • test-external
  • Documented the 165-test baseline in docs/TESTING_BASELINE.md.

2. Redis configuration alignment

  • Removed hardcoded default Redis password behavior.
  • Default local Redis config is now no-password:
    • REDIS_HOST=localhost
    • REDIS_PORT=6379
    • REDIS_PASSWORD=
    • REDIS_DB=0
  • Compose API uses Redis service name redis.
  • Compose Redis does not publish 6379 publicly by default.
  • Empty password is normalized so the client does not send AUTH.
  • Redis tests remain external and do not pollute the core gate.

3. Legacy /crawler API contract cleanup

  • /crawler remains deprecated and unmounted.
  • /scrape is the only official scrape API.
  • Existing legacy limit tests were migrated to the /scrape contract.
  • Old /crawler path is explicitly tested as unavailable instead of being silently revived.
  • The 8 legacy /crawler known failures were resolved without reintroducing duplicate API routes.

4. Health field contract alignment

  • Frontend and backend now consistently use:
checks.disk.free_gb
  • Deprecated available_gb is no longer used by the frontend.
  • API tests assert:
    • free_gb is numeric
    • available_gb is absent
    • /health remains public

5. Logging and task file lifecycle cleanup

  • Fixed pytest shutdown noise:
ValueError: I/O operation on closed file
--- Logging error ---
  • Added idempotent TaskManager.shutdown().
  • Ensured task workers are tracked and shut down cleanly.
  • Ensured atexit registration is global and not duplicated per TaskManager instance.
  • Added log handler cleanup utilities.
  • Released execution_log.jsonl handlers before task deletion.
  • Fixed Windows DELETE failure caused by locked execution_log.jsonl.
  • Added tests for log handler release and delete-after-failed-task behavior.

Validation

Local validation:

pytest api/tests.py -q
38 passed

pytest douyin_scraper/tests api/tests.py -q
91 passed

pytest -q
53 passed

git diff --check
PASS

Manual acceptance result:

Internal use: conditionally allowed
Public deployment: not allowed

Manual notes:

  • API and Web can start locally.
  • API Key auth works.
  • CORS defaults are restricted.
  • Health panel contract is aligned on free_gb.
  • DELETE / cleanup safety is improved.
  • Full Douyin E2E still requires Chrome CDP on port 19222.
  • Redis Docker E2E still requires Docker Desktop / Redis runtime.

Remaining known issues

Not fixed in this PR:

  • Store Factory known-fail: 1
  • Proxy external known-fail: 3
  • Full CDP E2E validation requires Chrome remote debugging on port 19222
  • Redis Docker E2E requires Docker Desktop / Redis runtime
  • Error field sanitization is still P2
  • output_dir nested workspace path is still P2
  • Public deployment still requires HTTPS, reverse proxy hardening, rate limiting, key rotation, and production secret management

Port contract

This PR keeps the existing project port contract:

API container port: 8000
API host port: 18080
Web dev port: 15173
Chrome/CDP port: 19222

Do not default to:

host API 8000
Web 3000
CDP 9222

Commit range

74ecc288 test(ci): expose full regression baseline
1562b4a9 fix(redis): align default redis configuration
01ece7dc chore(api): reconcile legacy crawler route contract
544afd11 fix(web): align health disk capacity fields
67527463 fix(logging): avoid closed stream errors during test shutdown

lsh-915 and others added 30 commits June 16, 2026 10:42
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
Co-authored-by: Cursor <cursoragent@cursor.com>
…ut-utf8

Co-authored-by: Cursor <cursoragent@cursor.com>
@lsh-915 lsh-915 requested a review from NanmiCoder as a code owner June 23, 2026 03:34
@dosubot dosubot Bot added the size:XXL This PR changes 1000+ lines, ignoring generated files. label Jun 23, 2026
@lsh-915

lsh-915 commented Jun 23, 2026

Copy link
Copy Markdown
Author

T021 review note:

This PR is a stacked follow-up to PR #918. The full PR diff against main may include the PR #918 Content Asset / release / P0 security scope because #918 has not been merged yet.

Please review the T021-specific delta only:

git log bdb2c802..HEAD
git diff bdb2c802..HEAD

T021 delta commits:

74ecc288 test(ci): expose full regression baseline
1562b4a9 fix(redis): align default redis configuration
01ece7dc chore(api): reconcile legacy crawler route contract
544afd11 fix(web): align health disk capacity fields
67527463 fix(logging): avoid closed stream errors during test shutdown

Local validation:

api/tests.py: 38 passed
douyin_scraper/tests + api/tests.py: 91 passed
pytest -q: 53 passed
closed-stream logging errors: resolved
Windows execution_log.jsonl delete lock: covered by new test

Checks are currently not reported on GitHub for this fork PR; maintainer workflow approval may be required.

Recommended merge order:

  1. Review / merge PR feat: Content Asset pipeline, release isolation, and API regression harness #918 first.
  2. Rebase or update this PR after feat: Content Asset pipeline, release isolation, and API regression harness #918 lands.
  3. Review / merge the T021 delta.

Out of scope for this PR:

  • Store Factory known-fail: 1
  • proxy external known-fail: 3
  • CDP E2E validation with Chrome remote debugging on 19222
  • Redis Docker E2E
  • error field sanitization
  • output_dir nested path cleanup
  • public deployment readiness

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XXL This PR changes 1000+ lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant