Skip to content

Add standalone tests for LLM behavior regression testing #102

Description

@neuromechanist

Summary

The CI workflow has a standalone-tests job that runs on PRs targeting main when LangGraph components are touched, but there are currently zero tests marked with @pytest.mark.standalone. This job needs actual tests to fulfill its purpose of verifying LLM behavior hasn't regressed.

Context

  • The standalone marker is defined in pyproject.toml and intended for tests that "run locally without backend (requires OPENROUTER_API_KEY_FOR_TESTING)"
  • The CI job was recently fixed to only trigger on PRs targeting main to avoid unnecessary API costs
  • There are @pytest.mark.integration tests in test_integration_openrouter.py, but none with @pytest.mark.standalone

Requirements

  • Create standalone tests that verify core LLM annotation quality (e.g., known event descriptions produce valid HED annotations)
  • Tests should use OPENROUTER_API_KEY_FOR_TESTING and a cheap model
  • Cover key regression scenarios: invalid tags not generated, proper grouping, semantic faithfulness
  • Consider golden-reference test cases (known input -> expected valid output patterns)
  • Keep tests minimal to control API costs; focus on grounding behavior, not exhaustive coverage
  • All tests should be marked with @pytest.mark.standalone

Related

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions