Add standalone tests for LLM behavior regression testing

## Summary
The CI workflow has a `standalone-tests` job that runs on PRs targeting `main` when LangGraph components are touched, but there are currently **zero tests** marked with `@pytest.mark.standalone`. This job needs actual tests to fulfill its purpose of verifying LLM behavior hasn't regressed.

## Context
- The standalone marker is defined in `pyproject.toml` and intended for tests that "run locally without backend (requires OPENROUTER_API_KEY_FOR_TESTING)"
- The CI job was recently fixed to only trigger on PRs targeting `main` to avoid unnecessary API costs
- There are `@pytest.mark.integration` tests in `test_integration_openrouter.py`, but none with `@pytest.mark.standalone`

## Requirements
- [ ] Create standalone tests that verify core LLM annotation quality (e.g., known event descriptions produce valid HED annotations)
- [ ] Tests should use `OPENROUTER_API_KEY_FOR_TESTING` and a cheap model
- [ ] Cover key regression scenarios: invalid tags not generated, proper grouping, semantic faithfulness
- [ ] Consider golden-reference test cases (known input -> expected valid output patterns)
- [ ] Keep tests minimal to control API costs; focus on grounding behavior, not exhaustive coverage
- [ ] All tests should be marked with `@pytest.mark.standalone`

## Related
- PR #101 fixed the standalone CI trigger to only run on PRs to main
- The `standalone` marker is already registered in `pyproject.toml` (line 128)

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Add standalone tests for LLM behavior regression testing #102

Summary

Context

Requirements

Related

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Uh oh!

Add standalone tests for LLM behavior regression testing #102

Description

Summary

Context

Requirements

Related

Metadata

Metadata

Assignees

Labels

Type

Fields

Projects

Milestone

Relationships

Development

Issue actions