Skip to content

refactor: Restructure bot into modular agent with multi-provider LLM support#59

Open
nalbam wants to merge 57 commits intoawskrug:mainfrom
nalbam:main
Open

refactor: Restructure bot into modular agent with multi-provider LLM support#59
nalbam wants to merge 57 commits intoawskrug:mainfrom
nalbam:main

Conversation

@nalbam
Copy link
Copy Markdown
Member

@nalbam nalbam commented Apr 18, 2026

Summary

  • handler.py 단일 파일(754줄)을 app.py + src/ 모듈 구조로 전면 재작성
  • OpenAI / xAI / Bedrock(Claude · Nova · Titan · Stability) 멀티 프로바이더 지원
  • Native function calling 기반 에이전트 루프 도입 (의도 → 계획 → 툴 → 응답)
  • 단위/통합 테스트 125개 추가 (커버리지 86%)

Changes

아키텍처 재구성

  • src/agent.py — 에이전트 루프 소유. tool_calls 기반 native function calling, 중복 호출 억제, max_steps 도달 시 강제 compose
  • src/llm.pyLLMProvider Protocol과 OpenAI / xAI / Bedrock 구현체. 텍스트·이미지 프로바이더가 다를 때를 위한 _CompositeProvider 포함
  • src/tools.py@tool 데코레이터로 dispatch와 LLM 스키마 동시 선언하는 ToolRegistry
  • src/dedup.py — DynamoDB 조건부 put 기반 race-safe 중복 제거 + 스레드 대화 메모리
  • src/slack_helpers.py — 코드 펜스를 보존하는 메시지 분할, 스트리밍 메시지 처리
  • src/config.py — 지연 검증 Settings, 잘못된 enum/int는 기본값 폴백
  • src/logging_utils.py — request_id 컨텍스트 포함 구조화 JSON 로깅

인프라 / 배포

  • serverless.yml — Python 3.12, x86_64, DynamoDB(GSI user-index + TTL), Bedrock IAM
  • .github/workflows/push.ymlpytest --cov 실행 후 OIDC 역할(lambda-gurumi-bot)로 Serverless 배포
  • .github/aws-role/ — Phase 2 KB 작업까지 커버하는 OIDC trust/role 정책

정리

  • 사용하지 않는 scripts/bedrock/, scripts/notion/, scripts/awsdocs/ 제거
  • Node 기반 package.json / package-lock.json 제거 (Serverless는 글로벌 설치)
  • 로컬 디버깅용 localtest.py 추가 (Slack 없이 CLI로 에이전트 호출)

Test Plan

  • pip install -r requirements.txt -r requirements-dev.txt
  • python -m pytest --cov=src --cov-report=term-missing — 125 tests pass, 86% coverage
  • python localtest.py \"안녕\" — 스트리밍 응답 동작 확인
  • python localtest.py \"강아지 그림 그려줘\"generate_image 툴 호출 후 응답 합성 확인
  • Slack 워크스페이스에서 dev 스테이지 배포 후 멘션·DM·스레드 대화 동작 확인
  • 동일 메시지 재전송 시 중복 처리되지 않는지 확인 (DynamoDB dedup)

nalbam added 11 commits April 13, 2026 14:18
… tools

- Introduced `logging_utils.py` for structured JSON logging with request ID support.
- Created `slack_helpers.py` for message formatting, streaming messages, and user name caching.
- Implemented `tools.py` for tool registry and six built-in tools:
  - `read_attached_images`
  - `read_attached_document`
  - `fetch_thread_history`
  - `search_web`
  - `generate_image`
  - `get_current_time`
- Each tool includes JSON Schema specifications and error handling.
…nalities

- Implement tests for JsonFormatter to ensure request_id and extra fields are included in logs.
- Create tests for MessageFormatter to validate message splitting and handling of code blocks.
- Add tests for UserNameCache to verify caching behavior and error handling.
- Introduce tests for channel_allowed and sanitize_error functions to ensure proper functionality.
- Develop comprehensive tests for ToolExecutor, including timeout handling and error capturing.
- Add tests for reading attached images and documents, ensuring proper handling of various file types and limits.
- Implement tests for web search and image generation tools to validate expected outputs.
Copy link
Copy Markdown

@amazon-q-developer amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review Summary

This PR restructures the lambda-gurumi-bot from a monolithic handler.py into a modular architecture with multi-provider LLM support. While the refactoring approach is sound, there are 6 critical issues that must be fixed before merge:

Critical Issues (Must Fix):

  1. Workflow References Deleted Infrastructure: The GitHub Actions workflow still attempts to update Bedrock Agent resources that were completely removed from serverless.yml
  2. Service Name Mismatch: Workflow references old stack/role names ("lambda-gurumi-ai-bot") while serverless.yml uses "lambda-gurumi-bot"
  3. DynamoDB Schema Breaking Change: Adding expire_at as a GSI RANGE key breaks existing user-only queries
  4. Invalid DynamoDB AttributeDefinition: expire_at is defined as an attribute but only used for TTL, not as a key

Architecture Concerns:

  1. Suboptimal Architecture Choice: Changed from arm64 to x86_64 citing build complexity, but serverless-python-requirements already supports arm64 with dockerizePip
  2. Deleted Serverless Plugin Reference: Workflow installs serverless-dotenv-plugin which is no longer configured in serverless.yml

Recommendations:

  • Fix the 6 critical issues identified above
  • Ensure all new src/ module files (app.py, src/agent.py, src/llm.py, etc.) are properly tested
  • Update deployment documentation to reflect the new architecture and environment variables
  • Verify the test coverage claim (86%) includes integration tests for the agent loop and tool calling

The refactoring direction is good, but deployment will fail without addressing these infrastructure and configuration mismatches.


You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.


⚠️ This PR contains more than 30 files. Amazon Q is better at reviewing smaller PRs, and may miss issues in larger changesets.

Comment thread .github/workflows/push.yml
Comment thread .github/workflows/push.yml
Comment thread .github/workflows/push.yml
Comment thread .github/workflows/push.yml
Comment thread serverless.yml
Comment thread serverless.yml
Comment thread serverless.yml
Comment thread serverless.yml
nalbam and others added 18 commits April 20, 2026 11:39
…_READER_BASE)

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move _PUBLIC_WEB_UA, _WEB_FETCH_TIMEOUT, _JINA_HEADER_MAX_LINES,
_JINA_LINK_RE, _HtmlTextExtractor, _NoRedirectHandler,
_validate_public_https_url, _filter_links, _extract_markdown_links,
_parse_jina_response, _read_body_capped, _jina_fetch, _raw_fetch from
src/tools.py into a new src/tools_web.py. Re-export all moved symbols
from src/tools for backward compatibility. Drop unused imports (ipaddress,
re, socket, HTMLParser) from src/tools.py. Update monkeypatch targets in
tests/test_tools.py for the moved helpers (src.tools_web.socket.getaddrinfo,
src.tools_web.urllib.request.urlopen/build_opener).

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Register fetch_webpage in default_registry alongside other built-in
tools. Tries Jina Reader first (markdown output, link extraction);
falls back to direct raw GET + HTML text extraction on 5xx or oversize.
Adds import socket for socket.timeout handling. Includes 7 end-to-end
tests covering happy path, fallback, truncation, and link dedup.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Move 818-line src/llm.py into a 7-file src/llms/ package so each
responsibility lives in its own module. Public API names are preserved
via src/llms/__init__.py. All callers updated; 161 tests still pass.

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
nalbam and others added 28 commits April 20, 2026 16:41
…ls package

- Created src/tools/__init__.py with public re-exports and side-effect imports
- Created src/tools/registry.py with ToolDef, ToolRegistry, tool, ToolContext, ToolExecutor, default_registry
- Created src/tools/slack.py with read_attached_images, read_attached_document, fetch_thread_history
- Created src/tools/search.py with search_web, _ddg_search, _tavily_search
- Created src/tools/web.py merging fetch_webpage from tools.py + all SSRF/HTML/fetch helpers from tools_web.py
- Created src/tools/image.py with generate_image
- Created src/tools/time.py with get_current_time
- Deleted src/tools.py and src/tools_web.py
- Updated tests/test_tools.py import paths and patch targets to new submodule locations

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
BedrockProvider.describe_image always built an Anthropic Messages body,
so vision requests failed with ValidationException when the configured
LLM_MODEL was a Nova ID (amazon.nova-pro-v1:0 and its us./eu./apac./
global. inference-profile variants). chat() already family-routes between
Claude and Nova, but the vision entrypoint did not — read_attached_images
blew up for every Nova-based deployment.

Split describe_image by _text_family: Claude keeps the Messages body path,
Nova goes through converse() with an {image: {format, source: {bytes}}}
content block. MIME is mapped to Nova's short format (image/jpeg → jpeg)
with a conservative png fallback for unknown types.

Also hardens _to_nova_messages tool content — dict/list payloads now go
through json.dumps instead of str() repr, so callers that hand us
structured content don't end up with un-parseable Python syntax in Nova's
text blocks.
Two related issues in the tool dispatch layer:

1) ToolExecutor created a ThreadPoolExecutor in __init__ and never shut
   it down. SlackMentionAgent builds a new executor per request, so in
   Lambda warm containers every invocation added non-daemon workers to
   the process registry that stayed until interpreter exit. Added
   ToolExecutor.close() (idempotent, wait=False) and a try/finally in
   SlackMentionAgent.run() that only closes executors the agent created
   itself — injected executors remain caller-owned.

2) execute() caught a stdlib allowlist (ValueError, TypeError, URLError,
   SlackApiError, Boto*). Provider SDKs raise their own hierarchies
   (openai.APIError, anthropic.APIError, httpx.HTTPError) that don't
   subclass those, so generate_image failures bubbled past the executor
   and aborted the whole agent loop with a generic "요청 처리 중 오류"
   reply. The LLM never got a chance to see the tool_result failure and
   recover. Broadened to `except Exception` — the agent treats
   {"ok": False, ...} as recoverable, which is the entire point.
Fallback streaming path called chat_update and only logged on failure —
if the current ts became unusable (deleted, message-level rate limit,
permission change) the streamer would burn the rest of the request
re-trying the same broken ts until the buffer hit max_len. Added a
3-strike counter that rolls to a fresh chat_postMessage once crossed,
with preserve_buffer=True so deltas that never reached Slack are
re-sent against the new ts. Successful updates reset the counter.

Widened sanitize_error redaction: sk-ant-* (Anthropic), xai-* (Grok),
tvly-* (Tavily), and AKIA/ASIA AWS access keys all carried through
error messages to end users unredacted. More-specific patterns run
before the generic sk-* rule so labels stay accurate.
Old implementation re-serialized the full kept list on every pop(0),
giving quadratic behavior in message count. Now we serialize each
message once, then walk backwards from newest accumulating exact size
(including JSON array brackets and ", " item separators) until the
budget would be exceeded.

Added a regression test that pins output size to the real
len(json.dumps(kept)) across a range of budgets, so future rewrites
can't drift off the exact separator accounting.
Two flow fixes in _process:

- DMs (is_dm=True) no longer go through channel_allowed. DM channel IDs
  are D-prefixed and not normally in ALLOWED_CHANNEL_IDS, so setting the
  allowlist for a team used to lock every user out of the direct-message
  path the moment the var was configured. Slack's install permission
  already gates who can open the DM with the bot.

- Bare "@bot" with no prompt now returns before DedupStore.reserve. The
  previous ordering reserved a 1-hour TTL row for every empty ping and
  made Slack retries of the same no-op register as dedup.skip noise.
Nothing in src/ reads AWS_REGION_NAME — config.py pulls the region from
AWS_REGION, which the Lambda runtime auto-injects and cannot be
overridden via serverless's environment block. The alias was dead
config left over from an earlier attempt to work around the reserved
name; removed and replaced with a comment so the next contributor
doesn't re-introduce it.
Test suite grew from 161 to 189 after the audit follow-ups; per-module
coverage moved too (dedup 78→80, slack_helpers 84→86, bedrock 76→78,
tools/slack 86→87). Update both CLAUDE.md and README.md so onboarding
doesn't trust stale numbers.

Also extend CLAUDE.md's "Things that are easy to break" with the new
load-bearing invariants the fixes in this series rely on: Nova vision
must family-route, agent.run() must close its owned executor, tool
exception catch must stay broad enough to wrap provider SDK errors,
and channel allowlist must skip DMs.
Previously SYSTEM_MESSAGE overrode the entire default base prompt. An
operator who set it to a persona-style string (e.g. "한국어로 답하세요")
silently deleted the loop's planning / tool-use / attachment-lookup
guidance — which made ambiguous attachment requests need 3 hops to
resolve and let GitHub-markdown and tool-failure patterns slip through.

Split the system prompt into three append-only layers:

- Task rules (code-owned, always present): plan work, parallel
  tool_calls, and tool-failure policy (ok:false → tell user briefly +
  suggest alternative, no blind retry, no fabrication).
- Operator policy: SYSTEM_MESSAGE is now appended, not substituted.
- Persona: new PERSONA_MESSAGE env carries answer style / tone.

Add three targeted guidances on top:

- Slack mrkdwn rules so the streaming fallback path doesn't surface
  raw `**bold**` or `[label](url)` to users (enable_native defaults off
  so Slack receives a plain `text` field).
- Speculative attachment lookup for ambiguous references ("이 사진",
  "아까 그 파일") — read_attached_* returns [] cheaply when the event
  has no file, signaling the fetch_thread_history fallback.
- Parallel tool_calls hint to avoid hop waste when independent tools
  are needed together.

Wire PERSONA_MESSAGE through config, app, serverless, push.yml, and
.env.example. Update README env table and add regression tests asserting
task rules survive a set SYSTEM_MESSAGE and the three new guidances stay
in the prompt.
Extend Lambda timeout from 90s to 300s and the generate_image tool
from 75s to 240s. gpt-image-2 routinely takes 60-180s and was
tripping the 75s tool cap, surfacing as "이미지 생성 중 시간 초과가
발생했습니다" to users.

Output flows via Slack Web API (chat_postMessage, files_upload_v2),
so the API Gateway 29s integration limit is independent of Lambda
runtime. Slack's retry burst from the 29s 504 is already absorbed
by DedupStore, so extending Lambda runtime is safe for end-user
delivery.
Long-running agent calls (notably generate_image with gpt-image-2)
could exceed the API Gateway 29s integration timeout, which caused
Slack to record a delivery failure and issue retries that raced
against the real reply.

The receiver path now fires a fire-and-forget lambda:Invoke
(InvocationType=Event) against this same function with _worker=True
and returns HTTP 200 within a few hundred ms. The worker path
picks up the payload in a separate Lambda invocation and runs
_process to completion inside the full 300s timeout. The existing
dedup key still absorbs Slack's retries on the receiver side and
Lambda async's built-in 2x retry on worker failure — all paths
converge on the same dedup:{msg_id} row.

Adds lambda:InvokeFunction on self to the runtime IAM role, a
WebClient-from-token rebuild inside the worker (Bolt's injected
client dies with the receiver process), and an inline-execution
fallback so localtest / pytest paths keep working without
AWS_LAMBDA_FUNCTION_NAME.
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant