refactor: Restructure bot into modular agent with multi-provider LLM support#59
Open
nalbam wants to merge 57 commits intoawskrug:mainfrom
Open
refactor: Restructure bot into modular agent with multi-provider LLM support#59nalbam wants to merge 57 commits intoawskrug:mainfrom
nalbam wants to merge 57 commits intoawskrug:mainfrom
Conversation
… tools - Introduced `logging_utils.py` for structured JSON logging with request ID support. - Created `slack_helpers.py` for message formatting, streaming messages, and user name caching. - Implemented `tools.py` for tool registry and six built-in tools: - `read_attached_images` - `read_attached_document` - `fetch_thread_history` - `search_web` - `generate_image` - `get_current_time` - Each tool includes JSON Schema specifications and error handling.
…nalities - Implement tests for JsonFormatter to ensure request_id and extra fields are included in logs. - Create tests for MessageFormatter to validate message splitting and handling of code blocks. - Add tests for UserNameCache to verify caching behavior and error handling. - Introduce tests for channel_allowed and sanitize_error functions to ensure proper functionality. - Develop comprehensive tests for ToolExecutor, including timeout handling and error capturing. - Add tests for reading attached images and documents, ensuring proper handling of various file types and limits. - Implement tests for web search and image generation tools to validate expected outputs.
There was a problem hiding this comment.
Review Summary
This PR restructures the lambda-gurumi-bot from a monolithic handler.py into a modular architecture with multi-provider LLM support. While the refactoring approach is sound, there are 6 critical issues that must be fixed before merge:
Critical Issues (Must Fix):
- Workflow References Deleted Infrastructure: The GitHub Actions workflow still attempts to update Bedrock Agent resources that were completely removed from serverless.yml
- Service Name Mismatch: Workflow references old stack/role names ("lambda-gurumi-ai-bot") while serverless.yml uses "lambda-gurumi-bot"
- DynamoDB Schema Breaking Change: Adding expire_at as a GSI RANGE key breaks existing user-only queries
- Invalid DynamoDB AttributeDefinition: expire_at is defined as an attribute but only used for TTL, not as a key
Architecture Concerns:
- Suboptimal Architecture Choice: Changed from arm64 to x86_64 citing build complexity, but serverless-python-requirements already supports arm64 with dockerizePip
- Deleted Serverless Plugin Reference: Workflow installs serverless-dotenv-plugin which is no longer configured in serverless.yml
Recommendations:
- Fix the 6 critical issues identified above
- Ensure all new src/ module files (app.py, src/agent.py, src/llm.py, etc.) are properly tested
- Update deployment documentation to reflect the new architecture and environment variables
- Verify the test coverage claim (86%) includes integration tests for the agent loop and tool calling
The refactoring direction is good, but deployment will fail without addressing these infrastructure and configuration mismatches.
You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.
…_READER_BASE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ost normalization)
Move _PUBLIC_WEB_UA, _WEB_FETCH_TIMEOUT, _JINA_HEADER_MAX_LINES, _JINA_LINK_RE, _HtmlTextExtractor, _NoRedirectHandler, _validate_public_https_url, _filter_links, _extract_markdown_links, _parse_jina_response, _read_body_capped, _jina_fetch, _raw_fetch from src/tools.py into a new src/tools_web.py. Re-export all moved symbols from src/tools for backward compatibility. Drop unused imports (ipaddress, re, socket, HTMLParser) from src/tools.py. Update monkeypatch targets in tests/test_tools.py for the moved helpers (src.tools_web.socket.getaddrinfo, src.tools_web.urllib.request.urlopen/build_opener). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Register fetch_webpage in default_registry alongside other built-in tools. Tries Jina Reader first (markdown output, link extraction); falls back to direct raw GET + HTML text extraction on 5xx or oversize. Adds import socket for socket.timeout handling. Includes 7 end-to-end tests covering happy path, fallback, truncation, and link dedup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…NA_READER_BASE scheme
Move 818-line src/llm.py into a 7-file src/llms/ package so each responsibility lives in its own module. Public API names are preserved via src/llms/__init__.py. All callers updated; 161 tests still pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…ls package - Created src/tools/__init__.py with public re-exports and side-effect imports - Created src/tools/registry.py with ToolDef, ToolRegistry, tool, ToolContext, ToolExecutor, default_registry - Created src/tools/slack.py with read_attached_images, read_attached_document, fetch_thread_history - Created src/tools/search.py with search_web, _ddg_search, _tavily_search - Created src/tools/web.py merging fetch_webpage from tools.py + all SSRF/HTML/fetch helpers from tools_web.py - Created src/tools/image.py with generate_image - Created src/tools/time.py with get_current_time - Deleted src/tools.py and src/tools_web.py - Updated tests/test_tools.py import paths and patch targets to new submodule locations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
…TIMEZONE through CI+serverless
…crets→vars, grouped, A-Z)
… fetching for user names
BedrockProvider.describe_image always built an Anthropic Messages body,
so vision requests failed with ValidationException when the configured
LLM_MODEL was a Nova ID (amazon.nova-pro-v1:0 and its us./eu./apac./
global. inference-profile variants). chat() already family-routes between
Claude and Nova, but the vision entrypoint did not — read_attached_images
blew up for every Nova-based deployment.
Split describe_image by _text_family: Claude keeps the Messages body path,
Nova goes through converse() with an {image: {format, source: {bytes}}}
content block. MIME is mapped to Nova's short format (image/jpeg → jpeg)
with a conservative png fallback for unknown types.
Also hardens _to_nova_messages tool content — dict/list payloads now go
through json.dumps instead of str() repr, so callers that hand us
structured content don't end up with un-parseable Python syntax in Nova's
text blocks.
Two related issues in the tool dispatch layer:
1) ToolExecutor created a ThreadPoolExecutor in __init__ and never shut
it down. SlackMentionAgent builds a new executor per request, so in
Lambda warm containers every invocation added non-daemon workers to
the process registry that stayed until interpreter exit. Added
ToolExecutor.close() (idempotent, wait=False) and a try/finally in
SlackMentionAgent.run() that only closes executors the agent created
itself — injected executors remain caller-owned.
2) execute() caught a stdlib allowlist (ValueError, TypeError, URLError,
SlackApiError, Boto*). Provider SDKs raise their own hierarchies
(openai.APIError, anthropic.APIError, httpx.HTTPError) that don't
subclass those, so generate_image failures bubbled past the executor
and aborted the whole agent loop with a generic "요청 처리 중 오류"
reply. The LLM never got a chance to see the tool_result failure and
recover. Broadened to `except Exception` — the agent treats
{"ok": False, ...} as recoverable, which is the entire point.
Fallback streaming path called chat_update and only logged on failure — if the current ts became unusable (deleted, message-level rate limit, permission change) the streamer would burn the rest of the request re-trying the same broken ts until the buffer hit max_len. Added a 3-strike counter that rolls to a fresh chat_postMessage once crossed, with preserve_buffer=True so deltas that never reached Slack are re-sent against the new ts. Successful updates reset the counter. Widened sanitize_error redaction: sk-ant-* (Anthropic), xai-* (Grok), tvly-* (Tavily), and AKIA/ASIA AWS access keys all carried through error messages to end users unredacted. More-specific patterns run before the generic sk-* rule so labels stay accurate.
Old implementation re-serialized the full kept list on every pop(0), giving quadratic behavior in message count. Now we serialize each message once, then walk backwards from newest accumulating exact size (including JSON array brackets and ", " item separators) until the budget would be exceeded. Added a regression test that pins output size to the real len(json.dumps(kept)) across a range of budgets, so future rewrites can't drift off the exact separator accounting.
Two flow fixes in _process: - DMs (is_dm=True) no longer go through channel_allowed. DM channel IDs are D-prefixed and not normally in ALLOWED_CHANNEL_IDS, so setting the allowlist for a team used to lock every user out of the direct-message path the moment the var was configured. Slack's install permission already gates who can open the DM with the bot. - Bare "@bot" with no prompt now returns before DedupStore.reserve. The previous ordering reserved a 1-hour TTL row for every empty ping and made Slack retries of the same no-op register as dedup.skip noise.
Nothing in src/ reads AWS_REGION_NAME — config.py pulls the region from AWS_REGION, which the Lambda runtime auto-injects and cannot be overridden via serverless's environment block. The alias was dead config left over from an earlier attempt to work around the reserved name; removed and replaced with a comment so the next contributor doesn't re-introduce it.
Test suite grew from 161 to 189 after the audit follow-ups; per-module coverage moved too (dedup 78→80, slack_helpers 84→86, bedrock 76→78, tools/slack 86→87). Update both CLAUDE.md and README.md so onboarding doesn't trust stale numbers. Also extend CLAUDE.md's "Things that are easy to break" with the new load-bearing invariants the fixes in this series rely on: Nova vision must family-route, agent.run() must close its owned executor, tool exception catch must stay broad enough to wrap provider SDK errors, and channel allowlist must skip DMs.
Previously SYSTEM_MESSAGE overrode the entire default base prompt. An
operator who set it to a persona-style string (e.g. "한국어로 답하세요")
silently deleted the loop's planning / tool-use / attachment-lookup
guidance — which made ambiguous attachment requests need 3 hops to
resolve and let GitHub-markdown and tool-failure patterns slip through.
Split the system prompt into three append-only layers:
- Task rules (code-owned, always present): plan work, parallel
tool_calls, and tool-failure policy (ok:false → tell user briefly +
suggest alternative, no blind retry, no fabrication).
- Operator policy: SYSTEM_MESSAGE is now appended, not substituted.
- Persona: new PERSONA_MESSAGE env carries answer style / tone.
Add three targeted guidances on top:
- Slack mrkdwn rules so the streaming fallback path doesn't surface
raw `**bold**` or `[label](url)` to users (enable_native defaults off
so Slack receives a plain `text` field).
- Speculative attachment lookup for ambiguous references ("이 사진",
"아까 그 파일") — read_attached_* returns [] cheaply when the event
has no file, signaling the fetch_thread_history fallback.
- Parallel tool_calls hint to avoid hop waste when independent tools
are needed together.
Wire PERSONA_MESSAGE through config, app, serverless, push.yml, and
.env.example. Update README env table and add regression tests asserting
task rules survive a set SYSTEM_MESSAGE and the three new guidances stay
in the prompt.
Extend Lambda timeout from 90s to 300s and the generate_image tool from 75s to 240s. gpt-image-2 routinely takes 60-180s and was tripping the 75s tool cap, surfacing as "이미지 생성 중 시간 초과가 발생했습니다" to users. Output flows via Slack Web API (chat_postMessage, files_upload_v2), so the API Gateway 29s integration limit is independent of Lambda runtime. Slack's retry burst from the 29s 504 is already absorbed by DedupStore, so extending Lambda runtime is safe for end-user delivery.
Long-running agent calls (notably generate_image with gpt-image-2)
could exceed the API Gateway 29s integration timeout, which caused
Slack to record a delivery failure and issue retries that raced
against the real reply.
The receiver path now fires a fire-and-forget lambda:Invoke
(InvocationType=Event) against this same function with _worker=True
and returns HTTP 200 within a few hundred ms. The worker path
picks up the payload in a separate Lambda invocation and runs
_process to completion inside the full 300s timeout. The existing
dedup key still absorbs Slack's retries on the receiver side and
Lambda async's built-in 2x retry on worker failure — all paths
converge on the same dedup:{msg_id} row.
Adds lambda:InvokeFunction on self to the runtime IAM role, a
WebClient-from-token rebuild inside the worker (Bolt's injected
client dies with the receiver process), and an inline-execution
fallback so localtest / pytest paths keep working without
AWS_LAMBDA_FUNCTION_NAME.
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Summary
handler.py단일 파일(754줄)을app.py+src/모듈 구조로 전면 재작성Changes
아키텍처 재구성
src/agent.py— 에이전트 루프 소유.tool_calls기반 native function calling, 중복 호출 억제, max_steps 도달 시 강제 composesrc/llm.py—LLMProviderProtocol과 OpenAI / xAI / Bedrock 구현체. 텍스트·이미지 프로바이더가 다를 때를 위한_CompositeProvider포함src/tools.py—@tool데코레이터로 dispatch와 LLM 스키마 동시 선언하는ToolRegistrysrc/dedup.py— DynamoDB 조건부 put 기반 race-safe 중복 제거 + 스레드 대화 메모리src/slack_helpers.py— 코드 펜스를 보존하는 메시지 분할, 스트리밍 메시지 처리src/config.py— 지연 검증Settings, 잘못된 enum/int는 기본값 폴백src/logging_utils.py— request_id 컨텍스트 포함 구조화 JSON 로깅인프라 / 배포
serverless.yml— Python 3.12, x86_64, DynamoDB(GSIuser-index+ TTL), Bedrock IAM.github/workflows/push.yml—pytest --cov실행 후 OIDC 역할(lambda-gurumi-bot)로 Serverless 배포.github/aws-role/— Phase 2 KB 작업까지 커버하는 OIDC trust/role 정책정리
scripts/bedrock/,scripts/notion/,scripts/awsdocs/제거package.json/package-lock.json제거 (Serverless는 글로벌 설치)localtest.py추가 (Slack 없이 CLI로 에이전트 호출)Test Plan
pip install -r requirements.txt -r requirements-dev.txtpython -m pytest --cov=src --cov-report=term-missing— 125 tests pass, 86% coveragepython localtest.py \"안녕\"— 스트리밍 응답 동작 확인python localtest.py \"강아지 그림 그려줘\"—generate_image툴 호출 후 응답 합성 확인