refactor: Restructure bot into modular agent with multi-provider LLM support by nalbam · Pull Request #59 · awskrug/lambda-gurumi-bot

nalbam · 2026-04-18T03:25:26Z

Summary

handler.py 단일 파일(754줄)을 app.py + src/ 모듈 구조로 전면 재작성
OpenAI / xAI / Bedrock(Claude · Nova · Titan · Stability) 멀티 프로바이더 지원
Native function calling 기반 에이전트 루프 도입 (의도 → 계획 → 툴 → 응답)
단위/통합 테스트 125개 추가 (커버리지 86%)

Changes

아키텍처 재구성

src/agent.py — 에이전트 루프 소유. tool_calls 기반 native function calling, 중복 호출 억제, max_steps 도달 시 강제 compose
src/llm.py — LLMProvider Protocol과 OpenAI / xAI / Bedrock 구현체. 텍스트·이미지 프로바이더가 다를 때를 위한 _CompositeProvider 포함
src/tools.py — @tool 데코레이터로 dispatch와 LLM 스키마 동시 선언하는 ToolRegistry
src/dedup.py — DynamoDB 조건부 put 기반 race-safe 중복 제거 + 스레드 대화 메모리
src/slack_helpers.py — 코드 펜스를 보존하는 메시지 분할, 스트리밍 메시지 처리
src/config.py — 지연 검증 Settings, 잘못된 enum/int는 기본값 폴백
src/logging_utils.py — request_id 컨텍스트 포함 구조화 JSON 로깅

인프라 / 배포

serverless.yml — Python 3.12, x86_64, DynamoDB(GSI user-index + TTL), Bedrock IAM
.github/workflows/push.yml — pytest --cov 실행 후 OIDC 역할(lambda-gurumi-bot)로 Serverless 배포
.github/aws-role/ — Phase 2 KB 작업까지 커버하는 OIDC trust/role 정책

정리

사용하지 않는 scripts/bedrock/, scripts/notion/, scripts/awsdocs/ 제거
Node 기반 package.json / package-lock.json 제거 (Serverless는 글로벌 설치)
로컬 디버깅용 localtest.py 추가 (Slack 없이 CLI로 에이전트 호출)

Test Plan

pip install -r requirements.txt -r requirements-dev.txt
python -m pytest --cov=src --cov-report=term-missing — 125 tests pass, 86% coverage
python localtest.py \"안녕\" — 스트리밍 응답 동작 확인
python localtest.py \"강아지 그림 그려줘\" — generate_image 툴 호출 후 응답 합성 확인
Slack 워크스페이스에서 dev 스테이지 배포 후 멘션·DM·스레드 대화 동작 확인
동일 메시지 재전송 시 중복 처리되지 않는지 확인 (DynamoDB dedup)

… TopK settings

… tools - Introduced `logging_utils.py` for structured JSON logging with request ID support. - Created `slack_helpers.py` for message formatting, streaming messages, and user name caching. - Implemented `tools.py` for tool registry and six built-in tools: - `read_attached_images` - `read_attached_document` - `fetch_thread_history` - `search_web` - `generate_image` - `get_current_time` - Each tool includes JSON Schema specifications and error handling.

…yment

…nalities - Implement tests for JsonFormatter to ensure request_id and extra fields are included in logs. - Create tests for MessageFormatter to validate message splitting and handling of code blocks. - Add tests for UserNameCache to verify caching behavior and error handling. - Introduce tests for channel_allowed and sanitize_error functions to ensure proper functionality. - Develop comprehensive tests for ToolExecutor, including timeout handling and error capturing. - Add tests for reading attached images and documents, ensuring proper handling of various file types and limits. - Implement tests for web search and image generation tools to validate expected outputs.

…onfiguration

…ow files

amazon-q-developer

Review Summary

This PR restructures the lambda-gurumi-bot from a monolithic handler.py into a modular architecture with multi-provider LLM support. While the refactoring approach is sound, there are 6 critical issues that must be fixed before merge:

Critical Issues (Must Fix):

Workflow References Deleted Infrastructure: The GitHub Actions workflow still attempts to update Bedrock Agent resources that were completely removed from serverless.yml
Service Name Mismatch: Workflow references old stack/role names ("lambda-gurumi-ai-bot") while serverless.yml uses "lambda-gurumi-bot"
DynamoDB Schema Breaking Change: Adding expire_at as a GSI RANGE key breaks existing user-only queries
Invalid DynamoDB AttributeDefinition: expire_at is defined as an attribute but only used for TTL, not as a key

Architecture Concerns:

Suboptimal Architecture Choice: Changed from arm64 to x86_64 citing build complexity, but serverless-python-requirements already supports arm64 with dockerizePip
Deleted Serverless Plugin Reference: Workflow installs serverless-dotenv-plugin which is no longer configured in serverless.yml

Recommendations:

Fix the 6 critical issues identified above
Ensure all new src/ module files (app.py, src/agent.py, src/llm.py, etc.) are properly tested
Update deployment documentation to reflect the new architecture and environment variables
Verify the test coverage claim (86%) includes integration tests for the agent loop and tool calling

The refactoring direction is good, but deployment will fail without addressing these infrastructure and configuration mismatches.

You can now have the agent implement changes and create commits directly on your pull request's source branch. Simply comment with /q followed by your request in natural language to ask the agent to make changes.

⚠️ This PR contains more than 30 files. Amazon Q is better at reviewing smaller PRs, and may miss issues in larger changesets.

…_READER_BASE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ost normalization)

Move _PUBLIC_WEB_UA, _WEB_FETCH_TIMEOUT, _JINA_HEADER_MAX_LINES, _JINA_LINK_RE, _HtmlTextExtractor, _NoRedirectHandler, _validate_public_https_url, _filter_links, _extract_markdown_links, _parse_jina_response, _read_body_capped, _jina_fetch, _raw_fetch from src/tools.py into a new src/tools_web.py. Re-export all moved symbols from src/tools for backward compatibility. Drop unused imports (ipaddress, re, socket, HTMLParser) from src/tools.py. Update monkeypatch targets in tests/test_tools.py for the moved helpers (src.tools_web.socket.getaddrinfo, src.tools_web.urllib.request.urlopen/build_opener). Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Register fetch_webpage in default_registry alongside other built-in tools. Tries Jina Reader first (markdown output, link extraction); falls back to direct raw GET + HTML text extraction on 5xx or oversize. Adds import socket for socket.timeout handling. Includes 7 end-to-end tests covering happy path, fallback, truncation, and link dedup. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…fallback log

…NA_READER_BASE scheme

Move 818-line src/llm.py into a 7-file src/llms/ package so each responsibility lives in its own module. Public API names are preserved via src/llms/__init__.py. All callers updated; 161 tests still pass. Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…ls package - Created src/tools/__init__.py with public re-exports and side-effect imports - Created src/tools/registry.py with ToolDef, ToolRegistry, tool, ToolContext, ToolExecutor, default_registry - Created src/tools/slack.py with read_attached_images, read_attached_document, fetch_thread_history - Created src/tools/search.py with search_web, _ddg_search, _tavily_search - Created src/tools/web.py merging fetch_webpage from tools.py + all SSRF/HTML/fetch helpers from tools_web.py - Created src/tools/image.py with generate_image - Created src/tools/time.py with get_current_time - Deleted src/tools.py and src/tools_web.py - Updated tests/test_tools.py import paths and patch targets to new submodule locations Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…dlib shadow

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

…TIMEZONE through CI+serverless

…crets→vars, grouped, A-Z)

… fetching for user names

BedrockProvider.describe_image always built an Anthropic Messages body, so vision requests failed with ValidationException when the configured LLM_MODEL was a Nova ID (amazon.nova-pro-v1:0 and its us./eu./apac./ global. inference-profile variants). chat() already family-routes between Claude and Nova, but the vision entrypoint did not — read_attached_images blew up for every Nova-based deployment. Split describe_image by _text_family: Claude keeps the Messages body path, Nova goes through converse() with an {image: {format, source: {bytes}}} content block. MIME is mapped to Nova's short format (image/jpeg → jpeg) with a conservative png fallback for unknown types. Also hardens _to_nova_messages tool content — dict/list payloads now go through json.dumps instead of str() repr, so callers that hand us structured content don't end up with un-parseable Python syntax in Nova's text blocks.

Two related issues in the tool dispatch layer: 1) ToolExecutor created a ThreadPoolExecutor in __init__ and never shut it down. SlackMentionAgent builds a new executor per request, so in Lambda warm containers every invocation added non-daemon workers to the process registry that stayed until interpreter exit. Added ToolExecutor.close() (idempotent, wait=False) and a try/finally in SlackMentionAgent.run() that only closes executors the agent created itself — injected executors remain caller-owned. 2) execute() caught a stdlib allowlist (ValueError, TypeError, URLError, SlackApiError, Boto*). Provider SDKs raise their own hierarchies (openai.APIError, anthropic.APIError, httpx.HTTPError) that don't subclass those, so generate_image failures bubbled past the executor and aborted the whole agent loop with a generic "요청 처리 중 오류" reply. The LLM never got a chance to see the tool_result failure and recover. Broadened to `except Exception` — the agent treats {"ok": False, ...} as recoverable, which is the entire point.

Fallback streaming path called chat_update and only logged on failure — if the current ts became unusable (deleted, message-level rate limit, permission change) the streamer would burn the rest of the request re-trying the same broken ts until the buffer hit max_len. Added a 3-strike counter that rolls to a fresh chat_postMessage once crossed, with preserve_buffer=True so deltas that never reached Slack are re-sent against the new ts. Successful updates reset the counter. Widened sanitize_error redaction: sk-ant-* (Anthropic), xai-* (Grok), tvly-* (Tavily), and AKIA/ASIA AWS access keys all carried through error messages to end users unredacted. More-specific patterns run before the generic sk-* rule so labels stay accurate.

Old implementation re-serialized the full kept list on every pop(0), giving quadratic behavior in message count. Now we serialize each message once, then walk backwards from newest accumulating exact size (including JSON array brackets and ", " item separators) until the budget would be exceeded. Added a regression test that pins output size to the real len(json.dumps(kept)) across a range of budgets, so future rewrites can't drift off the exact separator accounting.

@bot

Two flow fixes in _process: - DMs (is_dm=True) no longer go through channel_allowed. DM channel IDs are D-prefixed and not normally in ALLOWED_CHANNEL_IDS, so setting the allowlist for a team used to lock every user out of the direct-message path the moment the var was configured. Slack's install permission already gates who can open the DM with the bot. - Bare "@bot" with no prompt now returns before DedupStore.reserve. The previous ordering reserved a 1-hour TTL row for every empty ping and made Slack retries of the same no-op register as dedup.skip noise.

Nothing in src/ reads AWS_REGION_NAME — config.py pulls the region from AWS_REGION, which the Lambda runtime auto-injects and cannot be overridden via serverless's environment block. The alias was dead config left over from an earlier attempt to work around the reserved name; removed and replaced with a comment so the next contributor doesn't re-introduce it.

Test suite grew from 161 to 189 after the audit follow-ups; per-module coverage moved too (dedup 78→80, slack_helpers 84→86, bedrock 76→78, tools/slack 86→87). Update both CLAUDE.md and README.md so onboarding doesn't trust stale numbers. Also extend CLAUDE.md's "Things that are easy to break" with the new load-bearing invariants the fixes in this series rely on: Nova vision must family-route, agent.run() must close its owned executor, tool exception catch must stay broad enough to wrap provider SDK errors, and channel allowlist must skip DMs.

Previously SYSTEM_MESSAGE overrode the entire default base prompt. An operator who set it to a persona-style string (e.g. "한국어로 답하세요") silently deleted the loop's planning / tool-use / attachment-lookup guidance — which made ambiguous attachment requests need 3 hops to resolve and let GitHub-markdown and tool-failure patterns slip through. Split the system prompt into three append-only layers: - Task rules (code-owned, always present): plan work, parallel tool_calls, and tool-failure policy (ok:false → tell user briefly + suggest alternative, no blind retry, no fabrication). - Operator policy: SYSTEM_MESSAGE is now appended, not substituted. - Persona: new PERSONA_MESSAGE env carries answer style / tone. Add three targeted guidances on top: - Slack mrkdwn rules so the streaming fallback path doesn't surface raw `**bold**` or `[label](url)` to users (enable_native defaults off so Slack receives a plain `text` field). - Speculative attachment lookup for ambiguous references ("이 사진", "아까 그 파일") — read_attached_* returns [] cheaply when the event has no file, signaling the fetch_thread_history fallback. - Parallel tool_calls hint to avoid hop waste when independent tools are needed together. Wire PERSONA_MESSAGE through config, app, serverless, push.yml, and .env.example. Update README env table and add regression tests asserting task rules survive a set SYSTEM_MESSAGE and the three new guidances stay in the prompt.

Extend Lambda timeout from 90s to 300s and the generate_image tool from 75s to 240s. gpt-image-2 routinely takes 60-180s and was tripping the 75s tool cap, surfacing as "이미지 생성 중 시간 초과가 발생했습니다" to users. Output flows via Slack Web API (chat_postMessage, files_upload_v2), so the API Gateway 29s integration limit is independent of Lambda runtime. Slack's retry burst from the 29s 504 is already absorbed by DedupStore, so extending Lambda runtime is safe for end-user delivery.

Long-running agent calls (notably generate_image with gpt-image-2) could exceed the API Gateway 29s integration timeout, which caused Slack to record a delivery failure and issue retries that raced against the real reply. The receiver path now fires a fire-and-forget lambda:Invoke (InvocationType=Event) against this same function with _worker=True and returns HTTP 200 within a few hundred ms. The worker path picks up the payload in a separate Lambda invocation and runs _process to completion inside the full 300s timeout. The existing dedup key still absorbs Slack's retries on the receiver side and Lambda async's built-in 2x retry on worker failure — all paths converge on the same dedup:{msg_id} row. Adds lambda:InvokeFunction on self to the runtime IAM role, a WebClient-from-token rebuild inside the worker (Bolt's injected client dies with the receiver process), and an inline-execution fallback so localtest / pytest paths keep working without AWS_LAMBDA_FUNCTION_NAME.

…mingMessage

nalbam added 11 commits April 13, 2026 14:18

refactor: Update inference configuration comments and remove TopP and…

0a44540

… TopK settings

refactor: Update AWS role policy and streamline deployment workflow

10e0784

fix: Update role-to-assume in AWS credentials configuration for deplo…

af6bca3

…yment

docs: Add missing image for Gurumi Bot in README

328a16f

feat: Add permissions for API Gateway tags in role policy

cca9f5d

refactor: Rename job from 'test-and-deploy' to 'deploy' in workflow c…

24ba902

…onfiguration

chore: Update action versions and enhance step descriptions in workfl…

d1e04c4

…ow files

docs: Update CLAUDE.md and README.md for clarity and workflow details

82447cb

Update README.md

875035f

amazon-q-developer Bot reviewed Apr 18, 2026

View reviewed changes

Comment thread .github/workflows/push.yml

Comment thread .github/workflows/push.yml

Comment thread .github/workflows/push.yml

Comment thread .github/workflows/push.yml

Comment thread serverless.yml

Comment thread serverless.yml

Comment thread serverless.yml

Comment thread serverless.yml

nalbam and others added 18 commits April 20, 2026 11:39

refactor: Remove path instructions from review configuration

165d886

feat(config): add web-fetch settings (MAX_WEB_CHARS/BYTES/LINKS, JINA…

0e385a3

…_READER_BASE) Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

test(config): drop redundant delenv loop in test_web_fetch_defaults

03d51bf

feat(tools): add SSRF guard and no-redirect handler for web fetch

9bc676c

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

fix(tools): strengthen SSRF guard (empty DNS, CGNAT, gaierror test)

302f143

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

feat(tools): add HTML extractor and Jina response parser for web fetch

3aeb526

fix(tools): harden Jina/HTML parsers (inline body, balanced parens, h…

3846fb0

…ost normalization)

feat(tools): add Jina Reader + direct-GET fetchers with size guards

ab5c449

refactor(tools): extract web fetch timeout constant and annotate noqa

a1e31c8

refactor(tools): clarify fetch_webpage fallback flow and demote jina-…

6dc01cc

…fallback log

docs: document fetch_webpage env vars and SSRF gating

b1207fa

fix(tools): document DNS rebinding, clarify except order, validate JI…

76410e1

…NA_READER_BASE scheme

refactor(llms): drop unused imports introduced by the package split

2a07249

refactor(tests): split tests/test_llm.py into tests/llms package

9102a88

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(tests): hoist inline json/base64 imports to module level

1658246

nalbam and others added 28 commits April 20, 2026 16:41

refactor(tests): hoist inline base64 import in test_xai

c8a3ff9

refactor(tools): use relative side-effect imports and note time.py st…

2f33871

…dlib shadow

refactor(tests): split tests/test_tools.py into tests/tools package

9d22f0d

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>

refactor(tests): drop unused imports in tests/tools after split

1e2e2f9

docs: update module paths and document extension points

92294bf

refactor(tests): drop empty llms test stubs

e3bca27

docs: refresh README and add extension guide

fd30c99

fix: add newline before response indication in step output

a316f8e

docs: sync CLAUDE.md + extending guide with current code

ef8e318

chore(deploy): wire MAX_WEB_*/JINA_READER_BASE and MAX_DOC_*/DEFAULT_…

5c3a600

…TIMEZONE through CI+serverless

chore(deploy): align env order across push.yml and serverless.yml (se…

d36faa7

…crets→vars, grouped, A-Z)

chore: add stackTags and tags for katalog_id in serverless.yml

cda6b5f

feat: enhance UserNameCache with thread-safe warm method and parallel…

f0ce062

… fetching for user names

feat: add additional API Gateway permissions for policy management

bb32da9

chore: update AWS credentials action to v6 in workflow files

b554c77

feat: configure Bedrock client with explicit timeout and retry settings

5c7b304

feat: add error handling for Slack API message length limits in Strea…

6a591f0

…mingMessage

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

refactor: Restructure bot into modular agent with multi-provider LLM support#59

refactor: Restructure bot into modular agent with multi-provider LLM support#59
nalbam wants to merge 57 commits intoawskrug:mainfrom
nalbam:main

nalbam commented Apr 18, 2026

Uh oh!

amazon-q-developer Bot left a comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

Conversation

nalbam commented Apr 18, 2026

Summary

Changes

아키텍처 재구성

인프라 / 배포

정리

Test Plan

Uh oh!

amazon-q-developer Bot left a comment

Choose a reason for hiding this comment

Review Summary

Critical Issues (Must Fix):

Architecture Concerns:

Recommendations:

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant