Luda — Tourism Data Agent

A Text-to-SQL / Chat-Data system for the tourism industry. Users ask business questions in natural language; the system parses metrics, dimensions, time ranges, and filters, generates safe SQL queries, and returns structured data with visualization suggestions and natural language summaries.

Dual-Mode Engine

Luda ships with two execution modes, switchable via environment variable:

Mode	Description	When to use
`legacy` (default)	Local semantic dictionary + SQL template matching. No LLM dependency, 100% deterministic.	Baseline, fast iteration, no API key
`langgraph`	13-node LangGraph workflow with parallel retrieval, semantic filtering, SQL self-correction, and answer summarization.	Full AI capability, requires LLM + embedding keys

LangGraph Architecture (13 nodes)

extract_keywords
    ↓
┌──────────────────────────────────────┐
│  recall_column  │  recall_value  │  recall_metric  │  (parallel)
└──────────────────────────────────────┘
    ↓
merge_retrieved_info
    ↓
┌──────────────────────────────────────┐
│  filter_table  │  filter_metric  │  (parallel)
└──────────────────────────────────────┘
    ↓
add_extra_context → generate_sql → validate_sql
    ↓
execute_sql → [pass] → summarize_result → END
            → [fail] → correct_sql → re-execute → summarize_result → END

Three-way parallel recall: Vector search (Qdrant) for schema entities + fuzzy text search (ES) for dimension values + metric definition lookup
Semantic filtering: LLM prunes recalled tables/metrics by relevance before SQL generation
SQL self-correction: On validation failure, the LLM corrects the SQL and retries (up to N iterations)
Answer summarization: Raw query results are rewritten into natural language with chart suggestions

Hybrid Retrieval (RAG)

Qdrant: Semantic vector recall for table fields, relationships, and business metrics
Elasticsearch: Fuzzy and full-text search for dimension values (destinations, merchants, product names)

Tech Stack

Layer	Technology
Backend	Python 3.12+ · FastAPI (async)
Orchestration	LangGraph · LangChain
Database	MySQL 8.0
Vector DB	Qdrant
Text Search	Elasticsearch 8.x
LLM / Embedding	OpenAI-compatible API (DeepSeek / DashScope)
Package Manager	uv
Frontend	Vanilla HTML + JS
Testing	pytest

Project Structure

旅答/
├── frontend/                      # Web UI (chat + visualization)
│   ├── index.html
│   ├── app.js
│   └── styles.css
├── docs/
│   ├── architecture.md            # Detailed architecture document
│   ├── api_contract.md            # API / SSE event contract
│   ├── metric_glossary.md         # Business metric definitions
│   ├── demo_questions.md          # Standard test question set
│   ├── frontend_integration.md    # Frontend SSE integration guide
│   └── handoff.md                 # Project handoff documentation
└── data/                          # Backend sub-project (own pyproject.toml)
    ├── app/
    │   ├── agent/                 # Legacy mode SQL template engine + LangGraph graph
    │   ├── api/                   # FastAPI route handlers
    │   ├── prompts/               # LLM prompt templates
    │   ├── scripts/               # Meta knowledge build scripts
    │   ├── services/              # Query service (SSE emission)
    │   └── ...
    ├── generate/                  # Business data generation engine
    ├── seeds/                     # Seed data
    ├── sql/                       # DDL scripts (meta + business)
    ├── tests/                     # Automated tests
    ├── docker/                    # Infrastructure configs (MySQL, Qdrant, ES)
    ├── init_db.py                 # Database initialization
    ├── pyproject.toml
    └── .env.example

data/ is a sub-project with its own pyproject.toml. All uv commands use --project data.

Quick Start

1. Start infrastructure

# MySQL 8 (host port 3307 → container port 3306)
docker run --name luda-mysql \
  -e MYSQL_ROOT_PASSWORD=123321 \
  -p 3307:3306 \
  -d mysql:8.0 \
  --default-authentication-plugin=mysql_native_password \
  --character-set-server=utf8mb4 \
  --collation-server=utf8mb4_unicode_ci

# Qdrant + Elasticsearch (for langgraph mode)
docker compose -f data/docker/docker-compose.retrieval.yaml up -d

2. Configure environment

cp data/.env.example data/.env

Edit data/.env:

# Database
DB_HOST=127.0.0.1
DB_PORT=3307
DB_USER=root
DB_PASSWORD=123321
DB_NAME=travel
APP_PORT=8000

# Agent mode: legacy (default) / langgraph
LUDA_AGENT_MODE=legacy

# Retrieval toggle (default: off; enable for langgraph mode)
RETRIEVAL_ENABLED=false

# LLM config (required for langgraph mode)
LLM_API_KEY=your-llm-key
LLM_BASE_URL=https://api.deepseek.com
LLM_MODEL=deepseek-chat

# Embedding config (required for langgraph mode)
EMBEDDING_API_KEY=your-embedding-key
EMBEDDING_BASE_URL=https://dashscope.aliyuncs.com/compatible-mode/v1
EMBEDDING_MODEL=text-embedding-v4

3. Install dependencies & initialize

# Install Python dependencies
uv sync --project data

# Create meta database
docker exec -i luda-mysql mysql -uroot -p123321 < data/sql/meta.sql

# Initialize business database
uv run --project data python data/init_db.py

# Generate business data (Windows PowerShell)
$env:PYTHONPATH="data"
uv run --project data python -m generate.main --profile full

# Build meta knowledge + retrieval indexes (langgraph mode only)
uv run --project data python -m app.scripts.build_meta_knowledge

4. Start backend

# Windows PowerShell
$env:PYTHONPATH="data"
uv run --project data python -m app.main

# macOS / Linux
PYTHONPATH=data uv run --project data python -m app.main

Swagger docs: http://127.0.0.1:8000/docs
Health check: http://127.0.0.1:8000/health

5. Open frontend

Open frontend/index.html in a browser (or serve it via any static file server).

SSE Event Contract

The system streams progress and results via Server-Sent Events:

Event	Description	Key Fields
`meta`	Request metadata	`trace_id`, `created_at`, `agent_mode`
`progress`	Node execution progress	`step`, `status`, `message`
`keywords`	Extracted keywords and entities	keyword list
`retrieval`	Retrieved tables, columns, metrics, values	retrieval results
`filtered_tables`	LLM-pruned table entities	filtered list
`filtered_metrics`	LLM-pruned metric entities	filtered list
`sql`	Generated SQL statement	SQL string
`sql_validation`	EXPLAIN safety check result	validation status
`sql_correction`	Self-correction log (on failure)	correction details
`result`	Structured tabular data	rows, columns
`chart`	Visualization suggestion config	chart config
`answer`	Natural language summary	summary text
`error`	Exception or safety block	error message
`done`	Workflow complete	—

SQL Safety Guardrails

All generated SQL passes through multi-layer safety checks before execution:

Read-only filter: Only SELECT allowed. Blocks INSERT, UPDATE, DELETE, DROP, ALTER, TRUNCATE, and multi-statement queries.
System schema protection: Blocks access to MySQL internal schemas (information_schema, etc.).
Row limit enforcement: Rewrites unbounded SELECT * queries to include LIMIT.
EXPLAIN validation: Executes EXPLAIN <SQL> on a read-only connection to verify syntax and catch join errors before returning results.

Testing

# Run agent regression tests
uv run --project data pytest data/tests/test_query_agent.py -v

# Run all backend tests
uv run --project data pytest data/tests -v

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
data		data
docs		docs
frontend		frontend
.gitignore		.gitignore
README.md		README.md
需求说明.md		需求说明.md

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Luda — Tourism Data Agent

Dual-Mode Engine

LangGraph Architecture (13 nodes)

Hybrid Retrieval (RAG)

Tech Stack

Project Structure

Quick Start

1. Start infrastructure

2. Configure environment

3. Install dependencies & initialize

4. Start backend

5. Open frontend

SSE Event Contract

SQL Safety Guardrails

Testing

About

Uh oh!

Releases

Packages

Uh oh!

Contributors

Uh oh!

Languages

Folders and files

Latest commit

History

Repository files navigation

Luda — Tourism Data Agent

Dual-Mode Engine

LangGraph Architecture (13 nodes)

Hybrid Retrieval (RAG)

Tech Stack

Project Structure

Quick Start

1. Start infrastructure

2. Configure environment

3. Install dependencies & initialize

4. Start backend

5. Open frontend

SSE Event Contract

SQL Safety Guardrails

Testing

About

Topics

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Contributors

Uh oh!

Languages

Packages