Graduate researcher at the seam of Audit / Large Language Models / Causal Inference_
A graduate researcher at the intersection of audit, finance and large language models. Empirical research has been the through-line throughout my studies — what changes is the instrument.
-
Undergraduate years — Several years of empirical work begun midway through, in corporate finance, with attention to derivatives and acquisition activity.
-
Master's years — Began in fraud-related empirical research; pivoted along with the LLM wave to studying how large models actually behave inside audit practice. Causal inference, a direction long emphasised in my training, runs alongside both threads and remains a continuing self-study.
-
In recent years — Worked through the LLM stack as it appeared, from classical machine learning and deep learning to prompt engineering, parameter-efficient fine-tuning and retrieval-augmented generation, anchored throughout in real audit and finance scenarios rather than benchmarks, in both engineering and research modes.
-
On causal inference — Today it mostly lives inside paper narratives, a way of telling a clean story. I am interested in carrying it past the journal page into the actual decisions an audit team or a firm has to make.
-
On Claude Code — A standing research interest in how coding agents are reshaping empirical workflows, especially in regulated and domain-specific settings.
-
After graduation — Half research, half on-site, staying in audit, finance and management while sharing what I see at the intersection of LLMs, causal inference and industry practice. If you are scoping a thesis in this area, these field notes may hand you a thread to pull.
- academic-agents — A Claude Code skill suite for the day-to-day of academic work: research lookup, evidence binding, structured drafting, and verification before handoff. 56+ GitHub stars.
- paper-discipline-skills — Eleven discipline skills for Chinese scholarly writing: terminology protection, citation auditing, batch-edit safety rails, pre-handoff verification.
- ForenSight — A multi-agent evidentiary-reasoning prototype for fraud review: coordinated retrieval, anomaly screening, and narrative drafting under regulatory constraints.
- awesome-ai-research-writing — A curated reading list on using LLMs for scholarly writing, without the polish-and-publish trap.
- empirical-research-pipeline — A modular Claude Code skill chain covering the full empirical workflow, from data intake to final results.
-
Sequential Policy Learning under Regulatory Constraints · A reinforcement-learning-inspired formulation that frames regulated review as a sequential decision process, with reward shaping derived from compliance standards and from adversarial behaviour on the reviewed side.
-
Multi-Agent Reasoning under Compliance Constraints · Specialised LLM agents coordinated for retrieval, anomaly screening, review and report drafting, built atop parameter-efficient fine-tuning and structured tool use.
-
Domain-Adaptive Foundation Models for Regulatory Corpora · Continued pre-training and instruction tuning of open foundation models on regulatory and accounting text, with attention to evaluation under low-resource conditions.
-
Causal Inference × Large Language Models · Two threads: surfacing candidate causal hypotheses from unstructured corporate disclosures with LLMs; and using DAG-based causal frameworks to constrain and audit LLM-driven decisions in audit and management contexts.
-
Statement-Level Anomaly Detection · Multi-modal signal integration across textual disclosures, accounting ratios and disclosure-network features, under a unified scoring framework for early-warning analytics.
-
Empirical Notebooks (Legacy) · Earlier undergraduate work in empirical corporate finance. Surviving routines, interpolation utilities for Chinese R&D statistics, and LaTeX templates remain in active use.
- Parameter-Efficient Fine-Tuning — LoRA, QLoRA, adapter tuning, prefix tuning.
- Domain Adaptation — Continued pre-training and instruction tuning on regulatory and accounting corpora.
- Retrieval-Augmented Generation — Hybrid sparse–dense retrieval, document re-ranking, and citation-grounded generation.
- Multi-Agent Orchestration — Specialised role decomposition with structured tool use and intermediate verification.
- Causal Identification — Difference-in-differences, instrumental variables, propensity-score matching, regression discontinuity, DAG-based identification.
- Panel-Data Econometrics — Fixed-effect estimation, dynamic panel methods, and staggered-adoption designs.
- Languages — Python · R · SQL · Stata · LaTeX · Bash
- Foundation Models — PyTorch · Hugging Face Transformers · TRL · PEFT · DeepSpeed · vLLM · Unsloth
- Agentic Systems — LangChain · LlamaIndex · LangGraph · Model Context Protocol (MCP) · FAISS · Chroma
- Causal & Empirical — DoWhy · EconML · statsmodels · linearmodels · CausalImpact
- Experimentation — Weights & Biases · MLflow · lm-eval-harness
- Workflow — Claude Code · Anthropic API · Dify · Git · Overleaf
- Domain — Audit · Corporate Disclosures · M&A · Financial Derivatives
