fix(langchain): extract model name for ChatHuggingFace by vismaytiwari · Pull Request #1728 · langfuse/langfuse-python

vismaytiwari · 2026-06-28T18:36:41Z

What does this PR do?

When a generation runs through ChatHuggingFace, Langfuse recorded it without a model name and logged "Langfuse was not able to parse the LLM model". ChatHuggingFace is not LangChain-serializable, so it reaches the callback as a not_implemented stub with no kwargs; the model id is only present in the repr string as model_id='...'. _extract_model_name had no entry for it, so extraction fell through and returned None.

This adds a repr-pattern entry that reads model_id, matching how the other repr-only models (HuggingFaceHub, Ollama, etc.) are already handled. ChatHuggingFace exposes model_id regardless of the underlying backend (HuggingFaceEndpoint, HuggingFaceHub, or HuggingFacePipeline), so this covers those cases.

Type of change

Bug fix

Verification

I confirmed the model id is only available in the repr by dumping a real ChatHuggingFace via langchain_core.load.dumpd, then wrote a deterministic unit test using that serialized shape so it needs no live HuggingFace call. The test returns None before the change and the correct model id after.

uv run --frozen pytest tests/unit/test_langchain_utils.py
uv run --frozen ruff check langfuse/langchain/utils.py tests/unit/test_langchain_utils.py
uv run --frozen ruff format --check langfuse/langchain/utils.py tests/unit/test_langchain_utils.py
uv run --frozen mypy langfuse/langchain/utils.py --no-error-summary

Checklist

I self-reviewed the diff using code_review.md.
I added or updated tests for behavior changes.
I updated docs, examples, or .env.template if needed.
I did not hand-edit generated files; if generated files changed, I used the upstream regeneration path.
I did not commit secrets or credentials.

Greptile Summary

Adds ChatHuggingFace to the repr-based model-name extraction table so that Langfuse records a model name when LangChain callbacks arrive from ChatHuggingFace, which serialises as a not_implemented stub with the model id only available in the repr string.

langfuse/langchain/utils.py: Inserts (\"ChatHuggingFace\", \"model_id\", None) into models_by_pattern, following the identical approach already used for HuggingFaceHub, Ollama, DeepInfra, and others that expose their model identifier only via repr.
tests/unit/test_langchain_utils.py: Adds a new unit test file with a deterministic serialized stub matching the real dumpd shape of ChatHuggingFace, verifying the regex extracts the correct model_id.

Confidence Score: 5/5

Safe to merge — a one-line addition to a lookup table with a corresponding unit test.

The change is minimal and self-contained: one new tuple in models_by_pattern using the same repr-regex mechanism already proven by multiple other entries. The unit test confirms the extraction works correctly for the primary backend shape. The only gap is that the HuggingFacePipeline-backend repr variant is not explicitly tested, but since both occurrences carry the same value in practice this is a test-coverage gap rather than a functional defect.

No files require special attention.

Flowchart

%%{init: {'theme': 'neutral'}}%%
flowchart TD
    A[_extract_model_name called] --> B{Match by ID path\nmodels_by_id list}
    B -- match --> Z[Return model name]
    B -- no match --> C{AzureOpenAI\nspecial case?}
    C -- yes --> D[Extract from\ninvocation_params / kwargs]
    D --> Z
    C -- no --> E{Match by repr\nmodels_by_pattern list}
    E -- includes new ChatHuggingFace entry\nre.search model_id='.+' in repr --> F{repr contains\nmodel_id='...'?}
    F -- yes --> Z
    F -- no --> G[Return None default]
    E -- no match --> H{Catch-all path\nkwargs / serialized}
    H -- found --> Z
    H -- not found --> I[Return None]

%%{init: {'theme': 'base', 'themeVariables': {"darkMode": true, "background": "#0d1117", "primaryColor": "#21262d", "primaryTextColor": "#e6edf3", "primaryBorderColor": "#8b949e", "lineColor": "#8b949e", "textColor": "#e6edf3", "edgeLabelBackground": "#161b22", "actorBkg": "#21262d", "actorBorder": "#8b949e", "actorTextColor": "#e6edf3", "actorLineColor": "#8b949e", "signalColor": "#8b949e", "signalTextColor": "#e6edf3", "noteBkgColor": "#373320", "noteBorderColor": "#d4a72c", "noteTextColor": "#f0e6c0", "labelBoxBkgColor": "#21262d", "labelBoxBorderColor": "#8b949e", "labelTextColor": "#e6edf3", "loopTextColor": "#e6edf3", "activationBkgColor": "#30363d", "activationBorderColor": "#8b949e"}}}%%
flowchart TD
    A[_extract_model_name called] --> B{Match by ID path\nmodels_by_id list}
    B -- match --> Z[Return model name]
    B -- no match --> C{AzureOpenAI\nspecial case?}
    C -- yes --> D[Extract from\ninvocation_params / kwargs]
    D --> Z
    C -- no --> E{Match by repr\nmodels_by_pattern list}
    E -- includes new ChatHuggingFace entry\nre.search model_id='.+' in repr --> F{repr contains\nmodel_id='...'?}
    F -- yes --> Z
    F -- no --> G[Return None default]
    E -- no match --> H{Catch-all path\nkwargs / serialized}
    H -- found --> Z
    H -- not found --> I[Return None]

Prompt To Fix All With AI

Fix the following 1 code review issue. Work through them one at a time, proposing concise fixes.

---

### Issue 1 of 1
tests/unit/test_langchain_utils.py:29-32
**Test only covers HuggingFaceEndpoint backend shape**

The PR description correctly notes that `ChatHuggingFace` works with three backends — `HuggingFaceEndpoint`, `HuggingFaceHub`, and `HuggingFacePipeline`. `HuggingFacePipeline` itself has a `model_id` attribute in its own repr, so a wrapped repr could look like `ChatHuggingFace(llm=HuggingFacePipeline(model_id='...', ...), model_id='...')`. The regex uses `re.search` (first match), meaning it would pick up the inner `HuggingFacePipeline`'s `model_id` rather than `ChatHuggingFace`'s. In practice both should be the same value, but adding a parameterised test for each backend shape would guard against any future repr change where the two values might differ.

_{Reviews (1): Last reviewed commit: "fix(langchain): extract model name for C..." | Re-trigger Greptile}

Greptile also left 1 inline comment on this PR.

ChatHuggingFace is not LangChain-serializable, so it is passed to the callback as a not_implemented stub with no kwargs; the model id is only present in the repr string as model_id='...'. _extract_model_name had no entry for it, so generations from a ChatHuggingFace model were recorded without a model name. Add a repr-pattern entry that reads model_id.

claude

Claude Code Review

This pull request is from a fork — automated review is disabled. A repository maintainer can comment @claude review to run a one-time review.

greptile-apps · 2026-06-28T18:38:39Z

+
+def test_extract_model_name_chat_huggingface():
+    serialized = _chat_huggingface_serialized("Qwen/Qwen2.5-Coder-32B-Instruct")
+


Test only covers HuggingFaceEndpoint backend shape

The PR description correctly notes that ChatHuggingFace works with three backends — HuggingFaceEndpoint, HuggingFaceHub, and HuggingFacePipeline. HuggingFacePipeline itself has a model_id attribute in its own repr, so a wrapped repr could look like ChatHuggingFace(llm=HuggingFacePipeline(model_id='...', ...), model_id='...'). The regex uses re.search (first match), meaning it would pick up the inner HuggingFacePipeline's model_id rather than ChatHuggingFace's. In practice both should be the same value, but adding a parameterised test for each backend shape would guard against any future repr change where the two values might differ.

Prompt To Fix With AI

This is a comment left during a code review. Path: tests/unit/test_langchain_utils.py Line: 29-32 Comment: **Test only covers HuggingFaceEndpoint backend shape** The PR description correctly notes that `ChatHuggingFace` works with three backends — `HuggingFaceEndpoint`, `HuggingFaceHub`, and `HuggingFacePipeline`. `HuggingFacePipeline` itself has a `model_id` attribute in its own repr, so a wrapped repr could look like `ChatHuggingFace(llm=HuggingFacePipeline(model_id='...', ...), model_id='...')`. The regex uses `re.search` (first match), meaning it would pick up the inner `HuggingFacePipeline`'s `model_id` rather than `ChatHuggingFace`'s. In practice both should be the same value, but adding a parameterised test for each backend shape would guard against any future repr change where the two values might differ. How can I resolve this? If you propose a fix, please make it concise.

Good catch. I parameterised the test to cover all three backends (HuggingFaceEndpoint, HuggingFaceHub, HuggingFacePipeline). The HuggingFacePipeline case specifically exercises the double model_id repr you flagged, where re.search matches the inner one; since both ids resolve to the same model the extracted value is correct, and the test now guards against future drift between them.

claude Bot reviewed Jun 28, 2026

View reviewed changes

vismaytiwari mentioned this pull request Jun 28, 2026

bug: <short description> Langfuse was not able to parse the LLM model. langfuse/langfuse#14103

Open

greptile-apps Bot reviewed Jun 28, 2026

View reviewed changes

test(langchain): cover all ChatHuggingFace backends in model-name test

df65690

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

fix(langchain): extract model name for ChatHuggingFace#1728

fix(langchain): extract model name for ChatHuggingFace#1728
vismaytiwari wants to merge 2 commits into
langfuse:mainfrom
vismaytiwari:fix-langchain-chathuggingface-model

vismaytiwari commented Jun 28, 2026 •

edited by greptile-apps Bot

Loading

Uh oh!

claude Bot left a comment

Uh oh!

greptile-apps Bot Jun 28, 2026

Uh oh!

vismaytiwari Jun 28, 2026

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant


		def test_extract_model_name_chat_huggingface():
		serialized = _chat_huggingface_serialized("Qwen/Qwen2.5-Coder-32B-Instruct")

Uh oh!

Conversation

vismaytiwari commented Jun 28, 2026 • edited by greptile-apps Bot Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

What does this PR do?

Type of change

Verification

Checklist

Greptile Summary

Confidence Score: 5/5

Flowchart

Uh oh!

claude Bot left a comment

Choose a reason for hiding this comment

Claude Code Review

Uh oh!

greptile-apps Bot Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

vismaytiwari Jun 28, 2026

Choose a reason for hiding this comment

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

1 participant

vismaytiwari commented Jun 28, 2026 •

edited by greptile-apps Bot

Loading