feat(index): Add fallback for community report extraction on provider…#2399
Open
Luotianyi-0712-tech wants to merge 1 commit into
Open
feat(index): Add fallback for community report extraction on provider…#2399Luotianyi-0712-tech wants to merge 1 commit into
Luotianyi-0712-tech wants to merge 1 commit into
Conversation
…s without json_schema support
Author
|
@microsoft-github-policy-service agree |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Problem
When I first used and configured the deepseek api, such an error occurred“Pipeline error: 'community'”,The reason is that GraphRAG's community report generation forces the use of JSON mode, which is incompatible with the DeepSeek API.
The
create_community_reportsworkflow fails entirely when using LLM providers that do not supportresponse_formatwith Pydantic models (i.e.,json_schemamode). This includes popular providers like DeepSeek, which only supports{"type": "json_object"}and returns:BadRequestError: This response_format type is unavailable nowWhen all community report requests fail, the pipeline crashes with
KeyError: 'community'becausecommunity_reportsDataFrame is empty.Solution
Implement a three-tier fallback strategy in
CommunityReportsExtractor:response_format(best quality, works on OpenAI/Azure/Anthropic).json_schema, catch the error and retry with{"type": "json_object"}(works on DeepSeek, Gemini 1.5, etc.).json_objectis rejected, fall back to plain text completion and extract JSON manually via regex (works on Ollama and other minimal providers).Provider Support Matrix
Changes
community_reports_extractor.py: Added_is_unsupported_response_format_error(),_parse_json_from_text(), and three-tier try/catch logic.json_schemawithout any behavior change.Testing
gpt-4o-mini) — usesjson_schemapathdeepseek-v4-flash) — falls back tojson_objectpath