Skip to content

FEAT: Improve TranslationConverter prompt for faithful translation#2082

Open
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:improve-translation-converter
Open

FEAT: Improve TranslationConverter prompt for faithful translation#2082
romanlutz wants to merge 2 commits into
microsoft:mainfrom
romanlutz:improve-translation-converter

Conversation

@romanlutz

Copy link
Copy Markdown
Contributor

Description

The LLM-based TranslationConverter sometimes paraphrases content, uses synonyms, or fails to preserve code blocks and special formatting during translation. This is problematic for AI safety research where preserving exact prompt structure is critical.

This PR enhances the TranslationConverter system prompt to produce more faithful, literal translations by:

  • Adding an explicit role as a literal, word-for-word translation engine
  • Adding critical rules for preserving code blocks, URLs, file paths, and special characters
  • Adding rules against paraphrasing, synonym substitution, and idiomatic interpretation
  • Adding examples demonstrating code preservation and structural marker handling
  • Changing markers to [TRANSLATE_START]/[TRANSLATE_END]\ for clearer delimitation
  • Fixing the parameter name from \languages\ to \language\ (singular) for consistency

Testing results: Evaluation on 15 diverse test cases with round-trip translation (EN -> German -> EN) shows average similarity improved from 0.900 to 0.948 (+5.3%), with the improved version winning on 5 tests, the original on 2, and ties on 8. The biggest gains were in code preservation and structural fidelity.

Tests and Documentation

  • Tested manually with round-trip translation on diverse prompts covering code blocks, structural markers, quoted text, URLs, markdown formatting, and technical terminology
  • No new tests added as this is a prompt template improvement; existing unit tests for TranslationConverter still pass
  • Documentation in the YAML file is self-documenting through the examples

Copilot AI added 2 commits June 25, 2026 05:50
Enhance the LLM-based TranslationConverter to produce more faithful
translations by:

- Adding explicit role as a literal word-for-word translation engine
- Adding critical rules for preserving code blocks, URLs, special chars
- Adding rules against paraphrasing and synonym substitution
- Adding examples for code preservation and structural markers
- Changing markers to [TRANSLATE_START]/[TRANSLATE_END] for clarity
- Fixing parameter name from 'languages' to 'language' (singular)

Testing on 15 diverse prompts with round-trip translation (EN→DE→EN)
shows improvement from 0.900 to 0.948 average similarity (+5.3%),
with particular gains in code preservation and structural fidelity.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
The PR changed the user prompt markers from '=== begin ==='/
'=== end ===' to '[TRANSLATE_START]'/'[TRANSLATE_END]', but the
byte-for-byte regression test still asserted the old format. Update
the expected string to match the new template and remove the now-unused
dedent import.

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants