"If you can't explain it simply, you don't understand it well enough." - Richard Feynman
A physics-focused educational LLM fine-tuned with GRPO (Group Relative Policy Optimization) to teach complex physics concepts in the Feynman style: simple, intuitive, and using everyday analogies.
Special Focus: Core concepts from The Feynman Lectures on Physics, including the Ratchet and Pawl, Path Integrals, and the Electron Spring Model.
- GRPO Training: Implementation of Group Relative Policy Optimization for alignment
- Feynman-Style Physics Teaching: Trained on ~75 carefully selected physics concepts from classical mechanics to quantum field theory
- Feynman Lectures Integration: Includes classic examples like the Ratchet and Pawl, Electron Spring Model, and Path Integral formulation
- Efficient Fine-tuning: Uses LoRA (Low-Rank Adaptation) for parameter-efficient training
- Free Training Pipeline: Runs entirely on Google Colab free tier (T4 GPU)
- Gemini-Powered Data: Uses Gemini API for reward modeling and physics dialogue generation
Base Model: Qwen2.5-7B-Instruct
โ
SFT (Supervised Fine-Tuning) with LoRA
โ
GRPO (Group Relative Policy Optimization)
โ
ReFeynman Model
- Data Generation: Gemini 1.5 Flash generates Feynman-style teaching dialogues
- SFT Phase: LoRA fine-tuning on educational conversations
- GRPO Phase: Reinforcement learning with Gemini as reward model
- Evaluation: Multi-metric assessment of teaching quality
Our GRPO implementation follows the original paper:
# For each prompt:
1. Generate K responses from policy model
2. Score each response with reward model
3. Compute group-relative advantages:
advantage_i = (reward_i - mean(rewards)) / std(rewards)
4. Update policy with PPO-style objective:
L = E[min(ratio * A, clip(ratio, 1-ฮต, 1+ฮต) * A)] - ฮฒ * KLKey Components:
- Group Sampling: 4 responses per prompt for stable advantage estimation
- Gemini Reward Model: Evaluates teaching quality (clarity, analogies, accuracy)
- KL Penalty: Prevents model from deviating too far from SFT checkpoint
- Python 3.10+
- Google Colab account (for free GPU training)
- HuggingFace account
- Gemini API key (free tier)
# Clone the repository
git clone https://github.com/SeanDF333/ReFeynman.git
cd ReFeynman
# Create conda environment
conda create -n LLM python=3.10
conda activate LLM
# Install dependencies
pip install -r requirements.txt# Copy environment template
cp .env.example .env
# Edit .env with your credentials
# - HF_TOKEN: HuggingFace token
# - GEMINI_API_KEY: Gemini API key- Upload
notebooks/colab_training.ipynbto Google Colab - Set runtime to GPU (T4)
- Add secrets in Colab:
HF_TOKENGEMINI_API_KEY
- Run all cells
Expected Training Time:
- Data Generation: ~30 minutes (500 dialogues)
- SFT Training: ~2-3 hours
- GRPO Training: ~1-2 hours
# Interactive mode
python demo.py --checkpoint checkpoints/grpo_final
# Single question
python demo.py --question "Explain quantum entanglement simply"ReFeynman/
โโโ data/
โ โโโ generate_data.py # Gemini-based data generation
โ โโโ train.jsonl # Training data (generated)
โโโ models/
โ โโโ sft_trainer.py # Supervised fine-tuning
โ โโโ grpo_trainer.py # GRPO implementation โญ
โโโ evaluation/
โ โโโ metrics.py # Evaluation metrics
โโโ notebooks/
โ โโโ colab_training.ipynb # End-to-end Colab notebook
โโโ checkpoints/ # Model checkpoints (gitignored)
โโโ config.yaml # Training configuration
โโโ demo.py # CLI demo
โโโ README.md
Question: "Can you explain quantum entanglement like I'm 10 years old?"
Before GRPO (SFT only):
Quantum entanglement is a phenomenon where particles become correlated...
After GRPO:
Imagine you have two magic coins. When you flip one and it lands on heads, the other coin INSTANTLY lands on tails - even if they're on opposite sides of the universe! That's quantum entanglement. The particles are "connected" in a spooky way, as Einstein called it...
| Metric | SFT | GRPO |
|---|---|---|
| Avg Reward | 0.62 | 0.81 |
| Clarity Score | 6.8/10 | 8.4/10 |
| Analogy Usage | 45% | 78% |
(Metrics evaluated on 100 test questions)
SFT:
- LoRA rank: 16
- Learning rate: 2e-4
- Batch size: 4 (grad accum: 4)
- Epochs: 3
GRPO:
- Samples per prompt: 4
- Learning rate: 5e-6
- KL coefficient: 0.05
- Clip range: 0.2
- Iterations: 5
- Training: Free Google Colab T4 (16GB VRAM)
- Inference: 1650Ti (4GB) with 4-bit quantization
- Total Training Cost: $0 (using free tiers)
- Implement ORPO/DAPO for comparison
- Multi-modal teaching (diagrams, animations)
- Adaptive difficulty based on user level
- Expand to 30+ subjects
- Fine-grained reward models (per subject)
- Gradio web interface
- Retrieval-augmented teaching (RAG)
Contributions are welcome! Please feel free to submit a Pull Request.
This project is licensed under the MIT License - see the LICENSE file for details.
- Richard Feynman for inspiring a generation of teachers
- Anthropic for Claude and research guidance
- Google for Gemini API and Colab
- HuggingFace for transformers and model hosting
- Alibaba for Qwen models
For questions or collaborations, please open an issue or contact [xiaoyanfan333@gmail.com]
โญ If you find this project helpful, please consider starring it!