Skip to content

SeanDF333/ReFeynman

Folders and files

NameName
Last commit message
Last commit date

Latest commit

ย 

History

2 Commits
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 
ย 

Repository files navigation

ReFeynman ๐ŸŽ“โš›๏ธ

"If you can't explain it simply, you don't understand it well enough." - Richard Feynman

A physics-focused educational LLM fine-tuned with GRPO (Group Relative Policy Optimization) to teach complex physics concepts in the Feynman style: simple, intuitive, and using everyday analogies.

Special Focus: Core concepts from The Feynman Lectures on Physics, including the Ratchet and Pawl, Path Integrals, and the Electron Spring Model.

License: MIT Python 3.10+ PyTorch

๐ŸŒŸ Features

  • GRPO Training: Implementation of Group Relative Policy Optimization for alignment
  • Feynman-Style Physics Teaching: Trained on ~75 carefully selected physics concepts from classical mechanics to quantum field theory
  • Feynman Lectures Integration: Includes classic examples like the Ratchet and Pawl, Electron Spring Model, and Path Integral formulation
  • Efficient Fine-tuning: Uses LoRA (Low-Rank Adaptation) for parameter-efficient training
  • Free Training Pipeline: Runs entirely on Google Colab free tier (T4 GPU)
  • Gemini-Powered Data: Uses Gemini API for reward modeling and physics dialogue generation

๐Ÿ—๏ธ Architecture

Base Model: Qwen2.5-7B-Instruct
    โ†“
SFT (Supervised Fine-Tuning) with LoRA
    โ†“
GRPO (Group Relative Policy Optimization)
    โ†“ 
ReFeynman Model

Training Pipeline

  1. Data Generation: Gemini 1.5 Flash generates Feynman-style teaching dialogues
  2. SFT Phase: LoRA fine-tuning on educational conversations
  3. GRPO Phase: Reinforcement learning with Gemini as reward model
  4. Evaluation: Multi-metric assessment of teaching quality

๐Ÿ“Š GRPO Implementation

Our GRPO implementation follows the original paper:

# For each prompt:
1. Generate K responses from policy model
2. Score each response with reward model
3. Compute group-relative advantages:
   advantage_i = (reward_i - mean(rewards)) / std(rewards)
4. Update policy with PPO-style objective:
   L = E[min(ratio * A, clip(ratio, 1-ฮต, 1+ฮต) * A)] - ฮฒ * KL

Key Components:

  • Group Sampling: 4 responses per prompt for stable advantage estimation
  • Gemini Reward Model: Evaluates teaching quality (clarity, analogies, accuracy)
  • KL Penalty: Prevents model from deviating too far from SFT checkpoint

๐Ÿš€ Quick Start

Prerequisites

  • Python 3.10+
  • Google Colab account (for free GPU training)
  • HuggingFace account
  • Gemini API key (free tier)

Installation

# Clone the repository
git clone https://github.com/SeanDF333/ReFeynman.git
cd ReFeynman

# Create conda environment
conda create -n LLM python=3.10
conda activate LLM

# Install dependencies
pip install -r requirements.txt

Configuration

# Copy environment template
cp .env.example .env

# Edit .env with your credentials
# - HF_TOKEN: HuggingFace token
# - GEMINI_API_KEY: Gemini API key

Training (on Colab)

  1. Upload notebooks/colab_training.ipynb to Google Colab
  2. Set runtime to GPU (T4)
  3. Add secrets in Colab:
    • HF_TOKEN
    • GEMINI_API_KEY
  4. Run all cells

Expected Training Time:

  • Data Generation: ~30 minutes (500 dialogues)
  • SFT Training: ~2-3 hours
  • GRPO Training: ~1-2 hours

Local Testing

# Interactive mode
python demo.py --checkpoint checkpoints/grpo_final

# Single question
python demo.py --question "Explain quantum entanglement simply"

๐Ÿ“ Project Structure

ReFeynman/
โ”œโ”€โ”€ data/
โ”‚   โ”œโ”€โ”€ generate_data.py      # Gemini-based data generation
โ”‚   โ””โ”€โ”€ train.jsonl            # Training data (generated)
โ”œโ”€โ”€ models/
โ”‚   โ”œโ”€โ”€ sft_trainer.py         # Supervised fine-tuning
โ”‚   โ””โ”€โ”€ grpo_trainer.py        # GRPO implementation โญ
โ”œโ”€โ”€ evaluation/
โ”‚   โ””โ”€โ”€ metrics.py             # Evaluation metrics
โ”œโ”€โ”€ notebooks/
โ”‚   โ””โ”€โ”€ colab_training.ipynb   # End-to-end Colab notebook
โ”œโ”€โ”€ checkpoints/               # Model checkpoints (gitignored)
โ”œโ”€โ”€ config.yaml                # Training configuration
โ”œโ”€โ”€ demo.py                    # CLI demo
โ””โ”€โ”€ README.md

๐ŸŽฏ Results

Sample Outputs

Question: "Can you explain quantum entanglement like I'm 10 years old?"

Before GRPO (SFT only):

Quantum entanglement is a phenomenon where particles become correlated...

After GRPO:

Imagine you have two magic coins. When you flip one and it lands on heads, the other coin INSTANTLY lands on tails - even if they're on opposite sides of the universe! That's quantum entanglement. The particles are "connected" in a spooky way, as Einstein called it...

Training Metrics

Metric SFT GRPO
Avg Reward 0.62 0.81
Clarity Score 6.8/10 8.4/10
Analogy Usage 45% 78%

(Metrics evaluated on 100 test questions)

๐Ÿ› ๏ธ Technical Details

Hyperparameters

SFT:

  • LoRA rank: 16
  • Learning rate: 2e-4
  • Batch size: 4 (grad accum: 4)
  • Epochs: 3

GRPO:

  • Samples per prompt: 4
  • Learning rate: 5e-6
  • KL coefficient: 0.05
  • Clip range: 0.2
  • Iterations: 5

Compute Requirements

  • Training: Free Google Colab T4 (16GB VRAM)
  • Inference: 1650Ti (4GB) with 4-bit quantization
  • Total Training Cost: $0 (using free tiers)

๐Ÿ”ฎ Future Work

  • Implement ORPO/DAPO for comparison
  • Multi-modal teaching (diagrams, animations)
  • Adaptive difficulty based on user level
  • Expand to 30+ subjects
  • Fine-grained reward models (per subject)
  • Gradio web interface
  • Retrieval-augmented teaching (RAG)

๐Ÿ“š References

  1. Group Relative Policy Optimization - GRPO Paper
  2. LoRA: Low-Rank Adaptation
  3. Qwen2.5 Technical Report

๐Ÿค Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Richard Feynman for inspiring a generation of teachers
  • Anthropic for Claude and research guidance
  • Google for Gemini API and Colab
  • HuggingFace for transformers and model hosting
  • Alibaba for Qwen models

๐Ÿ“ง Contact

For questions or collaborations, please open an issue or contact [xiaoyanfan333@gmail.com]


โญ If you find this project helpful, please consider starring it!

About

Educational LLM fine-tuned with GRPO for Feynman-style teaching

Resources

License

Stars

Watchers

Forks

Releases

No releases published

Packages

 
 
 

Contributors