Param Δ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost

Training-free weight mixing for large language model

Info

Title: Param Δ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost
Group: Meta
Keywords: training-free, weight mixing, large language model
Venue: ICLR 2025

Comments

A new post-training method for large language model. It directly adds the weight difference (ΔΘ = Θpost - Θbase) to the new base model (ΘParamΔ = Θ′base + ΔΘ), achieving comparable performance without additional training. Method