Featured image of post LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

Better model and token fusion strategy

Info

Comments

  1. Read and write: read and fuse from the text token, then generate the speech tokens corresponding to the text token.
  2. Use Qwen2.5-0.5B for model initialization.
Last updated: 2025-05-13
Built with Hugo, theme modified on Stack