Daily PublicationBeyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation TasksBetter model and token fusion strategy
Daily PublicationLLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech SynthesisBetter model and token fusion strategy
Daily PublicationConceptAttention: Diffusion Transformers Learn Highly Interpretable FeaturesA clever way to attend to the important region
Daily PublicationCAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentGenerating temporal alignment for video-text retrieval