
Categories
2025
Daily arXiv Papers - 2025-08-27
Daily arXiv Papers - 2025-08-26
Daily arXiv Papers - 2025-08-25
Daily arXiv Papers - 2025-08-22
Daily arXiv Papers - 2025-08-21
Daily arXiv Papers - 2025-08-20
Daily arXiv Papers - 2025-08-19
Daily arXiv Papers - 2025-08-18
Daily arXiv Papers - 2025-08-15
Daily arXiv Papers - 2025-08-14
Daily arXiv Papers - 2025-08-13
Daily arXiv Papers - 2025-08-12
Daily arXiv Papers - 2025-08-11
Daily arXiv Papers - 2025-08-08
Daily arXiv Papers - 2025-08-07
Daily arXiv Papers - 2025-08-06
Daily arXiv Papers - 2025-08-05
Daily arXiv Papers - 2025-08-04
Daily arXiv Papers - 2025-08-01
Daily arXiv Papers - 2025-07-31
Daily arXiv Papers - 2025-07-30
Daily arXiv Papers - 2025-07-29
Daily arXiv Papers - 2025-07-28
Daily arXiv Papers - 2025-07-25
Daily arXiv Papers - 2025-07-24
Daily arXiv Papers - 2025-07-23
Daily arXiv Papers - 2025-07-22
Daily arXiv Papers - 2025-07-21
Daily arXiv Papers - 2025-07-16
Seed1.5-VL Technical Report

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Beyond Text: Utilizing Vocal Cues to Improve Decision Making in LLMs for Robot Navigation Tasks

LLaMA-Omni 2: LLM-based Real-time Spoken Chatbot with Autoregressive Streaming Speech Synthesis

CAVA: Comprehensive Assessment for Voice Assistants

ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features

CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment

SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation

Param Δ for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost

Kimi-Audio Technical Report

MR. Video: “MapReduce” is the Principle for Long Video Understanding

Φ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitation

Memory-enhanced Retrieval Augmentation for Long Video Understanding

VACE: Video Tasks within an All-in-one Framework for Creation and Editing

Answer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language Models

MAS-GPT: Training LLMs to Build LLM-based Multi-Agent Systems

Talking Turns: Benchmarking Audio Foundation Models on Turn-Taking Dynamics

Why Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus Areas

LeCun's talk on Advanced AI
OPTISHEAR: Towards Efficient and Adaptive Pruning of Large Language Models via Evolutionary Optimization

Reconstruction vs. Generation: Taming Optimization Dilemma in Latent Diffusion Models

Knowledge Bridger: Towards Training-Free Missing Modality Completion
