Daily PublicationVACE: Video Tasks within an All-in-one Framework for Creation and EditingA unified model for video generation and editing
Daily PublicationAnswer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language ModelsTrain an LLM to build a multi-agent system
Daily PublicationMAS-GPT: Training LLMs to Build LLM-based Multi-Agent SystemsTrain an LLM to build a multi-agent system
Daily PublicationTalking Turns: Benchmarking Audio Foundation Models on Turn-Taking DynamicsEvaluation bechmark accepted by ICLR
Daily PublicationWhy Is Spatial Reasoning Hard for VLMs? An Attention Mechanism Perspective on Focus AreasSpatial reasoning in VLM using attention visualization