Daily PublicationMR. Video: āMapReduceā is the Principle for Long Video UnderstandingMap for dense short clip perception and Reduce for joint aggregation
Daily PublicationΦ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitationforesight sampling for better efficiency and accuracy
Daily PublicationMemory-enhanced Retrieval Augmentation for Long Video UnderstandingTrain a memory model for long video understanding
Daily PublicationVACE: Video Tasks within an All-in-one Framework for Creation and EditingA unified model for video generation and editing
Daily PublicationAnswer, Refuse, or Guess? Investigating Risk-Aware Decision Making in Language ModelsTrain an LLM to build a multi-agent system