Daily PublicationSVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationGenerating temporal alignment for video-text retrieval
Daily PublicationParam Δ for Direct Weight Mixing: Post-Train Large Language Model at Zero CostTraining-free weight mixing for large language model
Daily PublicationMR. Video: “MapReduce” is the Principle for Long Video UnderstandingMap for dense short clip perception and Reduce for joint aggregation
Daily PublicationΦ-Decoding: Adaptive Foresight Sampling for Balanced Inference-Time Exploration and Exploitationforesight sampling for better efficiency and accuracy