Daily PublicationCAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentGenerating temporal alignment for video-text retrieval
Daily PublicationSVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationGenerating temporal alignment for video-text retrieval
Daily PublicationParam Δ for Direct Weight Mixing: Post-Train Large Language Model at Zero CostTraining-free weight mixing for large language model
Daily PublicationMR. Video: “MapReduce” is the Principle for Long Video UnderstandingMap for dense short clip perception and Reduce for joint aggregation