Daily Publication ConceptAttention: Diffusion Transformers Learn Highly Interpretable Features A clever way to attend to the important region
Daily Publication CAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained Alignment Generating temporal alignment for video-text retrieval
Daily Publication SVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video Situation Generating temporal alignment for video-text retrieval
Daily Publication Param Î for Direct Weight Mixing: Post-Train Large Language Model at Zero Cost Training-free weight mixing for large language model