Daily PublicationConceptAttention: Diffusion Transformers Learn Highly Interpretable FeaturesA clever way to attend to the important region
Daily PublicationCAV-MAE Sync: Improving Contrastive Audio-Visual Mask Autoencoders via Fine-Grained AlignmentGenerating temporal alignment for video-text retrieval
Daily PublicationSVLTA: Benchmarking Vision-Language Temporal Alignment via Synthetic Video SituationGenerating temporal alignment for video-text retrieval
Daily PublicationParam Δ for Direct Weight Mixing: Post-Train Large Language Model at Zero CostTraining-free weight mixing for large language model