BATON: Aligning Text-to-Audio Model with Human Preference Feedback Paper • 2402.00744 • Published Feb 1, 2024
Rhythmic Foley: A Framework For Seamless Audio-Visual Alignment In Video-to-Audio Synthesis Paper • 2409.08628 • Published Sep 13, 2024
AToM: Aligning Text-to-Motion Model at Event-Level with GPT-4Vision Reward Paper • 2411.18654 • Published Nov 27, 2024
Metis: A Foundation Speech Generation Model with Masked Generative Pre-training Paper • 2502.03128 • Published Feb 5 • 2
NVSpeech: An Integrated and Scalable Pipeline for Human-Like Speech Modeling with Paralinguistic Vocalizations Paper • 2508.04195 • Published Aug 6 • 1