Next Token Is Enough: Realistic Image Quality and Aesthetic Scoring with Multimodal Large Language Model