PhD student's Thao Phuong's paper on multimodal emotion prediction from movies/music is now available on Arxiv, together with the code. AttendAffectNet uses transformers with feature-based attention to attend to the most useful features at any given time to predict the valence/arousal.
Ha Thi Phuong Thao, Balamurali B.T., Dorien Herremans, Gemma Roig, 2020. AttendAffectNet: Self-Attention based Networks for Predicting Affective Responses from Movies. arXiv:2010.11188