When a rock band's intensity peaks, the camera cuts to close-ups. When the scene fades to slow motion, the music softens. We all sense this relationship, but can AI models actually reason about it causally?
We introduce KARMA-MV, a benchmark for causal question answering on music videos.