Abstract: Video-based facial expression recognition (FER) in the wild is a common yet challenging task. Extracting spatial and temporal features simultaneously is a common approach but may not always ...