This script contains implementation of attention: https://github.com/saikrishnarallabandi/falkon/blob/master/tasks/speech/self_assessed_affect/baseline/local/seqmodels/baseline_attention.py The data balancing thingy suggested in https://www.isca-speech.org/archive/Interspeech_2018/pdfs/1610.pdf seems interesting.