Featured image of post Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Talk With Human-like Agents: Empathetic Dialogue Through Perceptible Acoustic Reception and Reaction

Better model and token fusion strategy

Info

Comments

Method

  1. They use a pre-trained Q-former and a GPT-2 decoder to “caption” the speech first.
  2. TextrolSpeech dataset, which consists of 236,220 pairs of captions and the corresponding speech sample
  3. The PerceptiveAgent’s evaluation metrics assess its cognitive empathy using BERTScore for text quality and affective empathy via an expressive style classifier’s accuracy for audio. MELD is used for BERTScore.
Last updated: 2025-05-13
Built with Hugo, theme modified on Stack