Sound vs. Silence: Investigating the Impact of Audio on Video Attention Heatmaps

Neurons HQ

February 6, 2023

Main findings

There are no significant differences between eye-tracking heatmaps for videos played with and without sound.

The average difference is around 5% depending on the metric, smoothness of the heatmap, and frame rate of the video attention heatmap.

Scope of the study

What is the role of sound in advertising videos? Previous studies have demonstrated mixed results, as some studies have found audio-free ads to have lower effectiveness, while other studies have demonstrated limited to no effect. However, these diverging results may be due to the inconsistent use of measures and metrics: some have been focusing on attention, others on ad/brand recall.

This study aimed to investigate whether audio in videos influences consumer attention to ads. This primary focus on attention only was at the cost of other measures such as emotional, cognitive, and memory-related responses. To this end, we used the attention heatmap generated from eye-tracking data to compare videos presented with and without audio.

The broader aim of this study was also to inform the work on creating predictive AI models for visual attention. Should the results show that audio leads to a dramatic change in visual attention, then AI modeling should include explicit decisions on whether to use videos with or without audio. In the case of few to no differences, videos with and without audio could be included in the AI model creation.

The focus of this study was only for use in AI prediction of visual attention, while comparable work would need to be conducted for other domains such as emotional and cognitive responses.


We collected eye-tracking data from a total of 106 participants split into two groups. One group was exposed to videos played without sound, and the other watched the same videos with sound on.

  • Participants: We recorded data from 35 people (in Denmark) on videos played without audio and compared the heatmaps to the ones that were collected a year before for the same videos played with audio (2 groups, 36 and 35 participants in each). The recording with audio was done in two groups (in Chicago and Orlando), so first, we compared the attention heatmaps between these two groups to have a base for the audio/no-audio comparison.
  • Eye-tracking: All participants were exposed to each video (played either with or without audio, never both) in a pseudorandomized manner, without having any particular task. We used a stationary eye-tracking device (screen-based Tobii Pro Nano eye-tracker) to record participants' eye movements. Using this data we created saliency maps for each group.
  • Comparisons: We then compared the heatmaps of the first 5 seconds of exposure on a pixel-by-pixel level to answer the research question, using 3 different metrics (i.e., KL divergence, Similarity score, Correlation coefficient) usually used to assess the similarity of heatmaps.

Key findings

We did not see a significant difference between eye-tracking heatmaps for recordings done using videos with audio and without. 

Box plot showing the relative difference and similarity between sound and no sound conditions. The box represents the upper and lower quantiles, the middle line indicates the median value, and the whiskers denote the minimum and maximum values. The comparisons are correlation coefficient (CC), Kullback-Leibler Divergence (KLD), and Similarity Score (SIM).

The study concludes that there is no need to collect more data for videos played without sound, as audio does not result in a significant bias for the attention heatmaps.

These results may depend on how "raw" or “smoothed” the heatmaps are. Each heatmap is a result of smoothing raw fixations with a Gaussian filter. Depending on the characteristics of this filter heatmaps show a bit differently. However, when we compared heatmaps with different smoothness, we still found the same results.

A visual comparison of eye-tracking heatmaps with and without audio is provided here.

Caveats and further actions

The study did not examine specific measures of eye-tracking that are locked to specific areas of interest (AOIs) such as total fixation duration (TFD) and time to first fixation (TTFF). This lies beyond the scope of the current study but is part of our future research. Additionally, it is worth noting that this study did not consider participants' emotions or cognitive responses while viewing the videos.

Sound vs. Silence: Investigating the Impact of Audio on Video Attention Heatmaps

Neurons Icon

Ready to drive revenue with creatives that work?

Get a free demo