The making of our #attentionAI model

It might seem like fancy wording, but Predict's latest algorithm is based on best-in-class neuroscience and machine learning work. Here, we show exactly how we have gone about making the new model. This primer works like a simplified explanation of our technical paper.

Two powers combined

Good machine learning models are not only the product of having the best models or most powerful machines. The old dictum of "garbage in, garbage out" still applies here. So you need data. Good data. And when we want to predict visual attention, the best sources of data comes from good quality eye-tracking data.

In our work on making the latest machine learning model, we have tapped into one of Neurons' goldmines: the thousands of people tested both since and before our formal upstart in 2013. Our dataset on customer behavior is well over 12,000 participants from around the global, which have also been tested in a variety of situations: advertising on print and digital media, social media processing, in-store attention, outdoor banners, website visits, packaging, apps, and software GUIs.
‍

The second power is from machine learning models, all provided through the TensorFlow framework. This might come as a surprise, but neuroscience methods have long been using machine learning models, even before they reached the current popularity and frequent naming. Due to the sheer data amounts that neuroimaging methods such as EEG and fMRI produce, as neuroscientists, we have had to solve these computational challenges and opportunities long before the rest of the world caught up.

Now, these two powers have been used to full effect to make the first machine learning model that predicts visual attention.

In the following, we'll describe more in detail how we have gone about making the model. This is all documented in our just-released technical paper, so this post is more to explain the approach in more simple terms.

‍

Initial model training and testing

We set up several different machine learning models, utilizing the past documented advances in machine learning, model frameworks, and graphic card developments. This produced more than 30 different models. Over the course of the past year, thee models have each been running 24/7 for weeks to months before we evaluated their performance.

Each model was fed a random sample of images from the large data set of existing eye-tracking data, and then internally worked on improving its prediction. At the end of the training, a second and independent selection of images with eye-tracking data were used to evaluate the precision of the model. That is, we evaluated the performance of the model on how it performed on a set of eye-tracking images that it had not been trained on. Performance on these evaluation data were then used to choose whether the performance was sufficiently good to be taken into account.

Many models were rejected through this process, and we ended up with a set of 5-7 models. These models were put to the next level of tests and comparisons.

‍

*Selection of models that were used for initial model comparisons. Here, the models were performing decently: all showing over 80% accuracy and a low error rate (below 0.0035 standard error).*

‍

Edging on the model competition

To better evaluate and compare each of the remaining models, we chose to embark on a new level of comparisons not seen in machine learning work. This was inspired by work in clinical sciences such as neuropsychology.

In this work, we divided between three types of comparisons:

Pixel-by-pixel (PBP) comparison -- this was possibly the most conservative measure, as it looked at the level of correspondence between a model and eye-tracking data on a PBP level. It was conservative, as it requires more precision from the model and high granularity.
Area of Interest (AOI) comparisons -- here, we drew AOIs on the original image (i.e., not knowing what the model and eye-tracking data looked like) and extracted aggregated AOI values for both the model prediction and eye-tracking data, and compared these results.
Interpretation (INT) comparison -- The final stage was set to test whether people would make the same conclusions from the model heat map as the eye-tracking heat map. Here, we tested 7 participants, who answered specific questions about the level of attention to a given area of the picture using a 4-point scale ("None", "a little", "moderately", and "high"). Participants were not told whether the heat map was from the eye-tracking or AI model, and all rated both versions in a randomized manner.

‍

*Model accuracy was not only trained for sensitivity (hitting the right target) but also specificity (ignoring the wrong targets).*

‍

Results: the winning formula

From the 5-7 models we ended up comparing, two models showed equal performance across image and video types, and showed indistinguishable performance on each of the measures we had imposed. Here, we show the results in a simplified manner:

General performance

The chosen model showed an accuracy of well over 90%. This means that on average, when comparing a model prediction to eye-tracking results, the PBP comparison would show a 90% accuracy.

‍

Distribution of model comparisons. As these graphs show, the model prediction was skewed towards the high end. This means that there was a higher likelihood of the model being high-performing than lower performing. A similar but opposite effect was found for the standard error, which showed that errors were more likely to be very low.

‍

AOI performance

When comparing model predictions to eye-tracking data on the AOI level, we could run more detailed analyses of the relationship. This showed that there was a significant positive relationship between what the model predicted and what the eye-tracking data showed.

A further sub-analysis focused on whether we wanted to tell whether an AOI had low or high attention. Here, we found that when using a "heat map cutoff" value of 0.62 (range: 0-1) we get a model accuracy of over 92%.

‍

Model comparisons at the AOI level show that there was a highly positive and significant relationship between model prediction and actual eye-tracking data (top). When breaking these data down to different types of visual materials, we see the same level of performance (bottom).

‍

Interpretation comparison

The final level was to test whether people would reach the same conclusion when interpreting the results from the model heat map and eye-tracking heat map. Here, we found a highly significant relationship between the two types of conclusion:

‍

The distribution of error was virtually zero across all images (top). Most of the variation was in one step in either direction (e.g., an eye-tracking heat map region was evaluated as showing "a little" attention, while a model heat map was deemed "moderate"). These variations are comparable to what we see in individual variation in ratings. An analysis of the relationship between the scores showed a highly significant positive relationship (ordinal logistic regression, p<0.0001)

‍

What does this mean?

Taken together, these results clearly demonstrate that the model prediction is comparable to eye-tracking data on a number of counts:

in a pixel-by-pixel comparison of heat maps, the results are extremely similar
the data are virtually the same at the AOI level
there is no difference in how we interpret heat maps from the AI model and the eye-tracking

These results clearly demonstrate the power of #attentionAI in predicting customer attention across a variety of image types and customer attention types.

To read the full technical report, go here.

‍

Next steps and beyond?

With the publication of this customer attention model, we also start looking to the next steps of Predict models. While work ins underway for improving your overall Predict experience, from a machine learning perspective, some ongoing initiatives include:

better models for specific uses, such as phones, retail, web pages, etc.
predicting of subsequent behaviors, such as memory
predicting emotional responses
predicting market responses

Stay tuned for more attention AI and applied neuroscience work from the Neurons team!

The making of our #attentionAI model

Thomas Z. Ramsøy

Two powers combined

Initial model training and testing

Edging on the model competition