Carnegie Mellon University

computer screen with marketing ads

July 23, 2018

Multi-armed bandits: using algorithms for sequential decision-making

By Lucas Grasha

Krista Burns

How do online marketers know what to advertise to you? It’s not by magic—it’s with an algorithm.

ECE Assistant Professor Gauri Joshi is investigating multi-armed bandit algorithms to make data-driven inference and decision-making faster and more efficient. She recently received the Berkman Faculty Development Award to aid her research and acquire real-world data sets to validate her findings.

“Multi-armed bandit” came from an analogy in which a gambler plays a row of slot machines and tries to decide how many times to play each machine to reap the maximum reward. Multi-armed bandit algorithms work by exploring choices of actions and estimating the actions that yield the maximum reward. “For example, suppose a company wants to market a product to a population with unknown demographics,” Joshi said. The company then uses an algorithm to figure out which demographics respond best to which products and accordingly adjust advertising.

But currently, most work on multi-armed bandit algorithms presumes that rewards obtained from different choices are independent of each other. “In reality, many choices result in correlated rewards,” Joshi said. Along with Ph.D. student Samarth Gupta and Prof. Osman Yagan, Joshi is designing algorithms that use the correlation between arms to reduce the need for exploration. This change will enable the algorithm to find the choice that yields maximum reward with minimum number of samples.

The award lets Joshi test and validate her algorithms using real-world datasets. She imagines her algorithms could find use in systems for marketing and recommendation, medical diagnosis, and scheduling in cloud computing.