Szepesvári Csaba előadása: Explore-exploit tradeoff

Szepesvári Csaba, a Department of Computing Science, University of Alberta kutatója egy megerősítéses tanuláshoz (is) kötődő előadást fog tartani, amelyre minden érdeklődőt szeretettel várunk. Az előadás címe: ”Explore-exploit tradeoff: A report from the AI battlefield”.

Helyszín: IE224.

Időpont: május 11. 12.30-14.00.

Abstract:

Machine learning and artificial intelligence makes the news almost every day. Big data, GPUs, distributed computing, and the comeback of neural networks amongst other things make it possible to address challenges that were out of reach just a little while ago. Some predict that as a result of these improvements true artificial intelligence is around the corner. But
true artificial intelligence requires learning purely from interactions with the environment and intelligent, forward looking decision making, the topic of reinforcement learning. In this talk, I will focus on just one of the unique challenges in reinforcement learning, the so-called exploration exploitation problem, which arises when an agent needs to act based on
uncertain knowledge. I will start by consider the simplest, so-called finite-armed bandit setting, reviewing and motivating basic strategies, such as UCB1 and Thompson sampling. I will then discuss more realistic settings, where the number of actions is enormous, or the decision is to be made based on some side-information, but the payoff is structured. The question then is
how to exploit structure, such as when the payoff is a linear function of the features of actions? Building on the linear structure, can, for example, sparsity, as in the supervised case, speed up learning? 
Bringing back "states", I will next discuss core results and strategies for optimal exploration in full-scale reinforcement learning. As we will see, while fascinating progress has been made during the last decades, much remains open in this exciting area.
© 2010-2024 BME MIT