Reinforcement Learning in Big Worlds

A popular view in the AI community is that a sufficiently large neural network, once trained on enough data, would be able to do everything we want our AI systems to do. This idea makes sense if the world we live in has a finite number of things that are useful to learn. It does not make sense if the world is filled with infinitely many things that could be learned, and achieving different goals requires learning different things. The view that the world is big and filled with infinitely many subtle things that could be learned for achieving goals is the big world hypothesis (Javed & Sutton, 2024).

Making autonomous agents that can achieve goals in big worlds poses unique challenges. An agent that interfaces with a big world inevitably encounters unforeseeable situations. Handling these situations necessitates the ability to continually adapt. Continual adaptation, in turn, requires learning algorithms that can run on resources available to the agent throughout its lifetime as opposed to only available during a special training period.

Most of the prior works in deep reinforcement learning have not explicitly developed algorithms that allow agents to achieve goals in big worlds. In prior empirical works, a normal practice is to evaluate large overparameterized agents, such as agents with millions of parameters that run on modern GPUs, on relatively simple environments, such as Atari games that can run on small embedded devices. In prior theoretical works, it is common to assume that the state space is finite and enumerable, that the state is observable, and that the optimal policy can be represented by the agent's parameterization.

There are some exceptions to this. A handful of works have developed algorithms, benchmarks, or theoretical frameworks that are compatible with learning in big worlds. We have added a list of these papers below in the relevant papers section as background readings for the interested reader.

The purpose of this workshop is to bring together people who are already working on the problem of learning in big worlds and people who are interested in joining this endeavor. A significant portion of the earlier work on learning in big worlds has appeared at prior iterations of The Reinforcement Learning Conference, which makes the next Reinforcement Learning Conference a natural venue for this workshop.

Organizers

Alex Lewandowski

Carlo D'Eramo

Esraa Elelimy

Khurram Javed

Kris De Asis

Théo Vincent

Advisors

Jan Peters

Richard S. Sutton

Reinforcement Learning in Big Worlds

Organizers

Advisors

Relevant Papers