Reinforcement Learning

Sequential decision making and credit assignment under uncertainty and partial observability is central to developing Intelligent Systems. Reinforcement Learning (RL) provides a general and powerful computational framework for sequential decision making. It involves an agent interacting with the environment to maximize a reward function by selecting actions.

Our research at the Institute of Machine Learning focuses on developing new algorithms and theory required to improve the state of the art in Reinforcement Learning. Credit assignment under delayed reward has been central to our work in recent years. We also actively pursue developing new function approximation methods for scaling Reinforcement Learning to high dimensional problems. Learning to take decisions based on stored data is another area of interest. We actively apply Reinforcement Learning to various applications including robotics, logistics, natural language processing and others.

recent publications in Reinforcement Learning:

  1. ICML
    History Compression via Language Models in Reinforcement Learning
    Paischer, F., Adler, T., Patil, V., Bitto-Nemling, A., Holzleitner, M., Lehner, S., Eghbal-zadeh, H., and Hochreiter, S.
    In 2022
  2. ICML
    Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
    Patil, V., Hofmarcher, M., Dinu, M., Dorfer, M., Blies, P., Brandstetter, J., Arjona-Medina, J., and Hochreiter, S.
    arXiv preprint arXiv:2009.14108 2022
  3. Modern Hopfield Networks for Return Decomposition for Delayed Rewards
    Widrich, M., Hofmarcher, M., Patil, V., Bitto-Nemling, A., and Hochreiter, S.
    In Deep RL Workshop NeurIPS 2021 2021
  4. arXiv
    Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning
    Schweighofer, K., Hofmarcher, M., Dinu, M., Renz, P., Bitto-Nemling, A., Patil, V., and Hochreiter, S.
    2021
  5. arXiv
    Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER
    Holzleitner, M., Gruber, L., Arjona-Medina, J., Brandstetter, J., and Hochreiter, S.
    2020
  6. NeurIPS
    RUDDER: Return Decomposition for Delayed Rewards
    Arjona-Medina, J., Gillhofer, M., Widrich, M., Unterthiner, T., Brandstetter, J., and Hochreiter, S.
    In Advances in Neural Information Processing Systems 2019