Reinforcement Learning

Sequential decision making and credit assignment under uncertainty and partial observability is central to developing Intelligent Systems. Reinforcement Learning (RL) provides a general and powerful computational framework for sequential decision making. It involves an agent interacting with the environment to maximize a reward function by selecting actions.

Our research at the Institute of Machine Learning focuses on developing new algorithms and theory required to improve the state of the art in Reinforcement Learning. Credit assignment under delayed reward has been central to our work in recent years. We also actively pursue developing new function approximation methods for scaling Reinforcement Learning to high dimensional problems. Learning to take decisions based on stored data is another area of interest. We actively apply Reinforcement Learning to various applications including robotics, logistics, natural language processing and others.

recent publications in Reinforcement Learning:

  1. RRL
    Learning to Modulate pre-trained Models in RL
    Schmied, T., Hofmarcher, M., Paischer, F., Pascanu, R., and Hochreiter, S.
    2023
  2. Toward Semantic History Compression for Reinforcement Learning
    Paischer, F., Adler, T., Radler, A., Hofmarcher, M., and Hochreiter, S.
    2022
  3. DeepRL
    InfODist: Online distillation with Informative rewards improves generalization in Curriculum Learning
    Siripurapu, R., Patil, V., Schweighofer, K., Dinu, M., Schmied, T., Diez, L., Holzleitner, M., Eghbal-zadeh, H., Kopp, M., and Hochreiter, S.
    2022
  4. FMDM
    Foundation Models for History Compression in Reinforcement Learning
    Paischer, F., Adler, T., Radler, A., Hofmarcher, M., and Hochreiter, S.
    2022
  5. CoLLAs
    A Dataset Perspective on Offline Reinforcement Learning
    Schweighofer, K., Radler, A., Dinu, M., Hofmarcher, M., Patil, V., Bitto-Nemling, A., Eghbal-zadeh, H., and Hochreiter, S.
    2022
  6. CoLLAs
    Reactive Exploration to Cope with Non-Stationarity in Lifelong Reinforcement Learning
    Steinparz, C., Schmied, T., Paischer, F., Dinu, M., Patil, V., Bitto-Nemling, A., Eghbal-zadeh, H., and Hochreiter, S.
    2022
  7. ICML
    History Compression via Language Models in Reinforcement Learning
    Paischer, F., Adler, T., Patil, V., Bitto-Nemling, A., Holzleitner, M., Lehner, S., Eghbal-zadeh, H., and Hochreiter, S.
    In 2022
  8. ICML
    Align-RUDDER: Learning From Few Demonstrations by Reward Redistribution
    Patil, V., Hofmarcher, M., Dinu, M., Dorfer, M., Blies, P., Brandstetter, J., Arjona-Medina, J., and Hochreiter, S.
    arXiv preprint arXiv:2009.14108 2022
  9. arXiv
    Understanding the Effects of Dataset Characteristics on Offline Reinforcement Learning
    Schweighofer, K., Hofmarcher, M., Dinu, M., Renz, P., Bitto-Nemling, A., Patil, V., and Hochreiter, S.
    2021
  10. Modern Hopfield Networks for Return Decomposition for Delayed Rewards
    Widrich, M., Hofmarcher, M., Patil, V., Bitto-Nemling, A., and Hochreiter, S.
    In Deep RL Workshop NeurIPS 2021 2021
  11. arXiv
    Convergence Proof for Actor-Critic Methods Applied to PPO and RUDDER
    Holzleitner, M., Gruber, L., Arjona-Medina, J., Brandstetter, J., and Hochreiter, S.
    2020
  12. NeurIPS
    RUDDER: Return Decomposition for Delayed Rewards
    Arjona-Medina, J., Gillhofer, M., Widrich, M., Unterthiner, T., Brandstetter, J., and Hochreiter, S.
    In Advances in Neural Information Processing Systems 2019