Accidental exploration through value predictors

Szczegóły
Abstrakt

Tytuł:: Accidental exploration through value predictors
Autorzy:: Kisielewski, Tomasz
Leśniak, Damian
Data publikacji:: 2018
Słowa kluczowe:: reinforcement learning
value predictors
exploration
Język:: angielski
Dostawca treści:: BazTech
: Artykuł

Przejdź do źródła

Infinite length of trajectories is an almost universal assumption in the theoretical foundations of reinforcement learning. In practice learning occurs on finite trajectories. In this paper we examine a specific result of this disparity, namely a strong bias of the time-bounded Every-visit Monte Carlo value estimator. This manifests as a vastly different learning dynamic for algorithms that use value predictors, including encouraging or discouraging exploration. We investigate these claims theoretically for a one dimensional random walk, and empirically on a number of simple environments. We use GAE as an algorithm involving a value predictor and evolution strategies as a reference point.

Opracowanie rekordu w ramach umowy 509/P-DUN/2018 ze środków MNiSW przeznaczonych na działalność upowszechniającą naukę (2019).

Informacja

Accidental exploration through value predictors