MDPMDPDPMethodsDP
MethodsPolicyIterationPolicy
IterationValueIterationValue
IterationMCMethodsMC
MethodsMonteCarloTD(1)Monte
Carlo
TD(1)TDMethodsTD
MethodsTD(0)TD(0)n-StepBootstrapn-Step
BootstrapTD(λ)TD(λ)ForwardViewForward
ViewEligibilityTracesEligibility
TracesBackwardViewBackward
ViewValueBasedValue
BasedSarsaSarsa1-StepSarsa1-Step
Sarsan-StepSarsan-Step
SarsaSarsa(λ)Sarsa(λ)ExpectedSarsaExpected
SarsaQ-LearningQ-Learning1-StepQ-Learning1-Step
Q-Learningn-StepQ-Learningn-Step
Q-LearningQ(λ)Q(λ)Double QDouble QDQNDQNDDQNDDQNDueling-DQNDueling-DQNDRQNDRQNC51DistributionalDQNC51
Distributional
DQNNoisy NetNoisy NetRainbowDQNRainbow
DQNModelModelDyna-QDyna-QDyna-Q+Dyna-Q+PolicyGradientsPolicy
GradientsReinforceReinforceReinforcew/BaselineReinforce
w/
BaselineTRPOTRPOTRPO+TRPO+PPOPPOIntrinsicRewardIntrinsic
RewardCuriosityICMCuriosity
ICMEmpowermentEmpowermentActor-CriticActor-CriticA3CA3CGA3CGA3CA2CA2CPAACPAACACERACERACKTRACKTRDPGDPGDDPGDDPGTD3TD3SACSACSAC w/temperatureauto-tuneSAC w/
temperature
auto-tuneOACOACMarkovGameMarkov
GameIQLIQLIACIACMADDPGMADDPGCounterfactualReasoningCounterfactual
ReasoningModel ofOtherAgentsModel of
Other
AgentsFactorizableFactorizableHyperNetworkHyperNetworkCOMACOMAVDNVDNQMIXQMIXQTRANQTRANQTRAN-baseQTRAN-baseQTRAN-altQTRAN-altSocialInfluenceSocial
Influence