MDPMDPDPMethodsDP MethodsPolicyIterationPolicy IterationValueIterationValue IterationMCMethodsMC MethodsMonteCarloTD(1)Monte Carlo TD(1)TDMethodsTD MethodsTD(0)TD(0)n-StepBootstrapn-Step BootstrapTD(λ)TD(λ)ForwardViewForward ViewEligibilityTracesEligibility TracesBackwardViewBackward ViewValueBasedValue BasedSarsaSarsa1-StepSarsa1-Step Sarsan-StepSarsan-Step SarsaSarsa(λ)Sarsa(λ)ExpectedSarsaExpected SarsaQ-LearningQ-Learning1-StepQ-Learning1-Step Q-Learningn-StepQ-Learningn-Step Q-LearningQ(λ)Q(λ)Double QDouble QDQNDQNDDQNDDQNDueling-DQNDueling-DQNDRQNDRQNC51DistributionalDQNC51 Distributional DQNNoisy NetNoisy NetRainbowDQNRainbow DQNModelModelDyna-QDyna-QDyna-Q+Dyna-Q+PolicyGradientsPolicy GradientsReinforceReinforceReinforcew/BaselineReinforce w/ BaselineTRPOTRPOTRPO+TRPO+PPOPPOIntrinsicRewardIntrinsic RewardCuriosityICMCuriosity ICMEmpowermentEmpowermentActor-CriticActor-CriticA3CA3CGA3CGA3CA2CA2CPAACPAACACERACERACKTRACKTRDPGDPGDDPGDDPGTD3TD3SACSACSAC w/temperatureauto-tuneSAC w/ temperature auto-tuneOACOACMarkovGameMarkov GameIQLIQLIACIACMADDPGMADDPGCounterfactualReasoningCounterfactual ReasoningModel ofOtherAgentsModel of Other AgentsFactorizableFactorizableHyperNetworkHyperNetworkCOMACOMAVDNVDNQMIXQMIXQTRANQTRANQTRAN-baseQTRAN-baseQTRAN-altQTRAN-altSocialInfluenceSocial Influence