Report

Towards Equilibrium Transfer in Markov Games 胡裕靖 2013-9-9 Outline Background Preliminary Ideas Some Results Background Multi-agent Reinforcement Learning Single-agent RL: Path finding Mountain Car RL in multi-agent tasks Robot Soccer IKEA furniture robot Markov Games : < , , , > : the discrete state space. : the action space of the agent. : × → is the reward function. : × × → 0,1 is the transition function. from one agent to more than one : < , , =1… , =1… , > N: the set of agents. : the discrete state space. = 1 × ⋯ × : the joint action space of the agents. : × → is the reward function. p: × × → 0,1 is the transition function. Agent take joint actions Equilibrium-based MARL Some equilibrium solution concepts in game theory can be adopted Our Previous Work Equilibrium-based MARL: Multi-agent reinforcement learning with meta equilibrium [] Multi-agent reinforcement learning by negotiation with unshared value functions [] Focusing on combining MARL with equilibrium solution concepts Problematic issues: Equilibrium computing is complicated and time consuming A new complexity class: TFNP! [] For tasks with many agents, equilibrium-based MARL algorithms may take too much time How to accelerate the learning process of equilibrium-based MARL? Transfer Learning in RL Matthew E Taylor, Peter Stone. Transfer learning for reinforcement learning domains. Journal of Machine Learning Research, 2009. Alessandra Lazaric. Transfer in reinforcement learning: a framework and a survey. Reinforcement Learning, Springer, 2012. instance/policy/value function/model/… ′ accelerate Reuse learnt knowledge Transfer Learning in Markov Games? instance/policy/value function/model/… ′ Inter-task transfer Inner-task transfer …… Why not transfer between these normal-form games within a Markov game? …… Inner-task Transfer Transfer equilibrium between similar normal-form games during learning in a Markov game: Reuse the computed equilibria in previous games Reducing learning time Key problems: Which games are similar? For example: the games occur on different visits of a state How to transfer equilibrium? 1+1 (, , ) 1 (, , ) (, ) 1 2 (, ) 1 2 1 2 1 1 2 1 2 −1 0 2 −1 −0.5 …… Preliminary Ideas Game Similarity Games with the same action space? Games with different action space? Similarity payoff distance? Equilibrium-based similarity or equilibrium-independent similarity? Drew Fudenberg and David M. Kreps. A theory of learning, experimentation and equilibrium in games. 1990. Game Similarity Find equilibria of two games and compute the similarity Equilibrium-based similarity Weird Cycle Transfer seems senseless! Equilibrium transfer Why not take (, ) in the second game? Our Idea Transfer equilibrium between games which are thought to be similar. Evaluate how much the loss brought by equilibrium transfer is. Transfer is acceptable when there is a little loss. 1+1 (, , ) 1 (, , ) (, ) 1 2 (, ) 1 2 1 2 1 1 2 1 2 −1 0 2 −1 −0.5 The two games are different only in one item. …… Problem Definition (, ) 1 2 1 1 2 1 0 2 −1 −0.5 (, ) 1 2 1 2 2 −1 , ∗ transfer method? ′, ? Can we find a transfer method which can transfer the computed Nash equilibrium ∗ in game to a strategy profile ′ in game ′ that satisfies ∀ ∈ and ∀ ∈ , there holds Approximate ′ ′ ′ ′ , − ≤ + , Nash equilibrium where is close to 0. In other words, given a transfer method, if is small enough, then the transfer method is acceptable. Furthermore, Problem Definition ∀ ∈ and ∀ ∈ , define the transfer error ′ ′ ′ ′ , ′ = , − − Let ′ = max ( , ′ ) Let ′ = max (′ ) Given a transfer method, we need to find the bound of (′ )! A Naïve Transfer Method Direct Transfer 1 2 1 1 2 1 0 2 −1 −0.5 1 2 1 2 2 −1 , ∗ ∗ (, ) (, ) ′, ? Define the difference of the two games = ′ − such that ∀ ∈ and ∀ ∈ = ′ − . Examine the transfer error ′ ′ ∗ ∗ ′ ∗ , = , = , − − A Naïve Transfer Method ′ ′ ∗ , ′ = , − − ∗ ′ ′ ∗ = Σ− − − , − − Σ′ Σ− ∗ ′ (′ , − ) ∗ = Σ− − − ′ ′ , − − Σ′ ∗ ′ ′ , − ∗ = Σ− − − [ , − + ( , − ) − Σ′ ∗ ′ [ ′ , − + (′ , − )]] ∗ = Σ− − − , − − Σ′ ∗ ′ ′ , − ∗ + Σ− − − [ ( , − ) − Σ′ ∗ ′ (′ , − )] ∗ ≤ Σ− − − [ ( , − ) − Σ′ ∗ ′ (′ , − )] ∗ = Σ− − − , − − Σ ∗ () ∗ = Σ− − − + , − − Σ ∗ () ≤ Σ− + , − − Σ ∗ () + , − = max(0, , − ) A Naïve Transfer Method Σ− + , − − Σ ∗ () Many items in are zero if two games are very similar Some Results Future Work Some problems: Other transfer methods? Only Nash equilibrium? Equilibrium finding algorithms Transfer between games with different action space Transfer between games with different agent numbers Game abstraction Thanks!