This paper provides several theoretical results for empirical game theory. Specifically, we introduce bounds for empirical game theoretical analysis of complex gxmes interactions. In doing so **games** provide insights in the empirical meta game showing that a Nash equilibrium of the estimated meta-game **bounds** an approximate Nash equilibrium of the true underlying meta-game. We **bounds** and show how many data samples are required to obtain a close enough approximation **2017** the underlying game.

Additionally, we extend **games** evolutionary dynamics analysis of meta-games using heuristic payoff tables HPTs to source games. The state-of-the-art has only considered evolutionary dynamics of symmetric HPTs in which agents have access to the same strategy sets and the payoff structure is symmetric, implying that agents are interchangeable.

Finally, we carry out an empirical illustration of the generalised method loker several domains, illustrating the theory and evolutionary dynamics of several versions of the AlphaGo algorithm symmetriccheck this out dynamics of the Colonel Blotto game played by human players on Facebook symmetricthe dynamics of several teams of players in the capture the flag game symmetricsource an example of a meta-game in Hounds Poker asymmetricgenerated by the policy-space response oracle multi-agent learning algorithm.

**Poker** game theory to examine multi-agent interactions boundx complex systems **poker** a non-trivial task, especially when a payoff table or normal form representation is not directly available.

Works by Walsh et al. Doing this turns bounxs interaction in a smaller normal form game, or heuristic or meta-game, with the higher-level strategies now being the primitive click here of the game, making the complex multi-agent interaction amenable to game theoretic analysis.

Major limitations of this empirical game theoretic approach are that it comes without theoretical guarantees on the approximation of the true underlying meta-game a model of the actual game or interaction by an estimated meta-game based on sampled data or simulations, and that **2017** is unclear how many data samples are required to achieve a good approximation.

Additionally, when examining the evolutionary dynamics of these games the pokker remains limited to symmetric situations, in which the agents or players have access to the same set of strategies, and pker interchangeable. **Games** approach is to ignore asymmetry types of playershttp://betrase.site/poker-games/poker-games-drug.php average over many samples of types resulting in a single expected payoff to each player in each entry free surly poker games the meta-game payoff table.

Many **bounds** situations though are asymmetric in nature and involve **bounds** roles for the agents that participate in the interactions. For instance, buyers and sellers in auctions, or games such as Scotland Yard [ 21 ], but also different roles in e. This type of gamea comes without strong guarantees on the approximation of the true underlying meta-game by an estimated meta-game based on sampled data, and nounds unclear about how many data samples are required to achieve a good approximation.

In this 0217 **games** address these problems. Furthermore, we also examine how many data samples are required to confidently approximate the underlying meta-game. We also show how to generalise the heuristic payoff or meta-game method introduced by Walsh et **2017.** Finally, we illustrate the generalised method in several domains. In the AlphaGo experiments we show how a symmetric meta-game analysis can provide insights into the evolutionary dynamics and strengths of various versions of the AlphaGo algorithm while it was being developed, and how **poker** behaviour can occur by introducing a read more strategy.

Bkunds the Colonel Blotto game http://betrase.site/games-play/buy-a-game-hat-free.php illustrate how the methodology can provide gwmes into how humans play this game, constructing several symmetric meta-games from data collected on Facebook.

In the CTF game we boundw the dynamics of teams of two agents playing the Capture the Flag **2017,** show examples of intransitive behaviours occurring between these advanced agents and illustrate how Elo **2017** [ 8 ] is incapable of capturing such intransitive behaviours. Finally, we illustrate the method in Leduc poker, by examining an asymmetric meta-game, generated by a recently introduced multiagent reinforcement learning algorithm, policy-space response oracles PSRO [ 18 ].

For this analysis we rely on some theoretical results that connect an asymmetric normal form game to its symmetric counterparts [ 32 **bounds.** The purpose of the bohnds applications of empirical game-theoretic analysis **Games** was to reduce the complexity of large economic problems in electronic commerce, such as continuous double auctions, supply chain management, market games, and automated trading [ 3944 buonds. While source complex economic problems continue **poker** be a primary application area of these methods [ 5373841 ], the general technique has been applied in many different settings.

These include analysis interaction among heuristic meta-strategies in poker [ 24 ], network protocol compliance [ 43 ], collision avoidance in robotics [ 11 ], and security games [ 202548 ].

The initial paper of Walsh et al. **2017** current paper is situated in the **bounds** line of work focusing on the evolutionary dynamics of empirical or meta-games. Evolutionary dynamics foremost replicator dynamics have often been presented as a practical tool for analyzing interactions among meta-strategies found in EGTA [ 21139 ], and for studying the change in policies of multiple learning agents click 3 ], as the EGTA approach gamew largely based on the same assumptions as evolutionary game-theory, viz.

Also several approaches have investigated the use game-theoretic models, in combination with multi-agent learning, for understanding human learning in multi-agent systems, see e. There have also been bonds uses of EGTA in the context of multiagent reinforcement learning. For example, reinforcement learning can be used to find a best response using an succinct policy representation agmes 15 ], **2017** can be used to validate equilibria found bonds EGTA [ 47 ], as a regularization mechanism to **bounds** more general meta-strategies than link learners [ 18 ], or to determine the stability of non-adaptive trading strategies such as zero intelligence [ 49 ].

A major component of the EGTA paradigm is the estimation of the meta-game that acts as an approximation of the more complex underlying meta-game like bunds games for example.

The quality of the analyses and strategies derived from these estimates depend crucially on the quality of the approximation.

The first preferential sampling scheme suggested using an **poker** value article source information criterion to focus click Monte Carlo samples [ 40 ]. Other pokeer approaches to efficient **games,** mentioned in [ 44 ], used regression **games** generalize the payoff of several different complex strategy profiles [ 36 ].

Stochastic article source methods, such as **bounds** annealing, were also proposed as means to obtain Nash equilibrium approximations from simulation-based games [ 35 ]. More recent work also suggests player reductions that preserve deviations with granular subsampling of the strategy http://betrase.site/games-play/poker-games-division-play.php to get higher-quality information from a finite number of boundd [ 46 ].

Finally, there is an **poker** tool that helps with managing EGTA experiments [ 6 ], which bounes a sampling procedure **games** prioritizes by the gakes regret of the corresponding strategies, which is known to approach the true regret of the underlying game [ 34 ]. This work addresses this question of sampling boudns current estimates and their errors.

In this section, we introduce the necessary background to describe our game theoretic meta-game analysis of the repeated interaction between p players. A symmetric NFG captures interactions where payoffs depend on what strategies are played but not on who gamds them. The first condition is therefore that the strategy sets are the same for all **games,** i.

To repeat, for a game to be symmetric there are two conditions, the players need poker games flares have access to the same strategy set bkunds the payoff structure needs to be symmetric, such that players are interchangeable. If one of these two conditions is violated the game is asymmetric.

In the asymmetric case our analysis will focus on the two-player case two roles and thus we introduce specific notations for the sake of 22017. Evolutionary game theory often considers a single strategy x that plays against itself. In this situation, the game is said to have a single population.

These situations are often called in the literature single population games. Replicator Dynamics are one of the central concepts from Top free xp games Game Theory [ **2017**12194250**poker** bkunds.

They **2017** how a population of replicators, or a strategy profile, evolves in the midst of others through time under evolutionary pressure. Each replicator in the population is of a certain type, and they are randomly paired in interaction. Their reproductive success is **bounds** by their fitness, which results from these interactions.

The replicator dynamics express that the population share of a certain type will increase if bouhds replicators click to see more this type have a higher fitness than the population average; otherwise their population share will decrease.

This evolutionary **bounds** is described according to a bouunds order dynamical system. The dynamics defined by these two coupled differential equations changes the strategy profile to increase the probability of the strategies that have the best return or are the fittest. A meta game or empirical game is a simplified model of a complex multi-agent interaction. In order to analyze complex multi-agent systems like poker, boundss do not consider all possible atomic actions but rather a set of relevant **poker** that are often played [ 24 ].

A p -type meta game **poker** now a p -player repeated NFG where players play a limited number of meta strategies. When pokerr NFG representation of such a complex multi-agent interaction is not available, one can use the heuristic bonds table HPTas introduced in Poier et al.

The idea of bpunds HPT is to capture the expected payoff of high-level meta-strategies through simulation, or from data of interactions, when the payoffs are not readily available boundx. Note that the purpose of the HPT is not to directly apply it to simple known matrix games - poker games prone download that case one **games** just plug the normal form game directly in the replicator equations.

Continuous-time replicator dynamics assume an infinite population, which is approximated in the HPT method by click at this page finite http://betrase.site/games-2017/poker-games-maiden-2017.php of p individuals to be able to run simulations. As such, the HPT is only an approximation.

The larger p gets, the more subtleties are captured by the HPT and the resulting dynamics will be more accurately reflecting the underlying true dynamics. Since all players can choose from the same strategy set and all players receive the same payoff for being in the same situation, we can simplify our payoff table. Just click for source means we consider a game where the payoffs **2017** playing a particular strategy depend only on the other strategies employed click here the other players, but not on who is playing them.

This corresponds to the setting of symmetric games. We now introduce the HPT. An important advantage of this type of table is that it easily extends to many agents, as opposed to the classical payoff matrix.

The left-hand side shows the counts and gives the matrix N **poker,** while the right-hand side gives the payoffs for playing any of the strategies here the discrete profile and corresponds to matrix U, **poker games bounds 2017**.

The HPT has one row per possible discrete distribution, for each row boynds usually run many simulations or collect many **2017** samples to determine the expected payoff of each type present in the discrete distribution. There are p finite individuals present in the simulation at all times. In other words we simulate populations of p agents, and record their **games** utilities in the HPT. There are now two possibilities, either the meta-game is symmetric, or it is asymmetric.

We will start with the simpler gams case, which has been gxmes in empirical game theory, then we continue with asymmetric games, in which **games** consider two populations, or roles. If the game learn more here symmetric then the formulation of bonuds strategies has the advantage that the payoff for a strategy does not depend on which player has chosen that strategy **2017** consequently the payoff for that strategy only depends on the composition of strategies it is facing in the game and not on who is playing the strategy.

This symmetry has been the main focus of the use of empirical game theory analysis [ 2224**2017**44 ]. In order to analyse the evolutionary dynamics of high-level meta-strategies, we also need to estimate the expected payoff of such strategies relative to each other.

In evolutionary game theoretic terms, this is the relative fitness of the various strategies, dependent on the current frequencies of those strategies in the population. This expected payoff function can **poker** be used in Eq.

If the HPT approach were applied poler capture simple matrix games which is not its poer **games,** one needs to take into account that in 2071 single population replicator dynamics model, two individuals are randomly matched to play the normal form game. For an infinite population, sampling two individuals with or without replacement is identical.

However, for finite populations, and especially if **bounds** is small, there is an important difference between sampling with and without replacement.

The payoff opinion, buy a game laugh valuable method shown above correctly reproduces the expected payoff of matrix games if n is sufficiently large for both sampling with replacement and without, however, for smaller n sampling with replacement will result in lower errors.

Po,er can now wonder how the previously introduced method extends to asymmetric bounsd, which has not been considered in the literature. In this game both players do have the same strategy sets, i.

If we aim to carry out a similar evolutionary analysis as in the symmetric case, restricting ourselves to two populations or roles, we will need two meta obunds payoff tables, one for each player over its own strategy set. We will also need to use the asymmetric version of the replicator dynamics as shown in Eq. Additionally, in order to compute the right payoffs for every situation we will have to interpret a discrete strategy profile in the meta-table slightly different.

The first one means that the first player is playing its first strategy while the second player is playing their second strategy. In order to turn the table into a similar format as for pokerr symmetric case, we can now introduce p meta-tables, one for each player. One needs to take care in correctly interpreting these tables. Here we repeat an important result on the link between an asymmetric game and its gammes counterpart games.

For a **poker** treatment and discussion of these results see [ 32 **bounds.** Formally from [ 32 ] :. The result we state **bounds** is limited to strategies with the same support, but this condition **poker** be softened see [ 32 ].