\input /zu/6001-devel/fall94/6001mac \begin{document} \psetheader{Sample programming assignment}{Prisoner's Dilemma} \section{The Prisoner's Dilemma: A Fable} In the mid-1920's, the Nebraska State Police achieved what may still be their finest moment. After a 400-mile car chase over dirt roads and corn fields, they finally caught up with the notorious bank robbers Bonnie and Clyde. The two criminals were brought back to the police station in Omaha for further interrogation. Bonnie and Clyde were questioned in separate rooms, and each was offered the same deal by the police. The deal went as follows (since both are the same, we need only describe the version presented to Bonnie): ``Bonnie, here's the offer that we are making to both you and Clyde. If you both hold out on us, and don't confess to bank robbery, then we admit that we don't have enough proof to convict you. However, we {\it will} be be able to jail you both for one year, for reckless driving and endangerment of corn. If you turn state's witness and help us convict Clyde (assuming he doesn't confess), then you will go free, and Clyde will get twenty years in prison. On the other hand, if you don't confess and Clyde does, then {\it he} will go free and {\it you} will get twenty years.'' ``What happens if both Clyde and I confess?'' asked Bonnie. ``Then you both get ten years,'' said the interrogator. Bonnie, who had been a math major at Cal Tech before turning to crime, reasoned this way: ``Suppose Clyde intends to confess. Then if I don't confess, I'll get twenty years, but if I do confess, I'll only get ten years. On the other hand, suppose Clyde intends to hold out on the cops. Then if I don't confess, I'll go to jail for a year, but if I do confess, I'll go free. So no matter what Clyde intends to do, I am better off confessing than holding out. So I'd better confess.'' Naturally, Clyde employed the very same reasoning. Both criminals confessed, and both went to jail for ten years.\footnote{Actually, they didn't go to jail. When they were in court, and heard that they had both turned state's witness, they strangled each other. But that's another story.} The police, of course, were triumphant, since the criminals would have been free in a year had both remained silent. \section{The Prisoner's Dilemma} The Bonnie and Clyde story is an example of a situation known in mathematical game theory as the ``prisoner's dilemma.'' A prisoner's dilemma always involves two ``game players,'' and each has a choice between ``cooperating'' and ``defecting.'' If the two players cooperate, they each do moderately well; if they both defect, they each do moderately poorly. If one player cooperates and the other defects, then the defector does extremely well and the cooperator does extremely poorly. (In the case of the Bonnie and Clyde story, ``cooperating'' means cooperating with one's partner --- i.e., holding out on the police --- and ``defecting'' means confessing to bank robbery.) Before formalizing the prisoner's dilemma situation, we need to introduce some basic game theory notation. \subsection{A Crash Course in Game Theory} In game theory, a {\it two-person binary-choice game} is represented as a two-by-two matrix. Figure 1 shows a hypothetical game matrix. The two players in this case are called {\bf A} and {\bf B}, and the choices are called ``cooperate'' and ``defect.'' \begin{center} \begin{tabular}{c|c|c|} & B cooperates & B defects \\ \cline{2-3} A cooperates & A gets 5 & A gets 2 \\ & B gets 5 & B gets 3 \\ \cline{2-3} A defects & A gets 3 & A gets 1 \\ & B gets 2 & B gets 1 \\ \cline{2-3} \end{tabular} \end{center} \centerline{{\it Figure 1:\/} A hypothetical game matrix} Players {\bf A} and {\bf B} can play a single game by separately (and secretly) choosing to either cooperate or defect. Once each player has made a choice, he announces it to the other player; and the two then look up their respective scores in the game matrix. Each entry in the matrix is a pair of numbers indicating a score for each player, depending on their choices. Thus, in Figure 1, if Player {\bf A} chooses to cooperate while Player {\bf B} defects, then {\bf A} gets 2 points and {\bf B} gets 3 points. If both players defect, they each get 1 point. Note, by the way, that the game matrix is a matter of public knowledge; for instance, Player {\bf A} knows before the game even starts that if he and {\bf B} both choose to defect, they will each get 1 point. In an {\it iterated game}, the two players play repeatedly: thus, after finishing one game, {\bf A} and {\bf B} may play another. (Admittedly, there is a little confusion in the terminology here: you can think of each individual game as a single ``round'' of the larger, iterated game.) There are a number of ways in which iterated games may be played; in the simplest situation, {\bf A} and {\bf B} play for some fixed number of rounds (say, 200), and before each round they are able to look at the record of all previous rounds. For instance, before playing the tenth round of their iterated game, both {\bf A} and {\bf B} are able to study the results of the previous nine rounds. \subsection{An Analysis of a Simple Game Matrix} The game depicted in Figure 1 is a particularly easy one to analyze. Let's examine the situation from Player {\bf A}'s point of view (Player {\bf B}'s point of view is identical): ``Suppose {\bf B} cooperates. Then I do better by cooperating myself (I receive five points instead of three). On the other hand, suppose {\bf B} defects. I still do better by cooperating (since I get two points instead of one). So no matter what {\bf B} does, I am better off cooperating.'' Player {\bf B} will, of course, reason the same way, and both will choose to cooperate. In the terminology of game theory, both {\bf A} and {\bf B} have a {\it dominant} choice --- i.e., a choice that gives a preferred outcome no matter what the other player chooses to do. Figure 1, by the way, does {\it not} represent a prisoner's dilemma situation, since when both players make their dominant choice, they also both achieve their highest personal scores. We'll see an example of a prisoner's dilemma game very shortly. To recap: in any particular game using the matrix of Figure 1, we would expect both players to cooperate; and in an iterated game, we would expect both players to cooperate repeatedly, on every round. \subsection{The Prisoner's Dilemma Game Matrix} Now consider the game matrix shown in Figure 2. \begin{center} \begin{tabular}{c|c|c|} & B cooperates & B defects \\ \cline{2-3} A cooperates & A gets 3 & A gets 0 \\ & B gets 3 & B gets 5 \\ \cline{2-3} A defects & A gets 5 & A gets 1 \\ & B gets 0 & B gets 1 \\ \cline{2-3} \end{tabular} \end{center} \centerline{{\it Figure 2:\/} Prisoner's Dilemma Game Matrix} In this case, Players {\bf A} and {\bf B} both have a dominant choice---namely, defection. No matter what Player {\bf B} does, Player {\bf A} improves his own score by defecting, and vice versa. However, there is something odd about this game. It seems as though the two players would benefit by choosing to cooperate. Instead of winning only one point each, they could win three points each. So the ``rational'' choice of mutual defection has a puzzling self-destructive flavor. The matrix of Figure 2 is an example of a prisoner's dilemma game situation. Just to formalize the situation, let {\it CC} be the number of points won by each player when they both cooperate; let {\it DD} be the number of points won when both defect; let {\it CD} be the number of points won by the cooperating party when the other defects; and let {\it DC} be the number of points won by the defecting party when the other cooperates. Then the prisoner's dilemma situation is characterized by the following conditions: \begin{eqnarray*} DC > CC > DD > CD \\ CC > (DC + CD) / 2 \end{eqnarray*} In the game matrix of Figure 2, we have: \begin{eqnarray*} DC = 5 \\ CC = 3 \\ DD = 1 \\ CD = 0 \end{eqnarray*} \noindent so both conditions are met. In the Bonnie and Clyde story, by the way, you can verify that: \begin{eqnarray*} DC = 0 \\ CC = -1 \\ DD = -10 \\ CD = -20 \end{eqnarray*} \noindent Again, these values satisfy the prisoner's dilemma conditions. \section{Axelrod's Tournament} In the late 1970's, political scientist Robert Axelrod held a computer tournament designed to investigate the prisoner's dilemma situation.\footnote{Actually, there were two tournaments. Their rules and results are described in Axelrod's book {\sl The Evolution of Cooperation}.} Contestants in the tournament submitted computer programs that would compete in an iterated prisoner's dilemma game of approximately two hundred rounds, using the same matrix shown in Figure 2. Each contestant's program played five iterated games against each of the other programs submitted, and after all games had been played the scores were tallied. The contestants in Axelrod's tournament included professors of political science, mathematics, psychology, computer science, and economics. The winning program --- the program with the highest average score --- was submitted by Anatol Rapoport, a professor of psychology at the University of Toronto. In this problem set, we will pursue Axelrod's investigations and make up our own Scheme programs to play the iterated prisoner's dilemma game. The second (optional) part of this problem set is itself an experiment in the spirit of Axelrod's tournament: a contest of programs that play a {\it three-person} prisoner's dilemma game. Before we look at the two-player program, it is worth speculating on what possible strategies might be employed in the iterated prisoner's dilemma game. Here are some examples: \begin{description} \item[All-Defect] A program using the {\bf all-defect} strategy simply defects on every round of every game. \item[Poor-Trusting-Fool] A program using the {\bf poor-trusting-fool} strategy cooperates on every round of every game. \item[Random] This program cooperates or defects on a random basis. \item[Go-by-Majority] This program cooperates on the first round. On all subsequent rounds, {\bf go-by-majority} examines the history of the other player's actions, counting the total number of defections and cooperations by the other player. If the other player's defections outnumber her cooperations, {\bf go-by-majority} will defect; otherwise this strategy will cooperate. \item[Tit-for-Tat] This program cooperates on the first round, and then on every subsequent round it mimics the other player's previous move. Thus, if the other player cooperates (defects) on the $n$th round, then {\bf tit-for-tat} will cooperate (defect) on the $(n + 1)$th round. \end{description} All of these strategies are extremely simple. (Indeed, the first three do not even pay any attention to the other player; their responses are uninfluenced by the previous rounds of the game.) Nevertheless, simplicity is not necessarily a disadvantage. Rapoport's first-prize program employed the {\bf tit-for-tat} strategy, and achieved the highest average score in a field of far more complicated programs. \section{The Two-Player Prisoner's Dilemma Program} A Scheme program for an iterated prisoner's dilemma game is shown at the end of this problem set. The procedure {\tt play-loop} pits two players (or, to be more precise, two ``strategies'') against one another for approximately 100 games, then prints out the scores for each of the two players. Player strategies are represented as procedures. Each strategy takes two inputs --- its own ``history'' (that is, a list of all of its previous ``plays'') and its opponent's ``history.'' The strategy returns either the symbol {\tt c} (for ``cooperate'') or the symbol {\tt d} (for ``defect''). At the beginning of an iterated game, each history is an empty list. As the game progresses, the histories grow (via {\tt extend-history}) into lists of {\tt c}'s and {\tt d}'s. Note how each strategy must have its {\it own} history as its first input. So in {\tt play-loop-iter}, {\tt strat0} has {\tt history0} as its first input, and {\tt strat1} has {\tt history1} as its first input. The values from the game matrix are stored in a list named {\tt *game-association-list*}. This list is used to calculate the scores at the end of the iterated game. Some sample strategies are given at the end of the program. {\tt All-defect} and {\tt poor-trusting-fool} are particularly simple; each returns a constant value regardless of the histories. {\tt Random-strategy} also ignores the histories and chooses randomly between cooperation and defection. You should study {\tt go-by-majority} and {\tt tit-for-tat} to see that their behavior is consistent with the descriptions in the previous section. \paragraph{Problem 0 (no write-up necessary)} Use {\tt play-loop} to play games among the five defined strategies. Notice how a strategy's performance varies sharply depending on its opponent. For example, {\tt poor-trusting-fool} does quite well against {\tt tit-for-tat} or against another {\tt poor-trusting-fool}, but it loses badly to {\tt all-defect}. Pay special attention to {\tt tit-for-tat}. Notice how it never beats its opponent --- but it never loses badly. \paragraph{Problem 1} {\it a.} Games involving {\tt go-by-majority} tend to be slower than other games. Why is that so? Use order-of-growth notation to explain your answer. {\it b.} Alyssa P. Hacker, upon seeing the code for {\tt go-by-majority}, suggested the following iterative version of the procedure: \beginlisp (define (go-by-majority my-history other-history) (define (majority-loop cs ds hist) (cond ((empty-history? hist) (if (> ds cs) 'd 'c)) ((eq? (most-recent-play hist) 'c) (majority-loop (1+ cs) ds (rest-of-plays hist))) (else (majority-loop cs (1+ ds) (rest-of-plays hist))))) (majority-loop 0 0 other-history)) \endlisp Compare this procedure with the original version. Do the orders of growth (in time) for the two procedures differ? Is the newer version faster? \paragraph{Problem 2} Write a new strategy {\tt tit-for-two-tats}. The strategy should always cooperate unless the opponent defected on both of the previous two rounds. (Looked at another way: {\tt tit-for-two-tats} should cooperate if the opponent cooperated on either of the previous two rounds.) Play {\tt tit-for-two-tats} against other strategies. \paragraph{Problem 3} Write a procedure {\tt make-tit-for-n-tats}. This procedure should take a number as input and return the appropriate {\tt tit-for-tat}-like strategy. For example, {\tt (make-tit-for-n-tats 2)} should return a strategy equivalent to {\tt tit-for-two-tats}. \paragraph{Problem 4} {\it a.} Write a procedure {\tt make-dual-strategy} which takes as input two strategies (say, {\tt strat0} and {\tt strat1}) and an integer (say, {\tt switch-point}). {\tt Make-dual-strategy} should return a strategy which plays {\tt strat0} for the first {\tt switch-point} rounds in the iterated game, then switches to {\tt strat1} for the remaining rounds. {\it b.} Use {\tt make-dual-strategy} to define a procedure {\tt make-triple-strategy} which takes as input three strategies and two switch points. \paragraph{Problem 5} Write a procedure {\tt niceify}, which takes as input a strategy (say, {\tt strat}) and a number between 0 and 1 (call it {\tt niceness-factor}). The {\tt niceify} procedure should return a strategy that plays the same as {\tt strat} except: when {\tt strat} defects, the new strategy should have a {\tt niceness-factor} chance of cooperating. (If {\tt niceness-factor} is 0, the returned strategy is exactly the same as {\tt strat}; if {\tt niceness-factor} is 1, the returned strategy is the same as {\tt poor-trusting-fool}.) Use {\tt niceify} with a low value for {\tt niceness-factor} --- say, 0.1 --- to create two new strategies: {\tt slightly-nice-all-defect} and {\tt slightly-nice-tit-for-tat}. \section{The Three-Player Prisoner's Dilemma} So far, all of our prisoner's dilemma examples have involved two players (and, indeed, most game-theory research on the prisoner's dilemma has focused on two-player games). But it is possible to create a prisoner's dilemma game involving three --- or even more --- players. Strategies from the two-player game do not necessarily extend to a three-person game in a natural way. For example, what does {\tt tit-for-tat} mean? Should the player defect if {\it either} of the opponents defected on the previous round? Or only if {\it both} opponents defected? And are either of these strategies nearly as effective in the three-player game as {\tt tit-for-tat} is in the two-player game? Before we analyze the three-player game more closely, we must introduce some notation for representing the payoffs. We use a notation similar to that used for the two-player game. For example, we let DCC represent the payoff to a defecting player if both opponents cooperate. Note that the first position represents the player under consideration. The second and third positions represent the opponents. Another example: CCD represents the payoff to a cooperating player if one opponent cooperates and the other opponent defects. Since we assume a symmetric game matrix, CCD could be written as CDC. The choice is arbitrary. Now we are ready to discuss the payoffs for the three-player game. We impose three rules:\footnote{Actually, there is no universal definition for the multi-player prisoner's dilemma. The constraints used here represent one possible version of the three-player prisoner's dilemma.} \noindent 1) Defection should be the dominant choice for each player. In other words, it should always better for a player to defect, regardless what the opponents do. This rule gives three constraints: \begin{eqnarray*} DCC > CCC & {\rm (both\ opponents\ cooperate)}\\ DDD > CDD & {\rm (both\ opponents\ defect)}\\ DCD > CCD & {\rm (one\ opponent\ cooperates,\ one\ defects)} \end{eqnarray*} \noindent 2) A player should always be better off if more of his opponents choose to cooperate. This rule gives: \begin{eqnarray*} DCC > DCD > DDD \\ CCC > CCD > CDD \end{eqnarray*} \noindent 3) If one player's choice is fixed, the other two players should be left in a two-player prisoner's dilemma. This rule gives the following constraints: \begin{eqnarray*} CCD > DDD \\ CCC > DCD \\ CCD > (CDD + DCD) / 2 \\ CCC > (CCD + DCC) / 2 \end{eqnarray*} \begin{eqnarray*} CDD = 0 \\ DDD = 1 \\ CCD = 3 \\ DCD = 5 \\ CCC = 7 \\ DCC = 9 \end{eqnarray*} \paragraph{Problem 6} Revise the Scheme code for the two-player game to make a three-player iterated game. The program should take three strategies as input, keep track of three histories, and print out results for three players. You need to change only three procedures: {\tt play-loop}, {\tt print-out-results}, and {\tt get-scores}. You also need to change {\tt *game-association-list*} as follows: \beginlisp (define *game-association-list* '(((c c c) (7 7 7)) ((c c d) (3 3 9)) ((c d c) (3 9 3)) ((d c c) (9 3 3)) ((c d d) (0 5 5)) ((d c d) (5 0 5)) ((d d c) (5 5 0)) ((d d d) (1 1 1)))) \endlisp \paragraph{Problem 7} {\it a.} Write strategies {\tt poor-trusting-fool-3}, {\tt all-defect-3}, and {\tt random-strategy-3} that will work in a three-player game. Try them out to make sure your code is working. {\it b.} Write two new strategies: {\tt tough-tit-for-tat} and {\tt soft-tit-for-tat}. {\tt tough-tit-for-tat} should defect if {\it either} of the opponents defected on the previous round. {\tt soft-tit-for-tat} should defect only if {\it both} opponents defected on the previous round. Play some games using these two new strategies. \paragraph{Problem 8} A natural idea in creating a prisoner's dilemma strategy is to try and deduce what kind of strategies the {\it other} players might be using. In this problem, we will implement a simple version of this idea. First, we need a procedure that takes three histories as arguments: call them {\tt hist-0}, {\tt hist-1}, and {\tt hist-2}. The idea is that we wish to characterize the strategy of the player responsible for {\tt hist-0}. Our procedure will return a list of three numbers: the probability that the {\tt hist-0} player cooperates given that the other two players cooperated on the previous round, the probability that the {\tt hist-0} player cooperates given that only one other player cooperated on the previous round, and the probability that the {\tt hist-0} player cooperates given that both others defected on the previous round. To fill out some details in this picture, let's look at a couple of examples. We will call our procedure {\tt get-probability-of-c}; here are a couple of sample calls. \beginlisp ==> (get-probability-of-c '(c c c c) '(d d d c) '(d d c c)) (1 1 1) \null ==> (get-probability-of-c '(c c c d c) '(d c d d c) '(d c c c c)) (0.5 1 ()) \endlisp In the top example, the returned list indicates that the first player cooperates with probability 1 no matter what the other two players do. In the bottom example, the first player cooperates with probability 0.5 when the other two players cooperate; the first player cooperates with probability 1 when one of the other two players defects; and since we have no data regarding what happens when both other players defect, our procedure returns {\tt ()} for that case. Write the {\tt get-probability-of-c} procedure. Using this procedure, you should be able to write some predicate procedures that help in deciphering another player's strategy. For instance, here are two possibilities: \beginlisp (define (is-he-a-fool? hist0 hist1 hist2) (equal? '(1 1 1) (get-probability-of-c hist0 hist1 hist2))) \null (define (could-he-be-a-fool? hist0 hist1 hist2) (equal? (list true true true) (map (lambda (elt) (or (null? elt) (eqv? elt 1))) (get-probability-of-c hist0 hist1 hist2)))) \endlisp Use the {\tt get-probability-of-c} procedure to write a predicate that tests whether another player is using the {\tt soft-tit-for-tat} strategy from Problem 7. Also, write a new strategy named {\tt dont-tolerate-fools}. This strategy should cooperate for the first ten rounds; on subsequent rounds it checks (on each round) to see whether the other players might both be playing {\tt poor-trusting-fool}. If our strategy finds that both other players seem to be cooperating uniformly, it defects; otherwise, it cooperates. \paragraph{Problem 9} Write a procedure {\tt make-combined-strategies} which takes as input two {\it two-player} strategies and a ``combining'' procedure. {\tt Make-combined-strategies} should return a {\it three-player} strategy that plays one of the two-player strategies against one of the opponents, and the other two-player strategy against the other opponent, then calls the ``combining'' procedure on the two two-player results. Here's an example: this call to {\tt make-combined-strategies} returns a strategy equivalent to {\tt tough-tit-for-tat} in Problem 7. \beginlisp (make-combined-strategies tit-for-tat tit-for-tat (lambda (r1 r2) (if (or (eq? r1 'd) (eq? r2 'd)) 'd 'c))) \endlisp The resulting strategy plays {\tt tit-for-tat} against each opponent, and then calls the combining procedure on the two results. If either of the two two-player strategies has returned {\tt d}, then the three-player strategy will also return {\tt d}. Here's another example. This call to {\tt make-combined-strategies} returns a three-player strategy that plays {\tt tit-for-tat} against one opponent, {\tt go-by-majority} against another, and chooses randomly between the two results: \beginlisp (make-combined-strategies tit-for-tat go-by-majority (lambda (r1 r2) (if (= (random 2) 0) r1 r2))) \endlisp \section{Extra Credit: The Three-Player Prisoner's Dilemma Tournament} As decribed earlier, Axelrod held two computer tournaments to investigate the two-player prisoner's dilemma. Last semester, 6.001 students made game theory history by participating in the world's first THREE-player prisoner's dilemma tournament. Now, in a return engagement, the second-ever three-player tournament will be held. Last semester's results indicated that basically cooperative strategies do very well. The winning strategy, for instance, defected only if both opponents had each defected twice in a row. And the third place strategy (out of more than fifty entries) was simply {\tt poor-trusting-fool-3}! You can participate this year by designing a strategy for the tournament. You might submit one of the strategies developed in the problem set, or develop a new one. The only restriction is that the strategy must work against any other legitimate entry. Any strategies that cause the tournament software to crash will be disqualified. Instructions for entering your strategy in the tournament will be provided on a separate handout and posted in the laboratory. \end{document}