In artificial intelligence, the iterated prisoner’s dilemma game is a spinoff of the prisoner’s dilemma from game theory. There are many variations of the original version, so for consistency’s sake we’ll discuss this one.
You are an agent of a mafia who has just successfully stolen ten bars of gold from a bank with your partner-in-crime. To determine your rewards, your twisted mafia overlord sits you both down in front of the gold and instructs you to either give two bars to your companion, or to take one for yourself. The twist? He places a screen between the two of you, so neither of you knows what the other will do.
What’s an agent to do? Two bars are better than one, but one bar is better than none. Do you trust that your partner is benevolent and will give you gold, or do you ensure that you get something by taking a bar?
We won’t go too much further into this, except to note that the best outcome for each agent occurs when they take 1 and their partner gives them 2. Since both agents know this, each will theoretically attempt to improve his own situation by taking gold for himself. In game theory, this is known as a Nash equilibrium. 
The iterated prisoner’s dilemma game requires two prisoner’s dilemma agents to continuously face each other in a set number of rounds. On each round, an agent can either take one gold bar from an infinite store in the middle (selfishly), or give two bars to his opposing agent (benevolently). Agents have memory, so if one side acts selfishly, his opponent is likely to retaliate with selfishness later on.
In college, I took a class where we were tasked with creating a bot that competed with each of the other bots in the class in an iterated prisoner’s dilemma tournament. Importantly, each bot’s goal was to gain the most gold among all the rounds it played, and not to beat as many opponents as possible.
What is the difference between these two goals? It wasn’t clear to me until I sat down to discuss winning strategies with some friends.
The well-known solution was a ridiculously simple bot that does whatever its opponent did the previous round, starting by giving. This is known as the tit-for-tat bot, and it is famous in psychology/political science circles for beating every other bot in two international tournaments around 1980.  For the most part, my classmates schemed to beat it.
More than one friend confided in me that their strategy was to build an agent that was guaranteed to outscore a tit-for-tat bot in any round. To do this, the agents would be more nefarious than tit-for-tat. They would take gold at random, and certainly on the last round with any opponent. Thus, they were guaranteed a higher average score than their opponent across the rounds between the pair.
It seemed sound. If you were guaranteed to score higher on every round against a provably great bot, surely you would perform better than that bot in the tournament. I sketched out what this looked like.
And then I realized I was missing 90% of the picture.
Because each bot played every other bot, the nefarious bots wound up hurting themselves more than they did anyone else. They would perform well locally, against a single tit-for-tat bot. But in the greater environment of the tournament, where all agents retaliated against their opponents for selfish behaviour, taking was extremely costly. Taking begets taking, which risks sending a pair of agents into a downward spiral of mistrusting each other and taking for the rest of their rounds. This results in significantly lower profits for both agents.
Meanwhile, the success of each tit-for-tat bot increased when the total number of tit-for-tat bots (or similarly generous agents) in the tournament increased. It was a win-win situation every time a tit-for-tat bot encountered another such bot, because both agents would give to their opponent indefinitely. The most optimal agents were also the generous ones, who won’t take unless they’re given a reason to take. As expected, tit-for-tat bots did extremely well in the tournament.
“You can lose some battles, and still win the war.”
For several weeks afterwards, I couldn’t get this idea out of my head. The iterated prisoner’s dilemma tournament illustrated an interesting perspective on life itself. While acting antagonistically may let you take the upper hand on isolated situations or relationships, acting benevolently lets you take the upper hand in the greater system. I thought about traits like generosity, cooperation, and humility, which can sometimes feel like losing to an undeserving opponent. Perhaps, in actuality, they’re win-win situations.
 To elaborate: each agent knows that his own most optimal move is to take gold, and he knows that his opponent’s most optimal move is to take gold. He thus assumes that his opponent will take gold, leaving him with no option but to take as well.
 Dr. Robert Axelrod’s famous paper, The Evolution of Cooperation. Dr. Axelrod was the one who ran the original iterated prisoner’s dilemma tournaments.