Google's DeepMind tests AI vs AI to see if they become 'aggressive' or cooperate

Google's artificial intelligence subsidiary DeepMind is pitting AI agents against one another to test how they interact with each other and how they would react in various "social dilemmas". In a new study, researchers said they used two video games – Wolfpack and Gathering – to examine how AI agents change the way they behave based on the environment and situation they are in using social sciences and game theory principles.

"The question of how and under what circumstances selfish agents cooperate is one of the fundamental questions in the social sciences," DeepMind researchers wrote in a blog post. "One of the simplest and most elegant models to describe this phenomenon is the well-known game of Prisoner's Dilemma from game theory."

This well-known principle is based on the scenario where two arrested suspects jointly accused of a crime are questioned separately. The police only have enough evidence to charge them for a minor offence but not enough for the main, more serious crime.

Both prisoners are offered the same deal: If you testify against the other prisoner - a move known as "defecting" - you will be released and the other will serve three years in prison. However, if both prisoners confess, both will be handed a two-year prison sentence, leading to the social dilemma.

The researchers found that the AI's behaviour does change based on its environment and may react in an "aggressive manner" if it feels it may lose out. However, the agents will cooperate and work as a team if there is more to be gained.

In the game Gathering, two players are tasked with collecting apples from a central pile. However, they also have the option of "tagging" the other player with a laser beam to temporarily remove them from the game, giving the first player more time to gather apples (represented as green pixels in the video embedded below).

The researchers allowed the agents to play the game thousands of times over and "let them learn how to behave rationally using deep multi-agent reinforcement learning". When there were enough apples in the environment, the agents learned to "peacefully coexist" and collect as many apples as they could. However, when the number of apples was reduced, the agents learnt "highly aggressive policies" and fired their lasers at each other when faced with a situation where there were fewer resources and the possibility of not getting a reward.

"The greed motivation reflects the temptation to take out a rival and collect all the apples for oneself," the researchers wrote. "The fear motivation reflected the danger of being taken out oneself by a defecting rival."

In the second Wolfpack game, two red players acting as in-game wolves have to hunt a third blue player, the prey, in an environment peppered with grey obstacles. If both wolves are near the prey when it is captured, both will receive a reward. However, if one wolf manages to capture it, there is a risk of losing the carcass to scavengers.

The researchers found that the smarter the AI agent is, the more likely it is to cooperate and work together with other players. They said this is likely because learning to work together and cooperate in Wolfpack requires more coordination and computational power than Gathering does.

They noted these experiments showed that AI may alter their behaviour to become more cooperative or aggressive depending on the situation, the rules that are in place and what is at stake.

"We showed that we can apply the modern AI technique of deep multi-agent reinforcement learning to age-old questions in social science such as the mystery of the emergence of cooperation," researchers wrote. "As a consequence, we may be able to better understand and control complex multi-agent systems such as the economy, traffic systems, or the ecological health of our planet - all of which depend on our continued cooperation."

AI Artificial Intelligence