Google's artificially intelligent neural network of computers is now so clever that it can solve puzzles meant for computing professionals who apply for jobs with the internet giant.
According to researchers from Oxford University, the Canadian Institute for Advanced Research and Google, the DeepMind neural network can be taught to solve communication-based co-ordination tasks. The network previous succeeded in teaching itself to play 31 retro arcade video games, although it was unable to master Pac-Man.
The robot first needs to develop a communications protocol in order to communicate with other agents in the network, and the researchers say that for the first time, deep reinforcement learning has enabled the neural network to successfully learn communications protocols.
The puzzle set for the neural network is one that is posed to prospective Google employees when they attend an interview with the company.
Can you answer this Google interview question?
Q: An executioner lines up 100 prisoners single file and puts a red or a blue hat on each prisoner's head. Every prisoner can see the hats of the people in front of him in the line – but not his own hat, nor those of anyone behind him. The executioner starts at the end (back) and asks the last prisoner the colour of his hat. He must answer "red" or "blue."
If he answers correctly, he is allowed to live. If he gives the wrong answer, he is killed instantly and silently. (While everyone hears the answer, no one knows whether an answer was right.) On the night before the line-up, the prisoners confer on a strategy to help them. What should they do?
In order to solve this problem, the DeepMind neural network modelled each of the 100 prisoners as a separate computer in the network and then considered what coloured hats each agent would be able to see if they were lined up in a row. Each artificially intelligent agent had to consider what to tell the others, and then collectively they could work out the answer.
Answer, by Terence Gaffney, mathematics professor at Northeastern University:
"You have a 100% chance of saving all but the last prisoner, and a 50% chance of saving that one. Here's the strategy the prisoners have agreed on. The last prisoner counts the number of blue hats worn; if the number is even, the last prisoner yells "blue", if odd, yells "red". If the the 99th prisoner hears "blue", but counts an odd number of blue hats, then his hat must be blue so that the total number of blue hats is even. If he counts an even number of blue hats, then his hat must be red. If the last prisoner yells red, then #99 knows that there are an odd number of blue hats.
So #99 counts the number of blue hats he can see. Again, if they are even, his hat is blue, if odd, his hat is red. The 99th prisoner then yells out the color of his hat and is spared. The next prisoner now knows whether the remaining number of blue hats, including his own, is odd or even, by taking into account whether #99 had a blue hat or not. Then by counting the number of blue hats he sees, he knows the color of his hat. So he yells out the color of his hat and is spared. This saves all but the last prisoner, and there is a 50% chance that his hat is the color he shouted out." - Answer first appeared In the New York Times
The researchers say that the network of computers was able to work out the answer to the problem by themselves and this is a great step towards getting robots to work together successfully to complete tasks.
"They've come up with protocols that are different from how humans solve these problems. We don't yet fully understand what the solutions are, but we know that they work," Oxford University researcher Jakob Foerster told New Scientist magazine.
The research, entitled "Learning to Communicate to Solve Riddles with Deep Distributed Recurrent Q-Networks", is an open-access paper published in the Cornell University Library repository.