Empowerment

17 Oct 2016

One of the simple strategies for game-playing (games like chess, checkers, backgammon, and go) is called "rollout". This is where the computer evaluates evaluates a particular position, by playing randomly from that position to the end, repeatedly. If a position is strong, then there will be a lot of ways to win, and the rollout will show that.

Empowerment is a similar simple heuristic for NPCs proposed by Klyubin, Polani, and Nehaniv. That is, not only adversarial, gameplaying AIs, but generic agents. The agent evaluates a particular position by rolling out repeatedly from that position, and counts the number of positions it can get to - more is better.

An empowerment-based agent's behavior is based what world the agent is in. If it is in a 5x5 open square grid, then its options would be maximized by going to the center of the grid. If the 5x5 grid is not open, but separated horizontally into "floors", except for one "elevator" column, then its options would be instead be maximized by going to the elevator column, on floor 3. If you nudge the empowerment-based agent, it would behave reasonably - retrying if you push it "backward", and opportunistically taking advantage of your push if you push it "forward". If you change the world, adding a wall somehow, it reacts, rather than mindlessly pushing against the wall as if it isn't there.

If you add a "dead" state, where the agent has no options, then the agent will act to avoid death.

A different group (of physicists) have independently invented this idea, which they call "Causal Entropy". They have a video of a simulated pendulum, subjected to a causal entropic force, balancing itself.

I think empowerment is a great idea for games. Specifying utility functions manually is hard. Specifying rules for behavior is hard. Specifying dynamics "doing this will cause this to occur" is natural and easy. There's a need for "cheap and shitty NPCs", that nevertheless behave sortof reasonably almost immediately. After all, you can go back and replace them with some utility function optimizer after you work out what the game IS.

One idea that might be interesting would be combining the empowerment heuristic with a different idea, of "umwelt".

The way I understand it, Umwelt is the idea that someone else's world can be much smaller, simpler, and perhaps exotic, which compared to our own understanding of the world. There's an example in the wikipedia of someone trying to "get inside the head of" a tick.

From a game-AI perspective, the point is that empowerment needs a model world to run against, but it doesn't have to be the real world. An agent that lives in a 5x5 open square grid, but believes it lives in the floors-and-elevator world will behave like it has a preference regarding its manner of movement. Explaining the floors-and-elevator world to the agent is straightforward, but rules expressing the manner of movement desired (including things like opportunistically taking advantage of nonmodeled nudges that move the agent forward, and behaving reasonably when nonmodeled nudges move the agent backward or to a different floor) is harder. Writing a closed-form utility function for the agent to maximize is also a bit tricky.

There's a story, in Lean manufacturing, that you have tasks piling up in a queue in front of a worker, and for one reason or another, it is a good idea to reduce the number of tasks in the queue. One reason given is that the worker works faster with fewer things in the queue (congestion). Another reason given is that the items in the queue gradually "go bad". A third reason given is that the tasks are actually expensive pieces of metal, and the business can operate more efficiently, requiring less capital, by buying the pieces of metal "just in time". I wonder if there is a way to express the "reduce the number of tasks in the queue" rule of thumb as "follow the empowerment heuristic".