Sandboxing the Homunculus

To understand the problem of dualistic experience, it is important to treat the brain as a cybernetic system.

This is a picture of your cerebrum, the part of your brain that embeds your consciousness.

[TODO image of human brain with cerebrum labeled]

The purpose of your cerebrum is to send signals that tell your body what to do.

[TODO image of cerebrum with output signal moving a hand]

To determine what signals it should send, the cerebrum needs to know what’s going on in the world. Your cerebrum thus gets input signals from your senses.

[TODO image of cerebrum with output signal and input signal]

The data going from your eyes into your cerebrum is, at best, only a snapshot of what is happening in the physical world. That’s not enough information to work from. What if a bad guy is behind you? The cerebrum doesn’t make decisions just based off of what information is coming through the senses right this instant. The cerebrum instead buids a real-time simulation of the world called your umwelt, which is merely updated by sensory inputs. The algorithm it uses for this is called predictive coding.

[TODO image of cerebrum with outside object, input signal, and simulated object]

Predictive coding isn’t the only important algorithm going on in the brain. There’s another important algorithm called operant conditioning1 (aka reinforcement learning). Operant conditioning is the idea that certain behaviors are reinforced. If you’re hungry and you eat a yummy cookie, then this behavior is likely to be reinforced through the operant conditioning algorithm.

Basically:

  • Predictive takes input signals and uses them to create your umwelt.
  • Operant condition takes your umwelt and uses it to make decisions about what output signals to send.

Your umwelt and the external physical world are similar and have many 1-to-1 correspondences. There is something called “your right hand” in the physical world and there is something called “your right hand” in your umwelt. If your right hand is cut off in the physical world then your right hand will also be cut off in your umwelt, due to your predictive coding algorithm updating your umwelt.

If you get hunger signals from your senses, then your predictive coding updates your umwelt to include hunger. Your operant conditioning algorithm (seeded by hard-coded instincts) decides to output motor actions that eat food. Your umwelt updates itself immediately because it is a simulation of physical causality. Later, the predictive processing algorithm updates your umwelt either directly, via surprise, or indirectly, via nonsurprise, according to sensory signals it received from the physical world.

[TODO circular diagram: stomach -> brain [I’m hungry] -> eat food (smiling hamburger) -> stomach. Subtitle: “It’s an Impossible burger.”]

When the operant conditioning algorithm tells the physical world “move my left hand”, it sees your left hand move in your umwelt. Your operant conditioning algorithm doesn’t know that your umwelt and the physical world are different things. Your operating condition algorithm doesn’t even notice a lag between action and observation because your umwelt is a real-time prediction, not an after-the-fact observation. Your operant gets ground truth from one universe (your umwelt) and sends decision signals to a different reality (the physical one), all while not even noticing that they’re two fundamentally different phenomenological planes of existence. From the perspective of your operant conditioning algorithm, there is only one universe.

Since the operant conditioning algorithm doesn’t observe exteranl physical reality, it only really cares about your umwelt. Your umwelt is neural activity running in your cerebrum, and the operant conditioning algorithm is an algorithm modifying the neural activity in your cerbrum, there’s an obvious failure mode: Instead of sending signals telling your body to eat food, your operant conditioning algorithm might just modify your umwelt to say “not hungry”.

[TODO diagram: stomach -> brain I’m hungry -> No I’m not hungry. This doesn’t work. (lonely hamburger). Subtitle: This one is an Impossible burger.]

This is bad and stupid. The part in your brain that tells your body what to do based on the umwelt should not be able to crudely, deliberately, and directly modify that same umwelt, because if it could it’d just modify the world into sunshine and rainbows instead of doing its job.

[TODO comic of brain with sunshine and rainbows inside and rain outside]

Consequently, normative human brains fall into a dynamical attractor where there is a world-model part and a decision-making part, and the decision-making part has read-only access to the world-model part. This sandboxed decisionmaker is called the homunculus.

  1. Operant conditioning is specific to animals. People from a machine learning background will often use the broader term reinforcement learning, which applies to both animals and computers â†Šī¸Ž