A few simple logical steps yield our main theoretical resultπ
This magical equation connects two seemingly distinct concepts:
1β£ P_obs: the subjective probability your brain assigns to its observations
2β£ KL divergence between the true world and your internal beliefs
π§΅[4/n]
This magical equation connects two seemingly distinct concepts:
1β£ P_obs: the subjective probability your brain assigns to its observations
2β£ KL divergence between the true world and your internal beliefs
π§΅[4/n]
Comments
you adapt to the world
βοΈ
you accurately predict the world
βοΈ
fewer nasty surprises that could get you killed
βοΈ
KL ( world || your beliefs ) drops
π§΅[5/n]
I didn't put it in by hand, or make crazy assumptions.
β There is something truly fundamental, unique, and privileged about KL divergence.
π§΅[6/n]
β In section 4, I apply these foundational principles to "derive" (almost) all of machine learning from KL minimization.
(first pointed out by @alexalemi.bsky.social : https://blog.alexalemi.com/kl-is-all-you-need.html)
π§΅[7/n]
I then connect this to philosophy of science, and experiments as "truth-revealing actions."
(another relevant @alexalemi.bsky.social piece: https://blog.alexalemi.com/kl.html)
π§΅[8/n]
β KL divergence must be asymmetric: because it captures the flow of information from the world to the brain (when you read a book or perform an experiment)
π§΅[9/n]
β Brains: adapt to survive, survive to adapt
β adaptation = KL minimization
β most of ML = KL minimization
β KL asymmetry captures the flow of information
β KL minimization = flow of information from the world into your brain, updating your beliefs
π§΅[10/n]