Optimizing the marginal likelihood encoding the idea of Occam's razor leads to more sparsifiable neural networks. The title is 🔥