Nvidia needs to dominate inference otherwise they will lose their valuation soon. Nothing is more important in the next two years than getting faster and cheaper inference. We are going to need trillions of tokens per hour.

Comments