Building inductive biases into LLMs

Mar 4

Language models are hungry for data—can we initialize their weights in ways that help reduce how much data they need?

2 Comments

so we can think of the varying strength of synapses of interconnected biological neurons as evolutionary pre-pretraining for language (with school as pretraining and vocational training as training), and we should translate the learnings of that pre-pretraining to a geometric understanding for computational linguistics, so that we can advantageously apply these learnings as inductive biases to network architecture and weight initialization better

craziest shit i've ever read

Expand full comment

Reply (1)

Sean Trott

Mar 5

hopefully crazy in an interesting/good way!

Expand full comment

The Counterfactual

Building inductive biases into LLMs