meditation

A discussion with Dakota informed me of the concept on Context Free Grammar. It’s very intriguing once he fully introduced it.

It’s constructing up a DAG tree that’s suppose to be the next token prediction probability distribution. The benefits are:

It’s a very portable dataset to conjour up language like sequence data.
The amount of information and logics are transparent and easy to manipulate, say, you can ablate a branch of a designated size and hierachy to see how fast & well model can handle shift. Or test any sort of curriculum learning as the difficulty is explicitly defined.
Tiny experiments that we think that will improve LLM can be easily tested, and measured for their benefits and impact.

Original paper: