A discussion with Dakota informed me of the concept on Context Free Grammar. It’s very intriguing once he fully introduced it.
It’s constructing up a DAG tree that’s suppose to be the next token prediction probability distribution. The benefits are:
- It’s a very portable dataset to conjour up language like sequence data.
- The amount of information and logics are transparent and easy to manipulate, say, you can ablate a branch of a designated size and hierachy to see how fast & well model can handle shift. Or test any sort of curriculum learning as the
difficulty is explicitly defined.
- Tiny experiments that we think that will improve LLM can be easily tested, and measured for their benefits and impact.
Original paper:
Dakota’s Repo