Recent Learnings
RLVR
Reinforcement Learning with Verifiable Rewards is actually things I want to try very much.
Several resources and frameworks within the open-source ecosystem for building and integrating environments for agent training:
- NVIDIA’s Nemo Gym: A framework for building environments and integrating them into training loops
- Prime Intellect’s Verifiers and Environment Hub: These resources are designed to encapsulate tasks, harnesses, and metrics, acting as a platform for sharing environment patterns and research experiments
- Meta’s OpenM: Another identified framework for building and integrating environments
- Open Rewards by General Reasoning: A newer hub that aggregates different environments for use with various training frameworks
- Hugging Face Spaces: Act as a neutral, agnostic set of environments that can be plugged into training workflows
Other things
- Flex attention is a good idea to implement efficient attention for transformer models, it use indices of the attention matrix to implement actual shape of the attention matrix, Without rewriting your kernel with CUDA.
- inductive bias
- semantic ID, training with decreasing number of code for layers, like layer 1 is 2048 code, layer 2 is 1024 code, layer 3 is 512 code, etc. What I experimented before is reversed.
- Current LLM, if still learning from human, it’s not learning the bitter lesson. Language should only be used as the handler of understandable input and output, rather than the true bearing of the logic.