bqw1013

BQW bqw1013

Achievements

WebSculpt WebSculpt Public

Agent's procedural memory for the web. Turn one successful browser workflow into reusable, zero-cost CLI commands.

TypeScript 2
OpenRLHF OpenRLHF Public

Forked from OpenRLHF/OpenRLHF

An Easy-to-use, Scalable and High-performance RLHF Framework (70B+ PPO Full Tuning & Iterative DPO & LoRA & RingAttention & RFT)

Python
simpleRL-reason simpleRL-reason Public

Forked from hkust-nlp/simpleRL-reason

This is a replicate of DeepSeek-R1-Zero and DeepSeek-R1 training on small models with limited data

Python
PaperMind PaperMind Public
Logic-RL Logic-RL Public

Forked from Unakar/Logic-RL

Reproduce R1 Zero on Logic Puzzle

Python
verl verl Public

Forked from verl-project/verl

verl: Volcano Engine Reinforcement Learning for LLMs

Python