Skip to content

Add multi-GPU vLLM CAA steering hook#694

Open
linmou wants to merge 4 commits into
zjunlp:mainfrom
linmou:pr/vllm-caa-multigpu-hook
Open

Add multi-GPU vLLM CAA steering hook#694
linmou wants to merge 4 commits into
zjunlp:mainfrom
linmou:pr/vllm-caa-multigpu-hook

Conversation

@linmou

@linmou linmou commented Jun 19, 2026

Copy link
Copy Markdown

Summary

This PR adds a lightweight CAA activation-add hook for vLLM inference, including tensor-parallel worker support.

The hook lets EasyEdit apply already-computed CAA vectors to vLLM-loaded models by installing activation additions on selected decoder layers inside each worker process.

What Changed

  • Added steer/vllm_caa_hooks.py

    • install and clear CAA vectors on vLLM model layers
    • worker RPC helpers for tensor-parallel vLLM engines
    • hook call/configuration stats
    • support for common vLLM worker model layouts
  • Added tests/test_vllm_caa_hooks.py

    • focused hook-only unit tests with fake vLLM-style models/workers
    • no dataset dependency
    • no LLM judge dependency
    • no generated artifacts
  • Added examples/vllm_caa_gpu_e2e.py

    • lightweight optional real-GPU smoke test
    • records baseline, steered, and restored outputs
    • records worker install/clear results and hook stats
  • Added docs/vllm_caa_multigpu_hook.md

    • documents runtime API and lightweight validation commands

Validation

pytest -q tests/test_vllm_caa_hooks.py

Result:

9 passed
python -m compileall steer/vllm_caa_hooks.py examples/vllm_caa_gpu_e2e.py tests/test_vllm_caa_hooks.py

Result: passed.

git diff --check main...HEAD

Result: passed.

Optional GPU smoke command:

CUDA_VISIBLE_DEVICES=0,1 \
VLLM_USE_FLASHINFER_SAMPLER=0 \
VLLM_ALLOW_INSECURE_SERIALIZATION=1 \
python examples/vllm_caa_gpu_e2e.py \
  --model /path/to/model \
  --tensor-parallel-size 2 \
  --layer 12 \
  --multiplier 0.0 \
  --vector-value 0.0 \
  --output /tmp/vllm_caa_gpu_e2e.json \
  --monitor-output /tmp/vllm_caa_gpu_e2e.gpu.csv

Notes

The larger steering-effect consistency experiments used during development are not included in this PR. This keeps the runtime contribution focused on the multi-GPU vLLM CAA hook and lightweight validation.

Related Issue

Closes #695

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

Add multi-GPU vLLM support for CAA steering hooks

1 participant