LEVER: Inference-Time Policy Reuse under Support Constraints
This work was conducted during my Research Fellow appointment in the RAI for Ukraine program at the Center for Responsible AI at New York University.
Preprint: Vitenko I., et al. “LEVER: Inference-Time Policy Reuse under Support Constraints.” arXiv, 2026.
Code: The implementation is available in the LEVER GitHub repository.
LEVER studies inference-time reuse of reinforcement learning policies: given a library of pretrained policies and a new composite objective, the framework constructs a policy offline, without additional environment interaction. The method retrieves relevant policies from metadata, evaluates candidates with behavioral embeddings, and composes policies through offline Q-value composition.
The paper focuses on the support-limited regime, where value propagation is unavailable and reuse quality depends on transition coverage in the policy library. In deterministic GridWorld experiments, LEVER can match or exceed training-from-scratch performance while reducing computation time, but performance degrades when long-horizon dependencies require propagation beyond the available support.
