Dual-Process Atomic Skill Learning for Long-Horizon Manipulation Tasks
Abstract
Language-conditioned Imitation Learning (IL) is essential for enabling robots to perform complex tasks following natural language instructions. However, executing long-horizon tasks sequentially remains a significant challenge. While hierarchical approaches attempt to address this by decomposing tasks into atomic skills, existing methods often suffer from training instability and codebook collapse due to the tight coupling between high-level skill reasoning and low-level action generation. Inspired by the Dual-Process Theory of cognition, we propose DASL, a novel asynchronous hierarchical imitation learning framework that effectively decouples slow, semantic reasoning from fast, real-time motion control. DASL comprises a Slow-Frequency Policy that predicts interpretable, discrete skills via Vector Quantization, and a High-Frequency Policy that leverages a diffusion model and Decision Transformer to generate precise actions conditioned on these latent skills. By asynchronously coordinating these modules, our framework mitigates the interference common in synchronous co-training without relying on complex auxiliary regularization. Extensive evaluations on robotic manipulation and grid-world navigation benchmarks demonstrate that DASL significantly outperforms state-of-the-art baselines, particularly in skill acquisition and compositional generalization to unseen instructions. We will share our source code on GitHub.
Framework of DASL
Overview of DASL.an asynchronous hierarchical imitation learning framework. The high-level policy operates at a slow timescale to generate discrete semantic skills from language instructions and sparse observations, while the low-level policy executes skill-conditioned actions at a fast timescale. A latent diffusion module regularizes the latent trajectory space during training and is removed at inference for efficient real-time control.
Results
Performance on LOReL Sawyer
(a)
(b)
(c)
(d)
(e)
(a)
(b)
(c)
(d)
(e)Performance on Franka Kitchen
(a)
(b)
(c)
(d)
(e)
(f)
(g)
(a)
(b)
(c)
(d)
(e)
(f)
(g)Performance on BabyAI
Interpretability Analysis
Bibtex
@inproceedings{Chen2026DASL,
title = {Dual-Process Atomic Skill Learning for Long-Horizon Manipulation Tasks},
author = {Jun Chen, Erdemt Bao, Wenlong Dong, Jierui Liu, Hao Wan, Shaopeng Li, Weijun Qin, Jing Liang, Huiping Zhuang},
booktitle = {},
year = {2026}
}