PAPER_TITLE

FIRST_AUTHOR_LAST, FIRST_AUTHOR_FIRST; SECOND_AUTHOR_LAST, SECOND_AUTHOR_FIRST

Dual-Process Atomic Skill Learning for Long-Horizon Manipulation Tasks

Jun Chen^1*, Erdemt Bao^2*, Wenlong Dong³, Jierui Liu⁴, Hao Wan¹, Shaopeng Li⁵, Weijun Qin⁵, Jing Liang¹, Huiping Zhuang⁴^†

¹ University of Electronic Science and Technology of China
² Huazhong University of Science and Technology
³ Southern University of Science and Technology
⁴ South China University of Technology
⁵ EbTech Co. Ltd.
^*Indicates Equal Contribution, ^†Corresponding authors Under review

Paper Supplementary Code arXiv

Abstract

Language-conditioned Imitation Learning (IL) is essential for enabling robots to perform complex tasks following natural language instructions. However, executing long-horizon tasks sequentially remains a significant challenge. While hierarchical approaches attempt to address this by decomposing tasks into atomic skills, existing methods often suffer from training instability and codebook collapse due to the tight coupling between high-level skill reasoning and low-level action generation. Inspired by the Dual-Process Theory of cognition, we propose DASL, a novel asynchronous hierarchical imitation learning framework that effectively decouples slow, semantic reasoning from fast, real-time motion control. DASL comprises a Slow-Frequency Policy that predicts interpretable, discrete skills via Vector Quantization, and a High-Frequency Policy that leverages a diffusion model and Decision Transformer to generate precise actions conditioned on these latent skills. By asynchronously coordinating these modules, our framework mitigates the interference common in synchronous co-training without relying on complex auxiliary regularization. Extensive evaluations on robotic manipulation and grid-world navigation benchmarks demonstrate that DASL significantly outperforms state-of-the-art baselines, particularly in skill acquisition and compositional generalization to unseen instructions. We will share our source code on GitHub.

Framework of DASL

Overview of DASL.an asynchronous hierarchical imitation learning framework. The high-level policy operates at a slow timescale to generate discrete semantic skills from language instructions and sparse observations, while the low-level policy executes skill-conditioned actions at a fast timescale. A latent diffusion module regularizes the latent trajectory space during training and is removed at inference for efficient real-time control.

Results

Performance on LOReL Sawyer

LOReL State 1 — Fig.1 LOReL State: open drawer and move black mug right

LOReL State 2 — Fig.1 LOReL State: open drawer and move black mug right

LOReL Image 1 — Fig.2 LOReL Image: turn faucet right and close drawer

LOReL Image 2 — Fig.2 LOReL Image: turn faucet right and close drawer

Performance on Franka Kitchen

Fig1 visualization — Fig.3 K-rates on seen and unseen tasks in Kitchen (image)

Kitchen State 1 — Fig.4 Kitchen State: activate bottom burner and activate top burner and turn on light switch and open sliding cabinet

Kitchen State 2 — Fig.4 Kitchen State: activate bottom burner and activate top burner and turn on light switch and open sliding cabinet

Kitchen Image 1 — Fig.5 Kitchen Image: activate bottom burner and activate top burner and turn on light switch and open sliding cabinet

Kitchen Image 2 — Fig.5 Kitchen Image: activate bottom burner and activate top burner and turn on light switch and open sliding cabinet

Performance on BabyAI

Interpretability Analysis

Latent distribution visualization — Fig.7 Visualization of Latent Skill Distributions

Word clouds comparison — Fig.8 Word cloud of skills learned in LOReL Sawyer (state) compositional task

Correlation matrix — Fig.9 Skill heatmap visualization for LOReL on Sawyer (state) compositional tasks

Option frequency matrix — Fig.10 Skill heatmap visualization for LOReL on Sawyer (state) compositional tasks (normalized row-wise)

Word frequency matrix — Fig.11 Skill heatmap visualization for LOReL on Sawyer (state) compositional tasks (normalized column-wise)

Bibtex

@inproceedings{Chen2026DASL,
  title     = {Dual-Process Atomic Skill Learning for Long-Horizon Manipulation Tasks},
  author    = {Jun Chen, Erdemt Bao, Wenlong Dong, Jierui Liu, Hao Wan, Shaopeng Li, Weijun Qin, Jing Liang, Huiping Zhuang},
  booktitle = {},
  year      = {2026}
}

More Works from Our Lab

Paper Title 1

Paper Title 2

Paper Title 3