Lipschitz-constrained Unsupervised Skill Discovery
ICLR 2022
-
Seohong Park
Seoul National University -
Jongwook Choi*
University of Michigan -
Jaekyeom Kim*
Seoul National University -
Honglak Lee
University of Michigan -
Gunhee Kim
Seoul National University
Abstract
We study the problem of unsupervised skill discovery, whose goal is to learn a set of diverse and useful skills with no external reward. There have been a number of skill discovery methods based on maximizing the mutual information between skills and states. However, we point out that their objectives do not necessarily prefer dynamic skills that can be more useful for downstream tasks. To address this issue, we propose Lipschitz-constrained Skill Discovery (LSD), which encourages the agent to discover more diverse, dynamic, and far-reaching skills. Another benefit of LSD is that its learned representation function can be utilized for solving goal-reaching downstream tasks even in a zero-shot manner — i.e., without further training or complex planning. Through experiments on various MuJoCo robotic locomotion and manipulation environments, we demonstrate that LSD outperforms previous approaches in terms of skill diversity, state space coverage, and performance on seven downstream tasks including the challenging task of following multiple goals on Humanoid. Videos of learned skills are available at https://shpark.me/projects/lsd/.
Zero-shot goal following
With LSD's learned representation function, the agent can follow goals without any further training (4x speed).
Skills discovered by LSD
We provide videos of skills learned by LSD. All the diverse behaviors illustrated in each video were obtained within a single training of LSD. To show the consistency of the learned policy for each skill, we demonstrate two rollouts each.
HalfCheetah (8 discrete skills, 2 rollouts each)
On HalfCheetah, discrete LSD learned to run forward and backward with diverse gaits, to roll forward and backward, and to take different poses.
More random seeds
LSD discovered diverse and dynamic behaviors regardless of the random seed.
Ant (16 discrete skills, 2 rollouts each)
On Ant, discrete LSD learned a skill set consisting of:
- Locomotion skills - #1, #6, #7, #12, #16
- Rotation skills - #2, #3, #4, #8, #10, #15
- Posing skills - #5, #9, #14
- Flipping skills - #11, #13
More random seeds
LSD discovered diverse and dynamic behaviors regardless of the random seed.
Ant (2-D continuous skills, 2 rollouts each)
DIAYN
DIAYN discovered posing skills, as its mutual information objective does not necessarily prefer large state variations.
LSD
On the other hand, LSD encourages the agent to have more variations in the state space, resulting in learning more dynamic skills.
Humanoid (16 discrete skills, 2 rollouts each, 5x speed)
Humanoid (2-D continuous skills, 2 rollouts each, 5x speed)
Fetch environments (2-D continuous skills)
FetchSlide
FetchPickAndPlace