Dual representations
-
Seohong Park
UC Berkeley
October 2025
In this post, I'll introduce a "new" idea for representing states in reinforcement learning (RL). The one-line summary is: we represent a state as the set of "distances" from every other state. I'll first explain the motivation behind this idea, and explain why it is useful.
A little motivation
When studying mathematics and machine learning, one may notice that there is one intriguing theme that recurrently comes up in many different contexts. It has something to do with a "relative" way of looking at things. Here are some examples:
- In linear algebra, there's a one-to-one correspondence between a vector and a linear functional (i.e., a dual vector) given by \( v \leftrightsquigarrow \langle v, \cdot \rangle \). This is often called the Riesz representation theorem. Intuitively, this means that a vector can be represented as "the set of inner products with all other vectors."
- In kernel machines, we represent a data point \( x \) as \( k(x, \cdot) \), where \( k \) is a kernel function. This kernel representation enables us to work implicitly in an infinite-dimensional space with the "kernel trick," \(k(x, y) = \langle \phi(x), \phi(y) \rangle\), where \(\phi\) is a feature map.
- In category theory, there's a natural bijection between an object \( X\) and the set of morphisms ("functions") from all other objects to it, \( \mathrm{Hom}(-, X) \). This is the celebrated Yoneda lemma. Again, it intuitively means that an object can be identified by its relations with all other objects.
The recurring theme in all these examples is that the relation to all other objects often contains enough information to identify the object itself. Can we apply this idea to reinforcement learning? More importantly, do we want to? The answer is yes and yes.
Dual representations for RL
Let's see how we can apply this idea to RL. The key question here is: how should we define the "relation" between two states in the context of RL? I'd argue that there's only one natural choice: the temporal distance. For simplicity, let's assume that the dynamics is deterministic. Then, the temporal distance \(d^*(s, g)\) denotes the minimum number of time steps required to reach state \(g\) from state \(s\). This is natural because it depends only on the intrinsic dynamics of the environment.
Based on this, we define a representation of a state \(s\) as the set of temporal distances from all other states. For example, if the state space is finite (\(\mathcal{S} = \{s_1, s_2, \ldots, s_K\}\)), then the representation of \(s\) is given by $$\begin{aligned} \phi(s) = \big[ d^*(s_1, s), \ldots, d^*(s_K, s) \big]. \end{aligned}$$ In the general (continuous) case, we define the representation of a state \(s\) as a function $$\begin{aligned} \phi(s) = d^*(\cdot, s). \end{aligned}$$ We call this a dual representation. It is named after the concept of dual spaces in mathematics (see the first example above).
Why is it a good idea?
Now, you may wonder why this is useful. It turns out that dual representations have a number of nice properties:
- First, dual representations are purely intrinsic. No matter how a state is originally represented (e.g., joint angles, pixels, etc.), its dual representation depends only on the temporal dynamics of the environment, providing a canonical way of representing states.
- Second, dual representations are rich, containing sufficient information to model an optimal goal-reaching policy. Let me explain what this means. One might worry that dual representations are too "lossy" (because they only contain some set of distances), so we may not be able to represent a good policy using dual representations. However, we show that this is not the case. In the paper, we prove that we can model an optimal goal-conditioned policy \( \pi(\pl{a} \mid \pl{s}, \phi(\pl{g})) \) that depends only on the dual representation of the goal \(\phi(\pl{g})\). So at least for goal-conditioned RL, dual representations contain provably enough information to represent an optimal policy!
- Third, dual representations are robust to external noise and improve generalization. Since dual representations depend only on temporal distances, they are naturally invariant to any external noise (e.g., background distractions). In the paper, we formalize this and show that they are indeed invariant to exogenous noise.
To summarize, dual representations are especially useful when used as goal representations in goal-conditioned RL, because they contain "necessary and sufficient" information for optimal goal-reaching!
How to implement this idea?
This all sounds great, but there are two problems when it comes to actually implementing this idea. First, dual representations are infinite-dimensional in general. Second, we don't have access to the true temporal distance function \(d^*\). So we need to employ some approximations. We do this in two steps.
To address the issue of infinite dimensionality, we use a parameterized temporal distance function. Namely, we model the temporal distance function as follows: $$\begin{aligned} d^*(s, g) = f(\psi(s), \phi(g)), \end{aligned}$$ where \(\psi\) and \(\phi\) are finite-dimensional embedding functions, and \(f\) is a simple aggregation function. We then take \(\phi\) as the dual representation of a state. This way, \(\phi\) captures temporal information about the state that can be "paired" with any other state to compute the temporal distance. In practice, we use the inner product parameterization for \(f\): $$\begin{aligned} d^*(s, g) = \psi(s)^\top \phi(g). \end{aligned}$$ Note that this parameterization is universal.
Next, to approximate the true temporal distance function \(d^*\), we use an existing goal-conditioned value learning algorithm, such as goal-conditioned implicit Q-learning. Since the optimal goal-conditioned value function \(V^*(\pl{s}, \pl{g})\) is closely related to the temporal distance \(d^*(\pl{s}, \pl{g})\), we can easily get the (approximate) distance function.
With these two approximations, we can train dual representations in practice! Here is the full training algorithm (see the paper for full details):
Connections to prior work
The practical algorithm above has many interesting connections to prior work, including successor representations, VIP, HILP, TRA, and many others. I won't go into detail here, but if you're interested, check out the related work section of our paper!
How does it work?
So, how well do dual representations work in practice? We tested them on a variety of goal-conditioned RL tasks in OGBench, across robotic navigation and manipulation.
Here are the main results. The table shows that dual representations indeed improve goal-reaching performance. They often outperform previous representation learning methods as well. It's quite exciting to see that just changing a goal representation can lead to up to around \(2\times\) performance gains on some tasks!
We also show that dual representations are particularly strong in "noisy" environments. As we discussed earlier, this is likely because dual representations ignore features that are irrelevant to the temporal dynamics.
In the paper, we show many more results, analyses, ablation studies, and Q&As about dual representations. Check out the paper for more details!
Epilogue
In this post, I introduced the idea of dual representations for RL. This post is based on our recent work, Dual Goal Representations, co-led by Deepinder and myself. Personally, this was a slightly unusual research journey for me. I usually prefer "problem-driven" research, where I first set an empirical, quantifiable goal and then try to come up with a solution to achieve that goal. But this time, it started from a rather mathematically motivated idea, which turned out to have some nice properties and practical benefits. It was an enjoyable journey, and I hope that the concept of dual representations is useful and inspiring for future research in RL and machine learning!