Rectified flow learns ODEs as generative models by causalizing (or rectifying) an interpolation process that smoothly connects noise and data. This process naturally favors dynamics with straighter trajectories and hence fast Euler discretization, and can be repeated to further improve straightness.
This blog provides a brief introduction to rectified flow, based on Chapter 1 of these lecture notes. For more introduction, please refer to the original papers and these blogs.
Problem: Learning Flow Generative Models
Generative modeling can be formulated as finding a computational procedure that transforms a noise distribution, denoted by , into an unknown data distribution observed from data. In flow models, this procedure is represented by an ordinary differential equation (ODE):
where denotes the time derivative, and the velocity field is a learnable function to be estimated to ensure that follows the target distribution when starting from . In this case, we say that the stochastic process provides an (ODE) transport from to .
It is important to note that, in all but trivial cases, there exist infinitely many ODE transports from to , provided that at least one such process exists. Thus, it is essential to be clear about which types of ODEs we should prefer.
One option is to favor ODEs that are easy to solve at inference time. In practice, the ODEs are approximated using numerical methods, which typically construct piecewise linear approximations of the ODE trajectories. For instance, a common choice is the Euler method:
where is a step size. Varying the step size introduces a trade-off between accuracy and computational cost: smaller yields higher accuracy but requires more computation steps. Therefore, we should seek ODEs that can be approximated accurately even with large step sizes.
Figure 1. Lady Windermere's fan for illustration of error accumulation in Euler method trajectories, starting from various initial points and deviating from the true solution curve over time.
The ideal scenario arises when the ODE follows straight-line trajectories, in which case Euler approximation yields zero discretization error regardless of the choice of step sizes. In such cases, the ODE, up to time reparameterization, should satisfy:
These ODEs, known as straight transports, enable fast generative models that can be simulated in a single step. We refer to the resulting pair as a straight coupling of and . In practice, we may not achieve perfect straightness, but we can aim to make the ODE trajectories as straight as possible to maximize computational efficiency.
Rectified Flow
To construct a flow transporting to , let us assume that we are given an arbitrary coupling of and , from which we can obtain empirical draws. This can simply be the independent coupling with law , as is common in practice when we have access to independent samples from and . The idea is to take and convert it to a better coupling generated by an ODE model. Optionally, we can then iteratively repeat this process to further enhance desired properties, such as straightness.
Rectified flow is constructed in the following ways:
Build Interpolation:
The first step is to build an interpolation process that smoothly interpolates between and . Although general choices are possible, let us consider the canonical choice of straight-line interpolation:
Here the interpolation is a stochastic process generated in an “anchor-and-bridge” way: we first sample the endpoints and and then sample the intermediate trajectory connecting them.
Marginal Matching:
By construction, the marginal distributions of and match the target distributions and through the interpolation process . However, is not a causal ODE process like , which generate the output by evolving forward in time from . Instead, generating requires knowledge of both and , rather than evolving solely from as increases.
This issue can be resolved if we can convert somehow into a causal ODE process while preserving the marginal distributions of at each time . Note that since we only care about the output , we only need to match the marginal distributions of at each individual time . There is no need to match the trajectory-wise joint distribution of .
Perhaps surprisingly, marginal matching can be achieved by simply training the velocity field of the ODE model to match the slope of the interpolation process via:
The theoretical minimum is achieved by:
which is the conditional expectation of the slope for all the interpolation trajectories passing through a given point . If multiple trajectories pass point , the velocity is the average of for these trajectories.
With the canonical straight interpolation , we have by taking the derivative of with respect to . It yields:
In practice, the optimization in (3) can be efficiently solved even for large AI models when is parameterized as modern deep neural nets. This is achieved by leveraging off-the-shelf optimizers with stochastic gradients, computed by drawing pairs from data, sampling uniformly in , and then computing the corresponding using the interpolation formula.
Notation. A stochastic process is a measurable function of time and a random seed (with, say, distribution ). In the case above, the end points are the random seed, i.e., . The slope is given by as the partial derivative of w.r.t. , which is also a function of the same random seed. The expectation in the loss, written in full, is
In writing, we often omit the random seed. Whenever we take the expectation, it averages out all random sources inside the brackets except for those explicitly included in the conditioning.
Figure 2. Rectified flow between and . Blue and pink lines represent trajectories, colored by the mode they are associated with for visualization.
We illustrate the intuition in Fig.2:
In the interpolation process , different trajectories may have intersecting points, resulting in multiple possible values of associated with a same point due to uncertainty about which trajectory it was drawn from (Fig.2a).
In contrast, by the definition of an ODE , the update direction at each point is uniquely determined by , making it impossible for different trajectories of {Z_t} to intersect and then diverge along different directions.
Hence at these intersection points of where {\dot X_t} is uncertain and non-unique, the ODE “derandomizes” the update direction by following the conditional expectation Consequently, the trajectories of the ODE “reassemble” the interpolation trajectories in a way that avoids intersections. See Fig.2(b).
Since ODE trajectories {Z_t} cannot intersect, they must curve at potential intersection points to “rewire” the original interpolation paths and avoid crossing.
Rectified Flow. For any time-differential stochastic process , we call the ODE process:
the rectified flow induced by . We denote it as:
Figure 3. A close-up view of how rectification “rewires” interpolation trajectories. (a) Interpolation trajectories with intersections. (b) Averaged velocity directions at intersection points (red arrows). (c) Trajectories of the resulting rectified flow.
Figure 3 illustrates a close-up view of how rectification “rewires” interpolation trajectories. Consider two “beams” of interpolation trajectories intersecting to form the “region of confusion” (shaded area in the middle). Within this region, a particle moving along the rectified flow follows the averaged direction v^*_t. Upon exiting, the particle joins one of the original interpolation streams based on its exit side and continues moving. Since rectified flow trajectories do not intersect within the region, they remain separated and exit from their respective sides, effectively “rewiring” the original interpolation trajectories.
What makes rectified flow useful is that it preserves the marginal distributions of at each point while resulting in a “better” coupling in terms of optimal transport:
Marginal Preservation
The and its rectified flow share the same marginal distributions at each time , that is:
where denotes the probability distribution (or law) of random variable .
Intuitively, by the definition of in (1), the total amount of mass flow entering and exiting every infinitesimal volume in the space is equal under the dynamics of and . This ensures that the two processes yield the same marginal distributions, even though the flow directions may differ.
Transport Cost
The start-end pairs from the rectified flow guarantee to yield no larger transport cost than , simultaneously for all convex cost functions :
Intuitively, it is because disentangling the intersections reduces the length of the trajectories by triangle inequality:
Reflow
While rectified flows tend to favor straight trajectories, they are not perfectly straight. As in Fig.2 (a), the flow makes turns at intersection points of the interpolation trajectories . How can we further improve the flow to achieve straighter trajectories and hence speed up inference?
A key insight is that the start-end pairs generated by rectified flow, called the rectified coupling of , form a better and “straighter” coupling compared to . This is because if we connect and with a new straight-line interpolation, it would yield fewer intersection points. Hence, training a new rectified flow based on this interpolation would result in straighter trajectories, leading to faster inference.
Formally, we apply the procedure recursively, yielding a sequence of rectified flows starting from :
where denotes an interpolation process given as the endpoints. We call the -th rectified flow, or simply the -rectified flow, induced from .
This reflow procedure is proved to “straighten” the paths of rectified flows in the following sense: Define the following measure of straightness of :
where is a measure of the straightness of , with corresponding to straight paths. Then it can be found in paper that
which suggests that the average of in the first steps decay with an rate.
Note that reflow can begin from any coupling , so it provides a general procedure for straightening and thus speeding up any given dynamics while preserving the marginals.
As shown in Fig.2 (c), after applying the “Reflow” operation, the trajectories become straighter than the original rectified flow .
Reflow and Shortcut Learning. Intuitively, reflow resembles shortcut learning in humans: once we solve a problem for the first time, we learn to go directly to the solution, enabling us to solve it more quickly the next time.
Footnotes
References
Flow straight and fast: Learning to generate and transfer data with rectified flow Liu, X., Gong, C. and Liu, Q., 2022. arXiv preprint arXiv:2209.03003.
Rectified flow: A marginal preserving approach to optimal transport Liu, Q., 2022. arXiv preprint arXiv:2209.14577.