Non-linear beta schedules are implemented by defining a custom function that determines how the noise level (beta) changes over the steps of a diffusion or denoising process. Unlike linear schedules, which increase beta at a constant rate, non-linear schedules use mathematical functions like cosine, quadratic, or exponential curves to control the progression. The key is to map the timestep values to beta values in a way that balances stability and performance, often starting with slower changes in beta early in the process and accelerating later, or vice versa, depending on the use case.
To implement a non-linear beta schedule, you first choose a function that dictates the shape of the curve. For example, a cosine-based schedule might compute beta values using cos²
or similar transformations to create a smooth transition. Here’s a basic Python example for a cosine schedule:
import math
def cosine_beta_schedule(timesteps, s=0.008):
steps = timesteps + 1
x = torch.linspace(0, timesteps, steps)
alphas_cumprod = torch.cos((x / timesteps + s) / (1 + s) * math.pi * 0.5) ** 2
betas = 1 - (alphas_cumprod[1:] / alphas_cumprod[:-1])
return torch.clip(betas, 0, 0.999)
This code calculates cumulative product terms (alphas_cumprod
) using a cosine function, then derives beta values from their sequential ratios. Alternatively, a quadratic schedule might use beta_t = (t/T)^2 * max_beta
, where t
is the current step and T
is the total number of steps. The choice of function depends on whether you want sharper transitions (e.g., quadratic) or smoother ones (e.g., cosine).
When integrating a non-linear schedule into a diffusion model, ensure the beta values remain within valid ranges (typically 0 to 1) and avoid numerical instability. For example, very small beta values early in training might slow convergence, while large jumps could destabilize learning. Testing with visualization tools (e.g., plotting beta over time) helps validate the curve’s shape. Additionally, consider interpolating between precomputed beta values during training for efficiency. Most frameworks like PyTorch or TensorFlow allow caching the schedule as a tensor upfront. Non-linear schedules often require tuning hyperparameters like the s
offset in the cosine example to align with specific data domains, such as images versus audio.
Zilliz Cloud is a managed vector database built on Milvus perfect for building GenAI applications.
Try FreeLike the article? Spread the word