Differentiable Rendering - Mesh Based Techniques

index

In the first part of this series, we introduced the concept of differentiable rendering (DR) and outlined its importance for bridging the gap between 2D computer vision and 3D scene understanding. We established the core challenge: standard graphics pipelines contain discrete, non-differentiable operations like rasterization and occlusion testing, which prevent the gradient flow essential for modern optimization algorithms.

While Part 1 provided a high-level historical overview, this post examines the foundational papers that first addressed these challenges. We trace the evolution from early approximation methods to physically-principled, fully differentiable frameworks. Our analysis covers four seminal works:

OpenDR (2014): A pioneering framework that introduced a practical, approximate differentiable renderer using image-space filtering.

Neural 3D Mesh Renderer (2017): A renderer designed specifically for deep learning pipelines, which proposed an approximate gradient for rasterization to enable end-to-end mesh generation.

Soft Rasterizer (2019): A paradigm shift that reimagined rendering as a probabilistic process, creating a truly differentiable forward pass.

Differentiable Monte Carlo Ray Tracing (2018): A mathematically rigorous approach that addressed the core discontinuities of physically-based rendering through edge sampling techniques.

Together, these papers demonstrate increasing sophistication, moving from gradient approximation to fundamental reformulation of the rendering process itself.

Early Approximation Methods

The first generation of differentiable renderers adopted a pragmatic approach: maintain standard, efficient, non-differentiable renderers for the forward pass while developing clever approximations for the backward pass. This strategy leveraged decades of graphics hardware optimization while circumventing the non-differentiability problem.

OpenDR: Differentiating at Object Boundaries

The 2014 paper “OpenDR: An Approximate Differentiable Renderer” provided one of the first general, publicly available frameworks for this problem [1]. The authors recognized that the primary source of non-differentiability occurs at object boundaries or silhouettes. Their solution involved defining a forward model with specific approximations and then differentiating it analytically.

Forward Model Components:

Appearance (A): Per-pixel appearance modeled as a product of mipmapped texture and per-vertex brightness, combining reflectance and lighting effects.
Geometry (V): 3D scenes approximated by triangulated meshes parameterized by vertices.
Camera (C): Standard pinhole camera model with distortion parameters, approximating continuous pixel intensities by their sampled central values.

The fundamental insight involves applying the chain rule. Final pixel values $f$ depend on 3D model parameters $\Theta = \{V, C, A\}$ through the 2D projection of vertices onto the image plane, denoted $U$ . The gradient decomposes as:

$\frac{\partial f}{\partial \Theta} = \frac{\partial f}{\partial U} \frac{\partial U}{\partial \Theta}$

The term $\partial U / \partial \Theta$ represents standard geometric projection with well-defined derivatives. The challenge lies in $\partial f / \partial U$ : how pixel values change as projected 2D vertices move.

OpenDR’s key contribution was approximating this term by segmenting pixels into categories:

Interior Pixels: For pixels away from occlusion boundaries, small geometric shifts result in smooth texture translation. OpenDR approximates this using image-space gradients, similar to applying convolution kernels like $[-1, 0, 1]$ .

Boundary Pixels: For pixels on silhouettes where one surface occludes another, OpenDR employs surrogate gradients. Rather than computing differences between foreground and occluded background pixels, it uses differences between foreground and neighboring visible background pixels. The boundary gradient typically dominates the occluded background surface gradient, justifying this approximation.

Multi-Boundary Pixels: Rare cases containing multiple boundaries are treated as single-boundary cases, avoiding prohibitive computational overhead.

OpenDR demonstrated practical inverse graphics on complex tasks, including human body shape estimation from Kinect data with thousands of parameters (lighting, albedo, pose, and body shape). However, its derivatives were approximate by design.

Neural 3D Mesh Renderer: Loss-Aware Gradient Design

The 2017 “Neural 3D Mesh Renderer” focused on tight integration with end-to-end deep learning pipelines [2]. This work advocated for polygon meshes as the optimal 3D representation for neural networks due to their compactness and geometric transformation suitability, contrasting with memory-intensive voxels or surface-less point clouds.

Like OpenDR, it addressed rasterization’s non-differentiable nature. The core problem occurs when vertex movement causes face edges to cross pixel centers, creating instantaneous color changes. The Neural Renderer smoothed these transitions through linear interpolation and loss-aware gradients.

For pixel $P_j$ with loss change $\delta P_j$ and potential color change $\delta I_j$ from vertex movement, gradients flow only when color changes would reduce loss:

\frac{\partial I_j(x_i)}{\partial x_i}\bigg|_{x_i=x_0} = \begin{cases} \frac{\delta I_j}{\delta x_i} & \text{if } \delta P_j \cdot \delta I_j < 0 \\ 0 & \text{otherwise} \end{cases}

This gated gradient prevents vertex movements that would worsen rendered images. The authors demonstrated this by training networks to reconstruct 3D meshes from single silhouette images. Networks predicted deformations of base meshes (642-vertex spheres), rendered using the Neural Renderer. Silhouette differences provided error signals backpropagated through the renderer to update network weights. This approach significantly outperformed voxel-based methods but was limited to fixed topology—deforming spheres into chairs without creating holes or changing mesh connectivity.

Probabilistic Approaches

Rather than approximating backward passes, the 2019 “Soft Rasterizer” paper proposed a fundamental shift: making the forward rendering process inherently differentiable [4].

Soft Rasterizer: Probabilistic Rendering

SoftRas reimagined rendering as a probabilistic rather than deterministic process. Instead of hard decisions about triangle-pixel coverage, every triangle probabilistically influences every pixel. This approach uses two key components:

Probability Maps ( $D_j$ ): For each triangle $f_j$ , SoftRas computes probability maps $D_j$ over the entire image. The value $D_j(p_i)$ represents the probability that pixel $p_i$ is covered by triangle $f_j$ , calculated as:

$D_j(p_i) = \text{sigmoid}\left(\frac{\delta_j \cdot d(p_i, f_j)^2}{\sigma}\right)$

Here, $d(p_i, f_j)$ is the distance from pixel $p_i$ to triangle $f_j$ ‘s nearest edge, $\delta_j$ indicates inside (+1) or outside (-1), and $\sigma$ controls boundary softness. As $\sigma \to 0$ , this converges to standard binary rasterization.

Aggregate Functions: Final pixel colors blend all triangle colors weighted by probability maps and normalized depths:

I_i = \sum_j w_j C_j + w_b C_b \quad \text{where} \quad w_j \propto D_j \cdot \exp(z_j / \gamma)

The parameter $\gamma$ controls depth influence, acting as a soft z-buffer. This formulation is smooth and differentiable throughout, allowing gradients to flow from pixels to all triangles, including fully occluded ones.

For silhouette rendering, aggregation models the probability that at least one triangle covers the pixel:

I_i^{\text{sil}} = 1 - \prod_j (1 - D_j(p_i)

This probabilistic formulation smooths loss landscapes by tuning $\sigma$ and $\gamma$ , proving crucial for avoiding local minima in complex optimization tasks like pose fitting. SoftRas represents a paradigm shift from gradient approximation to rendering redesign for differentiability.

Physically-Based Methods

While SoftRas achieved differentiability through probabilistic softening, another research direction pursued a different question: can we differentiate renderers that model precise light transport physics, including shadows and reflections? The 2018 paper “Differentiable Monte Carlo Ray Tracing through Edge Sampling” provided a mathematically rigorous solution [3].

Differentiable Ray Tracing: A Comprehensive Solution

This work provides the first comprehensive solution for computing derivatives of scalar functions over rendered images with respect to arbitrary scene parameters, including camera pose, scene geometry, materials, and lighting. The fundamental challenge arises from visibility terms in rendering integrals that introduce discontinuities at object boundaries, making traditional Monte Carlo sampling inadequate for gradient computation.

Mathematical Foundation

The rendering integral for pixel intensity is expressed as: $I = \iint k(x,y)L(x,y)dxdy$

The goal is computing: $\nabla I = \nabla \iint f(x,y;\Phi)dxdy$

where $f(x,y;\Phi) = k(x,y)L(x,y)$ represents the scene function parameterized by scene parameters $\Phi$ .

Handling Discontinuities Through Edge Sampling

The key insight recognizes that all discontinuities occur at triangle edges, which can be modeled as Heaviside step functions. The scene function becomes a sum of step functions multiplied by arbitrary functions. When differentiating, the derivative of a Heaviside step function produces a Dirac delta function, which has zero probability of being captured by traditional area sampling.

To differentiate the rendering integral, they first swap the gradient and integral operators and then apply the product rule. This separates the gradient of the integral into two distinct components:

\nabla \iint \theta(\alpha_i(x, y))f_i(x, y)dxdy = \iint \delta(\alpha_i(x, y))\nabla\alpha_i(x, y)f_i(x, y)dxdy + \iint \nabla f_i(x, y)\theta(\alpha_i(x, y))dxdy

This elegant formulation splits the problem perfectly:

The Discontinuous Part: The first term contains the Dirac delta function $\delta$ , which is non-zero only at the visibility boundaries (the edges). Standard Monte Carlo sampling will miss these infinitely narrow spikes with a probability of 1.
The Smooth Part: The second term involves the gradient of the smooth function $f_i$ , which can be evaluated away from the boundaries using standard techniques.

Two-Stage Sampling Strategy

The method employs complementary sampling approaches:

Area Sampling: Standard Monte Carlo sampling for smooth parts of the integrand using automatic differentiation.
Edge Sampling: Novel technique that explicitly samples points on triangle edges where discontinuities occur. For each edge point, the method computes function value differences on both sides of the edge.

Secondary Visibility and 3D Edges

The framework extends to handle shadows and global illumination through 3D edge sampling. Similar to 2D primary visibility, 3D edges introduce step functions into shading equations. The method generalizes the derivation to 3D with crucial modifications including area correction terms for projecting scene surface elements onto edge infinitesimal widths.

Efficient Importance Sampling

For practical implementation, the method develops sophisticated importance sampling strategies:

Edge Selection: Only silhouette edges contribute non-zero gradients. Edges are selected by projecting triangle meshes to screen space and clipping against camera frustum.
Hierarchical Structures: Two hierarchies are constructed—one for hard-edged geometry and a 6D bounding volume hierarchy for smooth-shaded geometry using endpoint positions and normals.
Traversal Strategy: Edge importance is computed based on contribution bounds, with special focus on edges blocking light sources.

Validation and Accuracy

Experimental validation demonstrates the method’s effectiveness across various synthetic scenes and inverse rendering tasks. Comparisons with finite differences show derivative accuracy within 1% relative error, confirming mathematical correctness. The method correctly handles cases where approximate renderers fail, such as spatially-varying illumination where brightness changes are due to lighting rather than geometry—approximate methods incorrectly interpret illumination variation as geometric gradients, while this approach correctly outputs zero derivatives in continuous regions.

Conclusion

The evolution from OpenDR to Differentiable Monte Carlo Ray Tracing demonstrates clear progression in differentiable rendering approaches. We have moved from clever gradient approximations to fundamental reformulation of rendering processes.

Method	Key Approach	Strengths	Limitations
OpenDR / NMR	Approximate gradients using image-space filtering and geometric rules	Fast, practical, leverages existing hardware	Inexact gradients; struggles with complex lighting and occlusions; fixed topology (NMR)
Soft Rasterizer	Reformulate forward pass as probabilistic process	Truly differentiable, graceful occlusion handling, smoothed loss landscapes	Non-photorealistic renders; hyperparameter dependence ( $\sigma, \gamma$ )
Diff. Ray Tracing	Comprehensive solution using edge sampling for discontinuities and automatic differentiation for smooth regions	Physically-accurate, unbiased gradients; handles arbitrary scene parameters and complex light transport	Computationally expensive (seconds to minutes for 256×256 images); assumes static scenes without participating media

The quest for differentiable rendering represents a critical enabler for next-generation 3D-aware AI systems. In the next part of this series, we will explore how these foundational ideas enabled current state-of-the-art approaches, including neural implicit representations like NeRF and real-time capabilities of 3D Gaussian Splatting.

Important (References)

References

[1] Loper, M. M., & Black, M. J. (2014). OpenDR: An Approximate Differentiable Renderer. In European Conference on Computer Vision (ECCV).

[2] Kato, H., Ushiku, Y., & Harada, T. (2018). Neural 3D Mesh Renderer. In Conference on Computer Vision and Pattern Recognition (CVPR).

[3] Li, T. M., Aittala, M., Durand, F., & Lehtinen, J. (2018). Differentiable Monte Carlo Ray Tracing through Edge Sampling. ACM Transactions on Graphics (TOG), 37(6), 1-11.

[4] Liu, S., Li, T., Chen, W., & Li, H. (2019). Soft Rasterizer: A Differentiable Renderer for Image-based 3D Reasoning. In International Conference on Computer Vision (ICCV).