Guide to Improve Image Captures for Gaussian Splatting and NeRF

index

Why this guide?

Even state-of-the-art methods (Gaussian Splatting, NeRF) are only as good as your poses and geometry. That means: disciplined capture and sensible COLMAP[1] settings. Below is a practical, professional workflow that consistently yields sharper point clouds, stable cameras, and faster training.

1) Capture fundamentals (what actually matters)

Texture & features

SfM needs repeatable features. Add texture to bland surfaces.

Add anchors: posters, checkerboards, ArUco/AprilTags, textured fabric, books.
Avoid large glossy areas; break them up with matte artifacts or cross-polarization (if available).
Ensure every major surface appears in ≥ 3 views with good parallax.

Lighting & exposure

Aim for soft, uniform light.

Prefer diffuse (cloudy daylight; bounced/softbox). Avoid harsh spotlights and specular glare.
Lock exposure & white balance; disable auto-ISO/HDR.
Shoot RAW (or minimally compressed) whenever possible.

Focus & optics

Manual focus, fixed for the entire sequence.
Disable aggressive EIS/IBIS modes that warp geometry.
If using wide/fisheye lenses, note the camera model choice in COLMAP (see below).

Motion planning & overlap

Move the camera center—don’t just rotate in place.
Overlap per view: 50–70%; keep baseline/parallax modest but non-zero.
Typical paths: ring + elevated ring, figure-8, orbital arcs with a few top-down passes.

Exposure triangle (typical targets)

ISO: 64–400 (lower is better).
Shutter: ≤ 1/100 s handheld; faster for moving subjects.
Aperture: f/5.6–f/8 for sharpness (if available).

Quick pre-shoot checklist

4K+ resolution, RAW if possible.
Lock: WB, exposure, ISO, focus.
Add texture anchors where needed.
Plan a path ensuring parallax + coverage.

2) Preparing media (stills & video)

Stills

Keep burst rates sensible; avoid dozens of near-duplicates.

Video

Extract frames at 1–5 FPS depending on motion speed.
Remove motion-blurred or heavily redundant frames.

3) COLMAP: settings that move the needle

Below are the practical switches that most affect robustness and accuracy. Start with the Balanced preset, then adapt for low texture, videos, or wide lenses.

Image import (camera model & shared intrinsics)

If your capture used fixed zoom/focus, share the same camera across images:

--ImageReader.single_camera 1 (enforces identical intrinsics)
Camera model:
- SIMPLE_RADIAL (default) for normal lenses.
- OPENCV for wide lenses with tangential distortion.
- OPENCV_FISHEYE for fisheye action cams.

Example (project .ini or CLI flags used by multiple commands)

1
ImageReader.camera_model = OPENCV
2
ImageReader.single_camera = 1
3
# Optional: if EXIF is missing and you know the focal in pixels:
4
# ImageReader.camera_params = fx,fy,cx,cy,k1,k2,p1,p2,k3

A) Feature extraction (`feature_extractor`)

Key controls (SIFT):

--SiftExtraction.max_image_size : upsample to get more features (GPU/RAM permitting).
--SiftExtraction.max_num_features : cap per-image features.
--SiftExtraction.peak_threshold : lower → more keypoints (helps low texture).
--SiftExtraction.use_gpu 1 : use GPU extractor.

Balanced (good light, normal texture)

colmap feature_extractor \
  --database_path database.db \
  --image_path images \
  --ImageReader.single_camera 1 \
  --SiftExtraction.use_gpu 1 \
  --SiftExtraction.max_image_size 3200 \
  --SiftExtraction.max_num_features 12000 \
  --SiftExtraction.peak_threshold 0.004

Low-texture / matte walls

colmap feature_extractor \
  --database_path database.db \
  --image_path images \
  --ImageReader.single_camera 1 \
  --SiftExtraction.use_gpu 1 \
  --SiftExtraction.max_image_size 4096 \
  --SiftExtraction.max_num_features 16000 \
  --SiftExtraction.peak_threshold 0.0035 \
  --SiftExtraction.edge_threshold 10

Tip: if keypoints cluster at edges only, reduce peak_threshold a bit; if you get many spurious matches, increase it slightly.

B) Matching (pair selection + SIFT matching)

Pick a matcher:

Exhaustive (exhaustive_matcher) — best for < 500–800 images.
Sequential (sequential_matcher) — best for video/ordered frames (with loop detection).
Vocab tree (vocab_tree_matcher) — best for thousands of images.

Enable GPU matching and guided verification:

# Exhaustive
colmap exhaustive_matcher \
  --database_path database.db \
  --SiftMatching.use_gpu 1 \
  --SiftMatching.guided_matching 1

Sequential (video) with loop closure

colmap sequential_matcher \
  --database_path database.db \
  --SiftMatching.use_gpu 1 \
  --SiftMatching.guided_matching 1 \
  --SequentialMatching.overlap 5 \
  --SequentialMatching.loop_detection 1 \
  --SequentialMatching.loop_detection_num_images 50 \
  --SequentialMatching.loop_detection_period 10

Increase overlap if you sparsified video frames aggressively.

C) Incremental SfM (`mapper`)

Controls that affect stability/scale drift:

Initialization & inliers
- --Mapper.init_min_num_inliers (e.g., 200–300 for dense captures)
- --Mapper.abs_pose_min_num_inliers (e.g., 30–60)
Triangulation & filtering
- --Mapper.tri_min_angle 1.0 (larger → more robust, fewer points)
- --Mapper.filter_max_reproj_error 4.0 (tighter → cleaner points)
Bundle adjustment (intrinsics refinement)
- If you locked focus/zoom: --Mapper.ba_refine_focal_length 0 --Mapper.ba_refine_principal_point 0 --Mapper.ba_refine_extra_params 0
- If you did not lock or EXIF is unreliable: set the above to 1.

Balanced mapper example

mkdir -p sparse
colmap mapper \
  --database_path database.db \
  --image_path images \
  --output_path sparse \
  --Mapper.init_min_num_inliers 200 \
  --Mapper.abs_pose_min_num_inliers 40 \
  --Mapper.tri_min_angle 1.0 \
  --Mapper.filter_max_reproj_error 4.0 \
  --Mapper.ba_refine_focal_length 0 \
  --Mapper.ba_refine_principal_point 0 \
  --Mapper.ba_refine_extra_params 0

If registration stalls: run exhaustive_matcher again, lower peak_threshold a touch, and retry mapper.

D) Dense stage (undistort → MVS → fuse)

mkdir -p dense

# Undistort (export for MVS)
colmap image_undistorter \
  --image_path images \
  --input_path sparse/0 \
  --output_path dense \
  --output_type COLMAP \
  --max_image_size 2000

# PatchMatch stereo (robust depth with geometric consistency)
colmap patch_match_stereo \
  --workspace_path dense \
  --workspace_format COLMAP \
  --PatchMatchStereo.geom_consistency true

# Depth fusion → point cloud
colmap stereo_fusion \
  --workspace_path dense \
  --workspace_format COLMAP \
  --input_type geometric \
  --output_path dense/fused.ply

For very detailed scenes and ample VRAM, keep max_image_size higher in both extraction and undistortion.

E) Post: orientation & scale for NeRF/GS

For reproducible training and camera paths:

Orient to gravity / Manhattan:

colmap model_orientation_aligner \
  --input_path sparse/0 \
  --output_path sparse_aligned \
  --method MANHATTAN

Align to known axes / GPS / control points (optional):

colmap model_aligner \
  --input_path sparse/0 \
  --output_path sparse_aligned \
  --ref_images_path ref.txt

Export cameras.txt / images.txt / points3D.txt or fused.ply for downstream pipelines.

4) One-page field checklist (print me)

Before: RAW on, WB locked, exposure locked, ISO ≤ 400, manual focus, EIS/HDR off, add anchors.
Path: ring + elevated ring + a few top-downs; 50–70% overlap; steady pace.
Stills: no heavy bursts. Video: extract 1–5 FPS frames.
COLMAP: extractor 3200–4096 px; 12–16k feats; peak 0.0035–0.004; GPU on; exhaustive or sequential+loops; mapper with tight reproj (3–4 px), refine intrinsics off if you truly locked optics.
Dense: undistort 2000–3000 px; geom_consistency true; fuse; align orientation.

Bottom line: lock your camera, add texture, plan motion for parallax, and use the tuned COLMAP switches above. Your Gaussian Splatting/NeRF training will converge faster, with crisper detail and far fewer headaches.

Note: To use this guide on your business, contact us. Copyright by Kali Ink.

Important (References)

[1] Schönberger, J. L., & Frahm, J.-M. (2016). Structure-from-Motion revisited. In Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition. IEEE