Visual Fusion And Realtime Gaussian Splatting Study
Thesis
Mimir’s visual side should not become a CPU image processing application. The camera rig exists to produce time-indexed visual evidence that Fensalir can fuse on GPU into a live volumetric program surface.
The direct-driver work matters because sensor fusion is brutally sensitive to:
- timestamp error;
- rolling/shutter latency;
- frame drops;
- calibration drift;
- duplicated CPU copies;
- inconsistent exposure/format state.
Current Sensor Roles
- Leap stereo IR: near-field timing/depth/tracking candidate.
- PS3 Eyes: high-rate feature/tracking witnesses.
- Kiyo: stable RGB context.
- Kiyo Pro: intended RGB ground truth, currently high-speed/cadence limited.
- Raven-side devices: remote evidence producers, not local compute authority.
Evidence Layers
Raw Frame
Minimum:
- source id;
- device/arrival timestamp;
- dimensions/format/stride;
- native handle or byte span.
Feature Frame
Possible GPU outputs:
- corners/keypoints;
- descriptors;
- optical flow;
- silhouettes/masks;
- depth/disparity hints;
- confidence per feature.
Scene Claims
Rolling reservoir views:
- scene rays;
- surface claims;
- material claims;
- render packets.
Program Surface
Fensalir output:
- Gaussian splat/point cloud render;
- avatar/AR overlay;
- Spout2 program video.
Realtime Gaussian Splatting Relevance
Gaussian splatting is attractive because it represents scenes as many local anisotropic primitives that can be rasterized quickly. For Mimir, the hard part is not rendering a static splat set. The hard part is maintaining a dynamic time-indexed splat field from live unsynchronized cameras.
Mimir needs:
- synchronized observations;
- camera calibration;
- dynamic object handling;
- confidence and decay;
- GPU-resident update path;
- output that can be inspected/debugged live.
The low-level implementation references all point at the same renderer spine:
project Gaussians, bin/intersect them into tiles, sort or partially order for
view-consistent blending, then rasterize tiles with tightly packed GPU buffers.
gsplat, NVIDIA’s Vulkan sample, StopThePop, and FlashGS are useful because
they expose performance levers Mimir/Fensalir can reuse without pretending the
live rig is an offline training pipeline.
Useful borrowed ideas:
- packed structure-of-arrays splat buffers;
- tile-size as a first-order performance parameter;
- radius/opacity culling before sort/raster;
- global versus local/tile order tradeoffs;
- asynchronous CPU/GPU sorting only if it does not add frame latency;
- feature/depth render modes as diagnostics, not only RGB output.
4D / Dynamic Splatting Shape
flowchart TD Frames["time-aligned camera frames"] --> Features["GPU features / masks / flow"] Features --> Tracks["temporal tracks"] Tracks --> Claims["surface/material claims"] Claims --> Splats["dynamic Gaussian / point primitives"] Audio["audio source estimates"] --> Claims Splats --> Render["Fensalir program render"]
Data Model Sketch
struct MimirVisualObservation
{
uint64_t sourceHash;
uint64_t timestampNs;
uint32_t width;
uint32_t height;
uint32_t format;
uint64_t nativeHandle;
};
struct MimirSplatClaim
{
uint64_t stableKeyHash;
float position[3];
float covariance[6];
float color[4];
float confidence;
uint64_t sourceTimeMinNs;
uint64_t sourceTimeMaxNs;
};Immediate Architecture Cut
Do not start with “implement 4DGS.” Start with the ownership:
- direct camera driver pushes frame handles;
- runtime/reservoir indexes them;
- Fensalir reads current synchronized window;
- Fensalir produces simple point/render packet claims;
- only then expand to dynamic splat update.
Micro-Optimization Risks
- CPU demosaic/format conversion before GPU upload.
- Multiple copies of compressed camera frames.
- Re-decoding frame data per feature stage.
- Per-frame heap allocations in capture workers.
- Blocking on GPU readback for UI/debug.
- Sorting every splat globally every frame before the field is even stable.
- Treating 4DGS optimization loops as live capture prerequisites.
Research Threads To Continue
- 3D Gaussian Splatting reference CUDA rasterizer.
gsplattile/raster APIs and CUDA memory layout.- NVIDIA
vk_gaussian_splattingmesh-shader versus vertex-shader paths. - StopThePop / sorted Gaussian splatting for view-consistency tradeoffs.
- FlashGS-style kernel-level optimization and redundant-sort reduction.
- Dynamic/4D Gaussian splatting update strategies.
- PlayCanvas/WebGPU splat processing as a practical open implementation of GPU-side splat mutation and browser-scale rendering constraints.
- Gaussian-SLAM and multi-sensor calibration papers as shape references for online pose/calibration drift, even though Mimir’s rig is room-bound rather than vehicle-mounted.
- Online multi-camera calibration and temporal alignment.
- GPU optical flow/features for live camera arrays.
- Sensor-fusion priors from audio localization.
