Low-Level Implementation Notes
Purpose
This file collects the implementation details that are easy to lose when the conversation stays at the architecture level. None of these notes are promises; they are candidates to benchmark.
Audio Memory Layout
Preferred Analysis Layout
For decoder kernels:
channel0 samples: [s0 s1 s2 ...]
channel1 samples: [s0 s1 s2 ...]
...Separate channel spans make per-channel scoring straightforward and avoid striding through interleaved audio during analysis. Interleaving is useful for device I/O and some DSP kernels; it is not automatically the right analysis shape.
Preferred Scoring Layout
candidateStartSamples[]
candidateEnergy[]
symbolReliability[]
binFrequencies[]
binSin[]
binCos[]Dense arrays let SIMD/native/GPU kernels process candidates without object chasing. The codebook trellis can stay managed/CPU because it is small and branchy.
SIMD Chirp-Bin Scoring
Likely CPU kernel shape:
- load candidate window samples in 8-float AVX2 chunks;
- multiply by dechirp oscillator/window;
- accumulate bin real/imag sums;
- compute magnitude or normalized energy;
- write top K or full dense bin vector.
Important:
- keep oscillator tables aligned;
- keep candidate windows contiguous;
- batch multiple candidates per interop call;
- avoid denormals;
- normalize outside the deepest loop where possible.
FFT / Goertzel Decision
Use Goertzel/fixed-bin DFT when:
- symbol count is small;
- bins are known;
- candidate count is modest;
- native dispatch overhead matters.
Use FFT/CZT when:
- many bins are needed;
- frequency offset interpolation dominates;
- candidate batches are large enough;
- GPU/native FFT plans can be reused.
Do not choose FFT because it sounds more canonical. Canonical is not a profiler.
GPU Chirp Scoring
The GPU scoring pass only makes sense if data is already resident or batches are large enough to pay dispatch overhead.
Good GPU candidates:
- many mic streams;
- many remote receiver windows;
- wide candidate sets during acquisition;
- offline/artifact calibration sweeps.
Bad GPU candidates:
- one or two channels;
- tiny candidate counts;
- paths that require CPU readback before codebook solving every event.
ASIO Capture Hot Path
Current observed risks:
- per-block
std::vector<float>resize; std::dequequeue;- native-to-managed polling copies into
float[]; - managed byte array allocation for sample payload;
ConcurrentQueue.Countpressure.
Preferred cut:
- native SPSC ring;
- fixed block pool;
- monotonic frame counters;
- block descriptors with handle/index;
- C# metadata only until DSP needs samples.
Camera Capture Hot Path
Current probe strengths:
- KS async queueing is already close to the driver;
- PS3 Eye raw path proves high fps;
- JSON frame events make cadence visible.
Current probe risks:
- JSON must not become production payload;
- CPU frame buffers are diagnostic until Fensalir import/upload path exists;
- one-off command-line knobs must become configuration/stateful driver policy.
Preferred cut:
- one native capture worker per camera family;
- fixed frame ring;
- descriptor ABI;
- optional shared GPU texture handle;
- Fensalir owns texture import and GPU staging.
Fensalir GPU Fusion
Fensalir contracts already distinguish:
GpuSensorFramefor calibrated camera inputs and external textures;AcousticFieldFramefor timing/room constraints;CalibrationEventFramefor high-confidence event anchors;GpuFusionFieldfor point buffers/seeds;TemporalGaussianFieldfor stable gaussian observations.
Mimir should produce these rows. It should not pack shader structs directly.
Gaussian Splat Rendering Levers
From modern 3DGS implementations:
- tile size matters;
- sorting strategy matters;
- cull before sort;
- packed buffers matter;
- anti-aliasing and opacity/radius thresholds matter;
- view-consistency fixes cost real time;
- depth/feature render modes are useful for diagnostics.
Mimir-specific cut:
- first update live claims;
- then render simple claims;
- only later optimize dynamic radiance fields.
Benchmark Counters To Add
Decoder:
- samples consumed;
- proposals created;
- candidates scored;
- anchors accepted/rejected;
- elapsed scoring time;
- allocations per analysis tick.
ASIO:
- callback count;
- block count;
- queue/ring high-water mark;
- dropped blocks;
- conversion time if measurable.
Camera:
- frame interval p50/p95/p99;
- async queue depth;
- dropped/late frames;
- payload copy time;
- upload/import time.
Fensalir:
- contract lowering time;
- upload time;
- GPU pass timings;
- temporal field count;
- acoustic constraint count.
