Low-Level Implementation Notes

Purpose

This file collects the implementation details that are easy to lose when the conversation stays at the architecture level. None of these notes are promises; they are candidates to benchmark.

Audio Memory Layout

Preferred Analysis Layout

For decoder kernels:

channel0 samples: [s0 s1 s2 ...]
channel1 samples: [s0 s1 s2 ...]
...

Separate channel spans make per-channel scoring straightforward and avoid striding through interleaved audio during analysis. Interleaving is useful for device I/O and some DSP kernels; it is not automatically the right analysis shape.

Preferred Scoring Layout

candidateStartSamples[]
candidateEnergy[]
symbolReliability[]
binFrequencies[]
binSin[]
binCos[]

Dense arrays let SIMD/native/GPU kernels process candidates without object chasing. The codebook trellis can stay managed/CPU because it is small and branchy.

SIMD Chirp-Bin Scoring

Likely CPU kernel shape:

load candidate window samples in 8-float AVX2 chunks;
multiply by dechirp oscillator/window;
accumulate bin real/imag sums;
compute magnitude or normalized energy;
write top K or full dense bin vector.

Important:

keep oscillator tables aligned;
keep candidate windows contiguous;
batch multiple candidates per interop call;
avoid denormals;
normalize outside the deepest loop where possible.

FFT / Goertzel Decision

Use Goertzel/fixed-bin DFT when:

symbol count is small;
bins are known;
candidate count is modest;
native dispatch overhead matters.

Use FFT/CZT when:

many bins are needed;
frequency offset interpolation dominates;
candidate batches are large enough;
GPU/native FFT plans can be reused.

Do not choose FFT because it sounds more canonical. Canonical is not a profiler.

GPU Chirp Scoring

The GPU scoring pass only makes sense if data is already resident or batches are large enough to pay dispatch overhead.

Good GPU candidates:

many mic streams;
many remote receiver windows;
wide candidate sets during acquisition;
offline/artifact calibration sweeps.

Bad GPU candidates:

one or two channels;
tiny candidate counts;
paths that require CPU readback before codebook solving every event.

ASIO Capture Hot Path

Current observed risks:

per-block std::vector<float> resize;
std::deque queue;
native-to-managed polling copies into float[];
managed byte array allocation for sample payload;
ConcurrentQueue.Count pressure.

Preferred cut:

native SPSC ring;
fixed block pool;
monotonic frame counters;
block descriptors with handle/index;
C# metadata only until DSP needs samples.

Camera Capture Hot Path

Current probe strengths:

KS async queueing is already close to the driver;
PS3 Eye raw path proves high fps;
JSON frame events make cadence visible.

Current probe risks:

JSON must not become production payload;
CPU frame buffers are diagnostic until Fensalir import/upload path exists;
one-off command-line knobs must become configuration/stateful driver policy.

Preferred cut:

one native capture worker per camera family;
fixed frame ring;
descriptor ABI;
optional shared GPU texture handle;
Fensalir owns texture import and GPU staging.

Fensalir GPU Fusion

Fensalir contracts already distinguish:

GpuSensorFrame for calibrated camera inputs and external textures;
AcousticFieldFrame for timing/room constraints;
CalibrationEventFrame for high-confidence event anchors;
GpuFusionField for point buffers/seeds;
TemporalGaussianField for stable gaussian observations.

Mimir should produce these rows. It should not pack shader structs directly.

Gaussian Splat Rendering Levers

From modern 3DGS implementations:

tile size matters;
sorting strategy matters;
cull before sort;
packed buffers matter;
anti-aliasing and opacity/radius thresholds matter;
view-consistency fixes cost real time;
depth/feature render modes are useful for diagnostics.

Mimir-specific cut:

first update live claims;
then render simple claims;
only later optimize dynamic radiance fields.

Benchmark Counters To Add

Decoder:

samples consumed;
proposals created;
candidates scored;
anchors accepted/rejected;
elapsed scoring time;
allocations per analysis tick.

ASIO:

callback count;
block count;
queue/ring high-water mark;
dropped blocks;
conversion time if measurable.

Camera:

frame interval p50/p95/p99;
async queue depth;
dropped/late frames;
payload copy time;
upload/import time.

Fensalir:

contract lowering time;
upload time;
GPU pass timings;
temporal field count;
acoustic constraint count.

Low-Level Implementation Notes

Low-Level Implementation Notes

Purpose

Audio Memory Layout

Preferred Analysis Layout

Preferred Scoring Layout

SIMD Chirp-Bin Scoring

FFT / Goertzel Decision

GPU Chirp Scoring

ASIO Capture Hot Path

Camera Capture Hot Path

Fensalir GPU Fusion

Gaussian Splat Rendering Levers

Benchmark Counters To Add

Table of Contents

Backlinks