So I built a hyperspectral imaging system using some of the more recent developments in algorithm development. I was quite pleased with the results so I thought I would take some 3-channel photos of my cat and then convert them to hyperspectral cubes and showcase those. But, I decided to instead use some scaled tiles of myself.
That work forced me to think about perception, classification, and what it means to turn reality into data.
I watched the memorial service on January 27 at Auschwitz-Birkenau Memorial and Museum.
Words fail.
Two officers stood in front of Buildings 10 and 11, former medical and experimental blocks, their windows sealed. That image was surreal. It demanded the highest level of human respect and dignity. They weren’t there to enforce authority. They were there to stand witness.
This must never be repeated. And yet both young and old in our society increasingly behave as if historical memory were optional.
From the arrest of innocent families, their renaming and systematic indoctrination, to the consequences that persist across generations, we must understand the full depth of what happened, during and after. This was not confined to a handful of sites. It unfolded across more than 40,000 camps, ghettos, and detention facilities.
This was a distributed system, not isolated evil.
I don’t usually write outside technical documentation.
My world is syntax, signal paths, kernel logs, and build systems, not essays about power or society. But visibility changes responsibility. And if I’m going to maintain a public technical space, then this deserves to be said plainly.
Technology is powerful.
Building a convolutional neural network or a multi stage inference pipeline isn’t magic. It’s engineering. Difficult, yes, but not inaccessible. Most doctorate level research today focuses on algorithmic refinement, not on writing efficient compiled systems or production grade pipelines. If you can code, understand semantics, and reason about data flow, you can build perception systems. In fact, it’s easier than it used to be.
Years ago, I would have written the entire stack in C, with optimized inline assembly where it mattered. Today, I spend more time wrestling package dependencies in mismanaged Python repositories on a stripped down Linux dev box. It isn’t rocket science. It just requires discipline, command line fluency, and persistence.
So the real question isn’t whether we can build these systems.
It’s why.
And for whom.
Earlier this week I spoke with someone working inside a city agency. He showed me thermal camera feeds and object detection overlays. He was proud of how the system identified “subjects” as they moved, ran, or fled across a landscape. He talked about thermal scopes. Tracking. Classification.
Then he went further.
He explained that he had worked directly with materials used in thermal imaging sensor development. He spoke about it with the same pride, as if proximity to restricted materials implied novelty or superiority.
It doesn’t.
This is old technology.
Thermal imaging based on radiological and infrared principles has existed for decades. The physics are well understood. The sensors are mature. What has changed is not capability, but scale, automation, and framing.
What struck me wasn’t the technology.
What struck me was how casually it was framed.
Light converted to heat maps. Heat converted to bounding boxes. People reduced to vectors. Human movement, including fleeing, reduced to metadata.
This framing is deeply offensive.
Once raw physics becomes structured data, it stops being imagery. It becomes inference<.
And inference, at scale, becomes power.
The kind of data that shows you exactly how perception is being transformed into classification, signals into vectors, people into metadata.
And it forces a harder reflection.
Where we were in 1935, and where we are today.
Back then, people were cataloged by hand. Paper forms. Filing cabinets. Human clerks deciding who belonged and who did not. The machinery was slower, but the intent was familiar. Classification first. Movement second. Consequences later.
In Nazi Germany this took the form of population registries, racial classification cards, Arbeitspass (work books), Kennkarten (identity cards), and local police card indexes. Everything depended on human operated paperwork pipelines.
Today, it’s automated.
Bounding boxes replace clipboards. Vectors replace names. Dashboards replace offices.
The velocity has changed. The abstraction has changed.
The danger has not.
Today that power increasingly lives behind closed source cameras, closed firmware, proprietary models, and vendor controlled pipelines. State and county camera systems are marketed as convenience and safety, but they operate as black boxes. Communities do not get to inspect the models. Citizens cannot audit the datasets. There is no meaningful public accountability for how identity, motion, or behavior are classified.
And it is no longer just governments.
These tools are now embedded in apartment complexes, homeowners associations, parking operators, retail centers, restaurants, and private property networks. Quasi government surveillance infrastructure is quietly being absorbed into the private sector, where accountability is weaker and oversight is minimal.
You don’t need a badge anymore to run perception systems on the public.
You just need a contract.
When vision pipelines move into restaurants and storefronts, when license plate readers sit beside drive through lanes, when facial classifiers operate in spaces meant for ordinary life, the line between civic safety and commercial monitoring disappears.
When perception systems migrate outside formal government channels, the danger multiplies.
There are no transparency requirements. No due process guarantees. No public records requests. Just automated identification feeding proprietary dashboards.
That is how monitoring becomes rounding up.
Not through dramatic announcements, but through normalized pipelines. License plate hits. Face matches. Movement graphs. Quiet coordination between private entities using tools originally justified for public safety.
This is where remembrance matters.
We remember because forgetting makes repetition easy.
Entire populations have been cataloged before. Movement has been tracked before. Human beings have been reduced to identifiers before. History shows us exactly what happens when classification replaces compassion and automation replaces accountability.
Remembrance is not symbolic. It is technical. It is cultural. It is defensive.
It reminds us what centralized perception can become when left unchecked.
These systems are still owned by the same institutions, built on closed hardware, closed firmware, closed datasets, and export restricted sensors. The stack is opaque by design. Not because it has to be, but because opacity preserves control.
Software systems that perform hyperspectral transformation, thermal fusion, or automated classification aren’t rare anymore. What’s rare is transparency.
Open standards matter because they decentralize understanding. They allow inspection. Reproduction. Audit. They make it harder for surveillance to quietly become infrastructure.
Technology does not drift toward ethics.
It drifts toward whoever funds it.
And that brings us to where we were as a country, and where we are going.
Entire generations fought wars so that power would not be centralized, so that citizens could question authority, so that freedom of movement, thought, and expression were not mediated by unseen systems. Those sacrifices were not made so that people could eventually be reduced to vectors in proprietary dashboards.
We did not fight for independence so that perception itself could become privatized.
The same neural networks used to track movement across terrain can inspect bridges, analyze crops, preserve artwork, or study climate systems. The code is identical. The difference is governance.
Engineers like to believe we’re neutral.
We aren’t.
Every dataset encodes priorities. Every optimization implies intent. Every deployment answers a simple question.
Who benefits?
I don’t write this as a theorist.
I write this as someone who builds systems.
We are now creating machines that see.
If we don’t also build systems that explain, verify, and democratize that vision, we shouldn’t be surprised when it only looks in one direction.
If I could show you photos of the street on Merwedeplein in 1944, I would. I'm certain that the people who live there understand the full depth of everything they've gone through. They live in close proximity to where it all happened. And I know that they have not forgotten. Lately, it feels like the distance between there and here is shrinking.
They lived.
They loved.
They laughed.
And I saw what followed.
I lived part of it.
The Quantum Nature of Observation
Several groups are actively exploring related deep learning approaches in this space. If you have questions or comments, feel free to reach out.
Bryan R Hinton
bryan (at) bryanhinton.com
The interrogation of physical reality through the medium of light remains one of the most profound endeavors of scientific inquiry. This pursuit traces its modern theoretical roots to the mid-20th century, a pivotal era for physics.
In 1935, Albert Einstein and his colleagues Boris Podolsky and Nathan Rosen published a seminal paper that challenged the completeness of quantum mechanics.1 They introduced the concept of EPR pairs to describe quantum entanglement, where particles remain inextricably linked, their states correlated regardless of spatial separation.
It is the quintessential example of quantum entanglement. An EPR pair is created when two particles are born from a single, indivisible quantum event, like the decay of a parent particle.This process "bakes in" a shared quantum reality where only the joint state of the pair is defined, governed by conservation laws such as spin summing to zero. As a result, the individual state of each particle is indeterminate, yet their fates are perfectly correlated.
Measuring one particle (e.g., finding its spin "up") instantaneously determines the state of its partner (spin "down"), regardless of the distance separating them. This "spooky action at a distance," as Einstein called it, revealed that particles could share hidden correlations across space that are invisible to any local measurement of one particle alone. While Einstein used this idea to argue quantum theory was incomplete, later work by John Bell2 and experiments by Alain Aspect3 confirmed this entanglement as a fundamental, non-classical feature of nature.The EPR-Spectral Analogy: Hidden Correlations
|
Quantum Physics (1935)
EPR Pairs: Particles share non-local entanglement.
Measuring one particle gives random results; correlation only appears when comparing both
|
Spectral Imaging (Today)
Spectral Pairs: Materials share spectral signatures.
The correlation is invisible to trichromatic (RGB) vision
|
|
↓
Mathematical
Reconstruction ↓
Reveals Hidden
Correlations |
|
While the EPR debate centered on the foundations of quantum mechanics, its core philosophy, that direct observation can miss profound hidden relationships, resonates deeply with modern imaging. Just as the naked eye perceives only a fraction of the electromagnetic spectrum, standard RGB sensors discard the high-dimensional "fingerprint" that defines the chemical and physical properties of a subject. Today, we resolve this limitation through multispectral imaging. By capturing the full spectral power distribution of light, we can mathematically reconstruct the invisible data that exists between the visible bands, revealing hidden correlations across wavelength, just as the analysis of EPR pairs revealed hidden correlations across space.
Silicon Photonic Architecture: The 48MP Foundation
The realization of this physics in modern hardware is constrained by the physical dimensions of the semiconductor used to capture it. The interaction of incident photons with the silicon lattice, generating electron-hole pairs, is the primary data acquisition step for any spectral analysis.
Sensor Architecture: Sony IMX803
The core of this pipeline is the Sony IMX803 sensor. Contrary to persistent rumors of a 1-inch sensor, this is a 1/1.28-inch type architecture, optimized for high-resolution radiometry.
- Active Sensing Area: Approximately \(9.8 \text{ mm} \times 7.3 \text{ mm}\). This physical limitation is paramount, as the sensor area is directly proportional to the total photon flux the device can integrate, setting the fundamental Signal-to-Noise Ratio (SNR) limit.
- Pixel Pitch: The native photodiode size is \(1.22 \, \mu\text{m}\). In standard operation, the sensor utilizes a Quad-Bayer color filter array to perform pixel binning, resulting in an effective pixel pitch of \(2.44 \, \mu\text{m}\).
Mode Selection
The choice between binned and unbinned modes depends on the analysis requirements:
- Binned mode (12MP, 2.44 µm effective pitch): Superior for low-light conditions and spectral estimation accuracy. By summing the charge from four photodiodes, the signal increases by a factor of 4, while read noise increases only by a factor of 2, significantly boosting the SNR required for accurate spectral estimation.
- Unbinned mode (48MP, 1.22 µm native pitch): Optimal for high-detail texture correlation where spatial resolution drives the analysis, such as resolving fine fiber patterns in historical documents or detecting micro-scale material boundaries.
The Optical Path
The light reaching the sensor passes through a 7-element lens assembly with an aperture of ƒ/1.78. It is critical to note that "Spectral Fingerprinting" measures the product of the material's reflectance \(R(\lambda)\) and the lens's transmittance \(T(\lambda)\). Modern high-refractive-index glass absorbs specific wavelengths in the near-UV (less than 400nm), which must be accounted for during calibration.
The Digital Container: DNG 1.7 and Linearity
The accuracy of computational physics depends entirely on the integrity of the input data. The Adobe DNG 1.7 specification provides the necessary framework for scientific mobile photography by strictly preserving signal linearity.
Scene-Referred Linearity
Apple ProRAW utilizes the Linear DNG pathway. Unlike standard RAW files, which store unprocessed mosaic data, ProRAW stores pixel values after demosaicing but before non-linear tone mapping. The data remains scene-referred linear, meaning the digital number stored is linearly proportional to the number of photons collected (\(DN \propto N_{photons}\)). This linearity is a prerequisite for the mathematical rigor of Wiener estimation and spectral reconstruction.
The ProfileGainTableMap
A key innovation in DNG 1.7 is the ProfileGainTableMap (Tag 0xCD2D). This tag stores a spatially varying map of gain values that represents the local tone mapping intended for display.
- Scientific Stewardship: By decoupling the "aesthetic" gain map from the "scientific" linear data, the pipeline can discard the gain map entirely. This ensures that the spectral reconstruction algorithms operate on pure, linear photon counts, free from the spatially variant distortions introduced by computational photography.
Algorithmic Inversion: From 3 Channels to 16 Bands
Recovering a high-dimensional spectral curve \(S(\lambda)\) (e.g., 16 channels from 400nm to 700nm) from a low-dimensional RGB input is an ill-posed inverse problem. While traditional methods like Wiener Estimation provide a baseline, modern high-end hardware enables the use of advanced Deep Learning architectures.
Wiener Estimation (The Linear Baseline)
The classical approach utilizes Wiener Estimation to minimize the mean square error between the estimated and actual spectra:
This method generates the initial 16-band approximation from the 3-channel input.
State-of-the-Art: Transformers and Mamba
For high-end hardware environments, we can utilize predictive neural architectures that leverage spectral-spatial correlations to resolve ambiguities.
- MST++ (Spectral-wise Transformer): The MST++ (Multi-stage Spectral-wise Transformer) architecture represents a significant leap in accuracy. Unlike global matrix methods, MST++ utilizes Spectral-wise Multi-head Self-Attention (S-MSA). It calculates attention maps across the spectral channel dimension, allowing the model to learn complex non-linear correlations between texture and spectrum. Hardware Demand: The attention mechanism scales quadratically \(O(N^2)\), requiring significant GPU memory (VRAM) for high-resolution images. This computational intensity necessitates powerful dedicated hardware to process the full data arrays.
- MSS-Mamba (Linear Complexity): The MSS-Mamba (Multi-Scale Spectral-Spatial Mamba) model introduces Selective State Space Models (SSM) to the domain. It discretizes the continuous state space equation into a recurrent form that can be computed with linear complexity \(O(N)\). The Continuous Spectral-Spatial Scan (CS3) strategy integrates spatial neighbors and spectral channels simultaneously, effectively "reading" the molecular composition in a continuous stream.
Computational Architecture: The Linux Python Stack
Achieving multispectral precision requires a robust, modular architecture capable of handling massive arrays across 16 dimensions. The implementation relies on a heavy Linux-based Python stack designed to run on high-end hardware.
- Ingestion and Processing: We can utilize rawpy (a LibRaw wrapper) for the low-level ingestion of ProRAW DNG files, bypassing OS-level gamma correction to access the linear 12-bit data directly. NumPy engines handle the high-performance matrix algebra required to expand 3-channel RGB data into 16-band spectral cubes.
- Scientific Analysis: Scikit-image and SciPy are employed for geometric transforms, image restoration, and advanced spatial filtering. Matplotlib provides the visualization layer for generating spectral signature graphs and false-color composites.
- Data Footprint: The scale of this operation is significant. A single 48.8MP image converted to floating-point precision results in massive file sizes. Intermediate processing files often exceed 600MB for a single 3-band layer. When expanded to a full 16-band multispectral cube, the storage and I/O requirements scale proportionally, necessitating the stability and memory management capabilities of a Linux environment.
The Spectral Solution
When analyzed through the 16 band multispectral pipeline:
| Spectral Feature | Ultramarine (Lapis Lazuli) | Azurite (Copper Carbonate) |
|---|---|---|
| Primary Reflectance Peak | Approximately 450 480nm (blue violet region) | Approximately 470 500nm with secondary green peak at 550 580nm |
| UV Response (below 420nm) | Minimal reflectance, strong absorption | Moderate reflectance, characteristic of copper minerals |
| Red Absorption (600 700nm) | Moderate to strong absorption | Strong absorption, typical of blue pigments |
| Characteristic Features | Sharp reflectance increase at 400 420nm (violet edge) | Broader reflectance curve with copper signature absorption bands |
Note: Spectral values are approximate and can vary based on particle size, binding medium, and aging.
Completing the Picture
The successful analysis of complex material properties relies on a convergence of rigorous physics and advanced computation.
- Photonic Foundation: The Sony IMX803 provides the necessary high-SNR photonic capture, with mode selection (binned vs. unbinned) driven by the specific analytical requirements of each examination.
- Data Integrity: DNG 1.7 is the critical enabler, preserving the linear relationship between photon flux and digital value while sequestering non-linear aesthetic adjustments in metadata.
- Algorithmic Precision: While Wiener estimation serves as a fast approximation, the highest fidelity is achieved through Transformer (MST++) and Mamba-based architectures. These models disentangle the complex non-linear relationships between visible light and material properties, effectively generating 16 distinct spectral bands from 3 initial channels.
- Historical Continuity: The EPR paradox of 1935 revealed that quantum particles share hidden correlations across space, correlations invisible to local measurement but real nonetheless. Modern spectral imaging reveals an analogous truth: materials possess hidden correlations across wavelength, invisible to trichromatic vision but accessible through mathematical reconstruction. In both cases, completeness requires looking beyond what direct observation provides.
This synthesis of hardware specification, file format stewardship, and deep learning reconstruction defines the modern standard for non-destructive material analysis, a spectral witness to what light alone cannot tell us.
Light can expose structure.
It cannot carry history.
That part is on us.
References
- Einstein, A., Podolsky, B., & Rosen, N. (1935). Can Quantum-Mechanical Description of Physical Reality Be Considered Complete? Physical Review, 47(10), 777–780. ↑
- Bell, J. S. (1964). On the Einstein Podolsky Rosen paradox. Physics Physique Физика, 1(3), 195–200. ↑
- Aspect, A., Dalibard, J., & Roger, G. (1982). Experimental Test of Bell's Inequalities Using Time-Varying Analyzers. Physical Review Letters, 49(25), 1804–1807. ↑