Why Image Hardening
Images are attack vectors. Adversarial perturbations can fool classifiers. LSB steganography hides data in the least significant bits of pixel values. DCT domain payloads embed information in JPEG compression coefficients. If you are building any system that accepts user uploaded images, you need a way to neutralize these threats without destroying the visual content. That is what pixmask does: a 4 stage image hardening pipeline that strips malicious payloads while keeping the image visually intact.
The design goal was simple. Every image that enters your system goes through pixmask, and what comes out is clean. No gradient perturbations survive. No steganographic payloads survive. No DCT domain tricks survive. And it has to be fast enough to run inline on upload, not as a batch job.
The Four Stages
Stage 1 is validation. Before we touch any pixel data, we parse and validate the image container format. Malformed headers, oversized dimensions, unexpected color spaces, all rejected here. This catches the obvious stuff: polyglot files, zip bombs disguised as PNGs, truncated JPEGs that exploit decoder vulnerabilities. It is a cheap sanity check that prevents the expensive stages from operating on garbage.
Stage 2 is safe decode. We decode the image using a hardened decoder path with strict bounds checking and no reallocation during decode. The decoded pixels land in a pre allocated arena buffer. This stage exists because image decoders are historically a rich source of CVEs, and we want decode to happen in a controlled environment before any processing.
Stage 3 is SIMD accelerated bit depth reduction. This is where LSB steganography dies. We mask out the bottom N bits of every channel value using vectorized operations. On x86, this runs on AVX2 via Google Highway. On ARM, it uses NEON. The bit depth reduction is configurable (default strips 2 bits), which destroys any information hidden in the least significant bits while introducing imperceptible visual change. A 2 bit reduction on 8 bit channels means you lose 0.78% of the dynamic range. Visually undetectable. But any payload stored in those bits is gone.
Stage 4 is a randomized JPEG roundtrip. We encode the image to JPEG at a randomly selected quality level (within a configurable range, default 85 to 95), then decode it back. This is the DCT domain killer. Any payload embedded in specific DCT coefficients gets scrambled by the lossy compression, and the randomized quality level means an attacker cannot predict and compensate for the exact quantization table. This stage also destroys any remaining gradient perturbations, since JPEG compression is a non differentiable transform that breaks the carefully crafted pixel level noise that adversarial attacks rely on.
Performance
The entire pipeline runs in under 15 milliseconds per image on a typical 1080p input. The hot path has zero heap allocations. All pixel buffers come from a pre allocated arena that gets reused across calls. The SIMD bit masking stage processes roughly 16 pixels per cycle on AVX2. Memory layout is channel interleaved (RGBRGB...) to maximize cache line utilization during the vectorized masking pass.
The C++17 core uses Google Highway for portable SIMD. Highway dispatches to the best available instruction set at runtime: AVX2 on modern x86, NEON on ARM, SSE4 as fallback. You compile once and it runs fast everywhere. No manual intrinsics, no platform specific build flags.
Python Bindings
The core is C++17 but I wanted this to be usable from Python without friction. nanobind provides the bindings. The Python API is a single function call: pixmask.harden(image_bytes) returns hardened bytes. There is also a pixmask.harden_file(path) for filesystem operations. The package is pip installable with pre built wheels for Linux and macOS. No compiler needed on the user's end.
Under the hood, the Python binding passes buffer pointers directly to the C++ arena, so there is no copy on the way in. The hardened image bytes are copied out once when returning to Python. Total Python overhead is under 100 microseconds per call, dominated by the GIL acquisition. For batch processing, there is a pixmask.harden_batch that releases the GIL and processes images in parallel across a thread pool.