The Sonic Shield – Experimental Quantum Watermarking

1. The Challenge

Context: The rapid evolution of Generative AI (GenAI) voice cloning has outpaced traditional security measures. My goal was to engineer a "Defense-in-Depth" prototype that replaces standard mathematical randomness with physical entropy to create a unique, non-deterministic audio watermark.
The Obstacle: Developing on the bleeding edge requires navigating strict hardware limitations. I had to build this system using the IBM Quantum Open Plan, which restricts users to just 10 minutes of Quantum Processing Unit (QPU) execution time per month. This forced a shift from "real-time generation" to a highly optimized "batch-harvesting" architecture.

2. The Solution Architecture

The objective was to explore the theoretical application of Quantum Phase Poisoning against Neural Vocoders.

The Pipeline: IBM Torino (127-Qubit) $\to$ Batch Entropy Harvesting $\to$ Frequency Domain Injection $\to$ Phase Rotation $\to$ WAV Output

Key Decisions:

Resource Management (IBM Free Tier): With only 600 seconds of runtime available monthly, I couldn't run a circuit for every audio file. I architected a "Generate Once, Use Locally" system that harvests ~500kb of true entropy in a single 4-second burst, caching it locally for long-term use.
Qiskit SamplerV2: I chose the latest Qiskit primitive because it returns raw bitstrings (shot memory) rather than probability distributions. This was essential for converting physical qubit collapse into a binary stream usable for DSP operations.

3. Implementation Highlights

A. Maximizing the "Free Tier" (Batch Processing)

To work within the IBM Open Plan limits, I designed a circuit that maximizes qubit utilization per shot.

def generate_quantum_seed(backend):
    # Target the 127-qubit 'ibm_brisbane' processor
    # We apply Hadamard gates to ALL available qubits to maximize entropy per second
    qc = QuantumCircuit(127)
    qc.h(range(127)) 
    qc.measure_all()

    # Optimization: One job, maximum shots (4096).
    # This consumes ~3 seconds of QPU time but yields ~520k bits.
    sampler = Sampler(backend)
    job = sampler.run([isa_circuit], shots=4096)
  
    return job.result()[0].data.meas.get_bitstrings()

B. Theoretical Phase Poisoning

Neural Vocoders (like HiFi-GAN) generally require precise phase alignment to reconstruct waveforms. I implemented a stochastic phase shifter that rotates audio angles based on the quantum key.

def inject_phase_poison(y, key_path):
    # Convert audio to Frequency Domain (Complex Numbers)
    D = librosa.stft(y, n_fft=2048)
    magnitude, phase_angle = librosa.magphase(D)
  
    # Load Quantum Key
    q_bits = load_quantum_bits_raw(key_path)
  
    # Theory: Rotating phase by 45 degrees (pi/4) should disrupt
    # the Vocoder's ability to converge, while remaining audible to humans.
    phase_shift = q_bits * (np.pi / 4)
    new_phase = np.angle(D) + phase_shift
  
    # Reconstruct
    return librosa.istft(magnitude * np.exp(1j * new_phase))

4. Challenges & Limiting Factors

The Roadblock: The "Denoising" Resilience of Transformers. While the system successfully injected high-entropy quantum noise and phase shifts, modern GenAI models (like ElevenLabs v2) proved remarkably resilient.

The Issue: Current voice cloning models are often based on Diffusion or Transformers, which are trained specifically to "denoise" inputs. They effectively treat my quantum watermark as background static and filter it out during the feature extraction process.
The Result: Despite heavy signal processing, the AI was often able to reconstruct the voice features, resulting in a successful clone.
Engineering Pivot: I shifted the project focus from a "Prevention Tool" to an "Attribution Tool." Even if the voice can be cloned, the Morse Code signature layer remains detectable in the spectrogram, allowing for cryptographic proof of ownership of the original file.

5. Results & Post-Mortem

Engineering Efficiency: Successfully managed strict API limits, extracting weeks' worth of entropy in single-digit seconds of QPU time.
Cryptographic Quality: The generated keys achieved a Shannon Entropy of 0.999, proving that IBM's quantum hardware provides superior randomness compared to standard PRNGs.
Defense Efficacy: Experimental. While theoretically sound against older DSP-based vocoders, this approach confirmed that Phase Poisoning alone is insufficient against modern, large-scale generative models, which "hear" through the noise better than anticipated.

Sonic Shield

Table of Contents