Research

CAK: Emergent Audio Effects from Minimal Deep Learning | First Author

August 2025

A single 3×3 convolutional kernel with 11 learnable parameters discovers emergent audio effects when trained on 200 samples from a personalized corpus. The structured complexity of audio spectrograms, combined with targeted architectural constraints, forces the network toward minimal viable solutions where one well-learned pattern adapts contextually across diverse inputs rather than memorizing distributions.

Introduces Conditioning Aware Kernels (CAK): implementing additive residual modulation with soft-gating that preserves identity while scaling effect intensity through scalar multiplication. Each spectrogram is paired with a randomly sampled scalar. During training, the network learns to associate arbitrary control values with proportional effect intensity through the audit game, rather than requiring explicit supervision.

Introduces AuGAN (Audit GAN / WGAN-GP), reframing adversarial training from forgery detection to cooperative control verification: both generator and discriminator share the learned kernel. Training completes in ~2 hours on consumer hardware.

Enables personalized effect discovery where artists train on curated material (50 minutes of audio) to uncover transformations.

Paper →Code →Hugging Face →Video Demonstration →

Research

CAK: Emergent Audio Effects from Minimal Deep Learning | First Author

Austin Rockman