Nucleus Image
Nucleus Image introduces the first sparse Mixture-of-Experts architecture to diffusion-based image generation — activating only 2B of 17B total parameters per forward pass.
A custom Expert-Choice routing mechanism dynamically selects from 64 specialized experts plus one shared expert, enabling the model to allocate compute where it matters most.
The architecture employs Grouped-Query Attention with adaptive modulation across 32 transformer blocks, achieving state-of-the-art quality at a fraction of the compute.
Trained with a capacity factor schedule — 4.0 for early layers, tapering to 2.0 for deeper layers — learning efficient expert specialization without sacrificing diversity.
The result: leading benchmark performance across DPG-Bench, GenEval, and overall quality metrics, at 10× parameter efficiency versus the nearest dense competitor.
Gallery
Curated generations











































