Nucleus Image

The 1st Sparse MoE Diffusion TransformerLeading Performance

2Bactive params

17Btotal params

64experts

Nucleus Image introduces the first sparse Mixture-of-Experts architecture to diffusion-based image generation — activating only 2B of 17B total parameters per forward pass.

A custom Expert-Choice routing mechanism dynamically selects from 64 specialized experts plus one shared expert, enabling the model to allocate compute where it matters most.

The architecture employs Grouped-Query Attention with adaptive modulation across 32 transformer blocks, achieving state-of-the-art quality at a fraction of the compute.

Trained with a capacity factor schedule — 4.0 for early layers, tapering to 2.0 for deeper layers — learning efficient expert specialization without sacrificing diversity.

The result: leading benchmark performance across DPG-Bench, GenEval, and overall quality metrics, at 10x parameter efficiency versus the nearest dense competitor.

Performance scores are approximate from publicly available benchmarks.

Data compiled March 2026

Nucleus Image

Gallery