Nucleus AI Logo
∇²ψ + V(x)ψ = Eψ

Nucleus Image

The 1st Sparse MoE Diffusion TransformerLeading Performance
2Bactive params
17Btotal params
64experts
p(x) = ∫ p(x|z) · p(z) dz

Nucleus Image introduces the first sparse Mixture-of-Experts architecture to diffusion-based image generation — activating only 2B of 17B total parameters per forward pass.

A custom Expert-Choice routing mechanism dynamically selects from 64 specialized experts plus one shared expert, enabling the model to allocate compute where it matters most.

The architecture employs Grouped-Query Attention with adaptive modulation across 32 transformer blocks, achieving state-of-the-art quality at a fraction of the compute.

Trained with a capacity factor schedule — 4.0 for early layers, tapering to 2.0 for deeper layers — learning efficient expert specialization without sacrificing diversity.

The result: leading benchmark performance across DPG-Bench, GenEval, and overall quality metrics, at 10× parameter efficiency versus the nearest dense competitor.

Architecture

Nucleus-Image · Transformer Block Diagram

TRANSFORMER BLOCKREPEAT BLOCK32×DATA FLOWCConditionzNoisy latentsNRMSNormCROSS ATTNNRMSNorm 1MModulationAGrouped-QueryAttentiond_model = 3,072residualNRMSNorm 2MModulationEExpert-Choice MoE64 experts · top-kresidualNAdaLayerNormLLinear output projDenoised latentsEXPANDEExpert-Choice MoExUnmodulated xTimestepRRouterShared ExpertExpert 1Expert 6464 ROUTED
!
Architecture Note
First 3 blocks use dense FFN with hidden size 2,048 instead of MoE.
Model Specifications
Total Parameters17B
Active Parameters2B
Experts64
Shared Experts1
Capacity Factor
4.0layers 1–2
2.0layers 5+

Gallery

Curated generations