There’s a small army of on-device models coming to Chrome

Discovery

Technical interpretations and parameter breakdowns for various AI models, including Gemini, Gemma, ULM, and StableLM, covering architecture and scale.

Listen

Today's landscape of large language models is highly specialized, focusing heavily on speed, efficiency, and on-device performance. We see this in compact systems like the Universal Language Model, which uses just one hundred and twenty-eight million parameters for lightweight applications. There are also instruction-tuned models with one billion parameters, specifically optimized to follow human directions quickly in chat and virtual assistants.

Google's Gemini family features several extra-small and second-generation Nano variants. Many of these models are designed for edge-computing and use lower-precision quantization to save memory. Some, operating around the seven-hundred-million parameter mark, act as causal drafters to rapidly generate initial text. Even the larger Gemini drafters use efficient twenty-four-layer structures to streamline generation.

The Gemma series, in both its second and third generations, offers a highly scalable approach. These models range from incredibly light one-billion-parameter versions up to more robust twenty-seven-billion-parameter configurations, giving developers precise control over the balance between speed and capability.

Finally, models like StableLM demonstrate the push for mobile deployment. By packing three billion parameters into a format optimized for TensorFlow Lite, these architectures show that the future of artificial intelligence isn't just about getting bigger. It is about getting smarter, faster, and much closer to the user.

ULM128M
LLMTI1B
GEMINI2_NANOV2
GEMINI2_NANOV2_EE2Q
GEMINI_XS
GEMINI_XS_DRAFTER_6LAYER_CAUSAL_USM_700M_RESIDUAL
GEMINI_XS_LUSM_700M_RESIDUAL_BOTTOM15
GEMINI2_NANOV2_EE12Q
GEMINI2_NANOV2_EE2_LUSM_700M
GEMINI2_NANOV2_CAUSAL_700M
GEMINI2_NANOV2_EE20_CAUSAL_LUSM_700M
GEMINI_XL_DRAFTER_24LAYER
GEMINI_XS_FA1
GEMMA2_8B
GEMMA2_7B
GEMMA2_2B
GEMMA3_1B
GEMMA3_4B
GEMMA3_12B
GEMMA3_27B
STABLELM_4E1T_3B_PHI_2_TF_LITE

1. ULM128M

Interpretation:
Likely a “Universal Language Model” with 128 million parameters. Common in smaller, efficient AI applications.

2. LLMIT1B

Interpretation:
Large Language Model, Instruction-Tuned, 1 Billion parameters.
- LLM: Large Language Model
- IT: Instruction-Tuned (fine-tuned to follow human instructions for chat, Q&A, etc.)
- 1B: 1 billion parameters
Typical Use Case:
A compact, efficient instruction-following model designed for conversational agents, chatbots, and smart assistants—optimized for inference speed while maintaining the ability to understand and follow complex user instructions.

3. GEMINI2_NANOV2

Interpretation:
“Gemini2” refers to Google’s Gemini model, with “NanoV2” being its second, smallest/efficient “Nano” version.

4. GEMINI2_NANOV2_EE2Q

Interpretation:
A variant of Gemini2 NanoV2, probably quantized to a lower precision (e.g., 2-bit or Q for quantized), or “EE” could mean “Edge-Enhanced.”

5. GEMINI_XS

Interpretation:
“Gemini Extra Small”—likely the smallest, most efficient Gemini variant.

6. GEMINI_XS_DRAFTER_6LAYER_CAUSAL_USM_700M_RESIDUAL

Interpretation:
- “XS Drafter” = Gemini Extra Small, used for drafting (possibly text generation).
- “6Layer” = 6 transformer layers.
- “Causal” = Unidirectional, like GPT.
- “USM_700M” = Universal Sentence Model, 700M parameters.
- “Residual” = Uses residual connections for better training/stability.

7. GEMINI_XS_LUSM_700M_RESIDUAL_BOTTOM15

Interpretation:
Similar to above, but “LUSM” could be a variant of the universal model, and “BOTTOM15” may mean it’s using the bottom 15 layers (or some layer selection trick).

8. GEMINI2_NANOV2_EE12Q

Interpretation:
Gemini2 NanoV2, probably with Edge-Enhanced (EE) features and “12Q” indicating quantization at 12 bits or a quantization scheme.

9. GEMINI2_NANOV2_EE2_LUSM_700M

Interpretation:
Another Gemini2 NanoV2 variant with Edge-Enhanced 2, using a LUSM 700M parameter model.

10. GEMINI2_NANOV2_CAUSAL_700M

Interpretation:
Gemini2 NanoV2, causal (unidirectional), with 700M parameters.

11. GEMINI2_NANOV2_EE20_CAUSAL_LUSM_700M

Interpretation:
Gemini2 NanoV2, Edge-Enhanced 20 (version or setting), causal, LUSM, 700M parameters.

12. GEMINI_XL_DRAFTER_24LAYER

Interpretation:
“XL” = Extra Large variant.
“Drafter” = Possibly optimized for initial text generation or suggestion.
“24Layer” = 24 transformer layers.

13. GEMINI_XS_FA1

Interpretation:
Gemini Extra Small, “FA1” could be “Fast Architecture 1” or a specific feature set/version.

14. GEMMA2_8B

Interpretation:
Gemma model, version 2, with 8 billion parameters.

15. GEMMA2_7B

Interpretation:
Gemma version 2, 7 billion parameters.

16. GEMMA2_2B

Interpretation:
Gemma version 2, 2 billion parameters.

17. GEMMA3_1B

Interpretation:
Gemma version 3, 1 billion parameters.

18. GEMMA3_4B

Interpretation:
Gemma version 3, 4 billion parameters.

19. GEMMA3_12B

Interpretation:
Gemma version 3, 12 billion parameters.

20. GEMMA3_27B

Interpretation:
Gemma version 3, 27 billion parameters.

21. STABLELM_4E1T_3B_PHI_2_TF_LITE

Interpretation:
- “StableLM” = Stable Language Model (by Stability AI).
- “4E1T” = Possibly a version or internal code.
- “3B” = 3 billion parameters.
- “PHI_2” = Possibly related to Microsoft’s Phi-2 model or a version.
- “TF_LITE” = TensorFlow Lite (optimized for mobile/edge deployment).

Dan Petrovic · Jun 05, 14:01