← back

There’s a small army of on-device models coming to Chrome

Technical interpretations and parameter breakdowns for various AI models, including Gemini, Gemma, ULM, and StableLM, covering architecture and scale.

Listen

Today's landscape of large language models is highly specialized, focusing heavily on speed, efficiency, and on-device performance. We see this in compact systems like the Universal Language Model, which uses just one hundred and twenty-eight million parameters for lightweight applications. There are also instruction-tuned models with one billion parameters, specifically optimized to follow human directions quickly in chat and virtual assistants.

Google's Gemini family features several extra-small and second-generation Nano variants. Many of these models are designed for edge-computing and use lower-precision quantization to save memory. Some, operating around the seven-hundred-million parameter mark, act as causal drafters to rapidly generate initial text. Even the larger Gemini drafters use efficient twenty-four-layer structures to streamline generation.

The Gemma series, in both its second and third generations, offers a highly scalable approach. These models range from incredibly light one-billion-parameter versions up to more robust twenty-seven-billion-parameter configurations, giving developers precise control over the balance between speed and capability.

Finally, models like StableLM demonstrate the push for mobile deployment. By packing three billion parameters into a format optimized for TensorFlow Lite, these architectures show that the future of artificial intelligence isn't just about getting bigger. It is about getting smarter, faster, and much closer to the user.

  1. ULM128M
  2. LLMTI1B
  3. GEMINI2_NANOV2
  4. GEMINI2_NANOV2_EE2Q
  5. GEMINI_XS
  6. GEMINI_XS_DRAFTER_6LAYER_CAUSAL_USM_700M_RESIDUAL
  7. GEMINI_XS_LUSM_700M_RESIDUAL_BOTTOM15
  8. GEMINI2_NANOV2_EE12Q
  9. GEMINI2_NANOV2_EE2_LUSM_700M
  10. GEMINI2_NANOV2_CAUSAL_700M
  11. GEMINI2_NANOV2_EE20_CAUSAL_LUSM_700M
  12. GEMINI_XL_DRAFTER_24LAYER
  13. GEMINI_XS_FA1
  14. GEMMA2_8B
  15. GEMMA2_7B
  16. GEMMA2_2B
  17. GEMMA3_1B
  18. GEMMA3_4B
  19. GEMMA3_12B
  20. GEMMA3_27B
  21. STABLELM_4E1T_3B_PHI_2_TF_LITE

1. ULM128M

  • Interpretation:
    Likely a “Universal Language Model” with 128 million parameters. Common in smaller, efficient AI applications.

2. LLMIT1B

  • Interpretation:
    Large Language Model, Instruction-Tuned, 1 Billion parameters.
    • LLM: Large Language Model
    • IT: Instruction-Tuned (fine-tuned to follow human instructions for chat, Q&A, etc.)
    • 1B: 1 billion parameters
  • Typical Use Case:
    A compact, efficient instruction-following model designed for conversational agents, chatbots, and smart assistants—optimized for inference speed while maintaining the ability to understand and follow complex user instructions.

3. GEMINI2_NANOV2

  • Interpretation:
    “Gemini2” refers to Google’s Gemini model, with “NanoV2” being its second, smallest/efficient “Nano” version.

4. GEMINI2_NANOV2_EE2Q

  • Interpretation:
    A variant of Gemini2 NanoV2, probably quantized to a lower precision (e.g., 2-bit or Q for quantized), or “EE” could mean “Edge-Enhanced.”

5. GEMINI_XS

  • Interpretation:
    “Gemini Extra Small”—likely the smallest, most efficient Gemini variant.

6. GEMINI_XS_DRAFTER_6LAYER_CAUSAL_USM_700M_RESIDUAL

  • Interpretation:
    • “XS Drafter” = Gemini Extra Small, used for drafting (possibly text generation).
    • “6Layer” = 6 transformer layers.
    • “Causal” = Unidirectional, like GPT.
    • “USM_700M” = Universal Sentence Model, 700M parameters.
    • “Residual” = Uses residual connections for better training/stability.

7. GEMINI_XS_LUSM_700M_RESIDUAL_BOTTOM15

  • Interpretation:
    Similar to above, but “LUSM” could be a variant of the universal model, and “BOTTOM15” may mean it’s using the bottom 15 layers (or some layer selection trick).

8. GEMINI2_NANOV2_EE12Q

  • Interpretation:
    Gemini2 NanoV2, probably with Edge-Enhanced (EE) features and “12Q” indicating quantization at 12 bits or a quantization scheme.

9. GEMINI2_NANOV2_EE2_LUSM_700M

  • Interpretation:
    Another Gemini2 NanoV2 variant with Edge-Enhanced 2, using a LUSM 700M parameter model.

10. GEMINI2_NANOV2_CAUSAL_700M

  • Interpretation:
    Gemini2 NanoV2, causal (unidirectional), with 700M parameters.

11. GEMINI2_NANOV2_EE20_CAUSAL_LUSM_700M

  • Interpretation:
    Gemini2 NanoV2, Edge-Enhanced 20 (version or setting), causal, LUSM, 700M parameters.

12. GEMINI_XL_DRAFTER_24LAYER

  • Interpretation:
    “XL” = Extra Large variant.
    “Drafter” = Possibly optimized for initial text generation or suggestion.
    “24Layer” = 24 transformer layers.

13. GEMINI_XS_FA1

  • Interpretation:
    Gemini Extra Small, “FA1” could be “Fast Architecture 1” or a specific feature set/version.

14. GEMMA2_8B

  • Interpretation:
    Gemma model, version 2, with 8 billion parameters.

15. GEMMA2_7B

  • Interpretation:
    Gemma version 2, 7 billion parameters.

16. GEMMA2_2B

  • Interpretation:
    Gemma version 2, 2 billion parameters.

17. GEMMA3_1B

  • Interpretation:
    Gemma version 3, 1 billion parameters.

18. GEMMA3_4B

  • Interpretation:
    Gemma version 3, 4 billion parameters.

19. GEMMA3_12B

  • Interpretation:
    Gemma version 3, 12 billion parameters.

20. GEMMA3_27B

  • Interpretation:
    Gemma version 3, 27 billion parameters.

21. STABLELM_4E1T_3B_PHI_2_TF_LITE

  • Interpretation:
    • “StableLM” = Stable Language Model (by Stability AI).
    • “4E1T” = Possibly a version or internal code.
    • “3B” = 3 billion parameters.
    • “PHI_2” = Possibly related to Microsoft’s Phi-2 model or a version.
    • “TF_LITE” = TensorFlow Lite (optimized for mobile/edge deployment).
Dan Petrovic · Jun 05, 14:01