Gemma-Embed is a bespoke 256-dim embedding model created by fine-tuning google/gemma-3-1b-pt with LoRA to enable high-fidelity query reformulation.
To automatically generate millions of search query suggestions, we need to translate vector embeddings back into natural language. But standard, off-the-shelf embedding models are designed for general similarity, not for being reversed back into text. To solve this, we built a custom embedding model called Gemma-Embed.
We created Gemma-Embed by fine-tuning a one-billion-parameter Google Gemma model. By using Low-Rank Adaptation, or LoRA, along with a custom projection head, we compress the embeddings into a consistent, two-hundred and fifty-six dimension space. Because we control this geometry, our query decoder can accurately map these vectors back into readable text.
Our training pipeline runs in three phases. First, we use unsupervised learning on more than half a million sentences to establish basic semantic relationships. Second, we train the model on millions of paraphrase pairs so it learns which sentences share the same meaning. Finally, we run in-domain self-contrast training using over seven million search queries.
This multi-stage process locks in a precise latent space. It allows us to successfully traverse the embedding space and generate diverse, high-quality search queries without relying on manual human labeling.
In our previous post, Training a Query Fan-Out Model, we demonstrated how to generate millions of high-quality query reformulations without human labelling, by navigating the embedding space between a seed query and its target document and then decoding each intermediate vector back into text using a trained query decoder.
That decoder’s success critically depends on having an embedding encoder whose latent geometry is fully under our control: off-the-shelf models (e.g. mxbai embed large) optimize for general semantic similarity, not for invertibility, so their embeddings cannot reliably be mapped back into meaningful queries.
To bridge that gap, this article introduces Gemma-Embed, a bespoke 256-dim embedding model built by fine-tuning google/gemma-3-1b-pt with LoRA adapters and contrastive objectives. By training our own encoder, we lock in a consistent, L2-normalized latent space that the subsequent query decoder can invert with high fidelity.
Together, these steps automate query fan-out, boost retrieval performance, and open the door to interpretable, language-agnostic search suggestions.
To power a query fan‑out decoder that inverts embeddings back to natural language queries, we need an embedding encoder whose latent geometry we control. Since no off‑the‑shelf Gemma‑3 embedding model exists, we fine‑tune google/gemma‑3‑1b‑pt with LoRA and contrastive objectives to produce high‑quality, L2‑normalized 256‑dim embeddings.
google/gemma-3-1b-pt (1 B params)q_proj, v_projtext.txt (wiki sentences or plain text logs)triplets.csva_ids,a_mask,p_ids,p_mask,n_ids,n_mask (token IDs & masks)queries.dbpretokenized_queries.ptinput_ids (7,129,444 × 128), attention_mask (7,129,444 × 128)
Sign in with Google to comment.