Machine learning models

We believe in small, dedicated models trained on the highest quality data.

Each model we train does one thing only, and does it really well. Interested in hearing how we can transform your SEO using machine learning techniques?

Let’s talk

DeBERTa-v3 · Binary classifier

AI Content Detection Model

Binary classification model for detecting AI-generated vs human-written text. A fine-tuned DeBERTa-v3 model trained to distinguish between organic (human-written) and AI-generated content. Uses class-weighted training to handle imbalanced datasets and is optimized for high precision in content authenticity detection.

Key features

Binary classification – organic vs AI-generated content detection
DeBERTa-v3 architecture – state-of-the-art transformer for text understanding
Class-weighted training – handles imbalanced datasets effectively
High precision – optimized for content authenticity verification

Use cases

Content moderation – detect AI-generated spam or fake content
Academic integrity – identify AI-assisted writing in submissions
Publishing verification – ensure content authenticity for publications
SEO quality control – filter AI-generated content in content strategies

Model demo ↗

Money-link detection

Link Spam Algorithm

Link spam algorithm which can identify money links on any page. If our algorithm can spot your link, so can Google’s, and that means that your links are either being devalued or pose a risk of penalty.

Batch processing

Batch analysis is only available to our paying clients.
If you’re interested in batch processing, please get in touch.

Link Spam Algorithm demo ↗

Multilingual T5 · Query expansion

Search Query Fan-Out

The model generates diverse, contextually relevant search query variations for a given URL and seed query. By reformulating queries, it helps capture a broader range of search intents, improving organic search visibility and click-through rates.

This is especially valuable for SEO, content optimization, and keyword targeting, enabling discovery of traffic-driving variations that may not surface through manual keyword research.

Inference: High Effort (Deep Analysis)

Stochastic sampling in large batches with varied seeds, high temperature/top-p diversity, duplicate suppression, and log-probability/length scoring. Produces up to 200 unique candidates, sorted by quality.

Inference: Quick Fan-Out

Beam search with diversity penalty and no sampling. Small set (default 10) of deterministic, diverse expansions generated quickly.

Training ran for 70 hours, 5 times over 15 million training samples.

Rigorously following all available steps in Google’s query fan-out process we train a search query reformulation model with optimization and improvements geared towards SEO use. In the two-step process we first create a custom architecture Gemma 3 1B for feature extraction and use it to augment our training data via interpolation between the query and the target documents using vector embedding space traversal. Combining Google Search Console data (query and URL pairs) augmented with synthetic data we then fine-tune a large multilingual T5 model for query expansion.

Search Query Fan-Out demo ↗

Open-set · Multi-label

Universal Search Query Classifier

Generalist, open-set classification for any label taxonomy.

Multi-label text classification for search queries with arbitrary label support. The Universal Query Classifier is a specialized model for classifying search queries into multiple intent categories simultaneously. Unlike traditional single-label classifiers, this model supports threshold-based multi-label assignment and works with any custom label set.

Key features

Multi-label classification – assigns multiple relevant labels per query
Arbitrary label support – works with any custom label set and descriptions
Threshold-based assignment – configurable confidence thresholds
Special token format – uses [QUERY], [LABEL_NAME], and [LABEL_DESCRIPTION] tokens
Multiple model sizes – X-Small, Small, Base, and Large variants

Use cases

SEO intent analysis – classify queries by commercial intent, brand awareness, etc.
Content strategy – understand query intent to optimize content targeting
Search advertising – categorize keywords for campaign optimization
Customer support – route queries based on intent classification

Model variants

X-Small – fast inference for real-time applications
Small – balanced speed and accuracy
Base – recommended for most use cases
Large – highest accuracy for complex classification tasks

ALBERT · Multi-label intent

Classic Query Intent Classifier

Multi-label search query classification model developed by Dejan AI. The model is designed to be deployed in an automated pipeline capable of classifying search query intent for large volumes of search queries from common data sources such as ad campaigns and organic search tools and platforms.

Classification labels

LABEL_0: Commercial
LABEL_1: Non-Commercial
LABEL_4: Informational
LABEL_5: Navigational
LABEL_6: Transactional
LABEL_7: Commercial Investigation
LABEL_8: Local
LABEL_9: Entertainment

Base models

Models: dejanseo/Intent-XS · dejanseo/Intent-XL

BERT · Anchor-text prediction

LinkBERT

LinkBERT is a fine-tuned version of Google’s BERT model, designed to predict natural link placement within web content. This binary classification model excels in identifying distinct token ranges that web authors are likely to choose as anchor text for links. By analysing never-before-seen texts, LinkBERT can predict areas within the content where links might naturally occur, effectively simulating web author behaviour in link creation.

LinkBERT is positioned as a powerful tool for content creators, SEO specialists, and webmasters, offering unparalleled support in optimizing web content for both user engagement and search engine recognition.

Use cases

Spam and inorganic SEO detection – helps identify unnatural link patterns
Anchor text suggestion – suggests potential anchor texts during internal link optimization
Evaluation of existing links – assesses the naturalness of link placements
Link placement guide – suggests optimal placement for links within content
Anchor text idea generator – provides creative anchor text suggestions

Models: dejanseo/LinkBERT · LinkBERT-mini · LinkBERT-XL

Demo ↗

7-point · Multi-label sentiment

Sentiment

Multi-label sentiment classification model developed by Dejan Marketing. The model is designed to be deployed in an automated pipeline capable of classifying text sentiment for thousands (or even millions) of text chunks or as a part of a scraping pipeline.

Classification labels

0: very positive
1: positive
2: somewhat positive
3: neutral
4: somewhat negative
5: negative
6: very negative

Sources of training data

Synthetic. Llama3.

Models: dejanseo/sentiment · dejanseo/good-vibes

Demo ↗

ALBERT · Well-formedness

Query Form Quality Classifier

We build on the work by Manaal Faruqui and Dipanjan Das from Google AI Language team to train a search query classifier of well-formed search queries. Our model offers a 10% improvement over Google’s classifier by utilising ALBERT architecture instead of LSTM.

Practical application

With accuracy of 80%, the model is production ready and has already been deployed in Dejan AI’s query processing pipeline. The role of the model is to help identify query expansion candidates by flagging ambiguous queries retrieved via the Google Search Console API.
Most search queries are ambiguous, making it difficult to classify intent and decide how to optimise for them. Query expansion helps, but only if you know which queries to expand. This is where our model comes in.

Model: dejanseo/Query-Quality-Classifier

Demo ↗

Model quality

Our robust model validation process ensures model quality for most common classification and natural language processing tasks.

Recall

Precision

Accuracy

Evaluation metrics ↗

Bespoke

Custom models

Our team can work with you to design and train your very own industry-, language- or task-specific model.

Example: Bulgarian Search Query Intent

Fine-tuned from mDeBERTa V3 for search query intent classification in the Bulgarian language. It predicts one of five intent categories:
COMMERCIAL_INVESTIGATION – queries with purchase intent but requiring additional research
INFORMATIONAL – queries seeking knowledge or facts
LOCAL – queries related to local services or locations
NAVIGATIONAL – queries aiming to reach a specific website or service
TRANSACTIONAL – queries with a direct intent to complete an action

Model: dejanseo/bulgarian-search-query-intent

Demo ↗

Need a model built for your problem?

Book a conference call with our team to discuss a custom model for your industry, language or task.

Book a call