Google Research's TimesFM-ICF uses in-context fine-tuning to achieve high-performance time-series forecasting without the need for traditional model training.
If you have ever managed time-series forecasting in production, you know the struggle. Traditional tools like Prophet require manual tuning, while deep learning models demand massive training datasets. Zero-shot foundation models promised to solve this, but they historically fell short of models fine-tuned on specific data.
Now, Google Research has introduced a major breakthrough called TimesFM-ICF, which stands for In-Context Fine-tuning. Presented at the International Conference on Machine Learning, this model achieves the high performance of a custom, fine-tuned model without any actual training or gradient updates.
It works by borrowing a concept from large language models: few-shot prompting. Instead of feeding the model a single historical series, you prompt it with the target series plus up to fifty related examples. These could be historical sales curves, seasonal patterns, or even competitor data. Special separator tokens and a cross-example attention mechanism allow the model to learn from these references on the fly.
This solves some of the biggest headaches in forecasting. For cold-start scenarios with new products, you can immediately prompt the model with launch patterns from similar items. For sudden market shifts, you can feed in recent post-crisis data to guide the predictions in real time.
The results are striking. TimesFM-ICF improves performance on benchmarks by nearly seven percent over the base model, and it runs sixteen times faster than traditional fine-tuning. While this specific in-context model is not yet publicly available, it signals a massive shift toward a new era of instant, zero-training production forecasting.
In-Context Fine-Tuning for Time-Series: The Next Evolution Beyond Prophet and Traditional Forecasting
How Google’s TimesFM-ICF achieves fine-tuned model performance without training – and why this changes everything for production forecasting systems
If you’re reading this, you’ve likely wrestled with time-series forecasting in production. Perhaps you’ve implemented Facebook Prophet for its interpretable seasonality decomposition, experimented with Amazon’s DeepAR for probabilistic forecasting, or even tried retrofitting GPT models for numerical prediction. Each approach comes with trade-offs that practitioners know all too well.
Prophet excels at business time-series with strong seasonal patterns but requires manual tuning for each new dataset. DeepAR handles multiple related time-series but needs substantial training data. Neural Prophet adds deep learning components but inherits Prophet’s single-series limitations. And while foundation models like TimesFM and Chronos promised zero-shot forecasting, they’ve consistently underperformed compared to models fine-tuned on specific datasets.
Until now.
Geometric mean of scaled MASE on the OOD Benchmark. This benchmark is essentially the zero-shot benchmark used in (Ansari et al., 2024), modified slightly to guarantee a zero-shot evaluation of TimesFM-ICF. Our in-context fine-tuning approach improves the performance TimesFM (base) over all other benchmark models, and achieves the same performance as that of TimesFM-FT , the model which separately fine-tunes TimesFM (base) on the training split of each task before making predictions.
Google Research’s new TimesFM-ICF (In-Context Fine-tuning) model, presented at ICML 2025, fundamentally changes this equation. It achieves fine-tuned model performance while remaining truly zero-shot – no gradient updates, no training loops, just inference with cleverly chosen context examples.
Visualization of TimesFM-ICF predictions on the Monash Australian Electricity dataset
The key insight is deceptively simple: what if we could “prompt” a time-series model with examples, just like we prompt ChatGPT with few-shot examples?
Analogous to few-shot prompting of a foundation LLM (left), we train a time-series foundation model to support few-shot prompting with an arbitrary number of related in-context time-series examples (right). The dashed box encloses the full context window/prompt.
Traditional time-series models see the world like this:
# Traditional approach (Prophet-style)model = Prophet()model.fit(historical_data) # Training requiredforecast = model.predict(future_dates)TimesFM-ICF introduces a paradigm shift:
# In-context fine-tuning approachforecast = timesfm_icf.predict( target_history=web_traffic[-512:], context_examples=[ competitor_traffic[-512:], # Related series 1 seasonal_pattern_last_year, # Related series 2 similar_product_launch_traffic, # Related series 3 # ... up to 50 examples ])
Two illustrative examples on how in-context examples can help disambiguate the prediction tasks, that likely patterns based solely on the history can get proved or disproved by the patterns from the in-context examples.
The model architecture builds on the decoder-only Transformer design but with crucial modifications:
TimesFM-ICF employs the decoder-only architecture for time-series prediction with in-context examples.
Here’s a simplified visualization of how data flows through the model:
[Series 1: E-commerce Site A Traffic] ↓ Patchify (32 points/patch)[P1][P2][P3]...[P16][SEP] ↓[Series 2: E-commerce Site B Traffic] [P1][P2][P3]...[P16][SEP] ↓[Target Series: Your Site Traffic][P1][P2][P3]...[P12][PREDICT→][P13][P14][P15][P16] ↓ Transformer with Cross-Example Attention ↓ Future PredictionsTraditional Approach: Wait months to gather data, or use naive baselines.
Prophet-Style Solution:
# Not enough data for reliable seasonality detectionmodel = Prophet(yearly_seasonality=True) # Guessingmodel.fit(two_weeks_of_data) # UnreliableTimesFM-ICF Solution:
# Leverage similar product launches immediatelycontext_examples = [ previous_product_launch_curves, category_average_patterns, seasonal_patterns_from_last_year]forecast = model.predict_with_context(new_product_data, context_examples)Traditional models struggle with sudden pattern changes. TimesFM-ICF can adapt in real-time by including recent examples of the new regime:
# COVID-19 traffic pattern shift examplepre_covid_patterns = traffic_jan_2020early_covid_patterns = traffic_march_2020Unlike Prophet which requires separate models for different granularities, TimesFM-ICF handles multiple resolutions simultaneously:
# Single model, multiple granularitieshourly_context = [hourly_patterns_from_similar_days]daily_context = [daily_patterns_from_similar_weeks]weekly_context = [weekly_patterns_from_similar_quarters]
Scaled MASE (GM) vs number of in-context examples over the short context datasets in the OOD Benchmark. We also plot the total inference time for all the datasets as we vary the number of examples. All numbers are averaged over 5 runs with the corresponding one standard error.
Build a library of canonical patterns for your domain:
class ContextLibrary: def __init__(self): self.patterns = { 'black_friday': self.load_black_friday_patterns(), 'product_launch': self.load_launch_patterns(), 'seasonal_q4': self.load_q4_patterns(), 'viral_growth': self.load_viral_patterns(), 'paid_campaign': self.load_campaign_patterns() }Use similarity metrics to automatically select relevant examples:
def select_context_examples(target_series, candidate_pool, n_examples=50): """ Automatically select most relevant context examples using multiple similarity metrics """ similarities = []For complex businesses with multiple levels of aggregation:
class HierarchicalContextBuilder: def build_context(self, target_store, target_category, target_sku): """ Build context from multiple hierarchy levels """ context = []


Instead of waiting weeks for A/B test results:
def predict_ab_test_outcome(test_config, early_results): """ Predict full A/B test results from first 48 hours """ context_examples = []Understanding channel interactions without complex MMM models:
def predict_channel_impact(channel_spend, other_channels_history): """ Predict impact of channel spend changes using cross-channel patterns """ # Include successful channel mix examples successful_campaigns = get_high_roi_campaign_patterns()Unlike traditional anomaly detection that relies on fixed thresholds:
class ContextualAnomalyDetector: def is_anomalous(self, current_pattern): """ Determine if pattern is anomalous given context """ # Get similar historical contexts similar_contexts = self.find_similar_contexts(current_pattern)The empirical results are striking:
Validation errors during training time suggest that (1) NoPE works better than APE, and (2) NoPE performs on par with other positional encodings that generalize length.
Scaled MASE (GM) for various in-context example selection strategies for the OOD benchmark: 1) 50 random examples, 2) 45 Random examples and 5 examples from the immediate past history 3) 45 examples chosen at random from similar time-series (according to DTW distance) and 5 examples from the immediate past history 4) 40 Random examples and 10 examples from the immediate past history. The error bars are one standard deviation of the evaluations averaged over 10 random seeds.
Heatmap of in-context example configurations. The configuration with smallest validation loss has 11 in-series examples and 22 randomly-selected examples.
Most importantly, it shows that simple random selection of context examples often works well – you don’t need sophisticated retrieval mechanisms to start.
For teams currently using Prophet or similar tools, here’s a practical migration path:
Organizations can benefit from patterns across companies without sharing raw data:
# Company A provides encrypted pattern embeddingscompany_a_patterns = encrypt_patterns(company_a_data)Unlike traditional models that need retraining:
class AdaptiveForecaster: def predict_with_adaptation(self, target): # Morning prediction with overnight context morning_context = get_overnight_patterns() morning_forecast = predict(target, morning_context)Apply patterns from completely different domains:
# Use viral social media patterns to predict product adoptionsocial_viral_patterns = get_tiktok_viral_patterns()product_forecast = predict( new_product_sales, context=[social_viral_patterns, previous_product_launches])TimesFM-ICF represents more than an incremental improvement – it’s a fundamental shift in how we approach time-series forecasting. By borrowing the in-context learning paradigm from LLMs, it offers:
For practitioners, this means less time managing model pipelines and more time understanding business context. The question isn’t whether to adopt in-context forecasting, but how quickly you can build your context library and migration plan.
The age of “train once, deploy everywhere” forecasting has arrived. The only question is: what patterns will you discover when you can learn from any related time-series, anywhere, instantly?
Based on the paper and current information available, here’s the status of model availability:
The original TimesFM that this work builds on is available:
GitHub Repository: https://github.com/google-research/timesfm
Hugging Face:
pip install timesfmCurrent Usage Example (Base TimesFM):
import timesfmConsider these available alternatives that offer some similar capabilities:
MOMENT (Multi-variate forecasting):
pip install momentfm
https://github.com/moment-timeseries-foundation-model/moment
Chronos (Amazon’s foundation model):
pip install chronos-forecasting
https://github.com/amazon-science/chronos-forecasting
Lag-Llama (probabilistic forecasting):
https://github.com/time-series-foundation-models/lag-llama
The authors’ email addresses from the paper (senrajat@google.com, abhidas@google.com) suggest they’re at Google Research, so the model will likely follow Google’s standard productization path through Vertex AI eventually.
I’ll update the article when the model becomes publicly available. For now, the base TimesFM offers solid zero-shot capabilities, just without the powerful in-context learning feature that makes ICF special.
Sign in with Google to comment.