TS-Eval is live · Diaugeia.AI

Today we are launching TS-Eval — an open, reproducible leaderboard for time-series forecasting. Every entry is a community submission: one agent trajectory plus one verified result, ranked transparently across tracks, datasets, and horizons. It sits on top of ModernTSF, the producer framework we released a few days ago.

The first result

The opening round puts 135 models head-to-head on the Stock CSI-300 (沪深300) realtime track — 151 submissions, predicting 5 trading days from the prior 20, ranked by MSE (lower is better). The field spans two modes run on the same data: 124 submissions in pure time-series mode (each series forecast largely on its own) and 27 in graph/spatiotemporal mode (the ~300 stocks treated as nodes, with the cross-sectional structure between them modeled directly).

Rank	Model	MSE	Type
1	NBeats	0.7483	time-series
2	MTGNN	0.7484	graph
3	DFDGCN	0.7485	graph
4	STPGNN	0.7487	graph
5	HimNet	0.7488	graph
6	GWNet	0.7489	graph
7	STNorm	0.7490	graph
8	STGCN	0.7497	graph

The honest finding is that nothing separates at the top: #1 NBeats (0.7483) and #2 MTGNN (0.7484) are apart by 0.0002, and the rest of the leaders trail by hundredths — a dead heat. Two patterns survive it. Graph/spatiotemporal models that exploit the cross-sectional structure between stocks fill most of the front — 15 of the top 20 — yet the single best result is a pure per-series model (NBeats), so the cross-section clusters at the top without running away with it. And no model captures much signal: correlation with the truth sits near 0.04 across the leaders. What the field does clear is a real bar — a naive last-value baseline (HL) lands near the bottom at ≈1.50, while the leaders sit at ~0.748. Across the full board the spread runs from a best of 0.7483 to a median of 0.7856 and a worst of 1.7141.

Below the top, models cluster tightly and the long tail is wide; absolute predictability stays low (near-zero correlation for most), so CSI-300 forecasting is still genuinely hard. The takeaway is not that deep models magically solve stocks, nor that one architecture wins — it is that on this data, learned models beat the naive floor by a wide margin, graph models cluster at the front, and no single model meaningfully separates from the pack.

This is a launch snapshot, not a final verdict — mostly single-seed (seed 2024), one horizon, first round. More datasets, more horizons, and the realtime refresh are coming.

Where it lives

Live leaderboard: diaugeia.ai/tseval
Static datasets: Diaugeia/TSEval-Static
RealTime datasets: Diaugeia/TSEval-RealTime
Submissions: Diaugeia/TSEval-Submissions
Weights: Diaugeia/TSEval-Weights
Leaderboard Space: Diaugeia/TSEval

Submit

Clone ModernTSF, run your experiment via the tsf CLI, capture the run with tsf trace, then tsf submit --push to open a community PR — your model on the board.

For the full story, the protocol, and the complete results table, read the research post.