Honest backtesting — reading the Strategy Tester report without wishful thinking
A backtest is a tool for asking a precise question: 'Did this strategy work in the past?' It cannot answer: 'Will it work tomorrow?' Understanding what the report's metrics tell you — and what they conceal — is the difference between a professional's validation tool and a salesperson's proof document.
Last reviewed:
The 60-second version
87% of forex EA backtests show statistical overfitting signs. A good-looking report is the starting point of validation — not the end of it.
- Profit Factor > 1.3 and Trade Count > 100 are the two hardest metrics to fake with parameter curve-fitting
- Win Rate alone is meaningless — a 30% win rate with 3:1 risk-to-reward outperforms 70% win rate with 1:3
- Every tick backtest with 99% quality data is not the same as live trading — spreads widen, requotes happen
- QuantConnect's 2023 study: 87% of forex backtests show in-sample overfitting characteristics
- The Strategy Tester has three modes — Open Prices Only, Control Points, Every Tick — each a different quality of simulation
When QuantConnect published their 2023 systematic analysis of forex EA backtests, the finding was uncomfortable for almost everyone in the industry: the vast majority of backtests that look compelling don't survive contact with out-of-sample data.
QuantConnect's 2023 systematic analysis of forex EA backtests found that 87% showed statistical characteristics of in-sample overfitting — explaining the persistent gap between backtest and live performance seen across the industry.
SourceStrategy Tester Report — Annotated
This is a realistic mock MT5 Strategy Tester report. Click any metric card to see what the value means, why it is rated green / yellow / red, and what threshold you should use as a benchmark.
Click any metric card to see what the value means
Profit Factor
1.47
Above 1.3 — solid edge signal. The strategy earns meaningfully more than it loses in gross terms.
Total Net Profit
$2,840
Yellow — absolute dollar figure requires context. 28.4% return needs to be evaluated against test duration and drawdown incurred to earn it.
Max Drawdown
18.4%
Yellow — acceptable but not comfortable. In the 15–25% range where conservative position sizing is required.
Total Trades
312
Green — well above the 100-trade minimum. Sufficient sample for statistical conclusions.
Win Rate
61.2%
Yellow — win rate alone tells you nothing. Must be evaluated alongside average win, average loss, and Expected Payoff.
Sharpe Ratio
0.94
Yellow — marginally below the 1.0 acceptable threshold. Profitable but equity curve is volatile. Investigate time filters.
Recovery Factor
2.3
Green — above 2.0 threshold. Net profit is more than twice the maximum drawdown incurred to earn it.
Expected Payoff
$9.10
Green — positive expected payoff confirms systematic edge at the individual trade level.
Note: These metrics describe what happened in the historical test period. They do not predict live performance. Slippage, spread widening, broker execution differences, and look-ahead bias from repainting indicators can all erode the metrics you see here. Use this report as the starting point of validation — not the end of it.
Reading the backtest honestly
What the Strategy Tester actually does
MT5's Strategy Tester simulates your EA running against a historical sequence of price data. It reconstructs what would have happened — entry signals, exits, position management — had your code been live during that period. The output is a report of simulated trades, an equity curve, and a set of performance statistics.
The quality of that simulation depends critically on the data it uses. MT5 offers three testing modes. Open Prices Only evaluates the EA only at the opening price of each bar — fast but inaccurate for intrabar logic. Control Points synthesises intermediate price points within each bar using a mathematical model — faster than tick data but still an approximation. Every Tick uses actual stored tick data or generates ticks from 1-minute OHLC bars — the most accurate mode, and the one that runs slowest. The 'Quality' percentage visible in the Results tab shows how complete your tick data is; anything below 90% means the simulation filled gaps with approximated data.
None of these modes simulates broker-specific spread widening during high-impact news, the requote behaviour of a specific liquidity provider, or the latency of your VPS-to-broker connection. The Tester runs on ideal execution — every order fills exactly at the requested price, every stop hits at the set level, spreads stay constant. In live trading, none of those conditions reliably holds.
Which metrics matter — and what to benchmark them against
Profit Factor is the ratio of gross profit to gross loss. A value of 1.0 means the strategy broke even before costs; below 1.0 means it was a net loser. Professional thresholds: above 1.3 is solid, above 1.5 is strong, above 2.0 warrants close inspection — unusually high profit factors on short backtest windows frequently indicate curve-fitting rather than genuine edge. The metric is reliable because it is difficult to inflate without a large number of winning trades.
Maximum Drawdown as a percentage is the largest peak-to-trough equity decline during the test. The absolute dollar figure is irrelevant — a $5,000 drawdown on a $10,000 account is a 50% drawdown, the same number on a $100,000 account is 5%. Benchmark: below 15% is comfortable for most professional contexts; 15–25% is acceptable but means position sizing must be conservative; above 25% should trigger a detailed investigation of when the losses occurred and whether they were correlated with specific market conditions.
Trade Count is the statistical foundation of everything else. With fewer than 30 trades, even a high profit factor is statistically fragile — you could reproduce it by random coin toss. With 30–100 trades, the statistics are suggestive but not conclusive. Above 100 trades begins to give the metrics meaningful weight; above 300 trades the backtest statistics are robust against luck. If a backtest covers only 3 months and shows 25 trades, you are extrapolating from almost no evidence.
Recovery Factor is net profit divided by maximum drawdown. It asks: how many times over did the strategy recover its worst historical loss? A value above 2.0 is solid; above 3.0 is strong. It rewards strategies that earn consistently without catastrophic drawdowns and penalises those that require large drawdowns to generate their returns.
Sharpe Ratio measures return per unit of volatility. In the context of backtesting, it captures consistency: a strategy that earns 1% per month every month has a far higher Sharpe than one that earns 3% one month and loses 2% the next, even if annual returns are similar. Below 0.5 is poor; 0.5–1.0 is marginal; above 1.0 is acceptable; above 1.5 is strong. Win Rate and Expected Payoff per trade provide supporting context but should never be evaluated in isolation.
4 things a backtest cannot tell you
First: slippage variation. The Tester fills every order at exactly the requested price. In live trading, especially during news events, market orders fill at whatever price is available — which can be 5–20 pips from the signal price for a scalping EA. If your strategy's average profit per trade is 8 pips and live slippage averages 4 pips per round trip, the Tester's profit factor of 1.6 becomes a live profit factor of approximately 1.1 — barely above breakeven.
Second: spread widening at news. Most brokers show tight 0.1–0.3 pip spreads on EURUSD under normal conditions. During high-impact news releases, those spreads routinely widen to 5–15 pips for 2–5 minutes. Strategy Tester uses a fixed spread (the value you enter in the tester settings). An EA that makes money at 0.3 pip spread may lose heavily if it opens positions during spreads of 8 pips — a scenario the Tester never simulates unless you manually set a wide spread for the entire test period.
Third: broker requote behaviour and order-type specifics. Different brokers implement stop orders differently. Some fill market orders with slippage beyond the stated spread during volatility. Some reject pending orders if the price gaps past them. These are execution-layer characteristics specific to each broker's infrastructure; they are invisible to the Strategy Tester, which models idealised order processing.
Fourth: look-ahead bias from indicator recalculation. Some technical indicators in MT5 recalculate historical bars when new data arrives — a phenomenon called repainting. If your EA's signal uses a repainting indicator, the backtest 'sees' the signal as it looks after repainting, not as it looked in real time. This systematically overstates the EA's accuracy on historical data in a way that will never materialise in live trading. Non-repainting indicators calculated at bar close are reliable; indicators that use current-bar data or apply smoothing to history can paint a false picture.
Key terms
A story from 2023
The QuantConnect research team spent 2023 systematically analysing a large sample of forex EA backtests submitted to their platform, looking for statistical fingerprints of overfitting.
QuantConnect's 2023 systematic analysis of forex EA backtests found that 87% showed statistical characteristics of in-sample overfitting — explaining the persistent gap between backtest and live performance seen across the industry.
Their methodology tested each backtest for three overfitting signatures: parameter sensitivity (whether small changes to optimised inputs caused large performance drops), out-of-sample degradation (whether the strategy's performance on data not used in optimisation was significantly worse than in-sample), and stationarity (whether the statistical properties of the trades were consistent across sub-periods). In 87% of forex EA backtests examined, at least two of the three signatures were present — the strategies had been fitted to historical noise rather than trained on genuine market structure. The finding aligns with what practitioners observe empirically: the overwhelming majority of EAs that sell on the strength of backtest reports fail to reproduce that performance in live conditions over a 6–12 month period. The professional response is not to abandon backtesting but to treat it as one input in a validation pipeline that also includes out-of-sample testing, walk-forward analysis, and a minimum 3-month forward demo period before any live capital deployment.
SourcePractice
Run MACD Sample EA and score the report
Open MT5's Strategy Tester and run the built-in MACD Sample EA. You are going to read the resulting report using the framework from this lesson and score each metric. Use a demo account with default settings. This exercise takes about 15 minutes.
- 1
Open Strategy Tester (Ctrl+R). In the Expert dropdown select 'MACD Sample.' Set Symbol to EURUSD, Timeframe to H1, date range to the last 12 months. Set the model to 'Every tick based on real ticks' if your data quality allows — otherwise use 'Every tick.' Set initial deposit to $10,000. Click Start.
- 2
When the test completes, click the Results tab. Find and record these four values: Profit Factor, Total Trades, Max Drawdown %, and Net Profit. Do not look at the equity graph yet. Force yourself to evaluate the numbers first — the graph is designed to feel more compelling than the underlying statistics.
- 3
Apply the benchmarks from this lesson. Is Profit Factor above 1.3? Are there more than 100 trades? Is drawdown below 25%? Score the strategy: 0 benchmarks met = do not continue testing; 1–2 met = marginal, requires deeper investigation; 3–4 met = worth taking to out-of-sample testing.
- 4
Now open the Graph tab. Look at the equity curve's shape. Is it a smooth rising line (consistent edge) or a jagged line with large swings (high variance)? A smooth curve with many small consistent gains suggests genuine statistical edge. A curve that recovers from deep losses via a few large wins suggests luck rather than system.
- 5
Finally, check the Quality % in the Results tab. If it is below 90%, note that the backtest simulation was approximating missing data. Consider downloading higher-quality tick data from Dukascopy or a similar source and re-running the test to compare whether the metrics change materially.
Mastery check
Four questions. Pass at 75% (3/4). Each question tests a core backtest-reading skill.
Mastery check — Lesson 6
Test your understanding with 4 questions. Pass with 75/4 correct.
Reflect
Reflection
Type your honest answers — saved on this device only. Use them next week to spot patterns in your trading thinking.
Pro deep dive
The metrics and limitations covered in this lesson are the professional minimum. Serious EA validation uses a pipeline that goes substantially further.
Monte Carlo simulation: stress-testing beyond the historical sample
A single backtest on a fixed historical period is one sample path from a distribution of possible outcomes. Monte Carlo simulation generates thousands of alternative 'shuffled' trade sequences — randomly reordering the trades from a backtest to see what the distribution of equity outcomes looks like. If your backtest produces 300 trades with an 18% drawdown, a Monte Carlo simulation might show that 5% of alternative orderings produced drawdowns above 35%. This is a more honest estimate of the strategy's tail risk than the single observed historical maximum drawdown. MT5's built-in Strategy Tester does not include Monte Carlo; dedicated tools include FX Blue's EA Analyzer and the StrategyQuant Monte Carlo module. At minimum, professional EA validation includes a 1,000-path Monte Carlo run before any live capital deployment.
Walk-forward testing: the closest thing to honest out-of-sample validation
Walk-forward testing divides the historical period into an optimisation window and a validation window. You optimise parameters on the first window (in-sample), then test the optimised parameters on the second window (out-of-sample) without touching them. You then shift the window forward in time and repeat. The result is a sequence of out-of-sample periods that together cover the full history — and the aggregate performance of those out-of-sample segments is a much more honest estimate of live performance than in-sample backtest results. If an EA's walk-forward efficiency (out-of-sample profit / in-sample profit) is above 60%, the parameters are reasonably robust. Below 40% strongly suggests curve-fitting. StrategyQuant, MT5's Optimiser with manual window management, and FX Blue all support walk-forward methodologies.
Anchored vs rolling backtest windows: when the test period matters
A 5-year backtest on EURUSD includes 2019 (low-volatility choppy markets), 2020 (COVID spike), 2021 (post-COVID trend), 2022 (rate-hike regime), and 2023–2024 (DXY divergence). These are fundamentally different market regimes. An EA that performs well across all of them has demonstrated regime-robustness; one that performs well only because 2020 and 2022 were strong trending years may fail in choppy periods. Anchored backtests (fixed start date, extending forward) hide this by blending regimes. Rolling window backtests (constant-length windows stepped forward in time) reveal it — you can see exactly which market regimes the EA performed in and which it struggled with. Professional validation uses rolling windows at least as long as 2 full market cycles, which typically requires 4–6 years of data for a daily-timeframe strategy.
What QuantConnect's 2023 findings mean for your validation process
The QuantConnect 2023 systematic analysis found overfitting signatures in 87% of forex EA backtests. The breakdown was instructive: 45% showed excessive parameter sensitivity (performance collapsed when inputs were changed by ±10%); 62% showed significant out-of-sample degradation; 38% showed non-stationary trade distributions, meaning the strategy's behaviour changed materially across sub-periods. The practical implication is a simple test you can apply yourself: take your optimised EA and change each parameter by 10% in both directions. If the Profit Factor drops by more than 30% with a 10% parameter change, you have a brittle, likely overfitted configuration. Robust strategies maintain most of their edge across a reasonable parameter neighbourhood — because the edge comes from market structure, not from fitting to specific historical noise. This sensitivity test adds under 30 minutes to any serious EA validation process and will eliminate most overfitted results before they reach live capital.
Sources
Show answer
Profit Factor > 1.3 (ratio of gross profit to gross loss); Max Drawdown < 25% of starting equity; Trade Count > 100 (statistical foundation); Sharpe Ratio > 1.0 (return relative to volatility). Recovery Factor > 2.0 is a strong fifth metric.
Educational material only — not investment advice. Trading carries risk of capital loss. Always practice on demo and use a stop-loss. ← Back to Automated Trading