Optimisation without overfitting — finding real parameters vs curve-fitting the past
MT5's Strategy Tester Optimiser can find parameter combinations that produce spectacular backtests. Most of them fail immediately in live markets. This lesson shows you how optimisation works, why it produces overfitted results by default, and the walk-forward protocol that separates genuinely robust parameters from historical accidents.
آخر مراجعة:
The 60-second version
Every parameter you optimise is a coin flip. The optimiser always finds the combination that won — not the combination that will win next.
- Running 10,000 parameter combinations guarantees finding one that looks spectacular — even with a random strategy
- Walk-forward testing is the only way to test whether optimised parameters generalise to unseen data
- Limit optimisation to 3–4 free parameters; each additional parameter multiplies the overfitting risk
- Signs of an overfit result: profit factor above 3.0, Sharpe above 3.0, equity curve with no drawdown periods
- The yen carry-trade unwind of August 2024 exposed which EAs were optimised for calm ranging markets and which were genuinely robust
August 5, 2024 was one of the most violent single-session moves in yen pairs in decades. EAs that had been optimised exclusively on the 2023 low-volatility ranging period encountered a market regime their parameters had never been calibrated for.
في 5 أغسطس 2024 هبط USDJPY 10 أرقام كبيرة في جلسة آسيوية واحدة. EAs التي تعمل بلا kill-switch لـ drawdown امتصّت الحركة كاملةً. أما تلك التي تملك سقف خسارة يومي 3-5% فخرجت مبكراً وحافظت على رأس المال.
المصدرOverfit Detector
Move the slider to see how increasing the number of optimised parameters changes the relationship between in-sample and out-of-sample performance. This is an educational model — the values illustrate the pattern, not a precise formula.
Under-optimised (1–2 params)
Too few free parameters to fit noise effectively. In-sample and out-of-sample results are similar. The strategy is robust but may be leaving edge on the table.
Sweet spot (3–4 params)
Enough flexibility to capture genuine structural patterns without excessive noise-fitting. In-sample modestly outperforms out-of-sample. This is the target zone.
Overfit zone (5–8 params)
Too many free parameters. The optimiser has enough degrees of freedom to sculpt the equity curve to the historical data. Out-of-sample performance degrades sharply. High risk of live failure.
The sweet spot is 3–4 optimised parameters. Beyond that, each additional parameter is more likely to improve the historical fit than to improve forward performance.
Understanding optimisation and its traps
What the MT5 optimiser actually does
The MT5 Strategy Tester Optimiser is an automated search engine. You define a set of input parameters, a range of values to test for each, and an objective function — the metric you want to maximise. The optimiser then runs the EA's backtest hundreds or thousands of times, each run using a different combination of parameters within your defined ranges. It keeps track of which combinations produced the best result by the objective metric and presents them ranked in the optimisation results table.
Two search algorithms are available. The slow, thorough approach is a full genetic scan — systematic evaluation of every combination in the defined space. The faster approach is a genetic algorithm — an evolutionary search that progressively focuses on promising parameter regions, reducing the total number of backtests needed at the cost of occasionally missing the global optimum. For wide parameter ranges with many inputs, the genetic algorithm is the only practical choice. For narrow ranges with few inputs, a full scan is feasible and more complete.
The fundamental problem is not with the search algorithm — it is with the framing of the question. You are asking: 'What parameter set produced the best result on this specific historical data?' The answer to that question is always meaningful when interpreted correctly: it tells you which parameters were most suited to the specific market conditions in that period. It tells you almost nothing about which parameters will be most suited to the future, which will contain different conditions.
Walk-forward testing — the gold standard
Walk-forward testing is the professional answer to the overfitting problem. The principle is rigorous and simple: you never test the parameter set on data that was used to find it. The historical period is divided into two parts — the in-sample window, used for optimisation, and the out-of-sample window, used for validation. You optimise parameters on the in-sample window, then test the resulting parameter set on the out-of-sample window without any further adjustments. What the strategy achieves on the out-of-sample data is an honest estimate of how robust those parameters are.
A single in-sample/out-of-sample split is better than no split, but it is vulnerable to the possibility that your out-of-sample window happened to be favourable for any strategy. The full walk-forward methodology addresses this by rolling the window forward in time and repeating the process. A typical protocol divides 5 years of data into a 3-year in-sample window and a 1-year out-of-sample window, then optimises, records the out-of-sample result, shifts the window forward by 6 months, and repeats. After several iterations you have 3–4 independent out-of-sample periods whose aggregate performance is far more statistically robust than any single test.
The metric that results from this process is walk-forward efficiency: the ratio of average out-of-sample performance to average in-sample performance. A ratio above 60% suggests the parameters are genuine — the strategy is earning most of its in-sample return in truly unseen data. A ratio below 40% strongly suggests curve-fitting — the strategy found the historical noise and cannot reproduce its performance on new data. A ratio between 40–60% is inconclusive; the strategy may be partially robust but requires further investigation with additional out-of-sample windows.
Signs of an overfit result
Certain patterns in optimisation results are diagnostic of curve-fitting rather than genuine edge. The most reliable red flags are statistical: a Profit Factor above 3.0 on a standard backtest period is almost always evidence of fitting, not edge — genuine systematic edges in liquid forex markets rarely produce profit factors that high when tested over multi-year periods. Similarly, a Sharpe Ratio above 3.0 in backtesting describes an equity curve that is unrealistically smooth. Real market conditions produce drawdown periods; a backtest that shows almost no drawdown has been optimised to avoid the specific bad periods in the test data, not to navigate genuinely adverse markets.
Equity curve shape is another powerful signal. A healthy backtest equity curve is trending upward but includes visible drawdown periods — sequences of consecutive losing trades that test the strategy's ability to recover. A curve that rises almost without interruption, particularly one that bends upward toward the end of the test period, is a hallmark of optimised-to-the-data performance. It has been fitted to the most recent price behaviour and is extrapolating that behaviour indefinitely.
Parameter values themselves reveal overfitting. Parameters that fall at the extreme edges of the tested range — the fastest or slowest moving average, the tightest or widest stop distance — suggest the optimiser found that performance improved monotonically as the parameter approached its boundary, which typically means the parameter is simply filtering out bad market conditions from the specific test period rather than capturing a robust structural relationship. Suspiciously round numbers are another tell: a stop-loss optimised to exactly 50 pips when tested across a range of 20–80 pips was probably placed at 50 because that value happened to avoid several major stop-hunt events in the test history — events that will recur in different locations in forward data.
Key terms
August 2024: when optimised EAs met a regime shift
On August 5, 2024, USDJPY dropped more than 10 big figures in a single Asian session — one of the most violent single-day moves in yen pairs in over a decade. The catalyst was an unexpected Bank of Japan rate hike combined with a simultaneous unwinding of the global yen carry trade. For automated traders, the event was a live stress test of whether their EA's parameters had been validated for regime shifts or simply optimised for the calm, range-bound conditions that had dominated 2023.
في 5 أغسطس 2024 هبط USDJPY 10 أرقام كبيرة في جلسة آسيوية واحدة. EAs التي تعمل بلا kill-switch لـ drawdown امتصّت الحركة كاملةً. أما تلك التي تملك سقف خسارة يومي 3-5% فخرجت مبكراً وحافظت على رأس المال.
The pattern that emerged was instructive. EAs that had been optimised exclusively on the 12 months preceding August 2024 — a period characterised by low volatility and relatively narrow ranges — had been fitted to parameters that performed beautifully in that specific regime: tight stops, short-duration trades, frequent small profits. Those exact parameters were catastrophically mismatched to a 10-big-figure trending move with sharply widened spreads and deep liquidity gaps. The core reason was not that the strategies were conceptually wrong — trend-following and mean-reversion both have genuine edges in forex — but that the optimised parameters represented the specific conditions of one regime, not the structural properties of the strategy's edge across regimes. Walk-forward testing across a multi-year window that included the 2022 USD strength trend and the 2020 COVID spike would have forced the optimisation to produce parameters robust enough to survive regime transitions. Optimising on a single calm year guaranteed parameters that would fail the moment conditions changed.
المصدرPractice
Run MT5 optimiser on MACD Sample EA and observe the degradation
This exercise deliberately demonstrates the overfitting trap. You will use the optimiser to find spectacular in-sample parameters, then test them on a forward period and observe the degradation. The degradation is the lesson. Allow approximately 20 minutes for both optimisation and forward testing.
- 1
Open Strategy Tester (Ctrl+R) and select the MACD Sample EA. Set Symbol to EURUSD, Timeframe H1. Set date range to cover the 2 years ending 12 months ago — this is your in-sample window. Set model to Every Tick. Click the Optimisation tab and enable it. Tick the Fast Period and Slow Period inputs for optimisation; leave others fixed. Set Fast Period range 5–20 step 1, Slow Period range 15–50 step 1. Click Start.
- 2
When optimisation completes, open the Optimisation Results tab. Sort by Net Profit descending. Record the top-ranked parameter set — its Fast Period, Slow Period, Net Profit, Profit Factor, and Max Drawdown. This is your in-sample champion. Notice that it likely shows a profit factor significantly higher than typical non-optimised runs. This is the optimiser's gift to you — and its trap.
- 3
Now switch back to single-run mode (disable optimisation). Enter exactly the parameter values from the top-ranked result. Change the date range to the 12 months immediately following your in-sample window — this is the out-of-sample period the optimiser never saw. Run the backtest.
- 4
Compare in-sample vs out-of-sample results. Record both Profit Factors and Net Profits. In most cases you will see the Profit Factor drop materially — often from above 2.0 to below 1.3, or even below 1.0. This degradation is not bad luck; it is the mechanical consequence of parameter selection on historical data. The in-sample champion was the parameter set most suited to that specific 2-year market behaviour, not the most robust strategy.
- 5
For comparison, re-run the out-of-sample backtest with the default (non-optimised) MACD parameters. Note that the default parameters often produce a more consistent, if smaller, profit factor across both periods. This is because unoptimised parameters reflect the strategy's structural logic rather than a specific historical fit. This comparison is the clearest possible demonstration of why professional validation always includes a walk-forward step.
Mastery check
Four questions. Pass at 75% (3/4). Each question tests a core optimisation and walk-forward concept.
Mastery check — Lesson 7
اختبر فهمك عبر 4 أسئلة. اجتَز بإجابة 75/4 صحيحة.
Reflect
تأمّل
اكتب إجاباتك بصدق — تُحفظ على هذا الجهاز فقط. استخدمها الأسبوع المقبل لرصد الأنماط في تفكيرك التداولي.
Pro deep dive
Walk-forward testing is the minimum viable validation protocol. Professional-grade EA development goes further with statistical robustness tests, anchored vs rolling window analysis, and systematic regime classification.
Monte Carlo resampling: quantifying the luck in your walk-forward results
A walk-forward test with 4 windows produces 4 out-of-sample observations. That is enough to detect obvious curve-fitting but not enough to characterise the distribution of outcomes. Monte Carlo resampling addresses this by randomly shuffling the sequence of out-of-sample trades across the 4 windows and simulating thousands of alternative sequences. If your 4 windows produced a median out-of-sample Profit Factor of 1.25, Monte Carlo can tell you whether 1.25 is in the top 10% or the top 50% of achievable results given random variation — a measure of statistical significance that the raw walk-forward result cannot provide. Tools that support Monte Carlo on walk-forward results include FX Blue's EA Analyzer and StrategyQuant's complete backtesting environment. At minimum, run a 500-path simulation before treating any walk-forward result as conclusive evidence of edge.
Anchored vs rolling walk-forward windows: when the start date matters
Rolling walk-forward windows — where both the start and end date shift forward each iteration — treat all market periods equally. An anchored walk-forward window — where the start date is fixed but the end date extends forward — has a different property: as more data is added, the in-sample window grows and the strategy's parameters are always informed by the full available history. Anchored windows are appropriate when you believe the market's structural properties are stable over time and you want the most data-informed parameter set for live trading. Rolling windows are appropriate when you believe conditions shift materially over multi-year periods and you want your parameters to reflect recent behaviour. The professional approach is to run both and compare: if the rolling and anchored approaches produce similar walk-forward efficiencies, the strategy's edge is regime-robust; if they diverge significantly, the strategy may be performing well only in one specific era of market behaviour.
Statistical significance of backtest results: the p-value problem
In scientific research, a result is considered statistically significant when it could not plausibly have occurred by chance alone — typically when the p-value is below 0.05. Applying this framework to backtesting is harder than it sounds. A strategy with 200 trades and a Profit Factor of 1.4 over 3 years needs to be compared against the distribution of results you would expect from a random-entry strategy run on the same data. If 15% of random strategies achieve a Profit Factor of 1.4 over 200 trades on EURUSD H1 data, your result is not statistically significant — it is within the range of chance. The rigorous way to test this is permutation testing: randomise entry signals while keeping the same trade management logic, run 1,000 randomised backtests, and compare your real strategy's Profit Factor against the resulting distribution. If your result falls in the top 5% of random outcomes, you have a statistically meaningful edge. If it falls in the top 25%, you may have something real, but the evidence is weak. QuantConnect's research platform and the academic finance literature both use this methodology; it is not yet standard in retail EA development but is increasingly required by serious prop firms evaluating systematic strategies.
QuantConnect's walk-forward research findings: what the industry data shows
QuantConnect's 2023 systematic analysis specifically measured walk-forward efficiency across the forex EA backtests in their dataset. The median walk-forward efficiency was 41% — meaning the typical optimised EA retained less than half its in-sample performance on out-of-sample data. Only 18% of the tested strategies showed walk-forward efficiency above 60%, suggesting genuine edge. The remaining 82% showed characteristics consistent with curve-fitting. The strategies that achieved high walk-forward efficiency shared several properties: fewer than 5 optimised parameters, test periods covering at least 3 years, and consistent out-of-sample performance across multiple rolling windows rather than one exceptional window surrounded by failures. These findings provide a practical benchmark: if your walk-forward efficiency lands in the top quintile (above 60%) with fewer than 5 parameters and consistent cross-window results, you are in the rare minority of backtests that have passed a meaningful robustness standard.
Sources
إظهار الإجابة
Overfit signs: (1) Profit Factor above 3.0 or Sharpe above 3.0 in a standard forex backtest — unrealistically smooth; (2) equity curve with almost no drawdown periods; (3) parameters at the extreme boundaries of their tested range. Walk-forward efficiency = average out-of-sample performance / average in-sample performance. Above 60% suggests genuine robustness; below 40% strongly suggests curve-fitting.
مادة تعليمية فقط — ليست نصيحة استثمارية. ينطوي التداول على مخاطر خسارة رأس المال. تدرّب دائمًا على حساب تجريبي واستخدم أمر وقف الخسارة. ← العودة إلى Automated Trading