4 min readMACHINE-LEARNING · TRADING · LSTM

Building an LSTM scalping signal engine for MetaTrader 5

money-maker predicts whether EUR/USD hits a 2-pip take profit before a 1.5-pip stop. The hard parts are label design, latency into MT5, and accepting that forward testing is the only honest evaluation.

The model in money-maker is a multi-layer LSTM with a dense head that emits two probabilities. That description fits a hundred tutorials. What does not fit a tutorial is the part that decides whether any of it means something: the label.

The label is the whole problem

A scalping signal is a bet about order of events, not about price level. I do not care where EUR/USD closes in an hour. I care whether it touches a 2-pip take profit before it touches a 1.5-pip stop loss, starting now. So the label is not next-tick return. It is a forward-looking, path-dependent question: walking forward from the current tick, which barrier gets hit first.

That framing has consequences most people skip. You cannot label a sequence until you have seen enough future ticks to resolve one of the two barriers, so every training sample carries a variable, unknown lookahead window baked into it. Get the windowing wrong and you leak the future into the features: if any part of your input sequence overlaps the bars you used to resolve the label, the model learns to read the answer instead of predicting it. Validation accuracy looks excellent. Forward performance is noise. The single most expensive bug class here is not a bad model, it is a label that quietly knows the outcome.

The barriers being asymmetric (2 pips up versus 1.5 pips down) is deliberate. A symmetric target turns the problem into a coin-flip the model has no edge on. The asymmetry means the classes are imbalanced in a way that tracks the actual microstructure, and that imbalance has to survive into the loss function rather than getting normalized away.

Features are normalized, then windowed, in that order

Ticks arrive from MT5 every three seconds. The pipeline normalizes, then slices fixed-length sequences, then feeds the LSTM.

plaintext
MT5 tick feed (3s)  ->  normalize  ->  sequence window  ->  LSTM  ->  P(TP), P(SL)

Order matters. Raw EUR/USD prices are around 1.08 with tiny tick-scale moves; an unnormalized LSTM spends its capacity learning the offset instead of the shape. I normalize first so the windowed sequences are already on a stable scale, and I keep the normalization stateless per window so it can be applied identically at training time and at serving time. The instant your live preprocessing diverges from your training preprocessing, your deployed model is a different model than the one you evaluated. That divergence is invisible in code review and obvious in the logs three days later.

Serving is a latency problem, not an ML problem

The training repo and the serving loop are not the same animal. Training is offline: pull three months of history from MT5, build sequences, fit, save weights. Serving runs main.py, polls a tick every three seconds, and writes a timestamped TP/SL probability. No order is ever placed.

Three seconds sounds generous until you account for the full budget: pull the tick over the MT5 bridge, append it to the rolling window, normalize, run the forward pass, log. On a scalping horizon a signal computed late is a signal about a market that no longer exists. The fix is keeping the rolling window in memory and only ever appending the newest tick, never rebuilding the sequence from scratch on each poll. The forward pass on a small LSTM is cheap; the trap is the bookkeeping around it.

Why forward testing is the only honest evaluation

money-maker is forward testing only. That is a design decision, not a missing feature. A backtest on intraday FX flatters itself: it assumes you got filled at the price you saw, with no spread widening, no slippage, no requote, and it silently assumes your labels did not leak. Every one of those assumptions breaks live, and they break in the direction that makes the strategy look worse, not better.

So the engine logs predictions against ticks as they actually arrive and lets reality grade them, with no execution in the loop to launder the result. I will not publish a return number from this, because a return number implies a track record of profit, and a logged probability stream is not that. It is evidence the pipeline runs end to end and that the model emits calibrated-looking signals on data it has never seen. That is the claim. Anything stronger would be a story, and FX punishes stories.