Reference

Backtesting

.md

engine_v2 is Finny's institutional-grade backtester. It produces 60+ metrics across eight categories and renders them as TUI-formatted tables. Walk-forward, sweep, and Monte Carlo are first-class.

Run a backtest

From the Finny CLI, run /backtest on the active algo. Output is a structured JSON report saved to v<NN>/backtest.json and a formatted TUI summary in the terminal.

bash
/backtest

Metric categories

Return

total_return, cagr, time_weighted_return, money_weighted_return, best_day, worst_day, best_month, worst_month, pct_positive_months, pct_positive_years

Risk

ann_vol, downside_deviation, semi_variance, skew, kurtosis, VaR_95, VaR_99, CVaR_95, CVaR_99, ulcer_index, pain_index, tail_ratio

Ratios

sharpe, sortino, calmar, omega, MAR, sterling, k_ratio

Drawdown

max_drawdown, max_dd_duration_bars, max_dd_recovery_bars, avg_drawdown, avg_dd_duration_bars, current_drawdown, plus the top N drawdowns with start / trough / end timestamps.

Trades

total_trades, win_rate, loss_rate, breakeven_rate, avg_win, avg_loss, payoff_ratio, expectancy, expectancy_r, profit_factor, max_consecutive_wins/losses, hold-bar stats, MAE_avg/max, MFE_avg/max, kelly_fraction, kelly_confidence

Exposure

time_in_market_pct, avg_gross_exposure, avg_net_exposure, max_gross_exposure, total_turnover, turnover_per_year, total_fees, fees_as_pct_return, total_funding, total_borrow, liquidation_count

Stability

equity_curve_r2, rolling_sharpe (mean / min over a configurable window), and a monthly_returns heatmap.

Optional advanced blocks

Benchmark
alpha, beta, r², correlation, tracking error, information ratio, Treynor, up/down capture — against any benchmark ticker.
Monte Carlo
Trade-shuffle or block-bootstrap. Reports final_equity and max_dd at p5/p50/p95, and sharpe at p5/p50/p95.
Walk-forward
N-fold IS/OOS split. Adds deflated_sharpe and probabilistic_sharpe.
Regime breakdown
Per-regime (low / mid / high vol) return, Sharpe, drawdown, win rate.
Data quality
Coverage %, gap count, OHLC violations, outlier bars, zero-volume bars.
Per-symbol attribution
Realized + unrealized PnL, n_trades, win_rate, contribution % per symbol.

Full trade log

Every trade is captured:

symbol, side, entry/exit_ts, qty, entry/exit_price,
pnl, pnl_pct, r_multiple, fees, funding, borrow,
MAE, MFE, hold_bars, entry/exit_tag, liquidation

Walk-forward (the overfitting gate)

finny_backtest_walkforward splits the window into in-sample (default 70%) and out-of-sample (30%), runs the strategy on each independently, then compares metric sets and renders a robustness verdict:

  • OOS Sharpe far below IS Sharpe → OVERFIT.
  • OOS loses money → OVERFIT.

Treat this as the primary gate before declaring a strategy "good."

Parameter sweep

finny_backtest_sweep takes a parameter grid and runs the Cartesian product as separate backtests (capped at 27 combinations):

json
{ "rsi_period": [10, 14, 18], "oversold": [25, 30, 35] }
Strategies must read params
The sweep only varies the parameters your strategy actually reads via self.params.get("rsi_period", 14). Hardcoded constants make every combination return identical metrics.

Data sources

  • Equities / ETFs — Yahoo Finance via yfinance.
  • Crypto — Binance (preferred for coverage), with yfinance fallback.

Both routes flow through the data extractor subagent, which writes parquet into the algo workspace and produces a regime brief. See /docs/research.

Limitations
Backtests can mislead when a strategy has lookahead bias, overfit parameters, poor liquidity assumptions, missing fees, or unrealistic fills. Use walk-forward as the overfit check; treat every result as research, not investment advice.