Methods

Crash courses on the methods that recur across the portfolio.

This page is written as a compact research note rather than a formula gallery. For each topic, I keep the same structure: what the object is, why it matters in practice, and what I check before trusting the result.

MDP and Dynamic Programming in C++

Bellman optimality and value iteration

The Bellman equation rewrites a sequential decision problem as a fixed-point problem. Once the transition kernel and reward function are known, dynamic programming turns policy search into repeated value updates.

Core object

The value function V*(s) is the best discounted reward attainable from state s. The optimal policy is obtained by selecting the action that maximizes the Bellman operator at each state.

Why it matters

This is the cleanest way to separate modeling assumptions from optimization. If the model is explicit, policy computation becomes transparent, testable, and reproducible.

Research reflex

Before running value iteration, I check whether the state space is small enough for exact dynamic programming, whether rewards are well-scaled, and whether gamma implies a contraction strong enough for stable convergence.

Model-based dynamic programming is not the same thing as model-free reinforcement learning.
Convergence is driven by repeated Bellman backups, not by ad hoc search over policies.

Monte Carlo Methods for Quantile Estimation

Importance sampling for tail estimation

Importance sampling is a change-of-measure technique. Instead of sampling where the probability mass is common, it deliberately samples where the estimator is expensive or rare, then reweights to remain unbiased.

Core object

The target density is f and the proposal density is g. The estimator remains valid as long as g has support wherever the target integrand matters.

Why it matters

For rare events, naive Monte Carlo spends most of its budget in regions that contribute almost nothing to the final estimate. Importance sampling moves the simulation effort toward the tail.

Research reflex

I never look only at the point estimate. I also check the weight dispersion, whether the proposal actually covers the tail correctly, and whether the variance reduction is large enough to justify the change of measure.

A good proposal reduces variance without introducing unstable importance weights.
In practice, tail estimation quality depends more on the proposal design than on raw simulation volume.

Customer Segmentation and Environmental Audio Clustering

K-Means as a representation-dependent objective

K-Means is often presented as a simple clustering baseline, but in practice most of its power comes from preprocessing, scaling, and the geometry of the latent space rather than from the optimization routine itself.

Core object

The objective minimizes within-cluster inertia around centroids. The algorithm alternates assignment and centroid updates, but the problem remains non-convex and sensitive to initialization.

Why it matters

The same objective can behave very differently across raw variables, PCA factors, UMAP embeddings, or transformer representations. That is why clustering quality is often a feature-space question before it is an algorithm question.

Research reflex

I check scaling, latent dimension, cluster stability across seeds, and internal metrics such as silhouette or Davies-Bouldin. I do not treat a clean-looking cluster plot as sufficient evidence on its own.

Good clustering results usually come from better representations, not from decorative model changes.
The objective is simple, but the methodological discipline around it matters a lot.

Financial Time Series and Actuarial Modeling

GARCH(1,1) and conditional volatility

Financial prices are often close to random walks in level, but volatility is not. GARCH models exploit this asymmetry by modeling the dynamics of conditional variance rather than forcing predictability where there is little in the mean.

Core object

Conditional variance today depends on a long-run level omega, yesterday's shock magnitude, and yesterday's variance. Large moves create volatility clustering because shocks feed into future risk estimates.

Why it matters

For risk measurement, derivatives, and market diagnostics, volatility is often the object that remains forecastable even when returns themselves are close to unpredictable.

Research reflex

I fit GARCH on returns, not price levels, and I check persistence, residual diagnostics, and whether Gaussian innovations are too optimistic. In the portfolio project, Student-t innovations were more credible because of fat tails.

The key question is often not whether price levels can be forecast, but whether conditional risk can be modeled well.
Heavy tails are not a cosmetic choice; they change the plausibility of the fitted risk model.

Financial Time Series and Actuarial Modeling

Unit roots, differencing, and the ADF test

Before fitting ARIMA, VAR, or macro regressions, I first ask whether the series is stationary. The ADF test is one of the standard tools for deciding whether a level series behaves like an integrated process that should be differenced.

Core object

The null hypothesis is a unit root, which means the series is non-stationary in level. The lagged differences are added so that short-run autocorrelation does not contaminate the test.

Why it matters

If a series is I(1), regressions in level can look impressive while being almost entirely spurious. In quantitative work, checking integration order is basic hygiene before interpreting coefficients or forecasting performance.

Research reflex

I choose the deterministic part carefully, compare the result with visual diagnostics and ACF behavior, and then work on returns or differences when the data behaves like a random walk. In finance, this often changes the whole modeling strategy.

A strong-looking regression can be statistically empty if the underlying series is not stationary.
Differencing is not a technical nuisance; it determines what object is actually being modeled.

Financial Time Series and Actuarial Modeling

Granger causality, VARs, and predictive structure

In macro and financial data, I separate predictive content from structural causality. Granger tests and VAR models are useful because they ask a modest question: do lagged values of one series improve the forecast of another?

Core object

A VAR stacks several time series into one system. Granger causality in that setting means testing whether the lagged coefficients of one variable in another variable's equation are jointly zero.

Why it matters

This gives a disciplined way to talk about lead-lag structure without pretending to have identified a structural economic mechanism. In the portfolio project, construction activity had predictive content for GDP growth, which is economically plausible and empirically testable.

Research reflex

I difference non-stationary series first, control the lag order, and interpret impulse responses and out-of-sample RMSE alongside the test. I do not translate a Granger result into a causal claim without stronger identification.

Predictive precedence is weaker than structural causality, but still useful for research and forecasting.
VARs are informative only when the dimensionality stays compatible with the sample size.

ECG Signal Denoising

Kernel PCA for nonlinear denoising

Kernel PCA extends PCA by replacing linear covariance structure with a kernel-induced feature space. That makes it useful when the signal geometry is nonlinear and linear components no longer separate signal from noise well enough.

Core object

Instead of diagonalizing a covariance matrix in the original space, Kernel PCA diagonalizes the centered kernel matrix. The eigenvectors define principal directions in an implicit nonlinear feature map.

Why it matters

For ECG denoising, the structure of the waveform is not purely linear. Kernel PCA can preserve morphology better than standard PCA when the signal manifold is curved or locally nonlinear.

Research reflex

I treat the kernel and its scale parameter as modeling assumptions, not defaults. In the denoising benchmark, the point was not to declare one method universally superior, but to compare methods across noise regimes and records with a common MSE protocol.

Kernel methods are useful when linear projections discard too much structure.
Benchmarking by noise type is often more informative than reporting a single aggregate win.