Applied reinforcement learning project inspired by a medical control paper

Chemotherapy Dose Control with Q-Learning

Reinforcement learning for adaptive treatment policies under constraints

This project sits at the intersection of control, reinforcement learning, and applied modeling. A pharmacological system is simulated with differential equations, while a Q-learning agent learns dosing policies that reduce tumor burden while managing normal-cell preservation constraints.

50k

Training episodes

Simulated patients

43 days

Mean eradication

Problem

A treatment policy must balance competing objectives: reducing tumor cells quickly without destroying healthy cells or violating scenario-specific safety constraints.

Approach

Modeled the treatment dynamics, formalized the control problem as an MDP, trained a tabular Q-learning agent over 50,000 episodes, then evaluated nominal, constrained, and perturbed patient scenarios including pregnancy and comorbidity cases.

Results

Across the 15-patient simulation, the mean tumor eradication time was about 43 days, and the learned controller remained stable under parameter perturbations ranging from minus 10 percent to plus 15 percent.

What is in the repository

Connected continuous treatment dynamics with a reinforcement learning control policy.

Tested three clinically different patient scenarios instead of only one nominal case.

Added robustness analysis under kinetic parameter perturbations.

Reported distributional outcomes, not just a single success story.

Role and scope

Mathematical modeling, RL agent design, robustness analysis, and scenario simulation

Project context

Applied reinforcement learning project inspired by a medical control paper

Main stack

PythonQ-learningSciPyNumPyMatplotlib