#reinforcement-learning

3 posts · all tags

Jun 6, 2025 · Swarm Structure Simulation

After reading the book The Rules of the Flock I got inspired to test some ideas of the book. Basically a swarm behavior is defined when individual agents, following a simple set of local rules without a central leader, produce complex and intelligent collective patterns.
Feb 6, 2025 · Group Relative Policy Optimization (GRPO)

PPO is a reinforcement learning algorithm originally designed to update policies in a stable and reliable way. In the context of LLM fine-tuning, the model (the “policy”) is trained using feedback from a reward model that represents human preferences. Value Function (Critic): Estimates the “goodness” of a state, used with Generalized Advantage Estimation (GAE) to balance bias and variance. Basically it works as follows:
Aug 4, 2024 · Simple Reinforcement Learning Example

A simple RL example, where i want an agent to navigate a grid to reach a goal while avoiding holes. For more details here is the colab where I implemented the RL example Link to code