#reinforcement-learning
3 posts · all tags
-
· Swarm Structure Simulation
After reading the book The Rules of the Flock I got inspired to test some ideas of the book. Basically a swarm behavior is defined when individual agents, following a simple set of local rules without a central leader, produce complex and intelligent collective patterns.
-
· Group Relative Policy Optimization (GRPO)
PPO is a reinforcement learning algorithm originally designed to update policies in a stable and reliable way. In the context of LLM fine-tuning, the model (the “policy”) is trained using feedback from a reward model that represents human preferences. Value Function (Critic): Estimates the “goodness” of a state, used with Generalized Advantage Estimation (GAE) to balance bias and variance. Basically it works as follows:
-
· Simple Reinforcement Learning Example
A simple RL example, where i want an agent to navigate a grid to reach a goal while avoiding holes. For more details here is the colab where I implemented the RL example Link to code