WORDLE ARENA

LLM vs RL vs Heuristics

Four Wordle solvers go head to head across hundreds of games. Qwen 3.5 2B, a small language model, reasons about each board state in natural language, choosing words based on semantic and positional intuition. A2C and PPO are reinforcement learning agents that learn letter-placement value through policy-gradient methods, refining strategies that balance information gain with exploitation of known letters. Frequency is a baseline that ranks candidate words by English letter-frequency overlap. And Random picks a valid word uniformly at random — the control. Watch them race through each puzzle side by side, track their guess distributions, and see which approach handles the variance of a five-letter word game best.
Game 1 / 0?????
Speed

S T A T I S T I C S

A V G   S C O R E