Evaluating SotA LLM Models trying to solve a net-new LeetCode style puzzle
Claude, GPT, Gemini and DeepSeek try to find optimal placement for men occupying urinal stalls in a restroom!
Jan 23, 202520 min read868

Search for a command to run...
Articles tagged with #model-evaluation