Agent best practices - grind mode and various hooks
A comprehensive guide to working with coding agents, from starting with plans to managing context, customizing workflows, and reviewing code.
An agent generates Python code for a search reranker using the ESCI dataset and a keyword search tool, aiming to improve NDCG from a BM25 baseline of 0.30.
The initial code-dumping approach yields inconsistent NDCG around 0.33 across test queries due to overfitting risks. An iterative optimization process applies small code patches, enforces generalization via holdout evaluations, and rejects overfit changes to produce a deployable reranker function.