We argue that commonly used benchmarks underestimate the out-of-distribution (OOD) generalization capabilities of programmatic policies. By controlling input sparsity and reward functions, we show neural networks can match or exceed programmatic policies on standard benchmarks, and we propose new tasks that highlight scenarios where symbolic representations are advantageous. In the context of OOD generalization, we argue that programmatic representations should be used in problems that require computational constructs that neural models have difficulty learning, such as stacks and queues.
For The Open Car Racing Simulator (TORCS) we show that using a more cautious reward function slows down the agent, enabling better generalization results.
Using sparse observation and augmenting observations with the previous action allows fully-connected policies to generalize to larger grids (100×100), outperforming convolutional and LSTM baselines that fail to scale and perform better than LEAPS.
We introduce a SPARSE MAZE task, which has wider corridors than normal MAZE, requiring explicit memory (stacks or queues) to find shortest paths. Neural policies struggle, while programmatic search synthesizes an optimal BFS solution. We used FunSearch for generating programmatic policies for this section.