FPGAs facilitate prototyping and debug, and recently accelerate full-stack simulations due to their rapid turnaround time (TAT). However, this TAT is restrictive in exhaustive design space explorations of parameterized RTL generators, especially DNN accelerators that unleash an explosive full-stack search space. This paper presents Quickloop, an efficient and scalable framework to enable FPGA-accelerated exploration. Quickloop first abstracts away the cumbersome flow of RTL generation, software stack, FPGA toolflow, workload execution and metrics extraction by wrapping these stages into isolated Quicksteps, featuring cascadability, scalability, and replay. Then, we analytically minimize the FPGA toolflow TAT via a novel, data-driven strategy that intelligently utilizes build fragments from previous iterations, enhancing the loop efficiency and simultaneously lowering the toolflow’s compute utilization.
Quickloop is built around the OpenAI Gym environment framework and thus supports drop-in regression and reinforcement learning explorations. With a Quickloop around a reference Berkeley’s Gemmini DNN accelerator, we exhaustively explore its parameter space and discover complex performance patterns, based on full-stack simulation of Imagenet benchmarks as a workload. Compared to conventional FPGA toolflow, we further show that Quickloop effectively reduces episodal time by above 30%, as the episode approaches realistic lengths.