HLS-Eval: The First Benchmark for Evaluating LLMs in High-Level Synthesis Design
The rise of LLMs in hardware design
Large language models (LLMs) have been making waves in hardware design, particularly in generating and optimizing code for hardware description languages (HDLs) like Verilog. But there's a new frontier in semiconductor design that's been largely overlooked: high-level synthesis (HLS).
HLS allows designers to write hardware accelerators in C++ instead of traditional HDLs, dramatically increasing productivity. However, it comes with its own set of domain-specific challenges—restricted C++ subsets, specialized pragmas, and vendor-specific quirks that make it a perfect candidate for LLM assistance.
Enter HLS-Eval, the first comprehensive benchmark and evaluation framework specifically designed to measure how well LLMs can handle HLS design tasks. Developed by researchers at Georgia Tech, this open-source tool could fundamentally change how we approach AI-assisted hardware design.
What HLS-Eval brings to the table
HLS-Eval isn't just another benchmark—it's a complete ecosystem for evaluating LLMs in HLS workflows. Here's what makes it stand out:
- 94 diverse benchmark designs curated from PolyBench, MachSuite, CHStone, and real-world accelerators like FlowGNN
- "LLM-ready" formatting with natural language descriptions, testbenches, and reference implementations
- Parallel evaluation engine that can test multiple LLMs simultaneously across different HLS tasks
- Four critical metrics measuring parseability, compilability, runnability, and synthesizability
"We're seeing LLMs being applied to Verilog generation, but HLS has been largely ignored despite its growing importance," says Stefan Abi-Karam, one of the paper's authors. "HLS-Eval gives researchers the tools to properly evaluate these models where it matters most."
The HLS design tasks that matter
HLS-Eval focuses on two core challenges where LLMs could provide the most value:
1. From English to HLS C++
The framework evaluates how well LLMs can generate working HLS code from natural language descriptions—a common starting point for many hardware designers.
2. Hardware optimization edits
More than just making code synthesizable, HLS-Eval tests LLMs on:
- Loop labeling for better pragma targeting
- Fixed-point conversion for area/power savings
- Dataflow refactoring for parallel execution
- Loop tiling for memory efficiency
"Simply making code synthesizable isn't enough," explains Cong Hao, co-author of the paper. "The real value comes when LLMs can suggest hardware optimizations that even experienced designers might miss."
Surprising baseline results
The team evaluated several open-source LLMs on their benchmark, with some unexpected findings:
- DeepSeek V3 outperformed larger models on code generation (97.6% synthesizability at pass@5)
- Even Llama 3 8B showed promise on simpler editing tasks
- Fixed-point conversion proved particularly challenging across all models
"The results show we're not yet at the point where LLMs can fully replace HLS experts," notes Abi-Karam. "But they're already useful as copilots, especially for routine optimizations."
Why this matters for AI in EDA
HLS-Eval represents more than just an academic exercise—it's a foundational tool for the coming wave of AI-assisted hardware design:
- Democratization: Makes HLS more accessible to software engineers
- Productivity: Could cut days from the hardware optimization cycle
- Innovation: Enables new research into LLM-based HLS workflows
The framework is already being used internally at several semiconductor companies, and the team plans to expand it with more benchmarks and tool integrations.
Get involved
The entire HLS-Eval framework—benchmarks, evaluation tools, and baseline results—is available on GitHub under an open-source license. For hardware engineers, ML researchers, or EDA tool developers, this represents a unique opportunity to shape the future of AI in hardware design.
"We're just scratching the surface of what's possible with LLMs in HLS," says Hao. "With HLS-Eval, we're giving the community the tools to explore this frontier together."
Read the full paper on arXiv: HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks Explore the code: GitHub Repository