17 Apr 2025 3 min read

HLS-Eval: The First Benchmark for Evaluating LLMs in High-Level Synthesis Design

The rise of LLMs in hardware design

Large language models (LLMs) have been making waves in hardware design, particularly in generating and optimizing code for hardware description languages (HDLs) like Verilog. But there's a new frontier in semiconductor design that's been largely overlooked: high-level synthesis (HLS).

HLS allows designers to write hardware accelerators in C++ instead of traditional HDLs, dramatically increasing productivity. However, it comes with its own set of domain-specific challenges—restricted C++ subsets, specialized pragmas, and vendor-specific quirks that make it a perfect candidate for LLM assistance.

Enter HLS-Eval, the first comprehensive benchmark and evaluation framework specifically designed to measure how well LLMs can handle HLS design tasks. Developed by researchers at Georgia Tech, this open-source tool could fundamentally change how we approach AI-assisted hardware design.

What HLS-Eval brings to the table

HLS-Eval isn't just another benchmark—it's a complete ecosystem for evaluating LLMs in HLS workflows. Here's what makes it stand out:

94 diverse benchmark designs curated from PolyBench, MachSuite, CHStone, and real-world accelerators like FlowGNN
"LLM-ready" formatting with natural language descriptions, testbenches, and reference implementations
Parallel evaluation engine that can test multiple LLMs simultaneously across different HLS tasks
Four critical metrics measuring parseability, compilability, runnability, and synthesizability

"We're seeing LLMs being applied to Verilog generation, but HLS has been largely ignored despite its growing importance," says Stefan Abi-Karam, one of the paper's authors. "HLS-Eval gives researchers the tools to properly evaluate these models where it matters most."

The HLS design tasks that matter

HLS-Eval focuses on two core challenges where LLMs could provide the most value:

1. From English to HLS C++

The framework evaluates how well LLMs can generate working HLS code from natural language descriptions—a common starting point for many hardware designers.

2. Hardware optimization edits

More than just making code synthesizable, HLS-Eval tests LLMs on:

Loop labeling for better pragma targeting
Fixed-point conversion for area/power savings
Dataflow refactoring for parallel execution
Loop tiling for memory efficiency

"Simply making code synthesizable isn't enough," explains Cong Hao, co-author of the paper. "The real value comes when LLMs can suggest hardware optimizations that even experienced designers might miss."

Surprising baseline results

The team evaluated several open-source LLMs on their benchmark, with some unexpected findings:

DeepSeek V3 outperformed larger models on code generation (97.6% synthesizability at pass@5)
Even Llama 3 8B showed promise on simpler editing tasks
Fixed-point conversion proved particularly challenging across all models

"The results show we're not yet at the point where LLMs can fully replace HLS experts," notes Abi-Karam. "But they're already useful as copilots, especially for routine optimizations."

Why this matters for AI in EDA

HLS-Eval represents more than just an academic exercise—it's a foundational tool for the coming wave of AI-assisted hardware design:

Democratization: Makes HLS more accessible to software engineers
Productivity: Could cut days from the hardware optimization cycle
Innovation: Enables new research into LLM-based HLS workflows

The framework is already being used internally at several semiconductor companies, and the team plans to expand it with more benchmarks and tool integrations.

Get involved

The entire HLS-Eval framework—benchmarks, evaluation tools, and baseline results—is available on GitHub under an open-source license. For hardware engineers, ML researchers, or EDA tool developers, this represents a unique opportunity to shape the future of AI in hardware design.

"We're just scratching the surface of what's possible with LLMs in HLS," says Hao. "With HLS-Eval, we're giving the community the tools to explore this frontier together."

Read the full paper on arXiv: HLS-Eval: A Benchmark and Framework for Evaluating LLMs on High-Level Synthesis Design Tasks Explore the code: GitHub Repository