Papers

Discover Innovation: Explore the Latest Scientific
Breakthroughs from PrimisAI Minds

Our Papers

Nexus: A Lightweight and Scalable Multi-Agent Framework for Complex Tasks Automation

Humza Sami¹, Mubashir ul Islam¹, Samy Charas¹, Asav Gandhi¹, Pierre-Emmanuel Gaillardon¹˒², and Valerio Tenace¹

¹PrimisAI, Los Gatos, CA, USA

²University of Utah, Salt Lake City, UT, USA

Abstract:

Recent advancements in Large Language Models (LLMs) have substantially evolved Multi-Agent Systems (MASs) capabilities, enabling systems that not only automate tasks but also leverage near-human reasoning capabilities. To achieve this, LLM-based MASs need to be built around two critical principles: (i) a robust architecture that fully exploits LLM potential for specific tasks -- or related task sets -- and (ii) an effective methodology for equipping LLMs with the necessary capabilities to perform tasks and manage information efficiently. It goes without saying that a priori architectural designs can limit the scalability and domain adaptability of a given MAS.
To address these challenges, in this paper we introduce Nexus: a lightweight Python framework designed to easily build and manage LLM-based MASs. Nexus introduces the following innovations: (i) a flexible multi-supervisor hierarchy, (ii) a simplified workflow design, and (iii) easy installation and open-source flexibility: Nexus can be installed via pip and is distributed under a permissive open-source license, allowing users to freely modify and extend its capabilities.
Experimental results demonstrate that architectures built with Nexus exhibit state-of-the-art performance across diverse domains. In coding tasks, Nexus-driven MASs achieve a 99% pass rate on HumanEval and a flawless 100% on VerilogEval-Human, outperforming cutting-edge reasoning language models such as o3-mini and DeepSeek-R1. Moreover, these architectures display robust proficiency in complex reasoning and mathematical problem solving, achieving correct solutions for all randomly selected problems from the MATH dataset. In the realm of multi-objective optimization, Nexus-based architectures successfully address challenging timing closure tasks on designs from the VTR benchmark suite, while guaranteeing, on average, a power saving of nearly 30%.

EDA-Aware RTL Generation with Large Language Models

Mubashir ul Islam¹, Humza Sami¹, Pierre-Emmanuel Gaillardon¹˒², and Valerio Tenace¹

¹PrimisAI, Los Gatos, CA, USA

²University of Utah, Salt Lake City, UT, USA

Abstract:

Large Language Models (LLMs) have become increasingly popular for generating RTL code. However, producing error-free RTL code in a zero-shot setting remains highly challenging for even state-of-the-art LLMs, often leading to issues that require manual, iterative refinement. This additional debugging process can dramatically increase the verification workload, underscoring the need for robust, automated correction mechanisms to ensure code correctness from the start.

In this work, we introduce AIvril2, a self-verifying, LLM-agnostic agentic framework aimed at enhancing RTL code generation through iterative corrections of both syntax and functional errors. Our approach leverages a collaborative multi-agent system that incorporates feedback from error logs generated by EDA tools to automatically identify and resolve design flaws.

AIVRIL: AI-DRIVEN RTL GENERATION WITH VERIFICATION IN-THE-LOOP

Mubashir ul Islam¹, Humza Sami¹, Pierre-Emmanuel Gaillardon¹˒², and Valerio Tenace¹

¹PrimisAI, Los Gatos, CA, USA

²University of Utah, Salt Lake City, UT, USA

Abstract:

Large Language Models (LLMs) are computational models capable of performing complex natural language processing tasks. Leveraging these capabilities, LLMs hold the potential to transform the entire hardware design stack, with predictions suggesting that front-end and back-end tasks could be fully automated in the near future. Currently, LLMs show great promise in streamlining Register Transfer Level (RTL) generation, enhancing efficiency, and accelerating innovation. However, their probabilistic nature makes them prone to inaccuracies—a significant drawback in RTL design, where reliability and precision are essential.

To address these challenges, this paper introduces AIVRIL, an advanced framework designed to enhance the accuracy and reliability of RTL-aware LLMs. AIVRIL employs a multi-agent, LLM- agnostic system for automatic syntax correction and functional verification, significantly reduc- ing—and in many cases, completely eliminating—instances of erroneous code generation. Experi- mental results conducted on the VerilogEval-Human dataset show that our framework improves code quality by nearly 2× when compared to previous works, while achieving an 88.46% success rate in meeting verification objectives. This represents a critical step toward automating and optimizing hardware design workflows, offering a more dependable methodology for AI-driven RTL design.