Publications | Antonio De Caro

Legend:

Conference/Workshop Journal arXiv/Poster/Other

2026

PDP
Evaluating Portable Programming Models for Hypergraph Label Propagation on GPUs

Antonio De Caro, Dario De Maio, Francesco Monzillo, and 2 more authors

In Proceedings of the 34th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2026

Abs DOI Bib PDF

Understanding community formation is a fundamental task in network science, as it reveals the structural organization of complex networks and provides insights into the functional roles and interactions of their nodes. Recently, hypergraphs have emerged as a powerful framework for modeling high-order relationships in real-world systems, capturing multi-entity relations beyond traditional pairwise connections. However, efficiently processing hypergraphs on GPUs remains challenging due to their inherent sparsity and structural irregularity, leading to poor memory locality and load imbalance. Given that the world’s most powerful supercomputers are equipped with GPUs from different vendors, such as NVIDIA, AMD, and Intel, a portable performance solution is essential to exploit these systems effectively without rewriting the entire codebase for each platform. In this work, we pursue this objective by evaluating three portable programming models, OpenMP, SYCL, and Kokkos, applied to the label propagation community detection algorithm for hypergraphs, examining their programmability and performance on heterogeneous GPU architectures.
@inproceedings{labelPDP26, title = {{E}valuating {P}ortable {P}rogramming {M}odels for {H}ypergraph {L}abel {P}ropagation on {G}{P}{U}s}, author = {De Caro, Antonio and De Maio, Dario and Monzillo, Francesco and Antelmi, Alessia and Cosenza, Biagio}, booktitle = {Proceedings of the 34th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)}, series = {PDP '26}, year = {2026}, isbn = {}, publisher = {IEEE}, doi = {}, pages = {}, numpages = {8}, keywords = {Label Propagation, Hypergraphs, Portable Programming Models, GPU}, address = {New York, NY, USA}, }
PDP
Optimizing the LiGen Drug Discovery Pipeline for Intel Max GPUs

Saleh Jamali Golzar, Lorenzo Carpentieri, Antonio De Caro, and 7 more authors

In Proceedings of the 34th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP), 2026

Abs DOI Bib PDF

High-throughput virtual screening is a fundamental technique in modern drug discovery, enabling the identification of promising drug candidates by evaluating millions of ligand–protein interactions in silico. Achieving high performance and scalability in such workflows requires efficient exploitation of parallel architectures. LiGen is a high-performance virtual screening application designed to accelerate molecular docking and scoring computations. Initially implemented in CUDA to fully exploit NVIDIA GPUs, LiGen has been ported to SYCL to extend its support for heterogeneous architectures. In this work, we enhance the LiGen SYCL codebase through a performance-portable implementation to maximize the ligand throughput based on the new SYCL features introduced in oneAPI. Our optimized implementation leverages portable features that abstract the underlying hardware resources, enabling dynamic adaptation of the number of ligands processed per kernel to the characteristics of the target device. We further extended our implementation with architecture-specific optimizations for Intel GPUs, focusing on sub-group size tuning and efficient General Register File (GRF) utilization to maximize ligand throughput. Our experimental evaluation compares our optimized SYCL implementations against the manually-tuned CUDA and SYCL baselines. Results show that our version achieves up to 1.69x throughput compared to the SYCL baseline without requiring manual tuning, while the version with Intel-specific optimizations achieves a throughput of up to 2.42x.
@inproceedings{ligenPDP26, title = {Optimizing the LiGen Drug Discovery Pipeline for Intel Max GPUs}, author = {Jamali Golzar, Saleh and Carpentieri, Lorenzo and De Caro, Antonio and Cosenza, Biagio and Gadioli, Davide and Accordi, Gianmarco and Palermo, Gianluca and Ficarelli, Federico and Gregori, Daniele and Beccari, Andrea R.}, booktitle = {Proceedings of the 34th Euromicro International Conference on Parallel, Distributed, and Network-Based Processing (PDP)}, series = {PDP '26}, year = {2026}, isbn = {}, publisher = {IEEE}, doi = {}, pages = {}, numpages = {8}, keywords = {}, address = {New York, NY, USA}, }

2025

SC
SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching

Antonio De Caro, Gennaro Cordasco, Federico Ficarelli, and 1 more author

In Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis, 2025

Abs DOI Bib PDF Code Slides

Subgraph isomorphism is a fundamental graph problem with applications in diverse domains from biology to social network analysis. Of particular interest is molecular matching, which uses a subgraph isomorphism formulation for the drug discovery process. While subgraph isomorphism is known to be NP-complete and computationally expensive, in the molecular matching formulation a number of domain constraints allow for efficient implementations. This paper presents SIGMo, a high-throughput, portable subgraph isomorphism framework for GPUs, specifically designed for batch molecular matching. SIGMo takes advantage of the specific domain formulation to provide a more efficient filter-and-join strategy: the framework introduces a novel multi-level iterative filtering technique based on neighborhood signature encoding to efficiently prune candidates prior to a GPU-optimized join phase using a stack-based DFS traversal. The GPU implementation is written in SYCL, allowing portable execution on AMD, Intel, and NVIDIA GPUs. Our experimental evaluation on a large dataset from ZINC demonstrates up to 1470\times speedup over state-of-the-art subgraph isomorphism frameworks, and achieves a throughput of 7.7 billion matches per second on a cluster with 256 GPUs.
@inproceedings{sigmoSC25, title = {SIGMo: High-Throughput Batched Subgraph Isomorphism on GPUs for Molecular Matching}, author = {De Caro, Antonio and Cordasco, Gennaro and Ficarelli, Federico and Cosenza, Biagio}, booktitle = {Proceedings of the International Conference for High Performance Computing, Networking, Storage and Analysis}, series = {SC '25}, year = {2025}, isbn = {9798400714665}, publisher = {Association for Computing Machinery}, doi = {10.1145/3712285.3759782}, pages = {1524–1538}, numpages = {15}, keywords = {Subgraph Isomorphism, GPU, Molecular Matching}, address = {New York, NY, USA}, }
ICPP
SYgraph: A Portable Heterogeneous Graph Analytics Framework for GPU

Antonio De Caro, Gennaro Cordasco, and Biagio Cosenza

In Proceedings of the 54th International Conference on Parallel Processing (ICPP 2025), 2025

Abs DOI Bib PDF Code Slides

Graph analytics play a crucial role in a wide range of fields, including social network analysis, bioinformatics, and scientific computing, due to their ability to model and explore complex relationships. However, optimizing graph algorithms is inherently difficult due to their memory-bound constraints, often resulting in poor performance on modern massively parallel hardware. In addition, most state-of-the-art implementations are designed in CUDA for NVIDIA GPUs, and thus they can not run on supercomputers equipped with AMD and Intel GPUs. To address these challenges, we propose SYgraph, a portable heterogeneous graph analytics framework written in SYCL. SYgraph provides an efficient two-layer bitmap data layout optimized for GPU memory, eliminates the need for pre- or post-processing steps, and abstracts the complexity of working with diverse target platforms. Experimental results demonstrate that SYgraph delivers competitive performance against state-of-the-art frameworks on datasets with up to 21 million nodes and 530 million edges on NVIDIA GPUs while being able to target any SYCL-supported device, such as AMD and Intel GPUs.
@inproceedings{sygraphICPP25, title = {SYgraph: A Portable Heterogeneous Graph Analytics Framework for GPU}, author = {De Caro, Antonio and Cordasco, Gennaro and Cosenza, Biagio}, booktitle = {Proceedings of the 54th International Conference on Parallel Processing (ICPP 2025)}, year = {2025}, doi = {10.1145/3754598.3754615}, }
Poster
SYgraph: Efficient Data Layout for Heterogeneous Parallel Graph Analytics

Antonio De Caro, Gennaro Cordasco, and Biagio Cosenza

In Proceedings of the 3th International Workshop on OpenCL and SYCL, 2025

Abs DOI Bib PDF

Graph analytics serves as an essential tool for modeling and exploring complex relationships in a variety of domains, including social networks, bioinformatics, and scientific computing. These applications often involve analyzing massive and intricate datasets, making it critical to optimize graph algorithms for modern hardware. However, achieving optimal performance on massively parallel architectures is a challenging task due to the memory-bound nature of graph computations and their inherently irregular workloads. Existing graph processing frameworks, such as Gunrock, Tigr, or SEP-Graph, have made strides in optimizing these workloads, but are predominantly designed for NVIDIA GPUs using CUDA. This design choice restricts their applicability in environments equipped with other high-performance hardware, such as AMD and Intel GPUs, which now power some of the world’s fastest supercomputers.
@inproceedings{sygraphIWOCL25, title = {SYgraph: Efficient Data Layout for Heterogeneous Parallel Graph Analytics}, author = {De Caro, Antonio and Cordasco, Gennaro and Cosenza, Biagio}, booktitle = {Proceedings of the 3th International Workshop on OpenCL and SYCL}, year = {2025}, doi = {10.1145/3731125.3731132}, }
IPDPS
Phase-based Frequency Scaling for Energy-efficient Heterogeneous Computing

Lorenzo Carpentieri, Antonio De Caro, Majid Salimi Beni, and 2 more authors

In Proceedings of the 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2025), 2025

Abs DOI Bib PDF

Energy efficiency has been a major challenge for exascale computing. Frequency scaling is a powerful technique to achieve energy savings in modern heterogeneous systems, and can be applied either at a coarse granularity, by application, or at a fine granularity, by setting the frequency for each computational kernel. The chosen granularity significantly impacts the performance and energy consumption of applications due to frequency-change overhead. We propose a novel phase-based method that minimizes the frequency-change overhead and improves performance and energy efficiency on heterogeneous multi-GPU systems. Our approach detects different phases through application profiling and DAG analysis, and sets an optimal frequency for each phase. Our methodology also considers MPI programs, where the overhead can be hidden by overlapping frequency-change with communication. Experimental results show up to 37% energy saving and 1.87x speedup for various benchmarks on a single GPU, and 68% energy saving and 3.63x speedup on two multi-GPU applications.
@inproceedings{phasebasedIPDPS25, title = {Phase-based Frequency Scaling for Energy-efficient Heterogeneous Computing}, author = {Carpentieri, Lorenzo and De Caro, Antonio and Salimi Beni, Majid and Fan, Kaijie and Cosenza, Biagio}, booktitle = {Proceedings of the 39th IEEE International Parallel & Distributed Processing Symposium (IPDPS 2025)}, year = {2025}, doi = {10.1109/IPDPS64566.2025.00078}, }

2022

Master Thesis

Developing Educational Serious Games via a Cloud Solution: Identification and implementation of a model that fits Cloud Computing into Educational Games

Antonio De Caro

DiVA, 2022

Abs HTML

Cloud computing is a current trend in information technology and has become an important part of many industries. In the educational games field, cloud computing has the potential to revolutionize how educational games are created, delivered, and consumed.This research aims to contribute to a better understanding of how cloud computing can be integrated into the development of educational games and how developers can utilize its powerful tools to create games that are not only more engaging but also more effective in terms of learning outcomes and deployment speed. To address this issue, this thesis investigates the practicalities of integrating cloud computing with educational games using the fundamentals of ”Educational Serious Games as a Service” (ESGaaS) which highlight a set of characteristics that enable the execution of educational games in a cloud environment. This research takes into account the requirements of the ”Happy Heart” game in order to develop and evaluate such a model, as well as to identify an architecture that is compatible with the ESGaaS model and the quality criteria necessary for such an architecture. By applying a design science research method, we developed a prototype that addresses the needs of an ESGaaS-compliant architecture, and we evaluated it by highlighting the sensitivity points, risks, non-risks, and tradeoffs of such an architecture. This research has demonstrated that deploying educational games according to the characteristics of ”Educational Serious Games as a service” can solve some of the challenges of educational game development and lead to faster deployment of the game, improved learning outcomes, and easier assessment of student performance.