Drug design is both a time consuming and expensive endeavour. Computational strategies offer viable options to address this task; deep learning approaches in particular are indeed gaining traction for their capability of dealing with chemical structures. A straightforward way to represent such structures is via their molecular graph, which in turn can be naturally processed by graph neural networks. This paper introduces AMCG, a dual atomic-molecular, conditional, latent-space, generative model built around graph processing layers able to support both unconditional and conditional molecular graph generation. Among other features, AMCG is a one-shot model allowing for fast sampling, explicit atomic type histogram assignation and property optimization via gradient ascent. The model was trained on the Quantum Machines 9 (QM9) and ZINC datasets, achieving state-of-the-art performances. Together with classic benchmarks, AMCG was also tested by generating large-scale sampled sets, showing robustness in terms of sustainable throughput of valid, novel and unique molecules.
@article{abate2024amcg,title={AMCG: a graph dual atomic-molecular conditional molecular generator},author={Abate, Carlo and Decherchi, Sergio and Cavalli, Andrea},journal={Machine Learning: Science and Technology},volume={5},number={3},pages={035004},year={2024},month=jul,publisher={IOP Publishing},doi={10.1088/2632-2153/ad5bbf},url={https://dx.doi.org/10.1088/2632-2153/ad5bbf},}
2023
WIREs
Graph neural networks for conditional de novo drug design
Drug design is costly in terms of resources and time. Generative deep learning techniques are using increasing amounts of biochemical data and computing power to pave the way for a new generation of tools and methods for drug discovery and optimization. Although early methods used SMILES strings, more recent approaches use molecular graphs to naturally represent chemical entities. Graph neural networks (GNNs) are learning models that can natively process graphs. The use of GNNs in drug discovery is growing exponentially. GNNs for drug design are often coupled with conditioning techniques to steer the generation process towards desired chemical and biological properties. These conditioned graph-based generative models and frameworks hold promise for the routine application of GNNs in drug discovery.
@article{abate2023graph,title={Graph neural networks for conditional de novo drug design},author={Abate, Carlo and Decherchi, Sergio and Cavalli, Andrea},journal={WIREs Computational Molecular Science},volume={13},number={4},pages={e1651},year={2023},doi={10.1002/wcms.1651},url={https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1651},eprint={https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1651},keywords={deep learning, drug discovery, generative models, graph neural networks},}
WIREs
Ligandability and druggability assessment via machine learning
Drug discovery is a daunting and failure-prone task. A critical process in this research field is represented by the biological target and pocket identification steps as they heavily determine the subsequent efforts in selecting a putative ligand, most often a small molecule. Finding "ligandable" pockets, namely protein cavities that may accept a drug-like binder is instrumental to the more general and drug discovery oriented "druggability" estimation process. While high-throughput experimental techniques exist to identify putative binding sites other than the orthosteric one, these techniques are relatively expensive and not so commonly available in labs. In this regard, computational means of detecting ligandable pockets are advisable for their inexpensiveness and speed. These methods can become, in principle, particularly predictive when supported by machine learning methodologies that provide the modeling framework. As with any data-driven effort, the outcome critically depends on the input data, its featurization process and possible associated biases. Also, the machine learning task, (supervised/unsupervised) the learning method, and the possible usage of molecular dynamics data considerably shape the inherent assumptions of the modeling step. Defining a proper quantitative thermodynamic and/or kinetic score (or label) is key to the modeling process; here we revise literature and propose residence time as a novel ideal indicator of ligandability. Interestingly the vast majority of the methods does not keep into consideration kinetics nor thermodynamics when devising predictors.
@article{dipalma2023ligandability,title={Ligandability and druggability assessment via machine learning},author={Di Palma, Francesco and Abate, Carlo and Decherchi, Sergio and Cavalli, Andrea},journal={WIREs Computational Molecular Science},volume={13},number={5},pages={e1676},year={2023},doi={10.1002/wcms.1676},url={https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1676},eprint={https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1676},keywords={druggability, ligandability, machine learning, pocket detection},}
conference papers
2025
ICLR
MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks
We propose a novel approach to compute the MAXCUT in attributed graphs, i.e. graphs with features associated with nodes and edges, by exploiting heterophilic message passing to assign connected nodes to different partitions. The approach is fully differentiable, making it possible to find solutions that jointly optimize the MAXCUT along with other objectives. Based on the obtained MAXCUT partition, we implement MaxCutPool, a hierarchical graph pooling layer for graph neural networks. The layer is sparse, differentiable, and particularly suitable for downstream tasks on heterophilic graphs. Our key contributions include: 1) A novel MAXCUT computation method for attributed graphs, 2) A new hierarchical pooling layer especially effective for heterophilic graphs, 3) A general scheme for node-to-supernode assignment, 4) The introduction of the first heterophilic dataset for graph classification.
@inproceedings{abate2025maxcutpool,title={MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks},author={Abate, Carlo and Bianchi, Filippo Maria},booktitle={The Thirteenth International Conference on Learning Representations},year={2025},url={https://openreview.net/forum?id=xlbXRJ2XCP},}
2020
A flexible simulation-based framework for model-based/data-driven dependability evaluation
Modern predictive maintenance is the convergence of several technological trends: developing new techniques and algorithms can be very costly due to the need for a physical prototype. This research has the final aim to build a simulation-based software framework for modeling and analysing complex systems and for defining predictive maintenance algorithms. By the usage of simulation, quantitative evaluation of the dependability of such systems will be possible. The ERTMS/ETCS dependability case study is presented to prove the applicability of the software.
@inproceedings{abate2020flexible,title={A flexible simulation-based framework for model-based/data-driven dependability evaluation},author={Abate, Carlo and Campanile, Lelio and Marrone, Stefano},booktitle={2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)},year={2020},pages={261-266},doi={10.1109/ISSREW51248.2020.00083},}
preprints
doctoral thesis
2025
PhD Thesis
Graph neural network methods for representation and generation in drug discovery
Drug discovery is a time-consuming and expensive process, often spanning over a decade and costing billions of dollars. This thesis advances graph-based machine learning approaches to accelerate this process, making three main contributions. First, we provide a comprehensive review of graph neural networks for conditional molecular generation, establishing a framework for understanding and comparing different methods. Building on these insights, we introduce AMCG (Atomic-Molecular Conditional Generator), a novel generative framework that achieves state-of-the-art performance while offering one-shot generation capability and effective property optimization via gradient ascent. Motivated by the heterophilic nature of molecular graphs — where connected atoms often have dissimilar features — we then develop MaxCutPool, a differentiable graph pooling technique based on the maximum cut problem. By combining graph-theoretical principles with deep learning, MaxCutPool demonstrates superior performance on heterophilic graphs while remaining competitive on standard benchmarks and maintaining computational efficiency. Together, these contributions advance both the theoretical foundations of graph representation learning and provide practical tools for accelerating drug discovery.
@phdthesis{abate2025gnnmethods,school={University of Bologna},author={Abate, Carlo},month=mar,year={2025},title={Graph neural network methods for representation and generation in drug discovery},keywords={Machine Learning, Deep Learning, Graph Neural Networks, De novo drug design, Computational drug design, Graph Pooling, Graph Representation Learning},url={https://amsdottorato.unibo.it/id/eprint/11943/},doi={10.48676/unibo/amsdottorato/11943}}