Analyzing molecular surfaces to predict functional sites and identify protein cavities for small molecule binding is essential in structural biology and drug discovery, particularly when targeting allosteric sites or designing PROTACs. Moreover, measuring properties like volume, surface area, and pockets’ chemical descriptors helps in understanding protein function and improving drug development. Over the past decades, numerous surface and pocket-detection tools have been developed. While these tools provide valuable insights, they often require extensive postprocessing of text output files, making the analysis workflow cumbersome. To address this limitation, we introduce NanoShaperWeb, a web server that not only provides the computational capabilities of NanoShaper but also eliminates the need for manual text file processing through an intuitive web-based interface. Molecular surface and pocket-detection computations are performed remotely via a queue, with results visualized interactively and available for download. The application also delivers for each pocket DrugPred descriptors, enabling deeper insights into pocket features. By streamlining molecular analysis, this tool offers an efficient and accessible platform for researchers, supporting key stages of the drug design pipeline. The NanoShaperWeb tool is freely accessible online at https://nanoshaperweb.iit.it/ with no required registration.
@article{abate2025nanoshaperweb,author={Abate, Carlo and Serra, Eleonora and Rocchia, Walter and Cavalli, Andrea and Decherchi, Sergio},title={NanoShaperWeb: Molecular Surface and Pocket Detection Made Visual},journal={Journal of Chemical Information and Modeling},year={2025},month=jun,day={30},publisher={American Chemical Society},issn={1549-9596},doi={10.1021/acs.jcim.5c00821},url={https://doi.org/10.1021/acs.jcim.5c00821},}
2024
MLST
AMCG: a graph dual atomic-molecular conditional molecular generator
Drug design is both a time consuming and expensive endeavour. Computational strategies offer viable options to address this task; deep learning approaches in particular are indeed gaining traction for their capability of dealing with chemical structures. A straightforward way to represent such structures is via their molecular graph, which in turn can be naturally processed by graph neural networks. This paper introduces AMCG, a dual atomic-molecular, conditional, latent-space, generative model built around graph processing layers able to support both unconditional and conditional molecular graph generation. Among other features, AMCG is a one-shot model allowing for fast sampling, explicit atomic type histogram assignation and property optimization via gradient ascent. The model was trained on the Quantum Machines 9 (QM9) and ZINC datasets, achieving state-of-the-art performances. Together with classic benchmarks, AMCG was also tested by generating large-scale sampled sets, showing robustness in terms of sustainable throughput of valid, novel and unique molecules.
@article{abate2024amcg,title={AMCG: a graph dual atomic-molecular conditional molecular generator},author={Abate, Carlo and Decherchi, Sergio and Cavalli, Andrea},journal={Machine Learning: Science and Technology},volume={5},number={3},pages={035004},year={2024},month=jul,publisher={IOP Publishing},doi={10.1088/2632-2153/ad5bbf},url={https://dx.doi.org/10.1088/2632-2153/ad5bbf},}
2023
WIREs
Graph neural networks for conditional de novo drug design
Drug design is costly in terms of resources and time. Generative deep learning techniques are using increasing amounts of biochemical data and computing power to pave the way for a new generation of tools and methods for drug discovery and optimization. Although early methods used SMILES strings, more recent approaches use molecular graphs to naturally represent chemical entities. Graph neural networks (GNNs) are learning models that can natively process graphs. The use of GNNs in drug discovery is growing exponentially. GNNs for drug design are often coupled with conditioning techniques to steer the generation process towards desired chemical and biological properties. These conditioned graph-based generative models and frameworks hold promise for the routine application of GNNs in drug discovery.
@article{abate2023graph,title={Graph neural networks for conditional de novo drug design},author={Abate, Carlo and Decherchi, Sergio and Cavalli, Andrea},journal={WIREs Computational Molecular Science},volume={13},number={4},pages={e1651},year={2023},doi={10.1002/wcms.1651},url={https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1651},eprint={https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1651},keywords={deep learning, drug discovery, generative models, graph neural networks},}
WIREs
Ligandability and druggability assessment via machine learning
Drug discovery is a daunting and failure-prone task. A critical process in this research field is represented by the biological target and pocket identification steps as they heavily determine the subsequent efforts in selecting a putative ligand, most often a small molecule. Finding "ligandable" pockets, namely protein cavities that may accept a drug-like binder is instrumental to the more general and drug discovery oriented "druggability" estimation process. While high-throughput experimental techniques exist to identify putative binding sites other than the orthosteric one, these techniques are relatively expensive and not so commonly available in labs. In this regard, computational means of detecting ligandable pockets are advisable for their inexpensiveness and speed. These methods can become, in principle, particularly predictive when supported by machine learning methodologies that provide the modeling framework. As with any data-driven effort, the outcome critically depends on the input data, its featurization process and possible associated biases. Also, the machine learning task, (supervised/unsupervised) the learning method, and the possible usage of molecular dynamics data considerably shape the inherent assumptions of the modeling step. Defining a proper quantitative thermodynamic and/or kinetic score (or label) is key to the modeling process; here we revise literature and propose residence time as a novel ideal indicator of ligandability. Interestingly the vast majority of the methods does not keep into consideration kinetics nor thermodynamics when devising predictors.
@article{dipalma2023ligandability,title={Ligandability and druggability assessment via machine learning},author={Di Palma, Francesco and Abate, Carlo and Decherchi, Sergio and Cavalli, Andrea},journal={WIREs Computational Molecular Science},volume={13},number={5},pages={e1676},year={2023},doi={10.1002/wcms.1676},url={https://wires.onlinelibrary.wiley.com/doi/abs/10.1002/wcms.1676},eprint={https://wires.onlinelibrary.wiley.com/doi/pdf/10.1002/wcms.1676},keywords={druggability, ligandability, machine learning, pocket detection},}
conference papers
2025
ICLR
MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks
We propose a novel approach to compute the MAXCUT in attributed graphs, i.e. graphs with features associated with nodes and edges, by exploiting heterophilic message passing to assign connected nodes to different partitions. The approach is fully differentiable, making it possible to find solutions that jointly optimize the MAXCUT along with other objectives. Based on the obtained MAXCUT partition, we implement MaxCutPool, a hierarchical graph pooling layer for graph neural networks. The layer is sparse, differentiable, and particularly suitable for downstream tasks on heterophilic graphs. Our key contributions include: 1) A novel MAXCUT computation method for attributed graphs, 2) A new hierarchical pooling layer especially effective for heterophilic graphs, 3) A general scheme for node-to-supernode assignment, 4) The introduction of the first heterophilic dataset for graph classification.
@inproceedings{abate2025maxcutpool,title={MaxCutPool: differentiable feature-aware Maxcut for pooling in graph neural networks},author={Abate, Carlo and Bianchi, Filippo Maria},booktitle={The Thirteenth International Conference on Learning Representations},year={2025},url={https://openreview.net/forum?id=xlbXRJ2XCP},}
2020
A flexible simulation-based framework for model-based/data-driven dependability evaluation
Modern predictive maintenance is the convergence of several technological trends: developing new techniques and algorithms can be very costly due to the need for a physical prototype. This research has the final aim to build a simulation-based software framework for modeling and analysing complex systems and for defining predictive maintenance algorithms. By the usage of simulation, quantitative evaluation of the dependability of such systems will be possible. The ERTMS/ETCS dependability case study is presented to prove the applicability of the software.
@inproceedings{abate2020flexible,title={A flexible simulation-based framework for model-based/data-driven dependability evaluation},author={Abate, Carlo and Campanile, Lelio and Marrone, Stefano},booktitle={2020 IEEE International Symposium on Software Reliability Engineering Workshops (ISSREW)},year={2020},pages={261-266},doi={10.1109/ISSREW51248.2020.00083},}
preprints
2025
SSRN
A Map of 1 Million Polymorphic Tandem Repeats in the Human Genome
Tandem repeats (TRs) are genomic structures consisting of multiple copies of a, typically short, motif sequence lying adjacent to each other. More permissive definitions allow some degree of variability within the sequences as well as small gaps among the motifs. Polymorphism of TRs, namely the increase/decrease in the number of repetitions, is one of the most common structural variations in eukaryotic genomes that has already been correlated with more than 45 human diseases and it is expected to be responsible for many others. In this scenario, it is not surprising that several research projects, aimed at understanding the etiology of a disease, have pursued the goal of discovering new associations with a polymorphic tandem repeat (PTR). A crucial step for all these projects to succeed is that of compiling a comprehensive list of candidate loci. The slackness of the definition of TRs and the difficulty of establishing polymorphism, which would require comparing the number of repetitions of the same TR within a heterogeneous population, make the task of distilling such a list challenging. In the present work, we compiled a comprehensive map of over 1 million PTRs by comparing more than 16 million candidates in a population of 56 assembled genomes from the NCBI repository. The two main achievements over previous results are: the identification of PTRs too long to be measured with standard short reads NGS sequencing and the discovery of PTRs not annotated in the hg38 reference.
@misc{geraci2025map,title={A Map of 1 Million Polymorphic Tandem Repeats in the Human Genome},author={Geraci, Filippo and Abate, Carlo},year={2025},note={Available at SSRN},url={https://ssrn.com/abstract=5021767},doi={10.2139/ssrn.5021767},}
doctoral thesis
2025
PhD Thesis
Graph neural network methods for representation and generation in drug discovery
Carlo
Abate
Alma Mater Studiorum - University of Bologna, Mar 2025
Drug discovery is a time-consuming and expensive process, often spanning over a decade and costing billions of dollars. This thesis advances graph-based machine learning approaches to accelerate this process, making three main contributions. First, we provide a comprehensive review of graph neural networks for conditional molecular generation, establishing a framework for understanding and comparing different methods. Building on these insights, we introduce AMCG (Atomic-Molecular Conditional Generator), a novel generative framework that achieves state-of-the-art performance while offering one-shot generation capability and effective property optimization via gradient ascent. Motivated by the heterophilic nature of molecular graphs — where connected atoms often have dissimilar features — we then develop MaxCutPool, a differentiable graph pooling technique based on the maximum cut problem. By combining graph-theoretical principles with deep learning, MaxCutPool demonstrates superior performance on heterophilic graphs while remaining competitive on standard benchmarks and maintaining computational efficiency. Together, these contributions advance both the theoretical foundations of graph representation learning and provide practical tools for accelerating drug discovery.
@phdthesis{abate2025gnnmethods,school={Alma Mater Studiorum - University of Bologna},author={Abate, Carlo},month=mar,year={2025},title={Graph neural network methods for representation and generation in drug discovery},keywords={Machine Learning, Deep Learning, Graph Neural Networks, De novo drug design, Computational drug design, Graph Pooling, Graph Representation Learning},url={https://amsdottorato.unibo.it/id/eprint/11943/},doi={10.48676/unibo/amsdottorato/11943}}