Predictive computational modeling encompasses all of the work in the Jain Lab. This takes primary form in algorithmic approaches for drug discovery. The primary areas of research in the lab are: 1) methods for docking small molecules to proteins using empirically derived scoring functions, 2) methods for inducing the shape of a protein binding pocket given the structures and affinities of ligands that bind the pocket competitively, 3) generalized surface-based approaches to computing molecular similarity, both among small molecules and proteins, 4) approaches for modeling and prediction of polypharmacology based on molecular structure, and 5) applications of such methods for cancer drug discovery. All of the approaches share their roots in the use of sophisticated computational algorithms involving object representation, function optimization, and search.
Following a long period of applied research in defense applications and in speech understanding, Prof. Jain began a research career exclusively focused on issues in computational chemistry and computational biology. His foundational work in computer-aided drug design was done in industry, beginning with the Compass and Hammerhead techniques (see papers from 1994-1997). Compass involved a new representational scheme for capturing the 3D surface-properties of small molecules that made it possible to systematically address a previously unaddressed aspect in modeling the activity of small molecules: choice of the relative alignment and conformation (or pose) of competitive ligands including the detailed relationship of their hydrophobic shapes. A key insight, made with colleagues, was that the choice of pose should be directly governed by the function being used to predict binding affinity (essentially a direct analogy to physics where the lowest energy state is sought). The difficulty was that the function to predict activity was being induced at the same time as the pose choice. The Compass method overcame this problem, and was one of the foundational methods in establishing the field of multiple-instance learning, as it has come to be known within the Computer Science community. This work lead to the development of one of the first molecular docking programs described that addressed ligand conformational flexibility. The Hammerhead docking system built upon the molecular representations, multiple-instance approach, and search strategy developed for Compass.
Advances in Molecular Docking and Ligand-Based Modeling
Subsequent work built on the foundation laid by Compass and Hammerhead. These methods addressed problems in computation of molecular diversity and prediction of ADME properties (see papers from 1998-2000). Our most recent work in computational drug design (see the Surflex methodological papers from 2003 onward) is focused on pushing the frontiers of molecular docking and in constructing ligand-based models of protein active sites in cases where protein structure is unknown. The Surflex docking approach is unique, both with respect to scoring function and search methodology. Surflex-Dock is competitive with the best and most widely available methods in terms of docking accuracy and screening utility on publicly available benchmarks. We have recently made a substantial innovation to the multiple-instance parameter estimation process by generalizing our approach to now include negative training data. Putative inactive molecules have been added to a set of known active molecules in re-estimation of the scoring function for the Surflex docking method. We have continued our work in ligand-based modeling as well. The Surflex similarity method has been augmented, both in search strategy and in its objective function, to support the construction of ligand-based models of protein activity. The models are competitive with the best docking methods in terms of effectiveness in identifying novel ligands, generalizing remarkably well even across different chemical scaffolds. The Surflex QMOD approach takes QSAR to a new level, by transforming the problem into one of molecular docking. A protein binding site is induced given structure-activity data using the multiple-instance machine learning paradigm developed for Compass.
Rational and Predictive Pharmacology
Research within the lab has branched out to encompass larger biological scales, with studies that contemplate the multiple effects of small molecules in the whole organism. Our earliest work in drug discovery focused exclusively on the therapeutically desired target. At least as important are off-targets: those that are not intended to be modulated by a therapeutic but are affected at relevant drug concentrations. We are interested in building accurate predictive models for promiscuous bad-actor targets such as hERG and cytochrome-p450 enzymes. More broadly, we are interested in building models for multitudes of human targets in order to help guide the design and selection of compounds during preclinical research. This is challenging, both in terms of the stringency on model accuracy and also in terms of information curation regarding the multiple effects of existing therapeutics and those that have undergone clinical testing.
The laboratory is purely a dry one. We rely upon our collaborators to test predictions made by our computational tools. In addition to the hundreds of laboratories that make use of our software, we have active collaborations with both academic and industrial partners. We are particularly interested in applications involving cancer.