Arkansas Bioinformatics Consortium

Advancing Regulatory Science through Bioinformatics
Huixiao Hong, Roger Perkins and Weida Tong
Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, US

In 2010, the US FDA launched its Advancing Regulatory Science (ARS) initiative aimed at developing new tools, standards, and approaches to assessing safety, efficacy, quality, and performance across FDA-regulated products. The initiative identifies eight scientific areas that affect multiple regulated product domains or human populations, where bioinformatics play paramount roles. The Division of Bioinformatics and Biostatistics at FDA’s Center for Toxicological Research (NCTR) engages in bioinformatics applicable to such areas as biomarker development and validation, drug safety and repurposing, and personalized medicine. This poster will highlight selected bioinformatics research as well as selected databases and software tools that have been developed both in past years and more recently in support of FDA regulatory sciences. The DBB has led a large international consortium for the past eight years that has assessed the reliability of clinical and toxicological biomarkers derived from emerging microarray and next generation sequencing. Knowledge bases have been developed that aggregate diverse data associated with a disease, toxicity or phenotype, providing a means for mechanistic studies and development of predictive models. The Liver Toxicity Knowledge Base integrates in vitro, in vivo, gene expression data and textual data. The Endocrine Disruptor Knowledge Base contains in vitro and in vivo data for thousands of chemicals to build models to predict endocrine activity mediated by estrogen and androgen hormone receptors based solely on chemical structure. The Food-Borne Pathogen Genomics Knowledge Base provides tools to detect and characterize microbial isolates from gene expression data during pathogen outbreaks. ArrayTrack is a genomics tools widely used within FDA, as well as the public, private and academic research community worldwide. ArrayTrack provides an integrated means to manage, analyze and interpret omics data. It contains many statistical and visualization tools as well as libraries for gene and protein function and biological pathways. FDALabel is a web-based database containing the entire set of 40,000 FDA-approved drug labels. It contains a powerful and flexible search capability, and much other functionality valuable to researchers, regulators, drug developers and clinicians. FDALabel will provide an improved bridge for transparent drug safety knowledge exchange between the public and FDA. A common element of the databases and bioinformatics tools cited above is that they either are or will be openly available on the Internet, including an FDA external Cloud when available, thus advancing FDA data liberation. Many of NCTR’s bioinformatics tools can be accessed through the following link: FDA Bioinformations Tools.
AR-BIC-1

Poster Abstracts AR-BIC First annual conference - March 11-12, 2015

Poster Number	Title	Presenter and Affiliation
AR-BIC-1	Hong, Huixiao	Advancing Regulatory Science through Bioinformatics
AR-BIC-2	Chen, Tao	Discovery of Novel MicroRNAs in Rat Kidney Using Next Generation Sequencing, Microarray and Bioinformatics Technologies
AR-BIC-3	Bisgin, Tao	Exploring the impact of miRNA-seq pipelines on downstream analysis
AR-BIC-4	Ng, Huiwen	Development of a competitive molecular docking approach for predicting estrogen receptor agonists and antagonists
AR-BIC-5	Luo, Heng	Collection and molecular docking identification of associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions
AR-BIC-6	Hao, Ye	Deciphering adverse outcome pathways through network analysis of ToxCast data
AR-BIC-7	Chen, Yu-Chuan	Ensemble Survival Trees for Identifying Subpopulations in Personalized Medicine
AR-BIC-8	Chen, Minjun	The development of Liver Toxicity Knowledge Base (LTKB) for research and review of drug-induced liver injury
AR-BIC-9
AR-BIC-10
Abstracts
Advancing Regulatory Science through Bioinformatics Huixiao Hong, Roger Perkins and Weida Tong Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, US Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079, US In 2010, the US FDA launched its Advancing Regulatory Science (ARS) initiative aimed at developing new tools, standards, and approaches to assessing safety, efficacy, quality, and performance across FDA-regulated products. The initiative identifies eight scientific areas that affect multiple regulated product domains or human populations, where bioinformatics play paramount roles. The Division of Bioinformatics and Biostatistics at FDA’s Center for Toxicological Research (NCTR) engages in bioinformatics applicable to such areas as biomarker development and validation, drug safety and repurposing, and personalized medicine. This poster will highlight selected bioinformatics research as well as selected databases and software tools that have been developed both in past years and more recently in support of FDA regulatory sciences. The DBB has led a large international consortium for the past eight years that has assessed the reliability of clinical and toxicological biomarkers derived from emerging microarray and next generation sequencing. Knowledge bases have been developed that aggregate diverse data associated with a disease, toxicity or phenotype, providing a means for mechanistic studies and development of predictive models. The Liver Toxicity Knowledge Base integrates in vitro, in vivo, gene expression data and textual data. The Endocrine Disruptor Knowledge Base contains in vitro and in vivo data for thousands of chemicals to build models to predict endocrine activity mediated by estrogen and androgen hormone receptors based solely on chemical structure. The Food-Borne Pathogen Genomics Knowledge Base provides tools to detect and characterize microbial isolates from gene expression data during pathogen outbreaks. ArrayTrack is a genomics tools widely used within FDA, as well as the public, private and academic research community worldwide. ArrayTrack provides an integrated means to manage, analyze and interpret omics data. It contains many statistical and visualization tools as well as libraries for gene and protein function and biological pathways. FDALabel is a web-based database containing the entire set of 40,000 FDA-approved drug labels. It contains a powerful and flexible search capability, and much other functionality valuable to researchers, regulators, drug developers and clinicians. FDALabel will provide an improved bridge for transparent drug safety knowledge exchange between the public and FDA. A common element of the databases and bioinformatics tools cited above is that they either are or will be openly available on the Internet, including an FDA external Cloud when available, thus advancing FDA data liberation. Many of NCTR’s bioinformatics tools can be accessed through the following link: FDA Bioinformations Tools. AR-BIC-1

Discovery of Novel MicroRNAs in Rat Kidney Using Next Generation Sequencing, Microarray and Bioinformatics Technologies Tao Chen1, Fanxue Meng1, Michael Hackenberg2, Zhiguang Li1, Jian Yan1, 1Division of Genetic and Molecular Toxicology, National Center for Toxicological Research, Food and Drug Administration, Jefferson. 2Dpto. de Genetica, Facultad de Ciencias, Universidad de Granada, Granada, Spain MicroRNAs (miRNAs) are small non-coding RNAs that regulate a variety of biological processes. The version of the miRBase database (Release 18) includes 1,157 mouse and 680 rat mature miRNAs. Only one new rat mature miRNA was added to the rat miRNA database from version 16 to version 18 of miRBase, suggesting that many rat miRNAs remain to be discovered. Given the importance of rat as a model organism, discovery of the completed set of rat miRNAs is necessary for understanding rat miRNA regulation. In this study, next generation sequencing (NGS), microarray analysis and bioinformatics technologies were applied to discover novel miRNAs in rat kidneys. MiRanalyzer was utilized to analyze the sequences of the small RNAs generated from NGS analysis of rat kidney samples. Hundreds of novel miRNA candidates were examined according to the mappings of their reads to the rat genome, presence of sequences that can form a miRNA hairpin structure around the mapped locations, Dicer cleavage patterns, and the levels of their expression determined by both NGS and microarray analyses. Nine novel rat hairpin precursor miRNAs (pre-miRNA) were discovered with high confidence. Five of the novel pre-miRNAs are also reported in other species while four of them are rat specific. In summary, 9 novel pre-miRNAs and 14 novel mature miRNAs were identified via combination of NGS, microarray and bioinformatics high-throughput technologies. AR-BIC-2

Exploring the impact of miRNA-seq pipelines on downstream analysis Halil Bisgin, Binsheng Gong, Yuping Wang, Weida Tong Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration Background: Development of next-generation sequencing (NGS) techniques opened a new era in genomic research and led several studies in RNA-Seq. Despite the excitement, concerns have arisen about profiling tools and defining the standards. In recent years, FDA SEQC consortium took an initiative to address technical and statistical challenges in RNA-seq. However, similar issues have not been extensively studied for miRNA-Seq in the research community. Method: We investigated the effect of parameter space on downstream analysis by exploring four miRNA-Seq profiling tools (mirDeep2, mirExpress, miRNAkey, sRNAbench). Given the mirRNA-seq data generated from rat liver samples that were treated by Thioacetamide in four time points and three dose levels, we first compared the variance in the number of differentially expressed miRNAs (DEMs) for each tool with their own parameters. mirDeep2 and sRNAbench were further exploited with common parameters (genome mapping, windowing, quantification) to study the detection sensitivity and DEM variability along with normalization choice. Results: The analysis showed that under the same parameters sRNAbench detected more miRNAs most of which were also detected by mirDeep2. Under the same normalization method, mirDeep2 had more DEMs which showed higher overlap ratio with sRNAbench compared to detection sensitivities. While windowing introduced more variance in the detection, genome mapping was also effective in the variability of DEMs. For higher doses and longer durations, mirDeep2 was less sensitive to parameter changes which resulted in more agreement on DEMs within itself. Profiling parameters did not exceed 8%, when time, dose, and time-dose interaction were considered in the variance. A change in the normalization step affected the DEMs close to treatment factors, but the trend across time and dose remained similar. Conclusion: Results indicated that for the given normalization method, profiling parameters had limited impact on the downstream analysis. On the other hand, normalization considerably changed the number of DEMs, but choice of normalization still allowed time-dose pattern to follow similar trends which was the sign of treatment effect. AR-BIC-3

Development of a competitive molecular docking approach for predicting estrogen receptor agonists and antagonists Hui Wen Ng, Weida Tong and Huixiao Hong Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079 Molecular docking is a well-established molecular modeling technique commonly used in ligand screening and drug design. This method attempts to predict the binding mode and molecular interactions between a protein and a ligand as well as rank the predicted poses with scoring functions. The protein-ligand association in vivo is characterized by a dynamic process whereby protein-ligand binding is accompanied by a conformational change in the complex, a phenomenon commonly referred to as “induced-fit”. However, due to high computational costs, fully flexible docking remains impractical. In light of this, rigid docking and limited flexible docking become the most commonly practiced methods. The estrogen receptors (ERs) adopt distinctly different conformations upon binding to the agonists and antagonists. Using the ER subtype a agonist and antagonist conformations, we designed an in silico approach that more closely mimics the biological process, and used it to differentiate the agonist versus antagonist status of potential binders. The ability of this approach was first evaluated using true agonists and antagonists extracted from the crystal structures available in the protein data bank (PDB), and then further validated using a larger set of ligands from the literature. The usefulness of the approach was demonstrated with enrichment analysis in data sets with a large number of decoy ligands. The performance of individual agonist and antagonist docking conformations were found comparable to similar models in the literature. When combined in a competitive docking approach, they provided the ability to discriminate agonists from antagonists with good accuracy, as well as the ability to efficiently select true agonists and antagonists from decoys during enrichment analysis. In conclusion, this approach offers potential applications not only in drug discovery projects in the pharmaceutical industry but also in the screening of potential endocrine disrupting compounds (EDCs) by regulatory authorities to perform risk assessments on potential EDCs. AR-BIC-4

Collection and molecular docking identification of associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions Heng Luo1,2, Huixiao Hong1 1 National Center for Toxicological Research, US Food and Drug Administration, 2 University of Arkansas at Little Rock/University of Arkansas for Medical Sciences joint Bioinformatics program Corresponding to: Huixiao.Hong@fda.hhs.gov Idiosyncratic drug reactions (IDRs) are rare, somewhat dose-independent, patient-specific and hard to predict. Human leukocyte antigens (HLAs) are the major histocompatibility complex (MHC) in humans, are highly polymorphic and are associated with specific IDRs. Therefore, it is important to identify potential drug-HLA associations so that individuals who would develop IDRs can be identified before drug exposure. We harvested the associations between drugs and HLAs from the literature and built up a database named HLADR. Molecular docking was used to explore the known associations. From the analysis of docking scores between the 17 drugs and 74 class I HLAs, it was observed that the significantly associated drug-HLA pairs had statistically lower docking scores than those not reported to be significantly associated (t-test p < 0.05). This indicates that molecular docking can be utilized for screening drug-HLA interactions and predicting potential IDRs, and may improve drug safety and the implementation of personalized medicine. Examining the binding modes of drugs in the docked HLAs suggested several distinct binding sites inside class I HLAs, expanding our knowledge of the underlying interaction mechanisms between drugs and HLAs. AR-BIC-5

Deciphering adverse outcome pathways through network analysis of ToxCast data Hao Ye 1, Heng Luo2, Hui Wen Ng1, Weigong Ge1, Weida Tong1, Huixiao Hong1* 1Division of Bioinformatics and Biostatistics, National Center for Toxicological Research, U.S. Food and Drug Administration, 3900 NCTR Road, Jefferson, AR 72079; 2University of Arkansas at Little Rock/University of Arkansas for Medical Sciences Bioinformatics Graduate Program, Little Rock, Arkansas, AR 72204 Correspondence should be addressed to Dr. Huixiao Hong at huixiao.hong@fda.hhs.gov ToxCast data have been demonstrated to be efficient in characterizing the toxicological profiles of environmental chemicals. An adverse outcome pathway (AOP) is a group of molecular events related at higher levels of biological organizations (e.g. cell or tissue) that ultimately lead to an adverse outcome. Network analysis was frequently used to investigate the group properties of networks such as social network, electronic commerce network, and biological network. We first constructed a network in which the assays and chemicals assayed in ToxCast data were treated as nodes and the positive assay results were used to connect the nodes. We then applied a network analysis to inspire the understanding of ToxCast data and to identify potential AOPs. We also demonstrated the activity data of untested chemicals in the ToxCast assays could be predicted using the network analysis. We found the compound-assay network could be decomposed into seven densely connected modules based on its topological properties. Moreover, each of the seven modules was associated with different AOPs. For example, most of ER, AR, and GR related assays were significantly enriched in module one. We will present our results and discuss the implications, limitations and perspectives of the network analysis on ToxCast data. AR-BIC-6*

Ensemble Survival Trees for Identifying Subpopulations in Personalized Medicine Yu-Chuan Chen James J. Chen Recently, personalized medicine has received a great attention to improve safety and effectiveness in drug development. Personalized medicine aims to provide medical treatment that is tailored to the patient’s characteristics such as genomic biomarkers, disease history, etc., so that the benefit of treatment can be optimized. Subpopulations identification is to divide patients into several different subgroups where each subgroup corresponds to an optimal treatment. For two subgroups, traditionally multivariate Cox proportional hazards model is fitted and used to calculate the risk score when outcome is survival time endpoint. Median is commonly chosen as the cutoff value to separate patients. Here we propose a novel tree-based method that adopts the algorithm of relative risk trees to identify subgroup patients. After growing a relative risk tree, we apply ??-means clustering to group the terminal nodes based on the averaged covariates. We adopt an ensemble Bagging method to improve the performance of a single tree since it is well known that the performance of a single tree is quite unstable. A simulation study is conducted to compare the performance between our proposed method and the multivariate Cox model. The applications of our proposed method to three public cancer data sets are also conducted for illustration. AR-BIC-7

Collection and molecular docking identification of associations between drugs and class I human leukocyte antigens for predicting idiosyncratic drug reactions Heng Luo1,2, Huixiao Hong1 1 National Center for Toxicological Research, US Food and Drug Administration, 2 University of Arkansas at Little Rock/University of Arkansas for Medical Sciences joint Bioinformatics program Corresponding to: Huixiao.Hong@fda.hhs.gov Idiosyncratic drug reactions (IDRs) are rare, somewhat dose-independent, patient-specific and hard to predict. Human leukocyte antigens (HLAs) are the major histocompatibility complex (MHC) in humans, are highly polymorphic and are associated with specific IDRs. Therefore, it is important to identify potential drug-HLA associations so that individuals who would develop IDRs can be identified before drug exposure. We harvested the associations between drugs and HLAs from the literature and built up a database named HLADR. Molecular docking was used to explore the known associations. From the analysis of docking scores between the 17 drugs and 74 class I HLAs, it was observed that the significantly associated drug-HLA pairs had statistically lower docking scores than those not reported to be significantly associated (t-test p < 0.05). This indicates that molecular docking can be utilized for screening drug-HLA interactions and predicting potential IDRs, and may improve drug safety and the implementation of personalized medicine. Examining the binding modes of drugs in the docked HLAs suggested several distinct binding sites inside class I HLAs, expanding our knowledge of the underlying interaction mechanisms between drugs and HLAs. AR-BIC-8

Poster Abstracts

AR-BIC First annual conference - March 11-12, 2015