Clustering Among BAsal and Luminal androgen receptor Read more.

Breast cancer is a heterogeneous disease, and unsupervised clustering approaches using gene expression data have identified 3-6 distinct subtypes of triple negative breast cancer (TNBC). A genomic ally and clinically distinct subtype of TNBC is referred to as LAR (Luminal Androgen Receptor). Tumors with this subtype typically express high levels of the AR and exhibit alterations within genes involved in the PI3K pathway (e.g. PIK3CA mutations). Prospective studies are underway using drugs that target the AR alone or in combination with PI3K and CDK 4/6 inhibitors. Given the importance of accurately identifying this subtype, we sought to develop an online tool that uses submitted gene expression data to confidently characterize LAR samples by corroborating the classification with previously published clustering approaches. Hide this content.

Precision Cancer Genomic Report: Single Sample Inventory Read more.

With the advent of high throughput technologies, the quantity of ‘omics’ data has rapidly increased, creating the need for methodologies that can analyze complex datasets and provide interpretations that assist in decision making. We have developed PANOPLY; a novel computational approach to integrate both germline and somatic data obtained from multi-omics platforms for an individual of interest and analyze that data in the context matched-control samples. This approach is used to summarize knowledge into informative and predictive models to narrow appropriate treatment options and to aid clinician/researchers in decision making. The objective of PANOPLY is to process a variety of “omics” data along with molecular, pathological and clinical data from cancer patients with the ultimate goal of “individualizing therapy. Specifically, PANOPLY takes pre-processed data such as germline DNA sequence, somatic single nucleotide variants (SNVs), small insertions and deletions (INDELs), copy number variants (CNVs), fusion transcripts, along with RNA seq gene expression data to build an integrated network which ranks the cancer genesets for a given patient. From these data, the PANOPLY system identifies the most promising drug targets. Hide this content.

A comprehensive bioinformatics workflow for detecting circular RNAs Read more.

Circular RNAs (circRNAs) are recently discovered members of the noncoding RNA family that range in length from a few hundred to thousands of nucleotides. In contrast to linear RNA transcripts, which are normally spliced tail-to-head, circRNAs are formed by the covalent bonding of their 3´ and 5´ (head-to-tail) ends. The lack of open sites at the 5´ and 3´ ends exempts circRNAs from endonuclease degradation, making them stable in cells. Additionally, studies have shown remarkable capabilities of circRNAs to sequester several miRNAs away from messenger RNA targets using shared miRNA binding sites (MRE – miRNA response elements). Depending on the number of MRE sites available, circRNAs can compete with messenger RNAs for a common pool of miRNAs, thereby regulating gene expression. Such networks of complex interactions between coding and non-coding RNAs within the cell are termed as competing endogenous RNA (ceRNA) networks. Hence, these unique features of circRNAs to remain stable and act as competing endogenous RNAs (ceRNAs) make them promising candidates to explore novel diagnostic and therapeutic targets in diseases. Hide this content.

Reliable Identification of Variants Using RNA-seq Data Read more.

The eSNV-Detect is a method to detect expressed single nucleotide variants (eSNVs) with high specificity and sensitivity from the high throughput transcriptome sequencing data. Alignments from multiple aligers are used to cover the aligner bias and multiple genomic features are used to improve the specificity. For the expressed SNVs detected, it can also identify the amino acid change and classify the protein domains. Hide this content.

A program for detecting viral insertion sequences in the genome of human cancers Read more.

An efficient and sensitive program for detecting viral insertion sequences from known viral reference genome in the genome of human cancers. Hide this content.

A comprehensive system for RNA-Sequencing data analysis Read more.

RNA-Sequencing (RNA-Seq) technology is information-rich; the breadth of information gained spans from large structural changes to single nucleotide variants (SNVs). By efficiently analyzing RNA-Seq data, we can query and obtain a variety of genomic features, such as gene expression, novel and fusion transcripts, alternative splice sites, long non-coding and circular RNAs, SNVs, etc. Most RNA-Seq bioinformatics tools output one or two genomic features for downstream analysis, but so far there have been no comprehensive workflows that can be used to obtain a number of features from RNA-Seq data. To address this shortfall, Mayo Clinic has developed MAP-RSeq – a computational workflow that leverages data from an RNA-Seq experiment to provide comprehensive reports on genomic features for secondary data analysis. Hide this content.