We provide training and service opportunities across a variety of bioinformatics techniques. We work with genomes and transcriptomes, including, but not limited to de novo assemblies, resequencing, differential gene expression analyses, differential peak calling, variant discovery, and annotation. Custom services available on a case-by-case basis.
Experienced in de novo assemblies and resequencing, using next-generation sequencing long and/or short read technologies. We move datasets from raw reads through assemblies/alignments, gene and functional annotation, pathway analysis, variant discovery (single nucleotide polymorphisms, insertions/deletions, breakpoints, rearrangements, inversions, and copy-number variation), and selective sweeps.
We also have experience in comparative genomics using various techniques, such as genome-wide association studies and comparisons of other variant discovery methods mentioned above.
Finally, we have used methods to examine the subset of the genome known as the exome including single samples, family trios, and matched tumor/normal pairs
Transcriptomics “common analyses”
We have completed projects with RNAseq, including small RNA profiling, differential gene expression (DGE) studies using de novo assembled transcriptomes and available reference sequences, and gene co-expression network construction. Gene identification and functional annotation are done for de novo assemblies. Results from DGE studies include gene ontology terms, pathway enrichment and information from the KEGG pathway database.
Nucleic acid-binding assays
Use of next-generation sequencing after chromatin immunoprecipitation assay (ChIP-seq) to identify protein-DNA binding sites throughout the genome with up to a single base-pair resolution. A typical ChIP-seq analysis includes peak calling, functional analysis of the peak-associated promoter/gene, and identification of enriched motifs within the peaks. A similar analysis can be performed after RNA immunoprecipitation (RIP-seq) to identify the binding sites of RNA-binding proteins within target transcripts.
Machine learning (ML) models
Machine Learning is a rapidly advancing domain within AI that enables computers to learn from labeled ground truth training data. The trained model can then be used to label new data without manual curation. Given a large training dataset, deep learning, which involves neural networks, can automatically identify features useful to perform a task. In biological sciences, machine learning is often used to identify biochemical properties/ regulation from sequence data and classify image data.
CUGBF currently owns two high memory nodes and a 150 terabyte (TB) file system on Clemson University’s cyberinfrastructure technology integration (CITI) Palmetto cluster. Our nodes are equipped with either 40 or 80 cores and 1.5 TB of random-access memory (RAM).
Our facility maintains hundreds of software programs for most types of bioinformatics analyses. A list grouped by categories can be found here, and a full list (with version numbers) is available here.
The software we maintain is accessible through modules on the Palmetto cluster. In order to use these programs, you must have a Palmetto account. If you need to apply for one (free to all Clemson faculty, staff, and students), please follow this link.
CITI also provides introductory and advanced training throughout the year, and their workshops are available here.
A great resource with additional information on the Palmetto cluster is their online user guide.
Once you have a Palmetto account, please agree to and sign our acknowledgement agreement in order to be given access to our software modules.
“We had an excellent experience with the facility and service. We will come back to you in the future.”
-Dr. Venugopal Mendu
Department of Plant and Soil Science
Texas Tech University