Showing 6 Result(s)

Transcriptomics CHO Alignment Pipeline

Summary Using fastqc, HISAT2, featureCounts, samtools, and Trimmomatic, an alignment pipeline was made for RNAseq data to the Chinese hamster ovary (CHO) genome. The annotation files were taken from NCBI. The workflow until the featureCounts step show’s an older annotation file. Note that the workflow is identical for the updated annotation file, but the updated …

Kaggle Leash Bio Therapeutics Competition

Source Code: GitHub Competition Overview In this competition, you’ll develop machine learning (ML) models to predict the binding affinity of small molecules to specific protein targets – a critical step in drug development for the pharmaceutical industry that would pave the way for more accurate drug discovery. You’ll help predict which drug-like small molecules (chemicals) …

Brain Region Enrichment Analysis

An exploration of the different biological enrichment algorithms and machine learning algorithms applied to an RNA expression dataset.  Source Code: GitHub Motivation This project is an exploration of RNAseq data from Kaggle. When I initially downloaded this dataset, it was because I wanted to learn how to do data analysis on high dimensional biological data. …

Dash and SQLAlchemy Dashboard

Source Code: GitHub Summary A stock forecasting Dash dashboard with a backend MySQL database; the database was setup in AWS RDS as well as a local MySQL server. The final project features the local database simply to avoid unnecessary costs. Below is a high level view of the programming layout. App.py has the main plotly …

CAFA 5 Protein Function Prediction (Kaggle Competition)

Competition Description: The goal of this competition is to predict the function of a set of proteins. You will develop a model trained on the amino-acid sequences of the proteins and on other data. Your work will help ​​researchers better understand the function of proteins, which is important for discovering how cells, tissues, and organs …