This Kaggle competition represents a significant opportunity to impact the future of medicine development. The goal of this competition was to predict chemical perturbation gene expression on new cell lines from 144 small molecule drugs. The dataset consisted of gene expression data from 6 different cell types. Two of the cell types were severely under represented, B cells and Myeloid Cells, comprising 5% of the training dataset. Ultimately, the goal was to create a machine learning model that could generalize to a test dataset of the two underrepresented cell types, B cells and Myeloid Cells, to predict the gene expression profiles from the small molecules not in the training set.