{"id":667,"date":"2024-04-18T23:34:25","date_gmt":"2024-04-18T23:34:25","guid":{"rendered":"https:\/\/alex-jimenez.com\/?post_type=rara-portfolio&#038;p=667"},"modified":"2024-11-11T14:52:17","modified_gmt":"2024-11-11T14:52:17","slug":"gradient-boosting-diabetes-data","status":"publish","type":"rara-portfolio","link":"https:\/\/alex-jimenez.com\/?rara-portfolio=gradient-boosting-diabetes-data","title":{"rendered":"Diabetes Readmission Classifier"},"content":{"rendered":"\n<h3 class=\"wp-block-heading\">Source Code: <a href=\"https:\/\/github.com\/alexjimenez99\/education-workflows\" data-type=\"link\" data-id=\"https:\/\/github.com\/alexjimenez99\/education-workflows\">GitHub<\/a><\/h3>\n\n\n\n<h3 class=\"wp-block-heading\">Summary (Not Completed!)<\/h3>\n\n\n\n<p>Exploration of a dataset related to readmission of diabetes patients based on attributes the hospitals have collected. The goal of this workflow was to see if there is a way to successfully create a classification model that could differentiate between no admission, readmission before 30 days, and readmission after 30 days. The motivation behind a hospital knowing this has to do with financial burdens that readmission can put on patients. With this information, perhaps hospitals could decide to prescribe more effective treatments. The scope of the use of this information is up to the end user, but there are various ways you could use the readmission rates to more effectively run a hospital. <\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Data Background<\/h3>\n\n\n\n<p>The data for this project comes from 130 US hospitals ranging from 1999-2008. The full background can be found <a href=\"https:\/\/www.kaggle.com\/datasets\/brandao\/diabetes\" data-type=\"link\" data-id=\"https:\/\/www.kaggle.com\/datasets\/brandao\/diabetes\">here<\/a>.<\/p>\n\n\n\n<h3 class=\"wp-block-heading\">Technical Background<\/h3>\n\n\n\n<p>This project utilized the following algorithms to assess their performance on predicting the diabetes dataset. <\/p>\n\n\n\n<ul>\n<li>Gradient Boosting (Xgboost)<\/li>\n\n\n\n<li>Random Forest <\/li>\n\n\n\n<li>Nueral Network (MLP)<\/li>\n<\/ul>\n\n\n\n<p>The training data involved X features and 3 classes for the target. Before digging into the actual model building, some exploratory data analysis was done using PCA to see if the classes were linearly separable. <\/p>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img fetchpriority=\"high\" decoding=\"async\" width=\"1024\" height=\"497\" src=\"https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM-1024x497.png\" alt=\"\" class=\"wp-image-818\" style=\"width:841px;height:auto\" srcset=\"https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM-1024x497.png 1024w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM-300x145.png 300w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM-768x372.png 768w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM-1536x745.png 1536w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM-124x60.png 124w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.01.49-AM.png 1790w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<figure class=\"wp-block-image size-large is-resized\"><img decoding=\"async\" width=\"1024\" height=\"783\" src=\"https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.08.35-AM-1024x783.png\" alt=\"\" class=\"wp-image-819\" style=\"width:643px;height:auto\" srcset=\"https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.08.35-AM-1024x783.png 1024w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.08.35-AM-300x229.png 300w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.08.35-AM-768x587.png 768w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.08.35-AM-78x60.png 78w, https:\/\/alex-jimenez.com\/wp-content\/uploads\/2024\/06\/Screenshot-2024-06-14-at-6.08.35-AM.png 1114w\" sizes=\"(max-width: 1024px) 100vw, 1024px\" \/><\/figure>\n\n\n\n<p>Evidently there is almost no separation between these classes using Principal Component Analysis. Given autoencoder&#8217;s performed better than the principal component analysis in the <a href=\"https:\/\/alex-jimenez.com\/?rara-portfolio=scanpy-rnaseq\" data-type=\"link\" data-id=\"https:\/\/alex-jimenez.com\/?rara-portfolio=scanpy-rnaseq\">RNA Transcriptomics Analysis project<\/a>, it was worth seeing if the autoencoder structure could find more meaningful separation planes. The autoencoder has the following notable features<\/p>\n\n\n\n<ul>\n<li>Latent Dimension of XX<\/li>\n\n\n\n<li>Embedding input_dim of <\/li>\n<\/ul>\n","protected":false},"excerpt":{"rendered":"<p>Source Code: GitHub Summary (Not Completed!) Exploration of a dataset related to readmission of diabetes patients based on attributes the hospitals have collected. The goal of this workflow was to see if there is a way to successfully create a classification model that could differentiate between no admission, readmission before 30 days, and readmission after &hellip; <\/p>\n","protected":false},"author":1,"featured_media":701,"comment_status":"open","ping_status":"closed","template":"","rara_portfolio_categories":[3,4],"_links":{"self":[{"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=\/wp\/v2\/rara-portfolio\/667"}],"collection":[{"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=\/wp\/v2\/rara-portfolio"}],"about":[{"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=\/wp\/v2\/types\/rara-portfolio"}],"author":[{"embeddable":true,"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=%2Fwp%2Fv2%2Fcomments&post=667"}],"wp:featuredmedia":[{"embeddable":true,"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=\/wp\/v2\/media\/701"}],"wp:attachment":[{"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=%2Fwp%2Fv2%2Fmedia&parent=667"}],"wp:term":[{"taxonomy":"rara_portfolio_categories","embeddable":true,"href":"https:\/\/alex-jimenez.com\/index.php?rest_route=%2Fwp%2Fv2%2Frara_portfolio_categories&post=667"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}