
For the Summer of 2024, I am working as a Senior Data Science Intern with Thermo Fisher Scientific in Gilford, Connecticut. I am working with the surface chemistry team, within greater R&D, on various data-driven projects. My first project was creating a large language model, LLM, to take error messages written in plain English and create key:value pairs containing relevant information. To do this, I fine-tuned Google's FLAN-T5 model using PyTorch. I was limited in my computer resources, as I only had access to an Intel Core i7 CPU and an Nvidia T1200 Laptop GPU. With this in mind, I choose the appropriate model to not exhaust my limited compute resources. The model I fine-tuned accurately and quickly transformed plain English sequencing error messages into actionable key:value pairs used for further debugging and error tracing.

Thermo Fisher Logo
Thermo Fisher Logo

Over the Summer of 2023, I worked as a Data Science Intern with Thermo Fisher Scientific in Kalamazoo, Michigan. I led an enhanced, data-driven approach to update a previous targeted sales initiative. I brought innovative data science techniques and automation skills to my team to utilize many current technologies. Using Python, I engineered several machine learning models which highlighted variables that are highly correlated with increased test utilization and account growth. Using data from various databases, as well as web scraping additional data, the machine learning models I developed painted a improved picture of the key market drivers among over 180,000 physicians across the United States. With these drivers in mind, I worked with the business intelligence team to develop an intricate targeting algorithm that uses many test utilization metrics to determine a score for each physician. The algorithm I developed can be used whenever the team sees the need to find new targets. I developed it in a way so physicians who are growing and maximizing their opportunities are classified into a different category as opposed to physicians who are more targetable at that time. The algorithm was built to adjust for seasonal demand and market trends as well. All physicians were stack ranked within their respective sales territory region and plotted on a dashboard to be utilized by field sales reps. This helps to increase every sales reps' reach, promote efficiency, and ultimately result in increased growth. The solutions I delivered to the team will be implemented in 2024 for first-half annual targeting. I had the opportunity to present my work and findings to domestic and international senior leadership in business, IT, and data science.


Over the Summer of 2022, I worked as a Data Science Intern with the Municipal Securities Rulemaking Board (MSRB) in Washington D.C. The MSRB is the regulator responsible for all municipal securities in the United States. The MSRB creates rules and regulations for the municipal market, which are then approved by the Securities and Exchange Commission (SEC). I was a part of the Data and Analytics team. I was responsible for identifying why bond filings were not reported within a consistent time frame. Using various machine learning models, I found a strong correlation between several variables that led to faster bond filing times. Using these variables the team and I identified a root cause for the inconsistent reporting. These findings were used in a Request for Comment to propose a new rule change regarding automated trade reporting technologies. The Request for Comment aimed to assess the sentiment of firms operating in the municipal bond market regarding the proposed change of the bond trade filing time requirement from 15 minutes to 1 minute.