Kevin Bui | Data Science & ML Portfolio

I build data tools that turn messy information into something people can actually use.

Kevin Bui | Data Science & Machine Learning Intern

I like working at the point where engineering, analysis, and product usefulness meet: MCP services, geospatial pipelines, LLM systems, and dashboards that help people make clearer decisions.

About Me

I'm most energized by projects that start with raw, messy, real-world data and end with something useful, whether that's a cleaner pipeline, a smarter model workflow, or a visualization that makes the answer obvious.

At NextGen Federal Systems, I build Model Context Protocol services that ground LLMs with structured satellite, weather, and geospatial data, and I've been exploring how to make retrieval systems more secure and more dependable for sensitive document analysis.

At UW, I also support research operations by cleaning and coordinating data for a 500+ participant subject pool. I like the mix of engineering discipline and human usefulness that comes with that work: the data has to be right, but it also has to help someone else do their job better.

Quick Facts

Education B.S. Computer Science, Data Science track | University of Washington | 2023 to 2026
Previously A.A. Direct Transfer | South Puget Sound CC | GPA 3.95/4.0
Base Seattle, WA
Open to Summer 2026 internships and new-grad roles
Outside work I compete in the UW Husky Bowling League | see the stats dashboard
Email kevinabuicollege@gmail.com

What feels most like me

I tend to enjoy the in-between work: cleaning the weird dataset, connecting the API no one wants to touch, or translating technical output into something clear enough that another person can act on it.

Experience

Data Science & Machine Learning Intern

NextGen Federal Systems | Morgantown, WV

July 2025 to Present

Designed and deployed 3+ Model Context Protocol (MCP) services to augment LLM workflows with structured satellite, weather, and geospatial data, improving model grounding and reducing hallucinations through deterministic tool-based retrieval.
Built a production-ready NOAA GOES satellite data pipeline integrating 10+ configurable API parameters for dynamic feature selection and standardized JSON outputs for downstream ML analysis.
Engineered a geospatial data ingestion pipeline processing 1M+ USGS GNIS and NGA GNS records into SQLite, replacing probabilistic API lookups with authoritative dataset-driven entity resolution.
Developed and evaluated a secure Retrieval-Augmented Generation (RAG) architecture for 50MB+ confidential documents, implementing TOC-based semantic chunking and embedding-driven retrieval.

Python LLMs MCP RAG SQLite Geospatial

Research Assistant, Subject Pool Coordinator

University of Washington | Seattle, WA

Sept 2025 to Present

Managed and optimized a subject pool database with 500+ participants, ensuring 99% data accuracy.
Extracted, cleaned, and analyzed data quarterly for multiple professors using Python, reducing data processing time by 20%.

Python Pandas Data Cleaning

Projects

Machine Learning | LLMs

MCP Services for LLM Workflows

Designed and deployed 3+ Model Context Protocol services to augment LLM workflows with structured satellite, weather, and geospatial data. Deterministic tool-based retrieval improves model grounding and reduces hallucinations on specialized queries.

Python MCP LLMs APIs

Machine Learning | RAG

Secure RAG Architecture for Confidential Docs

Developed and evaluated a secure Retrieval-Augmented Generation system for 50MB+ confidential documents. Implemented TOC-based semantic chunking and embedding-driven retrieval to increase contextual precision in LLM-based analysis tasks.

Python RAG Embeddings LLMs

Data Engineering

NOAA GOES Satellite Data Pipeline

Built a production-ready pipeline integrating 10+ configurable API parameters for the NOAA GOES satellite feed. Enabled dynamic feature selection and standardized JSON outputs consumed directly by downstream ML analysis.

Python REST APIs JSON Geospatial

Data Engineering

Geospatial Entity Resolution Pipeline

Engineered an ingestion pipeline processing 1M+ USGS GNIS and NGA GNS records into SQLite, replacing probabilistic API-based lookups with authoritative dataset-driven entity resolution for ambiguous geographic queries.

Python SQLite ETL Geospatial

Data Analytics

Supermarket Sales Analysis

Analyzed 10K+ transactional records using SQL and Python to identify revenue drivers, customer purchasing patterns, and inventory turnover trends. Designed interactive Tableau dashboards for sales growth, customer retention, and product profitability.

SQL Python Tableau Pandas

Data Science | Scraping

Pokemon Data Scraper

Developed a Python-based data pipeline to scrape, clean, and structure 1,000+ Pokemon records, transforming unstructured HTML into analysis-ready datasets. Performed data cleaning, normalization, and feature extraction with Pandas; results stored in MongoDB.

Python Pandas MongoDB Web Scraping

Data Analytics | D3

Bowling Analytics Dashboard

Personal performance tracker built with Chart.js and Firebase. Tracks scratch scores, strikes, spares, and ball speed across every game, with rolling averages, personal records, and an interactive live scorecard with real bowling scoring logic.

D3.js Chart.js JavaScript Personal Data

Skills & Tools

Programming Languages

Python Java JavaScript MySQL

Machine Learning

NumPy Scikit-learn Pandas LLMs RAG MCP

Data Visualization

Tableau PowerBI Vega-lite D3.js Observable

Data Engineering

MySQL SQLite MongoDB ETL Pipelines REST APIs

Tools & Platforms

VSCode PyCharm Eclipse Atom Overleaf Git

Relevant Coursework

Machine Learning Data Management Data Visualization Data Structures & Algorithms Statistics Artificial Intelligence

Get In Touch

I'm looking for Summer 2026 internships and new-grad roles in data science and ML. If you're building something thoughtful around data pipelines, LLM systems, analytics, or geospatial tools, I'd love to hear about it.

Send a Message

I build data tools that turn messy information into something people can actually use.

About Me

Quick Facts

What feels most like me

Systems that connect data to real decisions

Practical, curious, and detail-heavy

Bowling, side dashboards, and overanalyzing performance trends

Experience

Data Science & Machine Learning Intern

Research Assistant, Subject Pool Coordinator

Projects

MCP Services for LLM Workflows

Secure RAG Architecture for Confidential Docs

NOAA GOES Satellite Data Pipeline

Geospatial Entity Resolution Pipeline

Supermarket Sales Analysis

Pokemon Data Scraper

Bowling Analytics Dashboard

Skills & Tools

Programming Languages

Machine Learning

Data Visualization

Data Engineering

Tools & Platforms

Relevant Coursework

Get In Touch