Seattle-based | CS + Data Science at UW

I build data tools that turn messy information into something people can actually use.

Kevin Bui | Data Science & Machine Learning Intern

I like working at the point where engineering, analysis, and product usefulness meet: MCP services, geospatial pipelines, LLM systems, and dashboards that help people make clearer decisions.

scroll

About Me

I'm most energized by projects that start with raw, messy, real-world data and end with something useful, whether that's a cleaner pipeline, a smarter model workflow, or a visualization that makes the answer obvious.

At NextGen Federal Systems, I build Model Context Protocol services that ground LLMs with structured satellite, weather, and geospatial data, and I've been exploring how to make retrieval systems more secure and more dependable for sensitive document analysis.

At UW, I also support research operations by cleaning and coordinating data for a 500+ participant subject pool. I like the mix of engineering discipline and human usefulness that comes with that work: the data has to be right, but it also has to help someone else do their job better.

Quick Facts

  • Education B.S. Computer Science, Data Science track | University of Washington | 2023 to 2026
  • Previously A.A. Direct Transfer | South Puget Sound CC | GPA 3.95/4.0
  • Base Seattle, WA
  • Open to Summer 2026 internships and new-grad roles
  • Outside work I compete in the UW Husky Bowling League | see the stats dashboard
  • Email kevinabuicollege@gmail.com

What feels most like me

I tend to enjoy the in-between work: cleaning the weird dataset, connecting the API no one wants to touch, or translating technical output into something clear enough that another person can act on it.

What I Enjoy Building

Systems that connect data to real decisions

I'm drawn to projects that have both technical depth and visible usefulness: retrieval systems, pipelines, analytics tools, and data products with a clear downstream user.

How I Work

Practical, curious, and detail-heavy

I like understanding the full path from source data to final output. Usually that means asking a lot of questions, cleaning edge cases early, and trying to make the system understandable for the next person too.

Outside the Portfolio

Bowling, side dashboards, and overanalyzing performance trends

The bowling dashboard is probably the most personal project here. It's part sports tracker, part data toy, and part excuse to keep building visualizations around something I care about.

Experience

Data Science & Machine Learning Intern

NextGen Federal Systems | Morgantown, WV

July 2025 to Present
  • Designed and deployed 3+ Model Context Protocol (MCP) services to augment LLM workflows with structured satellite, weather, and geospatial data, improving model grounding and reducing hallucinations through deterministic tool-based retrieval.
  • Built a production-ready NOAA GOES satellite data pipeline integrating 10+ configurable API parameters for dynamic feature selection and standardized JSON outputs for downstream ML analysis.
  • Engineered a geospatial data ingestion pipeline processing 1M+ USGS GNIS and NGA GNS records into SQLite, replacing probabilistic API lookups with authoritative dataset-driven entity resolution.
  • Developed and evaluated a secure Retrieval-Augmented Generation (RAG) architecture for 50MB+ confidential documents, implementing TOC-based semantic chunking and embedding-driven retrieval.
Python LLMs MCP RAG SQLite Geospatial

Research Assistant, Subject Pool Coordinator

University of Washington | Seattle, WA

Sept 2025 to Present
  • Managed and optimized a subject pool database with 500+ participants, ensuring 99% data accuracy.
  • Extracted, cleaned, and analyzed data quarterly for multiple professors using Python, reducing data processing time by 20%.
Python Pandas Data Cleaning

Projects

Machine Learning | LLMs

MCP Services for LLM Workflows

Designed and deployed 3+ Model Context Protocol services to augment LLM workflows with structured satellite, weather, and geospatial data. Deterministic tool-based retrieval improves model grounding and reduces hallucinations on specialized queries.

Machine Learning | RAG

Secure RAG Architecture for Confidential Docs

Developed and evaluated a secure Retrieval-Augmented Generation system for 50MB+ confidential documents. Implemented TOC-based semantic chunking and embedding-driven retrieval to increase contextual precision in LLM-based analysis tasks.

Data Engineering

NOAA GOES Satellite Data Pipeline

Built a production-ready pipeline integrating 10+ configurable API parameters for the NOAA GOES satellite feed. Enabled dynamic feature selection and standardized JSON outputs consumed directly by downstream ML analysis.

Data Engineering

Geospatial Entity Resolution Pipeline

Engineered an ingestion pipeline processing 1M+ USGS GNIS and NGA GNS records into SQLite, replacing probabilistic API-based lookups with authoritative dataset-driven entity resolution for ambiguous geographic queries.

Data Analytics

Supermarket Sales Analysis

Analyzed 10K+ transactional records using SQL and Python to identify revenue drivers, customer purchasing patterns, and inventory turnover trends. Designed interactive Tableau dashboards for sales growth, customer retention, and product profitability.

Data Science | Scraping

Pokemon Data Scraper

Developed a Python-based data pipeline to scrape, clean, and structure 1,000+ Pokemon records, transforming unstructured HTML into analysis-ready datasets. Performed data cleaning, normalization, and feature extraction with Pandas; results stored in MongoDB.

Data Analytics | D3

Bowling Analytics Dashboard

Personal performance tracker built with Chart.js and Firebase. Tracks scratch scores, strikes, spares, and ball speed across every game, with rolling averages, personal records, and an interactive live scorecard with real bowling scoring logic.

Skills & Tools

Programming Languages

Python Java JavaScript MySQL

Machine Learning

NumPy Scikit-learn Pandas LLMs RAG MCP

Data Visualization

Tableau PowerBI Vega-lite D3.js Observable

Data Engineering

MySQL SQLite MongoDB ETL Pipelines REST APIs

Tools & Platforms

VSCode PyCharm Eclipse Atom Overleaf Git

Relevant Coursework

Machine Learning Data Management Data Visualization Data Structures & Algorithms Statistics Artificial Intelligence

Get In Touch

I'm looking for Summer 2026 internships and new-grad roles in data science and ML. If you're building something thoughtful around data pipelines, LLM systems, analytics, or geospatial tools, I'd love to hear about it.

Send a Message