Advocating for Open source | MS Data Science @ University of Washington, Seattle
Clean schemas and robust, vectorized pipelines beat complex, fragile architectures every single time. Good design is quiet and highly maintainable.
Deep focus on open formats like Parquet, Arrow, and Hudi. Your data belongs to your engineering stack, completely free from proprietary vendor locks.
Engineering starts with understanding the user's requirements. We reverse-engineer our solutions directly from core user needs to build platforms that actually solve real problems.
Research Assistant, Health Sciences · Jan 2026 - Present · Seattle, WA
Building an end-to-end air quality data pipeline ingesting data from 5 heterogeneous instruments into a unified DuckDB + Parquet schema, enabling researchers to run statistical analyses without manual data wrangling. Designing a standardized ingestion layer that harmonizes multi-format sensor data with automated parsing, schema validation, and cloud storage on AWS S3.
Student Researcher, Bioengineering · Mar 2026 - Present · Seattle, WA
Contributing to AutoRELATE, a multimodal LLM research project evaluating AI-based clinical communication assessment. Focused on annotation pipelines and model evaluation workflows, investigating multimodal inputs to enhance model performance for equitable, scalable clinician training tools.
Data Engineer · Aug 2023 - Aug 2025 · Bengaluru, India
Architected a cloud-native automated pricing engine for 30+ Huggies SKUs on Amazon India using Azure Data Factory, Databricks, and Data Lakes, contributing to a 2.14% revenue increase. Engineered Power BI reporting across 6 APAC markets reducing manual requests by 80%. Optimized master data pipelines for a 56% runtime decrease and led a production storage migration to Azure Blob Storage with zero downtime across 20+ Databricks notebooks.
Data Engineering Intern · Jan 2023 - Jul 2023 · Bengaluru, India
Developed anomaly detection models in PySpark for scalable log processing and built Power BI dashboards that identified 50+ security findings in 3 months. Analyzed 200+ phishing emails to design organization-wide training programs and built MLflow-based monitoring pipelines for real-time alerts on data and prediction drift.
Student Researcher, MITACS Globalink · May 2022 - Aug 2022 · Gatineau, Canada
Awarded MITACS Globalink Research Scholarship to develop a thermal imaging dataset for sports injury diagnosis. Collected and processed 20,000+ thermal images across 4 anatomical regions to support deep learning segmentation models.