Rithvik Sunku

San Francisco, CA (510) 584-7415 · rithviksunku@berkeley.edu

I am a dedicated and results-driven Data Engineer with a strong foundation in Software Engineering and Data Science. I hold a Master’s degree from the University of California, Berkeley, and a Bachelor of Arts in Computer Science. I am currently working at McKinsey & Company, specializing in Advanced Industries and Digital Twin technologies. My work focuses on delivering data-driven solutions that bridge the gap between operational technology and digital innovation. My academic and professional journey has equipped me with deep expertise in algorithms, statistical analysis, database management, and machine learning, which I leverage to solve complex industry challenges and drive strategic impact.

Experience

Data Engineer

McKinsey & Company

Working on the scalable development of core feature datasets and deployment of ML models across various clients.

March 2024 - Present

Data Engineer Intern

Tesla

Analyzed energy signals for 4000+ sites using PySpark and statistical testing, resulting in $180k higher credit claims. Developing an API wrapper and data pipeline for collection of various APIs, automating work averaging 20 hours/week.

August 2023 - December 2023

Data Engineer Intern

McKinsey & Company

Built out a multi-stage data pipeline through PySpark and EMR to process 4M rows of Iceberg tables for MVP infra. Designed 10+ validations and handling for row, table, and schema with GreatExpectations to reduce outages by 40%. Spearheaded and implemented 3+ data models and respective SQL transformations to cover 15% of all customer data.

June 2023 - August 2023

Software Engineer Intern

Moveworks

Reduced 30 hours of pipeline outages per month for a team of 15 Data Scientists and Engineers by developing a SQL data validation developer tool with custom Protobuf design, Superset dashboard logging, and an HTML analytic report. Lowered performance run-time of the tool by 40% by implementing connections to Athena, Snowflake, S3, and Postgres.

May 2022 - August 2022

Software Engineer Intern

Amazon Web Services

Tracked 10% of all Lambda Functions user activity and memory recommendations by designing an e2e automated ETL system using a CDK infrastructure to read internal API recommendation requests and process them into a database. Increased detection of errors by 5% by automating request reads in CloudWatch and anomaly tracking in QuickSight.

May 2021 - August 2021

Education

University of California, Berkeley

Master of Science

Information and Data Science

Machine Learning Systems Engineering (FastAPI, Docker, Kubernetes), Machine Learning at Scale (Spark, Databricks), Computer Vision, Applied Machine Learning, Research Design for Data Scientists, Statistics for Data Science, Fundamentals of Data Engineering (AWS, SQL, Neo4J)

August 2022 - May 2024

University of California, Berkeley

Bachelor of Arts

Computer Science

Database Systems, Software Engineering, Data Structure and Algorithms, Analysis of Algorithms, Artificial Intelligence, Intro to Programming, Foundations of Data Science, Prinicipals of Data Science, Data and Decisions, Probability, Statistics, Linear Algebra, Inference

August 2019 - May 2022

Skills

Top Languages & Tools

Workflow

Backend Software Engineering
Data Engineering ETL
Cross Functional Teams
Agile Development & Scrum

Languages:

Python, SQL, Java, Kotlin, JavaScript (including Node.js), HTML, Ruby, R

Front-End Framework:

React.js

Data Analysis and Visualization:

Tableau, PowerBI, Jupyter, Apache Superset

Version Control:

Git

API Development:

FastAPI, Flask

Data Libraries and Frameworks:

Numpy, Pandas, Matplotlib, Scikit-learn, TensorFlow, NLTK, CV2, Great Expectations

Data Management and Workflow:

Apache Airflow, Apache Spark

Projects

Real Estate Investment Platform

Developed a MERN stack application hosted on AWS for real estate investors, including user auth and deal calculator.

Sentiment Analysis Model API

Created an API that implemented sentiment analysis model from hugging face hosted with configured Azure Kubernetes

EuroSat Land Use Classification

Designed SVM and logistic classifier on satellite images of land using LBP, HOG, HSV, and VGG16 with 91% accuracy

Spotify Recommendation Algorithm

Constructed mixture model using K-Means and GMMs using Spotify song features and user ratings to generate playlist