Rithvik Sunku

San Francisco, CA (510) 584-7415 ยท rithviksunku@berkeley.edu

I am a dedicated and results-driven Data Engineer with a strong background in Software Engineering and Data Science. I am currently pursuing my Master's degree at the University of California, Berkeley and I also hold a Bachelor of Arts in Data Science as well. My academic journey has exposed me to cutting-edge coursework, including Statistical and Probability, Algorithms Analysis, Database Management, and Machine Learning.


Experience

Data Engineer Intern

Tesla

Analyzed energy signals for 4000+ sites using PySpark and statistical testing, resulting in $180k higher credit claims. Developing an API wrapper and data pipeline for collection of various APIs, automating work averaging 20 hours/week.

September 2023 - Present

Data Engineer Intern

McKinsey & Company

Built out a multi-stage data pipeline through PySpark and EMR to process 4M rows of Iceberg tables for MVP infra. Designed 10+ validations and handling for row, table, and schema with GreatExpectations to reduce breakage by 40%. Spearheaded and implemented 3+ data models and respective SQL transformations to cover 15% of all customer data.

June 2023 - August 2023

Data Engineer Intern

Moveworks

Reduced 30 hours of pipeline outages per month for a team of 15 Data Scientists and Engineers by developing a SQL data validation developer tool with custom Protobuf design, Superset dashboard logging, and an HTML analytic report. Lowered performance run-time of the tool by 40% by implementing connections to Athena, Snowflake, S3, and Postgres.

May 2022 - August 2022

Software Engineer Intern

Amazon Web Services

Tracked 10% of all Lambda Functions user activity and memory recommendations by designing an e2e automated ETL system using a CDK infrastructure to read internal API recommendation requests and process them into a database. Increased detection of errors by 5% by automating request reads in CloudWatch and anomaly tracking in QuickSight.

May 2021 - August 2021

Education

University of California, Berkeley

Master of Science
Information and Data Science

Machine Learning Systems Engineering (FastAPI, Docker, Kubernetes), Machine Learning at Scale (Spark, Databricks), Computer Vision, Applied Machine Learning, Research Design for Data Scientists, Statistics for Data Science, Fundamentals of Data Engineering (AWS, SQL, Neo4J)
August 2022 - May 2024

University of California, Berkeley

Bachelor of Arts
Data Science

Database Systems, Software Engineering, Data Structure and Algorithms, Analysis of Algorithms, Artificial Intelligence, Intro to Programming, Foundations of Data Science, Prinicipals of Data Science, Data and Decisions, Probability, Statistics, Linear Algebra, Inference
August 2019 - May 2022

Skills

Top Languages & Tools
Workflow
  • Backend Software Engineering
  • Data Engineering ETL
  • Cross Functional Teams
  • Agile Development & Scrum

Languages:

Python, SQL, Java, Kotlin, JavaScript (including Node.js), HTML, Ruby, R

Front-End Framework:

React.js

Data Analysis and Visualization:

Tableau, PowerBI, Jupyter, Apache Superset

Version Control:

Git

API Development:

FastAPI, Flask

Data Libraries and Frameworks:

Numpy, Pandas, Matplotlib, Scikit-learn, TensorFlow, NLTK, CV2, Great Expectations

Data Management and Workflow:

Apache Airflow, Apache Spark


Projects

Real Estate Investment Platform

Developed a MERN stack application hosted on AWS for real estate investors, including user auth and deal calculator.

Sentiment Analysis Model API

Created an API that implemented sentiment analysis model from hugging face hosted with configured Azure Kubernetes

EuroSat Land Use Classification

Designed SVM and logistic classifier on satellite images of land using LBP, HOG, HSV, and VGG16 with 91% accuracy

Spotify Recommendation Algorithm

Constructed mixture model using K-Means and GMMs using Spotify song features and user ratings to generate playlist