Haoyang's Profile

Hi, my name is

Haoyang.

Solutions fuel my passion.

A passionate software developer. I tend to make use of modern technologies to solve problems and make values.

Also a Music Arrangement enthusiast, don’t forget to check my portfolio!

Resume

About Me

I am a second-year master’s student at Georgetown University majoring in computer science, with current GPA 3.96.

My interests focus on backend development and data engineering. I have implemented service backends, big data workflows, and applications during my past internships and projects. These works include experiences with distributed components, machine learning algorithms, backend frameworks, and cloud services and are involved with multiple languages, like Python, Scala, and Java.

Here are a few technologies and concepts I've been working with:

Python
Java
C/C++
Amazon Web Service
SQL
Kubernetes
Docker
NoSQL
Git
Spring
Unix/Linux
Elasticsearch
Redis
Keras
Pytorch
Spark
Scala
Object Oriented Programming (OOP)
Algorithms and Data Structures
Design Patterns

Experience

Georgetown University McDonough School of Business
Beijing Computing Center
Baidu

Research Assistant - Georgetown University McDonough School of Business

Jun 2022 - Nov 2022

Counseled and budgeted the cloud-based project workflow, trimmed development cost by 50% and time by 70% compared to training and tuning models from scratch.
Designed and built a serverless data processing workflow with AWS Lambda and AWS S3. Improved the processing time with parallelized serverless function calls by 1000x than traditional cloud server instances, while keeping the cost similar, reduced data processing time by over 90%.

Algorithm Intern - Beijing Computing Center

Jun 2019 - Sep 2019

Facilitated and helped the build of backend of proposal analysis system, including implementing text similarity, text classification, abstract generation algorithms, and corresponding backend RESTful API with Flask in Python, and ORM SQL frameworks.
Refactored monolithic applications to a Microservice and Component based architecture, breaking components into Pods and containers for Kubernetes clusters.
Implemented an image correction algorithm which reduced color RMSE by 60%, which was integral to ensuring the product’s performance.

Research & Development Intern - Baidu

Jan 2020 - Aug 2020

Completed optimizations and manufacturer customization requirements with self-testing for the Baidu Mobile App for Android in Java, of which 80% pushed to master, including adapting different notification push services for Chinese users, modifying default page styles for different phone models, adding or removing specific entries for preinstalled OEM versions, and so on.
Expanded team technology stack by evaluating new UI toolkit Flutter.

Education

2021 - 2023

Master of Science in Computer Science

Georgetown University

GPA: 3.96 out of 4.0

Selected courses: Database Management System, Computer Architecture, Grad. Algorithm, Gems of Theoretical Computer Science, Information Assurance, Deep Learning with Neural Networks, Statistical Machine Learning

2016 - 2020

Bachelor of Science in Computer Science and Technology

Beijing Jiaotong University

Selected courses: Object Oriented Programming & C++, Operating Systems, Computer Organization, Data Structure, Algorithm Design and Analysis, Artificial Intelligence, Database Systems

Projects

Java Visitor Design Pattern Git SQL Database Management System

Toy-DB: Implementation of Relational Database Management System

Implemented an RDBMS in Java, with Visitor Design Pattern and composite data structures, supporting nearly full SQL syntax, including implicit join, expression updates, and arbitrary expression evaluation for WHERE conditions.
Implemented cost-based and rule-based query optimization, integrity constraints and achieved sub-1-second responsiveness manipulating up to one-million-record tables, which is close to commercial grade.

Python Pandas Numpy Scikit-learn Scipy NetworkX Plotly MongoDB

Music Data Analysis

Scraped and collected feature data for over 500K songs and artists from sources including Spotify, Wikipedia and allmusic.com, larger than any public dataset, stored in MongoDB NoSQL database. Worked on missing value imputation, outliers identification and duplication removal with statistical methods using Pandas, Numpy and Scikit-learn.
Analysed influence and trends caused by popular artists with clustering, regression, ANOVA, classification, and network analysis using Scikit-learn, Scipy and NetworkX. Utilized Matplotlib and Plotly for visualization.

Demo

MongoDB Spark Scala PyTorch Distributed System Big Data

Prediction on Taxi Drivers’ Income Based on GPS Data

Built a high-quality dataset to describe behaviors of taxi drivers in Qingdao using multiple dimensions from hundreds of Gigabytes of raw GPS data with a distributed system in the workflow of MongoDB and Spark; extracted multiple features like the empty rate, work time, and profit from the spatial–temporal data.
Designed a brand-new multi-input RNN model with human-related features, environment-related features, and income data input simultaneously using PyTorch; the RMSE for predicting drivers’ income improved by 8.3% using the dataset as compared to LSTM.