Haoyang.
Solutions fuel my passion.
A passionate software developer. I tend to make use of modern technologies to solve problems and make values.
Also a Music Arrangement enthusiast, don’t forget to check my portfolio!
A passionate software developer. I tend to make use of modern technologies to solve problems and make values.
Also a Music Arrangement enthusiast, don’t forget to check my portfolio!
I am a second-year master’s student at Georgetown University majoring in computer science, with current GPA 3.96.
My interests focus on backend development and data engineering. I have implemented service backends, big data workflows, and applications during my past internships and projects. These works include experiences with distributed components, machine learning algorithms, backend frameworks, and cloud services and are involved with multiple languages, like Python, Scala, and Java.
Here are a few technologies and concepts I've been working with:Counseled and budgeted the cloud-based project workflow, trimmed development cost by 50% and time by 70% compared to training and tuning models from scratch.
Designed and built a serverless data processing workflow with AWS Lambda and AWS S3. Improved the processing time with parallelized serverless function calls by 1000x than traditional cloud server instances, while keeping the cost similar, reduced data processing time by over 90%.
Facilitated and helped the build of backend of proposal analysis system, including implementing text similarity, text classification, abstract generation algorithms, and corresponding backend RESTful API with Flask in Python, and ORM SQL frameworks.
Refactored monolithic applications to a Microservice and Component based architecture, breaking components into Pods and containers for Kubernetes clusters.
Implemented an image correction algorithm which reduced color RMSE by 60%, which was integral to ensuring the product’s performance.
Completed optimizations and manufacturer customization requirements with self-testing for the Baidu Mobile App for Android in Java, of which 80% pushed to master, including adapting different notification push services for Chinese users, modifying default page styles for different phone models, adding or removing specific entries for preinstalled OEM versions, and so on.
Expanded team technology stack by evaluating new UI toolkit Flutter.
Implemented an RDBMS in Java, with Visitor Design Pattern and composite data structures, supporting nearly full SQL syntax, including implicit join, expression updates, and arbitrary expression evaluation for WHERE conditions.
Implemented cost-based and rule-based query optimization, integrity constraints and achieved sub-1-second responsiveness manipulating up to one-million-record tables, which is close to commercial grade.
Scraped and collected feature data for over 500K songs and artists from sources including Spotify, Wikipedia and allmusic.com, larger than any public dataset, stored in MongoDB NoSQL database. Worked on missing value imputation, outliers identification and duplication removal with statistical methods using Pandas, Numpy and Scikit-learn.
Analysed influence and trends caused by popular artists with clustering, regression, ANOVA, classification, and network analysis using Scikit-learn, Scipy and NetworkX. Utilized Matplotlib and Plotly for visualization.
Built a high-quality dataset to describe behaviors of taxi drivers in Qingdao using multiple dimensions from hundreds of Gigabytes of raw GPS data with a distributed system in the workflow of MongoDB and Spark; extracted multiple features like the empty rate, work time, and profit from the spatial–temporal data.
Designed a brand-new multi-input RNN model with human-related features, environment-related features, and income data input simultaneously using PyTorch; the RMSE for predicting drivers’ income improved by 8.3% using the dataset as compared to LSTM.