* denotes equal contribution
An up-to-date list is available on Google Scholar.
PhD thesis
Thesis
-
The Evolution of Cloud Data Architectures: Storage, Compute, and Migration. Gang Liao (2022). University of Maryland, College Park.
conference papers
2024
Flock: A Low-Cost Streaming Query Engine on FaaS Platforms
arXiv:2312.16735,
2024
In this paper, we present Flock, a cloud-native streaming query
engine that leverages the on-demand elasticity of Function-as-aService (FaaS) platforms to perform real-time data analytics. Traditional server-centric deployments often suffer from resource underor over-provisioning, leading to resource wastage or performance
degradation. Flock addresses these issues by providing more finegrained elasticity that can dynamically match the per-query basis
with continuous scaling, and its billing methods are more finegrained with millisecond granularity, making it a low-cost solution
for stream processing. Our approach, payload invocation, eliminates the need for external storage services and eliminates the
requirement for a query coordinator in the data architecture. Our evaluation shows that Flock significantly outperforms state-of-theart systems in terms of cost, especially on ARM processors, making
it a promising solution for real-time data analytics on FaaS platforms.
SFVInt: Simple, Fast and Generic Variable-Length Integer Decoding using Bit Manipulation Instructions (PATENTED)
arXiv,
2024
2023
FileScale: Fast and Elastic Metadata Management for Distributed File Systems
In Proceedings of the 2023 ACM Symposium on Cloud Computing,
2023
Recent work has shown that distributed database systems are a promising solution for scaling metadata management in scalable file systems. This work has shown that systems that store metadata on a single machine, or over a shared-disk abstraction, struggle to scale performance to deployments including billions of files. In contrast, leveraging a scalable, shared-nothing, distributed system for metadata storage can achieve much higher levels of scalabil- ity, without giving up high availability guarantees. However, for low-scale deployments – where metadata can fit in memory on a single machine – these systems that store metadata in a distributed database typically perform an order of magnitude worse than sys- tems that store metadata in memory on a single machine. This has limited the impact of these distributed database approaches, since they are only currently applicable to file systems of extreme scale.
This paper describes FileScale, a three-tier architecture that incorporates a distributed database system as part of a comprehen- sive approach to metadata management in distributed file systems. In contrast to previous approaches, the architecture described in the paper performs comparably to the single-machine architecture at a small scale, while enabling linear scalability as the file system metadata increases.
2021
BullFrog: Online Schema Evolution via Lazy Evaluation
In Proceedings of the 2021 ACM SIGMOD International Conference on Management of Data,
2021
This paper presents BullFrog, a relational DBMS that supports single-step, non-backwards compatible schema migrations without downtime, and without advanced warning.
When a schema migration is presented, BullFrog initiates a logical switch to the new schema, but physically migrates affected data lazily, as it is demanded by incoming transactions. BullFrog’s internal concurrency control algorithms and data structures enable concurrent processing of schema migration operations with post-migration transactions, while ensuring exactly-once migration of all old data into the physical layout required by the new schema.
BullFrog is implemented as an open source extension to PostgreSQL. Experiments using this prototype over a TPC-C based workload (supplemented to include schema migrations) show that BullFrog can achieve zero-downtime migration to non-trivial new schemas with near-invisible impact on transaction throughput and latency.