
Research Group of Distributed Computing and Systems
The "ABCD" of Our Research
We address real-world system challenges and advance state-of-the-art solutions. Our work is grounded in the core of distributed systems and supports a broad range of applications, including AI-driven systems, Blockchains, Cloud computing, and Data management.
At DISCOS, we are devoted to designing foundational components of distributed computing systems that advance performance, scalability, and availability. We rigorously prove their correctness and implement them in real-world systems (e.g., physically distributed servers or cloud environments). We then evaluate their performance using standard benchmarks and real-world workloads.


Key words: Data governance, Edge-Cloud DBMS, Agentic DBMS
R1: Cloud DBMS
Our research on Cloud DBMS focuses on designing scalable, reliable, and cost-efficient data management infrastructures for real-world production systems. We study how cloud DBMS can serve as a foundation for centralized data governance while supporting heterogeneous, distributed, and data-intensive workloads.
Our topics include, but are not limited to:
-
Edge-Cloud DBMS for cost-efficient data ingestion
-
Data quality and lean processing
-
Cloud DBMS interoperability
-
Agentic DBMS
We actively collaborate with Airbus Canada to develop next-generation Cloud DBMS with Agentic technologies for Industry 4.0. Our work targets complex discrete manufacturing and directly supports the ramp-up of the Airbus A220.

Key words: ANN indexing, Cloud VDBMS, Distributed VDB search, KV caching, RAG
R2: Vector Database (VDB)
VDBs are becoming a foundational component of AI-driven data systems, enabling efficient similarity search over high-dimensional representations (typically generated by LLMs). Our research focuses on system-level challenges, including:
-
ANN indexing and similarity search (e.g., HNSW, IVF, PQ, OPQ) for high accuracy and low latency across multi-modal data
-
Distributed indexing and query processing, supporting sharding, replication, and parallel search at the scale of billions of vectors
-
Cloud-native VDB architectures that provide elasticity, fault tolerance, and cost efficiency
In addition, we design LLM-centric middleware systems, including key-value caching and memory management layers, for large-scale AI inference and retrieval-augmented generation (RAG) pipelines.

Key words: Data consistency, Consensus, Fault tolerance, Coordination, Security
R3: Consensus, Consistency, and Fault tolerance
Research on distributed system fundamentals has always been a core strength of our group. Blockchains, cloud computing infrastructures, and distributed DBMS are intrinsically distributed and operate under failures, asynchrony, and adversarial behaviors. Our research studies and advances the following fundamentals in distributed systems:
-
Data consistency and replication protocols in distributed DBMS
-
Security of distributed computing, including adversarial and fault-prone environments
-
Consensus algorithms under crash (CFT) and Byzantine fault tolerance (BFT)
-
Coordination-as-a-Service, providing modular and scalable coordination primitives for cloud and blockchain systems.
Our summary on BFT is used by Hyperledger Fabric (first link!) as part of its official documentation to introduce the BFT problem and its design considerations. We are proud to be recognized by the world's largest and widely deployed permissioned blockchain system.

Key words: Training influence, Auditable revenue-sharing, Decentralized GenAI data governance
R4: Data governance in GenAI
Under the rapid adoption of GenAI systems, training and inference pipelines increasingly rely on large volumes of copyrighted, often proprietary, data, yet current systems lack transparent and enforceable governance mechanisms. Our research explores blockchain- and decentralized solutions that enable trustworthy, auditable, and incentive-compatible data governance in GenAI systems.
-
Measurable training influence, quantifying how individual data inputs contribute to model training and inference outcomes
-
Decentralized revenue-sharing mechanisms, enabling fair compensation between GenAI service providers and copyright data owners
-
Emerging legal and policy frameworks for GenAI data governance, bridging system design with regulatory requirements
Through this research, we aim to establish system-level foundations for responsible and sustainable GenAI, where data usage, value creation, and incentives are aligned across technical, economic, and societal dimensions.
