Zhichao Cao
-
Mail code: 8809Campus: Tempe
-
Zhichao Cao is an assistant professor in the School of Computing and Augmented Intelligence at Arizona State University. He leads the Intelligent Data Infrastructure Lab (ASU-IDI), where he conducts research in the areas of data infrastructure for AI/ML (LLM-based system auto-optimization, LLM KV-cache offloading, LLM inference memory extension, vector databases, caching for CPU-GPU co-design, systems for GPU-SSD direct access, training/inference failure protection, and sustainable LLM serving), AI/ML for system optimizations (data system auto-tuning with LLMs, AI-driven system optimizations), database systems (e.g., key-value stores, graph databases, and timeseries databases), storage systems (e.g., file systems, cloud storage, and deduplication systems), next-generation data infrastructure (e.g., disaggregated infrastructure, computing-in-X, and wireless datacenter). His research interests also lie in the design and development of storage systems for new memory and storage technologies, such as SMR, IMR, NVM, CXL, RDMA, ZNS, and DNA. Moreover, his research also encompasses big data systems, with a focus on the development of query engines for large-scale scientific computing in HPC.
Zhichao Cao has served as a Program Committee member of several top conferences, such as USENIX FAST, USENIX ATC, ACM SIGMOD, VLDB, IEEE ICDE, ACM HotStorage, IEEE ICDCS, and ICPP. He has served as the proceedings chair of SIGMOD, Mentorship Program Co-Chair of USENIX FAST, publicity chair of ACM HotStorage, publicity chair of MSST, and virtual chair of ACM HotStorage. He is the reviewer of top journals, including ACM Transactions on Storage, IEEE Transactions on Computers, IEEE Micro, and IEEE Transactions on Cloud Computing. He is a recipient of the prestigious NSF CAREER Award (2025), and he also received ACM HPDC'25 Best Student Paper Award, ACM HotStorage'24 Best Paper Award, and was recognized with the VLDB 2024 Distinguished Reviewer Award for his contributions to the community.
Prior to joining ASU, Zhichao Cao worked as a research scientist at the Facebook (Meta) RocksDB Team, where he contributed to storage and database research from 2018 to 2021. He earned his bachelor's degree in Automation from Tsinghua University in 2013 and his doctoral degree in Computer Science from the University of Minnesota (advised by Prof. David H.C. Du), Twin Cities, in 2020.
Funding:
NSF CAREER Award (2025): $730,000 (Single-PI)
NSF CSR Core (2024): $600,000 (Leading-PI)
OpenAI Researcher Access Program Credit Award 2024
Google Cloud Research Credit Award 2022
Ph.D. Students
- Chang Guo, 2022 -present (B.S. Tsinghua University)
- Viraj Thakkar, 2023 - Present (M.S., Arizona State University)
- Qi Lin, 2024 - Present (M.S. UC Irvine)
- Zhenyu Zhang, 2024 - Present (M.S., Peking University)
- Zhenjie Sun, 2025 - Present (B.S. Shanghai Jiao Tong University and UMich, Ann Arbor)
- Jun Kong, 2025 - Present (B.S. Lanzhou University)
- Ph.D. Computer Science, University of Minnesota, Twin-Cities, 2020
- M.S. Computer Science, University of Minnesota, Twin-Cities, 2019
- B.S. Automation, Tsinghua University, China 2013
Work Experience
- Research Scientist, Facebook Oct. 2019 - Dec. 2021
- Research Collaborator, Facebook Sep. 2018 - Sep. 2019
- Research Intern, Facebook Jun. 2018 - Aug. 2018
- Research Intern, Veritas Jun. 2016 - Aug. 2016
- Research Intern, Hewlett-Packard (HPE) Jun. 2015 - Aug. 2015
- Research Intern, Hewlett-Packard (HPE) Jun. 2014 - Aug. 2014
Intelligent Data Infrastructure for AI: LLM-driven optimizations for storage systems, similarity search indexing, storage systems for direct GPU-SSD I/Os, LLM-serving optimizations (multi-level caching, offloading, and indexing), failure protection and recovery for LLM training and serving (bitflip protection, ECC, replication).
Key-Value Stores and NoSQL Databases: LSM-based key-value stores (RocksDB, LevelDB, HBase), graph databases (Nebula, Neo4j), indexing for storage and database systems (hybrid, R-tree, B-Tree, and ART), caching optimizations for key-value stores and databases (tiered, persistent, and disaggregated).
Disaggregated Data Infrastructure: Disaggregated storage systems, disaggregated memory (with CXL and RDMA), cloud storage, distributed object storage.
Storage Systems for Emerging and Sustainable Devices: Non-Volatile Memory (NVM), Shingled Magnetic Recording (SMR), Interlaced Magnetic Recording (IMR), Zoned Namespace SSDs (ZNS SSDs), and DNA storage.
[ICDE'26] Fei Shao, Jia Zou, Zhichao Cao, Xusheng Xiao. “PROGQL: A Provenance Graph Query System for Cyber Attack Investigation.” The 42nd IEEE International Conference on Data Engineering (ICDE), To Appear.
[EMNLP’25] Yuhang Chen, Zhen Tan, Ajay Kumar Jaiswal, Huaizhi Qu, Xinyu Zhao, Qi Lin, Yu Cheng, Andrew Kwong, Zhichao Cao, Tianlong Chen. “Bit-Flip Error Resilience in LLMs: A Comprehensive Analysis and Defense Framework.” The 2025 Conference on Empirical Methods in Natural Language Processing (EMNLP), Main Conference, To Appear.
[HPDC'25 (Best Student Paper Award!)]Chang Guo, Ning Yan, Lipeng Wan, Zhichao Cao. "LegoIndex: A Scalable and Modular Indexing Framework for Efficient Analysis of Extreme-Scale Particle Data". The 34th ACM International Symposium on High-Performance Parallel and Distributed Computing (HPDC), Research Track Full Paper, 2025.
[HotStorage’25] Chang Guo, Norbert Podhorszki, Greg Eisenhauer, Zhiwen Xie, Scott Klasky, Zhichao Cao. “Unlocking the Unusable: A Proactive Caching Framework for Reusing Partial Overlapped Data.” The 17th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage), 2025.
[SIGMOD'25] Viraj Thakkar, Dongha Kim, Hokeun Kim, Zhichao Cao. "SHIELD: Encrypting Persistent Data of LSM-KVS from Monolithic to Disaggregated Storage". Proceedings of ACM Conference on Management of Data (SIGMOD), Research Track Full Paper, 2025.
[SOSP'24] Shushu Yi, Shaocong Sun, Li Peng, Yingbo Sun, Ming-Chang Yang, Zhichao Cao, Qiao Li, Myoungsoo Jung, Ke Zhou, Jie Zhang. "BIZA: Design of Self-Governing Block-Interface ZNS AFA for Endurance and Performance" The 30th ACM Symposium on Operating Systems Principles (SOSP 2024).
[TC'24] Yixun Wei, Zhichao Cao, David HC Du. “CPI: A Collaborative Partial Indexing Design for Large-Scale Deduplication Systems.” IEEE Transactions on Computers, November 2024
[HotStorage’24 (Best Paper Award!)] Viraj Thakkar, Madhumitha Sukumar, Jiaxin Dai, Kaushiki Singh, Zhichao Cao. “Can Modern LLMs Tune and Configure LSM-based Key-Value Stores?” 16th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage), 2024.
[HotStorage’24] Chongzhuo Yang, Zhang Cao, Chang Guo, Ming Zhao, Zhichao Cao. “Can ZNS SSDs be Better Storage Devices for Persistent Cache?” 16th ACM Workshop on Hot Topics in Storage and File Systems (HotStorage), 2024.
[MSST'24] Zhang Cao, Chang Guo, Ziyuan Lv, Anand Ananthabhotla, Zhichao Cao "SAS-Cache: A Semantic-Aware Secondary Cache for LSM-based Key-Value Stores. The 38th International Conference on Massive Storage Systems and Technology (MSST), Research Track Full Paper, 2024.
[MSST'24] Gaoji Liu, Chongzhuo Yang, Qiaolin Yu, Chang Guo, Wen Xia, Zhichao Cao "Prophet: Optimizing LSM-Based Key-Value Store on ZNS SSDs with File Lifetime Prediction and Compaction Compensation". The 38th International Conference on Massive Storage Systems and Technology (MSST), Research Track Full Paper, 2024.
[SIGMOD'24] Qiaolin Yu, Chang Guo, Jay Zhuang, Viraj Thakkar, Jianguo Wang, Zhichao Cao. "CaaS-LSM: Compaction-as-a-Service for LSM-based Key-Value Stores in Storage-Disaggregated Infrastructure". Proceedings of ACM Conference on Management of Data (SIGMOD), Research Track Full Paper, 2024.
[ICCD'23] Zhichao Cao, Hao Wen, Fenggang Wu, David H.C. Du. "SMRTS: A Performance and Cost-Effectiveness Optimized SSD-SMR Tiered File System with Data Deduplication". The 41st IEEE International Conference on Computer Design (ICCD) (Acceptance rate: 28%), Research Track Full Paper, 2023.
[ICCD'23] Hao Wen, Zhichao Cao, Bingzhe Li, David Du, Ayman Abouelwafa, Doug Voigt, Shiyong Liu, Jim Diehl and Fenggang Wu "K8sES: Optimizing Kubernetes with Enhanced Storage Service-Level Objectives". The 41st IEEE International Conference on Computer Design (ICCD),(Acceptance rate: 28%), Research Track Full Paper, 2023.
[TOS’22] Zhichao Cao, Huibing Dong, Yixun Wei, Shiyong Liu, and David H.C. Du. “IS-HBase: An In-Storage Computing Optimized HBase with I/O Offloading and Self-Adaptive Caching in Compute-Storage Disaggregated Infrastructure.” ACM Transaction on Storage, Volume 18, Issue 2, May 2022.
[TOS’22] Hiwot Tadese Kassa, Jason Akers, Mrinmoy Ghosh, Zhichao Cao, Vaibhav Gogte, Ronald Dres-linski. “Power-optimized Deployment of Key-value Stores Using Storage Class Memory.” ACM Transaction on Storage, Volume 18, Issue 2, May 2022.
[TOS’22] Xiongzi Ge Zhichao Cao, and David H.C. Du. “HintStor: A Framework to Study I/O Hints in Heterogeneous Storage.” ACM Transaction on Storage, Volume 18, Issue 2, May 2022.
[ATC’21] Hiwot Tadese Kassa, Jason Akers, Mrinmoy Ghosh, Zhichao Cao, Vaibhav Gogte, Ronald Dreslin- ski. “Improving Performance of Flash Based Key-Value Stores Using Storage Class Memory as a Volatile Memory Extension.” 2021 USENIX Annual Technical Conference, 2021 (Acceptance rate: 64/341=23% as Full Paper).
[FAST’20] Zhichao Cao, Siying Dong, Sagar Vemuri, and David H.C. Du.. “Characterizing, Modeling, and Benchmarking RocksDB Key-Value Workloads at Facebooke.” 18th USENIX Conference on File and Storage Technologies, 2020 (Acceptance rate: 23/138=17% as Full Paper).
[FAST’19] Zhichao Cao, Shiyong Liu, Fenggang Wu, Guohua Wang, Bingzhe Li, and David H.C. Du. “Sliding Look-Back Window Assisted Data Chunk Rewriting for Improving Deduplication Restore Performance.” 17th USENIX Conference on File and Storage Technologies, 2019 (Acceptance rate: 26/145=18% as Full Paper).
[TOS’19] Zhichao Cao, Hao Wen, Xiongzi Ge, and David H.C. Du. “TDDFS: A Tier-aware Data Deduplication based File System.” ACM Transaction on Storage, 2019.
[FAST’18] Zhichao Cao, Hao Wen, Fenggang Wu, and David H.C. Du. “ALACC: Accelerating Restore Performance of Data Deduplication Systems Using Adaptive Look Ahead Window Assisted Chunk Caching.” 16th USENIX Conference on File and Storage Technologies, 2018 (Acceptance rate: 23/139=17% as Full Paper).
Poster and Working-in-Progress
[FAST’25] Chang Guo, Zhenyu Zhang, Zhichao Cao. "EverCache: A Multi-Tier KVCache Engine for High-Performance and High-Efficiency LLMs Inferencing" 23rd USENIX Conference on File and Storage Technologies, 2025.
[FAST’25] Jiajun Li, Chang Guo, Zhichao Cao. "AnyTier: An LSM-Managed Dynamic Data Tiering Framework with High Generality and Efficiency" 23rd USENIX Conference on File and Storage Technologies, 2025.
[FAST’25] Yibo Zhao, Viraj Thakkar, Zhichao Cao, Zaoxing Liu. "NetLSM: Enabling an In-Network Approach for Scheduling LSM-KVS Operations" 23rd USENIX Conference on File and Storage Technologies, 2025.
[FAST’24] Madhumitha Sukumar, Jiaxin Dai, Kaushiki Singh, Vikriti Lokegaonkar, Viraj Thakkar, Zhichao Cao. “LLM-assisted Automatic-Configuration and Tuning Framework for LSM-based Key-Value Stores.” 22nd USENIX Conference on File and Storage Technologies, 2024.
[FAST’23] Kritshekhar Jha, Ian Mcdonough, Alexander Sutila, Zhichao Cao, and Ming Zhao.. “DM-ZCache:Zoned Namespace (ZNS) SSD based Caching.” 21st USENIX Conference on File and Storage Technologies, 2023.
[FAST’23] Jinghuan Yu, Yixun Wei, Zhichao Cao, David H.C. Du, and Chun Jason Xue.. “Level-based Shard Migration in Distributed LSM KV Store.” 21st USENIX Conference on File and Storage Technologies, 2023.
[FAST’17] Zhichao Cao, Fenggang Wu, Hao Wen, and David H.C. Du. “Optismr: Restore-Performance Optimization for Deduplication Systems Using SMR Drives.” 16th USENIX Conference on File and Storage Technologies, 2017.
FAST’17] Hao Wen, Zhichao Cao, Yang Zhang, and David H.C. Du. “Guaranteed QoS with Integrated Control for Networked Storage.” 16th USENIX Conference on File and Storage Technologies, 2017.
[SoCC’14] Xiongzi Ge, Zhichao Cao, and David H.C. Du. “OneStore: Integrating Local and Cloud Storage with Access Hints.” ACM Symposium on Cloud Computing, 2014.
Courses
2026 Spring
| Course Number | Course Title |
|---|---|
| CSE 792 | Research |
| CSE 799 | Dissertation |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
2025 Fall
| Course Number | Course Title |
|---|---|
| CSE 599 | Thesis |
| CSE 792 | Research |
| CSE 795 | Continuing Registration |
| CSE 799 | Dissertation |
| CSE 580 | Practicum |
| CSE 584 | Internship |
| CSE 790 | Reading and Conference |
2025 Summer
| Course Number | Course Title |
|---|---|
| CSE 584 | Internship |
2025 Spring
| Course Number | Course Title |
|---|---|
| CSE 792 | Research |
| CSE 799 | Dissertation |
| CSE 792 | Research |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 593 | Applied Project |
2024 Fall
| Course Number | Course Title |
|---|---|
| CSE 792 | Research |
| CSE 799 | Dissertation |
| CSE 580 | Practicum |
| CSE 511 | Data Processing at Scale |
2024 Summer
| Course Number | Course Title |
|---|---|
| CSE 584 | Internship |
| CSE 792 | Research |
| CSE 584 | Internship |
| CSE 792 | Research |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
2024 Spring
| Course Number | Course Title |
|---|---|
| CSE 599 | Thesis |
| CSE 792 | Research |
| CSE 580 | Practicum |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
2023 Fall
| Course Number | Course Title |
|---|---|
| CSE 792 | Research |
| CSE 580 | Practicum |
| CSE 792 | Research |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
2023 Summer
| Course Number | Course Title |
|---|---|
| CSE 584 | Internship |
| CSE 584 | Internship |
| CSE 599 | Thesis |
2023 Spring
| Course Number | Course Title |
|---|---|
| CSE 599 | Thesis |
| CSE 580 | Practicum |
| CSE 511 | Data Processing at Scale |
2022 Fall
| Course Number | Course Title |
|---|---|
| CSE 580 | Practicum |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
| CSE 330 | Operating Systems |
2022 Spring
| Course Number | Course Title |
|---|---|
| CSE 511 | Data Processing at Scale |
- NSF CAREER Award, 2025
- ACM HPDC 2025 Best Student Paper Award, 2025
- Distinguished Reviewers Board of ACM Transactions on Database Systems, 2025
- ACM HotStorage 2024 Best Paper Award, 2024
- VLDB 2024 Distinguished Reviewer Award, 2024
Editorships
- Associate Editor: ACM Transactions on Storage
Conference Organizer
- Mentorship Program Co-Chair of USENIX FAST 2025
- Publicity Co-Chair of ACM HotStorage 2025
- Session Chair of USENIX ATC 2024
- Session Chair of ACM HotStorage 2024
- Publicity Co-Chair of MSST 2024
- Session Chair of IEEE ICCD 2023
- Session Chair of ACM SIGMOD 2023
- Proceedings Co-Chair of ACM SIGMOD 2023
- Virtual Chair of ACM HotStorage 2022
Technical Program Committees
- Program Committee of USENIX FAST 2027
- Program Committee of ACM SIGMOD 2027
- Program Committee of SC 2026
- Program Committee of USENIX FAST 2026
- Program Committee of ACM SIGMOD 2026
- Program Committee of IEEE ICDE 2026
- Mentorship Program Co-Chair of USENIX FAST 2025
- Publicity Co-Chair of ACM HotStorage 2025
- Program Committee of USENIX ATC 2025
- Program Committee of USENIX FAST 2025
- Program Committee of ACM SIGMOD 2025
- Program Committee of VLDB 2025
- Program Committee of IEEE ICDCS 2025
- Program Committee of ACM HotStorage 2025
- Program Committee of USENIX ATC 2024
- Program Committee of ACM SIGMOD 2024 (Demo track)
- Program Committee of VLDB 2024
- Publicity Co-Chair of MSST 2024
- Program Committee of ACM HotStorage 2024
- Program Committee of ACM SYSTOR 2024
- Proceedings Co-Chair of ACM SIGMOD 2023
- Program Committee of ACM SIGMOD 2023
- Program Committee of ICPP 2023
- Program Committee of ACM HotStorage 2023
- Virtual Chair of ACM HotStorage 2022
- Program Committee of IEEE NAS 2022
- Program Committee of ACM APSys 2022
Journal Reviewing
- Reviewer of ACM Transactions on Storage (TOS), 2022, 2023, 2024, 2025
- Reviewer of ACM Transactions on Database Systems (TODS) 2024, 2025
- Reviewer of IEEE Transactions on Computers (TC), 2023,2024
- Reviewer of ACM Transactions on Architecture and Code Optimization (TACO) 2024, 2025
- Reviewer of IEEE Micro 2024, 2025
- Reviewer of IEEE/ACM Transactions on Networking (TON), 2022, 2023, 2024
- Reviewer of IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems (TCAD), 2022
- Reviewer of International Journal of Future Generation Computer Systems, 2021
- Reviewer of Transactions on Cloud Computing, 2022
- Reviewer of Computer Communications, 2022
- Reviewer of IEEE Intelligent Systems, 2022
- Volunteer of International Conference on Parallel Processing (ICPP’14)
- Research Scientist Facebook Oct. 2019 - Dec. 2021
- Research Collaborator Facebook Sep. 2018 - Sep. 2019
- Research Intern Facebook Jun. 2018 - Aug. 2018
- Research Intern Veritas Jun. 2016 - Aug. 2016
- Research Intern Hewlett-Packard (HPE) Jun. 2015 - Aug. 2015
- Research Intern Hewlett-Packard (HPE) Jun. 2014 - Aug. 2014