12–13 Mar 2026
Maison des Mines et des Ponts et Chaussées
Europe/Paris timezone

lustre-db: Scalable Metadata Analytics for Large-Scale Lustre Filesystems

13 Mar 2026, 09:25
25m
Maison des Mines et des Ponts et Chaussées

Maison des Mines et des Ponts et Chaussées

270 Rue Saint-Jacques, 75005 Paris
Presentation Session I: Operational Experiences and Aspects Session C

Speaker

Janos Zimmermann (German Climate Computing Center)

Description

We operate a 120 PiB Lustre filesystem at DKRZ with billions of inodes. At the same time, climate and Earth system workflows are increasingly moving toward chunked, object-style formats such as Zarr. While this shift enables scalable and cloud-aligned data access patterns, it also dramatically increases inode counts. As a result, traditional namespace traversals become slow, resource-intensive, and difficult to run continuously at scale.

We present lustre-db, a lightweight and scalable metadata analytics framework designed to persist and query the current state as well as the historical evolution of our Lustre filesystem. The system incrementally captures inode-level metadata changes and stores them in a columnar database (DuckDB), enabling efficient SQL-based analytics across billions of records.

This talk introduces the architecture, data model, ingestion strategy, and performance characteristics of lustre-db in production, along with practical lessons learned from operating it at large scale.

Author

Janos Zimmermann (German Climate Computing Center)

Presentation materials