Skip to content
NISHANT RANJAN edited this page Dec 9, 2025 · 12 revisions

What is BlobFuse?

BlobFuse is an open-source virtual file system driver that enables seamless integration of Azure Blob Storage with Linux environments. It allows users to mount Azure storage account containers as a file system, making blob data accessible through standard Linux file operations. BlobFuse translates these operations into Azure Blob REST API calls, allowing your applications to leverage the scalability and durability of Azure Blob Storage.

BlobFuse provides several caching mechanisms (file, metadata, attribute caching) to enhance performance and minimize network traffic charges. Users can configure cache location, size, and retention policies for optimal performance.

Note: BlobFuse refers to BlobFuse v2 as we are retiringBlobFuse v1 by September 2026.

Key Use cases:

Model training and checkpointing for AI and ML - Azure BlobFuse boosts AI/ML workflows by providing fast access to multi-petabyte datasets in Azure Blob Storage with caching. It allows compute nodes (VMs, containers, AKS pods) toefficiently load training data and save model checkpoints. Preloading data withBlobfuse2 ensures quick access before training starts, helping optimize GPU usage. BlobFuse have been validated with distributed ML frameworks like PyTorch and Ray for greater workflow portability.

High-Performance Computing (HPC) - Enables rapid, scalable access to Azure Blob Storage in HPC settings, supporting efficient data processing across domains such as:

  • Autonomous driving workloads (ADAS)using Azure Kubernetes services (AKS), leveraging BlobFuse for large-scale simulation and model training data.

  • Hydrofoil simulations, where BlobFuse manages computational files and results for streamlined engineering analysis.

  • Genomics sequencing, benefiting fromBlobFuse’s ability to handle large datasets and accelerate data sharing.

  • Gaming simulations, relying on quickdata access with BlobFuse to boost parallel processing and scale complexscenarios.

Cloud-Native Workload Integration - BlobFuse is usedas a persistent storage layer for containers and stateful workloads in Kubernetes using the CSI driver. It allows applications to share large files, model weights, or logs using Azure Blob Storage’s scalable capacity, and is well suited for ReadWrite or ReadOnly access modes in shared cluster scenarios.

Big Data Analytics/AI training data pre-processing - Enhances analytics workload by integrating with tools like Hadoop and Spark for efficient data storage and retrieval. BlobFuse is also useful for pre-processing data on blobs for AI data cleaning, validating and pre-processing.

Data backup and Archiving - BlobFuse streamlines the backup and archiving of large datasets by allowing direct storage in Azure Blob Storage. It supports major backup tasks, such as RMAN database and enterprise system backups, and provides secure, scalable storage for surveillance video, reducing manual data management.

About the BlobFuse2 open source project

BlobFuse2 is an open source project that uses the libfuse open source library (fuse3) to communicate with the Linux FUSE kernel module. BlobFuse2 implements file system operations by using the Azure Storage REST APIs.

The open source BlobFuse2 project is on GitHub:

Clone this wiki locally