Module 1 Lesson 1 :

FlashBlade: why we need a new category of storage

  • Challenge: industry shift in unstructured data and the emergence of a compelling opportunity for a new storage approach, beyond the existing solutions at the time of the talk. The speaker teases that traditional problems around unstructured data weren’t solved in the way modern big data applications require.
  • Goal: introduce FlashBlade before diving into hardware, framing it as a response to evolving workloads and the need for high scale with high performance.

Unstructured data challenges and evergreen storage: quick intro

  • Unstructured data history starts with file servers; the speaker notes a nuance: it’s not only “DAS” (direct-attached storage) but a broader evolution.
  • In the very early IT days, storage was directly attached to client systems. Over time, the idea of client-server storage emerged: data lived on a server, accessed by clients over the network (the file server concept).
  • Benefits of centralized file servers:
    • IT gains control of data, enabling backups and protection.
    • Security advantages: data protected with server-side controls; if a client PC is stolen, sensitive data is less exposed.
    • Data can be shared across many clients; data appears local to each client but is actually stored remotely.
  • Early networking partnerships: NFS as a protocol enabling shared file access across clients.
  • Scale-up NAS evolution popularized by NetApp: introduced the concept of a dedicated file server appliance (a “filer”).
    • Filers are optimized for file serving: faster, more secure, scalable for file workloads, using familiar file protocols; easier integration with existing file-serving workflows.
    • Features typically cluster for high availability (e.g., at least two controllers; failover if one fails).
  • Small environments vs. larger environments:
    • In small environments, a traditional file server or entry-level NAS is often enough.
    • In larger environments, a specialized NAS appliance provides value through consolidation and advanced file services (snapshots, clustering).
  • EMC Solara as a practical example in pre-sales discussions: the value of consolidating six to eight file servers onto a single system.
  • Today, FlashArray has gained NAS capabilities: FlashArray can serve file data and block data on the same system; described as a unified system. Rationale: file-serving workloads remain popular for many customers.
  • The problem statement: if you grow beyond what a single scale-up NAS can handle, you run into a scale limit where you’re effectively operating one powerful controller plus a secondary for failover. You may need multiple NAS devices, leading to sprawl.
  • File virtualization attempts to solve scale-out needs by presenting a single NAS view while backing it with multiple underlying NAS devices. Approaches include:
    • DFS (Distributed File System) from Microsoft.
    • Acopia (a hardware/software solution for generic file virtualization).
    • Analogy to IBM SVC (SAN Volume Controller) but for file storage instead of block storage.
  • Limitations of virtualization approaches:
    • Admin complexity grows because you manage virtualization plus multiple NAS devices.
    • Scale-out performance is not always even; load balancing can be challenging.
    • The admin burden can be high, and some solutions struggle at very high scales.
  • Traditional scale-out NAS (Isilon/PowerScale) introduced integrated virtualization within the cluster:
    • Nodes provide storage, compute, networking, and memory.
    • To scale, you add nodes; nodes connect over a private, high-performance network.
    • Virtualization is built into the system, so clients get a simple view while the backend distributes data across nodes.
    • Load balancing in large clusters is handled within the cluster, often easing the admin burden.
  • Recap of the ladder of options before FlashBlade:
    • Small scale: direct-attached storage (DAS) or file server.
    • Reasonable scale: scale-up NAS (FlashArray with NAS capabilities or a similar product).
    • Very large scale: traditional scale-out NAS, or specialized virtualization appliances (DFS/Acopia).
    • Consolidation: Isilon/PowerScale and Cumulo-like approaches offer scale-out NAS with integrated virtualization.
  • The big question: why FlashBlade? The hint is that a new type of application emerged requiring both high scale and very high performance, which traditional approaches weren’t optimized for.

The modern unstructured data landscape: different data needs, different architectures

  • Traditional architectures were built around different target workloads:
    • Backup and data warehousing: structured data, batch processing, random reads, scale-up architectures.
    • Data lakes: unstructured data, batch processing, mostly sequential processing, scale-out compute with scale-out storage in mind.
    • Streaming analytics: unstructured data, micro-batch and real-time processing; architecture emphasizes multi-dimensional scalability (scale up and scale out).
    • AI pipelines: extremely parallel workloads, real-time to near-real-time data access; GPUs become central, demanding massive parallelism and high data throughput.
  • AI and GPUs:
    • GPUs enable massive parallel computation; high value and cost require high utilization.
    • With big data AI workloads, you want storage that doesn’t bottleneck GPU compute and data scientists; FlashBlade is framed as a solution to keep GPUs busy by delivering data quickly and in parallel.
  • The takeaway: modern big data applications demand a single platform that can handle rapid data protection/restores, analytics, AI, data lakes, EDA, and DevOps pipelines, with a simple and scalable experience.
  • FlashBlade was designed to support these workloads in parallel, consolidating modern workloads onto one centralized platform while staying simple to manage.
  • The FlashBlade roadmap emphasized evergreen upgrades (see below) to avoid disruptive migrations and keep systems current with evolving workloads.

FlashBlade: core concepts and differentiators

  • Unified Fast File and Object (UFFO): the new category name for a platform that supports both file and object storage on a single system, aimed at modern big data applications.
    • Purpose: to consolidate file and object workloads that previously lived in silos, enabling high performance at scale with a simple management model.
  • Applications targeted by FlashBlade (examples mentioned):
    • Rapid data protection and rapid restore
    • Modern analytics (e.g., Splunk, Elastic)
    • AI workloads
    • Data lakes (healthcare imaging, enterprise imaging)
    • Electronic design automation (EDA) involving microchip design with huge transistor layouts
    • Software development and DevOps pipelines where fast builds and tests matter for CI/CD
    • Container-based applications and environments where parallelizable workloads scale out
  • Architecture goals:
    • Centralized, scalable storage for modern big data workloads
    • Parallel access from many clients for both file and object interfaces
    • High performance at scale, not just high capacity
    • Simple administration: designed so storage admins are not required to be storage experts; emphasis on ease of use for data scientists and developers
    • Evergreen model: continuous improvement without disruptive migrations or maintenance windows
  • Evergreen value proposition:
    • Day-one performance and capability, with ongoing improvements over time through non-disruptive upgrades
    • Features and enhancements delivered via subscription; customers pay for features as they are deployed
    • Continuous hardware and software evolution: faster memory, CPUs, networking, etc., with upgrades that can be applied while the system is running
  • The S and E product variants:
    • FlashBlade $S$ (S-variant): performance-optimized at scale; higher cost but greater capacity flexibility
    • FlashBlade $E$ (E-variant): lower cost per GB; lower peak performance but similar architecture and evergreen model
    • Both variants use Pure1, run Purity FB, and are evergreen so upgrades apply to both without major migrations
  • Evergreen architecture and three architectural pillars (note about a missing bullet on the slide):
    • Pillar 1: The system is an integrated hardware-and-software storage solution designed as a single product (not just software on generic hardware).
    • Pillar 2: Scale-out by adding blades; seamless horizontal scaling without complex re-architecting
    • Pillar 3: Simple, easy-to-use administration; the system is designed so admins without deep storage backgrounds can manage it; fully integrated storage stack
    • The speaker notes a missing bullet point on the slide; emphasizes the integrated nature and simplicity as core tenets
  • Why File and Object (instead of Block) for modern workloads:
    • Block storage is optimized for low latency to a single or few servers; not ideal when tens, hundreds, or thousands of clients access the same data set concurrently
    • File storage enables parallel access from many clients using traditional file protocols (e.g., NFS); Object storage offers parallel access and a RESTful API, better suited for application-centric access and cloud-like workflows
    • FlashBlade aims to deliver high performance for both file and object workloads simultaneously, addressing prior performance bottlenecks
  • Why not DAS, NAS, or object alone?
    • DAS Benefits and limits: simple but hard to scale; inefficient utilization at scale; difficult management in big data contexts
    • NAS/scale-up approaches: scalable, but limited by single-controller bottlenecks; scaling beyond what a single controller can handle leads to silos or sprawl
    • Traditional object storage and RESTful cloud-style stores: great for large-scale storage with flexible access patterns, but historically designed for mediates performance and object-level operations rather than the real-time, high-parallelism needs of AI, analytics, and data-intensive pipelines
    • The FlashBlade message: unify file and object workloads with an evergreen, scale-out, high-performance platform that avoids silos and migrations
  • How FlashBlade handles high-scale, high-performance requirements:
    • The platform is designed to be evergreen, with non-disruptive upgrades to both software and hardware components
    • High degree of parallelism: multiple clients accessing the same dataset concurrently without compromising performance
    • Data remains accessible and consistent during upgrades; no forced maintenance windows
  • Real-world implications and practical benefits:
    • Consolidation eliminates silos and reduces administrative overhead
    • Higher utilization of storage resources due to consolidation and parallel access
    • Investment protection through evergreen upgrades and continuous performance improvements
    • Simplified data management and governance for modern workloads, including AI and data science pipelines
  • Two important distinctions in deployment strategy:
    • FlashBlade S: higher performance, scale-out capability with more aggressive performance characteristics
    • FlashBlade E: lower cost per GB, lower peak performance but still robust for nearline workloads; both variants support the same software and evergreen model

FlashBlade in context: Silo elimination, consolidation, and the new data era

  • The big picture: modern big data applications require both high performance and high scale; traditional architectures were designed for different workloads (e.g., media archives, basic file sharing, or batch data processing) and thus required workarounds (SSDs in older products, virtualization, etc.)
  • The consequence of legacy approaches:
    • DAS makes a comeback in some implementations due to hyper-converged strategies, spreading data across many servers and attempting to access data locally; this is complex, hard to manage, and often inefficient at scale
    • Sprawl emerges as teams create silos for different projects (data warehousing, AI, etc.) without a centralized platform to share data effectively
  • The FlashBlade promise in practice:
    • A single, scalable platform that can serve both file and object workloads with high performance
    • A replacement for many silos and bespoke setups with a unified, scalable architecture
    • An evergreen evolution path so the system remains current with the latest hardware and software capabilities without migrations
  • Applications cited as proof points for FlashBlade’s relevance:
    • Rapid restore for backups and disaster recovery scenarios
    • High-performance analytics and AI workflows (e.g., Splunk, Elastic, machine learning pipelines)
    • Data lakes and imaging (healthcare, enterprise) and EDA workflows
    • DevOps pipelines and CI/CD acceleration through faster builds and tests
  • The role of containers in modern workloads:
    • Containerized applications naturally align with FlashBlade’s scale-out architecture and parallel data access patterns
    • The platform’s design supports container-based workflows and microservices at scale

Summary and practical takeaways

  • FlashBlade represents a strategic shift from siloed, specialized storage tiers to a unified platform capable of handling modern big data workloads with high performance and high scale.
  • By combining file and object storage, FlashBlade addresses a broad set of use cases—from rapid data protection to AI and data lake workloads—without the need for migrations or complex re-architecting.
  • The evergreen model ensures that organizations remain competitive by continuously benefiting from hardware and software improvements over time, with non-disruptive upgrades.
  • The distinction between FlashBlade S and FlashBlade E offers a choice between maximum performance and lower cost-per-GB, while preserving a coherent targeting of modern workloads.
  • In other words: FlashBlade aims to provide simplicity, consolidation, and continuous evolution in a single platform designed specifically for the demands of modern big data applications.

Key terms and concepts to remember

  • UFFO: Unified Fast File and Object storage category introduced by Pure.
  • FlashBlade S: performance-optimized at scale; higher cost, flexible capacity
  • FlashBlade E: economy option; nearline performance; lower cost per GB
  • Evergreen: non-disruptive upgrades and ongoing improvements over time
  • File virtualization (DFS, Acopia): attempts to present a single view of many underlying file systems
  • Isilon/PowerScale: traditional scale-out NAS with integrated virtualization
  • Data types/workloads: backups/restores, data warehousing, data lakes, streaming analytics, AI pipelines, EDA, gaming CI/CD pipelines
  • Block vs File vs Object: block is low-latency for single servers; file/object support parallel access for multi-client workloads; object uses RESTful APIs and flat namespaces with rich metadata and immutability
  • RESTful APIs, flat namespace, immutability, metadata: core attributes of object storage
  • Sprawl: proliferation of storage silos across projects or teams, reducing utilization and increasing management complexity
  • Solara / Isilon / Cumulo: historical references to consolidation and scale-out NAS approaches
  • Three architectural pillars (summary): integrated system design, scale-out with blades, simple administration

Connections to broader concepts and real-world relevance

  • The evolution from DAS and file servers to NAS to scale-out NAS mirrors a broader theme in IT: centralization for control and efficiency vs. decentralization for local performance. FlashBlade represents a modern synthesis aimed at the needs of data-intensive, real-time workloads.
  • The shift to unified file and object aligns with trends in cloud-native and AI workloads, where applications expect RESTful interfaces, flat namespaces, and rich metadata for automated processing, search, and governance.
  • Evergreen upgrades reflect a broader industry move away from disruptive migrations toward continuous delivery models, improving TCO and reducing downtime risk for mission-critical workloads.
  • The discussion highlights practical implications about data governance, cost management, and vendor strategy: consolidation reduces silos, but organizations must consider licensing, upgrades, and total cost of ownership over multi-year horizons.
  • Ethical and practical considerations include ensuring data privacy, security, and compliance in a consolidated storage platform; the easier migration paths and upgrades should not bypass critical governance checks or introduce single points of failure through over-consolidation.