Second in a three-part series on the changing storage landscape.
So today I’ll jump back into my discussion on key storage technologies, turning my attention to what still continues to be the heart and soul of the storage industry: SAN. (First, as a quick note, while in fact the “SAN” is the actual network to which a disk array is connected, I’ve found that many people actually refer to the disk array as “the SAN”).
Let’s begin with shared external disk storage systems, aka Storage Area Network (SAN) Arrays
Despite the meteoric rise in unstructured data growth and file content, SAN arrays still account for a majority of the money spent worldwide on external disk storage. SAN connectivity options traditionally were based around Fibre Channel, but in recent years the broader adoption of 10GbE has firmly planted iSCSI as a mainstream choice. Fibre Channel over Ethernet (FCoE) rounds out the options, but has been a relatively slow starter in the market.
Independent of how you connect, SAN is all about the consolidation of multiple workloads into a shared pool of capacity. The way that capacity is used depends on the host that a given volume/LUN is presented to. The disk array itself simply presents the capacity. Any file system or OS requirement is driven by the host on the other side. The disk array itself is able to provide advanced data services such as multi-site replication and sub-LUN data tiering across different physical disk types. Applying these data services once at the array level can be much more efficient than managing them separately at the host level. This is particularly true in enterprise environments where you may have dozens if not hundreds of physical servers connected to the same disk array.
SAN growth, fueled by virtualization
One of the biggest drivers of SAN segment growth in the last 10 years has been virtualization. In addition to utilizing compute resources much more effectively, virtualization has provided the needed abstraction layer between hardware and workloads to provide workload mobility. Moving applications across physical hosts for maintenance, workload balancing, or disaster recovery requires shared storage. It will be interesting to see if hypervisor vendors follow the likes of MS Exchange and build more storage smarts into the hypervisor level. But for now, if you are running virtual servers, chances are that you are connecting to some sort of shared storage device.
One interesting riff on the SAN song and dance is that of the Virtual SAN Appliance–software that is able to utilize host-based DAS, but pool that capacity together and represent it out to applications as shared capacity. This hybrid doesn’t exclusively fit in DAS or SAN, but is really a blending of the two. LeftHand Networks was one of the early pioneers in this space with their VSA solution. Originally positioned as a starter SAN for those not quite ready for dedicated hardware, it has recently found an entirely new market segment as large independent cloud providers look to build out shared infrastructure in a more cost-effective way. VMware has seen the appeal of this approach and recently started including a VSA within its vSphere product. While the VMware VSA is an early-generation product and therefore feature-limited as compared to the LeftHand VSA, it is a good validation of this emerging space.
A solid case for SSD
Another emerging category of SAN is that of the solid-state disk (SSD) array. While functionally similar to traditional disk arrays in that they present shared capacity to multiple hosts, these solid state-optimized platforms are capable of reaching absurdly high IO points with extremely low latency. For certain high-performance computing applications, these are a great pick. Mainstream computing environments, however, are more likely to benefit cost-wise from selecting storage that delivers robust data services and utilizes sub-LUN tiering (as discussed in this ESG white paper) to move “hot” data onto an SSD tier dynamically and only as needed while still maintaining a majority of blocks on large, low-cost disks. 3PAR was the first high-end disk system vendor in the market to ship sub-LUN tiering software to capitalize on the fact that 90% of the IO on most systems comes from less than 10% of the blocks. It’s this reality that makes me think that highly optimized SSD arrays are overkill for most and will continue to be a niche for a while to come.
Fast approaching on the SAN horizon: storage federation
One last topic worth touching on in the SAN category is that of storage federation. Much like VMware vMotion lets you move virtual machine workloads between hosts, storage federation is an emerging technology that lets you non-disruptively move data volumes natively between homogeneous storage systems. Like other array-based data services, federation can be a very efficient way to balance workloads, deal with maintenance, or even take the pain out of infrastructure refresh actions. Similar to how thin provisioning has become table stakes for disk systems nowadays, storage federation will become more and more important as data center managers seek to manage capacity at a singular persistent level versus managing individual disk arrays.
Next up: Network Attached Storage (NAS)
I heard an analyst recently quote a stat that by 2015, over 80% of storage capacity sold worldwide will be shipped in support of file-level or “unstructured” data. That is file-based information that doesn’t fit neatly in a column-and-row database. Audio, video, graphics, and all of the other content we Facebook and Twitter about are good examples. If you can’t fit all of this neatly in a database, then you need a file system to make order out of the chaos and sit above the low-level disk operations.
NAS is essentially a dedicated appliance with a built-in file system to store these files and then present file shares to servers, applications and users. Most often we think about mapping to these systems using protocols such as SMB/CIFS (for Windows) or NFS (for Linux/Unix). In the past, I’ve heard many people try and plot NAS and SAN on a continuum as if NAS was simply a less sophisticated storage device than a SAN or a steppingstone to SAN, but really they are completely different animals for completely different purposes. Just as SAN can range from entry-level departmental systems to large tier-1 arrays, the same is true with NAS. I have a 2TB Microsoft Home Server as a NAS device at home compared to web 2.0 companies with multiple petabytes of NAS storing millions of files for hundreds of thousands of users. SAN is SAN and NAS is NAS (unified will come later).
Originally, NAS filers emerged as companies realized that simply putting user shares and file data on random Windows servers or home-grown Linux-based NFS servers was likely not the best way to deal with all of this file growth. However, just as people originally were consolidating islands of files into a dedicated NAS filer, they are now looking to consolidate dozens of distributed NAS filers into something easier to manage. This is where scale-out clustered NAS has come to the party. These systems can physically store data on multiple underlying disk technologies but present a large single “namespace” or file system out to the network. The concept of a single namespace simplifies management and enables the policy-based movement of files underneath the covers for reasons of performance, capacity, or compliance. IT practitioners love this technology due to the fact that a system admin can have just-in-time capacity expansion to a large pool of NAS. This provides simplicity in management and access, while avoiding over-provisioning and captive storage.
Where SAN and NAS come together
One example of where NAS and SAN come together is the advent of the NAS gateway. These appliance “heads” are processing nodes with an operating system layer. The NAS head is physically attached to back-end storage capacity that sits in a disk array on the SAN. This gateway manages the file system and organization of folders as well as presentation out to the network while the disk array manages the centralized data services we previously discussed. This scenario is a great solution for companies that have invested in a SAN and a disk array but want to increase the utilization of that asset by placing their unstructured files on some of the extra disk. Some might (and do) call this a “unified” storage approach, which sets the stage for the third and final part of this blog series. . . coming soon.
Learn more about storage innovation from HP.
If you missed it, here’s the first blog in this series: The changing flavor of storage alphabet soup. DAS, NAS and SAN oh my