Wednesday, May 24, 2017

Thoughts on HyperConverged, and the Future of HyperConverged (Part 2)

So how did we get here? Where did HCI come from?

If we look back at the history of HCI, it seems to have evolved from the idea of using clustered, "whitebox" x86 servers to create a clustered storage system. There were a number of early entrants in the space, some dating back to 2006. Another vector was the idea of a "Virtual Storage Appliance" or VSA, software which ran in a VM, connected to local server hard disk drives, and presented that internal storage to the guest VMs over the internal IP network. The first VSA was from Lefthand in 2007. But the real hyper-converged push started around 2009 with the founding of integrated HCI players Nutanix and SimpliVity.

We also have to look at where the HCI market is today. It is arguably dominated by three primary players: Nutanix; SimpliVity (now part of HPE); and VMware VSAN. They represent the lion's share of the HCI market, and we will come back to them.

If you look at the earlier clustered storage companies, they either offered a scale-out NAS, a kind of commodity alternative to Isilon, a scale-out block storage solution, or a scale-out unified storage solution. These early players came into existence when "grid computing" was the buzzterm of the day, and these architectures were also called "grid storage".

In 2009 Nutanix was founded. There were other virtual storage appliance start-ups, such as Virsto Software (which eventually became VMware VSAN), but it is fair to define the official beginning of the hyper-converged era as August 2011, when Nutanix emerged from stealth. The same month, VMware released vSphere 5 which included its first implementation of a VSA (vSphere Storage Appliance). SimpliVity would emerge from stealth one year later in August 2012. VMware's VSA did not gain traction, and VMware announced its intent to acquire Virsto six months later in February 2013 which represented VMware's serious interest in HCI.

As Nutanix and SimpliVity started to grow, and with VMware's very public acquisition of Virsto, and obvious plans to enter the HCI market, many of the earlier clustered storage vendors and virtual storage appliance vendors redefined themselves as hyper-converged players. Several new industry buzzterms were developed: "Server SAN"; "Virtual SAN"; and "Software Defined Storage", or "SDS".

Many of the early clustered storage system vendors redefined themselves as SDS or HCI players, moving their clustered storage software from bare-metal to run in VMs, and allowing their clustered storage software to run alongside guest VMs on the same server. VSA vendors added more sophisticated clustering, replication, and scalability to their products.

From this, it is fair to say modern HCI owes itself to three parents: Commodity clustered storage systems; virtual storage appliances; and purpose built integrated HCI systems.

To me, the most interesting thing is many of the earlier clustered storage or "grid storage" players had little to no success, but the HCI players saw significant early success. Part of this may have been how each targeted the market. Clustered/grid storage historically had been seen as targeting the high-performance and academic community for technical computing use cases. HCI targeted business organizations and VMware virtualization workloads.

But what cannot be dismissed is the reality the early clustered storage ystems did not provide the level of performance and reliability required for enterprise workloads. The early clustered storage systems were not designed for transactional, random I/O workloads. They were better suited for sequential I/O. The early HCI players focused on addressing write latency and random I/O with aggressive write and read caching. The also focused on ease of use and eliminating the need for storage administrators to provision storage to VMware administrators.

At this point it is interesting to note, there were other players aggressively targeting VMware virtualized workloads. Tintri had come out of stealth five months before Nutanix with its VMware optimized storage platform. It too targeted the VMware admin and sought to used its product to bypass the traditional storage management team in an organization.

So that is the history lesson and the end of Part 2.

Sunday, May 21, 2017

Thoughts on HyperConverged, and the Future of HyperConverged (Part 1)

Almost two years ago I made some observations on HyperConverged Infrastructure, and where I think it needed to go to be successful. I posted these to Twitter at the time. I still stand by some of those observations, for others I am not as sure. But I have done a lot more thinking about the HCI phenomena, and believe change is coming to HCI.

To this point, I recently saw an update of the Gartner Hype Cycle, which showed HCI at the zenith of the "Peak of Inflated Expectations". I agree with this. The question is what comes next? Probably a vendor shake-out.

But another question to ask is "What comes after HCI?" The idea HCI is the end-game for IT infrastructure is a naive assumption. There may be better architectures being worked on by start-ups as I write this.

These were my original observations on HCI:

HCI must support multiple hypervisors, and no hypervisor (i.e., Containers, Hadoop, Oracle RAC, etc.).

At the time, Microsoft was pushing Hyper-V very hard, and I thought Hyper-V was going to make significant penetration into the enterprise. At the same time, some organizations were experimenting with OpenStack and KVM. Today, looking back, VMware still dominates. Hyper-V exists mainly in on-prem Azure Stack deployments, and KVM struggles without a single brand behind it.

As for no-hypervisor HCI (my idea being a combination of OpenStack with Containers and an HCI filesytem embedded in Linux for something like Oracle RAC), this has yet to take off. There is a chance we could see something like it for OpenStack.

HCI must become all-flash for virtualized workloads.

For the most part, this has become true. And the reality is, All-Flash saved HCI, which probably would not have been able to keep up with the performance requirements of virtualized workloads in its hybrid form.

HCI filesystems must be or become flash aware (WAF, etc.).

HCI filesystems have been adapted for flash, but I do not believe they have reached a point to make them comparable to All Flash Arrays in reducing flash wear. They have been able to avoid this by using high Drive Write Per Day (DWPD) SSDs in their caching tier to coalesce writes to low DWPD SSDs in their capacity tier. I see two problems with this approach. The first is the use of a high DWPD SSD as a cache is a carry-over from the hybrid HCI filesystem architecture. There it provided a significant performance boost. When combined with an SSD capacity tier, it provides no performance boost, and only a write wear mitigation benefit. The second issue is high DWPD SSDs are not a high volume part for SSD manufacturers, who would rather manufacture lower DWPD, higher capacity, higher revenue SSDs. Ultimately, high DWPD SSDs may fade away like SLC and eMLC SSDs did. If that happens, what will HCI vendors do?

HCI must move to parity/erasure coding data protection and move away from mirroring/replication based data protection (RF2/RF3).

I believed this was necessary for All-Flash HCI due to the cost of flash, and the capacity of SSDs at the time. I am less sure of this now, at least as a $/GB requirement. I think parity/erasure coding will only be driven by availability requirements, and not $/GB requirements.

HCI must support storage only nodes and compute only nodes for asymmetric scaling.

I believe this even more today. With All-Flash HCI, storage efficiencies (a.k.a., Data Reduction technologies) became critical. When you look at the Virtual Desktop (VDI) use case for HCI, deduplication means storage capacity does not grow linearly with VDI instances. In fact, it hardly grows at all. But what does grow is a need for write caching. If I invested in HCI for VDI, and deployed 200 VDI instances across 4 HCI nodes, and later decided to grow my VDI to 400 instances, I might need 4 more nodes of compute, but deduplication might mean I need only 10% more storage capacity, which I might already have on my existing nodes. I might need a caching SSD on each new node, but not 5 to 11 data drives.

The reverse holds true as well. If I assume a certain storage efficiency ratio, but due to adding workloads with different data types (say pre-compressed image files) my storage efficiency drops, today I have to add compute and hypervisor instances (and associated licenses) just to gain access to more storage capacity. If I could add a storage only node or two, it would provide flexibility. Also, it might offer the ability to introduce tiering between an all-SSD production tier, and a NL-SAS capacity tier.

This is the end of Part 1. Over the next several parts, I will dig much deeper into these thoughts, including thoughts on what comes after HCI.

Monday, April 24, 2017

The Big Payoff

The big payoff for driverless vehicles is with driverless trucks, not driverless cars, especially driverless "Ubers". By driverless trucks, I specifically mean long-haul trucks.
A typical long-haul trucker drives 10 hours a day, meaning the truck is idle the other 14. Some downtime is needed for refueling, weigh stations, etc., but it is reasonable that driverless long-haul trucks will double the productivity of human driven trucks, quite literally overnight.

There has been a shortage of people willing to work as long haul truckers, even given it pays a middle-class income without the need for excessive education or training. This has caused labor costs to rise.

There are currently over 1.5 million long-haul truckers and estimates are the need for long-haul truckers will approach 2 million in the next 5 years.

There are about 250,000 taxi and limo drivers, and they make less than long-haul truckers. Uber and Lyft have exposed there is much greater demand for car services than originally expected, and the capital-less model of ride-shares works well for that demand. The flood of ride-shares has depressed wages for both taxis and ride-shares. But more importantly, self-driving Ubers will be a capital intensive model and will have all the flexibility of a taxi, and none of the flexibility of a ride-sharing service.

The long-haul truck driver replacement market is a $100 billion addressable market, about 10 times that of the taxi driver replacement market.

Follow the money.

Tuesday, April 18, 2017

When Did Expertise Die?

I saw a recent Facebook post of Dr. Tom Nichol's commentary on PBS about The Death of Expertise.

For some unknown reason, Nichols blocked me on Twitter, so I cannot provide this opinion directly. That is his loss.

But Nichols accurately posits the rise of the public Internet has created the side effect of everyone thinking they individually are an expert. However, individuals believing themselves to be experts is only half of the equation. The other half is the discrediting of the true experts, and I believe that happened about a decade or more before the rise of the public Internet. There is a third point, which is the rise of the well known pseudo-expert, and in some cases the celebrity pseudo-expert, such as Jenny McCarthy in the Anti-Vaxxer movement, and Rosie O'Donnell in the 9/11 Truther movement. Celebrity pseudo-experts provide credibility to lay pseudo-experts such as the producers of the original "Loose Change" 9/11 Truther film.

But back to the second point, the discrediting of true experts, or "when expertise died".

In 1989, while in college, I had a roommate who was a journalism major. At that time, they were teaching journalism students expertise is a subject was inherently biasing, and that the opinions of an expert in a subject must be balanced with an opinion of someone who was not an expert in the subject.

He later worked on a story on management, and interviewed an expert in the subject, who happened to be a management professor I worked for as a graduate assistant. He had to then find rebuttal information not from another management professor, but from someone completely unrelated. To me, this was surreal, because I knew both the interviewer and interviewee, and had no reason to question the good intentions of either.

But later it all made sense to me. I grew up watching expert reporters: Jules Bergman, ABC's science reporter; and Irving R. Levine, NBC's economics reporter. I also noticed those expert reporters completely disappeared in the 1980s. Except for the doctors the networks use as medical correspondents and the aviation expert they bring in for airplane crashes, there are no expert reporters any more. I also remember every time in the 1980s we launched a Space Shuttle, the various national news anchors would state the Soviet Union's public statement opinion about the purpose of the mission, as if it was as valid as NASA's stated mission objectives, or as if NASA's stated mission was as invalid as the Soviet's opinion. This latter point goes straight to my original point about my what my roommate was taught: NASA is an expert on their space missions, their opinion must be balanced. Was the Soviet statement credible? Was it valid? Was it simply propaganda? It didn't matter. Was NASA's statement credible? Was it valid? Was it simply propaganda? It didn't matter. To the media, the Soviet position was just as valid as NASA's position. Propagandists at the Kremlin were just as valid as rocket scientists in Houston.

From a purely pop-culture standpoint, I think we tended to believe Jules Bergman on science issues because his name sounded similar to science fiction writer Jules Verne's. I think we believed the bespectacled and bow-tied Irving R. Levine because he fit our visual of what a college economics professor should look like. They were journalists, and not scientists or economists, and they fit a persona, but they were experts in their field as far as journalism went. They had connections, they could get a meeting with the real experts, they had developed a working expertise on their subject, and they had credibility with the public. But they are gone now, and have been for about 40 years.

So I think before we blame the general public, driven by curiosity, and enabled by the Internet (be it WebMD, Wikipedia, or "FakeNews"), we need to consider nature abhors a vacuum, and realize the television media created a vacuum when it cut out those quirky expert reporters, and promoted skepticism and outright distrust of expertise.

Thursday, March 16, 2017

Everything I need to know about NetApp’s All-Flash Storage Portfolio I learned from watching College Football

Okay, silly title. I got the idea when Andy Grimes referred to NetApp’s all-flash storage portfolio as a “Triple Option”. To me, when I hear triple option, I think of the famous Wishbone triple option offense popular in college football in the 1970s and 1980s. And that got me to thinking of how NetApp’s flash portfolio had similarities to the old Wishbone offense.

The Wishbone triple-option is basically three running plays in one. The first option is the fullback dive play. This is an up the middle run with no lead blocker. It is up to the fullback to use his strength and power to make yardage. The second option is the quarterback running the ball. While most quarterbacks are not great runners, the real threat of the quarterback in running offenses is the play action pass, where a running play is faked, but the quarterback instead passes the ball. In today’s college football, while the Wishbone may have faded, option football remains popular, and many of the most exciting players are “dual-threat” quarterbacks who can both run well and pass well. But, back to the Wishbone. The third option is the halfback, an agile, quick running back who often depends more on his ability to cut, make moves, and change direction to make the play successful.

In considering this analogy, I wanted to find the right pictures or videos of Wishbone football to make the comparisons to NetApp’s flash portfolio, but found the older pictures and videos from the 1980s to not be that great. So I decided to take the three basic concepts: The powerful fullback, the dual-threat quarterback, and the agile halfback and look at more recent examples. I just happen to use examples from my alma mater, Auburn University, because I knew of a few plays that visually represent the comparisons I am about to make.

So first up is the fullback. The fullback is all about power. It is not about finesse. The fullback position is not glamorous. The fullback had to have the strength to face the defense head-on. To me, the obvious comparison in the NetApp flash portfolio is the EF-Series. The EF is all about performance: Low latency, high bandwidth, without extra bells and whistles which can slow other platforms down.

While I don’t have a good fullback example, I have a similar powerful running back demonstrating the comparison I am trying to make. Here we see Rudi Johnson on a power play break eight tackles and dragging defenders 70 yards to a touchdown from the 2000 Auburn-Wyoming game.

Rudi Johnson great 70 yard TD against Wyoming 2000

The next comparison is to the dual-threat quarterback. The dual-threat quarterback can run or pass with equal effectiveness. In NetApp’s flash portfolio, the obvious comparison is the All-Flash FAS (AFF), the only multi-protocol (SAN and NAS) all-flash storage array from a leading vendor. The multi-protocol capability of AFF (Fibre Channel, iSCSI, and FCoE SAN; NFS and SMB NAS) allows storage consolidation, and truly brings the all-flash data center to reality.

The play which best demonstrates the dual-threat quarterback’s potential is the run-pass option (RPO), where a quarterback rolls out and can either keep the ball and run with it, or pass it to a receiver if the receiver is open. Here we see Nick Marshall on an RPO play which tied the 2013 Iron Bowl with 33 seconds left in the game. The reason the play worked is Nick Marshall, a gifted runner, had already run for 99 yards including a touchdown.

2013 Iron Bowl: Marshall to Coates

That brings us to the halfback, also known as the tailback, or just the running back. For the sake of this discussion, and keeping with the original Wishbone concept, I will use the term halfback. The handful of teams who still run a variation of the Wishbone (Georgia Tech, Navy, Army, Air Force, and a few others), tend to use smaller, more agile athletes as halfbacks. These running backs usually get the ball on the outside, and leverage their agility to make the defenders miss. When I think of agility in flash storage, I think of SolidFire. Agility is a key feature of SolidFire. It scales with agility, provisions with agility, adapts with agility, and is the best storage for agile infrastructures like private clouds, especially private clouds using OpenStack. The best recent example I have seen of a running back leveraging agility to make a play is this run by Kerryon Johnson against Arkansas State.

Watch Kerryon Johnson's incredible touchdown against Arkansas State

So enough fun for now. But if you have a dedicated application needing performance acceleration, such as a performance critical database, NetApp’s EF-Series might be your tackle-breaking fullback powering through spaghetti code and getting the job completed despite the challenge. If you are looking to move to an all-flash data center and need consolidated flash storage to accelerate iSCSI MS-SQL databases and NFS VMware datastores on the same infrastructure, AFF is your dual-threat quarterback. And if you are looking to deploy a private cloud with the agility to grow with your workload, SolidFire is your agile halfback.