Contents

1 Choose your availability level

Choose your availability level

One of the benefits to “on prem” is you can build for any availability level you desire at all parts of the stack.

My Preferences

If I look back at the last 25 years of my career, all of the various workloads and organizations I have worked for, if I needed to run any of those workloads today, in 2025, then my preferences are really obvious.

I want the highest quality of everything I can get my hands on. Best servers, best storage, best networking, best data centers, and don’t forget, best internet connectivity. “Best” is very subjective of course, at the end of the day it comes down to the preferences of the organization and the workloads that are going to be used.

One of the driving factors behind using the best especially at this point is most organizations simply don’t need that much in order to do their stuff. When I built out forty racks of equipment early in my career two decades ago, that was a different era. Today the compute, memory, and storage of those forty racks fits in less than half of a rack today(if I was pushing things to an extreme it may be as little as 4U of space with the right hardware!). So if you’re going to have far less physical infrastructure than you once had so it’s more important to be sure that infrastructure is as reliable as you can get.

Data Center

Build

If you are a very large org you could build your own facilities, though I think very few actually do that these days. Modern data centers are often extraordinarily large with at least 250,000 square feet or more of floor space to host servers. Some facilities go as high as a million square feet or more. Two decades ago it wasn’t uncommon to have 10,000 to 50,000 square feet. Today such facilities still exist, but were often the same ones from that time and just happen to be still in use. I assumed it was bigger, but I just checked and the first data center I ever used was only 27,000 square feet of raised floor. The data center I have used for the past fourteen years is literally 18 times larger.

Rent

When I say rent, I am really referring to co-location. There are literally thousands of co-location providers across the globe in practically every market you can imagine. There is a very wide variety of quality between providers and facilities, so take care to pick those that meet your requirements. The largest co-location providers in the world include Equinix, Digital Realty, and QTS Data centers. This website is hosted at a very small facility by contrast managed by Hurricane Electric.

Hyperscalers often rent

It’s important to note, that especially in new markets, and smaller markets it is very common for hyperscale companies to leverage co-location instead of building a data center from the ground up. When they get big enough then they would build their own stuff and move out as it would be more cost effective for them. Co-location allows them to come into a market quickly, just as IaaS allows their customers to come into a new market quickly riding on top of the hyperscaler’s infrastructure.

Data center tiers

I am not aware of any formal enforcement, seems more of “honor system” as far as what tier a specific facility claims to be, so be careful.

Also I assume that the way the tier chart mentions power and cooling, is referring to power and cooling for the facility itself, that does not mean the equipment inside the facility is all running on redundant power.

This list is kind of backwards, usually the lower numbers indicate higher quality but in this case it’s the opposite, not sure why..

Tier 1

A data center with a single path for power and cooling, and no backup components. This tier has an expected uptime of 99.671% per year.

Tier 2

A data center with a single path for power and cooling, and some redundant and backup components. This tier offers an expected uptime of 99.741% per year.

Tier 3

A data center with multiple paths for power and cooling, and redundant systems that allow the staff to work on the setup without taking it offline. This tier has an expected uptime of 99.982% per year.

Tier 4

A completely fault-tolerant data center with redundancy for every component. This tier comes with an expected uptime of 99.995% per year.

Hyperscale facilities

Honestly I am not certain where many hyperscale facilities fall in the tier chart, I would assume many are probably Tier 3, and I also assume in many cases power to indivdiual servers is not redundant, even if the facility as a whole has redundant power. There are probably groups of servers each connected to different power feeds, so if there are two feeds for the whole facility, and one goes down completely(Power+UPS+generator failure) at most you lose half of the gear.

My history

For the last fourteen years of my career I have been hosting at the QTS Atlanta Metro facility. When I moved in they were still building out the first data center there, which is the largest they have on that site. In the years since they have built 3 more data centers on site. The facility I am hosted at is nearly 1 million square feet in size with roughly 500,000 square feet of raised floor. They have had a perfect operational record as far as I am concerned, not a single noticeable issue the entire time. The staff are also outstanding, I am super impressed with all of them. I recall at one point I had a phone call to complain about a new security procedure they had and I spoke with the security manager. I started talking and he asked me again who I was. I told him my name and who I was with and he immediately recognized me. I was blown away, especially because I had not been on site in at least 18 months. When I do go on site I am usually there for 8-12 hours per day for at least a week if not two weeks at a time. Late in 2025, it has once again been roughly 18 months since I was last on site, probably be on site again mid 2026. You might ask, do I have a disaster recovery facility? My answer is no. Realistically, with my experience not only do most companies balk at the cost of such things(whether “on prem” or “in cloud”), in most situations such a facility is never warranted, as data center failures are exceptionally rare, at least in systems designed like the facility I host with. What IS important though is to protect your data, that means backups, and multiple primary storage systems in the event of primary storage failure(of which I have had my share). Disaster recovery is far more important if your primary site happens to be your corporate office by contrast.

In those 22 years I have only been hosted at a single facility that suffered a full facility failure, which was Fisher Plaza in Seattle, WA and the last outage I experienced was in 2007, I moved my employer out of that facility shortly after(employer was already hosted at the facility when I was hired), two years later they had a fire in the electrical room causing $6.8M in damages and I believe over 40 hours of hard downtime, the facility ran on generator trucks for several months while they rebuilt the power systems.

For me hosting personally, I believe the facility this website is hosted in is actually a Tier 1 or Tier 2. No redundant power that I know of, and it has had power outages in the past(last one was around 2020 I think). Cost wise that is not a big deal to me, if my stuff goes down for a bit(outages are usually quite brief) once every few years it’s just my personal stuff, I don’t sweat it.

Servers

There really are several different tiers of servers you can use, you may end up using multiple tiers for different types of workloads. I consider tier 0 to basically be mainframe, and that is not relevant for this website.

Enterprise Grade / Tier 1

These are brands like HPE Proliant, Dell PowerEdge, and Cisco UCS. They usually come with premium configurations, support, and tooling to help manage fleets of systems easier. Those features do make them the most expensive of the classes on this list.

Web hosting grade / Tier 2

Not sure what else to call this group. One of the main players in this space is Supermicro, selling custom configurations to lots of mid or even large size organizations. Small organizations can use their products as well of course. In my experience, generally the hardware quality of Supermicro is quite good, but the software, firmware, support and overall manageability of their platforms is less robust. But they also come at a significant discount vs the enterprise grade gear.

At one point both HPE and Dell had classes of systems that fell into this category but at the moment I can’t find those, perhaps they were discontinued. At one point I think Dell had a “DCS” group, where they would build custom stuff for you but you had to buy 1,000 servers or more at a time or something. Perhaps HPE and Dell surrendered this market to the ODMs.

Composable Infrastructure / Tier 2

This is one of the newest types of infrastructure that HP has been trying to sell for the past several years, basically replacing the earlier blade systems with newer more flexible blade systems. HPE has Synergy systems, and at one point at least Dell had partnered with Liqid for composable servers. I haven’t gotten the impression that this type of design has really taken off in the market relative to regular rack mount, and I don’t get the feeling that it has even taken the same market share that blade servers once had from HPE and Dell.

ODM Grade / Tier 3

This class is almost universal among hyperscale providers. Entirely custom systems from the ground up built for each customer, with thousands to hundreds of thousands of systems being built at a time. These same companies build many systems for companies like Dell and HPE as well. This also includes Open Compute systems which leverage a different rack mount standard than traditional stuff, being two inches wider, and 4mm taller per “unit” of height. I say tier 3 in the title, but really they could be tier 2 as well, it all depends on how the customer instructs the ODM to design/build the systems.

Storage

Like servers, there are different classes of storage systems depending on your needs. I mainly cover block storage in this section as opposed to NAS.

Mission Critical / Tier 0

There are really only a couple of storage platforms in the world that fall into this category in my opinion, that is the Hitachi VSP platform(of which HPE OEMs this product as the XP storage system), and I believe the Dell EMC PowerMax platform. These two platforms are also among the only ones with native mainframe connectivity. IBM also has their DS storage line, but that is really geared towards enterprise IBM, not relevant to most organizations. I know that for many years Amazon deployed HP XP storage arrays like candy.

Enterprise Grade / Mid range / Tier 1

The most premium systems with the highest availability ranging from 99.9999% to 100% data availability(for most customers). These systems have eaten a lot of the higher end of the market over the last two decades, but despite manufacturer claims, they still sit below the mission critical category above, even though they run countless mission critical workloads for customers across the globe(mission critical from the customer’s perspective, may not compare in criticality as say a large bank or insurance company). There are many players in this tier, some of the largest include Dell/EMC, NetApp, Pure Storage, HPE, and IBM.

I group “mid range” into this category as well out of laziness perhaps. Mid range platforms are similar to enterprise(at this point with flash, wasn’t quite the case 15 years ago) just scaled down a bit, usually running the same software, and providing similar availability in most situations.

Tier 2

I’m less in touch with this tier of storage, but some vendors I consider to be in this tier are Infortrend, Supermicro, Seagate, and Data Direct Networks(DDN). There are many others. Generally less robust software/support than higher tiers as you might expect, but also have a much lower cost.

ODM Tier / Tier 3

Like Servers, ODM companies build custom storage systems for their hyperscale customers, who then will load custom software on top to manage them. The software is just as important as the hardware, so I wouldn’t really consider ODM storage platforms of being designed to be similar to be Tier 2 on software alone(they could do it from a hardware perspective).

My history

My SAN storage platform of choice over the past nineteen years has been 3PAR (later, HPE 3PAR), and I consider that platform to be tier 1.

My NAS (NFS) platform of choice at this point is Dell/EMC Isilon. I went through several different NAS platforms including Nexenta, HPE StoreEasy(Windows 2012 Storage server), FreeNAS, and Isilon SD Edge(software only version of Isilon). In the past I used Exanet, and BlueArc and obviously regular Linux NFS servers. All of these products had critical implementation issues rendering them not suitable for use in my environment long term.

(Side note: For Nexenta, I first deployed that in 2012, fully virtualized and supported by Nexenta(at the time). I wanted them because they would support a 2 node active/passive cluster inside of VMware using raw device maps and SCSI reservations. Initially it all worked fine, fail over, fail back no problems. After a few months there was an issue, likely just a latency spike on the storage and that caused Nexenta to want to fail over, but it did not do so correctly and went split brain, causing data corruption to one of the filesystems(the only one with ZFS dedupe enabled), Nexenta would kernel panic and reboot loop until I removed that volume. Happened two or three times, at one point I tried to actually recover data from the volume but could not(used ZFS debug tools to scan and force repair to no avail). Nexenta support’s only answer was “restore from backup”. Once I broke the cluster apart and ran standalone, it worked fine, but of course no high availability, no way to update the software without outages, so, it wasn’t a long term solution. FreeNAS was sort of similar, no high availability(you could, and I did do ZFS replication but that is not near instant failover and failback). It worked fine standalone but same issues as Nexenta stand alone.)

(Side Note: for StoreEasy I actually had Microsoft make a custom windows patch for me for StoreEasy(for one of the dozen+ critical problems with that platform, I dealt with MS through HPE), which I never deployed out of fear of being the only one in the world to have it. That platform was so terrible, I should of taken a hint and returned it after having HPE support take 8 hours on a call to set it up initially, a process that was supposed to take 30min by the customer – the breaking point for me was a single file system went down on the Windows cluster and that took out the entire cluster, all other file systems. The other file systems themselves were fine but the cluster marked them all as bad and I could not bring them online without breaking the cluster apart entirely. The whole point of multiple file systems was if one had an issue it should have no impact on the others).

(Side note: for Isilon SD Edge, I was super excited to see that came out(it is long discontinued), but it absolutely imploded in basic internal testing(didn’t even last 48 hours, unlike StoreEasy which I let it last for a few months for testing). It could not handle a few hundred thousand small files with a 15k RPM fibrechannel backed SAN. CPU would just sit at 100% for hours at a time working through a simple file extraction. Engaged with Isilon techs and they confirmed it wasn’t going to work. Had a friend who worked at a higher level tech role at Isilon who explained the issue in more detail. Basically they needed their hardware appliances with SSD metadata acceleration(the key part) in order to do stuff right. I did a simple POC in one of their labs before pulling the trigger to buy the appliances. Just copied a ~20GB file full of small images that we had, and extracted it, and watched the cpu usage. No impact at all with the metadata acceleration))

The hardware-based Isilon platform by contrast has been running flawlessly for over eight years now. It’s really too bad(IMO) that HPE doesn’t have a better NAS offering on the low end (sub 50TB, they have partners that go to the 100s of TB and beyond). Some of their enterprise storage systems do have NAS capabilities but I have seen enough about how those work to know that they are unsuitable for my use cases without even trying them. I really wish Exanet went to HP, along with 3PAR. Exanet could scale to an exabyte(in theory at least am sure they never tested that at the time) and up to 128 billion files, though could also scale down to 10-100TB without an issue too.

Hyperconverged (HCI)

(added this in early December 2025, wasn’t on the original site at launch)

HCI is “managed complexity”

I don’t really ever think about HCI so this completely slipped my mind. But it is another option for those what wish to have a more simple way to manage integrated systems. HCI combines servers and storage into a single integrated platform, rather than relying on external storage for running your servers. HCI vendors also like to tout they integrate networking as well, but they really don’t, and can’t. The HCI stack stops at the “server” level. Once the network traffic exits the server onto a real network, the convergence is gone. There is never(as far as I know) integration from HCI stacks to external physical switches, routers, load balancers firewalls etc (some may opt to run “virtual load balancers or firewalls”).

(Side note: back around 2009/2010 there was some degree of integration that extended from the hypervisor out onto the network, but it required specific Cisco gear, and specific Cisco virtual switches(appears to only support vSphere 5 – 6.5). It was highly touted initially, given the software that ran on the virtual switches was the same as the external switches so you could manage them more like a unified system(at least on the network layer))

Several(in 2025, perhaps most?) HCI vendors also have their own software stacks, usually based on Linux and other open source software(always with lots of custom stuff bolted on top).

Nutanix

Nutanix is the largest independent player in this space. Though there are many others that have come and gone over the years.

Dell/EMC VxRail

I have had brief exposure to Dell/EMC VxRail(which leverages VMware’s vSAN), and that experience was pretty terrible, the platform was setup years before I joined the company and it was just a crap experience on many levels. Quite possibly it was setup wrong, though they did have an external certified Dell VAR do the design at least. The office the system was in was being shut down so the whole VxRail stack was retired well in advance of that.

verge.io

I see a lot of activity on LinkedIn from verge.io recently. One of their employees tried to claim to me in a comment that their system was not hypercoverged. I responded with data on their own website, and well they didn’t reply after that. Technically speaking, verge.io claims to be “ultra converged”(UCI). But they really are HCI at a core (that is combined server+storage), they may have extra stuff on top of that, but the point is they don’t appear to support connecting to external storage systems.

Leave “legacy storage” behind

verge.io argues that companies should leave their “legacy” storage behind, a sentiment pretty much universally echoed throughout the HCI ecosystem. I believe the origins of this was more so to drive more revenue to the HCI vendor as they could sell the servers, software and storage(often has license costs associated as well), rather than just the servers+software. Though more recently came to realize it is also software limitations on their part that prevent them from being able to properly support external storage(I touch a bit more on that below).

Nutanix’s external storage support

Nutanix recently introduced support for limited external storage systems to their HCI platform in 2024 and 2025. Last I recall seeing(can’t find the article) HCI market share was something around 30%, and I think the article made it sound like the HCI vendors were having trouble growing further, hence Nutanix’s introduction of support of external storage.

But don’t let Nutanix’s external storage support fool you too much. In brief discussions with one of their employees they clarified that they currently only support two storage systems because they are not using them as “generic” storage, but rather a “vVol-like experience”, meaning the storage systems need special software and powerful processors and memory to manage the integrations(which is the same as using VMware’s vVol). So you cannot hook up just any storage system to Nutanix thinking it will work, because the level of integration goes way beyond the storage protocols and MPIO.

My thoughts

Personally, I’ve never been a fan of HCI for myself at least. I think it’s a fine option for many out there, but I don’t like the software complexity under the hood for the storage level stuff, and the tight integration with the hypervisor which means the software will be changing more frequently. It is “simpler” from an administration standpoint for sure, the complexity I fear, is in the underlying (hidden) software layers below. I have been working with storage in depth now for almost 20 years, and like all other areas(networking, servers), I don’t like complexity.

I’ve never liked VMWare’s vVols for the exact same reason. I was super excited about the concept of vVols in the year or two VMware was talking about it(I saw it as a direct response to technology from Tintri), before they got released. But once I dug into what was needed to use them(after they got released), even on a storage platform which was considered basically the “best supported” for the technology(at the time), it took me just a few minutes to “nope” out of the whole thing entirely. Fast forward to 2025, and VVols are now depreciated.

In case it wasn’t obvious, I don’t like Ceph for the exact same reason. AFAIK, Ceph has it’s origins in object storage, for which I wouldn’t be surprised if it was quite good at. But adapting object storage to block/file storage and running VMs on top, is not something I would to have myself, don’t like that level of complexity. I once had a VP of Technology in 2010 who, for some reason, thought we could run VMs on top of Hadoop’s HDFS, and eliminate the need for the SAN entirely. Talk about clueless… (at least with Ceph you can do that, could not with HDFS).

With Broadcom’s outright attacks on the VMware customer base, HCI market share may very well go quite a bit higher as customers leave to other platforms, virtually none of the HCI vendors (aside from Nutanix) have the ability to support connecting to external storage, as none of the open source virtualization stacks have figured out proper shared block storage, something that VMware has had for over sixteen years now(I came to this specific realization recently and it still blows my mind).

Networking

This refers primarily to ethernet switching, unless otherwise specified

Enterprise Grade / Tier 1

The biggest data center brands for switching include Cisco, Arista, and HPE (includes Juniper now).

The biggest data center brands for load balancing include F5 Networks, and Citrix Systems.

Tier 2

Once again, Supermicro is in this group for ethernet switching. I know there are more but I am not sure who they are.

Load balancers I’d say tier 2 includes companies like A10 Networks, and Kemp.

ODM Tier / Tier 3

ODM, once again is usually custom designs for their customers, who then run custom software on top of the equipment.

Software load balancers

Technically speaking, appliance load balancers from the likes of F5 and Citrix run mostly on software(they usually have SSL offload to dedicated chips), so when I say software load balancers I more referring to is software-only(not hardware appliances) like HAProxy, or whatever load balancer comes with OpenStack, the VMware NSX load balancer, etc..

While F5 and Citrix both have software “virtual appliance” (software-only)versions of their load balancing products, I don’t consider them in this tier just because they are much more capable (and expensive) products by comparison.

My history

For Ethernet switching my preference has been Extreme Networks for pretty much my entire career, the last 25 years. I didn’t list them under Tier 1 vendors for data center because while I do believe they have high quality and can run data center stuff just fine, they clearly have retreated from the data center market over the past decade, choosing to focus more on campus and wireless. There were multiple times in the past where Extreme Networks held the title for “World’s Fastest Switch” (as with any title, it is always temporary). There was a time when even VMware’s own network was powered by 50,000 ports from Extreme Networks.

But I absolutely don’t discourage anyone from looking at Extreme for their data center needs. I use them in part due to ease of use, their protocol ESRP (comparison between VRRP and ESRP), and their virtual routing/switch capabilities(note this PDF is from 2004, here is a slightly newer one).

My data center needs are simple though, and the same type of equipment I deployed 15 years ago (both speed and features) works fine today, and I believe will continue to work fine for the next decade. 1/10/40GbE is more than adequate(for me, though Extreme does have products that go up to 100Gbps and 400Gbps). Software wise I am using the same software features as I was two decades ago, and there’s no obvious signs that requirements will make me need to change to another strategy in the years ahead.

For load balancing, I have used Citrix Netscaler for almost fourteen years, and used F5 Networks BigIP/LTM for seven years prior to that. I currently run a pair of low end Netscalers as my main load balancers, low end as they are the second smallest model, at peak load they use about 4% of their CPU. The software license limits them to 700,000 HTTP requests per second and 5Gbps of throughput. The hardware(with license upgrade) is rated for 900,000 HTTP Requests a second and 10Gbps of throughput. This compares with their highest end unit (as of 2023) of 5.5 Million requests per second and 200Gbps of throughput. I don’t buy these for the scalability(at this point), since my needs are meager by comparison. I buy them for the features, instrumentation, and stability. My production load balancers route traffic for more than 90 different types/groups of systems, and my non production load balancers route traffic for more than 80 different types/groups of systems. If I had just one or two things I was load balancing then I probably wouldn’t be dead set on these enterprise load balancing products as I am otherwise. Side note, I started using Logic Monitor as my infrastructure monitoring platform of choice in 2014, and it exposes a mountain on metrics that these load balancing platforms expose in a very easy to use manor that makes it possible to have “single pane of glass” for so much of the stuff that I do, it’s wonderful.

In both cases I can’t really imagine using different vendors than the ones I am used to.

Cult of the Cloud

Designing Infrastructure Availability

Choose your availability level

My Preferences

Data Center

Build

Rent

Hyperscalers often rent

Data center tiers

Tier 1

Tier 2

Tier 3

Tier 4

Hyperscale facilities

My history

Servers

Enterprise Grade / Tier 1

Web hosting grade / Tier 2

Composable Infrastructure / Tier 2

ODM Grade / Tier 3

Storage

Mission Critical / Tier 0

Enterprise Grade / Mid range / Tier 1

Tier 2

ODM Tier / Tier 3

My history

Hyperconverged (HCI)

HCI is “managed complexity”

Nutanix

Dell/EMC VxRail

verge.io

Leave “legacy storage” behind

Nutanix’s external storage support

My thoughts

Networking

Enterprise Grade / Tier 1

Tier 2

ODM Tier / Tier 3

Software load balancers

My history