Public Cloud Myths

Myth: Public Cloud providers are required to build a new global footprint

It really surprises me how often I see something like this written out. When it’s so easy to disprove with real world examples. If you need a new global footprint spun up in the next 24 hours, then yes you probably do need a public cloud provider for that initial deployment anyway. Most serious organizations don’t need to make such decisions on such tight time frames. I know one organization I was at needed a location spun up in Amsterdam in 2012 for example, we spent a good six months thinking about it before actually pulling the trigger(in that case it was to a data center with a tiny footprint, not even one full rack).

Most Global CDNs do not leverage public cloud for their edge services

I can’t realistically say there isn’t a single independent CDN out there that doesn’t use some cloud for their edge(but I can’t recall the last time I came across a CDN endpoint with a cloud IP). Obviously I am referring to non hyperscale branded CDNs(which include Azure Front Door, and Amazon Cloudfront both are operated by said cloud providers). But if you look at major CDNs like Akamai, or Cloudflare for example. All you need to do is find the IP of one of their endpoints, and use the WHOIS infrastructure to see who owns that IP. Even smaller CDNs like Fastly, CacheFly (which I have used for the past year), and Imperva (which I have used for the past 3 years) are similar.

At my previous company for quite a whle we used a CDN named Instart Logic. They were founded in 2010, right during the initial public cloud boom. So by no means were they a major player, they were tiny by comparison. I specifically recall in a conversation with their senior folks them mentioning sort of proudly that they could and did offer pricing to customers that was significantly less than what Amazon Cloudfront costed. Several years later Instart Logic ran out of money and their assets were acquired by Akamai. I think it was in fact Instart Logic that I may recall as being the only independent CDN (that I came across) that had one or more public cloud endpoints (in distant countries).

It would not surprise me, that if in SOME MARKETS, especially remote markets, that CDNs could leverage public cloud in those areas(assuming cloud providers exist there) in their very early days if their traffic is very low and they don’t have a lot of customers yet that would make some sense. Just as it makes sense for public cloud providers to leverage co-location services in those same kinds of markets to spin up their services faster than building(or buying) their own data center in that region.

Why do CDNs build infrastructure themselves? Much of the same reason as I do really: cost and control. But CDNs have another thing to worry about, which most customers don’t ever think about, and that is network routing. They need multiple internet providers as well as controlling their own BGP routes to maximize network performance for their customers.

I believe the same holds true for most global DNS providers as well(for the same reasons).

I actually had an exchange recently on LinkedIn with a employee from Cloudflare. Initially I joked and said something like “You mean you don’t need to use public cloud for building a global network? That’s not what I heard..” The initial response was Cloudflare does not leverage hyperscale services for any CDN services or CDN workers(relatively speaking, IMO, CDN workers are still a very new concept with Cloudflare leading the industry I believe in that area but I could be wrong). The latter bit actually kind of surprised me. I was assuming, until that point that it was possible Cloudflare would leverage hyperscale IaaS to handle “burst” workloads. But they clarified again they do not do that, everything is “on prem”.

Myth: The only alternative to public cloud is to build your own data centers

This myth was added December 2025, about a week after the site launched in the event you saw the site before and didn’t notice this, it’s because it wasn’t here yet.

I was thinking a lot last night, and thought this would be a good myth to have as well. I cover it less directly in other spots on the site, but here I will be direct. This myth is inspired by a LinkedIn post that I replied to a few weeks ago from someone who claimed to be a former AWS employee, and was separated from the company just a couple of months prior(so either they quit/fired/laid off around August 2025). I don’t know this person, but he put a pretty blunt rant in a LinkedIn post, and their algorithm pushed that to me a few days after they wrote it. I say rant because their post literally started with the words “ENOUGH!” (all in caps).

Anyway, they went on to suggest customers keep using public cloud(in response to the big AWS October 2025 outage), and not try to build their own cloud, because it’s really difficult to do and you aren’t smart enough to do it. That part is fine and I agree with actually(my argument is most orgs don’t need their own cloud, utility computing is more than sufficient for 90%+ of the workloads, for super large organizations OpenStack or VMware Cloud Foundation may be good solutions if they “require” IaaS on prem), but then they went into the myth.. they started talking about needing to build data centers(actual facilities), acquire the land for the data centers, and go through the construction process, why do that when you can just use public cloud?

The myth of course is you don’t need to build data centers, for the vast majority of organizations(95%+), the co-location service has been around for at least twenty five years, much longer than public cloud was even a twinkle in anyone’s eye. There are thousands of service providers around the world in almost every conceivable market for customers to rent data center floor space or rack space to put equipment. In fact, hyperscale companies themselves often leverage co-location in new markets to get up to speed quicker. Even in existing markets, I recall touring a small co-location data center (link to the actual tier 3 facility though it had a different owner at the time) north of Seattle fifteen years ago, literally half of the facility was Amazon gear, that was in their home market! The server hosting this very site is hosted at a co-location facility in Fremont, CA(run on servers owned by me personally and I pay the co-location bill). This server also runs email, DNS and other services for me.

Back to the person I responded to on LinkedIn. I gave them some info(this was before this site was an idea in my head), but their responses were somewhat nonsensical. We weren’t talking the same language. This person was obviously pretty deep in the cult as they could not form a coherent argument. Eventually they just stopped replying(which isn’t unusual when engaging with such people).

Myth: The only way to protect myself against another failure in us-east-1 is to go multi region or multi cloud

Another example where I’m consistently surprised the sheer volume of people and organizations echoing this nonsense. Especially in 2025. Now I could see people say this when us-east-1 had a power outage and was down for multiple days for several customers back in 2012. The cloud market was still evolving, vs today I’d say it’s quite mature. Point is we got a very important data point back in 2012(and more smaller points in the years since) that you should not rely on a single cloud region for important stuff. I have very little doubt, that any serious organization that is leveraging IaaS has technical people inside that are actually doing the work that have thought about it. Maybe they’ve even had formal discussions with their organization on this very topic. Especially for larger customers I have no doubt that such conversations have taken place at all of them at some point.

So then why do so many organizations have serious outages when us-east-1 goes down if this situation is common knowledge?

The answer is simple and obvious, something I raised fifteen years ago in my original blog post. Saying we need to go multi region or multi cloud is really easy. It is not easy to do. It’s also extremely expensive (as if public cloud wasn’t already extremely expensive). I am sure there are software packages that can make such availability between cloud providers and regions an easier process, however that will absolutely increase the costs even higher(if such tools were worked well, and cost effective they would be in much wider use).

So these organizations are making a conscious decision to not go multi region or multi cloud because it is too difficult/complex and too expensive. You really can’t blame them for it if some of Amazon’s own services like Alexa, Amazon Ring doorbells, and I think I read Amazon Prime had issues during recent us-east-1 issues in October 2025? If Amazon themselves doesn’t do this for everything then you know there’s reasons why all these other customers are not doing it either.

For me the answer is simple, as I said on my blog fifteen years ago, all you have to do is look at their SLA to be able to rule them out for running anything that is mission critical to your organization.

I moved my last organization out of AWS nearly fourteen years ago(with 7 month ROI), we have had 100% data center availability in the time since. In fact the only time in my career where I did not have 100% data center availability was in 2007, that facility was poorly designed and I moved my organization out within a couple months.

Myth: Public cloud is just a bunch of computers operated by someone else

I see people say “cloud is someone else’s computer” or similar quite often. Technically that is true, the myth really is in the implied simplicity of the system. Running a computer system is easy. Running a cloud platform is hard. It is an enormous amount of complexity, with an enormous amount of automation(which is yet more complexity) riding on top to try to tame the complexity below. A massive software code base that is constantly evolving and obviously will have bugs like anything else.

Just be careful and don’t confuse Public Cloud IaaS as being in the same league of reliability(see “You can’t run data centers better…” below for more clarification on the term reliability) as a simpler managed hosting operation for example.

Myth: Hyperscale data centers are the same as any other data center

I can see how less technical folks can be ignorant on this topic. They hear the term data center, and well that’s a pretty generic term. Even the U.S. Government has an absurd definition of what makes a data center a data center.

“This has been a point of contention within the nation’s bureaucracy itself, with several redefinitions of the term changing the number of data centers calculated. Back in 2010, it only meant sites that were 500 square feet or more, and met stringent availability requirements – criteria that covered 2,094 data centers.

Over the following years, that definition was expanded to include almost anything with a server that provides services (whether in a production, test, staging, development, or any other environment). A room containing only print servers, routing equipment, switches, security devices (such as firewalls), or other telecommunication components, was not a data center.

By May 2018, the government said it had 12,062 data centers (as of August 2017) – although it did not create nearly 10,000 data centers in seven years, most were already there, just not covered by the previous definition. Others were there previously, but had not been included by the 24 government agencies subject to FITARA and DCOI, due to poor accounting practices. It is therefore not clear how many data centers the US government has actually added over the past decade.”

www.datacenterdynamics.com article from 2019

IMO – anything less than say ~5,000 square feet of space used to host IT equipment is better referred to as a “server room” (regardless of whatever type of services are hosted there, I suppose one exception may be if the facility is built to N+1 availability standards which for such a small place I think would be very unlikely).

The point is(and sorry because I raise this in other area(s) on this same page) true hyperscale data centers are intentionally built to lower quality standards for cost purposes, with the intention that you work around those lower quality conditions by distributing your data/applications across multiple data centers(availability zones) and regions. Losing a data center is like losing a rack for a smaller customer(my stuff would not survive a rack failure, though I’ve never had a rack failure that wasn’t also a data center failure, last one was in 2007)., or could be like losing a single server for a really small customer. Compared to a world class tier 4 facility where it is built from the ground up to never go down(not that it’s impossible for an outage just extraordinarily unlikely, in my case during the past fourteen years not even the slightest blip indicating any service impact whatsoever).

Myth: You can’t run data centers better than the Hyperscale companies

I had people tell me that fifteen years ago. I didn’t believe it then. But if I am to be completely honest, fifteen years ago I lacked the data on both sides of that equation to properly answer it. I’ll also say I don’t run data centers, never have, never will. But people often use the term data center as a layman’s term for server/network/storage infrastructure, rather than being specific about operating the facility that such infrastructure is installed in. So when I refer to the term data center I am in fact NOT referring to actual facility operations of a data center.

Now that fifteen years has elapsed since my first blog post, and honestly the first year I heard comments such as those. I can say today, in 2025, with absolute confidence I can, and have operated data centers far better than even the public track record of any public cloud provider has done so themselves in the same period of time.

But this shouldn’t be a surprise to anyone technical really. If it is then you are missing the point about the architecture of hyperscale providers. In an ideal world, a customer would be distributed amongst several different regions (at a minimum) of a given hyperscale provider. Availability of that customer would not be dependent upon a single region(any more than my availability is dependent upon a single server), but on the entire system as a whole.

But we do not live in an ideal world, and we have fifteen years of hard evidence(customers going offline when us-east-1 goes down) now that most organizations do not operate in that ideal world either.

My systems are not built to survive a data center failure, they never have been. In large part because doing that is again, expensive. But more importantly, if you are deployed to a world class tier 4 data center, the likelihood of a full facility failure is exceptionally remote(I have twenty two years of data center experience), so much so that for most organizations(including every one I have worked for in the past 25 years) it is a waste of money to go that route. Now if you have the money and want to do it anyway, more power to you, that is great.

In my opinion, the whole concept of “Disaster Recovery” was really born out of “data centers”(aka “server rooms”) that were actually on site in corporate & government office buildings where availability is far less than a world class tier 4 data center. If your critical servers are in your office, well that is not a great place for them. One company I was at, ironically enough fifteen years ago was in this situation for their Microsoft Exchange system. Before I was even hired at the company, they had their Microsoft Exchange hosted on site in their office(despite having other critical systems in data centers). There was a giant wind storm, and the electricity for the entire city of Bellevue, WA (and others I think) went out for more than 24 hours. I was living there at the time and it happened to be the same city that this company’s HQ was in. They learned then, and moved their Exchange server(s) to their data center following the outage(or maybe during the outage I am not sure).

I’ll also toss this out there, a realization I came to about a decade ago

For the vast majority of organizations, a disaster means only one thing – your data is gone and is not coming back. Outages, even extended multi day outages are a PITA for sure(I have personally worked through several), but not a disaster. It would certainly be nice to have protection against such extended outages, but reality is most organizations are unwilling to invest for that(in cloud or on prem).

cultofthe.cloud quote

Look at the recent(as of this post) massive near PB scale data loss in South Korea. I saw so many comments somewhat attacking them for not having proper backups when they said why they don’t in the original articles.

“The G-Drive couldn’t have a backup system due to its large capacity”

South Korea officials on why there was no backup

There was no backup because it was too big. What does that mean? It means they wanted to back it up but there was no budget allocated to do so.

As I mentioned in my blog post fifteen years ago, true hyperscale data centers are not built to the same standards as the facilities I use. They don’t do this intentionally in order to save costs(and their decision is a sound one), because availability is not determined based on a single data center, or even a region, it is the system as a whole(all regions). Of course you have to have your applications deployed to the “system” rather than just a region, or a single availability zone. That is where of course the costs and complexity skyrocket(even further).

Another realization I had recently which it seems nobody else in the world has recognized yet(at least I haven’t seen signs of recognition) is regarding the recent large outage at AWS in October 2025. Bottom line is a single entry(technically, the lack of an entry in that case) in a DNS Zone hamstrung six data centers, rendering many services unavailable for an extended period of time. That is not a bug, that is a critical design flaw. How many other design flaws like that are there in their infrastructure? Same goes for Google, Microsoft and other hyperscale companies?

Such design flaws don’t exist in my environment because I don’t put such complexity into my systems. I have in fact had my VMware vCenter control plane that managed ~800 VMs fail and stay offline for over a week about a decade ago. The effect: zero impact to operations. It was a bit annoying, but nobody outside of my team knew or cared because it had no impact on anyone but my team(and me specifically). Recovery of the system was quite simple, but I spent a lot of time working with support out of paranoia regarding what may happen when I do recover the system(in the end, nothing bad happened, it was just paranoia, but I am a pessimist by nature).

You personally may not be able to run a data center as well as I can, but you don’t need to be me to get good results.

Myth: You need more staff to managed on prem environments

I cover this to some extent here. But will repeat myself to cite this out in the “myths” section of this website. This myth was added December 2025, about a week after the site launched in the event you saw the site before and didn’t notice this, it’s because it wasn’t here yet.

I have seen on occasion, going back to the origins of “My cloud journey” as far back is 2010, but many times in comments since, less technical people having ignorance around staffing for on prem vs public cloud. To be clear, I do believe that both situations (if done to their fullest potential) are very different skill sets.

My personal cloud journey had no staffing changes for on prem vs public cloud. But I don’t expect you to believe my story alone, so as an alternative I can only offer that the CEO of 37 Signals specifically stated on his blog that they did not have to make any staffing changes at all as a result of moving out of cloud.

“[..]And crucially, we’ve been able to do this without changing the size of the operations team at all. Running our applications in the cloud just never provided the promised productivity gains to do with any smaller of a team anyway.”

quote from the CEO of 37 signals

I firmly believe, if an organization has moderately competent staff operating public cloud IaaS (done to it’s fullest potential or close to it), that same staff will have the brainpower to pretty easily adapt to an on prem environment, if they choose to. On prem enterprise offerings are really built from the ground up to be easy to use, and high availability is often built in. You don’t need to design your application stack in a special way to account for unreliable infrastructure(you still can, as nothing is perfect). On prem enterprise offerings usually come with nice user interfaces as well, they may also have APIs, though they would me much less frequently used in most deployments. I’m sure if the staff has zero experience with basic networking concepts and servers etc, they may very well make some mistakes initially. Just as they would make mistakes initially doing stuff in public cloud. Vendors are often more than willing to provide free guidance, or even paid consulting as well if needed. Documentation is also generally better, and the quality of the products is higher, with a lower rate of change.

I’d also argue that for most organizations(90%+), on prem would really be a small environment. IT equipment is extraordinarily powerful these days, and you don’t need much of it to accomplish a lot. What used to take forty racks two decades ago can be done in less than half a rack today. In most situations, you can easily run two thousand virtual systems in a single rack without much effort. I believe that covers a large portion of the world’s organizations.

Public cloud IaaS on the other hand, if done to it’s fullest potential, almost universally requires additional complex automation that runs on top of the system in order to try to tame it (such as Terraform, or IaaC offerings, things cloud people would immediately recognize), to provide for situations like recovery from system failures etc(vs on prem environments recovery from server failures is automatic and seamless). Server lifetimes in public cloud can often be measured in days or months, vs on prem lifetimes can easily stretch to years or even a decade if desired.

You can try to operate public cloud like you would on prem, I believe most orgs do that, at least initially (certainly mine did, have read many stories over the years of other orgs doing the same). That will usually result in even higher costs, and more frustrations as your expectations will not be met(“you’re doing it wrong” – which I am happy to admit, though I was only following the lead of other people who had years more experience in AWS than me). My main point on the site here is regardless of whether you are “doing it right”, you can’t work around the fact that IaaS is broken-by-design (at least for cost and efficiency reasons).

Myth: You only pay for what you use with cloud

This myth was added December 2025, about a week after the site launched in the event you saw the site before and didn’t notice this, it’s because it wasn’t here yet.

This is a commonly used myth. But unlike most of the others it’s not ENTIRELY a myth, just mostly. I will clarify that shortly. The point is more people refer to “cloud” as a general purpose all encompassing platform when saying “pay for what you use”, and not being specific, in the vast majority of cases they aren’t specific due to their ignorance of the situation.

Paying for what you use

Taking AWS specifically as an example, there are some services that are truly “pay for what you use”. The biggest one is their S3 object storage. You pay for the bytes that you store in that, for the number of times it is replicated, and for the bytes going in and going out of the platform. Which is one of the reasons their costs are so high relative to some alternatives(disclaimer: have never used that service). Another is probably SQS, their queuing platform. Another huge cloud thing for “paying for what you use” is “server less“, which ironically makes cloud even less cost efficient than even IaaS(even for a group inside Amazon themselves).

Paying for what you provision

For most orgs, I believe the bulk of their costs will be here. That includes things like EC2 (virtual machines), and RDS (managed database offering) being some of the biggest. Also that includes block storage such as EBS (which is tied into EC2), and probably other things (been so long since I used the services, I don’t feel a need to cite every service anyway).

Taking a super simple situation, you have workload that can benefit from 8 CPU cores for a few seconds at a time, a dozen times per hour. Most customers would just make a VM that has 8 CPUs on it and be done with it. This is where the trap is. Your 8 CPUs are idle for 90%+ of the time, you are paying for those 8 CPU cores regardless of their utilization. The capacity from those 8 CPU cores cannot be allocated to other resources in the meantime, leading to stranded capacity, which is the most key point regarding Hyperscale IaaS: broken by design, something I first raised fifteen years ago.

Same goes for disk space, any disk space that is not being used, you are being charged for regardless(disk space is especially wasted on the “instance level” disks attached to the VMs). In a utility computing model, you allocate space, but that space is not consumed until it is written to, and the space is then freed automatically when the data is deleted.

Not only disk space, but disk I/O is wasted as well(for non technical people I/O signifies the actual performance you get out of the storage). When the storage for that VM is idle, you can’t leverage that capacity for any other systems, it is stranded.

Same goes for memory, due to fixed instance sizes, you are forced into cookie cutter VM designs that do not offer precise levels of control over their capacity. The result is you are forced to choose the closest sized system to your requirements and just eat the wasted resources that come along with it, because almost certainly most workloads will not be able to fully leverage the CPU/Memory/Disk of every VM in the environment. With Utility Computing, resources are shared, so what one VM isn’t using, another VM can use in a safe manor.

Put most simply, go provision a 8 CPU VM on any cloud service, fire it up, don’t “use” it. Let it sit doing nothing. In a perfect world you’d be charged perhaps just a few cents for consuming a couple GB of disk space, and some tiny amount for consuming less than 1% of the CPU capacity and probably 2-3% of the memory capacity. But in the hyperscale IaaS world, you are charged the full price, regardless of whether or not you are using it. Just having it on, is considered using it. That may sound strange, but I’m sure many admins can relate to having servers “on” but nobody is “using them”(perhaps those who were using them forgot to tell IT they don’t need that system anymore for example, or they left the organization). “Usage” for me implies actual resource utilization. Hence you are paying for what you provision, not for what you use.

Myth: I want to “burst” my workload to public cloud

This myth was added December 2025, about a week after the site launched in the event you saw the site before and didn’t notice this, it’s because it wasn’t here yet.

I’ve heard this many times over the last decade and a half, even from technology leaders who I have worked for personally. They want to have their steady state “on prem” then be able to “burst” to a public cloud for large traffic events. On paper it sounds nice. But this strategy will not work in the vast majority of cases, and it is not public cloud specific. It often will not work for bursting to another “on prem” data center either for the exact same reason: latency.

Latency is a performance killer

When thinking about this myth, people never consider latency, which would kill such plans to burst almost dead in their tracks 90%+ of the time. There are exceptions to everything of course. One way you could work around this is if you specifically locate your “on prem” resources geographically near to your public cloud provider (or other “on prem” data center – say less than 10ms of network latency, could be more or less depending on your stack’s tolerance), which means within maybe 15-30 miles of the public cloud’s servers you intend to burst to(the fiber between your two facilities may have to travel some distance to a central location, so can’t rely on a basic map to determine distance). So in my case for example being hosted in Atlanta, GA, it would be ludicrous to expect to burst to a public cloud player located in Virginia, way too far away.

People almost universally overestimate their resource requirements

I’ve seen it time and again at least with people I have worked with. Big cloud aspirations, need cloud to scale, this and that.. when they literally have no idea how much resources are required to run their system. They just think I want to scale to 10X to 20X traffic, and to do that I think it would be best to burst to a cloud. Whereas in reality in most cases(all cases in my career actually) such 10-20X bursts in traffic amount to very little in additional infrastructure. Maybe at best a doubling of the server capacity. Yes, double capacity to handle 10-20X traffic, because most of the time the servers are way under utilized on normal traffic days. People seem to be thinking in CPU/memory capacities of two decades ago (single core CPUs, 2-4GB of memory per server), that’s the only way I think I can try to explain why they think the way they do.

Modern systems, even systems from a decade ago are obviously far more powerful than that. At my last org in 2014 I deployed a trio of 24-core/96GB memory systems with LXC running on bare metal for their main e-commerce stack. It was overkill for sure, but it was also super cheap especially taking into account the licensing for the e-commerce platform itself. Under normal workload the systems ran well under 10% CPU/memory. Under the highest load ever they probably topped out at 70-80% CPU, ran for about 6 years without a single issue, until the app stack was replaced with a new app stack.

If I had a stack with far more traffic, you can get servers literally with up to 384 cores today in two sockets(sixteen times the number of cores I had in 2014 with dual sockets), though I would go for fewer cores and distribute among a few additional systems depending on your overall capacity requirements. Web server software has had “auto scaling” of web workers for literally over twenty five years now. No need to fancy Kubernetes stuff, just set the settings in the container in LXC and let it go. Near flawless operation of LXC on bare metal for eleven years now, simple, fast, easy to manage. Certainly avoid web server software that doesn’t have dynamic worker support(such as Unicorn, which is for Ruby). Speaking of Ruby, I remember deploying my first production Ruby stack in 2007, originally I believe it used Apache and fastcgi, and that had some weaknesses. I came across mod_fcgid, and deployed that instead, with great results. Not only could the workers scale up and down they would auto restart after some time which really helped working around memory leaks. More recently a Ruby stack I deal with uses Puma, after half a decade of complaining about how bad Unicorn was. Also I’d suggest using server software that can report internal metrics on worker utilization. Apache has had this for well over two decades with mod_status, Puma has it as well(Unicorn did not last I checked). I mention this because I was shocked how poor the internal metrics are from nginx by contrast, at least the open source version, the commercial version may be much better but isn’t cheap!

So the next time you hear someone suggest, or perhaps if you yourself think about bursting to cloud or anywhere else really, think hard about how much resources your app really uses. If your not needing to get to thousands of cores for short periods of time, then bursting to cloud is not something to consider IMO.