The cloud has been touted as the panacea for all things data related. Whether posited as the ideal locus for Artificial Intelligence deployments, or renowned for streamlining the fundamentals of data management, cloud computing is considered the solution for nearly every aspect of the data landscape.
“Silicon Valley’s IT business is really good at telling stories and painting pictures in the early days that are attractive, and it gets people moving in a direction,” explained Stardog CEO Kendall Clark. “And then, just like any other complicated human thing, the costs come later, and the sober reality comes in later.”
For the cloud, the sober reality is this paradigm has a number of points of concern that, unless properly addressed, become business flaws with the potential to undo any advantage gained from it. Specifically, these issues pertain to:
- Cost: It’s not uncommon for organizations’ costs to soar once migrating to the cloud, despite its pay-per-use pricing model.
- Data Consolidation: The cloud reinforces the notion of consolidating data at the storage layer, which can prove deleterious for the short term and long term.
- Repatriation: The above factors and a number of other considerations are contributing to the movement towards data repatriation, in which organizations are actively trying to return their data from the cloud to on-premise settings.
Collectively, these factors—and the mounting traction for alternatives to the burgeoning costs, data replication, and regulatory consequences attendant to cloud computing—are contributing to a reality in which, “You can start to sense the beginnings of, I won’t say backlash just yet, but a turn, a reconsideration, that the biggest companies on the planet ought to think twice before they headlong rush all their data sources into either their own cloud environment, or vendor systems’,” Clark observed.
Despite the abundance of solutions designed to monitor (and predict) cloud costs, it’s easy for costs to spiral when moving to the cloud. This fact is counterintuitive to many organizations who considered cost as one of the cloud’s advantages.
“If you listened very carefully, no one ever said the cloud was going to be cheaper,” Clark recalled. “They said very particularly for the customer that it was going to shift from CapEx to OpEx. That has value up to a point, but you can’t absorb any level of rising costs and justify it by saying, ‘it’s an OpEx, not a CapEx’. In the long run, enough OpEx growth and everybody’s out of business.”
The growth of OpEx expenses in the cloud pertains to factors such as vendor lock-in, egress fees, and the proliferation of data quantities—and tools required to manage them—the cloud model supports.
With so much data scattered throughout so many different cloud resources, it’s common practice for organizations to simply replicate relevant datasets for any sort of centralized analytics or application. The amassing of these data, their replication, and the numerous tools required for managing them boost expenses for cloud computing. “The most recent statistic I saw was that 85 percent of all businesses, irrespective of size and including the SMB market, have data in more than one cloud environment,” Clark revealed.
When organizations constantly replicate such data for centralized views or the aforesaid application and analytics use cases, they simply create more data, increase the cost of this medium, and run the risk of incurring regulatory penalties for data privacy and other issues. “This year the planet will create about 59 zettabytes of data, which is a lot, and 90 percent of that is replica data,” Clark mentioned. “Then you also put with that fact that network performance historically is trending down. There is no Moore’s Law for network performance, and all of that data has to be moved over some network if it’s going to be moved at all.” Such movement also exacerbates cloud costs.
Organizations can reduce these costs on all fronts by not copying data and minimizing data’s movement by simply linking them together in a holistic data fabric. When implemented with data virtualization and knowledge graph capabilities, data fabrics can assemble whatever data are required for queries without physically consolidating them at the storage layer. In addition to decreasing the costs of the cloud, this approach also protects firms from other cloud caveats such as vendor lock-in (in which cloud platforms appropriate organizations’ metadata, and by extension, their data, and/or make them pay enormous egress fees to remove data) and regulatory transgressions.
“In an era with concerns for privacy and data control, being able to move the least amount of data that’s necessary to perform some operation or business process on behalf of your customer is a much better policy posture for all businesses than saying, ‘you need to give us permission to have access to any and all of your data, and we can’t even tell you what for, that’s just how our business runs’,” Clark maintained.
An Enterprise Data Fabric
Each of these considerations (increasing cloud costs, the perils of copying and consolidating data at the storage layer, and the regulatory risks of doing so) is increasing the momentum around the data repatriation notion in which “companies are moving datasets back from the cloud,” Clark said. Regardless of where data are, organizations can unify them in an enterprise data fabric to practically eliminate moving data (and its negative effects) to get faster query results, improve operations, and remain much more agile in the face of an uncertain future.
Featured Image: NeedPix
Jelani Harper is an editorial consultant servicing the information technology market. He specializes in data-driven applications focused on semantic technologies, data governance, and analytics.