<img src="https://secure.leadforensics.com/133892.png" alt="" style="display:none;">

Making use of your organisation's data is essential for staying competitive in any industry. Having a modern data platform is a must, and it is important that management makes the right decisions for their investments from the start.

We generate ever increasing amounts of data with systems and tracking that covers almost all of our business processes. Therefore, being able to leverage that data via a modern data platform is a priority for most organisations. This article covers:

•    What business leaders need to know about modern data platforms and their benefits
•    How you can get started with a modern data platform 

Modern data platform is a term that's thrown around with little clarity or consistency. Depending on the author it can mean multiple things and cover various technologies. On top you'll also hear terms like lakehouse, data mesh and fabric which make it even more difficult to understand what one actually is. Therefore, we’ll explore a few clarifiers to define and explain a modern data platform in an understandable way. 

Where are we coming from?

Historically businesses have spent vast amounts of money and effort to pull data into a centralised data warehouse. Data from their various systems is ingested into the warehouse, structured into a useable form, then analysed with reports to deliver information (as examples) on how their business runs or what their customers buy. Given the heavy cost and time investment, surely there’s now a better way?

On the other hand, isn't that what businesses still want to do, to gather all their data into a useable format and get insights out of it? I'd definitely say so, for the most part.

However, the demands of the modern enterprise organisation require improvements to the way this is done. Centralised data warehouses grow to be slow, expensive beasts which restrict innovation and the use of newer higher value technologies like Machine Learning or real time analytics.

In addition, enterprise organisations often have large numbers of applications and data sources, with varying methods to extract data, obscure ownership and heavy governance among other challenges. Using a central data warehouse in these circumstances is a sure-fire way to slow progress.

What makes it a MODERN data platform?

The goal of a modern data platform is to leverage newer architectural concepts and technologies for the following benefits:
 
•    Lower the costs and improve the efficiency of data storage and data processing
•    Work with common and more “raw” types of data like unstructured or semi-structured data
•    Enable faster or real-time analytics
•    Make data more accessible to the people that need it
•    Reduce the manual effort required to perform common tasks
•    Leverage simpler development and low-code development
•    Automate and simplify management tasks like access control and performance monitoring
•    Enable innovation to happen faster
•    Improve security and resilience

What are some examples of these architectural concepts and new technologies? 

Public cloud platforms

Microsoft Azure, Amazon Web Solutions and Google Cloud Platform all offer a huge variety of cloud services which aim to provide the best in modern data technology. The platforms cover not just things like data storage and processing, but also security, data management, data governance and more. Everything that is needed for a full data platform.
 
Each individual service which makes up the whole platform is designed to enable automation and interactivity, minimise complexity and accelerate development. In addition, new fields like machine learning are much more accessible, as is the ability to test these technologies with maximum pace and minimum cost.

Warehouse to lakehouse

A lakehouse is probably the most defining component of a modern data platform.

A data warehouse is full of structured data, that is data in tables, connected to other tables. Once data is in the warehouse and committed to an organised data model it is simple to analyse using commonly available skills such as SQL (a programming language used to interrogate structured data). However, data storage, processing and computation is expensive in enterprise scale data warehouses due to the underlying technology.

A lakehouse takes the same concept of a data warehouse, structuring data and preparing it for analysis. However, it aims to achieve lower costs via:

•    Using cheap data lake storage (storing files is cheaper than tables of data)  
•    More efficient file formats (more efficient = smaller files and better file structures)
•    Reducing the number of areas data is stored in, data stays in the cheap data lake rather than being physically stored in an expensive warehouse
•    Employing more efficient data processing technologies using programming languages like Python (and PySpark)

Databricks is a great example of a platform that does this very effectively and can be deployed on each of the public cloud providers platforms. Another prominent example is Microsoft Fabric, recently released as an all-in-one data platform solution.

From BI to AI 

Business intelligence tools like Power BI or Tableau can visualise data to give you information and insights about it, allowing you to make informed decisions. However, AI promises the ability to help you make better decisions or just to make the decision for you in the first place.

For example, a stock report in Power BI could tell you how much of a product you sold this year, enabling you to work out how much you might sell in January next year. A machine learning model built to produce a demand forecast may well be able to do this quicker and more accurately than a person ever could. As the forecast is typically developed using cloud services it can be visualised, distributed, integrated into applications and automated, driving accessibility and adoption. 

Batch data to real-time data

Batch data typically means pulling a chunk of data out of a system on a regular interval. For example, exporting new customer details out of your CRM every hour to update your customer reporting.

For a large number of use cases an hour is too slow, business stakeholders want data to be refreshed every minute, if not instantly. Another example, your finance team might be making frequent changes to your ERP data during month end reporting, they’d like to see that data refreshed in your reports as they make the change to keep up the flow of their work.

Additionally, a modern data platform should enable streaming of data for purposes such as Internet of Things (IoT) analytics. A prominent example here is that manufacturing organisations want live analysis of their machines to understand when they’re likely to break or produce lower quality products.

The speed at which the data is received enables them to address issues more immediately and therefore reduce downtime or bad products. Now augment that real time data with machine learning and a predictive maintenance use case and you can predict when machines will break ahead of time.

Centralised data to data mesh

If everything is done centrally then as your organisation gets larger and your data platform follows, progress becomes slower as:

•    You need more governance and will experience more resource bottlenecks impacting time to value and innovation.
•    Your results will suffer as your developers will have less specific domain expertise (a domain being Finance,  HR, etc.). Alternatively, you’ll need to use more of your business teams time to support with domain expertise.

A data mesh could potentially be the way to go to if you’re a large organisation with multiple large domains. In simple terms a data mesh is concerned with technology, principles, domains and products. The goal is to link domains to each other using guiding principles, technology, and the concept of products.

Each domain builds its own data products which serve both its own purposes and are made accessible (where appropriate) to other domains in the organisation. For example, finance data products which deal with sales data will probably be of great use to marketing to understand the impact marketing is having on sales.

This sounds like a great idea on paper, streamlined governance, domain expertise to build effective solutions, modern ways of working, sharing of data and other benefits. However, it requires an extensive transformation for most organisations, which is of course expensive and time consuming. Therefore, it's only applicable to a limited range of large organisations with existing maturity in data and cloud or medium-sized organisations looking to get started with cloud with significant investment in data.

Recommendations to get started 

If you’re a business with limited or no cloud use:

  • Start with a data strategy!
  • Focus the technology on a centralised cloud lakehouse
  • Build your mandatory reporting first such as operational and financial data products, reporting to meet compliance requirements and so on
  • Identify domains which have high value use cases and build out use cases incrementally focusing on a limited number of domains. These use cases will primarily be using business intelligence tools like Power BI to begin, unless you have external support or budget to invest in a data science team as well
  • As your team scales you can support more domains in parallel and start investing in innovative technologies like AI, machine learning and IoT
If you’re a business already with a cloud data warehouse:
  • Build a business case for the investment required to refactor your warehouse into a modern data platform. Answer questions like how quickly you will recoup the costs of the transformation via cost savings from the platform or new services that can be offered
  • Assuming a positive business case, a build of the modern data platform and migration of data can be performed. This can be done incrementally per system or domain in parallel with the next step, ensuring you start producing value from your new data platform sooner
  • With your platform ready to go you can either repoint your existing reporting to it or focus on building the missing components
  • Then you can follow the final two steps in the previous example to drive value from your data platform

A solid data strategy is the best start

The term modern data platform is a catch-all for a group of technologies, functionality and ambitions combined to form a single holistic solution for your data in the cloud. It can be difficult for business leaders to know where to start with such a muddled concept, let alone realise the potential of the concept.

Business leaders should look to understand the complexities and costs of a modern data platform and how these compare to the multitude of potential benefits. Ideally this is done by starting with a data strategy, but for businesses with more imminent needs identifying valuable use cases, identifying appropriate technologies and designing a solution architecture and roadmap is a good start.

If you lack the expertise or capacity within your organisation to take these steps, Columbus can support you along the journey into cloud and modern data platforms.

Topics

Discuss this post

Recommended posts

The hype around the rise of generative AI technologies makes huge promises about the potential of the technology. Yet it would be fair to say the vast majority of organizations are only experimenting with the technology or using it in isolated use cases.
right-arrow share search phone phone-filled menu filter envelope envelope-filled close checkmark caret-down arrow-up arrow-right arrow-left arrow-down