You might have heard the news of the announcement at Microsoft Build. The announcement was about the new service offering named Microsoft Fabric. In this article , I want to explain what Fabric is and why you should care. In the end, I will also share my two cents on how I see this offering will come into the Data analytics products and services market.
The message of Fabric is simplicity. To understand that, let’s look at how many products and services are available now in the Data Analytics world. Here is a few of them just in the Microsoft toolset;
The list above does not contain AI and Machine Learning and does not contain Databases. If we add those, it would easily be something over 30 products. And that is just from one vendor; Microsoft. If you want to consider others, such as AWS and SAP, the list of products, services, and tools will easily exceed 100 items.The above is complicated for a Data Analytics department to lead a successful project. You have to spend a huge amount of time learning about these products, but that isn’t the whole story; each product and service comes with its own licensing plan, and that would make things super complicated. We live in an age where ChatGPT and other AI tools scream for simplicity. This complex method is not the way.
The best way to understand Fabric is to understand its primary purpose; Simplicity. Microsoft team invested in this new offering in the past two years and devised a way to simplify things. As the Data Analytics Lead of your organization, you don’t have to worry much about the technology; you can instead focus on the results. You don’t have to spend hours and hours to figure out how the licensing of your Azure Synapse combined with Azure Data Factory and Power BI would work together. Fabric makes it much simpler.
I like the Umbrella concept. Microsoft did it once in 2015 by bringing Power View, Power Query, and Power Pivot under an umbrella called Power BI. Power BI was a huge success in a way that in the past few years, Power BI always has been on the top of Gartner’s Magic Quadrant for BI services in the world.Fabric is the Data Platform service offering of Microsoft for this new age. Fabric is an umbrella on top of Microsoft’s three main Data Analytics products: Power BI, Azure Data Factory, and Azure Synapse. However, it is easier to understand if you look at it by functionality or workload. Here are what is included in Fabric;
When you have all the above under one umbrella, then you would have one place to create and edit them, you would have one structure to tie them together (workspaces), one setup for security and configurations like that, and a far simpler licensing plan that can be used for all of the above.
OneLake is a Data Lake technology that emphasizes being the ONE data lake. This will be the storage for all the computing services mentioned above. They will all store data in the OneLake and read it from there. The idea behind using a Data Lake technology is that it would cover both types of structured and unstructured data. OneLake will automatically cover the regions through one tenant, so there won’t be a need to create a data lake for each region. One Data Lake would be enough for all, hence the OneLake.
Microsoft has invested a long time in data integration technologies. Azure Data Factory is the successor technology of SSIS (SQL Server Integration Services). Azure Data Factory has the power to transfer billions and trillions of rows of data. Recent enhancements in Power Query technology also bring Dataflow as the transformation engine that can now be used alongside Data Factory for a comprehensive Data Integration technology. Azure Data Factory is the ETL technology for a data professional, whereas Dataflow and Power Query are usually the technology for the Data Analyst. In Trident, the experience of Data Integration would use the best of both worlds and will give you the scalability and the transformation power in one place.
For data engineers, Synapse provides the ability to build the infrastructure using Lakehouse (OneLake) and then pipelines to ingest the data into that structure. There will be connectors for various data sources, and the data will be stored in the Lakehouse as files or data tables, depending on the source type. The data can be moved into the Lakehouse using Shortcuts or Data integration methods mentioned in the previous section.The Lakehouse is not just for storing the data but also for table management. Synapse helps you to have better performance and management across the Lakehouse.
When you work with a large-scale data warehouse, Synapse gives you immense power to manage that. You can query the data with an amazing, empowered performance using SQL technology combined with Apache Spark for big data. Azure Data Explorer (Kusto) can be used for interacting with this technology, and with the Fabric, now Kusto is part of the overall experience. You won’t need to use a separate tool or editor for it.Synapse provides an open and infinitely scalable data warehouse. As the data warehouse developer or admin, you don’t have to worry about providing more resources to scale up or down. Everything will be done for you automatically. Data is stored in an open format parquet file. The difference between Synapse Data Warehouse with creating Data Warehouse in Azure SQL DB, or even using Dataflow is that Synapse is enterprise-ready and infinitely scalable.
Data Science projects are usually part of the bigger data analytics work. That is why in Microsoft Fabric, Data Science using Synapse is added as a workload. Data Science is not just using a single tool; it is a combination of features and tools used across the entire Microsoft Fabric. The process can include using analyzing the data using Data Wrangler, building models and experiments using MLFlow, model training, usage of Cognitive Services and large language models, and prediction using PREDICT. Synapse ML would be supporting all these in Microsoft Fabric.
The ability to analyze real-time data using IOT Analytics and Log Analytics has been part of Microsoft’s offering for a long time. This ability is now part of the Microsoft Fabric as Synapse Real-time Analytics workload. Synapse Real-time Analytics works with event streaming technologies (such as IoT or Event Hubs, pipelines, etc.), loading data into KQL DB and Lakehouse via mirroring and then ML models to run experiments on it, and finally, use Power BI to see the results.
It is hard to have not heard of Power BI in this age. Power BI is the most capable analytics technology that can connect to a wide range of data sources. The analytical power of this service enables data analysts to do data preparation, data modeling, and calculations. The visualization engine of Power BI then visualizes the insights to the users. Power BI works by itself in many data analyst solutions. However, in many other solutions, it works with other technologies such as Excel and Power Platform, and with Fabric, it would work with OneLake storage. The coupling of Power BI and OneLake is more than just another data source. It comes with a new connection type called DirectLake, which is faster than DirectQuery. I’ll explain this in another article.
Data Activator is the new tool that is offered as part of Fabric. This tool is a data-event-trigger system that helps automate actions based on data. For example, you might want to set a query to run if a certain measure’s value in a Power BI dataset goes above or below a particular amount. The idea for this service is to close the loop from Insights to Action.
Purview provides a solution to help govern, protect, and manage the data estate. The Data Catalog of Purview now would be able to scan the entire Microsoft Fabric artifact (not just Power BI), The Purview hub will be part of the Microsoft Fabric portal, and users can browse and search for artifacts. Information protection and sensitivity labels will be part of Microsoft Fabric elements to help protect the data as the organization needs.With Microsoft Fabric providing an umbrella for all the workloads, you can expect to have end-to-end monitoring and a comprehensive audit log system.
I hear you; you might say, “All these products and service already exists (except one); what is the big deal of bringing them under one umbrella and calling it Fabric? “Power BI itself was born like this, it was an umbrella on some of the products that have been used in Excel and SQL Server for some years already, and yet you see such growth in this product, and hardly these days you can ignore it if you are in the analytical space. One reason for that success is simplicity; the other is being built on top of powerful products such as Power Query, Power Pivot, and Power View.The same can be said for Fabric. Fabric makes things simpler in a way that licensing won’t be a super-complex subject. You would have one environment to deal with all these products and services. Data Engineers, Data Scientists,s and Data analysts work in a similar environment. The project’s focus goes where it should, which is succeeding in the analytical project rather than getting all these products to work together. Fabric is built from products with a proven track record of successful implementation; Power BI, Azure Data Factory, Dataflow, and Synapse, are all products that every organization can fully rely on.The integration of these tools is more than just an umbrella. Microsoft invested two years in coupling these products and services so that they work seamlessly. You don’t have to worry about switching environments for using each service.