My Takeaways from FabCon Europe 2024 - 1 Week Later
Intro
From September 24th - 27th I attended the 2024 European Fabric Community Conference (or FabCon) in Stockholm. The first day of this was spent attending hands on tutorials run by the Microsoft product team, with the remaining three days kicking off with a keynote followed by at least 4 slots to attend one of a number of breakout sessions. Alongside standard breaks for networking and food, there was also a community booth, Fabric developer challenge, Ask the Experts stand, and a number of partner booths. I’d planned to share some of my experiences and takeaways, but I wanted to take a beat and reflect once things settled. There are a couple of points below with some overlap to my daily LinkedIn reflections, but I’ve tried to minimise this or add extra detail where relevant.
Community takeaways
First of all, a note on the community aspect of the conference. Prior to attending, I wasn’t sure what exactly the branding of a “community conference” would mean and I must admit that it did feel a little bit different than the traditional tech or data conference. It felt as though there was a dedicated effort to engage the community from the “Fast with Fabric” developer challenge being focused on understanding how people are actually using the tool, to constantly looking for product feedback that would feed future developments, it did feel like Microsoft wanted community engagement. The community booth was constantly busy throughout the entire conference, too.
- The Fabric community is massive and growing rapidly. From the 3,300 attendees (across 61 countries) and massive Microsoft representation, 14,000+ Fabric customers, the active forum users and user groups, and more, there’s so much going on in the Fabric space. One particular note here was that Fabric is on the same trajectory that PowerBI was at the same point in its lifecycle - given the almost 400,000 PowerBI customers today, there is obviously a large targeted growth
- In terms of user growth, there were two interesting things I noted; Fabric has the fastest growing certification Microsoft offer (Analytics Engineer certification, DP-600) with 17,000+ certified people, and from interacting with a number of attendees, especially during the tutorials, so many users are from non-technical or data backgrounds until they needed to fix a specific problem with data (often using PowerBI) - a refreshing change from my background and from seeing so many consider data-first rather than business problems
- It can be incredibly valuable spending time engaging with the community, especially in person. It’s hard to imagine anyone left the event without being just a little more energised than beforehand. From resolving specific technical queries and validating design decisions and best practice to just understanding what other people are working on, and how, there was a lot to gain from talking with the Microsoft staff and MVPs at the “Ask the Experts” booths, those leading sessions, and other attendees
- I’d extend the above to suggest that while you might not have access to so many product experts in person on a daily basis, the community forum and sub reddit are great places to engage online. I had the pleasure of meeting some of the active Reddit members (picture below, sourced from this post)
Product and broader theme takeaways
- Effective data governance is crucial - alongside lots of discussion on how Fabric can meet the governance needs in the era of AI, there was a lot of detailed coverage around both Fabric’s built in governance features and utilising Fabric alongside Purview for extended data governance and security. I also noticed quite a number of partners in the governance and MDM space in the exhibition space including Profisee, Informatica, Semarchy, Cluedin, and more. On my daily LinkedIn updates, I called out one important quote; “Without proper governance and security, analytics initiatives are at risk of producing unreliable or compromised results"
- Fabric is intended to be an AI-embedded experience for developers and business users - it’s easy to say this means going all-in on AI, especially in today’s market, but I thought it was interesting that all this was discussed from all angles. From a generic focus on copilot driven development and consumption, and generative AI solutions, to integrating OneLake data to custom built AI applications and the ability to call AI functions (e.g. classify, translate) directly via notebooks with functions. This included covering key aspects like generative AI not always being the right solution and getting ready for and appropriately governing data to support AI, all backed by great demos
- Power hour was a thoroughly enjoyable experience, and one I highly recommend checking out if you’re not familiar with them. The energy on stage, and lighthearted, enjoyable nature of the various demos was a master class in storytelling during your presentation and how to have fun with data. It reiterated something that I think most passionate data practitioners are conscious of; the value of your data is often determined by the narrative you can drive by effectively using it, and how you can explain it in simple or understandable terms to the business
- There was a real focus on ease of use and how Microsoft are trying to minimise the barrier to entry. This included extension of Copilot features (in PowerBI, and building custom Copilots on Fabric data), the inclusion of PowerBI semantic model authoring via desktop, changes to the UI, ability to embed Real Time Dashboards, and features around ease of implementation / migration including integrating Azure Data Factory pipelines supported directly in Fabric, sessions around migrating from Azure Synapse, and upcoming (in 2024) support for migrating existing SSIS jobs
- There were lots of great sessions and technical deep dives where architecture examples were presented e.g. connecting to Dataverse, implementing CI/CD, production lakehouse and warehouse solutions as well as conversation with other attendees about other data technologies (Azure Data Factory, Databricks, and others). This was all a firm reminder of Werner Vogel’s Frugal Architect (Law 3) that architecting is a series of tradeoffs. Don’t waste time chasing perfection, but invest in resources aligned to business needs
- While ultimately the ”proof is in the pudding” as far as listening to customer feedback goes, it felt clear that Microsoft want to factor in user feedback to how they develop Fabric - there was a Fast at Fabric challenge that was entirely aimed at gathering user feedback, Microsoft product leads were engaging with attendees to understand key sticking points, and I even had a conversation with Arun Ulag, Corporate VP of Azure Data, where the first thing he wanted to know was how I am using Fabric and how it could be improved. It was also good to see the deep dive into data warehouse performance explain the trajectory of warehouse developments and acknowledge why there were some shortcomings at launch tied to the significant effort to move to delta (format) under the hood
My favourite feature announcements
Frankly, there were too many announcements to capture or list individually, though Arun’s blog and the September update blog cover most things I can recall. I still wanted to call out the announcements I saw as either the most impressive or that have previously come up in conversation as potential blockers to adoption (looking at you, git integration!)
- Incremental refresh for Gen2 Dataflows - those working in data engineering will be more than familiar with implementing incremental refresh, but this brings the ability via low-code through Gen2 Dataflows, which is great for those who had it as a core requirement, but also would reduce the consumption and cost of existing pipelines that are conducting full refreshes
- Copy jobs - think of copy jobs as a prebuilt packaged copy pipeline including incremental refresh capability. Put simply, copy jobs are the quickest and easiest way to rapidly automate data ingestion
- Tabular Model Definition Language (TMDL) for semantic models - coming soon is both the ability to create or script out existing semantic models using code, enabling versioning but also consistent best practice (e.g. reusable measures). Alongside this, an additional TMDL view will be added to PowerBI
- GIT integration - though Git integration has existed for some time, it’s always needed some kind of hacks to be properly functional. During FabCon, it was announced that all core items will be covered for git integration by the end of the year - the standout here is inclusion of Dataflow Gen2 items
- The new Fabric Runtime 1.3 was released. This was quoted as achieving up to four times faster performance compared to traditional Spark based on the TPC-DS 1TB benchmark, and is available at no additional cost
Best practice takeaways
- Focused effort on optimising your semantic model is important - though Direct Lake can add value and performance, a bad (or not optimised) model outside fabric will be a bad model in fabric. Also, don’t use the default semantic model, but create a dedicated semantic model
- V-Order is primarily intended for read operations - think carefully about how you use it. Best practice advised was to utilise V-Order for “gold” layer data, feeding semantic models. Use Delta analyser to examine specific details
- Using DirectLake is as important in Fabric as query folding is to PowerBI - DirectLake can massively improve read performance - an example referenced reading 2 billion rows taking 13 seconds in a P1 capacity via direct query and 234ms for 1 billion rows via DirectLake. While the record count isn’t identical, it was designed like this because a 2 billion record read would force direct query fallback
- A metadata driven approach to building pipelines is best practice, but it’s not easy to tackle all at once, so start small and gradually expand across the organisation
- There’s a lot to tweak around spark optimisation (more via another blog), but one key area of discussion is around spark session startup times. Two callouts on this were to utilise starter pools and high concurrency mode. High concurrency mode enables multiple sessions to share spark sessions and can reduce startup to as little as 5 seconds
Lastly, a quick shoutout for a couple of food recommendations. If you’re visiting Stockholm, do try and check out Berns (Asian fusion and Sushi) and Mr. Churros!