How to Secure Data Collaboration and User Privacy in the New World of Data Clean Rooms | Podcast #13

Tune in to episode 13 of AdTech | AlikeAudience, where Director of Business Development & Strategy Juan Baron and AlikeAudience Co-founder Bosco Lam talk about the role of Data Clean rooms and Data Collaboration in the AdTech Marketing Industry, with business and technology journalist Duncan Craig.

Tune in to get intriguing insights on:

• Active vs. passive approach in managing Users’ data privacy
• Privacy Enhancing Technologies (PET): Customer Data Platforms (CDPs) and Consent Management Platforms (CMPs)
• Data clean room versus Data warehousing
• Interoperability in the context of data clean rooms
• Balance of Utility & Fidelity in data collaboration
• Market forecast for the AdTech marketing industry
• Importance of data protection for AI-empowered businesses and AdTech Industry

Hello from AdTech | AlikeAudience

Duncan Craig: Hello and welcome to the AdTech with AlikeAudience podcast. This podcast is brought to you by AlikeAudience, the premium audience-targeting company with high-performing mobile audience segments. Every month we spotlight leading executives and marketers from industry-leading companies around the world. My name is Duncan Craig. I’ve been a business and journalist in the APAC region for a decade and worked in AdTech content and comms since 2013. We aim to speak to as many interesting people in the AdTech digital marketing and advertising industry across the world.

Guests for this Episode: Juan Baron and Bosco Lam

Duncan Craig: Today we have two guests on the podcast to talk about data clean rooms and data collaboration.

Juan Baron is the Director of Business Development & Strategy for Media and Advertising at Sensitive Data Collaboration platform Decentriq, offering DCR for large enterprises. Born and raised in Colombia, Juan’s experience spans over 20 years across adtech agencies, publishers, and multiple startup exits in the US. He previously led digital transformation at the multinational Swiss media holding group, Ringier AG. He now resides in Zurich, Switzerland, helping Decentriq expand its Confidential Computing data clean room technology internationally. Yes, he’s gonna be teaching us a lot.

Joining us is Bosco Lam, the co-founder of AlikeAudience. Bosco is an Addressability Working Member of the iAB Tech Lab in the US, has expertise in behavioural economics and consumer data, and is passionate about empowering marketers to reach their target customers through connecting data and media and developing privacy-safe audience targeting solutions.

Juan, Bosco, welcome to the podcast.

Juan Baron: Thank you.

Bosco Lam: Thank you.

Juan Baron’s entry into the industry of data clean room

Duncan Craig: Now today, Bosco is going to lead the conversation because we know that the data clean room space is fast-moving, and Bosco has an intense interest in the data collaboration world and how it works. So we’re going to hand it over to you, Bosco, to lead the discussion points. And I’ll chime in with a big future forecast question at the end, Bosco.

Bosco Lam: Thank you, Duncan. And our pleasure to have you here, Juan, today with us. So I would like to set the stage actually, when everyone understands the value of data collaboration. But we understand also there are limits to doing so. Maybe, Juan, before we deep dive into the question, tell us in a few words, how did you come across all the way to Switzerland and also in the industry of data clean room

Juan Baron: Oh my God. My story is ever-changing. So I was born and raised in Colombia. I got a scholarship to go to a school in the US. I lived for seven years in Atlanta. That’s how I discovered the AdTech industry before it was really known as AdTech. So back in the day, when there were still Overture and Ask Jeeves search engines. Then I went into the social media AdTech space. So I’ve always worked in platforms between Atlanta and New York and worked very closely with the agencies.

And then, I was hired by Ringier in Switzerland to come and lead the digital transformation through the publishing lens, where I built a very large team in one of the largest newsrooms in the country. So that’s why my experience is a little bit all over the place. And then eventually, I was an entrepreneur for a few years, during COVID. And then I ended up in the Decentriq, going back to AdTech, but in a more sensible AdTech, because obviously, the data clean rooms are all about protecting user privacy. So the world and advertising are really changing very rapidly.

Active vs. passive approach in managing Users’ data privacy

Bosco Lam: Wonderful. Thanks Juan for that. So before we came into this podcast, we briefly chatted about the existing practices, right, to handle and to manage, and respect user’s data privacy. And we also mentioned that there are two approaches. One is passive, which is bounded by the legal constraints, right, versus the active approach that you mentioned. And we believe that will be the future, which is a technological way of empowering data collaboration. Would you mind telling us more about comparing the passive and the active approaches?

Juan Baron: The good thing about GDPR is that it puts a framework around what you need to do and behave around consumer data. But I think, in general, companies have always wanted to collaborate, even through a passive way. The challenge with the previous ways of collaborating without the technologies like Decentriq, which is mostly led by legal. So these are lawyers kind of handling all the oversight. And there’s a lot of restrictions because as an organization, even though you’re doing things legally, compliantly, even legally approved by your own team, the minute you hand over your data, you lose control. It doesn’t matter if you mask it, if you create synthetic data; still, you’re losing control. And that is a key difference now in the new world of data collaboration platforms like Decentriq, where we truly, fundamentally believe that the data owners, whether it’s a hospital group collaborating with a pharmaceutical company, or whether it’s a bank collaborating with a publisher, through the lens of advertising, every single data owner is treated equally in our platform, the code is the contract. So the model that has been run has to be vetted and approved by every single data owner. And that is only what is allowed to run, and the data clean rooms from Decentriq. So in a way, we remove a lot of the legal burden that comes in place, but we also embed everything that is legally sound, so GDPR, obviously, and all the things that we built around it are built around privacy first frameworks.

PET: Customer Data Platforms (CDPs) and Consent Management Platforms (CMPs)

Bosco Lam: Very interesting. So when we take a look at those particular PETs or what we call the Privacy Enhancing Technologies, actually, iAB has such an array of tools, from consent management systems, CDP’s, and data clean rooms. Would you mind walking us through, from a corporate point of view, how this array of tools would come together and work towards the requirements under the GDPR or other legal constraints?

Juan Baron: The customer data platforms are mostly around understanding a 360-degree view of your customer. That is really at the core, the basics, of what a customer data platform is meant to be. It’s like the next generation of CRMs, as they call it. And these are very powerful platforms. So there’s a lot of rich first-party data living within these systems. The question is, how do you create value out of that data through the lens of a corporation. The consent management platforms are really built around, obviously, GDPR was a big push for a tailwind into these platforms because, in digital advertising, you’re either doing open web where you have to have consent about tracking the user. And leveraging that individual user’s data for the purpose of processing the data and using it with another third party.

So that is kind of gravitating and consolidating into two different data sources that consolidate into one single data stream, right, because you also need the consent of the individual, but you also need the other data points of the CDP. And then eventually, you make that available in a clean room. Now, there are different ways, especially with Decentriq. The way we build Decentriq is we are, at the core, just a data processor. We are not in the joint controller thing, because of the way we build the technology. it’s based on hardware, not on software encryption.

Each individual data owner has his own encryption keys, we have no way of accessing those encryption keys. And we put all the trust and the security, and the privacy constraints on the data into these very special microprocessors developed by Intel, Nvidia, and AMD. So everything that we do, anything that the data owner does, like they don’t even need to trust Decentriq, in a way. We built a system called Remote Attestation that is very heavy in the Privacy-Enhancing Technology space, where every day the owner can actually interact with the chip. And we created the highway to allow the mechanism to the data owner to communicate with the chip and validate what is actually happening in the clean room, what kind of code has been run, who’s running the code, each individual user. So it creates a kind of tamper-proof audit log. So it clears a lot of legal hurdles. compliance hurdles, information security, IT security hurdles, all the nuances around due diligence of onboarding a customer. It just ticks all the boxes, because of the way that technology is built.

Now, the iAB. Going back to your question around the iAB and the PET thing. The data clean room space is getting crowded, not all data clean rooms are created equal. And the truth is starting to surface. And it’s good that we’re starting to being compared to other data clean rooms. Once the compliance or information security teams of any brand or publisher does the assessment, they’ll quickly realize that eventually, if they need to trust the data clean room, then that is not really a data clean room. It’s the simplest way to describe it. Yeah.

Data clean room vs. data warehousing

Bosco Lam: Interesting, you bring that up. Because from a layman’s perspective, a data clean room sounds like an abstract room that, you know, hosts the data set. So let’s take one step back. A lot of our clients actually are data owners. They actually ask, what is the difference between a data clean room, for instance, CDP’s, and data warehouses, right? Because from their perspective they already set up their own cloud computation with their nice data lakes or data warehouses, why do I need an extra room, right, in order to collaborate? How would you approach that?

Juan Baron: Well, the answer is quite simple. If you’re a data owner, and you have your own data warehousing instance, let’s say you’re using Snowflake inside of GCP. And then the other customer is using Databricks on Azure. The ones from Azure, or Databricks are not going to send over the data to Snowflake, you need a neutral ground where each individual data owner can actually enable trusted collaboration in a neutral zone where we can then collaborate and extract insights. That is why you need an independent data clean room. That is the simplest answer, you need something in the middle. In cheesy terms, you need a Switzerland of data. Let’s put it that way.

Bosco Lam: Interesting. That’s why you ended up in Switzerland, isn’t it?

Juan Baron: Yeah, exactly. Well, we can definitely claim we are the real Switzerland of data.

Interoperability in the context of data clean rooms

Bosco Lam: That’s good. Let’s deep dive into that. Right? It is exactly, we encounter the reality with, let’s say, a retailer, they’re using a clean room vendor A and a brand that comes with a clean room vendor B. So how would that common ground be resolved? Do they need to adopt each other’s clean room vendor? Do they need a common protocol in order to work together? Would it defeat the purpose of having a common ground in order to collaborate if each of them has different vendors?

Juan Baron: So this whole concept of interoperability, obviously, it’s an interesting topic. We are, from a Decentriq point of view, we’re completely agnostic to the data input and output. So one thing that makes Decentriq a little bit different is that we’re not in the data storage space, like you don’t use Decentriq to store a copy of your entire data warehouse and then eventually make it available for collaboration. What you do with Decentriq is more like Snapchat, there are feral data clean rooms.

So you send the data to computation. And that’s it. It’s a single-purpose, mini clean room in a way, but you can create thousands and thousands of clean rooms for different types of computations, either from one to 1000 different data collaborations in one single clean room. So when you talk about interoperability, that example that you’ve given, let’s pretend it’s a very large CPG brand, they have their own data clean room of choice, and they want to protect their own data. It becomes more of a negotiation between the particular retailer and the CPG brand to say, who’s going to be in charge of the computation and, most importantly, the output of the data of the insights, where’s it going to go.

And for that particular reason, we are 100%, from the ground up, interoperable from the start. It all depends on how other clean rooms are structured and behave from an architectural point of view, whether they can actually magically ingest data from somebody else, and then compute it, and then eventually export the results. But it’s more of, like, if the CPG brand trusts the vendor of the data clean room from that particular retailer, right? If we’re talking about highly sensitive data, or maybe the CPG brand uses something like Decentriq, where, in a case, you don’t even really need to trust Decentriq. And then the retailer, confidently can actually upload their data into Decentriq, just for the purpose of that particular computation.

Balance of Utility & Fidelity in data collaboration

Bosco Lam: Thank you, Juan, wonderful insights. Let’s also jump into another topic that I think a lot of our clients are interested in. It’s about the utility and the fidelity of data collaboration. So in the past, I have actually been working with, you know, credit bureaus, and credit card companies. Obviously, all this transactional level data will be so sensitive that we cannot disclose that Bosco Lam has made a $5 Coffee purchase at Starbucks in that particular location, right? This will obviously be beyond the red lines that we can disclose. And what we have done is to actually model the data by indexing. If this group of people are buying certain categories within the top 20% of the bracket of the population, right? We have been doing this indexing versus we have other types of data, let’s say impression lock, right, we work with DSPs.

And we retrieve impressions, counts, IDs on each publisher. So we have quite different kinds of datasets when we mention data collaboration. And I believe you have that experience with other healthcare companies as well. How are we going to justify what certain level of data we need to pre-process before we enter data collaboration, and to what utility that we need, from a data clean room perspective. Obviously, we can’t go through line by line, but how, you know, abstract before we need to go into a collaboration.

Juan Baron: Let me respond with one question because I would love to learn a little bit more. Up to what point, because you never had something like data clean room, the data that was being sent to you was being diluted in a way because it had to be aggregated for the purpose of privacy right, up to what point you got data that was maybe rolled up in certain ways where it prevented you from actually getting more sophisticated insights because the data owner was forced to actually aggregate it or dilute the data in such a way that is not as useful as you would like it to be.

Bosco Lam: So imagine, if there is a retailer, right, they have thousands of SKUs. And for that particular chocolate brand, they would like to work with the CPG, right, and obviously the retailer because it’s multiple chocolate brands, and they will not be able to tell the competing brands for A and tell B what types of insights. And the approach before the retailer is having a clean room is they at best to aggregate for chocolate category, and to show and to index, what will be the purchase insights for that particular chocolate brand.

Juan Baron: So in this particular case, we have data clean rooms. In particular, with Decentriq, there’s no need to pre-roll or aggregate the data. Because you can go down to the code and the code for us is a SQL query, Python or R model. The whole point of what we do in Decentriq is that we disclose these SQL statements or these Python and R models to the other party. So, that’s the whole point of making it very transparent. But you can then safely upload all the raw data that you have. And then based on the legal framework, it becomes the code. Then you can actually allow the analyst and the other participating party to run whatever model they want to use on your data so that for that particular example of the retailer with this chocolate brands, they should be able to upload all the granular data around all the different variations of chocolate brands that they offer down to the SQL level.

Maybe they even want to upload transactional history based on loyalty cards data as well down to the granular individual, but then allow the CPG brand to query that data set. And it’s the retailer that approves the queries from the CPG brand. So it allows a lot more insights-driven collaboration, rather than very dictated and pre-formatted, in a way. I usually tend to refer to this as a data clean room like Decentriq to enable more of an intimate data relationship.

Diverse range of use cases and Data collaborations

Bosco Lam: So from your experience, are there any differences between the finance and banking circle, health care, to, you know, CPG? Do they have different levels of granularity needs of the datasets, because you manage all these data clean room adopters, they can go further down to the most granular level without feeling insecure of exposing the data sets.

Juan Baron: The core at Decentriq, it’s a data science collaboration platform. They can run all kinds of crazy models, with all the different libraries that exist for data science, and we support most of them. And if the customer needs a new one that we have to vet it, that kind of passes our security guarantees. And then we embed it on the platform.

Now that the use cases are what defines the type of data that you need. So for example, we have a project ongoing with multiple pharmaceutical companies where they provide transactional data. And the whole point is they run a monthly report among themselves, about 36 different pharmaceutical companies in Europe, and then run a market share analysis, very simple use case, right? But this was never, like imagine you tried to disclose to a central consultant in a way. Now with a click of a button, they just upload the data, click run, and then they get the results in seconds.

There is a different story around pharmaceutical companies collaborating with a hospital group. Because now what they’re doing is actually leveraging patient record data, and running models to better predict outcomes on some of the medicines that they’re testing, right? So the whole idea is to accelerate medical research.

I’ll give you another example that we have. This is public, we have a case of cyber warfare data between the Swiss National Bank, the Swiss Stock Exchange, including other continental banks in Switzerland. And they’re all sending metadata from emails and all these other transactional data into the clean room. For the purpose of national defense, right, because just like we in Switzerland, we love to say that we export cheese and chocolate, but we also export a lot of banking, it’s a national product, critical infrastructure product. So cyber defense is quite key.

And then now we go to the level of an advertiser collaborating with a publisher, you know, the more you give in terms of raw data and insights, and granularity into the clean room, the better off you will be. But we built all these privacy-enhancing features inside of the clean room. So you can, for example, prevent the clean room from running processing data on records that are less than 10 rows, for example, and it can be hardwired into the clean room. Or if you want to further enhance your security, you can create synthetic datasets inside of the clean room. So that’s just two features, but there’s a lot of different things that we can actually do inside of the data clean room.

PET first or Privacy Law first?

Bosco Lam: Great insights, Juan. Before we wrap up. I actually got a question when we, as the DPOs, you know, Data Privacy Officers, often have a hard time keeping up with the innovation and technologies, while privacy laws take time to be effected in the market. Which is the driver, or the momentum that you think privacy enhancing technology first, or privacy law first?

Juan Baron: I don’t think we can go without the other, right? I mean, the good thing about GDPR is that it’s not really tied to technology. It’s a framework around processing data. So it’s very agnostic to the technology. Like, they don’t care about confidential computing, they don’t care about multi computation, or SMPC, they don’t care about PETs, it doesn’t matter for the DPO. It’s all about where’s my data? How’s it gonna get processed? Describe to me the data flow, who’s going to be on the receiving end? And I can then do a proper assessment. That is really the behavior of any true DPO. Obviously, we like to, at Decentriq or any other data clean room provider, would like to claim, Yeah, we’re the most secure, we guarantee, and everything else. But at the end of the day, it’s all about the data flow and the processing of the data. And I think it goes more on the legal side, to be honest, than on the PET side.

Market forecast for publishers

Duncan Craig: Juan, thank you, Duncan here. Wow, what a conversation! Kind of mind blown a little bit about your chip technology, hardware framework. You talked about the need to have control over data. And also the ultimate goal, which is leveraging as much granular data as possible. But I’m going to hit you with two questions to wrap and one for Bosco to wrap, if you don’t mind. It’s a fast-moving space. There are a lot of players moving into the market. What’s it going to look like in two years from now? Question one, at the macro, and one at the micro, how’s it going to do it up in that advertiser-publisher relationship? What is the key issue going to be in that particular segment of business

Juan Baron: So question number one, I think the market is still at a very early stage. What’s fascinating for us, what we’ve seen, because we talked about a lot of different industries just now. Think about it this way. It’s like a new muscle that a lot of people didn’t know existed. And it’s a new muscle that needs training, it needs muscle memory, eventually, a lot of people have been talking about data collaboration, data is the new oil, whatever.

But now, it’s really becoming true in a way that people are discovering that, if I collaborate with partners, through data, we can learn from each other, and then eventually collaborate closer, and have a business impact, positive business impact. So that is at the macro level. For the advertising sector, it’s very difficult to predict, because AdTech is also always very difficult to predict. But confidently we can say that data clean rooms will play a very significant role in the advertising sector, for a couple of reasons. One is obviously the deprecation of third-party cookies. So there’s a real signal loss. We hear this from advertisers. In particular, even more on the regulator brands, because their hands are already tied anyway, on what they can do with their data. And without even third-party cookies, they’re completely blind now.

And the second thing around third-party cookie deprecation is that it also impacts tremendously the Publishers. The cool thing is now publishing has been almost kind of screwed for the past 15 to 20 years. A lot of the money has been siphoned away from big tech. And now publishers have what I like to say is, there’s almost like a once in a lifetime opportunity. The pendulum is really swinging their way. They’re the ones who actually have a relationship with the readers. And they’re the ones who should be able to extract that relationship with those readers with key advertisers.

And the cool thing is that data clean rooms play a very critical role in that relationship, because it removes a lot of intermediaries. It allows the publisher with that relationship with their reader to leverage that data in connection with the sensitive data of that particular brand. It is not about open web advertising anymore. Through clean rooms it is all about direct programmatic. Yeah.

Duncan Craig: You think publishers need to engage more tightly with data clean room providers?

Juan Baron: It’s not that I think, is that what they told me. The critical leg in the strategics tool. That is what they’ve told me. Okay. They need to learn how to do it. Like I said, it’s a new muscle. There’s a lot of questions. I don’t know how to do it. How do I go to market? Even from a sales perspective through a publisher, it’s a completely new way of selling a new product. It just never existed before. It’s the Change management scenario. It is going to take a few years for it to scale. But it’s starting already, we see it.

Importance of data protection for AI-empowered businesses and AdTech Industry

Duncan Craig: Thanks. Thanks, Juan. Bosco, finally for you. Do you agree with this scenario? Do you have additional beliefs and thoughts, and forecasts?

Bosco Lam: Yeah, absolutely. I think for any data owner, one key takeaway is to be prepared for the passive way, and also the active approaches in data protection, right? No matter how strict the user consents, how precise all these contractual wordings are, once the data is out, that’s it, you can’t recover and you lose control right away. But for the active approach, you have to think about how you would govern the access right. Who can read to what certain level and to what insights that can be applied in what destinations, and I think this is absolutely the future for any data owner who wants to collaborate.

One example is like the lawsuit against open AI, the lawsuit is about without having consent, and they scrape a lot of public data for the model training. And I think this LLM model is inevitable for any companies who really want to tap into the AI game and strategy, and have any unique approach, you need your unique data to go into the model, and to train and get that unique results. Obviously, you do not want this precise first-party data to be exposed to any other parties without your control. So I may have stretched it a bit too far. But if you want to tap into that AI-empowered business, and you want to tap into this precise data that you have, or you know, your collaborators have, you need clean rooms, and you’d have to think about this active approach and how to secure your AI strategy.

And your next question is actually about the AdTech Marketing industry. And I would say that AdTech Data collaboration has long been in the black box, you know, how all these segments have been created, how data is being shared, we have the ID, data labels, just like the food ingredients. But no matter how precise that information is, this close to the upstream or downstream partners, will really need a technology or a framework that would show how things work, but not just only having that label on the package. Don’t get me wrong, having the label is the first step. But next is how to prove that we’re doing it accordingly. And to prove it to other partners in the ecosystem that we do it in the right way. So yeah, that’s my prediction or the macro view and to the AdTech-specific industry.

Duncan Craig: Oh, thank you, Juan. And thank you, Bosco. A bit like Google scraping the internet with this AI technology to power its own products, I’m sure. I feel that we’ve only scratched the surface here. We could have spoken for another hour about law and data protection, and privacy. Juan, I think we’re going to have you back at some point.

Juan Baron: Okay. I would love to come. No worries. Thank you very much.

Subscribe and Stay Tuned!

Duncan Craig: Appreciate that. And just to wrap up, to our audience, thank you so much for listening. To find the show notes, the transcripts, and more information about AlikeAudience’s segment offerings, jump onto the website www.alikeaudience.com. And Juan, Bosco, that was fascinating. To our listeners, if you enjoyed this episode, don’t forget to hit subscribe, and leave us a review. We’ll catch you all in the next session. Thank you.

Silicon Valley Office

440 N. Wolfe Road, Sunnyvale CA 94085, USA

Hong Kong Office

20/F, Tower 535. 535 Jaffe Road Causeway Bay. Hong Kong Tel: +852 6172 7413