There was a question recently posted about how Zscaler trains AI models. We wanted to provide accurate information on how we train our AI models. Zscaler does not use customer data to train its AI models. Each customer owns their proprietary information or personal data (user names, email addresses, device IDs, etc.) in the Zscaler logs. We only use data or metadata that does not contain customer or personal data for AI model training.Organizations want to safely unlock the value of artificial intelligence and machine learning, but they also need to ensure that this does not come at the expense of privacy, security and compliance controls. This becomes particularly charged when we consider the potential of training AI using proprietary data or personal data. The foundation of our architectural approach is data containment. Every customer’s tenant is self-contained: their data lives within their tenancy, under their control. Sensitive information never leaves that boundary. This is not just a principle: it’s a design choice by Zscaler that governs how we build, scale, and deliver value. This is how Zscaler can ensure that customer data is never used to train an AI model beyond a given tenant.Within that contained environment, customers can harness the power of their own data. Logs, transactions, and telemetry generated by their use of our platform are used to improve outcomes for their organization alone. This means customers benefit directly from their own signals, whether it’s for risk modeling, AI copilots, or policy enforcement, without having to trade away autonomy or privacy or security.Leveraging Data ResponsiblyA common concern is whether preserving privacy limits the ability to benefit from large-scale insights. Here’s where an important distinction comes in: personal data remains private, secured, and not included as model training data while metadata that does not contain proprietary information or personal data is used to enrich each tenant’s environment.Think of it like water flowing through pipes: while the content of the water belongs entirely to each customer, the knowledge of how the water moves (its pressure, velocity, and patterns) can inform the system without ever extracting the water itself. Similarly, Zscaler’s platform can use traffic patterns and telemetry that does not contain personal data, and aggregated signals to strengthen AI models and improve the overall environment while still enforcing the guarantee that sensitive data never leaves a customer’s tenancy.Zscaler’s ability to learn from over half a trillion transactions per day leverages a network effect without sacrificing customers’ privacy (specifically and technically a logarithmic utility). Customers benefit from the sheer breadth of signals Zscaler processes because it allows us to recognize global threat trends and provide resilient, real-time defenses. At the same time, customer-specific data is never exposed outside of respective tenancy.Instead, Zscaler leverages the aggregate knowledge of signals across the platform, never tied to an individual customer’s data, to strengthen detection and modeling. Each tenant gains from this global intelligence while maintaining strict boundaries for its own data.To re-emphasize: customers’ proprietary information or personal data in the Zscaler logs is never shared outside of the customer boundary. A Core Security Principle Rooted in Shannon’s Information TheoryThere is deep alignment of our approach with Shannon’s Information Theory, a topic I will delve into more in a future blog. Zscaler views data along a continuum that stretches from low entropy, high-information states such as clear text, through progressively higher entropy forms like ciphertext, and ultimately to encryption and the extreme of pure randomness. At Zscaler, our architectural principle begins with data control. Sensitive classes of data never leave a tenant boundary in any form including customer data. Beyond that, we apply a disciplined progression toward maximum entropy wherever possible, ensuring that only the minimum necessary information is exposed. Techniques such as anonymization, tokenization, de-identification, and other data strategies are not applied as afterthoughts but as deliberate mechanisms to elevate entropy while still preserving just enough structure for essential operations, such as AI modeling and training at the platform level. This approach ensures that the system operates at the highest entropy state consistent with utility, minimizing information exposure while maximizing privacy, trust, and compliance. This is how Zscaler unlocks the value of artificial intelligence but still ensures privacy and compliance for all customers.
