构建多租户AI平台：Weam的数据隔离与安全策略

When you’re building an AI platform that serves multiple companies, you can’t just throw everyone’s data into the same bucket and hope for the best. Company A shouldn’t see Company B’s conversations, documents, or custom agents. Ever.

This sounds obvious, but getting it right is tricky. You need to think about isolation at every layer: database queries, file storage, API access, real-time connections, and even vector embeddings. Miss one spot and you’ve got a security nightmare.

Let me explain how we built multi tenant AI platform; Weam, starting from the database and moving up through the application stack.

The Foundation for Multi Tenant AI platform

Here’s the core principle: every piece of data in Weam belongs to a company. Not a user, not a workspace, but a company. This companyId becomes your primary isolation boundary.

When you sign up for Weam, we create a company for you. This identifier is tagged on every user, brain, document, agent, and chat message. It’s not optional, and it’s not nullable.

// MongoDB Schema Exampleconst ChatMessageSchema = new Schema({content: { type: String, required: true },  messageType: { type: String, enum: ['user', 'assistant', 'system'] },  companyId: { type: ObjectId, required: true, index: true },  createdBy: { type: ObjectId, required: true },  session: { type: ObjectId, required: true },  // ... other fields});// Critical: Index on companyId for query performanceChatMessageSchema.index({ companyId: 1, session: 1 });

Notice the index on companyId. Every query that touches this collection will filter by company, so you want that lookup to be fast.

Session Based Access Control

We use iron-session for managing user sessions. When you log in, your session stores your companyId along with your user ID and role. This session data becomes the source of truth for every API request.

Note

We use iron-session, a lightweight session management library for Next.js that stores encrypted session data in cookies perfect for server-side environments without external session stores.

// Session Structure{  _id: "user-id-here",  email: "user@company.com",  roleCode: "USER",  companyId: "company-id-here"}

Before any operation happens, we check the session. No session means no access. Wrong companyId means no access. It’s that simple.

Here’s what the middleware looks like:

// Middleware for Protected Routesasync function checkAccess(req, res, next) {  const session = await req.session.get();    if (!session || !session._id) {    return res.status(401).json({ error: 'Unauthorized' });  }  // Attach company context to request  req.user = {    userId: session._id,    companyId: session.companyId,    role: session.roleCode  };  next();}

Every protected endpoint uses this middleware. No exceptions.

Query-Level Isolation

The session gives you the companyId, but you still need to use it correctly in every query. This is where developers often mess up. They write a query that forgets to filter by company, and suddenly, there’s a data leak.

We enforce this pattern everywhere:

// WRONG - Missing company filterconst chats = await ChatMessage.find({ session: sessionId });// RIGHT - Always filter by companyconst chats = await ChatMessage.find({   companyId: req.user.companyId,  session: sessionId });For extra safety, we built repository classes that automatically inject the companyId:class ChatRepository {  constructor(companyId) {    this.companyId = companyId;  }  async findMessages(sessionId) {    return await ChatMessage.find({      companyId: this.companyId,      session: sessionId    });  }  async createMessage(data) {    return await ChatMessage.create({      ...data,      companyId: this.companyId    });  }}// Usage in route handlerconst chatRepo = new ChatRepository(req.user.companyId);const messages = await chatRepo.findMessages(sessionId);

Note

Repository classes are a clean architecture practice that wrap direct database access, enforcing consistent data rules like always including companyId filters.

This pattern makes it harder to accidentally write an unscoped query.

Vector Database Isolation

Documents in Weam get chunked and stored in Pinecone for semantic search. But vector databases don’t have built-in multi-tenancy. You need to handle it yourself using metadata filters.

When we store embeddings, we attach the companyId and agentId as metadata:

// Storing Vectors with Metadataawait pinecone.upsert({  vectors: [{    id: `chunk-${chunkId}`,    values: embedding,    metadata: {      companyId: companyId,      agentId: agentId,      fileId: fileId,      chunkIndex: index,      content: chunkText    }  }]});When querying, we filter by this metadata:// Querying with Company Isolationconst results = await pinecone.query({  vector: queryEmbedding,  topK: 5,  filter: {    companyId: { $eq: req.user.companyId },    agentId: { $eq: agentId }  }});```

Without that filter, you’d get results from other companies. That’s a major security problem.

Note

Vector databases like Pinecone store numerical embeddings of text for semantic search. Since they don’t support multi-tenancy natively, we rely on metadata filters to scope queries to each company

File Storage and Access

We use either MinIO or S3 for file storage. Files get organized by company in the bucket structure:

bucket-name/  company-abc123/    files/      document1.pdf      document2.docx  company-xyz789/    files/      report.pdfWhen generating presigned URLs or serving files, we verify the requesting user's companyId matches the file's company:async function getFileUrl(fileId, req) {  const file = await File.findOne({    _id: fileId,    companyId: req.user.companyId  // Verify ownership  });  if (!file) {    throw new Error('File not found');  }  // Generate presigned URL  return await s3.getSignedUrl('getObject', {    Bucket: process.env.AWS_S3_BUCKET,    Key: file.s3Key,    Expires: 3600  });}

No company check means no file access.

Real-Time Isolation with Socket.IO

Chat responses stream over WebSockets using Socket.IO. When a user connects, we authenticate their socket connection and store the companyId in the socket’s metadata:

io.use(async (socket, next) => {  const session = await getSession(socket.request);  if (!session || !session.companyId) {    return next(new Error('Authentication failed'));  }  socket.companyId = session.companyId;  socket.userId = session._id;  next();});When emitting events, we can filter by company:// Emit to all sockets in a companyio.to(`company-${companyId}`).emit('notification', data);// Or just to a specific user in that companyio.to(`user-${userId}`).emit('message', data);

This prevents cross-company event leakage in real-time communication.

Note

Each authenticated socket automatically joins a company-{id} room after validation, so messages stay scoped to that company.

Role-Based Access Within Companies

Multi-tenancy handles company isolation, but you also need role-based access control within each company. Weam has three roles: User, Manager, and Admin.

The check-access endpoint validates both company membership and role permissions:

async function checkAccess(userId, resourceType, requiredRole) {  const user = await User.findById(userId);  if (!user) {    return { allowed: false, reason: 'User not found' };  }    // Check if user has required role  const hasPermission = hasRequiredPermission(user.role, requiredRole);  return {     allowed: hasPermission,    companyId: user.companyId,    role: user.role   };}

Testing Multi-Tenancy

You can’t just assume your isolation works. You need to test it. Here’s what we test:

Cross-company data access attempts

Missing company filters

Session hijacking

Vector search leakage

// Example test casedescribe('Multi-tenant isolation', () => {  it('should not return data from other companies', async () => {    const company1 = await createCompany();    const company2 = await createCompany();    const user1 = await createUser({ companyId: company1._id });    const user2 = await createUser({ companyId: company2._id });     const chat1 = await createChat({       companyId: company1._id,      createdBy: user1._id     });    // User2 should not see Company1's chat    const result = await ChatMessage.find({      companyId: company2._id  // User2's company    });    expect(result).not.toContainEqual(chat1);  });});

Common Pitfalls

Here are the mistakes we’ve seen (and fixed):

Forgetting to filter aggregation pipelines: Aggregations need the companyId filter in the first stage:

// WRONGconst stats = await Message.aggregate([  { $group: { _id: '$session', count: { $sum: 1 } } }]);// RIGHTconst stats = await Message.aggregate([  { $match: { companyId: new ObjectId(companyId) } },  { $group: { _id: '$session', count: { $sum: 1 } } }]);

Using user-provided IDs without validation: Never trust user input for cross-references:

// Validate that the resource belongs to the user's companyconst brain = await Brain.findOne({  _id: req.body.brainId,  companyId: req.user.companyId});if (!brain) {  return res.status(403).json({ error: 'Access denied' });}

Leaking data in error messages: Don’t reveal whether resources exist in other companies:

// BAD - Reveals that the resource existsif (!resource) {  return res.status(404).json({ error: 'Resource not found' });}if (resource.companyId !== req.user.companyId) {  return res.status(403).json({ error: 'Access denied' });}// GOOD - Same response for both casesif (!resource || resource.companyId !== req.user.companyId) {  return res.status(404).json({ error: 'Resource not found' });}

Monitoring and Auditing

We log all access attempts with company context:

logger.info('Data access', {  userId: req.user.userId,  companyId: req.user.companyId,  resource: 'chat-messages',  action: 'read',  timestamp: new Date()});

This audit trail helps catch isolation bugs in production and provides compliance documentation.

The Bottom Line

Multi-tenancy isn’t something you bolt on later. It needs to be in your data model from day one. Every query, every file access, every WebSocket message needs company scoping.

The good news is that once you get the patterns right, they become second nature. Company-scoped repositories, metadata filtering in vector stores, and session-based access control give you a strong foundation.

Just remember: every new feature needs to answer the question “how does this respect company boundaries?” If you can’t answer that, you’re not ready to ship it.

Frequently Asked Questions

1. What does “multi-tenancy” mean in an AI platform?

2. Why is isolation important in multi-tenant AI systems?

Multi-tenancy is a software architecture where a single application serves multiple customers—called tenants—from a shared infrastructure. Each tenant’s data, users, and configurations are logically separated, even though they’re using the same underlying codebase and servers.

In SaaS or AI platforms like Weam, this architecture allows efficient scaling and centralized updates — but it introduces a major security responsibility: isolation.

3. How do you enforce isolation at the database layer?

If company data isn’t properly isolated in a multi-tenant system, it can lead to:

Compliance violations

Data leaks

Unauthorized access

File exposure

Vector leaks

4. How does Weam implement company-level data isolation?

Weam tags every entity (users, agents, chats, documents) with a companyId.

5. How is user access controlled per company?

Through session-based authentication (iron-session), middleware checks, and request validation.

The post Building a Multi Tenant AI Platform: How Weam Handles Isolation and Security appeared first on Weam - AI For Digital Agency.

The Foundation for Multi Tenant AI platform

Session Based Access Control

Note

Query-Level Isolation

Note

Vector Database Isolation

Note

File Storage and Access

Real-Time Isolation with Socket.IO

Note

Role-Based Access Within Companies

Testing Multi-Tenancy

Monitoring and Auditing

Frequently Asked Questions

Fish AI Reader

FishAI

联系邮箱 441953276@qq.com

相关标签