Technical Guide

Customer Data in Google: Public S3 Bucket Disaster

Private files indexed by search engines; we secured storage and removed exposed data.

January 15, 2025 5 min read

The problem

A security researcher contacted our client with screenshots showing customer invoices, contracts, and personal IDs appearing in Google search results. Anyone could download 8,400+ confidential documents by simply googling "site:s3.amazonaws.com [company-name] filetype:pdf". The exposed data included social security numbers, bank statements, medical records, and signed contracts worth $3.2M. The client faced potential GDPR fines of up to €20 million.

How AI created this issue

The developer asked ChatGPT: "How do I upload files to S3 in Node.js?" ChatGPT provided this code:


// ChatGPT-generated S3 upload code
const AWS = require('aws-sdk');
const s3 = new AWS.S3();

async function uploadFile(file, filename) {
  const params = {
    Bucket: 'my-app-uploads',
    Key: filename,
    Body: file.buffer,
    ACL: 'public-read', // Makes file publicly accessible
    ContentType: file.mimetype
  };
  
  const result = await s3.upload(params).promise();
  return result.Location; // Returns public URL
}

// Store URL in database for easy access
const fileUrl = await uploadFile(req.file, 'invoice-12345.pdf');
await db.saveDocument({ url: fileUrl, userId: req.user.id });

ChatGPT included 'public-read' ACL without explaining the security implications. The AI assumed the developer wanted public file sharing (common for profile pictures) but this was for sensitive documents. Worse, the files were stored with predictable names and no access controls, making them easily discoverable by web crawlers.

The solution

  1. Emergency response: Immediately made all buckets private and implemented IP-based access restrictions to stop the bleeding
  2. Google removal: Submitted urgent removal requests for 8,400+ indexed URLs through Google Search Console
  3. Secure file access: Implemented pre-signed URLs with 15-minute expiration:
    
    // Secure S3 implementation
    async function uploadSecureFile(file, userId) {
      // Generate unique, non-guessable filename
      const fileKey = `private/${userId}/${crypto.randomUUID()}-${Date.now()}`;
      
      const params = {
        Bucket: process.env.SECURE_BUCKET,
        Key: fileKey,
        Body: file.buffer,
        ServerSideEncryption: 'AES256',
        // NO ACL - bucket is private by default
      };
      
      await s3.upload(params).promise();
      
      // Store only the key, not the URL
      return fileKey;
    }
    
    // Generate temporary access URL when needed
    async function getSecureUrl(fileKey, userId) {
      // Verify user has permission
      const hasAccess = await checkFilePermission(fileKey, userId);
      if (!hasAccess) throw new Error('Access denied');
      
      // Generate pre-signed URL valid for 15 minutes
      return s3.getSignedUrl('getObject', {
        Bucket: process.env.SECURE_BUCKET,
        Key: fileKey,
        Expires: 900
      });
    }
  4. Bucket security audit: Implemented AWS Config rules to prevent public bucket creation
  5. Access logging: Enabled CloudTrail logging to track all file access attempts
  6. Data classification: Implemented automatic PII detection using AWS Macie

The results

  • 100% of exposed files secured within 4 hours of discovery
  • Google removed all indexed files within 72 hours
  • Zero data breaches reported - caught before malicious actors found it
  • Passed security audit with new SOC 2 Type II compliance
  • Reduced AWS costs by 34% by implementing intelligent tiering for archived files
  • Avoided potential €20M GDPR fine and preserved customer trust

The incident taught the team that AI tools often default to convenience over security. They now review all AI-generated code for security implications and maintain a "secure by default" policy for all file storage.

Ready to fix your codebase?

Let us analyze your application and resolve these issues before they impact your users.

Get Diagnostic Assessment →