Technical Guide

Site Down for 3 Days: Copilot Error-Handling Blackout

Zero try/catch meant tiny bugs crashed prod; we wrapped risky calls, added monitoring, and kept uptime.

January 15, 2025 5 min read

The problem

An online learning platform went completely offline for 72 hours during final exam week. 50,000 students couldn't access course materials or submit assignments. The site crashed when a third-party API returned an unexpected null value in user profile data. One unhandled exception took down the entire Node.js application. The company faced $2.3M in refund demands and potential lawsuits from universities.

How AI created this issue

GitHub Copilot had generated all the API integration code without any error handling:


// Copilot-generated code - no error handling anywhere
app.get('/api/user/:id', async (req, res) => {
  const userId = req.params.id;
  
  // Fetch from multiple services
  const profile = await userService.getProfile(userId);
  const courses = await courseService.getUserCourses(userId);
  const grades = await gradeService.getGrades(userId);
  const notifications = await notificationService.getUnread(userId);
  
  // Copilot assumed these always work
  const response = {
    name: profile.data.fullName,
    email: profile.data.email,
    avatar: profile.data.avatar.url, // This crashed when avatar was null
    coursesCount: courses.items.length,
    averageGrade: grades.reduce((a, g) => a + g.score, 0) / grades.length,
    unreadCount: notifications.count
  };
  
  res.json(response);
});

// Service calls also had no error handling
async function getProfile(userId) {
  const response = await fetch(`${API_URL}/users/${userId}`);
  return response.json(); // No status check, no try/catch
}

When the user profile API returned `{ data: { fullName: "John", email: "john@example.com", avatar: null } }`, the code crashed trying to access `profile.data.avatar.url`. This unhandled TypeError killed the entire Node.js process. Copilot never suggested error boundaries, try/catch blocks, or graceful degradation.

The solution

  1. Comprehensive error handling at every level:
    
    // Robust error handling implementation
    app.get('/api/user/:id', async (req, res) => {
      try {
        const userId = req.params.id;
        
        // Parallel fetches with individual error handling
        const [profile, courses, grades, notifications] = await Promise.allSettled([
          userService.getProfile(userId),
          courseService.getUserCourses(userId),
          gradeService.getGrades(userId),
          notificationService.getUnread(userId)
        ]);
        
        // Handle each result safely
        const response = {
          name: profile.status === 'fulfilled' ? profile.value?.data?.fullName || 'Unknown' : 'Unknown',
          email: profile.status === 'fulfilled' ? profile.value?.data?.email || '' : '',
          avatar: profile.status === 'fulfilled' ? profile.value?.data?.avatar?.url || null : null,
          coursesCount: courses.status === 'fulfilled' ? courses.value?.items?.length || 0 : 0,
          averageGrade: grades.status === 'fulfilled' && grades.value?.length > 0
            ? grades.value.reduce((a, g) => a + (g?.score || 0), 0) / grades.value.length
            : null,
          unreadCount: notifications.status === 'fulfilled' ? notifications.value?.count || 0 : 0
        };
        
        res.json(response);
      } catch (error) {
        logger.error('User API error:', error);
        res.status(500).json({ 
          error: 'Unable to fetch user data',
          requestId: req.id 
        });
      }
    });
    
    // Service-level error handling
    async function getProfile(userId) {
      try {
        const response = await fetch(`${API_URL}/users/${userId}`, {
          timeout: 5000
        });
        
        if (!response.ok) {
          throw new Error(`Profile API error: ${response.status}`);
        }
        
        return await response.json();
      } catch (error) {
        logger.error(`Failed to fetch profile for ${userId}:`, error);
        return null; // Graceful degradation
      }
    }
  2. Global error handlers: Added Express error middleware and process-level handlers
  3. Circuit breakers: Implemented circuit breakers for external API calls
  4. Health checks: Added comprehensive health check endpoints
  5. Graceful shutdown: Proper cleanup and connection draining

The results

  • Zero unplanned downtime in 8 months (was 3-4 outages/month)
  • 99.97% uptime achieved (up from 94.2%)
  • API errors handled gracefully - degraded functionality instead of crashes
  • Response time improved 23% with optimized error paths
  • Saved $2.3M in refunds by preventing exam week outage
  • Error monitoring catches issues before they impact users

The team learned that AI-generated code assumes happy paths. Real production systems need defensive programming at every level. They now require error handling for all external calls and have adopted a "fail gracefully" philosophy. No single API failure should ever take down the entire application.

Ready to fix your codebase?

Let us analyze your application and resolve these issues before they impact your users.

Get Diagnostic Assessment →