How to Get PDF Attachments from Gmail Using JavaScript

Fetching email attachments via the Gmail API seems straightforward until you encounter complex using javaScript, nested MIME structures. If your code is stuck in infinite loops or failing to extract PDFs, here’s a step-by-step guide to fix it and add practical improvements.

Nested MIME Structures Causing Infinite Loops

The original code uses a recursive function to search for PDF attachments in an email’s MIME parts. However, emails often have deeply nested structures (e.g., multipart/mixed, multipart/alternative), which can cause recursion to fail or loop indefinitely.

Why It Fails:

  1. Recursion Limitations: Deeply nested parts exceed JavaScript’s recursion stack.
  2. Redundant Checks: Parts may reference subparts in a way that revisits the same nodes.
  3. Incomplete Traversal: The code stops at the first PDF found, but nested parts may contain duplicates or invalid data.

Iterative Traversal for MIME Parts

Replace the recursive approach with an iterative, breadth-first search (BFS) to reliably traverse nested parts without infinite loops.

Modified Code:

function findPdfPart(rootPart) {
  const queue = [rootPart]; // Use a queue for BFS traversal
  while (queue.length > 0) {
    const currentPart = queue.shift();
    // Check if current part is a PDF attachment
    if (
      currentPart.mimeType === 'application/pdf' &&
      currentPart.body?.attachmentId
    ) {
      return currentPart;
    }
    // Add subparts to the queue for further traversal
    if (currentPart.parts) {
      queue.push(...currentPart.parts);
    }
  }
  return null; // No PDF found
}

Key Changes:

  1. BFS Traversal: Processes parts level-by-level using a queue.
  2. No Recursion: Avoids stack overflow in deeply nested structures.
  3. Early Exit: Returns the first valid PDF found.

Enhancements for Robustness

Handle Multiple Attachments

Modify the code to collect all PDFs in the email:

function findAllPdfParts(rootPart) {
  const queue = [rootPart];
  const pdfParts = [];
  while (queue.length > 0) {
    const currentPart = queue.shift();
    if (
      currentPart.mimeType === 'application/pdf' &&
      currentPart.body?.attachmentId
    ) {
      pdfParts.push(currentPart);
    }
    if (currentPart.parts) {
      queue.push(...currentPart.parts);
    }
  }
  return pdfParts;
}

Add Error Handling for Attachments

Check for valid attachment data before saving:

// Inside the try block:
const pdfParts = findAllPdfParts(email.payload);
if (pdfParts.length === 0) {
  console.log("No PDF attachments found.");
  return null;
}

for (const pdfPart of pdfParts) {
  try {
    const attachmentData = await checkInbox({
      token: accessToken,
      messageId: email.id,
      attachmentId: pdfPart.body.attachmentId,
    });
    
    if (!attachmentData?.data) {
      console.log(`Skipping invalid attachment: ${pdfPart.filename}`);
      continue;
    }
    
    // Save the file...
  } catch (error) {
    console.error(`Failed to process ${pdfPart.filename}: ${error.message}`);
  }
}

Add Filename Deduplication

Prevent overwriting files with identical names:

const filename = pdfPart.filename || 'attachment.pdf';
const uniqueFilename = `${Date.now()}_${filename}`;
const filePath = path.join(downloadPath, uniqueFilename);

Final Thoughts

  1. Avoid Recursion for MIME Traversal: Use BFS/DFS with a loop to handle arbitrary nesting depths.
  2. Validate Attachment Data: Not all parts marked as application/pdf may have valid content.
  3. Leverage Libraries: Consider using libraries like googleapis for built-in MIME parsing.

Next Steps:

  • Add support for ZIP/TXT attachments.
  • Integrate email filtering by date/sender.
  • Implement retries for failed downloads.

By anchoring your code to iterative traversal and robust validation, you’ll reliably extract PDFs from even the most convoluted emails.

Related blog posts