Track Your S3 Uploads Like a Pro with Lambda and DynamoDB

5 min readJan 28, 2025

Introduction

Managing large-scale data uploads to S3 often comes with an overlooked challenge: tracking the files that were uploaded and their details.

In a recent project, we realized that identifying which files had been uploaded and when was slowing us down. Instead of manually searching logs or relying on external tracking mechanisms, we decided to automate the process.

In this article, we’ll walk through how to build a serverless solution that automatically captures metadata about files uploaded to S3 (i.e. like file name, size, and upload time) and logs it into a DynamoDB table. This ensures all your metadata is at your fingertips, without the manual effort.

We’ll show you how to do this in the following steps:

Step 1: Create an S3 bucket
Step 2: Create a DynamoDB table
Step 3: Create a lambda function
Step 4: Create an S3 Event Notification
Step 5: Test the solution

Architecture

Here’s an architecture diagram of what we’ll implement:

Steps:

Step 1 : Create an S3 bucket

From the AWS Console, navigate to Amazon S3 and select “Create bucket”. For the bucket name, enter “my-file-upload-bucket-883783773” and leave the default configuration settings as they are.

> Result:

A new S3 bucket is created to store all incoming files. We’ll discuss the S3 event notification later in this article.

AWS Console: S3 bucket successfully created

Step 2: Create a DynamoDB table

DynamoDB is a NoSQL database built for scalability, low-latency performance, and smooth integration with other AWS services. In this article, we’ll create a table called “S3FileMetadata” to store file metadata. The table will use the file ID (string) as the partition key to uniquely identify each record.

From the AWS Console, navigate to the DynamoDB service and click on the ‘Create table’ button.

Next, input the table name “S3FileMetadata” and set the partition key to fileID (string). Leave the remaining settings as default.

> Result: ‘S3FileMetadata’ table created successfully.

AWS Console: ‘S3FileMetadata’ table created successfully.

Clicking on the table reveals that it is currently empty, with no items returned.

AWS Console: “S3FileMetadata” View table’s items

Step 3: Create a lambda function

In this step, we will create a Lambda function that serves as the intermediary, processing events and writing data to DynamoDB. The function captures S3 event notifications, extracts metadata like file ID, file name, upload timestamp, and file size, and stores this information in DynamoDB.

Go to the Lambda service in the AWS Management Console, then click the “Create function” button and select “Author from scratch.”

Use “metadata-logger-lambda” as the function name and keep the default configuration as shown below.

AWS Console: Create Lambda function from scratch — basic information

Next, configure the Lambda’s IAM role to allow reading from S3 and uploading records to DynamoDB, all while adhering to the principle of least privilege.

Click on “Permissions” and select “Create a new role from AWS policy templates.” For the role name, enter “metadata-lambda-execution-role” and choose the following policy templates:

Simple microservice permissions
Amazon S3 object read-only permissions

Finally, click the “Create function” button

AWS Console: Create Lambda function from scratch — permissions

> Lambda’s code: metadata-logger-lambda.py

import boto3  
from datetime import datetime  

s3 = boto3.client('s3')  
dynamodb = boto3.resource('dynamodb')  
table = dynamodb.Table('S3FileMetadata')  

def lambda_handler(event, context):  
    for record in event['Records']:  
        bucket = record['s3']['bucket']['name']  
        key = record['s3']['object']['key']  

        # Retrieve file metadata from S3  
        response = s3.head_object(Bucket=bucket, Key=key)  
        metadata = {  
            'fileID': key,  
            'fileName': key.split('/')[-1],  
            'fileSize': response['ContentLength'],  
            'uploadTimestamp': datetime.now().isoformat()  
        }  

        # Store metadata in DynamoDB  
        table.put_item(Item=metadata)  

    return {  
        'statusCode': 200,  
        'body': 'Metadata logged successfully!'  
    }

> Result:

Your Lambda function should appear like this:

AWS Console: Lambda function “*metadata-logger-lambda”*

With this function in place, our solution can handle real-time metadata logging and ensure reliable tracking of S3 uploads.

Step 4: create S3 Event Notifications

In the “Function overview” section, click on “Add trigger.”

Next, select S3 as the trigger and provide the following details:

Bucket: my-file-upload-bucket-883783773
Event types: “All object create events”

AWS Console: Lambda function add trigger

> Result:

The S3 Event Notification trigger was successfully added, and the Lambda function is now ready to receive events from S3.

AWS Console: S3 Event Notification trigger was successfully added

Step 5: Testing the solution

Uploading a new file to S3 will trigger the Lambda function to execute, extract the metadata, and upload the results to the DynamoDB table. Finally, we will scan the DynamoDB table to verify that the solution works end-to-end.

For troubleshooting, it is highly recommended to check the Lambda’s CloudWatch event logs.

Upload a new file to S3

> create a new file named: “test_metadata_solution.txt” and upload it to S3

This is a file to test the metadata solution

Scan the DynamoDB table for new items.

AWS Console: DynamoDB table scan results

Conclusion

Tracking your S3 uploads effectively is essential for maintaining robust data pipelines and ensuring reliable metadata management. By combining the power of AWS Lambda and DynamoDB, you can seamlessly process S3 event notifications, extract critical metadata, and store it for future analysis. This solution demonstrates how to leverage serverless architecture to handle real-time events while adhering to best practices like the principle of least privilege and efficient resource configuration. With this setup, you can monitor and log S3 uploads like a pro, ensuring your system remains scalable, reliable, and cost-effective.

References

What is Event-Driven Architecture?
https://aws.amazon.com/event-driven-architecture/
Create your first Lambda function
https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html
What is Amazon DynamoDB?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html
Store and Retrieve a File with Amazon S3
https://aws.amazon.com/getting-started/hands-on/backup-files-to-amazon-s3/?trk=s3-gs
Serverless on AWS
https://aws.amazon.com/serverless/
Best practice 5.2 — Implement least privilege policies for source and downstream systems
https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/best-practice-5.2---implement-least-privilege-policies-for-source-and-downstream-systems..html

Track Your S3 Uploads Like a Pro with Lambda and DynamoDB

Introduction

Architecture

Steps:

Conclusion

References

Written by Ghita EL AMLAQUI

No responses yet