Track Your S3 Uploads Like a Pro with Lambda and DynamoDB
Introduction
Managing large-scale data uploads to S3 often comes with an overlooked challenge: tracking the files that were uploaded and their details.
In a recent project, we realized that identifying which files had been uploaded and when was slowing us down. Instead of manually searching logs or relying on external tracking mechanisms, we decided to automate the process.
In this article, we’ll walk through how to build a serverless solution that automatically captures metadata about files uploaded to S3 (i.e. like file name, size, and upload time) and logs it into a DynamoDB table. This ensures all your metadata is at your fingertips, without the manual effort.
We’ll show you how to do this in the following steps:
- Step 1: Create an S3 bucket
- Step 2: Create a DynamoDB table
- Step 3: Create a lambda function
- Step 4: Create an S3 Event Notification
- Step 5: Test the solution
Architecture
Here’s an architecture diagram of what we’ll implement:
Steps:
Step 1 : Create an S3 bucket
From the AWS Console, navigate to Amazon S3 and select “Create bucket”. For the bucket name, enter “my-file-upload-bucket-883783773” and leave the default configuration settings as they are.
> Result:
A new S3 bucket is created to store all incoming files. We’ll discuss the S3 event notification later in this article.
Step 2: Create a DynamoDB table
DynamoDB is a NoSQL database built for scalability, low-latency performance, and smooth integration with other AWS services. In this article, we’ll create a table called “S3FileMetadata” to store file metadata. The table will use the file ID (string) as the partition key to uniquely identify each record.
From the AWS Console, navigate to the DynamoDB service and click on the ‘Create table’ button.
Next, input the table name “S3FileMetadata” and set the partition key to fileID (string). Leave the remaining settings as default.
> Result: ‘S3FileMetadata’ table created successfully.
Clicking on the table reveals that it is currently empty, with no items returned.
Step 3: Create a lambda function
In this step, we will create a Lambda function that serves as the intermediary, processing events and writing data to DynamoDB. The function captures S3 event notifications, extracts metadata like file ID, file name, upload timestamp, and file size, and stores this information in DynamoDB.
Go to the Lambda service in the AWS Management Console, then click the “Create function” button and select “Author from scratch.”
Use “metadata-logger-lambda” as the function name and keep the default configuration as shown below.
Next, configure the Lambda’s IAM role to allow reading from S3 and uploading records to DynamoDB, all while adhering to the principle of least privilege.
Click on “Permissions” and select “Create a new role from AWS policy templates.” For the role name, enter “metadata-lambda-execution-role” and choose the following policy templates:
- Simple microservice permissions
- Amazon S3 object read-only permissions
Finally, click the “Create function” button
> Lambda’s code: metadata-logger-lambda.py
import boto3
from datetime import datetime
s3 = boto3.client('s3')
dynamodb = boto3.resource('dynamodb')
table = dynamodb.Table('S3FileMetadata')
def lambda_handler(event, context):
for record in event['Records']:
bucket = record['s3']['bucket']['name']
key = record['s3']['object']['key']
# Retrieve file metadata from S3
response = s3.head_object(Bucket=bucket, Key=key)
metadata = {
'fileID': key,
'fileName': key.split('/')[-1],
'fileSize': response['ContentLength'],
'uploadTimestamp': datetime.now().isoformat()
}
# Store metadata in DynamoDB
table.put_item(Item=metadata)
return {
'statusCode': 200,
'body': 'Metadata logged successfully!'
}
> Result:
Your Lambda function should appear like this:
With this function in place, our solution can handle real-time metadata logging and ensure reliable tracking of S3 uploads.
Step 4: create S3 Event Notifications
In the “Function overview” section, click on “Add trigger.”
Next, select S3 as the trigger and provide the following details:
- Bucket: my-file-upload-bucket-883783773
- Event types: “All object create events”
> Result:
The S3 Event Notification trigger was successfully added, and the Lambda function is now ready to receive events from S3.
Step 5: Testing the solution
Uploading a new file to S3 will trigger the Lambda function to execute, extract the metadata, and upload the results to the DynamoDB table. Finally, we will scan the DynamoDB table to verify that the solution works end-to-end.
For troubleshooting, it is highly recommended to check the Lambda’s CloudWatch event logs.
- Upload a new file to S3
> create a new file named: “test_metadata_solution.txt” and upload it to S3
This is a file to test the metadata solution
- Scan the DynamoDB table for new items.
Conclusion
Tracking your S3 uploads effectively is essential for maintaining robust data pipelines and ensuring reliable metadata management. By combining the power of AWS Lambda and DynamoDB, you can seamlessly process S3 event notifications, extract critical metadata, and store it for future analysis. This solution demonstrates how to leverage serverless architecture to handle real-time events while adhering to best practices like the principle of least privilege and efficient resource configuration. With this setup, you can monitor and log S3 uploads like a pro, ensuring your system remains scalable, reliable, and cost-effective.
References
- What is Event-Driven Architecture?
https://aws.amazon.com/event-driven-architecture/ - Create your first Lambda function
https://docs.aws.amazon.com/lambda/latest/dg/getting-started.html - What is Amazon DynamoDB?
https://docs.aws.amazon.com/amazondynamodb/latest/developerguide/Introduction.html - Store and Retrieve a File with Amazon S3
https://aws.amazon.com/getting-started/hands-on/backup-files-to-amazon-s3/?trk=s3-gs - Serverless on AWS
https://aws.amazon.com/serverless/ - Best practice 5.2 — Implement least privilege policies for source and downstream systems
https://docs.aws.amazon.com/wellarchitected/latest/analytics-lens/best-practice-5.2---implement-least-privilege-policies-for-source-and-downstream-systems..html