Looking for a hypothetical AWS based architecture
New here? Learn about Bountify and follow @bountify to get notified of new bounties! x

To help frame what we will be working on imagine encoding.com or renderrocket.com . We will be hosting a Saas platfrom that takes some files from users, gets instructions, and then carries our the instructions by applying resource intensive transformations on the files. Once done we give the files back to the users either by url, S3, or FTP.

The reason for the bounty is I want to shorten the time needed for us to get started on this and without knowing the AWS stack exactly and wading through the offerings its a bit overwhelming. We also want to consolidate on aws cloud to avoid getting a mess of mixins from a lot of speciality app service providers.

The basic idea is we run an application (nodejs) it will host the Saas functions of the application
A user will go in to the system and upload a big file (suppose video)
- we will use something like fineuploader to go straight to S3 so we can avoid getting the load of upload on our app
- Add message to a queue that file is uploaded and a job needs to be performed
Some worker process that looks at the queue all the time is taskes now with spinning up a machine and pushing a config file to the local app so it knows where the file is and what to do with it
Server will process the file and then upload it either to S3, FTP, or some other remote file location
Server notifies some queue or process that it is done and machine can be killed
Process that got notified will send user an email
Server will have some logs to post somewhere either on failure of a job or performance metrics on the job
-Failed jobs will result in machine not getting killed (will remain in paused state) administrator emailed and user emailed of the issue.

I have a lot of experience with iron.io for queues and workers but I really want to look at what AWS has to offer for all of this. The only hard constraint I have here is we need each job to spawn a full machine. At this time we cant run more than 1 job on a machine at a time

High level architecture must leverage some of the best AWS has to offer for managing all of this stuff - we want to drink the koolaide on this one
The stack must be lean while idle and for the most part scale out as we get "jobs" meaning we pay aws only when we have jobs and not a ton of vertical scale while running with no or few jobs.
Must have some sort of system diagram so I can visualize what your thinking for this system and how the parts communicate with each other.

My intial thought on logging was to use something like elasticsearch/logstash/kibana - i know this isnt exactly the AWS koolaid but its what I was thinking might be best for this sort of system. If there is a AWS equiv then please pitch it for us.

I know this isnt a typical bounty so I will award 2 people (primary, and secondary $50 bonus)

i was thinking that you could use kubernetes to orkestrate your components, each component being built on top of docker. this should take care of the scaling based on the workload. Unfortunately, i don't have enough experience with aws to post a solution.
andijcr over 7 years ago
This was a really interesting read, thank you
apr over 7 years ago
awarded to alixaxel

Crowdsource coding tasks.

2 Solutions

Winning solution

Looks like you're trying to reinvent the awesome AWS Lambda. Here's how it can work:

  1. upload files directly to S3
  2. AWS Lambda can listen for events on all files that match any or certain conditions (like buckets)
  3. once a condition is met, map it to a Lambda function written in Node.js
  4. the Node.js function can infer the rules from some sort of convention or fetch it from an remote source (DB)
  5. once you have the appropriate rule, process the file accordingly and store the output somewhere (S3?)
  6. send an event of completion to a predefined webhook
  7. webhook endpoint is responsible for notifying the user of the processed file

Lambda pricing model is really cool, and you don't have to worry about scaling, maintaining it or the queue.

This is really cool, I got the lambda newsletter from amazon and didn't give it much thought. Very nice to see this use case here fitting like a glove.
julianobs over 7 years ago

You could make use of AWS Elastic Beanstalk.

It is a service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.

You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources. If you decide you want to take over some (or all) of the elements of your infrastructure, you can do so seamlessly by using Elastic Beanstalk's management capabilities.

  • you can use S3 to host uploded files
  • EB has a web server tier to process requests and a worker tier to run background jobs (video processing in your case)
  • EB uses Amazon SQS queues for job control
  • workers get messages from the queue; when the message is processed successfully the SQS queue is told to remove it; otherwise the message is made available again for another attempt at processing.
  • there are specific dead letter queues to hold messages that for some reason could not be successfully processed for further analysis
  • EB has built-in CloudWatch monitoring metrics such as average CPU utilization, request count, and average latency.
  • it can send e-mail notifications through Amazon Simple Notification Service (Amazon SNS) when application health changes or application servers are added or removed.
  • there's an access to server log files without needing to login to the application servers.

Here's a basic diagram:

worker diagram

Elastic Beanstalk details: http://aws.amazon.com/elasticbeanstalk/details/

Architectural Overview: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/concepts.concepts.architecture.html

About EB environment tiers: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html

Info on deployment of Node.js apps: http://docs.aws.amazon.com/elasticbeanstalk/latest/dg/create_deploy_nodejs.html

On Auto-scaling:

Dynamic scaling: http://docs.aws.amazon.com/AutoScaling/latest/DeveloperGuide/as-scale-based-on-demand.html

Also, about your constraint of one machine per request.

They say you could use an on-demand spawning of machines: http://artsy.github.io/blog/2012/07/10/on-demand-jenkins-slaves-with-amazon-ec2/

Don't know how good it can be in terms of responsibility.
Maybe you should have a look at using Docker with EB.

Here's some info:

View Timeline