Hi I have 3 different files (2 x CSVs and 1 x JSON) with student transcript grades from different schools.
The first school derives from a CSV with following structure:
| firstname | lastname | topic | mark |
|---|---|---|---|
| Mark | Johnson | Math | A+ |
| John | Fisher | Art | B- |
The second school has a CSV file with the structure below:
| name | topic | mark |
|---|---|---|
| Peter | Music | A+ |
| Mary | Art | B- |
Finally the 3rd school is a Json file with the structure below:
[
{
"firstname": "Peter",
"lastname": "McCkaulay",
"subject": "Mathematics",
"grade": 49
},
{
"first_name": "Mary",
"last_name": "Jane",
"subject": "Physics",
"grade": ""
},
{
"first_name": "Joseph",
"last_name": "Brighton",
"subject": "Soc. Studies",
"grade": 89
}
]
Can anyone please give me some recommendations on how to to build an efficient ETL process on AWS that will allow me to process all data from the 3 different schools and load that into an AWS RDS (PostgreSQL, MySQL, etc) so I can run some analysis over the data?
I know I could achieve this by loading the 3 files into S3, then create a lambda to load the data into DynamoDB and then load that into the RDS. Is that the best option though?
Any help is appreciated.
You can create a workflow by using AWS Step functions and that is able to perform ETL operations on the data that you are describing. (In cases where a given data set is too large that will timeout Lambda functions, then look at using Glue. However, given your use case and the data that you describe, I doubt that is the case here and Lambda will work).
You can use Lambda functions to perform the data operations and the AWS SDK to invoke AWS Service operations to meet your business requirements.
As an example of how to use Lambda and AWS Step functions to perform this use case, see this AWS tutorial, that shows a similar use case that reads an excel document that is located in an Amazon S3 bucket, extracts the data and puts the data into an Amazon DynamoDB table.

This AWS tutorial is implemented by using the AWS SDK for Java ; however, you can write the Lambda functions in any of the supported programming languages. This will certainly point you in the right direction.
Creating an ETL workflow by using AWS Step Functions and the AWS SDK for Java
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With