Running AWS Glue Job in Scala Locally and Connect to Redshift
While I developed the data warehouse project, It was hard to write the code on AWS Glue UI. That’s why I started looking for a way to write the code locally. I found a solution to add the AWS Glue dependencies to the data warehouse project in Scala, then I could use the AWS glue functions locally.
This functionality helps me to write the code easily on my machine :)
Today, I’m gonna tell you how I implemented and used it.
If I sort the topics,
- Create a new project on Intellij Idea
- Configure dependencies & modules
- Use AWS Glue Functions
Create a new project on Intellij Idea
File -> new project -> and choose the following steps
After creating the project, right-click on the project name -> new -> Module
And add a new module
Configure dependencies & modules
Add the following Gradle file to the main project and update the Gradle.
Add the following Gradle file to the job_scripts module and update the Gradle
We added redshift, AWS glue, and Scala libraries into our project then we are able to use AWS glue functions and connect to redshift.
as you see we haven’t had a scala directory.
Right-click on main -> new -> directory
Choose scala directory.
Use AWS Glue Functions
As you remember I wrote about how can we create crawler transform data etc..
Now we imagine we had a database as I create it by AWS Glue Crawler my old post.
Let’s read the code
As you see in the code,
- we can use glue functions
- get the data from glue database-table
- bringing the data to redshift.
If you want to connect to redshift with JDBC connection;