It enables unfettered communication between the ENIs within a VPC/subnet and prevents incoming network access from other, unspecified sources.īy default, the security group allows all outbound traffic and is sufficient for AWS Glue requirements. The security group attaches to AWS Glue elastic network interfaces in a specified VPC/subnet. In this example, we call this security group glue-security-group. To allow AWS Glue to communicate with its components, specify a security group with a self-referencing inbound rule for all TCP ports. Step 1: Create a security group for AWS Glue ENIs in your VPC Prepare the JDBC connection for an on-premises data storeįollow these steps to set up the JDBC connection.
#STAT TRANSFER ODBC POSTRGES ERROR READING DICTIONARY HOW TO#
Then it shows how to perform ETL operations on sample data by using a JDBC connection with AWS Glue. The following walkthrough first demonstrates the steps to prepare a JDBC connection for an on-premises data store. An AWS Glue crawler uses an S3 or JDBC connection to catalog the data source, and the AWS Glue ETL job uses S3 or JDBC connections as a source or target data store.Optionally, a NAT gateway or NAT instance setup in a public subnet provides access to the internet if an AWS Glue ETL job requires either access to AWS services using a public API or outgoing internet access.Amazon S3 VPC endpoints (VPCe) provide access to S3, as described in Amazon VPC Endpoints for Amazon S3. S3 can also be a source and a target for the transformed data. AWS Glue uses Amazon S3 to store ETL scripts and temporary files.ENIs can also access a database instance in a different VPC within the same AWS Region or another Region using VPC peering. Elastic network interfaces can access an EC2 database instance or an RDS instance in the same or different subnet using VPC-level routing. AWS Glue can communicate with an on-premises data store over VPN or DX connectivity.AWS Glue DPU instances communicate with each other and with your JDBC-compliant database using ENIs. The number of ENIs depends on the number of data processing units (DPUs) selected for an AWS Glue ETL job.Security groups attached to ENIs are configured by the selected JDBC connection.These network interfaces then provide network connectivity for AWS Glue through your VPC. AWS Glue creates elastic network interfaces (ENIs) in a VPC/private subnet.The JDBC connection defines parameters for a data store-for example, the JDBC connection to the PostgreSQL server running on an on-premises network.Network connectivity exists between the Amazon VPC and the on-premises network using a virtual private network (VPN) or AWS Direct Connect (DX).The solution architecture illustrated in the diagram works as follows: The ENIs in the VPC help connect to the on-premises database server over a virtual private network (VPN) or AWS Direct Connect (DX). The solution uses JDBC connectivity using the elastic network interfaces (ENIs) in the Amazon VPC. The following diagram shows the architecture of using AWS Glue in a hybrid environment, as described in this post. AWS Glue jobs extract data, transform it, and load the resulting data back to S3, data stores in a VPC, or on-premises JDBC data stores as a target.
AWS Glue can also connect to a variety of on-premises JDBC data stores such as PostgreSQL, MySQL, Oracle, Microsoft SQL Server, and MariaDB.ĪWS Glue ETL jobs can use Amazon S3, data stores in a VPC, or on-premises JDBC data stores as a source. For more information, see Adding a Connection to Your Data Store. AWS Glue can connect to Amazon S3 and data stores in a virtual private cloud (VPC) such as Amazon RDS, Amazon Redshift, or a database running on Amazon EC2. In this post, I describe a solution for transforming and moving data from an on-premises data store to Amazon S3 using AWS Glue that simulates a common data lake ingestion pipeline. For optimal operation in a hybrid environment, AWS Glue might require additional network, firewall, or DNS configuration. AWS Glue ETL jobs can interact with a variety of data sources inside and outside of the AWS environment. AWS Glue is a fully managed ETL (extract, transform, and load) service to catalog your data, clean it, enrich it, and move it reliably between various data stores.