Skip to content
Draft
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
31 changes: 31 additions & 0 deletions lib/shortcuts/api.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,6 +12,9 @@ a Lambda permission.</p>
<dt><a href="#GlueDatabase">GlueDatabase</a></dt>
<dd><p>Create a Glue Database.</p>
</dd>
<dt><a href="#GlueIcebergTable">GlueIcebergTable</a></dt>
<dd><p>Create a Glue table backed by Apache Iceberg format on S3.</p>
</dd>
<dt><a href="#GlueJsonTable">GlueJsonTable</a></dt>
<dd><p>Create a Glue Table backed by line-delimited JSON files on S3.</p>
</dd>
Expand Down Expand Up @@ -202,6 +205,34 @@ const db = new cf.shortcuts.GlueDatabase({

module.exports = cf.merge(myTemplate, db);
```
<a name="GlueIcebergTable"></a>

## GlueIcebergTable
Create a Glue table backed by Apache Iceberg format on S3.

**Kind**: global class
<a name="new_GlueIcebergTable_new"></a>

### new GlueIcebergTable(options)

| Param | Type | Default | Description |
| --- | --- | --- | --- |
| options | <code>Object</code> | | Accepts the same options as cloudfriend's [`GlueTable`](https://github.com/mapbox/cloudfriend/blob/master/lib/shortcuts/glue-table.js), though the following additional attributes are either required or hard-wired: |
| options.Location | <code>String</code> | | The physical location of the table. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-storagedescriptor.html#cfn-glue-table-storagedescriptor-location). |
| [options.TableType] | <code>String</code> | <code>&#x27;EXTERNAL_TABLE&#x27;</code> | Hard-wired by this shortcut. |
| [options.IcebergVersion] | <code>String</code> | <code>&#x27;2&#x27;</code> | The table version for the Iceberg table. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-iceberginput.html). |
| [options.EnableOptimizer] | <code>Boolean</code> | <code>false</code> | Whether to enable the snapshot retention optimizer for this Iceberg table. |
| [options.OptimizerRoleArn] | <code>String</code> | | The ARN of the IAM role for the retention optimizer to use. Required if EnableOptimizer is true. Can be the same role as CompactionRoleArn or OrphanFileDeletionRoleArn if multiple optimizers are enabled. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-tableoptimizer-tableoptimizerconfiguration.html). |
| [options.SnapshotRetentionPeriodInDays] | <code>Number</code> | <code>5</code> | The number of days to retain snapshots. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-glue-tableoptimizer-icebergretentionconfiguration.html). |
| [options.NumberOfSnapshotsToRetain] | <code>Number</code> | <code>1</code> | The minimum number of snapshots to retain. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-glue-tableoptimizer-icebergretentionconfiguration.html). |
| [options.CleanExpiredFiles] | <code>Boolean</code> | <code>true</code> | Whether to delete expired data files after expiring snapshots. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-glue-tableoptimizer-icebergretentionconfiguration.html). |
| [options.EnableCompaction] | <code>Boolean</code> | <code>false</code> | Whether to enable the compaction optimizer for this Iceberg table. Note: CloudFormation does not support configuring compaction strategy or thresholds; the optimizer will use AWS defaults (binpack strategy). Configuration must be done via AWS CLI/API. See [GitHub issue](https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/2257). |
| [options.CompactionRoleArn] | <code>String</code> | | The ARN of the IAM role for the compaction optimizer to use. Required if EnableCompaction is true. Can be the same role as OptimizerRoleArn or OrphanFileDeletionRoleArn if multiple optimizers are enabled. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-tableoptimizer-tableoptimizerconfiguration.html). |
| [options.EnableOrphanFileDeletion] | <code>Boolean</code> | <code>false</code> | Whether to enable the orphan file deletion optimizer for this Iceberg table. |
| [options.OrphanFileDeletionRoleArn] | <code>String</code> | | The ARN of the IAM role for the orphan file deletion optimizer to use. Required if EnableOrphanFileDeletion is true. Can be the same role as OptimizerRoleArn or CompactionRoleArn if multiple optimizers are enabled. See [AWS documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-tableoptimizer-tableoptimizerconfiguration.html). |
| [options.OrphanFileRetentionPeriodInDays] | <code>Number</code> | <code>3</code> | The number of days to retain orphan files before deleting them. See [AWS documentation](https://docs.aws.amazon.com/glue/latest/dg/enable-orphan-file-deletion.html). |
| [options.OrphanFileDeletionLocation] | <code>String</code> | | The S3 location to scan for orphan files. Defaults to the table location if not specified. See [AWS documentation](https://docs.aws.amazon.com/glue/latest/dg/enable-orphan-file-deletion.html). |

<a name="GlueJsonTable"></a>

## GlueJsonTable
Expand Down
207 changes: 207 additions & 0 deletions lib/shortcuts/glue-iceberg-table.js
Original file line number Diff line number Diff line change
@@ -0,0 +1,207 @@
'use strict';

const GlueTable = require('./glue-table');

/**
* Create a Glue table backed by Apache Iceberg format on S3.
*
* @param {Object} options - Accepts the same options as cloudfriend's
* [`GlueTable`](https://github.com/mapbox/cloudfriend/blob/master/lib/shortcuts/glue-table.js),
* though the following additional attributes are either required or hard-wired:
* @param {String} options.Location - The physical location of the table. See
* [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-storagedescriptor.html#cfn-glue-table-storagedescriptor-location).
* @param {String} [options.TableType='EXTERNAL_TABLE'] - Hard-wired by this
* shortcut.
* @param {String} [options.IcebergVersion='2'] - The table version for the
* Iceberg table. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-table-iceberginput.html).
* @param {Boolean} [options.EnableOptimizer=false] - Whether to enable the
* snapshot retention optimizer for this Iceberg table.
* @param {String} [options.OptimizerRoleArn=undefined] - The ARN of the IAM
* role for the retention optimizer to use. Required if EnableOptimizer is
* true. Can be the same role as CompactionRoleArn or OrphanFileDeletionRoleArn
* if multiple optimizers are enabled. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-tableoptimizer-tableoptimizerconfiguration.html).
* @param {Number} [options.SnapshotRetentionPeriodInDays=5] - The number of
* days to retain snapshots. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-glue-tableoptimizer-icebergretentionconfiguration.html).
* @param {Number} [options.NumberOfSnapshotsToRetain=1] - The minimum number
* of snapshots to retain. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-glue-tableoptimizer-icebergretentionconfiguration.html).
* @param {Boolean} [options.CleanExpiredFiles=true] - Whether to delete
* expired data files after expiring snapshots. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/TemplateReference/aws-properties-glue-tableoptimizer-icebergretentionconfiguration.html).
* @param {Boolean} [options.EnableCompaction=false] - Whether to enable the
* compaction optimizer for this Iceberg table. Note: CloudFormation does not
* support configuring compaction strategy or thresholds; the optimizer will use
* AWS defaults (binpack strategy). Configuration must be done via AWS CLI/API.
* See [GitHub issue](https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/2257).
* @param {String} [options.CompactionRoleArn=undefined] - The ARN of the IAM
* role for the compaction optimizer to use. Required if EnableCompaction is
* true. Can be the same role as OptimizerRoleArn or OrphanFileDeletionRoleArn
* if multiple optimizers are enabled. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-tableoptimizer-tableoptimizerconfiguration.html).
* @param {Boolean} [options.EnableOrphanFileDeletion=false] - Whether to
* enable the orphan file deletion optimizer for this Iceberg table.
* @param {String} [options.OrphanFileDeletionRoleArn=undefined] - The ARN of
* the IAM role for the orphan file deletion optimizer to use. Required if
* EnableOrphanFileDeletion is true. Can be the same role as OptimizerRoleArn
* or CompactionRoleArn if multiple optimizers are enabled. See [AWS
* documentation](https://docs.aws.amazon.com/AWSCloudFormation/latest/UserGuide/aws-properties-glue-tableoptimizer-tableoptimizerconfiguration.html).
* @param {Number} [options.OrphanFileRetentionPeriodInDays=3] - The number of
* days to retain orphan files before deleting them. See [AWS
* documentation](https://docs.aws.amazon.com/glue/latest/dg/enable-orphan-file-deletion.html).
* @param {String} [options.OrphanFileDeletionLocation=undefined] - The S3
* location to scan for orphan files. Defaults to the table location if not
* specified. See [AWS
* documentation](https://docs.aws.amazon.com/glue/latest/dg/enable-orphan-file-deletion.html).
*/
class GlueIcebergTable extends GlueTable {
constructor(options) {
if (!options) throw new Error('Options required');
const {
Location,
IcebergVersion = '2',
EnableOptimizer = false,
OptimizerRoleArn,
SnapshotRetentionPeriodInDays = 5,
NumberOfSnapshotsToRetain = 1,
CleanExpiredFiles = true,
EnableCompaction = false,
CompactionRoleArn,
EnableOrphanFileDeletion = false,
OrphanFileDeletionRoleArn,
OrphanFileRetentionPeriodInDays = 3,
OrphanFileDeletionLocation
} = options;

const required = [Location];
if (required.some((variable) => !variable))
throw new Error('You must provide a Location');

if (EnableOptimizer && !OptimizerRoleArn)
throw new Error('You must provide an OptimizerRoleArn when EnableOptimizer is true');

if (EnableCompaction && !CompactionRoleArn)
throw new Error('You must provide a CompactionRoleArn when EnableCompaction is true');

if (EnableOrphanFileDeletion && !OrphanFileDeletionRoleArn)
throw new Error('You must provide an OrphanFileDeletionRoleArn when EnableOrphanFileDeletion is true');

super(
Object.assign(
{
TableType: 'EXTERNAL_TABLE',
Parameters: { EXTERNAL: 'TRUE' }
},
options
)
);

const logicalName = options.LogicalName;
this.Resources[logicalName].Properties.OpenTableFormatInput = {
IcebergInput: {
MetadataOperation: 'CREATE',
Version: IcebergVersion
}
};

// Optionally add TableOptimizer for configuring snapshot retention
if (EnableOptimizer) {
const optimizerLogicalName = `${logicalName}RetentionOptimizer`;
this.Resources[optimizerLogicalName] = {
Type: 'AWS::Glue::TableOptimizer',
DependsOn: logicalName,
Properties: {
CatalogId: options.CatalogId || { Ref: 'AWS::AccountId' },
DatabaseName: options.DatabaseName,
TableName: options.Name,
Type: 'retention',
TableOptimizerConfiguration: {
RoleArn: OptimizerRoleArn,
Enabled: true,
RetentionConfiguration: {
IcebergConfiguration: {
SnapshotRetentionPeriodInDays,
NumberOfSnapshotsToRetain,
CleanExpiredFiles
}
}
}
}
};

// Apply Condition to optimizer if specified on the table
if (options.Condition) {
this.Resources[optimizerLogicalName].Condition = options.Condition;
}
}

// Optionally add TableOptimizer for compaction
// NOTE: CloudFormation does not support CompactionConfiguration properties
// (strategy, minInputFiles, deleteFileThreshold). These must be configured
// via AWS CLI/API after stack creation, or will use AWS defaults.
// See: https://github.com/aws-cloudformation/cloudformation-coverage-roadmap/issues/2257
if (EnableCompaction) {
const compactionLogicalName = `${logicalName}CompactionOptimizer`;
this.Resources[compactionLogicalName] = {
Type: 'AWS::Glue::TableOptimizer',
DependsOn: logicalName,
Properties: {
CatalogId: options.CatalogId || { Ref: 'AWS::AccountId' },
DatabaseName: options.DatabaseName,
TableName: options.Name,
Type: 'compaction',
TableOptimizerConfiguration: {
RoleArn: CompactionRoleArn,
Enabled: true
}
}
};

// Apply Condition to compaction optimizer if specified on the table
if (options.Condition) {
this.Resources[compactionLogicalName].Condition = options.Condition;
}
}

// Optionally add TableOptimizer for orphan file deletion
if (EnableOrphanFileDeletion) {
const orphanLogicalName = `${logicalName}OrphanFileDeletionOptimizer`;
const icebergConfiguration = {
OrphanFileRetentionPeriodInDays
};

// Only add Location if specified, otherwise it defaults to table location
if (OrphanFileDeletionLocation) {
icebergConfiguration.Location = OrphanFileDeletionLocation;
}

this.Resources[orphanLogicalName] = {
Type: 'AWS::Glue::TableOptimizer',
DependsOn: logicalName,
Properties: {
CatalogId: options.CatalogId || { Ref: 'AWS::AccountId' },
DatabaseName: options.DatabaseName,
TableName: options.Name,
Type: 'orphan_file_deletion',
TableOptimizerConfiguration: {
RoleArn: OrphanFileDeletionRoleArn,
Enabled: true,
OrphanFileDeletionConfiguration: {
IcebergConfiguration: icebergConfiguration
}
}
}
};

// Apply Condition to orphan file deletion optimizer if specified on the table
if (options.Condition) {
this.Resources[orphanLogicalName].Condition = options.Condition;
}
}
}
}

module.exports = GlueIcebergTable;
1 change: 1 addition & 0 deletions lib/shortcuts/index.js
Original file line number Diff line number Diff line change
Expand Up @@ -16,6 +16,7 @@ module.exports = {
GlueJsonTable: require('./glue-json-table'),
GlueOrcTable: require('./glue-orc-table'),
GlueParquetTable: require('./glue-parquet-table'),
GlueIcebergTable: require('./glue-iceberg-table'),
GluePrestoView: require('./glue-presto-view'),
GlueSparkView: require('./glue-spark-view'),
hookshot: require('./hookshot'),
Expand Down
4 changes: 2 additions & 2 deletions requirements.dev.txt
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
aws-sam-cli==1.142.1
cfn-lint==1.36.1
aws-sam-cli==1.149.0
cfn-lint==1.41.0
50 changes: 50 additions & 0 deletions test/fixtures/shortcuts/glue-iceberg-table-defaults.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,50 @@
{
"AWSTemplateFormatVersion": "2010-09-09",
"Metadata": {},
"Parameters": {},
"Rules": {},
"Mappings": {},
"Conditions": {},
"Resources": {
"MyTable": {
"Type": "AWS::Glue::Table",
"Properties": {
"CatalogId": {
"Ref": "AWS::AccountId"
},
"DatabaseName": "my_database",
"TableInput": {
"Description": {
"Fn::Sub": "Created by the ${AWS::StackName} CloudFormation stack"
},
"Name": "my_table",
"Parameters": {
"EXTERNAL": "TRUE"
},
"PartitionKeys": [],
"TableType": "EXTERNAL_TABLE",
"StorageDescriptor": {
"Columns": [
{
"Name": "column",
"Type": "string"
}
],
"Compressed": false,
"Location": "s3://fake/location",
"NumberOfBuckets": 0,
"SerdeInfo": {},
"StoredAsSubDirectories": true
}
},
"OpenTableFormatInput": {
"IcebergInput": {
"MetadataOperation": "CREATE",
"Version": "2"
}
}
}
}
},
"Outputs": {}
}
60 changes: 60 additions & 0 deletions test/fixtures/shortcuts/glue-iceberg-table-no-defaults.json
Original file line number Diff line number Diff line change
@@ -0,0 +1,60 @@
{
"AWSTemplateFormatVersion": "2010-09-09",
"Metadata": {},
"Parameters": {},
"Rules": {},
"Mappings": {},
"Conditions": {
"Always": {
"Fn::Equals": [
"1",
"1"
]
}
},
"Resources": {
"AnotherThing": {
"Type": "AWS::SNS::Topic"
},
"MyTable": {
"Type": "AWS::Glue::Table",
"Condition": "Always",
"DependsOn": "AnotherThing",
"Properties": {
"CatalogId": "1234",
"DatabaseName": "my_database",
"TableInput": {
"Description": "my_table description",
"Name": "my_table",
"Owner": "Team",
"Parameters": {
"table": "params"
},
"PartitionKeys": [],
"Retention": 12,
"TableType": "EXTERNAL_TABLE",
"StorageDescriptor": {
"Columns": [
{
"Name": "column",
"Type": "string"
}
],
"Compressed": false,
"Location": "s3://fake/location",
"NumberOfBuckets": 0,
"SerdeInfo": {},
"StoredAsSubDirectories": true
}
},
"OpenTableFormatInput": {
"IcebergInput": {
"MetadataOperation": "CREATE",
"Version": "2"
}
}
}
}
},
"Outputs": {}
}
Loading