Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
16 changes: 16 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
Expand Up @@ -2,4 +2,20 @@

<!-- <START NEW CHANGELOG ENTRY> -->

## [Unreleased]

### Added
- AWS Athena database connection support
- Added new database type `DB_ATHENA` (type 7)
- Support for AWS credentials (access key, secret key)
- Configurable region, workgroup, and S3 staging directory
- Uses pyathena library for optimal performance
- Fallback to JDBC driver if pyathena is not available
- Custom Athena icon for the UI
- Dedicated connection form fields for Athena configuration

### Changed
- Updated database connection interface to support Athena-specific fields
- Enhanced UI to show Athena-specific input fields when Athena is selected

<!-- <END NEW CHANGELOG ENTRY> -->
179 changes: 179 additions & 0 deletions INSTALL_ATHENA.md
Original file line number Diff line number Diff line change
@@ -0,0 +1,179 @@
# AWS Athena Installation and Usage Guide

This guide explains how to use AWS Athena in jupyterlab-sql-explorer.

## Installation

### 1. Basic Installation

```bash
pip install jupyterlab-sql-explorer[athena]
```

### 2. Manual Installation (Optional)

If the above command does not work:

```bash
pip install jupyterlab-sql-explorer
pip install pyathena>=2.0.0
```

## AWS Configuration

### 1. AWS Credentials

To connect to AWS Athena, you need the following information:

- **Access Key ID**: AWS IAM user's access key
- **Secret Access Key**: AWS IAM user's secret key
- **Region**: AWS region where Athena is located (e.g., ap-south-1, us-east-1)
- **Workgroup**: Athena workgroup (default: primary)
- **S3 Staging Directory**: The S3 bucket path where query results will be stored

### 2. IAM Permissions

Users connecting to Athena require the following permissions:

```json
{
“Version”: “2012-10-17”,
“Statement”: [
{
“Effect”: “Allow”,
“Action”: [
“athena:StartQueryExecution”,
“athena:GetQueryExecution”,
“athena:GetQueryResults”,
“athena:StopQueryExecution”,
“athena:GetWorkGroup”,
“athena:ListWorkGroups”,
“athena:ListDataCatalogs”,
“athena:GetDataCatalog”,
“athena:ListDatabases”,
“athena:GetDatabase”,
“athena:ListTableMetadata”,
“athena:GetTableMetadata”
],
“Resource”: “*”
},
{
“Effect”: “Allow”,
“Action”: [
“s3:GetBucketLocation”,
“s3:GetObject”,
“s3:ListBucket”,
“s3:PutObject”
],
“Resource”: [
“arn:aws:s3:::your-staging-bucket”,
“arn:aws:s3:::your-staging-bucket/*”
]
}
]
}
```

## Usage

### 1. Create a new connection

1. Open the SQL Explorer panel in JupyterLab
2. Click the “Add new database connection” button
3. Select “AWS Athena” as the database type
4. Enter the following information:
- **Name**: Display name of the connection
- **ID**: Unique ID of the connection
- **Access Key**: AWS access key ID
- **Secret Key**: AWS secret access key
- **Region**: AWS region (e.g., ap-south-1)
- **Workgroup**: Athena workgroup (default: primary)
- **S3 Staging Dir**: S3 bucket path (e.g., s3://my-bucket/athena-results/)

### 2. Test the connection

Once the connection has been successfully created:
1. Click the newly created Athena connection in the connection list
2. The database and table list will be displayed
3. Click on a table to view column information

### 3. Run SQL Query

1. Right-click on the connection and select “Open Sql Console”
2. Write and execute an SQL query
3. The results will be displayed in table form

## Example

### Basic Query Example

```sql
-- View table list
SHOW TABLES;

-- View data from a specific table
SELECT * FROM your_database.your_table LIMIT 10;

-- Aggregate query
SELECT
column1,
COUNT(*) as count
FROM your_database.your_table
GROUP BY column1
ORDER BY count DESC;
```

## Troubleshooting

### 1. Connection Error

- Verify that your AWS credentials are correct
- Verify that the region is correct
- Verify that the S3 staging directory exists and is accessible

### 2. Permission Error

- Verify that the IAM user has the necessary permissions
- Verify that you have access permissions for the Athena workgroup

### 3. pyathena installation error

```bash
# Upgrade pip
pip install --upgrade pip

# Reinstall pyathena
pip uninstall pyathena
pip install pyathena>=2.0.0
```

## Advanced settings

### 1. Using environment variables

You can set AWS credentials as environment variables:

```bash
export AWS_ACCESS_KEY_ID=your_access_key
export AWS_SECRET_ACCESS_KEY=your_secret_key
export AWS_DEFAULT_REGION=ap-south-1
```

### 2. AWS CLI settings

You can set credentials using the AWS CLI:

```bash
aws configure
```

Once configured, pyathena will automatically use the AWS CLI settings.

## Support

If you encounter any issues, please check the following:

1. Whether the pyathena library is installed correctly
2. Whether your AWS credentials are valid
3. Whether your network connection is working properly
4. Check the JupyterLab logs for error messages
17 changes: 16 additions & 1 deletion README.md
Original file line number Diff line number Diff line change
Expand Up @@ -8,7 +8,7 @@ jupyterlab-sql-explorer is an extensible JupyterLab plugin that enables users to

- Run SQL statements directly in JupyterLab and view the returned results.

- Support for multiple databases, including MySQL, PostgreSQL, Hive, SQLite, ORACLE, and more.
- Support for multiple databases, including MySQL, PostgreSQL, Hive, SQLite, ORACLE, AWS Athena, and more.

- Edit annotations for data objects and support for both local and shared modes. With jupyterlab-sql-explorer, users can add annotations to data objects such as tables and views. This feature is especially valuable for data analysts working in teams, as it facilitates collaboration and knowledge sharing around specific data assets.

Expand Down Expand Up @@ -49,6 +49,20 @@ Ensure that each team member follows the steps mentioned above to modify the con
- JupyterLab >= 4.0 : for JupyterLab 3.x please use version 0.1.x
- sqlalchemy >1.4

## Database Connections

### AWS Athena

To connect to AWS Athena, you need to provide the following information:

1. **Access Key ID**: Your AWS access key
2. **Secret Access Key**: Your AWS secret key
3. **Region**: AWS region (e.g., ap-south-1, us-east-1)
4. **Workgroup**: Athena workgroup (default: primary)
5. **S3 Staging Directory**: S3 bucket path for query results (e.g., s3://my-bucket/athena-results/)

The connection will use the Athena JDBC driver and pyathena library for optimal performance.

## Install

To install the extension, execute:
Expand All @@ -61,6 +75,7 @@ or install with special database driver,

```bash
pip install jupyterlab-sql-explorer[hive]
pip install jupyterlab-sql-explorer[athena]
```

## Uninstall
Expand Down
Loading