Support Azure in Snowflake #28

alexrenz · 2024-11-01T08:03:09Z

Add support for running on Azure inside Snowflake.

alexrenz · 2024-11-05T08:05:08Z

src/snowflake/mod.rs

+        let mut locked = self.cached.lock().await;
+
+        match locked.as_ref() {
+            Some(creds) => {
+                if matches!(creds.as_ref(), AzureCredential::SASToken(pairs) if *pairs == new_pairs) {
+                    return Ok(Arc::clone(creds));
+                }
+            }
+            _ => {}
+        }


@andrebsguedes: I adapted this from the S3 code + your variant, but don't follow why we have this caching here.

Speaking about the S3 variant: the request to Snowflake in current_upload_info (which happens before the cache check) should be more expensive than creating an object_store::aws::AwsCredential (which happens after), right? I also don't have a good understanding of when object_store calls get_credential, maybe that is my problem.

Yeah, you are right that this caching here is very confusing and my Azure version of this was pretty dumb to be honest, let me try to explain the rationale:

object_store calls this for every operation, even multiple times in some cases, so we want this to be fast

current_upload_info is a cached call to fetch_upload_info which means that most of the time calling it should just be a few atomic operations (to read the upload_info cache and perform the Arc::clone) and thus fast.

So we call current_upload_info but instead of cloning all the strings to build AwsCredential we cache it so that we can Arc::clone it most of the time

Some critical analysis after looking at my S3 code again:

In reality even if everything goes great the extra contention from the lock may be way worse than the allocations for constructing AwsCredential.

We would have to employ way more clever tricks with arc_swap and use the address of the current_upload_info result to detect change instead of checking the contents

TLDR: I think this caching is not worth it neither in S3 or Azure and we should drop it for now

Agree to drop it. Let's do that in another PR. So I will keep this as is here

src/snowflake/kms.rs

andrebsguedes

Looks awesome! Even more so when we take into account this is your first PR in the project.

Some minor comments:

src/snowflake/client.rs

src/snowflake/kms.rs

src/snowflake/mod.rs

src/util.rs

src/snowflake/mod.rs

andrebsguedes · 2024-11-06T22:48:31Z

src/snowflake/mod.rs

+        let mut locked = self.cached.lock().await;
+
+        match locked.as_ref() {
+            Some(creds) => {
+                if matches!(creds.as_ref(), AzureCredential::SASToken(pairs) if *pairs == new_pairs) {
+                    return Ok(Arc::clone(creds));
+                }
+            }
+            _ => {}
+        }


Yeah, you are right that this caching here is very confusing and my Azure version of this was pretty dumb to be honest, let me try to explain the rationale:

object_store calls this for every operation, even multiple times in some cases, so we want this to be fast

current_upload_info is a cached call to fetch_upload_info which means that most of the time calling it should just be a few atomic operations (to read the upload_info cache and perform the Arc::clone) and thus fast.

So we call current_upload_info but instead of cloning all the strings to build AwsCredential we cache it so that we can Arc::clone it most of the time

Some critical analysis after looking at my S3 code again:

In reality even if everything goes great the extra contention from the lock may be way worse than the allocations for constructing AwsCredential.

We would have to employ way more clever tricks with arc_swap and use the address of the current_upload_info result to detect change instead of checking the contents

TLDR: I think this caching is not worth it neither in S3 or Azure and we should drop it for now

alexrenz · 2024-11-12T09:55:14Z

Cargo.toml

 # object_store = { version = "0.10.1", features = ["azure", "aws"] }
 # Pinned to a specific commit while waiting for upstream
 object_store = { git = "https://github.com/andrebsguedes/arrow-rs.git", tag = "v0.10.2-beta1", features = ["azure", "aws", "experimental-azure-list-offset", "experimental-arbitrary-list-prefix"] }
+hickory-resolver = "0.24"


@andrebsguedes Is there a way to say sth like "use whatever version reqwest is using"?

Unfortunately not. We can add a test later that at least breaks if the versions diverge but for this I don't think it is even needed as reqwest does not care about the version of a custom resolver

andrebsguedes

Looks great! I still have to go through the raicode tests but it is very unlikely that the review there will cause any changes at this level so I will approve this one already.

andrebsguedes · 2024-11-22T14:02:04Z

Cargo.toml

 # object_store = { version = "0.10.1", features = ["azure", "aws"] }
 # Pinned to a specific commit while waiting for upstream
 object_store = { git = "https://github.com/andrebsguedes/arrow-rs.git", tag = "v0.10.2-beta1", features = ["azure", "aws", "experimental-azure-list-offset", "experimental-arbitrary-list-prefix"] }
+hickory-resolver = "0.24"


Unfortunately not. We can add a test later that at least breaks if the versions diverge but for this I don't think it is even needed as reqwest does not care about the version of a custom resolver

Add a few tests for SPCS Azure, as introduced by [object_store_ffi#28](RelationalAI/object_store_ffi#28) --------- Co-authored-by: André Guedes <andre.guedes@relational.ai>

alexrenz added 4 commits November 1, 2024 09:02

first (hacky) cut

0e08faf

clean up a bit based on Andre's variant

6cd0aeb

first cut at encryption

3aaa71b

some cleanup

a9c307d

alexrenz commented Nov 5, 2024

View reviewed changes

alexrenz added 2 commits November 5, 2024 09:55

more cleanup

d5cce06

minor

cd2cf6b

alexrenz commented Nov 5, 2024

View reviewed changes

src/snowflake/kms.rs Show resolved Hide resolved

alexrenz added 2 commits November 5, 2024 10:10

retried; hashes now show up in SF

531fdc9

remove 256 GCM option for now

e5c150c

alexrenz marked this pull request as ready for review November 5, 2024 09:46

alexrenz requested a review from andrebsguedes November 5, 2024 09:46

alexrenz changed the title ~~Snowflake Azure support~~ Support Azure in Snowflake Nov 5, 2024

andrebsguedes reviewed Nov 6, 2024

View reviewed changes

alexrenz added 11 commits November 7, 2024 09:34

TEMP logs about url config

9fabc2b

TEMP Switch back to trust-dns

208558d

hijack hickory with a version that has a larger buffer

79be98c

actually switch back to hickory

ee274ab

Re-activate retries

5e03c8d

Use eDNS

015a137

clean up log msgs

1e25101

make get_master_key a free function

4c3b1a6

simplify

cb30223

actually delete old get_master_key

f3509a4

more cleanup

a7ec1a7

alexrenz commented Nov 12, 2024

View reviewed changes

alexrenz requested a review from andrebsguedes November 12, 2024 10:18

alexrenz added 3 commits November 15, 2024 09:34

Decrypt AES_CBC_256 files with AES 128

ea65e80

cleanup

b16efc2

Merge remote-tracking branch 'origin/main' into arw-azure-support

17d3892

alexrenz added 2 commits November 19, 2024 12:33

Support test endpoint in Azure

8e8d425

proper azurite support

bec58f6

alexrenz mentioned this pull request Nov 19, 2024

Tests for SPCS Azure RelationalAI/RustyObjectStore.jl#46

Merged

alexrenz added 2 commits November 19, 2024 19:45

Bump package version

6e54faa

fix version number

a15568e

andrebsguedes approved these changes Nov 22, 2024

View reviewed changes

alexrenz merged commit 92c8ef5 into main Nov 22, 2024
4 checks passed

Support Azure in Snowflake #28

Support Azure in Snowflake #28

Uh oh!

Conversation

alexrenz commented Nov 1, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Uh oh!

alexrenz Nov 5, 2024 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Choose a reason for hiding this comment

Uh oh!

andrebsguedes Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

alexrenz Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

andrebsguedes left a comment

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

Uh oh!

andrebsguedes Nov 6, 2024

Choose a reason for hiding this comment

Uh oh!

alexrenz Nov 12, 2024

Choose a reason for hiding this comment

Uh oh!

andrebsguedes Nov 22, 2024

Choose a reason for hiding this comment

Uh oh!

andrebsguedes left a comment

Choose a reason for hiding this comment

Uh oh!

andrebsguedes Nov 22, 2024

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

alexrenz commented Nov 1, 2024 •

edited

Loading

alexrenz Nov 5, 2024 •

edited

Loading