-
Notifications
You must be signed in to change notification settings - Fork 0
Support Azure in Snowflake #28
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
| let mut locked = self.cached.lock().await; | ||
|
|
||
| match locked.as_ref() { | ||
| Some(creds) => { | ||
| if matches!(creds.as_ref(), AzureCredential::SASToken(pairs) if *pairs == new_pairs) { | ||
| return Ok(Arc::clone(creds)); | ||
| } | ||
| } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrebsguedes: I adapted this from the S3 code + your variant, but don't follow why we have this caching here.
Speaking about the S3 variant: the request to Snowflake in current_upload_info (which happens before the cache check) should be more expensive than creating an object_store::aws::AwsCredential (which happens after), right? I also don't have a good understanding of when object_store calls get_credential, maybe that is my problem.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right that this caching here is very confusing and my Azure version of this was pretty dumb to be honest, let me try to explain the rationale:
- object_store calls this for every operation, even multiple times in some cases, so we want this to be fast
current_upload_infois a cached call tofetch_upload_infowhich means that most of the time calling it should just be a few atomic operations (to read the upload_info cache and perform theArc::clone) and thus fast.- So we call
current_upload_infobut instead of cloning all the strings to buildAwsCredentialwe cache it so that we canArc::cloneit most of the time
Some critical analysis after looking at my S3 code again:
- In reality even if everything goes great the extra contention from the lock may be way worse than the allocations for constructing
AwsCredential. - We would have to employ way more clever tricks with
arc_swapand use the address of thecurrent_upload_inforesult to detect change instead of checking the contents
TLDR: I think this caching is not worth it neither in S3 or Azure and we should drop it for now
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Agree to drop it. Let's do that in another PR. So I will keep this as is here
andrebsguedes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks awesome! Even more so when we take into account this is your first PR in the project.
Some minor comments:
| let mut locked = self.cached.lock().await; | ||
|
|
||
| match locked.as_ref() { | ||
| Some(creds) => { | ||
| if matches!(creds.as_ref(), AzureCredential::SASToken(pairs) if *pairs == new_pairs) { | ||
| return Ok(Arc::clone(creds)); | ||
| } | ||
| } | ||
| _ => {} | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yeah, you are right that this caching here is very confusing and my Azure version of this was pretty dumb to be honest, let me try to explain the rationale:
- object_store calls this for every operation, even multiple times in some cases, so we want this to be fast
current_upload_infois a cached call tofetch_upload_infowhich means that most of the time calling it should just be a few atomic operations (to read the upload_info cache and perform theArc::clone) and thus fast.- So we call
current_upload_infobut instead of cloning all the strings to buildAwsCredentialwe cache it so that we canArc::cloneit most of the time
Some critical analysis after looking at my S3 code again:
- In reality even if everything goes great the extra contention from the lock may be way worse than the allocations for constructing
AwsCredential. - We would have to employ way more clever tricks with
arc_swapand use the address of thecurrent_upload_inforesult to detect change instead of checking the contents
TLDR: I think this caching is not worth it neither in S3 or Azure and we should drop it for now
| # object_store = { version = "0.10.1", features = ["azure", "aws"] } | ||
| # Pinned to a specific commit while waiting for upstream | ||
| object_store = { git = "https://github.com/andrebsguedes/arrow-rs.git", tag = "v0.10.2-beta1", features = ["azure", "aws", "experimental-azure-list-offset", "experimental-arbitrary-list-prefix"] } | ||
| hickory-resolver = "0.24" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@andrebsguedes Is there a way to say sth like "use whatever version reqwest is using"?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. We can add a test later that at least breaks if the versions diverge but for this I don't think it is even needed as reqwest does not care about the version of a custom resolver
andrebsguedes
left a comment
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Looks great! I still have to go through the raicode tests but it is very unlikely that the review there will cause any changes at this level so I will approve this one already.
| # object_store = { version = "0.10.1", features = ["azure", "aws"] } | ||
| # Pinned to a specific commit while waiting for upstream | ||
| object_store = { git = "https://github.com/andrebsguedes/arrow-rs.git", tag = "v0.10.2-beta1", features = ["azure", "aws", "experimental-azure-list-offset", "experimental-arbitrary-list-prefix"] } | ||
| hickory-resolver = "0.24" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately not. We can add a test later that at least breaks if the versions diverge but for this I don't think it is even needed as reqwest does not care about the version of a custom resolver
Add a few tests for SPCS Azure, as introduced by [object_store_ffi#28](RelationalAI/object_store_ffi#28) --------- Co-authored-by: André Guedes <andre.guedes@relational.ai>
Add support for running on Azure inside Snowflake.