Skip to content

Conversation

@vustef
Copy link
Collaborator

@vustef vustef commented Dec 24, 2025

I'm adding support for adls.sas-token.xxx properties. As noted in this ticket: apache#1442, this is needed for vended credentials for Azure.

Also, the azure storage is a bit confusing. There's combination of [abfss/wasbs] x [.blob/.dfs] x [blob storage with / without hierarchical namespace] schemes. First is intended to be a driver, second to be an endpoint, while third is what is actually present on server side. However, because of its unclear semantics, nobody knows what these mean, and different systems implement different things.
For example, snowflake doesn't support writing to .blob endpoints for externally managed iceberg tables. So when testing I create it with wasbs://...blob.... However, snowflake then keeps creating paths with abfss://...blob. Which is something that iceberg-rust doesn't allow, i.e. it matches driver to the endpoint. But that seems to be completely unnecessary restriction, so in this PR I'm removing it.
Further, iceberg-rust delegates to OpenDAL. And before it does that, it replaces .blob with .dfs, in an https:// request that it forms. Because iceberg-rust uses AzdlsBackend OpenDAL backend, and never AzblobBackend.

So to conclude what my understanding is:

  1. abfss/wasbs are just indications for drivers (legacy from hadoop world). They seem to be mostly legacy and not honoured in any way in most systems. They are just forms of azure URL, and can be used interchangeably.
  2. dfs/blob are different endpoints, requiring different HTTP headers in request and response.
  3. The actual storage can be either blob storage or ADLS Gen 2 (blob storage with hierarchical namespace aka HNS).
  4. All can be mixed and matched, or should be possible.
  5. There are some limitations though when mixing them. E.g. soft delete mustn't be enabled if you use .dfs endpoint. I think HNS is incompatible with soft delete also.
  6. OpenDal supports both blob and azdls, but we only use azdls, therefore iceberg-rust limitation of wasbs=>blob and abfss=>dfs is completely artificial.
  7. Snowflake is mistaken here to always use abfss, but it is what it is (it requires abfss here: https://docs.snowflake.com/user-guide/opencatalog/create-catalog#id1)

What would be proper fix though is that based on endpoint, we choose the proper OpenDAL backend. Then if it did, we could support Azurite for tests. However, I still think wasbs/abfss are just artifical limitations that don't matter much. But so is scheme (I think we should auto-recognize scheme based on endpoint), so it may very well be that in the current architecture of iceberg-rust we split to different OpenDAL backends based on scheme, and only allow pairing abfss with .dfs endpoints. However, this would break for paths that snowflake produces in open catalog, so I don't think this is a good direction.

@vustef vustef requested a review from gbrgr December 24, 2025 15:21
Base automatically changed from vs-credentials to main December 26, 2025 10:38
@vustef vustef marked this pull request as ready for review December 26, 2025 10:53
@vustef vustef changed the title azdls.sas-token. prefix support azdls.sas-token.<account> support for vended credentials Dec 26, 2025
@vustef
Copy link
Collaborator Author

vustef commented Dec 26, 2025

Merging this. Will welcome post-merge reviews after people are back from holidays.

@vustef vustef merged commit aec29ba into main Dec 26, 2025
14 checks passed
@vustef vustef deleted the vs-azdls-sas-token-prefix branch December 26, 2025 11:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

1 participant