Skip to content

[Python][FS][Azure] Pickling SubTreeFileSystem(base_path, AzureFileSystem(...)) is lossy #49078

@Tom-Newton

Description

@Tom-Newton

Describe the bug, including details regarding any error messages, version, and platform.

Reproduce:

import pyarrow.fs

azure_fs = pyarrow.fs.AzureFileSystem(account_name="test", sas_token="test")
print(azure_fs.__reduce__())

subtree_fs = pyarrow.fs.SubTreeFileSystem("/tmp", azure_fs)
print(subtree_fs.base_fs.__reduce__())

Returns

(<cyfunction AzureFileSystem._reconstruct at 0x79d22010c940>, ({'account_name': 'test', 'account_key': '', 'blob_storage_authority': '.blob.core.windows.net', 'blob_storage_scheme': 'https', 'client_id': '', 'client_secret': '', 'dfs_storage_authority': '.dfs.core.windows.net', 'dfs_storage_scheme': 'https', 'sas_token': 'test', 'tenant_id': ''},))
(<cyfunction AzureFileSystem._reconstruct at 0x79d22010c940>, ({'account_name': 'test', 'account_key': '', 'blob_storage_authority': '.blob.core.windows.net', 'blob_storage_scheme': 'https', 'client_id': '', 'client_secret': '', 'dfs_storage_authority': '.dfs.core.windows.net', 'dfs_storage_scheme': 'https', 'sas_token': '', 'tenant_id': ''},))

Notice how the first result the sas_token is not empty but the second one is.

Cause:

The sas_token and a couple of the other values returned by AzureFileSystem.__reduce__ read from self of the python side AzureFileSystem object. When constructing a SubTreeFileSystem, the python side AzureFileSystem object is discarded and the SubTreeFileSystem only holds a pointer to the CAzureFileSystem. Therefore its not possible to reconstruct a python side AzureFileSystem including the attributes on self of the original AzureFileSystem.

Component(s)

Python

Metadata

Metadata

Assignees

Type

No type
No fields configured for issues without a type.

Projects

No projects

Milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions