Skip to content

problem with python get_file_properties #713

@peder45552

Description

@peder45552

Environment

Ubuntu 20.04 on both servers
python 3.8.10 on both servers

MS modules on dev
azure-core 1.28.0
azure-storage-blob 12.17.0
azure-storage-file-datalake 12.12.0

MS modules on prod
azure-core 1.25.0
azure-storage-blob 12.14.0b2
azure-storage-file-datalake 12.9.0b1

Background

we have a linux prod server where we upload file from into Azure Datalake using python scripts.
Have the Microsoft module azure-storage-file-datalake installed, which pulled in azure-core, azure-storage-blob.
Our scripts use a adls class where we wrapped our function around MS functions. Our code for upload check
if a file already exist and if the size is different. We used MS function get_file_properties for that,
and then upload the file.
Sometime during fall 2022 our script failed during the upload and it turned out that our function using MS function get_file_properties
timed out after 3+ minutes. Could not get any info why that started happening. Ended up rewriting our code for checking
if file exists and getting file size with a function that us MS get_paths, loop through all the files until found (or not)
and return data about that file.

Problem

This year we needed to test (from the prod server) towards our Azure dev Datalake. Discovered that the function with get_paths
did not work towards adls dev, but MS function get_file_properties did.
Same code, same MS module versions.

Was able to get a dev linux server, install our software, installed the MS modules. On dev server MS function get_file_properties worked.
Noticed that we had different version of the MS modules. Wrote a handful of test scripts that check towards adls prod and dev.
Tested functions for
get meta data for a file, and prinf file size.
get meta data for a folder, print last_modified
(used MS function get_file_properties)

list folders in a folder
list files in a folder
(used MS get_paths, check for file or dir, return object)

upload a file, this includes check if parent exist, if file exist, and check size.

download a file, include checking if file exist.

These test script was run against adls prod and dev (2 different file systems)
All tests ran successfully on the linux dev server.

We assumed the difference in the MS module version was the reason for our problem on the linux prod server.
Since all tests was successful on linux dev, we upgraded our linux prod. Installed the same test script,
upgraded MS modules to the same version.

Test scripts failed on the linux prod towards all adls dev.
Test script to upload files to adls prod worked, after we changed the part of the code that use MS function get_file_properties.
Test to download from adls prod failed.

we ended up rolling back our software, rolling back the MS module versions on linux prod.

How do we troubleshoot this ?
works on one linux with same OS, same python, same tokens for adls prod and dev.

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions