-
Notifications
You must be signed in to change notification settings - Fork 242
Description
Environment
Ubuntu 20.04 on both servers
python 3.8.10 on both servers
MS modules on dev
azure-core 1.28.0
azure-storage-blob 12.17.0
azure-storage-file-datalake 12.12.0
MS modules on prod
azure-core 1.25.0
azure-storage-blob 12.14.0b2
azure-storage-file-datalake 12.9.0b1
Background
we have a linux prod server where we upload file from into Azure Datalake using python scripts.
Have the Microsoft module azure-storage-file-datalake installed, which pulled in azure-core, azure-storage-blob.
Our scripts use a adls class where we wrapped our function around MS functions. Our code for upload check
if a file already exist and if the size is different. We used MS function get_file_properties for that,
and then upload the file.
Sometime during fall 2022 our script failed during the upload and it turned out that our function using MS function get_file_properties
timed out after 3+ minutes. Could not get any info why that started happening. Ended up rewriting our code for checking
if file exists and getting file size with a function that us MS get_paths, loop through all the files until found (or not)
and return data about that file.
Problem
This year we needed to test (from the prod server) towards our Azure dev Datalake. Discovered that the function with get_paths
did not work towards adls dev, but MS function get_file_properties did.
Same code, same MS module versions.
Was able to get a dev linux server, install our software, installed the MS modules. On dev server MS function get_file_properties worked.
Noticed that we had different version of the MS modules. Wrote a handful of test scripts that check towards adls prod and dev.
Tested functions for
get meta data for a file, and prinf file size.
get meta data for a folder, print last_modified
(used MS function get_file_properties)
list folders in a folder
list files in a folder
(used MS get_paths, check for file or dir, return object)
upload a file, this includes check if parent exist, if file exist, and check size.
download a file, include checking if file exist.
These test script was run against adls prod and dev (2 different file systems)
All tests ran successfully on the linux dev server.
We assumed the difference in the MS module version was the reason for our problem on the linux prod server.
Since all tests was successful on linux dev, we upgraded our linux prod. Installed the same test script,
upgraded MS modules to the same version.
Test scripts failed on the linux prod towards all adls dev.
Test script to upload files to adls prod worked, after we changed the part of the code that use MS function get_file_properties.
Test to download from adls prod failed.
we ended up rolling back our software, rolling back the MS module versions on linux prod.
How do we troubleshoot this ?
works on one linux with same OS, same python, same tokens for adls prod and dev.