fix: accept any IO[bytes] object in convert_to_bytes()#4241
Closed
bittoby wants to merge 11 commits intoUnstructured-IO:mainfrom
Closed
fix: accept any IO[bytes] object in convert_to_bytes()#4241bittoby wants to merge 11 commits intoUnstructured-IO:mainfrom
bittoby wants to merge 11 commits intoUnstructured-IO:mainfrom
Conversation
…ueError crash when partitioning files opened from a zip archive
Contributor
Author
|
@badGarnet Could you please review this PR? thank you |
Contributor
Author
|
@badGarnet please give me any feedback! thanks |
Contributor
Author
|
@cragwolfe Could you please review this PR? |
Contributor
Author
|
These testing fails are not relate my changes. Please review my PR @badGarnet @qued |
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Closes: #4097
Problem
When a user opens a file from inside a zip archive and passes it directly into
partition(),the library crashes with
ValueError: Invalid file-like object type.Uploading a text file directly works fine. Uploading a zip containing that same text file
fails every time, even though the file inside the zip is perfectly readable.
Root Cause
The
convert_to_bytes()function only accepted a fixed list of known file types.Anything not on that list was immediately rejected with an error — no attempt to read it,
no fallback, just a crash.
The file object returned when opening a file from a zip archive is not on that list,
so it was always rejected. This was a flaw in the design of the function: it checked
what the file was instead of checking what it could do.
Fix
Replaced the rigid type-checking approach with a simple capability check.
Before giving up, the function now asks: does this object support reading?
If yes, read it. This makes the function behave correctly for zip archive files
and any other standard readable file object that was previously unrecognised,
without changing how any of the existing accepted types are handled.