Skip to content

Enable vertical text detection for rotated images#4328

Merged
vladimir-kivi-ds merged 16 commits intomainfrom
vk/enable-vertical-text-detection-for-rotated-pages
Apr 14, 2026
Merged

Enable vertical text detection for rotated images#4328
vladimir-kivi-ds merged 16 commits intomainfrom
vk/enable-vertical-text-detection-for-rotated-pages

Conversation

@vladimir-kivi-ds
Copy link
Copy Markdown
Contributor

No description provided.

@vladimir-kivi-ds vladimir-kivi-ds self-assigned this Apr 8, 2026
@vladimir-kivi-ds
Copy link
Copy Markdown
Contributor Author

vladimir-kivi-ds commented Apr 8, 2026

Copy link
Copy Markdown
Collaborator

@badGarnet badGarnet left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

would be good to have a behavior change test: e.g., a rotated page partitioned by hi-res and detect text grouping are now correct

@vladimir-kivi-ds
Copy link
Copy Markdown
Contributor Author

would be good to have a behavior change test: e.g., a rotated page partitioned by hi-res and detect text grouping are now correct

Smth like this?

vladimir-kivi-ds and others added 10 commits April 9, 2026 03:21
…ures update (#4331)

This pull request includes updated ingest test fixtures.
Please review and merge if appropriate.

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Low Risk**
> Updates are limited to test fixture golden files (HTML/JSON) with
small text/id changes, so production behavior is unaffected; risk is
mainly around masking or legitimizing unintended extraction regressions.
> 
> **Overview**
> Updates ingest golden fixtures for `layout-parser-paper.pdf`
structured output in both HTML and JSON.
> 
> The expected extracted content changes slightly (author line character
corrections and an added trailing page number in a `ListItem`) and
corresponding `element_id`s are updated to match the new extraction
output.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
191ba7e. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

Co-authored-by: vladimir-kivi-ds <vladimir-kivi-ds@users.noreply.github.com>
… github.com:Unstructured-IO/unstructured into vk/enable-vertical-text-detection-for-rotated-pages
adlfs changed in 2026.4.0 with a breaking auth change: the default anon
behavior flipped to False, so code that previously hit public Azure
blobs anonymously now tries DefaultAzureCredential unless anon=True is
set explicitly. that matches the CI failure here.

this pins `adlfs==2026.2.0` to temporarily unblock failing changes when
bumping to latest

<!-- CURSOR_SUMMARY -->
---

> [!NOTE]
> **Medium Risk**
> Pins `adlfs` to an older version to work around an Azure
public-container ingest regression, which can affect Azure connector
behavior at runtime. The rest is CI-only fixture update plumbing and is
low risk.
> 
> **Overview**
> Pins `adlfs` to `==2026.2.0` via `pyproject.toml` UV constraints (and
updates `uv.lock`) to avoid a regression affecting anonymous access to
public Azure blob containers.
> 
> Updates the `ingest-test-fixtures-update-pr` GitHub Actions workflow
to also generate and include `expected-structured-output-markdown`
fixtures in the auto-created PRs, and records this change in the
`CHANGELOG.md`.
> 
> <sup>Reviewed by [Cursor Bugbot](https://cursor.com/bugbot) for commit
88a7801. Bugbot is set up for automated
code reviews on this repo. Configure
[here](https://www.cursor.com/dashboard/bugbot).</sup>
<!-- /CURSOR_SUMMARY -->

---------

Co-authored-by: Vladimir Kirilenko <vladimir.kirilenko@deepsense.ai>
@vladimir-kivi-ds vladimir-kivi-ds added this pull request to the merge queue Apr 14, 2026
Merged via the queue into main with commit dfb1653 Apr 14, 2026
52 checks passed
@vladimir-kivi-ds vladimir-kivi-ds deleted the vk/enable-vertical-text-detection-for-rotated-pages branch April 14, 2026 01:22
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants