Skip to content

build(deps): update unstructured[all-docs] requirement from <1.0.0,>=0.18.31 to >=0.22.18,<1.0.0#567

Open
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/uv/unstructured-all-docs--gte-0.22.18-and-lt-1.0.0
Open

build(deps): update unstructured[all-docs] requirement from <1.0.0,>=0.18.31 to >=0.22.18,<1.0.0#567
dependabot[bot] wants to merge 1 commit intomainfrom
dependabot/uv/unstructured-all-docs--gte-0.22.18-and-lt-1.0.0

Conversation

@dependabot
Copy link
Copy Markdown
Contributor

@dependabot dependabot bot commented on behalf of github Apr 13, 2026

Updates the requirements on unstructured[all-docs] to permit the latest version.

Release notes

Sourced from unstructured[all-docs]'s releases.

0.22.18

What's Changed

Full Changelog: Unstructured-IO/unstructured@0.22.16...0.22.18

Changelog

Sourced from unstructured[all-docs]'s changelog.

0.22.18

Enhancements

  • Add page number support to v1 HTML parser: The v1 HTML parser now reads data-page-number attributes from ancestor elements and includes the page number in element metadata, consistent with the v2 parser behavior.

0.22.17

Fixes

  • Preserve semantic table headers across carried chunks: Carried rows in split table chunks now keep original header semantics (th stays th, including section header rows and wrapped header text), preventing header cells from degrading to data cells in continuation chunks.

0.22.16

Enhancements

  • Formula markdown export (element_to_md / elements_to_md): New keyword-only formula_markdown_style ("auto", "display_math", "plain"; default "auto"). In "auto", display math ($$ ... $$) is used only when the text looks like notation (heuristic score) and contains no $/$$ (avoids breaking Markdown and noisy OCR captions). "display_math" wraps whenever safe (still falls back to plain if $ would corrupt fences). "plain" emits text only. Optional normalize_formula (default True) maps common Unicode operators to LaTeX-like tokens; normalize_formula stays before keyword-only options so positional encoding / no_group_by_page callers are unchanged. Unicode is never mapped to \\sqrt{}. Module constants: FORMULA_MARKDOWN_AUTO, FORMULA_MARKDOWN_DISPLAY_MATH, FORMULA_MARKDOWN_PLAIN.

0.22.15

Security

  • security: fix(deps): upgrade vulnerable transitive dependencies [security]

0.22.14

Enhancements

  • Deduplicate PDF rendering: Remove _render_pdf_pages and delegate to unstructured-inference's convert_pdf_to_image (which already has lazy per-page rendering). Peak memory for path_only=True drops from O(n_pages) to O(1 page) — 97% reduction on a 100-page PDF. Bumps inference dep to >=1.6.2.

0.22.13

Enhancements

  • Speed up standardize_quotes: Replace loop-based character replacement with a single str.translate() call using a pre-computed translation table. Also fixes a pre-existing bug where left smart quotes were never normalized due to duplicate dictionary keys.

0.22.12

Fixes

  • Fix fast strategy silently skipping text in some PDFs: Certain PDF generators (e.g. Prince XML) embed font encoding data in a non-standard way that pdfminer.six does not handle, causing body text to be silently dropped while headings still extract correctly. Added a workaround that reads the embedded encoding data directly.

0.22.11

Enhancements

  • Exclude unused spaCy components: Exclude ner, lemmatizer, and attribute_ruler when loading en_core_web_sm, keeping parser for accurate sentence boundaries. Saves ~7 MiB peak memory.

0.22.10

Enhancements

  • Repeat table headers across continuation chunks: Add repeat_table_headers to basic/title chunking options and table chunking internals so leading header rows are detected once and carried forward when large tables spill across multiple chunks.

0.22.9

Enhancements

  • Isolate Table elements during chunking: Table and TableChunk elements are always placed in their own pre-chunk and are never merged with adjacent text into a CompositeElement, nor combined with neighboring pre-chunks when combine_text_under_n_chars is enabled. Shared helpers in unstructured.chunking.base centralize the table-isolation checks. Inter-chunk overlap (overlap + overlap_all) no longer carries narrative text into table pre-chunks or table tails into following text chunks.

... (truncated)

Commits
  • d299095 feat: add page number support to v1 html partition (#4327)
  • 615782a fix(chunking): preserve semantic headers in carried table chunks (#4313)
  • 264d569 feat: render Formula elements as $$ blocks with optional normalization (#4308)
  • 051b358 fix(deps): upgrade vulnerable transitive dependencies [security] (#4318)
  • affb9d6 refactor: deduplicate PDF rendering by delegating to unstructured-inference (...
  • 8929336 perf: speed up standardize_quotes with str.translate() (#4314)
  • 6ada488 fix: pdfminer drops extractable text (#4310)
  • a3172f8 mem: exclude unused spaCy pipeline components to reduce model memory (#4296)
  • b6cf510 feat(chunking): repeat table headers on continuation chunks (#4298)
  • 6360ef7 fix: isolate Table elements in pre-chunks (#4307)
  • Additional commits viewable in compare view

Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


Dependabot commands and options

You can trigger Dependabot actions by commenting on this PR:

  • @dependabot rebase will rebase this PR
  • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
  • @dependabot show <dependency name> ignore conditions will show all of the ignore conditions of the specified dependency
  • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
  • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)

Updates the requirements on [unstructured[all-docs]](https://github.com/Unstructured-IO/unstructured) to permit the latest version.
- [Release notes](https://github.com/Unstructured-IO/unstructured/releases)
- [Changelog](https://github.com/Unstructured-IO/unstructured/blob/main/CHANGELOG.md)
- [Commits](Unstructured-IO/unstructured@0.18.31...0.22.18)

---
updated-dependencies:
- dependency-name: unstructured[all-docs]
  dependency-version: 0.22.18
  dependency-type: direct:production
...

Signed-off-by: dependabot[bot] <support@github.com>
@dependabot dependabot bot added dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code labels Apr 13, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

dependencies Pull requests that update a dependency file python:uv Pull requests that update python:uv code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

0 participants