Skip to content
Merged
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
8 changes: 4 additions & 4 deletions .github/workflows/ci.yml
Original file line number Diff line number Diff line change
Expand Up @@ -24,7 +24,7 @@ jobs:
- name: Set up Python ${{ env.PYTHON_VERSION }}
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Install lint dependencies
run: uv sync --only-group lint --frozen
run: uv sync --only-group lint --locked
- name: Lint
run: make check

Expand Down Expand Up @@ -52,9 +52,9 @@ jobs:
- name: Install dependencies and run core tests
run: |
sudo apt-get update && sudo apt-get install --yes poppler-utils libreoffice
uv sync --group test --frozen
uv sync --group test --locked
make install-pandoc
make install-nltk-models
make install-nlp-models
sudo add-apt-repository -y ppa:alex-p/tesseract-ocr5
sudo apt-get install -y tesseract-ocr tesseract-ocr-kor
tesseract --version
Expand Down Expand Up @@ -114,6 +114,6 @@ jobs:
df -h
- name: Test Dockerfile
run: |
uv sync --group test --frozen
uv sync --group test --locked
make docker-build
make docker-test
2 changes: 1 addition & 1 deletion .github/workflows/docker-publish.yml
Original file line number Diff line number Diff line change
Expand Up @@ -83,7 +83,7 @@ jobs:
- name: Set up Python ${{ env.PYTHON_VERSION }}
run: uv python install ${{ env.PYTHON_VERSION }}
- name: Install test dependencies
run: uv sync --group test --frozen
run: uv sync --group test --locked
- name: Test image
run: |
export DOCKER_IMAGE="$DOCKER_BUILD_REPOSITORY:${{ matrix.arch }}-$SHORT_SHA"
Expand Down
5 changes: 5 additions & 0 deletions CHANGELOG.md
Original file line number Diff line number Diff line change
@@ -1,3 +1,8 @@
## 0.1.2
* Bump all packages (refresh uv.lock), pulling `unstructured==0.22.12` which replaces NLTK with spaCy
* Replace `download_nltk_packages` calls with spaCy model pre-download in Makefile, Dockerfile, and CI
* Switch `uv sync --frozen` to `uv sync --locked` across Dockerfile, Makefile, and CI workflows

## 0.1.1
* Switch arm64 Docker build runner from custom `opensource-linux-arm64-4core` to GitHub-hosted `ubuntu-24.04-arm`
* Consolidate multiarch Docker manifest creation into a single `docker buildx imagetools create` call
Expand Down
4 changes: 2 additions & 2 deletions Dockerfile
Original file line number Diff line number Diff line change
Expand Up @@ -61,7 +61,7 @@ ENV UV_PROJECT_ENVIRONMENT="${HOME}/.local"

COPY --chown=${NB_USER}:${NB_USER} pyproject.toml pyproject.toml
COPY --chown=${NB_USER}:${NB_USER} uv.lock uv.lock
RUN uv sync --no-dev --no-install-project --frozen
RUN uv sync --no-dev --no-install-project --locked

ARG PANDOC_VERSION="3.9"
RUN ARCH=$(uname -m) && \
Expand All @@ -71,7 +71,7 @@ RUN ARCH=$(uname -m) && \
cp /tmp/pandoc-${PANDOC_VERSION}/bin/pandoc /home/${USER}/.local/bin/ && \
rm -rf /tmp/pandoc*

RUN ${PYTHON} -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()" && \
RUN ${PYTHON} -c "from unstructured.nlp.tokenize import _load_spacy_model; _load_spacy_model()" && \
${PYTHON} -c "from unstructured.partition.model_init import initialize; initialize()" && \
${PYTHON} -c "from unstructured_inference.models.tables import UnstructuredTableTransformerModel; model = UnstructuredTableTransformerModel(); model.initialize('microsoft/table-transformer-structure-recognition')"

Expand Down
12 changes: 6 additions & 6 deletions Makefile
Original file line number Diff line number Diff line change
Expand Up @@ -13,23 +13,23 @@ help: Makefile

## install-base: installs minimum requirements to run the API
.PHONY: install-base
install-base: install-base-packages install-nltk-models
install-base: install-base-packages install-nlp-models

## install: installs all test and dev requirements
.PHONY: install
install: install-base install-test

.PHONY: install-base-packages
install-base-packages:
uv sync --no-dev --frozen
uv sync --no-dev --locked

.PHONY: install-test
install-test:
uv sync --group test --frozen
uv sync --group test --locked

.PHONY: install-nltk-models
install-nltk-models:
uv run python -c "from unstructured.nlp.tokenize import download_nltk_packages; download_nltk_packages()"
.PHONY: install-nlp-models
install-nlp-models:
uv run python -c "from unstructured.nlp.tokenize import _load_spacy_model; _load_spacy_model()"

## lock: regenerates uv.lock
.PHONY: lock
Expand Down
2 changes: 1 addition & 1 deletion prepline_general/api/__version__.py
Original file line number Diff line number Diff line change
@@ -1 +1 @@
__version__ = "0.1.1" # pragma: no cover
__version__ = "0.1.2" # pragma: no cover
Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Version bump without matching CHANGELOG entry breaks CI

High Severity

__version__ was bumped to 0.1.2 but CHANGELOG.md still has 0.1.1 as its latest entry. The make check-version CI step (which runs scripts/version-sync.sh -c) extracts the latest version from CHANGELOG.md and verifies it matches __version__.py. This mismatch will cause the check-version step to fail, blocking CI.

Fix in Cursor Fix in Web

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It's already updated you doofus

2 changes: 1 addition & 1 deletion prepline_general/api/general.py
Original file line number Diff line number Diff line change
Expand Up @@ -145,7 +145,7 @@ def partition_file_via_api(
if not request_url:
raise HTTPException(status_code=500, detail="Parallel mode enabled but no url set!")

api_key = request.headers.get("unstructured-api-key", default="")
api_key = request.headers.get("unstructured-api-key", "")
partition_kwargs["starting_page_number"] = (
partition_kwargs.get("starting_page_number", 1) + page_offset
)
Expand Down
1 change: 1 addition & 0 deletions pyproject.toml
Original file line number Diff line number Diff line change
Expand Up @@ -6,6 +6,7 @@ requires-python = ">=3.12"
dependencies = [
"unstructured[all-docs] >=0.18.31, <1.0.0",
"fastapi >=0.128.4, <1.0.0",
"python-multipart >=0.0.18",
"uvicorn >=0.40.0, <1.0.0",
"backoff >=2.2.1, <3.0.0",
"pandas >=3.0.0, <4.0.0",
Expand Down
Loading
Loading