Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
36 changes: 36 additions & 0 deletions api-reference/workflow/destinations/filenet.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,36 @@
---
title: FileNet
---

import FirstTimeAPIDestinationConnector from '/snippets/general-shared-text/first-time-api-destination-connector.mdx';

<FirstTimeAPIDestinationConnector />

Send processed data from Unstructured to {{filenet}}.

The requirements are as follows.

import FileNetPrerequisites from '/snippets/general-shared-text/filenet.mdx';

<FileNetPrerequisites />

To create an {{filenet}} destination connector, see the following examples.

import FileNetSDK from '/snippets/destination_connectors/filenet_sdk.mdx';
import FileNetAPIRESTCreate from '/snippets/destination_connectors/filenet_rest_create.mdx';

<CodeGroup>
<FileNetSDK />
<FileNetAPIRESTCreate />
</CodeGroup>

Replace the preceding placeholders as follows:

import FileNetAPIPlaceholders from '/snippets/general-shared-text/filenet-api-placeholders.mdx';

<FileNetAPIPlaceholders />

## Learn more

- <Icon icon="blog" />&nbsp;&nbsp;[Couchbase Integration in Unstructured Platform](https://unstructured.io/blog/couchbase-integration-in-unstructured-platform)

35 changes: 35 additions & 0 deletions api-reference/workflow/sources/filenet.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,35 @@
---
title: FileNet
---

import FirstTimeAPISourceConnector from '/snippets/general-shared-text/first-time-api-source-connector.mdx';

<FirstTimeAPISourceConnector />

Ingest your files into Unstructured from {{filenet}}.

The requirements are as follows.

import FileNetPrerequisites from '/snippets/general-shared-text/filenet.mdx';

<FileNetPrerequisites />

To create an {{filenet}} source connector, see the following examples.

import FileNetSDK from '/snippets/source_connectors/filenet_sdk.mdx';
import FileNetAPIRESTCreate from '/snippets/source_connectors/filenet_rest_create.mdx';

<CodeGroup>
<FileNetSDK />
<FileNetAPIRESTCreate />
</CodeGroup>

Replace the preceding placeholders as follows:

import FileNetAPIPlaceholders from '/snippets/general-shared-text/filenet-api-placeholders.mdx';

<FileNetAPIPlaceholders />

## Learn more

- <Icon icon="blog" />&nbsp;&nbsp;[Couchbase Integration in Unstructured Platform](https://unstructured.io/blog/couchbase-integration-in-unstructured-platform)
9 changes: 9 additions & 0 deletions docs.json
Original file line number Diff line number Diff line change
Expand Up @@ -49,6 +49,7 @@
"ui/sources/databricks-volumes",
"ui/sources/dropbox",
"ui/sources/elasticsearch",
"ui/sources/filenet",
"ui/sources/google-cloud",
"ui/sources/google-drive",
"ui/sources/jira",
Expand Down Expand Up @@ -80,6 +81,7 @@
"ui/destinations/delta-table",
"ui/destinations/databricks-delta-table",
"ui/destinations/elasticsearch",
"ui/destinations/filenet",
"ui/destinations/google-cloud",
"ui/destinations/ibm-watsonxdata",
"ui/destinations/kafka",
Expand Down Expand Up @@ -165,6 +167,7 @@
"api-reference/workflow/sources/databricks-volumes",
"api-reference/workflow/sources/dropbox",
"api-reference/workflow/sources/elasticsearch",
"api-reference/workflow/sources/filenet",
"api-reference/workflow/sources/google-cloud",
"api-reference/workflow/sources/google-drive",
"api-reference/workflow/sources/jira",
Expand Down Expand Up @@ -196,6 +199,7 @@
"api-reference/workflow/destinations/delta-table",
"api-reference/workflow/destinations/databricks-delta-table",
"api-reference/workflow/destinations/elasticsearch",
"api-reference/workflow/destinations/filenet",
"api-reference/workflow/destinations/google-cloud",
"api-reference/workflow/destinations/ibm-watsonxdata",
"api-reference/workflow/destinations/kafka",
Expand Down Expand Up @@ -433,6 +437,7 @@
"open-source/ingestion/source-connectors/discord",
"open-source/ingestion/source-connectors/dropbox",
"open-source/ingestion/source-connectors/elastic-search",
"open-source/ingestion/source-connectors/filenet",
"open-source/ingestion/source-connectors/github",
"open-source/ingestion/source-connectors/gitlab",
"open-source/ingestion/source-connectors/google-cloud-storage",
Expand Down Expand Up @@ -474,6 +479,7 @@
"open-source/ingestion/destination-connectors/dropbox",
"open-source/ingestion/destination-connectors/duckdb",
"open-source/ingestion/destination-connectors/elasticsearch",
"open-source/ingestion/destination-connectors/filenet",
"open-source/ingestion/destination-connectors/google-cloud-service",
"open-source/ingestion/destination-connectors/ibm-watsonxdata",
"open-source/ingestion/destination-connectors/kafka",
Expand Down Expand Up @@ -698,6 +704,9 @@
"tagId": "GTM-KJQHTZ6F"
}
},
"variables": {
"filenet": "IBM FileNet"
},
"redirects": [
{
"source": "/api-reference/api-services/accessing-unstructured-api",
Expand Down
25 changes: 25 additions & 0 deletions open-source/ingestion/destination-connectors/filenet.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,25 @@
---
title: FileNet
---

import SharedFileNet from '/snippets/dc-shared-text/filenet-cli-api.mdx';

<SharedFileNet />

Now call the Unstructured Ingest CLI or the Unstructured Ingest Python library. The source connector can be any of the ones supported. This example uses the local source connector.

This example sends files to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

[//]: # (tech-review: need to verify these samples for sh and python)

import FileNetAPISh from '/snippets/destination_connectors/filenet.sh.mdx';
import FileNetAPIPyV2 from '/snippets/destination_connectors/filenet.v2.py.mdx';

<CodeGroup>
<FileNetAPISh />
<FileNetAPIPyV2 />
</CodeGroup>

import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';

<SharedPartitionByAPIOSS/>
24 changes: 24 additions & 0 deletions open-source/ingestion/source-connectors/filenet.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,24 @@
---
title: FileNet
---

import SharedFileNet from '/snippets/sc-shared-text/filenet-cli-api.mdx';

<SharedFileNet/>

Now call the Unstructured CLI or Python. The destination connector can be any of the ones supported. This example uses the local destination connector:

This example sends data to Unstructured for processing by default. To process files locally instead, see the instructions at the end of this page.

import FileNetSh from '/snippets/source_connectors/filenet.sh.mdx';
import FileNetPyV2 from '/snippets/source_connectors/filenet.v2.py.mdx';

<CodeGroup>
<FileNetSh />
<FileNetPyV2 />
</CodeGroup>


import SharedPartitionByAPIOSS from '/snippets/ingest-configuration-shared/partition-by-api-oss.mdx';

<SharedPartitionByAPIOSS/>
13 changes: 13 additions & 0 deletions snippets/dc-shared-text/filenet-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,13 @@
FileNet

[Couchbase](https://couchbase.com) is a Distributed NoSQL Cloud Database. Couchbase embraces AI with coding assistance for developers, and vector search for their applications.

Batch process all your records to store structured outputs in a Couchbase database.

The requirements are as follows.

import FileNetShared from '/snippets/general-shared-text/filenet.mdx';
import FileNetSharedCLIAPI from '/snippets/general-shared-text/filenet-cli-api.mdx';

<FileNetShared />
<FileNetSharedCLIAPI />
29 changes: 29 additions & 0 deletions snippets/destination_connectors/filenet.sh.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
```bash CLI
#!/usr/bin/env bash

# Chunking and embedding are optional.

unstructured-ingest \
local \
--input-path $LOCAL_FILE_INPUT_DIR \
--output-dir $LOCAL_FILE_OUTPUT_DIR \
--strategy hi_res \
--chunk-elements \
--embedding-provider huggingface \
--num-processes 2 \
--verbose \
--partition-by-api \
--api-key $UNSTRUCTURED_API_KEY \
--partition-endpoint $UNSTRUCTURED_API_URL \
--additional-partition-args="{\"split_pdf_page\":\"true\", \"split_pdf_allow_failed\":\"true\", \"split_pdf_concurrency_level\": 15}" \
filenet \
--username $FILENET_USERNAME \
--password $FILENET_PASSWORD \
--server-url $FILENET_SERVER_URL \
--object-store $FILENET_OBJECT_STORE \
--folder-path $FILENET_FOLDER_PATH \
--document-class $FILENET_DOCUMENT_CLASS \
--recursive $FILENET_RECURSIVE \
--num-processes 2 \
--batch-size 80
```
56 changes: 56 additions & 0 deletions snippets/destination_connectors/filenet.v2.py.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,56 @@
```python Python Ingest
import os

from unstructured_ingest.pipeline.pipeline import Pipeline
from unstructured_ingest.interfaces import ProcessorConfig

from unstructured_ingest.processes.connectors.filenet import (
CouchbaseAccessConfig,
CouchbaseConnectionConfig,
CouchbaseUploadStagerConfig,
CouchbaseUploaderConfig
)
from unstructured_ingest.processes.connectors.local import (
LocalIndexerConfig,
LocalConnectionConfig,
LocalDownloaderConfig
)
from unstructured_ingest.processes.partitioner import PartitionerConfig
from unstructured_ingest.processes.chunker import ChunkerConfig
from unstructured_ingest.processes.embedder import EmbedderConfig

# Chunking and embedding are optional.

if __name__ == "__main__":
Pipeline.from_configs(
context=ProcessorConfig(),
indexer_config=LocalIndexerConfig(input_path=os.getenv("LOCAL_FILE_INPUT_DIR")),
downloader_config=LocalDownloaderConfig(),
source_connection_config=LocalConnectionConfig(),
partitioner_config=PartitionerConfig(
partition_by_api=True,
api_key=os.getenv("UNSTRUCTURED_API_KEY"),
partition_endpoint=os.getenv("UNSTRUCTURED_API_URL"),
strategy="hi_res",
additional_partition_args={
"split_pdf_page": True,
"split_pdf_allow_failed": True,
"split_pdf_concurrency_level": 15
}
),
chunker_config=ChunkerConfig(chunking_strategy="by_title"),
embedder_config=EmbedderConfig(embedding_provider="huggingface"),
destination_connection_config=CouchbaseConnectionConfig(
access_config=CouchbaseAccessConfig(
password=os.getenv("CB_PASSWORD"),
),
connection_string=os.getenv("CB_CONN_STR"),
username=os.getenv("CB_USERNAME"),
bucket=os.getenv("CB_BUCKET"),
scope=os.getenv("CB_SCOPE"),
collection=os.getenv("CB_COLLECTION")
),
stager_config=CouchbaseUploadStagerConfig(),
uploader_config=CouchbaseUploaderConfig(batch_size=100)
).run()
```
21 changes: 21 additions & 0 deletions snippets/destination_connectors/filenet_rest_create.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,21 @@
```bash curl
curl --request 'POST' --location \
"$UNSTRUCTURED_API_URL/destinations" \
--header 'accept: application/json' \
--header "unstructured-api-key: $UNSTRUCTURED_API_KEY" \
--header 'content-type: application/json' \
--data \
'{
"name": "<name>",
"type": "filenet",
"config": {
"server_url": "<server-url>",
"object_store": "<object-store>",
"folder_path": "<folder-path>",
"document_class": "<document-class>",
"recursive": <true|false>,
"username": "<username>",
"password": "<password>"
}
}'
```
29 changes: 29 additions & 0 deletions snippets/destination_connectors/filenet_sdk.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
```python Python SDK
import os

from unstructured_client import UnstructuredClient
from unstructured_client.models.operations import CreateDestinationRequest
from unstructured_client.models.shared import CreateDestinationConnector

with UnstructuredClient(api_key_auth=os.getenv("UNSTRUCTURED_API_KEY")) as client:
response = client.destinations.create_destination(
request=CreateDestinationRequest(
create_destination_connector=CreateDestinationConnector(
name="<name>",
type="filenet",
config={
"server_url": "<server-url>",
"object_store": "<object-store>",
"folder_path": "<folder-path>",
"document_class": "<document-class>",
"recursive": <true|false>,
"username": "<username>",
"password": "<password>"
}
)
)
)

print(response.destination_connector_information)
# ...
```
8 changes: 8 additions & 0 deletions snippets/general-shared-text/filenet-api-placeholders.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,8 @@
- `<name>` (_required_) - A unique name for this connector.
- `<server-url>` (_required_) - The base URL of your {{filenet}} server, containing both the IBM domain and your company's subdomain. For example, `https://<company-name>.automationcloud.ibm.com`.
- `<object-store>` - (_required_) - The name of the object store to connect within the content platform engine.
- `<folder-path>` (_required_) - The path of the folder to connect to within the object store.
- `<document_class>` The class of documents to include.
- `<recursive>` Set to `true` to include documents contained in any subfolders.
- `<username>` - (_required_) - The username of the IBM Cloud Pak for Business Automation as a Service account to use.
- `<password>` (_required_) - The password for the corresponding username.
29 changes: 29 additions & 0 deletions snippets/general-shared-text/filenet-cli-api.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,29 @@
The FileNet connector dependencies:

```bash CLI, Python
pip install "unstructured-ingest[filenet]"
```

import AdditionalIngestDependencies from '/snippets/general-shared-text/ingest-dependencies.mdx';

<AdditionalIngestDependencies />

These environment variables are required for the {{filenet}} connector:

[//]: # (tech-review: confirm the names of these FILENET variables)

- `FILENET_SERVER_URL` - The URL of your Content Platform Engine,, represented by `--server-url` (CLI) or `server_url` (Python).
- `FILENET_OBJECT_STORE` - The name of object store in the {{filenet}} server, represented by `--object-store` (CLI) or `object_store` (Python).
- `FILENET_FOLDER_PATH` - The path of the folder within the object store, represented by `--folder-path` (CLI) or `folder_path` (Python).
- `FILENET_DOCUMENT_CLASS` - The document class of documents contained in the folder, represented by `--document-class` (CLI) or `document_class` (Python).
- `FILENET_RECURSIVE` - `true` to include subfolders, represented by `--recursive` (CLI) or `recursive` (Python).
- `FILENET_USERNAME` - The username for the IBM Cloud Pak for Business Automation as a Service account, represented by `--username` (CLI) or `username` (Python).
- `FILENET_PASSWORD` - The password for the corresponding username, represented by `--password` (CLI) or `password` (Python).

Additional available settings include:

[//]: # (tech-review: is this setting available for FileNet?)

- `--collection-id` (CLI) or `collection_id` in `CouchbaseDownloaderConfig` (Python) - Optional for the source connector. The
unique key of the ID field in the collection. The default is `id` if not otherwise specified.
[Learn more](https://docs.couchbase.com/server/current/learn/services-and-indexes/indexes/indexing-and-query-perf.html#introduction-document-keys).
10 changes: 10 additions & 0 deletions snippets/general-shared-text/filenet-platform.mdx
Original file line number Diff line number Diff line change
@@ -0,0 +1,10 @@
Fill in the following fields:

- **Name** (_required_): A unique name for this connector.
- **Server URL** (_required_): The base URL of your {{filenet}} server, containing both the IBM domain and your company's subdomain. For example, `https://<company-name>.automationcloud.ibm.com`.
- **Object Store** (_required_): The name of the object store to connect within the server.
- **Folder Path** (_required_): The path of the folder to connect to within the object store.
- **Document Class**: The class of documents to include.
- **Recursive**: Select to include documents contained in any subfolders.
- **Username** (_required_): The username of the IBM Cloud Pak for Business Automation as a Service account to use.
- **Password** (_required_): The password for the corresponding username.
Loading