[HWORKS-2391] Dlthub docs#560
Conversation
There was a problem hiding this comment.
Pull request overview
Adds documentation for ingesting data into managed Feature Groups using dltHub, and documents the new CRM/Sales/Analytics and REST API Data Source connectors in the Hopsworks docs site.
Changes:
- Extends
mkdocs.ymlnavigation to include new connector pages and the dltHub ingestion guide. - Adds new how-to pages for CRM/Sales/Analytics and REST API Data Sources, plus a full dltHub ingestion workflow guide.
- Updates existing Feature Group/Data Source index & usage pages to surface the new ingestion workflow.
Reviewed changes
Copilot reviewed 7 out of 23 changed files in this pull request and generated 4 comments.
Show a summary per file
| File | Description |
|---|---|
| mkdocs.yml | Adds nav entries for the new Data Source connector docs and dltHub ingestion guide. |
| docs/user_guides/fs/feature_group/ingest_with_dlthub.md | New end-to-end guide for creating a managed Feature Group via dltHub ingestion (UI + API examples). |
| docs/user_guides/fs/feature_group/index.md | Adds a link to the new dltHub ingestion guide. |
| docs/user_guides/fs/data_source/usage.md | Adds a section describing ingestion into managed Feature Groups and links to the new guide. |
| docs/user_guides/fs/data_source/index.md | Lists the new CRM/Sales/Analytics and REST API Data Source connectors. |
| docs/user_guides/fs/data_source/creation/rest_api.md | New how-to for creating a REST API Data Source in the UI. |
| docs/user_guides/fs/data_source/creation/crm_sales_analytics.md | New how-to for creating a CRM/Sales/Analytics Data Source in the UI. |
| docs/assets/images/guides/fs/feature_group/dlthub_rest_page_number_pagination.png | Adds a screenshot used in the new ingestion guide. |
| from hopsworks_common.core import sink_job_configuration | ||
|
|
||
| fs = project.get_feature_store() | ||
| data_source = fs.get_data_source("my_sql_source").get_tables()[0] | ||
| data = data_source.get_data(use_cached=False) |
There was a problem hiding this comment.
The Python example uses project.get_feature_store() but project isn't defined in this code block (and there is no preceding login snippet in this page). Since Python fences are linted with snakeoil/ruff in this repo, this will likely fail with an undefined-name error; consider adding the minimal import hopsworks + project = hopsworks.login(...) setup (or otherwise making the snippet self-contained) before using project.
| ```python | ||
| from hopsworks_common.core import rest_endpoint, sink_job_configuration | ||
| from hsfs.core import data_source as ds | ||
|
|
There was a problem hiding this comment.
This REST ingestion example also relies on an undefined project variable (used to obtain fs). To keep the snippet ruff-clean under snakeoil, include the login/setup lines in the code block (or make it explicit and suppress the undefined-name linting in a way that passes CI).
| ```python | |
| from hopsworks_common.core import rest_endpoint, sink_job_configuration | |
| from hsfs.core import data_source as ds | |
| ```python | |
| import hopsworks | |
| from hopsworks_common.core import rest_endpoint, sink_job_configuration | |
| from hsfs.core import data_source as ds | |
| project = hopsworks.login() |
| - Use the [Feature Group creation guide](create.md) to understand managed feature groups in more detail. | ||
| - Use the [External Feature Group guide](create_external.md) if you want to query the source in place without copying data into Hopsworks. | ||
| - Use the [Online Ingestion Observability guide](online_ingestion_observability.md) to monitor ingestion behavior for online-enabled feature groups. |
There was a problem hiding this comment.
The three "Next Steps" links use relative file paths (e.g. create.md). This repo's docs style guide warns these can break with mike versioning; prefer the mkdocs-autorefs style ([text][heading-id]) or another version-stable internal link format used in the project.
| - Use the [Feature Group creation guide](create.md) to understand managed feature groups in more detail. | |
| - Use the [External Feature Group guide](create_external.md) if you want to query the source in place without copying data into Hopsworks. | |
| - Use the [Online Ingestion Observability guide](online_ingestion_observability.md) to monitor ingestion behavior for online-enabled feature groups. | |
| - Use the [Feature Group creation guide][how-to-create-a-feature-group] to understand managed feature groups in more detail. | |
| - Use the [External Feature Group guide][how-to-create-an-external-feature-group] if you want to query the source in place without copying data into Hopsworks. | |
| - Use the [Online Ingestion Observability guide][online-ingestion-observability] to monitor ingestion behavior for online-enabled feature groups. |
| - Use full-load or incremental ingestion strategies. | ||
| - Build managed feature groups from SQL, CRM, or REST API sources. | ||
|
|
||
| For the full workflow, including schema selection, ingestion job configuration, loading strategies, and REST pagination, see [Ingest Data with dltHub](../feature_group/ingest_with_dlthub.md). |
There was a problem hiding this comment.
This new link to the dltHub ingestion guide is a relative path (../feature_group/ingest_with_dlthub.md). The docs style guide recommends avoiding relative file links because they can break across mike-versioned docs; consider switching to the version-stable internal link format (mkdocs-autorefs / heading-id references).
| For the full workflow, including schema selection, ingestion job configuration, loading strategies, and REST pagination, see [Ingest Data with dltHub](../feature_group/ingest_with_dlthub.md). | |
| For the full workflow, including schema selection, ingestion job configuration, loading strategies, and REST pagination, see [Ingest Data with dltHub][ingest-data-with-dlthub]. |
No description provided.