diff --git a/src/current/_includes/molt/migration-create-sql-user.md b/src/current/_includes/molt/migration-create-sql-user.md index dd2e078e3a4..0430f9b4d02 100644 --- a/src/current/_includes/molt/migration-create-sql-user.md +++ b/src/current/_includes/molt/migration-create-sql-user.md @@ -45,7 +45,7 @@ ALTER DEFAULT PRIVILEGES IN SCHEMA public GRANT SELECT, INSERT, UPDATE, DELETE ON TABLES TO crdb_user; ~~~ -Depending on the MOLT Fetch [data load mode](#data-load-mode) you will use, grant the necessary privileges to run either [`IMPORT INTO`](#import-into-privileges) or [`COPY FROM`](#copy-from-privileges) on the target tables: +Depending on the MOLT Fetch [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) you will use, grant the necessary privileges to run either [`IMPORT INTO`](#import-into-privileges) or [`COPY FROM`](#copy-from-privileges) on the target tables: #### `IMPORT INTO` privileges diff --git a/src/current/_includes/molt/migration-schema-design-practices.md b/src/current/_includes/molt/migration-schema-design-practices.md index 644fad6de81..6df1dfa02a7 100644 --- a/src/current/_includes/molt/migration-schema-design-practices.md +++ b/src/current/_includes/molt/migration-schema-design-practices.md @@ -31,12 +31,12 @@ Convert the source table definitions into CockroachDB-compatible equivalents. Co ~~~ - - MOLT Fetch can automatically define matching CockroachDB tables using the {% if page.name != "migration-strategy.md" %}[`drop-on-target-and-recreate`](#table-handling-mode){% else %}[`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#target-table-handling){% endif %} option. + - MOLT Fetch can automatically define matching CockroachDB tables using the {% if page.name != "migration-strategy.md" %}[`drop-on-target-and-recreate`](#table-handling-mode){% else %}[`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#handle-target-tables){% endif %} option. - If you define the target tables manually, review how MOLT Fetch handles [type mismatches]({% link molt/molt-fetch.md %}#mismatch-handling). You can use the {% if page.name != "migration-strategy.md" %}[MOLT Schema Conversion Tool](#schema-conversion-tool){% else %}[MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}){% endif %} to create matching table definitions.
- - By default, table and column names are case-insensitive in MOLT Fetch. If using the [`--case-sensitive`]({% link molt/molt-fetch.md %}#global-flags) flag, schema, table, and column names must match Oracle's default uppercase identifiers. Use quoted names on the target to preserve case. For example, the following CockroachDB SQL statement will error: + - By default, table and column names are case-insensitive in MOLT Fetch. If using the [`--case-sensitive`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag, schema, table, and column names must match Oracle's default uppercase identifiers. Use quoted names on the target to preserve case. For example, the following CockroachDB SQL statement will error: ~~~ sql CREATE TABLE co.stores (... store_id ...); @@ -57,6 +57,6 @@ Convert the source table definitions into CockroachDB-compatible equivalents. Co Avoid using sequential keys. To learn more about the performance issues that can result from their use, refer to the [guidance on indexing with sequential keys]({% link {{site.current_cloud_version}}/sql-faqs.md %}#how-do-i-generate-unique-slowly-increasing-sequential-numbers-in-cockroachdb). If a sequential key is necessary in your CockroachDB table, you must create it manually, after using [MOLT Fetch]({% link molt/molt-fetch.md %}) to load and replicate the data. {{site.data.alerts.end}} -- Review [Transformations]({% link molt/molt-fetch.md %}#transformations) to understand how computed columns and partitioned tables can be mapped to the target, and how target tables can be renamed. +- Review [Transformations]({% link molt/molt-fetch.md %}#define-transformations) to understand how computed columns and partitioned tables can be mapped to the target, and how target tables can be renamed. - By default on CockroachDB, `INT` is an alias for `INT8`, which creates 64-bit signed integers. PostgreSQL and MySQL default to 32-bit integers. Depending on your source database or application requirements, you may need to change the integer size to `4`. For more information, refer to [Considerations for 64-bit signed integers]({% link {{ site.current_cloud_version }}/int.md %}#considerations-for-64-bit-signed-integers). \ No newline at end of file diff --git a/src/current/_includes/molt/molt-drop-constraints-indexes.md b/src/current/_includes/molt/molt-drop-constraints-indexes.md index c360991eff5..7c98fc6bf54 100644 --- a/src/current/_includes/molt/molt-drop-constraints-indexes.md +++ b/src/current/_includes/molt/molt-drop-constraints-indexes.md @@ -1,11 +1,11 @@ To optimize data load performance, drop all non-`PRIMARY KEY` [constraints]({% link {{ site.current_cloud_version }}/alter-table.md %}#drop-constraint) and [indexes]({% link {{site.current_cloud_version}}/drop-index.md %}) on the target CockroachDB database before migrating: -{% if page.name == "molt-fetch.md" %} +{% if page.name == "molt-fetch-best-practices.md" %} - [`FOREIGN KEY`]({% link {{ site.current_cloud_version }}/foreign-key.md %}) - [`UNIQUE`]({% link {{ site.current_cloud_version }}/unique.md %}) - [Secondary indexes]({% link {{ site.current_cloud_version }}/schema-design-indexes.md %}) - [`CHECK`]({% link {{ site.current_cloud_version }}/check.md %}) - [`DEFAULT`]({% link {{ site.current_cloud_version }}/default-value.md %}) - - [`NOT NULL`]({% link {{ site.current_cloud_version }}/not-null.md %}) (you do not need to drop this constraint when using `drop-on-target-and-recreate` for [table handling](#target-table-handling)) + - [`NOT NULL`]({% link {{ site.current_cloud_version }}/not-null.md %}) (you do not need to drop this constraint when using `drop-on-target-and-recreate` for [table handling]({% link molt/molt-fetch.md %}#handle-target-tables)) {{site.data.alerts.callout_danger}} Do **not** drop [`PRIMARY KEY`]({% link {{ site.current_cloud_version }}/primary-key.md %}) constraints. diff --git a/src/current/_includes/molt/molt-limitations.md b/src/current/_includes/molt/molt-limitations.md index 4e41fb29b93..3f52071b137 100644 --- a/src/current/_includes/molt/molt-limitations.md +++ b/src/current/_includes/molt/molt-limitations.md @@ -12,7 +12,7 @@ - Oracle advises against `LONG RAW` columns and [recommends converting them to `BLOB`](https://www.orafaq.com/wiki/LONG_RAW#History). `LONG RAW` can only store binary values up to 2GB, and only one `LONG RAW` column per table is supported.
-- Only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded with [`--export-concurrency`]({% link molt/molt-fetch.md %}#best-practices). +- Only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded with [`--export-concurrency`]({% link molt/molt-fetch-best-practices.md %}#configure-the-source-database-and-connection). {% if page.name != "migrate-bulk-load.md" %} #### Replicator limitations @@ -37,5 +37,5 @@ - Running DDL on the source or target while replication is in progress can cause replication failures. - `TRUNCATE` operations on the source are not captured. Only `INSERT`, `UPDATE`, `UPSERT`, and `DELETE` events are replicated. -- Changes to virtual columns are not replicated automatically. To migrate these columns, you must define them explicitly with [transformation rules]({% link molt/molt-fetch.md %}#transformations). +- Changes to virtual columns are not replicated automatically. To migrate these columns, you must define them explicitly with [transformation rules]({% link molt/molt-fetch.md %}#define-transformations). {% endif %} \ No newline at end of file diff --git a/src/current/_includes/molt/molt-setup.md b/src/current/_includes/molt/molt-setup.md index 9b54d1dc5e6..2ed05806113 100644 --- a/src/current/_includes/molt/molt-setup.md +++ b/src/current/_includes/molt/molt-setup.md @@ -9,7 +9,7 @@ - Create a CockroachDB [{{ site.data.products.cloud }}]({% link cockroachcloud/create-your-cluster.md %}) or [{{ site.data.products.core }}]({% link {{ site.current_cloud_version }}/install-cockroachdb-mac.md %}) cluster. - Install the [MOLT (Migrate Off Legacy Technology)]({% link releases/molt.md %}#installation) tools. -- Review the [Fetch]({% link molt/molt-fetch.md %}#best-practices) and {% if page.name != "migrate-bulk-load.md" %}[Replicator]({% link molt/molt-replicator.md %}#best-practices){% endif %} best practices. +- Review the [Fetch]({% link molt/molt-fetch-best-practices.md %}) and {% if page.name != "migrate-bulk-load.md" %}[Replicator]({% link molt/molt-replicator-best-practices.md %}){% endif %} best practices. - Review [Migration Strategy]({% link molt/migration-strategy.md %}).
diff --git a/src/current/_includes/releases/v24.2/feature-highlights-migrations.html b/src/current/_includes/releases/v24.2/feature-highlights-migrations.html index 7df377498f3..fac0c718033 100644 --- a/src/current/_includes/releases/v24.2/feature-highlights-migrations.html +++ b/src/current/_includes/releases/v24.2/feature-highlights-migrations.html @@ -31,7 +31,7 @@ MOLT Fetch transformation rules

- Column exclusion, computed columns, and partitioned tables are now supported in table migrations with MOLT Fetch. They are supported via a new transformations framework that allows the user to specify a JSON file with instructions on how MOLT Fetch should treat certain schemas, tables, or underlying columns. + Column exclusion, computed columns, and partitioned tables are now supported in table migrations with MOLT Fetch. They are supported via a new transformations framework that allows the user to specify a JSON file with instructions on how MOLT Fetch should treat certain schemas, tables, or underlying columns.

All★★ diff --git a/src/current/_includes/v23.1/sidebar-data/migrate.json b/src/current/_includes/v23.1/sidebar-data/migrate.json index 81d046ba2d9..fde9c2bf230 100644 --- a/src/current/_includes/v23.1/sidebar-data/migrate.json +++ b/src/current/_includes/v23.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v23.2/sidebar-data/migrate.json b/src/current/_includes/v23.2/sidebar-data/migrate.json index 81d046ba2d9..fde9c2bf230 100644 --- a/src/current/_includes/v23.2/sidebar-data/migrate.json +++ b/src/current/_includes/v23.2/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v24.1/sidebar-data/migrate.json b/src/current/_includes/v24.1/sidebar-data/migrate.json index 81d046ba2d9..fde9c2bf230 100644 --- a/src/current/_includes/v24.1/sidebar-data/migrate.json +++ b/src/current/_includes/v24.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v24.2/sidebar-data/migrate.json b/src/current/_includes/v24.2/sidebar-data/migrate.json index 81d046ba2d9..fde9c2bf230 100644 --- a/src/current/_includes/v24.2/sidebar-data/migrate.json +++ b/src/current/_includes/v24.2/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v24.3/sidebar-data/migrate.json b/src/current/_includes/v24.3/sidebar-data/migrate.json index 81d046ba2d9..fde9c2bf230 100644 --- a/src/current/_includes/v24.3/sidebar-data/migrate.json +++ b/src/current/_includes/v24.3/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate Separately", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v25.1/sidebar-data/migrate.json b/src/current/_includes/v25.1/sidebar-data/migrate.json index e6ba00a899c..fde9c2bf230 100644 --- a/src/current/_includes/v25.1/sidebar-data/migrate.json +++ b/src/current/_includes/v25.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v25.2/sidebar-data/migrate.json b/src/current/_includes/v25.2/sidebar-data/migrate.json index 7693e764268..fde9c2bf230 100644 --- a/src/current/_includes/v25.2/sidebar-data/migrate.json +++ b/src/current/_includes/v25.2/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ @@ -172,4 +277,4 @@ ] } ] -} +} \ No newline at end of file diff --git a/src/current/_includes/v25.3/sidebar-data/migrate.json b/src/current/_includes/v25.3/sidebar-data/migrate.json index e6ba00a899c..fde9c2bf230 100644 --- a/src/current/_includes/v25.3/sidebar-data/migrate.json +++ b/src/current/_includes/v25.3/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v25.4/sidebar-data/migrate.json b/src/current/_includes/v25.4/sidebar-data/migrate.json index e6ba00a899c..fde9c2bf230 100644 --- a/src/current/_includes/v25.4/sidebar-data/migrate.json +++ b/src/current/_includes/v25.4/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/_includes/v26.1/sidebar-data/migrate.json b/src/current/_includes/v26.1/sidebar-data/migrate.json index e6ba00a899c..fde9c2bf230 100644 --- a/src/current/_includes/v26.1/sidebar-data/migrate.json +++ b/src/current/_includes/v26.1/sidebar-data/migrate.json @@ -9,36 +9,42 @@ ] }, { - "title": "Migration Strategy", - "urls": [ - "/molt/migration-strategy.html" - ] - }, - { - "title": "Migration Flows", + "title": "Migration Considerations", "items": [ { - "title": "Bulk Load", + "title": "Overview", + "urls": [ + "/molt/migration-considerations.html" + ] + }, + { + "title": "Migration Granularity", + "urls": [ + "/molt/migration-considerations-phases.html" + ] + }, + { + "title": "Continuous Replication", "urls": [ - "/molt/migrate-bulk-load.html" + "/molt/migration-considerations-replication.html" ] }, { - "title": "Load and Replicate", + "title": "Data Transformation Strategy", "urls": [ - "/molt/migrate-load-replicate.html" + "/molt/migration-considerations-transformation.html" ] }, { - "title": "Resume Replication", + "title": "Validation Strategy", "urls": [ - "/molt/migrate-resume-replication.html" + "/molt/migration-considerations-validation.html" ] }, { - "title": "Failback", + "title": "Rollback Plan", "urls": [ - "/molt/migrate-failback.html" + "/molt/migration-considerations-rollback.html" ] } ] @@ -48,14 +54,60 @@ "items": [ { "title": "Schema Conversion Tool", - "urls": [ - "/cockroachcloud/migrations-page.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/cockroachcloud/migrations-page.html" + ] + }, + { + "title": "Type Mapping", + "urls": [ + "/molt/molt-type-mapping.html" + ] + } ] }, { "title": "Fetch", - "urls": [ - "/molt/molt-fetch.html" + "items": [ + { + "title": "Guide", + "urls": [ + "/molt/molt-fetch.html" + ] + }, + { + "title": "Installation", + "urls": [ + "/molt/molt-fetch-installation.html" + ] + }, + { + "title": "Commands and Flags", + "urls": [ + "/molt/molt-fetch-commands-and-flags.html" + ] + }, + { + "title": "Metrics", + "urls": [ + "/molt/molt-fetch-monitoring.html" + ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-fetch-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-fetch-troubleshooting.html" + ] + } ] }, { @@ -68,7 +120,13 @@ ] }, { - "title": "Flags", + "title": "Installation", + "urls": [ + "/molt/molt-replicator-installation.html" + ] + }, + { + "title": "Commands and Flags", "urls": [ "/molt/replicator-flags.html" ] @@ -78,6 +136,18 @@ "urls": [ "/molt/replicator-metrics.html" ] + }, + { + "title": "Best Practices", + "urls": [ + "/molt/molt-replicator-best-practices.html" + ] + }, + { + "title": "Troubleshooting", + "urls": [ + "/molt/molt-replicator-troubleshooting.html" + ] } ] }, @@ -89,6 +159,41 @@ } ] }, + { + "title": "Common Migration Approaches", + "items": [ + { + "title": "Classic Bulk Load Migration", + "urls": [ + "/molt/migration-approach-classic-bulk-load.html" + ] + }, + { + "title": "Phased Bulk Load Migration", + "urls": [ + "/molt/migration-approach-phased-bulk-load.html" + ] + }, + { + "title": "Delta Migration", + "urls": [ + "/molt/migration-approach-delta.html" + ] + }, + { + "title": "Streaming Migration", + "urls": [ + "/molt/migration-approach-streaming.html" + ] + }, + { + "title": "Active-Active Migration", + "urls": [ + "/molt/migration-approach-active-active.html" + ] + } + ] + }, { "title": "Third-Party Migration Tools", "items": [ diff --git a/src/current/advisories/a144650.md b/src/current/advisories/a144650.md index 969f5c24488..e8d4615b0b2 100644 --- a/src/current/advisories/a144650.md +++ b/src/current/advisories/a144650.md @@ -110,7 +110,7 @@ By default, MOLT Fetch uses [`IMPORT INTO`]({% link v25.1/import-into.md %}) to - If you ran MOLT Verify after completing your MOLT Fetch run, and Verify did not find mismatches, then MOLT Fetch was unaffected by this issue. -- If you did not run Verify after Fetch, analyze the exported files that exist in your configured [Fetch data path]({% link molt/molt-fetch.md %}#data-path) to determine the expected number of rows. Then follow steps 1-3 in the [`IMPORT`](#import) section. +- If you did not run Verify after Fetch, analyze the exported files that exist in your configured [Fetch data path]({% link molt/molt-fetch.md %}#define-intermediate-storage) to determine the expected number of rows. Then follow steps 1-3 in the [`IMPORT`](#import) section. #### Physical Cluster Replication diff --git a/src/current/images/molt/molt-fetch-flow-draft.jpg b/src/current/images/molt/molt-fetch-flow-draft.jpg new file mode 100644 index 00000000000..cc40ccfa73e Binary files /dev/null and b/src/current/images/molt/molt-fetch-flow-draft.jpg differ diff --git a/src/current/images/molt/molt_flows_1.svg b/src/current/images/molt/molt_flows_1.svg new file mode 100644 index 00000000000..c730a6f84ca --- /dev/null +++ b/src/current/images/molt/molt_flows_1.svg @@ -0,0 +1,856 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_flows_2.svg b/src/current/images/molt/molt_flows_2.svg new file mode 100644 index 00000000000..d965814883e --- /dev/null +++ b/src/current/images/molt/molt_flows_2.svg @@ -0,0 +1,699 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_flows_3.svg b/src/current/images/molt/molt_flows_3.svg new file mode 100644 index 00000000000..fa8a9d22f72 --- /dev/null +++ b/src/current/images/molt/molt_flows_3.svg @@ -0,0 +1,736 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/images/molt/molt_flows_4.svg b/src/current/images/molt/molt_flows_4.svg new file mode 100644 index 00000000000..34363f84aa3 --- /dev/null +++ b/src/current/images/molt/molt_flows_4.svg @@ -0,0 +1,885 @@ + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + + \ No newline at end of file diff --git a/src/current/molt/migrate-bulk-load.md b/src/current/molt/migrate-bulk-load.md index 3247c617175..058bcc5759f 100644 --- a/src/current/molt/migrate-bulk-load.md +++ b/src/current/molt/migrate-bulk-load.md @@ -9,13 +9,19 @@ Perform a one-time bulk load of source data into CockroachDB. {% include molt/crdb-to-crdb-migration.md %} +## Migration sequence + +
+MOLT tooling overview +
+ {% include molt/molt-setup.md %} ## Start Fetch Perform the bulk load of the source data. -1. Run the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data into CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It limits the migration to a single schema and filters for three specific tables. The [data load mode](#data-load-mode) defaults to `IMPORT INTO`. Include the `--ignore-replication-check` flag to skip replication checkpoint queries, which eliminates the need to configure the source database for logical replication. +1. Run the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data into CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It limits the migration to a single schema and filters for three specific tables. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`. Include the `--ignore-replication-check` flag to skip replication checkpoint queries, which eliminates the need to configure the source database for logical replication.
{% include_cached copy-clipboard.html %} @@ -45,7 +51,7 @@ Perform the bulk load of the source data.
- The command assumes an Oracle Multitenant (CDB/PDB) source. `--source-cdb` specifies the container database (CDB) connection string. + The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. {% include_cached copy-clipboard.html %} ~~~ shell diff --git a/src/current/molt/migrate-load-replicate.md b/src/current/molt/migrate-load-replicate.md index f1de67c8175..679f3f155a9 100644 --- a/src/current/molt/migrate-load-replicate.md +++ b/src/current/molt/migrate-load-replicate.md @@ -15,7 +15,7 @@ Perform an initial bulk load of the source data using [MOLT Fetch]({% link molt/ Perform the initial load of the source data. -1. Issue the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data to CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It also limits the migration to a single schema and filters three specific tables to migrate. The [data load mode](#data-load-mode) defaults to `IMPORT INTO`. +1. Issue the [MOLT Fetch]({% link molt/molt-fetch.md %}) command to move the source data to CockroachDB. This example command passes the source and target connection strings [as environment variables](#secure-connections), writes [intermediate files](#intermediate-file-storage) to S3 storage, and uses the `truncate-if-exists` [table handling mode](#table-handling-mode) to truncate the target tables before loading data. It also limits the migration to a single schema and filters three specific tables to migrate. The [data load mode]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) defaults to `IMPORT INTO`.
You **must** include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load. @@ -47,7 +47,7 @@ Perform the initial load of the source data.
- The command assumes an Oracle Multitenant (CDB/PDB) source. `--source-cdb` specifies the container database (CDB) connection string. + The command assumes an Oracle Multitenant (CDB/PDB) source. [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection string. {% include_cached copy-clipboard.html %} ~~~ shell diff --git a/src/current/molt/migrate-to-cockroachdb.md b/src/current/molt/migrate-to-cockroachdb.md index a64832549da..745de598299 100644 --- a/src/current/molt/migrate-to-cockroachdb.md +++ b/src/current/molt/migrate-to-cockroachdb.md @@ -5,14 +5,14 @@ toc: true docs_area: migrate --- -MOLT Fetch supports various migration flows using [MOLT Fetch modes]({% link molt/molt-fetch.md %}#fetch-mode). +MOLT Fetch supports various migration flows using [MOLT Fetch modes]({% link molt/molt-fetch.md %}#define-fetch-mode). {% include molt/crdb-to-crdb-migration.md %} | Migration flow | Mode | Description | Best for | |---------------------------------------------------------------------|------------------------------|---------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------| -| [Bulk load]({% link molt/migrate-bulk-load.md %}) | `--mode data-load` | Perform a one-time bulk load of source data into CockroachDB. | Testing, migrations with [planned downtime]({% link molt/migration-strategy.md %}#approach-to-downtime) | -| [Load and replicate]({% link molt/migrate-load-replicate.md %}) | MOLT Fetch + MOLT Replicator | Load source data using MOLT Fetch, then replicate subsequent changes using MOLT Replicator. | [Minimal downtime]({% link molt/migration-strategy.md %}#approach-to-downtime) migrations | +| [Bulk load]({% link molt/migrate-bulk-load.md %}) | `--mode data-load` | Perform a one-time bulk load of source data into CockroachDB. | Testing, migrations with [planned downtime]({% link molt/migration-considerations.md %}#permissible-downtime) | +| [Load and replicate]({% link molt/migrate-load-replicate.md %}) | MOLT Fetch + MOLT Replicator | Load source data using MOLT Fetch, then replicate subsequent changes using MOLT Replicator. | [Minimal downtime]({% link molt/migration-considerations.md %}#permissible-downtime) migrations | | [Resume replication]({% link molt/migrate-resume-replication.md %}) | `--mode replication-only` | Resume replication from a checkpoint after interruption. | Resuming interrupted migrations, post-load sync | | [Failback]({% link molt/migrate-failback.md %}) | `--mode failback` | Replicate changes from CockroachDB back to the source database. | [Rollback]({% link molt/migrate-failback.md %}) scenarios | diff --git a/src/current/molt/migration-approach-active-active.md b/src/current/molt/migration-approach-active-active.md new file mode 100644 index 00000000000..a63fd861a87 --- /dev/null +++ b/src/current/molt/migration-approach-active-active.md @@ -0,0 +1,8 @@ +--- +title: Active-Active Migration +summary: Learn what a Active-Active Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +TBD \ No newline at end of file diff --git a/src/current/molt/migration-approach-classic-bulk-load.md b/src/current/molt/migration-approach-classic-bulk-load.md new file mode 100644 index 00000000000..f899e46a2ed --- /dev/null +++ b/src/current/molt/migration-approach-classic-bulk-load.md @@ -0,0 +1,8 @@ +--- +title: Classic Bulk Load Migration +summary: Learn what a Classic Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +TBD \ No newline at end of file diff --git a/src/current/molt/migration-approach-delta.md b/src/current/molt/migration-approach-delta.md new file mode 100644 index 00000000000..227e1ee0cc2 --- /dev/null +++ b/src/current/molt/migration-approach-delta.md @@ -0,0 +1,8 @@ +--- +title: Delta Migration +summary: Learn what a Delta Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +TBD \ No newline at end of file diff --git a/src/current/molt/migration-approach-phased-bulk-load.md b/src/current/molt/migration-approach-phased-bulk-load.md new file mode 100644 index 00000000000..3c2da438afd --- /dev/null +++ b/src/current/molt/migration-approach-phased-bulk-load.md @@ -0,0 +1,8 @@ +--- +title: Phased Bulk Load Migration +summary: Learn what a Phased Bulk Load Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +TBD \ No newline at end of file diff --git a/src/current/molt/migration-approach-streaming.md b/src/current/molt/migration-approach-streaming.md new file mode 100644 index 00000000000..829d7b0fd15 --- /dev/null +++ b/src/current/molt/migration-approach-streaming.md @@ -0,0 +1,8 @@ +--- +title: Streaming Migration +summary: Learn what a Streaming Migration is, how it relates to the migration considerations, and how to perform it using MOLT tools. +toc: true +docs_area: migrate +--- + +TBD \ No newline at end of file diff --git a/src/current/molt/migration-considerations-cutover.md b/src/current/molt/migration-considerations-cutover.md new file mode 100644 index 00000000000..eac110f84d2 --- /dev/null +++ b/src/current/molt/migration-considerations-cutover.md @@ -0,0 +1,8 @@ +--- +title: Cutover Plan +summary: Learn about the different approaches to cutover, and how to think about this for your migration. +toc: true +docs_area: migrate +--- + +TBD \ No newline at end of file diff --git a/src/current/molt/migration-considerations-phases.md b/src/current/molt/migration-considerations-phases.md new file mode 100644 index 00000000000..9df80280da6 --- /dev/null +++ b/src/current/molt/migration-considerations-phases.md @@ -0,0 +1,101 @@ +--- +title: Migration Granularity +summary: Learn how to think about phased data migration, and whether or not to approach your migration in phases. +toc: true +docs_area: migrate +--- + +You may choose to migrate all of your data into a CockroachDB cluster at once. However, for larger data stores it's recommended that you migrate data in separate phases. This can help break the migration down into manageable slices, and it can help limit the effects of migration difficulties. + +This page explains when to choose each approach, how to define phases, and how to use MOLT tools effectively in either context. + +In general: + +- Choose to migrate your data **all at once** if your data volume is modest, if you want to minimize migration complexity, or if you don't mind taking on a greater risk of something going wrong. + +- Choose a **phased migration** if your data volume is large, especially if you can naturally partition workload by tenant, service/domain, table/shard, geography, or time. A phased migration helps to reduce risk by limiting the workloads that would be adversely affected by a migration failure. It also helps to limit the downtime per phase, and allows the application to continue serving unaffected subsets of the data during the migration of a phase. + +## How to divide migrations into phases + +Here are some common ways to divide migrations: + +* **Per-tenant**: Multi-tenant apps route traffic and data per customer/tenant. Migrate a small cohort first (canary), then progressively larger cohorts. This aligns with access controls and isolates blast radius. + +* **Per-service/domain**: In microservice architectures, migrate data owned by a service or domain (e.g., billing, catalog) and route only that service to CockroachDB while others continue on the source. Requires clear data ownership and integration contracts. + +* **Per-table or shard**: Start with non-critical tables, large-but-isolated tables, or shard ranges. For monolith schemas, you can still phase by tables with few foreign-key dependencies and clear read/write paths. + +* **Per-region/market**: If traffic is regionally segmented, migrate one region/market at a time and validate latency, capacity, and routing rules before expanding. + +Tips for picking slices: + +- Prefer slices with clear routing keys (tenant_id, region_id) to simplify cutover and verification. + +- Start with lower-impact slices to exercise the migration process before migrating high-value cohorts. + +## Tradeoffs + +| | All at once | Phased | +|---|---|---| +| Downtime | A single downtime window, but it affects the whole database | Multiple short windows, each with limited impact | +| Risk | Higher blast radius if issues surface post-cutover | Lower blast radius, issues confined to a slice | +| Complexity | Simpler orchestration, enables a single cutover | More orchestration, repeated verify and cutover steps | +| Validation | One-time, system-wide | Iterative per slice; faster feedback loops | +| Timeline | Shorter migration time | Longer calendar time but safer path | +| Best for | Small/medium datasets, simple integrations | Larger datasets, data with natural partitions or multiple tenants, risk-averse migrations | + +## Decision framework + +Use these questions to guide your approach: + +**How large is your dataset and how long will a full migration take?** +If you can migrate the entire dataset within an acceptable downtime window, all-at-once is simpler. If the migration would take hours or days, phased migrations reduce the risk and downtime per phase. + +**Does your data have natural partitions?** +If you can clearly partition by tenant, service, region, or table with minimal cross-dependencies, phased migration is well-suited. If your data is highly interconnected with complex foreign-key relationships, all-at-once may be easier. + +**What is your risk tolerance?** +If a migration failure affecting the entire system is unacceptable, phased migration limits the blast radius. If you can afford to roll back the entire migration in case of issues, all-at-once is faster. + +**How much downtime can you afford per cutover?** +Phased migrations spread downtime across multiple smaller windows, each affecting only a subset of users or services. All-at-once requires a single larger window affecting everyone. + +**What is your team's capacity for orchestration?** +Phased migrations require repeated cycles of migration, validation, and cutover, with careful coordination of routing and monitoring. All-at-once is a single coordinated event. + +**Do you need to validate incrementally?** +If you want fast feedback loops and the ability to adjust your migration strategy based on early phases, phased migration provides incremental validation. All-at-once validates everything once at the end. + +**Can you route traffic selectively?** +Phased migrations require the ability to route specific tenants, services, or regions to CockroachDB while others remain on the source. If your application can't easily support this, all-at-once may be necessary. + +## MOLT toolkit support + +Phased and unphased migrations are both supported natively by MOLT. + +By default, [MOLT Fetch]({% link molt/molt-fetch.md %}) moves all data from the source database to CockroachDB. However, you can use the [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter), [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter), and [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) flags to selective migrate data from the source to the target. Learn more about [schema and table selection]({% link molt/molt-fetch.md %}#schema-and-table-selection) and [selective data movement]({% link molt/molt-fetch.md %}#select-data-to-migrate), both of which can enable a phased migration. + +Similarly, you can use [MOLT Verify]({% link molt/molt-verify.md %})'s `--schema-filter` and `--table-filter` flags to run validation checks on subsets of the data in your source and target databases. In a phased migration, you will likely want to verify data at the end of each migration phase, rather than at the end of the entire migration. + +[MOLT Replicator]({% link molt/molt-replicator.md %}) replicates full tables by default. If you choose to combine phased migration with [continuous replication]({% link molt/migration-considerations-replication.md %}), you will either need to select phases that include whole tables, or else use [userscripts]({% link molt/replicator-flags.md %}#userscript) to select rows to replicate. + +## Example sequences + +#### Migrating all data at once + +
+MOLT tooling overview +
+ +#### Phased migration + +
+MOLT tooling overview +
+ +## See Also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) diff --git a/src/current/molt/migration-considerations-replication.md b/src/current/molt/migration-considerations-replication.md new file mode 100644 index 00000000000..ef7dd33d47e --- /dev/null +++ b/src/current/molt/migration-considerations-replication.md @@ -0,0 +1,120 @@ +--- +title: Continuous Replication +summary: Learn when and how to use continuous replication during data migration to minimize downtime and keep the target synchronized with the source. +toc: true +docs_area: migrate +--- + +Continuous replication can be used during a migration to keep a CockroachDB target cluster synchronized with a live source database. This is often used to minimize downtime at cutover. It can complement bulk data loading or be used independently. + +This page explains when to choose continuous replication, how to combine it with bulk loading, and how to use MOLT tools effectively for each approach. + +In general: + +- Choose to **bulk load only** if you can schedule a downtime window long enough to complete the entire data load and do not need to capture ongoing changes during migration. + +- Choose a **hybrid approach (bulk load + continuous replication)** when you need to minimize downtime and keep the target synchronized with ongoing source database changes until cutover. + +- You can choose **continuous replication only** for tables with transient data, or in other contexts where you only need to capture ongoing changes and are not concerned with migrating a large initial dataset. + +## Permissible downtime + +Downtime is the primary factor to consider in determining your migration's approach to continuous replication. + +If your migration can accommodate a window of **planned downtime** that's made known to your users in advance, a bulk load approach is simpler. A pure bulk load approach is well-suited for test or pre-production refreshes, or with migrations that can successfully move data within a planned downtime window. + +If your migration needs to **minimize downtime**, you will likely need to keep the source database live for as long as possible, continuing to allow write traffic to the source until cutover. In this case, an initial bulk load will need to be followed by a replication period, during which you stream incremental changes from the source to the target CockroachDB cluster. This is ideal for large datasets that are impractical to move within a narrow downtime window, or when you need validation time with a live, continuously synced target before switching traffic. The final downtime is minimized to a brief pause to let replication drain before switching traffic, with the pause length driven by write volume and observed replication lag. + +If you're migrating your data [in mulitple phases]({% link molt/migration-considerations-phases.md %}), consider the fact that each phase can have its own separate downtime window and cutover, and that migrating in phases can reduce the length of each individual downtime window. + +## Tradeoffs + +| | Bulk load only | Hybrid (bulk + replication) | Continuous replication only | +|---|---|---|---| +| **Downtime** | Requires full downtime for entire load | Minimal final downtime (brief pause to drain) | Minimal if resuming from checkpoint | +| **Performance** | Fastest overall if window allows | Spreads work: bulk moves mass, replication handles ongoing changes | Depends on catch-up time from checkpoint | +| **Complexity** | Fewer moving parts, simpler orchestration | Requires replication infrastructure and monitoring | Requires checkpoint management | +| **Risk management** | Full commit at once; rollback more disruptive | Supports failback flows for rollback options | Lower risk when resuming known state | +| **Cutover** | Traffic off until entire load completes | Traffic paused briefly while replication drains | Brief pause to verify sync | +| **Timeline** | Shortest migration time if downtime permits | Longer preparation but safer path | Short catch-up phase | +| **Best for** | Simple moves, test environments, scheduled maintenance | Production migrations, large datasets, high availability requirements | Recovery scenarios, post-load sync | + +## Decision framework + +Use these questions to guide your approach: + +**What downtime can you tolerate?** +If you can't guarantee a window long enough for the full load, favor the hybrid approach to minimize downtime at cutover. + +**How large is the dataset and how fast can you bulk-load it?** +If load time fits inside downtime, bulk-only is simplest. Otherwise, hybrid. + +**How active is the source (write rate and burstiness)?** +Higher write rates mean a longer final drain; this pushes you toward hybrid with close monitoring of replication lag before cutover. + +**Do you need a safety net?** +If rollback is a requirement, design for replication and failback pathways, which the MOLT flow supports. + +**How much validation do you require pre-cutover?** +Hybrid gives you time to validate a live, synchronized target before switching traffic. + +## MOLT toolkit support + +The MOLT toolkit provides two complementary tools for data migration: [MOLT Fetch]({% link molt/molt-fetch.md %}) for bulk loading the initial dataset, and [MOLT Replicator]({% link molt/molt-replicator.md %}) for continuous replication. These tools work independently or together depending on your chosen replication approach. + +### Bulk load only + +Use MOLT Fetch to export and load data to CockroachDB. + +For pure bulk migrations, set the [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flag to skip gathering replication checkpoints. This simplifies the workflow when you don't need to track change positions for subsequent replication. + +MOLT Fetch supports both `IMPORT INTO` (default, for highest throughput with offline tables) and `COPY FROM` (for online tables) loading methods. Because a pure bulk load approach will likely involve substantial application downtime, you may benefit from using `IMPORT INTO`. In this case, do not use the [`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) flag. Learn more about Fetch's [data load modes]({% link molt/molt-fetch.md %}#import-into-vs-copy-from). + +A migration that does not utilize continuous replication would not need to use MOLT Replicator. + + + +### Hybrid (bulk load + continuous replication) + +Use MOLT Fetch to export and load the inital dataset to CockroachDB. Then start MOLT Replicator to begin streaming changes from the source database to CockroachDB. + +When you run MOLT Fetch without [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check), it emits a checkpoint value that marks the point in time when the bulk load snapshot was taken. After MOLT Fetch completes, the checkpoint is stored in the target database. MOLT Replicator then uses this checkpoint to begin streaming changes from exactly that point, ensuring no data is missed between the bulk load and continuous replication. Learn more about [replication checkpoints]({% link molt/molt-replicator.md %}#replication-checkpoints). + +MOLT Fetch supports both `IMPORT INTO` (default, for highest throughput with offline tables) and `COPY FROM` (for online tables) loading methods. Because a hybrid approach will likely aim to have less downtime, you may need to use `COPY FROM` if your tables remain online. In this case, use the [`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) flag. Learn more about Fetch's [data load modes]({% link molt/molt-fetch.md %}#import-into-vs-copy-from). + +MOLT Replicator replicates full tables by default. If you choose to combine continuous replication with a [phased migration]({% link molt/migration-considerations-phases.md %}), you will either need to select phases that include whole tables, or else use [userscripts]({% link molt/replicator-flags.md %}#userscript) to select rows to replicate. + +MOLT Replicator can be stopped after cutover, or it can remain online to continue streaming changes indefinitely. + +### Continuous replication only + +If you're only interested in capturing recent changes, skip MOLT Fetch entirely and just use MOLT Replicator. + +## Example sequences + +#### Bulk load only + +
+MOLT tooling overview +
+ +#### Hybrid approach + +
+MOLT tooling overview +
+ +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Migration Granularity]({% link molt/migration-considerations-phases.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-considerations-rollback.md b/src/current/molt/migration-considerations-rollback.md new file mode 100644 index 00000000000..97379f0a521 --- /dev/null +++ b/src/current/molt/migration-considerations-rollback.md @@ -0,0 +1,97 @@ +--- +title: Rollback Plan +summary: Learn how to plan rollback options to limit risk and preserve data integrity during migration. +toc: true +docs_area: migrate +--- + +A rollback plan defines how you will undo or recover from a failed migration. A clear rollback strategy limits risk during migration, minimizes business impact, and preserves data integrity so that you can retry the migration with confidence. + +This page explains four common rollback options, their trade-offs, and how the MOLT toolkit supports each approach. + +In general: + +- **Manual reconciliation** is sufficient for low-risk systems or low-complexity migrations where automated rollback is not necessary. + +- Utilize **failback replication** to maintain synchronization between the CockroachDB cluster and the original source database after cutover to CockroachDB. + +- Utilize **bidirectional replication** (simultaneous forward and failback replication) to maximize database synchronization without requiring app changes, accepting the operational overhead of running two replication streams. + +- Choose a **dual-write** strategy for the fastest rollback with minimal orchestration, accepting higher application complexity during the trial window. + +## Why plan for rollback + +Many things can go wrong during a migration. Performance issues may surface under production load that didn't appear in testing. Application compatibility problems might emerge that require additional code changes. Data discrepancies could be discovered that necessitate investigation and remediation. In any of these scenarios, the ability to quickly and safely return to the source database is critical to minimizing business impact. + +Your rollback strategy should align with your migration's risk profile, downtime tolerance, and operational capabilities. High-stakes production migrations typically require faster rollback paths with minimal data loss, while test environments or low-traffic systems can tolerate simpler manual approaches. + +### Failback replication + +[Continuous (forward) replication]({% link molt/migration-considerations-replication.md %}), which serves to minimize downtime windows, keeps two databases in sync by replicating changes from the source to the target. In contrast, **failback replication** synchronizes data in the opposite direction, from the target back to the source. + +Failback replication is useful for rollback because it keeps the source database synchronized with writes that occur on CockroachDB after cutover. If problems emerge during your trial period and you need to roll back, the source database already has all the data that was written to CockroachDB. This enables a quick rollback without data loss. + +Failback and forward replication can be used simultaneously (**bidirectional replication**). This is especially useful if the source and the target databases can receive simultaneous, but disparate write traffic. In that case, bidirectional replication is necessary to ensure that both databases stay in sync. It's also useful if downtime windows are long or if cutover is gradual, increasing the likelihood that the two databases receive independent writes. + +### Dual-write + +Failback replication requires an external replication system (like [MOLT Replicator]({% link molt/molt-replicator.md %})) to keep two databases synchronized. Alternatively, you can modify the application code itself to enable **dual-writes**, wherein the application writes to both the source database and CockroachDB during a trial window. If rollback is needed, you can then redirect traffic to the source without additional data movement. + +This enables faster rollback, but increases application complexity as you need to manage two write paths. + +## Tradeoffs + +| | Manual reconciliation | Failback replication (on-demand) | Bidirectional replication | Dual-write | +|---|---|---|---|---| +| **Rollback speed (RTO)** | Slow | Moderate | Fast | Fast | +| **Data loss risk (RPO)** | Medium-High | Low | Low | Low-Medium (app-dependent) | +| **Synchronization mechanism** | None (backups/scripts) | Activate failback when needed | Continuous forward + failback | Application writes to both | +| **Application changes** | None | None | None | Required | +| **Operational complexity** | Low (tooling), High (manual) | Medium (runbook activation) | High (two replication streams) | High (app layer) | +| **Overhead during trial** | Low | Low-Medium | High (two replication streams) | Medium (two write paths) | +| **Best for** | Low-risk systems, simple migrations | Moderate RTO tolerance, lower ongoing cost | Strict RTO/RPO, long or complex cutovers | Short trials, resilient app teams | + +## Decision framework + +Use these questions to guide your rollback strategy: + +**How quickly do you need to roll back if problems occur?** +If you need immediate rollback, choose dual-write or bidirectional replication. If you can tolerate some delay to activate failback replication, one-way failback replication is sufficient. For low-risk migrations with generous time windows, manual reconciliation may be acceptable. + +**How much data can you afford to lose during rollback?** +If you cannot lose any data written after cutover, choose bidirectional replication or on-demand failback (both preserve all writes). Dual-write can also preserve data if implemented carefully. Manual reconciliation typically accepts some data loss. + +**Will writes occur to both databases during the trial period?** +If traffic might split between source and target (e.g., during gradual cutover or in multi-region scenarios), bidirectional replication keeps both databases synchronized. If traffic cleanly shifts from source to target, on-demand failback or dual-write is sufficient. + +**Can you modify the application code?** +If application changes are expensive or risky, use database-level replication (bidirectional or on-demand failback) instead of dual-write. + +**What is your team's operational capacity?** +Bidirectional replication requires monitoring and managing two active replication streams. On-demand failback requires a tested runbook for activating failback quickly. Dual-write requires application-layer resilience and observability. Manual reconciliation has the lowest operational complexity. + +**What are your database capabilities?** +Ensure your source database supports the change data capture requirements for the migration window. Verify that CockroachDB changefeeds can provide the necessary failback support for your environment. + +## MOLT toolkit support + +[MOLT Replicator]({% link molt/molt-replicator.md %}) uses change data to stream changes from one database to another. It's used for both [forward replication]({% link molt/migration-considerations-replication.md %}) and [failback replication](#failback-replication). + +To use MOLT Replicator in failback mode, run the [`replicator start`]({% link molt/replicator-flags.md %}#commands) command with its various [flags]({% link molt/replicator-flags.md %}). + +When enabling failback replication, the original source database becomes the replication target, and the original target CockroachDB cluster becomes the replication source. Use the `--sourceConn` flag to indicate the CockroachDB cluster, and use the `--targetConn` flag to indicate the PostgreSQL, MySQL, or Oracle database from which data is being migrated. + +MOLT Replicator can be stopped after cutover, or it can remain online to continue streaming changes indefinitely. This could be useful for as long as you want as you want the migration source database to serve as a backup to the new CockroachDB cluster. + +Rollback plans that do not utilize failback replication will require external tooling, or in the case of a dual-write strategy, changes to application code. You can still use [MOLT Verify]({% link molt/molt-verify.md %}) to ensure parity between the two databases. + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [Validation Strategy]({% link molt/migration-considerations-validation.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) +- [Migrate with Failback]({% link molt/migrate-failback.md %}) diff --git a/src/current/molt/migration-considerations-transformation.md b/src/current/molt/migration-considerations-transformation.md new file mode 100644 index 00000000000..a83f2f00d80 --- /dev/null +++ b/src/current/molt/migration-considerations-transformation.md @@ -0,0 +1,88 @@ +--- +title: Data Transformation Strategy +summary: Learn about the different approaches to applying data transformations during a migration and how to choose the right strategy for your use case. +toc: true +docs_area: migrate +--- + +Data transformations are applied to data as it moves from the source system to the target system. Transformations ensure that the data is compatible, consistent, and valuable in the destination. They are a key part of a migration to CockroachDB. When planning a migration, it's important to determine **what** transformations are necessary and **where** they need to occur. + +This page explains the types of transformations to expect, where they can be applied, and how these choices shape your use of MOLT tooling. + +## Common transformation types + +If the source and target schemas are not identical, some sort of transformation is likely to be necessary during a migration. The set of necessary transformations will depend on the differences between your source database schema and your target CockroachDB schema, as well as any data quality or formatting requirements for your application. + +- **Type mapping**: Align source types with CockroachDB types, especially for dialect-specific types. +- **Format conversion**: Change the format or encoding of certain value to align with the target schema (for example, `2024-03-01T00:00:00Z` to `03/01/2024`). +- **Field renaming**: Rename fields to fit target schemas or conventions. +- **Primary key strategy**: Replace source sequences or auto-increment patterns with CockroachDB-friendly IDs (UUIDs, sequences). +- **Table reshaping**: Consolidate partitioned tables, rename tables, or retarget to different schemas. +- **Column changes**: Exclude deprecated columns, or map computed columns. +- **Row filtering**: Move only a subset of rows by tenant, region, or timeframe. +- **Null/default handling**: Replace, remove, or infer missing values. +- **Constraints and indexes**: Drop non-primary-key constraints and secondary indexes before bulk load for performance, then recreate after. + +## Where to transform + +Transformations can occur in the source database, in the target database, or in flight (between the source and the target). Deciding where to perform the transformations is largely determined by technical constraints, including the mutability of the source database and the choice of tooling. + +#### Transform in the source database + +Apply transformations directly on the source database before migrating data. This is only possible if the source database can be modified to accommodate the transformations suited for the target database. + +This provides the advantage of allowing ample time, before the downtime window, to perform the transformations, but it often is not possible due to technical constraints. + +#### Transform in the target database + +Apply transformations in the CockroachDB cluster after data has been loaded. For any transformations that occur in the target cluster, it's recommended that these occur before cutover, to ensure that live data complies with CockroachDB best practices. Transformations that occur before cutover may extend downtime. + +#### Transform in flight + +Apply transformations within the migration pipeline, between the source and target databases. This allows the source database to remain as it is, and it allows the target database to be designed using CockroachDB best practices. It also enables testability by separating transformations from either database. + +However, in-flight transformations may require more complex tooling. Tranformation in-flight is largely supported by the [MOLT toolkit](#molt-toolkit-support). + +## Decision framework + +Use these questions to guide your transformation strategy: + +- **What is your downtime tolerance?** Near-zero downtime pushes you toward in-flight transforms that apply consistently to bulk and streaming loads. +- **Will transformation logic be reused post-cutover?** If you need ongoing sync or failback, prefer deterministic, version-controlled in-flight transformations. +- **How complex are the transformations?** Simple schema reshaping favors MOLT Fetch transformations or target DDL. Complex value normalization or routing favors in-flight userscripts. +- **Can you modify the source database?** Source-side transformations require permission and capacity to create views, staging tables, or run transformation queries. + +## MOLT toolkit support + +The MOLT toolkit provides functionality for implementing transformations at each stage of the migration pipeline. + +### MOLT Schema Conversion Tool + +While not a part of the transformation process itself, the [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) automates the creation of the target database schema based on the schema of the source database. This reduces downstream transformation pressure by addressing DDL incompatibilities upfront. + +### MOLT Fetch + +[MOLT Fetch]({% link molt/molt-fetch.md %}) supports transformations during a bulk data load: + +- **Row filtering**: [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) specifies a JSON file with table-to-SQL-predicate mappings evaluated in the source dialect before export. Ensure filtered columns are indexed for performance. +- **Schema shaping**: [`--transformations-file`]({% link molt/molt-fetch-commands-and-flags.md %}#transformations-file) defines table renames, n→1 merges (consolidate partitioned tables), and column exclusions. For n→1 merges, use [`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) or [`--direct-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#direct-copy) and pre-create the target table. +- **Type alignment**: [`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#type-map-file) specifies explicit type mappings when auto-creating target tables. +- **Table lifecycle**: [`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) controls whether to truncate, drop-and-recreate, or assume tables exist. + +### MOLT Replicator + +[MOLT Replicator]({% link molt/molt-replicator.md %}) uses TypeScript **userscripts** to implement in-flight transformations for continuous replication: + +- **Capabilities**: Transform or normalize values, route rows to different tables, enrich data, filter rows, merge partitioned sources. +- **Structure**: Userscripts export functions (`configureTargetTables`, `onRowUpsert`, `onRowDelete`) that process change data before commit to CockroachDB. +- **Staging schema**: Replicator uses a CockroachDB staging schema to store replication state and buffered mutations (`--stagingSchema` and `--stagingCreateSchema`). + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Migration Granularity]({% link molt/migration-considerations-phases.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-considerations-validation.md b/src/current/molt/migration-considerations-validation.md new file mode 100644 index 00000000000..0a890fada67 --- /dev/null +++ b/src/current/molt/migration-considerations-validation.md @@ -0,0 +1,161 @@ +--- +title: Validation Strategy +summary: Learn when and how to validate data during migration to ensure correctness, completeness, and consistency. +toc: true +docs_area: migrate +--- + +Validation strategies are critical to ensuring a successful data migration. They're how you confirm that the right data has been moved correctly, is complete, and is usable in the new environment. A validation strategy is defined by **what** validations you want to run and **when** you want to run them. + +This page explains how to think about different validation strategies and how to use MOLT tooling to enable validation. + + + +## What to validate + +Running any of the following validations can help you feel confident that the data in the CockroachDB cluster matches the data in the migration source database. + +- **Row Count Validation**: Ensures the number of records matches between source and target. + +- **Checksum/Hash Validation**: Compares hashed values of rows or columns to detect changes or corruption. + +- **Data Sampling**: Randomly sample and manually compare rows between systems. + +- **Column-Level Comparison**: Validate individual field values across systems. + +- **Business Rule Validation**: Apply domain rules to validate logic or derived values. + +- **Boundary Testing**: Ensure edge-case data (nulls, max values, etc.) are correctly migrated. + +- **Referential Integrity**: Validate that relationships (foreign keys) are intact in the target. + +- **Data Type Validation**: Confirm that fields conform to expected types/formats. + +- **Null/Default Value Checks**: Validate expected default values or NULLs post-migration. + +- **ETL Process Validation**: Check logs, counts, or errors from migration tools. + +- **Automated Testing**: Use scripts or tools to compare results and flag mismatches. + +The rigor of your validations (the set of validations you perform) will depend on your organization's risk tolerance and the complexity of the migration. + +## When to validate + +A migration can be a long process, and depending on the choices made in designing a migration, it can be complex. If the dataset is small or the migration is low in complexity, it may be sufficient to simply run validations when you're ready to cut over application traffic to CockroachDB. However, there are several opportunities to validate your data in advance of cutover. + +It's often useful to find natural checkpoints in your migration flow to run validations, and to increase the rigor of those validations as you approach cutover. + +If performing a migration [in phases]({% link molt/migration-considerations-phases.md %}), the checkpoints below can be considered in the context of each individual phase. A rigorous validation approach might choose to run validations after each phase, while a more risk-tolerant approach might choose to run them after all of the phases have been migrated but before cutover. + +#### Pre-migration (design and dry-run) + +Validate converted schema and resolve type mapping issues. Run a dry-run migration on test data and begin query validation to catch behavioral differences early. + +#### After a bulk data load + +Run comprehensive validations to confirm schema and row-level parity before re-adding constraints and indexes that were dropped to accelerate load. + +#### During continuous replication + +If using [continuous replication]({% link molt/migration-considerations-replication.md %}), run validation periodically to ensure the target converges with the source. Use live-aware validation to reduce false positives from in-flight changes. This gives you confidence that replication is working correctly. + +#### Before cutover + +Once replication has drained, run final validation on the complete cutover scope and verify critical application queries. + +#### Post-cutover confidence checks + +After traffic moves to CockroachDB, run targeted validation on critical tables and application smoke tests to confirm steady state. + +## Decision framework + +Use these questions to help you determine what validations you want to perform, and when you want to peform them: + +**What is your data volume and validation timeline?** +Larger datasets require more validation time. Consider concurrency tuning, phased validation, or off-peak runs to fit within windows. + +**Is the source database active during migration?** +Active sources require live-aware validation and continuous monitoring during replication. Plan for replication drain before final validation. + +**Are there intentional schema or type differences?** +Expect validation to flag type conversions and collation differences. Decide upfront whether to accept conditional successes or redesign to enable strict parity. + +**Which tables are most critical?** +Prioritize critical data (compliance, transactions, authentication) for comprehensive validation. Use targeted validation loops on high-churn tables during replication. + +**Do you have unsupported column types?** +For types that cannot be compared automatically (e.g., geospatial), plan alternative checks like row counts or application-level validation. + +## MOLT toolkit support + +[MOLT Verify]({% link molt/molt-verify.md %}) performs structural and row-level comparison between the source database and the CockroachDB cluster. MOLT Verify performs the following verifications to ensure data integrity during a migration: + +- Table Verification: Check that the structure of tables between the source database and the target database are the same. + +- Column Definition Verification: Check that the column names, data types, constraints, nullability, and other attributes between the source database and the target database are the same. + +- Row Value Verification: Check that the actual data in the tables is the same between the source database and the target database. + +Other validations beyond those supported by MOLT Verify would need to be run by a third-party tool, but could be run in tandem with MOLT Verify. + +If performing a [phased migration]({% link molt/migration-considerations-phases.md %}), you can use MOLT Verify's `--schema-filter` and `--table-filter` flags to specify specific schemas or tables to run the validations on. + +If using [continuous replication]({% link molt/migration-considerations-replication.md %}), you can use MOLT Verify's `--continuous` and `--live` flags to enable continuous verification. + +Check MOLT Verify's [known limitations]({% link molt/molt-verify.md %}#known-limitations) to ensure the tool's suitability for your validation strategy. + + + + + + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Considerations]({% link molt/migration-considerations.md %}) +- [Migration Granularity]({% link molt/migration-considerations-phases.md %}) +- [Continuous Replication]({% link molt/migration-considerations-replication.md %}) +- [Data Transformation Strategy]({% link molt/migration-considerations-transformation.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) diff --git a/src/current/molt/migration-considerations.md b/src/current/molt/migration-considerations.md new file mode 100644 index 00000000000..dcaa15853e1 --- /dev/null +++ b/src/current/molt/migration-considerations.md @@ -0,0 +1,73 @@ +--- +title: Migration Considerations +summary: Learn what to consider when making high-level decisions about a migration. +toc: true +docs_area: migrate +--- + +When planning a migration to CockroachDB, you need to make several high-level decisions that will shape your migration approach. This page provides an overview of key migration variables and the factors that influence them. Each variable has multiple options, and the combination you choose will largely define your migration strategy. + +For detailed migration sequencing and tool usage, see [Migration Overview]({% link molt/migration-overview.md %}). For detailed planning guidance, see [Migration Strategy]({% link molt/migration-strategy.md %}). + +## Migration variables + +Learn more about each migration variable by clicking the links in the left-hand column. + +| Variable | Description | +|---|---| +| [**Migration granularity**]({% link molt/migration-considerations-phases.md %}) | Do you want to migrate all of your data at once, or do you want to split your data up into phases and migrate one phase at a time? | +| [**Continuous replication**]({% link molt/migration-considerations-replication.md %}) | After the initial data load (or after the initial load of each phase), do you want to stream further changes to that data from the source to the target? | +| [**Data transformation strategy**]({% link molt/migration-considerations-transformation.md %}) | If there are discrepancies between the source and target schema, how will you define those data transformations, and when will those transformations occur? | +| [**Validation strategy**]({% link molt/migration-considerations-validation.md %}) | How and when will you verify that the data in CockroachDB matches the source database? | +| [**Rollback plan**]({% link molt/migration-considerations-rollback.md %}) | What approach will you use to roll back the migration if issues arise during or after cutover? | + +The combination of these variables largely defines your migration approach. While you'll typically choose one primary option for each variable, some migrations may involve a hybrid approach depending on your specific requirements. + +## Factors to consider + +When deciding on the options for each migration variable, consider the following business and technical requirements: + +### Permissible downtime + +How much downtime can your application tolerate during the migration? This is one of the most critical factors in determining your migration approach, and it may influence your choices for [migration granularity]({% link molt/migration-considerations-phases.md %}), [continuous replication]({% link molt/migration-considerations-replication.md %}), and [cutover strategy]({% link molt/migration-considerations-cutover.md %}). + +- **Planned downtime** is made known to your users in advance. It involves taking the application offline, conducting the migration, and bginging the application back online on CockroachDB. + + To succeed, you should estimate the amount of downtime required to migrate your data, and ideally schedule the downtime outside of peak hours. Scheduling downtime is easiest if your application traffic is "periodic", meaning that it varies by the time of day, day of week, or day of month. + + If you can support planned downtime, you may want to migrate your data all at once, and _without_ continuous replication. + +- **Minimal downtime** impacts as few customers as possible, ideally without impacting their regular usage. If your application is intentionally offline at certain times (e.g., outside business hours), you can migrate the data without users noticing. Alternatively, if your application's functionality is not time-sensitive (e.g., it sends batched messages or emails), you can queue requests while the system is offline and process them after completing the migration to CockroachDB. + +- **Near-zero downtime** is necessary for mission-critical applications. For these migrations, consider cutover strategies that keep applications online for as long as possible, and which utilize continuous replication. + +In addition to downtime duration, consider whether your application could support windows of **reduced functionality** in which some, but not all, application functionality is brought offline. For example, you can disable writes but not reads while you migrate the application data, and queue data to be written after completing the migration. + +### Migration timeframe and allowable complexity + +When do you need to complete the migration? How many team members can be allocated for this effort? How much complex orchestration can your team manage? These factors may influence your choices for [migration granularity]({% link molt/migration-considerations-phases.md %}), [continuous replication]({% link molt/migration-considerations-replication.md %}), and [cutover strategy]({% link molt/migration-considerations-cutover.md %}). + +- Migrations with a short timeline, or which cannot accommodate high complexity, may want to migrate data all at once, without utilizing continuous replication, and requiring manual reconciliation in the event of migration failure. + +- Migrations with a long timeline, or which can accomodate complexity, may want to migrate data in phases. If the migration requires minimal downtime, these migrations may also want to utilize continuous replication. If the migration is low in risk-tolerance, these migrations may also want to enable failback. + +### Risk tolerance + +How much risk is your organization willing to accept during the migration? This may influence your choices for [migration granularity]({% link molt/migration-considerations-phases.md %}), [validation strategy]({% link molt/migration-considerations-validation.md %}), and [rollback plan]({% link molt/migration-considerations-rollback.md %}). + +- Risk-averse migrations should prefer phased migrations that limit the blast radius of any issues. Start with low-risk slices (e.g., a small cohort of tenants or a non-critical service), validate thoroughly, and progressively expand to higher-value workloads. These migrations may also prefer rollback plans that enable quick recovery in the event of migration issues. + +- For risk-tolerant migrations, it may be acceptable to migrate all of your data at once. Less stringent validation strategies and manual reconciliation in the event of a migration failure may also be acceptable. + +___ + +These above factors are only a subset of all of what you'll want to consider in the decision-making about your CockroachDB migration, along with your specific business requirements and technical constraints. It's recommended that you document these decisions and the reasoning behind them as part of your [migration plan]({% link molt/migration-strategy.md %}#develop-a-migration-plan). + +## See also + +- [Migration Overview]({% link molt/migration-overview.md %}) +- [Migration Strategy]({% link molt/migration-strategy.md %}) +- [Bulk vs. Phased Migration]({% link molt/migration-considerations-phases.md %}) +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/migration-overview.md b/src/current/molt/migration-overview.md index 163dc26d8aa..0ad317a1346 100644 --- a/src/current/molt/migration-overview.md +++ b/src/current/molt/migration-overview.md @@ -5,34 +5,47 @@ toc: true docs_area: migrate --- -The MOLT (Migrate Off Legacy Technology) toolkit enables safe, minimal-downtime database migrations to CockroachDB. MOLT combines schema transformation, distributed data load, continuous replication, and row-level validation into a highly configurable workflow that adapts to diverse production environments. +A migration involves transfering data from a pre-existing **source** database onto a **target** CockroachDB cluster. Migrating data is a complex, multi-step process, and a data migration can take many different forms depending on your specific business and technical constraints. + +Cockroach Labs provides a MOLT (Migrate Off Legacy Technology) toolkit to aid in migrations. This page provides an overview of the following: - Overall [migration sequence](#migration-sequence) - [MOLT tools](#molt-tools) -- Supported [migration flows](#migration-flows) ## Migration sequence -{{site.data.alerts.callout_success}} -Before you begin the migration, review [Migration Strategy]({% link molt/migration-strategy.md %}). -{{site.data.alerts.end}} - A migration to CockroachDB generally follows this sequence: -MOLT tooling overview + + +1. **Assess and discover**: Inventory the source database, flag unsupported features, make a migration plan. +1. **Prepare the environment**: Configure networking, users and permissions, bucket locations, replication settings, and more. +1. **Convert the source schema**: Generate CockroachDB-compatible [DDL]({% link {{ site.current_cloud_version }}/sql-statements.md %}#data-definition-statements). Apply the converted schema to the target database. Drop constraints and indexes to facilitate data load. +1. **Load data into CockroachDB**: Bulk load the source data into the CockroachDB cluster. +1. **Finalize target schema**: Recreate indexes or constraints on CockroachDB that you previously dropped to facilitate data load. +1. **_(Optional)_ Replicate ongoing changes**: Keep CockroachDB in sync with the source. This may be necessary for migrations that minimize downtime. +1. **Stop application traffic**: Limit user read/write traffic to the source database. _This begins application downtime._ +1. **Verify data consistency**: Confirm that the CockroachDB data is consistent with the source. +1. **_(Optional)_ Enable failback**: Replicate data from the target back to the source, enabling a reversion to the source database in the event of migration failure. +1. **Cut over application traffic**: Resume normal application use, with the CockroachDB cluster as the target database. _This ends application downtime._ -1. Prepare the source database: Configure users, permissions, and replication settings as needed. -1. Convert the source schema: Use the [Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) to generate CockroachDB-compatible [DDL]({% link {{ site.current_cloud_version }}/sql-statements.md %}#data-definition-statements). Apply the converted schema to the target database. Drop constraints and indexes to facilitate data load. -1. Load data into CockroachDB: Use [MOLT Fetch]({% link molt/molt-fetch.md %}) to bulk-ingest your source data. -1. (Optional) Verify consistency before replication: Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the data loaded into CockroachDB is consistent with the source. -1. Finalize target schema: Recreate indexes or constraints on CockroachDB that you previously dropped to facilitate data load. -1. Replicate ongoing changes: Enable continuous replication with [MOLT Replicator]({% link molt/molt-replicator.md %}) to keep CockroachDB in sync with the source. -1. Verify consistency before cutover: Use [MOLT Verify]({% link molt/molt-verify.md %}) to confirm that the CockroachDB data is consistent with the source. -1. Cut over to CockroachDB: Redirect application traffic to the CockroachDB cluster. +The MOLT (Migrate Off Legacy Technology) toolkit enables safe, minimal-downtime database migrations to CockroachDB. MOLT combines schema transformation, distributed data load, continuous replication, and row-level validation into a highly configurable workflow that adapts to diverse production environments. -For more details, refer to [Migration flows](#migration-flows). + +
+MOLT tooling overview +
## MOLT tools @@ -87,11 +100,11 @@ The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) [MOLT Fetch]({% link molt/molt-fetch.md %}) performs the initial data load to CockroachDB. It supports: -- [Multiple migration flows](#migration-flows) via `IMPORT INTO` or `COPY FROM`. -- Data movement via [cloud storage, local file servers, or direct copy]({% link molt/molt-fetch.md %}#data-path). -- [Concurrent data export]({% link molt/molt-fetch.md %}#best-practices) from multiple source tables and shards. -- [Schema transformation rules]({% link molt/molt-fetch.md %}#transformations). -- After exporting data with `IMPORT INTO`, safe [continuation]({% link molt/molt-fetch.md %}#fetch-continuation) to retry failed or interrupted tasks from specific checkpoints. +- Multiple migration flows via `IMPORT INTO` or `COPY FROM`. +- Data movement via [cloud storage, local file servers, or direct copy]({% link molt/molt-fetch.md %}#define-intermediate-storage). +- [Concurrent data export]({% link molt/molt-fetch-best-practices.md %}) from multiple source tables and shards. +- [Schema transformation rules]({% link molt/molt-fetch.md %}#define-transformations). +- After exporting data with `IMPORT INTO`, safe [continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption) to retry failed or interrupted tasks from specific checkpoints. ### Replicator @@ -100,7 +113,7 @@ The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) - Continuous replication from source databases to CockroachDB. - [Multiple consistency modes]({% link molt/molt-replicator.md %}#consistency-modes) for balancing throughput and transactional guarantees. - Failback replication from CockroachDB back to source databases. -- [Performance tuning]({% link molt/molt-replicator.md %}#optimize-performance) for high-throughput workloads. +- [Performance tuning]({% link molt/molt-replicator-best-practices.md %}#optimize-performance) for high-throughput workloads. ### Verify @@ -110,7 +123,11 @@ The [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) - Column definition. - Row-level data. -## Migration flows + + +## Migration variables + +You must decide how you want your migration to handle each of the following variables. These decisions will depend on your specific business and technical considerations. The MOLT toolkit supports any set of decisions made for the [supported source databases](#molt-tools). + +### Migration granularity -### Bulk load +You may choose to migrate all of your data into a CockroachDB cluster at once. However, for larger data stores it's recommended that you migrate data in separate phases. This can help break the migration down into manageable slices, and it can help limit the effects of migration difficulties. -For migrations that tolerate downtime, use MOLT Fetch in `data-load` mode to perform a one-time bulk load of source data into CockroachDB. Refer to [Bulk Load]({% link molt/migrate-bulk-load.md %}). +### Continuous replication -### Migrations with minimal downtime +After data is migrated from the source into CockroachDB, you may choose to continue streaming changes to that source data from the source to the target. This is important for migrations that aim to minimize application downtime, as they may require the source database to continue receiving writes until application traffic is fully cut over to CockroachDB. -To minimize downtime during migration, use MOLT Fetch for initial data loading followed by MOLT Replicator for continuous replication. Instead of loading all data during a planned downtime window, you can run an initial load followed by continuous replication. Writes are paused only briefly to allow replication to drain before the final cutover. The duration of this pause depends on the volume of write traffic and the replication lag between the source and CockroachDB. +### Data transformation strategy -Refer to [Load and Replicate]({% link molt/migrate-load-replicate.md %}) for detailed instructions. +If there are discrepencies between the source and target schemas, the rules that determine necessary data transformations need to be defined. These transformations can be applied in the source database, in flight, or in the target database. -### Recovery and rollback strategies +### Validation strategy -If the migration is interrupted or cutover must be aborted, MOLT Replicator provides safe recovery options: +There are several different ways of verifying that the data in the source and the target match one another. You must decide what validation checks you want to perform, and when in the migration process you want to perform them. + +### Rollback plan + +Until the migration is complete, migration failures may make you decide to roll back application traffic entirely to the source database. You may therefore need a way of keeping the source database up to date with new writes to the target. This is especially important for risk-averse migrations that aim to minimize downtime. + +--- -- Resume a previously interrupted replication stream. Refer to [Resume Replication]({% link molt/migrate-resume-replication.md %}). -- Use failback mode to reverse the migration, synchronizing changes from CockroachDB back to the original source. This ensures data consistency on the source so that you can retry the migration later. Refer to [Migration Failback]({% link molt/migrate-failback.md %}). +[Learn more about the different migration variables]({% link molt/migration-considerations.md %}), how you should consider the different options for each variable, and how to use the MOLT toolkit for each variable. ## See also diff --git a/src/current/molt/migration-strategy.md b/src/current/molt/migration-strategy.md index eb2b46f65f5..29fb8ecb51f 100644 --- a/src/current/molt/migration-strategy.md +++ b/src/current/molt/migration-strategy.md @@ -10,10 +10,9 @@ A successful migration to CockroachDB requires planning for downtime, applicatio This page outlines key decisions, infrastructure considerations, and best practices for a resilient and repeatable high-level migration strategy: - [Develop a migration plan](#develop-a-migration-plan). -- Evaluate your [downtime approach](#approach-to-downtime). - [Size the target CockroachDB cluster](#capacity-planning). - Implement [application changes](#application-changes) to address necessary [schema changes](#schema-design-best-practices), [transaction contention](#handling-transaction-contention), and [unimplemented features](#unimplemented-features-and-syntax-incompatibilities). -- [Prepare for migration](#prepare-for-migration) by running a [pre-mortem](#run-a-migration-pre-mortem), setting up [metrics](#set-up-monitoring-and-alerting), [loading test data](#load-test-data), [validating application queries](#validate-queries) for correctness and performance, performing a [migration dry run](#perform-a-dry-run), and reviewing your [cutover strategy](#cutover-strategy). +- [Prepare for migration](#prepare-for-migration) by running a [pre-mortem](#run-a-migration-pre-mortem), setting up [metrics](#set-up-monitoring-and-alerting), [loading test data](#load-test-data), [validating application queries](#validate-queries) for correctness and performance, performing a [migration dry run](#perform-a-dry-run), and reviewing your cutover strategy. {% assign variable = value %} {{site.data.alerts.callout_success}} For help migrating to CockroachDB, contact our sales team. @@ -31,20 +30,6 @@ Consider the following as you plan your migration: Create a document summarizing the migration's purpose, technical details, and team members involved. -## Approach to downtime - -It's important to fully [prepare the migration](#prepare-for-migration) in order to be certain that the migration can be completed successfully during the downtime window. - -- *Planned downtime* is made known to your users in advance. Once you have [prepared for the migration](#prepare-for-migration), you take the application offline, [conduct the migration]({% link molt/migration-overview.md %}), and bring the application back online on CockroachDB. To succeed, you should estimate the amount of downtime required to migrate your data, and ideally schedule the downtime outside of peak hours. Scheduling downtime is easiest if your application traffic is "periodic", meaning that it varies by the time of day, day of week, or day of month. - - Migrations with planned downtime are only recommended if you can complete the bulk data load (e.g., using the MOLT Fetch [`data-load` mode]({% link molt/molt-fetch.md %}#fetch-mode)) within the downtime window. Otherwise, you can [minimize downtime using continuous replication]({% link molt/migration-overview.md %}#migrations-with-minimal-downtime). - -- *Minimal downtime* impacts as few customers as possible, ideally without impacting their regular usage. If your application is intentionally offline at certain times (e.g., outside business hours), you can migrate the data without users noticing. Alternatively, if your application's functionality is not time-sensitive (e.g., it sends batched messages or emails), you can queue requests while the system is offline and process them after completing the migration to CockroachDB. - - MOLT enables [migrations with minimal downtime]({% link molt/migration-overview.md %}#migrations-with-minimal-downtime), using [MOLT Replicator]({% link molt/molt-replicator.md %}) for continuous replication of source changes to CockroachDB. - -- *Reduced functionality* takes some, but not all, application functionality offline. For example, you can disable writes but not reads while you migrate the application data, and queue data to be written after completing the migration. - ## Capacity planning To size the target CockroachDB cluster, consider your data volume and workload characteristics: @@ -110,9 +95,9 @@ Based on the error budget you [defined in your migration plan](#develop-a-migrat ### Load test data -It's useful to load test data into CockroachDB so that you can [test your application queries](#validate-queries). Refer to [Migration flows]({% link molt/migration-overview.md %}#migration-flows). +It's useful to load test data into CockroachDB so that you can [test your application queries](#validate-queries). -MOLT Fetch [supports both `IMPORT INTO` and `COPY FROM`]({% link molt/molt-fetch.md %}#data-load-mode) for loading data into CockroachDB: +MOLT Fetch [supports both `IMPORT INTO` and `COPY FROM`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) for loading data into CockroachDB: - Use `IMPORT INTO` for maximum throughput when the target tables can be offline. For a bulk data migration, most users should use `IMPORT INTO` because the tables will be offline anyway, and `IMPORT INTO` can [perform the data import much faster]({% link {{ site.current_cloud_version }}/import-performance-best-practices.md %}) than `COPY FROM`. - Use `COPY FROM` (or `--direct-copy`) when the target must remain queryable during load. @@ -147,7 +132,7 @@ To further minimize potential surprises when you conduct the migration, practice Performing a dry run is highly recommended. In addition to demonstrating how long the migration may take, a dry run also helps to ensure that team members understand what they need to do during the migration, and that changes to the application are coordinated. -## Cutover strategy + ## See also diff --git a/src/current/molt/molt-fetch-best-practices.md b/src/current/molt/molt-fetch-best-practices.md new file mode 100644 index 00000000000..8c14342a3c2 --- /dev/null +++ b/src/current/molt/molt-fetch-best-practices.md @@ -0,0 +1,69 @@ +--- +title: MOLT Fetch Best Practices +summary: Learn best practices for using MOLT Fetch to migrate data to CockroachDB. +toc: true +docs_area: migrate +--- + +## Test and validate + +To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production. + +## Configure the source database and connection + +- To prevent connections from terminating prematurely during the [data export phase]({% link molt/molt-fetch.md %}#data-export-phase), set the following to high values on the source database: + + - **Maximum allowed number of connections.** MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards ([`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags)) multiplied by the number of tables ([`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags)) being exported concurrently. + + {{site.data.alerts.callout_info}} + With the default numerical range sharding, only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded. PostgreSQL users can enable [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to use statistics-based sharding for tables with primary keys of any data type. For details, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). + {{site.data.alerts.end}} + + - **Maximum lifetime of a connection.** + +- If a PostgreSQL database is set as a [source]({% link molt/molt-fetch.md %}#specify-source-and-target-databases), ensure that [`idle_in_transaction_session_timeout`](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) on PostgreSQL is either disabled or set to a value longer than the duration of the [data export phase]({% link molt/molt-fetch.md %}#data-export-phase). Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of [`molt_fetch_table_export_duration_ms`]({% link molt/molt-fetch-monitoring.md %}#metrics) for all exported tables. + +## Optimize performance + +- {% include molt/molt-drop-constraints-indexes.md %} + +- For PostgreSQL sources using [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), run [`ANALYZE`]({% link {{ site.current_cloud_version }}/create-statistics.md %}) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance. + +- To prevent memory outages during `READ COMMITTED` [data export]({% link molt/molt-fetch.md %}#data-export-phase) of tables with large rows, estimate the amount of memory used to export a table: + + ~~~ + --row-batch-size * --export-concurrency * average size of the table rows + ~~~ + + If you are exporting more than one table at a time (i.e., [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) is set higher than `1`), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run `molt fetch`, and adjust [`--row-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#row-batch-size) accordingly. For details on how concurrency and sharding interact, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). + +- If a table in the source database is much larger than the other tables, [filter and export the largest table]({% link molt/molt-fetch.md %}#schema-and-table-selection) in its own `molt fetch` task. Repeat this for each of the largest tables. Then export the remaining tables in another task. + +- Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric]({% link molt/molt-fetch-monitoring.md %}#metrics) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). + +## Import and continuation handling + +- When using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) during the [data import phase]({% link molt/molt-fetch.md %}#data-import-phase) to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, [manually resume or cancel the job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs). Then resume `molt fetch` using [continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption), or restart the task from the beginning. + +## Security + +Cockroach Labs strongly recommends the following security practices. + +### Connection security + +{% include molt/molt-secure-connection-strings.md %} + +{{site.data.alerts.callout_info}} +By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmode` not set on MySQL) are disallowed. When using an insecure connection, `molt fetch` returns an error. To override this check, you can enable the [`--allow-tls-mode-disable`]({% link molt/molt-fetch-commands-and-flags.md %}#allow-tls-mode-disable) flag. Do this **only** when testing, or if a secure SSL/TLS connection to the source or target database is not possible. +{{site.data.alerts.end}} + +### Cloud storage security + +{% include molt/fetch-secure-cloud-storage.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/molt-fetch-commands-and-flags.md b/src/current/molt/molt-fetch-commands-and-flags.md new file mode 100644 index 00000000000..1c3a61c2103 --- /dev/null +++ b/src/current/molt/molt-fetch-commands-and-flags.md @@ -0,0 +1,87 @@ +--- +title: MOLT Fetch Commands and Flags +summary: Reference documentation for MOLT Fetch commands and flags. +toc: true +docs_area: migrate +--- + +## Commands + +| Command | Usage | +|---------|---------------------------------------------------------------------------------------------------| +| `fetch` | Start the fetch task. This loads data from a source database to a target CockroachDB database. | + +### Subcommands + +| Command | Usage | +|--------------|----------------------------------------------------------------------| +| `tokens list` | List active [continuation tokens]({% link molt/molt-fetch.md %}#list-active-continuation-tokens). | + +## Flags + +### Global flags + +| Flag | Description | +|---------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `--source` | (Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to [Source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases). | +| `--source-cdb` | Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to [Source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases). | +| `--target` | (Required) Connection string for the target database. For details, refer to [Source and target databases]({% link molt/molt-fetch.md %}#specify-source-and-target-databases). | +| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | +| `--assume-role` | Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to [Cloud Storage Authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}). | +| `--bucket-path` | The path within the [cloud storage]({% link molt/molt-fetch.md %}#bucket-path) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`. | +| `--case-sensitive` | Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is **not** included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`).

**Default:** `false` | +| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage]({% link molt/molt-fetch.md %}#define-intermediate-storage). **Note:** Cleanup does not occur on [continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). | +| `--compression` | Compression method for data when using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) (`gzip`/`none`).

**Default:** `gzip` | +| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). | +| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption). | +| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | +| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | +| `--direct-copy` | Enables [direct copy]({% link molt/molt-fetch.md %}#direct-copy), which copies data directly from source to target without using an intermediate store. | +| `--export-concurrency` | Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the [data export phase]({% link molt/molt-fetch.md %}#data-export-phase) and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export).

**Default:** `4` | +| `--export-retry-max-attempts` | Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `3` | +| `--export-retry-max-duration` | Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `5m0s` | +| `--filter-path` | Path to a JSON file defining row-level filters for the [data import phase]({% link molt/molt-fetch.md %}#data-import-phase). Refer to [Selective data movement]({% link molt/molt-fetch.md %}#select-data-to-migrate). | +| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | +| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | +| `--ignore-replication-check` | Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica. | +| `--import-batch-size` | The number of files to be imported at a time to the target database during the [data import phase]({% link molt/molt-fetch.md %}#data-import-phase). This applies only when using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) for data movement. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry]({% link molt/molt-fetch.md %}#continue-molt-fetch-after-interruption) the entire batch.

**Default:** `1000` | +| `--import-region` | The region of the [cloud storage]({% link molt/molt-fetch.md %}#bucket-path) bucket. This applies only to [Amazon S3 buckets]({% link molt/molt-fetch.md %}#bucket-path). Set this flag only if you need to specify an `AWS_REGION` explicitly when using [`IMPORT INTO`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) for data movement. For example, `--import-region=ap-south-1`. | +| `--local-path` | The path within the [local file server]({% link molt/molt-fetch.md %}#local-path) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | +| `--local-path-crdb-access-addr` | Address of a [local file server]({% link molt/molt-fetch.md %}#local-path) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | +| `--local-path-listen-addr` | Write intermediate files to a [local file server]({% link molt/molt-fetch.md %}#local-path) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | +| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | +| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | +| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to [Monitoring]({% link molt/molt-fetch-monitoring.md %}).

**Default:** `'127.0.0.1:3030'` | +| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to [Fetch mode]({% link molt/molt-fetch.md %}#define-fetch-mode).

**Default:** `data-load` | +| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | +| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | +| `--row-batch-size` | Number of rows per shard to export at a time. For details on sharding, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). See also [Best practices]({% link molt/molt-fetch-best-practices.md %}).

**Default:** `100000` | +| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--skip-pk-check` | Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to [Skip primary key matching]({% link molt/molt-fetch.md %}#skip-primary-key-matching).

**Default:** `false` | +| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

**Default:** `4` | +| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | +| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | +| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling]({% link molt/molt-fetch.md %}#handle-target-tables).

**Default:** `none` | +| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations]({% link molt/molt-fetch.md %}#define-transformations). | +| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping]({% link molt/molt-fetch.md %}#type-mapping). | +| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | +| `--use-copy` | Use [`COPY FROM`]({% link molt/molt-fetch.md %}#import-into-vs-copy-from) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data movement]({% link molt/molt-fetch.md %}#import-into-vs-copy-from). | +| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage]({% link molt/molt-fetch.md %}#bucket-path) URIs. | +| `--use-stats-based-sharding` | Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to [Table sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export). | + + +### `tokens list` flags + +| Flag | Description | +|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------| +| `--conn-string` | (Required) Connection string for the target database. For details, see [List active continuation tokens]({% link molt/molt-fetch.md %}#list-active-continuation-tokens). | +| `-n`, `--num-results` | Number of results to return. | + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [MOLT Fetch Monitoring]({% link molt/molt-fetch-monitoring.md %}) +- [MOLT Fetch Troubleshooting]({% link molt/molt-fetch-troubleshooting.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-fetch-installation.md b/src/current/molt/molt-fetch-installation.md new file mode 100644 index 00000000000..fee592f94a5 --- /dev/null +++ b/src/current/molt/molt-fetch-installation.md @@ -0,0 +1,52 @@ +--- +title: MOLT Fetch Installation +summary: Learn how to install MOLT Fetch and configure prerequisites for data migration. +toc: true +docs_area: migrate +--- + +## Prerequisites + +### Supported databases + +The following source databases are supported: + +- PostgreSQL 11-16 +- MySQL 5.7, 8.0 and later +- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) + +### Database configuration + +Ensure that the source and target schemas are identical, unless you enable automatic schema creation with the [`drop-on-target-and-recreate`]({% link molt/molt-fetch.md %}#handle-target-tables) option. If you are creating the target schema manually, review the behaviors in [Mismatch handling]({% link molt/molt-fetch.md %}#mismatch-handling). + +{{site.data.alerts.callout_info}} +MOLT Fetch does not support migrating sequences. If your source database contains sequences, refer to the [guidance on indexing with sequential keys]({% link {{site.current_cloud_version}}/sql-faqs.md %}#how-do-i-generate-unique-slowly-increasing-sequential-numbers-in-cockroachdb). If a sequential key is necessary in your CockroachDB table, you must create it manually. After using MOLT Fetch to load the data onto the target, but before cutover, make sure to update each sequence's current value using [`setval()`]({% link {{site.current_cloud_version}}/functions-and-operators.md %}#sequence-functions) so that new inserts continue from the correct point. +{{site.data.alerts.end}} + +If you plan to use cloud storage for the data migration, follow [Cloud storage security]({% link molt/molt-fetch-best-practices.md %}#cloud-storage-security) best practices. + +### User permissions + +The SQL user running MOLT Fetch requires specific privileges on both the source and target databases: + +| Database | Required Privileges | Details | +|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source |
  • `CONNECT` on database.
  • `USAGE` on schema.
  • `SELECT` on tables to migrate.
| [Create PostgreSQL migration user]({% link molt/migrate-bulk-load.md %}#create-migration-user-on-source-database) | +| MySQL source |
  • `SELECT` on tables to migrate.
| [Create MySQL migration user]({% link molt/migrate-bulk-load.md %}?filters=mysql#create-migration-user-on-source-database) | +| Oracle source |
  • `CONNECT` and `CREATE SESSION`.
  • `SELECT` and `FLASHBACK` on tables to migrate.
  • `SELECT` on metadata views (`ALL_USERS`, `DBA_USERS`, `DBA_OBJECTS`, `DBA_SYNONYMS`, `DBA_TABLES`).
| [Create Oracle migration user]({% link molt/migrate-bulk-load.md %}?filters=oracle#create-migration-user-on-source-database) | +| CockroachDB target |
  • `ALL` on target database.
  • `CREATE` on schema.
  • `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables.
  • For `IMPORT INTO`: `SELECT`, `INSERT`, `DROP` on target tables. Optionally `EXTERNALIOIMPLICITACCESS` for implicit cloud storage authentication.
  • For `COPY FROM`: `admin` role.
| [Create CockroachDB user]({% link molt/migrate-bulk-load.md %}#create-the-sql-user) | + +## Installation + +{% include molt/molt-install.md %} + +### Docker usage + +{% include molt/molt-docker.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Fetch Commands and Flags]({% link molt/molt-fetch-commands-and-flags.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-fetch-monitoring.md b/src/current/molt/molt-fetch-monitoring.md new file mode 100644 index 00000000000..ab498e412b3 --- /dev/null +++ b/src/current/molt/molt-fetch-monitoring.md @@ -0,0 +1,31 @@ +--- +title: MOLT Fetch Metrics +summary: Learn how to monitor MOLT Fetch during data migration using Prometheus metrics. +toc: true +docs_area: migrate +--- + +## Metrics + +By default, MOLT Fetch exports [Prometheus](https://prometheus.io/) metrics at `127.0.0.1:3030/metrics`. You can configure this endpoint with the [`--metrics-listen-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#metrics-listen-addr) flag. + +Cockroach Labs recommends monitoring the following metrics: + +| Metric Name | Description | +|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| +| `molt_fetch_num_tables` | Number of tables that will be moved from the source. | +| `molt_fetch_num_task_errors` | Number of errors encountered by the fetch task. | +| `molt_fetch_overall_duration` | Duration (in seconds) of the fetch task. | +| `molt_fetch_rows_exported` | Number of rows that have been exported from a table. For example:
`molt_fetch_rows_exported{table="public.users"}` | +| `molt_fetch_rows_imported` | Number of rows that have been imported from a table. For example:
`molt_fetch_rows_imported{table="public.users"}` | +| `molt_fetch_table_export_duration_ms` | Duration (in milliseconds) of a table's export. For example:
`molt_fetch_table_export_duration_ms{table="public.users"}` | +| `molt_fetch_table_import_duration_ms` | Duration (in milliseconds) of a table's import. For example:
`molt_fetch_table_import_duration_ms{table="public.users"}` | + +You can also use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view the preceding metrics. + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) diff --git a/src/current/molt/molt-fetch-troubleshooting.md b/src/current/molt/molt-fetch-troubleshooting.md new file mode 100644 index 00000000000..59555ce78be --- /dev/null +++ b/src/current/molt/molt-fetch-troubleshooting.md @@ -0,0 +1,21 @@ +--- +title: MOLT Fetch Troubleshooting +summary: Troubleshoot common issues when using MOLT Fetch for data migration. +toc: true +docs_area: migrate +--- + +
+ + + +
+ +{% include molt/molt-troubleshooting-fetch.md %} + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Verify]({% link molt/molt-verify.md %}) diff --git a/src/current/molt/molt-fetch.md b/src/current/molt/molt-fetch.md index fe4318bf145..ed7c3497f15 100644 --- a/src/current/molt/molt-fetch.md +++ b/src/current/molt/molt-fetch.md @@ -7,158 +7,55 @@ docs_area: migrate MOLT Fetch moves data from a source database into CockroachDB as part of a [database migration]({% link molt/migration-overview.md %}). -MOLT Fetch uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to move the source data to cloud storage (Google Cloud Storage, Amazon S3, or Azure Blob Storage), a local file server, or local memory. Once the data is exported, MOLT Fetch loads the data into a target CockroachDB database. For details, refer to [Migration phases](#migration-phases). +MOLT Fetch uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to move the source data to cloud storage (Google Cloud Storage, Amazon S3, or Azure Blob Storage), a local file server, or local memory. Once the data is exported, MOLT Fetch loads the data into a target CockroachDB database. -## Terminology + -## Prerequisites +## How it works -### Supported databases +MOLT Fetch operates in two distinct phases to move data from the source databases to CockroachDB. The [data export phase](#data-export-phase) moves data to intermediate storage (either cloud storage or a local file server). The [data import phase](#data-import-phase) moves data from that intermediate storage to the CockroachDB cluster. For details on available modes, refer to [Define fetch mode](#define-fetch-mode). -The following source databases are supported: - -- PostgreSQL 11-16 -- MySQL 5.7, 8.0 and later -- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) - -### Database configuration - -Ensure that the source and target schemas are identical, unless you enable automatic schema creation with the [`drop-on-target-and-recreate`](#target-table-handling) option. If you are creating the target schema manually, review the behaviors in [Mismatch handling](#mismatch-handling). - -{{site.data.alerts.callout_info}} -MOLT Fetch does not support migrating sequences. If your source database contains sequences, refer to the [guidance on indexing with sequential keys]({% link {{site.current_cloud_version}}/sql-faqs.md %}#how-do-i-generate-unique-slowly-increasing-sequential-numbers-in-cockroachdb). If a sequential key is necessary in your CockroachDB table, you must create it manually. After using MOLT Fetch to load the data onto the target, but before cutover, make sure to update each sequence's current value using [`setval()`]({% link {{site.current_cloud_version}}/functions-and-operators.md %}#sequence-functions) so that new inserts continue from the correct point. -{{site.data.alerts.end}} - -If you plan to use cloud storage for the data migration, follow the steps in [Cloud storage security](#cloud-storage-security). +
+MOLT Fetch flow draft +
-### User permissions +### Data export phase -The SQL user running MOLT Fetch requires specific privileges on both the source and target databases: +In this first phase, MOLT Fetch connects to the source database and exports table data to intermediate storage. -| Database | Required Privileges | Details | -|--------------------|------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source |
  • `CONNECT` on database.
  • `USAGE` on schema.
  • `SELECT` on tables to migrate.
| [Create PostgreSQL migration user]({% link molt/migrate-bulk-load.md %}#create-migration-user-on-source-database) | -| MySQL source |
  • `SELECT` on tables to migrate.
| [Create MySQL migration user]({% link molt/migrate-bulk-load.md %}?filters=mysql#create-migration-user-on-source-database) | -| Oracle source |
  • `CONNECT` and `CREATE SESSION`.
  • `SELECT` and `FLASHBACK` on tables to migrate.
  • `SELECT` on metadata views (`ALL_USERS`, `DBA_USERS`, `DBA_OBJECTS`, `DBA_SYNONYMS`, `DBA_TABLES`).
| [Create Oracle migration user]({% link molt/migrate-bulk-load.md %}?filters=oracle#create-migration-user-on-source-database) | -| CockroachDB target |
  • `ALL` on target database.
  • `CREATE` on schema.
  • `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables.
  • For `IMPORT INTO`: `SELECT`, `INSERT`, `DROP` on target tables. Optionally `EXTERNALIOIMPLICITACCESS` for implicit cloud storage authentication.
  • For `COPY FROM`: `admin` role.
| [Create CockroachDB user]({% link molt/migrate-bulk-load.md %}#create-the-sql-user) | +- [**Selective data movement**](#select-data-to-migrate): By default, MOLT Fetch moves all data from the --source database to CockroachDB. If instead you want to move a subset of the available data, use the [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter), [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter), and [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) flags. -## Installation +- [**Table sharding for concurrent export**](#shard-tables-for-concurrent-export): Multiple tables and **table shards** can be exported simultaneously using [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#table-concurrency) and [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency), with large tables divided into shards for parallel processing. -{% include molt/molt-install.md %} +- [**Load into intermediate storage**](#define-intermediate-storage): Define whether data is written to cloud storage (Amazon S3, Google Cloud Storage, Azure Blob Storage), a local file server, or directly to CockroachDB memory. Intermediate storage enables [continuation after a MOLT Fetch failure](#continue-molt-fetch-after-interruption) by storing **continuation tokens**. -### Docker usage +### Data import phase -{% include molt/molt-docker.md %} +MOLT Fetch loads the exported data from intermediate storage to the target CockroachDB database. -## Migration phases +- [**`IMPORT INTO` vs. `COPY FROM`**](#import-into-vs-copy-from): This phase uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) (faster, tables offline during import) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) (slower, tables remain queryable) to move data. -MOLT Fetch operates in distinct phases to move data from source databases to CockroachDB. For details on available modes, refer to [Fetch mode](#fetch-mode). +- [**Target table handling**](#handle-target-tables): Target tables can be automatically created, truncated, or left unchanged based on [`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) settings. -### Data export phase +- [**Schema/table transformations**](#define-transformations): Use JSON to map computed columns from source to target, map partitioned tables to a single target table, rename tables on the target database, or rename database schemas. -MOLT Fetch connects to the source database and exports table data to intermediate storage. Data is written to [cloud storage](#bucket-path) (Amazon S3, Google Cloud Storage, Azure Blob Storage), a [local file server](#local-path), or [directly to CockroachDB memory](#direct-copy). Multiple tables and table shards can be exported simultaneously using [`--table-concurrency`](#global-flags) and [`--export-concurrency`](#global-flags), with large tables divided into shards for parallel processing. For details, refer to: +Refer to [the MOLT Fetch flags]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to learn how to use any flag for the `molt fetch` command. -- [Fetch mode](#fetch-mode) -- [Table sharding](#table-sharding) +## Run MOLT Fetch -### Data import phase +The following section describes how to use the [`molt fetch`]({% link molt/molt-fetch-commands-and-flags.md %}#commands) command and how to set its main [flags]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). -MOLT Fetch loads the exported data into the target CockroachDB database. The process uses [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) (faster, tables offline during import) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) (slower, tables remain queryable) to move data. Data files are imported in configurable batches using [`--import-batch-size`](#global-flags), and target tables can be automatically created, truncated, or left unchanged based on [`--table-handling`](#global-flags) settings. For details, refer to: - -- [Data movement](#data-load-mode) -- [Target table handling](#target-table-handling) - -## Commands - -| Command | Usage | -|---------|---------------------------------------------------------------------------------------------------| -| `fetch` | Start the fetch task. This loads data from a source database to a target CockroachDB database. | - -### Subcommands - -| Command | Usage | -|--------------|----------------------------------------------------------------------| -| `tokens list` | List active [continuation tokens](#list-active-continuation-tokens). | - -## Flags - -### Global flags - -| Flag | Description | -|------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `--source` | (Required) Connection string used to connect to the Oracle PDB (in a CDB/PDB architecture) or to a standalone database (non‑CDB). For details, refer to [Source and target databases](#source-and-target-databases). | -| `--source-cdb` | Connection string for the Oracle container database (CDB) when using a multitenant (CDB/PDB) architecture. Omit this flag on a non‑multitenant Oracle database. For details, refer to [Source and target databases](#source-and-target-databases). | -| `--target` | (Required) Connection string for the target database. For details, refer to [Source and target databases](#source-and-target-databases). | -| `--allow-tls-mode-disable` | Allow insecure connections to databases. Secure SSL/TLS connections should be used by default. This should be enabled **only** if secure SSL/TLS connections to the source or target database are not possible. | -| `--assume-role` | Service account to use for assume role authentication. `--use-implicit-auth` must be included. For example, `--assume-role='user-test@cluster-ephemeral.iam.gserviceaccount.com' --use-implicit-auth`. For details, refer to [Cloud Storage Authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}). | -| `--bucket-path` | The path within the [cloud storage](#bucket-path) bucket where intermediate files are written (e.g., `'s3://bucket/path'` or `'gs://bucket/path'`). Only the URL path is used; query parameters (e.g., credentials) are ignored. To pass in query parameters, use the appropriate flags: `--assume-role`, `--import-region`, `--use-implicit-auth`. | -| `--case-sensitive` | Toggle case sensitivity when comparing table and column names on the source and target. To disable case sensitivity, set `--case-sensitive=false`. If `=` is **not** included (e.g., `--case-sensitive false`), the flag is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`).

**Default:** `false` | -| `--cleanup` | Whether to delete intermediate files after moving data using [cloud or local storage](#data-path). **Note:** Cleanup does not occur on [continuation](#fetch-continuation). | -| `--compression` | Compression method for data when using [`IMPORT INTO`](#data-load-mode) (`gzip`/`none`).

**Default:** `gzip` | -| `--continuation-file-name` | Restart fetch at the specified filename if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--continuation-token` | Restart fetch at a specific table, using the specified continuation token, if the task encounters an error. `--fetch-id` must be specified. For details, see [Fetch continuation](#fetch-continuation). | -| `--crdb-pts-duration` | The duration for which each timestamp used in data export from a CockroachDB source is protected from garbage collection. This ensures that the data snapshot remains consistent. For example, if set to `24h`, each timestamp is protected for 24 hours from the initiation of the export job. This duration is extended at regular intervals specified in `--crdb-pts-refresh-interval`.

**Default:** `24h0m0s` | -| `--crdb-pts-refresh-interval` | The frequency at which the protected timestamp's validity is extended. This interval maintains protection of the data snapshot until data export from a CockroachDB source is completed. For example, if set to `10m`, the protected timestamp's expiration will be extended by the duration specified in `--crdb-pts-duration` (e.g., `24h`) every 10 minutes while export is not complete.

**Default:** `10m0s` | -| `--direct-copy` | Enables [direct copy](#direct-copy), which copies data directly from source to target without using an intermediate store. | -| `--export-concurrency` | Number of shards to export at a time per table, each on a dedicated thread. This controls how many shards are created for each individual table during the [data export phase](#data-export-phase) and is distinct from `--table-concurrency`, which controls how many tables are processed simultaneously. The total number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`. Tables can be sharded with a range-based or stats-based mechanism. For details, refer to [Table sharding](#table-sharding).

**Default:** `4` | -| `--export-retry-max-attempts` | Maximum number of retry attempts for source export queries when connection failures occur. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `3` | -| `--export-retry-max-duration` | Maximum total duration for retrying source export queries. If `0`, no time limit is enforced. Only supported for PostgreSQL and CockroachDB sources.

**Default:** `5m0s` | -| `--filter-path` | Path to a JSON file defining row-level filters for the [data import phase](#data-import-phase). Refer to [Selective data movement](#selective-data-movement). | -| `--fetch-id` | Restart fetch task corresponding to the specified ID. If `--continuation-file-name` or `--continuation-token` are not specified, fetch restarts for all failed tables. | -| `--flush-rows` | Number of rows before the source data is flushed to intermediate files. **Note:** If `--flush-size` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--flush-size` | Size (in bytes) before the source data is flushed to intermediate files. **Note:** If `--flush-rows` is also specified, the fetch behavior is based on the flag whose criterion is met first. | -| `--ignore-replication-check` | Skip querying for replication checkpoints such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle. This option is intended for use during bulk load migrations or when doing a one-time data export from a read replica. | -| `--import-batch-size` | The number of files to be imported at a time to the target database during the [data import phase](#data-import-phase). This applies only when using [`IMPORT INTO`](#data-load-mode) for data movement. **Note:** Increasing this value can improve the performance of full-scan queries on the target database shortly after fetch completes, but very high values are not recommended. If any individual file in the import batch fails, you must [retry](#fetch-continuation) the entire batch.

**Default:** `1000` | -| `--import-region` | The region of the [cloud storage](#bucket-path) bucket. This applies only to [Amazon S3 buckets](#bucket-path). Set this flag only if you need to specify an `AWS_REGION` explicitly when using [`IMPORT INTO`](#data-load-mode) for data movement. For example, `--import-region=ap-south-1`. | -| `--local-path` | The path within the [local file server](#local-path) where intermediate files are written (e.g., `data/migration/cockroach`). `--local-path-listen-addr` must be specified. | -| `--local-path-crdb-access-addr` | Address of a [local file server](#local-path) that is **publicly accessible**. This flag is only necessary if CockroachDB cannot reach the local address specified with `--local-path-listen-addr` (e.g., when moving data to a CockroachDB {{ site.data.products.cloud }} deployment). `--local-path` and `--local-path-listen-addr` must be specified.

**Default:** Value of `--local-path-listen-addr`. | -| `--local-path-listen-addr` | Write intermediate files to a [local file server](#local-path) at the specified address (e.g., `'localhost:3000'`). `--local-path` must be specified. | -| `--log-file` | Write messages to the specified log filename. If no filename is provided, messages write to `fetch-{datetime}.log`. If `"stdout"` is provided, messages write to `stdout`. | -| `--logging` | Level at which to log messages (`trace`/`debug`/`info`/`warn`/`error`/`fatal`/`panic`).

**Default:** `info` | -| `--metrics-listen-addr` | Address of the Prometheus metrics endpoint, which has the path `{address}/metrics`. For details on important metrics to monitor, refer to [Monitoring](#monitoring).

**Default:** `'127.0.0.1:3030'` | -| `--mode` | Configure the MOLT Fetch behavior: `data-load`, `export-only`, or `import-only`. For details, refer to [Fetch mode](#fetch-mode).

**Default:** `data-load` | -| `--non-interactive` | Run the fetch task without interactive prompts. This is recommended **only** when running `molt fetch` in an automated process (i.e., a job or continuous integration). | -| `--pglogical-replication-slot-name` | Name of a PostgreSQL replication slot that will be created before taking a snapshot of data. Must match the slot name specified with `--slotName` in the [MOLT Replicator command]({% link molt/molt-replicator.md %}#replication-checkpoints). For details, refer to [Load before replication](#load-before-replication). | -| `--pglogical-publication-and-slot-drop-and-recreate` | Drop the PostgreSQL publication and replication slot if they exist, then recreate them. Creates a publication named `molt_fetch` and the replication slot specified with `--pglogical-replication-slot-name`. For details, refer to [Load before replication](#load-before-replication).

**Default:** `false` | -| `--pprof-listen-addr` | Address of the pprof endpoint.

**Default:** `'127.0.0.1:3031'` | -| `--row-batch-size` | Number of rows per shard to export at a time. For details on sharding, refer to [Table sharding](#table-sharding). See also [Best practices](#best-practices).

**Default:** `100000` | -| `--schema-filter` | Move schemas that match a specified [regular expression](https://wikipedia.org/wiki/Regular_expression). Not used with MySQL sources. For Oracle sources, this filter is case-insensitive.

**Default:** `'.*'` | -| `--skip-pk-check` | Skip primary-key matching to allow data load when source or target tables have missing or mismatched primary keys. Disables sharding and bypasses `--export-concurrency` and `--row-batch-size` settings. Refer to [Skip primary key matching](#skip-primary-key-matching).

**Default:** `false` | -| `--table-concurrency` | Number of tables to export at a time. The number of concurrent threads is the product of `--export-concurrency` and `--table-concurrency`.

**Default:** `4` | -| `--table-exclusion-filter` | Exclude tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

This value **cannot** be set to `'.*'`, which would cause every table to be excluded.

**Default:** Empty string | -| `--table-filter` | Move tables that match a specified [POSIX regular expression](https://wikipedia.org/wiki/Regular_expression).

**Default:** `'.*'` | -| `--table-handling` | How tables are initialized on the target database (`none`/`drop-on-target-and-recreate`/`truncate-if-exists`). For details, see [Target table handling](#target-table-handling).

**Default:** `none` | -| `--transformations-file` | Path to a JSON file that defines transformations to be performed on the target schema during the fetch task. Refer to [Transformations](#transformations). | -| `--type-map-file` | Path to a JSON file that contains explicit type mappings for automatic schema creation, when enabled with `--table-handling drop-on-target-and-recreate`. For details on the JSON format and valid type mappings, see [type mapping](#type-mapping). | -| `--use-console-writer` | Use the console writer, which has cleaner log output but introduces more latency.

**Default:** `false` (log as structured JSON) | -| `--use-copy` | Use [`COPY FROM`](#data-load-mode) to move data. This makes tables queryable during data load, but is slower than using `IMPORT INTO`. For details, refer to [Data load mode](#data-load-mode). | -| `--use-implicit-auth` | Use [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) for [cloud storage](#bucket-path) URIs. | -| `--use-stats-based-sharding` | Enable statistics-based sharding for PostgreSQL sources. This allows sharding of tables with primary keys of any data type and can create more evenly distributed shards compared to the default numerical range sharding. Requires PostgreSQL 11+ and access to `pg_stats`. For details, refer to [Table sharding](#table-sharding). | - - -### `tokens list` flags - -| Flag | Description | -|-----------------------|---------------------------------------------------------------------------------------------------------------------------------------------| -| `--conn-string` | (Required) Connection string for the target database. For details, see [List active continuation tokens](#list-active-continuation-tokens). | -| `-n`, `--num-results` | Number of results to return. | - - -## Usage - -The following sections describe how to use the `molt fetch` [flags](#flags). - -### Source and target databases +### Specify source and target databases {{site.data.alerts.callout_success}} -Follow the recommendations in [Connection security](#connection-security). +Follow the recommendations in [Connection security]({% link molt/molt-fetch-best-practices.md %}#connection-security). {{site.data.alerts.end}} -`--source` specifies the connection string of the source database. +[`--source`]({% link molt/molt-fetch-commands-and-flags.md %}#source) specifies the connection string of the source database. PostgreSQL or CockroachDB connection string: @@ -181,7 +78,7 @@ Oracle connection string: --source 'oracle://{username}:{password}@{host}:{port}/{service_name}' ~~~ -For Oracle Multitenant databases, `--source-cdb` specifies the container database (CDB) connection. `--source` specifies the pluggable database (PDB): +For Oracle Multitenant databases, [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) specifies the container database (CDB) connection. [`--source`]({% link molt/molt-fetch-commands-and-flags.md %}#source) specifies the pluggable database (PDB): {% include_cached copy-clipboard.html %} ~~~ @@ -189,16 +86,16 @@ For Oracle Multitenant databases, `--source-cdb` specifies the container databas --source-cdb 'oracle://{username}:{password}@{host}:{port}/{cdb_service_name}' ~~~ -`--target` specifies the [CockroachDB connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url): +[`--target`]({% link molt/molt-fetch-commands-and-flags.md %}#target) specifies the [CockroachDB connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url): {% include_cached copy-clipboard.html %} ~~~ --target 'postgresql://{username}:{password}@{host}:{port}/{database}' ~~~ -### Fetch mode +### Define fetch mode -`--mode` specifies the MOLT Fetch behavior. +[`--mode`]({% link molt/molt-fetch-commands-and-flags.md %}#mode) specifies the MOLT Fetch behavior. `data-load` (default) instructs MOLT Fetch to load the source data into CockroachDB: @@ -221,29 +118,108 @@ For Oracle Multitenant databases, `--source-cdb` specifies the container databas --mode import-only ~~~ -### Data load mode +### Select data to migrate -MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to load data into CockroachDB. +By default, MOLT Fetch moves all data from the [`--source`]({% link molt/molt-fetch-commands-and-flags.md %}#source) database to CockroachDB. Use the following flags to move a subset of data. -By default, MOLT Fetch uses `IMPORT INTO`: +#### Schema and table selection -- `IMPORT INTO` achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{site.current_cloud_version}}/import-into.md %}#considerations) to achieve its import speed. Tables are taken back online once an [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. See [Best practices](#best-practices). -- `IMPORT INTO` supports compression using the `--compression` flag, which reduces the amount of storage used. +[`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's `migration_schema` schema: -`--use-copy` configures MOLT Fetch to use `COPY FROM`: +{% include_cached copy-clipboard.html %} +~~~ +--schema-filter 'migration_schema' +~~~ -- `COPY FROM` enables your tables to remain online and accessible. However, it is slower than using [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}). -- `COPY FROM` does not support compression. +{{site.data.alerts.callout_info}} +[`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. +{{site.data.alerts.end}} + +[`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) and [`--table-exclusion-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-exclusion-filter) specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name: + +{% include_cached copy-clipboard.html %} +~~~ +--table-filter '.*user.*' --table-exclusion-filter '.*temp.*' +~~~ + +#### Row-level filtering + +Use [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) to specify the path to a JSON file that defines row-level filtering for data load. This enables you to move a subset of data in a table, rather than all data in the table. To apply row-level filters during replication, use [MOLT Replicator]({% link molt/molt-replicator.md %}) with userscripts. + +{% include_cached copy-clipboard.html %} +~~~ +--filter-path 'data-filter.json' +~~~ + +The JSON file should contain one or more entries in `filters`, each with a `resource_specifier` (`schema` and `table`) and a SQL expression `expr`. For example, the following example exports only rows from `migration_schema.t1` where `v > 100`: + +~~~ json +{ + "filters": [ + { + "resource_specifier": { + "schema": "migration_schema", + "table": "t1" + }, + "expr": "v > 100" + } + ] +} +~~~ + +`expr` is case-sensitive and must be valid in your source dialect. For example, when using Oracle as the source, quote all identifiers and escape embedded quotes: + +~~~ json +{ + "filters": [ + { + "resource_specifier": { + "schema": "C##FETCHORACLEFILTERTEST", + "table": "FILTERTBL" + }, + "expr": "ABS(\"X\") > 10 AND CEIL(\"X\") < 100 AND FLOOR(\"X\") > 0 AND ROUND(\"X\", 2) < 100.00 AND TRUNC(\"X\", 0) > 0 AND MOD(\"X\", 2) = 0 AND FLOOR(\"X\" / 3) > 1" + } + ] +} +~~~ {{site.data.alerts.callout_info}} -`COPY FROM` is also used for [direct copy](#direct-copy). +If the expression references columns that are not indexed, MOLT Fetch will emit a warning like: `filter expression 'v > 100' contains column 'v' which is not indexed. This may lead to performance issues.` {{site.data.alerts.end}} -### Table sharding +{% comment %} +#### `--filter-path` userscript for replication + +To use `--filter-path` with replication, create and save a TypeScript userscript (e.g., `filter-script.ts`). The following script ensures that only rows where `v > 100` are replicated to `defaultdb.migration_schema.t1`: + +{% include_cached copy-clipboard.html %} +~~~ ts +import * as api from "replicator@v1"; +function disp(doc, meta) { + if (Number(doc.v) > 100) { + return { "defaultdb.migration_schema.t1" : [ doc ] }; + } +} +// Always put target schema. +api.configureSource("defaultdb.migration_schema", { + deletesTo: disp, + dispatch: disp, +}); +~~~ + +Apply the userscript with the `--userscript` replication flag: + +{% include_cached copy-clipboard.html %} +~~~ +--userscript 'filter-script.ts' +~~~ +{% endcomment %} + +### Shard tables for concurrent export During the [data export phase](#data-export-phase), MOLT Fetch can divide large tables into multiple shards for concurrent export. -To control the number of shards created per table, use the `--export-concurrency` flag. For example: +To control the number of shards created per table, use the [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency) flag. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -251,14 +227,14 @@ To control the number of shards created per table, use the `--export-concurrency ~~~ {{site.data.alerts.callout_success}} -For performance considerations with concurrency settings, refer to [Best practices](#best-practices). +For performance considerations with concurrency settings, refer to [Best practices]({% link molt/molt-fetch-best-practices.md %}). {{site.data.alerts.end}} Two sharding mechanisms are available: - **Range-based sharding (default):** Tables are divided based on numerical ranges found in primary key values. Only tables with [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) primary keys can use range-based sharding. Tables with other primary key data types export as a single shard. -- **Stats-based sharding (PostgreSQL only):** Enable with [`--use-stats-based-sharding`](#global-flags) for PostgreSQL 11+ sources. Tables are divided by analyzing the [`pg_stats`](https://www.postgresql.org/docs/current/view-pg-stats.htm) view to create more evenly distributed shards, up to a maximum of 200 shards. Primary keys of any data type are supported. +- **Stats-based sharding (PostgreSQL only):** Enable with [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#use-stats-based-sharding) for PostgreSQL 11+ sources. Tables are divided by analyzing the [`pg_stats`](https://www.postgresql.org/docs/current/view-pg-stats.htm) view to create more evenly distributed shards, up to a maximum of 200 shards. Primary keys of any data type are supported. Stats-based sharding requires that the user has `SELECT` permissions on source tables and on each table's `pg_stats` view. The latter permission is automatically granted to users that can read the table. @@ -280,7 +256,7 @@ Large tables may take time to analyze, but `ANALYZE` can run in the background. Migration without running `ANALYZE` will still work, but shard distribution may be less even. {{site.data.alerts.end}} -When using `--use-stats-based-sharding`, monitor the log output for each table you want to migrate. +When using [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#use-stats-based-sharding), monitor the log output for each table you want to migrate. If stats-based sharding is successful on a table, MOLT logs the following `INFO` message: @@ -294,25 +270,25 @@ If stats-based sharding fails on a table, MOLT logs the following `WARNING` mess Warning: failed to shard table {table_name} using stats based sharding: {reason_for_failure}, falling back to non stats based sharding ~~~ -The number of shards is dependent on the number of distinct values in the first primary key column of the table to be migrated. If this is different from the number of shards requested with `--export-concurrency`, MOLT logs the following `WARNING` and continues with the migration: +The number of shards is dependent on the number of distinct values in the first primary key column of the table to be migrated. If this is different from the number of shards requested with [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency), MOLT logs the following `WARNING` and continues with the migration: ~~~ number of shards formed: {num_shards_formed} is not equal to number of shards requested: {num_shards_requested} for table {table_name} ~~~ -Because stats-based sharding analyzes the entire table, running `--use-stats-based-sharding` with [`--filter-path`](#global-flags) (refer to [Selective data movement](#selective-data-movement)) will cause imbalanced shards to form. +Because stats-based sharding analyzes the entire table, running [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#use-stats-based-sharding) with [`--filter-path`]({% link molt/molt-fetch-commands-and-flags.md %}#filter-path) (refer to [Selective data movement](#select-data-to-migrate)) will cause imbalanced shards to form. -### Data path +### Define intermediate storage MOLT Fetch can move the source data to CockroachDB via [cloud storage](#bucket-path), a [local file server](#local-path), or [directly](#direct-copy) without an intermediate store. #### Bucket path {{site.data.alerts.callout_success}} -Only the path specified in `--bucket-path` is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in [Secure cloud storage](#cloud-storage-security). +Only the path specified in [`--bucket-path`]({% link molt/molt-fetch-commands-and-flags.md %}#bucket-path) is used. Query parameters, such as credentials, are ignored. To authenticate cloud storage, follow the steps in [Secure cloud storage]({% link molt/molt-fetch-best-practices.md %}#cloud-storage-security). {{site.data.alerts.end}} -`--bucket-path` instructs MOLT Fetch to write intermediate files to a path within [Google Cloud Storage](https://cloud.google.com/storage/docs/buckets), [Amazon S3](https://aws.amazon.com/s3/), or [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) to which you have the necessary permissions. Use additional [flags](#global-flags), shown in the following examples, to specify authentication or region parameters as required for bucket access. +[`--bucket-path`]({% link molt/molt-fetch-commands-and-flags.md %}#bucket-path) instructs MOLT Fetch to write intermediate files to a path within [Google Cloud Storage](https://cloud.google.com/storage/docs/buckets), [Amazon S3](https://aws.amazon.com/s3/), or [Azure Blob Storage](https://azure.microsoft.com/en-us/products/storage/blobs) to which you have the necessary permissions. Use additional [flags]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), shown in the following examples, to specify authentication or region parameters as required for bucket access. Connect to a Google Cloud Storage bucket with [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}#google-cloud-storage-implicit) and [assume role]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}#set-up-google-cloud-storage-assume-role): @@ -332,7 +308,7 @@ Connect to an Amazon S3 bucket and explicitly specify the `ap_south-1` region: ~~~ {{site.data.alerts.callout_info}} -When `--import-region` is set, `IMPORT INTO` must be used for [data movement](#data-load-mode). +When [`--import-region`]({% link molt/molt-fetch-commands-and-flags.md %}#import-region) is set, `IMPORT INTO` must be used for [data movement](#import-into-vs-copy-from). {{site.data.alerts.end}} Connect to an Azure Blob Storage container with [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}?filters=azure#azure-blob-storage-implicit-authentication): @@ -345,7 +321,7 @@ Connect to an Azure Blob Storage container with [implicit authentication]({% lin #### Local path -`--local-path` instructs MOLT Fetch to write intermediate files to a path within a [local file server]({% link {{site.current_cloud_version}}/use-a-local-file-server.md %}). `local-path-listen-addr` specifies the address of the local file server. For example: +[`--local-path`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path) instructs MOLT Fetch to write intermediate files to a path within a [local file server]({% link {{site.current_cloud_version}}/use-a-local-file-server.md %}). [`--local-path-listen-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path-listen-addr) specifies the address of the local file server. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -353,9 +329,9 @@ Connect to an Azure Blob Storage container with [implicit authentication]({% lin --local-path-listen-addr 'localhost:3000' ~~~ -In some cases, CockroachDB will not be able to use the local address specified by `--local-path-listen-addr`. This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect. +In some cases, CockroachDB will not be able to use the local address specified by [`--local-path-listen-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path-listen-addr). This will depend on where CockroachDB is deployed, the runtime OS, and the source dialect. -For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, such that the {{ site.data.products.cloud }} cluster is in a different physical location than the machine running `molt fetch`, then CockroachDB cannot reach an address such as `localhost:3000`. In these situations, use `--local-path-crdb-access-addr` to specify an address for the local file server that is **publicly accessible**. For example: +For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, such that the {{ site.data.products.cloud }} cluster is in a different physical location than the machine running `molt fetch`, then CockroachDB cannot reach an address such as `localhost:3000`. In these situations, use [`--local-path-crdb-access-addr`]({% link molt/molt-fetch-commands-and-flags.md %}#local-path-crdb-access-addr) to specify an address for the local file server that is **publicly accessible**. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -370,115 +346,38 @@ For example, if you are migrating to CockroachDB {{ site.data.products.cloud }}, #### Direct copy -`--direct-copy` specifies that MOLT Fetch should use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store: +[`--direct-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#direct-copy) specifies that MOLT Fetch should use `COPY FROM` to move the source data directly to CockroachDB without an intermediate store: -- Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight: +- Because the data is held in memory, the machine must have sufficient RAM for the data currently in flight: ~~~ average size of each row * --row-batch-size * --export-concurrency * --table-concurrency ~~~ -- Direct copy does not support compression or [continuation](#fetch-continuation). -- The [`--use-copy`](#data-load-mode) flag is redundant with `--direct-copy`. - -### Schema and table selection - -By default, MOLT Fetch moves all data from the [`--source`](#source-and-target-databases) database to CockroachDB. Use the following flags to move a subset of data. - -`--schema-filter` specifies a range of schema objects to move to CockroachDB, formatted as a POSIX regex string. For example, to move every table in the source database's `migration_schema` schema: - -{% include_cached copy-clipboard.html %} -~~~ ---schema-filter 'migration_schema' -~~~ - -{{site.data.alerts.callout_info}} -`--schema-filter` does not apply to MySQL sources because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. -{{site.data.alerts.end}} - -`--table-filter` and `--table-exclusion-filter` specify tables to include and exclude from the migration, respectively, formatted as POSIX regex strings. For example, to move every source table that has "user" in the table name and exclude every source table that has "temp" in the table name: - -{% include_cached copy-clipboard.html %} -~~~ ---table-filter '.*user.*' --table-exclusion-filter '.*temp.*' -~~~ - -### Selective data movement +- Direct copy does not support compression or [continuation](#continue-molt-fetch-after-interruption). +- The [`--use-copy`](#import-into-vs-copy-from) flag is redundant with [`--direct-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#direct-copy). -Use `--filter-path` to specify the path to a JSON file that defines row-level filtering for data load. This enables you to move a subset of data in a table, rather than all data in the table. To apply row-level filters during replication, use [MOLT Replicator]({% link molt/molt-replicator.md %}) with userscripts. +### `IMPORT INTO` vs. `COPY FROM` -{% include_cached copy-clipboard.html %} -~~~ ---filter-path 'data-filter.json' -~~~ +MOLT Fetch can use either [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}) or [`COPY FROM`]({% link {{site.current_cloud_version}}/copy.md %}) to load data into CockroachDB. -The JSON file should contain one or more entries in `filters`, each with a `resource_specifier` (`schema` and `table`) and a SQL expression `expr`. For example, the following example exports only rows from `migration_schema.t1` where `v > 100`: +By default, MOLT Fetch uses `IMPORT INTO`: -~~~ json -{ - "filters": [ - { - "resource_specifier": { - "schema": "migration_schema", - "table": "t1" - }, - "expr": "v > 100" - } - ] -} -~~~ +- `IMPORT INTO` achieves the highest throughput, but [requires taking the CockroachDB tables **offline**]({% link {{site.current_cloud_version}}/import-into.md %}#considerations) to achieve its import speed. Tables are taken back online once an [import job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs) completes successfully. See [Best practices]({% link molt/molt-fetch-best-practices.md %}). +- `IMPORT INTO` supports compression using the [`--compression`]({% link molt/molt-fetch-commands-and-flags.md %}#compression) flag, which reduces the amount of storage used. -`expr` is case-sensitive and must be valid in your source dialect. For example, when using Oracle as the source, quote all identifiers and escape embedded quotes: +[`--use-copy`]({% link molt/molt-fetch-commands-and-flags.md %}#use-copy) configures MOLT Fetch to use `COPY FROM`: -~~~ json -{ - "filters": [ - { - "resource_specifier": { - "schema": "C##FETCHORACLEFILTERTEST", - "table": "FILTERTBL" - }, - "expr": "ABS(\"X\") > 10 AND CEIL(\"X\") < 100 AND FLOOR(\"X\") > 0 AND ROUND(\"X\", 2) < 100.00 AND TRUNC(\"X\", 0) > 0 AND MOD(\"X\", 2) = 0 AND FLOOR(\"X\" / 3) > 1" - } - ] -} -~~~ +- `COPY FROM` enables your tables to remain online and accessible. However, it is slower than using [`IMPORT INTO`]({% link {{site.current_cloud_version}}/import-into.md %}). +- `COPY FROM` does not support compression. {{site.data.alerts.callout_info}} -If the expression references columns that are not indexed, MOLT Fetch will emit a warning like: `filter expression ‘v > 100' contains column ‘v' which is not indexed. This may lead to performance issues.` +`COPY FROM` is also used for [direct copy](#direct-copy). {{site.data.alerts.end}} -{% comment %} -#### `--filter-path` userscript for replication - -To use `--filter-path` with replication, create and save a TypeScript userscript (e.g., `filter-script.ts`). The following script ensures that only rows where `v > 100` are replicated to `defaultdb.migration_schema.t1`: - -{% include_cached copy-clipboard.html %} -~~~ ts -import * as api from "replicator@v1"; -function disp(doc, meta) { - if (Number(doc.v) > 100) { - return { "defaultdb.migration_schema.t1" : [ doc ] }; - } -} -// Always put target schema. -api.configureSource("defaultdb.migration_schema", { - deletesTo: disp, - dispatch: disp, -}); -~~~ - -Apply the userscript with the `--userscript` replication flag: - -{% include_cached copy-clipboard.html %} -~~~ ---userscript 'filter-script.ts' -~~~ -{% endcomment %} +### Handle target tables -### Target table handling - -`--table-handling` defines how MOLT Fetch loads data on the CockroachDB tables that [match the selection](#schema-and-table-selection). +[`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) defines how MOLT Fetch loads data on the CockroachDB tables that [match the selection](#schema-and-table-selection). To load the data without changing the existing data in the tables, use `none`: @@ -505,21 +404,21 @@ When using the `drop-on-target-and-recreate` option, MOLT Fetch creates a new Co #### Mismatch handling -If either [`none`](#target-table-handling) or [`truncate-if-exists`](#target-table-handling) is set, `molt fetch` loads data into the existing tables on the target CockroachDB database. If the target schema mismatches the source schema, `molt fetch` will exit early in certain cases, and will need to be re-run from the beginning. For details, refer to [Fetch exits early due to mismatches](#fetch-exits-early-due-to-mismatches). +If either [`none`](#handle-target-tables) or [`truncate-if-exists`](#handle-target-tables) is set, `molt fetch` loads data into the existing tables on the target CockroachDB database. If the target schema mismatches the source schema, `molt fetch` will exit early in certain cases, and will need to be re-run from the beginning. For details, refer to [Fetch exits early due to mismatches]({% link molt/molt-fetch-troubleshooting.md %}#fetch-exits-early-due-to-mismatches). {{site.data.alerts.callout_info}} -This does not apply when [`drop-on-target-and-recreate`](#target-table-handling) is specified, since this option automatically creates a compatible CockroachDB schema. +This does not apply when [`drop-on-target-and-recreate`](#handle-target-tables) is specified, since this option automatically creates a compatible CockroachDB schema. {{site.data.alerts.end}} #### Skip primary key matching -`--skip-pk-check` removes the [requirement that source and target tables share matching primary keys](#fetch-exits-early-due-to-mismatches) for data load. When this flag is set: +[`--skip-pk-check`]({% link molt/molt-fetch-commands-and-flags.md %}#skip-pk-check) removes the [requirement that source and target tables share matching primary keys]({% link molt/molt-fetch-troubleshooting.md %}#fetch-exits-early-due-to-mismatches) for data load. When this flag is set: - The data load proceeds even if the source or target table lacks a primary key, or if their primary key columns do not match. -- [Table sharding](#table-sharding) is disabled. Each table is exported in a single batch within one shard, bypassing `--export-concurrency` and `--row-batch-size`. As a result, memory usage and execution time may increase due to full table scans. +- [Table sharding](#shard-tables-for-concurrent-export) is disabled. Each table is exported in a single batch within one shard, bypassing [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#export-concurrency) and [`--row-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#row-batch-size). As a result, memory usage and execution time may increase due to full table scans. - If the source table contains duplicate rows but the target has [`PRIMARY KEY`]({% link {{ site.current_cloud_version }}/primary-key.md %}) or [`UNIQUE`]({% link {{ site.current_cloud_version }}/unique.md %}) constraints, duplicate rows are deduplicated during import. -When `--skip-pk-check` is set, all tables are treated as if they lack a primary key, and are thus exported in a single unsharded batch. To avoid performance issues, use this flag with `--table-filter` to target only tables **without** a primary key. +When [`--skip-pk-check`]({% link molt/molt-fetch-commands-and-flags.md %}#skip-pk-check) is set, all tables are treated as if they lack a primary key, and are thus exported in a single unsharded batch. To avoid performance issues, use this flag with [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) to target only tables **without** a primary key. For example: @@ -531,7 +430,7 @@ molt fetch \ --skip-pk-check ~~~ -Example log output when `--skip-pk-check` is enabled: +Example log output when [`--skip-pk-check`]({% link molt/molt-fetch-commands-and-flags.md %}#skip-pk-check) is enabled: ~~~json {"level":"info","message":"sharding is skipped for table public.nopktbl - flag skip-pk-check is specified and thus no PK for source table is specified"} @@ -539,65 +438,9 @@ Example log output when `--skip-pk-check` is enabled: #### Type mapping -If [`drop-on-target-and-recreate`](#target-table-handling) is set, MOLT Fetch automatically creates a CockroachDB schema that is compatible with the source data. The column types are determined as follows: - -- PostgreSQL types are mapped to existing CockroachDB [types]({% link {{site.current_cloud_version}}/data-types.md %}) that have the same [`OID`]({% link {{site.current_cloud_version}}/oid.md %}). -- The following MySQL types are mapped to corresponding CockroachDB types: - - | MySQL type | CockroachDB type | Notes | - |-----------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------------------------------------------| - | `CHAR`, `CHARACTER`, `VARCHAR`, `NCHAR`, `NVARCHAR` | [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %}) | Varying-length string; raises warning if BYTE semantics used | - | `TINYTEXT`, `TEXT`, `MEDIUMTEXT`, `LONGTEXT` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Unlimited-length string | - | `GEOMETRY` | [`GEOMETRY`]({% link {{site.current_cloud_version}}/architecture/glossary.md %}#geometry) | Spatial type (PostGIS-style) | - | `LINESTRING` | [`LINESTRING`]({% link {{site.current_cloud_version}}/linestring.md %}) | Spatial type (PostGIS-style) | - | `POINT` | [`POINT`]({% link {{site.current_cloud_version}}/point.md %}) | Spatial type (PostGIS-style) | - | `POLYGON` | [`POLYGON`]({% link {{site.current_cloud_version}}/polygon.md %}) | Spatial type (PostGIS-style) | - | `MULTIPOINT` | [`MULTIPOINT`]({% link {{site.current_cloud_version}}/multipoint.md %}) | Spatial type (PostGIS-style) | - | `MULTILINESTRING` | [`MULTILINESTRING`]({% link {{site.current_cloud_version}}/multilinestring.md %}) | Spatial type (PostGIS-style) | - | `MULTIPOLYGON` | [`MULTIPOLYGON`]({% link {{site.current_cloud_version}}/multipolygon.md %}) | Spatial type (PostGIS-style) | - | `GEOMETRYCOLLECTION`, `GEOMCOLLECTION` | [`GEOMETRYCOLLECTION`]({% link {{site.current_cloud_version}}/geometrycollection.md %}) | Spatial type (PostGIS-style) | - | `JSON` | [`JSONB`]({% link {{site.current_cloud_version}}/jsonb.md %}) | CRDB's native JSON format | - | `TINYINT`, `INT1` | [`INT2`]({% link {{site.current_cloud_version}}/int.md %}) | 2-byte integer | - | `BLOB` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | - | `SMALLINT`, `INT2` | [`INT2`]({% link {{site.current_cloud_version}}/int.md %}) | 2-byte integer | - | `MEDIUMINT`, `INT`, `INTEGER`, `INT4` | [`INT4`]({% link {{site.current_cloud_version}}/int.md %}) | 4-byte integer | - | `BIGINT`, `INT8` | [`INT`]({% link {{site.current_cloud_version}}/int.md %}) | 8-byte integer | - | `FLOAT` | [`FLOAT4`]({% link {{site.current_cloud_version}}/float.md %}) | 32-bit float | - | `DOUBLE` | [`FLOAT`]({% link {{site.current_cloud_version}}/float.md %}) | 64-bit float | - | `DECIMAL`, `NUMERIC`, `REAL` | [`DECIMAL`]({% link {{site.current_cloud_version}}/decimal.md %}) | Validates scale ≤ precision; warns if precision > 19 | - | `BINARY`, `VARBINARY` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | - | `DATETIME` | [`TIMESTAMP`]({% link {{site.current_cloud_version}}/timestamp.md %}) | Date and time (no time zone) | - | `TIMESTAMP` | [`TIMESTAMPTZ`]({% link {{site.current_cloud_version}}/timestamp.md %}) | Date and time with time zone | - | `TIME` | [`TIME`]({% link {{site.current_cloud_version}}/time.md %}) | Time of day (no date) | - | `BIT` | [`VARBIT`]({% link {{site.current_cloud_version}}/bit.md %}) | Variable-length bit array | - | `DATE` | [`DATE`]({% link {{site.current_cloud_version}}/date.md %}) | Date only (no time) | - | `TINYBLOB`, `MEDIUMBLOB`, `LONGBLOB` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | - | `BOOL`, `BOOLEAN` | [`BOOL`]({% link {{site.current_cloud_version}}/bool.md %}) | Boolean | - -- The following Oracle types are mapped to CockroachDB types: - - | Oracle type(s) | CockroachDB type | Notes | - |---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------| - | `NCHAR`, `CHAR`, `CHARACTER` | [`CHAR`]({% link {{site.current_cloud_version}}/string.md %})(n) or [`CHAR`]({% link {{site.current_cloud_version}}/string.md %}) | Fixed-length character; falls back to unbounded if length not specified | - | `VARCHAR`, `VARCHAR2`, `NVARCHAR2` | [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %})(n) or [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %}) | Varying-length string; raises warning if BYTE semantics used | - | `STRING` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Unlimited-length string | - | `SMALLINT` | [`INT2`]({% link {{site.current_cloud_version}}/int.md %}) | 2-byte integer | - | `INTEGER`, `INT`, `SIMPLE_INTEGER` | [`INT4`]({% link {{site.current_cloud_version}}/int.md %}) | 4-byte integer | - | `LONG` | [`INT8`]({% link {{site.current_cloud_version}}/int.md %}) | 8-byte integer | - | `FLOAT`, `BINARY_FLOAT`, `REAL` | [`FLOAT4`]({% link {{site.current_cloud_version}}/float.md %}) | 32-bit float | - | `DOUBLE`, `BINARY_DOUBLE` | [`FLOAT8`]({% link {{site.current_cloud_version}}/float.md %}) | 64-bit float | - | `DEC`, `NUMBER`, `DECIMAL`, `NUMERIC` | [`DECIMAL`]({% link {{site.current_cloud_version}}/decimal.md %})(p, s) or [`DECIMAL`]({% link {{site.current_cloud_version}}/decimal.md %}) | Validates scale ≤ precision; warns if precision > 19 | - | `DATE` | [`DATE`]({% link {{site.current_cloud_version}}/date.md %}) | Date only (no time) | - | `BLOB`, `RAW`, `LONG RAW` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | - | `JSON` | [`JSONB`]({% link {{site.current_cloud_version}}/jsonb.md %}) | CRDB's native JSON format | - | `CLOB`, `NCLOB` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Treated as large text | - | `BOOLEAN` | [`BOOL`]({% link {{site.current_cloud_version}}/bool.md %}) | Boolean | - | `TIMESTAMP` | [`TIMESTAMP`]({% link {{site.current_cloud_version}}/timestamp.md %}) or [`TIMESTAMPTZ`]({% link {{site.current_cloud_version}}/timestamp.md %}) | If `WITH TIME ZONE` → `TIMESTAMPTZ`, else `TIMESTAMP` | - | `ROWID`, `UROWID` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Treated as opaque identifier | - | `SDO_GEOMETRY` | [`GEOMETRY`]({% link {{site.current_cloud_version}}/architecture/glossary.md %}#geometry) | Spatial type (PostGIS-style) | - | `XMLTYPE` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Stored as text | - -- To override the default mappings for automatic schema creation, you can map source to target CockroachDB types explicitly. These are defined in the JSON file indicated by the `--type-map-file` flag. The allowable custom mappings are valid CockroachDB aliases, casts, and the following mappings specific to MOLT Fetch and [Verify]({% link molt/molt-verify.md %}): +If [`drop-on-target-and-recreate`](#handle-target-tables) is set, MOLT Fetch automatically creates a CockroachDB schema that is compatible with the source data. The column types are determined by [MOLT's default type mappings]({% link molt/molt-type-mapping.md %}). + +- To override the default mappings for automatic schema creation, you can map source to target CockroachDB types explicitly. These are defined in the JSON file indicated by the [`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#type-map-file) flag. The allowable custom mappings are valid CockroachDB aliases, casts, and the following mappings specific to MOLT Fetch and [Verify]({% link molt/molt-verify.md %}): - [`TIMESTAMP`]({% link {{site.current_cloud_version}}/timestamp.md %}) <> [`TIMESTAMPTZ`]({% link {{site.current_cloud_version}}/timestamp.md %}) - [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %}) <> [`UUID`]({% link {{site.current_cloud_version}}/uuid.md %}) @@ -607,7 +450,7 @@ If [`drop-on-target-and-recreate`](#target-table-handling) is set, MOLT Fetch au - [`JSONB`]({% link {{site.current_cloud_version}}/jsonb.md %}) <> [`TEXT`]({% link {{site.current_cloud_version}}/string.md %}) - [`INET`]({% link {{site.current_cloud_version}}/inet.md %}) <> [`TEXT`]({% link {{site.current_cloud_version}}/string.md %}) -`--type-map-file` specifies the path to the JSON file containing the explicit type mappings. For example: +[`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#type-map-file) specifies the path to the JSON file containing the explicit type mappings. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -641,7 +484,7 @@ The following JSON example defines two type mappings: - `source_type` specifies the source type to be mapped. - `crdb_type` specifies the target CockroachDB [type]({% link {{ site.current_cloud_version }}/data-types.md %}) to be mapped. -### Transformations +### Define transformations You can define transformation rules to be performed on the target database during the fetch task. These can be used to: @@ -650,7 +493,7 @@ You can define transformation rules to be performed on the target database durin - Rename tables on the target database. - Rename database schemas. -Transformation rules are defined in the JSON file indicated by the `--transformations-file` flag. For example: +Transformation rules are defined in the JSON file indicated by the [`--transformations-file`]({% link molt/molt-fetch-commands-and-flags.md %}#transformations-file) flag. For example: {% include_cached copy-clipboard.html %} ~~~ @@ -736,8 +579,8 @@ The following JSON example defines three transformation rules: rule `1` [maps co For n-to-1 mappings: - - Use [`--use-copy`](#data-load-mode) or [`--direct-copy`](#direct-copy) for data movement. - - Manually create the target table. Do not use [`--table-handling drop-on-target-and-recreate`](#target-table-handling). + - Use [`--use-copy`](#import-into-vs-copy-from) or [`--direct-copy`](#direct-copy) for data movement. + - Manually create the target table. Do not use [`--table-handling drop-on-target-and-recreate`](#handle-target-tables). [Example rule `2`](#transformation-rules-example) maps all table names with prefix `charges_part` to a single `charges` table on CockroachDB (an n-to-1 mapping). This assumes that all matching `charges_part.*` tables have the same table definition: @@ -782,7 +625,7 @@ Each rule is applied in the order it is defined. If two rules overlap, the later To verify that the logging shows that the computed columns are being created: -When running `molt fetch`, set `--logging debug` and look for `ALTER TABLE ... ADD COLUMN` statements with the `STORED` or `VIRTUAL` keywords in the log output: +When running `molt fetch`, set [`--logging`]({% link molt/molt-fetch-commands-and-flags.md %}#logging) `debug` and look for `ALTER TABLE ... ADD COLUMN` statements with the `STORED` or `VIRTUAL` keywords in the log output: ~~~ json {"level":"debug","time":"2024-07-22T12:01:51-04:00","message":"running: ALTER TABLE IF EXISTS public.computed ADD COLUMN computed_col INT8 NOT NULL AS ((col1 + col2)) STORED"} @@ -804,7 +647,7 @@ SHOW CREATE TABLE computed; | ) ~~~ -### Fetch continuation +## Continue MOLT Fetch after interruption If MOLT Fetch fails while loading data into CockroachDB from intermediate files, it exits with an error message, fetch ID, and [continuation token](#list-active-continuation-tokens) for each table that failed to load on the target database. You can use this information to continue the task from the *continuation point* where it was interrupted. @@ -817,14 +660,14 @@ Continuation is only possible under the following conditions: Only one fetch ID and set of continuation tokens, each token corresponding to a table, are active at any time. See [List active continuation tokens](#list-active-continuation-tokens). {{site.data.alerts.end}} -To retry all data starting from the continuation point, reissue the `molt fetch` command and include the `--fetch-id`. +To retry all data starting from the continuation point, reissue the `molt fetch` command and include the [`--fetch-id`]({% link molt/molt-fetch-commands-and-flags.md %}#fetch-id). {% include_cached copy-clipboard.html %} ~~~ --fetch-id d44762e5-6f70-43f8-8e15-58b4de10a007 ~~~ -To retry a specific table that failed, include both `--fetch-id` and `--continuation-token`. The latter flag specifies a token string that corresponds to a specific table on the source database. A continuation token is written in the `molt fetch` output for each failed table. If the fetch task encounters a subsequent error, it generates a new token for each failed table. See [List active continuation tokens](#list-active-continuation-tokens). +To retry a specific table that failed, include both [`--fetch-id`]({% link molt/molt-fetch-commands-and-flags.md %}#fetch-id) and [`--continuation-token`]({% link molt/molt-fetch-commands-and-flags.md %}#continuation-token). The latter flag specifies a token string that corresponds to a specific table on the source database. A continuation token is written in the `molt fetch` output for each failed table. If the fetch task encounters a subsequent error, it generates a new token for each failed table. See [List active continuation tokens](#list-active-continuation-tokens). {{site.data.alerts.callout_info}} This will retry only the table that corresponds to the continuation token. If the fetch task succeeds, there may still be source data that is not yet loaded into CockroachDB. @@ -836,7 +679,7 @@ This will retry only the table that corresponds to the continuation token. If th --continuation-token 011762e5-6f70-43f8-8e15-58b4de10a007 ~~~ -To retry all data starting from a specific file, include both `--fetch-id` and `--continuation-file-name`. The latter flag specifies the filename of an intermediate file in [cloud or local storage](#data-path). All filenames are prepended with `part_` and have the `.csv.gz` or `.csv` extension, depending on compression type (gzip by default). For example: +To retry all data starting from a specific file, include both [`--fetch-id`]({% link molt/molt-fetch-commands-and-flags.md %}#fetch-id) and [`--continuation-file-name`]({% link molt/molt-fetch-commands-and-flags.md %}#continuation-file-name). The latter flag specifies the filename of an intermediate file in [cloud or local storage](#define-intermediate-storage). All filenames are prepended with `part_` and have the `.csv.gz` or `.csv` extension, depending on compression type (gzip by default). For example: {% include_cached copy-clipboard.html %} ~~~ @@ -848,9 +691,9 @@ To retry all data starting from a specific file, include both `--fetch-id` and ` Continuation is not possible when using [direct copy](#direct-copy). {{site.data.alerts.end}} -#### List active continuation tokens +### List active continuation tokens -To view all active continuation tokens, issue a `molt fetch tokens list` command along with `--conn-string`, which specifies the [connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url) for the target CockroachDB database. For example: +To view all active continuation tokens, issue a `molt fetch tokens list` command along with [`--conn-string`]({% link molt/molt-fetch-commands-and-flags.md %}#conn-string), which specifies the [connection string]({% link {{site.current_cloud_version}}/connection-parameters.md %}#connect-using-a-url) for the target CockroachDB database. For example: {% include_cached copy-clipboard.html %} ~~~ shell @@ -867,9 +710,9 @@ molt fetch tokens list \ Continuation Tokens. ~~~ -### CDC cursor +## Enable replication -A change data capture (CDC) cursor is written to the output as `cdc_cursor` at the beginning and end of the fetch task. +A change data capture (CDC) cursor is written to the MOLT Fetch output as `cdc_cursor` at the beginning and end of the fetch task. For MySQL: @@ -887,35 +730,19 @@ Use the `cdc_cursor` value as the checkpoint for MySQL or Oracle replication wit You can also use the `cdc_cursor` value with an external change data capture (CDC) tool to continuously replicate subsequent changes from the source database to CockroachDB. -## Security - -Cockroach Labs strongly recommends the following security practices. - -### Connection security - -{% include molt/molt-secure-connection-strings.md %} - -{{site.data.alerts.callout_info}} -By default, insecure connections (i.e., `sslmode=disable` on PostgreSQL; `sslmode` not set on MySQL) are disallowed. When using an insecure connection, `molt fetch` returns an error. To override this check, you can enable the `--allow-tls-mode-disable` flag. Do this **only** when testing, or if a secure SSL/TLS connection to the source or target database is not possible. -{{site.data.alerts.end}} - -### Cloud storage security - -{% include molt/fetch-secure-cloud-storage.md %} - -## Common workflows +## Common uses ### Bulk data load +When migrating data to CockroachDB in a bulk load (without utilizing [continuous replication]({% link molt/migration-considerations-replication.md %}) to minimize system downtime), run the `molt fetch` command with the required flags, as shown below: +
-To perform a bulk data load migration from your source database to CockroachDB, run the `molt fetch` command with the required flags. - -Specify the source and target database connections. For connection string formats, refer to [Source and target databases](#source-and-target-databases). +Specify the source and target database connections. For connection string formats, refer to [Source and target databases](#specify-source-and-target-databases).
{% include_cached copy-clipboard.html %} @@ -926,7 +753,7 @@ Specify the source and target database connections. For connection string format
-For Oracle Multitenant (CDB/PDB) sources, also include `--source-cdb` to specify the container database (CDB) connection string. +For Oracle Multitenant (CDB/PDB) sources, also include [`--source-cdb`]({% link molt/molt-fetch-commands-and-flags.md %}#source-cdb) to specify the container database (CDB) connection string. {% include_cached copy-clipboard.html %} ~~~ @@ -969,7 +796,7 @@ Optionally, filter the source data to migrate. By default, all schemas and table
-For Oracle sources, `--schema-filter` is case-insensitive. You can use either lowercase or uppercase: +For Oracle sources, [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) is case-insensitive. You can use either lowercase or uppercase: {% include_cached copy-clipboard.html %} ~~~ @@ -979,7 +806,7 @@ For Oracle sources, `--schema-filter` is case-insensitive. You can use either lo
-For MySQL sources, omit `--schema-filter` because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. If needed, use `--table-filter` to select specific tables: +For MySQL sources, omit [`--schema-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#schema-filter) because MySQL tables belong directly to the database specified in the connection string, not to a separate schema. If needed, use [`--table-filter`]({% link molt/molt-fetch-commands-and-flags.md %}#table-filter) to select specific tables: {% include_cached copy-clipboard.html %} ~~~ @@ -987,14 +814,14 @@ For MySQL sources, omit `--schema-filter` because MySQL tables belong directly t ~~~
-Specify how to handle target tables. By default, `--table-handling` is set to `none`, which loads data without changing existing data in the tables. For details, refer to [Target table handling](#target-table-handling): +Specify how to handle target tables. By default, [`--table-handling`]({% link molt/molt-fetch-commands-and-flags.md %}#table-handling) is set to `none`, which loads data without changing existing data in the tables. For details, refer to [Target table handling](#handle-target-tables): {% include_cached copy-clipboard.html %} ~~~ --table-handling truncate-if-exists ~~~ -When performing a bulk load without subsequent replication, use `--ignore-replication-check` to skip querying for replication checkpoints (such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle). This is appropriate when: +When performing a bulk load without subsequent replication, use [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) to skip querying for replication checkpoints (such as `pg_current_wal_insert_lsn()` on PostgreSQL, `gtid_executed` on MySQL, and `CURRENT_SCN` on Oracle). This is appropriate when: - Performing a one-time data migration with no plan to replicate ongoing changes. - Exporting data from a read replica where replication checkpoints are unavailable. @@ -1004,7 +831,7 @@ When performing a bulk load without subsequent replication, use `--ignore-replic --ignore-replication-check ~~~ -At minimum, the `molt fetch` command should include the source, target, data path, and `--ignore-replication-check` flags: +At minimum, the `molt fetch` command should include the source, target, data path, and [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check) flags: {% include_cached copy-clipboard.html %} ~~~ shell @@ -1015,9 +842,16 @@ molt fetch \ --ignore-replication-check ~~~ -For detailed steps, refer to [Bulk load migration]({% link molt/migrate-bulk-load.md %}). +For detailed walkthroughs of migrations that use `molt fetch` in this way, refer to the [Classic Bulk Load]({% link molt/migrate-bulk-load.md %}) and [Phased Bulk Load]({% link molt/migrate-bulk-load.md %}) migration approaches. + +For detailed walkthroughs of migrations that use `molt fetch` in this way, refer to these common migration approaches: + +- [Classic Bulk Load Migration]({% link molt/migration-approach-classic-bulk-load.md %}) +- [Phased Bulk Load Migration]({% link molt/migration-approach-phased-bulk-load.md %}) + +### Initial bulk load (before replication) -### Load before replication +In a migration that utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}), perform an initial data load before [setting up ongoing replication with MOLT Replicator]({% link molt/molt-replicator.md %}#forward-replication-after-initial-load). Run the `molt fetch` command without [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check), as shown below:
@@ -1025,15 +859,13 @@ For detailed steps, refer to [Bulk load migration]({% link molt/migrate-bulk-loa
-To perform an initial data load before setting up ongoing replication with [MOLT Replicator]({% link molt/molt-replicator.md %}), run the `molt fetch` command without `--ignore-replication-check`. This captures replication checkpoints during the data load. - The workflow is the same as [Bulk data load](#bulk-data-load), except: -- Exclude `--ignore-replication-check`. MOLT Fetch will query and record replication checkpoints. +- Exclude [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#ignore-replication-check). MOLT Fetch will query and record replication checkpoints.
- You must include `--pglogical-replication-slot-name` and `--pglogical-publication-and-slot-drop-and-recreate` to automatically create the publication and replication slot during the data load.
-- After the data load completes, check the [CDC cursor](#cdc-cursor) in the output for the checkpoint value to use with MOLT Replicator. +- After the data load completes, check the [CDC cursor](#enable-replication) in the output for the checkpoint value to use with MOLT Replicator. At minimum, the `molt fetch` command should include the source, target, and data path flags: @@ -1080,84 +912,18 @@ The output will include a `cdc_cursor` value at the end of the fetch task: Use this `cdc_cursor` value when starting MOLT Replicator to ensure replication begins from the correct position. For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}).
-## Monitoring - -### Metrics - -By default, MOLT Fetch exports [Prometheus](https://prometheus.io/) metrics at `127.0.0.1:3030/metrics`. You can configure this endpoint with the `--metrics-listen-addr` [flag](#global-flags). - -Cockroach Labs recommends monitoring the following metrics: - -| Metric Name | Description | -|---------------------------------------|-----------------------------------------------------------------------------------------------------------------------------| -| `molt_fetch_num_tables` | Number of tables that will be moved from the source. | -| `molt_fetch_num_task_errors` | Number of errors encountered by the fetch task. | -| `molt_fetch_overall_duration` | Duration (in seconds) of the fetch task. | -| `molt_fetch_rows_exported` | Number of rows that have been exported from a table. For example:
`molt_fetch_rows_exported{table="public.users"}` | -| `molt_fetch_rows_imported` | Number of rows that have been imported from a table. For example:
`molt_fetch_rows_imported{table="public.users"}` | -| `molt_fetch_table_export_duration_ms` | Duration (in milliseconds) of a table's export. For example:
`molt_fetch_table_export_duration_ms{table="public.users"}` | -| `molt_fetch_table_import_duration_ms` | Duration (in milliseconds) of a table's import. For example:
`molt_fetch_table_import_duration_ms{table="public.users"}` | - -You can also use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view the preceding metrics. - -## Best practices - -### Test and validate - -To verify that your connections and configuration work properly, run MOLT Fetch in a staging environment before migrating any data in production. Use a test or development environment that closely resembles production. - -### Configure the source database and connection - -- To prevent connections from terminating prematurely during the [data export phase](#data-export-phase), set the following to high values on the source database: - - - **Maximum allowed number of connections.** MOLT Fetch can export data across multiple connections. The number of connections it will create is the number of shards ([`--export-concurrency`](#global-flags)) multiplied by the number of tables ([`--table-concurrency`](#global-flags)) being exported concurrently. - - {{site.data.alerts.callout_info}} - With the default numerical range sharding, only tables with [primary key]({% link {{ site.current_cloud_version }}/primary-key.md %}) types of [`INT`]({% link {{ site.current_cloud_version }}/int.md %}), [`FLOAT`]({% link {{ site.current_cloud_version }}/float.md %}), or [`UUID`]({% link {{ site.current_cloud_version }}/uuid.md %}) can be sharded. PostgreSQL users can enable [`--use-stats-based-sharding`](#global-flags) to use statistics-based sharding for tables with primary keys of any data type. For details, refer to [Table sharding](#table-sharding). - {{site.data.alerts.end}} - - - **Maximum lifetime of a connection.** - -- If a PostgreSQL database is set as a [source](#source-and-target-databases), ensure that [`idle_in_transaction_session_timeout`](https://www.postgresql.org/docs/current/runtime-config-client.html#GUC-IDLE-IN-TRANSACTION-SESSION-TIMEOUT) on PostgreSQL is either disabled or set to a value longer than the duration of the [data export phase](#data-export-phase). Otherwise, the connection will be prematurely terminated. To estimate the time needed to export the PostgreSQL tables, you can perform a dry run and sum the value of [`molt_fetch_table_export_duration_ms`](#monitoring) for all exported tables. - -### Optimize performance - -- {% include molt/molt-drop-constraints-indexes.md %} - -- For PostgreSQL sources using [`--use-stats-based-sharding`](#global-flags), run [`ANALYZE`]({% link {{ site.current_cloud_version }}/create-statistics.md %}) on source tables before migration to ensure optimal shard distribution. This is especially important for large tables where even distribution can significantly improve export performance. - -- To prevent memory outages during `READ COMMITTED` [data export](#data-export-phase) of tables with large rows, estimate the amount of memory used to export a table: - - ~~~ - --row-batch-size * --export-concurrency * average size of the table rows - ~~~ - - If you are exporting more than one table at a time (i.e., [`--table-concurrency`](#global-flags) is set higher than `1`), add the estimated memory usage for the tables with the largest row sizes. Ensure that you have sufficient memory to run `molt fetch`, and adjust `--row-batch-size` accordingly. For details on how concurrency and sharding interact, refer to [Table sharding](#table-sharding). - -- If a table in the source database is much larger than the other tables, [filter and export the largest table](#schema-and-table-selection) in its own `molt fetch` task. Repeat this for each of the largest tables. Then export the remaining tables in another task. - -- Ensure that the machine running MOLT Fetch is large enough to handle the amount of data being migrated. Fetch performance can sometimes be limited by available resources, but should always be making progress. To identify possible resource constraints, observe the `molt_fetch_rows_exported` [metric](#monitoring) for decreases in the number of rows being processed. You can use the [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) to view metrics. For details on optimizing export performance through sharding, refer to [Table sharding](#table-sharding). - -### Import and continuation handling - -- When using [`IMPORT INTO`](#data-load-mode) during the [data import phase](#data-import-phase) to load tables into CockroachDB, if the fetch task terminates before the import job completes, the hanging import job on the target database will keep the table offline. To make this table accessible again, [manually resume or cancel the job]({% link {{site.current_cloud_version}}/import-into.md %}#view-and-control-import-jobs). Then resume `molt fetch` using [continuation](#fetch-continuation), or restart the task from the beginning. - -## Troubleshooting - -
- - - -
+For detailed walkthroughs of migrations that use `molt fetch` in this way, refer to these common migration approaches: -{% include molt/molt-troubleshooting-fetch.md %} +- [Delta Migration]({% link molt/migration-approach-delta.md %}) +- [Streaming Migration]({% link molt/migration-approach-streaming.md %}) +- [Active-Active Migration]({% link molt/migration-approach-active-active.md %}) ## See also +- [MOLT Fetch Installation]({% link molt/molt-fetch-installation.md %}) +- [MOLT Fetch Commands and Flags]({% link molt/molt-fetch-commands-and-flags.md %}) +- [MOLT Fetch Metrics]({% link molt/molt-fetch-monitoring.md %}) +- [MOLT Fetch Best Practices]({% link molt/molt-fetch-best-practices.md %}) +- [MOLT Fetch Troubleshooting]({% link molt/molt-fetch-troubleshooting.md %}) - [Migration Overview]({% link molt/migration-overview.md %}) -- [Migration Strategy]({% link molt/migration-strategy.md %}) -- [MOLT Replicator]({% link molt/molt-replicator.md %}) -- [MOLT Verify]({% link molt/molt-verify.md %}) -- [Load and replicate]({% link molt/migrate-load-replicate.md %}) -- [Resume Replication]({% link molt/migrate-resume-replication.md %}) -- [Migration Failback]({% link molt/migrate-failback.md %}) \ No newline at end of file +- [MOLT Replicator]({% link molt/molt-replicator.md %}) \ No newline at end of file diff --git a/src/current/molt/molt-replicator-best-practices.md b/src/current/molt/molt-replicator-best-practices.md new file mode 100644 index 00000000000..138ac3f4916 --- /dev/null +++ b/src/current/molt/molt-replicator-best-practices.md @@ -0,0 +1,147 @@ +--- +title: MOLT Replicator Best Practices +summary: Learn best practices for using MOLT Replicator for continuous replication. +toc: true +docs_area: migrate +--- + +## Test and validate + +To verify that your connections and configuration work properly, run MOLT Replicator in a staging environment before replicating any data in production. Use a test or development environment that closely resembles production. + +## Optimize performance + +{% include molt/optimize-replicator-performance.md %} + +## Security + +Cockroach Labs **strongly** recommends the following: + +### Connection security and credentials + +{% include molt/molt-secure-connection-strings.md %} + +### CockroachDB changefeed security + +For failback scenarios, secure the connection from CockroachDB to MOLT Replicator using TLS certificates. Generate TLS certificates using self-signed certificates, certificate authorities like Let's Encrypt, or your organization's certificate management system. + +#### TLS from CockroachDB to Replicator + +Configure MOLT Replicator with server certificates using the [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) flags to specify the certificate and private key file paths. For example: + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator start \ +--tlsCertificate ./certs/server.crt \ +--tlsPrivateKey ./certs/server.key \ +... +~~~ + +These server certificates must correspond to the client certificates specified in the changefeed webhook URL to ensure proper TLS handshake. + +Encode client certificates for changefeed webhook URLs: + +- Webhook URLs: Use both URL encoding and base64 encoding: `base64 -i ./client.crt | jq -R -r '@uri'` +- Non-webhook contexts: Use base64 encoding only: `base64 -w 0 ca.cert` + +#### JWT authentication + +You can use JSON Web Tokens (JWT) to authorize incoming changefeed connections and restrict writes to a subset of SQL databases or user-defined schemas in the target cluster. + +Replicator supports JWT claims that allow writes to specific databases, schemas, or all of them. JWT tokens must be signed using RSA or EC keys. HMAC and `None` signatures are automatically rejected. + +To configure JWT authentication: + +1. Add PEM-formatted public signing keys to the `_replicator.jwt_public_keys` table in the staging database. + +1. To revoke a specific token, add its `jti` value to the `_replicator.jwt_revoked_ids` table in the staging database. + +The Replicator process re-reads these tables every minute to pick up changes. + +To pass the JWT token from the changefeed to the Replicator webhook sink, use the [`webhook_auth_header` option]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options): + +{% include_cached copy-clipboard.html %} +~~~ sql +CREATE CHANGEFEED ... WITH webhook_auth_header='Bearer '; +~~~ + +##### Token quickstart + +The following example uses `OpenSSL` to generate keys, but any PEM-encoded RSA or EC keys will work. + +{% include_cached copy-clipboard.html %} +~~~ shell +# Generate an EC private key using OpenSSL. +openssl ecparam -out ec.key -genkey -name prime256v1 + +# Write the public key components to a separate file. +openssl ec -in ec.key -pubout -out ec.pub + +# Upload the public key for all instances of Replicator to find it. +cockroach sql -e "INSERT INTO _replicator.jwt_public_keys (public_key) VALUES ('$(cat ec.pub)')" + +# Reload configuration, or wait one minute. +killall -HUP replicator + +# Generate a token which can write to the ycsb.public schema. +# The key can be decoded using the debugger at https://jwt.io. +# Add the contents of out.jwt to the CREATE CHANGEFEED command: +# WITH webhook_auth_header='Bearer {out.jwt}' +replicator make-jwt -k ec.key -a ycsb.public -o out.jwt +~~~ + +##### External JWT providers + +The `make-jwt` command also supports a [`--claim`]({% link molt/replicator-flags.md %}#claim) flag, which prints a JWT claim that can be signed by your existing JWT provider. The PEM-formatted public key or keys for that provider must be inserted into the `_replicator.jwt_public_keys` table. The `iss` (issuers) and `jti` (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety. + +{{site.data.alerts.callout_success}} +You can repeat the [`-a`]({% link molt/replicator-flags.md %}#allow) flag to create a claim for multiple schemas. +{{site.data.alerts.end}} + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator make-jwt -a 'database.schema' --claim +~~~ + +~~~json +{ + "iss": "replicator", + "jti": "d5ffa211-8d54-424b-819a-bc19af9202a5", + "https://github.com/cockroachdb/replicator": { + "schemas": [ + [ + "database", + "schema" + ] + ] + } +} +~~~ + +### Production considerations + +- Avoid [`--disableAuthentication`]({% link molt/replicator-flags.md %}#disable-authentication) and [`--tlsSelfSigned`]({% link molt/replicator-flags.md %}#tls-self-signed) flags in production environments. These flags should only be used for testing or development purposes. + +### Supply chain security + +Use the `version` command to verify the integrity of your MOLT Replicator build and identify potential upstream vulnerabilities. + +{% include_cached copy-clipboard.html %} +~~~ shell +replicator version +~~~ + +The output includes: + +- Module name +- go.mod checksum +- Version + +Use this information to determine if your build may be subject to vulnerabilities from upstream packages. Cockroach Labs uses Dependabot to automatically upgrade Go modules, and the team regularly merges Dependabot updates to address security issues. + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Replicator Installation]({% link molt/molt-replicator-installation.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-replicator-installation.md b/src/current/molt/molt-replicator-installation.md new file mode 100644 index 00000000000..478ab30c704 --- /dev/null +++ b/src/current/molt/molt-replicator-installation.md @@ -0,0 +1,55 @@ +--- +title: MOLT Replicator Installation +summary: Learn how to install MOLT Replicator and configure prerequisites for continuous replication. +toc: true +docs_area: migrate +--- + +## Prerequisites + +### Supported databases + +MOLT Replicator supports the following source and target databases: + +- PostgreSQL 11-16 +- MySQL 5.7, 8.0 and later +- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) +- CockroachDB (all currently [supported versions]({% link releases/release-support-policy.md %}#supported-versions)) + +### Database configuration + +The source database must be configured for replication: + +| Database | Configuration Requirements | Details | +|-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source |
  • Enable logical replication by setting `wal_level = logical`.
| [Configure PostgreSQL for replication]({% link molt/migrate-load-replicate.md %}#configure-source-database-for-replication) | +| MySQL source |
  • Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full`.
  • Configure sufficient binlog retention for migration duration.
| [Configure MySQL for replication]({% link molt/migrate-load-replicate.md %}?filters=mysql#configure-source-database-for-replication) | +| Oracle source |
  • Install [Oracle Instant Client]({% link molt/migrate-load-replicate.md %}?filters=oracle#oracle-instant-client).
  • [Enable `ARCHIVELOG` mode]({% link molt/migrate-load-replicate.md %}?filters=oracle#enable-archivelog-and-force-logging), supplemental logging for primary keys, and `FORCE LOGGING`.
  • [Create sentinel table]({% link molt/migrate-load-replicate.md %}#create-source-sentinel-table) (`REPLICATOR_SENTINEL`) in source schema.
  • Grant and verify [LogMiner privileges]({% link molt/migrate-load-replicate.md %}#grant-logminer-privileges).
| [Configure Oracle for replication]({% link molt/migrate-load-replicate.md %}?filters=oracle#configure-source-database-for-replication) | +| CockroachDB source (failback) |
  • [Enable rangefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) (`kv.rangefeed.enabled = true`) (CockroachDB {{ site.data.products.core }} clusters only).
| [Configure CockroachDB for replication]({% link molt/migrate-failback.md %}#prepare-the-cockroachdb-cluster) | + +### User permissions + +The SQL user running MOLT Replicator requires specific privileges on both the source and target databases: + +| Database | Required Privileges | Details | +|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| PostgreSQL source |
  • `SUPERUSER` role (recommended), or the following granular permissions:
  • `CREATE` and `SELECT` on database and tables to replicate.
  • Table ownership for adding tables to publications.
  • `LOGIN` and `REPLICATION` privileges to create replication slots and access replication data.
| [Create PostgreSQL migration user]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database) | +| MySQL source |
  • `SELECT` on tables to replicate.
  • `REPLICATION SLAVE` and `REPLICATION CLIENT` privileges for binlog access.
  • For [`--fetchMetadata`]({% link molt/replicator-flags.md %}#fetch-metadata), either `SELECT` on the source database or `PROCESS` globally.
| [Create MySQL migration user]({% link molt/migrate-load-replicate.md %}?filters=mysql#create-migration-user-on-source-database) | +| Oracle source |
  • `SELECT`, `INSERT`, `UPDATE` on `REPLICATOR_SENTINEL` table.
  • `SELECT` on `V$` views (`V$LOG`, `V$LOGFILE`, `V$LOGMNR_CONTENTS`, `V$ARCHIVED_LOG`, `V$LOG_HISTORY`).
  • `SELECT` on `SYS.V$LOGMNR_*` views (`SYS.V$LOGMNR_DICTIONARY`, `SYS.V$LOGMNR_LOGS`, `SYS.V$LOGMNR_PARAMETERS`, `SYS.V$LOGMNR_SESSION`).
  • `LOGMINING` privilege.
  • `EXECUTE` on `DBMS_LOGMNR`.
  • For Oracle Multitenant, the user must be a common user (prefixed with `C##`) with privileges granted on both CDB and PDB.
| [Create Oracle migration user]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-migration-user-on-source-database)

[Create sentinel table]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-source-sentinel-table)

[Grant LogMiner privileges]({% link molt/migrate-load-replicate.md %}?filters=oracle#grant-logminer-privileges) | +| CockroachDB target (forward replication) |
  • `ALL` on target database.
  • `CREATE` on schema.
  • `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables.
  • `CREATEDB` privilege for creating staging schema.
| [Create CockroachDB user]({% link molt/migrate-load-replicate.md %}#create-the-sql-user) | +| PostgreSQL, MySQL, or Oracle target (failback) |
  • `SELECT`, `INSERT`, `UPDATE` on tables to fail back to.
  • For Oracle, `FLASHBACK` is also required.
| [Grant PostgreSQL user permissions]({% link molt/migrate-failback.md %}#grant-target-database-user-permissions)

[Grant MySQL user permissions]({% link molt/migrate-failback.md %}?filter=mysql#grant-target-database-user-permissions)

[Grant Oracle user permissions]({% link molt/migrate-failback.md %}?filter=oracle#grant-target-database-user-permissions) | + +## Installation + +{% include molt/molt-install.md %} + +### Docker usage + +{% include molt/molt-docker.md %} + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [Load and Replicate]({% link molt/migrate-load-replicate.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-replicator-troubleshooting.md b/src/current/molt/molt-replicator-troubleshooting.md new file mode 100644 index 00000000000..e96472fc4dd --- /dev/null +++ b/src/current/molt/molt-replicator-troubleshooting.md @@ -0,0 +1,23 @@ +--- +title: MOLT Replicator Troubleshooting +summary: Troubleshoot common issues with MOLT Replicator during continuous replication. +toc: true +docs_area: migrate +--- + +
+ + + +
+ +{% include molt/molt-troubleshooting-replication.md %} + +{% include molt/molt-troubleshooting-failback.md %} + +## See also + +- [MOLT Replicator]({% link molt/molt-replicator.md %}) +- [MOLT Replicator Installation]({% link molt/molt-replicator-installation.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [Migration Overview]({% link molt/migration-overview.md %}) diff --git a/src/current/molt/molt-replicator.md b/src/current/molt/molt-replicator.md index aa6d1dbe039..c375b6a74e5 100644 --- a/src/current/molt/molt-replicator.md +++ b/src/current/molt/molt-replicator.md @@ -13,101 +13,24 @@ MOLT Replicator consumes change data from PostgreSQL [logical replication](https - *Checkpoint*: The position in the source database's transaction log from which replication begins or resumes: LSN (PostgreSQL), GTID (MySQL), or SCN (Oracle). - *Staging database*: A CockroachDB database used by Replicator to store replication metadata, checkpoints, and buffered mutations. Specified with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) and automatically created with [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema). For details, refer to [Staging database](#staging-database). -- *Forward replication*: Replicate changes from a source database (PostgreSQL, MySQL, or Oracle) to CockroachDB during a migration. For usage details, refer to [Forward replication with initial load](#forward-replication-with-initial-load). -- *Failback*: Replicate changes from CockroachDB back to the source database. Used for migration rollback or to maintain data consistency on the source during migration. For usage details, refer to [Failback to source database](#failback-to-source-database). - -## Prerequisites - -### Supported databases - -MOLT Replicator supports the following source and target databases: - -- PostgreSQL 11-16 -- MySQL 5.7, 8.0 and later -- Oracle Database 19c (Enterprise Edition) and 21c (Express Edition) -- CockroachDB (all currently [supported versions]({% link releases/release-support-policy.md %}#supported-versions)) - -### Database configuration - -The source database must be configured for replication: - -| Database | Configuration Requirements | Details | -|-------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source |
  • Enable logical replication by setting `wal_level = logical`.
| [Configure PostgreSQL for replication]({% link molt/migrate-load-replicate.md %}#configure-source-database-for-replication) | -| MySQL source |
  • Enable [global transaction identifiers (GTID)](https://dev.mysql.com/doc/refman/8.0/en/replication-options-gtids.html) and configure binary logging. Set `binlog-row-metadata` or `binlog-row-image` to `full`.
  • Configure sufficient binlog retention for migration duration.
| [Configure MySQL for replication]({% link molt/migrate-load-replicate.md %}?filters=mysql#configure-source-database-for-replication) | -| Oracle source |
  • Install [Oracle Instant Client]({% link molt/migrate-load-replicate.md %}?filters=oracle#oracle-instant-client).
  • [Enable `ARCHIVELOG` mode]({% link molt/migrate-load-replicate.md %}?filters=oracle#enable-archivelog-and-force-logging), supplemental logging for primary keys, and `FORCE LOGGING`.
  • [Create sentinel table]({% link molt/migrate-load-replicate.md %}#create-source-sentinel-table) (`REPLICATOR_SENTINEL`) in source schema.
  • Grant and verify [LogMiner privileges]({% link molt/migrate-load-replicate.md %}#grant-logminer-privileges).
| [Configure Oracle for replication]({% link molt/migrate-load-replicate.md %}?filters=oracle#configure-source-database-for-replication) | -| CockroachDB source (failback) |
  • [Enable rangefeeds]({% link {{ site.current_cloud_version }}/create-and-configure-changefeeds.md %}#enable-rangefeeds) (`kv.rangefeed.enabled = true`) (CockroachDB {{ site.data.products.core }} clusters only).
| [Configure CockroachDB for replication]({% link molt/migrate-failback.md %}#prepare-the-cockroachdb-cluster) | - -### User permissions - -The SQL user running MOLT Replicator requires specific privileges on both the source and target databases: - -| Database | Required Privileges | Details | -|------------------------------------------------|----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------|---------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| PostgreSQL source |
  • `SUPERUSER` role (recommended), or the following granular permissions:
  • `CREATE` and `SELECT` on database and tables to replicate.
  • Table ownership for adding tables to publications.
  • `LOGIN` and `REPLICATION` privileges to create replication slots and access replication data.
| [Create PostgreSQL migration user]({% link molt/migrate-load-replicate.md %}#create-migration-user-on-source-database) | -| MySQL source |
  • `SELECT` on tables to replicate.
  • `REPLICATION SLAVE` and `REPLICATION CLIENT` privileges for binlog access.
  • For [`--fetchMetadata`]({% link molt/replicator-flags.md %}#fetch-metadata), either `SELECT` on the source database or `PROCESS` globally.
| [Create MySQL migration user]({% link molt/migrate-load-replicate.md %}?filters=mysql#create-migration-user-on-source-database) | -| Oracle source |
  • `SELECT`, `INSERT`, `UPDATE` on `REPLICATOR_SENTINEL` table.
  • `SELECT` on `V$` views (`V$LOG`, `V$LOGFILE`, `V$LOGMNR_CONTENTS`, `V$ARCHIVED_LOG`, `V$LOG_HISTORY`).
  • `SELECT` on `SYS.V$LOGMNR_*` views (`SYS.V$LOGMNR_DICTIONARY`, `SYS.V$LOGMNR_LOGS`, `SYS.V$LOGMNR_PARAMETERS`, `SYS.V$LOGMNR_SESSION`).
  • `LOGMINING` privilege.
  • `EXECUTE` on `DBMS_LOGMNR`.
  • For Oracle Multitenant, the user must be a common user (prefixed with `C##`) with privileges granted on both CDB and PDB.
| [Create Oracle migration user]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-migration-user-on-source-database)

[Create sentinel table]({% link molt/migrate-load-replicate.md %}?filters=oracle#create-source-sentinel-table)

[Grant LogMiner privileges]({% link molt/migrate-load-replicate.md %}?filters=oracle#grant-logminer-privileges) | -| CockroachDB target (forward replication) |
  • `ALL` on target database.
  • `CREATE` on schema.
  • `SELECT`, `INSERT`, `UPDATE`, `DELETE` on target tables.
  • `CREATEDB` privilege for creating staging schema.
| [Create CockroachDB user]({% link molt/migrate-load-replicate.md %}#create-the-sql-user) | -| PostgreSQL, MySQL, or Oracle target (failback) |
  • `SELECT`, `INSERT`, `UPDATE` on tables to fail back to.
  • For Oracle, `FLASHBACK` is also required.
| [Grant PostgreSQL user permissions]({% link molt/migrate-failback.md %}#grant-target-database-user-permissions)

[Grant MySQL user permissions]({% link molt/migrate-failback.md %}?filter=mysql#grant-target-database-user-permissions)

[Grant Oracle user permissions]({% link molt/migrate-failback.md %}?filter=oracle#grant-target-database-user-permissions) | - -## Installation - -{% include molt/molt-install.md %} - -### Docker usage - -{% include molt/molt-docker.md %} +- *Forward replication*: Replicate changes from a source database (PostgreSQL, MySQL, or Oracle) to CockroachDB during a migration. For usage details, refer to [Forward replication (after initial load)](#forward-replication-after-initial-load). +- *Failback*: Replicate changes from CockroachDB back to the source database. Used for migration rollback or to maintain data consistency on the source during migration. For usage details, refer to [Failback replication](#failback-replication). ## How it works MOLT Replicator supports forward replication from PostgreSQL, MySQL, and Oracle, and failback from CockroachDB: -- PostgreSQL source ([`pglogical`](#commands)): MOLT Replicator uses [PostgreSQL logical replication](https://www.postgresql.org/docs/current/logical-replication.html), which is based on publications and replication slots. You create a publication for the target tables, and a slot marks consistent replication points. MOLT Replicator consumes this logical feed directly and applies the data in sorted batches to the target. - -- MySQL source ([`mylogical`](#commands)): MOLT Replicator relies on [MySQL GTID-based replication](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to read change data from MySQL binlogs. It works with MySQL versions that support GTID-based replication and applies transactionally consistent feeds to the target. Binlog features that do not use GTIDs are not supported. - -- Oracle source ([`oraclelogminer`](#commands)): MOLT Replicator uses [Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html) to capture change data from Oracle redo logs. Both Oracle Multitenant (CDB/PDB) and single-tenant Oracle architectures are supported. Replicator periodically queries LogMiner-populated views and processes transactional data in ascending SCN windows for reliable throughput while maintaining consistency. - -- Failback from CockroachDB ([`start`](#commands)): MOLT Replicator acts as an HTTP webhook sink for a single CockroachDB changefeed. Replicator receives mutations from source cluster nodes, can optionally buffer them in a CockroachDB staging cluster, and then applies time-ordered transactional batches to the target database. Mutations are applied as [`UPSERT`]({% link {{ site.current_cloud_version }}/upsert.md %}) or [`DELETE`]({% link {{ site.current_cloud_version }}/delete.md %}) statements while respecting [foreign-key]({% link {{ site.current_cloud_version }}/foreign-key.md %}) and table dependencies. - -### Consistency modes - -MOLT Replicator supports three consistency modes for balancing throughput and transactional guarantees: - -1. *Consistent* (failback mode only, default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). - -1. *BestEffort* (failback mode only): Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with [`--bestEffortOnly`]({% link molt/replicator-flags.md %}#best-effort-only) or allow auto-entry via [`--bestEffortWindow`]({% link molt/replicator-flags.md %}#best-effort-window) set to a positive duration (such as `1s`). - - {{site.data.alerts.callout_info}} - For independent tables (with no foreign key constraints), BestEffort mode applies changes immediately as they arrive, without waiting for the resolved timestamp. This provides higher throughput for tables that have no relationships with other tables. - {{site.data.alerts.end}} - -1. *Immediate* (default for PostgreSQL, MySQL, and Oracle sources): Applies updates as they arrive to Replicator with no buffering or waiting for resolved timestamps. For CockroachDB sources, provides highest throughput but requires no foreign keys on the target schema. - -## Commands - -MOLT Replicator provides the following commands: +- PostgreSQL source ([`pglogical`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator uses [PostgreSQL logical replication](https://www.postgresql.org/docs/current/logical-replication.html), which is based on publications and replication slots. You create a publication for the target tables, and a slot marks consistent replication points. MOLT Replicator consumes this logical feed directly and applies the data in sorted batches to the target. -| Command | Description | -|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| -| `pglogical` | Replicate from PostgreSQL source to CockroachDB target using logical replication. | -| `mylogical` | Replicate from MySQL source to CockroachDB target using GTID-based replication. | -| `oraclelogminer` | Replicate from Oracle source to CockroachDB target using Oracle LogMiner. | -| `start` | Replicate from CockroachDB source to PostgreSQL, MySQL, or Oracle target ([failback mode](#failback-to-source-database)). Requires a CockroachDB changefeed with rangefeeds enabled. | -| `make-jwt` | Generate JWT tokens for authorizing changefeed connections in failback scenarios. Supports signing tokens with RSA or EC keys, or generating claims for external JWT providers. For details, refer to [JWT authentication](#jwt-authentication). | -| `version` | Display version information and Go module dependencies with checksums. For details, refer to [Supply chain security](#supply-chain-security). | +- MySQL source ([`mylogical`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator relies on [MySQL GTID-based replication](https://dev.mysql.com/doc/refman/8.0/en/replication-gtids.html) to read change data from MySQL binlogs. It works with MySQL versions that support GTID-based replication and applies transactionally consistent feeds to the target. Binlog features that do not use GTIDs are not supported. -For command-specific flags and examples, refer to [Usage](#usage) and [Common workflows](#common-workflows). +- Oracle source ([`oraclelogminer`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator uses [Oracle LogMiner](https://docs.oracle.com/en/database/oracle/oracle-database/21/sutil/oracle-logminer-utility.html) to capture change data from Oracle redo logs. Both Oracle Multitenant (CDB/PDB) and single-tenant Oracle architectures are supported. Replicator periodically queries LogMiner-populated views and processes transactional data in ascending SCN windows for reliable throughput while maintaining consistency. -## Flags - -Refer to [Replicator Flags]({% link molt/replicator-flags.md %}). - -## Usage +- Failback from CockroachDB ([`start`]({% link molt/replicator-flags.md %}#commands)): MOLT Replicator acts as an HTTP webhook sink for a single CockroachDB changefeed. Replicator receives mutations from source cluster nodes, can optionally buffer them in a CockroachDB staging cluster, and then applies time-ordered transactional batches to the target database. Mutations are applied as [`UPSERT`]({% link {{ site.current_cloud_version }}/upsert.md %}) or [`DELETE`]({% link {{ site.current_cloud_version }}/delete.md %}) statements while respecting [foreign-key]({% link {{ site.current_cloud_version }}/foreign-key.md %}) and table dependencies. ### Replicator commands -MOLT Replicator provides four commands for different replication scenarios. For detailed workflows, refer to [Common workflows](#common-workflows). +MOLT Replicator provides four commands for different replication scenarios. For example commands, refer to [Common uses](#common-uses). Use `pglogical` to replicate from PostgreSQL to CockroachDB: @@ -140,7 +63,7 @@ replicator start ### Source connection strings {{site.data.alerts.callout_success}} -Follow the security recommendations in [Connection security and credentials](#connection-security-and-credentials). +Follow the security recommendations in [Connection security and credentials]({% link molt/molt-replicator-best-practices.md %}#connection-security-and-credentials). {{site.data.alerts.end}} [`--sourceConn`]({% link molt/replicator-flags.md %}#source-conn) specifies the connection string of the source database for forward replication. @@ -195,7 +118,7 @@ For failback, [`--stagingConn`]({% link molt/replicator-flags.md %}#staging-conn ~~~ {{site.data.alerts.callout_info}} -For failback, [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) specifies the original source database (PostgreSQL, MySQL, or Oracle). For details, refer to [Failback to source database](#failback-to-source-database). +For failback, [`--targetConn`]({% link molt/replicator-flags.md %}#target-conn) specifies the original source database (PostgreSQL, MySQL, or Oracle). For details, refer to [Failback replication](#failback-replication). {{site.data.alerts.end}} ### Replication checkpoints @@ -209,14 +132,14 @@ For PostgreSQL, use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name --slotName molt_slot ~~~ -For MySQL, set [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) to the [`cdc_cursor` value]({% link molt/molt-fetch.md %}#cdc-cursor) from the MOLT Fetch output: +For MySQL, set [`--defaultGTIDSet`]({% link molt/replicator-flags.md %}#default-gtid-set) to the [`cdc_cursor` value]({% link molt/molt-fetch.md %}#enable-replication) from the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ --defaultGTIDSet '4c658ae6-e8ad-11ef-8449-0242ac140006:1-29' ~~~ -For Oracle, set [`--scn`]({% link molt/replicator-flags.md %}#scn) and [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) to the [`cdc_cursor` values]({% link molt/molt-fetch.md %}#cdc-cursor) from the MOLT Fetch output: +For Oracle, set [`--scn`]({% link molt/replicator-flags.md %}#scn) and [`--backfillFromSCN`]({% link molt/replicator-flags.md %}#backfill-from-scn) to the [`cdc_cursor` values]({% link molt/molt-fetch.md %}#enable-replication) from the MOLT Fetch output: {% include_cached copy-clipboard.html %} ~~~ @@ -241,135 +164,60 @@ The staging database is used to: - Maintain consistency for time-ordered transactional batches while respecting table dependencies. - Provide restart capabilities after failures. -## Security - -Cockroach Labs **strongly** recommends the following: - -### Connection security and credentials - -{% include molt/molt-secure-connection-strings.md %} - -### CockroachDB changefeed security - -For failback scenarios, secure the connection from CockroachDB to MOLT Replicator using TLS certificates. Generate TLS certificates using self-signed certificates, certificate authorities like Let's Encrypt, or your organization's certificate management system. - -#### TLS from CockroachDB to Replicator - -Configure MOLT Replicator with server certificates using the [`--tlsCertificate`]({% link molt/replicator-flags.md %}#tls-certificate) and [`--tlsPrivateKey`]({% link molt/replicator-flags.md %}#tls-private-key) flags to specify the certificate and private key file paths. For example: - -{% include_cached copy-clipboard.html %} -~~~ shell -replicator start \ ---tlsCertificate ./certs/server.crt \ ---tlsPrivateKey ./certs/server.key \ -... -~~~ - -These server certificates must correspond to the client certificates specified in the changefeed webhook URL to ensure proper TLS handshake. - -Encode client certificates for changefeed webhook URLs: - -- Webhook URLs: Use both URL encoding and base64 encoding: `base64 -i ./client.crt | jq -R -r '@uri'` -- Non-webhook contexts: Use base64 encoding only: `base64 -w 0 ca.cert` +### Consistency modes -#### JWT authentication +MOLT Replicator supports three consistency modes for balancing throughput and transactional guarantees: -You can use JSON Web Tokens (JWT) to authorize incoming changefeed connections and restrict writes to a subset of SQL databases or user-defined schemas in the target cluster. +1. *Consistent* (failback mode only, default for CockroachDB sources): Preserves per-row order and source transaction atomicity. Concurrent transactions are controlled by [`--parallelism`]({% link molt/replicator-flags.md %}#parallelism). -Replicator supports JWT claims that allow writes to specific databases, schemas, or all of them. JWT tokens must be signed using RSA or EC keys. HMAC and `None` signatures are automatically rejected. +1. *BestEffort* (failback mode only): Relaxes atomicity across tables that do not have foreign key constraints between them (maintains coherence within FK-connected groups). Enable with [`--bestEffortOnly`]({% link molt/replicator-flags.md %}#best-effort-only) or allow auto-entry via [`--bestEffortWindow`]({% link molt/replicator-flags.md %}#best-effort-window) set to a positive duration (such as `1s`). -To configure JWT authentication: + {{site.data.alerts.callout_info}} + For independent tables (with no foreign key constraints), BestEffort mode applies changes immediately as they arrive, without waiting for the resolved timestamp. This provides higher throughput for tables that have no relationships with other tables. + {{site.data.alerts.end}} -1. Add PEM-formatted public signing keys to the `_replicator.jwt_public_keys` table in the staging database. +1. *Immediate* (default for PostgreSQL, MySQL, and Oracle sources): Applies updates as they arrive to Replicator with no buffering or waiting for resolved timestamps. For CockroachDB sources, provides highest throughput but requires no foreign keys on the target schema. -1. To revoke a specific token, add its `jti` value to the `_replicator.jwt_revoked_ids` table in the staging database. +### Monitoring -The Replicator process re-reads these tables every minute to pick up changes. +#### Metrics -To pass the JWT token from the changefeed to the Replicator webhook sink, use the [`webhook_auth_header` option]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options): +MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: -{% include_cached copy-clipboard.html %} -~~~ sql -CREATE CHANGEFEED ... WITH webhook_auth_header='Bearer '; +~~~ +--metricsAddr :30005 ~~~ -##### Token quickstart - -The following example uses `OpenSSL` to generate keys, but any PEM-encoded RSA or EC keys will work. - -{% include_cached copy-clipboard.html %} -~~~ shell -# Generate an EC private key using OpenSSL. -openssl ecparam -out ec.key -genkey -name prime256v1 - -# Write the public key components to a separate file. -openssl ec -in ec.key -pubout -out ec.pub - -# Upload the public key for all instances of Replicator to find it. -cockroach sql -e "INSERT INTO _replicator.jwt_public_keys (public_key) VALUES ('$(cat ec.pub)')" - -# Reload configuration, or wait one minute. -killall -HUP replicator - -# Generate a token which can write to the ycsb.public schema. -# The key can be decoded using the debugger at https://jwt.io. -# Add the contents of out.jwt to the CREATE CHANGEFEED command: -# WITH webhook_auth_header='Bearer {out.jwt}' -replicator make-jwt -k ec.key -a ycsb.public -o out.jwt -~~~ +For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). -##### External JWT providers +#### Logging -The `make-jwt` command also supports a [`--claim`]({% link molt/replicator-flags.md %}#claim) flag, which prints a JWT claim that can be signed by your existing JWT provider. The PEM-formatted public key or keys for that provider must be inserted into the `_replicator.jwt_public_keys` table. The `iss` (issuers) and `jti` (token id) fields will likely be specific to your auth provider, but the custom claim must be retained in its entirety. +By default, MOLT Replicator writes two streams of logs: operational logs to `stdout` (including `warning`, `info`, `trace`, and some errors) and final errors to `stderr`. -{{site.data.alerts.callout_success}} -You can repeat the [`-a`]({% link molt/replicator-flags.md %}#allow) flag to create a claim for multiple schemas. -{{site.data.alerts.end}} +Redirect both streams to ensure all logs are captured for troubleshooting: {% include_cached copy-clipboard.html %} -~~~ shell -replicator make-jwt -a 'database.schema' --claim -~~~ - -~~~json -{ - "iss": "replicator", - "jti": "d5ffa211-8d54-424b-819a-bc19af9202a5", - "https://github.com/cockroachdb/replicator": { - "schemas": [ - [ - "database", - "schema" - ] - ] - } -} -~~~ - -### Production considerations - -- Avoid [`--disableAuthentication`]({% link molt/replicator-flags.md %}#disable-authentication) and [`--tlsSelfSigned`]({% link molt/replicator-flags.md %}#tls-self-signed) flags in production environments. These flags should only be used for testing or development purposes. +~~~shell +# Merge both streams to console +./replicator ... 2>&1 -### Supply chain security +# Redirect both streams to a file +./replicator ... > output.log 2>&1 -Use the `version` command to verify the integrity of your MOLT Replicator build and identify potential upstream vulnerabilities. +# Merge streams to console while saving to file +./replicator > >(tee replicator.log) 2>&1 -{% include_cached copy-clipboard.html %} -~~~ shell -replicator version +# Use logDestination flag to write all logs to a file +./replicator --logDestination replicator.log ... ~~~ -The output includes: - -- Module name -- go.mod checksum -- Version +Enable debug logging with [`-v`]({% link molt/replicator-flags.md %}#verbose). For more granularity and system insights, enable trace logging with [`-vv`]({% link molt/replicator-flags.md %}#verbose). Pay close attention to warning- and error-level logs, as these indicate when Replicator is misbehaving. -Use this information to determine if your build may be subject to vulnerabilities from upstream packages. Cockroach Labs uses Dependabot to automatically upgrade Go modules, and the team regularly merges Dependabot updates to address security issues. +## Common uses -## Common workflows +### Forward replication (after initial load) -### Forward replication with initial load +In a migration that utilizes [continuous replication]({% link molt/migration-considerations-replication.md %}), run the `replicator` command after [using MOLT Fetch to perform the initial data load]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication). Run the `replicator` command with the required flags, as shown below:
@@ -378,7 +226,7 @@ Use this information to determine if your build may be subject to vulnerabilitie
-To start replication after an [initial data load with MOLT Fetch]({% link molt/migrate-load-replicate.md %}#start-fetch), use the `pglogical` command: +To start replication after an initial data load with MOLT Fetch, use the `pglogical` command: {% include_cached copy-clipboard.html %} ~~~ shell @@ -387,7 +235,7 @@ replicator pglogical
-To start replication after an [initial data load with MOLT Fetch]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-fetch), use the `mylogical` command: +To start replication after an initial data load with MOLT Fetch, use the `mylogical` command: {% include_cached copy-clipboard.html %} ~~~ shell @@ -396,7 +244,7 @@ replicator mylogical
-To start replication after an [initial data load with MOLT Fetch]({% link molt/migrate-load-replicate.md %}?filters=oracle#start-fetch), use the `oraclelogminer` command: +To start replication after an initial data load with MOLT Fetch, use the `oraclelogminer` command: {% include_cached copy-clipboard.html %} ~~~ shell @@ -438,7 +286,7 @@ Specify the target schema on CockroachDB with [`--targetSchema`]({% link molt/re To replicate from the correct position, specify the appropriate checkpoint value.
-Use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the slot [created during the data load]({% link molt/molt-fetch.md %}#load-before-replication), which automatically tracks the LSN (Log Sequence Number) checkpoint: +Use [`--slotName`]({% link molt/replicator-flags.md %}#slot-name) to specify the slot [created during the data load]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication), which automatically tracks the LSN (Log Sequence Number) checkpoint: {% include_cached copy-clipboard.html %} ~~~ @@ -487,7 +335,6 @@ replicator pglogical \ --stagingCreateSchema ~~~ -For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}#start-replicator).
@@ -502,7 +349,6 @@ replicator mylogical \ --stagingCreateSchema ~~~ -For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}?filters=mysql#start-replicator).
@@ -520,63 +366,17 @@ replicator oraclelogminer \ --stagingCreateSchema ~~~ -For detailed steps, refer to [Load and replicate]({% link molt/migrate-load-replicate.md %}?filters=oracle#start-replicator). -
- -### Resume after interruption - -
- - - -
- -When resuming replication after an interruption, MOLT Replicator automatically uses the stored checkpoint to resume from the correct position. - -Rerun the same `replicator` command used during [forward replication](#forward-replication-with-initial-load), specifying the same fully-qualified [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) value as before. Omit [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) and any checkpoint flags. For example: - -
-{% include_cached copy-clipboard.html %} -~~~ shell -replicator pglogical \ ---sourceConn $SOURCE \ ---targetConn $TARGET \ ---slotName molt_slot \ ---stagingSchema defaultdb._replicator -~~~ - -For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}).
-
-{% include_cached copy-clipboard.html %} -~~~ shell -replicator mylogical \ ---sourceConn $SOURCE \ ---targetConn $TARGET \ ---stagingSchema defaultdb._replicator -~~~ - -For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}?filters=mysql). -
- -
-{% include_cached copy-clipboard.html %} -~~~ shell -replicator oraclelogminer \ ---sourceConn $SOURCE \ ---sourcePDBConn $SOURCE_PDB \ ---sourceSchema MIGRATION_USER \ ---targetConn $TARGET \ ---stagingSchema defaultdb._replicator -~~~ +For detailed walkthroughs of migrations that use `replicator` in this way, refer to these common migration approaches: -For detailed steps, refer to [Resume replication]({% link molt/migrate-resume-replication.md %}?filters=oracle). -
+- [Delta Migration]({% link molt/migration-approach-delta.md %}) +- [Streaming Migration]({% link molt/migration-approach-streaming.md %}) +- [Active-Active Migration]({% link molt/migration-approach-active-active.md %}) -### Failback to source database +### Failback replication -When replicating from CockroachDB back to the source database, MOLT Replicator acts as a webhook sink for a CockroachDB changefeed. +A migration that utilizes [failback replication]({% link molt/migration-considerations-rollback.md %}) replicates data from the CockroachDB cluster back to the source database. In this case, MOLT Replicator acts as a webhook sink for a CockroachDB changefeed. Use the `start` command for failback: @@ -599,7 +399,7 @@ Specify the CockroachDB connection string with [`--stagingConn`]({% link molt/re --stagingConn $STAGING ~~~ -Specify the staging database name with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) in fully-qualified `database.schema` format. This should be the same staging database created during [Forward replication with initial load](#forward-replication-with-initial-load): +Specify the staging database name with [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) in fully-qualified `database.schema` format. This should be the same staging database created during [Forward replication with initial load](#forward-replication-after-initial-load): {% include_cached copy-clipboard.html %} ~~~ @@ -642,52 +442,13 @@ When [creating the CockroachDB changefeed]({% link molt/migrate-failback.md %}#c Explicitly set a default `10s` [`webhook_client_timeout`]({% link {{ site.current_cloud_version }}/create-changefeed.md %}#options) value in the `CREATE CHANGEFEED` statement. This value ensures that the webhook can report failures in inconsistent networking situations and make crash loops more visible. {{site.data.alerts.end}} -## Monitoring - -### Metrics - -MOLT Replicator metrics are not enabled by default. Enable Replicator metrics by specifying the [`--metricsAddr`]({% link molt/replicator-flags.md %}#metrics-addr) flag with a port (or `host:port`) when you start Replicator. This exposes Replicator metrics at `http://{host}:{port}/_/varz`. For example, the following flag exposes metrics on port `30005`: - -~~~ ---metricsAddr :30005 -~~~ - -For guidelines on using and interpreting replication metrics, refer to [Replicator Metrics]({% link molt/replicator-metrics.md %}). - -### Logging - -By default, MOLT Replicator writes two streams of logs: operational logs to `stdout` (including `warning`, `info`, `trace`, and some errors) and final errors to `stderr`. - -Redirect both streams to ensure all logs are captured for troubleshooting: - -{% include_cached copy-clipboard.html %} -~~~shell -# Merge both streams to console -./replicator ... 2>&1 - -# Redirect both streams to a file -./replicator ... > output.log 2>&1 - -# Merge streams to console while saving to file -./replicator > >(tee replicator.log) 2>&1 - -# Use logDestination flag to write all logs to a file -./replicator --logDestination replicator.log ... -~~~ - -Enable debug logging with [`-v`]({% link molt/replicator-flags.md %}#verbose). For more granularity and system insights, enable trace logging with [`-vv`]({% link molt/replicator-flags.md %}#verbose). Pay close attention to warning- and error-level logs, as these indicate when Replicator is misbehaving. - -## Best practices - -### Test and validate +For a detailed walkthrough of a migration that use `replicator` in this way, refer to this common migration approach: -To verify that your connections and configuration work properly, run MOLT Replicator in a staging environment before replicating any data in production. Use a test or development environment that closely resembles production. +- [Active-Active Migration]({% link molt/migration-approach-active-active.md %}) -### Optimize performance +### Resuming after an interruption -{% include molt/optimize-replicator-performance.md %} - -## Troubleshooting +Whether you're using Replicator to perform [forward replication](#forward-replication-after-initial-load) or [failback replication](#failback-replication), an unexpected issue may cause replication to stop. Rerun the `replicator` command as shown below:
@@ -695,20 +456,52 @@ To verify that your connections and configuration work properly, run MOLT Replic
-{% include molt/molt-troubleshooting-replication.md %} +When resuming replication after an interruption, MOLT Replicator automatically uses the stored checkpoint to resume from the correct position. + +Rerun the same `replicator` command used during forward replication, specifying the same fully-qualified [`--stagingSchema`]({% link molt/replicator-flags.md %}#staging-schema) value as before. Omit [`--stagingCreateSchema`]({% link molt/replicator-flags.md %}#staging-create-schema) and any checkpoint flags. For example: + +
+{% include_cached copy-clipboard.html %} +~~~ shell +replicator pglogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--slotName molt_slot \ +--stagingSchema defaultdb._replicator +~~~ + +
-{% include molt/molt-troubleshooting-failback.md %} +
+{% include_cached copy-clipboard.html %} +~~~ shell +replicator mylogical \ +--sourceConn $SOURCE \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator +~~~ -## Examples +
-For detailed examples of using MOLT Replicator usage, refer to the migration workflow tutorials: +
+{% include_cached copy-clipboard.html %} +~~~ shell +replicator oraclelogminer \ +--sourceConn $SOURCE \ +--sourcePDBConn $SOURCE_PDB \ +--sourceSchema MIGRATION_USER \ +--targetConn $TARGET \ +--stagingSchema defaultdb._replicator +~~~ -- [Load and Replicate]({% link molt/migrate-load-replicate.md %}): Load data with MOLT Fetch and set up ongoing replication with MOLT Replicator. -- [Resume Replication]({% link molt/migrate-resume-replication.md %}): Resume replication after an interruption. -- [Migration failback]({% link molt/migrate-failback.md %}): Replicate changes from CockroachDB back to the initial source database. +
## See also +- [MOLT Replicator Installation]({% link molt/molt-replicator-installation.md %}) +- [MOLT Replicator Flags]({% link molt/replicator-flags.md %}) +- [MOLT Replicator Best Practices]({% link molt/molt-replicator-best-practices.md %}) +- [MOLT Replicator Troubleshooting]({% link molt/molt-replicator-troubleshooting.md %}) - [Migration Overview]({% link molt/migration-overview.md %}) - [Migration Strategy]({% link molt/migration-strategy.md %}) - [MOLT Fetch]({% link molt/molt-fetch.md %}) \ No newline at end of file diff --git a/src/current/molt/molt-type-mapping.md b/src/current/molt/molt-type-mapping.md new file mode 100644 index 00000000000..219569dd0e1 --- /dev/null +++ b/src/current/molt/molt-type-mapping.md @@ -0,0 +1,69 @@ +--- +title: Type Mappings +summary: Learn what the default type mappings are when using the MOLT Schema Conversion Tool and MOLT Fetch. +toc: true +docs_area: migrate +--- + +The MOLT Schema Conversion Tool and [MOLT Fetch]({% link molt/molt-fetch.md %}#handle-target-tables) can be used to automatically generate schema for a CockroachDB cluster. By default, types are mapped from the source database to CockroachDB as follows: + +- PostgreSQL types are mapped to existing CockroachDB [types]({% link {{site.current_cloud_version}}/data-types.md %}) that have the same [`OID`]({% link {{site.current_cloud_version}}/oid.md %}). +- The following MySQL types are mapped to corresponding CockroachDB types: + + | MySQL type | CockroachDB type | Notes | + |-----------------------------------------------------|-------------------------------------------------------------------------------------------|--------------------------------------------------------------| + | `CHAR`, `CHARACTER`, `VARCHAR`, `NCHAR`, `NVARCHAR` | [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %}) | Varying-length string; raises warning if BYTE semantics used | + | `TINYTEXT`, `TEXT`, `MEDIUMTEXT`, `LONGTEXT` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Unlimited-length string | + | `GEOMETRY` | [`GEOMETRY`]({% link {{site.current_cloud_version}}/architecture/glossary.md %}#geometry) | Spatial type (PostGIS-style) | + | `LINESTRING` | [`LINESTRING`]({% link {{site.current_cloud_version}}/linestring.md %}) | Spatial type (PostGIS-style) | + | `POINT` | [`POINT`]({% link {{site.current_cloud_version}}/point.md %}) | Spatial type (PostGIS-style) | + | `POLYGON` | [`POLYGON`]({% link {{site.current_cloud_version}}/polygon.md %}) | Spatial type (PostGIS-style) | + | `MULTIPOINT` | [`MULTIPOINT`]({% link {{site.current_cloud_version}}/multipoint.md %}) | Spatial type (PostGIS-style) | + | `MULTILINESTRING` | [`MULTILINESTRING`]({% link {{site.current_cloud_version}}/multilinestring.md %}) | Spatial type (PostGIS-style) | + | `MULTIPOLYGON` | [`MULTIPOLYGON`]({% link {{site.current_cloud_version}}/multipolygon.md %}) | Spatial type (PostGIS-style) | + | `GEOMETRYCOLLECTION`, `GEOMCOLLECTION` | [`GEOMETRYCOLLECTION`]({% link {{site.current_cloud_version}}/geometrycollection.md %}) | Spatial type (PostGIS-style) | + | `JSON` | [`JSONB`]({% link {{site.current_cloud_version}}/jsonb.md %}) | CRDB's native JSON format | + | `TINYINT`, `INT1` | [`INT2`]({% link {{site.current_cloud_version}}/int.md %}) | 2-byte integer | + | `BLOB` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | + | `SMALLINT`, `INT2` | [`INT2`]({% link {{site.current_cloud_version}}/int.md %}) | 2-byte integer | + | `MEDIUMINT`, `INT`, `INTEGER`, `INT4` | [`INT4`]({% link {{site.current_cloud_version}}/int.md %}) | 4-byte integer | + | `BIGINT`, `INT8` | [`INT`]({% link {{site.current_cloud_version}}/int.md %}) | 8-byte integer | + | `FLOAT` | [`FLOAT4`]({% link {{site.current_cloud_version}}/float.md %}) | 32-bit float | + | `DOUBLE` | [`FLOAT`]({% link {{site.current_cloud_version}}/float.md %}) | 64-bit float | + | `DECIMAL`, `NUMERIC`, `REAL` | [`DECIMAL`]({% link {{site.current_cloud_version}}/decimal.md %}) | Validates scale ≤ precision; warns if precision > 19 | + | `BINARY`, `VARBINARY` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | + | `DATETIME` | [`TIMESTAMP`]({% link {{site.current_cloud_version}}/timestamp.md %}) | Date and time (no time zone) | + | `TIMESTAMP` | [`TIMESTAMPTZ`]({% link {{site.current_cloud_version}}/timestamp.md %}) | Date and time with time zone | + | `TIME` | [`TIME`]({% link {{site.current_cloud_version}}/time.md %}) | Time of day (no date) | + | `BIT` | [`VARBIT`]({% link {{site.current_cloud_version}}/bit.md %}) | Variable-length bit array | + | `DATE` | [`DATE`]({% link {{site.current_cloud_version}}/date.md %}) | Date only (no time) | + | `TINYBLOB`, `MEDIUMBLOB`, `LONGBLOB` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | + | `BOOL`, `BOOLEAN` | [`BOOL`]({% link {{site.current_cloud_version}}/bool.md %}) | Boolean | + +- The following Oracle types are mapped to CockroachDB types: + + | Oracle type(s) | CockroachDB type | Notes | + |---------------------------------------|--------------------------------------------------------------------------------------------------------------------------------------------------|-------------------------------------------------------------------------| + | `NCHAR`, `CHAR`, `CHARACTER` | [`CHAR`]({% link {{site.current_cloud_version}}/string.md %})(n) or [`CHAR`]({% link {{site.current_cloud_version}}/string.md %}) | Fixed-length character; falls back to unbounded if length not specified | + | `VARCHAR`, `VARCHAR2`, `NVARCHAR2` | [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %})(n) or [`VARCHAR`]({% link {{site.current_cloud_version}}/string.md %}) | Varying-length string; raises warning if BYTE semantics used | + | `STRING` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Unlimited-length string | + | `SMALLINT` | [`INT2`]({% link {{site.current_cloud_version}}/int.md %}) | 2-byte integer | + | `INTEGER`, `INT`, `SIMPLE_INTEGER` | [`INT4`]({% link {{site.current_cloud_version}}/int.md %}) | 4-byte integer | + | `LONG` | [`INT8`]({% link {{site.current_cloud_version}}/int.md %}) | 8-byte integer | + | `FLOAT`, `BINARY_FLOAT`, `REAL` | [`FLOAT4`]({% link {{site.current_cloud_version}}/float.md %}) | 32-bit float | + | `DOUBLE`, `BINARY_DOUBLE` | [`FLOAT8`]({% link {{site.current_cloud_version}}/float.md %}) | 64-bit float | + | `DEC`, `NUMBER`, `DECIMAL`, `NUMERIC` | [`DECIMAL`]({% link {{site.current_cloud_version}}/decimal.md %})(p, s) or [`DECIMAL`]({% link {{site.current_cloud_version}}/decimal.md %}) | Validates scale ≤ precision; warns if precision > 19 | + | `DATE` | [`DATE`]({% link {{site.current_cloud_version}}/date.md %}) | Date only (no time) | + | `BLOB`, `RAW`, `LONG RAW` | [`BYTES`]({% link {{site.current_cloud_version}}/bytes.md %}) | Binary data | + | `JSON` | [`JSONB`]({% link {{site.current_cloud_version}}/jsonb.md %}) | CRDB's native JSON format | + | `CLOB`, `NCLOB` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Treated as large text | + | `BOOLEAN` | [`BOOL`]({% link {{site.current_cloud_version}}/bool.md %}) | Boolean | + | `TIMESTAMP` | [`TIMESTAMP`]({% link {{site.current_cloud_version}}/timestamp.md %}) or [`TIMESTAMPTZ`]({% link {{site.current_cloud_version}}/timestamp.md %}) | If `WITH TIME ZONE` → `TIMESTAMPTZ`, else `TIMESTAMP` | + | `ROWID`, `UROWID` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Treated as opaque identifier | + | `SDO_GEOMETRY` | [`GEOMETRY`]({% link {{site.current_cloud_version}}/architecture/glossary.md %}#geometry) | Spatial type (PostGIS-style) | + | `XMLTYPE` | [`STRING`]({% link {{site.current_cloud_version}}/string.md %}) | Stored as text | + +## See also + +- [MOLT Fetch]({% link molt/molt-fetch.md %}) +- [MOLT Schema Conversion Tool]({% link cockroachcloud/migrations-page.md %}) \ No newline at end of file diff --git a/src/current/molt/replicator-flags.md b/src/current/molt/replicator-flags.md index ad6ec3b8664..f09ae8bf84e 100644 --- a/src/current/molt/replicator-flags.md +++ b/src/current/molt/replicator-flags.md @@ -5,7 +5,24 @@ toc: false docs_area: migrate --- -This page lists all available flags for the [MOLT Replicator commands]({% link molt/molt-replicator.md %}#commands): `start`, `pglogical`, `mylogical`, `oraclelogminer`, and `make-jwt`. +## Commands + +MOLT Replicator provides the following commands: + +| Command | Description | +|------------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| +| `pglogical` | Replicate from PostgreSQL source to CockroachDB target using logical replication. | +| `mylogical` | Replicate from MySQL source to CockroachDB target using GTID-based replication. | +| `oraclelogminer` | Replicate from Oracle source to CockroachDB target using Oracle LogMiner. | +| `start` | Replicate from CockroachDB source to PostgreSQL, MySQL, or Oracle target ([failback mode]({% link molt/molt-replicator.md %}#failback-replication)). Requires a CockroachDB changefeed with rangefeeds enabled. | +| `make-jwt` | Generate JWT tokens for authorizing changefeed connections in failback scenarios. Supports signing tokens with RSA or EC keys, or generating claims for external JWT providers. For details, refer to [JWT authentication]({% link molt/molt-replicator-best-practices.md %}#jwt-authentication). | +| `version` | Display version information and Go module dependencies with checksums. For details, refer to [Supply chain security]({% link molt/molt-replicator-best-practices.md %}#supply-chain-security). | + +For command-specific flags and examples, refer to MOLT Replicator's [How it works]({% link molt/molt-replicator.md %}#how-it-works) and [Common uses]({% link molt/molt-replicator.md %}#common-uses) documentation. + +## Flags + +This page lists all available flags for the [MOLT Replicator commands](#commands): `start`, `pglogical`, `mylogical`, `oraclelogminer`, and `make-jwt`. | Flag | Commands | Type | Description | |---------------------------------------------------------------------------------------------|-----------------------------------------------------|------------|--------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------| @@ -53,7 +70,7 @@ This page lists all available flags for the [MOLT Replicator commands]({% link m | `--schemaRefresh` | `start`, `pglogical`, `mylogical`, `oraclelogminer` | `DURATION` | How often a watcher will refresh its schema. If this value is zero or negative, refresh behavior will be disabled.

**Default:** `1m0s` | | `--scn` | `oraclelogminer` | `INT` | **Required** the first time `replicator` is run. The snapshot System Change Number (SCN) from the initial data load, which provides a replication marker for streaming changes. | | `--scnWindowSize` | `oraclelogminer` | `INT` | The maximum size of SCN bounds per pull iteration from LogMiner. This helps prevent timeout errors when processing large SCN ranges. Set to `0` or a negative value to disable the cap.

**Default:** `3250` | -| `--slotName` | `pglogical` | `STRING` | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command]({% link molt/molt-fetch.md %}#load-before-replication).

**Default:** `"replicator"` | +| `--slotName` | `pglogical` | `STRING` | **Required.** PostgreSQL replication slot name. Must match the slot name specified with `--pglogical-replication-slot-name` in the [MOLT Fetch command]({% link molt/molt-fetch.md %}#initial-bulk-load-before-replication).

**Default:** `"replicator"` | | `--sourceConn` | `pglogical`, `mylogical`, `oraclelogminer` | `STRING` | The source database's connection string. When replicating from Oracle, this is the connection string of the Oracle container database (CDB). | | `--sourcePDBConn` | `oraclelogminer` | `STRING` | Connection string for the Oracle pluggable database (PDB). Only required when using an [Oracle multitenant configuration](https://docs.oracle.com/en/database/oracle/oracle-database/21/cncpt/CDBs-and-PDBs.html). [`--sourceConn`](#source-conn) **must** be included. | | `--sourceSchema` | `oraclelogminer` | `STRING` | **Required.** Source schema name on Oracle where tables will be replicated from. | diff --git a/src/current/releases/molt.md b/src/current/releases/molt.md index accc7cdaa33..fda58ef22ca 100644 --- a/src/current/releases/molt.md +++ b/src/current/releases/molt.md @@ -99,9 +99,9 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.3.1 is [available](#installation). -- MOLT Fetch now supports [sharding]({% link molt/molt-fetch.md %}#table-sharding) of primary keys of any data type on PostgreSQL 11+ sources. This can be enabled with the [`--use-stats-based-sharding`]({% link molt/molt-fetch.md %}#global-flags) flag. -- Added the [`--ignore-replication-check`]({% link molt/molt-fetch.md %}#global-flags) flag to allow data loads with planned downtime and no replication setup. The `--pglogical-ignore-wal-check` flag has been removed. -- Added the `--enableParallelApplies` [replication flag]({% link molt/molt-replicator.md %}#flags) to enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of increased target pool and memory usage. +- MOLT Fetch now supports [sharding]({% link molt/molt-fetch.md %}#shard-tables-for-concurrent-export) of primary keys of any data type on PostgreSQL 11+ sources. This can be enabled with the [`--use-stats-based-sharding`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag. +- Added the [`--ignore-replication-check`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag to allow data loads with planned downtime and no replication setup. The `--pglogical-ignore-wal-check` flag has been removed. +- Added the `--enableParallelApplies` [replication flag]({% link molt/replicator-flags.md %}) to enable parallel application of independent table groups during replication. By default, applies are synchronous. When enabled, this increases throughput at the cost of increased target pool and memory usage. - Improved cleanup logic for scheduled tasks to ensure progress reporting and prevent indefinite hangs. - Added parallelism gating to ensure the parallelism setting is smaller than the `targetMaxPoolSize`. This helps prevent a potential indefinite hang. - Added new metrics that track start and end times for progress reports (`core_progress_reports_started_count` and `core_progress_reports_ended_count`) and error reports (`core_error_reports_started_count` and `core_error_reports_ended_count`). These provide visibility into the core sequencer progress and help identify hangs in the applier and progress tracking pipeline. @@ -151,7 +151,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.2.6 is [available](#installation). -- Fixed a bug in [`--direct-copy` mode]({% link molt/molt-fetch.md %}#direct-copy) that occurred when [`--case-sensitive`]({% link molt/molt-fetch.md %}#global-flags) was set to `false` (default). Previously, the `COPY` query could use incorrect column names in some cases during data transfer, causing errors. The query now uses the correct column names. +- Fixed a bug in [`--direct-copy` mode]({% link molt/molt-fetch.md %}#direct-copy) that occurred when [`--case-sensitive`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) was set to `false` (default). Previously, the `COPY` query could use incorrect column names in some cases during data transfer, causing errors. The query now uses the correct column names. - Fixed a bug in how origin messages were handled during replication from PostgreSQL sources. This allows replication to successfully continue. - `ENUM` types can now be replicated from MySQL 8.0 sources. @@ -169,7 +169,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer - MOLT Fetch failback to CockroachDB is now disallowed. - MOLT Verify can now compare tables that are named differently on the source and target schemas. - The `molt` logging date format is now period-delimited for Windows compatibility. -- During replication, an index is now created on all tables by default, improving replication performance. Because index creation can cause the replication process to initialize more slowly, this behavior can be disabled using the `--stageDisableCreateTableReaderIndex` [replication flag]({% link molt/molt-replicator.md %}#flags). +- During replication, an index is now created on all tables by default, improving replication performance. Because index creation can cause the replication process to initialize more slowly, this behavior can be disabled using the `--stageDisableCreateTableReaderIndex` [replication flag]({% link molt/replicator-flags.md %}#stage-disable-create-table-reader-index). - Added a failback metric that tracks the time to write a source commit to the staging schema for a given mutation. - Added a failback metric that tracks the time to write a source commit to the target database for a given mutation. @@ -177,41 +177,41 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.2.3 is [available](#installation). -- MOLT Fetch users can now set [`--table-concurrency`]({% link molt/molt-fetch.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch.md %}#global-flags) to values greater than `1` for MySQL sources. -- MOLT Fetch now supports case-insensitive comparison of table and column names by default. Previously, case-sensitive comparisons could result in `no matching table on target` errors. To disable case-sensitive comparisons explicitly, set [`--case-sensitive=false`]({% link molt/molt-fetch.md %}#global-flags). If `=` is **not** included (e.g., `--case-sensitive false`), this is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`). +- MOLT Fetch users can now set [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to values greater than `1` for MySQL sources. +- MOLT Fetch now supports case-insensitive comparison of table and column names by default. Previously, case-sensitive comparisons could result in `no matching table on target` errors. To disable case-sensitive comparisons explicitly, set [`--case-sensitive=false`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). If `=` is **not** included (e.g., `--case-sensitive false`), this is interpreted as `--case-sensitive` (i.e., `--case-sensitive=true`). ### February 5, 2025 `molt` 1.2.2 is [available](#installation). -- Added an [`--import-region`]({% link molt/molt-fetch.md %}#global-flags) flag that is used to set the `AWS_REGION` query parameter explicitly in the [`s3` URL]({% link molt/molt-fetch.md %}#bucket-path). -- Fixed the [`truncate-if-exists`]({% link molt/molt-fetch.md %}#target-table-handling) schema mode for cases where there are uppercase table or schema names. +- Added an [`--import-region`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag that is used to set the `AWS_REGION` query parameter explicitly in the [`s3` URL]({% link molt/molt-fetch.md %}#bucket-path). +- Fixed the [`truncate-if-exists`]({% link molt/molt-fetch.md %}#handle-target-tables) schema mode for cases where there are uppercase table or schema names. - Fixed an issue with unsigned `BIGINT` values overflowing in replication. -- Added a `--schemaRefresh` [replication flag]({% link molt/molt-replicator.md %}#flags) that is used to configure the schema watcher refresh delay in the replication phase. Previously, the refresh delay was set to a constant value of 1 minute. Set the flag as follows: `--replicator-flags "--schemaRefresh {value}"`. +- Added a `--schemaRefresh` [replication flag]({% link molt/replicator-flags.md %}#schema-refresh) that is used to configure the schema watcher refresh delay in the replication phase. Previously, the refresh delay was set to a constant value of 1 minute. Set the flag as follows: `--replicator-flags "--schemaRefresh {value}"`. ### December 13, 2024 `molt` 1.2.1 is [available](#installation). -- MOLT Fetch users now can use [`--assume-role`]({% link molt/molt-fetch.md %}#global-flags) to specify a service account for assume role authentication to cloud storage. `--assume-role` must be used with `--use-implicit-auth`, or the flag will be ignored. +- MOLT Fetch users now can use [`--assume-role`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to specify a service account for assume role authentication to cloud storage. `--assume-role` must be used with `--use-implicit-auth`, or the flag will be ignored. - MySQL 5.7 and later are now supported with MOLT Fetch replication modes. - Fetch replication mode now defaults to a less verbose `INFO` logging level. To specify `DEBUG` logging, pass in the `--replicator-flags '-v'` setting, or `--replicator-flags '-vv'` for trace logging. - MySQL columns of type `BIGINT UNSIGNED` or `SERIAL` are now auto-mapped to [`DECIMAL`]({% link {{ site.current_cloud_version }}/decimal.md %}) type in CockroachDB. MySQL regular `BIGINT` types are mapped to [`INT`]({% link {{ site.current_cloud_version }}/int.md %}) type in CockroachDB. -- The `pglogical` replication workflow was modified in order to enforce safer and simpler defaults for the [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode), `data-load-and-replication`, and `replication-only` workflows for PostgreSQL sources. Fetch now ensures that the publication is created before the slot, and that `replication-only` defaults to using publications and slots created either in previous Fetch runs or manually. +- The `pglogical` replication workflow was modified in order to enforce safer and simpler defaults for the [`data-load`]({% link molt/molt-fetch.md %}#define-fetch-mode), `data-load-and-replication`, and `replication-only` workflows for PostgreSQL sources. Fetch now ensures that the publication is created before the slot, and that `replication-only` defaults to using publications and slots created either in previous Fetch runs or manually. - Fixed scan iterator query ordering for `BINARY` and `TEXT` (of same collation) PKs so that they lead to the correct queries and ordering. -- For a MySQL source in `replication-only` mode, the [`--stagingSchema` replicator flag]({% link molt/replicator-flags.md %}#staging-schema) can now be used to resume streaming replication after being interrupted. Otherwise, the [`--defaultGTIDSet` replicator flag]({% link molt/replicator-flags.md %}#default-gtid-set) is used to start initial replication after a previous Fetch run in [`data-load`]({% link molt/molt-fetch.md %}#fetch-mode) mode, or as an override to the current replication stream. +- For a MySQL source in `replication-only` mode, the [`--stagingSchema` replicator flag]({% link molt/replicator-flags.md %}#staging-schema) can now be used to resume streaming replication after being interrupted. Otherwise, the [`--defaultGTIDSet` replicator flag]({% link molt/replicator-flags.md %}#default-gtid-set) is used to start initial replication after a previous Fetch run in [`data-load`]({% link molt/molt-fetch.md %}#define-fetch-mode) mode, or as an override to the current replication stream. ### October 29, 2024 `molt` 1.2.0 is [available](#installation). - Added `failback` mode to MOLT Fetch, which allows the user to replicate changes on CockroachDB back to the initial source database. Failback is supported for MySQL and PostgreSQL databases. -- The [`--pprof-list-addr` flag]({% link molt/molt-fetch.md %}#global-flags), which specifies the address of the `pprof` endpoint, is now configurable. The default value is `'127.0.0.1:3031'`. -- [Fetch modes]({% link molt/molt-fetch.md %}#fetch-mode) involving replication now state that MySQL 8.0 and later are supported for replication between MySQL and CockroachDB. -- [Partitioned tables]({% link molt/molt-fetch.md %}#transformations) can now be moved to CockroachDB using [`IMPORT INTO`]({% link {{ site.current_cloud_version }}/import-into.md %}). -- Improved logging for the [Fetch]({% link molt/molt-fetch.md %}) schema check phases under the `trace` logging level, which is set with [`--logging trace`]({% link molt/molt-fetch.md %}#global-flags). +- The [`--pprof-list-addr` flag]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), which specifies the address of the `pprof` endpoint, is now configurable. The default value is `'127.0.0.1:3031'`. +- [Fetch modes]({% link molt/molt-fetch.md %}#define-fetch-mode) involving replication now state that MySQL 8.0 and later are supported for replication between MySQL and CockroachDB. +- [Partitioned tables]({% link molt/molt-fetch.md %}#define-transformations) can now be moved to CockroachDB using [`IMPORT INTO`]({% link {{ site.current_cloud_version }}/import-into.md %}). +- Improved logging for the [Fetch]({% link molt/molt-fetch.md %}) schema check phases under the `trace` logging level, which is set with [`--logging trace`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). - Added a [sample Grafana dashboard](https://molt.cockroachdb.com/molt/cli/grafana_dashboard.json) for monitoring MOLT tasks. -- Fetch now logs the name of the staging database in the target CockroachDB cluster used to store metadata for [replication modes]({% link molt/molt-fetch.md %}#fetch-mode). +- Fetch now logs the name of the staging database in the target CockroachDB cluster used to store metadata for [replication modes]({% link molt/molt-fetch.md %}#define-fetch-mode). - String [primary keys]({% link {{ site.current_cloud_version }}/primary-key.md %}) that use `C` [collations]({% link {{ site.current_cloud_version }}/collate.md %}) on PostgreSQL can now be compared to the default `en_US.utf8` on CockroachDB. - MOLT is now distributed under the [Cockroach Labs Product License Agreement](https://www.cockroachlabs.com/cockroach-labs-product-license-agreement/), which is bundled with the binary. @@ -219,7 +219,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.7 is [available](#installation). -- When a [Fetch transformation rule]({% link molt/molt-fetch.md %}#transformations) is used to rename a table or map partitioned tables, a script in the format `partitionTableScript.{timestamp}.ts` is now automatically generated to ensure that [replication]({% link molt/molt-fetch.md %}#fetch-mode) works properly if enabled. +- When a [Fetch transformation rule]({% link molt/molt-fetch.md %}#define-transformations) is used to rename a table or map partitioned tables, a script in the format `partitionTableScript.{timestamp}.ts` is now automatically generated to ensure that [replication]({% link molt/molt-fetch.md %}#define-fetch-mode) works properly if enabled. ### August 19, 2024 @@ -232,8 +232,8 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.5 is [available](#installation). - **Deprecated** the `--ongoing-replication` flag in favor of `--mode data-load-and-replication`, using the new `--mode` flag. Users should replace all instances of `--ongoing-replication` with `--mode data-load-and-replication`. -- Fetch can now be run in an export-only mode by specifying [`--mode export-only`]({% link molt/molt-fetch.md %}#fetch-mode). This will export all the data in `csv` or `csv.gz` format to the specified cloud or local store. -- Fetch can now be run in an import-only mode by specifying [`--mode import-only`]({% link molt/molt-fetch.md %}#fetch-mode). This will load all data in the specified cloud or local store into the target CockroachDB database, effectively skipping the export data phase. +- Fetch can now be run in an export-only mode by specifying [`--mode export-only`]({% link molt/molt-fetch.md %}#define-fetch-mode). This will export all the data in `csv` or `csv.gz` format to the specified cloud or local store. +- Fetch can now be run in an import-only mode by specifying [`--mode import-only`]({% link molt/molt-fetch.md %}#define-fetch-mode). This will load all data in the specified cloud or local store into the target CockroachDB database, effectively skipping the export data phase. - Strings for the `--mode` flag are now word-separated by hyphens instead of underscores. For example, `replication-only` instead of `replication_only`. ### August 8, 2024 @@ -241,7 +241,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.4 is [available](#installation). - Added a replication-only mode for Fetch that allows the user to run ongoing replication without schema creation or initial data load. This requires users to set `--mode replication_only` and `--replicator-flags` to specify the `defaultGTIDSet` ([MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical)) or `slotName` ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical)). -- Partitioned tables can now be mapped to renamed tables on the target database, using the Fetch [transformations framework]({% link molt/molt-fetch.md %}#transformations). +- Partitioned tables can now be mapped to renamed tables on the target database, using the Fetch [transformations framework]({% link molt/molt-fetch.md %}#define-transformations). - Added a new `--metrics-scrape-interval` flag to allow users to specify their Prometheus scrape interval and apply a sleep at the end to allow for the final metrics to be scraped. - Previously, there was a mismatch between the errors logged in log lines and those recorded in the exceptions table when an `IMPORT INTO` or `COPY FROM` operation failed due to a non-PostgreSQL error. Now, all errors will lead to an exceptions table entry that allows the user to continue progress from a certain table's file. - Fixed a bug that will allow Fetch to properly determine a GTID if there are multiple `source_uuids`. @@ -251,8 +251,8 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.1.3 is [available](#installation). - `'infinity'::timestamp` values can now be moved with Fetch. -- Fixed an issue where connections were not being closed immediately after sharding was completed. This could lead to errors if the [maximum number of connections]({% link molt/molt-fetch.md %}#best-practices) was set to a low value. -- Fetch users can now exclude specific tables from migration using the [`--table-exclusion-filter` flag]({% link molt/molt-fetch.md %}#global-flags). +- Fixed an issue where connections were not being closed immediately after sharding was completed. This could lead to errors if the [maximum number of connections]({% link molt/molt-fetch-best-practices.md %}#configure-the-source-database-and-connection) was set to a low value. +- Fetch users can now exclude specific tables from migration using the [`--table-exclusion-filter` flag]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). ### July 18, 2024 @@ -260,16 +260,16 @@ Cockroach Labs recommends using the latest available version of each tool. Refer - Fetch users can now specify columns to exclude from table migrations in order to migrate a subset of their data. This is supported in the schema creation, export, import, and direct copy phases. - Fetch now automatically maps a partitioned table from a PostgreSQL source to the target CockroachDB schema. -- Fetch now supports column exclusions and computed column mappings via a new [transformations framework]({% link molt/molt-fetch.md %}#transformations). -- The new Fetch [`--transformations-file`]({% link molt/molt-fetch.md %}#global-flags) flag specifies a JSON file for schema/table/column transformations, which has validation utilities built in. +- Fetch now supports column exclusions and computed column mappings via a new [transformations framework]({% link molt/molt-fetch.md %}#define-transformations). +- The new Fetch [`--transformations-file`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag specifies a JSON file for schema/table/column transformations, which has validation utilities built in. ### July 10, 2024 `molt` 1.1.1 is [available](#installation). -- Fixed a bug that led to incorrect list continuation file behavior if a trailing slash was provided in [`--bucket-path`]({% link molt/molt-fetch.md %}#global-flags). +- Fixed a bug that led to incorrect list continuation file behavior if a trailing slash was provided in [`--bucket-path`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags). - Fixed a bug with extracting the filename from a failed import URL. Previously, an older filename was being used, which could result in duplicated data. Now, the filename that is used in import matches what is stored in the exceptions log table. -- Added a [`--use-implicit-auth`]({% link molt/molt-fetch.md %}#global-flags) flag that determines whether [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) is used for cloud storage import URIs. +- Added a [`--use-implicit-auth`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag that determines whether [implicit authentication]({% link {{ site.current_cloud_version }}/cloud-storage-authentication.md %}) is used for cloud storage import URIs. ### July 8, 2024 @@ -284,14 +284,14 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 1.0.0 is [available](#installation). -- Renamed the `--table-splits` flag to [`--concurrency-per-table`]({% link molt/molt-fetch.md %}#global-flags), which is more descriptive. -- Increased the default value of [`--import-batch-size`]({% link molt/molt-fetch.md %}#global-flags) to `1000`. This leads to better performance on the target post-migration. Each individual import job will take longer, since more data is now imported in each batch, but the sum total of all jobs should take the same (or less) time. +- Renamed the `--table-splits` flag to [`--concurrency-per-table`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags), which is more descriptive. +- Increased the default value of [`--import-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) to `1000`. This leads to better performance on the target post-migration. Each individual import job will take longer, since more data is now imported in each batch, but the sum total of all jobs should take the same (or less) time. ### May 29, 2024 `molt` 0.3.0 is [available](#installation). -- Added an [`--import-batch-size`]({% link molt/molt-fetch.md %}#global-flags) flag, which configures the number of files to be imported in each `IMPORT` job. +- Added an [`--import-batch-size`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag, which configures the number of files to be imported in each `IMPORT` job. - In some cases on the previous version, binaries would not work due to how `molt` was being built. Updated the build method to use static linking, which creates binaries that should be more portable. - [`VARBIT`]({% link {{ site.current_cloud_version }}/bit.md %}) <> [`BOOL`]({% link {{ site.current_cloud_version }}/bool.md %}) conversion is now allowed for Fetch and Verify. The bit array is first converted to `UINT64`. A resulting `1` or `0` is converted to `true` or `false` accordingly. If the `UINT64` is another value, an error is emitted. @@ -300,7 +300,7 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 0.2.1 is [available](#installation). - MOLT tools now enforce secure connections to databases as a default. The `--allow-tls-mode-disable` flag allows users to override that behavior if secure access is not possible. -- When using MySQL as a source, [`--table-concurrency`]({% link molt/molt-fetch.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch.md %}#global-flags) are strictly set to `1`. +- When using MySQL as a source, [`--table-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) and [`--export-concurrency`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) are strictly set to `1`. - Fixed a bug involving history retention for [`DECIMAL`]({% link {{ site.current_cloud_version }}/decimal.md %}) values. ### May 3, 2024 @@ -308,9 +308,9 @@ Cockroach Labs recommends using the latest available version of each tool. Refer `molt` 0.2.0 is [available](#installation). - Fetch now supports CockroachDB [multi-region tables]({% link {{ site.current_cloud_version }}/multiregion-overview.md %}). -- Fetch now supports continuous replication for PostgreSQL and MySQL source databases via the [`--ongoing-replication`]({% link molt/molt-fetch.md %}#global-flags) flag. When Fetch finishes the initial data load phase, it will start the replicator process as a subprocess, which runs indefinitely until the user ends the process with a `SIGTERM` (`ctrl-c`). -- Replicator flags for ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical#postgresql-logical-replication) and [MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical#mysqlmariadb-replication)) are now supported, allowing users to further configure the [`--ongoing-replication`]({% link molt/molt-fetch.md %}#global-flags) mode for their use case. -- Added the [`--type-map-file`]({% link molt/molt-fetch.md %}#global-flags) flag, which enables custom type mapping for schema creation. +- Fetch now supports continuous replication for PostgreSQL and MySQL source databases via the [`--ongoing-replication`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag. When Fetch finishes the initial data load phase, it will start the replicator process as a subprocess, which runs indefinitely until the user ends the process with a `SIGTERM` (`ctrl-c`). +- Replicator flags for ([PostgreSQL](https://github.com/cockroachdb/replicator/wiki/PGLogical#postgresql-logical-replication) and [MySQL](https://github.com/cockroachdb/replicator/wiki/MYLogical#mysqlmariadb-replication)) are now supported, allowing users to further configure the [`--ongoing-replication`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) mode for their use case. +- Added the [`--type-map-file`]({% link molt/molt-fetch-commands-and-flags.md %}#global-flags) flag, which enables custom type mapping for schema creation. - Fixed a bug where primary key positions could be missed when creating a schema with multiple primary keys. - Added a default mode for MySQL sources that ensures consistency and does not leverage parallelism. New text is displayed that alerts the user and links to documentation in cases where fetching from MySQL might not be consistent. - Logging for continuation tokens is now omitted when data export does not successfully complete.