Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
29 changes: 17 additions & 12 deletions antora.yml
Original file line number Diff line number Diff line change
Expand Up @@ -8,19 +8,26 @@ nav:

asciidoc:
attributes:
#General attributes
company: 'DataStax'
support-url: 'https://www.ibm.com/mysupport/s/'
#Other product attributes
spark-reg: 'Apache Spark(TM)'
spark: 'Apache Spark'
spark-short: 'Spark'
hadoop-reg: 'Apache Hadoop(R)'
cass-reg: 'Apache Cassandra(R)'
cass: 'Apache Cassandra'
cass-short: 'Cassandra'
cql: 'Cassandra Query Language (CQL)'
cql-shell: 'CQL shell'
cql-console: 'CQL console'
cql-proxy-url: 'https://github.com/datastax/cql-proxy'
java-driver-url: 'https://github.com/apache/cassandra-java-driver'
dse: 'DataStax Enterprise (DSE)'
dse-short: 'DSE'
metrics-collector: 'DSE Metrics Collector'
hcd: 'Hyper-Converged Database (HCD)'
hcd-short: 'HCD'
mc: 'Mission Control'
#Astra DB attributes
astra-db: 'Astra DB'
astra: 'Astra'
db-serverless: 'Serverless (non-vector)'
Expand All @@ -31,14 +38,9 @@ asciidoc:
scb: 'Secure Connect Bundle (SCB)'
scb-short: 'SCB'
scb-brief: 'Secure Connect Bundle'
#Sideloader has a specific name in this repo. It is not identical to the one in the Serverless repo.
sstable-sideloader: '{astra-db} Sideloader'
#devops api attributes
devops-api: 'DevOps API'
devops-api-ref-url: 'xref:astra-api-docs:ROOT:attachment$devops-api/index.html'
#data api attributes
data-api: 'Data API'
#Migration docs attributes
product: 'Zero Downtime Migration'
product-short: 'ZDM'
product-proxy: 'ZDM Proxy'
Expand All @@ -48,9 +50,12 @@ asciidoc:
product-automation-repo: 'https://github.com/datastax/zdm-proxy-automation'
product-automation-shield: 'image:https://img.shields.io/github/v/release/datastax/zdm-proxy-automation?label=latest[alt="Latest zdm-proxy-automation release on GitHub",link="{product-automation-repo}/releases"]'
product-demo: 'ZDM Demo Client'
dsbulk-loader: 'DSBulk Loader'
dsbulk-loader-repo: 'https://github.com/datastax/dsbulk'
cass-migrator: 'Cassandra Data Migrator'
dsbulk: 'DataStax Bulk Loader (DSBulk)'
dsbulk-short: 'DSBulk'
dsbulk-repo: 'https://github.com/datastax/dsbulk'
dsbulk-migrator: 'DSBulk Migrator'
cass-migrator: 'Cassandra Data Migrator (CDM)'
cass-migrator-short: 'CDM'
cass-migrator-repo: 'https://github.com/datastax/cassandra-data-migrator'
cass-migrator-shield: 'image:https://img.shields.io/github/v/release/datastax/cassandra-data-migrator?label=latest[alt="Latest cassandra-data-migrator release on GitHub",link="{cass-migrator-repo}/packages"]'
cass-migrator-shield: 'image:https://img.shields.io/github/v/release/datastax/cassandra-data-migrator?label=latest[alt="Latest cassandra-data-migrator release on GitHub",link="{cass-migrator-repo}/packages"]'
sstable-sideloader: '{astra-db} Sideloader'
184 changes: 163 additions & 21 deletions local-preview-playbook.yml
Original file line number Diff line number Diff line change
Expand Up @@ -60,34 +60,176 @@ asciidoc:
xrefstyle: short
# CUSTOM ATTRIBUTES
company: 'DataStax'
astra_db: 'Astra DB'
astra_stream: 'Astra Streaming'
astra-stream: 'Astra Streaming' # AIM: Once all instances of astra_stream are removed, keep only the astra-stream attribute.
astra_ui: 'Astra Portal'
astra_cli: 'Astra CLI'
astra-cli: 'Astra CLI' # AIM: Once all instances of astra_cli are removed, keep only the astra-cli attribute.
trust-center: 'IBM Trust Center'
trust-center-url: 'https://www.ibm.com/trust'
trust-center-link: '{trust-center-url}[{trust-center}]'
support-url: 'https://www.ibm.com/mysupport/s/'
dsbulk: 'DataStax Bulk Loader (DSBulk)'
dsbulk-short: 'DSBulk'
dsbulk-repo: 'https://github.com/datastax/dsbulk'
astra: 'Astra'
astra-db: 'Astra DB'
astra-ui: 'Astra Portal'
astra-url: 'https://astra.datastax.com'
astra-ui-link: '{astra-url}[{astra-ui}^]'
db-classic: 'Managed Cluster'
db-serverless: 'Serverless (non-vector)'
db-serverless-vector: 'Serverless (vector)'
scb: 'Secure Connect Bundle (SCB)'
scb-short: 'SCB'
scb-brief: 'Secure Connect Bundle'
astra-streaming-examples-repo: 'https://raw.githubusercontent.com/datastax/astra-streaming-examples/master'
luna-streaming-examples-repo: 'https://raw.githubusercontent.com/datastaxdevs/luna-streaming-examples/main'
support_url: 'https://www.ibm.com/mysupport/s/'
glossary-url: 'https://docs.datastax.com/en/glossary/docs/index.html#'
emoji-tada: "🎉"
emoji-rocket: "🚀"
emoji-smile: "&#128512"
devops-api: 'DevOps API'
devops-api-ref-url: 'xref:astra-api-docs:ROOT:attachment$devops-api/index.html'
astra-cli: 'Astra CLI'
astra-stream: 'Astra Streaming'
starlight-kafka: 'Starlight for Kafka'
starlight-rabbitmq: 'Starlight for RabbitMQ'
astra-streaming-examples-repo: 'https://github.com/datastax/astra-streaming-examples'
sstable-sideloader: '{astra-db} Sideloader'
zdm: 'Zero Downtime Migration'
zdm-short: 'ZDM'
zdm-proxy: 'ZDM Proxy'
cass-migrator: 'Cassandra Data Migrator (CDM)'
cass-migrator-short: 'CDM'
hcd: 'Hyper-Converged Database (HCD)'
hcd-short: 'HCD'
dse: 'DataStax Enterprise (DSE)'
cassandra: 'Apache Cassandra(R)'
classic: 'classic'
classic_cap: 'Classic'
serverless: 'serverless'
serverless_cap: 'Serverless'
py-client-api-ref-url: 'xref:astra-api-docs:ROOT:attachment$python-client-1x/astrapy'
ts-client-api-ref-url: 'xref:astra-api-docs:ROOT:attachment$typescript-client-1x'
java-client-api-ref-url: 'xref:astra-api-docs:ROOT:attachment$java-client-1x'
dse-short: 'DSE'
metrics-collector: 'DSE Metrics Collector'
mc: 'Mission Control'
opscenter: 'DSE OpsCenter'
studio: 'DataStax Studio'
cass-reg: 'Apache Cassandra(R)'
cass: 'Apache Cassandra'
cass-short: 'Cassandra'
cql: 'Cassandra Query Language (CQL)'
cql-shell: 'CQL shell'
cql-console: 'CQL console'
cql-service: 'CQL Service'
pulsar-reg: 'Apache Pulsar(TM)'
pulsar: 'Apache Pulsar'
pulsar-short: 'Pulsar'
spark-reg: 'Apache Spark(TM)'
spark: 'Apache Spark'
spark-short: 'Spark'
spark-connect: 'Spark Connect'
spark-connector: 'Apache Cassandra Spark Connector'
spark-connector-short: 'Spark Connector'
kafka-reg: 'Apache Kafka(R)'
kafka: 'Apache Kafka'
kafka-short: 'Kafka'
kafka-connect: 'Kafka Connect'
kafka-connector: 'DataStax Apache Kafka Connector'
kafka-connector-short: 'Kafka Connector'
solr-reg: 'Apache Solr(TM)'
solr: 'Apache Solr'
solr-short: 'Solr'
lucene-reg: 'Apache Lucene(TM)'
lucene: 'Apache Lucene'
lucene-short: 'Lucene'
hadoop-reg: 'Apache Hadoop(R)'
hadoop: 'Apache Hadoop'
hadoop-short: 'Hadoop'
airflow-reg: 'Apache Airflow(R)'
airflow: 'Apache Airflow'
airflow-short: 'Airflow'
maven-reg: 'Apache Maven(TM)'
maven: 'Apache Maven'
maven-short: 'Maven'
flink-reg: 'Apache Flink(R)'
flink: 'Apache Flink'
flink-short: 'Flink'
beam-reg: 'Apache Beam(R)'
beam: 'Apache Beam'
beam-short: 'Beam'
geode-reg: 'Apache Geode(TM)'
geode: 'Apache Geode'
geode-short: 'Geode'
hbase-reg: 'Apache HBase(R)'
hbase: 'Apache HBase'
hbase-short: 'HBase'
kudu-reg: 'Apache Kudu(TM)'
kudu: 'Apache Kudu'
kudu-short: 'Kudu'
phoenix-reg: 'Apache Phoenix(TM)'
phoenix: 'Apache Phoenix'
phoenix-short: 'Phoenix'
zookeeper-reg: 'Apache ZooKeeper(TM)'
zookeeper: 'Apache ZooKeeper'
zookeeper-short: 'ZooKeeper'
asf: 'Apache Software Foundation (ASF)'
asf-short: 'ASF'
tinkerpop-reg: 'Apache TinkerPop(TM)'
tinkerpop: 'Apache TinkerPop'
tinkerpop-short: 'TinkerPop'
cloudstack-reg: 'Apache CloudStack(R)'
cloudstack: 'Apache CloudStack'
cloudstack-short: 'CloudStack'
tomcat-reg: 'Apache Tomcat(R)'
tomcat: 'Apache Tomcat'
tomcat-short: 'Tomcat'
ajp: 'Apache JServ Protocol (AJP)'
ajp-short: 'AJP'
activemq-reg: 'Apache ActiveMQ(R)'
activemq: 'Apache ActiveMQ'
activemq-short: 'ActiveMQ'
tomee-reg: 'Apache TomEE(TM)'
tomee: 'Apache TomEE'
tomee-short: 'TomEE'
bookkeeper-reg: 'Apache BookKeeper(TM)'
bookkeeper: 'Apache BookKeeper'
bookkeeper-short: 'BookKeeper'
groovy-reg: 'Apache Groovy(TM)'
groovy: 'Apache Groovy'
groovy-short: 'Groovy'
cpp-driver-url: 'https://github.com/datastax/cpp-driver'
csharp-driver-url: 'https://github.com/datastax/csharp-driver'
gocql-astra-url: 'https://github.com/datastax/gocql-astra'
go-driver-url: 'https://github.com/apache/cassandra-gocql-driver'
cql-proxy-url: 'https://github.com/datastax/cql-proxy'
java-driver-url: 'https://github.com/apache/cassandra-java-driver'
nodejs-driver-url: 'https://github.com/datastax/nodejs-driver'
python-driver-url: 'https://github.com/datastax/python-driver'
scala-driver-url: 'https://github.com/apache/cassandra-spark-connector'
cass-driver-cpp-shield: 'image:https://img.shields.io/github/v/tag/datastax/cpp-driver?label=latest[alt="Latest cpp-driver release on GitHub",link="{cpp-driver-url}/tags"]'
cass-driver-csharp-shield: 'image:https://img.shields.io/nuget/v/CassandraCSharpDriver?label=latest[alt="Latest CassandraCSharpDriver release on NuGet",link="https://www.nuget.org/packages/CassandraCSharpDriver"]'
cass-driver-go-shield: 'image:https://img.shields.io/github/v/tag/apache/cassandra-gocql-driver?label=latest%20gocql[alt="Latest gocql release on GitHub",link="{go-driver-url}/tags"]'
cass-driver-java-shield: 'image:https://img.shields.io/github/v/tag/apache/cassandra-java-driver?label=latest[alt="Latest cassandra-java-driver release on GitHub",link="{java-driver-url}/tags"]'
cass-driver-nodejs-shield: 'image:https://img.shields.io/github/v/tag/datastax/nodejs-driver?label=latest[alt="Latest nodejs-driver release on GitHub",link="{nodejs-driver-url}/tags"]'
cass-driver-python-shield: 'image:https://img.shields.io/github/v/tag/datastax/python-driver?label=latest[alt="Latest python-driver release on GitHub",link="{python-driver-url}/tags"]'
cass-driver-scala-shield: 'image:https://img.shields.io/github/v/tag/apache/cassandra-spark-connector?label=latest[alt="Latest cassandra-spark-connector release on GitHub",link="{scala-driver-url}/releases"]'
data-api: 'Data API'
csharp-client-api-ref-url: 'xref:astra-api-docs:ROOT:attachment$csharp-client'
py-client-api-ref-url-2x: 'xref:astra-api-docs:ROOT:attachment$python-client/astrapy'
ts-client-api-ref-url-2x: 'xref:astra-api-docs:ROOT:attachment$typescript-client'
java-client-api-ref-url-2x: 'xref:astra-api-docs:ROOT:attachment$java-client'
python-client-repo-url: 'https://github.com/datastax/astrapy'
typescript-client-repo-url: 'https://github.com/datastax/astra-db-ts'
typescript-client-examples-url: '{typescript-client-repo-url}/blob/v2.x/examples'
java-client-repo-url: 'https://github.com/datastax/astra-db-java'
csharp-client-repo-url: 'https://github.com/datastax/astra-db-csharp'
python-client-python-version: '3.8'
dataapi-java-client-shield: 'image:https://img.shields.io/maven-central/v/com.datastax.astra/astra-db-java.svg?label=latest[alt="Latest astra-db-java release on Maven Central",link="https://search.maven.org/artifact/com.datastax.astra/astra-db-java"]'
dataapi-python-client-shield: 'image:https://img.shields.io/github/v/tag/datastax/astrapy?label=latest[alt="Latest astrapy release on GitHub",link="{python-client-repo-url}/releases"]'
dataapi-typescript-client-shield: 'image:https://img.shields.io/github/v/tag/datastax/astra-db-ts?label=latest[alt="Latest astra-db-ts release on GitHub",link="{typescript-client-repo-url}/releases"]'
dataapi-csharp-client-shield: 'image:https://img.shields.io/github/v/tag/datastax/astra-db-csharp?label=latest[alt="Latest astra-db-csharp release on GitHub",link="{csharp-client-repo-url}/releases"]'
agent: 'DataStax Agent'
repair-service: 'Repair Service'
backup-service: 'Backup Service'
performance-service: 'Performance Service'
monitoring-service: 'OpsCenter Monitoring'
nodesync-service: 'NodeSync Service'
bestpractice-service: 'Best Practice Service'
capacity-service: 'Capacity Service'
lcm: 'Lifecycle Manager (LCM)'
lcm-short: 'LCM'
cr: 'custom resource (CR)'
cr-short: 'CR'
crd: 'custom resource definition (CRD)'
crd-short: 'CRD'
# Custom attributes only used in ragstack-ai
astra_db: 'Astra DB'
astra_ui: 'Astra Portal'
# Antora Atlas
primary-site-url: https://docs.datastax.com/en
primary-site-manifest-url: https://docs.datastax.com/en/site-manifest.json
Expand Down
2 changes: 1 addition & 1 deletion modules/ROOT/nav.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -42,5 +42,5 @@
* xref:ROOT:cassandra-data-migrator.adoc[]
* {cass-migrator-repo}/releases[{cass-migrator-short} release notes]

.{dsbulk-loader}
.{dsbulk}
* xref:dsbulk:overview:dsbulk-about.adoc[]
2 changes: 1 addition & 1 deletion modules/ROOT/pages/astra-migration-paths.adoc
Original file line number Diff line number Diff line change
Expand Up @@ -9,7 +9,7 @@ If you have questions about migrating from a specific source to {astra-db}, cont
.Migration tool compatibility
[cols="2,1,1,1,1"]
|===
|Origin |{sstable-sideloader} |{cass-migrator} |{product-proxy} |{dsbulk-loader}
|Origin |{sstable-sideloader} |{cass-migrator} |{product-proxy} |{dsbulk}

|Aiven for {cass-short}
|icon:check[role="text-success",alt="Supported"]
Expand Down
40 changes: 20 additions & 20 deletions modules/ROOT/pages/cassandra-data-migrator.adoc
Original file line number Diff line number Diff line change
@@ -1,6 +1,6 @@
= Use {cass-migrator} with {product-proxy}
:navtitle: Use {cass-migrator}
:description: You can use {cass-migrator} ({cass-migrator-short}) for data migration and validation between {cass-reg}-based databases.
:navtitle: Use {cass-migrator-short}
:description: You can use {cass-migrator} for data migration and validation between {cass-reg}-based databases.
:page-aliases: cdm-parameters.adoc, ROOT:cdm-steps.adoc, ROOT:cdm-overview.adoc

{description}
Expand Down Expand Up @@ -51,41 +51,41 @@ The container's `assets` directory includes all required migration tools: `cassa
Install as a JAR file::
+
--
. Install Java 11 or later, which includes Spark binaries.
. Install Java 11 or later, which includes {spark-short} binaries.

. Install https://spark.apache.org/downloads.html[Apache Spark(TM)] version 3.5.x with Scala 2.13 and Hadoop 3.3 and later.
. Install https://spark.apache.org/downloads.html[{spark-reg}] version 3.5.x with Scala 2.13 and {hadoop-reg} 3.3 and later.
+
[tabs]
====
Single VM::
+
For one-off migrations, you can install the Spark binary on a single VM where you will run the {cass-migrator-short} job.
For one-off migrations, you can install the {spark-short} binary on a single VM where you will run the {cass-migrator-short} job.
+
. Get the Spark tarball from the Apache Spark archive.
. Get the {spark-reg} tarball from the {spark} archive.
+
[source,bash,subs="+quotes"]
----
wget https://archive.apache.org/dist/spark/spark-3.5.**PATCH**/spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz
----
+
Replace `**PATCH**` with your Spark patch version.
Replace `**PATCH**` with your {spark-short} patch version.
+
. Change to the directory where you want install Spark, and then extract the tarball:
. Change to the directory where you want install {spark-short}, and then extract the tarball:
+
[source,bash,subs="+quotes"]
----
tar -xvzf spark-3.5.**PATCH**-bin-hadoop3-scala2.13.tgz
----
+
Replace `**PATCH**` with your Spark patch version.
Replace `**PATCH**` with your {spark-short} patch version.

Spark cluster::
{spark-reg} cluster::
+
For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a Spark cluster or Spark Serverless platform.
For large (several terabytes) migrations, complex migrations, and use of {cass-migrator-short} as a long-term data transfer utility, {company} recommends that you use a {spark} cluster or {spark-short} Serverless platform.
+
If you deploy CDM on a Spark cluster, you must modify your `spark-submit` commands as follows:
If you deploy {cass-migrator-short} on a {spark-short} cluster, you must modify your `spark-submit` commands as follows:
+
* Replace `--master "local[*]"` with the host and port for your Spark cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`.
* Replace `--master "local[*]"` with the host and port for your {spark-short} cluster, as in `--master "spark://**MASTER_HOST**:**PORT**"`.
* Remove parameters related to single-VM installations, such as `--driver-memory` and `--executor-memory`.
====

Expand Down Expand Up @@ -167,7 +167,7 @@ You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file
* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--

Spark cluster::
{spark-reg} cluster::
+
--
[source,bash,subs="+quotes"]
Expand All @@ -188,7 +188,7 @@ Depending on where your properties file is stored, you might need to specify the
+
You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`.

* `--master`: Provide the URL of your Spark cluster.
* `--master`: Provide the URL of your {spark-short} cluster.

* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--
Expand Down Expand Up @@ -236,7 +236,7 @@ You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file
* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--

Spark cluster::
{spark-reg} cluster::
+
--
[source,bash,subs="+quotes"]
Expand All @@ -257,7 +257,7 @@ Depending on where your properties file is stored, you might need to specify the
+
You can also set `spark.cdm.schema.origin.keyspaceTable` in your properties file using the same format of `**KEYSPACE_NAME**.**TABLE_NAME**`.

* `--master`: Provide the URL of your Spark cluster.
* `--master`: Provide the URL of your {spark-short} cluster.

* `**VERSION**`: Specify the full {cass-migrator-short} version that you installed, such as `5.2.1`.
--
Expand Down Expand Up @@ -331,15 +331,15 @@ Specifically, see the {cass-migrator-repo}/blob/main/src/resources/cdm-detailed.
.Java NoSuchMethodError
[%collapsible]
====
If you installed Spark as a JAR file, and your Spark and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following:
If you installed {spark-short} as a JAR file, and your {spark-short} and Scala versions aren't compatible with your installed version of {cass-migrator-short}, {cass-migrator-short} jobs can throw exceptions such a the following:

[source,console]
----
Exception in thread "main" java.lang.NoSuchMethodError: 'void scala.runtime.Statics.releaseFence()'
----

Make sure that your Spark binary is compatible with your {cass-migrator-short} version.
If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier Spark binary.
Make sure that your {spark-short} binary is compatible with your {cass-migrator-short} version.
If you installed an earlier version of {cass-migrator-short}, you might need to install an earlier {spark-short} binary.
====

.Rerun a failed or partially completed job
Expand Down
Loading