Skip to content
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
10 changes: 5 additions & 5 deletions docs/content.zh/docs/connectors/flink-sources/mongodb-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -238,7 +238,7 @@ MongoDB 的更改事件记录在消息之前没有更新。因此,我们只能
<td>String</td>
<td> MongoDB CDC 消费者可选的启动模式,
合法的模式为 "initial","latest-offset" 和 "timestamp"。
请查阅 <a href="#a-name-id-002-a">启动模式</a> 章节了解更多详细信息。</td>
请查阅 <a href="#启动模式">启动模式</a> 章节了解更多详细信息。</td>
</tr>
<tr>
<td>scan.startup.timestamp-millis</td>
Expand Down Expand Up @@ -423,7 +423,7 @@ CREATE TABLE products (

MongoDB CDC 连接器是一个 Flink Source 连接器,它将首先读取数据库快照,然后在处理**甚至失败时继续读取带有**的更改流事件。

### 启动模式<a name="启动模式" id="002" ></a>
### 启动模式

配置选项```scan.startup.mode```指定 MongoDB CDC 消费者的启动模式。有效枚举包括:

Expand Down Expand Up @@ -544,7 +544,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) 了解更多详细信息。
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) 了解更多详细信息。

### DataStream Source

Expand Down Expand Up @@ -694,7 +694,7 @@ CREATE TABLE mongodb_source (...) WITH (
----------------
[BSON](https://docs.mongodb.com/manual/reference/bson-types/) **二进制 JSON**的缩写是一种类似 JSON 格式的二进制编码序列,用于在 MongoDB 中存储文档和进行远程过程调用。

[Flink SQL Data Type](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/types/) 类似于 SQL 标准的数据类型术语,该术语描述了表生态系统中值的逻辑类型。它可以用于声明操作的输入和/或输出类型。
[Flink SQL Data Type](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/types/) 类似于 SQL 标准的数据类型术语,该术语描述了表生态系统中值的逻辑类型。它可以用于声明操作的输入和/或输出类型。

为了使 Flink SQL 能够处理来自异构数据源的数据,异构数据源的数据类型需要统一转换为 Flink SQL 数据类型。

Expand Down Expand Up @@ -812,6 +812,6 @@ CREATE TABLE mongodb_source (...) WITH (
- [Replica set protocol](https://docs.mongodb.com/manual/reference/replica-configuration/#mongodb-rsconf-rsconf.protocolVersion)
- [Connection String Options](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options)
- [BSON Types](https://docs.mongodb.com/manual/reference/bson-types/)
- [Flink DataTypes](https://nightlies.apache.org/flink/flink-docs-release-1.16/docs/dev/table/types/)
- [Flink DataTypes](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/types/)

{{< top >}}
20 changes: 10 additions & 10 deletions docs/content.zh/docs/connectors/flink-sources/mysql-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -109,7 +109,7 @@ mysql> FLUSH PRIVILEGES;
### 为每个 Reader 设置不同的 Server id

每个用于读取 binlog 的 MySQL 数据库客户端都应该有一个唯一的 id,称为 Server id。 MySQL 服务器将使用此 id 来维护网络连接和 binlog 位置。 因此,如果不同的作业共享相同的 Server id, 则可能导致从错误的 binlog 位置读取数据。
因此,建议通过为每个 Reader 设置不同的 Server id [SQL Hints](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/hints.html),
因此,建议通过为每个 Reader 设置不同的 Server id [SQL Hints](https://nightlies.apache.org/flink/flink-docs-release-1.20/zh/docs/dev/table/sql/queries/hints/),
假设 Source 并行度为 4, 我们可以使用 `SELECT * FROM source_table /*+ OPTIONS('server-id'='5401-5404') */ ;` 来为 4 个 Source readers 中的每一个分配唯一的 Server id。


Expand Down Expand Up @@ -246,7 +246,7 @@ Flink SQL> SELECT * FROM orders;
(3)在快照读取之前,Source 不需要数据库锁权限。
如果希望 Source 并行运行,则每个并行 Readers 都应该具有唯一的 Server id,所以
Server id 必须是类似 `5400-6400` 的范围,并且该范围必须大于并行度。
请查阅 <a href="#a-name-id-001-a">增量快照读取</a> 章节了解更多详细信息。
请查阅 <a href="#增量快照读取">增量快照读取</a> 章节了解更多详细信息。
</td>
</tr>
<tr>
Expand Down Expand Up @@ -281,7 +281,7 @@ Flink SQL> SELECT * FROM orders;
<td>String</td>
<td> MySQL CDC 消费者可选的启动模式,
合法的模式为 "initial","earliest-offset","latest-offset","specific-offset","timestamp" 和 "snapshot"。
请查阅 <a href="#a-name-id-002-a">启动模式</a> 章节了解更多详细信息。</td>
请查阅 <a href="#启动模式">启动模式</a> 章节了解更多详细信息。</td>
</tr>
<tr>
<td>scan.startup.specific-offset.file</td>
Expand Down Expand Up @@ -578,7 +578,7 @@ CREATE TABLE products (
支持的特性
--------

### 增量快照读取<a name="增量快照读取" id="001" ></a>
### 增量快照读取

增量快照读取是一种读取表快照的新机制。与旧的快照机制相比,增量快照具有许多优点,包括:
* (1)在快照读取期间,Source 支持并发读取,
Expand Down Expand Up @@ -637,7 +637,7 @@ MySQL 集群中你监控的服务器出现故障后, 你只需将受监视的服
当 MySQL CDC Source 启动时,它并行读取表的快照,然后以单并行度的方式读取表的 binlog。

在快照阶段,快照会根据表的分块键和表行的大小切割成多个快照块。
快照块被分配给多个快照读取器。每个快照读取器使用 [区块读取算法](#snapshot-chunk-reading) 并将读取的数据发送到下游。
快照块被分配给多个快照读取器。每个快照读取器使用 [区块读取算法](#区块读取算法) 并将读取的数据发送到下游。
Source 会管理块的进程状态(完成或未完成),因此快照阶段的 Source 可以支持块级别的 checkpoint。
如果发生故障,可以恢复 Source 并继续从最后完成的块中读取块。

Expand Down Expand Up @@ -678,7 +678,7 @@ MySQL CDC Source 使用主键列将表划分为多个分片(chunk)。 默认
[uuid-def, +∞).
```

##### Chunk 读取算法
##### 区块读取算法

对于上面的示例`MyTable`,如果 MySQL CDC Source 并行度设置为 4,MySQL CDC Source 将在每一个 executes 运行 4 个 Readers **通过偏移信号算法**
获取快照区块的最终一致输出。 **偏移信号算法**简单描述如下:
Expand All @@ -699,7 +699,7 @@ MySQL CDC Source 使用主键列将表划分为多个分片(chunk)。 默认
MySQL CDC 连接器是一个 Flink Source 连接器,它将首先读取表快照块,然后继续读取 binlog,
无论是在快照阶段还是读取 binlog 阶段,MySQL CDC 连接器都会在处理时**准确读取数据**,即使任务出现了故障。

### 启动模式<a name="启动模式" id="002" ></a>
### 启动模式

配置选项```scan.startup.mode```指定 MySQL CDC 使用者的启动模式。有效枚举包括:

Expand Down Expand Up @@ -837,7 +837,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) 了解更多详细信息。
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) 了解更多详细信息。

### 关于无主键表

Expand Down Expand Up @@ -1136,14 +1136,14 @@ $ ./bin/flink run \
</td>
<td>
MySQL 中的空间数据类型将转换为具有固定 Json 格式的字符串。
请参考 MySQL <a href="#a-name-id-003-a">空间数据类型映射</a> 章节了解更多详细信息。
请参考 MySQL <a href="#空间数据类型映射">空间数据类型映射</a> 章节了解更多详细信息。
</td>
</tr>
</tbody>
</table>
</div>

### 空间数据类型映射<a name="空间数据类型映射" id="003"></a>
### 空间数据类型映射

MySQL中除`GEOMETRYCOLLECTION`之外的空间数据类型都会转换为 Json 字符串,格式固定,如:<br>
```json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -831,14 +831,14 @@ CREATE TABLE products (
</td>
<td>
空间数据类型将转换为具有固定 Json 格式的字符串。
请参考 <a href="#a-name-id-003-a">空间数据类型映射</a> 章节了解更多详细信息。
请参考 <a href="#空间数据类型映射">空间数据类型映射</a> 章节了解更多详细信息。
</td>
</tr>
</tbody>
</table>
</div>

### 空间数据类型映射<a name="空间数据类型映射" id="003"></a>
### 空间数据类型映射

除`GEOMETRYCOLLECTION`之外的空间数据类型都会转换为 Json 字符串,格式固定,如:<br>
```json
Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -617,7 +617,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) 了解更多详细信息。
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) 了解更多详细信息。
### DataStream Source
Expand Down
4 changes: 2 additions & 2 deletions docs/content.zh/docs/connectors/flink-sources/postgres-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -222,7 +222,7 @@ Connector Options
(1) source can be parallel during snapshot reading,
(2) source can perform checkpoints in the chunk granularity during snapshot reading,
(3) source doesn't need to acquire global read lock (FLUSH TABLES WITH READ LOCK) before snapshot reading.
Please see <a href="#incremental-snapshot-reading ">Incremental Snapshot Reading</a>section for more detailed information.
Please see <a href="#incremental-snapshot-reading-experimental">Incremental Snapshot Reading</a>section for more detailed information.
</td>
</tr>
<tr>
Expand Down Expand Up @@ -573,7 +573,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) 了解更多详细信息。
**注意:** 请参考文档 [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) 了解更多详细信息。

### DataStream Source

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -656,7 +656,7 @@ source:
</td>
<td>
MySQL 中的空间数据类型将转换为具有固定 Json 格式的字符串。
请参考 MySQL <a href="#a-name-id-003-a">空间数据类型映射</a> 章节了解更多详细信息。
请参考 MySQL <a href="#空间数据类型映射">空间数据类型映射</a> 章节了解更多详细信息。
</td>
</tr>
</tbody>
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/core-concept/data-pipeline.md
Original file line number Diff line number Diff line change
Expand Up @@ -32,7 +32,7 @@ under the License.
为了描述 Data Pipeline,我们需要定义以下部分:
- [source]({{< ref "docs/core-concept/data-source" >}})
- [sink]({{< ref "docs/core-concept/data-sink" >}})
- [pipeline](#pipeline-configurations)
- [pipeline](#pipeline-配置)

下面 是 Data Pipeline 的一些可选配置:
- [route]({{< ref "docs/core-concept/route" >}})
Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/core-concept/schema-evolution.md
Original file line number Diff line number Diff line change
Expand Up @@ -68,7 +68,7 @@ pipeline:

这是默认的架构演变行为。

> 注意:在此模式下,`TruncateTableEvent` 和 `DropTableEvent` 默认不会被发送到下游,以避免意外的数据丢失。这一行为可以通过配置 [Per-Event Type Control](#per-event-type-control) 调整。
> 注意:在此模式下,`TruncateTableEvent` 和 `DropTableEvent` 默认不会被发送到下游,以避免意外的数据丢失。这一行为可以通过配置 [按类型配置行为](#按类型配置行为) 调整。

### Ignore 模式

Expand Down
2 changes: 1 addition & 1 deletion docs/content.zh/docs/core-concept/transform.md
Original file line number Diff line number Diff line change
Expand Up @@ -496,7 +496,7 @@ pipeline:
```
注意:
* `model-name` 是一个通用的必填参数,用于所有支持的模型,表示在 `projection` 或 `filter` 中调用的函数名称。
* `class-name` 是一个通用的必填参数,用于所有支持的模型,可用值可以在[所有支持的模型](#all-support-models)中找到。
* `class-name` 是一个通用的必填参数,用于所有支持的模型,可用值可以在[所有支持的模型](#所有支持的模型)中找到。
* `openai.model`,`openai.host`,`openai.apiKey` 和 `openai.chat.prompt` 是在各个模型中特别的可选参数。

如何使用一个 Embedding AI 模型:
Expand Down
6 changes: 3 additions & 3 deletions docs/content/docs/connectors/flink-sources/mongodb-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -567,7 +567,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**Note:** Please refer the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) for more details.
**Note:** Please refer the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) for more details.

### DataStream Source

Expand Down Expand Up @@ -714,7 +714,7 @@ Data Type Mapping
----------------
[BSON](https://docs.mongodb.com/manual/reference/bson-types/) short for **Binary JSON** is a binary-encoded serialization of JSON-like format used to store documents and make remote procedure calls in MongoDB.

[Flink SQL Data Type](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/types/) is similar to the SQL standard’s data type terminology which describes the logical type of a value in the table ecosystem. It can be used to declare input and/or output types of operations.
[Flink SQL Data Type](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/types/) is similar to the SQL standard’s data type terminology which describes the logical type of a value in the table ecosystem. It can be used to declare input and/or output types of operations.

In order to enable Flink SQL to process data from heterogeneous data sources, the data types of heterogeneous data sources need to be uniformly converted to Flink SQL data types.

Expand Down Expand Up @@ -835,6 +835,6 @@ Reference
- [Connection String Options](https://docs.mongodb.com/manual/reference/connection-string/#std-label-connections-connection-options)
- [Document Pre- and Post-Images](https://www.mongodb.com/docs/v6.0/changeStreams/#change-streams-with-document-pre--and-post-images)
- [BSON Types](https://docs.mongodb.com/manual/reference/bson-types/)
- [Flink DataTypes](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/dev/table/types/)
- [Flink DataTypes](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/types/)

{{< top >}}
6 changes: 3 additions & 3 deletions docs/content/docs/connectors/flink-sources/mysql-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -103,7 +103,7 @@ Notes
### Set a different SERVER ID for each reader

Every MySQL database client for reading binlog should have a unique id, called server id. MySQL server will use this id to maintain network connection and the binlog position. Therefore, if different jobs share a same server id, it may result to read from wrong binlog position.
Thus, it is recommended to set different server id for each reader via the [SQL Hints](https://ci.apache.org/projects/flink/flink-docs-release-1.11/dev/table/sql/hints.html),
Thus, it is recommended to set different server id for each reader via the [SQL Hints](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/dev/table/sql/queries/hints/),
e.g. assuming the source parallelism is 4, then we can use `SELECT * FROM source_table /*+ OPTIONS('server-id'='5401-5404') */ ;` to assign unique server id for each of the 4 source readers.


Expand Down Expand Up @@ -658,7 +658,7 @@ The CDC job may restart fails in this case. So the heartbeat event will help upd
When the MySQL CDC source is started, it reads snapshot of table parallelly and then reads binlog of table with single parallelism.

In snapshot phase, the snapshot is cut into multiple snapshot chunks according to chunk key of table and the size of table rows.
Snapshot chunks is assigned to multiple snapshot readers. Each snapshot reader reads its received chunks with [chunk reading algorithm](#snapshot-chunk-reading) and send the read data to downstream.
Snapshot chunks is assigned to multiple snapshot readers. Each snapshot reader reads its received chunks with [chunk reading algorithm](#chunk-reading-algorithm) and send the read data to downstream.
The source manages the process status (finished or not) of chunks, thus the source of snapshot phase can support checkpoint in chunk level.
If a failure happens, the source can be restored and continue to read chunks from last finished chunks.

Expand Down Expand Up @@ -862,7 +862,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**Note:** Please refer the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) for more details.
**Note:** Please refer the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) for more details.

### Tables Without primary keys

Expand Down
Original file line number Diff line number Diff line change
Expand Up @@ -854,7 +854,7 @@ Data Type Mapping
</td>
<td>
The spatial data types will be converted into STRING with a fixed Json format.
Please see <a href="#patial-data-types-mapping ">Spatial Data Types Mapping</a> section for more detailed information.
Please see <a href="#spatial-data-types-mapping">Spatial Data Types Mapping</a> section for more detailed information.
</td>
</tr>
</tbody>
Expand Down
2 changes: 1 addition & 1 deletion docs/content/docs/connectors/flink-sources/oracle-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -618,7 +618,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**Note:** Please refer the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) for more details.
**Note:** Please refer the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) for more details.
### DataStream Source
Expand Down
4 changes: 2 additions & 2 deletions docs/content/docs/connectors/flink-sources/postgres-cdc.md
Original file line number Diff line number Diff line change
Expand Up @@ -219,7 +219,7 @@ SELECT * FROM shipments;
(1) source can be parallel during snapshot reading,
(2) source can perform checkpoints in the chunk granularity during snapshot reading,
(3) source doesn't need to acquire global read lock (FLUSH TABLES WITH READ LOCK) before snapshot reading.
Please see <a href="#incremental-snapshot-reading ">Incremental Snapshot Reading</a>section for more detailed information.
Please see <a href="#incremental-snapshot-reading-experimental">Incremental Snapshot Reading</a>section for more detailed information.
</td>
</tr>
<tr>
Expand Down Expand Up @@ -574,7 +574,7 @@ $ ./bin/flink run \
--from-savepoint /tmp/flink-savepoints/savepoint-cca7bc-bb1e257f0dab \
./FlinkCDCExample.jar
```
**Note:** Please refer to the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.17/docs/deployment/cli/#command-line-interface) for more details.
**Note:** Please refer to the doc [Restore the job from previous savepoint](https://nightlies.apache.org/flink/flink-docs-release-1.20/docs/deployment/cli/#command-line-interface) for more details.

### DataStream Source

Expand Down