English | 中文
-
We are recruiting Big data platform development engineers. If you want more information about the position, please add WeChat ID [ysqwhiletrue] or email your resume to [email protected].
-
We use DingTalk to communicate, you can search the group number [30537511] or scan the QR code below to join the communication group
- FlinkX is a distributed offline and real-time data synchronization framework based on flink widely used in 袋鼠云, which realizes efficient data migration between multiple heterogeneous data sources.
Different data sources are abstracted into different Reader plugins, and different data targets are abstracted into different Writer plugins. In theory, the FlinkX framework can support data synchronization of any data source type. As a set of ecosystems, every time a set of new data sources is connected, the newly added data sources can realize intercommunication with existing data sources.
FlinkX is a data synchronization tool based on Flink. FlinkX can collect static data, such as MySQL, HDFS, etc, as well as real-time changing data, such as MySQL binlog, Kafka, etc. FlinkX currently includes the following features:
-
Most plugins support concurrent reading and writing of data, which can greatly improve the speed of reading and writing;
-
Some plug-ins support the function of failure recovery, which can restore tasks from the failed location and save running time; Failure Recovery
-
The Reader plugin for relational databases supports interval polling. It can continuously collect changing data; Interval Polling
-
Some databases support opening Kerberos security authentication; Kerberos
-
Limit the reading speed of Reader plugins and reduce the impact on business databases;
-
Save the dirty data when writing data;
-
Limit the maximum number of dirty data;
-
Multiple running modes: Local,Standalone,Yarn Session,Yarn Per;
The following databases are currently supported:
Database Type | Reader | Writer | |
---|---|---|---|
Batch Synchronization | MySQL | doc | doc |
Oracle | doc | doc | |
SqlServer | doc | doc | |
PostgreSQL | doc | doc | |
DB2 | doc | doc | |
GBase | doc | doc | |
ClickHouse | doc | doc | |
PolarDB | doc | doc | |
SAP Hana | doc | doc | |
Teradata | doc | doc | |
Phoenix | doc | doc | |
达梦 | doc | doc | |
Greenplum | doc | doc | |
KingBase | doc | doc | |
Cassandra | doc | doc | |
ODPS | doc | doc | |
HBase | doc | doc | |
MongoDB | doc | doc | |
Kudu | doc | doc | |
ElasticSearch | doc | doc | |
FTP | doc | doc | |
HDFS | doc | doc | |
Carbondata | doc | doc | |
Stream | doc | doc | |
Redis | doc | ||
Hive | doc | ||
Stream Synchronization | Kafka | doc | doc |
EMQX | doc | doc | |
RestApi | doc | ||
MySQL Binlog | doc | ||
MongoDB Oplog | doc | ||
PostgreSQL WAL | doc | ||
Oracle LogMiner | doc | ||
Sqlserver CDC | doc |
In the underlying implementation, FlinkX relies on Flink, and the data synchronization task will be translated into StreamGraph and executed on Flink. The basic principle is as follows:
Please click Quick Start
Please click General Configuration
Please click Statistics Metric
Please click Kerberos
Please click Questions
Please click Contribution
FlinkX is under the Apache 2.0 license. See the LICENSE file for details.