How to customize the output of Flink data to Kafka?

2 years ago

William Carter

2 minutes

In Flink, data can be custom output to Kafka using the addSink() method. Here is an example code:

import org.apache.flink.api.common.serialization.SimpleStringSchema;
import org.apache.flink.streaming.api.datastream.DataStream;
import org.apache.flink.streaming.api.environment.StreamExecutionEnvironment;
import org.apache.flink.streaming.connectors.kafka.FlinkKafkaProducer;

public class FlinkKafkaOutputExample {

    public static void main(String[] args) throws Exception {
        // 设置执行环境
        StreamExecutionEnvironment env = StreamExecutionEnvironment.getExecutionEnvironment();

        // 创建输入数据流
        DataStream<String> inputStream = env.fromElements("data1", "data2", "data3");

        // 定义Kafka连接信息
        String kafkaBroker = "localhost:9092";
        String kafkaTopic = "output_topic";

        // 创建Kafka生产者
        FlinkKafkaProducer<String> kafkaSink = new FlinkKafkaProducer<>(
                kafkaTopic,
                new SimpleStringSchema(),
                KafkaConfig.getProperties(kafkaBroker),
                FlinkKafkaProducer.Semantic.EXACTLY_ONCE);

        // 将数据流写入Kafka
        inputStream.addSink(kafkaSink);

        // 执行任务
        env.execute("Flink Kafka Output Example");
    }
}

In the code above, we first obtain a StreamExecutionEnvironment object using StreamExecutionEnvironment.getExecutionEnvironment(), and then create an input data stream using the fromElements() method. Next, we define the connection information for Kafka, including the broker address and the output topic name. Then, we create a Kafka producer instance using FlinkKafkaProducer, where we set the data serialization method and Kafka configuration information. Finally, we write the data to Kafka using the addSink() method.

It is important to note that the old version of the Kafka connector used in the example above has been deprecated in the latest version of Flink. If you are using the new version of Flink, you can replace the KafkaConfig.getProperties(kafkaBroker) used in the example above with the method of using the constructor of FlinkKafkaProducer to accept a KafkaProducer configuration object.

Moreover, customizing the serialization of data can be achieved by implementing the custom SerializationSchema interface, as well as customizing the partition of data by implementing the KafkaSerializationSchema interface, and so on. For specific details, please refer to the official Flink documentation.