Hadoop Serialization Guide

2 years ago

Liam

2 minutes

Hadoop can serialize data using the serialization interface in Java. The specific steps are as follows:

Create a class that implements the Writable interface, which represents a data object that needs to be serialized. The Writable interface is provided by Hadoop for serialization and deserialization.

public class MyData implements Writable {
    private String name;
    private int age;

    // 实现write()方法，将对象序列化为字节流
    @Override
    public void write(DataOutput out) throws IOException {
        out.writeUTF(name);
        out.writeInt(age);
    }

    // 实现readFields()方法，从字节流中反序列化对象
    @Override
    public void readFields(DataInput in) throws IOException {
        name = in.readUTF();
        age = in.readInt();
    }

    // 其他getter和setter方法
}

Utilize the custom data type in a MapReduce program, and perform serialization and deserialization on it.

public static class MyMapper extends Mapper<LongWritable, Text, Text, MyData> {
    private MyData myData = new MyData();

    @Override
    protected void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
        // 对myData对象进行赋值
        myData.setName("Alice");
        myData.setAge(30);

        // 将myData对象写入context中
        context.write(new Text("key"), myData);
    }
}

public static class MyReducer extends Reducer<Text, MyData, Text, Text> {
    @Override
    protected void reduce(Text key, Iterable<MyData> values, Context context) throws IOException, InterruptedException {
        // 从values中读取myData对象并进行操作
        for (MyData myData : values) {
            // 输出myData对象的内容
            context.write(new Text(myData.getName()), new Text(String.valueOf(myData.getAge())));
        }
    }
}

In the main function, configure the serialization class for the custom data type so Hadoop can properly serialize and deserialize data objects.

job.setMapOutputKeyClass(Text.class);
job.setMapOutputValueClass(MyData.class);

job.setOutputKeyClass(Text.class);
job.setOutputValueClass(Text.class);

By following the above steps, it is possible to serialize and deserialize custom data types in Hadoop.

#Big Data #Hadoop #Java #Serialization #Writable interface