Hadoop Serialization & Deserialization Guide
In Hadoop, serialization and deserialization are primarily achieved through the Writable interface and WritableComparable interface. The Writable interface defines the data types that can be serialized and deserialized, while the WritableComparable interface extends the Writable interface and adds comparison methods.
To achieve serialization and deserialization, follow the steps below:
- Create a class that implements the Writable interface, which should include fields that need to be serialized and deserialized, and implement the write and readFields methods to perform serialization and deserialization operations.
public class MyWritable implements Writable {
private String field1;
private int field2;
// 必须实现无参构造方法
public MyWritable() {
}
public void write(DataOutput out) throws IOException {
out.writeUTF(field1);
out.writeInt(field2);
}
public void readFields(DataInput in) throws IOException {
field1 = in.readUTF();
field2 = in.readInt();
}
}
- Use this custom Writable class as the input and output data type in the MapReduce program. Serialize and deserialize operations are implemented in the Mapper and Reducer by calling the write and readFields methods.
public static class MyMapper extends Mapper<LongWritable, Text, Text, MyWritable> {
private MyWritable myWritable = new MyWritable();
public void map(LongWritable key, Text value, Context context) throws IOException, InterruptedException {
String[] parts = value.toString().split(",");
myWritable.setField1(parts[0]);
myWritable.setField2(Integer.parseInt(parts[1]));
context.write(new Text("key"), myWritable);
}
}
public static class MyReducer extends Reducer<Text, MyWritable, Text, NullWritable> {
public void reduce(Text key, Iterable<MyWritable> values, Context context) throws IOException, InterruptedException {
for (MyWritable value : values) {
// 反序列化操作
String field1 = value.getField1();
int field2 = value.getField2();
// 执行其他操作
}
}
}
By implementing the Writable and WritableComparable interfaces, custom data types can be serialized and deserialized in Hadoop, allowing them to be stored and processed in MapReduce programs.