javaでファイルの文字コードを判別するには

2年 ago

夏樹, 風

1 minute

Javaでは、CharsetDetectorクラスを使ってファイルの文字コードを判定できます。最初にjuniversalchardetライブラリをインポートする必要があります。その上で、以下のコードでファイルの文字コードを判定できます。

import org.mozilla.universalchardet.UniversalDetector;

public class CharsetDetectorExample {
    public static void main(String[] args) {
        try {
            byte[] data = readFile("path/to/file"); // 读取文件内容为字节数组
            UniversalDetector detector = new UniversalDetector(null);
            detector.handleData(data, 0, data.length);
            detector.dataEnd();
            String charsetName = detector.getDetectedCharset();
            detector.reset();
            System.out.println("文件的字符集编码为：" + charsetName);
        } catch (Exception e) {
            e.printStackTrace();
        }
    }
    
    private static byte[] readFile(String filePath) throws IOException {
        File file = new File(filePath);
        byte[] data = new byte[(int) file.length()];
        try (InputStream in = new FileInputStream(file)) {
            in.read(data);
        }
        return data;
    }
}

ファイル内容をバイト配列として読み取るために、readFile メソッドが利用されています。次に、UniversalDetector オブジェクトが作成され、handleData メソッドにファイル内容が渡され処理されます。最後に、getDetectedCharset メソッドを呼び出すことでファイルの文字セットエンコーディングが取得できます。