I parsing mp3 tags.
String artist  - I do not know what was on the encoding
Ïåñíÿ ïðî íàäåæäó - example string in russian "Песня про надежду"
I use http://code.google.com/p/juniversalchardet/
code:
String GetEncoding(String text) throws IOException {
        byte[] buf = new byte[4096];
        InputStream fis = new ByteArrayInputStream(text.getBytes());
        UniversalDetector detector = new UniversalDetector(null);
        int nread;
        while ((nread = fis.read(buf)) > 0 && !detector.isDone()) {
            detector.handleData(buf, 0, nread);
        }
        detector.dataEnd();
        String encoding = detector.getDetectedCharset();
        detector.reset();
        return encoding;
    }
And covert
new String(text.getBytes(encoding), "cp1251"); -but this not work.
if I use utf-16
new String(text.getBytes("UTF-16"), "cp1251") return "юя П е с н я п р о н а д е ж д у" space - not is char space
EDIT:
this first read bytes
byte[] abyFrameData = new byte[iTagSize];
oID3DIS.readFully(abyFrameData);
ByteArrayInputStream oFrameBAIS = new ByteArrayInputStream(abyFrameData);
String s = new String(abyFrameData, "????");
Java strings are UTF-16. All other encodings can be represented using byte sequences. To decode character data, you must provide the encoding when you first create the string. If you have a corrupted string, it is already too late.
Assuming ID3, the specifications define the rules for encoding. For example, ID3v2.4.0 might restrict the encodings used via an extended header:
q - Text encoding restrictions
0 No restrictions 1 Strings are only encoded with ISO-8859-1 [ISO-8859-1] or UTF-8 [UTF-8].
Encoding handling is defined further down the document:
If nothing else is said, strings, including numeric strings and URLs, are represented as ISO-8859-1 characters in the range $20 - $FF. Such strings are represented in frame descriptions as
<text string>, or<full text string>if newlines are allowed. If nothing else is said newline character is forbidden. In ISO-8859-1 a newline is represented, when allowed, with $0A only.Frames that allow different types of text encoding contains a text encoding description byte. Possible encodings:
$00 ISO-8859-1 [ISO-8859-1]. Terminated with $00. $01 UTF-16 [UTF-16] encoded Unicode [UNICODE] with BOM. All strings in the same frame SHALL have the same byteorder. Terminated with $00 00. $02 UTF-16BE [UTF-16] encoded Unicode [UNICODE] without BOM. Terminated with $00 00. $03 UTF-8 [UTF-8] encoded Unicode [UNICODE]. Terminated with $00.
Use transcoding classes like InputStreamReader or (more likely in this case) the String(byte[],Charset) constructor to decode the data. See also Java: a rough guide to character encoding.
Parsing the string components of an ID3v2.4.0 data structure would something like this:
//untested code
public String parseID3String(DataInputStream in) throws IOException {
  String[] encodings = { "ISO-8859-1", "UTF-16", "UTF-16BE", "UTF-8" };
  String encoding = encodings[in.read()];
  byte[] terminator =
      encoding.startsWith("UTF-16") ? new byte[2] : new byte[1];
  byte[] buf = terminator.clone();
  ByteArrayOutputStream buffer = new ByteArrayOutputStream();
  do {
    in.readFully(buf);
    buffer.write(buf);
  } while (!Arrays.equals(terminator, buf));
  return new String(buffer.toByteArray(), encoding);
}
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With