2013年9月23日星期一

By char convert a String to byte array

 This post last edited by the skmbw on 2013-09-22 15:58:36
for example String text = " A is " ;
If passed byte [] bytes = text.getBytes (); byte array is converted
[-80, -94, -54, -57]
If you want to do through the char , which is traversed String of each character , and then
convert byte array. To and text.getBytes () to get an array of the same, how to do it ?
encoding is GBK encoding .
example:

String text = "阿是";
byte[] bytes = text.getBytes();//[-80, -94, -54, -57]
byte[] abytes = new byte[text.length() * 2];
for (int i = 0; i < text.length(); i++) {
    char c = text.charAt(i);
//在这里通过c,该怎么做,才能将String转成byte[]

    //System.out.println((byte)((c + 0xA0)) );
//System.out.println((byte)(0x96));
//System.out.println((byte)(0xA0 + (c >> 6)));
//System.out.println((byte)(0xa0 + (c & 0x3F)));
}


god please let me know.
------ Solution ---------------------------------------- ----

      
What is the problem here ?                
    
then answer questions such as landlord .     
    
If you want to know immediately , you can open a paste hang 40 points to me               
  
  
There is no problem . . . . You are wrong , right ? . . . .  

Well, I said is not accurate, so write is actually possible.
------ Solution ---------------------------------------- ----
Java character set conversion has the API , including GBK, research getBytes () to go .
------ Solution ---------------------------------------- ----
do not know why you have to use char to turn , if it is to use it, look at this way it can meet your requirements



package study.string.length;

import java.io.UnsupportedEncodingException;

import sun.io.CharToByteConverter;
import sun.io.MalformedInputException;

public class StrLenght {

    public static void main(String[] args) throws UnsupportedEncodingException, MalformedInputException {
        String str = "a中";
        byte[] chars = str.getBytes();
        for (int x = 0; x < chars.length; x++) {
            System.out.println(chars[x]);
        }

        print(str);
    }

    public static void print(String str) throws UnsupportedEncodingException, MalformedInputException {
        byte[] result = new byte[str.getBytes().length];
        int p = 0;
        for (int i = 0; i < str.length(); i++) {
            char c = str.charAt(i);
            byte l = (byte) c;
            byte h = (byte) (c >> 8);

            if (h == 0) {
                result[p++] = l;
            } else {
                char[] cs = new char[1];
                cs[0] = c;

                CharToByteConverter converter = CharToByteConverter.getConverter("GBK");
                byte[] br = converter.convertAll(cs);

                result[p++] = br[0];
                result[p++] = br[1];
            }

        }
        for (int x = 0; x < result.length; x++) {
            System.out.println(result[x]);
        }
    }

}



------ Solution ------------------------------------- -------
If I have to deal with a character a character , you can use CharBuffer.
But still the easiest to use String , generally will not go wrong.
------ For reference only -------------------------------------- -
char is double-byte , and the result will be different , oh
------ For reference only ----------------- ----------------------
for (int i = 0; i < text.length () ; i + +) {
char c = text.charAt (i);

If you have Chinese character processing , text.length () would be a problem .
------ For reference only -------------------------------------- -


What is the problem here ?
------ For reference only -------------------------------------- -

  
What is the problem here ?  

then answer questions such as landlord .

If you want to know immediately , you can open a paste hang 40 points to me
------ For reference only ---------------------------------------

    
What is the problem here ?          
  
then answer questions such as landlord .   
  
If you want to know immediately , you can open a paste hang 40 points to me     


There is no problem . . . . You are wrong , right ? . . . .
------ For reference only -------------------------------------- -

deal indeed Chinese , char can represent Chinese . If Chinese is converted into a byte [] b = new byte [2]; letters to convert a byte. What is the problem ? Please enlighten me .
------ For reference only -------------------------------------- -

deal indeed Chinese , char can represent Chinese . If Chinese is converted into a byte [] b = new byte [2]; letters to convert a byte. What is the problem ? Please enlighten me .  

there is no problem,
But you can not new byte [2], Byte and Char is a difference .
------ For reference only -------------------------------------- -
I want to use bit operations, so that the performance better. If you use a nio package the way I would do .
If it is converted to UTF-8, the following operations are possible :

(byte)(0xE0 + (chr >> 12));
(byte)(0x80 + ((chr >> 6) & 0x3F));
(byte)(0x80 + (chr & 0x3F));

above code is a character chr .
If a character directly

(byte)chr;

can.
now confusion is GBK, bitwise how to write .
Please god who educated us.
------ For reference only -------------------------------------- -

deal indeed Chinese , char can represent Chinese . If Chinese is converted into a byte [] b = new byte [2]; letters to convert a byte. What is the problem ? Please enlighten me .          
  
there is no problem,   
But you can not new byte [2], Byte and Char is a difference .  
There is a difference , so to pass bit computing , the char into byte [].
please let us know .
------ For reference only -------------------------------------- -

ah , that I have seen , and finally calls the native method . I only java ah.
------ For reference only -------------------------------------- -

this is not a problem .


Please refer Ming Road , thanks
------ For reference only ---------------------------------------

this use sun private classes and methods. Also a way .
------ For reference only -------------------------------------- -

this uses the sun 's private classes and methods. Also a way .  

yes ah, source code , it is quite complicated, you can look at free look .
------ For reference only ------------------------------------ ---
I made reference to the JDK charset.jar in , sun.nio.cs.ext.GBK18030.java source, as well as in rt.jar sun.nio.cs.UTF_8.java source. utf -bit arithmetic coding can be directly converted into bytes because java internal use unicode encoding , utf-8 encoding is the law , and can be directly mapped into unicode. And gbk and unicode without a certain relationship . Bitwise mostly unable to turn . In GBK18030.java is completely manual way by means of the mapping .
And I will GBK18030.java the code extracted , made tools , performance, not as a direct use nio transfer efficiency.
This is nio way :

import java.nio.ByteBuffer;
import java.nio.CharBuffer;
import java.nio.charset.Charset;
import java.util.Arrays;

public class GBKCharUtils {
public static final Charset charset = Charset.forName("GBK");

public static byte[] getBytes(char c) {
CharBuffer charBuffer = CharBuffer.allocate(1);
charBuffer.put(c);
charBuffer.flip();
ByteBuffer byteBuffer = charset.encode(charBuffer);
return byteBuffer.array();
}

public static byte[] getBytes(char[] chars) {
CharBuffer charBuffer = CharBuffer.wrap(chars);
ByteBuffer byteBuffer = charset.encode(charBuffer);
return byteBuffer.array();
}

public static void main(String[] args) {
CharBuffer charBuffer = CharBuffer.allocate(3);
charBuffer.put('c');
charBuffer.put('2');
charBuffer.put('a');

System.out.println(Arrays.toString(getBytes('雷')));

System.out.println(Arrays.toString(getBytes(new char[]{'雷'})));
}

}

So go directly nio it.
Here is GBK18030.java extracted to approach.

public void encode(CharBuffer src, ByteBuffer dst) {
            //int hiByte = 0, loByte = 0;
            while (src.hasRemaining()) {
                char c = src.get();
                if (c >= 0x0000 && c <= 0x007F) {
                    dst.put((byte)c);
                } else if (c <= 0xA4C6 || c >= 0xE000) {
                    int outByteVal = getGB18030(encoderIndex1, encoderIndex2, c);
                    //hiByte = (outByteVal & 0xFF00) >> 8;
                    //loByte = outByteVal & 0xFF;

                    dst.put((byte)((outByteVal & 0xFF00) >> 8));
                    dst.put((byte)(outByteVal & 0xFF));
                }
            }
        }

which getGB18030 is to do gbk encoding and unicode mapping .
specific source , jdk does not seem to , you can download the source code to see openjdk .

没有评论:

发表评论