天天看點

【jdk1.8】String源碼分析String

String

類的聲明

public final class String
    implements java.io.Serializable, Comparable<String>, CharSequence
           

首先可以看到String類是一個不變類,被final修飾,是以是不可繼承的。

它實作了Serializable接口,還有Comparable(主要就是compareTo方法)與CharSequence(如下圖)。

【jdk1.8】String源碼分析String

類的成員變量

/** 底層字元的存儲*/
    private final char value[];

    /** 哈希碼*/
    private int hash; // Default to 0
           

類的構造方法

主要有三類類的構造方法,第一種是和byte[]相關的,第二種是和char[]相關的,第三種是和StringBuilder和StringBuffer相關的。

byte[]類

byte[]類裡比較重要或者說比較常用的一個方法就是解碼了。

public String(byte bytes[], String charsetName)
            throws UnsupportedEncodingException {
        this(bytes, , bytes.length, charsetName);
    }
    public String(byte bytes[], int offset, int length, String charsetName)
            throws UnsupportedEncodingException {
        if (charsetName == null)
            throw new NullPointerException("charsetName");
        checkBounds(bytes, offset, length);
        this.value = StringCoding.decode(charsetName, bytes, offset, length);
    }
           

我們來看一個例子:

String test = "中文";
        String[] csn = new String[] {"ISO-8859-1", "GBK", "UTF-8"};
        for(int i=;i<csn.length;i++){
            byte[] bt = test.getBytes(csn[i]);
            for(int j=;j<csn.length;j++){
                String str = new String(bt, csn[j]);
                String res = new String(str.getBytes(csn[j]), csn[i]);
                System.out.print(res+"\t");
            }
            System.out.println();
        }
           

結果是:

ISO GBK UTF-8
ISO     ??  ??  ??
GBK     中文  中文  锟斤拷锟斤拷
UTF-8   中文  中文  中文
           

為什麼ISO-8859-1那一行編碼組合再還原都不行呢?

因為ISO-8859-1編碼的編碼表中,沒有包含漢字字元,當然也就無法通過[

"中文".getBytes("ISO8859-1");

]來得到正确的”中文”在ISO-8859-1中的編碼值了,是以再通過new String()來還原就無從談起了。

char[] 和 StringXxx

主要就是

Arrays.copyOf()

的應用咯。

類的關鍵方法

hashCode()

public int hashCode() {
        int h = hash;
        if (h ==  && value.length > ) {
            char val[] = value;

            for (int i = ; i < value.length; i++) {
                h =  * h + val[i];
            }
            hash = h;
        }
        return h;
    }
           

其實就是公式

s[0]*31^(n-1) + s[1]*31^(n-2) + ... + s[n-1]

的值。

intern()

/**本地方法*/    /**
     * Returns a canonical representation for the string object.
     * <p>
     * A pool of strings, initially empty, is maintained privately by the
     * class {@code String}.
     * <p>
     * When the intern method is invoked, if the pool already contains a
     * string equal to this {@code String} object as determined by
     * the {@link #equals(Object)} method, then the string from the pool is
     * returned. Otherwise, this {@code String} object is added to the
     * pool and a reference to this {@code String} object is returned.
     * <p>
     * It follows that for any two strings {@code s} and {@code t},
     * {@code s.intern() == t.intern()} is {@code true}
     * if and only if {@code s.equals(t)} is {@code true}.
     * <p>
     * All literal strings and string-valued constant expressions are
     * interned. String literals are defined in section 3.10.5 of the
     * <cite>The Java&trade; Language Specification</cite>.
     *
     * @return  a string that has the same contents as this string, but is
     *          guaranteed to be from a pool of unique strings.
     */
    public native String intern();