天天看点

基于pinyin4j实现中文按照首字母排序

参考资料:

https://www.cnblogs.com/zhangqie/p/9456401.html

https://blog.csdn.net/weixin_42311000/article/details/114711578

结合以上的2个帖子进行了改进,得到的结果如下:

(1)依赖的jar:

<dependency>
	<groupId>com.belerweb</groupId>
	<artifactId>pinyin4j</artifactId>
	<version>2.5.1</version>
</dependency>
           

(2)测试demo代码如下:

import net.sourceforge.pinyin4j.PinyinHelper;
import org.springblade.core.tool.utils.Func;

import java.text.CollationKey;
import java.text.Collator;
import java.util.*;

public class PinyinComparator implements Comparator<String> {
    Collator collator = Collator.getInstance(Locale.CHINA);

    public int compare(String o1, String o2) {
        CollationKey key1 = collator.getCollationKey(chineseToPinyin(o1));
        CollationKey key2 = collator.getCollationKey(chineseToPinyin(o2));
        return key1.compareTo(key2);
    }

    private static String chineseToPinyin(String chinese) {
        char[] chars = chinese.toCharArray();
        StringBuffer pinyinSbf = new StringBuffer();
        for (char ch:chars){
            String[] strings = PinyinHelper.toHanyuPinyinStringArray(ch);
            if (Func.isNotEmpty(strings)){
                //中文返回的是拼音
                pinyinSbf.append(strings[0].substring(0,strings[0].length()-1));
            }else{
                //为null的字符为数字、字母或者其他特殊符号
                pinyinSbf.append(ch);
            }
        }
        return pinyinSbf.toString();
    }

    public static void main(String[] args) {
        String[] arr = {"关羽2", "关羽1", "张飞", "公孙瓒", "诸葛亮", "曹操", "刘备", "赵云", "微微", "哈哈", "哈", "怡情", "用友", "医院", "小米", "张三", "李四", "王五", "赵六", "JAVA", "java", "AVA", "php", "PHP", "123", "2", "234", "126", "011", "123", "$%$#", "哈哈A",
                "1哈哈A", "1哈哈b", "1哈哈a", "哈哈", "哈", "怡情"};
        List<String> list = Arrays.asList(arr);
        Collections.sort(list, new PinyinComparator());
        for (String str: list) {
            System.out.println(str);
        }
    }
}
           

(3)输入结果:

$%$#
011
123
123
126
1哈哈a
1哈哈A
1哈哈b
2
234
AVA
曹操
公孙瓒
关羽1
关羽2
哈
哈
哈哈
哈哈
哈哈A
java
JAVA
李四
刘备
php
PHP
王五
微微
小米
怡情
怡情
医院
用友
张飞
张三
赵六
赵云
诸葛亮
           

(4)结论:

特殊字符>数字>小写字母>大写字母