基本统计(含排序) 分布/累计统计 数据特征 相关性、周期性等 数据挖掘(形成知识)
一组数据表达一个或多个含义
摘要 - 数据形成有损特征的过程
pandas库的数据排序
.sort_index()方法在指定轴上根据索引进行排序,默认升序
.sort_index(axis=0, ascending=True)
.sort_values()方法在指定轴上根据数值进行排序,默认升序
Series.sort_values(axis=0, ascending=True)
DataFrame.sort_values(by, axis=0, ascending=True)
by :axis轴上的某个索引或索引列表
NaN统一放到排序末尾
代码示例
# -*- coding: utf-8 -*-
# @File : pandas_sort.py
# @Date : 2018-05-20
# pandas数据排序
import pandas as pd
import numpy as np
# 数据准备
df = pd.DataFrame(np.arange(20).reshape(4, 5), index=["c", "a", "d", "b"])
print(df)
"""
0 1 2 3 4
c 0 1 2 3 4
a 5 6 7 8 9
d 10 11 12 13 14
b 15 16 17 18 19
"""
# 索引升序排序,默认axis=0,行索引
print(df.sort_index())
"""
0 1 2 3 4
a 5 6 7 8 9
b 15 16 17 18 19
c 0 1 2 3 4
d 10 11 12 13 14
"""
# 索引降序排序
print(df.sort_index(ascending=False))
"""
0 1 2 3 4
d 10 11 12 13 14
c 0 1 2 3 4
b 15 16 17 18 19
a 5 6 7 8 9
"""
# 对axis-1排序,列索引
print(df.sort_index(axis=1, ascending=False))
"""
4 3 2 1 0
c 4 3 2 1 0
a 9 8 7 6 5
d 14 13 12 11 10
b 19 18 17 16 15
"""
# 值排序,行排序
print(df.sort_values(2, ascending=False))
"""
0 1 2 3 4
b 15 16 17 18 19
d 10 11 12 13 14
a 5 6 7 8 9
c 0 1 2 3 4
"""
# 列排序,选择排序关键字
print(df.sort_values("a", axis=1, ascending=False))
"""
4 3 2 1 0
c 4 3 2 1 0
a 9 8 7 6 5
d 14 13 12 11 10
b 19 18 17 16 15
"""
# NaN统一放到排序末尾
a = pd.DataFrame(np.arange(12).reshape(3, 4), index=["a", "b", "c"])
b = pd.DataFrame(np.arange(20).reshape(4, 5), index=["a", "b", "c", "d"])
c = a + b
print(c)
"""
0 1 2 3 4
a 0.0 2.0 4.0 6.0 NaN
b 9.0 11.0 13.0 15.0 NaN
c 18.0 20.0 22.0 24.0 NaN
d NaN NaN NaN NaN NaN
"""
print(c.sort_values(2, ascending=False))
"""
0 1 2 3 4
c 18.0 20.0 22.0 24.0 NaN
b 9.0 11.0 13.0 15.0 NaN
a 0.0 2.0 4.0 6.0 NaN
d NaN NaN NaN NaN NaN
"""
print(c.sort_values(2, ascending=True))
"""
0 1 2 3 4
a 0.0 2.0 4.0 6.0 NaN
b 9.0 11.0 13.0 15.0 NaN
c 18.0 20.0 22.0 24.0 NaN
d NaN NaN NaN NaN NaN
"""