pandas对象拥有一组常用的数学和统计方法，用于从Series提取单个值（sum或mean），或者是从DataFrame的行或列提取一个Series。与NumPy数组方法相比，它们都是基于没有缺失数据的假设而构建的。

df = DataFrame([[1.4, np.nan], [7.1, -4.5],
                [np.nan, np.nan], [0.75, -1.3]],
               index=['a', 'b', 'c', 'd'],
               columns=['one', 'two'])

调用DataFrame的sum方法，返回含有列小计的Series

df.sum()

one    9.25
two   -5.80
dtype: float64

传入axis=1按行进行求和运算

df.sum(axis=1)

a    1.40
b    2.60
c    0.00
d   -0.55
dtype: float64

NA值会自动排除，除非整个切片都是NA，通过skipna选项可以禁用该功能

df.mean(axis=1, skipna=False)

a      NaN
b    1.300
c      NaN
d   -0.275
dtype: float64

有些方法（如idxmin和idxmax）返回的是间接统计，比如达到最小值或最大值的索引

df.idxmax()

one    b
two    d
dtype: object

有些方法是累计型的

df.cumsum()

    one    two
a    1.40    NaN
b    8.50    -4.5
c    NaN    NaN
d    9.25    -5.8

describe一次性产生多个汇总统计

df.describe()

        one            two
count    3.000000    2.000000
mean    3.083333    -2.900000
std    3.493685    2.262742
min    0.750000    -4.500000
25%    NaN    NaN
50%    NaN    NaN
75%    NaN    NaN
max    7.100000    -1.300000

Series的describe方法

obj = Series(['a', 'a', 'b', 'c'] * 4)
obj.describe()

count     16
unique     3
top        a
freq       8
dtype: object

数据统计

results matching ""

No results matching ""