在 Pandas 中更改列的資料類型1 建立 DataFrame 時指定類型2 對于 Series 3 對于多列或者整個 DataFrame軟轉換——類型自動推斷 astype 強制轉換

2018-06-02 23:50:00

import pandas as pd
import numpy as np
a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a)

df.dtypes

0    object
1    object
2    object
dtype: object

資料框（data.frame）是最常用的資料結構，用于存儲二維表（即關系表）的資料，每一列存儲的資料類型必須相同，不同資料列的資料類型可以相同，也可以不同，但是每列的行數（長度）必須相同。資料框的每列都有唯一的名字，在已建立的資料框上，使用者可以添加計算列。

1 建立 DataFrame 時指定類型

如果要建立一個

DataFrame

，可以直接通過

dtype

參數指定類型：

df = pd.DataFrame(data=np.arange(100).reshape((10,10)), dtype=np.int8) 
df.dtypes

0    int8
1    int8
2    int8
3    int8
4    int8
5    int8
6    int8
7    int8
8    int8
9    int8
dtype: object

2 對于 `Series`

s = pd.Series(['1', '2', '4.7', 'pandas', '10'])
s

0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

使用 `to_numeric` 轉為數值

預設情況下，它不能處理字母型的字元串'pandas'

pd.to_numeric(s) # or pd.to_numeric(s, errors='raise');

---------------------------------------------------------------------------

ValueError                                Traceback (most recent call last)

pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_numeric()


ValueError: Unable to parse string "pandas"


During handling of the above exception, another exception occurred:


ValueError                                Traceback (most recent call last)

<ipython-input-24-12f1203e2645> in <module>()
----> 1 pd.to_numeric(s) # or pd.to_numeric(s, errors='raise');


C:\Program Files (x86)\Microsoft Visual Studio\Shared\Anaconda3_64\lib\site-packages\pandas\core\tools\numeric.py in to_numeric(arg, errors, downcast)
    131             coerce_numeric = False if errors in ('ignore', 'raise') else True
    132             values = lib.maybe_convert_numeric(values, set(),
--> 133                                                coerce_numeric=coerce_numeric)
    134 
    135     except Exception:


pandas/_libs/src/inference.pyx in pandas._libs.lib.maybe_convert_numeric()


ValueError: Unable to parse string "pandas" at position 3

可以将無效值強制轉換為

NaN

，如下所示：

pd.to_numeric(s, errors='coerce')

0     1.0
1     2.0
2     4.7
3     NaN
4    10.0
dtype: float64

如果遇到無效值，第三個選項就是忽略該操作：

pd.to_numeric(s, errors='ignore')

0         1
1         2
2       4.7
3    pandas
4        10
dtype: object

3 對于多列或者整個 DataFrame

如果想要将這個操作應用到多個列，依次處理每一列是非常繁瑣的，是以可以使用

DataFrame.apply

處理每一列。

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['col1','col2','col3'])
df

col1	col2	col3
a	1.2	4.2
1	b	70	0.03
2	x	5

df[['col2','col3']] = df[['col2','col3']].apply(pd.to_numeric)
df.dtypes

col1     object
col2    float64
col3    float64
dtype: object

這裡「col2」和「col3」根據需要具有

float64

類型

`df.apply(pd.to_numeric, errors='ignore')`

該函數将被應用于整個

DataFrame

，可以轉換為數字類型的列将被轉換，而不能(例如，它們包含非數字字元串或日期)的列将被單獨保留。

另外 `pd.to_datetime` 和 `pd.to_timedelta` 可将資料轉換為日期和時間戳。

軟轉換——類型自動推斷

infer_objects()

方法，用于将具有對象資料類型的 DataFrame 的列轉換為更具體的類型。

df = pd.DataFrame({'a': [7, 1, 5], 'b': ['3','2','1']}, dtype='object')
df.dtypes

a    object
b    object
dtype: object

然後使用

infer_objects()

，可以将列

'a'

的類型更改為

int64

：

df = df.infer_objects()
df.dtypes

a     int64
b    object
dtype: object

`astype` 強制轉換

如果試圖強制将兩列轉換為整數類型，可以使用

df.astype(int)

。

a = [['a', '1.2', '4.2'], ['b', '70', '0.03'], ['x', '5', '0']]
df = pd.DataFrame(a, columns=['one', 'two', 'three'])
df.dtypes

one      object
two      object
three    object
dtype: object

df[['two', 'three']] = df[['two', 'three']].astype(float)
df.dtypes

one       object
two      float64
three    float64
dtype: object

探尋有趣之事！

在 Pandas 中更改列的資料類型1 建立 DataFrame 時指定類型2 對于 Series 3 對于多列或者整個 DataFrame軟轉換——類型自動推斷 astype 強制轉換

1 建立 DataFrame 時指定類型

2 對于 `Series`

使用 `to_numeric` 轉為數值

3 對于多列或者整個 DataFrame

`df.apply(pd.to_numeric, errors='ignore')`

另外 `pd.to_datetime` 和 `pd.to_timedelta` 可将資料轉換為日期和時間戳。

軟轉換——類型自動推斷

`astype` 強制轉換

繼續閱讀

來自python的【條件控制/語句循環/break/continue/else/pass】一、條件控制二、語句循環

無法解析的外部符号 wmain，該符号在函數 "void cdecl mainCRTStartupHelper(struct HINSTANCE *,unsigned short con......

TestLink導出用例轉換工具(XML2Excel)

YAML簡介和PyYAML安全操作YAML支援的類型YAML的優點：yaml的基本文法python操作

Small tricks

libsvm for python 安裝

學習軟體測試基礎測試第七天

Zeppelin 配置通路 REST APIApache Zeppelin Configuration REST API

【Torch】最簡潔logging使用指南

27. Remove Element(清單)題目代碼

Cloud Studio初體驗

使用 ctypes 進行 Python 和 C 的混合程式設計

【python】【資料處理】畫多元資料分布圖

【python】netconf協定對接管理裝置

「Python 網絡自動化」NETCONF —— Python 使用 NETCONF 管理配置 H3C 網絡裝置

在python中建立excel并寫入

在 Pandas 中更改列的資料類型1 建立 DataFrame 時指定類型2 對于 Series 3 對于多列或者整個 DataFrame軟轉換——類型自動推斷 astype 強制轉換

1 建立 DataFrame 時指定類型

2 對于 Series

使用 to_numeric 轉為數值

3 對于多列或者整個 DataFrame

df.apply(pd.to_numeric, errors='ignore')

另外 pd.to_datetime 和 pd.to_timedelta 可将資料轉換為日期和時間戳。

軟轉換——類型自動推斷

astype 強制轉換

繼續閱讀

2 對于 `Series`

使用 `to_numeric` 轉為數值

`df.apply(pd.to_numeric, errors='ignore')`

另外 `pd.to_datetime` 和 `pd.to_timedelta` 可将資料轉換為日期和時間戳。

`astype` 強制轉換