Commands in Pandas which I have used or would like to use. This is not meant to be complete, but cover some common usage situations or some tricky or obscure syntax.

See also:

Imports

import numpy as np
import pandas as pd

Types

  • pd.Series
  • pd.GroupBy
  • pd.DataFrame

Create dataframe

Column-input

With data as a dict where key is column name and value is a list of values.

data = {"a" : [4 ,5, 6], "b" : [7, 8, 9], "c" : [10, 11, 12]}
df = pd.DataFrame(
    data,
    index=[1, 2, 3],
)

Row input

With data as a 2-D list - a list of rows where each row is a list. Columns must be named.

data = [[4, 7, 10], [5, 8, 11], [6, 9, 12]]
df = pd.DataFrame(
    data,
    indexes=[1, 2, 3],
    columns=['a', 'b', 'c'],
)

Select and filter

Get column.

df['foo']

df.foo

Get columns.

df[['foo', 'bar']]

Get columns of name matching pattern.

df.filter(regex='foo')

Select range of named columns, inclusive.

df.loc[:, 'foo':'bar']

Select individual columns by index.

df.iloc[:, [0, 2 ,5]]

Also filter by condition.

df.loc[df['a'] > 10, ['a','c']]

Summarize

Row count.

len(df)

Get pairs of values and frequency, sorted by frequency descending.

df['foo'].value_counts()

e.g.

df = pd.Series([2, 2, 5, 3, 4, np.nan])
df.value_counts()
# 2.0    2
# 4.0    1
# 3.0    1
# 5.0    1

Distinct values. Does not sort.

df['foo'].nunique()

e.g.

> pd.Series([2, 4, 3, 3], name='P').unique()
# => array([2, 4, 3], dtype=int64)

Basic descriptive statistics for each column

df.describe()

Functions

Calculations on a column or possibly dataframe.

  • .count()
  • .sum()
  • .min()
  • .max()
  • .mean()
  • .median()
  • .var()
  • .std()
  • .quantile([0.25,0.75]
  • .apply(function)

Group by

Get GroupBy object.

Column.

df.groupby(by="foo")

Index.

df.groupby(level="foo")

Size of group.

.size()

Plot

Histogram for each column.

df.plot.hist()

Scatter chart using pairs of points.

df.plot.scatter(x='foo', y='bar')

Read and write files

See pandas: How to Read and Write Files on Real Python blog.

CSV

$ pip install pandas

Write:

df.to_csv('data.csv')

Read:

df = pd.read_csv('data.csv', index_col=0)

Excel

$ pip install xlwt openpyxl xlsxwriter xlrd

Write:

df.to_excel('data.xlsx')