Coder Perfect

Get a list of the column headings in a Pandas DataFrame.

Problem

From a Pandas DataFrame, I’d like to extract a list of the column headers. I won’t know how many columns there will be or what they will be called because the DataFrame will be generated from user input.

If I’m given a DataFrame like this, for example:

>>> my_dataframe
    y  gdp  cap
0   1    2    5
1   2    3    9
2   8    7    2
3   3    4    7
4   6    7    7
5   4    8    3
6   8    2    8
7   9    9   10
8   6    6    4
9  10   10    7

I’d make a list like this:

>>> header_list
['y', 'gdp', 'cap']

Asked by natsuki_2002

Solution #1

You can receive a list of the values by typing:

list(my_dataframe.columns.values)

You can also use (as indicated in Ed Chum’s response):

list(my_dataframe)

Answered by Simeon Visser

Solution #2

There is a pre-built method that is the most efficient:

my_dataframe.columns.values.tolist()

.columns.values returns an array, and this has a helper function. tolist is a function that returns a list.

If you don’t care about performance, Index objects provide a.tolist() method that you can call directly:

my_dataframe.columns.tolist()

It’s easy to see the difference in performance:

%timeit df.columns.tolist()
16.7 µs ± 317 ns per loop (mean ± std. dev. of 7 runs, 100000 loops each)

%timeit df.columns.values.tolist()
1.24 µs ± 12.3 ns per loop (mean ± std. dev. of 7 runs, 1000000 loops each)

as follows:

list(df)

Answered by EdChum

Solution #3

I ran some brief tests, and unsurprisingly, the built-in approach, which uses dataframe.columns.values.tolist(), is the quickest:

In [1]: %timeit [column for column in df]
1000 loops, best of 3: 81.6 µs per loop

In [2]: %timeit df.columns.values.tolist()
10000 loops, best of 3: 16.1 µs per loop

In [3]: %timeit list(df)
10000 loops, best of 3: 44.9 µs per loop

In [4]: % timeit list(df.columns.values)
10000 loops, best of 3: 38.4 µs per loop

(However, I still appreciate the list(dataframe), so thanks EdChum!)

Answered by tegan

Solution #4

It gets even easier (thanks to Pandas 0.16.0):

df.columns.tolist()

will provide you with a great selection of column names.

Answered by fixxxer

Solution #5

With Python 3.5, unpacking generalizations (PEP 448) were introduced. As a result, all of the surgeries listed below are achievable.

df = pd.DataFrame('x', columns=['A', 'B', 'C'], index=range(5))
df

   A  B  C
0  x  x  x
1  x  x  x
2  x  x  x
3  x  x  x
4  x  x  x

If you’re looking for a list, here it is….

[*df]
# ['A', 'B', 'C']

Alternatively, if you want a set,

{*df}
# {'A', 'B', 'C'}

Alternatively, if you want a tuple, use

*df,  # Please note the trailing comma
# ('A', 'B', 'C')

Alternatively, if you wish to save the result,

*cols, = df  # A wild comma appears, again
cols
# ['A', 'B', 'C']

… if you’re the kind of person who converts coffee to typing sounds, well, this is going consume your coffee more efficiently 😉

Visual Check

You can use iterable unpacking, which I’ve seen mentioned in other answers (no need for explicit loops).

print(*df)
A B C

print(*df, sep='\n')
A
B
C

If an operation can be done in a single line, don’t use an explicit for loop (list comprehensions are okay).

The original order of the columns is not preserved when using sorted(df). Instead of using list(df), you should use list(df).

Next, list(df.columns) and list(df.columns.values) are poor suggestions (as of the current version, v0.24). Both Index (returned from df.columns) and NumPy arrays (returned by df.columns.values) define .tolist() method which is faster and more idiomatic.

Finally, for Python 3.4 or earlier, listification, i.e. list(df), should only be used as a concise alternative to the aforementioned methods when extensive unpacking is not available.

Answered by cs95

Post is based on https://stackoverflow.com/questions/19482970/get-a-list-from-pandas-dataframe-column-headers