Coder Perfect

Pandas read data from a table with no headers.


How can I use pandas to read in a.csv file (without headers) and only want a subset of the columns (for example, the 4th and 7th out of a total of 20 columns)? I can’t seem to get usecols to work.

Asked by user308827

Solution #1

To read a csv file without a header and only for specified columns, use the parameters header=None and usecols=[3,6] for the 4th and 7th columns:

df = pd.read_csv(file_path, header=None, usecols=[3,6])

See the docs

Answered by EdChum

Solution #2

Previous solutions were fine and right, but adding extra names parameter, in my opinion, will make it perfect, and it should be the recommended method, especially when the csv has no headers.

df = pd.read_csv(file_path, usecols=[3,6], names=['colA', 'colB'])

alternatively use header=None to warn them that the csv doesn’t have any headers (anyway both lines are identical)

df = pd.read_csv(file_path, usecols=[3,6], names=['colA', 'colB'], header=None)

in order for you to be able to retrieve your data

# with `names` parameter

instead of

# without `names` parameter

When names are explicitly supplied to read csv, header behaves like None instead of 0, allowing header=None to be skipped when names exist.

Answered by ch33hau

Solution #3

Pass header=None and usecols=[3,6] for the 4th and 7th columns, respectively.

Answered by Alex

Solution #4

According to the documentation at csv.html, the headerint, list of int, default ‘infer’ Row number(s) to use as column names, and the start of the data are: headerint, list of int, default ‘infer’ Row number(s) to use as column names, and the start of the data. If no names are supplied, the behavior is same to header=0, and column names are inferred from the first line of the file; if column names are explicitly passed, the behavior is identical to header=None. To be able to replace current names, explicitly pass header=0. The header can be a list of integers that specify row locations for a multi-index on the columns e.g. [0,1,3]. Intervening rows that are not specified will be skipped (e.g. 2 in this example is skipped). Note that this parameter ignores commented lines and empty lines if

optional, namesarray-like To use, make a list of column names. You must explicitly pass header=0 to override the column names if the file contains a header row. There are no duplicates in this list.

columts = ['Day', 'PLMN', 'RNCname']
tempo = pd.read_csv("info.csv", sep=';', header=0, names=columts, index_col=False)


Post is based on