Coder Perfect

In Pandas, how do you deal with SettingWithCopyWarning?

Problem

My Pandas have now been upgraded from 0.11 to 0.13.0rc1. The application is now displaying a slew of additional warnings. One of them enjoys the following:

E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE

I’m curious as to what it implies. Do I need to make any adjustments?

If I insist on using quote df[‘TVol’] = quote df[‘TVol’]/TVOL SCALE, how should I suspend the warning?

def _decode_stock_quote(list_of_150_stk_str):
    """decode the webpage and return dataframe"""

    from cStringIO import StringIO

    str_of_all = "".join(list_of_150_stk_str)

    quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
    quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
    quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]
    quote_df['TClose'] = quote_df['TPrice']
    quote_df['RT']     = 100 * (quote_df['TPrice']/quote_df['TPCLOSE'] - 1)
    quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
    quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
    quote_df['STK_ID'] = quote_df['STK'].str.slice(13,19)
    quote_df['STK_Name'] = quote_df['STK'].str.slice(21,30)#.decode('gb2312')
    quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

    return quote_df
E:\FinReporter\FM_EXT.py:449: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TVol']   = quote_df['TVol']/TVOL_SCALE
E:\FinReporter\FM_EXT.py:450: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TAmt']   = quote_df['TAmt']/TAMT_SCALE
E:\FinReporter\FM_EXT.py:453: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  quote_df['TDate']  = quote_df.TDate.map(lambda x: x[0:4]+x[5:7]+x[8:10])

Asked by bigbug

Solution #1

The SettingWithCopyWarning was created to flag potentially confusing “chained” assignments, such as the following, which does not always work as expected, particularly when the first selection returns a copy. [see GH5390 and GH5597 for background discussion.]

df[df['A'] > 2]['B'] = new_val  # new_val not set in df

The caution suggests that you rewrite it as follows:

df.loc[df['A'] > 2, 'B'] = new_val

This, however, does not correspond to your usage, which is similar to:

df = df[df['A'] > 2]
df['B'] = new_val

While it’s clear that you don’t care if writes return to the original frame (as you’re overwriting the reference to it), this pattern is sadly indistinguishable from the first chained assignment example. As a result, the (false positive) alert. If you want to learn more about the risk for false positives, go to the indexing docs. With the following assignment, you may safely suppress this new warning.

import pandas as pd
pd.options.mode.chained_assignment = None  # default='warn'

Answered by Garrett

Solution #2

This article is intended for those who,

Setup

np.random.seed(0)
df = pd.DataFrame(np.random.choice(10, (3, 5)), columns=list('ABCDE'))
df
   A  B  C  D  E
0  5  0  3  3  7
1  9  3  5  2  4
2  7  6  8  8  1

It’s crucial to understand what this warning signifies and why it was issued in the first place before deciding how to handle it.

When filtering DataFrames, you can slice/index a frame to get a view or a copy, depending on the internal structure and implementation specifics. Because a “view” is a window into the underlying data, changing the view can change the original object. A “copy,” on the other hand, is a duplicate of the original’s data, and changing the copy has no effect on the original.

SettingWithCopyWarning was established to signal “chained assignment” activities, as noted in earlier responses. In the configuration above, consider df. Assume you want to select all values in column “B” that are greater than 5 in column “A.” Pandas allows you to achieve this in a variety of methods, some of which are more accurate than others. For example,

df[df.A > 5]['B']

1    3
2    6
Name: B, dtype: int64

And,

df.loc[df.A > 5, 'B']

1    3
2    6
Name: B, dtype: int64

These both provide the same outcome, therefore it doesn’t matter if you’re just reading these numbers. So, what exactly is the problem? The issue with chained assignment is that it’s difficult to tell whether a view or a copy will be returned, which is especially problematic when trying to assign data back. Consider how the interpreter executes this code, building on the previous example:

df.loc[df.A > 5, 'B'] = 4
# becomes
df.__setitem__((df.A > 5, 'B'), 4)

Consider the following code, which uses a single __setitem__ call to df:

df[df.A > 5]['B'] = 4
# becomes
df.__getitem__(df.A > 5).__setitem__('B', 4)

The __setitem__ action may or may not work now, depending on whether __getitem__ delivered a view or a copy.

In general, you should use loc for label-based assignment and iloc for integer/positional assignment because the spec ensures that they always work on the original. Additionally, at and iat should be used to set a single cell.

More information is available in the documentation.

Consider a simple operation on the “A” column of df. Selecting “A” and dividing by 2 will raise the warning, but the operation will work.

df2 = df[['A']]
df2['A'] /= 2
/Library/Frameworks/Python.framework/Versions/3.6/lib/python3.6/site-packages/IPython/__main__.py:1: SettingWithCopyWarning: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

df2
     A
0  2.5
1  4.5
2  3.5

There are a couple of ways to silence this warning directly:

In the comments, @Peter Cotton came up with a good technique of non-intrusively altering the mode (adapted from this gist) by utilizing a context manager to set the mode only for as long as it’s needed, and then revert it to its original state when done.

The following is an example of how to use it:

# some code here
with ChainedAssignent():
    df2['A'] /= 2
# more code follows

Alternatively, to make an exception

with ChainedAssignent(chained='raise'):
    df2['A'] /= 2

SettingWithCopyError: 
A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_indexer,col_indexer] = value instead

Users frequently try to find ways to silence this exception without fully comprehending why it was triggered in the first place. This is an XY problem, in which users try to fix a problem “Y” that is actually a symptom of a deeper-seated problem “X.” Questions will be posed based on frequent issues that arise as a result of this warning, and solutions will be offered.

The incorrect method is to:

df.A[df.A > 5] = 1000         # works, because df.A returns a view
df[df.A > 5]['A'] = 1000      # does not work
df.loc[df.A > 5]['A'] = 1000   # does not work

How to use loc the right way:

df.loc[df.A > 5, 'A'] = 1000

You can do this using any of the ways listed below.

df.loc[1, 'D'] = 12345
df.iloc[1, 3] = 12345
df.at[1, 'D'] = 12345
df.iat[1, 3] = 12345

This is most likely due to code farther down your pipeline. Did you make df2 from a larger file, such as

df2 = df[df.A > 5]

? Because boolean indexing returns a view in this situation, df2 will refer to the original. To do this, you’ll need to assign df2 to a copy:

df2 = df[df.A > 5].copy()
# Or,
# df2 = df.loc[df.A > 5, :]

This is due to the fact that df2 was most likely formed as a view from another slicing procedure, such as

df2 = df[df.A > 5]

The solution is to either construct a copy() of df or, as before, utilize loc.

Answered by cs95

Solution #3

The purpose of the SettingWithCopyWarning is to inform users (especially novice users) that they may be working with a copy rather than the original. False positives do exist (IOW if you know what you are doing it could be ok). One option, as suggested by @Garrett, is to simply turn off the (by default warn) warning.

Here’s another possibility:

In [1]: df = DataFrame(np.random.randn(5, 2), columns=list('AB'))

In [2]: dfa = df.ix[:, [1, 0]]

In [3]: dfa.is_copy
Out[3]: True

In [4]: dfa['A'] /= 2
/usr/local/bin/ipython:1: SettingWithCopyWarning: A value is trying to be set on a copy of a slice from a DataFrame.
Try using .loc[row_index,col_indexer] = value instead
  #!/usr/local/bin/python

For that object, you can set the is copy flag to False, which effectively disables the check:

In [5]: dfa.is_copy = False

In [6]: dfa['A'] /= 2

There will be no more warning if you copy explicitly:

In [7]: dfa = df.ix[:, [1, 0]].copy()

In [8]: dfa['A'] /= 2

While the code shown by the OP is real, and likely something I do as well, it is technically a case for this warning, rather than a false positive. Another option to avoid the warning is to do the selection process through reindex, for example.

quote_df = quote_df.reindex(columns=['STK', ...])

Or,

quote_df = quote_df.reindex(['STK', ...], axis=1)  # v.0.21

Answered by Jeff

Solution #4

When you go out and accomplish anything like this, keep the following in mind:

quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

In this scenario, pandas.ix returns a new, standalone dataframe.

Any changes you make to the values in this dataframe will have no effect on the original dataframe.

This is something that pandas are trying to warn you of.

The.ix object tries to perform too many things, which is a strong odor for anyone who has read anything about clean code.

Given this dataframe:

df = pd.DataFrame({"a": [1,2,3,4], "b": [1,1,2,2]})

Two behaviors:

dfcopy = df.ix[:,["a"]]
dfcopy.a.ix[0] = 2

The first behavior is that dfcopy has become a stand-alone dataframe. Changing it will have no effect on df.

df.ix[0, "a"] = 3

The second behavior modifies the original dataframe.

The pandas team realized that the.ix object was a bit stinky[speculatively], therefore they added two new objects to aid with data access and assignment. (.iloc is the other.)

Because it does not attempt to construct a copy of the data,.loc is faster.

.loc is designed to edit an existing dataframe in situ, which saves memory.

The behavior of.loc is predictable.

In your code example, you’re loading a large file with many columns and then altering it to make it smaller.

The pd.read_csv function can help you out with a lot of this and also make the loading of the file a lot faster.

So, rather of doing this,

quote_df = pd.read_csv(StringIO(str_of_all), sep=',', names=list('ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefg')) #dtype={'A': object, 'B': object, 'C': np.float64}
quote_df.rename(columns={'A':'STK', 'B':'TOpen', 'C':'TPCLOSE', 'D':'TPrice', 'E':'THigh', 'F':'TLow', 'I':'TVol', 'J':'TAmt', 'e':'TDate', 'f':'TTime'}, inplace=True)
quote_df = quote_df.ix[:,[0,3,2,1,4,5,8,9,30,31]]

Do this

columns = ['STK', 'TPrice', 'TPCLOSE', 'TOpen', 'THigh', 'TLow', 'TVol', 'TAmt', 'TDate', 'TTime']
df = pd.read_csv(StringIO(str_of_all), sep=',', usecols=[0,3,2,1,4,5,8,9,30,31])
df.columns = columns

This will only read and name the columns that you are interested in. There’s no need to use the evil.ix object to do magical feats.

Answered by firelynx

Solution #5

In this section, I provide a direct response to the question. What are your options for dealing with it?

After you slice, do a.copy(deep=False). Take a look at pandas. DataFrame.copy.

Isn’t a slice supposed to return a copy? After all, isn’t that what the warning message is implying? Read the entire response:

import pandas as pd
df = pd.DataFrame({'x':[1,2,3]})

This serves as a cautionary note:

df0 = df[df.x>2]
df0['foo'] = 'bar'

This does not:

df1 = df[df.x>2].copy(deep=False)
df1['foo'] = 'bar'

Both df0 and df1 are DataFrame objects, but something about them is different that enables pandas to print the warning. Let’s have a look at what it is.

import inspect
slice= df[df.x>2]
slice_copy = df[df.x>2].copy(deep=False)
inspect.getmembers(slice)
inspect.getmembers(slice_copy)

Using your preferred diff tool, you’ll notice that, aside from a couple of addresses, the only significant difference is this:

|          | slice   | slice_copy |
| _is_copy | weakref | None       |

DataFrame. check setitem copy, which tests _is copy, is the method that determines whether or not to warn. So there you have it. Make a copy of your DataFrame so that it isn’t _is copy.

The warning suggests that you use.loc, but you’ll still get the same warning if you use.loc on a frame that is _is copy. Misleading? Yes. Annoying? Yes, absolutely. Helpful? When chained assignment is utilized, this is possible. However, it is unable to accurately detect chain assignment and prints the warning to all users.

Answered by user443854

Post is based on https://stackoverflow.com/questions/20625582/how-to-deal-with-settingwithcopywarning-in-pandas