Delete rows in pandas which match your header

Question

I'm kind of new with pandas and now i have a question.

I read a table from a html site and set my header according to the table on the website.

 df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)

Now I have the my dataframe with a matching header BUT I have some rows that are the same as the header, like the example below.

RK                  PLAYER  TEAM  GP   G   A  PTS  +/-  PIM  PTS/G  SOG 
1          Jamie Benn, LW   DAL  82  35  52   87    1   64   1.06  253   
2         John Tavares, C   NYI  82  38  48   86    5   46   1.05  278   
...      
10  Vladimir Tarasenko, RW   STL  77  37  36   73   27   31   0.95  264   
RK                  PLAYER  TEAM  GP   G   A  PTS  +/-  PIM  PTS/G  SOG 
14       Steven Stamkos, C    TB  82  43  29   72    2   49   0.88  268

I know that it's possible to delete duplicate rows with panda but is it possible to delete rows that are duplicates of the header or a specific row?

Hope you can help me out !

jezrael · Accepted Answer

You can use boolean indexing:

df = df[df.PLAYER != 'PLAYER']

If need also remove rows with PP in column PLAYER use isin:

Notice: I add [0] to the end of read_html, because it return list of dataframes an you need select first item of list:

df = pd.read_html('http://www.espn.com/nhl/statistics/player/_/stat/points/sort/points/year/2015/seasontype/2', header = 1)[0]
print (df)
     RK                  PLAYER  TEAM   GP    G    A  PTS  +/-  PIM  PTS/G  \
0     1          Jamie Benn, LW   DAL   82   35   52   87    1   64   1.06   
1     2         John Tavares, C   NYI   82   38   48   86    5   46   1.05   
2     3        Sidney Crosby, C   PIT   77   28   56   84    5   47   1.09   
3     4       Alex Ovechkin, LW   WSH   81   53   28   81   10   58   1.00   
4   NaN       Jakub Voracek, RW   PHI   82   22   59   81    1   78   0.99   
5     6    Nicklas Backstrom, C   WSH   82   18   60   78    5   40   0.95   
6     7         Tyler Seguin, C   DAL   71   37   40   77   -1   20   1.08   
7     8         Jiri Hudler, LW   CGY   78   31   45   76   17   14   0.97   
8   NaN        Daniel Sedin, LW   VAN   82   20   56   76    5   18   0.93   
9    10  Vladimir Tarasenko, RW   STL   77   37   36   73   27   31   0.95   
10  NaN                      PP    SH  NaN  NaN  NaN  NaN  NaN  NaN    NaN   
11   RK                  PLAYER  TEAM   GP    G    A  PTS  +/-  PIM  PTS/G   
12  NaN        Nick Foligno, LW   CBJ   79   31   42   73   16   50   0.92   
13  NaN        Claude Giroux, C   PHI   81   25   48   73   -3   36   0.90   
14  NaN         Henrik Sedin, C   VAN   82   18   55   73   11   22   0.89   
15   14       Steven Stamkos, C    TB   82   43   29   72    2   49   0.88   
...
...

mask = df['PLAYER'].isin(['PLAYER', 'PP'])
print (df[~mask])
     RK                  PLAYER TEAM  GP   G   A PTS  +/- PIM PTS/G  SOG  \
0     1          Jamie Benn, LW  DAL  82  35  52  87    1  64  1.06  253   
1     2         John Tavares, C  NYI  82  38  48  86    5  46  1.05  278   
2     3        Sidney Crosby, C  PIT  77  28  56  84    5  47  1.09  237   
3     4       Alex Ovechkin, LW  WSH  81  53  28  81   10  58  1.00  395   
4   NaN       Jakub Voracek, RW  PHI  82  22  59  81    1  78  0.99  221   
5     6    Nicklas Backstrom, C  WSH  82  18  60  78    5  40  0.95  153   
6     7         Tyler Seguin, C  DAL  71  37  40  77   -1  20  1.08  280   
7     8         Jiri Hudler, LW  CGY  78  31  45  76   17  14  0.97  158   
8   NaN        Daniel Sedin, LW  VAN  82  20  56  76    5  18  0.93  226   
9    10  Vladimir Tarasenko, RW  STL  77  37  36  73   27  31  0.95  264   
12  NaN        Nick Foligno, LW  CBJ  79  31  42  73   16  50  0.92  182   
13  NaN        Claude Giroux, C  PHI  81  25  48  73   -3  36  0.90  279   
14  NaN         Henrik Sedin, C  VAN  82  18  55  73   11  22  0.89  101   
15   14       Steven Stamkos, C   TB  82  43  29  72    2  49  0.88  268   
16  NaN        Tyler Johnson, C   TB  77  29  43  72   33  24  0.94  203   
17   16        Ryan Johansen, C  CBJ  82  26  45  71   -6  40  0.87  202   
18   17         Joe Pavelski, C   SJ  82  37  33  70   12  29  0.85  261   
19  NaN        Evgeni Malkin, C  PIT  69  28  42  70   -2  60  1.01  212   
20  NaN         Ryan Getzlaf, C  ANA  77  25  45  70   15  62  0.91  191   
21   20           Rick Nash, LW  NYR  79  42  27  69   29  36  0.87  304   
...
...

Delete rows in pandas which match your header

Tags:

python

html

pandas

duplicates

nieka

1 Answers

jezrael

Recent Activity

Donate For Us

Delete rows in pandas which match your header

Tags:

python

html

pandas

duplicates

nieka

1 Answers

jezrael

Related questions

Recent Activity

Donate For Us