@@ -1180,28 +1180,43 @@ takes as an argument the columns to use to identify duplicated rows.
11801180By default, the first observed row of a duplicate set is considered unique, but
11811181each method has a ``keep `` parameter to specify targets to be kept.
11821182
1183+ - ``keep='first' `` (default): mark / drop duplicates except for the first occurrence.
1184+ - ``keep='last' ``: mark / drop duplicates except for the last occurrence.
1185+ - ``keep=False ``: mark / drop all duplicates.
1186+
11831187.. ipython :: python
11841188
1185- df2 = pd.DataFrame({' a' : [' one' , ' one' , ' two' , ' three' , ' two' , ' one' , ' six' ],
1186- ' b' : [' x' , ' y' , ' y' , ' x' , ' y' , ' x' , ' x' ],
1187- ' c' : np.random.randn(7 )})
1188- df2.duplicated([' a' ,' b' ])
1189- df2.duplicated([' a' ,' b' ], keep = ' last' )
1190- df2.duplicated([' a' ,' b' ], keep = False )
1191- df2.drop_duplicates([' a' ,' b' ])
1192- df2.drop_duplicates([' a' ,' b' ], keep = ' last' )
1193- df2.drop_duplicates([' a' ,' b' ], keep = False )
1189+ df2 = pd.DataFrame({' a' : [' one' , ' one' , ' two' , ' two' , ' two' , ' three' , ' four' ],
1190+ ' b' : [' x' , ' y' , ' x' , ' y' , ' x' , ' x' , ' x' ],
1191+ ' c' : np.random.randn(7 )})
1192+ df2
1193+ df2.duplicated(' a' )
1194+ df2.duplicated(' a' , keep = ' last' )
1195+ df2.duplicated(' a' , keep = False )
1196+ df2.drop_duplicates(' a' )
1197+ df2.drop_duplicates(' a' , keep = ' last' )
1198+ df2.drop_duplicates(' a' , keep = False )
11941199
1195- An alternative way to drop duplicates on the index is `` .groupby(level=0) `` combined with `` first() `` or `` last() `` .
1200+ Also, you can pass a list of columns to identify duplications .
11961201
11971202.. ipython :: python
11981203
1199- df3 = df2.set_index(' b' )
1200- df3
1201- df3.groupby(level = 0 ).first()
1204+ df2.duplicated([' a' , ' b' ])
1205+ df2.drop_duplicates([' a' , ' b' ])
1206+
1207+ To drop duplicates by index value, use ``Index.duplicated `` then perform slicing.
1208+ Same options are available in ``keep `` parameter.
12021209
1203- # a bit more verbose
1204- df3.reset_index().drop_duplicates(subset = ' b' , keep = ' first' ).set_index(' b' )
1210+ .. ipython :: python
1211+
1212+ df3 = pd.DataFrame({' a' : np.arange(6 ),
1213+ ' b' : np.random.randn(6 )},
1214+ index = [' a' , ' a' , ' b' , ' c' , ' b' , ' a' ])
1215+ df3
1216+ df3.index.duplicated()
1217+ df3[~ df3.index.duplicated()]
1218+ df3[~ df3.index.duplicated(keep = ' last' )]
1219+ df3[~ df3.index.duplicated(keep = False )]
12051220
12061221 .. _indexing.dictionarylike :
12071222
0 commit comments