@@ -1159,173 +1159,6 @@ The ``.dt`` accessor works for period and timedelta dtypes.
11591159
11601160 ``Series.dt `` will raise a ``TypeError `` if you access with a non-datetimelike values
11611161
1162- .. _basics.string_methods :
1163-
1164- Vectorized string methods
1165- -------------------------
1166-
1167- Series is equipped (as of pandas 0.8.1) with a set of string processing methods
1168- that make it easy to operate on each element of the array. Perhaps most
1169- importantly, these methods exclude missing/NA values automatically. These are
1170- accessed via the Series's ``str `` attribute and generally have names matching
1171- the equivalent (scalar) build-in string methods:
1172-
1173- Splitting and Replacing Strings
1174- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1175-
1176- .. ipython :: python
1177-
1178- s = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
1179- s.str.lower()
1180- s.str.upper()
1181- s.str.len()
1182-
1183- Methods like ``split `` return a Series of lists:
1184-
1185- .. ipython :: python
1186-
1187- s2 = Series([' a_b_c' , ' c_d_e' , np.nan, ' f_g_h' ])
1188- s2.str.split(' _' )
1189-
1190- Elements in the split lists can be accessed using ``get `` or ``[] `` notation:
1191-
1192- .. ipython :: python
1193-
1194- s2.str.split(' _' ).str.get(1 )
1195- s2.str.split(' _' ).str[1 ]
1196-
1197- Methods like ``replace `` and ``findall `` take regular expressions, too:
1198-
1199- .. ipython :: python
1200-
1201- s3 = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' ,
1202- ' ' , np.nan, ' CABA' , ' dog' , ' cat' ])
1203- s3
1204- s3.str.replace(' ^.a|dog' , ' XX-XX ' , case = False )
1205-
1206- Extracting Substrings
1207- ~~~~~~~~~~~~~~~~~~~~~
1208-
1209- The method ``extract `` (introduced in version 0.13) accepts regular expressions
1210- with match groups. Extracting a regular expression with one group returns
1211- a Series of strings.
1212-
1213- .. ipython :: python
1214-
1215- Series([' a1' , ' b2' , ' c3' ]).str.extract(' [ab](\d)' )
1216-
1217- Elements that do not match return ``NaN ``. Extracting a regular expression
1218- with more than one group returns a DataFrame with one column per group.
1219-
1220- .. ipython :: python
1221-
1222- Series([' a1' , ' b2' , ' c3' ]).str.extract(' ([ab])(\d)' )
1223-
1224- Elements that do not match return a row filled with ``NaN ``.
1225- Thus, a Series of messy strings can be "converted" into a
1226- like-indexed Series or DataFrame of cleaned-up or more useful strings,
1227- without necessitating ``get() `` to access tuples or ``re.match `` objects.
1228-
1229- The results dtype always is object, even if no match is found and the result
1230- only contains ``NaN ``.
1231-
1232- Named groups like
1233-
1234- .. ipython :: python
1235-
1236- Series([' a1' , ' b2' , ' c3' ]).str.extract(' (?P<letter>[ab])(?P<digit>\d)' )
1237-
1238- and optional groups like
1239-
1240- .. ipython :: python
1241-
1242- Series([' a1' , ' b2' , ' 3' ]).str.extract(' (?P<letter>[ab])?(?P<digit>\d)' )
1243-
1244- can also be used.
1245-
1246- Testing for Strings that Match or Contain a Pattern
1247- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1248-
1249- You can check whether elements contain a pattern:
1250-
1251- .. ipython :: python
1252-
1253- pattern = r ' [a-z ][0-9 ]'
1254- Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).str.contains(pattern)
1255-
1256- or match a pattern:
1257-
1258-
1259- .. ipython :: python
1260-
1261- Series([' 1' , ' 2' , ' 3a' , ' 3b' , ' 03c' ]).str.match(pattern, as_indexer = True )
1262-
1263- The distinction between ``match `` and ``contains `` is strictness: ``match ``
1264- relies on strict ``re.match ``, while ``contains `` relies on ``re.search ``.
1265-
1266- .. warning ::
1267-
1268- In previous versions, ``match `` was for *extracting * groups,
1269- returning a not-so-convenient Series of tuples. The new method ``extract ``
1270- (described in the previous section) is now preferred.
1271-
1272- This old, deprecated behavior of ``match `` is still the default. As
1273- demonstrated above, use the new behavior by setting ``as_indexer=True ``.
1274- In this mode, ``match `` is analogous to ``contains ``, returning a boolean
1275- Series. The new behavior will become the default behavior in a future
1276- release.
1277-
1278- Methods like ``match ``, ``contains ``, ``startswith ``, and ``endswith `` take
1279- an extra ``na `` argument so missing values can be considered True or False:
1280-
1281- .. ipython :: python
1282-
1283- s4 = Series([' A' , ' B' , ' C' , ' Aaba' , ' Baca' , np.nan, ' CABA' , ' dog' , ' cat' ])
1284- s4.str.contains(' A' , na = False )
1285-
1286- .. csv-table ::
1287- :header: "Method", "Description"
1288- :widths: 20, 80
1289-
1290- ``cat ``,Concatenate strings
1291- ``split ``,Split strings on delimiter
1292- ``get ``,Index into each element (retrieve i-th element)
1293- ``join ``,Join strings in each element of the Series with passed separator
1294- ``contains ``,Return boolean array if each string contains pattern/regex
1295- ``replace ``,Replace occurrences of pattern/regex with some other string
1296- ``repeat ``,Duplicate values (``s.str.repeat(3) `` equivalent to ``x * 3 ``)
1297- ``pad ``,"Add whitespace to left, right, or both sides of strings"
1298- ``center ``,Equivalent to ``pad(side='both') ``
1299- ``wrap ``,Split long strings into lines with length less than a given width
1300- ``slice ``,Slice each string in the Series
1301- ``slice_replace ``,Replace slice in each string with passed value
1302- ``count ``,Count occurrences of pattern
1303- ``startswith ``,Equivalent to ``str.startswith(pat) `` for each element
1304- ``endswith ``,Equivalent to ``str.endswith(pat) `` for each element
1305- ``findall ``,Compute list of all occurrences of pattern/regex for each string
1306- ``match ``,"Call ``re.match `` on each element, returning matched groups as list"
1307- ``extract ``,"Call ``re.match `` on each element, as ``match `` does, but return matched groups as strings for convenience."
1308- ``len ``,Compute string lengths
1309- ``strip ``,Equivalent to ``str.strip ``
1310- ``rstrip ``,Equivalent to ``str.rstrip ``
1311- ``lstrip ``,Equivalent to ``str.lstrip ``
1312- ``lower ``,Equivalent to ``str.lower ``
1313- ``upper ``,Equivalent to ``str.upper ``
1314-
1315-
1316- Getting indicator variables from separated strings
1317- ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
1318-
1319- You can extract dummy variables from string columns.
1320- For example if they are separated by a ``'|' ``:
1321-
1322- .. ipython :: python
1323-
1324- s = pd.Series([' a' , ' a|b' , np.nan, ' a|c' ])
1325- s.str.get_dummies(sep = ' |' )
1326-
1327- See also :func: `~pandas.get_dummies `.
1328-
13291162.. _basics.sorting :
13301163
13311164Sorting by index and value
0 commit comments