1- .. _ pandas_docstring :
1+ .. _ docstring :
22
3- ====================================
4- How to write a good pandas docstring
5- ====================================
3+ ======================
4+ pandas docstring guide
5+ ======================
66
77About docstrings and standards
88------------------------------
@@ -38,6 +38,10 @@ Next example gives an idea on how a docstring looks like:
3838 int
3939 The sum of `num1` and `num2`
4040
41+ See Also
42+ --------
43+ subtract : Subtract one integer from another
44+
4145 Examples
4246 --------
4347 >>> add(2, 2)
@@ -56,11 +60,12 @@ The first conventions every Python docstring should follow are defined in
5660`PEP-257 <https://www.python.org/dev/peps/pep-0257/ >`_.
5761
5862As PEP-257 is quite open, and some other standards exist on top of it. In the
59- case of pandas, the numpy docstring convention is followed. There are two main
60- documents that explain this convention :
63+ case of pandas, the numpy docstring convention is followed. The conventions is
64+ explained in this document :
6165
62- - `Guide to NumPy/SciPy documentation <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt >`_
6366- `numpydoc docstring guide <http://numpydoc.readthedocs.io/en/latest/format.html >`_
67+ (which is based in the original `Guide to NumPy/SciPy documentation
68+ <https://github.com/numpy/numpy/blob/master/doc/HOWTO_DOCUMENT.rst.txt> `_)
6469
6570numpydoc is a Sphinx extension to support the numpy docstring convention.
6671
@@ -75,9 +80,13 @@ about reStructuredText can be found in:
7580The rest of this document will summarize all the above guides, and will
7681provide additional convention specific to the pandas project.
7782
83+ .. _docstring.tutorial :
84+
7885Writing a docstring
7986-------------------
8087
88+ .. _docstring.general :
89+
8190General rules
8291~~~~~~~~~~~~~
8392
@@ -124,6 +133,8 @@ opening quotes (not in the next line). The closing quotes have their own line
124133 bar = 2
125134 return foo + bar
126135
136+ .. _docstring.short_summary :
137+
127138Section 1: Short summary
128139~~~~~~~~~~~~~~~~~~~~~~~~
129140
@@ -178,6 +189,8 @@ details.
178189 """
179190 pass
180191
192+ .. _docstring.extended_summary :
193+
181194Section 2: Extended summary
182195~~~~~~~~~~~~~~~~~~~~~~~~~~~
183196
@@ -203,6 +216,8 @@ every paragraph in the extended summary is finished by a dot.
203216 """
204217 pass
205218
219+ .. _docstring.parameters :
220+
206221Section 3: Parameters
207222~~~~~~~~~~~~~~~~~~~~~
208223
@@ -223,12 +238,19 @@ required to have a line with the parameter description, which is indented, and
223238can have multiple lines. The description must start with a capital letter, and
224239finish with a dot.
225240
241+ Keyword arguments with a default value, the default will be listed in brackets
242+ at the end of the description (before the dot). The exact form of the
243+ description in this case would be "Description of the arg (default is X).". In
244+ some cases it may be useful to explain what the default argument means, which
245+ can be added after a comma "Description of the arg (default is -1, which means
246+ all cpus).".
247+
226248**Good: **
227249
228250.. code-block :: python
229251
230252 class Series :
231- def plot (self , kind , ** kwargs ):
253+ def plot (self , kind , color = ' blue ' , ** kwargs ):
232254 """ Generate a plot.
233255
234256 Render the data in the Series as a matplotlib plot of the
@@ -238,6 +260,8 @@ finish with a dot.
238260 ----------
239261 kind : str
240262 Kind of matplotlib plot.
263+ color : str
264+ Color name or rgb code (default is 'blue').
241265 **kwargs
242266 These parameters will be passed to the matplotlib plotting
243267 function.
@@ -272,6 +296,8 @@ finish with a dot.
272296 """
273297 pass
274298
299+ .. _docstring.parameter_types :
300+
275301Parameter types
276302^^^^^^^^^^^^^^^
277303
@@ -281,6 +307,7 @@ directly:
281307- int
282308- float
283309- str
310+ - bool
284311
285312For complex types, define the subtypes:
286313
@@ -290,7 +317,8 @@ For complex types, define the subtypes:
290317- set of {str}
291318
292319In case there are just a set of values allowed, list them in curly brackets
293- and separated by commas (followed by a space):
320+ and separated by commas (followed by a space). If one of them is the default
321+ value of a keyword argument, it should be listed first.:
294322
295323- {0, 10, 25}
296324- {'simple', 'advanced'}
@@ -306,10 +334,21 @@ If the type is in a package, the module must be also specified:
306334- numpy.ndarray
307335- scipy.sparse.coo_matrix
308336
309- If the type is a pandas type, also specify pandas:
337+ If the type is a pandas type, also specify pandas except for Series and
338+ DataFrame:
339+
340+ - Series
341+ - DataFrame
342+ - pandas.Index
343+ - pandas.Categorical
344+ - pandas.SparseArray
310345
311- - pandas.Series
312- - pandas.DataFrame
346+ If the exact type is not relevant, but must be compatible with a numpy
347+ array, array-like can be specified. If Any type that can be iterated is
348+ accepted, iterable can be used:
349+
350+ - array-like
351+ - iterable
313352
314353If more than one type is accepted, separate them by commas, except the
315354last two types, that need to be separated by the word 'or':
@@ -321,6 +360,8 @@ last two types, that need to be separated by the word 'or':
321360If None is one of the accepted values, it always needs to be the last in
322361the list.
323362
363+ .. _docstring.returns :
364+
324365Section 4: Returns or Yields
325366~~~~~~~~~~~~~~~~~~~~~~~~~~~~
326367
@@ -395,12 +436,14 @@ If the method yields its value:
395436 while True :
396437 yield random.random()
397438
439+ .. _docstring.see_also :
398440
399- Section 5: See also
441+ Section 5: See Also
400442~~~~~~~~~~~~~~~~~~~
401443
402444This is an optional section, used to let users know about pandas functionality
403- related to the one being documented.
445+ related to the one being documented. While optional, this section should exist
446+ in most cases, unless no related methods or functions can be found at all.
404447
405448An obvious example would be the `head() ` and `tail() ` methods. As `tail() ` does
406449the equivalent as `head() ` but at the end of the `Series ` or `DataFrame `
@@ -421,22 +464,30 @@ examples:
421464* `astype ` and `pandas.to_datetime `, as users may be reading the documentation
422465 of `astype ` to know how to cast as a date, and the way to do it is with
423466 `pandas.to_datetime `
467+ * `where ` is related to `numpy.where `, as its functionality is based on it
424468
425469When deciding what is related, you should mainly use your common sense and
426470think about what can be useful for the users reading the documentation,
427471especially the less experienced ones.
428472
473+ When relating to other libraries (mainly `numpy `), use the name of the module
474+ first (not an alias like `np `). If the function is in a module which is not
475+ the main one, like `scipy.sparse `, list the full module (e.g.
476+ `scipy.sparse.coo_matrix `).
477+
429478This section, as the previous, also has a header, "See Also" (note the capital
430479S and A). Also followed by the line with hyphens, and preceded by a blank line.
431480
432481After the header, we will add a line for each related method or function,
433482followed by a space, a colon, another space, and a short description that
434- illustrated what this method or function does, and why is it relevant in
435- this context. The description must also finish with a dot.
483+ illustrated what this method or function does, why is it relevant in this
484+ context, and what are the key differences between the documented function and
485+ the one referencing. The description must also finish with a dot.
436486
437487Note that in "Returns" and "Yields", the description is located in the
438488following line than the type. But in this section it is located in the same
439- line, with a colon in between.
489+ line, with a colon in between. If the description does not fit in the same
490+ line, it can continue in the next ones, but it has to be indenteted in them.
440491
441492For example:
442493
@@ -449,9 +500,9 @@ For example:
449500 This function is mainly useful to preview the values of the
450501 Series without displaying the whole of it.
451502
452- Return
453- ------
454- pandas. Series
503+ Returns
504+ -------
505+ Series
455506 Subset of the original series with the 5 first values.
456507
457508 See Also
@@ -460,6 +511,8 @@ For example:
460511 """
461512 return self .iloc[:5 ]
462513
514+ .. _docstring.notes :
515+
463516Section 6: Notes
464517~~~~~~~~~~~~~~~~
465518
@@ -472,16 +525,18 @@ examples for the function.
472525
473526This section follows the same format as the extended summary section.
474527
528+ .. _docstring.examples :
529+
475530Section 7: Examples
476531~~~~~~~~~~~~~~~~~~~
477532
478533This is one of the most important sections of a docstring, even if it is
479534placed in the last position. As often, people understand concepts better
480535with examples, than with accurate explanations.
481536
482- Examples in docstrings are also unit tests, and besides illustrating the
483- usage of the function or method, they need to be valid Python code, that in a
484- deterministic way returns the presented output.
537+ Examples in docstrings, besides illustrating the usage of the function or
538+ method, they must be valid Python code, that in a deterministic way returns
539+ the presented output, and that can be copied and run by users .
485540
486541They are presented as a session in the Python terminal. `>>> ` is used to
487542present code. `... ` is used for code continuing from the previous line.
@@ -491,14 +546,21 @@ be added with blank lines before and after them.
491546
492547The way to present examples is as follows:
493548
494- 1. Import required libraries
549+ 1. Import required libraries (except ` numpy ` and ` pandas `)
495550
4965512. Create the data required for the example
497552
4985533. Show a very basic example that gives an idea of the most common use case
499554
500- 4. Add commented examples that illustrate how the parameters can be used for
501- extended functionality
555+ 4. Add examples with explanations that illustrate how the parameters can be
556+ used for extended functionality
557+
558+ .. note ::
559+ Which data should be used in examples is a topic still under discussion.
560+ We'll likely be importing a standard dataset from `pandas.io.samples `, but
561+ this still needs confirmation. You can work with the data from this pull
562+ request: https://github.com/pandas-dev/pandas/pull/19933/files but
563+ consider this could still change.
502564
503565A simple example could be:
504566
@@ -527,9 +589,8 @@ A simple example could be:
527589
528590 Examples
529591 --------
530- >>> import pandas
531- >>> s = pandas.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',
532- ... 'Lion', 'Monkey', 'Rabbit', 'Zebra'])
592+ >>> s = pd.Series(['Ant', 'Bear', 'Cow', 'Dog', 'Falcon',
593+ ... 'Lion', 'Monkey', 'Rabbit', 'Zebra'])
533594 >>> s.head()
534595 0 Ant
535596 1 Bear
@@ -548,32 +609,25 @@ A simple example could be:
548609 """
549610 return self .iloc[:n]
550611
612+ .. _docstring.example_conventions :
613+
551614Conventions for the examples
552615^^^^^^^^^^^^^^^^^^^^^^^^^^^^
553616
554- .. note ::
555- numpydoc recommends avoiding "obvious" imports and importing them with
556- aliases, so for example `import numpy as np `. While this is now an standard
557- in the data ecosystem of Python, it doesn't seem a good practise, for the
558- next reasons:
559-
560- * The code is not executable anymore (as doctests for example)
617+ Code in examples is assumed to always start with these two lines which are not
618+ shown:
561619
562- * New users not familiar with the convention can't simply copy and run it
620+ .. code-block :: python
563621
564- * Users may use aliases (even if it is a bad Python practise except
565- in rare cases), but if maintainers want to use `pd ` instead of `pandas `,
566- why do not name the module `pd ` directly?
622+ import numpy as np
623+ import pandas as pd
567624
568- * As this is becoming more standard, there are an increasing number of
569- aliases in scientific Python code, including `np `, `pd `, `plt `, `sp `,
570- `pm `... which makes reading code harder
571625
572- All examples must start with the required imports , one per line (as
626+ Any other module used in the examples must be explicitly imported , one per line (as
573627recommended in `PEP-8 <https://www.python.org/dev/peps/pep-0008/#imports >`_)
574628and avoiding aliases. Avoid excessive imports, but if needed, imports from
575629the standard library go first, followed by third-party libraries (like
576- numpy) and importing pandas in the last place .
630+ matplotlib) .
577631
578632When illustrating examples with a single `Series ` use the name `s `, and if
579633illustrating with a single `DataFrame ` use the name `df `. If a set of
@@ -605,11 +659,9 @@ positional arguments `head(3)`.
605659
606660 Examples
607661 --------
608- >>> import numpy
609- >>> import pandas
610- >>> df = pandas.DataFrame([389., 24., 80.5, numpy.nan]
611- ... columns=('max_speed'),
612- ... index=['falcon', 'parrot', 'lion', 'monkey'])
662+ >>> df = pd.DataFrame([389., 24., 80.5, numpy.nan]
663+ ... columns=('max_speed'),
664+ ... index=['falcon', 'parrot', 'lion', 'monkey'])
613665 """
614666 pass
615667
@@ -622,15 +674,9 @@ positional arguments `head(3)`.
622674
623675 Examples
624676 --------
625- >>> import numpy
626- >>> import pandas
627- >>> df = pandas .DataFrame(numpy.random.randn(3, 3),
628- ... columns=('a', 'b', 'c'))
677+ >>> import numpy as np
678+ >>> import pandas as pd
679+ >>> df = pd .DataFrame(numpy.random.randn(3, 3),
680+ ... columns=('a', 'b', 'c'))
629681 """
630682 pass
631-
632- Once you finished the docstring
633- -------------------------------
634-
635- When you finished the changes to the docstring, go to the
636- :ref: `instructions to submit your changes <pandas_pr >` to continue.
0 commit comments