@@ -1089,7 +1089,7 @@ The final prediction goes to the largest posterior. This is known as the
10891089Kernel density estimation
10901090*************************
10911091
1092- It is possible to estimate a continuous probability density function
1092+ It is possible to estimate a continuous probability distribution
10931093from a fixed number of discrete samples.
10941094
10951095The basic idea is to smooth the data using `a kernel function such as a
@@ -1100,14 +1100,27 @@ which is called the *bandwidth*.
11001100
11011101.. testcode ::
11021102
1103- def kde_normal(sample, h):
1104- "Create a continuous probability density function from a sample."
1105- # Smooth the sample with a normal distribution kernel scaled by h.
1106- kernel_h = NormalDist(0.0, h).pdf
1107- n = len(sample)
1103+ from random import choice, random
1104+
1105+ def kde_normal(data, h):
1106+ "Create a continuous probability distribution from discrete samples."
1107+
1108+ # Smooth the data with a normal distribution kernel scaled by h.
1109+ K_h = NormalDist(0.0, h)
1110+
11081111 def pdf(x):
1109- return sum(kernel_h(x - x_i) for x_i in sample) / n
1110- return pdf
1112+ 'Probability density function. P(x <= X < x+dx) / dx'
1113+ return sum(K_h.pdf(x - x_i) for x_i in data) / len(data)
1114+
1115+ def cdf(x):
1116+ 'Cumulative distribution function. P(X <= x)'
1117+ return sum(K_h.cdf(x - x_i) for x_i in data) / len(data)
1118+
1119+ def rand():
1120+ 'Random selection from the probability distribution.'
1121+ return choice(data) + K_h.inv_cdf(random())
1122+
1123+ return pdf, cdf, rand
11111124
11121125`Wikipedia has an example
11131126<https://en.wikipedia.org/wiki/Kernel_density_estimation#Example> `_
@@ -1117,15 +1130,38 @@ a probability density function estimated from a small sample:
11171130.. doctest ::
11181131
11191132 >>> sample = [- 2.1 , - 1.3 , - 0.4 , 1.9 , 5.1 , 6.2 ]
1120- >>> f_hat = kde_normal(sample, h = 1.5 )
1133+ >>> pdf, cdf, rand = kde_normal(sample, h = 1.5 )
11211134 >>> xarr = [i/ 100 for i in range (- 750 , 1100 )]
1122- >>> yarr = [f_hat (x) for x in xarr]
1135+ >>> yarr = [pdf (x) for x in xarr]
11231136
11241137The points in ``xarr `` and ``yarr `` can be used to make a PDF plot:
11251138
11261139.. image :: kde_example.png
11271140 :alt: Scatter plot of the estimated probability density function.
11281141
1142+ `Resample <https://en.wikipedia.org/wiki/Resampling_(statistics) >`_
1143+ the data to produce 100 new selections:
1144+
1145+ .. doctest ::
1146+
1147+ >>> new_selections = [rand() for i in range (100 )]
1148+
1149+ Determine the probability of a new selection being below ``2.0 ``:
1150+
1151+ .. doctest ::
1152+
1153+ >>> round (cdf(2.0 ), 4 )
1154+ 0.5794
1155+
1156+ Add a new sample data point and find the new CDF at ``2.0 ``:
1157+
1158+ .. doctest ::
1159+
1160+ >>> sample.append(4.9 )
1161+ >>> round (cdf(2.0 ), 4 )
1162+ 0.5005
1163+
1164+
11291165..
11301166 # This modelines must appear within the last ten lines of the file.
11311167 kate: indent-width 3; remove-trailing-space on; replace-tabs on; encoding utf-8;
0 commit comments