KS统计量表达为：

Clauset等人2009年发表了题为Power-law distributions in empirical data，提出了拟合幂律分布数据的思路1。这一思路使用最大似然的方法拟合，使用Kolmogorov-Smirnov检验统计量和似然率为基础来评估拟合的结果。

%matplotlib inline
import re
import sys
import powerlaw
import numpy as np
import matplotlib.cm as cm
import networkx as nx
import statsmodels.api as sm
from collections import Counter, defaultdict
import matplotlib.pyplot as plt

with open("/Users/Weibo Landscape/data/aggregate.txt", 'rb') as f:

len(dat)

741754

G=nx.DiGraph() # G = nx.DiGraph() # æåçœç»

for line in dat:
a, b = line.strip().split('\t')[:2]
if b:


nx.info(G)

'Name: \nType: DiGraph\nNumber of nodes: 151937\nNumber of edges: 279499\nAverage in degree:   1.8396\nAverage out degree:   1.8396'

import powerlaw

def plotPowerlaw(data,ax,col,xlab):
fit = powerlaw.Fit(data,xmin=1)
fit.plot_pdf(color = col, linewidth = 2)
fit = powerlaw.Fit(data)
fit.power_law.plot_pdf(color = col, linestyle = 'dotted', ax = ax)
a,x = (fit.power_law.alpha,fit.power_law.xmin)
minx,maxx=ax.get_xlim()
miny,maxy=ax.get_ylim()
ax.text(minx*5,maxy/10,r"$\alpha = %d \:\:, x_{min} = %d$" % (a,x), fontsize = 20)
ax.set_xlabel(xlab, fontsize = 20)
ax.set_ylabel('$P(k)$', fontsize = 20)


d = G.degree()
degree = d.values()

fig = plt.figure(figsize=(6, 6),facecolor='white')
cmap = cm.get_cmap('rainbow_r',6)

plotPowerlaw(degree,ax,cmap(6),'$k$')
plt.savefig('/Users/chengjun/çŸåºŠäºåæ­¥ç/Writing/Weibo Landscape/data/Figure2_powerlaw_PDF.png',
dpi = 400, bbox_inches="tight",transparent = True)

Calculating best minimal value for power law fit


fitted = powerlaw.Fit(degree, xmin = 3)

fitted.alpha, fitted.D

(2.3663242775904081, 0.034385401350230271)

python
help(fitted_pl.distribution_compare)


Help on method distribution_compare in module powerlaw:

distribution_compare(self, dist1, dist2, nested=None, **kwargs) method of powerlaw.Fit instance
Returns the loglikelihood ratio, and its p-value, between the two
distribution fits, assuming the candidate distributions are nested.

Parameters
----------
dist1 : string
Name of the first candidate distribution (ex. 'power_law')
dist2 : string
Name of the second candidate distribution (ex. 'exponential')
nested : bool or None, optional
Whether to assume the candidate distributions are nested versions
of each other. None assumes not unless the name of one distribution
is a substring of the other.

Returns
-------
R : float
Loglikelihood ratio of the two distributions' fit to the data. If
greater than 0, the first distribution is preferred. If less than
0, the second distribution is preferred.
p : float
Significance of R


R是拟合两统计分布的似然率， p值是R的显著性程度。



r1, p1 = fitted.distribution_compare('power_law', 'exponential', nested = True)
r2, p2 = fitted.distribution_compare('power_law', 'lognormal', nested = True)
r3, p3 = fitted.distribution_compare('power_law', 'truncated_power_law', nested = True)
r4, p4 = fitted.distribution_compare('power_law', 'stretched_exponential', nested = True)

print r1, p1
print r2, p2
print r3, p3
print r4, p4

Assuming nested distributions

25001.9789239 0.0
5.97249455121 0.000547945514698
2.61348956987e-05 0.99423151641
665.388668948 0.0


• 正态/高斯分布Normal distribution / Gaussian distribution
• 对数正态分布Log-normal distribution
• 指数/负指数分布Exponential distribution / Negative exponential distribution
• 泊松分布Poisson distribution
• 幂律分布Power law distribution
• 指数截断的幂律分布Power law with exponential cutoff
• 截断幂律Truncated power law
• 广延指数分布Stretched exponential distribution
• 漂移幂律Shifted power law

## 另外一个例子

from networkx.utils import powerlaw_sequence

pl_sequence = powerlaw_sequence(1000,exponent=2.5)

fitted_pl = powerlaw.Fit(pl_sequence)

fitted_pl.alpha

2.4627497555168394

fitted_pl.D

0.016086740927675369

r1, p1 = fitted_pl.distribution_compare('power_law', 'exponential', nested = True)
r2, p2 = fitted_pl.distribution_compare('power_law', 'lognormal', nested = True)
r3, p3 = fitted_pl.distribution_compare('power_law', 'truncated_power_law', nested = True)
r4, p4 = fitted_pl.distribution_compare('power_law', 'stretched_exponential', nested = True)

print r1, p1
print r2, p2
print r3, p3
print r4, p4


438.9629348 0.0
0.00543046554532 0.916998102815
-0.0013573718109 0.958446461888
5.13999579922 0.00134480559265


# 参考文献

1. A. Clauset, C.R. Shalizi, and M.E.J. Newman, “Power-law distributions in empirical data” SIAM Review 51(4), 661-703 (2009). (arXiv:0706.1062, doi:10.1137/070710111)

2. Alstott, J., Bullmore, E., & Plenz, D. (2014). Powerlaw: a Python package for analysis of heavy-tailed distributions. Plos One, 9(1), e85777.

3. stretched exponential function $f_\beta (t) = e^{ -t^\beta }$

4. 复杂性科学研究中常见分布律及数据拟合方法 http://blog.sciencenet.cn/blog-419840-601485.html

Tags:

Categories:

Updated: