python进程池使用

python由于有全局锁的原因,多线程无法调用多核CPU。所以在计算密集型的程序中使用多进程是比较合适的策略,而进程池可以很方便的构建程序进程管理。

apply_async使用示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
import time
from multiprocessing import Pool
def task(x_y):
(x,y)=x_y
res = []
for i in range(x):
for j in range(y):
if i*j%99980000==1:
res.append(i)
else:
pass
return res
if "__main__" == __name__:
start = time.time()
results = []
x = [10000,10000,10000,10000]
y= [10000,10000,10000,10000]
x_y = list(zip(x,y))
pool = Pool(4)
for i in x_y:
result = pool.apply_async(task, (i, ))
results.append(result)
for i in results:
print(i.get())
end = time.time()
t = end - start
print(t) #12.45

CPU调用图:

要注意的点:

  1. pool = Pool(4) 默认Pool()会调用所有的CPU
  2. pool.apply_async 依次传入参数为 1 调用函数,2 参数(必须为元祖形式传入)
  3. i.get() 应该在进程结束后再使用 get 获得返回值
  4. x_y = list(zip(x,y))如果函数参数不止一个,应该使用这样的方式把参数合并。

map_async使用示例

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
import time
from multiprocessing import Pool
def task(x_y):
(x,y)=x_y
res = []
for i in range(x):
for j in range(y):
if i*j%99980000==1:
res.append(i)
else:
pass
return res
if "__main__" == __name__:
start = time.time()
x = [10000,10000,10000,10000]
y= [10000,10000,10000,10000]
x_y = list(zip(x,y))
pool = Pool(4) # 创建进程池对象
result = pool.map_async(task, x_y)
print(result.get()) # 进程函数返回值
end = time.time()
t = end - start
print(t) #12.34

要注意的点:

  1. pool.map_async(task, x_y)和 pool.apply_async 传入参数一致,但是不用循环,而是直接传入整体参数x_y

普通方式实现

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
import time
def task(x_y):
(x,y)=x_y
res = []
for i in range(x):
for j in range(y):
if i*j%99980000==1:
res.append(i)
else:
pass
return res
if "__main__" == __name__:
start = time.time()
x = [10000,10000,10000,10000]
y= [10000,10000,10000,10000]
x_y = list(zip(x,y))
for i in x_y:
print(task(i))
end = time.time()
t = end - start
print(t) #47.97