非栈式编程库stackless

lc_wangchao

浏览: 34107 次
性别:
来自: 北京

最近访客更多访客>>

有生菜有夏天

snail_lzq

lps_683

focus301

博主相关

博客

微博

相册

留言

关于我

文章分类

社区版块

存档分类

博客分类：

Python

Stackless是python的一个增强版本，我们可以把它看做python协程的一个实现。使用stackless，不仅可以避免创建线程所引起的不必要的开销，同时也可以实现无锁编程。相比于其他多线程或者多进程的服务，使用stackless更容易实现大规模的并发，更重要的是，用stackless写出的代码更容易理解，也更加简洁。

为了更容易理解，我们先看一些协程的概念

协程

在wiki上是这么解释的

由于协程不如子例程那样被普遍所知，最好对它们作个比较。子例程的起始处是惟一的入口点,一旦退出即完成了子程序的执行，子程序的一个实例只会返回一次。协程可以通过yield来调用其它协程。通过yield方式转移执行权的协程之间不是调用者与被调用者的关系，而是彼此对称、平等的。协程的起始处是第一个入口点，在协程里，返回点之后是接下来的入口点。子例程的生命期遵循后进先出（最后一个被调用的子例程最先返回）；相反，协程的生命期完全由他们的使用的需要决定。

也就是说，和一次进入只有一次返回的普通函数不同，协程在运行过程中随时都可以通过yield返回，并当某些条件满足时继续回来运行。yield在英文里有 ‘屈服，投降’的意思，我们在这里看做运行权的转让，让另一个协程占用资源运行。另外，转让出去的协程和原来协程间的地位是对称平等的，具体到实际场景上来解释就是：协程之间不存在栈的调用关系。

比如，协程代码可能是这样：

var q := new queue

生产者：

loop
    while q is not full
       create some new items
       add the items to q
   yield to consume

消费者：

loop
   while q is not empty
       remove some items from q
       use the items
   yield to produce

一般来讲，这样的编程方式无疑是场灾难，生产者和消费者的相互递归调用最终会引发栈的溢出。但是在协程中，因为yield只是转移而非调用，栈元素不会积累，当然就不会照成栈溢出的情况。

在原生的python中有对协程的简单实现，也就是yield关键字

yield关键字

关于yield先看下面一个简单的例子：

#!/usr/bin/python

def handler():
    i = 0
    while True:
        i = i + 1
        msg = yield i
        print msg

h = handler()
h.next()

while True:
    msg = raw_input()
    h.send(msg)

在行为上来说，含有yield的函数更像是一个迭代器，h = handler() 只是生成了一个迭代器的实例，只有运行了 h.next()后它才真正地被执行，同时迭代器的位置移向了函数的下一个yield所在的位置并等待下一次的调用。当我们调用h.send(msg)时，msg会被传回函数并继续向下执行。

对于刚接触yield的开发者来讲，这东西怎么看怎么像一个迭代器。而即使作为迭代器的话，常规迭代器的方式已经足够了，yield略显鸡肋。

的确，在功能上yield能实现的其它也能实现，但是在语法上，yield却的提供了别的机制无法提供的东西，那就是在函数内随意跳出以及返回的能力。运用yield机制我们可以以阻塞的写法写出非阻塞的代码，比如

def recvSocket(sock):
    while True:
        msg = yield 'recv'
        if not msg
            break
        print msg

把yield换成socket.recv(),这段就是一段阻塞IO的实现，阻塞IO的优点是写起来简单，缺点在于开销过大，而运用yield便可以完美地融合阻塞IO和非阻塞IO的优点。在上面的代码里，当需要读取socket数据的时候，函数交出了运行的控制权等待网络数据的到达，一旦socket可读这段代码又会重新被调度来执行之后的内容

当然，由于python原生的协程机制过于简陋，我们不得不在此之外写大量的支持代码，有没有一种更加完美地机制来支持协程编程哪。当然有， stackless就是一个不错的选择

stackless

关于stackless的具体内容可以看一下这个教程，讲的很详细很不错，下面只是介绍一下重点内容。

stackless是python中协程的实现，或者从字面上，你可以叫它非栈式编程（协程平等的定义）。在stackless中，最基本单元叫tasklet，我们叫它微进程。

举一个例子（来自于上面的教程，还有之后的一些例子）：

Python 2.4.3 Stackless 3.1b3 060504 (#69, May  3 2006, 19:20:41) [MSC v.1310 32
bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import stackless
>>>
>>> def print_x(x):
...     print x
...
>>> stackless.tasklet(print_x)('one')
<stackless.tasklet object at 0x00A45870>
>>> stackless.tasklet(print_x)('two')
<stackless.tasklet object at 0x00A45A30>
>>> stackless.tasklet(print_x)('three')
<stackless.tasklet object at 0x00A45AB0>
>>>
>>> stackless.run()
one
two
three
>>>

当我们调用stackless.tasklet(print_x)('one') 时，print_x并不运行，只有stackless.run() 之后，所有的tasklet才会被调度执行。

微进程并并不是进程，它甚至连线程都不是，实际上它是运行在单一线程中任务队列里的一个任务，一个函数。这个任务的特殊之处在于，它可以在中途停止运行，并进入休眠状态，而条件具备后又会被唤醒重新调度。再看另外一个例子：

Python 2.4.3 Stackless 3.1b3 060504 (#69, May  3 2006, 19:20:41) [MSC v.1310 32
bit (Intel)] on win32
Type "help", "copyright", "credits" or "license" for more information.
>>> import stackless
>>>
>>> channel = stackless.channel()
>>>
>>> def receiving_tasklet():
...     print "Recieving tasklet started"
...     print channel.receive()
...     print "Receiving tasklet finished"
...
>>> def sending_tasklet():
...     print "Sending tasklet started"
...     channel.send("send from sending_tasklet")
...     print "sending tasklet finished"
...
>>> def another_tasklet():
...     print "Just another tasklet in the scheduler"
...
>>> stackless.tasklet(receiving_tasklet)()
<stackless.tasklet object at 0x00A45B30>
>>> stackless.tasklet(sending_tasklet)()
<stackless.tasklet object at 0x00A45B70>
>>> stackless.tasklet(another_tasklet)()
<stackless.tasklet object at 0x00A45BF0>
>>>
>>> stackless.run()
Recieving tasklet started
Sending tasklet started
send from sending_tasklet
Receiving tasklet finished
Just another tasklet in the scheduler
sending tasklet finished
>>>

通道（channel）是不同任务间通信的重要渠道。在上述例子中channel.recive() 的调用会使当前任务放弃执行权，队列会接着执行其他的任务。只有当其它任务调用channel.send()后，之前的任务才会被从新唤醒并取得发送的信息作为receive的结果继续运行。而发送信息的任务会被放在队列尾部等待下一轮调用就像stackless.schedule()一样。

顺便提一下，channel.receive()，channel.send() 和 stackless.schedule() 虽然都可以使任务放弃执行权，但对于channel.receive()来讲只有其接收的通道被调用send()函数后才会被唤醒，否则会一直休眠，而对于channel.send() 和 stackless.schedule()来说，任务只是被放在了调度队列的底部，之后会被接着执行。

微进程的意义在哪里哪？

第一，高并发。我们说过，微进程不是进程也不是线程，它只是运行在单一进程中的任务，也就是说，微进程的生成开销要比进程和线程小的多，依赖与于队列这种轻量级的调度，不同任务间上下文切换的成本也大大低于进程和线程。

第二，无锁。锁是为了解决线程间的同步而被提出的，但是一方面容易造成死锁，另一方面，大量的锁使得系统的并发性变得很低。在stackless中，没有线程就没有锁，任何一个任务运行过程中都可以保证自己唯一在操作资源。

为了演示stackless如何实现无锁编程，我们举一个生产者/消费者的经典例子。在生产者/消费者模型中，生产者在制造产品到仓库中，消费者从仓库消耗产品。但是仓库可能为空，同时也会有最大储存量。当仓库为空时，如果消费者有消费请求，该请求将被挂起，等待生产者生产后才再度激活。同理，当仓库存储达到上限，生产者也需要等待消费者消费后再将物品放入。在普通的多线程编程中，由于线程之间相互独立，必须靠锁来保证共享变量（也就是仓库存货）的同步，极易造成死锁。而在stackless中，我们可以这样实现：

#!/usr/bin/python
import stackless

class Store:
    def __init__(self,items = [], max = 2):
        if not items:
            self.items = []
        self.items = items
        self.max = max
        self.pushChannel = stackless.channel()
        self.pushChannel.preference = 1
        self.popChannel = stackless.channel()
        self.popChannel.preference = 1

    def push(self, item):
        while len(self.items) >= self.max:
            self.pushChannel.receive()
        self.items.append(item)
        if self.popChannel.balance < 0:
            self.popChannel.send(None)

    def pop(self):
        while len(self.items) <= 0:
            self.popChannel.receive()
        ret = self.items.pop()
        if self.pushChannel.balance < 0:
            self.pushChannel.send(None)
        return ret

    def status(self):
        return "%d items in store now."%len(self.items)


def consume(store):
    item = store.pop()
    print "Consume ",item
    print store.status()
    print '\n'

def produce(store,item):
    store.push(item)
    print "Produce ",item
    print store.status()
    print '\n'


def main():
    store = Store()
    stackless.tasklet(consume)(store)
    stackless.tasklet(consume)(store)
    stackless.tasklet(produce)(store, '1')
    stackless.tasklet(produce)(store, '2')
    stackless.tasklet(produce)(store, '3')
    stackless.tasklet(produce)(store, '4')
    stackless.tasklet(produce)(store, '5')
    stackless.tasklet(produce)(store, '6')
    stackless.tasklet(consume)(store)
    stackless.tasklet(consume)(store)
    stackless.run()

if __name__ == '__main__':
    main()

运用stackless中的通道机制，我们可以很容易地控制生产者和消费者的行为。当Store为空时，consume任务会被阻塞，等待生产者生产产品，生产者生产完之后会通过管道重新激活consume消费。由于所有任务都运行在一个线程中，在转让运行权之前完全不用担心变量会被其他任务修改。

值得注意的是，在上面代码里，我们将channel的preference 设置为1，这使得调用send之后任务不被阻塞而继续运行，以便在之后输出正确的仓库信息。

下面是运行结果：

Produce  1
1 items in store now.


Produce  2
2 items in store now.


Consume  2
1 items in store now.


Consume  1
0 items in store now.


Produce  3
1 items in store now.


Produce  4
2 items in store now.


Consume  4
1 items in store now.


Consume  3
0 items in store now.


Produce  5
1 items in store now.


Produce  6
2 items in store now.

顺便一说，锁机制是抢占式内核的必然结果，由于进程不能保证任意时刻代码的原子性，所以必须要靠锁机制才能实现共享变量的同步。在通用系统中，这样是必要的，由此以避免系统因为一个程序的设计不合理而全面崩溃。因此，除了MacOS9以及其前辈等几个奇葩之外大部分操作系统的内核都是抢占式的。

但对于程序内的多个任务来讲，抢占式的任务分配（多线程）就没有绝对必要性了。单一的程序完全可以为自己的行为负责，其风险的影响仅仅是对本身而言的，而如果设计的好的话程序的可维护性和并发性要远远高于多线程。

基于stackless的socket操作

对于socket编程来讲，异步方式的效率要远远高于同步方式。但异步方式也有自身的问题，最大的问题之一就是实现较为复杂（比如select，poll或者epoll）。stackless却能将两者的优点巧妙地结合在一起。

为了说明这一点，我们用stackless实现一个多并发请求的客户端

服务器端暂且用Nodejs实现（只是为了实现功能。不过nodejs任务队列的概念和stackless真有点异曲同工的感觉），如下：

var net = require('net');

outbufs = ['#1 This is the 1st buf', '#2 This is the 2nd buf']
bufpos = 0

var server = net.createServer(function (socket) {
    var times = 0;
    var buf = outbufs[bufpos]
    var timer = setInterval(function() {
        if(times >= 10) {
            clearInterval(timer)
            socket.end();
        }else {
            socket.write(buf + ' peice: ' + (times + 1));
            times = times + 1;
        }

    }, 1000)
    bufpos = (bufpos + 1) % 2
    socket.on('close', function() {
        clearInterval(timer)
    })
});

server.listen(1337, '127.0.0.1');

服务器在1337端口上监听了一个socket服务，并且根据请求到来的先后，分别返回不同的内容（上面的实现为两种）。同时为了防止发送太快以至于不能区分客户端的请求是否异步，在服务程序上有模拟网络延迟（每隔一秒发一个包，一共十次）。

客户端这样写：

import socket as stdsocket
import sys

useStackless = True

if useStackless:
    import stackless
    import stacklesssocket
    stacklesssocket.install()

def socketClientTest(host, port = 1337):
    sock = stdsocket.socket(stdsocket.AF_INET, stdsocket.SOCK_STREAM)  
    sock.connect((host, port)) 
    while True:
        data = sock.recv(50)
        if not data:
            break
        print data
    sock.close()         

if useStackless:
    stackless.tasklet(socketClientTest)('127.0.0.1')  
    stackless.tasklet(socketClientTest)('127.0.0.1')  
    stackless.run()
else:
    socketClientTest('127.0.0.1')
    socketClientTest('127.0.0.1')

stacklesssocket是官方提供的基于stackless的socket模块，它能把socket操作由同步变为异步，可以在这里下到

我们用 useStackless 来标识是否用stackless库，当其为False时，是一个典型的同步socket请求

当 useStackless = False 时，运行结果如下：

#1 This is the 1st buf peice: 1
#1 This is the 1st buf peice: 2
#1 This is the 1st buf peice: 3
#1 This is the 1st buf peice: 4
#1 This is the 1st buf peice: 5
#1 This is the 1st buf peice: 6
#1 This is the 1st buf peice: 7
#1 This is the 1st buf peice: 8
#1 This is the 1st buf peice: 9
#1 This is the 1st buf peice: 10
#2 This is the 2nd buf peice: 1
#2 This is the 2nd buf peice: 2
#2 This is the 2nd buf peice: 3
#2 This is the 2nd buf peice: 4
#2 This is the 2nd buf peice: 5
#2 This is the 2nd buf peice: 6
#2 This is the 2nd buf peice: 7
#2 This is the 2nd buf peice: 8
#2 This is the 2nd buf peice: 9
#2 This is the 2nd buf peice: 10

可以看到，两个请求是顺序发出的，只有第一个请求完成之后第二个才开始。

之后修改 useStackless = True 重新运行程序

#1 This is the 1st buf peice: 1
#2 This is the 2nd buf peice: 1
#2 This is the 2nd buf peice: 2
#1 This is the 1st buf peice: 2
#1 This is the 1st buf peice: 3
#2 This is the 2nd buf peice: 3
#2 This is the 2nd buf peice: 4
#1 This is the 1st buf peice: 4
#1 This is the 1st buf peice: 5
#2 This is the 2nd buf peice: 5
#2 This is the 2nd buf peice: 6
#1 This is the 1st buf peice: 6
#1 This is the 1st buf peice: 7
#2 This is the 2nd buf peice: 7
#2 This is the 2nd buf peice: 8
#1 This is the 1st buf peice: 8
#1 This is the 1st buf peice: 9
#2 This is the 2nd buf peice: 9
#1 This is the 1st buf peice: 10
#2 This is the 2nd buf peice: 10

两个请求交替返回，也就是说请求是异步，并发的。而我们也可以看到，从同步到异步，我们仍可以保持同步socket的写法，最多加一些stackless的修饰罢了

这篇文章基本就到这里了，stackless作为协程概念的实现，具有代码有好以及高并发等特点。不明白的一点是为什么没有像Nodejs一样火起来。猜想原因有两点吧：一是缺少强大的支持，至少相比Nodejs来说如此。二是相关的库还是少了点，做稍微大一点的系统还是有点难度的。但如果能流行起来，对来发人员来讲无疑是个福音。

参考

stackless官方主页: http://www.stackless.com/

0
顶

0
踩

分享到：

Python中的字符串编解码 | IOS中的block和retain cycle

2012-09-10 15:47
浏览 1703
评论(0)
分类:编程语言
查看更多

发表评论

您还没有登录,请您登录后再发表评论

最近访客更多访客>>

博主相关

文章分类

社区版块

存档分类

最新评论