迭代器/生成器函数及协程函数的编写和使用

返回首页

　　迭代器函数：

　　　　迭代器的本质是用来迭代的。迭代就是更新换代。但是他的本质是逐条出结果。

　　　　　让所有数据类型，都有一种不依赖下标就可以迭代的方式，这个方式就是迭代器。

　　　　　迭代器，一定要是可迭代的对象。Python解释器会为迭代器类型的数据内置一个iter方法。

　　　　　迭代器可以不依赖下标也可以取值。

　　　　　只要有__iter__方法，就是可迭代的对象。　

d = {"a":2,"b":8,"l":4,}
## 可迭代的：只要对象本身有__iter__方法，那它就是可迭代的。
H = d.__iter__()   #有iter方法，就是可迭代的 iter(d) 返回值H就是我们要的迭代器
print(H.__next__()) #迭代器本身有一个next方法，可以获取一个d字典的key。一个next就可以获取一次值。
print(H.__next__())
print(H.__next__())

　　　　　在next的取值的方法中，如果next的次数超过数据类型的长度，会报stopIteration的异常错误。也可以理解为是一个结束信号。

　　　　 while循环取字典的值，迭代器方式：

d={'a':1,'b':2,'c':3}
i=iter(d)
while True:
    try:
        print(next(i))
    except StopIteration:
        break

　　　　 while循环取列表的值，迭代器方式：

l=['a','b','c','d','e']
i=l.__iter__()
while True:
    try:
        print(next(i))
    except StopIteration:
        break

　　为什么要用迭代器：
    　　优点
    　　1：迭代器提供了一种不依赖于索引的取值方式，这样就可以遍历那些没有索引的可迭代对象了（字典，集合，文件）。
    　　2：迭代器与列表比较，迭代器是惰性计算的，更节省内存。

    　　缺点
    　　1：无法获取迭代器的长度，使用不如列表索引取值灵活。
    　　2：一次性取值，只能往后取值，不能倒着取值，取过就不能再去，先用迭代器在用for循环是取不到值的。

　　查看可迭代对象与迭代器对象：需要使用到collections模块的lterable方法和lterator方法。lterable是否可迭代，lterator是否是迭代器。

　　生成器函数：

　　　　生成器就是一个函数，但是这个函数包含一个yield关键字。

　　　　生成器本身就是一个迭代器，但是它不直接叫迭代器是因为生成器是将函数做成了迭代器。　

　　　　判断这个生成器是否是迭代器，用collection模块的iterator方法去判断一下。

def test():
    print("first")
    yield 1  #yield 1 和return 1 很像。

g = test() #执行函数，不会有返回值，但是可以拿到这个值。
print(g) #打印的结果是一个内存地址，这个内存地址指向的是一个生成器对象generator。 
print(isinstance(g,iterator))
print(next(g))   #执行test函数，得到的不是函数的执行效果，而是拿到了一个生成器。next生成器才会触发函数的运行。

　　　　生成器函数，一个yield，只会出一个次结果。　

def countdown(n):
    print('start coutdown')
    while n > 0:
        yield n #1
        n-=1
    print('done')

g=countdown(5)  #拿到一个countdown的生成器
print(g)

print(next(g))  #一次next，取一次值。
print(next(g))
print(next(g))
print(next(g))
print(next(g))
print(next(g))

　　　　但是next的次数大过取值范围，就会报stopIteration的异常错误。

　　　　当next抛出异常是，将next捕获，做异常处理。

　　　　用while处理异常：

def countdown(n):
    print('start coutdown')
    while n > 0:
        yield n #1
        n-=1
    print('done')

g=countdown(5)  #拿到一个countdown的生成器
print(g)

while True:
    try:
        print(next(g))  #捕获next的异常
    except StopIteration:  
        break

　　　　用for也可以处理异常：

def countdown(n):
    print('start coutdown')
    while n > 0:
        yield n #1
        n-=1
    print('done')

g=countdown(5)  #拿到一个countdown的生成器
print(g)

for i in g: #iter(g) 每一次for循环都是在遍历g的迭代器，然后next取值。
    print(i)

　　　　生成器和return的区别：

　　　　　　return只能返回一次函数就彻底结束了，而yield能返回多次值。

　　　 yield到底干了什么事情：

　　　　　　yield把函数变成生成器-->迭代器

　　　　相当于把__iter__ 和 __next__ 方法封装到函数内部

　　　　　　用return返回值能返回一次，而yield返回多次

　　　　　　函数在暂停以及继续下一次运行时的状态是由yield保存

　　　　生成器的使用场景：动态查看日志文件的变化 tail -f

　　　　　　开两台Linux，查看一下tail -f /var/log/nginx/access.log 日志。

　　　　　　在node001节点：touch /tmp/a.txt

　　　　　　在node001节点：tail -f /tmp/a.txt 动态检测文件内容的变化，有新增内容就打印出来。

　　　　　　在node002节点：echo "hello" >> /tmp/a.txt

　　　　　　这时在node001节点，就可以看到node002追加进文件的内容。

　　　　　　用生成器实现tail命令的效果，本质是读文件新增的内容并打印。

　　　　　　在node001节点：vim tail.py

import time
def tail(file_path):
    with open(file_path,'r') as f:
        f.seek(0,2)
        while True:
            line=f.readline()  #读不到值，就要判断
            if not line:  #没读到值
                time.sleep(0.3)
　　　　　　　　　 print("===>")
                continue
            else: #读到值
                # print(line)
                yield line

g=tail('/tmp/a.txt')
print(next(g))  #监听到一行就yield结束
#for line in g:
#   if “error” in line：
#       print(line) #for循环是一直监听

　　　　　　在node001节点：python tail.py 这样就等待读取了。

　　　　　　在node002节点：echo "hello world" >> /tmp/a.txt

　　　　　　在node001节点：tail -f /tmp/a.txt | grep "error" #当读到的信息包含error才打印。

　　　　　　在node002节点：这时在echo有error的值，是打印的，echo没有error的值，是不打印的。　　　　　　　

import time
#定义阶段
def tail(file_path):
    with open(file_path,'r') as f:
        f.seek(0,2)
        while True:
            line=f.readline()  #读不到值，就要判断
            if not line:  #没读到值
                time.sleep(0.3)
                continue
            else: #读到值
                # print(line)
                yield line
def grep(pattern,lines):
    for line in lines:
        if pattern in line:
            yield line
#调用阶段 得到两个生成器对象
g1 = tail("/tmp/a.txt")
g2 = grep("error",g1)

#使用生成器，next触发执行g2生成器函数
for i in g2:
　　print(i)

　　协程函数：

　　　　如果在一个函数内部yield的使用方式是表达式形式的话，如x=yield，那么该函数成为协程函数。

def eater(name):
    print('%s start to eat food' %name)
    food_list=[]
    while True:
        food=yield food_list  #yield的表达式形式
        print('%s get %s ,to start eat' %(name,food))
        food_list.append(food)
    # print('done')

e=eater('George')
# print(e)
print(next(e))
print(e.send('东坡肉'))  #send会把值给yield，yield再把值给food
print(e.send('红烧排骨'))
print(e.send('锅包肉'))

　　　　e.send 和 next(e) 的区别：

　　　　　　1、如果函数内yield是表达式形式，那么必须先next(e)

　　　　　　2、二者的共同之处是都可以让函数在上次暂停的位置继续运行，

不一样的地方在于send在触发下一次代码的执行中，会顺便给yield传一个值。

　　　　给协程函数加装饰器：

def init(func):
    def wrapper(*args,**kwargs):
        res = func(*args,**kwargs)
        next(res)
        return res
    return wrapper

@init
def eater(name):
    print("%s start to eat" % name)
    food_list =[]
    while True:
        food = yield food_list
        print("%s eat %s" %(name,food))
        food_list.append(food)

e = eater("hb")   #wrapper("hb")
next(e)
print(e.send("123"))
print(e.send("123"))
print(e.send("123"))
print(e.send("123"))
print(e.send("123"))

　　作业：

　　爬网页，一直yield

from urllib.request import urlopen
def get():
    while True:
        url = yield
        res = urlopen(url).read()
        print(res)

g = get()
next(g)
g.send("http://www.python.org")

　　用协程函数实现grep -rl 的操作。

#grep -rl 'root' /etc
import os,time

def wrapper(func):
    def inner(*args,**kwargs):
        res = func(*args,**kwargs)
        next(res)
        return res
    return inner

@wrapper
def search(target):
    """
    找到文件的绝对路径
    target是生成器对象
    :return:
    """
    while True:
        dir_name = yield   #协程函数表达式，将路径传给dir_name
        print('车间search开始生产产品：文件路径')
        time.sleep(2)
        g = os.walk(dir_name)
        for i in g:
            for j in i[-1]:
                file_path = '%s/%s' %(i[0],j)
                target.send(file_path)  #将路径传出去。找到一个发一个。
                # print(file_path)
@wrapper
def opener(target):
    """
    打开一个文件,获取文件的句柄。
    :return:
    """
    while True:
        file_path = yield  #接到路径
        print('车间opener开始生产产品：文件句柄')
        time.sleep(2)
        with open(file_path) as f:   #接到文件路径，打开文件
            target.send(f(file_path,f))  #send可以传多个值，但是要是元组的形式。
@wrapper
def cat(target):
    """
    读取文件内容
    :return:
    """
    while True:
        file_path,f = yield   #那到文件句柄，读文件
        print('车间cat开始生产产品：文件内容')
        time.sleep(2)
        for line in f:  #读文件。
            target.send((file_path,line))

@wrapper
def grep(pattern,target):
    """
    过滤一行内容里是否有关键字。
    :return:
    """
    while True:
        file_path,line = yield
        print('车间grep开始生产产品：文件关键字')
        time.sleep(2)
        if pattern in line:
            target.send(file_path)

@wrapper
def printer():
    """
    打印有关键字的文件路径
    :return:
    """
    while True:
        file_path = yield
        print('车间printer开始生产产品：文件路径')
        time.sleep(2)
        print(file_path)

g = search(opener(cat(grep('python',printer()))))
# next(g)
g.send("/etc")

　　列表生成式：

　　列表生成式的语法是：列表里面有for循环，并且有if判断。

[
expression for item1 in iterable1 if condition1
           for item2 in iterable2 if condition2
           .....
           for itemN in iterableN if conditionN

]

　　eg：

l=[1,2,3,4]
s='hello'

# l1=[(num,s1) for num in l if num > 2 for s1 in s]
# print(l1)

l1=[]
for num in l:
    for s1 in s:
        t=(num,s1)
        l1.append(t)
print(l1)

　　也可以用列表生成式实现grep -rl的操作。

import os
g=os.walk('C:\test')
file_path_list=[]
for i in g:
    # print(i)
    for j in i[-1]:
        file_path_list.append('%s\%s' %(i[0],j))

print(file_path_list)

g=os.walk('C:\test')
l1=['%s\%s' %(i[0],j) for i in g for j in i[-1]]
print(l1)

　　生成器表达式：

　语法形式：生成器的语法格式和列表推导式类似，将[ ] 换成( )。

(
expression for item1 in iterable1 if condition1
           for item2 in iterable2 if condition2
           .....
           for itemN in iterableN if conditionN

)

　　优点是：省内存，一次只产生一个值在内存中。

　　应用：读取一个大文件的所有内容，并且处理行。

g=l=('egg%s' %i for i in range(1000000000000000000000000000000000000))
print(g)
print(next(g))
print(next(g))
for i in g:
    print(i)

　　eg1：将文件中，每一行的空格都处理掉。

#常规处理方法：弊端是大文件处理时，内存就暴了。
f=open('a.txt')
l=[]
for line in f:
    line=line.strip()
    l.append(line)
print(l)


#用列表生成器的形式是实现：这样依旧会占据内存空间
l1=[line.strip() for line in f]
print(l1)


#用生成器的形式实现：一次next取一次值，减少内存压力
g=(line.strip() for line in f)
print(g)
print(next(g))

　　eg2：计算文件中商品的价格。

　　b.txt文件

apple 10 3
Mercedes-G-AMG 3000000 2
Mac 30000 1
Porsche911 3000000 3

　　计算价格：

money_l=[]
with open('b.txt') as f:
    for line in f:
        goods=line.split()
        res=float(goods[-1])*float(goods[-2])
        money_l.append(res)
print(money_l)
print(sum(money_l))

　　用生成器的形式实现：

f=open('b.txt')
g=(float(line.split()[-1])*float(line.split()[-2]) for line in f)
print(sum(g))

　　eg3：模拟数据库查询数据

res=[]
with open('b.txt') as f:
    for line in f:
        # print(line)
        l=line.split() #切成列表
        # print(l)
        d={} #字典
        d={"name":None,"price":None,"count":None} #定义字典格式
        d['name']=l[0]
        d['price']=l[1]
        d['count']=l[2]
        res.append(d)

print(res)

　　用声明式方式简化：

with open('b.txt') as f:
    res=(line.split() for line in f)
    print(res)
    dic_g=({'name':i[0],'price':i[1],'count':i[2]} for i in res)
    print(dic_g)
    apple_dic=next(dic_g)
    print(apple_dic['count'])
　　 apple_dict=next(dic_g)
　　 print(apple_dict)

　　取出单价大于1万的：

#取出单价>10000
with open('b.txt') as f:
    res=(line.split() for line in f)
    # print(res)
    dic_g=({'name':i[0],'price':i[1],'count':i[2]} for i in res if float(i[1]) > 10000)
    print(dic_g)
    print(list(dic_g))
    # for i in dic_g:
    #     print(i)

---------------- END --------------