python:
1、‘sep’.join(seq):返回一个以分隔符sep连接各个元素后生成的字符串
将一个列表还原成一句话
training_ci = " ".join(seg_list)
>>> seq2 = "hello good boy doiido"
>>> print ':' .join(seq2)
h:e:l:l:o: :g:o:o:d: :b:o:y: :d:o:i:i:d:o
>>> seq3 = ( 'hello' , 'good' , 'boy' , 'doiido' )
>>> print ':' .join(seq3)
hello:good:boy:doiido
>>> seq4 = { 'hello' : 1 , 'good' : 2 , 'boy' : 3 , 'doiido' : 4 }
>>> print ':' .join(seq4)
boy:good:doiido:hello
>>> import os
>>> os.path.join( '/hello/' , 'good/boy/' , 'doiido' )
'/hello/good/boy/doiido'
2、split的用法
拆分字符串:通过指定分隔符对字符串进行切片,并返回分割后的字符串列表(list){与上面操作刚好相反,不过可以实现txt->""字符串->列表,这样才可以转换为numpy} os.path.split():按照路径将文件名和路径分割开
training_ci = training_ci.split()
3、进而引入一个txt->""字符串->列表->numpy的分词模式
def fenci(training_data):
seg_list = jieba.cut(training_data)
training_ci = " ".join(seg_list)
training_ci = training_ci.split()
training_ci = np.array(training_ci)
training_ci = np.reshape(training_ci, [-1, ])
return training_ci
4、Counter的使用
Counter():以字典的形式返回传入的迭代器(数组,字符串,字典等)内的各个词出现的次数。
from collections import Counter
colors = ['red', 'blue', 'red', 'green', 'blue', 'blue']
c = Counter(colors)
print (dict(c))
输出:{‘red’: 2, ‘blue’: 3, ‘green’: 1}
5、most_common()函数实现寻找top n 问题
和Counter的作用是一样的,不过可以限制选出前几,
from collections import Counter
user_counter = Counter("abbafafpskaag")
print(user_counter.most_common(3))
print(user_counter['a'])
6、zip的使用方法:接受任意多个(包括0个和1个)序列作为参数,返回一个tuple列表。
x = [1, 2, 3]
y = [4, 5, 6]
z = [7, 8, 9]
xyz = zip(x, y, z)
print xyz
运行的结果是:[(1, 4, 7), (2, 5, 8), (3, 6, 9)]
x = [1, 2, 3]
y = [4, 5, 6, 7]
xy = zip(x, y)
print xy
运行的结果是:[(1, 4), (2, 5), (3, 6)]
x = [1, 2, 3]
x = zip(x)
print x
运行的结果是:[(1,), (2,), (3,)]
x = zip()
print x
运行的结果是:[] 补充:
>>> a = [1,2,3]
>>> b = [4,5,6]
>>> c = [4,5,6,7,8]
>>> zipped = zip(a,b)
[(1, 4), (2, 5), (3, 6)]
>>> zip(a,c)
[(1, 4), (2, 5), (3, 6)]
>>> zip(*zipped)
[(1, 2, 3), (4, 5, 6)]
7、dict与zip结合
>>> dict()
{}
>>> dict(a='a', b='b', t='t')
{'a': 'a', 'b': 'b', 't': 't'}
>>> dict(zip(['one', 'two', 'three'], [1, 2, 3]))
{'three': 3, 'two': 2, 'one': 1}
>>> dict([('one', 1), ('two', 2), ('three', 3)])
{'three': 3, 'two': 2, 'one': 1}
|