??有蛋白编码基因的核苷酸序列,想要转化成对应的氨基酸序列,可以利用Python的Biopython Module来实现。
- 提取核苷酸序列信息
??原始的保存着核苷酸序列的fasta文件里还有着序列的id等说明信息,我们只需要核苷酸序列,可以用for循环遍历每一行,将偶数行输出到一个新的文本文档中。
# opening the file
file1 = open('D:/.../PCGs/cytb/cytb.fas', 'r')
# creating another file to store even lines
file2 = open('D:/.../PCGs/cytb/cytb_no_label.fas', 'w')
# reading content of the files and writing even lines to another file
lines = file1.readlines()
for i in range(0, len(lines)):
if (i % 2 != 0):
file2.write(lines[i])
# closing the files
file1.close()
file2.close()
- 利用Biopython将CDS转为PEP
#importing the Biopython package
from Bio.Seq import Seq
from Bio.Alphabet import IUPAC
#opening the file
file1 = open('D:/.../PCGs/atp6/atp6_no_label.fas')
file2 = open('D:/.../PCGs/atp6/atp6_aa_no_label.fas', 'w')
#creating a list to store the nucleotide sequence of each row
dataMat = []
for line in file1.readlines():
curLine = line.strip().split(" ")
dataMat.append(curLine[:])
for i in dataMat[0:]:
#list to string
j = "".join(i)
coding_dna = Seq(j, IUPAC.unambiguous_dna)
pep = coding_dna.translate(table="Invertebrate Mitochondrial")
pep2 = str(pep)
print(pep2)
file2.write(pep2)
file2.write("\n")
file2.close()
- 将输出的PEP序列添加每条序列原始信息
??之前输出的PEP序列保存只有氨基酸排列信息,没有最初的序列id等说明信息了,可以再用下面的脚本添加上。
file1=open('D:/.../PCGs/nad6/nad6_aa_no_label.fas','r')
lines=[]
for line in file1:
lines.append(line)
file1.close()
file1=open('D:/.../PCGs/nad6/nad6_aa_no_label.fas','w')
lines.insert(0,'>td')
lines.insert(2,'>tj')
lines.insert(4,'>to')
lines.insert(6,'>tchi')
lines.insert(8,'>tcae')
lines.insert(10,'>tp')
lines.insert(12,'>ma')
s = '\n'.join(lines)
file1.write(s)
file1.close()
|