0. 参考文档
参考文档如下:
1. 了解Cpython源码整体结构
python的解释器有好几种,但是最常见就是Cpython,所以我们这里主要介绍了Cpython的源码结构。
1.1. 下载Cpython源码并选择分支
cpython的源码是开源的,我们可以从GitHub上下载其源码,方法如下:
git clone https://github.com/python/cpython
cd cpython
下载后,查看分支:
? ~/Code/Read/cpython/ [main] git branch -a
* main
remotes/origin/3.10
remotes/origin/3.11
remotes/origin/3.7
remotes/origin/3.8
remotes/origin/3.9
remotes/origin/HEAD -> origin/main
remotes/origin/backport-c3648f4-3.11
remotes/origin/branch-v3.11.0
remotes/origin/gh-93963/remove-importlib-resources-abcs
remotes/origin/gh-98098/zipfile-packages
remotes/origin/main
remotes/origin/refactor-wait_for
remotes/origin/revert-96579-mdk-stick-to-python-syntax
remotes/origin/shared-testcase
? ~/Code/Read/cpython/ [main]
可以看到默认只下载了main 分支,其余均为远程分支,我们假设需要查看python3.8的实现,方法如下:
? ~/Code/Read/cpython/ [main] git fetch origin 3.8
来自 https://github.com/python/cpython
* branch 3.8 -> FETCH_HEAD
? ~/Code/Read/cpython/ [main] git branch
* main
? ~/Code/Read/cpython/ [main] git checkout 3.8
正在更新文件: 100% (3549/3549), 完成.
分支 '3.8' 设置为跟踪 'origin/3.8'。
切换到一个新分支 '3.8'
? ~/Code/Read/cpython/ [3.8] git branch
* 3.8
main
? ~/Code/Read/cpython/ [3.8]
下面就以这个分支为例,说明cpython源码中各个部分的大概作用。
1.2. 整个源码文件夹简要介绍
源码中各个文件夹的大概含义如下:
cpython/
│
├── Doc ← Source for the documentation
├── Grammar ← The computer-readable language definition
├── Include ← The C header files
├── Lib ← Standard library modules written in Python
├── Mac ← macOS support files
├── Misc ← Miscellaneous files
├── Modules ← Standard Library Modules written in C
├── Objects ← Core types and the object model
├── Parser ← The Python parser source code
├── PC ← Windows build support files
├── PCbuild ← Windows build support files for older Windows versions
├── Programs ← Source code for the python executable and other binaries
├── Python ← The CPython interpreter source code
└── Tools ← Standalone tools useful for building or extending Python
这里面有很多文件夹,但是我们常用的只有几个,下面会单独介绍这写常用的文件夹。
2. Cpython解释器本身源码简要介绍
2.1. Python解释器本身的实现
首先最重要的就是Python解释器本身的实现,也就是【Python文件夹】中的内容:
? ~/Code/Read/cpython/Python/ [3.8] ls
Python-ast.c dtoa.c getcompiler.c modsupport.c pystrtod.c
README dup2.c getcopyright.c mysnprintf.c pythonrun.c
_warnings.c dynamic_annotations.c getopt.c mystrtoul.c pytime.c
asdl.c dynload_aix.c getplatform.c opcode_targets.h strdup.c
ast.c dynload_dl.c getversion.c pathconfig.c structmember.c
ast_opt.c dynload_hpux.c graminit.c peephole.c symtable.c
ast_unparse.c dynload_shlib.c hamt.c preconfig.c sysmodule.c
bltinmodule.c dynload_stub.c import.c pyarena.c thread.c
bootstrap_hash.c dynload_win.c importdl.c pyctype.c thread_nt.h
ceval.c errors.c importdl.h pyfpe.c thread_pthread.h
ceval_gil.h fileutils.c importlib.h pyhash.c traceback.c
clinic formatter_unicode.c importlib_external.h pylifecycle.c wordcode_helpers.h
codecs.c frozen.c importlib_zipimport.h pymath.c
compile.c frozenmain.c initconfig.c pystate.c
condvar.h future.c makeopcodetargets.py pystrcmp.c
context.c getargs.c marshal.c pystrhex.c
? ~/Code/Read/cpython/Python/ [3.8]
比如我们常用到的内置函数,就是在bltinmodule.c 文件中实现的, 前面的内容如下:
? ~/Code/Read/cpython/Python/ [3.8] head bltinmodule.c
/* Built-in functions */
? ~/Code/Read/cpython/Python/ [3.8]
比如反射机制用到的getattr() 函数的实现就在bltinmodule.c 文件中, 如下:
static PyObject *
builtin_getattr(PyObject *self, PyObject *const *args, Py_ssize_t nargs)
{
PyObject *v, *name, *result;
if (!_PyArg_CheckPositional("getattr", nargs, 2, 3))
return NULL;
v = args[0];
name = args[1];
if (!PyUnicode_Check(name)) {
PyErr_SetString(PyExc_TypeError,
"getattr(): attribute name must be string");
return NULL;
}
if (nargs > 2) {
if (_PyObject_LookupAttr(v, name, &result) == 0) {
PyObject *dflt = args[2];
Py_INCREF(dflt);
return dflt;
}
}
else {
result = PyObject_GetAttr(v, name);
}
return result;
}
当然这里面有调用到了PyObject_GetAttr() ,我们需要一级一级查找了, 这里就不详细介绍了。
还有查看字节码时候常用的文件ceval.c , 在文件的最前面也有简单的功能介绍:
? ~/Code/Read/cpython/Python/ [3.8] head ceval.c
/* Execute compiled code */
/* XXX TO DO:
XXX speed up searching for keywords by using a dictionary
XXX document it!
*/
/* enable more aggressive intra-module optimizations, where available */
? ~/Code/Read/cpython/Python/ [3.8]
2.2. Python解释器Cpython的头文件【Include文件夹】
Cpython自然很多是用c实现,c的最核心的头文件存放在【Include文件夹】中。 如果我们只写python代码,通常不会用到这个文件夹。但是如果我们想做python的 C Extension(也就是C扩展),那么肯定会用到【Include文件夹】中的内容:
? ~/Code/Read/cpython/Include/ [3.8] ls
Python-ast.h datetime.h iterobject.h py_curses.h pystrtod.h
Python.h descrobject.h listobject.h pyarena.h pythonrun.h
abstract.h dictobject.h longintrepr.h pycapsule.h pythread.h
asdl.h dtoa.h longobject.h pyctype.h pytime.h
ast.h dynamic_annotations.h marshal.h pydebug.h rangeobject.h
bitset.h enumobject.h memoryobject.h pydtrace.d setobject.h
bltinmodule.h errcode.h methodobject.h pydtrace.h sliceobject.h
boolobject.h eval.h modsupport.h pyerrors.h structmember.h
bytearrayobject.h fileobject.h moduleobject.h pyexpat.h structseq.h
bytes_methods.h fileutils.h namespaceobject.h pyfpe.h symtable.h
bytesobject.h floatobject.h node.h pyhash.h sysmodule.h
cellobject.h frameobject.h object.h pylifecycle.h token.h
ceval.h funcobject.h objimpl.h pymacconfig.h traceback.h
classobject.h genobject.h odictobject.h pymacro.h tracemalloc.h
code.h graminit.h opcode.h pymath.h tupleobject.h
codecs.h grammar.h osdefs.h pymem.h typeslots.h
compile.h import.h osmodule.h pyport.h ucnhash.h
complexobject.h internal parsetok.h pystate.h unicodeobject.h
context.h interpreteridobject.h patchlevel.h pystrcmp.h warnings.h
cpython intrcheck.h picklebufobject.h pystrhex.h weakrefobject.h
? ~/Code/Read/cpython/Include/ [3.8]
在python中一切皆对象,有关对象的头文件是object.h , 其中特别重要的两个类型PyObject , PyVarObject 的定义就在这个文件中:
typedef struct _object {
_PyObject_HEAD_EXTRA
Py_ssize_t ob_refcnt;
struct _typeobject *ob_type;
} PyObject;
#define _PyObject_CAST(op) ((PyObject*)(op))
typedef struct {
PyObject ob_base;
Py_ssize_t ob_size;
} PyVarObject;
2.3. Python语言本身解析【Parser文件夹】
如果想要了解Python的语言本身解析过程,也就是python词法分析、语法分析的实现,可以查看【Parser文件夹】:
? ~/Code/Read/cpython/Parser/ [3.8] ls
Python.asdl asdl.py grammar1.c myreadline.c parser.c parsetok.c token.c tokenizer.h
acceler.c asdl_c.py listnode.c node.c parser.h pgen tokenizer.c
? ~/Code/Read/cpython/Parser/ [3.8]
2.4. Python内建对象的实现【Object文件夹】
python内建了很多对象,它们的实现存放在【Object文件夹】中:
? ~/Code/Read/cpython/Objects/ [3.8] ls
README clinic frameobject.c moduleobject.c tupleobject.c
abstract.c codeobject.c funcobject.c namespaceobject.c typeobject.c
accu.c complexobject.c genobject.c object.c typeslots.inc
boolobject.c descrobject.c interpreteridobject.c obmalloc.c typeslots.py
bytearrayobject.c dict-common.h iterobject.c odictobject.c unicodectype.c
bytes_methods.c dictnotes.txt listobject.c picklebufobject.c unicodeobject.c
bytesobject.c dictobject.c listsort.txt rangeobject.c unicodetype_db.h
call.c enumobject.c lnotab_notes.txt setobject.c weakrefobject.c
capsule.c exceptions.c longobject.c sliceobject.c
cellobject.c fileobject.c memoryobject.c stringlib
classobject.c floatobject.c methodobject.c structseq.c
? ~/Code/Read/cpython/Objects/ [3.8]
如果你想看python中List对象的实现:
? ~/Code/Read/cpython/Objects/ [3.8] head listobject.c
/* List object implementation */
? ~/Code/Read/cpython/Objects/ [3.8]
python中字典对象的实现:
? ~/Code/Read/cpython/Objects/ [3.8] head dictobject.c
/* Dictionary object implementation using a hash table */
/* The distribution includes a separate file, Objects/dictnotes.txt,
describing explorations into dictionary design and optimization.
It covers typical dictionary use patterns, the parameters for
tuning dictionaries, and several ideas for possible optimizations.
*/
/* PyDictKeysObject
? ~/Code/Read/cpython/Objects/ [3.8]
3. Cpython标准库
除了解释器本身外,cpython最关键的就是标准库的实现了,毕竟python是自带电池的语言嘛。。。 cpython的标准库由两部分构成:
3.1. 由python编写的标准库的实现
这里我们先介绍由python编写的标准库,它存放的路径是【Lib文件夹】,如下:
? ~/Code/Read/cpython/Lib/ [3.8] ls
__future__.py cmd.py fractions.py multiprocessing runpy.py test
__phello__.foo.py code.py ftplib.py netrc.py sched.py textwrap.py
_bootlocale.py codecs.py functools.py nntplib.py secrets.py this.py
_collections_abc.py codeop.py genericpath.py ntpath.py selectors.py threading.py
_compat_pickle.py collections getopt.py nturl2path.py shelve.py timeit.py
_compression.py colorsys.py getpass.py numbers.py shlex.py tkinter
_dummy_thread.py compileall.py gettext.py opcode.py shutil.py token.py
_markupbase.py concurrent glob.py operator.py signal.py tokenize.py
_osx_support.py configparser.py gzip.py optparse.py site-packages trace.py
_py_abc.py contextlib.py hashlib.py os.py site.py traceback.py
_pydecimal.py contextvars.py heapq.py pathlib.py smtpd.py tracemalloc.py
_pyio.py copy.py hmac.py pdb.py smtplib.py tty.py
_sitebuiltins.py copyreg.py html pickle.py sndhdr.py turtle.py
_strptime.py crypt.py http pickletools.py socket.py turtledemo
_threading_local.py csv.py idlelib pipes.py socketserver.py types.py
_weakrefset.py ctypes imaplib.py pkgutil.py sqlite3 typing.py
abc.py curses imghdr.py platform.py sre_compile.py unittest
aifc.py dataclasses.py imp.py plistlib.py sre_constants.py urllib
antigravity.py datetime.py importlib poplib.py sre_parse.py uu.py
argparse.py dbm inspect.py posixpath.py ssl.py uuid.py
ast.py decimal.py io.py pprint.py stat.py venv
asynchat.py difflib.py ipaddress.py profile.py statistics.py warnings.py
asyncio dis.py json pstats.py string.py wave.py
asyncore.py distutils keyword.py pty.py stringprep.py weakref.py
base64.py doctest.py lib2to3 py_compile.py struct.py webbrowser.py
bdb.py dummy_threading.py linecache.py pyclbr.py subprocess.py wsgiref
binhex.py email locale.py pydoc.py sunau.py xdrlib.py
bisect.py encodings logging pydoc_data symbol.py xml
bz2.py ensurepip lzma.py queue.py symtable.py xmlrpc
cProfile.py enum.py mailbox.py quopri.py sysconfig.py zipapp.py
calendar.py filecmp.py mailcap.py random.py tabnanny.py zipfile.py
cgi.py fileinput.py mimetypes.py re.py tarfile.py zipimport.py
cgitb.py fnmatch.py modulefinder.py reprlib.py telnetlib.py
chunk.py formatter.py msilib rlcompleter.py tempfile.py
? ~/Code/Read/cpython/Lib/ [3.8]
比如我们想要了解python的多进程的实现,可以打开multiprocessing 文件夹, 内部文件如下:
? ~/Code/Read/cpython/Lib/multiprocessing/ [3.8] tree ./
./
├── __init__.py
├── connection.py
├── context.py
├── dummy
│ ├── __init__.py
│ └── connection.py
├── forkserver.py
├── heap.py
├── managers.py
├── pool.py
├── popen_fork.py
├── popen_forkserver.py
├── popen_spawn_posix.py
├── popen_spawn_win32.py
├── process.py
├── queues.py
├── reduction.py
├── resource_sharer.py
├── resource_tracker.py
├── shared_memory.py
├── sharedctypes.py
├── spawn.py
├── synchronize.py
└── util.py
1 directory, 23 files
? ~/Code/Read/cpython/Lib/multiprocessing/ [3.8]
3.2. 由c编写的标准库的实现
由c编写的标准库,它存放的路径是【Modules文件夹】,如下:
? ~/Code/Read/cpython/Modules/ [3.8] ls
README _pickle.c binascii.c pwdmodule.c
Setup _posixsubprocess.c cjkcodecs pyexpat.c
_abc.c _queuemodule.c clinic readline.c
_asynciomodule.c _randommodule.c cmathmodule.c resource.c
_bisectmodule.c _scproxy.c config.c.in rotatingtree.c
_blake2 _sha3 errnomodule.c rotatingtree.h
_bz2module.c _sqlite expat selectmodule.c
_codecsmodule.c _sre.c faulthandler.c sha1module.c
_collectionsmodule.c _ssl fcntlmodule.c sha256module.c
_contextvarsmodule.c _ssl.c gc_weakref.txt sha512module.c
_cryptmodule.c _ssl_data.h gcmodule.c signalmodule.c
_csv.c _ssl_data_111.h getaddrinfo.c socketmodule.c
_ctypes _ssl_data_300.h getbuildinfo.c socketmodule.h
_curses_panel.c _stat.c getnameinfo.c spwdmodule.c
_cursesmodule.c _statisticsmodule.c getpath.c sre.h
_datetimemodule.c _struct.c grpmodule.c sre_constants.h
_dbmmodule.c _testbuffer.c hashlib.h sre_lib.h
_decimal _testcapimodule.c hashtable.c symtablemodule.c
_elementtree.c _testimportmultiple.c hashtable.h syslogmodule.c
_functoolsmodule.c _testinternalcapi.c itertoolsmodule.c termios.c
_gdbmmodule.c _testmultiphase.c ld_so_aix.in testcapi_long.h
_hashopenssl.c _threadmodule.c main.c timemodule.c
_heapqmodule.c _tkinter.c makesetup tkappinit.c
_io _tracemalloc.c makexp_aix tkinter.h
_json.c _uuidmodule.c mathmodule.c unicodedata.c
_localemodule.c _weakref.c md5module.c unicodedata_db.h
_lsprof.c _winapi.c mmapmodule.c unicodename_db.h
_lzmamodule.c _xxsubinterpretersmodule.c nismodule.c winreparse.h
_math.c _xxtestfuzz ossaudiodev.c xxlimited.c
_math.h addrinfo.h overlapped.c xxmodule.c
_multiprocessing arraymodule.c parsermodule.c xxsubtype.c
_opcode.c atexitmodule.c posixmodule.c zlibmodule.c
_operator.c audioop.c posixmodule.h
? ~/Code/Read/cpython/Modules/ [3.8]
|