Fork me on GitHub

Jedi Development

Note

This documentation is for Jedi developers who want to improve Jedi itself, but have no idea how Jedi works. If you want to use Jedi for your IDE, look at the plugin api.

Introduction

This page tries to address the fundamental demand for documentation of the Jedi interals. Understanding a dynamic language is a complex task. Especially because type inference in Python can be a very recursive task. Therefore Jedi couldn’t get rid of complexity. I know that simple is better than complex, but unfortunately it sometimes requires complex solutions to understand complex systems.

Since most of the Jedi internals have been written by me (David Halter), this introduction will be written mostly by me, because no one else understands to the same level how Jedi works. Actually this is also the reason for exactly this part of the documentation. To make multiple people able to edit the Jedi core.

In five chapters I’m trying to describe the internals of Jedi:

Note

Testing is not documented here, you’ll find that right here.

The Jedi Core

The core of Jedi consists of three parts:

Most people are probably interested in code evaluation, because that’s where all the magic happens. I need to introduce the parser first, because jedi.evaluate uses it extensively.

Parser (parser/__init__.py)

The Parser tries to convert the available Python code in an easy to read format, something like an abstract syntax tree. The classes who represent this tree, are sitting in the jedi.parser.representation module.

The Python module tokenize is a very important part in the Parser, because it splits the code into different words (tokens). Sometimes it looks a bit messy. Sorry for that! You might ask now: “Why didn’t you use the ast module for this? Well, ast does a very good job understanding proper Python code, but fails to work as soon as there’s a single line of broken code.

There’s one important optimization that needs to be known: Statements are not being parsed completely. Statement is just a representation of the tokens within the statement. This lowers memory usage and cpu time and reduces the complexity of the Parser (there’s another parser sitting inside Statement, which produces Array and Call).

Parser Representation (parser/representation.py)

If you know what an abstract syntax tree (ast) is, you’ll see that this module is pretty much that. The classes represent syntax elements: Import, Function.

A very central class is Scope. It is not used directly by the parser, but inherited. It’s used by Function, Class, Flow, etc. A Scope may have subscopes, imports and statements. The entire parser is based on scopes, because they also stand for indentation.

One special thing:

Array values are statements. But if you think about it, this makes sense. [1, 2+33] for example would be an Array with two Statement inside. This is the easiest way to write a parser. The same behaviour applies to Param, which is being used in a function definition.

The easiest way to play with this module is to use parsing.Parser. parsing.Parser.module holds an instance of SubModule:

>>> from jedi._compatibility import u
>>> from jedi.parser import Parser
>>> parser = Parser(u('import os'), 'example.py')
>>> submodule = parser.module
>>> submodule
<SubModule: example.py@1-1>

Any subclasses of Scope, including SubModule has attribute imports. This attribute has import statements in this scope. Check this out:

>>> submodule.imports
[<Import: import os @1,0>]

See also Scope.subscopes and Scope.statements.

Class inheritance diagram:

Inheritance diagram of SubModule, Class, Function, Lambda, Flow, ForFlow, Import, Statement, Param, Call, Array, Name, ListComprehension

Evaluation of python code (evaluate/__init__.py)

Evaluation of Python code in Jedi is based on three assumptions:

  • Code is recursive (to weaken this assumption, the jedi.evaluate.dynamic module exists).
  • No magic is being used:
    • metaclasses
    • setattr() / __import__()
    • writing to globals(), locals(), object.__dict__
  • The programmer is not a total dick, e.g. like this :-)

That said, there’s mainly one entry point in this script: eval_statement. This is where autocompletion starts. Everything you want to complete is either a Statement or some special name like class, which is easy to complete.

Therefore you need to understand what follows after eval_statement. Let’s make an example:

import datetime
datetime.date.toda# <-- cursor here

First of all, this module doesn’t care about completion. It really just cares about datetime.date. At the end of the procedure eval_statement will return the datetime class.

To visualize this (simplified):

  • eval_statement - <Statement: datetime.date>

    • Unpacking of the statement into [[<Call: datetime.date>]]
  • eval_expression_list, calls eval_call with <Call: datetime.date>

  • eval_call - searches the datetime name within the module.

This is exactly where it starts to get complicated. Now recursions start to kick in. The statement has not been resolved fully, but now we need to resolve the datetime import. So it continues

  • follow import, which happens in the jedi.evaluate.imports module.
  • now the same eval_call as above calls follow_path to follow the second part of the statement date.
  • After follow_path returns with the desired datetime.date class, the result is being returned and the recursion finishes.

Now what would happen if we wanted datetime.date.foo.bar? Just two more calls to follow_path (which calls itself with a recursion). What if the import would contain another Statement like this:

from foo import bar
Date = bar.baz

Well... You get it. Just another eval_statement recursion. It’s really easy. Just that Python is not that easy sometimes. To understand tuple assignments and different class scopes, a lot more code had to be written. Yet we’re still not talking about Descriptors and Nested List Comprehensions, just the simple stuff.

So if you want to change something, write a test and then just change what you want. This module has been tested by about 600 tests. Don’t be afraid to break something. The tests are good enough.

I need to mention now that this recursive approach is really good because it only evaluates what needs to be evaluated. All the statements and modules that are not used are just being ignored. It’s a little bit similar to the backtracking algorithm.

Evaluation Representation (evaluate/representation.py)

Like described in the jedi.evaluate.parsing_representation module, there’s a need for an ast like module to represent the states of parsed modules.

But now there are also structures in Python that need a little bit more than that. An Instance for example is only a Class before it is instantiated. This class represents these cases.

So, why is there also a Class class here? Well, there are decorators and they change classes in Python 3.

Inheritance diagram of Executable, Instance, InstanceElement, Class, Function, FunctionExecution

Name resolution (evaluate/finder.py)

Searcjing for names with given scope and name. This is very central in Jedi and Python. The name resolution is quite complicated with descripter, __getattribute__, __getattr__, global, etc.

Flow checks

Flow checks are not really mature. There’s only a check for isinstance. It would check whether a flow has the form of if isinstance(a, type_or_tuple). Unfortunately every other thing is being ignored (e.g. a == ‘’ would be easy to check for -> a is a string). There’s big potential in these checks.

API (api.py and api_classes.py)

The API has been designed to be as easy to use as possible. The API documentation can be found here. The API itself contains little code that needs to be mentioned here. Generally I’m trying to be conservative with the API. I’d rather not add new API features if they are not necessary, because it’s much harder to deprecate stuff than to add it later.

Core Extensions

Core Extensions is a summary of the following topics:

These topics are very important to understand what Jedi additionally does, but they could be removed from Jedi and Jedi would still work. But slower and without some features.

Iterables & Dynamic Arrays (evaluate/iterable.py)

To understand Python on a deeper level, Jedi needs to understand some of the dynamic features of Python, however this probably the most complicated part:

Contains all classes and functions to deal with lists, dicts, generators and iterators in general.

Array modifications

If the content of an array (set/list) is requested somewhere, the current module will be checked for appearances of arr.append, arr.insert, etc. If the arr name points to an actual array, the content will be added

This can be really cpu intensive, as you can imagine. Because Jedi has to follow every append and check wheter it’s the right array. However this works pretty good, because in slow cases, the recursion detector and other settings will stop this process.

It is important to note that:

  1. Array modfications work only in the current module.
  2. Jedi only checks Array additions; list.pop, etc are ignored.

Parameter completion (evaluate/dynamic.py)

One of the really important features of Jedi is to have an option to understand code like this:

def foo(bar):
    bar. # completion here
foo(1)

There’s no doubt wheter bar is an int or not, but if there’s also a call like foo('str'), what would happen? Well, we’ll just show both. Because that’s what a human would expect.

It works as follows:

  • Jedi sees a param
  • search for function calls named foo
  • execute these calls and check the input. This work with a ParamListener.

Fast Parser (parser/fast.py)

Basically a parser that is faster, because it tries to parse only parts and if anything changes, it only reparses the changed parts. But because it’s not finished (and still not working as I want), I won’t document it any further.

Docstrings (evaluate/docstrings.py)

Docstrings are another source of information for functions and classes. jedi.evaluate.dynamic tries to find all executions of functions, while the docstring parsing is much easier. There are two different types of docstrings that Jedi understands:

For example, the sphinx annotation :type foo: str clearly states that the type of foo is str.

As an addition to parameter searching, this module also provides return annotations.

Refactoring (evaluate/refactoring.py)

Introduce some basic refactoring functions to Jedi. This module is still in a very early development stage and needs much testing and improvement.

Warning

I won’t do too much here, but if anyone wants to step in, please do. Refactoring is none of my priorities

It uses the Jedi API and supports currently the following functions (sometimes bug-prone):

  • rename
  • extract variable
  • inline variable

Imports & Modules

Compiled Modules (evaluate/compiled.py)

Imitate the parser representation.

Imports (evaluate/imports.py)

jedi.evaluate.imports is here to resolve import statements and return the modules/classes/functions/whatever, which they stand for. However there’s not any actual importing done. This module is about finding modules in the filesystem. This can be quite tricky sometimes, because Python imports are not always that simple.

This module uses imp for python up to 3.2 and importlib for python 3.3 on; the correct implementation is delegated to _compatibility.

This module also supports import autocompletion, which means to complete statements like from datetim (curser at the end would return datetime).

Caching & Recursions

Caching (cache.py)

This caching is very important for speed and memory optimizations. There’s nothing really spectacular, just some decorators. The following cache types are available:

  • module caching (load_parser and save_parser), which uses pickle and is really important to assure low load times of modules like numpy.
  • time_cache can be used to cache something for just a limited time span, which can be useful if there’s user interaction and the user cannot react faster than a certain time.

This module is one of the reasons why Jedi is not thread-safe. As you can see there are global variables, which are holding the cache information. Some of these variables are being cleaned after every API usage.

Recursions (recursion.py)

Recursions are the recipe of Jedi to conquer Python code. However, someone must stop recursions going mad. Some settings are here to make Jedi stop at the right time. You can read more about them here.

Next to jedi.evaluate.cache this module also makes Jedi not thread-safe. Why? ExecutionRecursionDecorator uses class variables to count the function calls.

Helper Modules

Most other modules are not really central to how Jedi works. They all contain relevant code, but you if you understand the modules above, you pretty much understand Jedi.

Python 2/3 compatibility (_compatibility.py)

To ensure compatibility from Python 2.6 - 3.3, a module has been created. Clearly there is huge need to use conforming syntax.