A demo plugin based Python code analyser

A few weeks back I wrote a analyser for Apache Thrift IDL in Python. We used it to enforce some code review guidelines. When we hooked it onto arcanist lint engine, we could give feedback to developers at the time they were proposing a code change. The thrift parsing was done using ptsd. The analyser was written as a single file which meant adding new rules meant changing the engine itself. I wanted to implement a plugin based architecture for it. However, I didn’t get around to do that because of other reasons.

Around the same time, I saw Nick Coghlan suggest straight.plugin to someone else. So finally, I got around to sit with it and the result is this post and a plugin based Python code analyser with an accompanying git repository.

The final result is a program powered by two plugins which will parse a module and print any rule violations that has been configured by the plugins:

2: Class hello: Class name not in CapWords
7: Class Nodocstring: No docstring found in class
10: Class Alongdocstring: Docstring is greater than 100 characters

Our analyser has two parts - the core engine and the plugins which can do various things with the code being analysed. For the demo analyser, we will be focussed on Python classes. We will ignore everything else. And as far as the plugins are concerned, they check if a certain condition or conditions are met by the class - in other words these are checkers.

Please note that I am using Python 3.6 for everything.

Core analyser engine

The core analyser engine uses the ast module to create an AST node. Checkout this PyCon 2018 talk - The AST and Me if you want to learn more.

Basically, we call the parse() function and we get back an AST Node object which we can then use to traverse through the various nodes using the walk function. Here’s the code for the engine at the time of writing:

# analyser/main.py
...
def analyse(file_path):
    with open(file_path) as f:
        root = ast.parse(f.read())
        for node in ast.walk(root):
            if isinstance(node, ast.ClassDef):
                check_class(node)
...

As we walk through the tree, we check if a node is a Python class via isinstance(node, ast.ClassDef). If it is, we call this function check_class which then invokes all the checks the analyser knows of. So, we can write the check_class function such that we have all the rules hard-coded in there or have a way to externally load the check rules. Externally loading the rules without having to change the core engine is where straight.plugin comes in.

This how the check_class function looks like at the time of writing:

# analyser/main.py
def check_class(node):
    # The call() function yields a function object
    # http://straightplugin.readthedocs.io/en/latest/api.html#straight.plugin.manager.PluginManager.call
    # Not sure why..
    for p in plugins.call('check_violation', node):
        if p:
            p()

plugins is a basically a plugin registry - straight.plugin calls it as PluginManager. The call method returns function objects corresponding to the function you specified. Here, I have specified check_violation which expects an argument to be passed, node. If it finds an valid object - i.e. it found the specified method, we call it.

How do we create the plugin registry? We use the load function:

from straight.plugin import load
plugins = load('analyser.extensions', subclasses=BaseClassCheck)
..

The load function is called with two parameters:

BaseClassCheck is implemented as follows:

# analyser/bases.py
class BaseClassCheck():

    @classmethod
    def check_violation(cls, node):
        raise NotImplementedError('Method not implemented')
    
    @classmethod
    def report_violation(cls, node, msg):
        print('{0}: Class {1}: {2}'.format(node.lineno, node.name, msg))

Any plugins for this engine is thus expected to subclass BaseClassCheck and implement the check_violation method.

The setup.py for the core engine looks as follows:

from setuptools import setup


setup(
    name='analyser',
    version='1.0',
    description='',
    long_description='',
    author='Amit Saha',
    author_email='a@a.com',
    install_requires=['straight.plugin'],
    packages=['analyser'],
    zip_safe=False,
)

Writing plugins

Our core engine is done, how do we write plugins? I was faced with this new thing called namespace packages. Looking at the docs, it made complete sense. Basically, you want your plugins to be able to shipped as different Python packages written by different people.

So, let’s do that now. There are two example plugins in the example_plugins sub-direcotry. Each is a Python package and has a directory structure as follows:

.
├── py_analyser_class_capwords
│   ├── analyser
│   │   └── extensions
│   │       └── capwords
│   │           └── __init__.py
│   └── setup.py

The only difference between the two is the final package name capwords for the above and docstring for the other. The key point above is the directory structure, analyser/extensions/capwords. The other plugin will have the directory structure analyser/extensions/docstring. This is what makes them both belong to the analyser.extensions namespace and hence discoverable by straight.plugin. The setup.py for the above plugin looks as follows:

from setuptools import setup


setup(
    name='analyser-class-capwords',
    version='1.0',
    description='',
    long_description='',
    author='Amit Saha',
    author_email='a@a.com',
    install_requires=['analyser'],
    packages=['analyser.extensions.capwords'],
    zip_safe=False,
)

In a practical scenario, we will have these packages elsewhere and will just pip install them and the effect will be the same.

Trying it all out

These are the things we will need to do:

But, that’s all very boring and I found the tox that I love - nox. So, there is a nox.py file, so if you install nox, you can just run nox from the root of the respository:

$ nox 
...
nox > python analyser/main.py ./module_under_test.py
2: Class hello: Class name not in CapWords
7: Class Nodocstring: No docstring found in class
10: Class Alongdocstring: Docstring is greater than 100 characters
nox > Session human_testing(python_version='3.6') was successful.
...

The last three lines of the output is the result of running the checks implemented by the plugins.

The nox.py file looks as follows:

import nox

@nox.session
@nox.parametrize('python_version', ['3.6'])
def human_testing(session, python_version):
    session.interpreter = 'python' + python_version
    session.run('pip', 'install', '.')
    session.run('pip', 'install', './example_plugins/py_analyser_class_capwords/')
    session.run('pip', 'install', './example_plugins/py_analyser_class_docstring/')
    session.run('python', 'analyser/main.py', './module_under_test.py')
 

Other learnings

Besides all the above things that I learned, I also learned something about the issubclass function. I was wondering why, the below comparisions was returning False:

issubclass(<class 'analyser.main.BaseClassCheck'>, <class '__main__.BaseClassCheck'>)
issubclass(<class 'analyser.extensions.capwords.CheckCapwords'> <class '__main__.BaseClassCheck'>)

And so basically, I moved BaseClassCheck from analyser/main.py to analyser/bases.py which meant the namespace of BaseClassCheck was always going to be the same.

Summary

We saw how we can use straight.plugin to implement a plug-in architecture in our programs. We also saw how we can use the ast module to parse Python source code and analyse them and finally we learned about nox.

The git repository has all the code we discussed in this post.