As with any substantial application package, the ASPIRE project needed a convenient way to specify configuration settings pertaining to different parts of the computational pipeline.
What follows below are some outlines from our attempts to tackle this configuration issue. Where a supplementary (and hopefully useful) nugget is provided, or a caveat discussed, I shall append a linked numeral, like so: (n)
A brief background of ASPIRE
ASPIRE is a Python (3.6) package under development, which ingests Micrographs, the output of Cryo-Electron Microscopy (images that closely resemble television static), and comes up with a 3D reconstruction of the molecule. Read the excellent writeup on the ASPIRE page for a more comprehensive review of the package.
One of the early stages of processing in ASPIRE is APPLE Picking, a process that identifies regions in a micrograph which are very likely to be particles. See the APPLE Picker paper for full details of that algorithm.
Fig. 1: (a) A Micrograph; (b) Identified Particles
One of the key operations performed by APPLE-Picker is binary erosion of segments of the input image. This is an operation that takes in segments likely to contain particles (white) and a kernel, which is essentially a pixel template that determines the general shape/size of a candidate particle, and erodes away segments that do not conform to this shape.
For example, here is an instance of a segmented image, a kernel, and the corresponding eroded image.
Fig. 2: (a) Segmented areas of micrograph; (b) Kernel used for binary erosion; (c) Eroded segments
The size of the kernel guides the size of suspected artifacts that will be discarded, and gets passed on as the min_size
and max_size
arguments of the Picker
class.
class Picker:
def __init__(self, max_size=156, min_size=19, ...):
From the point of view of the Picker
class (which is in only ever used internally by a more public-facing Apple
class), it is absolutely okay (and indeed desirable) to have sensible defaults like above. The class could be unit-tested in isolation with minimal fuss and without a lot of setup.
However, we would like to enable the users of ASPIRE to configure this min_size
value among others in a declarative way, without anyone ever having to touch code. (1)
To start then, at the conclusion…
What do we want our final code to look like?
We would like to support code like the following, where a single import
allows us access to a config
object, from which we access attributes as are relevant to our situation:
from aspire import config
class Apple:
def __init__(self, output_dir=None):
self.min_size = config.apple.min_particle_size
...
optimization_fn(atol=config.apple.tol.abs)
...
self.picker = Picker(self.min_size, self.max_size, ...)
Notice in the fragment above that we would like to support arbitrarily-namespaced attributes (config.apple.min_particle_size
or config.apple.tol.abs
), namespaces that we can (if the need arises) maintain independently.
The application will drive itself by these package-level settings (the config
). However, we would also like to be able to override these settings (in scripts or notebooks, for example). For example, if we supported a apple.picker.gpu_enabled
(boolean) configuration directive (default False), we might want to do:
from aspire import config
...
# run code with GPU disabled
apple.pick()
...
with config({'apple.picker.gpu_enabled': True}):
# run code with GPU enabled
apple.pick()
Notice above that we specify a dictionary as the override argument. This is because while we could decide to support something like:
with config(gpu_enabled=True, precision='single'):
...
The above syntax would not allow us to override apple.picker.gpu_enabled
, since the intervening .
operator(s) would mess things up syntactically.
with config(apple.picker.gpu_enabled=True)
^
SyntaxError: keyword can't be an expression
How do we get there?
Arbitrarily-nested structures should immediately bring JSON to mind. We could drive things off a config.json
that looks like this:
...
"apple": {
"max_particle_size": 156,
"min_particle_size": 19,
"minimum_overlap_amount": 7,
"tol": {
"abs": 1e-5,
"rel": 1e-2
}
...
}
...
This config.json
resides right inside the package directory.
.
└── src
├── aspire
│ └── __init__.py
│ └── config.json
The aspire/__init__.py
reads in this config.json
, and initializes a module-level variable config
object (of class Config
): (2)
from importlib_resources import read_text
import aspire
from aspire.utils.config import Config
config = Config(read_text(aspire, 'config.json'))
The importlib_resources
library leverages the powerful import
mechanism already available in Python to load any resource
(a file located in any importable package) – in our case, the resource config.json
.
An aside on importlib_resources
In a scenario when we’re developing a distributable (and PIP-installable) package, we can no longer assume that we will be able to locate the config.json
file directly using facilities from the os.path
module. Our whole package might be zipped up, or be available through a hitherto-uninvented package-loading mechanism. (3)
We thus rely on this excellent library to get a handle on the json string, using the read_text
function in importlib_resources
. The fact that importlib_resources
is an integral part of Python 3.7 and above (as importlib.resources) should give us confidence in its longevity.
The read_text(aspire, 'config.json')
line looks for config.json
directly wherever the aspire
package is found, which is indeed the case in our setup.
importlib_resources
locates our resources, but does not package them. Make sure that config.json
is included in your distributable package by adding the appropriate package_data
directive in setup.py
. Notice also the install_requires
entry needed to make things work:
setup(
name='aspire',
version='0.3.0',
...
install_requires=[
'importlib_resources>=1.0.2'
],
package_data={'aspire': ['config.json']}
)
The Config
class
The key to providing arbitrarily-nested attributes in our Config
class is to utilize the types.SimpleNamespace
class (available since Python 3.3). As the name suggests, this is a simple class that provides attribute access, with the added ability to initialize attributes while constructing the object:
from types import SimpleNamespace
o = SimpleNamespace(x=1, y=2)
print(o.x)
Another lesser-known feature that we use is in the json.loads
function – the object_hook
argument. Here is what the docstring says about it:
``object_hook`` is an optional function that will be called with the
result of any object literal decode (a ``dict``). The return value of
``object_hook`` will be used instead of the ``dict``.
With SimpleNamespace
and object_hook
, we have the tools to do what we want. The aspire.utils.config.Config
class looks like the following:
from types import SimpleNamespace
import json
...
class Config:
def __init__(self, json_string):
self.namespace = json.loads(
json_string,
object_hook=lambda d: SimpleNamespace(**d)
)
...
The object_hook
is triggered any time a dict
is available during JSON-parsing (either the top-level dict
or an inner dict
). We simply modify object_hook
to return a SimpleNamespace
instead of a dict
, unpacking the dict
using the **
operator.
The namespace
attribute of Config
is now a SimpleNamespace
(which in turn could contain other types, including other SimpleNamespace
objects itself!)
One small tweak to the class (relaying all attribute access to the namespace
attribute) and we’re ready to go:
class Config:
def __init__(self, json_string):
self.namespace = json.loads(
json_string,
object_hook=lambda d: SimpleNamespace(**d)
)
...
def __getattr__(self, item):
return getattr(self.namespace, item)
Overriding configuration values
To support the overriding mechanism we discussed earlier:
with config({'apple.picker.gpu_enabled': True}):
apple.pick()
we return a context-manager in the __call__
method of the Config
class:
def __call__(self, override_dict):
return self.ConfigContext(self, override_dict)
This ConfigContext
class can be implemented as an inner-class, because none other than the Config
class needs to know about it:
from copy import deepcopy
...
class Config:
class ConfigContext:
def __init__(self, config, d):
self._original_namespace = deepcopy(config.namespace)
self.config = config
for k, v in d.items():
# TODO: Set attributes in config.namespace, but recursively.
pass
def __enter__(self):
return self.config
def __exit__(self, exc_type, exc_val, exc_tb):
self.config.namespace = self._original_namespace
The ConfigContext
class gets a handle on the Config
class it was created from, creates a deepcopy
of the original namespace
attribute, modifies the namespace
(the TODO
which we haven’t yet implemented), and re-attaches the deep-copied namespace
on __exit__
.
A recursive setattr
What we now need as the final piece is a recursive version of setattr
. Something that can take {'apple.picker.gpu_enabled': True}
and modify config.apple.picker.gpu_enabled
to True
). We could roll our own implementation, but a very compact one is available on StackOverflow:
import functools
# https://stackoverflow.com/questions/31174295/getattr-and-setattr-on-nested-subobjects-chained-properties
def rsetattr(obj, attr, val):
pre, _, post = attr.rpartition('.')
return setattr(rgetattr(obj, pre) if pre else obj, post, val)
def rgetattr(obj, attr, *args):
def _getattr(obj, attr):
return getattr(obj, attr, *args)
return functools.reduce(_getattr, [obj] + attr.split('.'))
These functions are bit tricky to understand. Staring at the above code, we realize that the magic is happening in the line:
setattr(_rgetattr(obj, pre) if pre else obj, post, val)
Spelled out – split the multi-dotted attribute of obj
into strings (starting from the right). If a left part (pre
) exists, get it using _rgetattr
, otherwise set the attribute on obj
as one would normally do. This has the effect of setting nested attributes from the outside in.
_rgetattr
delegates its heavy lifting to functools.reduce
, but is essentially a recursive approach to getattr
.
Bonus features
1. Configure logging in config.json
Given that the purpose of config.json
is to configure our package, why not configure logging using it too?
We add a new logging
key to our config.json
, something like:
{
"logging": {
"version": 1,
"formatters": {
"simple_formatter": {
"format": "%(asctime)s %(message)s",
"datefmt": "%Y/%m/%d %H:%M:%S"
}
},
"handlers": {
"console": {
"class": "logging.StreamHandler",
"formatter": "simple_formatter",
"level": "DEBUG",
"stream": "ext://sys.stdout"
}
},
"loggers": {
"aspire": {
"level": "DEBUG",
"handlers": ["console"]
}
}
},
...
}
The Python logging
module is capable of being configured using a dict
, but not directly using a SimpleNamespace
. No matter – we handle that as a special case in our Config
class:
import logging
...
class Config:
...
def __init__(self, json_string):
d = json.loads(json_string)
# The logging module supports configuration from a dictionary
# using dictConfig, but not a SimpleNamespace,
# so take care of that first
if 'logging' in d:
logging.config.dictConfig(d['logging'])
else:
logging.basicConfig(level=logging.INFO)
# Now that logging is configured, reload the json, but now
# with an object hook so we have cleaner access to keys
# by way of (recursive) attributes
self.namespace = json.loads(
json_string,
object_hook=lambda d: SimpleNamespace(**d)
)
...
2. Making config
values available as arguments in scripts
It would be capital to allow our config.*
values to be overridable in top-level scripts as well, so we could do:
python -m aspire apple experiment009.star --config.apple.min_particle_size 33
This is absolutely possible, and something that we have previously supported in ASPIRE. The key is to subclass argparse.ArgumentParser
and add a new argument group in its constructor:
from argparse import ArgumentParser
class ConfigArgumentParser(ArgumentParser):
def __init__(self, *args, **kwargs):
...
self.add_argument_group('config')
...
However, the details can get somewhat tedious, so I’ll save that for another post.
1: We use git describe
inside a somewhat elaborate get_full_version()
function when generating error logs, which, in addition to providing the environment within which the error occurred, alerts us if the user has been messing with the code. But that’s a whole other discussion.
2: Some Python cognoscenti might arch their eyebrows on anything (beyond setting a __version__
) being done in __init__.py
, but an agreement on etiquette is something different from a taboo, and we feel this minimal setup is justified in our case.
3: There are ways to get around this, see the zip_safe
flag in setuptools
.