Extending the Pipeline – Writing PipelineElements¶
General notes¶
Each and Every PipelineElement has to follow what is
described here.
A PipelineElement has these attributes:
out, adefaultdict(list)(which means that if you access a key that is not in use, you will be presented a list for use)inputs, aTypesoutputs, aTypes
and this methods:
_run, which takes arbitrary keyword arguments and performs the actual action.
out is used to offer the output of this
PipelineElement to the rest of the Pipeline. You
can choose to ignore the defaultdict(list) nature of it, and replace it by a
dict or by an object that behaves like a dict.
What is part of the interface that offer via out has to be described by
outputs.
This can be set at class level and thus used for all instances of the
PipelineElement you define, or overridden (or even
created) in every instance (likely by __init__ or _run, see below).
If you only offer the output out which is a list of list
of int, you will use this:
class YourElement(PipelineElement):
...
outputs = Types(('out', list, list, int))
...
You can nest this type description arbitrarily deep, but this makes only sense
if you use collection-natured types (such as lists or tupels).
For dict, you describe the types of the values; keys won’t be checked.
outputs is used to create the flow from a Source to a Sink if you don’t
explicitly map the outputs to the inputs (that is, use the Sink >> Source
syntax).
In this case, outputs has to be set at the latest after _run has been
called.
The types of outputs are only relevant at check time, where PenchY
statically examines if Sinks and Sources fit into each other.
That means if the outputs are only known after the execution, you should not
bother to set the exact types.
A Types(('first', object), ('second', object), ...) is sufficient and will
do the same.
inputs are declared in the same way as outputs.
They are relevant at check time and execution time.
At execution time, the values of the actual items passed to _run are checked,
not the outputs descriptions of the Sources that passed the items.
As they are relevant at the beginning of _run, they have to be set at the latest
when the element is initialized, that is in the __init__ method.
Additionally to the types of inputs and outputs, you should describe
(e.g. in the docstring of the class) which inputs and outputs exist and
what they mean.
_run performs the execution of the element, and you are free to do want you
want here.
With two exceptions:
- the signature is
def _run(self, **kwargs)and nothing else (well you can change the name ofkwargs) - you set
outto the values that are described inoutputs
If you define elements that are only intended for server usage and require
libraries, you should not import them toplevel but at the filter level (that is,
in __init__, or similar) to minimize the libraries needed for a client.
This is necessary because every client reads the complete job (and therefor the
complete job description language).
Workloads¶
A workload has the attributes (you may want to use properties instead):
argumentsthe arguments to execute the workload- (optional)
information_argumentsthe arguments to gather information about the workload (version, etc.)
You don’t have to set out yourself as it will be set by the executing JVM.
The same goes for outputs, because they are set by
Workload and inherited (if you change them, you
have to provide a strict superset).
Filters¶
Filters can be a normal Filter or
SystemFilter.
The latter will be passed an additional argument called :environment: on
execution, which describes the execution environment of the SystemFilter (see
penchy.jobs.job.Job._build_environment()).
Tools¶
Agents¶
An Agent is a Tool that is invoked via the JVM’s
agent parameters (e.g. -agentlib).
Contrary to a workload, it has to care for its outputs and out.
An Agent has to provide these attributes (here you might want to use properties as well):
argumentsthe arguments to execute the agent, that is to include it in the JVM
WrappedJVM¶
A WrappedJVM is a PipelineElement as well as a
JVM.
You have to provide these attributes:
cmdlinehow to invoke the JVM with the wrapping (to use most ofJVMinfrastructure)
and these methods:
informationthat returns information about the JVM (and its configuration)
Even if a WrappedJVM is a PipelineElement, you
must not specify a _run method.
Whatever you do: You must behave like a JVM, so be
sure to take a look how it is implemented.