skdh.Pipeline#

class skdh.Pipeline(load_kwargs=None, flatten_results=False)#

Pipeline class that can have multiple steps that are processed sequentially. Some of the output is passed between steps. Has the ability to save results from processing steps as local files, as well as return the results in a dictionary following the processing.

Parameters:
load_kwargs{None, dict}, optional

Dictionary of key-word arguments that will get directly passed to the Pipeline.load() function. If None, no pipeline will be loaded (default).

flatten_resultsbool, optional

Flatten the results of the processing steps in the return dictionary. By default (False), results of a step will be stored under a key of the step’s class name. If True, all results will be on the same level, and an exception will be raised if keys would be overwritten.

Methods

add(process[, name, save_file, plot_file, ...])

Add a processing step to the pipeline

load([file, yaml_str, process_raise, ...])

Load a previously saved pipeline from a file or YAML string.

run(**kwargs)

Run through the pipeline, sequentially processing steps.

save(file)

Save the pipeline to a file for consistent pipeline generation.

Examples

Load a pipeline, saved in a file, on instantiation. Also set it to raise an error instead of a warning if one of the processes cannot be loaded:

>>> pipe = Pipeline(
>>>     load_kwargs={"file": "example_pipeline.skdh", "process_raise": False})
add(process, name=None, save_file=None, plot_file=None, make_copy=True)#

Add a processing step to the pipeline

Parameters:
processProcess

Process class that forms the step to be run

namestr, optional

Process name. Used to delineate multiple of the same process if required. Output results will be under this name. If None is provided, output results will be under the class name, with some mangling in the case of multiple of the same processes in the same pipeline.

save_file{None, str}, optional

Optionally formattable path for the save file. If left/set to None, the results will not be saved anywhere.

plot_file{None, str}, optional

Optionally formattable path for the output of plotting. If left/set to None, the plot will not be generated and saved.

make_copybool, optional

Create a shallow copy of process to add to the pipeline. This allows a single instance to be used in multiple pipelines while retaining custom save file names and other pipeline-specific attributes. Default is True.

Notes

Some of the avaible parameters for results saving and plotting are:

  • date : the current date, expressed as YYYYMMDD.

  • name : the name of the process doing the analysis.

  • file : the name of the input file passed to the pipeline.

Note that if no file was passed in initially, then the file would be an empty string. However, even if the first step of the pipeline is not one that would use a file keyword, you can still specify it and it will be ignored for everything but this parameter.

>>> p = Pipeline()
>>> p.add(Gait(), save_file="{file}_gait_results.csv")
>>> p.run(accel=accel, time=time)
No file was passed in, so the resulting output file would be
`_gait_results.csv`

However if the p.run call is now:

>>> p.run(accel=accel, time=time, file="example_file.txt")
then the output would be `example_file_gait_results.csv`.

Examples

Add Gait and save the results to a fixed file name:

>>> from skdh.gait import Gait
>>> p = Pipeline()
>>> p.add(Gait(), save_results="gait_results.csv")

Add a binary file reader without saving the results and gait processing with a variable file name:

>>> from skdh.io import ReadBin
>>> p = Pipeline()
>>> p.add(ReadBin(bases=0, periods=24), save_results=None)
>>> p.add(Gait(), save_results="{date}_{name}_results.csv")

If the date was, for example, May 18, 2021 then the results file would be 20210518_Gait_results.csv.

load(file=None, *, yaml_str=None, process_raise=False, noversion_raise=False, old_raise=False)#

Load a previously saved pipeline from a file or YAML string.

Parameters:
file{str, path-like}

File path to load the pipeline structure from.

yaml_strstr

YAML string of the pipeline. If provided, file is ignored.

process_raisebool, optional

Raise an error if a process in file or yaml_str cannot be added to the pipeline. Default is False, which issues a warning instead.

noversion_raisebool

Raise an error if no version is provided in the input data. Default is False, which issues a warning instead.

old_raisebool

Raise an error if the version used to create the pipeline is old enough to potentially cause compatibility/functionality errors. Default is False, which issues a warning instead.

run(**kwargs)#

Run through the pipeline, sequentially processing steps. Inputs must be provided as key-word arguments.

Parameters:
kwargs

Any key-word arguments. Will get passed to the first step of the pipeline, and therefore they must contain at least what the first process is expecting.

Returns:
resultsdict

Dictionary of the results of any steps of the pipeline that return results.

save(file)#

Save the pipeline to a file for consistent pipeline generation.

Parameters:
file{str, path-like}

File path to save the pipeline structure to