Adding modules#
The goal of scikit-digital-health
is to have one package with a defined structure that allows for easy pipeline generaton with multiple stages that may or may not depend on previous stages.
To that end, there are several pre-defined base classes intended to be sub-classeed that will help setting up modules that are intended to directly interface with the pipeline infrastructure.
Add a new module#
Create a new module directory under
src/skdh
with the desired name. For this example, we will usepreprocessing
:cd src/skdh mkdir preprocessing touch preprocessing/__init__.py
Create the module class that will be added to the pipeline
This should be in a file with the same name as the class, eg
preprocessing.py
An example file is below:
# src/skdh/preprocessing/preprocessing.py import ... from skdh.base import BaseProcess, handle_process_returns # import the base process class class Preprocessing(BaseProcess): """ Class to implement any processing steps NOTE that this is the docstring for the class! placing up here aids with clean autogenerated documentation Parameters ---------- attr1 : {None, int}, optional Attribute 1 description attr2 : {None, float}, optional Attribute 2 description ... """ def __init__(self, attr1=None, attr2=None): # pass all arguments into super().__init__(), this will # autogenerate the __repr__ for the class super().__init__( attr1=attr1, attr2=attr2 ) # rest of setup required """ Next, define the "predict" method, which does all the computation for the process. The call of this is pretty open, but should have the **kwargs option at the end. While the **kwargs is important for the way the pipelines, etc work, it clutters/can add confusion to the docstring. As such, the first line of the docstring should be the actual call for predict Note that the * by itself indicates that the following arguments are key-word ONLY. For this example, therefore the gyro is an optional argument. Time and accel are treated as required arguments, even though they are key-word Note that the units for time, accel, and gyro should ALWAYS be unix seconds, 'g', and 'deg/s' to match the pipeline implementation, and what other processes are expecting. """ @handle_process_returns(results_to_kwargs=False) def predict(self, time=None, accel=None, *, gyro=None, **kwargs): """ predict(time, accel, *, gyro=None) Preprocess the time, accel, and gyroscope if provided Parameters ---------- time : numpy.ndarray (N, ) array of unix timestamps, in seconds accel : numpy.ndarray (N, 3) array of acceleration, in units of 'g' gyro : numpy.ndarray, optional (N, 3) array of angular velocity, units of 'deg/s' ... """ # run preprocessing accel = calibrate_accel(accel) # if you need something passed in thats not a default argument, but # might be useful, retrieve it from kwargs opt_item = kwargs.get('opt_item', None) """ NOTE that returns MUST be dictionaries!!! Because we set `results_to_kwargs=False` in `@handle_process_returns` above, `preproc_dict` values will NOT be included for future steps to use in an easy way. If you want them to be included, set `results_to_kwargs=True` If you need to return both results, and updates to inputs for future stages, you can use `results_to_kwargs=False`, and then set the return to `return results, updates` where keys in `results` will not be available for later steps, but keys in `updates` (which needs to be a dictionary) will be. """ return preproc_dict
External file functions
If there is too much code to be contained inside the
Preprocessing.predict
method, there are a few suggested guidelines:Generally avoid adding too many other functions to the main file (
preprocessing.py
).Individual functions (especially if they are fairly long) should ideally get their own file, with the name matching that of the function inside.
Functions with a common theme can be in 1 file, with the comman name matching that of the file.
A
utility.py
file might make sense for any functions that have general utility outside of the specific module (ie something frompreprocesisng/uility.py
getting called fromgait/gait.py
).
These are just suggestions in order to maintain some clarify with multiple functions split over multiple files. If you have a good reason to do something different, just try to keep it as clear as possible.
Make sure everyting is setup/imported
make sure all importes are handled in
scr/skdh/preprocessing/__init__.py
, as well as addingpreprocessing
tosrc/skdh/__init__.py
Make any additions to
setup.py
If you don’t have any data files (any non-Python files that need to be distributed with the package), or any low-level (c, cython, or Fortran) extensions, everything should be all set for the actual module!
If you have data files, find the
def configuration
function insetup.py
and locate the DATA FILES section, and add any data files that you have:
# setup.py ... # DATA FILES # ======================== config.add_data_files( ('skdh/gait/model', 'src/skdh/gait/model/final_features.json'), ('skdh/gait/model', 'src/skdh/gait/model/lgbm_gait_classifier_no-stairs.lgbm'), ('skdh/preprocessing/data', 'src/skdh/preprocessing/data/preprocessing_info.dat') # Added this file ) # alternatively add this directory, any files/folders under this directory will be added recursively config.add_data_dir('src/skdh/preprocessing/data') # ======================== config.get_version('src/skdh/version.py') return config
If you have low-level extensions, find the EXTENSIONS section and add as required:
# setup.py ... def configuration(parent_package='', top_path=None): ... # EXTENSIONS # ======================== # Fortran code that is NOT being compiled with f2py - it is being # built as a fortran function that will be imported into C code config.add_library( 'fcwa_convert', sources='src/skdh/read/_extensions/cwa_convert.f95' ) # C code that contains the necessary CPython API calls to allow it to # be imported and used in python config.add_extension( 'skdh/read/_extensions/cwa_convert', # note the path WITHOUT src/ sources='src/skdh/read/_extensions/cwa_convert.c', # note the path WITH src/ libraries=['fcwa_convert'] # link the previously built fortran library ) # standard C code extension that does not use a fortran library. # Adding a Fortran extension follows the same syntax # (numpy will do the heavy lifting for whatever compilation is required) config.add_extension( 'skdh/read/_extensions/bin_convert', sources='src/skdh/read/_extensions/bin_convert.c' ) # dealing with Cython extensions. if os.environ.get('CYTHONIZE', 'False') == 'True': # if the environment variable was set, generate .c files from # cython .pyx files. This is not necessary as the .c files # are distributed with the code, but is available as an option # in the off chance that the .c files are not up to date from Cython.Build import cythonize # only import if we need, as otherwise CYTHON isn't required as a requirement for pyxf in list(Path('.').rglob('*/features/lib/_cython/*.pyx')): cythonize(str(pyxf), compiler_directives={'language_level': 3}) # create a c file from the cython file # Either way, get a list of the cython .c files and add each # as an extension to be compiled for cf in list(Path('.').rglob('*/features/lib/_cython/*.c')): config.add_extension( str(Path(*cf.parts[1:]).with_suffix('')), sources=[str(cf)] ) # ======================== ... return config