Adding modules#

The goal of scikit-digital-health is to have one package with a defined structure that allows for easy pipeline generaton with multiple stages that may or may not depend on previous stages. To that end, there are several pre-defined base classes intended to be sub-classeed that will help setting up modules that are intended to directly interface with the pipeline infrastructure.

Add a new module#

Create a new module directory under src/skdh with the desired name. For this example, we will use preprocessing:
```
cd src/skdh
mkdir preprocessing
touch preprocessing/__init__.py
```

Create the module class that will be added to the pipeline

This should be in a file with the same name as the class, eg preprocessing.py
An example file is below:

# src/skdh/preprocessing/preprocessing.py
import ...

from skdh.base import BaseProcess, handle_process_returns  # import the base process class


class Preprocessing(BaseProcess):
    """
    Class to implement any processing steps

    NOTE that this is the docstring for the class! placing up here aids with
    clean autogenerated documentation

    Parameters
    ----------
    attr1 : {None, int}, optional
        Attribute 1 description
    attr2 : {None, float}, optional
        Attribute 2 description

    ...
    """
    def __init__(self, attr1=None, attr2=None):
        # pass all arguments into super().__init__(), this will
        # autogenerate the __repr__ for the class
        super().__init__(
            attr1=attr1,
            attr2=attr2
        )
        # rest of setup required

    """
    Next, define the "predict" method, which does all the computation for the process.
    The call of this is pretty open, but should have the **kwargs option at the end.
    While the **kwargs is important for the way the pipelines, etc work, it
    clutters/can add confusion to the docstring. As such, the first line of the
    docstring should be the actual call for predict

    Note that the * by itself indicates that the following arguments are key-word ONLY.
    For this example, therefore the gyro is an optional argument. Time and accel
    are treated as required arguments, even though they are key-word

    Note that the units for time, accel, and gyro should ALWAYS be unix seconds,
    'g', and 'deg/s' to match the pipeline implementation, and what other processes
    are expecting.
    """
    @handle_process_returns(results_to_kwargs=False)
    def predict(self, time=None, accel=None, *, gyro=None, **kwargs):
        """
        predict(time, accel, *, gyro=None)

        Preprocess the time, accel, and gyroscope if provided

        Parameters
        ----------
        time : numpy.ndarray
            (N, ) array of unix timestamps, in seconds
        accel : numpy.ndarray
            (N, 3) array of acceleration, in units of 'g'
        gyro : numpy.ndarray, optional
            (N, 3) array of angular velocity, units of 'deg/s'

        ...
        """
        # run preprocessing
        accel = calibrate_accel(accel)

        # if you need something passed in thats not a default argument, but
        # might be useful, retrieve it from  kwargs
        opt_item = kwargs.get('opt_item', None)

        """
        NOTE that returns MUST be dictionaries!!!

        Because we set `results_to_kwargs=False` in `@handle_process_returns`
        above, `preproc_dict` values will NOT be included for future steps to
        use in an easy way. If you want them to be included, set `results_to_kwargs=True`
        If you need to return both results, and updates to inputs for future
        stages, you can use `results_to_kwargs=False`, and then set the return to

        `return results, updates`

        where keys in `results` will not be available for later steps, but
        keys in `updates` (which needs to be a dictionary) will be.
        """
        return preproc_dict

External file functions
- If there is too much code to be contained inside the Preprocessing.predict method, there are a few suggested guidelines:
  Generally avoid adding too many other functions to the main file (preprocessing.py).
  
  Individual functions (especially if they are fairly long) should ideally get their own file, with the name matching that of the function inside.
  
  Functions with a common theme can be in 1 file, with the comman name matching that of the file.
  
  A utility.py file might make sense for any functions that have general utility outside of the specific module (ie something from preprocesisng/uility.py getting called from gait/gait.py).
- These are just suggestions in order to maintain some clarify with multiple functions split over multiple files. If you have a good reason to do something different, just try to keep it as clear as possible.
Make sure everyting is setup/imported
- make sure all importes are handled in scr/skdh/preprocessing/__init__.py, as well as adding preprocessing to src/skdh/__init__.py

Make any additions to setup.py

If you don’t have any data files (any non-Python files that need to be distributed with the package), or any low-level (c, cython, or Fortran) extensions, everything should be all set for the actual module!
If you have data files, find the def configuration function in setup.py and locate the DATA FILES section, and add any data files that you have:

# setup.py
...

# DATA FILES
# ========================
config.add_data_files(
    ('skdh/gait/model', 'src/skdh/gait/model/final_features.json'),
    ('skdh/gait/model', 'src/skdh/gait/model/lgbm_gait_classifier_no-stairs.lgbm'),
    ('skdh/preprocessing/data', 'src/skdh/preprocessing/data/preprocessing_info.dat')        # Added this file
)

# alternatively add this directory, any files/folders under this directory will be added recursively
config.add_data_dir('src/skdh/preprocessing/data')
# ========================

config.get_version('src/skdh/version.py')

return config

If you have low-level extensions, find the EXTENSIONS section and add as required:

# setup.py
...
def configuration(parent_package='', top_path=None):
    ...
    # EXTENSIONS
    # ========================
    # Fortran code that is NOT being compiled with f2py - it is being
    # built as a fortran function that will be imported into C code
    config.add_library(
        'fcwa_convert',
        sources='src/skdh/read/_extensions/cwa_convert.f95'
    )

    # C code that contains the necessary CPython API calls to allow it to
    # be imported and used in python
    config.add_extension(
        'skdh/read/_extensions/cwa_convert',  # note the path WITHOUT src/
        sources='src/skdh/read/_extensions/cwa_convert.c',  # note the path WITH src/
        libraries=['fcwa_convert']  # link the previously built fortran library
    )

    # standard C code extension that does not use a fortran library.
    # Adding a Fortran extension follows the same syntax
    # (numpy will do the heavy lifting for whatever compilation is required)
    config.add_extension(
        'skdh/read/_extensions/bin_convert',
        sources='src/skdh/read/_extensions/bin_convert.c'
    )

    # dealing with Cython extensions.
    if os.environ.get('CYTHONIZE', 'False') == 'True':
        # if the environment variable was set, generate .c files from
        # cython .pyx files. This is not necessary as the .c files
        # are distributed with the code, but is available as an option
        # in the off chance that the .c files are not up to date
        from Cython.Build import cythonize  # only import if we need, as otherwise CYTHON isn't required as a requirement

        for pyxf in list(Path('.').rglob('*/features/lib/_cython/*.pyx')):
            cythonize(str(pyxf), compiler_directives={'language_level': 3})  # create a c file from the cython file

    # Either way, get a list of the cython .c files and add each
    # as an extension to be compiled
    for cf in list(Path('.').rglob('*/features/lib/_cython/*.c')):
        config.add_extension(
            str(Path(*cf.parts[1:]).with_suffix('')),
            sources=[str(cf)]
        )

    # ========================
    ...

    return config