Adding modules#

The goal of scikit-digital-health is to have one package with a defined structure that allows for easy pipeline generaton with multiple stages that may or may not depend on previous stages. To that end, there are several pre-defined base classes intended to be sub-classeed that will help setting up modules that are intended to directly interface with the pipeline infrastructure.

Add a new module#

  1. Create a new module directory under src/skdh with the desired name. For this example, we will use preprocessing:

    cd src/skdh
    mkdir preprocessing
    touch preprocessing/__init__.py
    
  2. Create the module class that will be added to the pipeline

    • This should be in a file with the same name as the class, eg preprocessing.py

    • An example file is below:

    # src/skdh/preprocessing/preprocessing.py
    import ...
    
    from skdh.base import BaseProcess, handle_process_returns  # import the base process class
    
    
    class Preprocessing(BaseProcess):
        """
        Class to implement any processing steps
    
        NOTE that this is the docstring for the class! placing up here aids with
        clean autogenerated documentation
    
        Parameters
        ----------
        attr1 : {None, int}, optional
            Attribute 1 description
        attr2 : {None, float}, optional
            Attribute 2 description
    
        ...
        """
        def __init__(self, attr1=None, attr2=None):
            # pass all arguments into super().__init__(), this will
            # autogenerate the __repr__ for the class
            super().__init__(
                attr1=attr1,
                attr2=attr2
            )
            # rest of setup required
    
        """
        Next, define the "predict" method, which does all the computation for the process.
        The call of this is pretty open, but should have the **kwargs option at the end.
        While the **kwargs is important for the way the pipelines, etc work, it
        clutters/can add confusion to the docstring. As such, the first line of the
        docstring should be the actual call for predict
    
        Note that the * by itself indicates that the following arguments are key-word ONLY.
        For this example, therefore the gyro is an optional argument. Time and accel
        are treated as required arguments, even though they are key-word
    
        Note that the units for time, accel, and gyro should ALWAYS be unix seconds,
        'g', and 'deg/s' to match the pipeline implementation, and what other processes
        are expecting.
        """
        @handle_process_returns(results_to_kwargs=False)
        def predict(self, time=None, accel=None, *, gyro=None, **kwargs):
            """
            predict(time, accel, *, gyro=None)
    
            Preprocess the time, accel, and gyroscope if provided
    
            Parameters
            ----------
            time : numpy.ndarray
                (N, ) array of unix timestamps, in seconds
            accel : numpy.ndarray
                (N, 3) array of acceleration, in units of 'g'
            gyro : numpy.ndarray, optional
                (N, 3) array of angular velocity, units of 'deg/s'
    
            ...
            """
            # run preprocessing
            accel = calibrate_accel(accel)
    
            # if you need something passed in thats not a default argument, but
            # might be useful, retrieve it from  kwargs
            opt_item = kwargs.get('opt_item', None)
    
            """
            NOTE that returns MUST be dictionaries!!!
    
            Because we set `results_to_kwargs=False` in `@handle_process_returns`
            above, `preproc_dict` values will NOT be included for future steps to
            use in an easy way. If you want them to be included, set `results_to_kwargs=True`
            If you need to return both results, and updates to inputs for future
            stages, you can use `results_to_kwargs=False`, and then set the return to
    
            `return results, updates`
    
            where keys in `results` will not be available for later steps, but
            keys in `updates` (which needs to be a dictionary) will be.
            """
            return preproc_dict
    
  3. External file functions

    • If there is too much code to be contained inside the Preprocessing.predict method, there are a few suggested guidelines:

      • Generally avoid adding too many other functions to the main file (preprocessing.py).

      • Individual functions (especially if they are fairly long) should ideally get their own file, with the name matching that of the function inside.

      • Functions with a common theme can be in 1 file, with the comman name matching that of the file.

      • A utility.py file might make sense for any functions that have general utility outside of the specific module (ie something from preprocesisng/uility.py getting called from gait/gait.py).

    • These are just suggestions in order to maintain some clarify with multiple functions split over multiple files. If you have a good reason to do something different, just try to keep it as clear as possible.

  4. Make sure everyting is setup/imported

    • make sure all importes are handled in scr/skdh/preprocessing/__init__.py, as well as adding preprocessing to src/skdh/__init__.py

  5. Make any additions to setup.py

    • If you don’t have any data files (any non-Python files that need to be distributed with the package), or any low-level (c, cython, or Fortran) extensions, everything should be all set for the actual module!

    • If you have data files, find the def configuration function in setup.py and locate the DATA FILES section, and add any data files that you have:

    # setup.py
    ...
    
    # DATA FILES
    # ========================
    config.add_data_files(
        ('skdh/gait/model', 'src/skdh/gait/model/final_features.json'),
        ('skdh/gait/model', 'src/skdh/gait/model/lgbm_gait_classifier_no-stairs.lgbm'),
        ('skdh/preprocessing/data', 'src/skdh/preprocessing/data/preprocessing_info.dat')        # Added this file
    )
    
    # alternatively add this directory, any files/folders under this directory will be added recursively
    config.add_data_dir('src/skdh/preprocessing/data')
    # ========================
    
    config.get_version('src/skdh/version.py')
    
    return config
    
    • If you have low-level extensions, find the EXTENSIONS section and add as required:

    # setup.py
    ...
    def configuration(parent_package='', top_path=None):
        ...
        # EXTENSIONS
        # ========================
        # Fortran code that is NOT being compiled with f2py - it is being
        # built as a fortran function that will be imported into C code
        config.add_library(
            'fcwa_convert',
            sources='src/skdh/read/_extensions/cwa_convert.f95'
        )
    
        # C code that contains the necessary CPython API calls to allow it to
        # be imported and used in python
        config.add_extension(
            'skdh/read/_extensions/cwa_convert',  # note the path WITHOUT src/
            sources='src/skdh/read/_extensions/cwa_convert.c',  # note the path WITH src/
            libraries=['fcwa_convert']  # link the previously built fortran library
        )
    
        # standard C code extension that does not use a fortran library.
        # Adding a Fortran extension follows the same syntax
        # (numpy will do the heavy lifting for whatever compilation is required)
        config.add_extension(
            'skdh/read/_extensions/bin_convert',
            sources='src/skdh/read/_extensions/bin_convert.c'
        )
    
        # dealing with Cython extensions.
        if os.environ.get('CYTHONIZE', 'False') == 'True':
            # if the environment variable was set, generate .c files from
            # cython .pyx files. This is not necessary as the .c files
            # are distributed with the code, but is available as an option
            # in the off chance that the .c files are not up to date
            from Cython.Build import cythonize  # only import if we need, as otherwise CYTHON isn't required as a requirement
    
            for pyxf in list(Path('.').rglob('*/features/lib/_cython/*.pyx')):
                cythonize(str(pyxf), compiler_directives={'language_level': 3})  # create a c file from the cython file
    
        # Either way, get a list of the cython .c files and add each
        # as an extension to be compiled
        for cf in list(Path('.').rglob('*/features/lib/_cython/*.c')):
            config.add_extension(
                str(Path(*cf.parts[1:]).with_suffix('')),
                sources=[str(cf)]
            )
    
        # ========================
        ...
    
        return config