.. _adding-modules:

##############
Adding modules
##############

The goal of ``scikit-digital-health`` is to have one package with a defined structure that allows for easy pipeline generaton with multiple stages that may or may not depend on previous stages.
To that end, there are several pre-defined base classes intended to be sub-classeed that will help setting up modules that are intended to directly interface with the pipeline infrastructure.

Add a new module
================

1. Create a new module directory under ``src/skdh`` with the desired name. For this example, we will use ``preprocessing``::

    cd src/skdh
    mkdir preprocessing
    touch preprocessing/__init__.py

2. Create the module class that will be added to the pipeline

    * This should be in a file with the same name as the class, eg ``preprocessing.py``

    * An example file is below:

    .. code:: python

        # src/skdh/preprocessing/preprocessing.py
        import ... 

        from skdh.base import BaseProcess, handle_process_returns  # import the base process class


        class Preprocessing(BaseProcess):
            """
            Class to implement any processing steps

            NOTE that this is the docstring for the class! placing up here aids with 
            clean autogenerated documentation

            Parameters
            ----------
            attr1 : {None, int}, optional
                Attribute 1 description
            attr2 : {None, float}, optional
                Attribute 2 description
            
            ...
            """
            def __init__(self, attr1=None, attr2=None):
                # pass all arguments into super().__init__(), this will 
                # autogenerate the __repr__ for the class
                super().__init__(
                    attr1=attr1,
                    attr2=attr2
                )
                # rest of setup required
            
            """
            Next, define the "predict" method, which does all the computation for the process.
            The call of this is pretty open, but should have the **kwargs option at the end.
            While the **kwargs is important for the way the pipelines, etc work, it 
            clutters/can add confusion to the docstring. As such, the first line of the 
            docstring should be the actual call for predict

            Note that the * by itself indicates that the following arguments are key-word ONLY.
            For this example, therefore the gyro is an optional argument. Time and accel 
            are treated as required arguments, even though they are key-word

            Note that the units for time, accel, and gyro should ALWAYS be unix seconds, 
            'g', and 'deg/s' to match the pipeline implementation, and what other processes 
            are expecting.
            """
            @handle_process_returns(results_to_kwargs=False)
            def predict(self, time=None, accel=None, *, gyro=None, **kwargs):
                """
                predict(time, accel, *, gyro=None)

                Preprocess the time, accel, and gyroscope if provided

                Parameters
                ----------
                time : numpy.ndarray
                    (N, ) array of unix timestamps, in seconds
                accel : numpy.ndarray
                    (N, 3) array of acceleration, in units of 'g'
                gyro : numpy.ndarray, optional
                    (N, 3) array of angular velocity, units of 'deg/s'
                
                ...
                """
                # run preprocessing
                accel = calibrate_accel(accel)

                # if you need something passed in thats not a default argument, but 
                # might be useful, retrieve it from  kwargs
                opt_item = kwargs.get('opt_item', None)

                """
                NOTE that returns MUST be dictionaries!!!

                Because we set `results_to_kwargs=False` in `@handle_process_returns`
                above, `preproc_dict` values will NOT be included for future steps to
                use in an easy way. If you want them to be included, set `results_to_kwargs=True`
                If you need to return both results, and updates to inputs for future
                stages, you can use `results_to_kwargs=False`, and then set the return to

                `return results, updates`

                where keys in `results` will not be available for later steps, but
                keys in `updates` (which needs to be a dictionary) will be.
                """
                return preproc_dict

3. External file functions

    * If there is too much code to be contained inside the ``Preprocessing.predict`` method, there are a few suggested guidelines:

        - Generally avoid adding too many other functions to the main file (``preprocessing.py``).
        - Individual functions (especially if they are fairly long) should ideally get their own file, with the name matching that of the function inside.
        - Functions with a common theme can be in 1 file, with the comman name matching that of the file.
        - A ``utility.py`` file might make sense for any functions that have general utility *outside* of the specific module (ie something from ``preprocesisng/uility.py`` getting called from ``gait/gait.py``).

    * These are just suggestions in order to maintain some clarify with multiple functions split over multiple files. If you have a good reason to do something different, just try to keep it as clear as possible.

4. Make sure everyting is setup/imported

    * make sure all importes are handled in ``scr/skdh/preprocessing/__init__.py``, as well as adding ``preprocessing`` to ``src/skdh/__init__.py``

5. Make any additions to ``setup.py``

    * If you don't have any data files (any non-Python files that need to be distributed with the package), or any low-level (c, cython, or Fortran) extensions, everything should be all set for the actual module!

    * If you have data files, find the ``def configuration`` function in ``setup.py`` and locate the DATA FILES section, and add any data files that you have:

    .. code:: python

        # setup.py
        ...

        # DATA FILES
        # ========================
        config.add_data_files(
            ('skdh/gait/model', 'src/skdh/gait/model/final_features.json'),
            ('skdh/gait/model', 'src/skdh/gait/model/lgbm_gait_classifier_no-stairs.lgbm'),
            ('skdh/preprocessing/data', 'src/skdh/preprocessing/data/preprocessing_info.dat')        # Added this file
        )

        # alternatively add this directory, any files/folders under this directory will be added recursively
        config.add_data_dir('src/skdh/preprocessing/data')
        # ========================

        config.get_version('src/skdh/version.py')

        return config
    
    * If you have low-level extensions, find the EXTENSIONS section and add as required:

    .. code:: python

        # setup.py
        ...
        def configuration(parent_package='', top_path=None):
            ...
            # EXTENSIONS
            # ========================
            # Fortran code that is NOT being compiled with f2py - it is being 
            # built as a fortran function that will be imported into C code
            config.add_library(
                'fcwa_convert', 
                sources='src/skdh/read/_extensions/cwa_convert.f95'
            )

            # C code that contains the necessary CPython API calls to allow it to 
            # be imported and used in python
            config.add_extension(
                'skdh/read/_extensions/cwa_convert',  # note the path WITHOUT src/
                sources='src/skdh/read/_extensions/cwa_convert.c',  # note the path WITH src/
                libraries=['fcwa_convert']  # link the previously built fortran library
            )

            # standard C code extension that does not use a fortran library. 
            # Adding a Fortran extension follows the same syntax 
            # (numpy will do the heavy lifting for whatever compilation is required)
            config.add_extension(
                'skdh/read/_extensions/bin_convert',
                sources='src/skdh/read/_extensions/bin_convert.c'
            )

            # dealing with Cython extensions. 
            if os.environ.get('CYTHONIZE', 'False') == 'True':
                # if the environment variable was set, generate .c files from 
                # cython .pyx files. This is not necessary as the .c files 
                # are distributed with the code, but is available as an option 
                # in the off chance that the .c files are not up to date
                from Cython.Build import cythonize  # only import if we need, as otherwise CYTHON isn't required as a requirement

                for pyxf in list(Path('.').rglob('*/features/lib/_cython/*.pyx')):
                    cythonize(str(pyxf), compiler_directives={'language_level': 3})  # create a c file from the cython file

            # Either way, get a list of the cython .c files and add each 
            # as an extension to be compiled
            for cf in list(Path('.').rglob('*/features/lib/_cython/*.c')):
                config.add_extension(
                    str(Path(*cf.parts[1:]).with_suffix('')),
                    sources=[str(cf)]
                )

            # ========================
            ...

            return config