.. _adding-modules: ############## Adding modules ############## The goal of ``scikit-digital-health`` is to have one package with a defined structure that allows for easy pipeline generaton with multiple stages that may or may not depend on previous stages. To that end, there are several pre-defined base classes intended to be sub-classeed that will help setting up modules that are intended to directly interface with the pipeline infrastructure. Add a new module ================ 1. Create a new module directory under ``src/skdh`` with the desired name. For this example, we will use ``preprocessing``:: cd src/skdh mkdir preprocessing touch preprocessing/__init__.py 2. Create the module class that will be added to the pipeline * This should be in a file with the same name as the class, eg ``preprocessing.py`` * An example file is below: .. code:: python # src/skdh/preprocessing/preprocessing.py import ... from skdh.base import BaseProcess, handle_process_returns # import the base process class class Preprocessing(BaseProcess): """ Class to implement any processing steps NOTE that this is the docstring for the class! placing up here aids with clean autogenerated documentation Parameters ---------- attr1 : {None, int}, optional Attribute 1 description attr2 : {None, float}, optional Attribute 2 description ... """ def __init__(self, attr1=None, attr2=None): # pass all arguments into super().__init__(), this will # autogenerate the __repr__ for the class super().__init__( attr1=attr1, attr2=attr2 ) # rest of setup required """ Next, define the "predict" method, which does all the computation for the process. The call of this is pretty open, but should have the **kwargs option at the end. While the **kwargs is important for the way the pipelines, etc work, it clutters/can add confusion to the docstring. As such, the first line of the docstring should be the actual call for predict Note that the * by itself indicates that the following arguments are key-word ONLY. For this example, therefore the gyro is an optional argument. Time and accel are treated as required arguments, even though they are key-word Note that the units for time, accel, and gyro should ALWAYS be unix seconds, 'g', and 'deg/s' to match the pipeline implementation, and what other processes are expecting. """ @handle_process_returns(results_to_kwargs=False) def predict(self, time=None, accel=None, *, gyro=None, **kwargs): """ predict(time, accel, *, gyro=None) Preprocess the time, accel, and gyroscope if provided Parameters ---------- time : numpy.ndarray (N, ) array of unix timestamps, in seconds accel : numpy.ndarray (N, 3) array of acceleration, in units of 'g' gyro : numpy.ndarray, optional (N, 3) array of angular velocity, units of 'deg/s' ... """ # run preprocessing accel = calibrate_accel(accel) # if you need something passed in thats not a default argument, but # might be useful, retrieve it from kwargs opt_item = kwargs.get('opt_item', None) """ NOTE that returns MUST be dictionaries!!! Because we set `results_to_kwargs=False` in `@handle_process_returns` above, `preproc_dict` values will NOT be included for future steps to use in an easy way. If you want them to be included, set `results_to_kwargs=True` If you need to return both results, and updates to inputs for future stages, you can use `results_to_kwargs=False`, and then set the return to `return results, updates` where keys in `results` will not be available for later steps, but keys in `updates` (which needs to be a dictionary) will be. """ return preproc_dict 3. External file functions * If there is too much code to be contained inside the ``Preprocessing.predict`` method, there are a few suggested guidelines: - Generally avoid adding too many other functions to the main file (``preprocessing.py``). - Individual functions (especially if they are fairly long) should ideally get their own file, with the name matching that of the function inside. - Functions with a common theme can be in 1 file, with the comman name matching that of the file. - A ``utility.py`` file might make sense for any functions that have general utility *outside* of the specific module (ie something from ``preprocesisng/uility.py`` getting called from ``gait/gait.py``). * These are just suggestions in order to maintain some clarify with multiple functions split over multiple files. If you have a good reason to do something different, just try to keep it as clear as possible. 4. Make sure everyting is setup/imported * make sure all importes are handled in ``scr/skdh/preprocessing/__init__.py``, as well as adding ``preprocessing`` to ``src/skdh/__init__.py`` 5. Make any additions to ``setup.py`` * If you don't have any data files (any non-Python files that need to be distributed with the package), or any low-level (c, cython, or Fortran) extensions, everything should be all set for the actual module! * If you have data files, find the ``def configuration`` function in ``setup.py`` and locate the DATA FILES section, and add any data files that you have: .. code:: python # setup.py ... # DATA FILES # ======================== config.add_data_files( ('skdh/gait/model', 'src/skdh/gait/model/final_features.json'), ('skdh/gait/model', 'src/skdh/gait/model/lgbm_gait_classifier_no-stairs.lgbm'), ('skdh/preprocessing/data', 'src/skdh/preprocessing/data/preprocessing_info.dat') # Added this file ) # alternatively add this directory, any files/folders under this directory will be added recursively config.add_data_dir('src/skdh/preprocessing/data') # ======================== config.get_version('src/skdh/version.py') return config * If you have low-level extensions, find the EXTENSIONS section and add as required: .. code:: python # setup.py ... def configuration(parent_package='', top_path=None): ... # EXTENSIONS # ======================== # Fortran code that is NOT being compiled with f2py - it is being # built as a fortran function that will be imported into C code config.add_library( 'fcwa_convert', sources='src/skdh/read/_extensions/cwa_convert.f95' ) # C code that contains the necessary CPython API calls to allow it to # be imported and used in python config.add_extension( 'skdh/read/_extensions/cwa_convert', # note the path WITHOUT src/ sources='src/skdh/read/_extensions/cwa_convert.c', # note the path WITH src/ libraries=['fcwa_convert'] # link the previously built fortran library ) # standard C code extension that does not use a fortran library. # Adding a Fortran extension follows the same syntax # (numpy will do the heavy lifting for whatever compilation is required) config.add_extension( 'skdh/read/_extensions/bin_convert', sources='src/skdh/read/_extensions/bin_convert.c' ) # dealing with Cython extensions. if os.environ.get('CYTHONIZE', 'False') == 'True': # if the environment variable was set, generate .c files from # cython .pyx files. This is not necessary as the .c files # are distributed with the code, but is available as an option # in the off chance that the .c files are not up to date from Cython.Build import cythonize # only import if we need, as otherwise CYTHON isn't required as a requirement for pyxf in list(Path('.').rglob('*/features/lib/_cython/*.pyx')): cythonize(str(pyxf), compiler_directives={'language_level': 3}) # create a c file from the cython file # Either way, get a list of the cython .c files and add each # as an extension to be compiled for cf in list(Path('.').rglob('*/features/lib/_cython/*.c')): config.add_extension( str(Path(*cf.parts[1:]).with_suffix('')), sources=[str(cf)] ) # ======================== ... return config