skdh.io.ReadCSV#

class skdh.io.ReadCSV(time_col_name, column_names, drop_duplicate_timestamps=False, trim_keys=None, trim_start_factor=None, fill_gaps=True, fill_value=None, gaps_error='raise', to_datetime_kwargs=None, raw_conversions=None, accel_in_g=True, g_value=9.81, read_csv_kwargs=None, ext_error='warn')#

Read a comma-separated value (CSV) file into memory.

Parameters:
time_col_namestr

The name of the column containing timestamps.

column_namesdict

Dictionary of column names for different data types. See Notes.

drop_duplicate_timestampsbool, optional

Drop duplicate timestamps before doing any timestamp handling or gap filling. Default is False.

trim_keys{None, tuple}, optional

Trim keys provided in the predict method. Default (None) will not do any trimming. Trimming of either start or end can be accomplished by providing None in the place of the key you do not want to trim. If provided, the tuple should be of the form (start_key, end_key). When provided, trim datetimes will be assumed to be in the same timezone as the data (ie naive if naive, or in the timezone provided).

trim_start_factorint, optional

Factor of seconds to trim start time to. For example, if trim_start_factor=15, then the start time will be trimmed to the next multiple of 15 seconds. Default is None (no trimming).

fill_gapsbool, optional

Fill any gaps in data streams. Default is True. If False and data gaps are detected, then the reading will raise a ValueError.

fill_value{None, dict}, optional

Dictionary with keys and values to fill data streams with. See Notes for default values if not provided.

gaps_error{‘raise’, ‘warn’, ‘ignore’}, optional

Behavior if there are large gaps in the datastream after handling timestamps. Default is to raise an error. NOT recommended to change unless the data is being read as part of a skdh.io.MultiReader call, in which case it will likely be re-sampled.

to_datetime_kwargsdict, optional

Dictionary of key-word arguments for pandas.to_datetime.

raw_conversionsdict, optional

Conversions to apply to raw data, with keys matching those in column_names. Conversions are applied by dividing the raw data stream by the conversion factor provided. If left as None, no conversions will be applied (ie conversion factor of 1.0).

accel_in_gbool, optional

If the acceleration values are in units of “g”. Default is True.

Deprecated since version 0.15.1: Use raw_conversions instead.

g_valuefloat, optional

Gravitational acceleration. Default is 9.81 m/s^2.

Deprecated since version 0.15.1: Use raw_conversions instead.

read_csv_kwargsNone, dict, optional

Dictionary of additional key-word arguments for pandas.read_csv.

ext_error{“warn”, “raise”, “skip”}, optional

What to do if the file extension does not match the expected extension (.bin). Default is “warn”. “raise” raises a ValueError. “skip” skips the file reading altogether and attempts to continue with the pipeline.

.. deprecated:: 0.14.0

bases Removed in favor of having windowing be its own class, skdh.preprocessing.GetDayWindowIndices. periods Removed in favor of having windowing be its own class.

Methods

convert_timestamps(t)

Convert a timestamp/array of timestamps to a datetime object

handle_timestamp_inconsistency_np(fill_dict, ...)

Handle any time gaps, or timestamps that are only down to the second.

predict(*, file)

Read the data from a comma-separated value (CSV) file.

save_results(results, file_name)

Save the results of the processing pipeline to a csv file

trim_data(trim_start_factor, start_key, ...)

Trim data based on either a start factor or start/end keys

trim_data_factor(trim_start_factor, *, time, ...)

Trim raw data based on a factor of seconds.

trim_data_time(start_key, end_key, tz_name, ...)

Trim raw data based on provided date-times

handle_gaps_error

Notes

For column_names, valid keys are:

  • accel

  • gyro

  • ecg

  • temperature

For a key, either strings or lists of strings are accepted. If multiple columns are provided for different axes, they are assumed to be in X, Y, Z order.

In order to handle windowing, data gap filling, or timestamp interpolation in the case that timestamps are only down to the second (ie ActiGraph CSV files), the time column is always first converted to a datetime64 Series via pandas.to_datetime. To make sure this conversion applies correctly, specify whatever key-word arguments to to_datetime_kwargs. This includes specifying the unit (e.g. s, ms, us, ns, etc) if a unix timestamp integer is provided.

Default fill values are:

  • accel: numpy.array([0.0, 0.0, 1.0])

  • gyro: 0.0

  • temperature: 0.0

  • ecg: 0.0

handle_timestamp_inconsistency_np(fill_dict, time, fs, data)#

Handle any time gaps, or timestamps that are only down to the second.

Parameters:
fill_dictdict

Dictionary of fill values for columns identified in column_names. In cases where there are multiple columns for a datastream, the last will be filled with the value.

timenumpy.ndarray

Array of timestamps in unix seconds.

fs{None, float}

Sampling frequency. If None, will be calculated from the data.

datadict

Dictionary of data streams

Returns:
fsfloat

Number of samples per second.

timenumpy.ndarray

Timestamp array with update timestamps to be unique and gaps filled

datadict

Data dictionary with updated arrays with gaps filled if specified.

predict(*, file)#

Read the data from a comma-separated value (CSV) file.

Parameters:
file{str, Path}

Path to the file to read.

tz_name{None, str}, optional

IANA time-zone name for the recording location. If not provided, timestamps will represent local time naively. This means they will not account for any time changes due to Daylight Saving Time.

Returns:
datadict

Dictionary of the time and acceleration data contained in the file. Time will be in unix seconds, and acceleration will be in units of “g”.

Raises:
ValueError

If the file name is not provided.