skdh.io.ReadCSV#

class skdh.io.ReadCSV(time_col_name, column_names, drop_duplicate_timestamps=False, trim_keys=None, trim_start_factor=None, fill_gaps=True, fill_value=None, gaps_error='raise', to_datetime_kwargs=None, raw_conversions=None, accel_in_g=True, g_value=9.81, read_csv_kwargs=None, ext_error='warn')#

Read a comma-separated value (CSV) file into memory.

Parameters:

time_col_namestr: The name of the column containing timestamps.
column_namesdict: Dictionary of column names for different data types. See Notes.
drop_duplicate_timestampsbool, optional: Drop duplicate timestamps before doing any timestamp handling or gap filling. Default is False.
trim_keys{None, tuple}, optional: Trim keys provided in the predict method. Default (None) will not do any trimming. Trimming of either start or end can be accomplished by providing None in the place of the key you do not want to trim. If provided, the tuple should be of the form (start_key, end_key). When provided, trim datetimes will be assumed to be in the same timezone as the data (ie naive if naive, or in the timezone provided).
trim_start_factorint, optional: Factor of seconds to trim start time to. For example, if trim_start_factor=15, then the start time will be trimmed to the next multiple of 15 seconds. Default is None (no trimming).
fill_gapsbool, optional: Fill any gaps in data streams. Default is True. If False and data gaps are detected, then the reading will raise a ValueError.
fill_value{None, dict}, optional: Dictionary with keys and values to fill data streams with. See Notes for default values if not provided.
gaps_error{‘raise’, ‘warn’, ‘ignore’}, optional: Behavior if there are large gaps in the datastream after handling timestamps. Default is to raise an error. NOT recommended to change unless the data is being read as part of a skdh.io.MultiReader call, in which case it will likely be re-sampled.
to_datetime_kwargsdict, optional: Dictionary of key-word arguments for pandas.to_datetime.
raw_conversionsdict, optional: Conversions to apply to raw data, with keys matching those in column_names. Conversions are applied by dividing the raw data stream by the conversion factor provided. If left as None, no conversions will be applied (ie conversion factor of 1.0).
accel_in_gbool, optional: If the acceleration values are in units of “g”. Default is True.

Deprecated since version 0.15.1: Use raw_conversions instead.
g_valuefloat, optional: Gravitational acceleration. Default is 9.81 m/s^2.

Deprecated since version 0.15.1: Use raw_conversions instead.
read_csv_kwargsNone, dict, optional: Dictionary of additional key-word arguments for pandas.read_csv.
ext_error{“warn”, “raise”, “skip”}, optional: What to do if the file extension does not match the expected extension (.bin). Default is “warn”. “raise” raises a ValueError. “skip” skips the file reading altogether and attempts to continue with the pipeline.
.. deprecated:: 0.14.0: bases Removed in favor of having windowing be its own class, skdh.preprocessing.GetDayWindowIndices. periods Removed in favor of having windowing be its own class.

Methods

`convert_timestamps`(t)	Convert a timestamp/array of timestamps to a datetime object
`handle_timestamp_inconsistency_np`(fill_dict, ...)	Handle any time gaps, or timestamps that are only down to the second.
`predict`(*, file)	Read the data from a comma-separated value (CSV) file.
`save_results`(results, file_name)	Save the results of the processing pipeline to a csv file
`trim_data`(trim_start_factor, start_key, ...)	Trim data based on either a start factor or start/end keys
`trim_data_factor`(trim_start_factor, *, time, ...)	Trim raw data based on a factor of seconds.
`trim_data_time`(start_key, end_key, tz_name, ...)	Trim raw data based on provided date-times

handle_gaps_error

Notes

For column_names, valid keys are:

accel
gyro
ecg
temperature

For a key, either strings or lists of strings are accepted. If multiple columns are provided for different axes, they are assumed to be in X, Y, Z order.

In order to handle windowing, data gap filling, or timestamp interpolation in the case that timestamps are only down to the second (ie ActiGraph CSV files), the time column is always first converted to a datetime64 Series via pandas.to_datetime. To make sure this conversion applies correctly, specify whatever key-word arguments to to_datetime_kwargs. This includes specifying the unit (e.g. s, ms, us, ns, etc) if a unix timestamp integer is provided.

Default fill values are:

accel: numpy.array([0.0, 0.0, 1.0])
gyro: 0.0
temperature: 0.0
ecg: 0.0

handle_timestamp_inconsistency_np(fill_dict, time, fs, data)#

Handle any time gaps, or timestamps that are only down to the second.

Parameters:

fill_dictdict: Dictionary of fill values for columns identified in column_names. In cases where there are multiple columns for a datastream, the last will be filled with the value.
timenumpy.ndarray: Array of timestamps in unix seconds.
fs{None, float}: Sampling frequency. If None, will be calculated from the data.
datadict: Dictionary of data streams

Returns:

fsfloat: Number of samples per second.
timenumpy.ndarray: Timestamp array with update timestamps to be unique and gaps filled
datadict: Data dictionary with updated arrays with gaps filled if specified.

predict(*, file)#

Read the data from a comma-separated value (CSV) file.

Parameters:

file{str, Path}: Path to the file to read.
tz_name{None, str}, optional: IANA time-zone name for the recording location. If not provided, timestamps will represent local time naively. This means they will not account for any time changes due to Daylight Saving Time.

Returns:

datadict: Dictionary of the time and acceleration data contained in the file. Time will be in unix seconds, and acceleration will be in units of “g”.

Raises:

ValueError: If the file name is not provided.