skdh.io.ReadCSV#
- class skdh.io.ReadCSV(time_col_name, column_names, drop_duplicate_timestamps=False, trim_keys=None, trim_start_factor=None, fill_gaps=True, fill_value=None, gaps_error='raise', to_datetime_kwargs=None, raw_conversions=None, accel_in_g=True, g_value=9.81, read_csv_kwargs=None, ext_error='warn')#
Read a comma-separated value (CSV) file into memory.
- Parameters:
- time_col_namestr
The name of the column containing timestamps.
- column_namesdict
Dictionary of column names for different data types. See Notes.
- drop_duplicate_timestampsbool, optional
Drop duplicate timestamps before doing any timestamp handling or gap filling. Default is False.
- trim_keys{None, tuple}, optional
Trim keys provided in the predict method. Default (None) will not do any trimming. Trimming of either start or end can be accomplished by providing None in the place of the key you do not want to trim. If provided, the tuple should be of the form (start_key, end_key). When provided, trim datetimes will be assumed to be in the same timezone as the data (ie naive if naive, or in the timezone provided).
- trim_start_factorint, optional
Factor of seconds to trim start time to. For example, if trim_start_factor=15, then the start time will be trimmed to the next multiple of 15 seconds. Default is None (no trimming).
- fill_gapsbool, optional
Fill any gaps in data streams. Default is True. If False and data gaps are detected, then the reading will raise a ValueError.
- fill_value{None, dict}, optional
Dictionary with keys and values to fill data streams with. See Notes for default values if not provided.
- gaps_error{‘raise’, ‘warn’, ‘ignore’}, optional
Behavior if there are large gaps in the datastream after handling timestamps. Default is to raise an error. NOT recommended to change unless the data is being read as part of a
skdh.io.MultiReadercall, in which case it will likely be re-sampled.- to_datetime_kwargsdict, optional
Dictionary of key-word arguments for
pandas.to_datetime.- raw_conversionsdict, optional
Conversions to apply to raw data, with keys matching those in column_names. Conversions are applied by dividing the raw data stream by the conversion factor provided. If left as None, no conversions will be applied (ie conversion factor of 1.0).
- accel_in_gbool, optional
If the acceleration values are in units of “g”. Default is True.
Deprecated since version 0.15.1: Use raw_conversions instead.
- g_valuefloat, optional
Gravitational acceleration. Default is 9.81 m/s^2.
Deprecated since version 0.15.1: Use raw_conversions instead.
- read_csv_kwargsNone, dict, optional
Dictionary of additional key-word arguments for
pandas.read_csv.- ext_error{“warn”, “raise”, “skip”}, optional
What to do if the file extension does not match the expected extension (.bin). Default is “warn”. “raise” raises a ValueError. “skip” skips the file reading altogether and attempts to continue with the pipeline.
- .. deprecated:: 0.14.0
bases Removed in favor of having windowing be its own class,
skdh.preprocessing.GetDayWindowIndices. periods Removed in favor of having windowing be its own class.
Methods
convert_timestamps(t)Convert a timestamp/array of timestamps to a datetime object
handle_timestamp_inconsistency_np(fill_dict, ...)Handle any time gaps, or timestamps that are only down to the second.
predict(*, file)Read the data from a comma-separated value (CSV) file.
save_results(results, file_name)Save the results of the processing pipeline to a csv file
trim_data(trim_start_factor, start_key, ...)Trim data based on either a start factor or start/end keys
trim_data_factor(trim_start_factor, *, time, ...)Trim raw data based on a factor of seconds.
trim_data_time(start_key, end_key, tz_name, ...)Trim raw data based on provided date-times
handle_gaps_error
Notes
For column_names, valid keys are:
accel
gyro
ecg
temperature
For a key, either strings or lists of strings are accepted. If multiple columns are provided for different axes, they are assumed to be in X, Y, Z order.
In order to handle windowing, data gap filling, or timestamp interpolation in the case that timestamps are only down to the second (ie ActiGraph CSV files), the time column is always first converted to a datetime64 Series via
pandas.to_datetime. To make sure this conversion applies correctly, specify whatever key-word arguments to to_datetime_kwargs. This includes specifying the unit (e.g. s, ms, us, ns, etc) if a unix timestamp integer is provided.Default fill values are:
accel: numpy.array([0.0, 0.0, 1.0])
gyro: 0.0
temperature: 0.0
ecg: 0.0
- handle_timestamp_inconsistency_np(fill_dict, time, fs, data)#
Handle any time gaps, or timestamps that are only down to the second.
- Parameters:
- fill_dictdict
Dictionary of fill values for columns identified in column_names. In cases where there are multiple columns for a datastream, the last will be filled with the value.
- timenumpy.ndarray
Array of timestamps in unix seconds.
- fs{None, float}
Sampling frequency. If None, will be calculated from the data.
- datadict
Dictionary of data streams
- Returns:
- fsfloat
Number of samples per second.
- timenumpy.ndarray
Timestamp array with update timestamps to be unique and gaps filled
- datadict
Data dictionary with updated arrays with gaps filled if specified.
- predict(*, file)#
Read the data from a comma-separated value (CSV) file.
- Parameters:
- file{str, Path}
Path to the file to read.
- tz_name{None, str}, optional
IANA time-zone name for the recording location. If not provided, timestamps will represent local time naively. This means they will not account for any time changes due to Daylight Saving Time.
- Returns:
- datadict
Dictionary of the time and acceleration data contained in the file. Time will be in unix seconds, and acceleration will be in units of “g”.
- Raises:
- ValueError
If the file name is not provided.