pyheartlib.data_beat

Module Contents

Classes

BeatData

Processes the provided ECG records and creates a dataset containing

class pyheartlib.data_beat.BeatData(base_path=None, win=[60, 120], num_pre_rr=10, num_post_rr=10, remove_bl=False, lowpass=False, cutoff=45, order=15, progress_bar=True, **kwargs)

Bases: pyheartlib.data.Data

Processes the provided ECG records and creates a dataset containing waveforms, features, and annotations.

Parameters:
  • base_path (str, optional) – Path of the main directory for storing the original and processed data, by default None

  • win (list, optional) – [Onset, Offset] of signal excerpts around the R-peaks, by default [60, 120]

  • num_pre_rr (int, optional) – Number of preceding R-peak locations to be included for each beat, by default 10

  • num_post_rr (int, optional) – Number of subsequent R-peak locations to be included for each beat, by default 10

  • remove_bl (bool, optional) – If True, the baseline wander is removed from the original signals prior to extracting excerpts, by default False

  • lowpass (bool, optional) – Whether or not to apply low-pass filter to the original signals, by default False

  • cutoff (int, optional) – Parameter of the low pass-filter, by default 45

  • order (int, optional) – Parameter of the low pass-filter, by default 15

  • progress_bar (bool, optional) – Whether to display a progress bar, by default True

  • processors (list, optional) – Ordered list of functions’ names for preprocessing the raw signals. Each function takes a one-dimensional NumPy array as its input and returns an array of the same length.

Examples

>>> beatdata = BeatData(base_path="./data", win=[200, 200],
>>>                     remove_bl=False, lowpass=False,
>>>                     progress_bar=True)
>>> # create a BeatInfo object
>>> beatinfo = BeatInfo()
>>> # save the dataset file
>>> beatdata.save_dataset_inter(DS1[17:18], beatinfo, file="train.beat")
>>> # load the dataset from file
>>> train_ds = beatdata.load_data(file_name="train.beat")
File loaded from: ./data/train.beat
-Shape of "waveforms" is (2985, 400). Number of samples is 2985.
-Shape of "beat_feats" is (2985, 27). Number of samples is 2985.
-Shape of "labels" is (2985,). Number of samples is 2985.
            N  L  R  j  e  V  E    A  S  a  J  F  f  /  Q
train.beat  2601  0  0  0  0  1  0  383  0  0  0  0  0  0  0
make_frags(signal, r_locations=None, r_label=None)

Fragments one signal into beats and returns the signal excerpts and corresponding labels.

Parameters:
  • signal (list) – A list containing signal values.

  • r_locations (list) – A list containing rpeak locations on the signal.

  • r_label (list) – A list containing the rpeak(beat) labels.

Returns:

signal_fragsnumpy.ndarray

A 2D array containing extracted beat excerpts.

beat_typeslist

Contains the corresponding labels of each beat excerpt.

r_locslist

A list containing lists of previous, itself, and future rpeak locations for each beat. Can be used for HRV calculations.

s_idxslist

Contains the starting point of each extracted beat excerpt on the original signal. This is computed by subtracting the window onset from the rpeak location.

Return type:

Tuple

make_dataset(records, beatinfo_obj=None)

Creates a dataset from the provided records.

Parameters:
  • records (list) – A list containing records ids.

  • beatinfo_obj (instance of BeatInfo.) –

Returns:

Dictionary with keys:

’waveforms’numpy.ndarray

2D array of beat waveforms.

’beat_feats’pd.DataFrame

DataFrame of beats’ features.

’labels’numpy.ndarray

1D array of beats’ labels.

Return type:

dict

beat_info_feat(data, beatinfo_obj)

Provides the computed features for all the beats.

Parameters:
  • data (dict) –

    Dictionary with keys:

    ’waveform’numpy.ndarray

    Array of waveforms (#waveforms, len_waveforms, #channels).

    ’rpeak_locs’list

    List of rpeak locations.

    ’rec_ids’list

    List of record ids.

    ’start_idxs’list

    List of start_idxs of waveforms on the raw signal.

    ’labels’list

    List of beat labels.

  • beatinfo_obj (Instance of BeatInfo) –

Returns:

featureslist

Contains feature dictionaries for all the beats.

labelslist

Contains corresponding beat labels.

Return type:

Tuple

save_dataset_inter(records, beatinfo_obj, file=None)

Creates a dataset from the given record IDs.

Parameters:
  • records (list) – List of records IDs.

  • beatinfo_obj (Instance of BeatInfo.) –

  • file (str) – Name of the file that will be saved.

save_dataset_intra(records, beatinfo_obj, split_ratio=0.3, file_prefix='intra')

Makes the dataset in intra-patient way.

Parameters:
  • records (list, optional) – List of records IDs.

  • beatinfo_obj (Instance of BeatInfo.) –

  • split_ratio (float, optional) – Ratio of test set, by default 0.3

  • file_prefix (str, optional) – Prefix for the file names to be saved, by default ‘intra’

save_dataset_single(record, beatinfo_obj, split_ratio=0.3, file=None)

Saves the signal fragments and their labels into a file for a single record.

Parameters:
  • record (str) – Record id.

  • beatinfo_obj (instance of BeatInfo) –

  • split_ratio (float, optional) – Ratio of test set, by default 0.3

  • file (str, optional) – Name of the file to be saved, by default None

load_data(file_name)

Loads a file containing a dataframe.

Parameters:

file_name (str) – File name. The final final path is the join of base path and file name.

Returns:

Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

Return type:

dict

report_stats(yds_list)

Counts the number of samples for each label type in the data.

Parameters:

yds_list (list) – List containing several label data sets. e.g train,val,test.

Returns:

A list of dictionaries. One dictionary per one data set. Keys are labels types(symbols) and values are the counts of each specific symbol.

Return type:

list

report_stats_table(yds_list, name_list=[])

Returns the number of samples for each label type in the data.

Parameters:
  • yds_list (list) – List containing several label data. e.g train,val,test.

  • name_list (list, optional) – A list of strings as the name of label data e.g. train,val,test, by default []

Returns:

A dataframe containing symbols and their counts.

Return type:

pandas.dataframe

per_record_stats(rec_ids_list=None, cols=None)

Returns a dataframe containing the number of each type in each record.

Parameters:
  • rec_ids_list (list, optional) – List of record ids, by default None

  • cols (list) – List of labels classes, by default None

Returns:

Contains count of each label type.

Return type:

pandas.dataframe

slice_data(ds, labels)

Returns the data according to the provided annotation list.

Parameters:
  • ds (dics) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

  • labels (list) – List of labels to be kept in the output.

Returns:

Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

Return type:

dict

search_label(inp, sym='N')

Searches the provided data and returns the indexes for a patricular label.

Parameters:
  • inp (dict or numpy.ndarray) – Input can be a dictionary having a ‘labels’ key, or a 1D numpy array containing labels.

  • sym (str, optional) – The label to be searched for in the dataset, by default ‘N’

Returns:

A list of indexes corresponding to the searched label.

Return type:

list

Raises:

TypeError – Input data must be a dictionary or a numpy array.

clean_inf_nan(ds)

Cleans the dataset by removing samples (rows) with inf or nan in computed features.

Parameters:

ds (dict) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

Returns:

Cleaned dataset with keys: “waveforms”, “beat_feats”, and “labels”.

Return type:

dict

clean_IQR(ds, factor=1.5, return_indexes=False)

Cleans the dataset by removing outliers using IQR method.

Parameters:
  • ds (dict) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

  • factor (float, optional) – Parameter of IQR method, by default 1.5

  • return_indexes (bool, optional) – If True returns indexes of outliers, otherwise returns cleaned dataset, by default False

Returns:

Cleaned dataset with keys: “waveforms”, “beat_feats”, and “labels”, or indexes of outliers.

Return type:

dict or list

clean_IQR_class(ds, factor=1.5)

Cleans dataset by IQR method for every class separately.

Parameters:
  • ds (dict) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

  • factor (float, optional) – Parameter of IQR method, by default 1.5

Returns:

Cleaned dataset with keys: “waveforms”, “beat_feats”, and “labels”.

Return type:

dict

append_ds(ds1, ds2)

Appends two datasets.

Parameters:
  • ds1 (dict) – Datasets with keys: “waveforms”, “beat_feats”, and “labels”.

  • ds2 (dict) – Datasets with keys: “waveforms”, “beat_feats”, and “labels”.

Returns:

Dataset with keys: “waveforms”, “beat_feats”, and “labels”.

Return type:

dict