pyheartlib.data_rhythm

Module Contents

Classes

RhythmData

Processes ECG records to make a dataset holding records along with

ECGSequence

Generates samples of data in batches.

Functions

load_dataset([file_path])

Loads the dataset.

class pyheartlib.data_rhythm.RhythmData(base_path=None, remove_bl=False, lowpass=False, cutoff=45, order=15, progress_bar=True, **kwargs)

Bases: pyheartlib.data.Data, pyheartlib.data.DataSeq

Processes ECG records to make a dataset holding records along with metadata about signal excerpts.

It has a method that can generate metadata for signal excerpts. The metadata are generated using the sliding window approach. For each excerpt of a signal the onset, offset, and its annotation is recorded. The metadata list for an excerpt is structured as: [record_id, onset, offset, annotation]. Annotation for an excerpt is a single label. Example metadata for an excerpt: [10, 500, 800, ‘AFIB’].

Parameters:
  • base_path (str, optional) – Path of the main directory for storing the original and processed data, by default None

  • remove_bl (bool, optional) – If True, the baseline wander is removed from the original signals prior to extracting excerpts, by default False

  • lowpass (bool, optional) – Whether or not to apply low-pass filter to the original signals, by default False

  • cutoff (int, optional) – Parameter of the low pass-filter, by default 45

  • order (int, optional) – Parameter of the low pass-filter, by default 15

  • progress_bar (bool, optional) – Whether to display a progress bar, by default True

  • processors (list, optional) – Ordered list of functions’ names for preprocessing the raw signals. Each function takes a one-dimensional NumPy array as its input and returns an array of the same length.

Example

>>> from pyheartlib.data_rhythm import RhythmData
>>> # Make an instance of the RhythmData
>>> rhythm_data = RhythmData(
>>>     base_path="data", remove_bl=False, lowpass=False,
>>>     progress_bar=False)
>>> # Define records
>>> train_set = [201, 203]
>>> # Create the dataset
>>> rhythm_data.save_dataset(
>>>   rec_list=train_set, file_name="train.arr", win_size=3600, stride=64
>>> )
full_annotate(record)

Returns a signal along with an annotation of the same length.

Parameters:

record (dict) – Record as a dictionary with keys: signal, r_locations, r_labels, rhythms, rhythms_locations.

Returns:

Two items: (signal, full_ann).

First element is the original signal (1D ndarray).

Second element is a list that has the same length as the original signal with rhythm types as its elements: [‘(N’,’AFIB’,’AFIB’, …].

Return type:

tuple

gen_samples_info(annotated_records, win_size=30 * 360, stride=36, **kwargs)

Generates metadata for signal excerpts.

The metadata are generated using the sliding window approach. For each excerpt of a signal the onset, offset, and its annotation is recorded. The metadata list for an excerpt is structured as: [record id, onset, offset, annotation]

Parameters:
  • annotated_records (list) – List of records ([rec1_dict, …]). Each record is a dictionary with keys: signal, r_locations, r_labels, rhythms, rhythms_locations, full_ann.

  • win_size (int, optional) – Sliding window length, by default 30*360

  • stride (int, optional) – Stride of the sliding window, by default 36

Returns:

A nested list. Each inner list is structured as: [record id, onset, offset, annotation]. E.g. : [[10,500,800,’AFIB’], [10,700,900,’(N’], …]

Return type:

list

class pyheartlib.data_rhythm.ECGSequence(data, samples_info, class_labels=None, batch_size=128, raw=True, interval=36, shuffle=True, rri_output=True, rri_length=150)

Bases: tensorflow.keras.utils.Sequence

Generates samples of data in batches.

The excerpt for each sample is extracted based on the provided metadata. The use of metadata instead of excerpts has the advantage of reducing the RAM requirement, especially when numerous excerpts are required from the raw signals. By using metadata about the excerpts, they are extracted in batches whenever they are needed.

Parameters:
  • data (list) – A list containing a dictionary for each record: [rec1,rec2,….]. Each record is a dictionary with keys: signal, r_locations, r_labels, rhythms, rhythms_locations, full_ann.

  • samples_info (list) – A nested list of metadata for excerpts. For each excerpt,the metadata is structured as a list: [record_id, onset, offset, annotation]. E.g. : [[10,500,800,’AFIB’], …].

  • class_labels (list, optional) – Classes as a list for converting the output annotations to integers such as: [“(N”, “(VT”] => [0,1], by default None

  • batch_size (int, optional) – Number of samples in each batch, by default 128

  • raw (bool, optional) – Whether to return the waveform or the computed features, by default True

  • interval (int, optional) – Interval for sub-segmenting the signal for waveform feature computation, by default 36

  • shuffle (bool, optional) – If True, after each epoch the samples are shuffled, by default True

  • rri_output (bool, optional) – Whether to return RR-intervals and their features. If False, returns only waveforms.

  • rri_length (int, optional) – Length of the output RR-intervals list. It is zero-padded on the right side, by default 150

Examples

>>> from pyheartlib.data_rhythm import ECGSequence
>>> trainseq = ECGSequence(
>>>    annotated_records,
>>>    samples_info,
>>>    class_labels=None,
>>>    batch_size=3,
>>>    raw=True,
>>>    interval=36,
>>>    shuffle=False,
>>>    rri_output=True,
>>>    rri_length=25
>>> )

Notes

Returns a tuple containing two elements when its object is utilized in this way: ECGSequence_object[BatchNo].

The first element (Batch_x) contains data samples and the second one (Batch_y) their associated annotation.

If rri_output is True, Batch_x is a list of NumPy arrays of Batch_wave, Batch_rri, Batch_rri_feat.

Batch_wave contains signal excerpts or their features, Batch_rri contains RR-intervals, and Batch_rri_feat contains RR-interval features.

If rri_output is False, Batch_x contains Batch_wave only.

If raw is False, Batch_wave has the shape of (Batch size, Number of channels, Number of sub-segments, Number of features), otherwise, it has the shape of (Batch size, Number of channels, Length of excerpt).

Batch_y has the shape of (Batch size, ).

__len__()
__getitem__(idx)
Returns:

Contains batch_x and batch_y.

If rri_output is True, batch_x is a list of Numpy arrays of batch_wave, batch_rri, batch_rri_feat. If rri_output is False, batch_x contains batch_wave only.

If raw is False, batch_wave has the shape of (batch_size, #channels, #sub-segments, #features), otherwise, it has the shape of (batch_size, #channels, wave_len).

batch_y has the shape of (batch_size, 1).

Return type:

tuple

on_epoch_end()

After each epoch shuffles the samples.

get_integer(ann)

Converts a text label to integer.

get_rri(rec_id, start, end)

Computes RR-intervals.

compute_rri_features(rri_array)

Computes some statistical features for RR-intervals.

get_rri_features_names()

Get RR-interval feature names.

compute_wf_feats(seq)

Computes waveform features.

get_wf_feats_names()

Get waveform feature names.

pyheartlib.data_rhythm.load_dataset(file_path=None)

Loads the dataset.

Parameters:

file_path (str, optional) – Path of the dataset, by default None