Rpeak

This example shows how to use the ‍‍‍RpeakData‍‍‍ class to make a dataset. This class is similar to the RhythmData class. However, unlike the RhythmData class, it does not provide only one label for each excerpt. Instead, it provides a list of labels for each excerpt. Each element in this annotation list corresponds to a subsegment of the excerpt. The ECGSequence class uses the created dataset to generate batches of sample data.

This example is availble on GitHub.

Install pyheartlib

First, pyheartlib needs to be installed.

try:
    import pyheartlib
    print(f'Pyheartlib version {pyheartlib.__version__} is already installed!')
except ModuleNotFoundError:
    print('Installing pyheartlib...')
    %pip install pyheartlib
    import pyheartlib
    print(f'Pyheartlib version {pyheartlib.__version__} is installed!')

Pyheartlib version 1.21.0 is already installed!

Download raw data

Pyheartlib supports the WFDB format. A popular dataset that uses this format is the “MIT-BIH Arrhythmia Database”. The code below downloads this dataset and stores it in the data directory.

Create dataset

To create a dataset using the RpeakData, first it needs to be imported.

from pyheartlib.data_rpeak import RpeakData

The next step is to create an object of the pyheartlib.data_rpeak.RpeakData.

rpeak_data = RpeakData(
    base_path="data", remove_bl=False, lowpass=False, progress_bar=False
)

Descriptions of all the parameters can be found here.

Using the save_dataset() method, the dataset will be created.

rpeak_data.save_dataset(
    rec_list=train_set,
    file_name="train.rpeak",
    win_size=5 * 360,
    stride=360,
    interval=72,
)

Let’s create the dataset by running the next code block.

import numpy as np
import pandas as pd
from pyheartlib.data_rpeak import RpeakData

# Make an instance of the RpeakData
rpeak_data = RpeakData(
    base_path="data", remove_bl=False, lowpass=False, progress_bar=False
)

# Define records
train_set = [201, 203]

# Create the dataset
# The win_size specifies the length of the excerpts
rpeak_data.save_dataset(
    rec_list=train_set,
    file_name="train.rpeak",
    win_size=5 * 360,
    stride=360,
    interval=72,
)

File saved at: data/train.rpeak

Now that the dataset is ready, it can be loaded using the load_dataset() function.

from pyheartlib.data_rpeak import load_dataset
annotated_records, samples_info = load_dataset("data/train.rpeak")

Let’s load the data and count the number of samples for each class.

# Load the dataset
from pyheartlib.data_rpeak import load_dataset
annotated_records, samples_info = load_dataset("data/train.rpeak")

labels = []
for sample in samples_info:
    labels.append(sample[3])
df = pd.DataFrame(np.unique(labels, return_counts=True), index=["Label", "Count"])
print(df)

File loaded from: data/train.rpeak
           0    1   2  3      4   5     6    7   8
Label      0    A   F  J      N   Q     V    a   j
Count  65388  150  15  5  20718  20  3210  494  50

The metadata of a sample is a list: [record ID, onset, offset, annotation]

# Metadata of a sample excerpt
# [record ID, onset, offset, annotation]
print(samples_info[102])

[0, 36720, 38520, [0, 0, 0, 0, 'N', 0, 'N', 0, 'N', 0, 0, 0, 'N', 0, 0, 'N', 0, 0, 'N', 0, 0, 0, 'N', 0, 0]]

Generate data samples

In this section, the dataset that was created in the previous section will be used to generate batches of sample data. To accomplish this, an instance of pyheartlib.data_rpeak.ECGSequence must be created.

from pyheartlib.data_rpeak import ECGSequence

ecgseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=2, raw=True, interval=72
)

The ECGSequence takes the annotated_records (ECG records) and the samples_info (metadata) that were loaded previously.

Other parameters that ECGSequence takes can be found here.

Let’s generate a batch of data and examine the shapes and values of the samples.

# Generate data in batch using he ECGSequence

# Returns a tuple containing two elements when its object is utilized in
# this way: ECGSequence_object[BatchNo].

# The first element (Batch_x) contains data samples and the
# second one (Batch_y) their associated annotation.

# Batch_x contains the signal excerpts or their features (Batch_wave).

# If `raw` is False, Batch_wave has the shape of
# (Batch size, Number of channels, Number of sub-segments, Number of features), 
# otherwise, it has the shape of (Batch_size, Number of channels, Length of excerpt).

# Batch_y has the shape of (Batch size, Length of annotation list).

from pyheartlib.data_rpeak import ECGSequence

ecgseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=3, raw=True, interval=72
)
bt = 0  # Batch number
batch_x, batch_y = ecgseq[bt]
batch_annotation = batch_y  # Annotation
batch_wave = batch_x  # Excerpt

print("Length of annotation for each sample data:", len(batch_annotation[0]))
print("Length of each sample excerpt:", batch_wave.shape[2])
print("Batch_wave shape:", batch_wave.shape, ", Batch_annotation shape:", batch_annotation.shape)

Length of annotation for each sample data: 25
Length of each sample excerpt: 1800
Batch_wave shape: (3, 2, 1800) , Batch_annotation shape: (3, 25)

# Annotations of the samples in the batch
print(batch_annotation)

[['N' '0' '0' 'N' '0' 'N' '0' '0' 'N' '0' 'N' '0' '0' '0' 'N' '0' '0' 'N'
  '0' 'N' '0' '0' 'N' '0' 'N']
 ['0' 'V' '0' '0' 'N' '0' '0' '0' 'N' '0' '0' 'N' '0' 'N' '0' '0' 'N' '0'
  'V' '0' '0' 'N' '0' '0' '0']
 ['0' '0' 'N' '0' '0' '0' '0' 'N' '0' '0' '0' '0' '0' 'N' '0' '0' '0' '0'
  '0' 'N' '0' '0' '0' '0' '0']]

# Plot an output excerpt of the batch
wf = batch_wave[0]
wf_ch1 = wf[0]
wf_ch2 = wf[1]
print(f'Length of each excerpt: {len(wf_ch1)}')

import matplotlib.pyplot as plt
plt.figure(figsize=(6, 3))
plt.plot(wf_ch1);
plt.plot(wf_ch2);
chs = rpeak_data.config["CHANNEL"]
plt.legend([chs[0], chs[1]], loc="upper right");

Length of each excerpt: 1800

../_images/58c6f8908c999534a3019b88e1d9c27f7e1c6e60606ba6a43aa3e2e332ad8f0c.png

The code below generates a batch with the raw parameter set to False. In this case, the generated batch contains computed features instead of raw waveform.

# When raw is set to False
ecgseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=3, raw=False, interval=72
)

bt = 0  # Batch number
batch_x, batch_y = ecgseq[bt]
batch_annotation = batch_y  # Annotation
batch_wave = batch_x  # Excerpt

print("Length of annotation for each sample data:", len(batch_annotation[0]))
print("Batch_wave shape:", batch_wave.shape, ", Batch_annotation shape:", batch_annotation.shape)

Length of annotation for each sample data: 25
Batch_wave shape: (3, 2, 25, 14) , Batch_annotation shape: (3, 25)