R-peak Detection

This example contains the code for training a deep learning model for the R-peak detection task. In this example, the electrocardiogram signals are segmented into ten-second ecxerpts. Each excerpt is associated with an annotation list of length 600, containing zeros and ones. A subsegment of an excerpt that contains an R-peak corresponds to 1 in the annotation list. To achieve this goal, the RpeakData class is used to create the required train, validation, and test datasets. Each dataset consists of signals and annotations, as well as metadata about the excerpts. The metadata for each excerpt includes the record ID, the onset and offset of the excerpt on the raw signal, and the annotation.

This example is availble on GitHub.

Install Pyheartlib

try:
    import pyheartlib
    print(f'Pyheartlib version {pyheartlib.__version__} is already installed!')
except ModuleNotFoundError:
    print('Installing pyheartlib...')
    %pip install pyheartlib
    import pyheartlib
    print(f'Pyheartlib version {pyheartlib.__version__} is installed!')

Pyheartlib version 1.21.0 is already installed!

Download raw data

This section downloads the raw data and store them in the data directory.

Create dataset

The raw dataset contains several ECG records, each of which is 30 minutes long. In order to create the required datasets, RpeakData class is used. The datasets comprise the signals and their annotations, as well as the metadata required to extract ten-second signal excerpts. RpeakData class utilizes a sliding window to determine the onset and offset of the signal excerpts. The length of the excerpts is determined by the win_size parameter, and the degree of overlap between excerpts is determined by the stride parameter. Here, 18 records are used for the train dataset, 4 for validation, and 22 for the test dataset.

import os

from pyheartlib.data_rpeak import ECGSequence, RpeakData
from pyheartlib.data_rpeak import load_dataset

data_dir = "data"
# Make an instance of the RpeakData
rpeak_data = RpeakData(
    base_path=data_dir, remove_bl=False, lowpass=False, progress_bar=False
)

# Define record names
train_set = rpeak_data.config["DS1"][:18]
validation_set = rpeak_data.config["DS1"][18:]
test_set = rpeak_data.config["DS2"]

# Create train dataset
# The win_size specifies the length of the excerpts
rpeak_data.save_dataset(
    rec_list=train_set,
    file_name="train.rpeak",
    win_size=10 * 360,
    stride=1440,
    interval=6,
)

# Create validation dataset
rpeak_data.save_dataset(
    rec_list=validation_set,
    file_name="validation.rpeak",
    win_size=10 * 360,
    stride=1440,
    interval=6,
)

# Create test dataset
rpeak_data.save_dataset(
    rec_list=test_set,
    file_name="test.rpeak",
    win_size=10 * 360,
    stride=1440,
    interval=6,
)

File saved at: data/train.rpeak
File saved at: data/validation.rpeak
File saved at: data/test.rpeak

# Verify the dataset

# Load train dataset
train_data = os.path.join(data_dir, "train.rpeak")
annotated_records, samples_info = load_dataset(train_data)
print("Number of samples in the train dataset:", str(len(samples_info)))
print("\nExample metadata for a data sample ([record ID,onset,offset,annotation]):")
print(samples_info[1])

File loaded from: data/train.rpeak
Number of samples in the train dataset: 8082

Example metadata for a data sample ([record ID,onset,offset,annotation]):
[0, 1440, 5040, [0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 'N', 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0]]

# Verify the dataset - continue

# Generate data in batch using he ECGSequence
trainseq = ECGSequence(
    annotated_records,
    samples_info,
    binary=True,
    batch_size=1,
    raw=True,
    interval=6,
)

bt = 0  # Batch number
batch_annotation = trainseq[bt][1]  # Annotation
batch_wave = trainseq[bt][0]  # Excerpt values


print("Length of annotation for each sample data:", len(batch_annotation[0]))
print("Length of each sample excerpt:", batch_wave.shape[2])
print(
    "Each excerpt is associated with an annotation of length 600, containing zeros and ones."
)

Length of annotation for each sample data: 600
Length of each sample excerpt: 3600
Each excerpt is associated with an annotation of length 600, containing zeros and ones.

Plot an excerpt.

Length of each excerpt: 3600

../_images/73017272685d99a0ea26b8360c5e92e04b2dbb940215f9e0cdf2a852ba72252d.png

Model architecture

The input to this example deep learning model is a signal excerpt of length 3600 (10 seconds with a sampling rate of 360), and the output is an array of length 600 containing probabilities of a binary classification. A 1 in the classification output indicates that there is an R-peak in the relevant subsegment, and a 0 indicates that there is no R-peak in that subsegment. Only the first channel (MLII) is used.

Show code cell content Hide code cell content

# Model architecture
import tensorflow as tf
from tensorflow.keras.layers import (
    LSTM,
    Activation,
    BatchNormalization,
    Bidirectional,
    Conv1D,
    Dense,
    Dropout,
    Flatten,
    Input,
    MaxPool1D,
)


def reg():
    return tf.keras.regularizers.l2(l=0.01)


def conv1d_block(
    inp,
    name=None,
    filters=64,
    kernel_size=64,
    bn=True,
    drate=0.30,
    pool_size=0,
    flatten=True,
    regularizer=None,
):
    output = Conv1D(
        filters=filters,
        kernel_size=kernel_size,
        strides=1,
        padding="same",
        activation=None,
        kernel_regularizer=regularizer,
    )(inp)
    if bn:
        output = BatchNormalization(axis=-1)(output)
    output = Activation("relu")(output)
    if drate > 0:
        output = Dropout(drate)(output)
    if pool_size > 0:
        output = MaxPool1D(pool_size=pool_size)(output)
    if flatten:
        output = Flatten()(output)
    return output


def model_arch(params_model):
    x_input_dim = int(params_model["x_input_dim"])
    num_classes = int(params_model["num_classes"])
    input1_layer = Input(shape=(None, x_input_dim), name="x_input_dim")
    out = tf.gather(input1_layer, tf.constant([0]), axis=1)
    out = tf.squeeze(input=out, axis=1)
    out = tf.expand_dims(out, axis=-1)
    out = BatchNormalization()(out)
    out = conv1d_block(
        out,
        name="block1",
        filters=16,
        kernel_size=6,
        bn=True,
        drate=0.2,
        pool_size=6,
        flatten=False,
    )
    out = Activation("relu")(out)
    out = Bidirectional(LSTM(32, return_sequences=True))(out)
    out = Dropout(0.2)(out)
    out = Bidirectional(LSTM(32, return_sequences=True))(out)
    out = Dropout(0.2)(out)
    out = Dense(64, activation="relu")(out)
    out = Dropout(0.2)(out)
    out = Dense(num_classes, activation="softmax")(out)
    return tf.keras.Model(inputs=input1_layer, outputs=out, name="model_rpeak")

Train model

The example model is trained in this section. The train dataset is used for training purposes, while the validation dataset is utilized for validation purposes. The loss function is the sparse categorical cross entropy, and the Adam optimizer is used for optimization. Each batch contains 32 samples, and the model is trained for three epochs.

Show code cell content Hide code cell content

import os

from pyheartlib.extra.utils import reset_seed
from pyheartlib.data_rpeak import load_dataset

os.environ["TF_CPP_MIN_LOG_LEVEL"] = "3"
import tensorflow as tf  # noqa: E402

data_dir = "data"
train_data = os.path.join(data_dir, "train.rpeak")
val_data = os.path.join(data_dir, "validation.rpeak")
mdl_checkpoint = os.path.join(os.getcwd(), "checkpoint/keras.exp")

interval_value = 6

reset_seed()

# Load train dataset
annotated_records, samples_info = load_dataset(train_data)
print("Train dataset loaded. Number of sampels:", str(len(samples_info)))

labels = []
for sample in samples_info:
    labels.append(sample[3])

params_model = {
    "x_input_dim": samples_info[0][2] - samples_info[0][1],  # win size
    "out_seq_len": len(labels[0]),
    "num_classes": 2,
    "regularizer": None,
}
params_train = {"batch_size": 32, "epochs": 3}

mymodel = model_arch(params_model)
opt = tf.keras.optimizers.Adam(
    learning_rate=0.005, beta_1=0.9, beta_2=0.999, epsilon=1e-07, name="Adam"
)
mymodel.compile(
    optimizer=opt,
    loss="sparse_categorical_crossentropy",
    metrics=["accuracy"],
    loss_weights=None,
)
early_stopping = tf.keras.callbacks.EarlyStopping(
    monitor="val_loss", min_delta=0, patience=8, verbose=1
)
reduce_lr = tf.keras.callbacks.ReduceLROnPlateau(
    monitor="val_loss",
    factor=0.9,
    patience=5,
    verbose=1,
    mode="auto",
    min_delta=0.0001,
    cooldown=0,
    min_lr=0.0001,
)
model_checkpoint = tf.keras.callbacks.ModelCheckpoint(
    mdl_checkpoint,
    monitor="val_loss",
    save_best_only=True,
    verbose=1,
)
import time  # noqa: E402

batch_sleep = tf.keras.callbacks.LambdaCallback(
    on_batch_end=lambda batch, logs: time.sleep(0.0001)
)

callbacks = [early_stopping, model_checkpoint, reduce_lr]
# callbacks = [batch_sleep]

from pyheartlib.data_rpeak import ECGSequence  # noqa: E402

trainseq = ECGSequence(
    annotated_records,
    samples_info,
    binary=True,
    batch_size=params_train["batch_size"],
    raw=True,
    interval=interval_value,
)

# Load validation dataset
annotated_records_val, samples_info_val = load_dataset(val_data)

validationseq = ECGSequence(
    annotated_records_val,
    samples_info_val,
    binary=True,
    batch_size=params_train["batch_size"],
    raw=True,
    interval=interval_value,
)

# Train model
if not Path("checkpoint").is_dir():
    model_history = mymodel.fit(
        x=trainseq,
        validation_data=validationseq,
        use_multiprocessing=True,
        # workers=2,
        epochs=params_train["epochs"],
        verbose=1,
        callbacks=callbacks,
    )
else:
    print("The model has been trained before!")

Inference

This section performs the inference on the test dataset. Classification report and confusion matrix are stored in a file. Additionally, several plots illustrating the prediction outcomes are saved.

Show code cell content Hide code cell content

# Inference --> result.txt
import os
import textwrap

import numpy as np
import tensorflow as tf
from sklearn.metrics import classification_report, confusion_matrix
from tqdm import tqdm

from pyheartlib.data_rpeak import load_dataset

data_dir = "data"
test_data = os.path.join(data_dir, "test.rpeak")
mdl_checkpoint = os.path.join(os.getcwd(), "checkpoint/keras.exp")
interval_value = 6

# Load trained model
loaded_model = tf.keras.models.load_model(mdl_checkpoint)

# Load test dataset
annotated_records_test, samples_info_test = load_dataset(test_data)
print("Train dataset loaded. Number of sampels:", str(len(samples_info_test)))

from pyheartlib.data_rpeak import ECGSequence  # noqa: E402

batch_size = 128
testseq = ECGSequence(
    annotated_records_test,
    samples_info_test,
    binary=True,
    batch_size=batch_size,
    raw=True,
    interval=interval_value,
    shuffle=False,
)
labels_true = []
labels_pred = []
for i in tqdm(range(round(len(samples_info_test) / batch_size))):
    samples = testseq[i]
    labels_true.extend(samples[1])
    probs = loaded_model.predict(samples[0], verbose=0)  # predict
    labels_pred.extend(list(np.argmax(probs, axis=-1)))
labels_true = np.array(labels_true)
labels_pred = np.array(labels_pred)
assert labels_true.shape == labels_pred.shape

# Print classification_report and confusion_matrix
true = labels_true.flatten()
pred = labels_pred.flatten()
# print(classification_report(true, pred))
# print(confusion_matrix(true, pred))
with open("./result.txt", "w") as f:
    title = (
        "************************************************************"
        "Without considering adjacency of ground truth and prediction"
    )
    print(textwrap.fill(title, width=60, replace_whitespace=False), file=f)
    print("\n", file=f)
    print(classification_report(true, pred), file=f)
    print("Confusion Matrix", file=f)
    print(confusion_matrix(true, pred), file=f)
    print("\n", file=f)


def post_tolerance(true, pred):
    # Give a tolerance for nearby segments (False positive)
    # when a False Positve is very adjacent to ground truth peak
    new_pred = np.copy(pred)
    for indx, _ in enumerate(pred):
        if pred[indx] == 1:
            if true[indx] != 1:
                if true[indx + 1] == 1:
                    new_pred[indx + 1] = 1
                    new_pred[indx] = 0
                elif true[indx - 1] == 1:
                    new_pred[indx - 1] = 1
                    new_pred[indx] = 0
                elif true[indx + 2] == 1:
                    new_pred[indx + 2] = 1
                    new_pred[indx] = 0
                elif true[indx - 2] == 1:
                    new_pred[indx - 2] = 1
                    new_pred[indx] = 0
                if true[indx + 3] == 1:
                    new_pred[indx + 3] = 1
                    new_pred[indx] = 0
                elif true[indx - 3] == 1:
                    new_pred[indx - 3] = 1
                    new_pred[indx] = 0
                if true[indx + 4] == 1:
                    new_pred[indx + 4] = 1
                    new_pred[indx] = 0
                elif true[indx - 4] == 1:
                    new_pred[indx - 4] = 1
                    new_pred[indx] = 0
                if true[indx + 5] == 1:
                    new_pred[indx + 5] = 1
                    new_pred[indx] = 0
                elif true[indx - 5] == 1:
                    new_pred[indx - 5] = 1
                    new_pred[indx] = 0
    return new_pred


# Print classification_report and confusion_matrix
new_pred = post_tolerance(true, pred)
pred = new_pred
# print(classification_report(true, pred))
# print(confusion_matrix(true, pred))
with open("./result.txt", "a") as f:
    title = (
        "************************************************************"
        "The results below were obtained by considering a very small"
        "  tolerance for the adjacency of ground truth and prediction"
    )
    print(textwrap.fill(title, width=60, replace_whitespace=False), file=f)
    print("\n", file=f)
    print(classification_report(true, pred), file=f)
    print("Confusion Matrix", file=f)
    print(confusion_matrix(true, pred), file=f)

File loaded from: data/test.rpeak
Train dataset loaded. Number of sampels: 8082

100%|██████████| 63/63 [01:40<00:00,  1.60s/it]

Results

Inference results on the test dataset.

************************************************************
Without considering adjacency of ground truth and prediction


              precision    recall  f1-score   support

           0       1.00      1.00      1.00   4733482
           1       0.95      0.94      0.94    104918

    accuracy                           1.00   4838400
   macro avg       0.97      0.97      0.97   4838400
weighted avg       1.00      1.00      1.00   4838400

Confusion Matrix
[[4728110    5372]
 [   6540   98378]]


************************************************************
The results below were obtained by considering a very small
tolerance for the adjacency of ground truth and prediction


              precision    recall  f1-score   support

           0       1.00      1.00      1.00   4733482
           1       1.00      0.98      0.99    104918

    accuracy                           1.00   4838400
   macro avg       1.00      0.99      0.99   4838400
weighted avg       1.00      1.00      1.00   4838400

Confusion Matrix
[[4733238     244]
 [   2559  102359]]

../_images/5e691026916594137628e3b03531764b923693c2a42823c16f98a9f46bb3a57d.png