Rpeak

This example shows how to use the ‍‍‍RpeakData‍‍‍ class to make a dataset. This class is similar to the RhythmData class. However, unlike the RhythmData class, it does not provide only one label for each excerpt. Instead, it provides a list of labels for each excerpt. Each element in this annotation list corresponds to a subsegment of the excerpt. The ECGSequence class uses the created dataset to generate batches of sample data.

This example is availble on GitHub.

Open In Colab

Install pyheartlib

First, pyheartlib needs to be installed.

try:
    import pyheartlib
    print(f'Pyheartlib version {pyheartlib.__version__} is already installed!')
except ModuleNotFoundError:
    print('Installing pyheartlib...')
    %pip install pyheartlib
    import pyheartlib
    print(f'Pyheartlib version {pyheartlib.__version__} is installed!')
Pyheartlib version 1.21.0 is already installed!

Download raw data

Pyheartlib supports the WFDB format. A popular dataset that uses this format is the “MIT-BIH Arrhythmia Database”. The code below downloads this dataset and stores it in the data directory.

Hide code cell content
# Download the raw data and store them in the base data directory
from pathlib import Path
if not Path('data').is_dir():
  print('downloading raw data...')
  import io, zipfile
  from urllib.request import urlopen
  url='https://www.physionet.org/static/published-projects/mitdb/mit-bih-arrhythmia-database-1.0.0.zip'
  with urlopen(url) as rs:
      zipf = zipfile.ZipFile(io.BytesIO(rs.read()))
      zipf.extractall('data/')

  # Create the config file. For this example, it will be download from the original repository
  with urlopen("https://raw.githubusercontent.com/devnums/pyheartlib/main/src/pyheartlib/config.yaml") as file:
      content = file.read().decode()
  with open("data/config.yaml", 'w') as file:
      file.write(content)

Create dataset

To create a dataset using the RpeakData, first it needs to be imported.

from pyheartlib.data_rpeak import RpeakData

The next step is to create an object of the pyheartlib.data_rpeak.RpeakData.

rpeak_data = RpeakData(
    base_path="data", remove_bl=False, lowpass=False, progress_bar=False
)

Descriptions of all the parameters can be found here.

Using the save_dataset() method, the dataset will be created.

rpeak_data.save_dataset(
    rec_list=train_set,
    file_name="train.rpeak",
    win_size=5 * 360,
    stride=360,
    interval=72,
)

Let’s create the dataset by running the next code block.

import numpy as np
import pandas as pd
from pyheartlib.data_rpeak import RpeakData

# Make an instance of the RpeakData
rpeak_data = RpeakData(
    base_path="data", remove_bl=False, lowpass=False, progress_bar=False
)

# Define records
train_set = [201, 203]

# Create the dataset
# The win_size specifies the length of the excerpts
rpeak_data.save_dataset(
    rec_list=train_set,
    file_name="train.rpeak",
    win_size=5 * 360,
    stride=360,
    interval=72,
)
File saved at: data/train.rpeak

Now that the dataset is ready, it can be loaded using the load_dataset() function.

from pyheartlib.data_rpeak import load_dataset
annotated_records, samples_info = load_dataset("data/train.rpeak")

Let’s load the data and count the number of samples for each class.

# Load the dataset
from pyheartlib.data_rpeak import load_dataset
annotated_records, samples_info = load_dataset("data/train.rpeak")

labels = []
for sample in samples_info:
    labels.append(sample[3])
df = pd.DataFrame(np.unique(labels, return_counts=True), index=["Label", "Count"])
print(df)
File loaded from: data/train.rpeak
           0    1   2  3      4   5     6    7   8
Label      0    A   F  J      N   Q     V    a   j
Count  65388  150  15  5  20718  20  3210  494  50

The metadata of a sample is a list: [record ID, onset, offset, annotation]

# Metadata of a sample excerpt
# [record ID, onset, offset, annotation]
print(samples_info[102])
[0, 36720, 38520, [0, 0, 0, 0, 'N', 0, 'N', 0, 'N', 0, 0, 0, 'N', 0, 0, 'N', 0, 0, 'N', 0, 0, 0, 'N', 0, 0]]

Generate data samples

In this section, the dataset that was created in the previous section will be used to generate batches of sample data. To accomplish this, an instance of pyheartlib.data_rpeak.ECGSequence must be created.

from pyheartlib.data_rpeak import ECGSequence

ecgseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=2, raw=True, interval=72
)

The ECGSequence takes the annotated_records (ECG records) and the samples_info (metadata) that were loaded previously.

Other parameters that ECGSequence takes can be found here.

Let’s generate a batch of data and examine the shapes and values of the samples.

# Generate data in batch using he ECGSequence

# Returns a tuple containing two elements when its object is utilized in
# this way: ECGSequence_object[BatchNo].

# The first element (Batch_x) contains data samples and the
# second one (Batch_y) their associated annotation.

# Batch_x contains the signal excerpts or their features (Batch_wave).

# If `raw` is False, Batch_wave has the shape of
# (Batch size, Number of channels, Number of sub-segments, Number of features), 
# otherwise, it has the shape of (Batch_size, Number of channels, Length of excerpt).

# Batch_y has the shape of (Batch size, Length of annotation list).

from pyheartlib.data_rpeak import ECGSequence

ecgseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=3, raw=True, interval=72
)
bt = 0  # Batch number
batch_x, batch_y = ecgseq[bt]
batch_annotation = batch_y  # Annotation
batch_wave = batch_x  # Excerpt

print("Length of annotation for each sample data:", len(batch_annotation[0]))
print("Length of each sample excerpt:", batch_wave.shape[2])
print("Batch_wave shape:", batch_wave.shape, ", Batch_annotation shape:", batch_annotation.shape)
Length of annotation for each sample data: 25
Length of each sample excerpt: 1800
Batch_wave shape: (3, 2, 1800) , Batch_annotation shape: (3, 25)
# Annotations of the samples in the batch
print(batch_annotation)
[['N' '0' '0' 'N' '0' 'N' '0' '0' 'N' '0' 'N' '0' '0' '0' 'N' '0' '0' 'N'
  '0' 'N' '0' '0' 'N' '0' 'N']
 ['0' 'V' '0' '0' 'N' '0' '0' '0' 'N' '0' '0' 'N' '0' 'N' '0' '0' 'N' '0'
  'V' '0' '0' 'N' '0' '0' '0']
 ['0' '0' 'N' '0' '0' '0' '0' 'N' '0' '0' '0' '0' '0' 'N' '0' '0' '0' '0'
  '0' 'N' '0' '0' '0' '0' '0']]
# Plot an output excerpt of the batch
wf = batch_wave[0]
wf_ch1 = wf[0]
wf_ch2 = wf[1]
print(f'Length of each excerpt: {len(wf_ch1)}')

import matplotlib.pyplot as plt
plt.figure(figsize=(6, 3))
plt.plot(wf_ch1);
plt.plot(wf_ch2);
chs = rpeak_data.config["CHANNEL"]
plt.legend([chs[0], chs[1]], loc="upper right");
Length of each excerpt: 1800
../_images/58c6f8908c999534a3019b88e1d9c27f7e1c6e60606ba6a43aa3e2e332ad8f0c.png

The code below generates a batch with the raw parameter set to False. In this case, the generated batch contains computed features instead of raw waveform.

# When raw is set to False
ecgseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=3, raw=False, interval=72
)

bt = 0  # Batch number
batch_x, batch_y = ecgseq[bt]
batch_annotation = batch_y  # Annotation
batch_wave = batch_x  # Excerpt

print("Length of annotation for each sample data:", len(batch_annotation[0]))
print("Batch_wave shape:", batch_wave.shape, ", Batch_annotation shape:", batch_annotation.shape)
Length of annotation for each sample data: 25
Batch_wave shape: (3, 2, 25, 14) , Batch_annotation shape: (3, 25)