Getting Started

Requirements

This version (1.22.0) of the package was tested on:

Ubuntu: 20.04 | 22.04 & Python: 3.10 | 3.11 & Processor: x86_64
macOS: 12.6.9 | 13.6 & Python: 3.10 | 3.11 & Processor: x86_64

However, it may also be compatible with other systems.

Installation

The package can be installed with pip:

$ pip install pyheartlib

Data

Pyheartlib supports the WFDB format. Therefore, recordings in other formats must be converted before using this package.

The input data should be placed in a user-defined main data directory. This directory will hold the output datasets as well.

It is necessary to create a config.yaml file in the main data directory. This file contains some required fields regarding the original data, as described below.

config.yaml

DATA_DIR: Directory of the input data relative to the main data directory.
SAMPLING_RATE: Sampling rate of the original signals.
CHANNEL: Name of the signal channels.
BEAT_TYPES: List of heartbeat types (R-peak labels).
RHYTHM_TYPES: List of rhythm types.

Example

To try this package on the MIT-BIH Arrhythmia Database, the following commands can be executed in Linux to download the data in a main data directory named data.

$ mkdir data
$ cd data
$ wget https://www.physionet.org/static/published-projects/mitdb/mit-bih-arrhythmia-database-1.0.0.zip
$ unzip mit-bih-arrhythmia-database-1.0.0.zip

Below is the contents of the config.yaml file, which must be placed in the main data directory.

# file: data/config.yaml
DATA_DIR: "mit-bih-arrhythmia-database-1.0.0/"
SAMPLING_RATE: 360
CHANNEL: ['MLII', 'V1']
BEAT_TYPES: ['N', 'L', 'R', 'j', 'e', 'V', 'E', 'A', 'S', 'a', 'J', 'F', 'f', '/', 'Q']
RHYTHM_TYPES: ['(AB', '(AFIB', '(AFL', '(B', '(BII', '(IVR', '(N', '(NOD', '(P', '(PREX', '(SBR', '(SVTA', '(T', '(VFL', '(VT']

Usage

Pyheartlib contains three main classes for processing the original data and creating datasets.

BeatData

For the heartbeat analysis task, the BeatData class can be used to make a dataset. To create a dataset using the BeatData, first it needs to be imported.

from pyheartlib.data_beat import BeatData

The next step is to create an object of the pyheartlib.data_beat.BeatData.

beatdata = BeatData(
    base_path="data",
    win=[200, 200],
    remove_bl=False,
    lowpass=False,
    progress_bar=False,
)

Descriptions of all the parameters of BeatData can be found here. For feature computation, an instance of the pyheartlib.beat_info.BeatInfo has to be created.

from pyheartlib.beat_info import BeatInfo
beatinfo = BeatInfo()

BeatInfo includes some predefined features, however custom features can also be defined. Each custom feature definition must adhere to this syntax and its name must start with F_:

def F_new_feature(self):
    return return_result

The return_result must be one of the following:

A real number.
A dictionary such as {“Feature_1”: value_1, “Feature_2”: value_2}. Each value can be a real number. Alternatively, each value can be a one-dimensional array, list, or tuple. In this case, their elements must correspond to the channels.
Tuple or one-dimensional NumPy array. Their elements must correspond to the channels. For example, the return value of F_feat_new() as a tuple (element1, element2) will produce F_feat_new(CH1) for element1 and F_feat_new(CH2) for element2. The order of channels is determined by the CHANNEL field in the config.yaml file.
A list. An output as a list will be a list.

The custom features can be added to the beatinfo object by using the add_features() method, the list of all available features can be obtained using the available_features() method, and to select desired features for computation the select_features() method can be used.

Descriptions of all the parameters of BeatInfo can be found here.

Finally, using the save_dataset_inter() method of BeatData, an inter-patient dataset can be created.

# The file will be saved in the base data directory.
beatdata.save_dataset_inter(["209", "215"], beatinfo, file="train.beat")

Example

Heartbeat dataset

RhythmData

To create a dataset for arrhythmia classification, the RhythmData class can be used. This class can be imported using the code below.

from pyheartlib.data_rhythm import RhythmData

The next step is to create an object of the pyheartlib.data_rhythm.RhythmData.

rhythm_data = RhythmData(
    base_path="data", remove_bl=False, lowpass=False, progress_bar=False
)

Descriptions of all the parameters can be found here.

Using the save_dataset() method, the dataset will be created.

rhythm_data.save_dataset(
    rec_list=train_set, file_name="train.arr", win_size=3600, stride=64
)

The dataset can be loaded using the load_dataset() function.

from pyheartlib.data_rhythm import load_dataset
annotated_records, samples_info = load_dataset("data/train.arr")

To generate batches of sample data, the dataset that was created before can be used. To accomplish this, an instance of pyheartlib.data_rhythm.ECGSequence must be created.

from pyheartlib.data_rhythm import ECGSequence

trainseq = ECGSequence(
    annotated_records,
    samples_info,
    class_labels=None,
    batch_size=3,
    raw=True,
    interval=36,
    shuffle=False,
    rri_length=20
)

The ECGSequence takes the annotated_records (ECG records) and the samples_info (metadata) that were loaded previously. Other parameters that ECGSequence takes can be found here.

Example

Arrhythmia dataset

RpeakData

To create a dataset using the RpeakData, first it needs to be imported.

from pyheartlib.data_rpeak import RpeakData

The next step is to create an object of the pyheartlib.data_rpeak.RpeakData.

rpeak_data = RpeakData(
    base_path="data", remove_bl=False, lowpass=False, progress_bar=False
)

Descriptions of all the parameters can be found here.

Using the save_dataset() method, the dataset will be created.

rpeak_data.save_dataset(
    rec_list=train_set,
    file_name="train.rpeak",
    win_size=5 * 360,
    stride=360,
    interval=72,
)

The dataset can be loaded using the load_dataset() function.

from pyheartlib.data_rpeak import load_dataset
annotated_records, samples_info = load_dataset("data/train.rpeak")

The dataset that was created can be used to generate batches of sample data. To accomplish this, an instance of pyheartlib.data_rpeak.ECGSequence must be created.

from pyheartlib.data_rpeak import ECGSequence

trainseq = ECGSequence(
    annotated_records, samples_info, binary=False, batch_size=2, raw=True, interval=72
)

The ECGSequence takes the annotated_records (ECG records) and the samples_info (metadata) that were loaded previously. Other parameters that ECGSequence takes can be found here.

Examples

Optional Preprocessing

It is possible to remove the noise from the raw signals by setting the remove_bl and/or lowpass parameters to true. If remove_bl is set to true, the baseline wander will be removed by applying two median filters to the raw signal. By setting the lowpass parameter to true, a low pass filter is applied to the signal.

It is also possible to define custom preprocessing steps to be applied to the signals. The processors parameter can be used to achieve this goal. This parameter takes a list of functions that are going to be applied to the signals according to their order of appearance in the list. Each function takes as its input a one-dimensional NumPy array and returns an array of the same length. Example:

# Multiplies the signal by 2
def custom_processor1(x):
    return 2*x

# Adds 1 to the signal
def custom_processor2(x):
    return x+1

The above steps can be applied to the raw signals.

# custom_processor1 is applied to the raw signal first, and custom_processor2 is applied after that.
custom_processors = [custom_processor1, custom_processor2]

rhythm_data = RhythmData(
    base_path="data",
    remove_bl=False,
    lowpass=False,
    progress_bar=False,
    processors = custom_processors,
)

The processors parameter allows for custom noise reduction techniques to be applied to the raw signals.