pyheartlib.data_beat
Module Contents
Classes
Processes the provided ECG records and creates a dataset containing |
- class pyheartlib.data_beat.BeatData(base_path=None, win=[60, 120], num_pre_rr=10, num_post_rr=10, remove_bl=False, lowpass=False, cutoff=45, order=15, progress_bar=True, **kwargs)
Bases:
pyheartlib.data.DataProcesses the provided ECG records and creates a dataset containing waveforms, features, and annotations.
- Parameters:
base_path (str, optional) – Path of the main directory for storing the original and processed data, by default None
win (list, optional) – [Onset, Offset] of signal excerpts around the R-peaks, by default [60, 120]
num_pre_rr (int, optional) – Number of preceding R-peak locations to be included for each beat, by default 10
num_post_rr (int, optional) – Number of subsequent R-peak locations to be included for each beat, by default 10
remove_bl (bool, optional) – If True, the baseline wander is removed from the original signals prior to extracting excerpts, by default False
lowpass (bool, optional) – Whether or not to apply low-pass filter to the original signals, by default False
cutoff (int, optional) – Parameter of the low pass-filter, by default 45
order (int, optional) – Parameter of the low pass-filter, by default 15
progress_bar (bool, optional) – Whether to display a progress bar, by default True
processors (list, optional) – Ordered list of functions’ names for preprocessing the raw signals. Each function takes a one-dimensional NumPy array as its input and returns an array of the same length.
Examples
>>> beatdata = BeatData(base_path="./data", win=[200, 200], >>> remove_bl=False, lowpass=False, >>> progress_bar=True) >>> # create a BeatInfo object >>> beatinfo = BeatInfo() >>> # save the dataset file >>> beatdata.save_dataset_inter(DS1[17:18], beatinfo, file="train.beat") >>> # load the dataset from file >>> train_ds = beatdata.load_data(file_name="train.beat") File loaded from: ./data/train.beat -Shape of "waveforms" is (2985, 400). Number of samples is 2985. -Shape of "beat_feats" is (2985, 27). Number of samples is 2985. -Shape of "labels" is (2985,). Number of samples is 2985. N L R j e V E A S a J F f / Q train.beat 2601 0 0 0 0 1 0 383 0 0 0 0 0 0 0
- make_frags(signal, r_locations=None, r_label=None)
Fragments one signal into beats and returns the signal excerpts and corresponding labels.
- Parameters:
signal (list) – A list containing signal values.
r_locations (list) – A list containing rpeak locations on the signal.
r_label (list) – A list containing the rpeak(beat) labels.
- Returns:
- signal_fragsnumpy.ndarray
A 2D array containing extracted beat excerpts.
- beat_typeslist
Contains the corresponding labels of each beat excerpt.
- r_locslist
A list containing lists of previous, itself, and future rpeak locations for each beat. Can be used for HRV calculations.
- s_idxslist
Contains the starting point of each extracted beat excerpt on the original signal. This is computed by subtracting the window onset from the rpeak location.
- Return type:
Tuple
- make_dataset(records, beatinfo_obj=None)
Creates a dataset from the provided records.
- Parameters:
records (list) – A list containing records ids.
beatinfo_obj (instance of BeatInfo.) –
- Returns:
Dictionary with keys:
- ’waveforms’numpy.ndarray
2D array of beat waveforms.
- ’beat_feats’pd.DataFrame
DataFrame of beats’ features.
- ’labels’numpy.ndarray
1D array of beats’ labels.
- Return type:
dict
- beat_info_feat(data, beatinfo_obj)
Provides the computed features for all the beats.
- Parameters:
data (dict) –
Dictionary with keys:
- ’waveform’numpy.ndarray
Array of waveforms (#waveforms, len_waveforms, #channels).
- ’rpeak_locs’list
List of rpeak locations.
- ’rec_ids’list
List of record ids.
- ’start_idxs’list
List of start_idxs of waveforms on the raw signal.
- ’labels’list
List of beat labels.
beatinfo_obj (Instance of BeatInfo) –
- Returns:
- featureslist
Contains feature dictionaries for all the beats.
- labelslist
Contains corresponding beat labels.
- Return type:
Tuple
- save_dataset_inter(records, beatinfo_obj, file=None)
Creates a dataset from the given record IDs.
- Parameters:
records (list) – List of records IDs.
beatinfo_obj (Instance of BeatInfo.) –
file (str) – Name of the file that will be saved.
- save_dataset_intra(records, beatinfo_obj, split_ratio=0.3, file_prefix='intra')
Makes the dataset in intra-patient way.
- Parameters:
records (list, optional) – List of records IDs.
beatinfo_obj (Instance of BeatInfo.) –
split_ratio (float, optional) – Ratio of test set, by default 0.3
file_prefix (str, optional) – Prefix for the file names to be saved, by default ‘intra’
- save_dataset_single(record, beatinfo_obj, split_ratio=0.3, file=None)
Saves the signal fragments and their labels into a file for a single record.
- Parameters:
record (str) – Record id.
beatinfo_obj (instance of BeatInfo) –
split_ratio (float, optional) – Ratio of test set, by default 0.3
file (str, optional) – Name of the file to be saved, by default None
- load_data(file_name)
Loads a file containing a dataframe.
- Parameters:
file_name (str) – File name. The final final path is the join of base path and file name.
- Returns:
Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
- Return type:
dict
- report_stats(yds_list)
Counts the number of samples for each label type in the data.
- Parameters:
yds_list (list) – List containing several label data sets. e.g train,val,test.
- Returns:
A list of dictionaries. One dictionary per one data set. Keys are labels types(symbols) and values are the counts of each specific symbol.
- Return type:
list
- report_stats_table(yds_list, name_list=[])
Returns the number of samples for each label type in the data.
- Parameters:
yds_list (list) – List containing several label data. e.g train,val,test.
name_list (list, optional) – A list of strings as the name of label data e.g. train,val,test, by default []
- Returns:
A dataframe containing symbols and their counts.
- Return type:
pandas.dataframe
- per_record_stats(rec_ids_list=None, cols=None)
Returns a dataframe containing the number of each type in each record.
- Parameters:
rec_ids_list (list, optional) – List of record ids, by default None
cols (list) – List of labels classes, by default None
- Returns:
Contains count of each label type.
- Return type:
pandas.dataframe
- slice_data(ds, labels)
Returns the data according to the provided annotation list.
- Parameters:
ds (dics) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
labels (list) – List of labels to be kept in the output.
- Returns:
Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
- Return type:
dict
- search_label(inp, sym='N')
Searches the provided data and returns the indexes for a patricular label.
- Parameters:
inp (dict or numpy.ndarray) – Input can be a dictionary having a ‘labels’ key, or a 1D numpy array containing labels.
sym (str, optional) – The label to be searched for in the dataset, by default ‘N’
- Returns:
A list of indexes corresponding to the searched label.
- Return type:
list
- Raises:
TypeError – Input data must be a dictionary or a numpy array.
- clean_inf_nan(ds)
Cleans the dataset by removing samples (rows) with inf or nan in computed features.
- Parameters:
ds (dict) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
- Returns:
Cleaned dataset with keys: “waveforms”, “beat_feats”, and “labels”.
- Return type:
dict
- clean_IQR(ds, factor=1.5, return_indexes=False)
Cleans the dataset by removing outliers using IQR method.
- Parameters:
ds (dict) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
factor (float, optional) – Parameter of IQR method, by default 1.5
return_indexes (bool, optional) – If True returns indexes of outliers, otherwise returns cleaned dataset, by default False
- Returns:
Cleaned dataset with keys: “waveforms”, “beat_feats”, and “labels”, or indexes of outliers.
- Return type:
dict or list
- clean_IQR_class(ds, factor=1.5)
Cleans dataset by IQR method for every class separately.
- Parameters:
ds (dict) – Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
factor (float, optional) – Parameter of IQR method, by default 1.5
- Returns:
Cleaned dataset with keys: “waveforms”, “beat_feats”, and “labels”.
- Return type:
dict
- append_ds(ds1, ds2)
Appends two datasets.
- Parameters:
ds1 (dict) – Datasets with keys: “waveforms”, “beat_feats”, and “labels”.
ds2 (dict) – Datasets with keys: “waveforms”, “beat_feats”, and “labels”.
- Returns:
Dataset with keys: “waveforms”, “beat_feats”, and “labels”.
- Return type:
dict