How to predict blood pressure continuously to help doctors in reaction to a heart attack of his patient? How to speed up their response when an endangered person is at this time cycling or running? How to avoid a conventional way of measuring blood pressure where a patient should breathe smoothly in the correct position? Neural Network (NN) comes to our rescue and helps us to solve this nonlinear regression problem. To implement whatever type of NN, we need an API.

I decided to prepare a prototype with Python and Keras. Under the hood, Keras supports engines like TensorFlow, Theano and more. Knowing that the University of Montreal will not help Theano in the future, I tried to resolve it using the power of popular TensorFlow as the backend of Keras. Keep in mind that we can use Keras also with external backend.

Medical data sets

Once I had chosen with API, I needed to find a source of medical data. Where should we go for medical data sets? After a few hours of researches, I can outright recommended website, where you can find various samples of medical data. You can combine different parameters of a human’s body which you are interested in and generate full CSV file with example anonymized patients. To predict blood pressure, we need a data set which includes parameters like systolic blood pressure (SBP) and diastolic blood pressure (DBP). Size of data set for the prediction process has approximately 5000 rows – divided into three sets 80% training, 10% validation and 10% tests. With no doubts, we can name it as shallow learning instead of deep learning, due to lack of available patients with big data sets. Sample of the data can be found in the table below. To simplify, in this post, I will show how to predict a single parameter, for instance, SBP.

Preparing data

First of all, lets have a look at which libraries will be used in our python script.

import numpy
import matplotlib.pyplot as plt
from pandas import read_csv
import math
from tensorflow.keras.models import Sequential
from tensorflow.keras.layers import Dense
from tensorflow.keras.layers import LSTM
from tensorflow.python.keras.callbacks import EarlyStopping
from sklearn.preprocessing import MinMaxScaler
from sklearn.metrics import mean_squared_error

Afterwards, all values must be mathematically normalized. It is a required form of optimization before applying it to the model. Moreover, we need to divide the data set in proportion, as I described earlier. Train_size parameter appearing in the code includes 90% of approximately 4500 rows and rest 10% is for validation. Next step is to convert dataframe to NumPy array, which is required to fit data into the model. This method requires a window-size parameter that defines how many rows will be needed to predict at least one row. This form of prepared data is ready to be used in model fitting and validation process. Below you can find a piece of code that includes data delivery and transforming data set into the readable shape of a matrix for NN (link to a repositorium).

def prepare_dataset(scaler, window_size):
    raw_dataframe = read_csv('PredictionDataSet-training&validation.csv',
                             usecols=[0], engine='python')
    raw_dataset = raw_dataframe.values.astype('float32')

    normalized_dataset = scaler.fit_transform(raw_dataset)
    rows_amount = int(len(normalized_dataset))
    train_size = int(rows_amount * 0.9)
    validation_size = rows_amount - train_size
    train, validation = normalized_dataset[0:train_size,
                                           :], normalized_dataset[train_size: rows_amount, :]

    # reshape into X=t and Y=t+window_size
    trainX, trainY = convert_dataset(train, window_size)
    bp_to_validate, bp_original = convert_dataset(validation, window_size)

    # reshape input to be [samples, time steps, features]
    trainX = numpy.reshape(trainX, (trainX.shape[0], 1, trainX.shape[1]))
    bp_to_validate = numpy.reshape(
        bp_to_validate, (bp_to_validate.shape[0], 1, bp_to_validate.shape[1]))
    return bp_to_validate, bp_original, trainX, trainY

Training model

Once you have prepared data using the method described in the preceding subparagraph, you can define the architecture of the model. Based on conventions on how to design a model on the basis of a neural network, I prepared an experiment for prediction processes. The prepared model uses long short-term memory (LSTM) layer, which is an extension of a recurrent neural network (more about lstm).

The whole network includes 512 neurons – approximately 10% of the data set size. Presented piece of code also contains parameter named callback. The parameter can store different kind of callback methods. One of them is ‘EarlyStopping’ which helps to avoid overfitting by NN. You can save the model by using “” command which keeps a copy of the model in readable format for deep learning pipelines using apache spark. Once the model is trained and saved, python’s script verifies trained the model using a validation data set. Below you can find the code of the primary method, which includes parameters to train the model.

def main():
    window_size = 100
    scaler = MinMaxScaler(feature_range=(0, 1))
    bp_to_validate, bp_original, trainX, trainY = prepare_dataset(
        scaler, window_size)

    callback_early_stopping = EarlyStopping(
        monitor='loss', patience=10, verbose=1)
    callbacks = [callback_early_stopping]

    model = Sequential()
    model.add(LSTM(1, input_shape=(1, window_size),
                   activation='linear', return_sequences=True))
    model.add(LSTM(512, activation='sigmoid'))
    model.add(Dense(1, activation='linear'))
    model.compile(loss='mean_squared_error', optimizer='adam'), trainY, steps_per_epoch=62, epochs=100, batch_size=16,
              verbose=1, callbacks=callbacks)

    bp_validated = model.predict(bp_to_validate)
    draw_results(bp_original, bp_validated, scaler)


Root-mean-square error (RMSE) is a common method to measure result correctness. To estimate model accuracy, we need to restore data rows from normalized to primitive form. A good practice is that the result measurement does not include warm up state (first 10% of rows – on the graph below marked as a dark grey field). Taking into account the above, we can provide graphical output using by one of the most popular libraries like matplotlib. The method presented below is responsible for the output.

def draw_results(bp_original, bp_validated, scaler):
    bp_validated_inversed = scaler.inverse_transform(bp_validated)
    bp_original_inversed_values = scaler.inverse_transform([bp_original])[0]

    warmup_steps = int(len(bp_validated_inversed) * 0.1)

    rmse_of_validation_set = math.sqrt(mean_squared_error(
        bp_original_inversed_values[warmup_steps:], bp_validated_inversed[warmup_steps:, 0]))
    print('Validation RMSE: %.2f RMSE' % (rmse_of_validation_set))"seaborn")
    plt.axvspan(0, warmup_steps, facecolor='black', alpha=0.15)
    plt.plot(bp_original_inversed_values, color='grey', label='original')
    plt.plot(bp_validated_inversed, color='yellowgreen', label='validated')
    plt.title('VALIDATION OF SYSTOLIC BLOOD PRESSURE', fontweight='bold')

Below you can find example graph of SBP which was drawn by matplotlib on basis of validation data set.

and in reference to usage model with test data set:


Results look promising, just at the reasonable level of RMSE, which is visible on the graph. Future experiments will require exchanging MinMaxScaler to Z-Score method and data sets with longer history, which will allow achieving a deeper architecture that could be used in a factual situation in the clinic. The generated model is ready to be attached to your databricks job. This solution could be used, e.g. in an ambulance.

This post is just a shallow dive into regression. If you are experienced with upper mentioned architecture and you have a suggestion on how to refine presented solutions – don’t hesitate, share it in a comment below.