Health-GPS

Logo

Global Health Policy Simulation model

View the Project on GitHub imperialCHEPI/healthgps

Global Health Policy Simulation model

Home Quick Start User Guide Software Architecture Data Model Developer Guide API

Software Architecture

The Health-GPS software architecture adopts a modular design approach to provide the building blocks necessary to compose the microsimulation, which is written in modern, standard ANSI C++20 and using object-oriented principles for software development. The software application is open source, cross-platform, and comprises of four main components:

Health-GPS Components
Health-GPS Microsimulation Components

These components along with the physical data storage are the minimum package to deploy and use the Health-GPS microsimulation. All input data processing, model parameters fitting, and results analysis procedures are carried out outside the microsimulation using tools such R, Julia and Python which are very efficient in data wrangling, statistical analysis, and machine learning algorithms.

The Health-GPS framework adopts a modular design to specify the building blocks necessary to compose the overall system, several modules and sub-model types are required as shown below.

Health-GPS Modules
High-level Architecture of the Health-GPS Framework

The simulation engine clock and events scheduling is based on the Discrete Event System Specification (DEVS) and provided by the ADEVS library. The simulation results are streamed asynchronous to the outside world via the message bus instance provided to the model by the host application during initialisation.

The software architecture defines interfaces for modules, sub-models, and external communication; these abstractions provide decoupling, reuse, and flexibility for composing the microsimulation to answer different research questions. All modules share a common interface, as shown below, to enable dynamic registration of different modules version using a factory pattern, which also makes available the underline data infrastructure and user inputs for module instance creation.

Health-GPS Common Module Interface
Simulation Module Common Interface

The simulation module type enumeration is used by the microsimulation engine to request the correct simulation module type implementation from the Module Factory, which uses the same enumeration for registration of the module builder functions required to be defined by each concrete implementation of a simulation module interface as shown below. The RuntimeContext data structure, used as parameter, is created by the simulation engine to store the virtual population, share the policy scenario, run-time information, e.g., current run number and time, and common infrastructure such as random number generator, message bus and model settings with all modules during calls via the public interface.

Health-GPS Module Factory
Module Factory Class Diagram with Concrete Builder Function example

The module builder functions have access to both the model inputs and backend data storage (Repository) when requested to create the respective simulation module instance. The Repository interface shown below, provides read-only access to external datasets loaded via configuration to parameterise the risk factor module, and exposes the Datastore interface implementation instance. The factory registered module builders can retrieve the required raw data, reshape, and combine to create the respective module parameters definition and instance.

Health-GPS Repository
Data Repository Interface Diagram

This minimalistic design exposes the available datasets to the factory module builder functions, ultimately responsible for creating fully working module instances when requested. The requirement to evaluate baseline and intervention scenarios together as discussed later, provides an opportunity to extend the design to support caching of read-only modules parametrisation to share between the two scenarios. This optimisation would minimise backend I/O and memory usage, at the expense of introduced coupling between the repository and modules parameter.

The backend data storage interface shown below, defines the contract, the Data API abstraction providing Health-GPS a uniform data easy access layer to the physical backend storage, with strong typing, and fully agnostic of implementation.

Health-GPS Data API
Backend Data API Interface

See Data Model for the backend data model.definition.

To take the virtual population through time, the simulation modules have different requirements, and consequently the simulation module interface has been extended with new properties and operations to satisfy the different modules as shown below.

Health-GPS Extended Module Interface
Extended Simulation Module Common Interface

All modules must initialise the virtual population at the beginning of the simulation and update the respective properties at each subsequent simulated time step until the simulation ends. Modules providing additional functionality to the simulation algorithm such as population trends and disease indicators have specific extension added to their interfaces.

The two host modules, risk factor and disease respectively, are special containers for similar sub-models and likewise are responsible for managing the creation, ownership and execution order when requested by the simulation engine or other modules.

The risk factor module hosts hierarchical linear models defined and fitted to data by the user; and provided to the host application in the configuration file. The hierarchical linear model interface below defines the two types of models as previously described: Static and Dynamic, the first is used to initialise the population, and the second to update the population’s dynamics respectively.

Health-GPS Hierarchical Models
Hierarchical Linear Model Common Interface

The disease module hosts multiple instances of disease models from known groups, configured for different diseases definition. The disease model public interface is shown below; diseases are uniquely identified by type, two groups of diseases a current modelled: others and cancers, the first represents general noncommunicable diseases, and the second types of cancer respectively.

Health-GPS Disease Models
Disease Model Common Interface

The main difference between the two groups of diseases is on internal modelling, both groups’ definition is country-based and include rates for disease: prevalence, incidence, mortality and remission by age and gender; and relative risks for diseases and risk factors. However, cancers detection, mortality and remission are modelled differently from the others’ group and require an additional set of parameters data to be provided as part of the definition.

Virtual Population

All modules act on a virtual population of entities, individuals, or actors, that are the centre of the microsimulation model. The Health-GPS population is dynamic and changes over time with births, deaths and immigration being the events affecting the population size. The entire population is stored using a C++ standard library vector<T> for dynamic memory management and exception safety, the vector is protected with the thin wrapper for easy access.

Below are the class diagrams for the thin Population wrapper, the virtual Person data structure and associated types as used to represent individuals within the simulated virtual population.

Health-GPS Virtual Population
Virtual Population’s Entity definition

Individual are given a unique identifier number at creation, contains trivial properties and flags, and dynamic properties for representing diseases and risk factors values as follows:

The Disease type stores the diseases’ value, status: Active, indicates that the individual current has the disease; Free, the individual had the disease, but is now free (remission). The start_time value holds the time (simulation time) when the individual was diagnosed with the disease, and the time_since_onset holds the onset time of the disease at diagnose, used for cancers only.

Helper methods are provided for easy access, for instance the is_active method checks all the status flags to indicate whether an individual is current in the simulation; and the get_risk_factor_value method handle the complex access to the dynamic properties.

Simulation Engine

The simulation engine is responsible for managing the simulation clock, events scheduling and simulation execution order. The Health-GPS engine is based on the Discrete Event System Specification (DEVS) and provided by the ADEVS library. DEVS formalism is a system-theoretic concept that represents inputs, states, and outputs, like a state machine, but with the critical addition of a time-advance function. This combination enables DEVS models to represent discrete event systems, agent-based systems as well as hybrids with continuous components in a straightforward manner. Health-GPS describes a discrete event dynamic system, which can be advantageously represented as DEVS models for greater generality, clarity, and flexibility.

The DEVS model has been abstracted behind the Simulation base class, which defines the Health-GPS simulation engine shown below, it adds extensions to support this specific workflow, and stores the SimulationDefinition data structure containing the minimum information required to create a simulation. The pseudorandom number generator functionality required by the model is defined by the RandomBitGenerator interface, algorithms for generating sequence of numbers can be implemented and easily used by the simulation engine at runtime.

Health-GPS Engine Interface
Health-GPS Simulation Engine Interface

The default implementation of the Simulation interface, the simulation engine, is provided by the HealthGPS class shown below. The simulation engine requests the module factory to create all the required modules type instances during construction. The engine owns the module instances created by the factory and has full control over the duration of each run and time step advancements during its lifespan. The RuntimeContext data structure is created by the Health-GPS engine during construction to store the virtual population, and share common infrastructure, runtime information, and model settings with modules via calls to the public interface.

Health-GPS Engine
Health-GPS Simulation Engine

The engine enables communication with the outside world to update on progress, report errors, and publish simulation results via messaging, the EventAggregator and EventSubscriber interfaces define the required functionality that a concrete Message Bus implementation must provide as shown below. Four types of messages are current defined, however new messages can easily be added by extending the EventMessage interface, a single message is broadcasted to one or more subscribers.

Health-GPS Engine
Health-GPS Message Bus Interface

A concrete message bus instance is provided to the Health-GPS engine during construction and exposed by the RuntimeContext via the publish method to streamline usage. Messages can be sent synchronous or asynchronous, depending on specific requirements, for example, error messages need immediate attention before continuing, usually synchronous; while notification messages are typically queued before processing, may be asynchronous.

The simulation engine internal workflow shown below, highlights the main algorithms executed during the Health-GPS engine instance lifecycle, experiment scenario evaluation, and a single simulation run respectively. The number of simulation replications or runs required by an experiment is controlled externally by the simulation executive, which uses the same engine instance to complete multiple runs of the same scenario.

Health-GPS Engine Workflow
Health-GPS Simulation Engine Workflow

The order of execution of each module and provision of parameters for each call is the responsibility of the simulation engine algorithm. The same random number generator instance is used throughout the simulation, the generator is initialised during the Initialise step, users can optionally provide a fixed seed to initialise the generator for reproducibility. This initialisation is sufficient for experiment with only baseline scenario, however experiments with baseline and intervention scenarios being evaluated together, the random number generator is reset before each run by the simulation executive to ensure reproducibility. The simulation engine consists of two main algorithms to initialise and update the virtual population over time respectively.

Initialise Population

Creating and initialising the virtual population is the first step of a simulation run, below is the sequence diagram showing the events and order of activation of the different modules by the Health-GPS algorithm.

Health-GPS Initialise Population
Initialise Population Algorithm (Sequence Diagram #1)

The algorithm sets the simulation world clock, in years, to the user’s defined start time, and uses the simulation module interface to request each module to initialise the relevant properties of the virtual population individuals.

Update Population

The update population algorithm is the heart of the Health-GPS microsimulation, the sequence of events and order of activation for different modules during each time step update of the simulation is shown below. The algorithm moves the simulation clock, in years, forwards until the user’s defined end time is reached, at which point the algorithm terminates, finishing the run.

Health-GPS Update Population
Update Population Algorithm (Sequence Diagram #2)

The empty squares in the Scenario timeline highlight data synchronisation points between the baseline and intervention scenarios, see next section for details.

Policy Scenarios

Each simulation instance must have a single policy scenario associated as part of the experiment, the scenarios common interface definition is shown below. There are wwo types of scenarios are supported: Baseline to define the trends in the observed population to measure outcomes against; and Intervention for policies designed to change the observed trends during a specific time frame.

Health-GPS Policy Scenarios Interface
Policy Scenario Common Interface

Policy scenario is not a typical module, their implementation consists of a single function that is invoked with the information required for it to act: the individual, current time, the risk factor key, and risk factor value. The baseline scenario implementation simply loopback the risk factor value without making any change, independent of arguments, while the intervention scenarios definition might consist of external data, complex rules, and specific time frame to act upon invocation.

The current modelling of intervention scenario requires data synchronisation between the baseline and intervention scenarios at multiple points, resulting on pairs of simulations being evaluated at a time, instead of independently. The figure below illustrates the present solution, based on shared memory to work on a single machine, it creates a unidirectional channel to send messages, asynchronous, from baseline to intervention scenarios. At the receiving end, the channel is synchronous, blocking until the message arrives or a pre-defined time expires, forcing the experiment to terminate.

Health-GPS Policy Scenarios Sync
Policy Scenario’s Data Synchronisation Mechanism

An alternative to this design is to use a message broker, e.g., RabbitMQ, or a distributed event streaming platform such as Apache Kafka to distribute random generator seeds and messages over a network of computers running in pairs to scale-up the model’s virtual population size and throughput. This solution would allow for a single baseline scenario data to be shared by multiple intervention scenarios, marked reducing the requirement for simulation pair coupling.

Simulation Executive

The simulation executive creates the simulation running environment, instructs the simulation engine to evaluate the experiment scenarios for a pre-defined number of runs, manage master seeds generation, notify progress, and handle experiment for cancellation. The ModelRunner class shown below, implements the Health-GPS simulation executive.

Health-GPS Runner Class Diagram
Simulation Executive Class Diagram

Two modes of evaluating a simulation experiment as provided by the simulation executive, using the run function with overloaded parameters. The two paths of execution are illustration below, the first simulates no-intervention, baseline scenario only experiments, while the second simulates intervention experiments with baseline and intervention scenarios evaluated as a pair, and data synchronisation as described above.

Health-GPS Simulation Runner
Health-GPS Simulation Executive Activity Diagram

Experiment scenarios are evaluated in parallel using multiple threads, however the need to exchange data between scenarios creates an indirect synchronisation with a small overhead. The ADEVS executive, Simulator class, is use inside each thread loop to execute the respective experiment scenario. The simulation executive communicates with the outside world via messages, ideally sharing the same message bus instance with the simulation engine, indicating the start and finish of the experiment, notifying error and cancellation.

The message bus mechanism decouples the sender from the receiver, typically one or more event monitors are used to subscriber for messages, receive, queue, and process the messages queue on its own pace and thread, common activities are display on screen, stream over the internet, summarise results and/or log to file.

Deployment

The various components of the Health-GPS ecosystem can be deployed to multiple computing platforms. The four components are packaged together into the host application executable, which is purpose built for each target platforms as shown below. The backend data storage is platform independent, but must be available, accessible, and properly configured for the application to work correctly at runtime.

Health-GPS Deployment
Health-GPS Deployment Package

The version of the libraries required by the application at runtime depends on the compiler being used to build Health-GPS executable. The source code is portable for compilers supporting C++20 standard, however the resulting binaries are platform dependent and must be built, tested, and deployed accordingly for the model to work as expected.

See Data Model and Developer Guide for detailed information on the backend data storage and the various interfaces implementation respectively.