A lot of emphasis has been put on project governance, information governance and more lately data governance. Especially with the increase in regulatory oversight such as GDPR, forcing organisations to be more disciplined to ensure they are not at risk of a breach.
With the advances of machine learning and artificial intelligence coupled with the use of big data, it becomes paramount to consider analytics models governance as part of your governance, risk and compliance (GRC) regime. This article presents some of the key components of a model governance framework and explores their importance.
Why analytics model governance is important
The main reason why we implement governance in a given area is to reduce or mitigate risks. In the case of analytics models, we may want to reduce the risks of:
- models gaining insights that may invade individual’s privacy or be at the border of a privacy breach
- models not being updated and producing incorrect results due to obsolescence
- models based on flawed technical assumptions
- models not being sufficiently documented
- models failing to undergo robust enough testing
- models not being compliant with regulations
Those risks, if not receiving an adequate treatment, could translate into issues like:
- increased costs due to the impact of non-compliance for example, regulatory fines
- customer dissatisfaction
- management disillusionment.
If there was the absence of any of these components, that would increase our risk. Let’s take as an example the lack of version control, which would prevent smooth rollback in case of an error and delay any fix, or a lack of proper documentation would increase the risk of regulator dissatisfaction.
Operating model and ownership
Defining an operating model is a core aspect of governance; this should include roles and responsibilities, starting by defining artefacts ownership. It would make sense that the model owner makes the final decision on analytics modelling. This is not necessarily the analyst or scientist developing it. The model owner would typically be a business executive who will benefit or be highly impacted by the model outcome.
Ownership in this context is about accountability, where the responsibility for building the model may lie with the analyst or scientist. RACI (definition) matrices would assist organisations to clearly define the scope of responsibilities and the operating model for analytics modelling governance. At the risk of stating the obvious, it should be noted that the word ‘model’ in an operative model context, has an organisational connotation, while as we know, the word ‘model’ in the analytics context has a mathematical connotation.
The linkage with data governance
One of the main flawed assumptions an analyst may make in building a model, regardless of the industry where it is applied, is to believe the underlying data can be trusted, ie data has an acceptable level of data quality.
Although dealing with data quality is outside the scope of this paper, it is important to highlight there are several ‘dimensions’ to be considered in a data quality framework; those include accuracy, timeliness, consistency, integrity, etc. To complete this last point, cleaning up the input data for an analytical model should not be part of the model development itself, but a pre-requisite instead. The operating model and RACI matrices discussed earlier become relevant in this case.
Analytical model artefacts
There are several artefacts that constitute a well-governed analytical model.
Business case and target benefits: The business case will tell the story of what the problem is and how we want to tackle it; while the target benefits will allow us to go back and validate whether the model achieved its objective time after the relevant analyst has potentially moved on (for example, 12, 24 or 36 months later).
Regulatory requirements: Specific regulatory requirements the model would need to comply with. Note that privacy and ethics, are beyond the scope of this paper, given that entire books could be written on those matters. By the same token, a FATE model (fair, accountable, transparent, ethical) as part of the organisation’s governance framework would be a great start.
Stakeholders register: This is a list of stakeholders and approves of the model.
Algorithm or code: this is the set of statistical, mathematical or heuristic rules, formulas or sequences that receives certain inputs and yields an output, from where an insight could be gained. This typically takes the shape of programming code.
Model architecture or design: this artefact is a high-level view of input sources and technology being used to produce an output.
Model scope and assumptions: every analytical model will assume something. Model assumptions may be required in terms of scope, model parameters, data samples, data distributions, access to the data and the model itself and any special conditions that make the model work — or not work.
Logical access definition: This essentially describes who would have access to the model, the input data and the model output. Let's take for example the pharmaceutical industry; there will be models that are for a general consumption in the organisation, while others might be confidential or business sensitive and may require a different level of access.
Data samples: Every analytical model would require a data sample; this artefact would highlight the characteristics and description of the sample data being used. It would answer questions such as Where is the data sample? Where are the datasets coming from? Are there any sovereign conditions of use? Are there any risks of the data being updated or deleted?
Test strategy: The analytical model must be tested before its results are used for serious decision-making; therefore, we would need to think about the tests being performed on the analytical model, the team or members performing the tests, if the tests were passed at 100 per cent or if any concessions were made.
Model benchmarks: From a performance perspective, we would need to describe any models used as a benchmark, what kind of results we obtained, as well as the pros and cons of the new model.
Model configuration: Configuration file(s) clearly indicating location of the input data and location of the model outputs as well.
The governance of the analytical model involves multiple steps and artefacts.
Source control: It goes without saying that all the artefacts above must be placed into a source control system, so we track any changes or versions of the model.
Model development framework: it is integral to the overall model development lifecycle as noted in model development methodologies such as the Cross-Industry Standard Process for Data Mining (CRISP-DM).
Workflow: The governance of the analytical model involves multiple steps and artefacts; a workflow will allow us to establish an end-to-end process by stitching those items together using the RACI matrices as a guide.
Monitoring and calibration: Model calibration is about the collection of measures from the environment being modelled over a period of time, then comparing the measures with the derived results from the model, which has been set up to represent the underline environment.
Modelling oversight: Analytical models should be fit for purpose; however, there will be cases where a more rigorous level of governance is required (for example, Health and Banking industries) given the potential impact to the public. In those cases, an oversight function in the organisation will be imperative to verify that models perform at a required level or standard, or that they comply with a given legislation.
Operationalisation: It is necessary to differentiate an analytical model that is in development, compared with another one in production. Teams would need to firstly define what ‘production’ means and what are the requirements and process to promote a model from development into production. A production model will be expected to take inputs and produce outputs on timely basis and have some level of support.
The final product
As an analyst or data scientist, you would like to ship your model as a product, wouldn’t you? A way to see a model being shipped is as a ‘package’ containing all the related artefacts mentioned above. In addition to each iteration or release, we could also include:
- release notes indicating a description of the model, what it is, what it does
- author and contact details
- trackable versions of the package.
This package containing all the artefacts described earlier, paired with a version control mechanism will assist you in mitigating the risks and potential issues we described at the beginning of this paper.
As it can be seen, the governance of analytical models will assist us in reducing risks and improve our decision-making and insights. At the same time, this regime could be too heavy for some organisations.
At the end of the day, analytics models governance that is fit for purpose will add value to the organisation, while the analytics models governance that is too dense for a company will not. This will be a good prioritisation and balance exercise for the chief analytics officer, head of analytics or head of data science in your organisation.