Big data, privacy and information governance: Incorporating an ethical-based assessment
A key concern about the IoT and big data are the challenges it presents to maintaining privacy of personal information, particularly when analytics and profiling are involved.
Good information governance enables organisations to meet their regulatory obligations in respect of compliance, as well as achieving an effective ethical-based approach in line with strategic objectives.
An ethical-based approach will ensure that the processing of personal information in big data analytics is carried out in a fair, transparent, responsible and ethical manner.
As the law lags behind in rapid technology innovations, particularly in big data, artificial intelligence (AI), machine-learning and the Internet of Things (IoT), there is increasing awareness and discussion about the need for an ethical-based approach to data analytics.
This article considers why an can build trust and transparency with consumers and citizens, and why it should be part of good information governance, as a means of maximising the value of information derived from data analytics while minimising risks.
Big data describes the large volumes of data held by corporations and governments. Using analytics technology tools, insights and knowledge can be derived from the data. These insights can then be used to make informed decisions, for example, in the development of new or improved products or services providing a competitive advantage and ultimately delivering results to the bottom line.
In other words, there is big value in the information that can be extracted from data being collected and stored by both the corporate and government sectors. The Australian Government as part of the National Innovation and Science Agenda recognises data as ‘a strategic national resource that holds considerable value for growing the economy, improving service delivery and transforming policy outcomes’.1
IoT, AI and big data analytics
A large contributor to the exponential increase in data held by organisations is the growth of the Internet of Things (IoT). Cisco has estimated that by 2021, the total volume of data generated by the IoT will reach 3.3 ZB per year — ‘the gigabyte equivalent of all movies ever made will cross global IP networks every minute’ — and, global IP networks will support more than 27 billion devices connected to the internet (up from 16.3 billion in 2015).2
The IoT is the interconnection of devices to the internet and involves the transmission of data between the internet and those devices — eg, smart devices in factories, healthcare and smart buildings. Increasingly, it involves the connection of devices, i.e. ‘things’ and people in various aspects of their lives — eg, through fitness trackers, medical devices and Google Maps. The IoT enables the seamless integration of our devices with the internet, and with it our data.
The UK Information Commissioner’s Officer report, Big data, artificial intelligence, machine learning and data protection, (ICO Report) describes the connection between big data, AI and machine learning as follows: ‘AI can be seen as a key to unlocking the value of big data; and machine learning is one of the technical mechanisms that underpins and facilitates AI. The combination of all three concepts can be called ‘big data analytics’.3
Privacy challenges for big data
A key concern about the IoT and big data are the challenges it presents to maintaining privacy of personal information, particularly when analytics and profiling are involved. This is because the collection of personal data may involve not only data that has been consciously provided by individuals, but also personal data that is recorded automatically (e.g. tracking of online cookies), or derived from other data or inferred through data analytics.
A comprehensive list of privacy challenges in big data were identified in an International Working Group on Data Protection in Telecommunications report, Big data and privacy: Privacy principles under pressure in the age of big data analytics, (Big data and privacy report) follows:4
Re-use of data for new purposes — that organisations which use collected personal data as a basis for predictive analysis must ensure that the analysis is compatible with the original purpose for collecting the data.
Data maximisation where the value of data is linked to potential future uses this can challenge the privacy principle that requires the processing of data must be adequate, relevant and not excessive for the purposes that have been defined and stated at the time of collection.
Lack of transparency — lack of openness on how data is compiled and used may lead to consumers to decisions they don’t understand or have no control over – for example, in relation the use of data by data brokers and analysis companies.
Compilation of data may uncover sensitive information — that a compilation of bits and pieces of information, which may not be sensitive themselves, may generate a sensitive result – such as being able to predict a person’s health.
Risk of re-identification — through compilation of data from several sources, there is a risk that individuals may become identifiable from data sets which appear to be anonymous at first sight.
Security implications — the challenges of the additional infrastructure layers needed to process big data and encryption of large data sets; data breaches may have more severe consequences when large data sets are involved; and organisations that acquire and maintain large sets of personal data must be responsible stewards of that information.
Incorrect data — the risk that decisions are based on inaccurate information, particularly when information is obtained from online sources and not official registries — for example, credit agencies.
Power imbalance — between those that gather the data (large organisations and states) and individuals.
Determinism and discrimination — because algorithms are not neutral, but reflect choices, among others, about data, connections, inferences, interpretations, and thresholds for inclusion that advances a specific purpose. Big data may consolidate existing prejudices and stereotyping, as well as reinforce social exclusion and stratification.
Chilling effect — this is the effect that people will restrict and limit their behaviour if they know or think that they might be surveilled.
Echo chambers — which may result from personalised advertising, search results and news items so that people will only be exposed to content which confirms their own attitudes and values.
The lack of transparency, in addition to trust, are key issues for consumers when it comes to providing personal information to organisations and can provide an important competitive differentiator.
Big data: Transparency and trust
The lack of transparency, as identified in the above big data and privacy report, in addition to trust, are key issues for consumers when it comes to providing personal information to organisations and can provide an important competitive differentiator. The ICO Report refers to ICO-commissioned research that shows ‘the more people trust businesses with their personal data, the more appealing they find new product offers such as smart thermostats and telematics devices in cars’.
A Harvard Business Review article, ‘Customer data; designing for transparency and trust’, referred to numerous studies having ‘found that transparency about the use and protection of consumers’ data reinforces trust’.5 To assess this affect, the authors carried out their own international survey to understand consumers’ attitudes about data and what they expected in return for providing it. The authors concluded:6
‘A firm that is considered untrustworthy will find it difficult or impossible to collect certain types of data, regardless of the value offered in exchange. Highly trusted firms, on the other hand, may be able to collect it simply by asking, because customers are satisfied with past benefits received and confident the company will guard their data. In practical terms, this means that if two firms offer the same value in exchange for certain data, the firm with the higher trust will find customers more willing to share.’
The concerns around use of personal information is supported by the Office of the Australian Information Commission’s (OAIC) longitudinal surveys into community attitudes to privacy, and the Consumer Policy Research Centre’s work, which reveal Australians are increasingly aware and concerned about the use of their personal information. The former Australian Privacy Commissioner, Timothy Pilgrim stated in 2015:7
‘The majority of Australians — 60 percent — have decided not to deal with an organisation due to concerns about how their personal information will be used. And significantly, 97 percent of Australians don’t like their personal information to be used for a secondary purpose. This is critical to big data. Because big data projects will often involve secondary use of data.’
The Australian Community Attitudes to Privacy Survey 2017 revealed that 79 per cent of respondents were uncomfortable with other businesses sharing their personal information with other organisations, 49 per cent were uncomfortable with government sharing their personal information with other government agencies, and that number decreased to 40 per cent where it was disclosed that the information was being used for research or policy-making purposes.8 The Consumer Policy Research Centre survey in 2018, revealed that at least two-thirds of respondents indicated they were uncomfortable with most types of information being shared with third parties; and 95 per cent of respondents wanted businesses to give options to opt out of certain types of information collected about them, how it can be used and/or what can be shared with others.9
Apart from the legal requirements to collect data in accordance with applicable privacy legislation, the Australian survey results and research referred to above reinforces the importance of explaining how personal information will be used and the purpose of any secondary use at the time data is being collected. In May 2019, the Consumer Policy Research Centre published its report ‘A Day in the Life of Data’ stating that, ‘opaque business practices and undisclosed data sharing arrangements do not encourage trust.’10 It identifies that, transparency and implementing minimum protection standards will be key to building trust, by supporting informed consent and enabling consumers to identify the benefits for themselves as well as broader societal benefits.11
Data use should be beneficial, fair, respectful and just, transparent and autonomy protecting and performed with appropriate accountability with a redress provision.
How can big data be used in ways that respect the privacy of individuals?
An approach aiming to build trust and based on concepts such as fairness and respect, go beyond legal compliance and support an ethical-based approach. Ethical considerations in relation to processing of personal information in big data are currently very topical and have been the subject of increasing global discussion and development.
The Council of Europe’s Consultative Committee Convention 108 in January 2017 issued Guidelines on the Protection of Individuals with regard to processing of personal data in a world of Big Data.12 The first principle of the Guidelines is the ethical and socially aware use of data. It stipulates that in the processing of personal data, controllers should adequately take into account the likely impact and the broader ethical and social implications, and that it should not be in conflict with the ethical values commonly accepted in the relevant community or communities and should not prejudice societal interests, values and norms, including the protection of human rights.
This year there have been a number of papers and guidelines issued for ethics in AI globally by governments and organisations including; ‘The Ethics Guidelines for Trustworthy Artificial Intelligence (AI)’, published in April 2019 prepared by the High-Level Expert Group on Artificial Intelligence, which is an independent expert group set up by the European Commission13; and ‘Artificial Intelligence: Australia’s Ethics Framework’, which is a discussion paper developed by CSIRO’s Data 61 released in April 2019 to inform the Government’s approach to AI ethics in Australia.14 In May 2019, the OECD adopted its Principles on Artificial Intelligence, the first international standards agreed by governments for the responsible stewardship of trustworthy AI, which include recommendations for public policy and Principles to be applied to AI developments around the world.15 The Principles ‘promote AI that is innovative and trustworthy and that respects human rights and democratic values’.16
In the United States, The Information Accountability Foundation (IAF) has been working on a big data ethics initiative since 2014. The IAF’s goal is ‘to achieve effective information governance systems to facilitate information-driven innovation while protecting individuals’ rights to privacy and autonomy.’17 The IAF sets out the ethical data use principles in the paper, ‘Decisioning process, risk-benefits analysis tool for data intensive initiatives — Achieving legal, fair and just use of data & appropriate individual engagement’.18 The ethical principles articulated are that data use should be beneficial, fair, respectful and just, transparent and autonomy protecting and performed with appropriate accountability with a redress provision. These are core ethical principles, which are also found in other ethics guidelines being published. The considerations for each of these ethical principles as proposed by the IAF are set out in the table below.
Uses of data should provide benefits and value to individual users of the product or service. While the focus should be on the individual, benefits may also be accrued at a higher level, such as groups of individuals and even society as a whole.
Where a data use has a potential impact on individual(s), the benefit should be defined and assessed against potential risks this use might create.
Where data use does not impact an individual, risks, such as adequately protecting the data, should be identified.
Once all risks are identified, appropriate ways to mitigate these risks should be implemented.
FAIR, RESPECTFUL AND JUST
The use of data should be viewed by the reasonable individual as consistent, fair and respectful.
Data use should support the value of human dignity – that individuals have an innate right to be valued, respected and to receive ethical treatment. Human dignity goes beyond individual autonomy to interests such as better health and education.
Entities should assess data use against inadvertent, inappropriate bias or labelling that may have an impact on reputation or the potential to be viewed as discriminatory by individual(s).
Data should be used consistent with the ethical values of the entity.
The least data intensive processing should be utilised to effectively meet the data processing objectives.
TRANSPARENT AND AUTONOMY PROTECTION (Engagement and participation)
As part of the dignity value, entities should always take steps to be transparent about their use of data. Proprietary processes may be protected but not at the expense of transparency about substantive uses.
Dignity also means providing individuals and users appropriate and meaningful engagement and control over uses of data that impact them.
ACCOUNTABILITY AND REDRESS PROVISION
Entities are accountable for their use of data to meet legal requirements and should be accountable for using data consistent with the principles of Beneficial, Fair, Respectful & Just and Transparent & Autonomous Protection. They should stand ready to demonstrate the soundness of their accountability processes to those entities that oversee them.
Individuals and users should always have the ability to question the use of data that impacts them and to challenge situations where use is not consistent with the core principles of the entity.
Ethical framework considerations
The need for an ethical-based approach and the formality of the process will depend on the context, such as the size of the data set or sets and the amount of personal data to be processed within them. Whether a formal ethical framework or an ethics committee should be established will depend on the type and recurrence of data analytics being carried out and will vary across industries as well as government departments.
A comprehensive approach would include an internal ethics committee for the collection of data and data analytics. The ICO Report points out that ‘a large organisation may have its own board of ethics, which could ensure that its ethical principles are applied, and could make assessments of difficult issues such as the balance between legitimate interests and privacy rights’.19 Universities have a long history of research ethics committees and there are examples of medical and health organisations with an ethics based approach to the collection of data and data analytics. For example, researchers at the Centre for Epidemiology and Biostatistics at the Melbourne School of Population and Global Health at the University of Melbourne developed the Guidelines for the Ethical Use of Digital Data in Human Research.20
In the EU, the ‘Guidelines on the protection of individuals with regard to processing of personal data in a world of Big Data’ recommend the use of ethics committees where it is assessed there is likely to be a high impact of the use of big data on ethical values, as a means of identifying specific ethical values to be safeguarded in the use of data. The Guidelines at 1.3 provide that the ‘ethics committee should be an independent body composed by members selected for their competence, experience and professional qualities and performing their duties impartially and objectively.’21
As the ICO report points out an important issue ‘is the organisational relationship between the ethics board and employees with responsibilities for data and analytics, such as the chief data officer and the data protection officer’.22 This highlights the importance of an overarching information governance framework with appropriate formal structures including robust policies and organisational structural mechanisms to ensure appropriate leadership, and a multidisciplinary approach. To both maximise the opportunities and value from data and to minimise the risks, a collaborative approach supported by formal mechanisms to facilitate effective working groups is essential. This will involve collaboration through working groups overseen by a steering group (for large organisations) between those who manage the data, those responsible for its protection — that is, the chief privacy officer and legal team, those performing the data analytics and those using the information obtained to ultimately derive value from it.
One of the key challenges facing organisations in the information age is the ability to set and implement an organisational strategic approach to information governance to effectively maximise the value of information through data analytics while minimising its risks. These include the costs arising from:
data breach, particularly in respect of personal information
other breaches of privacy or other applicable legislation
compliance with mandatory data breach notification requirements
responding to regulators
responding to and dealing with adverse publicity arising from perceived misuse of consumer information
costs arising from an actual or perceived loss of trust by consumers and negative impacts, such as, ongoing damage to reputation and/or a negative impact to the bottom line.
Information governance, which includes governance of data and information across organisational silos, and the strategic management of data analytics initiatives need to be aligned with overall strategic objectives led from the top down, that is it requires board and C-suite leadership. Good information governance enables organisations to meet their regulatory obligations in respect of privacy laws and record retention and disposal requirements, as well as achieving an effective ethical-based approach and self-regulation in accordance with overall organisational strategic objectives. An ethical-based approach should be embedded into the organisation to enable employees working on data initiatives to have a clear mandate to apply guiding ethical principles or an ethical framework as well as clear data governance (that is, the rules around use, access, accuracy and availability of data) and privacy and record regulatory compliance.
The way in which an ethical-based approach may be embedded into an organisation will vary according to the types of data initiatives being undertaken, the volume of personal information involved or the potential to identify personal information through data sharing or combining data sets.
The types of ethical-based approaches include:
Ethical value or policy statement for data initiatives, which can be used as a reference point for employees in any data initiatives to guide decision making and data impact assessments;
Ethical frameworks and/or checklists for use in data initiatives — see for example: Data Impact Assessments including ethical aspects below; and Part B in the Guidelines for the Ethical Use of Digital Data in Human Research, which sets out five categories of ethical issues and guiding questions when conducting research involving digital data covering: consent, privacy, ownership, data governance, and data sharing;23 and
Ethics committees or internal ethics boards.
Data privacy impact assessments
As the IAF paper sets out whether the project is a core product review, the broader use of information, or a big data analytics project, an assessment process is required to address the legal, ethical, fair and other implications of information use. Privacy Impact Assessments (PIAs) are required to be undertaken by APP entities under the Privacy Act 1988 and Data Protection Impact Assessments (DIAs) by organisations that are controllers under Article 35 of the European Union’s General Data Protection Regulation (GDPR). Pursuant to APP 1, reasonable steps must be undertaken to implement practices, procedures and systems that will ensure compliance with the APPs. OIAC explains that a PIA is ‘a systematic assessment of a project that identifies the impact that the project might have on the privacy of individuals, and sets out recommendations for managing, minimising or eliminating that impact’.24 The OAIC guides sets out in detail the matters to be considered, assessed and addressed, and provides a recommended report format.
The IAF paper proposes a Comprehensive Data Impact Assessment (CDIA), as set out in the figure below, for data intensive initiatives to adequately and systematically (operationally) determine interests and impact to stakeholders, including specifically the individual. The CDIA below is designed to analyse ‘whether the individuals’ interests, those of society and those of processors are assessed in a manner that demonstrates legal, fair and just use of data and are risks identified and appropriately mitigated’. The figure below is helpful in showing how an ethical-based assessment component can be added into a PIA/DIA.
Assessment of ethical aspects
The IAF set out the following factors for consideration when assessing the ethical aspects of a big data initiative:
What aspect of collection/acquisition/processing/analysis or use of the insights could be considered unfair to the individual or to society?
Is the collection/acquisition/processing/analysis or use of the insights done in a way that is respectful to the individual?
Has the minimum possible amount of data been used?
Does the company have a legitimate interest in the processing of the data and the use of the insights?
After all mitigations have been applied, what is the residual risk to all stakeholders, particularly the individual — have the benefits and risks been effectively balanced?
Have the interests, expectations and rights of individuals been effectively addressed?
What additional, contextual based participation and choice (meaningful) with the individual should be considered?
Is there an effective redress option for the individual impacted? Has this use of data been transparent?
After considering all the above factors, is the project a ‘go’, ‘no-go’ or should some aspect be recalibrated to reduce the residual risks?
The challenge in relation to the above questions is in the careful consideration of the detail. While the reasons or justifications for undertaking the data analytics may have apparent obvious benefits to those driving the data analytics project(s), further consideration may highlight ethical issues and unintended consequences leading to negative outcomes, including significant reputational risk For example, it is important to know whether a project benefits everyone whose data is being processed, and if not, whether that matters. In the corporate sense, benefiting only high value customers to the organisation, may be an acceptable risk although reputational issues will need to be considered. In other situations, discrimination or determinism of individuals as a result of outcomes from data analytics projects will be unacceptable both for corporate and government. Further and more detailed consideration may reduce the risk of unintended consequences and harms, particularly to avoid the risks of determinism and discrimination. To ensure appropriate consideration, accurate assessments and risk mitigation steps are taken, it is key that relevant stakeholders are involved and there are robust policies that meet organisational needs to guide the stakeholders.
Organisations should implement strong information governance frameworks including privacy policies and PIAs/DIAs that include an ethical-based approach where data analytics of big data involving personal information is being undertaken.
A unified information governance framework enables organisations to take a strategic approach to both maximise the value of information derived from data analytics as well as minimise the risks arising from the costs of legal and regulatory privacy compliance, costs arising from data breaches and/or responding to regulators, as well as costs arising from loss of reputation, particularly where there is a breach of trust with consumers.
The type of ethical-based approach, such as, whether it is an ethical value or policy statement, or an ethics committee, and formal data impact assessments including an ethics assessment will depend on the types of big data initiatives being undertaken, the volume of personal information involved or the potential for personally identifiable information.
An ethical-based approach will ensure that the processing of personal information in big data analytics is carried out in a fair, transparent, responsible and ethical manner. The benefits of an ethical-based approach and data impact assessments include the ability to build trust and transparency with consumers and citizens, ultimately delivering long term benefits to both the consumer and the organisation, and citizens and their government.
Material published in Governance Directions is copyright and may not be reproduced without permission. The views expressed therein are those of the author and not of Governance Institute of Australia. All views and opinions are provided as general commentary only and should not be relied upon in place of specific accounting, legal or other professional advice.
Interview — Tim Nelson: Climate change risks is Governance 101