Data Ethics — What is it? And why it matters?


Today we are producing more data than ever before. It is a non-stop process; data is being produced 24 hours a day, 7 days a week and 365 days a year, by individuals, organisations and machines.

EVERY SECOND [1] — there are about 90,000 Google searches, 4,000 photos uploaded to Facebook, 9,300 Tweets, 1,050 photos are uploaded to Instagram, 88,000 YouTube videos viewed, 5,300 Skype calls, 3,000,000 emails sent and about 110,000 GB of Internet traffic is generated.

Every digital device we possess, our smartphone, tablet, smartwatch, health and activity monitor, computer, smart TV, even some of our fridges, is generating or creating data, with or without our knowledge, and this data is being stored somewhere. Some organisations are creating data as part of their processes when other organisations are collecting data that is available from different locations and from different sources to create their own collection and databases to be used or commercialised.

All this activity related to the data presents some questions that perhaps need to be considered. It also gives the space for the development of data ethics which will look at issues concerning the use of data. If the ethical issue of the data usage is not understood or ignored, it potentially can cause some damage and loss of opportunities to the data science industry. In contrast, exaggerating individual rights may lead to inflexible regulations, therefore delaying the progress of new technology which could add value to organisations or our lives. [2]

What happens with all this data?

Why it is so important?

Why should we care?

What data can and cannot be collected, stored or shared?

Who should have access to it?

Is the data safe?

Do we have legislation that controls the data collection, storage and usage?

What does ethics have to do with data?

Do we need a code of ethics?

To better answer some of these questions we may need to review and interpret some of the work from a few earlier philosophers and their theories regarding our society and human behaviour. Then reviewing these theories into today’s terms to see how they can relate to data science and data ethics.

Early philosophers

(please refer to the Infographic “Why does (data) ethics matter?”)

In the infographic, there is a brief explanation of the theories and identification of the philosophers.

Balanced and fair

Virtue Ethics suggested that a person should act with dignity, by doing the right thing.

The expectation today is that the organisation or individual managing the data would act in a manner that would be appropriate and it would not compromise the “data owner”. An extremist solution would be avoided, and the aim would be a moderate action.

Human society needs authority and protection

The Social Contract claimed that human society needs a form of authority to set some rules and ensure protection and freedom.

This brings the idea that legislation created by the authority would ensure that the data is used appropriately, maintaining individual privacy. Data would not be used to create harm without a, particularly good reason.

Universal law

Kantian Ethics brought the idea that an individual should act in a way that the person expects others to act the same. Data would be used in a fair fashion setting the expectation to others to behave similarly and individuals and organisations would not take advantage of ignorance nor compromise privacy. Everyone would follow the same universal set of laws. This can be quite restrictive as it suggests that a rule cannot be broken independent of the result.

Actions based toward obtaining best results

Utilitarianism came with slightly different ideas and suggested that to benefit the greater number of people, a harmful action may be justified, and it will then become a new rule. It defends that an existing rule may be broken if it is to benefit the majority. This theory was used as a base for the Consequentialism where the end justifies the means.

Mixing it up and the basic principles

When these theories are analysed together and elements of each are used as a base for some of the principles to endeavour to create a code of ethics for data science the code of ethics is intended to create a standard behaviour between professionals. These principles need to be looked at from a macro perspective and not from an individual level.

Starting with the principle that individuals and organisations will do the right thing when working with data (this comprises of producing, recording, processing, analysing, utilising and sharing data).

A regulatory authority provides the basic legislation, offering a certain level of protection to individuals and organisations, setting the minimum compliance based on the law.

The creation of common standards would help with an efficient code of ethics. These standards need to, at least, incorporate the points below:

· Privacy

· Transparency

· Management

· Regulation

· Fairness

Individual privacy must be preserved, any personally identifiable information will need to be removed as soon as possible. It is not intended to expose individuals’ identities. Privacy and confidentiality are not unconditional and it will need to be measured against the individuals best interest. [3]

Sharing data or parts of it needs to be done carefully due to emerging technologies and wide availability of data, it is possible to re-identify the data and consequently impacting the individuals.

Organisations and individuals should not collect data without a valid purpose, consequently reducing the risk of unnecessary exposure. Only data that is relevant should be kept, any non-relevant data can be eliminated avoiding extra risks. At the same time, the data should be stored only for the necessary amount of time, after that, it could become a liability.

Careful and considerate collection and use of data in order to not expose, harm or segregate any individual or group of individuals. Morally questionable use of data should be avoided. Not everything that can be done with consumer data should be done.

The legislation is to be followed consistently and organisations should aim to go above and beyond whilst improving upon it where possible. Legislation tends to fall behind technological advances and most of the time is striving to catch up or making some corrections to reduce risk. Organisations cannot wait for regulators to create legislation to deal with issues, once an issue is identified it needs to be dealt with promptly and the organisation process amended to avoid its’ recurrence.

The code of data ethics needs to be communicated across the organisation, compulsorily followed by all members of the organisation. The code must be reviewed and adjusted regularly as the technology evolves and legislation changes.

Data stored must be secure ensuring no unauthorised access can occur causing a data breach exposing the organisation and individuals. The highest standards of security need to be maintained to guarantee the stored data is safe.

The origins of data and the tools and methods used to collect or create it need to be transparent and auditable. This process should also be well documented and updated regularly.

One could argue that the code of data ethics is important, it needs to be looked at from a macro-perspective (macro ethics) instead of looking into a more detailed and individually tailored code of ethics. Another important consideration would be the ontology [4] of the data as the data is all connected, it has endless relationships and dependencies. Even individual datasets can be interconnected using the correct tools and the right algorithms.

Every day more organisations and individuals are relying on data to base their decisions. Data is rapidly becoming an essential part of the business and soon to be reported as an asset by organisations. With the data ecosystems gaining momentum in organisations and enabling organisations to better comprehend their customer’s interests and consuming habits.

Wrapping Up

With the increasing number of digital devices connected, cheap storage, fast advancements of technology and the lack of regulations to ensure the safety and fair usage of the data, we urgently need a code of ethics for data science and data analytics to be created and publicly adopted by organisations. This will ensure the creation of collaborative data ecosystems where data can be fairly and safely exchanged and bring innovation and sustainable value creation for businesses. And, at the same time, it will provide a better customer experience guaranteeing the protection of individuals privacy.



[2] Floridi L, Taddeo M. 2016

What is data ethics? Phil. Trans. R. Soc. A 374:






Get the Medium app

A button that says 'Download on the App Store', and if clicked it will lead you to the iOS App store
A button that says 'Get it on, Google Play', and if clicked it will lead you to the Google Play store
Cristiano Kochenborger

UTS Student 570100 Data Ethics and Regulation - Session 1 2021