Protect against inadvertent privacy breaches

There has been a raft of data breaches over the past few months.

Some of those were due to poor controls and/or significant effort by hackers.

But some recent breaches have been rather inadvertent. Despite some controls put in place, with no significant effort by evil actors and with the risk of breach not that easy to identify at the outset.

An example of an “inadvertent breach”

The travel card data privacy breach in Victoria, Australia, that was reported on in this article in August 2019.

Data was released for use in a public competition, a “datathon”, with the data de-identified.  By itself, the data did not result in a potential breach.  The risk that materialised related to “re-identification”. The data, when combined with other information sources, could be used to identify individuals.

With the myki data, this included:

  • the ability to link social media data with the myki data to identify certain individuals
  • an ability to combine known information – travel journeys or travel patterns – to identify particular individuals
    (e.g. find your own data -> find others that travel with you -> see where else they travel)

 

There are other examples, within and outside the public sector, like this.

This article in Science Daily says “Re-identifying anonymised data is how journalists exposed Donald Trump’s 1985-94 tax returns in May 2019.”

Between 1 July 2018 and 30 June 2019, according to data derived from OAIC (Office of the Australian Information Commissioner) reports, 21 “unauthorised disclosure (failure to redact)” breaches were reported.

 

What makes this different from other breaches we hear about?

With typical breaches, the risks are almost immediately apparent.  These incidents were a bit different.

That’s not an excuse though. The investigation into the myki incident suggested that more could have been done.

 

How can we minimise the risk?

In an ideal world, we’d eliminate the risk.  But that could mean that we don’t share the data at all.

This article in The Guardian says “anonymising data is practically impossible for any complex dataset”.

So let’s consider how to minimise the risk instead. This is not straight forward, but it is not a new area either.

Here are three examples of guidelines and frameworks that can help – newest first:

1.

(Aug 2019)

OVIC wrote about the investigation in this blog article (amongst other resources that they provided).

They provide 5 lessons learnt in that article. All relevant and the article is easy to read.

I found the fourth item to be of particular interest. “PIAs, if done incorrectly, can create a false sense of security)”. We explore this further in this blog article.

2.

(Sep 2017)

The De-Identification Decision-Making Framework was developed through a collaboration between various Australian government agencies and departments.

A fairly extensive document. It focuses on practical, operational advice.

Interestingly, there is an acknowledgement that “De-identification is not an exact science and, even using the De-Identification Decision-Making Framework (DDF) at this level, you will not be able to avoid the need for complex judgement calls.”

3.

(Mar 2010)

Microsoft – A Guide to Data Governance for Privacy, Confidentiality, and Compliance.

Almost a decade old, but still in use, and for good reason.

There are 5 parts, each self-contained:

  1. The case for data governance
  2. People and Process
  3. Managing Technological Risk
  4. A Capability Maturity Model
  5. Moving to Cloud Computing

Importantly, one of the introductory suggestions is that organisations should consider:

“Augmenting approaches that focus on mere compliance ‘with the letter of the law’ by implementing and enforcing” … “measures that go beyond mere compliance with the letter of regulations and standards.”

 

There are many others, but a combination of these would provide a good starting point.

The clear message is that we should consider the specifics of the situation and the associated risks:

  1. Impact assessments won’t work if not completed correctly. They should not be just checking boxes on a template. Rather, the risks and associated treatments should be carefully considered.
  2. There may be a need to make complex judgement calls, which may require expert advice. Especially with “the more technical risk analysis and control activities”, as outlined in the 2nd doc.
  3. Go beyond traditional cookie cutter compliance approaches.
  4. Consider alternate ways to protect data. An example of this is differential privacy, as used by Uber, who have released an open source project for it and Google, who also rolled out an open source version of their library in September 2019.

 

How are you protecting your customers and citizens?