At the recent CSO Perspectives Roadshow I was on a panel with the esteemed David Lacey, he suggested just like Asimov's laws for robotics we need some clear maxims for the security and privacy management of big data.
Well firstly, let's just have a recap of what is Big Data before I get into attempting to draft these laws. Big Data is essentially the techniques for curating and analysing large complex datasets that are beyond the capability of most normal Database Management Systems and data warehouses. These datasets are often accessed by a wide range of researchers, scientists and (shock horror) marketeers to gather new insights into customers and problems. For example diverse datasets about the physical environment could be analysed to identify unexpected impacts of climate change. The study of pedestrian and motor vehicle traffic patterns from smartphone navigation data could be used to improve the "livability" of cities. Many applications and websites use big data with "you bought X you might also like to buy Y" tailored marketing.
So, with that in mind I offer you, Hackling's Laws of big data:
1. Collect the data legally
2. Anonymise and de identify the data to preserve privacy of individuals, ethnic/religious groups etc. before it is ingested into the big data dataset. For example: 4. Log access to investigate misuse of the data. 5. Prosecute misuse of the data.
a) Year of Birth is OK for demographics. Date and month of birth isn't
b) Postcode is OK for demographics. Street number and address isn't
c) Anonymised location history is OK, personalised location history is an invasion of privacy.
d) Use of identifiers as phone number, Social Security Number, Tax File Number should be prohibited to impede data matching and unintended use.
3. Prohibit data matching to re-identify individuals and ethnic/religious groups contractually by using "end user license agreements" and business partner contracts.