It’s My Data, Why Can’t I Have It?
Data Rights
In his 1597 writings Sir Francis Bacon said Knowledge [itself] is Power. From that perspective, today we find ourselves on an amazing power acquisition trajectory. Technological developments are making it possible to measure, collect, store and process ever increasing quantities of data which together with developments in AI/Machine Learning are allowing us to derive new insights and gain new knowledge at an astounding rate. Entire new businesses have been built and are being imagined leveraging data and the knowledge it creates, for a sense of this power acquisition one only needs to look at today’s top five S&P 500 hundred companies all of which have vast amounts of data and analytics at their core. It is often stated that 90% of the data stored today has been created in the last two years, and the same will be true next year thanks to massive growth. In the 2017 marketing trends report, IBM estimates that we generate 2.5 quintillion bytes (or 2.5 Exabytes) of data each day! The quantity of global digital data is now measured in Zettabytes (1,000 Exabyte) and with the advancements in technology and increasing rate of data generation our largest metric unit, the Yotta (1,000 Zetta), will soon not suffice for measuring stored data.
A lot of this data generation of course results from data each of us creates directly and indirectly every day. Examples of direct data generation include our digital pictures and recordings, Twitter posts, Facebook and other social media posts, bank transactions, AI personal assistant requests, online searches, mobile device use, wearables, online shopping, and what seems to be an endless litany of texts, App usage, emails, and media streaming. Examples of indirect data generation, created as byproduct of our activities, include city/traffic/store camera recordings, sensors tracking our motion and activities, smart home devices, healthcare monitors, car GPS, and digital exhaust – the data generated by the systems as they handle/collect/store/transmit/process the data we generate.
The other day I was reading a post by a colleague that asked the question – Would You Give Up Your Data for Science? It’s a great question, based on the premise that the more data each of us shares the more we will learn about ourselves and each other. This question started me thinking about the data each of us generates, and I began to wonder. Suppose for a moment you were not concerned about privacy or the misuse of your data, ridiculous I know, but just for a moment. Then
If you wanted to give your data up for science, could you do it?
How much of the data that you generate directly and indirectly do you really own?
How much of the data you generate do you actually store and have access to over time?
I have seen only rough estimates of how much data each of us generates on a daily basis, but as the examples above show it seems clear that we actually own and have access to a very small and diminishing amount of that data. One could certainly leave behind all the digital photos, recordings and digitized documents (emails, PDF files, etc) that have been dutifully stored. However even when considering direct data generation cases I am often not the only “owner”, for example I may own the searches I make but so does the search engine, and their partners, and they are building hugely successful businesses on that data. And even when I am aware of the data, say searches, I do not necessarily have a good way of keeping track of them over time (although the search engine does!) and therefore would not have access to most of it and could not leave it behind. In the case of indirect data there is a whole lot of data that I am likely not even aware of and do not own or have access to, for example, security camera or biometric scanner recordings. So even dedicated lifeloggers would have difficulty leaving behind most of their data, much of it beyond their knowledge and ownership.
The bulk of the data each of us generates is owned and stored by a multiplicity of private and public organizations and institutions, including my flashlight app that wants my geolocation to simply shine a light! There are efforts underway both in the U.S. and abroad related to data rights and ethics, such as EU legislation going into effect next year requiring companies to obtain unambiguous consent from users to collect and use personal information and the U.S. Consumer Data Privacy in a Networked World White House report. To date, the main focus appears to be on ensuring that companies have our permission for collecting data and use it ethically once they have it, both important areas that need attention. But once permission is given I may still not have access to the data and will likely lose ownership, as highlighted in recent Senate hearings about the Equifax breach where senators asked why consumers shouldn’t have power over the data that these companies collect on them. The difficulty in knowing about, owning and having access to all our data will only increase as we grow the deployment of sensors and smart devices.
So while I possess and have the right to leave my organs behind for Science, I do not have the ability or the right to leave behind most of my data for the benefit of Science.
Beyond the question of my data legacy, there is the bigger question of my data rights. I don’t believe we can address the former without first addressing the latter.