The new gold fever: Big Data. Your data

What is it? How does it affect us? What do corporations do with it?

We cannot look at it, but it controls what we see. It cannot be touched, but it certainly has to do with whatever comes into our hands. We cannot listen to it, but it does influence what we listen to. We can neither taste it nor smell it, but it leads us to taste and smell specific things. Being omnipresent, data is already considered to be the oil of the 21st century but, unlike the much valued mineral oil, data is not only renewable but, moreover, it multiplies itself exponentially. According to the European Union, 1.700 billion bytes of data are generated every minute. But where do they come from and why are they being talked about so much now?

Whoever believes it was technology that made “data” a big issue is wrong. From the time we get up until we go to bed, we ourselves are data sources. Actually, we produce data even in our sleep. Everything in our everyday behavior draws trends and patterns. This phenomenon is not new at all, but the difference is that we now have the technology that allows us to gather all that content and use it to shape our daily life, just by uploading those thousands and billions of pieces of information in our computers, mobile phones, smartwatches, tablets, and so on.

Big Data, literally, is nothing more than the huge accumulation of this immense amount of “data.” Data is Big because of its volume: there is just so much data being generated every second that the systems and tools so far have not been able to control it. The speed at which data runs — continually modifying itself — and the variety of information that is constantly being created also add to its complexity and, hence, to our different attempts to understand if not to master it.

Big Data is therefore not a data recording system, nor a software, nor a tool by itself, nor a “cool” process in which techies work. It is what we call these immense databases made out of compiled info which proceeds from each and every one of us. However, if we were to give an official definition, we could use that of the Foundation of Urgent Spanish, which refers to Big Data as the “denomination with which one refers to a set of data that because of its amount, variety and the speed at which it needs to be processed exceeds the capabilities of usual computer systems.”

What is Big Data made of?

Variety is one of the main characteristics of Big Data. It is precisely the source of the difficulties of any attempt at handling it, due to the fact that we are constantly producing information concerning very different aspects of our lives which, in turn, creates different “archives”: geolocation, photos, voice notes, vital signs, music or video downloads, shopping, fashion preferences, and opinions — basically, everything you “like” on social networks.

All these elements, combined with variables such as frequency, can become useful knowledge. They allow us to give a somehow accurate depiction of who we are, what interests us, what we dislike, where we move, and even if we have any health problems. In addition, it is not only our own devices that are gathering such info.

Mario Tascón and Arantza Collaut co-authored the book Big Data and the ‘Internet of Things’: What Is Behind It and How Is It Going to Change Us? (Catarata, 2016). In it, they explain how all this information “can come from the activity of a company as well as from regular citizens in their relationship with public administration, their daily work or their conversations on social networks, but it also comes from weather stations, traffic sensors deployed by the government or cars driving on roads.”

The two sides of the coin

At the same time we produce data, we subordinate ourselves to it and, ultimately, to those who are able to manage data (remember there are data farms out there). Some large companies and certain governments already have the capacity to operate and manage this kind (and amount) of data; Therefore, they can (and do) exert an undeniable power over society.

Google, of course, is the master of this discipline. Their business is based on data. Starting with your browsing preferences (there’s a lot of data there), they are able to establish direct relationships between user preferences and the advertisers that can provide them with matching products, whichever they are. That is why Google will show you some suggestions (and ads) instead of others.

This is basically the reason why data can “control” where we go, what we see, what we eat, how we dress and so forth. But it is also clear as well that we are the ones who are, consciously or unconsciously, constantly providing all the information.

“Companies that are very successful now know where their customers are and, perhaps more important, what they do and where they go. They know what’s going on as it is still happening and they allow that information guide their strategy and participate in their decision-making process,” explained Bernard Marr in his book Big Data: The Use of Big Data, Analysis and Smart Parameters to Make Better Decisions and Increase Performance (TEELL, 2016).

Big Data as prediction

Beyond the clear interests of the market and the economic value data might have, its management does not have to be synonymous with a dystopian future in which society is slave to the market. Detecting patterns and trends on a global scale has benefits because it allows for predictions to be made.

Luciano Sáez Ayerra, president of the Spanish Society of Health Informatics, says — in Tascón and Collaut’s aforementioned book — that “data can allow the health industry and the researchers involved in it to have large amounts of real and verified information. The value of such information would be enormous in terms of generation of knowledge that can be used to improve health care.”

Also, when it comes to research, “it is all about extracting value from data,” claims Josep Maria Argimon, director of the Agency of Sanitary Quality Evaluation of Catalunya (Spain), at the Mobile Health Global website. “Health risk factors, related to habits or a given lifestyle, such as if a person smokes more or less, drinks more or less alcohol, has been admitted to a hospital for a specific disease, and so on” are some of the data that, once stripped of personally identifying information, would allow researchers to develop prevention patterns, Argimon explains.

Being able act in a timely manner when it comes to health issues saves lives. And regarding education, data analysis helps reduce academic failure. “The macro data allows us to know our students better, their study habits and what works best for them when facing learning processes in order to offer more personalized itineraries adapted to each student,” says Julià Minguillón, professor of Computer Sciences, Multimedia and Telecommunication Studies at the Universitat Oberta de Catalunya (UOC), in an article published on the institution’s website.

Teresa Sancho, who is a professor in the same institution, adds in the same article that “being able to react on the spot, not only when the course is finished, giving everyone what they need, is definitely a good thing.”

In fact, that’s what Big Data is about: giving everyone what they need. It obviously applies to health and education, but also to banking, the fashion industry, sports, energy sources and so on. The management of data allows for the creation of strategies and actions that aim towards simple process optimization. In a word, it’s a matter of efficiency.

Privacy and anonymity

To what extent is data anonymous? To what extent is privacy preserved? That’s the question users must ask the moment they generously provide their data. In fact, users are not always aware that’s what they are doing and, if they do, most of the time they really don’t know what this process of data-giving entails.

Tascón and Collaut’s book also explains how legislation usually applies general data protection principles to each sector (may it be business, research, education and so on). However, in the very same book, Iñaki Pariente de la Prada, the director of the Basque Data Protection Agency, points out that updating rules and regulations is important because “the current one was made when we still had no internet. We are now living in a world in which everyone walks down the street with several devices on.”

So what happens when the data is so abundant that it is impossible to get consent from everyone, all the time, while data is being produced? The data must be made anonymous, so that it is not linked to specific individuals, allowing for their identification. It is a basic principle of both privacy, identity safety and respect for private property (which, yes, includes data).

In addition, the ownership of such data is another of the aspects that generates the more controversy, and on which most current legislation has not yet established clear norms. There is no specific law concerning Big Data, but according to Pariente De La Prada, administrations and companies worldwide are indeed obliged to publish a declaration that explains whether they are going to respect or not certain privacy policies and how will they do it (or not) when engaging in massive data analysis.

“The only thing that is clear is that the company that facilitates the means to pull up a database has a series of rights on it. On the contrary, in relation to the ownership of information generated from Big Data there are no explicit rules, so one will necessarily have to follow general principles of law, as well as by what ends up being established in the contractual relations between the different parties involved,” explained Alejandro Sánchez del Campo, a lawyer who is specialized in technology, in the FIDE Blog of the Spanish journal El Confidencial.

From Big Data to Smart Data

“Data is a treasure when there is knowledge in it. Profiting from such ‘gold’ will depend a lot on the questions we ask ourselves, and how imaginative and skillful we are,” says Teresa Sancho, from the UOC.

What Sancho points out is, indeed, key, and is one of the many elements Marr points out at in his book. Setting aside its quantity, its speed and its diversity, veracity is the fourth key element of Big Data, since the data cluster that is constantly generated can lead to erroneous conclusions if it is mistakenly interpreted, if not organized properly.

Therefore, strategies, technical resources and new professional profiles are needed to make the most of Big Data. “Whether we like it or not, whether we are prepared for it or not, the future will be Big Data. Our ability to harness that power with intelligence, common sense, and usefulness will make it valuable SMART Data,” Marr affirms.