An Introduction for Critical Big Data Studies

Acoustic Space #12 - Peer-reviewed Journal for Transdisciplinary Research on Art, 
Science, Technology and Society, "Techno-Ecologies 2", Media Art Histories, (Der.) 
Rasa Smite, Armin Medosch and Raitis Smits, Riga, 2014, p. 262-273

In contrast to affirmative and romantic accounts of the 1990s and the early 2000s, which celebrated the role of Internet for participatory democracy, it becomes obvious that today there is a growing control, governed with big data. Although the term “big data” is often used in popular media, business, computer science and computer industry, with rather optimistic and vague descriptions, there are also critical problems raised with big data.

Mayer-Schönberger and Cukier (2013) consider big data as “a revolution that will transform how we live, work and think.” But Manovich (2011) points out, “even researchers working inside the largest social media companies can’t simply access all the data collected by different services in a company.” Those who can access to the political, economic and cultural production lines is strictly restricted. Massive transactional databases of web searches, applications, sensor data and mobile phone records are only in reach of the accredited actors of media, finance, government, pharmaceutical and other industry giants, which offer infrastructure to manufacture big data. The scandal around PRISM program of The United States has also recently proved how government agencies gathered information by disregarding the privacy of its citizens and increased its control. Under the guise of security and anti-terrorism, the National Security Agency (NSA) collected, via the net giants, the personal data of millions of citizens worldwide, a massive, menacing and indiscriminate collection. According to Johnson et al (2013) these companies include Microsoft in 2007, Yahoo! in 2008, Google in 2009, Facebook in 2009, Paltalk in 2009, YouTube in 2010, AOL in 2011, Skype in 2011 and Apple in 2012.

Appraisal and promotion of big data via contemporary cloud platforms also comprises of some extinction claims about certain disciplines and fundamental concepts. Some argue, like Savage and Burrows (2007), that big data is a subset of a new “knowing capitalism”, in which companies will be able to conduct the empirical sociology much better than sociologists themselves. But the problem seems to be more complicated than that. Such an approval of death (of a discipline along with its scholars) coincides with what Braidotti (2013, 122) calls as “contemporary necro-politics”, the politics of death on a global regional scale. Indeed, except the sociologists hired by these corporations, we don’t know “What Facebook Knows” and with whom share their analysis and data sets. Since large data sets and analysis tools are only available to dominant actors, such as big corporations, institutions and governments, the control of curating datafacts calls for the participatory performance of democratic mechanisms.

In his recently talk at Transmediale 2014, Bratton (2014) also emphasized that the Black Stack, the new nomos of the Internet, stage the death of the (human) User by bringing the multiplication and proliferation of other kinds of nonhuman Users – including sensors, financial algorithms, and robots. So in a way, it becomes almost impossible to talk about the privileged position of the citizen as a human subject. Contemporary Cloud platforms are displacing, if not also replacing, traditional core functions of the states, and demonstrating, fot good and ill, new spatial and temporal models of politics and publics (Bratton, 2014). For example, for cloud platforms, “the terms of participation is not mandatory, and because of this their social contacts are more extractive than constitutional. The Cloud Polis draws revenue from the cognitive capital of its Users, who trade attention and microeconomic compliance in exchange for global infrastructure services, and in turn, it provides each of them with an active discrete online identity and the license to use this infrastructure”. That’s why cloud platforms becomes intrinsically totalitarian just like the state as they centralize the terms of participation according to rigid but universal protocols, even as they decentralize like markets, coordinating economies not through the superimposition of fixed plans but through interoperable and emergent interaction. In other words, if the cloud platforms are becoming the third form, next to states and markets, of centralizing and coordinating through fixed protocols, we can only talk about “alter-totalitarian” governmentalities.

But “The Future of Internet Freedom” becomes a concern in particular for the net giants. According to a recent New York Times article of Eric E. Schmidt, Google CEO, and Jared Cohen, The Director of Google Ideas,  “Over the next decade, approximately five billion people will become connected to the Internet. The biggest increases will be in societies that, according to the human rights group Freedom House, are severely censored.” Russia, China, Iran, Turkey, Pakistan are among those countries where the control over big data becomes a political conflict between Internet giants and governments.

Schmidt and Cohen further argue, “The mechanisms of repression are diverse. One is “deep packet inspection” hardware, which allows authorities to track every unencrypted email sent, website visited and blog post published. When objectionable activities are detected, access to specific sites or services is blocked or redirected. And if all else fails, the entire Internet can be slowed for target users or communities. In other cases, like in Ukraine, sites are taken offline with distributed-denial-of-service attacks, which overwhelm a server with digital requests, or else the routing system of the national Internet system is tampered with to make entire sites mysteriously unreachable. Entire categories of content can be blocked or degraded en masse; in Iran, we hear that all encrypted connections are periodically severed and reset automatically”.

Big data is big business. The technologies of repression are a multibillion-dollar industry. Those countries, like Turkey, outsource their surveillance and censorship operations to certain digital mercenaries such as the U.K.’s Gamma Group, Germany’s Trovicor, Italy’s HackingTeam, France’s Amesys, and Blue Coat Systems, based in The United States. According to a 2013 report of Reporters Without Borders, “National governments are increasingly purchasing surveillance devices manufactured by a small number of corporate suppliers and using them to control dissidents, spy on journalists, and violate human rights.”

The control over data is also established via regulatory mechanisms and judicial instruments as well. Emerging legislative measures and Internet laws are passed (i.e. Law No. 5651 on Regulating Broadcasting in the Internet and Fighting Against Crimes Committed through Internet Broadcasting in Turkey) in many countries so that big data based mass surveillance, access restrictions and censorship via Internet Service Providers can well be executed on a legal basis. However, many people are not aware of the multiplicity of agents and algorithms around personal data collection, storing of their data for future use, possible uses of their data – and about the dimensions of profitable personal data economy (Boyd and Crawford 2012). DEVAM ET

As hegemony, in Gramscian context, ultimately relies on consented coercion, it is not only users but also institutions and governments and market actors give their consent and will to become the manufacturers and labor of big data. For instance, in order to survive within subordinating power of the financial elite, which gradually coerce to conform to their rules, some governments sell the data of their citizens – indirectly via another company – and present their countries as an open laboratory for several industries, such as Andorra, presented as a “Nation as Living Lab.” According to Travis Rich, an MIT Media Lab researcher and the director of the technology of the Smart Country program in Andorra, “The goal is to develop a platform that is sourced with data from the telecom, energy provider, installed sensor infrastructures, the bus company, tourist buying habits, store directories, store inventories, restaurants, etc., and to make it available to researchers, entrepreneurs, and companies as a tool for understanding and experimenting with new technologies and ideas.” 

This is part of a large effort in investigating ways to anonymize data that flows through the API (Application Programming Interface) before it ever reaches an ISP (Internet Service Provider), Andorra Telecom server in this case, in a way that would make it useless to any other entity trying to track and collect data from it, including the governments. Once again, we see that Democracy and Development are remediated. Still, who can access that data and for what purpose will it be used and processed remains a big question due to the lack of transparency and self-governed watchdog infrastructures.

Can governments collect the data of their citizens and sell to others? A 2013 scandal around the illegal trade of health data in Turkey reveals the “data war” between big corporations and the states. After centralizing the health insurance system in one institutional form, the governments made all pharmacies to collect and register the medicine records of all Turkish citizens mandatory via a software program. Despite all the objections based on privacy as well as the Supreme Court’s decision, the data of 75 million, collected in the last 5 years were sold to a small company called DATAMED without having a tender process. Before this, the pharmaceutical industry had no access to data sources. In this way, via an intermediary agency, the government sold the data of its citizens to big corporations. The political connections of the owners of DATAMED also revealed the fact that the struggle over big data is strongly related with becoming the owner of the capital.   

Datafacts / Data-failures

Despite the claims of objective truth, in big data, the selection, preservation, maintenance, collecting, processing and archiving of datafacts are subject to interpretative basis. Focusing on the increasing surveillance of daily online lives by major corporations who mine this data to sell to others for commercial profit, Jamie Allen and David Gautier (2014) developed an artistic project about big data use, Critical Infrastructure, and note, “Avoiding too much interpretation is the key.” Messiness and randomness is traded off with precision and objectivity in big data. With the aim of extracting the patterns and predicting the propensities of the large sum, some data can be ignored, excluded and disregarded. In this case, propagative power of the small can sometimes be underrated. Big data analysis alone would lead to exaggerated or distorted results as “pure facts” to be used in decision-making, evaluation and for definition of user’s rights, access, benefits and restrictions. In return for lowering the standards and criteria, one can sometimes access more data but have also more error.

Failure is not a problem in scientific knowledge production. Instead, as emphasized in Laboratory Life: The Social Construction of Scientific Facts by Latour and Woolgar (1986) a typical experiment produces only inconclusive data that is attributed to failure of the apparatus or experimental method. The point in having failure is to learn how to make the subjective decision of what data to keep and what data to throw out. For untrained observers, the entire process resembles not an unbiased search for truth and accuracy but a mechanism for “ignoring “some data. The failure corresponds to the failure of dealing with content and turns out to extend the culture of ignorance.

Google Flu Trends Performance in the United States during the 2009 Influenza Virus A (H1N1) Pandemic is one of the most well known failures of big data analysis. Google predicted 11% of the US had flu instead of the actual figure, 4.5-4.8%.  Although in this case the error percentage is indeed high and usually “3 per cent margin of error” is treated as a normal threshold, it is within this 3 per cent that the excluded subcategories would lead to major transformation. Besides the predictions based on Twitter messages during Hurricane Sandy can also be a case. Given the way the data was presented, there was a huge data-gap from communities unrepresented in the Twitter sphere.

Big Data vs. Small Data / Imitation

Even the integrated data that is accessed through APIs cannot be processed ‘properly’ and is qualified as ‘useless’, still, with big data, it is often we hear, ‘size matters’. The reproduction of this familiar, gendered and hierarchical discourse is embedded especially in popular accounts. Mayer-Schönberger and Cukier (2013, 6) argue, “Big data refers to things one can do at a larger scale that cannot be done at a smaller one, to extract new insights, or create new forms of value, in ways that change markets, organizations, the relationship between citizens and governments, and more.” Indeed, although the amount of stored information grows four times faster than the world economy, some argue, “Big Data is notable not because of its size, but because of its relationality to other data.” (Boyd and Crawford 2011)

The account of size is also augmented with what Wang (2013) calls “thick data,” another problematic analogy drawing from the anthropological views of Geertz on “thick description,” that is, “to make sense of the seemingly infinite array of data in the world” (Stoller, 2013). Wang (2013) applauses what big data needs: “Thick Data is the best method for mapping unknown territory. When organizations want to know what they do not already know, they need Thick Data because it gives something that Big Data explicitly does not—inspiration.” 

Apart from this rather mystical approach to big data, in their recent paper, When Size Matters: The Complementarity of Big and Small Data in Future Social Science, Blok and Pederson (2013) draw a distinction between Big (computational, quantitative) and Small (embodied, qualitative) Data. Following what Latour et al (2012) proposed as an “intermediary level” in The Whole is Always Than It’s Parts: A Digital Test of Gabriel Tarde’s Monads, they suggest using these bifurcated research methodologies complementarily. But in Tarde’s view, “smallest entities are always richer in difference and complexity” (Latour 2001, 82). If one reads this without having comprehensive reading of Tarde, it becomes possible to draw a conclusion in favor of small data. However, for Tarde, it is not big or small data but the imitative and contagious recurrence is at stake.

So from critical big data studies’ perspective, it should be noted that rather than focusing on big and/or small data, it is more important to notice that the very recurrence of this micro/macro distinction itself is an imitation of desires and beliefs in the empirical, which is also the extension of the totalitarian imperial ideology. This is one of the most important problematic aspects of big data: while talking about revolutionary change, the power and authority of dominant modern thinking is embedded merely with some newer versions of the already expired. While web data mining and crowd sourced tracking systems are becoming the ingredients of surveillance-based research, Valleron, an epidemiologist running a monitoring network says in Nature state, “The new systems depend too much on old existing ones to be able to live without them.” These “beams of imitation” (des faisceaux digitations) (Tarde 1895, 207) indicate the exact similarities within the mass of beliefs and desires. For this reason, big data is also a fabricated social, which is only made visible.

In other words, current status of big data is strongly linked to representative samples and their competence for producing objective truth. Introduced and popularized as the hot methodological research norm of our times, “data mining becomes a shallow version of neo-empiricism” (Braidotti 2013, 3) which serves for exploitative and profit-minded pursuit of capital. Analysis of large data sets is subjected to prediction and prevention measures. And this “dataist” approach operates as a plug-in for an updated totalitarian ideology of neo-imperial hegemony.

Tarde gave importance to both archeology and statistics by insisting on extracting the imitational and repetitive aspects. For Tarde, object of knowledge is not static but truly dynamic. He provides a kind of methodology for statistics and this can perhaps be used in critical big data studies: “1st determining the imitative power of each invention, in a given time and country; 2nd showing the favorable or adverse effects resulting from the imitation of each of them.” (Tarde 1890, 170) Therefore, for Tarde, after determining similarities among different body units, the propagation and imitation of a single one, no matter how big or small, as difference or invention, is at stake. Then it’s propagation impact that passes from one to another is under focus. Uncovering the archeology of the trends of imitation, it is not the size but how the dynamic contagion flows and the power of transforming the other are used should be considered.

Fertile Data / Infertile Data

Knowing through a mass of data is acknowledged to have not whole but only partial information about others. For this reason, information leaks lead to an inevitable collaboration among the owners of data-capital, accompanied by mutual debt relations. That is, since each owner only possesses some information, by way of collaborative lending, owners of data-capital aim to make profit by correlating, predicting and controlling a wider spectrum.

But in terms of media ecology, not all data is fertile for the sake of market growth or control. Douglas Merill, Google’s former chief information officer and the founder of ZestFinance—a startup that leverages big data to provide credit scoring information, notes, “We feel like all data is credit data, we just don’t know how to use it yet.” That is, although the tracked and collected data is infertile, it is considered as an investment. Therefore fecundity, the potential reproductive capacity, of data is taken for granted not having a useless value but as an investment capital.

In Merill’s account, credit data refers to the potentiality of pure data as raw material. This has a potential change our views about contemporary political economy. On one hand, in The Making of The Indebted Man Lazzarato (2012: 7) argues, “The debtor-creditor relationship intensifies mechanisms of exploitation and domination at every level of society.” Credit is one of the most effective instruments of exploitation man has managed to create, since certain people, by producing credit, are able to appropriate labor and wealth of others (Ardant 1976, 320, cited in Lazzarato 2012, 20-21). In a similar fashion, Griziotti (2014) emphasizes that “a mercenary political class, subject to a financial élite that forces it to privatize welfare, no longer has any margin for exercising their antiquated social democratic mediations; a new strategy of control over life and society, based on technological subjection and the generation of debt, has replaced it.”

From a contrary perspective, one can also argue that the interaction of the owner and the lender of data mark somehow also an inverse economy. Today the credit-givers and credit-receivers are reversed. If data is capital, it is also government and market actors, which seek credit-data, the capital. Without data provided by the users, the algorithms of a web site cannot function, hence cannot exist. In this way, epistemological approval of the transcendental power of control authorities is not imitated, as Tarde argues (cited in Latour 2012, 14) “To be part of a whole is no longer to ‘enter into’ a higher entity or to ‘obey’ a dispatcher (no matter if this dispatcher is a corporate body, a sui generis society, or an emergent structure), but for any given monad it is to lend part of itself to other monads without either of them losing their multiple identities.” However since the control over data is invisible and the emphasis is usually given to the authority and control, both users and materials as agencies are not recognized with the prospective power they possess.

Friction / Noise

Despite the growing demand of accessing and processing large volumes of data in nanoseconds, tracking, collecting, analyzing and correlating data can be multi-staged, slow and expanded in time. Within this process, there is also a large amount of messiness, uncertainty and noise amongst the high-frequency flowing information that investors need to use in order to make sound, long-term predictions and decisions. Different interdependent financial species are constantly processing market noise in an attempt to reduce it as much as possible to relevant information.

Yet noise is not always a problem. Sometimes the subsequent decisions and market orders generate more noise for the other participants. In this case, bigger actors manage to thrive by exploiting information gradients that slower market actors are unable to access. In this way, it becomes more efficient to make short-term predictions rather than taking risks for long-term investments. With this regard, Schleifer and Summers (1990, 23) suggest that sometimes noise traders with different models can also cancel each other out in this process: “many trading strategies based on pseudo-signals, noise, and popular models are correlated, leading to aggregate demand shifts. The reason for this is that judgment biases afflicting investors in processing information tend to be the same. Subjects in psychological experiments tend to make the same mistake; they do not make random mistakes. Many of these persistent mistakes are relevant for financial markets.”

Therefore, distorting the flows of data-capital leads to the control of judgment biases, decision-making mechanisms and choices of production. In their study of high frequency trading, Wilkins and Dragos (2013) note, “As long as there is enough disparity and enough heterogeneity in the market, high-frequency traders can profit from the underlying friction and produce more noise. It is precisely this persistent inefficiency of markets that informs heterodox economics.”

Profiting from friction and producing more noise are important aspects to examine how contemporary media operates along with hegemonic actors of neo-liberal control. Friction generates errors and noise. Although the smooth flow of transnational capital contradicts with friction, today big data oriented control agencies create various forms of friction and noise in society with regards to conflicts, pollution, surveillance, privacy and transparency. These physical dynamics of Earth also sheds a light to a more materialistic approach for critical big data studies. Rather than using ecology as a mere analogy for political economic environment of media (Dutton et al 2010), one can return to the fact that “the resources and materials gathered from geological depths enable our media technologies to function” (Parikka 2013). Therefore, within critical big data studies, friction and noise call for a more materialistic approach, such as the energy consumption, turbulence and heat production caused by the operation of big data and data centers.


Today access and control of massive transactional databases are only open to the accredited actors of media, finance, government, pharmaceutical and other industry giants, which provide the necessary infrastructure to manufacture big data. The lack of transparency and restriction of access is limiting the impacts of participatory democracy. For this reason, rather than focusing on the recursive loops of small data and big data distinction, this paper suggests exploring how the dynamic imitative and inventive flows are used within the relationality of big data. Besides the operative performance of friction, noise, errors and failures should be questioned by working with the related works of artists and hacktivists that increase critical awareness about the operations of big data. In this way, instead of drowning in the generated noise and becoming depleted by the frictions manufactured by the controlling agencies, new tools and methodologies to study big data with a critical insight can be explored and experimented further.


Allen, Jamie., Gautier, David. 2014. Critical Infrastructure, Transmediale/art and digital culture, No. 2, p.10

Blok, Anders., Pederson, Morten. 2013. When Size Matters. The Complementarity of Big and Small Data in Future Social Science. Bohr Conference 2013, An Open World, Copenhagen, 4-6 December 2013, Retrieved January 16, 2014, Available at:

Boyd, Danah., Crawford, Kate, 2011. Six provocations for big data, Retrieved on November 25, 2013, Available at:

Boyd, Danah., Crawford, Kate, 2012. “Critical questions for big data,” Information, Communication & Society 15(5): 662-679

Braidotti, Rosi. 2013. The Posthuman. Polity: Cambridge

Cirio, Paolo., Alessandro, Ludovico. 2012. The Hacking Monopolism Trilogy. Retrieved at November 29, 2013, Available at:

Clawson, Trevor. 2014. Small Players in a Big Data World, Retrieved on February 6, 2014, Available at:

Dutton, William H., Dopatka, Anna., Hills, Michael., Law, Ginette., Nash, Victoria. 2010. Freedom of Connection – Freedom of Expression: The Changing Legal and Regulatory Ecology Shaping the Internet. Oxford Internet Institute, University of Oxford. A report prepared for UNESCO’s Division for Freedom of Expression, Democracy and Peace.

Edwards, Paul. 2010. A Vast Machine: Computer Models, Climate Data, and the Politics of Global Warming. MIT Press, Massachusetts

Edwards, Paul N., S. Mayernik, Matthew, S., Batcheller, Archer L., Bowker, Geoffrey C. Borgman, Christine L. 2011. “Science friction: Data, metadata, and collaboration,” Social Studies of Science 41 (5): 667 – 690

Fuchs, Christian. 2010. “New imperialism. Information and media imperialism,” Global Media and Communications 6.1 (Apr 2010): 33-60.

Glanz, James. 2012. “The Cloud Factories: Power, Pollution and the Internet,” The New York Times, Retrieved on February 7, 2014. Available at:

Griziotti, Giorgio. 2014. Biorank: algorithms and transformation in the bios of cognitive capitalism, Retrieved on February 6, 2014, Available at:

Johnson, Kevin., Martin, Scott., O’Donnell, Jayne., Winter, Michael. 2013. “Reports: NSA Siphons Data from 9 Major Net Firms”USA Today. Retrieved January 16, 2014

Kroker, Arthur., Mariouise. 2008. City of Transformation: Paul Virilio in Obama’s America, Retrieved on February 6, 2014, Available at:

Latour, Bruno., Woolgar, Steve. 1986. Laboratory Life: The Social Construction of Scientific Facts. Princeton University: Massachusetts

Latour, Bruno. 2001. “Gabriel Tarde and The End of Social”, Patrick Joyce (ed) The Social in Question. New Bearings in History and the Social Sciences, Routledge, London, pp.117-132.

Retrieved on January 27, 2014, Also Available at:

Latour, Bruno., Jensen, Pablo., Venturini, Tommaso., Grauwin, Sébastian., Boullier, Dominique. 2012. ‘The whole is always smaller than its parts’: a digital test of Gabriel Tardes’ monads, British Journal of Sociology 63(4): 590-615. Retrieved January 2, 2014 Also available at:

Lazzarato, Maurizzio. 2012. The Making of The Indebted Man, MIT Press, Massachusetts

Manovich, Lev. 2011. Trending: The Promises and the challenges of big social data, Retrieved on January 7, 2014, Available at:

Mayer-Schönberher, Victor., Kenneth, Cukier. 2013. Big Data: A Revolution That Will Transform How We Live, Work and Think, John Murray: London

Miharbi, Ali. 2010. Detect, Bite, Slam. A thesis submitted in partial fulfillment of the requirements for the degree of Master of Fine Arts at Virginia Commonwealth University. Retrieved on February 4, 2014. Available at:

Parikka. Jussi. 2013. The Geology of Media. Retrieved on February 7, 2014. Available at:

Savage, Mike., Burrows, Roger. 2007. The Coming Crisis of Empirical Sociology, Sociology 41(5): 885-899.

Shleifer, Andrei., Summers, Lawrence H. 1990. “The Noise Trader Approach to Finance,” The Journal of Economic Perspectives, Vol. 4, No. 2. (Spring, 1990), pp. 19-33

Stein, Gabe. 2013. State-Owned Gold Mine: What Happens When Governments Sell Data?, Retrieved on February 1, 2014, Available at:

Stoller, Paul. 2013. “Big Data, Thick Description and Political Expediency,” Huffington Post, Retrieved on June 16, 2013, Available at:

Tarde, Gabriel. 1890. Les lois de l’imitation : tude sociologique, Paris : Alcan [Translated by Elsie Clews Parsons in 1903 and published as The Laws of Imitation, with an introduction by Franklin H. Giddings, New York: Henry Holt & Co.; reprinted 1962, Gloucester, MA: Peter Smith; online at

Tarde, Gabriel. 1895. Essais et mélanges sociologiques, Paris: Maloine.

Wang, Tricia. 2013. Big Data Needs Thick Data, Ethnography Matters, Retrieved on December 27, 2013, Available at:

Wilkins, Inigo. Dragos, Bogdan. 2013. Destructive Destruction?: An Ecological Study of High Frequency Trading, Retrieved on February 6, 2014, Available at: