Big Data (although I think the adjective “big” is already running small) has been here for a while now and is growing at a giant pace. It is expected that in 2020 there will be 40 zettabytes stored digitally, this is about 6080 million years of watching videos in HD (NVTC, 2017). That is, if we assume that you are Colombian then you should live 78 years 1 If you were born in Colombia in 2016 and you are a man then you have a life expectancy at birth of 75.4 years and if you are a woman your life expectancy is 81, 1 years (Fernández, 2017), so to make things simple we get the average. [/ Note] on average, then you would need almost 78 million lives to see all those videos. In this post, I am going to tell you about how Big Data is measured and why it is growing exponentially.
Big Data Infographic
Source: IBM (s.f.)
Let us begin then by measuring the volume. For that, you have to talk about bytes. What is a byte? A byte is a unit of measure for storage and information processing. A byte is what is required to store a letter. I believe that you are familiar with some volume measurements that follow the byte; for example, a kilobyte (KB) is 1024 bytes, imagine that a sheet written in Word is equivalent to more or less 30 KB. Then comes the Megabyte (1 MB is 1024 KB), and that’s a textbook. A GB that is 1024 MB follows it, that is typical of what we look for when we go to buy a USB memory or a cell phone.
One more recent is the Terabyte (1 TB is 1024 GB), and that volume is what most laptops now offer today. My computer has a storage TB, and although I keep and store information, it seems that it will never end. Then the Petabytes follow (1PB is 1024 TB), the Exabytes (1 EB is 1024 TB), the Zettabytes (1 ZB 1024 EB), the Yottabytes (1 YB are 1024 ZB), the Brontobytes (1 BB is 1024 YB) and the Geopbyte (1 GB is 1024 BB).
Let’s see some examples of those units of measures. In 2008, Google processed 20 Petabytes of information each day (Dontha, 2017). An Exabyte is equivalent to 250 million DVD’s if we compare all the movies that have been created throughout the history of cinema (approximately 500,000 films), then what would we do with the remaining 249.5 million DVDs, assuming we have access to all the movies? And DVD’s that we want (Säisä, 2013). So, regarding volume, what do we mean when we talk about Big Data? When we talk about dozens of Terabytes and from then on, we can say that we are already talking about Big Data.
And why does it grow so fast? On the one hand, technological developments have generated a great variety of data; On the other hand, more and more people in the world have access to them. Imagine that in 1998 (20 years ago) the searches that were done in the Google per day were 9800 on average and in 2012, they were 5134 million on average (SAP 2 SAP is a German multinational that is dedicated to designing software for all types of organizations. [/ note], 2014). In other words, approximately 524,000 times more searches in 2012 than in 1998, and that only happened in 14 years!
What kind of technological developments? For example, the cell phone, the internet or the iPod. If you a centennial (born in 1995 or later) you probably have no idea what is a beeper or fax, nowadays everyone have a cell phone. In fact, it is estimated that in the world there are 6 .8 of billions of cell lines, and if we are more or less 7.6 billion people, then we have that of every 100 people there are 78 who have a cell phone.
What do we store on our cell phone? All kinds of data: images, videos, sounds (music), text messages, and if we also have data or access to Wi-Fi, we can see videos on YouTube, add content to Facebook or send Tweets 3 It is not that before these developments we did not have photographs, or videos, or “text messages”, only that we stored them, interacted and shared them in a different way. For example, we took the photos with a photography camera, they were stored in a roll that we had to reveal, and we kept them physically in an album. [/ Note]. It turns out that 30 billion pieces of content are shared monthly on Facebook, 400 million Tweets are sent every day, or 4 billion hours of video are seen on YouTube every month (IBM, nd), I do not know about you, but those figures do not fit in my head.
here is another feature that is important: Velocity.That is to say how much time is needed to store and analyze the information. Once again the technological developments allow us to have more information in real time. In fact, there are about 2.5 network connections per person on earth (IBM, s.f.). An example of speed is a few steps from you if you own a modern car, they have around 100 sensors and that means that in real time you can know exactly the level of gasoline, the pressure of your tires, among many things, and all that information is stored and processed in thousandths of a second.
If you noticed, I underlined three words: variety, volume, and velocity. Those are the three basic V that characterize Big Data. On the other characteristics (value and veracity) and the opportunities and challenges that Big Data has brought, I will speak to you in the next post.
- Dontha, R. (2017, 13 enero). Who came up with the name Big Data? Disponible en https://www.datasciencecentral.com/profiles/blogs/who-came-up-with-the-name-big-data
- Fernández, C.F (2017, 14 septiembre). Los hombres colombianos vivirán 75,4 años y las mujeres 81,1 años. El Tiempo. Disponible en: http://www.eltiempo.com/vida/salud/esperanza-de-vida-en-los-hombres-y-mujeres-de-colombia-130840
- IBM (s.f). The Four V’s of Big Data. Disponible en: http://www.ibmbigdatahub.com/infographic/four-vs-big-data
- NTCV – Northern Virginia Tecnology Council (2017). Data Analytics. Disponible en: http://blog.nvtc.org/index.php/nvtc-publishes-2017-data-analytics-infographic/
- Säisä, L. (2013, 9 agosto). Big Data and privacy aspects. Disponible en: http://saisa.eu/blogs/Guidance/?p=1274
- SAP (2013). Big Data is affecting people everywhere. Disponible en: https://visual.ly/community/infographic/technology/big-data-affecting-people-everywhere