Database is
collection of information organized in such a way that a computer program can
quickly select desired pieces of data. In traditional system databases are
organized by fields, records and files. Everything in this databases are data.
Data is distinct pieces of information.
Every day 2.5
quintillion of data are created and more than 90 percentage of data are
produced within past two years. In past days our data generation has never been
so powerful. But now it is increased very much. In 2012, debate between Barak
Obama and Mitt Romney triggered about 10 million tweets within 2 hours. And the
well-known web site Flickr which is used to post our images faced a problem. It
receives 1.8 million photographs every day which has the size of 2MB.
Approximately they need 3.6TB storage capacity per day. Those situations demonstrate
the rise of Big Data application where data collection has grown admirably.
The term Big
data is used for collection of data sets, which are so large and complex. Those
data are very difficult to process using on-hand DBMS tools or traditional
applications which are used to process data. We can’t simply handle those data.
There are so many risks in capturing, storing, searching, sharing and
visualizing the data.
Usually big data
might be petabytes or exabytes of data consisting of trillions of records of
people. Those data are not belongs to a particular source. There are many
sources for these data. Such as customer contact center, web, sales, mobile
data, social media and so on. Those data are not structured well. Those are
mostly loosely structured and it is often incomplete and inaccessible.
Now a days
scientists encounter limitations because of large data sets in many areas. Such
as genomics, connectomics, physics simulations, meteorology, and biological
research. These limitations affects internet search, business informatics and
finance.
Digital data is
now everywhere, in every sector, in every economy, in every organizations and
user of digital technology. Big data is important for leaders across every
sector and consumers of products and services stand to benefit from its
application.
Big data starts with large volume,
heterogeneous, autonomous sources with distributed and decentralized control
and seeks to explore complex and evolving relationships among data. These characteristics make it an extreme challenge for discovering useful
knowledge from the Big Data. Autonomous data sources with distributed and
decentralized controls are a main characteristic of Big Data applications.
Being autonomous, each data source is able to generate and collect information
without involving any centralized control.
Needs of the
stored data have become larger amount in global economy. When we consider a
company, every day they increase their number of customers and suppliers. Therefore their operational activities and data
transactions also increase in large scale. All over the world there are many
devices connected to the network every second such as mobile phones, smart
energy meters, automobiles, and industrial machines these devices may sense the
data, create the data, and communicate data in the Internet.
Furthermore
there are many Social media sites, smartphones, and other consumer devices
including PCs and laptops all are contribute data to the entire world.
More than this multimedia content has played a
major role to increase the amount of big data in fastest ascending manner. Each
second of high-definition videos, images and audio files those are takes very
large amount of data. All together there are considerable amount of big data
available.
In typical data
mining systems, the mining procedures require computational intensive computing
units for data analysis and comparisons. A computing platform is, therefore,
needed to have efficient access to, at least, two types of resources. They are
data and computing processors.
For small scale data
mining tasks, a single desktop computer, which contains hard disk and CPU
processors, is sufficient to fulfill the data mining goals. Indeed, many data
mining algorithm are designed for this type of problem settings. For Big Data
mining, because data scale is far beyond the capacity that a single personal
computer can handle, a typical Big Data processing framework will rely on
cluster computers with a high-performance computing platform, with a data
mining task being deployed by running some parallel programming tools, such as
MapReduce or Enterprise Control Language, on a large number of computing nodes.
No comments:
Post a Comment