Wednesday, January 22, 2014

Big Data Mining

Database is collection of information organized in such a way that a computer program can quickly select desired pieces of data. In traditional system databases are organized by fields, records and files. Everything in this databases are data. Data is distinct pieces of information.



Every day 2.5 quintillion of data are created and more than 90 percentage of data are produced within past two years. In past days our data generation has never been so powerful. But now it is increased very much. In 2012, debate between Barak Obama and Mitt Romney triggered about 10 million tweets within 2 hours. And the well-known web site Flickr which is used to post our images faced a problem. It receives 1.8 million photographs every day which has the size of 2MB. Approximately they need 3.6TB storage capacity per day. Those situations demonstrate the rise of Big Data application where data collection has grown admirably. 

The term Big data is used for collection of data sets, which are so large and complex. Those data are very difficult to process using on-hand DBMS tools or traditional applications which are used to process data. We can’t simply handle those data. There are so many risks in capturing, storing, searching, sharing and visualizing the data.

Usually big data might be petabytes or exabytes of data consisting of trillions of records of people. Those data are not belongs to a particular source. There are many sources for these data. Such as customer contact center, web, sales, mobile data, social media and so on. Those data are not structured well. Those are mostly loosely structured and it is often incomplete and inaccessible.
Now a days scientists encounter limitations because of large data sets in many areas. Such as genomics, connectomics, physics simulations, meteorology, and biological research. These limitations affects internet search, business informatics and finance.

Digital data is now everywhere, in every sector, in every economy, in every organizations and user of digital technology. Big data is important for leaders across every sector and consumers of products and services stand to benefit from its application. 

Big data starts with large volume, heterogeneous, autonomous sources with distributed and decentralized control and seeks to explore complex and evolving relationships among data. These characteristics make it an extreme challenge for discovering useful knowledge from the Big Data. Autonomous data sources with distributed and decentralized controls are a main characteristic of Big Data applications. Being autonomous, each data source is able to generate and collect information without involving any centralized control. 

Needs of the stored data have become larger amount in global economy. When we consider a company, every day they increase their number of customers and suppliers. Therefore their operational activities and data transactions also increase in large scale. All over the world there are many devices connected to the network every second such as mobile phones, smart energy meters, automobiles, and industrial machines these devices may sense the data, create the data, and communicate data in the Internet.



Furthermore there are many Social media sites, smartphones, and other consumer devices including PCs and laptops all are contribute data to the entire world.

 More than this multimedia content has played a major role to increase the amount of big data in fastest ascending manner. Each second of high-definition videos, images and audio files those are takes very large amount of data. All together there are considerable amount of big data available.
In typical data mining systems, the mining procedures require computational intensive computing units for data analysis and comparisons. A computing platform is, therefore, needed to have efficient access to, at least, two types of resources. They are data and computing processors.
For small scale data mining tasks, a single desktop computer, which contains hard disk and CPU processors, is sufficient to fulfill the data mining goals. Indeed, many data mining algorithm are designed for this type of problem settings. For Big Data mining, because data scale is far beyond the capacity that a single personal computer can handle, a typical Big Data processing framework will rely on cluster computers with a high-performance computing platform, with a data mining task being deployed by running some parallel programming tools, such as MapReduce or Enterprise Control Language, on a large number of computing nodes. 

No comments:

Post a Comment