# Big Data Technologies and Applications pdf pdf

Gratis

### Borko Furht · Flavio Villanustre

All rights are reserved by the Publisher, whether the whole or part of the material is concerned, speciﬁcally the rights of translation, reprinting, reuse of illustrations,recitation, broadcasting, reproduction on microﬁlms or in any other physical way, and transmission or information storage and retrieval, electronic adaptation, computer software, or by similar or dissimilarmethodology now known or hereafter developed. The publisher, the authors and the editors are safe to assume that the advice and information in this book are believed to be true and accurate at the date of publication.

### Preface

The objective of this book is to introduce the basic concepts of big data computing and then to describe the total solution to big data problems developed byLexisNexis Risk Solutions. Part I on Big Data Technologies includes the chapters dealing with introduction to big data concepts and techniques, big data analytics and relating platforms, and visualizationtechniques and deep learning techniques for big data.

### Acknowledgments

More details about big data analytics techniques can be found in [ 2 , 4 ] as well as in the chapter in this book on “Big Data Analytics.” Clustering Algorithms for Big Data Clustering algorithms are developed to analyze large volume of data with the main objective to categorize data into clusters based on the speciﬁc metrics. The authors pro- posed the categorization of the clustering algorithms into the following ﬁve categories: The clustering algorithms were evaluated for big data applications with respect 5 ] and the to three Vs deﬁed earlier and the results of evaluation are given in [ authors proposed the candidate clustering algorithms for big data that meet thecriteria relating to three V.

### Big Data Industries

1.5 Big data growth (Source Reuter 2012) Challenges and Opportunities with Big Data In 2012, a group of prominent researchers from leading US universities includingUC Santa Barbara, UC Berkeley, MIT, Cornell University, University of Michigan,Columbia University, Stanford University and a few others, as well as researchers from leading companies including Microsoft, HP, Google, IBM, and Yahoo!,created a white paper on this topic [ 6 ]. The prediction isthat big database of every student’s academic performance can be created and this data can be then used to design the most effective approaches to education, startingfrom reading, writing, and math, to advanced college-level courses [ 6 ].

### Multiple species flocking

Sampling and compression are two representative data reduction methods for big data analytics because reducing the size of data makes the data analytics compu-tationally less expensive, thus faster, especially for the data coming to the system 75 ], how many instances need to be selected for data mining method is another research issue [ 76 ] because it will affect the performance of the sampling method in most cases. The trends of machine learning studies for big data analytics can be divided into twofold: one attempts to make machine learning algorithms run on parallel plat-forms, such as Radoop [ 128 ], Mahout [ 86 ], and PIMRU [ 123 ]; the other is to redesign the machine learning algorithms to make them suitable for parallel com- puting or to parallel computing environment, such as neural network algorithms forGPU [ 125 ] and ant-based algorithm for grid [ 126 ].

### Output the Result of Big Data Analysis

That parallel computing and cloud computing tech- nologies have a strong impact on the big data analytics can also be recognized asfollows: (1) most of the big data analytics frameworks and platforms are usingHadoop and Hadoop relevant technologies to design their solutions; and (2) most of the mining algorithms for big data analysis have been designed for parallel com-puting via software or hardware or designed for Map-Reduce-based platform. From thepragmatic perspective, the big data analytics is indeed useful and has many pos- sibilities which can help us more accurately understand the so-called “things.”However, the situation in most studies of big data analytics is that they argued that the results of big data are valuable, but the business models of most big dataanalytics are not clear.

### Security Issues

In spite of the security that we have to tighten for big data analytics beforeit can gather more data from everywhere, the fact is that until now, there are still not many studies focusing on the security issues of the big data analytics. As long as porting the data mining algorithms to Hadoop is inevitable, making the data mining algorithms work on a map-reduce architectureis the ﬁrst very thing to do to apply traditional data mining methods to big data analytics.

### Noise, Outliers, Incomplete and Inconsistent Data

Although big data analytics is a new age for data analysis, because several solutions adopt classical ways to analyze the data on big data analytics, the open issues oftraditional data mining algorithms also exist in these new systems. This situation may occur because the loading of different computer nodesmay be different during the data mining process, or it may occur because the convergence speeds are different for the same data mining algorithm.

### Privacy Issues

In the same way, D = { (x , y ) Si T is deﬁned as the target domain data where D T T1 T1 …, (x , y )}, where x , 2 X is the i-th data instance of D and y , 2 Y is the Tn Tn Ti T T Ti T corresponding class label for x . [ 74 ], referred to as Heterogeneous Spectral Mapping (HeMap), addresses the speciﬁc transfer learning scenario where the input feature6¼ space is different between the source and target (X X ), the marginal distri- S T bution is different between the source and the target (P(X ) 6¼ P(X )), and the S T 6¼ output space is different between the source and the target (Y Y .).

### Negative Transfer

[ 98 ] discusses the concept of negative transfer in transfer learning and claims that the source domain needs to be sufﬁ- ciently related to the target domain; otherwise, the attempt to transfer knowledgefrom the source can have a negative impact on the target learner. Using spectral graph theory [ 55 ] on the model transfer graph, a transfer function is derived that maintains the geometry of the model transfer graph and is used in the ﬁnal target learner to determine the level 99 ] is tested along with a handpicked method where the source domains are manually selected to be related to the target, an average method that uses all sources available, and a baseline method that does not use transfer learning.

### References 1. Witten IH, Frank E. Data mining, practical machine learning tools and techniques. 3rd ed

According to IDC [ 11 ], data volumes have grown exponentially, and by 2020 the number of digital bits will be comparable to the number of stars in the universe. Organizations and social media generate enormous amounts of data every day and, traditionally, represent it in a format consistent with the poorly structureddatabases: web blogs, text documents, or machine code, such as geospatial data that may be collected in various stores even outside of a company/organization [ 24 ].

### Current activity in the ﬁeld of Big Data visualization is focused on the invention of tools that allow a person to produce quick and effective results working with

We identify important steps for the research agenda to implement this This paper covers various issues and topics, but there are three main directions of this survey: The rest of paper is organized as follows: The ﬁrst section provides a deﬁnition of Big Data and looks at currently used methods for Big Data processing and theirspeciﬁcations. The mass distribution of the technology and innovative modelsthat utilize these different kinds of devices and services, appeared to be a starting point for the penetration of Big Data in almost all areas of human activity, includingthe commercial sector and public administration [ 27 ].

### However, the next step presenting a system with the addition of a time dimension appeared as a signiﬁcant breakthrough. In the beginning of the present

The main Gestalt principles such as law of proximity (collection of objects forming agroup), law of similarity (objects are grouped perceptually if they are similar to each other), symmetry (people tend to perceive object as symmetrical shapes), closure(our mind tends to close up objects that are not complete) and ﬁgure-ground law(prominent and recessed roles of visual objects) should be taken into account in Big Data Visualization. However, the only option to present a ﬁnal image is in moving around it and 168 ].thus navigation inside the model seems to be another influential issue [ From the Big Data visualization point of view, scaling is a signiﬁcant issue mainly caused by multidimensional systems where a need to delve into a branch ofinformation in order to obtain some speciﬁc value or knowledge takes its place.

### References

Empirical studies have demonstrated that data representations obtained from stacking up nonlinear feature extractors (as in Deep Learning) often yield better machine learning results, e.g., 9 ], better quality of generated samples by gen- improved classiﬁcation modeling [ erative probabilistic models [ 10 ], and the invariant property of data representations [ 11 ]. Inaddition to analyzing massive volumes of data, Big Data Analytics poses other unique challenges for machine learning and data analysis, including format varia-tion of the raw data, fast moving streaming data, trustworthiness of the data anal- ysis, highly distributed input sources, noisy and poor quality data, highdimensionality, scalability of algorithms, imbalanced input data, unsupervised and un-categorized data, limited supervised/labeled data, etc.

### These algorithms are largely motivated by the ﬁeld of artiﬁcial intelligence, which has the general goal of emulating the human brain’s ability to observe, analyze

A key concept underlying Deep Learning methods is distributed representations of the data, in which a large number of possible conﬁgurations of the abstractfeatures of the input data are feasible, allowing for a compact representation of each sample and leading to a richer generalization. The ﬁnal representation of data constructed by the deep learning algorithm(output of the ﬁnal layer) provides useful information from the data which can be used as features in building classiﬁers, or even can be used for data indexing andother applications which are more efﬁcient when using abstract representations of data rather than high dimensional sensory data.