Fig. 5. Four examples of statistical models for data integration. (A)
Voting system. Each circle represents one data set and has one vote. Gray
numbers indicate total votes. Data that are confirmed by multiple data sets
have multiple votes. In this example, there are three data sets; thus, three
is the maximum number of possible votes. (B) Support vector machine.
Blue circles indicate positives in the training set and yellow squares
represent negatives. In this example, there are two attributes (as represented
by the x- and y-axes) for each data point. The data are
plotted based on the values of these attributes. A function f is used
to convert the data points so that they become linearly separable. The
training set is used to derive the one-dimensional plane (red line) that
separates positives from negatives. (C) Decision tree. In this
hypothetical tree, the goal is to classify the input items into two
categories, X and Y, which are denoted as blue circles and yellow squares,
respectively. The category of each item is hidden, but we know the values of
its three attributes (A,B,C). We use a set of conditions (represented by pink
diamonds) to evaluate these attributes. Based on their values, we separate the
items into subsets. The separation continues until the final outcome of the
items (leaf nodes, represented by green boxes) is reached. (D) Bayesian
network. In Bayesian networks, nodes represent variables and edges represent
variable dependencies. Here, each node represents a Boolean variable, the
value of which is denoted as true (T) or false (F) in conditional probability
tables. The edges indicate that the value of B is dependent on the value of A
and that the value of D is influenced by both the values of A and C. The
conditional probability tables detail such dependency. For example, the
probability of B being true is 0.8 if A is true; the probability drops to 0.4
if A is false. This network enables us to derive probabilities from different
attribute values - for example, the probability of A being true given that B
is true.