spacer gif spacer gif spacer gif spacer gif ARCHIVE ANNOUNCEMENT! spacer gif
 QUICK SEARCH:   [advanced]


spacer gif
     Home     Help     Feedback     Subscriptions     Archive     Search     Table of Contents    


Right arrow Help viewing high resolution images
Right arrow Return to article
(Downloading may take up to 30 seconds.
If the slide opens in your browser, select File -> Save As to save it.)

Click on image to view larger version.


Figure 5


Fig. 5. Four examples of statistical models for data integration. (A) Voting system. Each circle represents one data set and has one vote. Gray numbers indicate total votes. Data that are confirmed by multiple data sets have multiple votes. In this example, there are three data sets; thus, three is the maximum number of possible votes. (B) Support vector machine. Blue circles indicate positives in the training set and yellow squares represent negatives. In this example, there are two attributes (as represented by the x- and y-axes) for each data point. The data are plotted based on the values of these attributes. A function f is used to convert the data points so that they become linearly separable. The training set is used to derive the one-dimensional plane (red line) that separates positives from negatives. (C) Decision tree. In this hypothetical tree, the goal is to classify the input items into two categories, X and Y, which are denoted as blue circles and yellow squares, respectively. The category of each item is hidden, but we know the values of its three attributes (A,B,C). We use a set of conditions (represented by pink diamonds) to evaluate these attributes. Based on their values, we separate the items into subsets. The separation continues until the final outcome of the items (leaf nodes, represented by green boxes) is reached. (D) Bayesian network. In Bayesian networks, nodes represent variables and edges represent variable dependencies. Here, each node represents a Boolean variable, the value of which is denoted as true (T) or false (F) in conditional probability tables. The edges indicate that the value of B is dependent on the value of A and that the value of D is influenced by both the values of A and C. The conditional probability tables detail such dependency. For example, the probability of B being true is 0.8 if A is true; the probability drops to 0.4 if A is false. This network enables us to derive probabilities from different attribute values - for example, the probability of A being true given that B is true.





Right arrow Return to article