This section establishes a new form of graphical probabilistic representation called an Experiment Graphical Model (EGM).

An EGM is similar to a Probabilistic Graphical Model (PGM). They differ in that the nodes in an EGM are Experiment Results, where in a PGM the nodes are Random Variables. An Experiment Result has all the information of a Random Variable, but adds one more piece of information – the number of samples. This additional information has significant implications:

- experiments with a common sample space and procedure can be combined
- the number of trials in the experiment can be used as an indicator of the significance of an experiment

**The Motivation**

It is frequently the case where we have results from experiments taken from a same sample space using the same technique but which have different number of trials. We want to combine the results of these experiments while respecting the relative significance represented in the trial counts. Bayes Theorem gives the ability to encode relationships between Random Variables and is commonly used to achieve such an expression. But Bayes Theorem is derived from probabilities, and probabilities do not include the sample count. So Bayes does not have access to sample count, and cannot consider relative significance of the respective sample counts. This limitation of Bayes places an unnecessary handicap in applications where Bayes is applied.

- Given the definition of probability p(A=a1) = nA1 / n (Kolmogorov 1933 set-theoretic definition)
- Bayes chooses to use probability p(A=a1) instead of the more expressive [nA1, n] in its definition.
- where Bayes is p(A=a1|B=b1) = p(B=b1|A=a1) * p(A=a1) / p(B=b1)

In summary: Bayes uses Probability instead of the more expressive [nA1… nAn]. Bayes does not have visibility into the number of events – the “n”. It has thrown away the cardinality of the experiment (the number of trials).

**Example 1:**

I move from Boston to Irvine. After a month I have experienced 2 earthquakes. One morning I am awoken by a loud noise and shaking. I am extremely confident it is an earthquake. Dr Daphne Koller also lives in that same building and experiences the same loud noise and shaking. She has experienced many earthquakes and is quite confident it is something else.

Given traditional probabilistic representation those evaluations might be Chris: p(quake) = 0.8 Dr Koller: p(quake) = 0.2. But Dr Koller has been living in the area much longer and has more experience detecting earthquakes. She has experienced (say) 934 earthquakes where I have experienced 2. How do we ‘weigh’ degree of experience in combining these? Typically a Bayes network would have random variable nodes “Chris” and “Dr Koller,” and a child node with weights (conditional probabilities) to combine them. The problem is- the degree of confidence that was already part of the original experiments (“2” and “934”) were thrown away in computing the probability. In other words – first we discard “n” after computing p(quake), then we have to create something to replace it – the weight in the child node. It would be preferable to preserve the “n”, and use that in weighing the the relative significance in the child node.

Example 2:

An aeronautic airspeed sensor is a component of an airplane control system that has several such sensors. The sensor expresses its readings in a probabilistic manner P(S=sn) where sn is speed in knots S <is a set containing> (20, 50, 100, 200, 500). Internally, the sensor has access to the temperature and humidity at the sensing point. It may have a heating element and could detect the state of this element. An inherent design element (aperture size) affects its performance. A model, internal to the sensor, factors these parameters in determining the most likely airspeed. The output of the sensor are the five values P(S=sn).

The problem here is the sensor can only express a prediction of airspeed. It cannot express its confidence in that prediction. So, if one sensor is subject to freezing under low temperature/high humidity conditions, it cannot express its low confidence when it detects those conditions. A second sensor optimized for low temperature/high humidity conditions would want to express its strong confidence under those conditions.

We desire a means for the sensor to express not just its airspeed prediction, but also its confidence in that prediction. To do this – we have the sensor express its output as an Experimental Result N(S=sn) where N(S=sn) (*is the number of occurrences of that event in an experiment that is equivalent to its current state. That experiment would have the same Procedure, Sample Set, etc ?)*

Note the probability P(S=sn) can be earily arrived at as N(S=sn) / card(N) where card(N) = sum_over_n(P(S=sn)

*(In the context of the System Controller where we have multiple sensors – if we assume the experiments from those sensors are performed under the same conditions, using the same Procedure, then they can be combined. The output from various sensors can be combined (is the number of occurrences of that event in an experiment that is equivalent to its current state. That experiment would have the same Procedure, Sample Set, etc ?)*

In this manner – details of the sensor model are encapsulated within the sensor and not exposed to the system. The sensor is given a means to express its level of confidence to the system. A common and consistent means allows the system to combine results from different sensors.

**The Plan**

- Define an Experiment Result as triple: Sample Space, Procedure and Collection of Trials.
- We note that Experimental Results establishes a Probability Function, where P(A=a1) = N(A=a1) / N(Sample Space)
- Note that Experiment Results can be combined if they share Sample Space and Procedure. Each Experimental Results can (will) have its own Collection of Trials.
- Express Bayes Theorem using the set-theoretic notation specifically identifying Sample Space and Procedure, but with different Number of Trials.
- Establish Experiment Graphical Model (EGM) – It is the equivalent of a Probabilistic Graphical Model except that it adds a new “Node” – an “Addition” node – where the prerequisite for such node they share Sample Space and Procedure.
- Demonstrate (prove) that any Neural Network can be rewritten as (is) a EGM.

abc

def

$latex i\hbar\frac{\partial}{\partial t}\left|\Psi(t)\right>=H\left|\Psi(t)\right>$