Untitled

To form code that can show how machine learning can be used to correctly give a prediction we can turn to a wide-known computing problem, the pima Indian diabetes problem. This is a database of over 700 patients with contains symptoms they have and if the subject is diabetic or not. These symptoms range from plasma glucose concentration to triceps skin fold thickness. We have a class variable which tells us if they are diabetic or not, 0 showing a false outcome (Does not have diabetes) then 1 showing us a true outcome (Also known as positive outcome – tells us they do in-fact have diabetes). All attributes are numeric meaning we can take the attributes, and statically predict if a new patient will see an onset of diabetes by comparing with previous symptoms. We use machine learning for this as no two patients will be the same, their attributes (symptoms) will range differently meaning we cannot use simple predicting to guess.

Navies Bayes Classifier is a mathematical technique which is based on the Bayesian Theorem and is suited to predicting outcomes when the dimensionality of the inputs is high. In this situation, our inputs can range from 0 to 300 and we have over 10 inputs which predict the onset of diabetes. Bayesian theorem is derived from a joint probability distribution over two events, this gives us the probability that an event would occur given a second event occurs.

P(A ┤|  B)=  (P(A ∩B))/(P(B))

Simplified, this is the probability of A and B occurring, divided by the probability of just P occurring. Navies Bayes is derived from the joint probability distribution. There are many ways to display the formula (Normal, Lognormal, Poisson and Gamma) although the calculation for navies Bayes does not work the same, as the formula to calculate it can be referred to as an gaussian probability density function  :

1/(σ √2π)  exp⁡[〖-(x- μ)〗^2/2σ]

Where μ = mean, σ = Standard Deviation

By using this, we can calculate the mean and standard deviation of a group, then substitute in our mean and standard deviation to calculate our navies Bayes outcome.

If our outcome from the gaussian predictor shows a strong correlation between the patients previous medically factors, we can assume that the patient will have a onset of diabetes. A strong correlation could be factored as a probability of over 0.75.  To visually see the correlations, we can take our original data from the pima Indians diabetes database, and plot different types of graphs, allowing us to see factors that cause the onset of diabetes with ease.