In this example we build a Sequential model with DenseLayers to model the Pima diabetes dataset, generating predictions about whether individuals will be diagnosed with a diabetes based on other diagnostic measurements included in the dataset.
This dataset is originally from the United States National Institute of Diabetes and Digestive and Kidney Diseases. All patients here are females at least 21 years old of Pima Indian heritage.
Source:
Pima Indian Diabetes Database, https://www.kaggle.com/uciml/pima-indians-diabetes-database. Data originally published in:
•
|
Smith, J.W., Everhart, J.E., Dickson, W.C., Knowler, W.C., & Johannes, R.S. (1988). Using the ADAP learning algorithm to forecast the onset of diabetes mellitus. In Proceedings of the Symposium on Computer Applications and Medical Care (pp. 261--265). IEEE Computer Society Press.
|
We first load the data into a DataFrame:
>
|
|
As before, we divide the dataset into training and test data. We are attempting to predict the last column, Outcome.
>
|
|
We define a neural network using the Sequential command which stacks one or more neural network layers.
>
|
|
| (2.1.1) |
>
|
|
We can train the data easily with the Fit command:
>
|
|
| (2.1.3) |
We now have a trained model whose accuracy we can test against the test data or against any new data. The accuracy is not especially high but nevertheless useful for predictive purposes.
>
|
|
| (2.1.4) |
To get a sense of what how this model behaves on individuals within the dataset, we can take a slice from the test data and compare the observed vs. predicted outcome.
>
|
|
| (2.1.5) |
>
|
|
| (2.1.6) |