Getting started with Facets
In this section, we will install Facets in Python, using Jupyter Notebook on Google Colaboratory.
We will then retrieve the training and testing datasets. Finally, we will read the data files.
The data files are the training and testing datasets from Chapter 1, Explaining Artificial Intelligence with Python. This way, we are in a situation in which we know the subject and can analyze the data without having to spend time understanding what it means.
Let's first install Facets on Google Colaboratory.
Installing Facets on Google Colaboratory
Open Facets.ipynb. The first cell contains the installation command:
# @title Install the facets-overview pip package.
!pip install facets-overview
The installation may be lost when the virtual machine (VM) is restarted. If this is the case, it will be installed again. If Facets is installed, the following message is displayed:
Requirement already satisfied:
The program will now retrieve the datasets.
Retrieving the datasets
The program retrieves the datasets from GitHub or Google Drive.
To import the data from GitHub, set the import option to repository = "github".
To read the data from Google Drive, set the option to repository = "google".
In this section, we will activate GitHub and import the data:
# @title Importing data <br>
# Set repository to "github"(default) to read the data
# from GitHub <br>
# Set repository to "google" to read the data
# from Google {display-mode: "form"}
import os
from google.colab import drive
# Set repository to "github" to read the data from GitHub
# Set repository to "google" to read the data from Google
repository = "github"
if repository == "github":
!curl -L https://raw.githubusercontent.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python/master/Chapter03/DLH_train.csv --output "DLH_train.csv"
!curl -L https://raw.githubusercontent.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python/master/Chapter03/DLH_test.csv --output "DLH_test.csv"
The data is now accessible to our runtime. We will set the path for each file:
# Setting the path for each file
dtrain = "/content/DLH_train.csv"
dtest = "/content/DLH_test.csv"
print(dtrain, dtest)
You can choose the same actions with Google Drive:
if repository == "google":
# Mounting the drive. If it is not mounted, a prompt
# will provide instructions
drive.mount('/content/drive')
# Setting the path for each file
dtrain = '/content/drive/My Drive/XAI/Chapter03/DLH_Train.csv'
dtest = '/content/drive/My Drive/XAI/Chapter03/DLH_Train.csv'
print(dtrain, dtest)
We have installed Facets and can access the files. We will now read the files.
Reading the data files
In this section, we will use pandas to read the data files and load them into DataFrames.
We will first import pandas and define the features:
# Loading Denis Rothman research training and testing data
# into DataFrames
import pandas as pd
features = ["colored_sputum", "cough", "fever", "headache", "days",
"france", "chicago", "class"]
The data files contain no headers so we will use our features array to define the names of the columns for the training data:
train_data = pd.read_csv(dtrain, names=features, sep=r'\s*,\s*',
engine='python', na_values="?")
The program now reads the training data file into a DataFrame:
test_data = pd.read_csv(dtest, names=features, sep=r'\s*,\s*',
skiprows=[0], engine='python', na_values="?")
Having read the data into DataFrames, we can now implement feature statistics for our datasets.