Hands-On Explainable AI(XAI) with Python
上QQ阅读APP看书,第一时间看更新

Getting started with Facets

In this section, we will install Facets in Python, using Jupyter Notebook on Google Colaboratory.

We will then retrieve the training and testing datasets. Finally, we will read the data files.

The data files are the training and testing datasets from Chapter 1, Explaining Artificial Intelligence with Python. This way, we are in a situation in which we know the subject and can analyze the data without having to spend time understanding what it means.

Let's first install Facets on Google Colaboratory.

Installing Facets on Google Colaboratory

Open Facets.ipynb. The first cell contains the installation command:

# @title Install the facets-overview pip package.
!pip install facets-overview

The installation may be lost when the virtual machine (VM) is restarted. If this is the case, it will be installed again. If Facets is installed, the following message is displayed:

Requirement already satisfied:

The program will now retrieve the datasets.

Retrieving the datasets

The program retrieves the datasets from GitHub or Google Drive.

To import the data from GitHub, set the import option to repository = "github".

To read the data from Google Drive, set the option to repository = "google".

In this section, we will activate GitHub and import the data:

# @title Importing data <br>
# Set repository to "github"(default) to read the data
# from GitHub <br>
# Set repository to "google" to read the data
# from Google {display-mode: "form"}
import os
from google.colab import drive
# Set repository to "github" to read the data from GitHub
# Set repository to "google" to read the data from Google
repository = "github"
if repository == "github":
  !curl -L https://raw.githubusercontent.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python/master/Chapter03/DLH_train.csv --output "DLH_train.csv"
  !curl -L https://raw.githubusercontent.com/PacktPublishing/Hands-On-Explainable-AI-XAI-with-Python/master/Chapter03/DLH_test.csv --output "DLH_test.csv"

The data is now accessible to our runtime. We will set the path for each file:

 # Setting the path for each file
  dtrain = "/content/DLH_train.csv"
  dtest = "/content/DLH_test.csv"
 print(dtrain, dtest)

You can choose the same actions with Google Drive:

if repository == "google":
  # Mounting the drive. If it is not mounted, a prompt
  # will provide instructions
  drive.mount('/content/drive')
  # Setting the path for each file
  dtrain = '/content/drive/My Drive/XAI/Chapter03/DLH_Train.csv'
  dtest = '/content/drive/My Drive/XAI/Chapter03/DLH_Train.csv'
  print(dtrain, dtest)

We have installed Facets and can access the files. We will now read the files.

Reading the data files

In this section, we will use pandas to read the data files and load them into DataFrames.

We will first import pandas and define the features:

# Loading Denis Rothman research training and testing data
# into DataFrames
import pandas as pd
features = ["colored_sputum", "cough", "fever", "headache", "days",
            "france", "chicago", "class"]

The data files contain no headers so we will use our features array to define the names of the columns for the training data:

train_data = pd.read_csv(dtrain, names=features, sep=r'\s*,\s*',
                         engine='python', na_values="?")

The program now reads the training data file into a DataFrame:

test_data = pd.read_csv(dtest, names=features, sep=r'\s*,\s*',
                        skiprows=[0], engine='python', na_values="?")

Having read the data into DataFrames, we can now implement feature statistics for our datasets.