Homework exercise 2:
Tricking neural networks: adversarial examples.

We load a neural network that is trained with keras to recognise images.

In [ ]:
## You should have installed PIL, see other notebook!
In [ ]:
import numpy as np
from keras import backend as K
from PIL import Image
from keras.preprocessing import image
from keras.applications import inception_v3

# Load pre-trained image recognition model
model = inception_v3.InceptionV3()

Load an image of a giant panda and transform it to be usable for the neural network.

In [ ]:
# Load the image file and convert it to a numpy array
img = image.load_img("./data/panda.jpg", target_size=(299, 299))
input_image = image.img_to_array(img)

# Scale the image so all pixel intensities are between [-1, 1] as the model expects
input_image /= 255.
input_image -= 0.5
input_image *= 2.

# Add a 4th dimension for batch size (as Keras expects)
input_image = np.expand_dims(input_image, axis=0)

# Show the image
img

Lets see what the neural nework sees.

In [ ]:
# Run the image through the neural network
predictions = model.predict(input_image)

# Convert the predictions into text and print them
predicted_classes = inception_v3.decode_predictions(predictions, top=1)
imagenet_id, name, confidence = predicted_classes[0][0]
print("This is a {} with {:.4}% confidence!".format(name, confidence * 100))
To trick the neural network, implement below algorithm in a loop until the desired outcome is reached. -- 1. Feed `model` the photo that should be hacked. -- 2. Check `model`'s prediction and evaluate the distance of the prediction from the desired answer. -- 3. Tweak the photo using backpropagation to make the final prediction closer to the desired answer.
In [ ]:
# Get the reference to the first and last layer of the model
model_input_layer = model.layers[0].input
model_output_layer = model.layers[-1].output

# Try making the pandas to a tennis ball (https://gist.github.com/ageitgey/4e1342c10a71981d0b491e1b8227328b)
object_type_to_fake = 852 # tennis_ball

# Pre-calculate the maximally allowed changes to the image. Larger allowed variations produce results faster, but cause the hacked image looking fake.
max_change_above = input_image + 0.01
max_change_below = input_image - 0.01

# Create a copy of the input image to modify
hacked_image = np.copy(input_image)

# How much to update the hacked image in each iteration
learning_rate = 0.1

# Define the cost function, i.e. the likelihood that the predicted class is the target class
cost_function = model_output_layer[0, object_type_to_fake]

# Use keras to calculate the gradient  based on the input image and the currently predicted class
gradient_function = K.gradients(cost_function, model_input_layer)[0]

# And use keras to get the cost and gradients from the model
grab_cost_and_gradients_from_model = K.function([model_input_layer, K.learning_phase()], [cost_function, gradient_function])

Implement the training algorithm. Assume that the target is fooling the network with 80% confidence. For that, get the cost and gradients for the hacked image and move the hacked image one step closer towards the target class. Make sure that the image does not change too much. You can use np.clip for this.

In [ ]:
### Implement the training algorithm here
In [ ]:
# Save the hacked image
img = hacked_image[0]
img /= 2.
img += 0.5
img *= 255.

im = image.array_to_img(img)
im.save("./data/my-tennis.png")
In [ ]:
# Load the hacked image file and convert it to a numpy array
img = image.load_img("./data/tennis.png", target_size=(299, 299))
input_image = image.img_to_array(img)

# Scale the image so all pixel intensities are between [-1, 1] as the model expects
input_image /= 255.
input_image -= 0.5
input_image *= 2.

# Add a 4th dimension for batch size (as Keras expects)
input_image = np.expand_dims(input_image, axis=0)

# Show the image
img
In [ ]:
# Run the image through the neural network
predictions = model.predict(input_image)

# Convert the predictions into text and print them
predicted_classes = inception_v3.decode_predictions(predictions, top=1)
imagenet_id, name, confidence = predicted_classes[0][0]
print("This is a {} with {:.4}% confidence!".format(name, confidence * 100))