Some Reinforcement Learning: The Greedy and Explore-Exploit Algorithms for the Multi-Armed Bandit Framework in Python

Some Reinforcement Learning: The Greedy and Explore-Exploit Algorithms for the Multi-Armed Bandit Framework in Python

In this article the multi-armed bandit framework problem and a few algorithms to solve the problem is going to be discussed. This problem appeared as a lab assignment in the edX course DAT257x: Reinforcement Learning Explained by Microsoft. The problem description is taken from the assignment itself.

The Problem Statement and Some Theory

Given a set of  actions with some unknown reward distributions,  maximize the cumulative reward by taking the actions sequentially, one action at each time step and obtaining a reward immediately.  This is the traditional explore-exploit problem in reinforcement learning. In order to find the optimal action, one needs to explore all the actions but not too much. At the same time, one needs to exploit the best action found so-far by exploring.

The following figure defines the problem mathematically  and shows the exploration-exploitation dilemma in a general setting of the reinforcement learning problems with sequential decision with incomplete information.

f0
The following figure shows a motivating application of the multi-armed bandit problem in drug discovery.  Given a set of experimental drugs (each of which can be considered an arm in the bandit framework) to be applied on a set of patients sequentially, with a reward 1 if a patient survives after the application of the drug and 0 if he dies, the goal is to save as many patients as we can.

f4.png
The following figures show the naive and a few variants of the greedy algorithms for maximizing the cumulative rewards.

  • The Naive Round-Robin algorithm basically chooses every action once to complete a round and repeats the rounds. Obviously it’s far from optimal since it explores too much and exploits little.
  • The greedy algorithm tries to choose the arm that has maximum average reward, with the drawback that it may lock-on to a sub-optimal action forever.
  • The epsilon greedy and optimistic greedy algorithms are variants of the greedy algorithm that try to recover from the drawback of the greedy algorithm. Epsilon-greedy chooses an action uniformly at random with probability epsilon, whereas
    the optimistic greedy algorithm initialized the estimated reward for each action to a high value, in order to prevent locking to a sub-optimal action.

f1

The following code from the github repository of the same course shows how the basic bandit Framework can be defined:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
import numpy as np
import sys
 
### Interface
class Environment(object):
 
def reset(self):
raise NotImplementedError('Inheriting classes must override reset.')
 
def actions(self):
raise NotImplementedError('Inheriting classes must override actions.')
 
def step(self):
raise NotImplementedError('Inheriting classes must override step')
 
class ActionSpace(object):
 
def __init__(self, actions):
self.actions = actions
self.n = len(actions)
 
### BanditEnv Environment
 
class BanditEnv(Environment):
 
def __init__(self, num_actions = 10, distribution = "bernoulli", evaluation_seed="387"):
super(BanditEnv, self).__init__()
 
self.action_space = ActionSpace(range(num_actions))
self.distribution = distribution
 
np.random.seed(evaluation_seed)
 
self.reward_parameters = None
if distribution == "bernoulli":
self.reward_parameters = np.random.rand(num_actions)
elif distribution == "normal":
self.reward_parameters = (np.random.randn(num_actions),
np.random.rand(num_actions))
elif distribution == "heavy-tail":
self.reward_parameters = np.random.rand(num_actions)
else:
print("Please use a supported reward distribution") #, flush = True)
sys.exit(0)
 
if distribution != "normal":
self.optimal_arm = np.argmax(self.reward_parameters)
else:
self.optimal_arm = np.argmax(self.reward_parameters[0])
 
def reset(self):
self.is_reset = True
return None
 
def compute_gap(self, action):
if self.distribution != "normal":
gap = np.absolute(self.reward_parameters[self.optimal_arm] -
self.reward_parameters[action])
else:
gap = np.absolute(self.reward_parameters[0][self.optimal_arm] -
self.reward_parameters[0][action])
return gap
 
def step(self, action):
self.is_reset = False
 
valid_action = True
if (action is None or action = self.action_space.n):
print("Algorithm chose an invalid action; reset reward to -inf")#, flush = True)
reward = float("-inf")
gap = float("inf")
valid_action = False
 
if self.distribution == "bernoulli":
if valid_action:
reward = np.random.binomial(1, self.reward_parameters[action])
gap = self.reward_parameters[self.optimal_arm] -
self.reward_parameters[action]
elif self.distribution == "normal":
if valid_action:
reward = self.reward_parameters[0][action] + self.reward_parameters[1][action] * np.random.randn()
gap = self.reward_parameters[0][self.optimal_arm] - self.reward_parameters[0][action]
elif self.distribution == "heavy-tail":
if valid_action:
reward = self.reward_parameters[action] + np.random.standard_cauchy()
gap = self.reward_parameters[self.optimal_arm] - self.reward_parameters[action] #HACK to compute expected gap
else:
print("Please use a supported reward distribution")#, flush = True)
sys.exit(0)
 
return(None, reward, self.is_reset, '')
 
#Policy interface
class Policy:
#num_actions: (int) Number of arms [indexed by 0 ... num_actions-1]
def __init__(self, num_actions):
self.num_actions = num_actions
 
def act(self):
pass
 
def feedback(self, action, reward):
pass

Now in order to implement an algorithm we need to just extend (inherit from) the Policy base class (interface) and implement the functions act() and feedback() for that algorithm (policy).

In order to theoretically analyze the greedy algorithms and find algorithms that have better performance guarantees, let’s define regret as the gap in between the total expected reward with the action chosen by the optimal policy and the cumulative reward with a set of actions chosen by any algorithm (assuming that the reward distributions are known), as shown in the following figure. Hence, maximizing cumulative reward is equivalent to minimizing the regret.

Given that greedy exploits too much an epsilon greedy explores too much, it can be shown that all the greedy variants have regrets linear in the number of timesteps T.

Also, it was theoretically proven by Lai and Robbins,that the lower bound on the regret is logarithmic in the number of timesteps T.

 

f2


  • The next figure shows the reward distributions for a 5-armed bandit framework.
  • If we take an action A_i, with probability p_i we shall get a reward of 1 and with probability 1-p_i we shall get a reward of 0.
  • Hence, each arm A_i has reward that is distributed as a Bernoulli random variable with some parameter p_i,  so we have R_i ~ B(p_i), i=0..4.
  • We first generate the parameters for the distribution corresponding to each arm randomly.
  • As we can see, the arm 2 has the highest p_i, so if we choose arm 2, we have the highest probability to get a reward of 1 at any timestep.
  • Obviously, the arm 2 is the arm/action that gives the most promising reward and any optimal policy should choose that arm at all timesteps.

 

gr

Now, let’s assume that we don’t know p_i values and we use the Greedy and the Naive Round-Robin algorithms to maximize the cumulative rewards over 10000 timesteps.

As can be seen from the next animations and figures

  1. Greedy algorithm locks into the arm/action 0 and can’t find the optimal action.
  2. Round robin algorithm chooses the actions uniformly and can’t find the optimal action.
  3. Both Greedy and Round-Robin has linear regrets (w.r.t. timesteps).
  4. In this case the greedy algorithm even with sub-optimal action still performs relatively better than the round-robin.

Greedy
ggcrgcrg

 

Round-Robin

rrrrcrrrcrg


 

  • The next figure shows the reward distributions for a 10-armed bandit framework.
  • If we take an action A_i, with probability p_i we shall get a reward of 1 and with probability 1-p_i we shall get a reward of 0.
  • Hence, each arm A_i has reward that is distributed as a Bernoulli random variable with some parameter p_i,  so we have R_i ~ B(p_i), i=0..9.
  • We first generate the parameters for the distribution corresponding to each arm randomly.
  • As we can see, the arm 6 has the highest p_i, so if we choose arm 6, we have the highest probability to get a reward of 1 at any timestep.
  • Obviously, the arm 6 is the arm/action that gives the most promising reward and any optimal policy should choose that arm at all timesteps.

 

egr.png

Again, let’s assume that we don’t know p_i values and we use the Epsilon-Greedy (with different values of the hyper-parameter ε) and the Optimistic-Greedy (with different values of the hyper-parameter R) algorithms to maximize the cumulative rewards over 10000 timesteps.

As can be seen from the next animations and figures

  1. Epsilon-Greedy algorithm behaves exactly like Greedy when ε = 0 and behaves randomly when ε = 1. For both of these cases, the ε-greedy algorithm has linear regret.
  2.  ε = 0.1 and ε = 0.15 find the optimal arm 6 eventually and they have sub-linear regrets. 
  3. Optimistic-Greedy algorithm behaves exactly like Greedy when R = 0 and behaves randomly when R = 10000. For both of these cases, the ε-greedy algorithm has linear regret.
  4. R = 3 and R = 5 find the optimal arm 6 eventually and they have sub-linear regrets(w.r.t. timesteps).

Epsilon-Greedy

ε = 0

og_0

egcrg_1

ε = 0.05

eg_.05

egcrg_.05

ε = 0.1

eg_.10egcrg_.1

ε = 0.15

eg_.15egcrg_.15

ε = 1.0

eg_1egcrg_1

 

Optimistic Greedy

ogcr

R = 0

og_0ogcrg0

R = 1

og_1

ogcrg1.png

 

R = 3

og_3.gif

ogcrg3.png

R = 5

og_5.gif

ogcrg5.png

R = 10000

og_10000.gif

ogcrg10000.png

The next figure shows two algorithms (UCB and Bayesian Thompson-Beta Posterior Sampling) that achieve logarithmic regret.

f3


  • The next figure shows the reward distributions for a 10-armed bandit framework.
  • If we take an action A_i, with probability p_i we shall get a reward of 1 and with probability 1-p_i we shall get a reward of 0.
  • Hence, each arm A_i has reward that is distributed as a Bernoulli random variable with some parameter p_i,  so we have R_i ~ B(p_i), i=0..9.
  • We first generate the parameters for the distribution corresponding to each arm randomly.
  • As we can see, the arm 4 has the highest p_i, so if we choose arm 4, we have the highest probability to get a reward of 1 at any timestep.
  • Obviously, the arm 4 is the arm/action that gives the most promising reward and any optimal policy should choose that arm at all timesteps.

tr

Again, let’s assume that we don’t know p_i values, we  implement and use the UCB1 and Thompson-Beta algorithms to maximize the cumulative rewards over 10000 timesteps.

As can be seen from the next animations and figures

  1. Both the algorithms find the optimal arm 4 pretty quickly without much of exploration.
  2. Both the algorithms achieve logarithmic regret when the (unknown) reward distribution is Bernoulli.
  3. For posterior sampling with Thompson Beta, each arm’s reward is sampled from the posterior β distribution from the same exponential family with the unknown Bernoulli distributed rewards, starting with the non-informative flat prior.
  4. Posterior R_i ~ β(a_i, b_i) where a_i is the number of times we obtained a reward 0 and b_i is the number of times we obtained a reward 1, when the arm A_i was drawn, as shown in the figure.
  5. The Thompson sampling gives linear regret when the (unknown) reward distribution is normal.

UCB

ucb.gif

ucrucrg

Thompson Beta

posterior_thomsonbeta000000posterior_thomsonbeta010000tbtbposttcrtcrg

Thompson Normal

tcrgN>tcrN

Graph-Based Image Segmentation in Python

Graph-Based Image Segmentation in Python

In this article, an implementation of an efficient graph-based image segmentation technique will be described, this algorithm was proposed by Felzenszwalb et. al. from MIT.  The slides on this paper can be found from Stanford Vision Lab.. The algorithm is closely related to Kruskal’s algorithm for constructing a minimum spanning tree of a graph, as stated by the author and hence can  be
implemented to run in O(m log m) time, where m is the number of edges in the graph.


Problem Definition and the basic idea (from the paper)



  • Let G = (V, E) be an undirected graph with vertices vi ∈ V, the set of elements to be segmented, and edges (vi, vj ) ∈ E corresponding to pairs of neighboring vertices. Each edge (vi, vj ) ∈ E has a corresponding weight w((vi, vj)), which is a non-negative measure of the dissimilarity between neighboring elements vi and vj.

  • In the case of image segmentation, the elements in V are pixels and the weight of an edge is some measure of the dissimilarity between the two pixels connected by that edge (e.g., the difference in intensity, color, motion, location or some other local attribute).


  • Particularly for the implementation described here, an edge weight functionbased on the absolute intensity difference (in the yiq space) between the pixels connected by an edge, w((vi, vj )) = |I(pi) − I(pj )|.


  • In the graph-based approach, a segmentation S is a partition of V into components
    such that each component (or region) C ∈ S corresponds to a connected component
    in a graph G0 = (V, E0), where E0 ⊆ E.


  • In other words, any segmentation is induced by a subset of the edges in E. There are different ways to measure the quality of a segmentation but in general we want the elements in a component to be similar, and elements in different components to be dissimilar.


  • This means that edges between two vertices in the same component should have relatively low weights, and edges between vertices in different components should have higher weights.



  • The next figure shows the steps in the algorithm. The algorithm is very similar to Kruskal’s algorithm for computing the MST for an undirected graph.


f17.png


  • The threshold function τ controls the degree to which the difference between two
    components must be greater than their internal differences in order for there to be
    evidence of a boundary between them.


  • For small components, Int(C) is not a good estimate of the local characteristics of the data. In the extreme case, when |C| = 1, Int(C) = 0. Therefore, a threshold function based on the size of the component, τ (C) = k/|C| is needed to be usedwhere |C| denotes the size of C, and k is some constant parameter.


  • That is, for small components we require stronger evidence for a boundary. In practice k sets a scale of observation, in that a larger k causes a preference for larger components.


  • In general, a Gaussian filter is used to smooth the image slightly before computing the edge weights, in order to compensate for digitization artifacts. We always use a Gaussian with σ = 0.8, which does not produce any visible change to the image but helps remove artifacts.


  • The following python code shows how to create the graph.


1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
import numpy as np
from scipy import signal
import matplotlib.image as mpimg
 
def gaussian_kernel(k, s = 0.5):
    # generate a (2k+1)x(2k+1) gaussian kernel with mean=0 and sigma = s
    probs = [exp(-z*z/(2*s*s))/sqrt(2*pi*s*s) for z in range(-k,k+1)]
return np.outer(probs, probs)
 
def create_graph(imfile, k=1., sigma=0.8, sz=1):
    # create the pixel graph with edge weights as dissimilarities
     rgb = mpimg.imread(imfile)[:,:,:3]
     gauss_kernel = gaussian_kernel(sz, sigma)
     for i in range(3):
         rgb[:,:,i] = signal.convolve2d(rgb[:,:,i], gauss_kernel, boundary='symm', mode='same')
     yuv = rgb2yiq(rgb)
     (w, h) = yuv.shape[:2]
     edges = {}
     for i in range(yuv.shape[0]):
         for j in range(yuv.shape[1]):
             #compute edge weight for nbd pixel nodes for the node i,j
             for i1 in range(i-1, i+2):
                 for j1 in range(j-1, j+2):
                     if i1 == i and j1 == j: continue
 
                     if i1 >= 0 and i1 = 0 and j1 < h:
                        wt = np.abs(yuv[i,j,0]-yuv[i1,j1,0])
                        n1, n2 = ij2id(i,j,w,h), ij2id(i1,j1,w,h)
                        edges[n1, n2] = edges[n2, n1] = wt
     return edges

 


Some Results


  • The images are taken from the paper itself or from the internet. The following figures and animations show the result of segmentation as a result of iterative merging of the components (by choosing least weight edges), depending on the internal difference of the components.

  • Although in the paper the author described the best value of the parameter k to be around 300, but  since in this implementation the pixel RGB values are normalized (to have values in between 0 – 1) and then converted to YIQ values and the YIQ intensities are used for computing the weights (which are typically very small), the value of k that works best in this scenario is 0.001-0.01.

  • As we can see from the below results, higher the value of the parameter k, larger the size of the final component and lesser the number of components in the result.

  • The minimum spanning tree creation is also shown, the red edges shown in the figures are the edges chosen by the algorithm to merge the components.


Input Image



            player.png

Output Images for two different values of the parameter k

player_k_0.001.gif

player_k_0.01.gif

out_ 0.001player.png

out_ 0.010player.png


The forest created after a few iterations


mst1


Input Image


hill

Output Images for two different values of the parameter k

hill_k_0.01

out_ 0.010hill.png

hill_k_0.001

out_ 0.001hill.png


The forest created after a few iterations


mst3.png


Input Image


parrot

Output Segmented Images

parrot_k_0.001out_ 0.001parrot_Dark2.pngout_ 0.001parrot_hot.png



Input Image


road2

Segmented Output Images

road2_k_0.01
out_ 0.001road2_Set1.png



Input Image


road


Output Images for two different values of the parameter k


road_k_0.001

out_ 0.001road.png
road_k_0.01out_ 0.010road.png


The forest created after a few iterations


mst2.png

 


  Input Image (UMBC)


             umbc.png


Segmented Output Images


umbc_k_0.001out_ 0.001umbc.png


    Input Image


nj.png



Segmented Output Image with k=0.001



out_ 0.001nj.png


    Input Image


hand


Segmented Output Image with k=0.001


ring_k_0.001



    Input Image


flowers2

Segmented Output Image with k=0.001

out_ 0.001flowers2.png


    Input Image (liver)


liver

Segmented Output Images 

out_ 0.001liver.pngout_ 0.010liver.png



    Input Image


building

Segmented Output Images with different values of k

building_k_0.01

out_ 0.001building.png out_ 0.010building.png


    Input Image


frame056

Segmented Output Image

out_ 0.001frame056.png


    Input Image


vic

Segmented Output Images for different k

vic


 


Graph-Based Image Segmentation in Python

Implementing Lucas-Kanade Optical Flow algorithm in Python

Implementing Lucas-Kanade Optical Flow algorithm in Python

In this article an implementation of the Lucas-Kanade optical flow algorithm is going to be described. This problem appeared as an assignment in  a computer vision course from UCSD. The inputs will be sequences of images (subsequent frames from a video) and the algorithm will output an optical flow field (u, v) and trace the motion of the moving objects. The problem description is taken from the assignment itself.

 


Problem Statement



Single-Scale Optical Flow

  • Let’s implement the single-scale Lucas-Kanade optical flow algorithm. This involves finding the motion (u, v) that minimizes the sum-squared error of the brightness constancy equations for each pixel in a window.  The algorithm will be implemented as a function with the following inputs:

     def optical_flow(I1, I2, window_size, tau) # returns (u, v)

  • Here, u and v are the x and y components of the optical flow, I1 and I2 are two images taken at times t = 1 and t = 2 respectively, and window_size is a 1 × 2 vector storing the width and height of the window used during flow computation.
  • In addition to these inputs, a theshold τ should be added, such that if τ is larger than the smallest eigenvalue of A’A, then the the optical flow at that position should not be computed. Recall that the optical flow is only valid in regions where

f18.png
has rank 2, which is what the threshold is checking. A typical value for τ is 0.01.

  • We should try experimenting with different window sizes and find out the tradeoffs associated with using a small vs. a large window size.
  • The following figure describes the algorithm, which considers a nxn (n>=3) window around each pixel and solves a least-square problem to find the best flow vectors for the pixel.

f19.png

  • The following code-snippet shows how the algorithm is implemented in python for a gray-level image.
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
import numpy as np
from scipy import signal
def optical_flow(I1g, I2g, window_size, tau=1e-2):
 
    kernel_x = np.array([[-1., 1.], [-1., 1.]])
    kernel_y = np.array([[-1., -1.], [1., 1.]])
    kernel_t = np.array([[1., 1.], [1., 1.]])#*.25
    w = window_size/2 # window_size is odd, all the pixels with offset in between [-w, w] are inside the window
    I1g = I1g / 255. # normalize pixels
    I2g = I2g / 255. # normalize pixels
    # Implement Lucas Kanade 
    # for each point, calculate I_x, I_y, I_t
    mode = 'same'
    fx = signal.convolve2d(I1g, kernel_x, boundary='symm', mode=mode)
    fy = signal.convolve2d(I1g, kernel_y, boundary='symm', mode=mode)
    ft = signal.convolve2d(I2g, kernel_t, boundary='symm', mode=mode) +
         signal.convolve2d(I1g, -kernel_t, boundary='symm', mode=mode)
    u = np.zeros(I1g.shape)
    v = np.zeros(I1g.shape)
    # within window window_size * window_size
    for i in range(w, I1g.shape[0]-w):
        for j in range(w, I1g.shape[1]-w):
            Ix = fx[i-w:i+w+1, j-w:j+w+1].flatten()
            Iy = fy[i-w:i+w+1, j-w:j+w+1].flatten()
            It = ft[i-w:i+w+1, j-w:j+w+1].flatten()
            #b = ... # get b here
            #A = ... # get A here
            # if threshold τ is larger than the smallest eigenvalue of A'A:
            nu = ... # get velocity here
            u[i,j]=nu[0]
            v[i,j]=nu[1]
 
    return (u,v)

 


Some Results


  • The following figures and animations show the results of the algorithm on a few image sequences. Some of these input image sequences / videos are from the course and some are collected from the internet.
  • As can be seen, the algorithm performs best if the motion of the moving object(s) in between consecutive frames is slow. To the contrary, if the motion is large, the algorithm fails and we should implement / use multiple-scale version Lucas-Kanade with image pyramids.
  • Finally,  with small window size,  the algorithm captures subtle motions but not large motions. With large size it happens the other way.


Input Sequences

sphere

shpere_cmap_15

Output Optical Flow with different window sizes

window size = 15

shpere_opt_15

window size = 21

shpere_opt_21

 



Input Sequences
rubic

Output Optical Flow
rubic_opt

rubic_cmap



Input Sequences (hamburg taxi)
taxi

taxi_cmap

Output Optical Flow
taxi_opt


Input Sequences
box

box_cmap

Output Optical Flow
box_opt


Input Sequences
seq

seq_cmap

Output Optical Flowseq_opt


Input Sequences    fount3.gif

fount_cmap

Output Optical Flowfount_opt


Input Sequences
corridor

Output Optical Flow
corridor_optc


Input Sequencessynth

synth'_cmap
Output Optical Flowsynth_opt


Input Sequencescars1
Output Optical Flowcars1_optcars1_cmap


Input Sequencescars2

Output Optical Flowcars2_opt

Output Optical Flowcars2_opt2cars2_cmap



Input Sequences

carsh.gif

cars3_cmap

Output Optical Flow with window size 45
cars3_opt.gif

Output Optical Flow with window size 10

cars3_opt2_10
Output Optical Flow with window size 25
cars3_opt2_25
Output Optical Flow with window size 45cars3_opt2_45


 


Implementing Lucas-Kanade Optical Flow algorithm in Python

Image Colorization Using Optimization in Python

Image Colorization Using Optimization in Python

This article is inspired by this SIGGRAPH paper by Levin et. al, for which they took this patent , the paper was referred to in the course CS1114 from Cornell.  This method is also discussed in the coursera online image processing course by NorthWestern University. Some part of the problem description is taken from the paper itself. Also, one can refer to the implementation provided by the authors in matlab, the following link and the following python implementation in github.

 

The Problem

Colorization is a computer-assisted process of adding color to a monochrome image or movie. In the paper the authors presented an optimization-based colorization method that is based on a simple premise: neighboring pixels in space-time that have similar intensities should have similar colors.

This premise is formulated using a quadratic cost function  and as an optimization problem. In this approach an artist only needs to annotate the image with a few color scribbles, and the indicated colors are automatically propagated in both space and time to produce a fully colorized image or sequence.

In this article the optimization problem formulation and the way to solve it to obtain the automatically colored image will be described for the images only.

 

The Algorithm

YUV/YIQ color space is chosen to work in, this is commonly used in video, where Y is the monochromatic luminance channel, which we will refer to simply as intensity, while Uand V are the chrominance channels, encoding the color.

The algorithm is given as input an intensity volume Y(x,y,t) and outputs two color volumes U(x,y,t) and V(x,y,t). To simplify notation the boldface letters are used (e.g. r,s) to denote left(x,y,t right) triplets. Thus, Y(r) is the intensity of a particular pixel.

Now, One needs to impose the constraint that two neighboring pixels r,s should have similar colors if their intensities are similar. Thus, the problem is formulated as the following optimization problem that aims to minimize the difference between the  colorU(r) at pixel r and the weighted average of the colors at neighboring pixels, where w(r,s)is a weighting function that sums to one, large when Y(r) is similar to Y(s), and small whenthe two intensities are different. 

f12

When the intensity is constant the color should be constant, and when the intensity is an edge the color should also be an edge. Since the cost functions are quadratic and
the constraints are linear, this optimization problem yields a large, sparse system of linear equations, which may be solved using a number of standard methods.

As discussed in the paper, this algorithm is closely related to algorithms proposed for other tasks in image processing. In image segmentation algorithms based on normalized cuts [Shi and Malik 1997], one attempts to find the second smallest eigenvector of the matrix D − W where W is a npixels×npixels matrix whose elements are the pairwise affinities between pixels (i.e., the r,s entry of the matrix is w(r,s)) and D is a diagonal matrix whose diagonal elements are the sum of the affinities (in this case it is always 1). The second smallest eigenvector of any symmetric matrix A is a unit norm vector x that minimizes x^{T}Ax and is orthogonal to the first eigenvector. By direct  inspection, the quadratic form minimized by normalized cuts is exactly the cost function J, that is x^{T}(D-W)x =J(x). Thus, this algorithm minimizes the same cost function but under different constraints. In image denoising algorithms based on anisotropic diffusion[Perona and Malik 1989; Tang et al. 2001] one often minimizes a  function
similar to equation 1, but the function is applied to the image intensity as well.

The following figures show an original gray-scale image and the marked image with color-scribbles that are going to be used to compute the output colored image.

Original Gray-scale Image Input
baby

Gray-scale image Input Marked with Color-Scribbles 
baby_marked

Implementation of the Algorithm

Here are the the steps for the algorithm:

  1. Convert both the original gray-scale image and the marked image (marked with color scribbles for a few pixels by the artist) to from RGB to YUV / YIQ color space.
  2. Compute the difference image from the marked and the gray-scale image. The pixels that differ  are going to be pinned and they will appear in the output, they are directly going to be used as stand-alone constraints in the minimization problem.
  3. We need to compute the weight matrix W that depends on the similarities in the neighbor intensities for a pixel from the original gray-scale image.
  4. The optimization problem finally boils down to solving the system of linear equations of the form WU = b that has a closed form least-square solution
    U = W^{+}b = {(W^{T}W)}^{-1}W^{T}b. Same thing is to be repeated for the V channel too.
  5. However the W matrix is going to be very huge and sparse, hence sparse-matrix based implementations must be used to obtain an efficient solution. However, in this python implementation in github, the scipy sparse lil_matrixwas used when constructing the sparse matrices, which is quite slow, we can construct more efficient scipy csc matrix rightaway, by using a dictionary to store the weights initially. It is much faster. The python code in the next figure shows my implementation for computing the weight matrix W.
  6. Once W is computed it’s just a matter of obtaining the least-square solution, by computing the pseudo-inverse, which can be more efficiently computed with LU factorization and a sparse LU solver, as in this  python implementation in github.
  7. Once the solution of the optimization problem is obtained in YUV / YIQ space, it needs to be converted back to RGB. The following formula is used for conversion.rgbyiq.png
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
import scipy.sparse as sp
from collections import defaultdict
 
def compute_W(Y, colored):
 
    (w, h) = Y.shape
    W = defaultdict()
    for i in range(w):
        for j in range(h):
            if not (i, j) in colored: # pixel i,j in color scribble
            (N, sigma) = get_nbrs(Y, i, j, w, h) #r = (i, j)
            Z = 0.
            id1 = ij2id(i,j,w,h) # linearized id
            for (i1, j1) in N: #s = (i1, j1)
                id2 = ij2id(i1,j1,w,h)
                W[id1,id2] = np.exp(-(Y[i,j]-Y[i1,j1])**2/(sigma**2)) if sigma > 0 else 0.
                Z += W[id1,id2]
            if Z > 0:
               for (i1, j1) in N: #s = (i1, j1)
                   id2 = ij2id(i1,j1,w,h)
                   W[id1,id2] /= -Z
     for i in range(w):
        for j in range(h):
            id = ij2id(i,j,w,h)
            W[id,id] = 1.
 
    rows, cols = zip(*(W.keys())) #keys
    data = W.values() #[W[k] for k in keys]
    return sp.csc_matrix((data, (rows, cols)), shape=(w*h, w*h)) #W

 

Results

The following images and animations show the results obtained with the optimization algorithm. Most of the following images are taken from the paper itself.

Original Gray-scale Image Input                 Gray-scale image Input Markedbaby

baby_marked

Color image Output
baby_col.png


The next animations show how the incremental scribbling results in better and better color images.

Original Gray-scale Image Input 

monaco_gray

As can be seen from the following animation, the different parts of the building get colored as more and more color-hints are scribbled / annotated.

Gray-scale image Input Markedhouse_marked

Color image Output

house_col


Original Gray-scale Image Input 

waterfall_gray

Gray-scale image Input Marked
water_marked

Color image Output
water_col


Original Gray-scale Image Input (me)me3_gray

Gray-scale image Input Marked incrementally
me_marked

Color image Output
me_colored


Original Gray-scale Image Input
roses_gray

Gray-scale image Input Marked
roses_gray_marked
Color image Output
roses_col


Original Gray-scale Image Input

madurga_gray

Gray-scale image Input Marked

madurga_gray_markedhttps://sandipanweb.files.wordpress.com/2018/01/madurga_gray_marked.png?w=150&h=128 150w” sizes=”(max-width: 413px) 100vw, 413px” />

Color image Output

madurga_gray_col


 


Image Colorization Using Optimization in Python

Interactive Image Segmentation with Graph-Cut in Python

Interactive Image Segmentation with Graph-Cut in Python

In this article, interactive image segmentation with graph-cut is going to be discussed. and it will be used to segment the source object from the background in an image. This segmentation technique was proposed by Boycov and Jolli in this paper

Problem Statement: Interactive graph-cut segmentation

Let’s implement “intelligent paint” interactive segmentation tool using graph cuts algorithm on a weighted image grid. Our task will be to separate the foreground object from the background in an image.

Since it can be difficult sometimes to automatically define what’s foreground and what’s background for an image, the user is going to help us with a few interactive scribble lines using which our algorithm is going to identify the foreground and the background, after that it will be the algorithms job to obtain a complete segmentation of the foreground from the background image.

The following figures show how an input image gets scribbling from a user with two different colors (which is also going to be input to our algorithm) and the ideal segmented image output.

Scribbled Input Image                                       Expected Segmented Output Image       bj

 

The Graph-Cut Algorithm

The following describes how the segmentation problem is transformed into a graph-cut problem:

  1. Let’s first define the Directed Graph G = (V, E) as follows:
    1. Each of the pixels in the image is going to be a vertex in the graph. There will be another couple of special terminal vertices: a source vertex (corresponds to the foreground object) and a sink vertex (corresponds to the background object in the image). Hence, |V(G)| = width x height + 2.
    2. Next, let’s defines the edges of the graph. As obvious, there is going to be two types of edges: terminal (edges that connect the terminal nodes to the non-terminal nodes) and non-terminal (edges that connect the non-terminal nodes only).
    3. There will be a directed edge from the terminal node source to each of non-terminal nodes in the graph. Similarly,  a directed edge will be there from each non-terminal node (pixel) to the other terminal node sink. These are going to be all the terminal edges and hence, |E_T(G)| =  2 x width x height.
    4. Each of the non-terminal nodes (pixels) are going to be connected by edges with the nodes corresponding to the neighboring pixels (defined by 4 or 8 neighborhood of a pixel).  Hence, |E_N(G)| =  |Nbd| x width x height.
  2. Now let’s describe how to compute the edge weights in this graph.
    1. In order to compute the terminal edge weights, we need to estimate the feature distributions first, i.e., starting with the assumption that each of the nodes corresponding to the scribbled pixels have the probability 1.0 (since we want the solution to respect the regional hard constraints marked by the user-seeds / scribbles) to be in foreground or background object in the image (distinguished by the scribble color, e.g.), we have to compute the probability that a node belongs to the foreground (or background) for all the other non-terminal nodes.
    2. The simplest way to compute P_F and P_B is to first fit a couple of  Gaussian distributions on the scribbles by computing the parameters
      (μ, ∑)
       with MLE from the scribbled pixel intensities and then computing the (class-conditional) probabilities from the individual pdfs (followed by a normalization) for each of the pixels as shown in the next figures. The following code fragment show how the pdfs are being computed.

      1
      2
      3
      4
      5
      6
      7
      8
      9
      10
      11
      12
      13
      14
      15
      16
      17
      18
      19
      20
      21
      22
      23
      24
      25
      26
      27
      import numpy as np
      from collections import defaultdict
       
      def compute_pdfs(imfile, imfile_scrib):
          '''
          # Compute foreground and background pdfs
          # input image and the image with user scribbles
          '''
           rgb = mpimg.imread(imfile)[:,:,:3]
           yuv = rgb2yiq(rgb)
           rgb_s = mpimg.imread(imfile_scrib)[:,:,:3]
           yuv_s = rgb2yiq(rgb_s)
           # find the scribble pixels
           scribbles = find_marked_locations(rgb, rgb_s)
           imageo = np.zeros(yuv.shape)
           # separately store background and foreground scribble pixels in the dictionary comps
           comps = defaultdict(lambda:np.array([]).reshape(0,3))
           for (i, j) in scribbles:
                imageo[i,j,:] = rgbs[i,j,:]
                # scribble color as key of comps
                comps[tuple(imageo[i,j,:])] = np.vstack([comps[tuple(imageo[i,j,:])], yuv[i,j,:]])
                mu, Sigma = {}, {}
           # compute MLE parameters for Gaussians
           for c in comps:
                mu[c] = np.mean(comps[c], axis=0)
                Sigma[c] = np.cov(comps[c].T)
           return (mu, Sigma)

      3. In order to compute the non-terminal edge weights, we need to compute the similarities in between a pixel node and the nodes corresponding to its neighborhood pixels, e.g., with the formula shown in the next figures (e.g., how similar neighborhood pixels are in RGB / YIQ space).

  3. Now that the underlying graph is defined, the segmentation of the foreground from the background image boils down to computing the min-cut in the graph or equivalently computing the max-flow (the dual problem) from the source to sink.
  4. The intuition is that the min-cut solution will keep the pixels with high probabilities to belong to the side of the source (foreground) node and likewise the background pixels on the other side of the cut near the sink (background) node, since it’s going to respect the (relatively) high-weight edges (by not going through the highly-similar pixels).
  5. There are several standard algorithms, e.g., Ford-Fulkerson (by finding an augmenting path with O(E max| f |) time complexity) or Edmonds-Karp (by using bfs to find the shortest augmenting path, with O(VE2) time complexity) to solve the max-flow problem that run fast, in polynomial time in V or E. Here we are going to use an implementation (with pymaxflow) based on Vladimir Kolmogorov, which is shown to run faster on some images empirically.

 

f13

f15f14

f16

Results

The following figures / animations show the interactive-segmentation results (computed probability densities, subset of the flow-graph & min-cut, final segmented image) on a few images,  some of them taken from the above-mentioned courses / videos, some of them taken from Berkeley Vision dataset.


Input Imagedeer

Input Image with Scribblesdeer_scribed

Fitted Densities from Color Scribblesgaussian_deer.png

maxflow_deerdeer_seg

A Tiny Sub-graph with Min-Cut

flow_deer.png


Input Image
eagle

Input Image with Scribbleseagle_scribed

Fitted Densities from Color Scribblesgaussian_eagle

maxflow_eagle

A Tiny Sub-graph with Min-Cut flow_eagle


Input Image (liver)                                             
liver

Input Image with Scribblesliver_scribed

maxflow_liver


Input Image
panda

Input Image with Scribblespanda_scribed

maxflow_panda


Input Imagebunnyhttps://sandipanweb.files.wordpress.com/2018/02/bunny.png?w=150&h=94 150w, https://sandipanweb.files.wordpress.com/2018/02/bunny.png?w=300&h=187 300w” sizes=”(max-width: 681px) 100vw, 681px” />

Input Image with Scribblesbunny_scribed

Fitted Densities from Color Scribbles
gaussian_bunny

maxflow_bunny

seg_bunny

A Tiny Sub-graph of the flow-graph with Min-Cut
flow


Input Image
flowers

Input Image with Scribbles
flowers_scribedmaxflow_flowers

A Tiny Sub-graph of the flow-graph with Min-Cut

flow


Input Image                                                         Input Image with Scribbles
me3   me3_scribed

maxflow_me3


Input Imagerose

Input Image with Scribbles
rose_scribed

Fitted Densities from Color Scribbles
gaussian_rose

maxflow_rose



Input Image
whale

Input Image with Scribbleswhale_scribed

Fitted Densities from Color Scribblesgaussian_whale

A Tiny Sub-graph of the flow-graph with Min-Cut

flow_whalemaxflow_whalebgc_whale


Input Image (UMBC Campus Map)umbc_campus

Input Image with Scribbles
umbc_campus_scribed

maxflow_umbc_campus


Input Imagehorses

Input Image with Scribbles
horses_scribed

maxflow_horses

A Tiny Sub-graph of the flow-graph with Min-Cut (with blue foreground nodes)

flow_horses


Changing the background of an image (obtained using graph-cut segmentation) with another image’s background with cut & paste

The following figures / animation show how the background of a given image can be replaced by a new image using cut & paste (by replacing the corresponding pixels in the new image corresponding to foreground), once the foreground in the original image gets identified after segmentation.

Original Input Imagehorses

New Backgroundsky.png

bg_1.0000horses.png

bgc_horses


Interactive Image Segmentation with Graph-Cut in Python

Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer in Python

Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer in Python

The following problems appeared as a programming assignment in the Computation Photography course (CS445) at UIUC. The description of the problem is taken from the assignment itself. In this assignment, a python implementation of the problems will be described instead of matlab, as expected in the course.

 

The Problems

  • The goal of this assignment is to implement the image quilting algorithm for
    texture synthesis and transfer, described in this SIGGRAPH 2001 paper by Efros
    and Freeman.
  • Texture synthesis is the creation of a larger texture image from a small sample.
  • Texture transfer is giving an object the appearance of having the
    same texture as a sample while preserving its basic shape.
  • For texture synthesis, the main idea is to sample patches and lay them down in overlapping patterns, such that the overlapping regions are similar.
  • The overlapping regions may not match exactly, which will result in noticeable
    edges artifact. To fix this, we need to compute a path along pixels with similar intensities through the overlapping region and use it to select which overlapping patch from which to draw each pixel.
  • Texture transfer is achieved by encouraging sampled patches to have similar appearance to a given target image, as well as matching overlapping regions of already sampled patches.
  • In this project, we need to apply important techniques such as template matchingfinding seams, and masking. These techniques are also useful for image stitching, image completion, image retargeting and blending.

 

Randomly Sampled Texture

First let’s randomly samples square patches of size patchsize from sample in order to create an output image of size outsize. Start from the upper-left corner, and tile samples until the image is full. If the patches don’t fit evenly into the output image, we can leave black borders at the edges. This is the simplest but least effective method. Save a result from a sample image to compare to the next two methods.

 

Overlapping Patches

Let’s start by sampling a random patch for the upper-left corner. Then sample new patches to overlap with existing ones. For example, the second patch along the top row will overlap by patchsize pixels in the vertical direction and overlap pixels in the horizontal direction. Patches in the first column will overlap by patchsize pixels in the horizontal direction and overlap pixels in the vertical direction. Other patches will have two overlapping regions (on the top and left) which should both be taken into account. Once the cost of each patch has been computed, randomly choose on patch whose cost is
less than a threshold determined by some tolerance value.

As described in the paper, the size of the block is the only parameter controlled by the user and it depends on the properties of a given texture; the block must be big enough to capture the relevant structures in the texture, but small enough so that the interaction between these structures is left up to the algorithm. The overlap size is taken to be one-sixth of the block size (B/6) in general.

 

Seam Finding

Next we need to find the min-cost contiguous path from the left to right side of the patch according to the cost. The cost of a path through each pixel is the square differences (summed over RGB for color images) of the output image and the newly
sampled patch. Use dynamic programming to find the min-cost path.

The following figure describes the algorithm to be implemented for image quilting.

f20.pngTexture Transfer

The final task is to implement texture transfer, based on the quilt implementation for creating a texture sample that is guided by a pair of sample/target correspondence images (section 3 of the paper). The main difference between this function and
quilt function is that there is an additional cost term based on the difference between
the sampled source patch and the target patch at the location to be filled.

Image quilting (texture synthesis) results

The following figures and animations show the results of the outputs obtained with the quilting algorithm. The input texture images are mostly taken from the paper .

Input sample Texture
q2.png

100 sampled blocks of a fixed size (e.g. 50×50) from the input sample
samples_q2.pngThe next animation shows how the large output texture gets created (100 times larger than the input sample texture) with the quilting algorithm.

quilt.gif

Output Texture (10×10 times larger than the input) created with texture synthesis (quilting)
quilt_q2.png

 

Input Texture

q6.png

Output Texture (25 times larger than the input) created with texture synthesis (quilting) with the minimum cost seams (showed as red lines) computed with dynamic programming

quilt_boundary_q6

Output Texture (25 times larger than the input) created with quiltingquilt_q6

 

Input Texture

q1

Output Texture (25 times larger than the input) created with quilting

quilt_q1_.png

Input Texture

q3

Output Texture (25 times larger than the input) created with quilting

quilt_q3.pnghttps://sandipanweb.files.wordpress.com/2017/10/quilt_q3.png?w=150 150w, https://sandipanweb.files.wordpress.com/2017/10/quilt_q3.png?w=300 300w” sizes=”(max-width: 632px) 100vw, 632px” />

Input Texture

q4

Output Texture (12 times larger than the input) created with quilting

quilt_q4.png

Input Texture

q5

Output Texture (25 times larger than the input) created with quilting

quilt_q5.png

Input Texture

q7

Output Texture (25 times larger than the input) created with quilting

quilt_q7_.png

Input Texture

q9

Output Texture (36 times larger than the input) created with quilting

quilt_q9.png

Input Texture

q11

Output Texture (9 times larger than the input) created with quilting

quilt_q11_.png

Input Texture

q12

Output Texture (25 times larger than the input) created with quilting

quilt_q12.png

Input Texture

q13

Output Texture (9 times larger than the input) created with quilting along with the min-cost seams (shown as red lines) computed with dynamic programming 

quilt_boundary_q13

Output Texture (9 times larger than the input) created with quilting

quilt_q13

 

Texture transfer results

The following figures and animations show how the texture from an input image can be transferred to the target image using the above-mentioned simple modification of the quilting algorithm. Again, some of the images are taken from the paper.

Input Texture (milk)

q14.png

Target Image

qt1.png

Output Image after Texture Transfer

text_tran_q14

 

Input Texture (milk)

q14.png

Target Image

mona

Output Image after Texture Transfer

text_tran_q14_

 

The following figure show the output image obtained when a few textures were transferred to my image.

Target Image (me)

me1.png

 

Input Texture (fire)

fire.png

Output Image after Texture Transfer  (with small block size)

text_tran_fire

 

 

Input Texture (cloud)

cloud1.png

Output Image after Texture Transfer  (with small block size)

text_tran_cloud1

 

Input Texture (toast)
toast.png

Output Image after Texture Transfer  (with small block size)text_tran_toast

 

Input Texture
q7.png

Output Image after Texture Transfer  (with small block size)

text_tran_q7

 

Input Texture

draw.png

Output Image after Texture Transfer  (with small block size)text_tran_draw

 

The following animation shows how the milk texture is being transformed to the target image with the quilting algorithm with modified code.

Input Texture

q14

Target Image (me)

ttmea

 

The next figures and animations show the output image obtained after milk texture gets transferred to the target image of mine, for different block size of the samples(shown in red). As can be seen from the following outputs, the target image gets more and more prominent as the sampling block size gets smaller and smaller.

text_tran_036_q14

text_tran_804_q14

ttme


Some Computational Photography: Image Quilting (Texture Synthesis) with Dynamic Programming and Texture Transfer in Python