This post will introduce a new feature for binary blobs like connected components in a text.

The feature is called *patch histogram* and it's the histogram of 3x3
patches of black and white pixels. We collect all 3x3 patches and count
their frequency.

3x3 patch for a binary image contains $2^9 = 512$ different combinations. For each of these combinations, we assign a number. I wrote the implementation in Python and here is a lookup table that converts all possible 3x3 patches to their ids.

```
patch_histogram_dict = {}
for f in range(0, 512):
i22 = f & 1
i21 = (f >> 1) & 1
i20 = (f >> 2) & 1
i12 = (f >> 3) & 1
i11 = (f >> 4) & 1
i10 = (f >> 5) & 1
i02 = (f >> 6) & 1
i01 = (f >> 7) & 1
i00 = (f >> 8) & 1
patch_histogram_dict[ma(i00, i01, i02, i10, i11, i12, i20, i21, i22)] = f
```

I was looking a *skeleton feature* that can be applied to components
after medial axis transform. However, we will use binarized connected
component images directly in this version and take a look at skeletal
version later.

Above, we used a function `ma`

to convert a set of numbers to a tuple of
tuples. It's simple as

```
def ma(i00, i01, i02, i10, i11, i12, i20, i21, i22):
return ((i00, i01, i02),
(i10, i11, i12),
(i20, i21, i22))
```

The first step in feature generation is marking each pixel of the image
with its id. Then a simple `np.histogram`

call generates the histogram.

```
for i in range(1, rows - 1):
for j in range(1, cols - 1):
s = framed[(i-1):(i+2), (j-1):(j+2)]
marks[i-1, j-1] = patch_dict[ma(s[0, 0], s[0, 1], s[0, 2],
s[1, 0], s[1, 1], s[1, 2],
s[2, 0], s[2, 1], s[2, 2])]
```

=framed= is a copy of the original image with a 1 pixel frame, so that
boundary pixel are also counted without much checking. `patch_dict`

is a
parameter that the function receives. Its default value is the
dictionary that we made above, but any kind of patch dictionaries can be
used.

The feature is a call to `numpy.histogram`

as follows

```
histogram = numpy.histogram(marked_points,
hist_range,
density=True,
bins=bins)
```

=marked~points~= are the result of above pixel counting loop.
`hist_range`

is the minimum and maximum value for the patch dictionary.
`density`

parameter is True, so that the histogram is normalized to 1
and feature become size invariant. `bins`

is a parameter we can set
freely, but by default it's equal to the range of elements.

The comparison can be done with standard histogram comparison methods, like EMD or Chi Square distance metrics.