Skip to content
Tom SF Haines edited this page Mar 31, 2016 · 2 revisions

Homography

Overview

Homography

A simple library of functions for constructing homographies, then distorting images with them. Has the usual set of translate/rotate/scale, which can be combined using ndarray.dot, plus generating the transform for 4 pairs of coordinates. There are then some helper methods for working out the exact size of image required to contain another image after it has been through a given homography. A transform method then allows you to apply a homography (B-Spline interpolation, degree 0-5 inclusive.), though be warned that you give it the homography that converts output coordinates to input coordinates, so you will have to invert a matrix constructed to go the other way.

Also includes some additional methods for querying arbitrary locations in an image with B-Spline interpolation - just made sense to include them here so they can share the B-Spline code. There is also a Gaussian blur implementation (n-dimensional, with support for derivatives and missing data handling) that got shoved in here.

Be warned that homographies are constructed to apply to vectors [x, y, w], to be consistent with everyone else, but then the arrays are indexed [y, x] - this makes things a touch confusing at points.

Contains the following key files:

hg.py - File for a user to import; just a bunch of functions.

test_*.py - Some test scripts. All take at least an image as input.

readme.txt - This file, which is included in the documentation.

make_doc.py - Builds the documentation.


Functions

translate(offset) Returns the homography for offsetting an input by a given amount - offset must be interpretable as a 2-element vector. Return matrix multiplied by a vector (x,y,1) will return a new vector (x+offset[0], y+offset[1], 1).

rotate(angle) Applies the given rotation, anticlockwise around the origin, in radians.

scale(amount) Scales everything to be the given amount times bigger (or smaller if <1).

match(source, dest) Calculates a 2D homography that converts from the source coordinates to the dest coordinates. both are (4,2) data matrices of 4 x,y coordinates. Returns a 3x3 matrix that when multiplied by the homogenous versions of source gets you to dest.

bounds(hg, lower, upper) Given a homography and a rectangle, as a lower/upper pair of coordinate (x, y order), this returns the axis-aligned rectangle (as the tuple (lower, upper)) that contains the rectangle after the provided homography has been applied.

fit(hg, shape) Given a homography and the shape of the image (in height, width order - consistant with image arrays, not the homography!) it is to be applied to this returns the tuple (hg, shape), which is a replacement for the homography and the output shape, such that the image after transformation is not clipped. Effectively offsets the homography so there are no negative values and increases the shape as required.

scaling(hg, lower, upper, divisions = 100) Given a rectangle this returns the tuple (minimum, maximum) giving the range of scaling encountered by the homography. Works by effectivly dividing the given rectangle (lower and upper) into the given number of divisions (defaults to 100) and finding their lengths before and after the transform, so the scaling factor for each can be found and the minimum/maximum determined. It assumes the homography doesn't have singularities, and hence only evaluates the 8 corner edges for efficiency.

fillmasked(?) Given a dictionary representing an image fills in all values outside the mask with the same colour as the closest valid pixel, measured with Manhatten distance. Primarily a method used internally by transform(...) to avoid the complexity of handling a mask, but exposed incase its useful elsewhere. A no-op if called on an image that has no mask. The image is a set of numpy arrays indexed by channel names, all 2D and with the same size, all float32 except for a mask which is uint8 where non-zero means valid.

transform(?) Given a dictionary representing an image returns a new dictionary of the image having been transformed by a provided homography. Note that you typically think of homographys as going from source to target - this expects the inverse. You should also provide the width and height of the output image, though they default to the same as the input image if not provided. Parameters are (hg - homography to apply; each pixel coordinate is multiplied by it to get the source coordinate, image - dictionary of channels, each a float32 2D numpy array of the same size, indexed [y,x]. Can also include a 'mask' channel, uint8, that is nonzero when a pixel is valid, optional height, optional width, optional degree of the polynomial, which can be 0-5, and defaults to 3 (cubic)). Return is a new image dictionary, which will always contain a 'mask' channel indicating which pixels are valid. Note that if there is a mask it will make changes to the original image, but only in the areas marked as invalid by the mask.

sample(?) Lets you sample a specified set of locations in an image. Takes parameters (image, locations, degree). image is a dictionary of 2D float32 arrays indexed [y,x], all the same size, to be sampled. Can also include a 'mask' of uint8 where nonzero means valid. locations is a list of coordinates in the image to evaluate, as a 2D float32 numpy array with x in column 0 and y in column 1. degree is the optional degree of the B-spline to use - defaults to 3 (cubic; must be 0-5). It returns a dictionary of 1D float32 numpy arrays indexed [location] of all the evaluations, one per input image channel. Note that any coordinates that land outside the image will be evaluated using repetition of border pixels - no mask is generated. Also note that if there is a mask it will make changes to the original image, but only in the areas marked as invalid by the mask.

offsets(?) Slightly strange - lets you sample a specified set of offsets around each point in an array of coordinates. For extracting the values required by features that need this kind of thing. Takes parameters (image, points, offsets, degree). image is a dictionary of 2D float32 arrays indexed [y,x], all the same size, to be sampled. Can also include a 'mask' of uint8 where nonzero means valid. points is a list of points in the image to evaluate, as another 2D float32 numpy array with x in column 0 and y in column 1. offsets is a 2D float32 numpy array, of offsets from an origin pixel, the x axis in column 0, the y axis in column 1. Note that you can get the order the wrong way around with the only consequence being the indexing order of the returned matrices. degree is the optional degree of the B-spline to use - defaults to 3 (cubic; must be 0-5). It then returns a dictionary of float32 numpy arrays indexed [point, offset] of all the relevant evaluations, one per input image channel. Note that any coordinates that land outside the image will be evaluated using repetition of border pixels - no mask is generated. Also note that if there is a mask it will make changes to the original image, but only in the areas marked as invalid by the mask.

rotsets(?) Same as offsets, except it makes rather more sense, as each location also has an orientation (given as cos(angle), sin(angle) - direction of x-axis) which is applied to the offsets before evaluation. In other words, this is for extracting feature vectors from images that estimate a rotation before sampling. Takes parameters (image, points, rotations, offsets, degree). image is a dictionary of 2D float32 arrays indexed [y,x], all the same size, to be sampled. Can also include a 'mask' of uint8 where nonzero means valid. points is a list of points in the image to evaluate, as another 2D float32 numpy array with x in column 0 and y in column 1. rotations is a 2D float32 array, aligned with the points and giving nx in column 0 and ny in column 1. These should be the unit length direction of the x axis - you can think of it as nx=cos(angle), ny=sin(angle). Note that scaling these vectors will have the expected effect, so you can have per-point scales as well as per-point angles. offsets is a 2D float32 numpy array, of offsets from an origin pixel, the x axis in column 0, the y axis in column 1. degree is the optional degree of the B-spline to use - defaults to 3 (cubic; must be 0-5). It then returns a dictionary of float32 numpy arrays indexed [point, offset] of all the relevant evaluations, one per input image channel. Note that any coordinates that land outside the image will be evaluated using repetition of border pixels - no mask is generated. Also note that if there is a mask it will make changes to the original image, but only in the areas marked as invalid by the mask.

Gaussian(?) Does a Gaussian blur on an n dimensional numpy array of type float32. Takes the following arguments, in the following order or with keywords: {data : An nd numpy array of values - can contain inf and NaN, which will be ignored; out : array identical to data which will be overwriten with the output. Can in fact be the same array as data; sd : 1D array giving standard deviation for each dimension, so length must match number of dimensions of data, in shape order which typically means [y sd, x sd]. Type must be float32, and it will handle values of zero correctly with a noop. If not provided it defaults to sqrt(2) for all values; derivative - an optional integer array, matching up with the sd array, whose length matches the number of dimensions. A value of 0 means to use the normal Gaussian for that dimension, a value of 1 its derivative, a value of -1 its mirrored derivative. Also supports 2/-2 for the second derivative etc. upto the 6th derivative. It rarely make sense to have more than one non-zero value; quality : Number of standard deviations out to go - defaults to 4, weight : An array, same shape as input/output of float32 type, into which the weights will be written - should be 1 in all cases.}. A little different from most implimentations because it drops values outside the array/numbers that are not finite, and renormalises the output values accordingly - does the correct thing for data that contains gaps in other words.

Clone this wiki locally