Developers Planet

ghost February 2016

Count number of intervals containing another interval?

Given two lists each containing N intervals (subset of the number line), each interval with form of a start point and endpoint. How many pairs of these intervals from one list contains intervals from another list?

For example:

If list A is `{(1,7), (2,9)}` and list B is `{(3,6), (5,8)}`

Then the count of pairs where A has an interval that contains an interval in B would be 3 pairs:

``````(1,7),(3,6)
(2,9)(3,6)
(2,9)(5,8)
``````

The goal is to shoot for O(n log n).

My algorithm currently is to sort by x-coordinate first and take that as one list. Then sort the list by y-coordinate and count the inversions between the two lists. But my question is why does this work? Any insight would be appreciated.

The way I have I'm currently visualizing is in the following geometric fashion (where each intersection of the lines is a count of the num inversion):

Note: I'm not sure how to go about checking inversions in a list of list. Just trying to get an approach that would give O(n log n). If there are any other approaches happy to hear suggestions.

Baj Mile February 2016

If you decide to try also the tree/gird approach I will explain how this may work. For your task you do not need 2D but a one-dimensional interval map or even grid. Lets choose the grid because it is more clear.

Suppose that your pairs are integers from 1 to 100. Then you can afford to have one array with size 100. Each cell from the array contains empty set (ordered list). See the picture below:

Now we start adding the intervals in the grid. We add 1 in all grid sells between 1,7 and 2 in all grid cells between 2,9 (1,2 are IDs which we increase by 1 with each inserted interval, insert in this way is inefficient but this can be fixed).

Now how we check for the intervals from B? We just get each ID from the first cell and check if it is also in the second cell. Since the cells are sets the checks takes O(log n). We need n O(log n) operations in the worst case to check the overlapping intervals count that one interval from B has inside A.

This can be extended to use interval map instead of grid (if the numbers are not small integers). Also if you have fixed number of intervals in A, and no memory requirements, O(logN) can become O(1) if we replace the sets with arrays for example.

MichaĆ Komorowski February 2016

I'll answer the first question about why the solution with inversion works. Firstly, I'll clarify one thing. You shouldn't count all inversions (intersections of lines) but only these that occur between an element from A list and an element from B list. In your example there is no difference but let's assume that `A = {(1,7), (2,5)}` and `B = {(3,6), (5,8)}`. If we visualize these case as in your example there would be 2 intersection but there is only 1 pair we are looking for i.e. (1,7), (3,6).

Now let's assume that we have 2 intervals: `I1=(x1,y1)` and `I2=(x2,y2)`. `I2` is included in `I1`. It means that `x1 <= x2` and `y1 >= y2`. Now, if you sort the list of intervals by x then `I1` will always be before `I2`. Analogically if you sort the list of intervals by y then `I1` will always be after `I2`. It also beans that if we connect `I1`, `I2` in the first list with `I1`, `I2` in the second list then the lines must cross.

However, let's assume that `x1 <= x2` and `y1 < y2`. Now `I1` will be before `I2` in the first and in the second list. If we connect `I1`, `I2` in the first list with `I1`, `I2` in the second list then the lines will never cross. The same situation is if `x1 > x2` and `y1 >= y2`

Here is the visualization of these cases: