Sunday, April 10, 2016

Mapping a continuous range to a discrete value in Cassandra

In mathematics and computer programs, discretization concerns the process of transferring continuous functions, models, and equations into discrete counterparts.

In my case I want to map a continuous integer range to a discrete value.
For example consider the following maping
[100 .. 299] --> 100
[300 .. 799] --> 300
[800 .. 999] --> 800

I created the following discretization table in Cassandra:

CREATE TABLE range_mapping (
   k int, 
   lower int,
   upper int,
   mapped_value int,
   PRIMARY KEY (k, lower, upper)
);

Problem is that I can't use a query like select mapped_value from range_mapping where k=0 and ? between lower and uppein Cassandra since there is no support for non-EQ relations on two clustering columns in the same query.
Issuing a query like
select * from range_mapping where k = 0 and lower <= 150 and upper >= 150 allow filtering;
returns an error stating "Clustering column "upper" cannot be restricted (preceding column "lower" is restricted by a non-EQ relation)"

The solution I found was using a combination of a clustering column and a secondary index.
I was inspired by this answer on SO.
I removed the ‘upper’ column from the PK so it is no longer a clustering column, and I added a secondary index over it. This assumes there is no overlap in the continuous ranges so having only 'lower' column in the PK provides uniqueness.
I had to add a ‘dummy’ column with a constant value to be able to use an non-eq operator on the ‘upper’ column.
Now that it’s not a clustering column and it has an index I can use the following table to map continuous ranges to a discrete values.

CREATE TABLE range_mapping (
   k int, 
   lower int,
   upper int,
   dummy int,
   mapped_value int,
   PRIMARY KEY (k, lower)
);
CREATE INDEX upper_index on range_mapping(upper);
CREATE INDEX dummy_index on range_mapping(dummy);

Put in some data:

INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 0, 99, 0);
INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 100, 199, 100);
INSERT INTO range_mapping (k, dummy, lower, upper, mapped_value) VALUES (0, 0, 200, 299, 200);

Now my updated query works as expected:

select mapped_value from range_mapping where k = 0 and dummy = 0 and lower <= 150 and upper >= 150 allow filtering;

returns the value '100'.