Recently I have been using Cassandra for one of my projects, and one of the
needs is to iterate over all columns of a row. Each column represents an
individual data, of type identified by row id, and keeps changing. So
I can’t simply use a set of known column names. Using the
setRange
call on a SliceQuery
and setting a large
count
is also not an option, since Cassandra will try to load the
entire set of columns into memory. Instead I’ve written this iterator
which takes a query on which row key and column family has been set, and will
load columns as they are requested. By default it loads a 100 columns at
a time. You could make it take the count as a parameter and all, but this works
for me for now.
The one ‘problem’ with this is the removal of the last column to ensure that there are no duplicates, but still having a start point for the next query. This is because each column is independent, so you cannot ask a column who it’s next neighbour is and start the next query from there. If anybody has a tip to make it more elegant, I’d love to hear it.
Nice post.
ReplyDeleteIsn't there a little bug here ? Why don't you update the "start" local variable in fetchMore() method ?
I am updating it if 'origSize >= count'. This condition is true whenever count rows were returned. In this case we have to do another query as there may be more rows. Otherwise we know that there are no more rows so we don't need to run the query again.
ReplyDelete