DoubleSparseJLT

template<typename RegType, template<std::size_t> class HashType, template<typename> class PtrType, std::size_t RangeSize, std::size_t ReplicationCount>
class DoubleSparseJLT

A functor implementing a CountSketch-based sparse JLT on each side of a matrix of registers.

Implements CountSketch using a ReplicationCount number of pairs of hash functions. Currently assumes that both sides of the matrix are to be projected to the same dimension.

[0] M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science. 2004. https://edoliberty.github.io/datamining2011aFiles/FindingFrequentItemsInDataStreams.pdf

[1] K. Clarkson, D. Woodruff. Low-rank approximation in input sparsity time. Journal of the ACM. 2017. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.746.6409&rep=rep1&type=pdf

Template Parameters:
  • RegType – The type of register over which the functor operates

  • HashType – The hash functor type to use to define CountSketch random mappings.

  • PtrType – The type of shared pointer used to wrap individual sketch functors.

  • RangeSize – The power-of-two embedding dimension.

  • ReplicationCount – The number of replicated CountSketch transforms to use.

Public Functions

inline DoubleSparseJLT(row_transform_ptr_type row_transform_ptr, col_transform_ptr_type col_transform_ptr)

Construct a new DoubleSparseJLT Functor object by initializing hash functors.

Depending on the hash functor to be used, the effective embedding dimension (returned by this->size()) may be rounded up to the next power of two. Each insert is hashed to one row and one column location for each combination of ReplicationCount register replicas in the rows and columns of the two-sided sketch. E.g., an insert to a sketch with whose row and columns feature 4 replications will result in updating 16 total indices in the matrix data structure.

The implementation currently assumes that the row and column hashes use the same functional form, meaning that the underlying Matrix data structure will always be square.

Note

This behavior may change in the future.

Template Parameters:

Args – type(s) of additional hash parameters

Parameters:
  • row_transform_ptr – shared pointer to the row transform.

  • col_transform_ptr – shared pointer to the column transform.

template<typename ContainerType, typename ...ItemArgs>
inline constexpr void operator()(ContainerType &registers, const ItemArgs&... item_args) const

Update a register container with an observation.

Template Parameters:
  • ContainerType – The type of the register data structure.

  • ItemArgs – Types of stream item observables.

Parameters:
  • registers – The register container.

  • item_args – Stream item observables.

inline constexpr std::uint64_t seed() const

Get the random seed.

Public Static Functions

static inline constexpr std::size_t range_size()

Get the maximum number of range values returnable by the register hash function.

This is equivalent to the number of registers in each replicated tile in passed containers.

Returns:

constexpr std::size_t The range size.

static inline constexpr std::size_t replication_count()

Get the number of replicated CountSketch transforms.

Returns:

constexpr std::size_t The replication count.

static inline constexpr std::size_t size()

Get the total number of addressable registers across all hash functions.

This is equivalent to the range size times the number of replicated tiles.

Returns:

constexpr std::size_t The range size.

static inline constexpr std::string name()

Return a description of the transform type.

Returns:

std::string “DoubleSparseJLT”

static inline constexpr std::string full_name()

Return a description of the fully-qualified transform type.

Returns:

std::string Transform description, e.g. “DoubleSparseJLT using

MulAddShift hashes and 4 byte registers”

Public Static Attributes

static constexpr RegType scaling_factor = row_transform_type::scaling_factor * col_transform_type::scaling_factor

Get the scaling factor to be used for projections.

Return:

constexpr std::size_t The replication count.

Friends

inline friend constexpr bool operator==(const self_type &lhs, const self_type &rhs)

Check for equality between two DoubleSparseJLTs.

Parameters:
  • lhs – The left-hand functor.

  • rhs – The right-hand functor.

Returns:

true The seeds and range sizes agree.

Returns:

false The seeds or range sizes disagree.

inline friend constexpr bool operator!=(const self_type &lhs, const self_type &rhs)

Check for inequality between two DoubleSparseJLTs.

Parameters:
  • lhs – The left-hand functor.

  • rhs – The right-hand functor.

Returns:

true The seeds or range sizes disagree.

Returns:

false The seeds and range sizes agree.

inline friend std::ostream &operator<<(std::ostream &os, const self_type &func)

Serialize a transform to human-readable output stream.

Prints the space-separated range size and seed.

Parameters:
  • os – The output stream.

  • func – The functor object.

Returns:

std::ostream& The new stream state.