SparseJLT

template<typename RegType, template<std::size_t> class HashType, std::size_t RangeSize, std::size_t ReplicationCount>
class SparseJLT

A functor implementing a CountSketch-based sparse JLT on a collection of registers.

good reference for operator declarations: https://stackoverflow.com/questions/4421706/what-are-the-basic-rules-and-idioms-for-operator-overloading

Implements CountSketch using a ReplicationCount number of pairs of hash functions.

[0] M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science. 2004. https://edoliberty.github.io/datamining2011aFiles/FindingFrequentItemsInDataStreams.pdf

[1] K. Clarkson, D. Woodruff. Low-rank approximation in input sparsity time. Journal of the ACM. 2017. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.746.6409&rep=rep1&type=pdf

Template Parameters:
  • RegType – The type of register over which the functor operates

  • HashType – The hash functor type to use to define CountSketch random mappings.

  • RangeSize – The power-of-two embedding dimension.

  • ReplicationCount – The number of replicated CountSketch transforms to use.

Public Functions

template<typename ...Args>
inline SparseJLT(std::uint64_t seed, const Args&... args)

Construct a new SparseJLT Functor object by initializing hash functors.

Depending on the hash functor to be used, the effective embedding dimension (returned by this->size()) may be rounded up to the next power of two. Each insert is hashed to one register location per ReplicationCount register replica.

Template Parameters:

Args – type(s) of additional hash parameters

Parameters:
  • seed – The random seed.

  • args – Any additional parameters required by the hash functions.

template<typename ContainerType, typename ...ItemArgs>
inline constexpr void operator()(ContainerType &registers, const ItemArgs&... item_args) const

Update a register container with an observation.

Template Parameters:
  • ContainerType – The type of the register data structure.

  • ItemArgs – Types of stream item observables.

Parameters:
  • registers – The register container.

  • item_args – Stream item observables.

inline constexpr std::uint64_t seed() const

Get the random seed.

Public Static Functions

static inline constexpr std::size_t range_size()

Get the maximum number of range values returnable by the register hash function.

This is equivalent to the number of registers in each replicated tile in passed containers.

Returns:

constexpr std::size_t The range size.

static inline constexpr std::size_t replication_count()

Get the number of replicated CountSketch transforms.

Returns:

constexpr std::size_t The replication count.

static inline constexpr std::size_t size()

Get the total number of addressable registers across all hash functions.

This is equivalent to the range size times the number of replicated tiles.

Returns:

constexpr std::size_t The range size.

static inline constexpr std::string name()

Return a description of the transform type.

Returns:

std::string “SparseJLT”

static inline constexpr std::string full_name()

Return a description of the fully-qualified transform type.

Returns:

std::string Transform description, e.g. “SparseJLT using

MulAddShift hashes and 4 byte registers”

Public Static Attributes

static constexpr RegType scaling_factor = std::sqrt((RegType)replication_count())

Get the scaling factor to be used for projections.

Return:

constexpr std::size_t The replication count.

Friends

inline friend constexpr bool operator==(const self_type &lhs, const self_type &rhs)

Check for equality between two SparseJLTs.

Parameters:
  • lhs – The left-hand functor.

  • rhs – The right-hand functor.

Returns:

true The seeds and range sizes agree.

Returns:

false The seeds or range sizes disagree.

inline friend constexpr bool operator!=(const self_type &lhs, const self_type &rhs)

Check for inequality between two SparseJLTs.

Parameters:
  • lhs – The left-hand functor.

  • rhs – The right-hand functor.

Returns:

true The seeds or range sizes disagree.

Returns:

false The seeds and range sizes agree.

inline friend std::ostream &operator<<(std::ostream &os, const self_type &func)

Serialize a transform to human-readable output stream.

Prints the space-separated range size and seed.

Parameters:
  • os – The output stream.

  • func – The functor object.

Returns:

std::ostream& The new stream state.