DoubleSparseJLT
-
template<typename RegType, template<std::size_t> class HashType, template<typename> class PtrType, std::size_t RangeSize, std::size_t ReplicationCount>
class DoubleSparseJLT A functor implementing a CountSketch-based sparse JLT on each side of a matrix of registers.
Implements CountSketch using a
ReplicationCountnumber of pairs of hash functions. Currently assumes that both sides of the matrix are to be projected to the same dimension.[0] M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science. 2004. https://edoliberty.github.io/datamining2011aFiles/FindingFrequentItemsInDataStreams.pdf
[1] K. Clarkson, D. Woodruff. Low-rank approximation in input sparsity time. Journal of the ACM. 2017. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.746.6409&rep=rep1&type=pdf
- Template Parameters:
RegType – The type of register over which the functor operates
HashType – The hash functor type to use to define CountSketch random mappings.
PtrType – The type of shared pointer used to wrap individual sketch functors.
RangeSize – The power-of-two embedding dimension.
ReplicationCount – The number of replicated CountSketch transforms to use.
Public Functions
-
inline DoubleSparseJLT(row_transform_ptr_type row_transform_ptr, col_transform_ptr_type col_transform_ptr)
Construct a new DoubleSparseJLT Functor object by initializing hash functors.
Depending on the hash functor to be used, the effective embedding dimension (returned by
this->size()) may be rounded up to the next power of two. Each insert is hashed to one row and one column location for each combination of ReplicationCount register replicas in the rows and columns of the two-sided sketch. E.g., an insert to a sketch with whose row and columns feature 4 replications will result in updating 16 total indices in the matrix data structure.The implementation currently assumes that the row and column hashes use the same functional form, meaning that the underlying Matrix data structure will always be square.
Note
This behavior may change in the future.
- Template Parameters:
Args – type(s) of additional hash parameters
- Parameters:
row_transform_ptr – shared pointer to the row transform.
col_transform_ptr – shared pointer to the column transform.
-
template<typename ContainerType, typename ...ItemArgs>
inline constexpr void operator()(ContainerType ®isters, const ItemArgs&... item_args) const Update a register container with an observation.
- Template Parameters:
ContainerType – The type of the register data structure.
ItemArgs – Types of stream item observables.
- Parameters:
registers – The register container.
item_args – Stream item observables.
-
inline constexpr std::uint64_t seed() const
Get the random seed.
Public Static Functions
-
static inline constexpr std::size_t range_size()
Get the maximum number of range values returnable by the register hash function.
This is equivalent to the number of registers in each replicated tile in passed containers.
- Returns:
constexpr std::size_t The range size.
-
static inline constexpr std::size_t replication_count()
Get the number of replicated CountSketch transforms.
- Returns:
constexpr std::size_t The replication count.
-
static inline constexpr std::size_t size()
Get the total number of addressable registers across all hash functions.
This is equivalent to the range size times the number of replicated tiles.
- Returns:
constexpr std::size_t The range size.
-
static inline constexpr std::string name()
Return a description of the transform type.
- Returns:
std::string “DoubleSparseJLT”
-
static inline constexpr std::string full_name()
Return a description of the fully-qualified transform type.
- Returns:
std::string Transform description, e.g. “DoubleSparseJLT using
MulAddShift hashes and 4 byte registers”
Public Static Attributes
Friends
-
inline friend constexpr bool operator==(const self_type &lhs, const self_type &rhs)
Check for equality between two DoubleSparseJLTs.
- Parameters:
lhs – The left-hand functor.
rhs – The right-hand functor.
- Returns:
true The seeds and range sizes agree.
- Returns:
false The seeds or range sizes disagree.
-
inline friend constexpr bool operator!=(const self_type &lhs, const self_type &rhs)
Check for inequality between two DoubleSparseJLTs.
- Parameters:
lhs – The left-hand functor.
rhs – The right-hand functor.
- Returns:
true The seeds or range sizes disagree.
- Returns:
false The seeds and range sizes agree.
-
inline friend std::ostream &operator<<(std::ostream &os, const self_type &func)
Serialize a transform to human-readable output stream.
Prints the space-separated range size and seed.
- Parameters:
os – The output stream.
func – The functor object.
- Returns:
std::ostream& The new stream state.