SparseJLT
-
template<typename RegType, template<std::size_t> class HashType, std::size_t RangeSize, std::size_t ReplicationCount>
class SparseJLT A functor implementing a CountSketch-based sparse JLT on a collection of registers.
good reference for operator declarations: https://stackoverflow.com/questions/4421706/what-are-the-basic-rules-and-idioms-for-operator-overloading
Implements CountSketch using a
ReplicationCountnumber of pairs of hash functions.[0] M. Charikar, K. Chen, M. Farach-Colton. Finding frequent items in data streams. Theoretical Computer Science. 2004. https://edoliberty.github.io/datamining2011aFiles/FindingFrequentItemsInDataStreams.pdf
[1] K. Clarkson, D. Woodruff. Low-rank approximation in input sparsity time. Journal of the ACM. 2017. http://citeseerx.ist.psu.edu/viewdoc/download?doi=10.1.1.746.6409&rep=rep1&type=pdf
- Template Parameters:
RegType – The type of register over which the functor operates
HashType – The hash functor type to use to define CountSketch random mappings.
RangeSize – The power-of-two embedding dimension.
ReplicationCount – The number of replicated CountSketch transforms to use.
Public Functions
-
template<typename ...Args>
inline SparseJLT(std::uint64_t seed, const Args&... args) Construct a new SparseJLT Functor object by initializing hash functors.
Depending on the hash functor to be used, the effective embedding dimension (returned by
this->size()) may be rounded up to the next power of two. Each insert is hashed to one register location per ReplicationCount register replica.- Template Parameters:
Args – type(s) of additional hash parameters
- Parameters:
seed – The random seed.
args – Any additional parameters required by the hash functions.
-
template<typename ContainerType, typename ...ItemArgs>
inline constexpr void operator()(ContainerType ®isters, const ItemArgs&... item_args) const Update a register container with an observation.
- Template Parameters:
ContainerType – The type of the register data structure.
ItemArgs – Types of stream item observables.
- Parameters:
registers – The register container.
item_args – Stream item observables.
-
inline constexpr std::uint64_t seed() const
Get the random seed.
Public Static Functions
-
static inline constexpr std::size_t range_size()
Get the maximum number of range values returnable by the register hash function.
This is equivalent to the number of registers in each replicated tile in passed containers.
- Returns:
constexpr std::size_t The range size.
-
static inline constexpr std::size_t replication_count()
Get the number of replicated CountSketch transforms.
- Returns:
constexpr std::size_t The replication count.
-
static inline constexpr std::size_t size()
Get the total number of addressable registers across all hash functions.
This is equivalent to the range size times the number of replicated tiles.
- Returns:
constexpr std::size_t The range size.
-
static inline constexpr std::string name()
Return a description of the transform type.
- Returns:
std::string “SparseJLT”
-
static inline constexpr std::string full_name()
Return a description of the fully-qualified transform type.
- Returns:
std::string Transform description, e.g. “SparseJLT using
MulAddShift hashes and 4 byte registers”
Public Static Attributes
-
static constexpr RegType scaling_factor = std::sqrt((RegType)replication_count())
Get the scaling factor to be used for projections.
- Return:
constexpr std::size_t The replication count.
Friends
-
inline friend constexpr bool operator==(const self_type &lhs, const self_type &rhs)
Check for equality between two SparseJLTs.
- Parameters:
lhs – The left-hand functor.
rhs – The right-hand functor.
- Returns:
true The seeds and range sizes agree.
- Returns:
false The seeds or range sizes disagree.
-
inline friend constexpr bool operator!=(const self_type &lhs, const self_type &rhs)
Check for inequality between two SparseJLTs.
- Parameters:
lhs – The left-hand functor.
rhs – The right-hand functor.
- Returns:
true The seeds or range sizes disagree.
- Returns:
false The seeds and range sizes agree.
-
inline friend std::ostream &operator<<(std::ostream &os, const self_type &func)
Serialize a transform to human-readable output stream.
Prints the space-separated range size and seed.
- Parameters:
os – The output stream.
func – The functor object.
- Returns:
std::ostream& The new stream state.