PrevNext
Rare
 0/2

(Optional) Hashmaps

Authors: Darren Yao, Benjamin Qi

Contributors: Neo Wang, Nathan Gong

Maintaining collections of distinct elements with hashing.

Warning!

You can almost always use ordered sets and maps instead, but it's good to know that these exist.

Resources
IUSACO

module is based off this

Hashing

Hashing refers to assigning a unique code to every variable/object which allows insertions, deletions, and searches in O(1)\mathcal{O}(1) time, albeit with a high constant factor, as hashing requires a large constant number of operations. However, as the name implies, elements are not ordered in any meaningful way, so traversals of an unordered set will return elements in some arbitrary order.

Custom Hashing

C++

There is no built in method for hashing pairs or vectors. Namely, unordered_set<vector<int>> does not work. In this case, we can use an sorted map (which supports all of the functions used in the code above) or declare our own hash function.

Resources
Mark Nelson

How to create user-defined hash function for unordered_map.

The link provides an example of hashing pairs of strings, as well as other data structures like pairs.

#include <bits/stdc++.h>
using namespace std;
typedef pair<int, int> pi;
#define f first
#define s second
struct hashPi {
size_t operator()(const pi &p) const { return p.f ^ p.s; }
};
int main() { unordered_map<pi, int, hashPi> um; }
#include <bits/stdc++.h>
using namespace std;
typedef pair<int, int> pi;
#define f first
#define s second
namespace std {
template <> struct hash<pi> {
size_t operator()(const pi &p) const { return p.f ^ p.s; }
};
} // namespace std
int main() { unordered_map<pi, int> um; }

Java

Java has its own hash functions for pre-defined objects like ArrayList's. However, a custom hash function is still needed for user-defined objects. In order to create one, we can implement the hashCode method.

Additionally, in order for HashSet's and HashMap's to work with a custom class, we must also implement the equals method.

import java.io.*;
import java.util.*;
public class HashTest {
public static void main(String[] args) {
// uses custom hash function in class Pair
Set<Pair> set = new HashSet<>();
}
static class Pair {

However, this hash function is quite bad; if we insert (0,0),(1,1),(2,2)(0,0), (1,1), (2,2) \ldots then they will all be mapped to the same bucket (so it would easily be hacked in a Codeforces contest).

C++

Java

In Java, an easy way to hash custom objects is to use the builtin Objects.hash() method. This method takes in multiple objects and uses them to create a hash code. However, this is easily hackable, as the code below shows.

import java.io.*;
import java.util.*;
public class HashTest {
public static void main(String[] args) {
// uses custom hash function in class Pair
Set<Pair> set = new HashSet<>();
for (int i = 0; i < 100000; ++i) {
// Pair p = new Pair(i, 0); // ~0.2s on ide.usaco.guide
Pair p = new Pair(i, -31 * i); // >5s

Python

A better way to hash pairs would be to use polynomial hashing with a randomized base (as described in this module).

Anti-Hash Tests

The built-in hashing algorithm for integers in C++ is vulnerable to pathological tests, causing abnormally slow runtimes. We describe the issue below and how to fix it. Java users are not affected (see this comment).

Should I worry about anti-hash tests in USACO?

No - historically, no USACO problem has included an anti-hash test. However, these sorts of tests often appear in Codeforces, especially in educational rounds, where open hacking is allowed.

Resources
CF

Explanation of this problem and how to fix it.

Hack Case Generator

In order to make your unordered_map unhackable, the hash function HH you select must have the following property: it is hard to find integers xx and yy such that xyx\neq y but H(x)H(y)(modP)H(x)\equiv H(y) \pmod{P}, where PP is the prime modulus corresponding to the number of buckets in your unordered map. Note that:

  • The default hash function, H(x)=xH(x)=x, obviously does not satisfy this property, because you can simply choose x=0x=0 and y=Py=P.
  • If open hacking is allowed, then any deterministic hash function does not satisfy this property, because a hacker could generate a test case specifically to break your solution. You should introduce randomness to ensure that your hash function changes from run to run.

Here is a drop-in replacement for unordered_map that seems to work well in practice:

Resources
Benq (from KACTL)
struct chash {
// any random-ish large odd number will do
const uint64_t C = uint64_t(2e18 * PI) + 71;
// random 32-bit number
const uint32_t RANDOM = chrono::steady_clock::now().time_since_epoch().count();
size_t operator()(uint64_t x) const {
// see https://gcc.gnu.org/onlinedocs/gcc/Other-Builtins.html
return __builtin_bswap64((x ^ RANDOM) * C);
}
};
template <class K, class V> using cmap = unordered_map<K, V, chash>;
// example usage: cmap<int, int>

A Faster HashMap in C++

In C++, it's usually faster to use gp_hash_table<K, V> instead of unordered_map<K, V>. Read / writes are much faster than unordered_map. The documentation is rather confusing, so I'll just summarize the most useful functions here. If you need a replacement for unordered_set<K>, use gp_hash_table<K, null_type>.

Resources
CF

Introduces gp_hash_table

GCC

documentation

Benq (from KACTL)

Custom Hashing

Like unordered_map, gp_hash_table is vulernable to hash collisions if a custom hash function isn't used. We recommend using the same custom hash we used for unordered map above:

#include <ext/pb_ds/assoc_container.hpp>
using namespace __gnu_pbds;
gp_hash_table<int, int, chash> table;

Resizing

Unordered map has reserve. Calling this function before inserting any elements can result in a constant factor speedup.

We can modify the declaration of gp_hash_table so that it supports the resize function, which operates similarly.

template <class K, class V>
using ht =
gp_hash_table<K, V, hash<K>, equal_to<K>, direct_mask_range_hashing<>,
linear_probe_fn<>,
hash_standard_resize_policy<hash_exponential_size_policy<>,
hash_load_check_resize_trigger<>, true>>;

These are the same template arguments as the default gp_hash_table, except false has been changed to true. This modification allows us to change the actual size of the hash table.

int main() {
ht<int, null_type> g;
g.resize(5);
cout << g.get_actual_size() << "\n"; // 8
cout << g.size() << "\n"; // 0
}

When calling g.resize(x), x is rounded up to the nearest power of 2. Then the actual size of g is changed to be equal to x (unless x < g.size(), in which case an error is thrown).

Resources
GCC

documentation

Furthermore, if we construct g with the following arguments:

ht<int, null_type> g({}, {}, {}, {}, {1 << 16});

then the actual size of g is always at least 1<<16 (regardless of calls to resize). The last argument must be a power of 2 (or else errors will be thrown).

Solving 3SUM

Focus Problem – try your best to solve this problem before continuing!

View Internal Solution

Since all the values are quite small, you can use an array instead of a hashmap. But if you didn't read the constraints carefully enough, you're in luck!

Warning!

This code passes in 1180ms when initial capacity is specified, and TLEs when it is not.

#include <bits/stdc++.h> // see C++ Tips & Tricks
using namespace std;
using ll = long long;
using vi = vector<int>;
#define pb push_back
#define rsz resize
#define all(x) begin(x), end(x)
#define sz(x) (int)(x).size()

Problems

StatusSourceProblem NameDifficultyTags
CSESNormal
Show TagsSet

Module Progress:

Join the USACO Forum!

Stuck on a problem, or don't understand a module? Join the USACO Forum and get help from other competitive programmers!

PrevNext