Mastering Hashing: Finding Duplicates, HashMaps, and HashSets

In partnership with

If you've ever been curious about how search engines deliver results so swiftly or how databases handle vast amounts of data so effectively, the answer often lies in hashing. This robust technique is a staple in computer science, particularly in problem-solving and system design.

In this newsletter of Nullpointer Club, we will delve into the basics of hashing, how to utilize HashMaps and HashSets, and methods for efficiently detecting duplicates. Additionally, we’ll discuss common questions related to these topics and provide tips to help you upskill.

There’s a reason 400,000 professionals read this daily.

Join The AI Report, trusted by 400,000+ professionals at Google, Microsoft, and OpenAI. Get daily insights, tools, and strategies to master practical AI skills that drive results.

What is Hashing?

At its essence, hashing is a method for transforming data into a fixed-size value (hash code) that can be stored and accessed efficiently. This transformation is achieved through a hash function, which assigns a unique hash code to input values. Hashing guarantees that lookups, insertions, and deletions can be performed in constant time (O(1)) on average, making it a preferred strategy for enhancing performance.

Common uses of hashing include:

  • Fast data retrieval (e.g., caches, databases, hash tables)

  • Duplicate detection (e.g., checking if an element already exists)

  • Cryptography (e.g., password hashing, data integrity checks)

  • Load balancing and distributed systems

Understanding HashMaps and HashSets

1. HashMap:

A HashMap (also called a dictionary in Python) is a key-value pair data structure that allows fast lookups, insertions, and deletions.

Operations:

  • Insert a key-value pair → O(1) on average

  • Retrieve a value using a key → O(1) on average

  • Remove a key → O(1) on average

Common Use Cases:

  • Caching frequently accessed data

  • Counting occurrences of elements

  • Implementing adjacency lists in graphs

Example:

hash_map = {}
hash_map["name"] = "Alice"
print(hash_map["name"])  # Output: Alice

2. HashSet:

A HashSet is a data structure that stores unique elements without maintaining any particular order.

Operations:

  • Insert an element → O(1)

  • Check if an element exists → O(1)

  • Remove an element → O(1)

Common Use Cases:

  • Removing duplicates from a list

  • Checking membership efficiently

  • Performing set operations like unions and intersections

Example:

hash_set = set()
hash_set.add(5)
hash_set.add(10)
hash_set.add(5)  # Duplicate, won’t be added
print(hash_set)  # Output: {5, 10}

Engineering Job Openings To Look Out For

Finding Duplicates Using Hashing

A common problem in coding interviews is detecting duplicate elements in a list efficiently. Instead of using a nested loop (O(n²) time complexity), we can use a HashSet to solve it in O(n).

Example:

def find_duplicates(arr):
    seen = set()
    duplicates = set()
    for num in arr:
        if num in seen:
            duplicates.add(num)
        else:
            seen.add(num)
    return duplicates

arr = [1, 2, 3, 4, 2, 5, 3]
print(find_duplicates(arr))  # Output: {2, 3}

Common Hashing Questions

If you're preparing to upskill, here are some hashing-related problems you should master:

  1. Find the first non-repeating character in a string
    Use a HashMap to store character frequencies and return the first one with count 1.

  2. Check if two arrays have common elements
    Store elements of one array in a HashSet and check for membership in the second array.

  3. Find the most frequent element in an array
    Use a HashMap to count occurrences and track the max count.

  4. Two Sum Problem
    Use a HashMap to check if (target - current number) exists before inserting the current number.

Group Anagrams
Use a HashMap with sorted strings as keys and lists of words as values.

Tips to Ace Hashing Questions

  • Know your data structures – Understand when to use HashMaps vs. HashSets.

  • Be mindful of hash collisions – In real-world implementations, hash collisions can slow down operations.

  • Optimize space usage – Hashing can use extra memory, so consider alternative approaches when necessary.

  • Practice common problems – The best way to master hashing is through hands-on problem-solving on platforms like LeetCode, CodeSignal, and HackerRank.

Hashing significantly enhances the efficiency of searches, data storage, and tackling interview challenges. By getting a good grasp of HashMaps, HashSets, and how to detect duplicates, you’ll gain essential skills that are crucial for system design and effective programming.

We hope you found this guide helpful! Do you have any favorite hashing problems or challenges you'd like to talk about? Just reply to this email, and let’s learn together!


- The Nullpointer Club Team

Reply

or to participate.