Dark Mode On/Off

Interactive Learning

C Language course

GO Lang course

Learn JavaScript

Learn HTML

Learn CSS

C Language

C Tutorial

C Programs (100+)

C Compiler

Execute C programs online.

C++ Language

C++ Tutorial

Standard Template Library

C++ Programs (100+)

C++ Compiler

Execute C++ programs online.

Python

Python Tutorial

Python Projects

Python Programs

Python How Tos

Numpy Module

Matplotlib Module

Tkinter Module

Network Programming with Python

Learn Web Scraping

What is Hashing?

Technology #Hashing#Data Structures#Algorithms

Hashing is a technique used for storing and retrieving information as quickly as possible. It allows performing optimal searches and is useful in implementing hash tables.

As part of Hashing a string is transformed into a shorter, fixed-length value(or key) which represents the original string. In general, when we store strings in a database and then have to search those strings, character by character comparison is carried out on each string until a match is found, this is a very cumbersome process.

But what if we generate a unique 4 digit number for each string using some conversion function and then whenever we are asked to search for any string, we can convert the string to be searched into the 4 digit number and then look for that number. This way, the search will be very fast.

Hashing provides a way to perform insert, delete and search operation in O(1) average time, which although can reach O(n) in the worst cases. When we use hashing, we use a function to generate a hash code for the data to be inserted and then whenever we have to search that data, we again use the same function to generate the unique hash code and can easily find the data.

Why Hashing?

Let's take an example to understand the need for hashing. Let's say we have a table in our database in which we store names of students like:

John Wick
John Rambo
Tony Stark
Steve Martin
Nicki Lauda
John Travolta, etc.

If the table has 1 million such name entries, then whenever we have to search for any name, names are searched by matching characters one by one which is time-consuming. But what if we create a function which can produce a unique 7 digit number for every name. For example,

John Wick - 0089761
John Rambo - 0089651
Tony Stark - 0912321, etc

Now, if someone has to search for 'Tony Stark', rather than searching strings, we can convert the name into numeric 7 digit code and can then look for that code which would be multifold faster than string search.

The table which stores the keys and their respective hashed value is termed as a hash table, and the function which is used to generate the hash keys is called a hash function.

Hashing is not only useful for searching and indexing but this technique is also used for encryption.

Hash Function

It is used to transform a string into a hash key. Ideally, the hash function should map each string to a unique hash key, but it is indeed very difficult to achieve it practically.

One way to create such a perfect hash function is to have the size of the hash table so large that it can accommodate all the hash keys in it. But then there will be no point of calling it hashing, and memory will be wasted too as most of the keys aren't required frequently.

Our goal is to choose a hash function that quickly calculates a hash value, evenly distributes the keys into the hash table and causes a minimum number of collisions.

What is a Collision?

The condition in which for two records the same hash key is generated.

Collision Resolution Techniques

Finding an alternate location for the hashed key is called collision resolution. There are a number of such techniques:

1. Direct Chaining: Array of linked list implementation.

Separate chaining

2. Open Addressing: Array-based implementation.

Linear Probing
Quadratic Probing
Double Hashing

Separate Chaining (Open Hashing)

When two or more string hash to the same location(hash key), we can append them into a singly linked list (called chain) pointed by the hash key.

It is called open hashing because it uses extra memory to resolve collision.

Separate chaining in hashing collision prevention

Linear Probing

Search the hash table sequentially starting from the original hash location. If we reach the end of the table, wrap up from last to the first location using the following function :

Rehash(datastring) = (id + stepsize) % TableSize

The major disadvantage of linear probing is the clustering of keys together in a consecutive pattern. This results in one part being very dense, while other parts having few items. It causes long probe searches.

Quadratic Probing

Start from the original hash location, and check for locations id + 12, id + 22, id + 32,.. and so on.

Wrap around the table if the end is reached.

The function is:

Rehash(key) = (id + k2) % TableSize

Clustering may also occur in the case of quadratic probing.

Double Hashing

The interval between probes is calculated using another hash function, hence the name double hashing.

It reduces clustering effectively.

Rehash(key) = (id + k * ID) % tableSize

where ID is the value calculated by the first hash function.

C TUTORIAL

C PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

C++ TUTORIAL

C++ PROGRAMS

INTERVIEW TESTS

EXECUTE CODE

PYTHON TUTORIAL

PYTHON HOW TOS

INTERVIEW TESTS

EXECUTE CODE

JAVA TUTORIAL

JAVA CODE EXAMPLES

SPRING TUTORIAL

MORE IN JAVA

COMPUTER ARCHITECTURE

COMPUTER NETWORK

OPERATING SYSTEM

DBMS & SQL

PL/SQL

MongoDB

EXECUTE SQL

ANDROID DEVELOPMENT

GO LANGUAGE

LINUX

DOCKER

HTML TAGS (A to Z)

CSS REFERENCES

SASS/SCSS

KOTLIN

GAME DEVELOPMENT

PHP

GIT GUIDE

JAVASCRIPT

ADVANCED DSA

What is Hashing?

Table of Contents

Why Hashing?

Hash Function

What is a Collision?

Collision Resolution Techniques

Separate Chaining (Open Hashing)

Linear Probing

Quadratic Probing

Double Hashing

You may also like:

IF YOU LIKE IT, THEN SHARE IT

RELATED POSTS