Data Modelling in MongoDB

As explained in the earlier lessons, data in MongoDB is schema-less, which means there is no need of defining a structure for the data before insertion. Since, MongoDB is a document based database, any document within the same collection is not mandatory to have same set of fields or structure. This helps in easily mapping the documents with the entity objects. In general, the documents in a collection of MongoDB will always share the same data structure(recommended for best performance, not mandatory).

The vital factor or challenge in modelling the data in a given database is load balancing and hence ensuring the performance aspect effectively. It is a mandate while modelling the data to consider the complete usage of data (CRUD operations) along with how the data will be inherited as well.

There are 2 ways in which the relationships between the data can be established in MongoDB:

  • Reference Documents
  • Embedded Documents

Referenced Documents

This is one of the ways to implement the relationship between data stored in different collections. In this, a reference to the data in one collection will be used in connecting the data between the collections.

Consider 2 collections books and authors as shown below:

{
   title: "Java in action",
   author: "author1",
   language: "English",
   publisher: {
              name: "My publications",
              founded: 1990,
              location: "SF"
            }
}

{
   title: "Hibernate in action",
   author: "author2",
   language: "English",
   publisher: {
              name: "My publications",
              founded: 1990,
              location: "SF"
            }
}

In the above example, the publisher data is repeated. In order to avoid this repetition, we can add references of the book to the publisher data instead of using entire data of publisher in every book entry, as shown below:

{
   name: "My Publciations",
   founded: 1980,
   location: "CA",
   books: [111222333, 444555666, ..]
}

{
    _id: 111222333,
    title: "Java in action",
    author: "author1",
    language: "English"
}

{
   _id: 444555666,
   title: "Hibernate in action",
   author: "author2",
   language: "English"
}

This can be done the other way round as well, where in one can reference the publisher id in the books data, your choice.


Embedded Documents

In this, one collection will be embedded into another collection. Consider 2 collections student and address. Let us see how the address can be embedded into student collection. Below is an example embedding single address to student data.

{
   _id: 123,
   name: "Student1"
}

{
   _studentId: 123,
   street: "123 Street",
   city: "Bangalore",
   state: "KA"
}

{
   _studentId: 123,
   street: "456 Street",
   city: "Punjab",
   state: "HR"
}

Embedding multiple addresses can also be done. See the below example :

{
   _id: 123,
   name: "Student1"
   addresses: [
      	{
             	street: "123 Street",
   		city: "Bangalore",
   		state: "KA"
        },

        {
   		street: "456 Street",
   		city: "Punjab",
   		state: "HR"
        }
    ]
}

Example of Data Modelling in MongoDB

Let us consider a simple example of building a student database in a college. Assume there are 3 models – Student, Address and Course. In a typical RDBMS database, these 3 models will be translated into 3 tables as shown below:

Data Modelling in RDBMS

Hence, from the above model, if a student details has to be added, then entries should be made in all the 3 tables!!

Let us see, how the same data can be modelled in MongoDB. In MongoDB, schema design will have only collection one Student and will have the below structure.

{
    _id: 123,
    firstName: 'Test',
    lastName: 'Student',
    address :[{
            City: 'Bangalore',
            State: 'Karnataka',
            Country: 'India'
        }
    ],
    Course: 'MCA'
}

In MongoDB, data related to all the 3 models will be shown under one Collection !!

MongoDB provides with multiple ways of modelling your data. Now you know how to do that.

NOTE : Fieldnames in a collection like firstName and lastName etc in above examples also use memory, may 10-20 bytes or so. But when the dataset is very large, this can add up to a lot of memory. Hence it is adviced, for large datasets, use short fieldnames to store data in collections, like fname instead of firstName.