Skip to content

Understanding the Chat SDK data schema

bensmiley edited this page Mar 24, 2017 · 4 revisions

Summary

This article talks about the following:

  • The difference between SQL and no-SQL databases
  • An explanation of how relationships are formed in a no-SQL database
  • The motivation behind the Chat SDK schema
  • Core Chat SDK entities
  • A full annotated example of the Chat SDK schema
  • How data is updated in real-time using Firebase

Relational database vs no-SQL database

Firebase uses a no-SQL database to store data on the backend. In this document, I'll explain the data schema used by the Chat SDK and some of the design decisions that lead us to that data layout.

There are a number of differences between relational and no-SQL databases that are important to understand.

Relational databases consist of a series of tables that are linked together using keys.

You can see in the example above, there is a one-to-many relationship between School and Student. Relational databases have the following properties:

  • Rigid structure
  • Pre-determined schema
  • Queried and manipulated using SQL
  • Relationships are defined at the database level
  • Data duplication avoided at all cost

A no-SQL database looks more like a big JSON object. Data is stored in a branch like structure where every item is either the sibling or the child of another item.

No-SQL databases have the following properties:

  • Flexible structure
  • No pre-determined schema
  • Queried and manipulated using code
  • Relationships are defined by the app's business logic
  • Data duplication tolerated when it makes processing easier

The main reason to use a no-SQL database is that it is much easier to scale up. Synchronising databases on multiple servers presents fewer technical challenges than for a relational database.

Chat SDK Schema

To start off, lets look at a simplified example of the Chat SDK's schema and then build up slowly until we look at a full example.

The Chat SDK has three main entities:

  • User
  • Thread
  • Message

There is a many-to-many relationship between user and thread and a one-to-many relationship between thread and message.

users: {
    1: {
        name: "John"
    },
    2: {
        name: "Simon"
    }
}

threads: {
    1: {
        name: "Group Chat",
        messages: {
            1: {
                text: "Hey Guys!"
            }
            2: {
                text: "What's up?"
            }
        }
    }
}

Above is a simple example.

You can see that we have two main branches users and threads. Each of these branches contains a number of sub-branches which are indexed by an id.

Inside the threads branch we have one thread (id = 1) which has name and messages properties. The messages property contains two message objects which are indexed by their id.

You can also see the one-to-many relationship between the threads and messages. The messages are defined directly as a child property of the thread object.

One characteristic of a no-SQL database is that each piece of data has it's own unique path. For example, the name of thread 1 would have the path threads/1/name. The text of message 2 would have the path threads/1/messages/2/text. This is very important for understanding how Firebase observers work.

You will see a pattern that emerges in the path structure. Usually it looks like:

entity-type/entity-id/entity-property

These paths are important because they allow us to make relationships between data in our code.

Many-to-Many relationships

In the example above, we can see that it's very easy to define one-to-many or parent-to-child relationships using the no-SQL database.

Defining many-to-many relationships is also possible.

users: {
    1: {
        name: "John"
        threads: [
            1,
            2
        ]
    },
    2: {
        name: "Simon"
        threads: [
            1,
            2
        ]
    }
    3: {
        name: "Jack"
    }
}

threads: {
    1: {
        name: "Group Chat",
        messages: {
            1: {
                text: "Hey Guys!"
            }
            2: {
                text: "What's up?"
            }
        },
        users: [
            1,
            2
        ]
    }
    2: {
        name: "Private Chat",
        messages: {
            1: {
                text: "Hey Guys!"
            }
        },
        users: [
            1,
            2
        ]
    }
}

This is similar to the previous example but now we've added a new array to the user data which contains a list of threads which the user is a member of. We've also added a list of users to the thread object. This relationship isn't enforced by the database but it does make it easier for us to create and enforce this relationship in our code.

Note
Jack's user object doesn't have a threads path because he's not a member of any thread. This means that his data schema is actually different to the other users. In a no-SQL database this isn't a problem. In our code, when we see that the threads path is null, we will assume that Jack isn't a member of any threads.

A practical example

Imagine that we wanted to show a list of the users that are in a particular thread. First we would use our code to query the database to get a particular thread object. We could request the path threads/2. This would return the following:

name: "Private Chat",
messages: {
    1: {
        text: "Hey Guys!"
    }
},
users: [
    1,
    2
]

Now to get a list of the users' names we would need to loop over the users property:

for(userId in thread.users) {
	var userPath = "users/" + userId
	var user = Firebase.get(userPath)
	print(user.name)
}

You can see that since we have a list of the user id's, we can easily get the user names from the database. The array of user ids acts like a link which makes it easy for us to traverse our database using the code.

In the Chat SDK, most entities look like this:

entity-type: {
    [entity id]: {
        meta: {
            // Entity meta data
        },
        children: {
            // Child entities
        },
        connections: [
            // Ids of connected entities
        ]
    },
    2 ...
}

Full example

Below is a screenshot from the database of an active Chat SDK installation.

Here you can see a full example of the data schema used by the Chat SDK.

How the chat uses the data in real-time

Lets look at how the Chat SDK uses this data to provide real-time instant messaging.

First it's important to understand how Firebase works. The Firebase SDK allows us to manipulate the data that's stored in the real-time database using a number of methods:

  1. Set data
  2. Update data
  3. Delete data
  4. Observer data

The first three operations are very standard and don't really affect the real-time operation of the chat. They would exist even if we weren't using a real-time database.

However, the observer data function is interesting. With a static server, we would request data rather than observing it.

The observe data operation allows us to tell Firebase to notify us in a callback, if something happens to the data at a certain path.

Note:
Remember before we talked about every piece of data having it's own unique path. This is important when we talk about observers.

There are a number of different observers that are available:

  1. Child Added
  2. Child Removed
  3. Child Changed
  4. Child Moved
  5. Value Changed

For example, if we wanted to be notified when a new message had been added to thread 1, we would do the following:

Firebase.addObserver('threads/1/messages', TypeChildAdded).then(function (message) {
    // Called whenever a child is added
})

This pseudo-code demonstrates the principle that we are talking about. Firebase will start watching the thread's messages path. When a new message is added, it will notify the app using the callback and provide the message JSON object. The app can then take the necessary steps to add the message to the database and update the user interface.

When the Chat SDK first starts up, it will add a number of observers.

  1. Add a value observer to the current user to make sure the user's profile information is up to date
  2. Add a child added observer to the user/threads path so that the app is updated when a new thread is added
  3. When a new thread is added, a value observer will be added to the thread to get the thread's details
  4. A child added observer will be added to the thread/messages path to update the app when a new messages arrives
  5. A child added / removed observer will be added to the thread/users path so we're updated when a user joins or leaves the thread
  6. For each user that joins the thread, we add a value observer to their user profile area to get their information - name, photo etc...

After these observers are added, the state of the chat will always be synchronized with the changing state of the real-time data.