Search Header Logo
All things Database

All things Database

Assessment

Presentation

Professional Development

Professional Development

Easy

Created by

Shivam Tripathi

Used 2+ times

FREE Resource

24 Slides • 1 Question

1

All things database

by Shivam Tripathi

2

Poll

How many questions were solved in the last year?

4-5 Billion

7-9 Billion

10-12 Billion

13-14 Billion

3

​Scale of operations

  • Around 75 million questions are solved ​every day.

  • ​Last year, around 1 billion players solved more than 14 billion questions.

  • ​1 million = 10 lakhs, 1 billion = 100 crore.

media

4

​What will we be discussing today?

  • ​How do we ensure high availability, low latency in order of milliseconds and horizontal scaling of data stores

  • ​How data is stored to enable optimal user experience

  • ​Different forms of data and lifecycle of a game

  • ​Different types data stores and the type of data they store

  • ​Important entities and their schema design (and reasons behind it)

5

System Design: Quizizz

  • ​Core operations of Quizizz can be roughly broken down into two parts:

    • ​Content creation

    • ​Content usage (in form of games)

media

6

​Entity comparison: Quiz vs Game

  • A quiz is a collection of questions and related metadata.

  • ​A user can "run" a quiz to start a game.

  • ​A user, after signing up, can choose to create a new content.

  • ​In computer science terminology, a quiz is like a piece of "program" and game is like a "process" - a program in execution.

media

7

​Entity: Quiz and Questions

  • ​Internally, lessons and quiz are same. They are differentiated by a type.

  • ​Quiz contains questions (which optionally contain options in case of MCQ/MSQ, type of question, correct answer, formatting for display etc).

media

8

​Entity: Game

  • ​A game is a quiz in execution.

  • ​A user can create their own quiz to start a game or optionally use someone else's quiz to start a game.

  • ​Quizizz has millions of user generated quiz on multiple topics which teachers can search, optionally clone to make changes (if they feel like it) and start a game from.

media

9

Challenge

Suppose a game was started from a quiz, and owner of the quiz modified that quiz (changed questions or correct answer etc)​.

​Would that impact consistent of experience to all players?

​If not not, how do we handle this situation?

10

Entity: ​Quiz Version

  • ​We decouple quiz metadata and core fields from a quiz.

  • A game depends just on a version of a quiz.

  • ​Upon any changes to existing quiz, a new version is published.

  • ​Old version continues to exist giving consistent experience for a running game.

media

11

​Primary Datastore: Mongo

​- Stores all user related information

​- Stores all content information

​- Stores all game information except responses

​- Follows PSA for High availability

​- Disks are regularly backed up by snapshots

media

12

​Responses Datastore: Sharded Mongo

  • ​Responses are stored in a sharded mongodb instance.

  • ​In sharding there are multiple shards which store a part of relevant data.

  • ​Horizontally scaled stateless mongos sit on top of theses shards to route traffic to correct destination.

  • ​Connection must be sticky, we use service discovery.

media

13

​Game Microservice and Game Redis

  • ​Number of writes on the system is very high (75 M+/day)

  • ​A large number of these writes are concentrated in small time span (American hours)

  • ​Game Redis is a horizontally scaled redis cluster which serves as an intermediary before we bulk write data back into mongo.

  • ​Write back cache vs write through cache.

media

14

Game Ecosystem

  • ​Game exists in two data stores: redis and mongo. Game in redis is called a "room" (after socket server room which allow for real time communications) and in mongo it is called plain "game". Id for room is called hash (roomHash) and for game is BSON _id or gameId.

  • A running game is always in redis. A completed game may be in mongo (depends on cleanup cycle).

  • ​Apart from room, redis also stores data about players, playersResponses, questions (from the quiz version this game was started from), powerups, game code to game id mapping etc.

15

​Game Types and some features

  • Live game

  • ​Async game (homework)

  • ​Solo game

  • ​Teacher paced

​​All of these types repeat for lessons.

​Additionally, there are some special features: games with infinite deadline, open an already over game, reassign score to a running or over game etc.

media

16

​Cleanups: Live Game

​​After every few hours/days we attempt to move different types of games to mongo from redis.

  1. ​Live game

​A live can be over (everyone finished) or stopped (manually by the teacher) or expired (automatically if it has been open for > 8 hours). A cleanup runs every 6 hours to move this into mongo. Maximum time a live game (lesson or quiz) or teacher paced game can stay in redis is 14 hours.

​In case it is stopped or completed, it is immediately moved into mongo but not cleaned from redis. Cleanup is handled by the crons is only.

17

​Cleanups: Async games

Async games are cleaned up twice a day when traffic is lowest. This includes all expired or stopped games.

​To support async games with infinite deadline, we move async games with no activity in the last one week into mongo (with the state still as running). We move them back into the redis once we have any new activity in the game. As this can have race conditions always make sure you fetch the room with a lock which handles repopulation with a lock!​

18

​Cleanup: Solo Games

Solo games can be completed, stopped or moved.​

​Solo games with no activity in last 24 hours are also moved into mongo temporarily.

​Solo games which have been running longer than a week are closed and permanently moved into mongo.

19

​Other Redis stores

  • ​Solo Redis - purely for solo games. Sentinel master slave redis.

  • Cache Redis - System wide caching redis. Distributed cluster​.

  • Student Redis - stores game related information for each student. For eg. running games, some metadata about completed games etc. This is behind a sentinel master slave.

20

​Other data stores: Elastic Search

  • ​We use elasticsearch to store all search and paginated information.

  • Indexes:

    • quizzes2: Public search

      quizizzpersonal: my quizzes listing and data related to quizId_userId likes, tagged etc

      reports: report listing (state, totalplayers)

      studentprofile: completed games grouped on quizId_userId -> [game_id]

media

21

​Other Datastores: BigQuery

  • This contains all the analytics events coming from user usage

  • ​This also contains some other analytics relevant data. This serves as data source for insight toolings like metabase etc

  • ​This also contains all bugs streaming from backend services.

media

22

​Other datastore: Dynamo

  • ​This contains data around deployments systems, eg: Component Ids, deployments metadata etc.

media

23

​Miscellaneous

​From from all before, we also use AWS S3 as block store for storing media files.

​AWS Cloudfront also acts as edge cache for a number of different requirements.

media

24

Questions?​

25

Thank You!​

All things database

by Shivam Tripathi

Show answer

Auto Play

Slide 1 / 25

SLIDE