Search Header Logo
Elasticsearch and the Elastic stack

Elasticsearch and the Elastic stack

Assessment

Presentation

Professional Development

Professional Development

Hard

Created by

Sanil Khurana

Used 5+ times

FREE Resource

62 Slides • 0 Questions

1

media

ELASTICSEARCH

2

THE PROBLEM OF SEARCH!

WHY IT's SUPER DIFFICULT, aND WHY IT's SUPER IMPORTANT

media

3

Search is often one of the core Features for most platforms

4

SEARCH at quizizz

media

5

But let's take a moment to talk about the problem statement of search

6

​Problem statement

​FROM

media

7

​Problem statement

​TO

media

8

media

9

But it's not very easy to do

10

Imagine searching through millions, if not billions of documents when...

11

TYPOS

media

12

media

13

WHEN You may want custom logic like showing out of stock items in the end, or sorting by ratings

media

14

RELEVANCE

media

15

ELASTICSEARCH AND THE ELASTIC STACK

AND HOW IT ALL FITS TOGETHER

media

16

INTRODUCING Elasticsearch!

media

17

​INTRODUCING Elasticsearch!

ElasticSearch is a document oriented database

media

18

​INTRODUCING Elasticsearch!

At the first glance it may seem very similar to Mongo

​But it's actually a really powerful full text search database

media

19

​INTRODUCING Elasticsearch!

We'd come to why it is an extremely powerful search engine soon,

But for now, understand that it scans every single document and maintains a list of every unique word that appears in any document and identifies which documents contain it

20

​INTRODUCING Elasticsearch!

Apart from all this,

21

​INTRODUCING Elasticsearch!

Apart from all this,

​1. It is extremely scalable, promising almost linear scaling*

*This has some conditions, more on it later

22

​INTRODUCING Elasticsearch!

Apart from all this,

​1. It is extremely scalable, promising almost linear scaling*

​2. Supports a lot of integrations with various tools

*This has some conditions, more on it later

23

​INTRODUCING Elasticsearch!

Apart from all this,

​1. It is extremely scalable, promising almost linear scaling*

​2. Supports a lot of integrations with various tools

​3. Can do search(among other things) really well

*This has some conditions, more on it later

24

​Where does Kibana fit in?

Kibana is a tool built by the same company elastic, and it works mostly as a dashboarding tool

media
media

26

​What is logstash?

Logstash is another tool by elastic that helps in ingesting data into ES(and other databases)

It allows us to parse data, transform it, and load it into ES​

media
media

27

​How it all fits together

media

28

OPERATIONS ON ELASTICSEARCH

media

29

Architecture

INVERTED INDICES AND FULL TEXT SEARCH

media

30

​How does Elasticsearch index data

ElasticSearch essentially builds an inverted index, which would map every unique word in all the documents to every document that contains it

media

31

​Score

Score mainly considers two parameters,

the total number of times the search term comes up in the document​, and

how frequently the term comes up in other documents​

media

32

Architecture .....AGAIN

​Nodes, Shards, Clusters

media

33

Let's Walk through a Scenario:

Assume we are a small startup

34

​Implement a simple search

Architecture | Nodes, Shards, Clusters

media

35

​IT WORKS!

Architecture | Nodes, Shards, Clusters

media

36

Architecture | Nodes, Shards, Clusters

media

37

​We are too successful....

Architecture | Nodes, Shards, Clusters

media

38

Let's look at the problems

​Problems with our current Architecture

All the data is stored in a single node, which leads to a single point of failure

39

Let's look at the problems

​Problems with our current Architecture

All the data is stored in a single node, which leads to a single point of failure

ES is complex, for every document, it scans the entire document and it maintains an inverted index of every unique word in the doc in the memory, which means there is too much load on the single node since its resources are limited.

40

Let's look at the problems

​Problems with our current Architecture

All the data is stored in a single node, which leads to a single point of failure

ES is complex, for every document, it scans the entire document and it maintains an inverted index of every unique word in the doc in the memory, which means there is too much load on the single node since its resources are limited.

Since all the requests are being handled by the single node with limited CPU/memory, users get slow responses​

41

​A BETTER ARCHITECTURE

Architecture | Nodes, Shards, Clusters

media

​We add another node to the architecture, which can handle half the requests and store half the data!​

42

​A BETTER ARCHITECTURE

Architecture | Nodes, Shards, Clusters

Just to build some terminology, we'd combine multiple documents into a shard, and say that a single node contains multiple shards

media

43

​Everything seems great...

Architecture | Nodes, Shards, Clusters

media

44

​BUT EVENTUALLY SOMETHING FAILS

Architecture | Nodes, Shards, Clusters

media

45

​AND we lose one node, along with all its DATA

Architecture | Nodes, Shards, Clusters

media
media

46

​BUILDING A MORE ROBUST ARCHITECTURE

Architecture | Nodes, Shards, Clusters

media

​We add Replica Shards, that contain the same documents as the primary shards.

We also ensure that ​a replica shard and its associated primary never live on the same node

47

​COST OF USING ELASTICSEARCH?

media

48

​What we know so Far....

​1. It is extremely scalable, promising almost linear scaling*

​2. Supports a lot of integrations with various tools

​3. Can do search(among other things) really well

*This has some conditions, more on it later

49

​But what does it cost

​1. It is heavy. A lot of the inverted index is maintained in heap, which means it requires a lot of memory to serve requests. This also limits its ability to perform as a primary DB.

​2. It is difficult to manage. Along with these layers of abstractions of shards, indexes, clusters comes the responsibility of managing them. It is time consuming or even impossible to change a lot of configuration without reindexing everything

*This has some conditions, more on it later

50

​But what does it cost

​3. Adding new documents can be time consuming, since it needs to read every word, it takes a little time before documents are searchable.

​4. Querying is not straightforward. Any query that involves multiple shards may require a lot of network hops, compute of other nodes, and may be slow

*This has some conditions, more on it later

51

​A common architecture

*This has some conditions, more on it later

media

52

A FEW POINTS I MISSED...

media

53

​What I missed

Inverted index is just one type of index, but ES would use other indexes for datatypes apart from text.​

54

​What I missed

There are other tools in the Elastic Stack that I didn't include in this discussion

media

55

​What I missed

Different type of nodes, like master nodes and replica nodes.

56

​SEARCH IN QUIZIZZ!

media

57

media

58

media

59

​AMAZON Opensearch

Our ES clusters are hosted on Amazon Opensearch

media

60

​AMAZON Opensearch

Our ES clusters sit behind search service

media

61

​AMAZON Opensearch

​This also provides a hosted Kibana environment

media
media

ELASTICSEARCH

Show answer

Auto Play

Slide 1 / 62

SLIDE

Discover more resources for Professional Development