Benchmarking Elasticsearch: 1 Million Writes per Sec

Siddharth Kothari
All things #search
Published in
6 min readJun 2, 2017

--

Elasticsearch is one of the most widely used distributed systems out there; ranked #11 amongst all database systems and the #1 search engine. At Appbase.io, we offer a hosted streaming Elasticsearch API for building great search, geolocation and feed experiences.

We first wrote on the topic of scaling writes in the real world in 2015, benchmarking over 100,000 writes per second on a 18-node cluster comprised of C4.2xlarge EC2 machines.

In the two years since,

  1. Elasticsearch has had two major version releases — 2.x and 5.x, with v6.0.0 available today as an alpha release.
  2. Elasticsearch now has a comprehensive macro benchmarking suite for measuring different performance metrics in the Rally project.
  3. Docker containers are emerging as the new standard for distributing software, even stateful ones like database systems.

For the first time, we are announcing 📢 a preview version of appbase.io that runs as a container within a server environment of your choosing.

In building this container version, we ended up designing an automated test suite on top of Rally for benchmarking indexing rates for

a.) measuring our streaming latency numbers and,

b.) to better understand how an appbase.io streaming proxy would affect the Elasticsearch cluster throughput.

Benchmarking: 3, 2, 1 .. GO

When we did a similar test in 2015, our calculation indicated that it would require a 150 node cluster to hit a sustained 1 Million writes per second. After all, it took Cassandra — a write optimized database system some 300 nodes to do the same in 2014.

Boy, we were in for a big surprise!

Image: Median indexing rate for ES v5.4 against the geo points dataset of 180MM records.

Scaling to 1 Million writes per second required a modest setup of 15 m4.2xlarge instances running a 15-node Elasticsearch cluster on docker containers.

We observed a near perfect linear scalability of writes as we scaled the number of cluster nodes. In the rest of the post, we will cover the test setup, data set used, and cluster settings that we used to hit these numbers. We also ran a series of other benchmarks for comparison and we will also go over them.

The Test

We use the current production version of Elasticsearch, v5.4 for running the actual indexing cluster.

We use Rally as our test benchmarking framework, Elasticsearch itself uses Rally for tracking a variety of performance metrics. benchmarks.elastic.co is a good place to start.

Image: Test tracks on esrally, captured from benchmarks.elastic.io.

Rally is set up on a benchmark master node of C4.8xlarge type, which uses 96 client threads to generate requests to the Elasticsearch cluster.

The track we picked for benchmarking the setup is Geopoint, it consists of 60.8 Million POIs from PlanetOSM.

We triplicated this dataset to get to 180 Million points to ensure that we can stabilize the write throughput when running on a 15-node setup.

The average data size is 40 bytes, and the mapping for the data looks like:

{  
"type":{
"dynamic":"strict",
"properties":{
"location":{
"type":"geo_point"
}
}
}
}

and an individual data record to be indexed looks like

{"location": [-0.1485188, 51.5250666]}

We have created an automated suite on top of Rally that:

  1. uses docker-machine to spin up EC2 instances and runs Elasticsearch in them,
  2. uses docker-machine to set up an EC2 instance for the benchmark master and runs Rally against the cluster,
  3. has a set of switches to customize configurations, repeat the benchmarks, parallelize the tests and print meta data reports on the dataset.

As far as tuning Elasticsearch itself goes, we have used the default heap size (2 GB in v5.x) and set the shards equal to the number of nodes in the cluster and used a replication factor of 1, i.e. one replica copy for each primary shard.

Comparisons

We also ran some comparison benchmarks to find out how we would fare on indexing a dataset of a different size and a more complex structure, or using a different major version of Elasticsearch.

Elasticsearch v5.4 v/s v2.4

We use the same Geopoint dataset for measuring throughput difference between two major versions of Elasticsearch.

Image: Median indexing rate difference between ES v5.4 and ES v2.4

As nodes in the cluster increase, Elasticsearch v5.4 outperforms v2.4 by a 20% margin. But overall, ES v2 does a solid job. If you are already using ES v2 in production, you might just want to wait a bit longer for the ES v6.x general candidate before making the switch ;).

A different dataset: NYC Taxis

Next, we take a very different dataset that is highly structured in comparison, 165M taxi rides performed by Yellow and Green Cabs in NYC in December, 2015. The average size of a document in this dataset is 489 bytes (~12x the size of Geopoint).

In the graph below, we compare the NYC Taxis (ES v5.4) benchmarking against the Geopoints (ES v5.4, referencing the original run).

Image: Comparing indexing rates b/w Geopoints dataset and NYC taxis.

When indexing this dataset, we observe a 2–3x throughput difference from Geopoints. However, the linear scalability still holds. It seems that with 36 nodes of m4.2xlarge instances, we would have hit 1M writes per second.

We also compared against the Geonames dataset with similar results.

Announcement 📢: appbase.io software in preview

Coming to the reason we started throughput benchmarking in the first place, we are psyched to announce the availability of appbase.io — the hosted streaming DB built on Elasticsearch as a Docker container that you can take anywhere with you.

“If software is eating the world, it needs a nice place to stay.”

A preview version of the same is available now on the Docker Hub, it works with both ElasticSearch v5 and v6. Stay tuned for a full announcement blog post with examples in the upcoming week.

You can also avail this as a hosted service via appbase.io clusters page.

Why use this?

  • appbase.io lets you run the entire Elasticsearch Query DSL interactively with streaming responses — realtime chats, dashboards, feeds, search experiences can all be built using the expressive query API of Elasticsearch.
  • appbase.io offers open-source tools and UI component kits for accessing data streams, all of these become readily available when you use appbase.io in a container.

We just did a talk on Data Streams with Elasticsearch at datalayer.com that better explains the need and use-cases for this.

Benchmark Numbers

In the spirit of the post, we want to talk about the two benchmarking numbers for appbase.io that are really important:

  1. Indexing throughput hit when proxying Elasticsearch queries via appbase.io. Turns out, it is not that bad.
Image: Adding appbase.io to an ES cluster doesn’t hurt the throughput.

Indexing rate difference on an ES cluster with appbase.io is within 1% of the indexing rate of a ES cluster without appbase.io.

2. Number of concurrent subscribers we can stream data to within 1s of latency. Latency here is measured as the time it takes for appbase.io to publish a message (40 bytes) to all the subscriber channels, and does not include time spent on the network.

We observe that we can easily scale to 100,000 concurrent subscriber channels while maintaining our goal of keeping the publish broadcast latency just a smidge under 1s on a single C4.8xlarge instance.

This works well for 99.9% of the web services. It is also possible to scale subscribers beyond this count (hint: it involves using multiple containers with a shared state), although you probably wouldn’t need it.

If you like seeing pretty visualizations and real examples, subscribe to this publication as we will doing plenty of those going forward.

--

--