Daniel Lacko17.12.202115 minutes
This is the first part of 3-part series on the Amazon OpenSearch service. This part is introductory and contains the necessary theory. If you wish to go straight to deploying, please continue with the upcoming second part of the series.
In the first part, we will take a look at:
Amazon OpenSearch service is a solution for searching, analyzing, and visualizing your data. The data is indexed into performant data structures that enable fast and easy search and filter capabilities. You can further work with the data to build monitoring systems, anomaly detection, alarms, visualizations and put everything into dashboards. OpenSearch is a community-driven open-source fork of ElasticSearch and Kibana. As such, this service still offers an option to deploy either ElasticSearch or OpenSearch. It is up to you. At the moment of writing this post, these versions are available:
If you opt-in for ElasticSearch, there is also an option to migrate to OpenSearch. Your choice should also take into consideration these notes:
To have a better idea of what it can look like working with OpenSearch, imagine you need to track the performance metrics of your AWS Lambda function. AWS Lambda includes these metrics in the REPORT statement of lambda invocation logs. With the right filter, we can focus on these statements:
In continuation of our effort, we can create visualizations from these metrics and put them on a dashboard:
These metrics are also a great candidate for anomaly detection. Once you have the anomaly detection in place, you can view the anomaly history:
Anomaly detection depends on the availability of data. In this image you can see that we are missing some data points:
To those who are completely unaware of the story, the 'Amazon OpenSearch service' was formerly named 'Amazon ElasticSearch service'. The AWS service was named Amazon ElasticSearch from its release in 2015 until this year. Long story short, there are two sides to this story, Amazon and Elastic companies. The Elastic company did not like the way they were treated both from legal and open source POV. Hence, they changed licensing of their products and Amazon responded by creating a fork from ElasticSearch 7.10 - Amazon OpenSearch. This is also a reason why you won't get any further updates on Amazon ElasticSearch. If you would like to read more on the story, check the Elastic.co blog.
In general, these parameters have the highest impact on the cluster performance:
Availability zones - Deployments using only 1 Availability zone are prone to outages and possibly even to complete loss of data. This can be caused by non-standard events like AWS outages or natural disasters. 1 availability zone also means higher latency for people/resources accessing the OpenSearch cluster from a more remote location. Deployments using 2 or 3 availability zones are using data centers isolated from each other, which makes them immune to AWS outages and natural disasters. The latency is also lower as the deployment covers a wider geographical area.
Master nodes - Master nodes help to offload cluster management and maintenance tasks. They don't hold any data. Having master nodes increases the stability of the cluster. You can deploy 1 master node, but in that case, OpenSearch does not have a backup master node and in case of failure, your cluster will be down. Recommended number of master nodes is 3. You should always choose an odd number of master nodes. There is always only one master node activated and the rest are idle and will be activated in case of failure. You are also charged for the idle nodes and their only use case is the backup of the master node.
Master node instance type - Master node instance type depends on the number of (Master + Data) nodes, indices, and shards. The more data nodes you have, the higher the master instance type should be. AWS recommends the following:
+------------+----------------------------------------------------+ | Node count | Recommended minimum dedicated master instance type | +------------+----------------------------------------------------+ | 1-10 | m5.large.search OR m6g.large.search | +------------+----------------------------------------------------+ | 10-30 | c5.xlarge.search OR c6g.xlarge.search | +------------+----------------------------------------------------+ | 30-75 | c5.2xlarge.search OR c6g.2xlarge.search | +------------+----------------------------------------------------+ | 75-200 | r5.4xlarge.search OR r6g.4xlarge.search | +------------+----------------------------------------------------+
Data nodes - Data nodes hold the data and execute operations like uploading and querying the data. You should always have at least 2 nodes for high availability. The number of nodes depends on the amount of total storage you need and the allowed maximum storage per instance type. You can start with a minimal amount of data nodes with a minimal instance type that can hold your data for your data retention period.
Data node instance type - Data node instance type dictates vCPU count, amount of RAM, and the maximal amount of storage it can hold. It's more or less trial and error until you find a well-balanced configuration. If you are running out of space, but your CPU load is stable, it's better to just add another data node of the same type. If you are having performance issues and your CPU load is constantly high, it's time to scale up and level up the instance type. Use CloudWatch monitoring to review the CPU/Memory/Storage usage.
Warm/Cold storage nodes - Storage nodes are used for read-only data. Warm storage node is used for frequently accessed data. Cold storage node is used for infrequently accessed data. Warm storage is using S3 in the backend with caching solution for reduced latency on top of that. Cold storage is also using S3, but there is no computing involved. Storage nodes can be deployed only to a cluster with dedicated master nodes. When you query data stored in warm storage, the data is moved from Amazon S3 to local storage and processing that requires compute power. If you encounter high latency while querying data from warm storage, you need to scale out or scale in. Good approach is to test the performance of warm storage using example data set from your workload while monitoring warm storage node metrics.
It's an industry standard to develop a project in multiple stages. Your architecture should reflect your use case and requirements for each stage. Each stage comes with different requirements and the closer to the production, the more scaled out/up the resources are, the higher usage/traffic, more data is generated and failure/downtime is less tolerable. In the next few sections, we will describe each stage. Of course, everything depends on the type and scale of the project you are running. Some companies might have a production cluster with requirements not being compliant to even your development cluster. Let's consider these 4 stages:
OpenSearch usage:
Architecture recommendations:
OpenSearch usage:
Architecture recommendations:
If your staging stage is available to your customers as an open alpha/beta, then you might consider applying production stage recommendations, otherwise, you can use the development stage recommendations. In the end, it depends on the usage/traffic and data throughput.
OpenSearch usage:
Architecture recommendations:
Amazon OpenSearch service incorporates a fork of the Kibana dashboard. Through this dashboard, you can access the data stored in OpenSearch and run queries and visualizations on them. The question is, how do you want to access this dashboard? You have 2 options on accessibility:
Authentication is also one of the things you have to configure right from the start. This part of the configuration is solely up to you. You have these options:
You are charged based on 4 metrics:
Amazon OpenSearch is not a cheap service and if your workloads generate a high amount of logs/data, it will also translate into the requirements of the OpenSearch cluster you will need.
When it comes to cost estimation, you need to have a good understanding of your cluster requirements, data throughput, and data retention.
Data throughput can be tricky. OpenSearch saves the data with an overhead. The best way to estimate the throughput is to deploy a test cluster for a short period, feed the data into it, and calculate the average. The minimal cluster you can deploy is this:
Once you have all the numbers, you can use the AWS cost calculator.
This part should give you an idea of what the OpenSearch is, what you can do with it, history with the ElasticSearch, what are the options on deploying the service, how does the pricing work, and how to create a cost estimation.
The second part will be mostly practical as we will guide you through the deployment of publicly accessible OpenSearch.