Amazon Managed Streaming for Apache Kafka (Amazon MSK) is an occasion streaming platform that you should use to construct asynchronous functions by decoupling producers and customers. Monitoring of various Amazon MSK metrics is essential for environment friendly operations of manufacturing workloads. Amazon MSK gathers Apache Kafka metrics and sends them to Amazon CloudWatch, the place you may view them. You may also monitor Amazon MSK with Prometheus, an open-source monitoring utility. A lot of our prospects use such open-source monitoring instruments like Prometheus and Grafana, however doing it in self-managed setting comes with its personal challenges concerning manageability, availability, and safety.
On this put up, we present how one can construct an AWS Cloud native monitoring platform for Amazon MSK utilizing the absolutely managed, extremely accessible, scalable, and safe companies Amazon Managed service for Prometheus and Amazon Managed Grafana for higher operational insights.
Why is Kafka monitoring essential?
As a essential part of the IT infrastructure, it’s obligatory to trace Amazon MSK clusters’ operations and their efficiencies. Amazon MSK metrics helps monitor essential duties whereas working functions. You can’t solely troubleshoot issues which have already occurred, but in addition uncover anomalous conduct patterns and forestall issues from occurring within the first place.
Some prospects at present use varied third-party monitoring options like lenses.io, AppDynamics, Splunk, and others to watch Amazon MSK operational metrics. Within the context of cloud computing, prospects are searching for an AWS Cloud native service that gives equal or higher capabilities however with the added benefit of being extremely scalable, accessible, safe, and absolutely managed.
Amazon MSK clusters emit a really giant variety of metrics through JMX, lots of which will be helpful for tuning the efficiency of your cluster, producers, and customers. Nevertheless, that giant quantity brings complexity with monitoring. By default, Amazon MSK clusters include CloudWatch monitoring of your important metrics. You may lengthen your monitoring capabilities through the use of open-source monitoring with Prometheus. This characteristic allows you to scrape a Prometheus pleasant API to assemble all of the JMX metrics and work with the information in Prometheus.
This answer supplies a easy and straightforward observability platform for Amazon MSK together with a lot wanted insights into varied essential operational metrics that yields the next organizational advantages in your IT operations or utility groups:
- You may rapidly drill down to varied Amazon MSK parts (dealer stage, matter stage, or cluster stage) and determine points that want investigation
- You may examine Amazon MSK points after the occasion utilizing the historic information in Amazon Managed Service for Prometheus
- You may shorten or remove lengthy calls that waste time questioning enterprise customers on Amazon MSK points
On this put up, we arrange Amazon Managed Service for Prometheus, Amazon Managed Grafana, and a Prometheus server working as container on Amazon Elastic Compute Cloud (Amazon EC2) to offer a totally managed monitoring answer for Amazon MSK.
The answer supplies an easy-to-configure dashboard in Amazon Managed Grafana for varied essential operation metrics, as demonstrated within the following video.
Answer overview
Amazon Managed Service for Prometheus reduces the heavy lifting required to get began with monitoring functions throughout Amazon MSK, Amazon Elastic Kubernetes Service (Amazon EKS), Amazon Elastic Container Service (Amazon ECS), and AWS Fargate, in addition to self-managed Kubernetes clusters. The service additionally seamlessly integrates with Amazon Managed Grafana to simplify information visualization, staff administration authentication, and authorization.
Grafana empowers you to create dashboards and alerts from a number of sources reminiscent of an Amazon Managed Prometheus workspace, CloudWatch, AWS X-Ray, Amazon OpenSearch Service, Amazon Redshift, and Amazon Timestream.
The next diagram demonstrates the answer structure. This answer deploys a Prometheus server working as a container inside Amazon EC2, which continuously scrapes metrics from the MSK brokers and distant write metrics to an Amazon Managed Service for Prometheus workspace. As of this writing, Amazon Managed Service for Prometheus just isn’t capable of scrape the metrics instantly, due to this fact a Prometheus server is important to take action. We use Amazon Managed Grafana to question and visualize the operational metrics for the Amazon MSK platform.
The next are the high-level steps to deploy the answer:
- Create an EC2 key pair.
- Configure your Amazon MSK cluster and related assets. We reveal the way to configure an present Amazon MSK cluster or create a brand new one.
- Possibility A:- Modify an present Amazon MSK cluster
- Possibility B:- Create a brand new Amazon MSK cluster
- Allow AWS IAM Id Heart (successor to AWS Single Signal-On), if not enabled.
- Configure Amazon Managed Grafana and Amazon Managed Service for Prometheus.
- Configure Prometheus and begin the service.
- Configure the information sources in Amazon Managed Grafana.
- Import the Grafana dashboard.
Stipulations
- You obtain three CloudFormation template information together with the Prometheus configuration file (
prometheus.yml
),targets.json
file (you want this to replace the MSK dealer DNS afterward), and three JSON information for making a dashboard inside Amazon Managed Grafana. - Be certain that web connection is allowed to obtain docker picture of Prometheus from inside Prometheus server
1. Create an EC2 key pair
To create your EC2 key pair, full the next steps:
- On the Amazon EC2 console, underneath Community & Safety within the navigation pane, select Key Pairs.
- Select Create key pair.
- For Title, enter
DemoMSKKeyPair
. - For Key pair kind¸ choose RSA.
- For Personal key file format, select the format wherein to save lots of the non-public key:
- To avoid wasting the non-public key in a format that can be utilized with OpenSSH, choose .pem.
- To avoid wasting the non-public key in a format that can be utilized with PuTTY, choose .ppk.
The non-public key file is mechanically downloaded by your browser. The bottom file title is the title that you simply specified because the title of your key pair, and the file title extension is decided by the file format that you simply selected.
- Save the non-public key file in a secure place.
2. Configure your Amazon MSK cluster and related assets.
Utilizing the next choices to configure an present Amazon MSK cluster or create a brand new one.
2.a Modify an present Amazon MSK cluster
If you wish to create a brand new Amazon MSK cluster for this answer, skip to the part – 2.b.Create a brand new Amazon MSK cluster, in any other case full the steps on this part to switch an present cluster.
Validate cluster monitoring settings
We should allow enhanced partition-level monitoring (accessible at a further value) and open monitoring with Prometheus. Be aware that open monitoring with Prometheus is just accessible for provisioned mode clusters.
- Check in to the account the place the Amazon MSK cluster is that you simply wish to monitor.
- Open your Amazon MSK cluster.
- On the Properties tab, navigate to Monitoring metrics.
- Test the monitoring stage for Amazon CloudWatch metrics for this cluster, and select Edit to edit the cluster.
- Choose Improve partition-level monitoring.
- Test the monitoring label for Open monitoring with Prometheus, and select Edit to edit the cluster.
- Choose Allow open monitoring for Prometheus.
- Beneath Prometheus exporters, choose JMX Exporter and Be aware Exporter.
- Beneath Dealer log supply, choose Ship to Amazon CloudWatch Logs.
- For Log group, enter your log group for Amazon MSK.
- Select Save modifications.
Deploy CloudFormation stack
Now we deploy the CloudFormation stack Prometheus_Cloudformation.yml
that we downloaded earlier.
- On the AWS CloudFormation console, select Stacks within the navigation pane.
- Select Create stack.
- For Put together template, choose Template is prepared.
- For Template supply, choose Add a template.
- Add the
Prometheus_Cloudformation.yml
file, then select Subsequent.
- For Stack title, enter
Prometheus
. - VPCID – Present the VPC ID the place your Amazon MSK cluster is deployed (obligatory)
- VPCCIdr – Present the VPC CIDR the place your Amazon MSK Cluster is deployed (obligatory)
- SubnetID – Present any one of many subnets ID the place your present Amazon MSK cluster is deployed (obligatory)
- MSKClusterName – Present the title your present Amazon MSK Cluster
- Depart Cloud9InstanceType, KeyName, and LatestAmild as default.
- Select Subsequent.
- On the Overview web page, choose I acknowledge that AWS CloudFormation would possibly create IAM assets.
- Select Create stack.
You’re redirected to the AWS CloudFormation console, and may see the standing as CREATE_IN_PROGRESSS
. Wait till the standing modifications to COMPLETE
.
- On the stack’s Outputs tab, notice the values for the next keys (in the event you don’t see something underneath Outputs tab, click on on refresh icon):
PrometheusInstancePrivateIP
PrometheusSecurityGroupId
Replace the Amazon MSK cluster safety group
Full the next steps to replace the safety group of the prevailing Amazon MSK cluster to permit communication from the Kafka consumer and Prometheus server:
- On the Amazon MSK console, navigate to your Amazon MSK cluster.
- On the Properties tab, underneath Community settings, open the safety group.
- Select Edit inbound guidelines.
- Select Add rule and create your rule with the next parameters:
- Kind – Customized TCP
- Port vary – 11001–11002
- Supply – The Prometheus server safety group ID
Arrange your AWS Cloud9 setting
To configure your AWS Cloud9 setting, full the next steps:
- On the AWS Cloud9 console, select Environments within the navigation pane.
- Choose
Cloud9EC2Bastion
and select Open in Cloud9.
- Shut the Welcome tab and open a brand new terminal tab
- Create an SSH key file with the contents from the non-public key file
DemoMSKKeyPair
utilizing the next command: - Run the next command to record the newly created key file
- Open the file, enter the contents of the non-public key file
DemoMSKKeyPair
, then save the file.
- Change the permissions of the file utilizing the next command:
- Log in to the Prometheus server utilizing this key file and the non-public IP famous earlier:
- When you’re logged in, verify if the Docker service is up and working utilizing the next command:
- To exit the server, enter
exit
and press Enter.
2.b Create a brand new Amazon MSK cluster
For those who don’t have an Amazon MSK cluster working in your setting, otherwise you don’t wish to use an present cluster for this answer, full the steps on this part.
As a part of these steps, your cluster can have the next properties:
Deploy CloudFormation stack
Full the next steps to deploy the CloudFormation stack MSKResource_Cloudformation.yml
:
- On the AWS CloudFormation console, select Stacks within the navigation pane.
- Select Create stack.
- For Put together template, choose Template is prepared.
- For Template supply, choose Add a template.
- Add the
MSKResource_Cloudformation.yml
file, then select Subsequent. - For Stack title, enter
MSKDemo
. - Community Configuration – Generic (obligatory)
- Stack to be deployed in NEW VPC? (true/false) – if false, you MUST present VPCCidr and different particulars underneath Current VPC part (Default is true)
- VPCCidr – Default is 10.0.0.0/16 for a brand new VPC. You may have any legitimate values as per your setting. If deploying in an present VPC, present the CIDR for a similar
- Community Configuration – For New VPC
- PrivateSubnetMSKOneCidr (Default is 10.0.1.0/24)
- PrivateSubnetMSKTwoCidr (Default is 10.0.2.0/24)
- PrivateSubnetMSKThreeCidr (Default is 10.0.3.0/24)
- PublicOneCidr (Default is 10.0.0.0/24)
- Community Configuration – For Current VPC (You want at the least 4 subnets)
- VpcId – Present the worth if you’re utilizing any present VPC to deploy the assets else depart it clean(default)
- SubnetID1 – Any one of many present subnets from the given VPCID
- SubnetID2 – Any one of many present subnets from the given VPCID
- SubnetID3 – Any one of many present subnets from the given VPCID
- PublicSubnetID – Any one of many present subnets from the given VPCID
- Depart the remaining parameters as default and select Subsequent.
- On the Overview web page, choose I acknowledge that AWS CloudFormation would possibly create IAM assets.
- Select Create stack.
You’re redirected to the AWS CloudFormation console, and may see the standing as CREATE_IN_PROGRESSS
. Wait till the standing modifications to COMPLETE
.
- On the stack’s Outputs tab, notice the values for the next (in the event you don’t see something underneath Outputs tab, click on on refresh icon):
KafkaClientPrivateIP
PrometheusInstancePrivateIP
Arrange your AWS Cloud9 setting
Comply with the steps as outlined within the earlier part to configure your AWS Cloud9 setting.
Retrieve the cluster dealer record
To get your MSK cluster dealer record, full the next steps:
- On the Amazon MSK console, navigate to your cluster.
- Within the Cluster abstract part, select View consumer data.
- Within the Bootstrap servers part, copy the non-public endpoint.
You want this worth to carry out some operations later, reminiscent of creating an MSK matter, producing pattern messages, and consuming these pattern messages.
- Select Carried out.
- On the Properties tab, within the Brokers particulars part, notice the endpoints listed.
These should be up to date within the targets.json
file (used for Prometheus configuration in a later step).
3. Allow IAM Id Heart
Earlier than you deploy the CloudFormation stack for Amazon Managed Service for Prometheus and Amazon Managed Grafana, make sure that to allow IAM Id Heart.
For those who don’t use IAM Id Heart, alternatively, you may arrange consumer authentication through SAML. For extra data, discuss with Utilizing SAML together with your Amazon Managed Grafana workspace.
If IAM Id Heart is at present enabled/configured in one other area, you don’t must allow in your present area.
Full the next steps to allow IAM Id Heart:
- On the IAM Id Heart console, underneath Allow IAM Id Heart, select Allow.
- Select Create AWS group.
4. Configure Amazon Managed Grafana and Amazon Managed Service for Prometheus
Full the steps on this part to arrange Amazon Managed Service for Prometheus and Amazon Managed Grafana.
Deploy CloudFormation template
Full the next steps to deploy the CloudFormation stack AMG_AMP_Cloudformation
:
- On the AWS CloudFormation console, select Stacks within the navigation pane.
- Select Create stack.
- For Put together template, choose Template is prepared.
- For Template supply, choose Add a template.
- Add the
AMG_AMP_Cloudformation.yml
file, then select Subsequent. - For Stack title, enter ManagedPrometheusAndGrafanaStack, then select Subsequent.
- On the Overview web page, choose I acknowledge that AWS CloudFormation would possibly create IAM assets.
- Select Create stack.
You’re redirected to the AWS CloudFormation console, and may see the standing as CREATE_IN_PROGRESSS
. Wait till the standing modifications to COMPLETE
.
- On the stack’s Outputs tab, notice the values for the next (in the event you don’t see something underneath Outputs tab, click on on refresh icon):
- GrafanaWorkspaceURL – That is Amazon Managed Grafana URL
- PrometheusEndpointWriteURL – That is the Amazon Managed Service for Prometheus write endpoint URL
Create a consumer for Amazon Managed Grafana
Full the next steps to create a consumer for Amazon Managed Grafana:
- On the IAM Id Heart console, select Customers within the navigation pane.
- Select Add consumer.
- For Username, enter
grafana-admin
. - Enter and make sure your electronic mail deal with to obtain a affirmation electronic mail.
- Skip the non-compulsory steps, then select Add consumer.
Successful message seems on the prime of the console.
- Within the affirmation electronic mail, select Settle for invitation and set your consumer password.
- On the Amazon Managed Grafana console, select Workspaces within the navigation pane.
- Open the workspace
Amazon-Managed-Grafana
. - Make a remark of the Grafana workspace URL.
You utilize this URL to log in to view your Grafana dashboards.
- On the Authentication tab, select Assign new consumer or group.
- Choose the consumer you created earlier and select Assign customers and teams.
- On the Motion menu, select what sort of consumer to make it: admin, editor, or viewer.
Be aware that your Grafana workspace wants as least one admin consumer.
- Navigate to the Grafana URL you copied earlier in your browser.
- Select Check in with AWS IAM Id Heart.
- Log in together with your IAM Id Heart credentials.
5. Configure Prometheus and begin the service
While you cloned the GitHub repo, you downloaded two configuration information: prometheus.yml
and targets.json
. On this part, we configure these two information.
- Use any IDE (Visible Studio Code or Notepad++) to open prometheus.yml.
- Within the
remote_write part
, replace the distant write URL and Area.
- Use any IDE to open
targets.json
. - Replace the targets with the dealer endpoints you obtained earlier.
- In your AWS Cloud9 setting, select File, then Add Native Information.
- Select Choose Information and add targets.json and prometheus.yml out of your native machine.
- Within the AWS Cloud9 setting, run the next command utilizing the important thing file you created earlier:
- copy targets.json to the Prometheus server:
- copy prometheus.yml to the Prometheus server:
- SSH into the Prometheus server and begin the container service for Prometheus
- begin the prometheus container
- Test if the Docker service is working:
6. Configure information sources in Amazon Managed Grafana
To configure your information sources, full the next steps:
- Log in to the Amazon Managed Grafana URL.
- Select AWS Information Companies within the navigation pane, then select Information Sources.
- For Service, select Amazon Managed Service for Prometheus.
- For Area, select your Area.
The right useful resource ID is populated mechanically.
- Choose your useful resource ID and select Add 1 information supply.
- Select Go to settings.
- For Title, enter
Amazon Managed Prometheus
and allow Default.
The URL is mechanically populated.
- Depart every little thing else as default.
- Select Save & Take a look at.
If every little thing is appropriate, the message Information supply is working seems.
Now we configure CloudWatch as an information supply.
- Select AWS Information Companies, then select Information supply.
- For Companies, select CloudWatch.
- For Area, select your appropriate Area.
- Select Add information supply.
- Choose the CloudWatch information supply and select Go to settings.
- For Title, enter
AmazonMSK-CloudWatch
. - Select Save & Take a look at.
7. Import the Grafana dashboard
You need to use the next preconfigured dashboards, which can be found to obtain from the GitHub repo:
- Kafka Metrics
- MSK Cluster Overview
- AWS MSK – Kafka Cluster-CloudWatch
To import your dashboard, full the next steps:
- In Amazon Managed Grafana, select the plus signal within the navigation pane.
- Select Import.
- Select Add JSON file.
- Select the dashboard you downloaded.
- Select Load.
The next screenshot reveals your loaded dashboard.
Generate pattern information in Amazon MSK (Elective – if you create a brand new Amazon MSK Cluster)
To generate pattern information in Amazon MSK, full the next steps:
- In your AWS Cloud9 setting, log in to the Kafka consumer.
- Set the dealer endpoint variable
- Run the next command to create a subject referred to as TLSTestTopic60:
- Nonetheless logged in to the Kafka consumer, run the next command to start out the producer service:
- Open a brand new terminal from inside your AWS Cloud9 setting and log in to the Kafka consumer occasion
- Set the dealer endpoint variable
- Now you can begin the patron service and see the incoming messages
- Press CTRL+C to cease the producer/client service.
Kafka metrics dashboards on Amazon Managed Grafana
Now you can view your Kafka metrics dashboards on Amazon Managed Grafana:
- Cluster total well being – Configured utilizing Amazon Managed Service for Prometheus as the information supply:
Amazon MSK cluster overview – Configured utilizing Amazon Managed Service for Prometheus as the information supply:
- Essential metrics
- Cluster throughput (broker-level metrics)
Kafka cluster operation metrics – Configured utilizing CloudWatch as the information supply:
Clear up
You’ll proceed to incur prices till you delete the infrastructure that you simply created for this put up. Delete the CloudFormation stack you used to create the respective assets.
For those who used an present cluster, make sure that to take away the inbound guidelines you up to date within the safety group (in any other case the stack deletion will fail).
- On the Amazon MSK console, navigate to your present cluster.
- On the Properties tab, within the Networking settings part, open the safety group you utilized.
- Select Edit inbound guidelines.
- Select Delete to take away the principles you added.
- Select Save guidelines.
Now you may delete your CloudFormation stacks.
- On the AWS CloudFormation console, select Stacks within the navigation pane.
- Choose
ManagedPrometheusAndGrafana
and select Delete. - For those who used an present Amazon MSK cluster, delete the stack
Prometheus
. - For those who created a brand new Amazon MSK cluster, delete the stack
MSKDemo
.
Conclusion
This put up confirmed how one can deploy a totally managed, extremely accessible, scalable, and safe monitoring system for Amazon MSK utilizing Amazon Managed Service for Prometheus and Amazon Managed Grafana, and use Grafana dashboards to achieve deep insights into varied operational metrics. Though this put up solely mentioned utilizing Amazon Managed Service for Prometheus and CloudWatch as the information sources in Amazon Managed Grafana, you may allow varied different information sources, reminiscent of AWS IoT SiteWise, AWS X-Ray, Redshift, and Amazon Athena, and construct a dashboard on prime of these metrics. You need to use these managed companies for monitoring any variety of Amazon MSK platforms. Metrics can be found to question in Amazon Managed Grafana or Amazon Managed Service for Prometheus in near-real time.
You need to use this put up as prescriptive steerage and deploy an observability answer for a brand new or an present Amazon MSK cluster, determine the metrics which might be vital in your functions after which create a dashboard utilizing Amazon Managed Grafana and Prometheus.
Concerning the Authors
Anand Mandilwar is an Enterprise Options Architect at AWS. He works with enterprise prospects serving to prospects innovate and rework their enterprise in AWS. He’s enthusiastic about automation round Cloud operation , Infrastructure provisioning and Cloud Optimization. He additionally likes python programming. In his spare time, he enjoys honing his pictures ability particularly in Portrait and panorama space.
Ajit Puthiyavettle is a Answer Architect working with enterprise purchasers, architecting options to realize enterprise outcomes. He’s enthusiastic about fixing buyer challenges with revolutionary options. His expertise is with main DevOps and safety groups for enterprise and SaaS (Software program as a Service) corporations. Not too long ago he’s focussed on serving to prospects with Safety, ML and HCLS workload.