Ingest, remodel, and ship occasions revealed by Amazon Safety Lake to Amazon OpenSearch Service

Ingest, remodel, and ship occasions revealed by Amazon Safety Lake to Amazon OpenSearch Service

With the latest introduction of Amazon Safety Lake, it has by no means been easier to entry all of your security-related knowledge in a single place. Whether or not it’s findings from AWS Safety Hub, DNS question knowledge from Amazon Route 53, community occasions equivalent to VPC Movement Logs, or third-party integrations supplied by companions equivalent to Barracuda Email Protection, Cisco Firepower Management Center, or Okta identity logs, you now have a centralized surroundings in which you’ll correlate occasions and findings utilizing a broad vary of instruments within the AWS and accomplice ecosystem.

Safety Lake routinely centralizes safety knowledge from cloud, on-premises, and customized sources right into a purpose-built knowledge lake saved in your account. With Safety Lake, you will get a extra full understanding of your safety knowledge throughout your total group. It’s also possible to enhance the safety of your workloads, purposes, and knowledge. Safety Lake has adopted the Open Cybersecurity Schema Framework (OCSF), an open normal. With OCSF help, the service can normalize and mix safety knowledge from AWS and a broad vary of enterprise safety knowledge sources.

On the subject of near-real-time evaluation of knowledge because it arrives in Safety Lake and responding to safety occasions your organization cares about, Amazon OpenSearch Service offers the mandatory tooling that will help you make sense of the info present in Safety Lake.

OpenSearch Service is a totally managed and scalable log analytics framework that’s utilized by prospects to ingest, retailer, and visualize knowledge. Clients use OpenSearch Service for a various set of knowledge workloads, together with healthcare knowledge, monetary transactions data, software efficiency knowledge, observability knowledge, and far more. Moreover, prospects use the managed service for its ingest efficiency, scalability, low question latency, and talent to investigate giant datasets.

This submit reveals you find out how to ingest, remodel, and ship Safety Lake knowledge to OpenSearch Service to be used by your SecOps groups. We additionally stroll you thru find out how to use a collection of prebuilt visualizations to view occasions throughout a number of AWS knowledge sources supplied by Safety Lake.

Understanding the occasion knowledge present in Safety Lake

Safety Lake shops the normalized OCSF safety occasions in Apache Parquet format—an optimized columnar knowledge storage format with environment friendly knowledge compression and enhanced efficiency to deal with advanced knowledge in bulk. Parquet format is a foundational format within the Apache Hadoop ecosystem and is built-in into AWS providers equivalent to Amazon Redshift Spectrum, AWS Glue, Amazon Athena, and Amazon EMR. It’s a transportable columnar format, future proofed to help extra encodings as know-how develops, and it has library help throughout a broad set of languages like Python, Java, and Go. And one of the best half is that Apache Parquet is open supply!

The intent of OCSF is to supply a typical language for knowledge scientists and analysts that work with risk detection and investigation. With a various set of sources, you’ll be able to construct an entire view of your safety posture on AWS utilizing Safety Lake and OpenSearch Service.

Understanding the occasion structure for Safety Lake

Safety Lake offers a subscriber framework to supply entry to the info saved in Amazon S3. Companies equivalent to Amazon Athena and Amazon SageMaker use question entry. The answer, on this submit, makes use of knowledge entry to reply to occasions generated by Safety Lake.

While you subscribe for knowledge entry, occasions arrive by way of Amazon Easy Queue Service (Amazon SQS). Every SQS occasion comprises a notification object that has a “pointer” by way of knowledge used to create a URL to the Parquet object on Amazon S3. Your subscriber processes the occasion, parses the info discovered within the object, and transforms it to no matter format is sensible on your implementation.

The answer we offer on this submit makes use of a subscriber for knowledge entry. Let’s drill down into what the implementation seems like so that you just perceive the way it works.

Answer overview

The high-level structure for integrating Safety Lake with OpenSearch Service is as follows.

The workflow comprises the next steps:

  1. Safety Lake persists Parquet formatted knowledge into an S3 bucket as decided by the administrator of Safety Lake.
  2. A notification is positioned in Amazon SQS that describes the important thing to get entry to the article.
  3. Java code in an AWS Lambda perform reads the SQS notification and prepares to learn the article described within the notification.
  4. Java code makes use of Hadoop, Parquet, and Avro libraries to retrieve the article from Amazon S3 and remodel the information within the Parquet object into JSON paperwork for indexing in your OpenSearch Service area.
  5. The paperwork are gathered after which despatched to your OpenSearch Service area, the place index templates map the construction right into a schema optimized for Safety Lake logs in OCSF format.

Steps 1–2 are managed by Safety Lake; steps 3–5 are managed by the shopper. The shaded parts are your duty. The subscriber implementation for this answer makes use of Lambda and OpenSearch Service, and these sources are managed by you.

If you’re evaluating this as answer for what you are promoting, do not forget that Lambda has a 15-minute most execution time on the time of this writing. Safety Lake can produce as much as 256MB object sizes and this answer will not be efficient on your firm’s wants at giant scale. Varied levers in Lambda have impacts on the price of the answer for log supply. Make price acutely aware choices when evaluating pattern options. This implementation utilizing Lambda is appropriate for smaller corporations the place to quantity of logs for CloudTrail and VPC movement logs are extra appropriate for a Lambda based mostly method the place the fee to remodel and ship logs to Amazon OpenSearch Service are extra finances pleasant.

Now that you’ve got some context, let’s begin constructing the implementation for OpenSearch Service!


Creation of Safety Lake on your AWS accounts is a prerequisite for constructing this answer. Safety Lake integrates with an AWS Organizations account to allow the providing for chosen accounts within the group. For a single AWS account that doesn’t use Organizations, you’ll be able to allow Safety Lake with out the necessity for Organizations. You should have administrative entry to carry out these operations. For a number of accounts, it’s prompt that you just delegate the Safety Lake actions to a different account in your group. For extra details about enabling Safety Lake in your accounts, evaluate Getting began.

Moreover, you could must take the supplied template and modify it to your particular surroundings. The pattern answer depends on entry to a public S3 bucket hosted for this weblog so egress guidelines and permissions modifications could also be required should you use S3 endpoints.

This answer assumes that you just’re utilizing a website deployed in a VPC. Moreover, it assumes that you’ve got fine-grained entry controls enabled on the area to forestall unauthorized entry to knowledge you retailer as a part of the combination with Safety Lake. VPC-deployed domains are privately routable and don’t have any entry to the general public web by design. If you wish to entry your area in a extra public setting, you have to create a NGINX proxy to dealer a request between private and non-private settings.

The remaining sections on this submit are targeted on find out how to create the combination with OpenSearch Service.

Create the subscriber

To create your subscriber, full the next steps:

  1. On the Safety Lake console, select Subscribers within the navigation pane.
  2. Select Create subscriber.
  3. Underneath Subscriber particulars, enter a significant identify and outline.
  4. Underneath Log and occasion sources, specify what the subscriber is allowed to ingest. For this submit, we choose All log and occasion sources.
  5. For Information entry technique, choose S3.
  6. Underneath Subscriber credentials, present the account ID and an exterior ID for which AWS account you need to present entry.
  7. For Notification particulars, choose SQS queue.
  8. Select Create when you’re completed filling within the type.

It can take a minute or so to initialize the subscriber framework, such because the SQS integration and the permission generated so to entry the info from one other AWS account. When the standing adjustments from Creating to Created, you have got entry to the subscriber endpoint on Amazon SQS.

  1. Save the next values discovered within the subscriber Particulars part:
    1. AWS function ID
    2. Exterior ID
    3. Subscription endpoint

Use AWS CloudFormation to provision Lambda integration between the 2 providers

An AWS CloudFormation template takes care of a giant portion of the setup for the combination. It creates the mandatory parts to learn the info from Safety Lake, remodel it into JSON, after which index it into your OpenSearch Service area. The template additionally offers the mandatory AWS Id and Entry Administration (IAM) roles for integration, the tooling to create an S3 bucket for the Java JAR file used within the answer by Lambda, and a small Amazon Elastic Compute Cloud (Amazon EC2) occasion to facilitate the provisioning of templates in your OpenSearch Service area.

To deploy your sources, full the next steps:

  1. On the AWS CloudFormation console, create a brand new stack.
  2. For Put together template, choose Template is prepared.
  3. Specify your template supply as Amazon S3 URL.

You possibly can both save the template to your native drive or copy the hyperlink to be used on the AWS CloudFormation console. On this instance, we use the template URL that factors to a template saved on Amazon S3. You possibly can both use the URL on Amazon S3 or set up it out of your system.

  1. Select Subsequent.
  2. Enter a reputation on your stack. For this submit, we identify the stack blog-lambda. Begin populating your parameters based mostly on the values you copied from Safety Lake and OpenSearch Service. Be certain that the endpoint for the OpenSearch area has a ahead slash / on the finish of the URL that you just copy from OpenSearch Service.
  3. Populate the parameters with values you have got saved or copied from OpenSearch Service and Safety Lake, then select Subsequent.
  4. Choose Protect efficiently provisioned sources to protect the sources in case the stack roles again so you’ll be able to debug the problems.
  5. Scroll to backside of web page and select Subsequent.
  6. On the abstract web page, choose the examine field that acknowledges IAM sources will probably be created and used on this template.
  7. Select Submit.

The stack will take a couple of minutes to deploy.

  1. After the stack has deployed, navigate to the Outputs tab for the stack you created.
  2. Save the CommandProxyInstanceID for executing scripts and save the 2 function ARNs to make use of within the function mappings step.

It is advisable to affiliate the IAM roles for the tooling occasion and the Lambda perform with OpenSearch Service safety roles in order that the processes can work with the cluster and the sources inside.

Provision function mappings for integrations with OpenSearch Service

With the template-generated IAM roles, you have to map the roles utilizing function mapping to the predefined all_access role in your OpenSearch Service cluster. It’s best to consider your particular use of any roles and guarantee they’re aligned along with your firm’s necessities.

  1. In OpenSearch Dashboards, select Safety within the navigation pane.
  2. Select Roles within the navigation pane and search for the all_access function.
  3. On the function particulars web page, on the Mapped customers tab, select Handle mapping.
  4. Add the 2 IAM roles discovered within the outputs of the CloudFormation template, then select Map.

Provision the index templates used for OCSF format in OpenSearch Service

Index templates have been supplied as a part of the preliminary setup. These templates are essential to the format of the info in order that ingestion is environment friendly and tuned for aggregations and visualizations. Information that comes from Safety Lake is remodeled right into a JSON format, and this format is predicated immediately on the OCSF normal.

For instance, every OCSF category has a typical Base Event class that comprises a number of objects that characterize particulars just like the cloud supplier in a Cloud object, enrichment knowledge utilizing an Enrichment object that has a typical construction throughout occasions however can have completely different values based mostly on the occasion, and much more advanced constructions which have internal objects, which themselves have extra internal objects such because the Metadata object, nonetheless a part of the Base Occasion class. The Base Occasion class is the inspiration for all classes in OCSF and helps you with the trouble of correlating occasions written into Safety Lake and analyzed in OpenSearch.

OpenSearch is technically schema-less. You don’t must outline a schema up entrance. The OpenSearch engine will attempt to guess the info varieties and the mappings discovered within the knowledge coming from Safety Lake. This is named dynamic mapping. The OpenSearch engine additionally offers you with the choice to predefine the info you might be indexing. This is named explicit mapping. Utilizing specific mappings to figuring out your knowledge supply varieties and the way they’re saved at time of ingestion is vital to getting excessive quantity ingest efficiency for time-centric knowledge listed at heavy load.

In abstract, the mapping templates use composable templates. On this assemble, the answer establishes an environment friendly schema for the OCSF normal and provides you the potential to correlate occasions and specialize on particular classes within the OCSF normal.

You load the templates utilizing the instruments proxy created by your CloudFormation template.

  1. On the stack’s Outputs tab, discover the parameter CommandProxyInstanceID.

We use that worth to search out the occasion in AWS Methods Supervisor.

  1. On the Methods Supervisor console, select Fleet supervisor within the navigation pane.
  2. Find and choose your managed node.
  3. On the Node actions menu, select Begin terminal session.
  4. While you’re linked to the occasion, run the next instructions:
    . /usr/share/es-scripts/ | grep -o '{"acknowledged":true}' | wc -l

It’s best to see a closing results of 42 occurrences of {“acknowledged”:true}, which demonstrates the instructions being despatched have been profitable. Ignore the warnings you see for migration. The warnings don’t have an effect on the scripts and as of this writing can’t be muted.

  1. Navigate to Dev Instruments in OpenSearch Dashboards and run the next command:

This confirms that the scripts have been profitable.

Set up index patterns, visualizations, and dashboards for the answer

For this answer, we prepackaged a couple of visualizations so to make sense of your knowledge. Obtain the visualizations to your native desktop, then full the next steps:

  1. In OpenSearch Dashboards, navigate to Stack Administration and Saved Objects.
  2. Select Import.
  3. Select the file out of your native system, choose your import choices, and select Import.

You will notice quite a few objects that you just imported. You need to use the visualizations after you begin importing knowledge.

Allow the Lambda perform to start out processing occasions into OpenSearch Service

The ultimate step is to enter the configuration of the Lambda perform and allow the triggers in order that the info could be learn from the subscriber framework in Safety Lake. The set off is at present disabled; you have to allow it and save the config. You’ll discover the perform is throttled, which is by design. It is advisable to have templates within the OpenSearch cluster in order that the info indexes within the desired format.

  1. On the Lambda console, navigate to your perform.
  2. On the Configurations tab, within the Triggers part, choose your SQS set off and select Edit.
  3. Choose Activate set off and save the setting.
  4. Select Edit concurrency.
  5. Configure your concurrency and select Save.

Allow the perform by setting the concurrency setting to 1. You possibly can modify the setting as wanted on your surroundings.

You possibly can evaluate the Amazon CloudWatch logs on the CloudWatch console to substantiate the perform is working.

It’s best to see startup messages and different occasion data that signifies logs are being processed. The supplied JAR file is ready for data degree logging and if wanted, to debug any issues, there’s a verbose debug model of the JAR file you should use. Your JAR file choices are:

In the event you select to deploy the debug model, the verbosity of the code will present some error-level particulars within the Hadoop libraries. To be clear, Hadoop code will show a lot of exceptions in debug mode as a result of it checks surroundings settings and appears for issues that aren’t provisioned in your Lambda surroundings, like a Hadoop metrics collector. Most of those startup errors aren’t deadly and could be ignored.

Visualize the info

Now that you’ve got knowledge flowing into OpenSearch Service from Safety Lake by way of Lambda, it’s time to place these imported visualizations to work. In OpenSearch Dashboards, navigate to the Dashboards web page.

You will notice 4 major dashboards aligned across the OCSF class for which they help. The 4 supported visualization classes are for DNS exercise, safety findings, community exercise, and AWS CloudTrail utilizing the Cloud API.

Safety findings

The findings dashboard is a collection of high-level abstract data that you just use for visible inspection of AWS Safety Hub findings in a time window specified by you within the dashboard filters. Most of the encapsulated visualizations give “filter on click on” capabilities so you’ll be able to slim your discoveries. The next screenshot reveals an instance.

The Discovering Velocity visualization reveals findings over time based mostly on severity. The Discovering Severity visualization reveals which “findings” have handed or failed, and the Findings desk visualization is a tabular view with precise counts. Your aim is to be close to zero in all of the classes besides informational findings.

Community exercise

The community visitors dashboard offers an outline for all of your accounts within the group which can be enabled for Safety Lake. The next instance is monitoring 260 AWS accounts, and this dashboard summarizes the highest accounts with community actions. Mixture visitors, high accounts producing visitors and high accounts with probably the most exercise are discovered within the first part of the visualizations.

Moreover, the highest accounts are summarized by permit and deny actions for connections. Within the visualization under, there are fields that you could drill down into different visualizations. A few of these visualizations have hyperlinks to 3rd celebration web site that will or will not be allowed in your organization. You possibly can edit the hyperlinks within the Saved objects within the Stack Administration plugin.

For drill downs, you’ll be able to drill down by selecting the account ID to get a abstract by account. The listing of egress and ingress visitors inside a single AWS account is sorted by the quantity of bytes transferred between any given two IP addresses.

Lastly, should you select the IP addresses, you’ll be redirected to Project Honey Pot, the place you’ll be able to see if the IP handle is a risk or not.

DNS exercise

The DNS exercise dashboard reveals you the requestors for DNS queries in your AWS accounts. Once more, this can be a abstract view of all of the occasions in a time window.

The primary visualization within the dashboard reveals DNS exercise in mixture throughout the highest 5 lively accounts. Of the 260 accounts on this instance, 4 are lively. The following visualization breaks the resolves down by the requesting service or host, and the ultimate visualization breaks out the requestors by account, VPC ID, and occasion ID for these queries run by your options.

API Exercise

The ultimate dashboard provides an outline of API exercise by way of CloudTrail throughout all of your accounts. It summarizes issues like API name velocity, operations by service, high operations, and different abstract data.

If we take a look at the primary visualization within the dashboard, you get an thought of which providers are receiving probably the most requests. You typically want to grasp the place to focus the vast majority of your risk discovery efforts based mostly on which providers could also be consumed in a different way over time. Subsequent, there are warmth maps that break down API exercise by area and repair and also you get an thought of what sort of API calls are most prevalent in your accounts you might be monitoring.

As you scroll down on the shape, extra particulars current themselves equivalent to high 5 providers with API exercise and the highest API operations for the group you might be monitoring.


Safety Lake integration with OpenSearch Service is simple to realize by following the steps outlined on this submit. Safety Lake knowledge is remodeled from Parquet to JSON, making it readable and easy to question. Allow your SecOps groups to establish and examine potential safety threats by analyzing Safety Lake knowledge in OpenSearch Service. The supplied visualizations and dashboards might help to navigate the info, establish tendencies and quickly detect any potential safety points in your group.

As subsequent steps, we advocate to make use of the above framework and related templates that give you straightforward steps to visualise your Safety Lake knowledge utilizing OpenSearch Service.

In a collection of follow-up posts, we are going to evaluate the supply code and walkthrough revealed examples of the Lambda ingestion framework within the AWS Samples GitHub repo. The framework could be modified to be used in containers to assist handle corporations which have longer processing occasions for giant information revealed in Safety Lake. Moreover, we are going to focus on find out how to detect and reply to safety occasions utilizing instance implementations that use OpenSearch plugins equivalent to Safety Analytics, Alerting, and the Anomaly Detection obtainable in Amazon OpenSearch Service.

Concerning the authors

Kevin Fallis (@AWSCodeWarrior) is an Principal AWS Specialist Search Options Architect. His ardour at AWS is to assist prospects leverage the right combination of AWS providers to realize success for his or her enterprise targets. His after-work actions embody household, DIY tasks, carpentry, enjoying drums, and all issues music.

Jimish Shah is a Senior Product Supervisor at AWS with 15+ years of expertise bringing merchandise to market in log analytics, cybersecurity, and IP video streaming. He’s obsessed with launching merchandise that supply pleasant buyer experiences, and clear up advanced buyer issues. In his free time, he enjoys exploring cafes, climbing, and taking lengthy walks

Ross Warren is a Senior Product SA at AWS for Amazon Safety Lake based mostly in Northern Virginia. Previous to his work at AWS, Ross’ areas of focus included cyber risk looking and safety operations. When he isn’t speaking about AWS he likes to spend time along with his household, bake bread, make sawdust and luxuriate in time outdoors.

Leave a Reply

Your email address will not be published. Required fields are marked *