Releases: airbnb/streamalert
v2.0.0
StreamAlert Release v2.0.0
New Features
Alert Merging
This release brings new support for merging similar alerts. The rate at which alerts can be triggered is occasionally unpredictable and be a potential pain for analysts down stream. Combing through a massive amount of alerts becomes tiresome, and could easily result in missing the needle in the haystack. We have introduced an alert merging feature that allows rules to dictate how numerous alerts generated by the rule in a given timeframe are merged together into a single, unified alert. This feature has proven to greatly reduce alert overload. See the rules page of the documentation for how to configure alert merging. See also: #612, #640, #642, #666
Rule Baking/Staging
Release 2.0.0 introduces a new feature we’ve appropriately deemed Rule Baking. Often, new rules have not been battle tested and need to be tuned over time to prevent alert fatigue. This feature introduces a staging period for new rules, along with a daily email digest. The digest includes statistics on alerts that have fired for staged rules, allowing analysts to get metrics on new rules before they go into production. As an added bonus, any rule, new or old, can be toggled into and out of staging on-demand without any code changes.
SNS & SQS Alerting Outputs
This release introduces support for sending alerts to AWS SNS (Simple Notification Service) and SQS (Simple Queue Service). SNS is ideal for sending email or text messages when alerts occur, while SQS better enables downstream and reliable alert processing by other tools.
Slack App
StreamAlert now supports reading files from S3 that contain lookup information to be used within rules. These JSON files can be gzip compressed, and will be refreshed every few minutes to ensure the AWS Lambda container always has an up-to-date intelligence set.
Salesforce App
Salesforce is a commonly used CRM service, and in the latest version of their API (at the time of this writing), it supports 42 types of event logs, and auditing/alerting on these events is no simple task. The new StreamAlert App will collect Console, Login, LoginAs, Report and ReportAs event logs and send to StreamAlert for rule processing and alerting.
Carbon Black Hash Banning
Thanks to @fusionrace for adding support for automated banning of potential malicious files via the new Carbon Black alert output. This feature enables rules to specify md5 hashes which should be banned, and the output handles the rest.
Lookup Tables via S3
StreamAlert now supports reading files from S3 that contain lookup information to be used within rules. These JSON files can be gzip compressed, and will be refreshed every few minutes to ensure the AWS Lambda container always has an up-to-date intelligence set.
Log Parsing Features
Occasionally, logs can be a JSON object containing an embedded csv log of interest. This is the case with AWS RDS logs that are captured via CloudWatch Logs log group and forwarded on to another destination with a CloudWatch Logs Subscription. This version of StreamAlert introduces the ability to extract embedded csv from a JSON object and parse it accordingly. It also adds support for encoded JSON within another JSON object. Various other added benefits were introduced as well; see also: #744, #745
To view a list of all of the new features, see here. Many of these features are sparsely documented at the moment, but look out for many updates to documentation in the coming weeks!
Improvements
Terraform Consolidation
A huge thanks to @austinbyers for refactoring the redundant Terraform Lambda code into a centralized module. The consolidation has enabled us to build new Lambda functions faster and reduced the amount of tests we need to write. The new module is easily adaptable and most of StreamAlert’s Lambda functions have already been ported to leverage it. See also: #638
Alert Processor Consolidation
As part of the move toward alert merging, ‘clustered’ alert processors became unnecessary and overly complex. This change removes multiple alert processors (one for each cluster) in favor of a single processor for all alerts. The architecture of StreamAlert has become more intuitive and simplified thanks to this change.
Rule Test Templates
Up until now, writing tests for rules in StreamAlert requires a copy of a 'real' log that will classify properly in the rules engine. This can be a little arduous if your rule only tests one or two conditions. This release introduce new functionality, where a user can supply a partial test event, and the testing framework will ‘intelligently’ fill the rest of the fields in with default data.
Firehose Optimizations
Batches of records being sent to AWS Firehose, for historical data retention, will now only have records that failed to send retried instead of the entire batch. Additionally, the data will only be serialized to JSON once at the start of the sending, instead of on every attempt. The client has also been tuned with connect and read timeouts to reduce latency when sending batches. See also: #649, #736
Threat Intel Improvements
This release introduces the ability to exclude specific indicators of compromise from the Threat Intel lookups. For instance, IP addresses that are known-goods could be omitted when querying the threat intel DynamoDB table. This can speed up rule processing and reduce cost. The ThreatIntel downloader has also been updated to support filtering out bad or unactionable indicators at the source, before they even land in DynamoDB. See also: #684
General Improvements(#725)
- #604 - Speeding up Lambda function deploys by using the
-auto-approve
flag with terraform. - #652 - Reducing the amount of API requests made to PagerDuty during PagerDuty incident creation. This change prevents potential throttling by the API.
- #653 - More efficient rule helper functions
- #691 - Backoff (aka retry) decorators are used fairly heavily throughout StreamAlert. This release centralizes the backoff handlers to be reused and reduces the amount of duplicate code & tests.
- #707 - Almost every new component added to StreamAlert needs to load the configuration from disk. This release introduces shared logic for config loading that removes the need to write yet another
load_config
function. - #725 - Alert processor status logging is now greatly simplified. Those writing new alert outputs no longer have to reason about success or failure logging.
- #784 - S3 server-side encryption adds a layer of protection to data in S3 buckets, and release 2.0.0 introduces this as the default for StreamAlert.
- #804 - Deduplicating gsuite events, thanks to @stoggi.
- #806 - Updating Athena partition refresh function to use SQS as a source. See also: #808, #810, #815
- #818 - Logger instantiation consolidated to
shared
module.
To view the complete list of all of the improvements in v2.0.0, including many not mentioned above, see here.
Bug Fixes
- #623 - Python
absolute import
warning within CloudWatch Logs during Lambda invocations. Fixes: #267 - #657 - Long-standing and sporadic
Too many open files
bug that was never easily tracked down. Fixes: #587 - #667 - Large amounts of memory utilized during data normalization.
- #683 -
TooManyUpdates
error when saving SSM parameter in StreamAlert Apps. Fixes: #682 - #688 - Unsupported types in log would not cause the log’s classification to fail. Fixes: #676
- #731 - Slack messages could be split on each character if no newline character existed, causing hundreds of message to be sent to Slack.
- #732 - Shredding the Lambda tmp directory before downloading S3 objects to prevent disk space from filling up.
- #735 - Catching
KeyError
during classification that could occur if log is misclassified as the wr...
v1.6.1
New Features
Github Output Support
This release now supports Github as an alerting output. Sending alerts to a github
output will now create an Issue in the specified Github.com repository. A huge thanks to @patrickod for this contribution!
Komand Output Support
Also new to this release is support for Komand as an alerting output. This allows Komand to carry out specific actions when alerts are triggered and further expands StreamAlert’s integration with security orchestration tools. A huge thanks to @0xdabbad00 for this contribution!
Improvements
Notes Support in PagerDuty Incidents
The PagerDuty Incidents alerting output now supports adding notes to Incidents created in PagerDuty. This is accomplished by adding a note
to an record’s context within a rule.
S3 Payload Error Handling
Improved handling of S3 payloads, including skipping files of zero size and checking for IOError
related issues when downloading objects.
Bug Fixes
- #597 - Fixed Firehose
Connection reset by peer
bug filed in #478 - #598 - Fixed
TypeError
when deleting messages from SQS - #614 - Fixed merging alerts into incidents in the PagerDuty Incidents output
- #624 - Fixed issue with required sub-key checking
- #627 - Catching request timeouts in alerting outputs
Updated CloudTrail Events log schema and new schema for Carbon Black Audit logs.
v1.6.0
New Features
Threat Intelligence
This release includes a new (beta) threat intelligence feature to enable analysis and identification of suspicious activity in your infrastructure based on IP address, domain and file hash indicators.
StreamAlert compares these indicators (stored in a DynamoDB table) to incoming data in real-time, and generates an alert if any matches are found.
To compliment this feature, it also includes a Threat Intel Downloader which is a Lambda function to collect and update the latest IP addresses, domains and file hashes mentioned above to the DynamoDB table. Currently, the Threat Intel Downloader supports fetching of data from Anomali’s ThreatStream API.
StreamAlert Apps
StreamAlert Apps enable you to easily retrieve data from any service with a RESTful API to send to StreamAlert for processing and alerting. The infrastructure is serverless, the configuration takes minutes, and the deployment is automated via Terraform.
Read more about this feature in our blog post, or learn how to get started with StreamAlert Apps in the documentation.
This release includes several apps, with more coming in future releases:
- Duo Admin & Auth Logs
- OneLogin Events
- GSuite Admin Reports
- Admin, Calendar, Drive, GPlus, Groups, Login, Mobile, Rules, SAML, Token
- Box Admin Events
Historical Search of Data
As announced in the last release (v1.5.0), StreamAlert can be configured to search generated alerts with AWS Athena.
This feature has been extended to support delivery of all incoming logs into Amazon S3 via AWS Firehose, and to be searched by AWS Athena in the streamalert
database. This allows users to query data for long periods of time, and perform statistics, joins, and other analysis.
The StreamAlert CLI also manages the setup, creation, and provisioning of data tables and required AWS infrastructure. To get started, check out our Athena setup instructions.
PagerDuty Events API v2 and Incidents API Output Support
StreamAlert now includes support for two new PagerDuty API outputs:
- PagerDuty Events API (v2) - Be sure to upgrade your outputs, as the Events API V1 has been deprecated.
- PagerDuty Incidents API - This allows for the usage of more advanced PagerDuty features, such as assigning an incident to a specific user or setting a priority, directly from within a rule.
Improvements
Local Rule Testing Enhancements
Rule test events can now be configured to indicate which rules they will trigger, and includes the log schema that this event corresponds to. The CLI also now reports on hard-to-diagnose errors related to rule tests. See the documentation for more information on the new test event structure.
Rule Helpers for Finding Key Items
StreamAlert now includes rule helper functions which help you recursively find key-values in records without worrying about the schema or nesting.
Security Linting via Bandit
Bandit is a Python scanner which checks for common security issues in Python source code. The project has now been updated to run bandit
on the StreamAlert source as part the CI pipeline.
User-configurable Kinesis Shard-Level Metrics
Kinesis shard-level metrics, via enhanced monitoring, will now be disabled by default with the optional ability to configure specific metrics to log. This will greatly reduce AWS costs for end-users.
Core Infrastructure Improvements
- Global Alerts Firehose - Enables high throughput delivery of alerts to S3
- Optional Kinesis Modules - Adds modularity to StreamAlert clusters, making Kinesis optional. This fully enables a purely S3 based cluster, where Kinesis is not necessary to deliver data into Lambda.
- Cross-Account CloudTrail - Supports receiving CloudTrail data from multiple AWS accounts into the StreamAlert CloudTrail module.
- S3 Event Filtering - Added support for suffix and prefix filtering of event notifications of objects in S3
Bug Fixes
- #339 - Fix for nested data normalization
- #361 - Classifier TypeError fix when casting to list/dict
- #367, #381 - Fixed various bugs related to data normalization
- #393 , #564 - Faster unit tests
- #449, #431 - Fixing various CLI bugs
- #453 - Fixed json parser bug related to json path
- #456 - Fixed classifier type conversion of nested values
- #548 - Fixed bug with total records metric
- #578 - Athena Partition Refresh KeyError bug fix
Updates to various Carbon Black schemas fixes for Carbon Black and CloudTrail logs.
Shout-outs
Special thanks for the following external contributions from @armtash and @javefang:
v1.5.0
New Features
Historical Search of Alerts
StreamAlert now supports historical searching of alerts!. To enable this functionality, follow the steps outlined in the docs.
Once setup is complete, ensure your rules are sending alerts to the default S3
bucket created by StreamAlert:
Example conf/outputs.json
config:
{
"aws-s3": {
"main": "<my-prefix>.streamalerts"
}
}
Example rule:
@rule(logs=['cloudtrail:events'],
outputs=['aws-s3:main'])
def test_cloudtrail_rule(rec):
return rec['region'] == 'us-west-2'
To search alerts, open AWS Athena and run desired SELECT
statements on the alerts
table in the newly created streamalert
database.
Optionally, a dt
partition can be specified to limit results to the nearest hour.
For more information on SQL syntax and options, see the Athena Language Reference.
Enhanced Metrics with Alarms
To gain a better understanding of your StreamAlert deployment, detailed metrics have been added for failed log parsing (FailedParses
), total records processed (TotalRecords
), total triggered alerts (TriggeredAlerts
), and more.
Custom metrics can be enabled or disabled using the python manage.py metrics
command for either aggregate and per-cluster metrics.
Alarms can also be configured using the python manage.py create-alarm
command. For more information on metrics setup, click the link in the header above.
Easy Schema Validation
Previously, in order to verify that a newly added schema was working as expected, a rule had to be created.
The new python manage.py validate-schemas
command removes the need to create a rule to test a schema.
After you have created a schema, and added a test event in tests/integration/rules,
the schema can be verified by running:
$ python manage.py validate-schemas --test-files <rule_file_name.json>
Data Normalization
It is common for multiple logs to have similar fields, but with different key names.
Examples include src_ip
, source_ip
, client_ip
, remote_address
, remote_ip
, dst_ip
, etc.
What if you wanted to write a single rule that analyzed all IP addresses found in your logs? With data normalization, you can!
By normalizing schema keys, rules can be simplified and consolidated.
Let’s walk through an example, using two example schemas:
{
"system:logs": {
"parser": "json",
"schema": {
"date": "string",
"client_ip": "string", # represents an ip address
"message": "string",
"name": "string"
}
},
"web:logs": {
"parser": "json",
"schema": {
"error_code": "string",
"filename": "string",
"src_ip": "string", # also represents an ip address
"name": "string"
}
}
}
The field names to be normalized are declared in conf/types.json
. In this case, we will normalize the ip
related fields.
{
"system": {
"sourceAddress": ["client_ip"]
},
"web": {
"sourceAddress": ["src_ip"]
}
}
Note the usage of CEF format. For examples, see the provided conf/types.json in the repository.
When writing rules, you can use the special keyword argument datatypes
to ensure that the rule applies to all logs with this normalized field:
from helpers.base import fetch_values_by_datatype, in_network
@rule(datatypes=['sourceAddress'],
outputs=['aws-s3:main'])
def trusted_ip_check(rec):
# Verify that a system IP is within the trusted CIDR set
ip_addresses = fetch_values_by_datatype(rec, ‘sourceAddress’)
trusted_cidrs = {‘10.0.100.0/24’, ‘10.1.200.0/24’}
return not all(in_network(ip, trusted_cidrs) for ip in ip_addresses)
Note: Rules can still be restricted to specific log types by using the logs
constraint.
Two other large benefits of data normalization:
- It paves the way for our threat intelligence integration (soon!)
- It lets you swap out system, networking or security products without having to change your rules. Vendor or product agnostic rules for the win!
Improvements
Remove SNS Between Lambda Functions
The removal of SNS has simplified inter-service communication and increased reliability in alert delivery between Lambda functions.
Consolidated S3 Alerts Buckets
Alert delivery has been consolidated to a single S3 bucket to enable historical searching of alerts.
CLI Rename and Added Help Strings
The stream_alert_cli.py
command line tool has been renamed to manage.py
.
To get started with the new CLI:
$ python manage.py --help
$ python manage.py <subcommand> --help
Bug Fixes
#223 - Fix nested rule directory import errors
#250 - Massive Pylint cleanup
#274 - Prevent the alert processor from running without a valid config
#284 - Raise exception if output credentials could not be encrypted
#300, #315 - VPC flow log, CarbonBlack, Osquery schema fixes and additional support
#297 - GitHub schema fixes
1.4.0
New Features
Community Rules
To encourage collaboration and contribution of StreamAlert rules from the community, the rules directory has been reorganized:
|------- rules/
| |------- community/
| |------- default/
When contributing public rules, rule files should be placed within a named subdirectory under the community folder. An example is the cloudtrail rules in rules/community/cloudtrail
.
For rules internal to your organization, the default
folder is a great starting point. Arbitrary amounts of subdirectories can be created under this directory. Remember to always place a blank __init__.py
in new subdirectories to be picked up by rule processor imports.
Matchers and helpers have also been reorganized into their own respective directories:
|------- conf/
|------- docs/
|------- helpers/
|------- matchers/
|------- rules/
|------- stream_alert/
|------- stream_alert_cli/
|------- terraform/
|------- test/
Be sure to update rules and matchers referencing helpers based on this new structure.
JSON Cluster Templates
StreamAlert’s supporting AWS infrastructure is managed by a set of Terraform modules. Each module controls a piece of StreamAlert. An example is the monitoring
module, used to create metric alarms and alert administrators when Lambda errors or throttles occur.
To give users full control over which modules and settings they would like, clusters have been refactored into independent JSON files:
# conf/clusters/production.json
{
"id": "production",
"region": "us-west-2",
"modules": {
"stream_alert": {
"alert_processor": {
"timeout": 25,
"memory": 128,
"current_version": "$LATEST"
},
"rule_processor": {
"timeout": 10,
"memory": 256,
"current_version": "$LATEST"
}
},
"cloudwatch_monitoring": {
"enabled": true
},
"kinesis": {
"streams": {
"shards": 1,
"retention": 24
},
"firehose": {
"enabled": true,
"s3_bucket_suffix": "streamalert.results"
}
},
"kinesis_events": {
"enabled": true
}
},
"outputs": {
"kinesis": [
"username",
"access_key_id",
"secret_key"
]
}
}
For more information on setup, check out https://www.streamalert.io/clusters.html
Alert Processor VPC Support
AWS VPC (Virtual Private Cloud) allows users or organizations to run virtual machines in a logically segmented environment. To support delivery of StreamAlerts to internal resources (such as EC2 instances), the alert processor may now be configured to access resources inside a VPC:
# conf/clusters/<cluster-name>.json
{
"alert_processor": {
"vpc_config": {
"subnet_ids": ["subnet-id-1"],
"security_group_ids": ["security-group-id-1"]
}
}
}
Note: When making this change, you must explicitly destroy and then re-create the alert processor:
$ cd terraform
$ terraform destroy -target=module.stream_alert_<cluster-name>.aws_lambda_function.streamalert_alert_processor
Then, run:
$ python stream_alert_cli.py terraform build
Alert Live Testing
To better validate StreamAlert’s end-to-end functionality, testing has been reworked to support sending alerts from a local StreamAlert repo. With a local set of valid AWS credentials, it is possible to use configured rule tests to dispatch alerts to configured outputs (such as Slack or PagerDuty).
This functionality is provided through the StreamAlertCLI tool, with the new command line argument live-test
:
$ python stream_alert_cli.py live-test --cluster <cluster_name>
For normal use cases, it is unlikely to want (or need) to test the full ruleset, as this could result in a high volume of alerts to outputs. To test specific rules, the --rules
argument followed by a space-delimited list of rule names to test:
$ python stream_alert_cli.py live-test --cluster <cluster_name> --rules <rule_name_01> <rule_name_02>
Bug Fixes
#129 - Cluster aware SNS inputs
#166 - Apply optional top level keys to nested JSON records
#168 - Fix the handler import path for the alert_processor
#183 - Lambda traceback due to PagerDuty errors
#201 - Updated IAM permissions for streamalert user
#202 - Handle errors when Terraform is not installed
#206, #209 - Schema updates to osquery and carbonblack:watchlist.hit.binary
1.3.0
New Features
New Schema Options
Log schemas now support list
, boolean
, and float
types for more accurate schemas (#77). As records are parsed by the rule_processor
, fields will now cast into these new types to be referenced by rules.
Example Schema:
"carbonblack:feed.storage.hit.process": {
"schema": {
"sensor_id": "integer",
"report_score": "integer",
"from_feed_search": "boolean",
"feed_id": "integer",
"ioc_type": "string",
"ioc_attr": {},
"docs": [],
"group": "string",
"server_name": "string",
"hostname": "string",
"feed_name": "string",
"cb_server": "string",
"timestamp": "float",
"process_guid": "string",
"interface_ip": "string",
"type": "string"
},
"parser": "json"
}
}
Example rule:
@rule(logs=['carbonblack:feed.storage.hit.process'],
matchers=[],
outputs=['slack:soc', 'pagerduty:soc'])
def cb_storage_hit_process(rec):
"""This event occurs when an intelligence feed indicator matches a new process upon ingest. """
return (
rec['from_feed_search'] == True and
len(rec['docs']) > 1
)
Additionally, to handle logs with optional keys, a new parser option optional_top_level_keys
has been added (#95). At a minimum, an incoming record must contain the keys defined in the schema
, and if any of the defined optional_top_level_keys
do not exist, an empty default value (per the defined type) will be added to the parsed record. This is to ensure rules do not reference keys that may not exist and subsequently result in an exception.
Example Schema:
"github:enterprise": {
"schema": {
"@timestamp": "string",
"@version": "integer",
"host": "string",
"message": "string",
"port": "integer",
"received_at": "string",
"tags": []
},
"parser": "json",
"configuration": {
"optional_top_level_keys": {
"logsource": "string",
"pid": "integer",
"program": "string",
"timestamp": "string"
}
}
}
This schema supports the following logs:
[
{
"message": "github_audit message",
"@version": "1",
"@timestamp": "2015-05-20T20:00:36.731Z",
"host": "10.0.0.1",
"port": 59310,
"tags": [],
"received_at": "2015-05-20T20:00:36.731Z",
"timestamp": "May 20 20:00:36",
"logsource": "github",
"program": "github_audit"
},
{
"message": "github_audit message",
"@version": "1",
"@timestamp": "2015-05-20T20:00:36.731Z",
"host": "10.0.0.1",
"port": 59310,
"pid": 1599,
"tags": [],
"received_at": "2015-05-20T20:00:36.731Z",
"timestamp": "May 20 20:00:36",
"logsource": "github",
"program": "github_audit"
}
]
Disable Rules
To quickly disable rules without deleting them, a new decorator (@disable
) has been added (#75). Note: This decorator must be right above the @rule
decorator with no spaces:
Example rule:
rule = StreamRules.rule
disable = StreamRules.disable()
@disable
@rule(logs=['carbonblack:feed.storage.hit.process'],
matchers=[],
outputs=['slack:soc', 'pagerduty:soc'])
def cb_storage_hit_process(rec):
"""This event occurs when an intelligence feed indicator matches a new process upon ingest. """
return (
rec['from_feed_search'] == True and
len(rec['docs'] > 1
)
When @disable
is being used, make sure to update the integration test to not expect an alert to trigger:
{
"records": [
{
"data": {...},
"description": "CB Feed Storage Hit Process should not trigger an alert",
"trigger": false,
"source": "my_s3_bucket",
"service": "s3"
}
]
}
Slack Message Format
Messages sent to Slack outputs are now formatted using mrkdwn styling, and sent as a series of attachments (#135).
Modular Outputs
Adding new outputs for supported services is now as easy as running:
$ python stream_alert_cli.py output new --service slack
This will create a new Slack integration. Prompts will then walk through entering any information required for the service. The currently supported services as of this release are: AWS Lambda, AWS S3, Pagerduty, Phantom, and Slack.
As an added bonus, these changes allow rules to send alerts to multiple configured outputs per service. For example, a rule could previously only send to one 'destination' in Slack, but can not send to multiple configured webhooks per service. To send to different integrations in Slack, a user would simply add them to the rule, like so:
@rule(logs=['carbonblack:feed.storage.hit.binary'],
matchers=[],
outputs=['slack:alerts_channel', 'slack:direct_message', 'pagerduty:corp_alerts'])
def cb_feed_storage_hit_binary_virustotal(rec):
"""Identify binaries that match against the virustotal feed"""
return (
rec['type'] == 'feed.storage.hit.binary' and
rec['feed_name'] == 'virustotal'
)
The StreamAlert output classes have also been refactored to easily enable the addition of new output services (#97). The documentation has been updated to demonstrate this new extensibility along with providing a walkthrough of how to implement a new service to send alerts to.
Support SNS inputs and S3/Lambda Outputs
To promote Serverless Service Oriented Architectures, StreamAlert now has the ability to accept input from arbitrary AWS SNS topics (#118/#119) and invoke arbitrary AWS Lambda functions as an output (#110).
To enable StreamAlert to accept input from SNS topics, modify the conf/inputs.json file, and terraform will automatically handle subscribing to the topic(s).
Example of adding an SNS input:
{
"aws-sns": {
"our_sns_input": "arn:aws:sns:us-east-1:012345678912:sns-topic-name"
}
}
As stated in the Modular Outputs section above, users can add AWS Lambda functions that they would like to utilize as outputs via the stream_alert_cli.py tool. This is accomplished by simply running the following command and following the prompts:
$ python stream_alert_cli.py output new --service aws-lambda
Example:
$ python stream_alert_cli.py output new --service aws-lambda
StreamAlertCLI [INFO]: Issues? Report here: https://github.com/airbnb/streamalert/issues
Please supply a short and unique descriptor for this Lambda function configuration
(ie: abbreviated name): external-lambda-function
Please supply the AWS arn, with the optional qualifier, that represents the Lambda function
to use for this configuration (ie: arn:aws:lambda:aws-region:acct-id:function:output_function:qualifier):
arn:aws:lambda:us-east-1:012345678912:function:my_function:Production
StreamAlertCLI [INFO]: Successfully saved 'external-lambda-function' output configuration
for service 'aws-lambda'
StreamAlertCLI [INFO]: Completed
Bug Fixes
#126, #137, #147, #161 - StreamAlert performance improvements thanks to @ryandeivert!
#100 - Check Slack message size before sending, and appropriately split long messages.
#79 - Does not upload the Lambda deployment package if pip
fails to install dependencies.
1.2.0
New Features
VPC Flow Log Support
AWS VPC Flow Logs is a feature that enables you to capture information about the network traffic going to and from network interfaces in your VPC. This network flow is represented as (srcaddr
, dstaddr
, srcport
, dstport
, and protocol
). Potential use cases for these logs include network traffic analysis, ACL auditing, and more.
StreamAlert now formally supports the setup, ingestion, and analysis of these logs. Follow the instructions below to get setup in minutes!
Add the following to your cluster(s) .tf
file located in the terraform/
directory:
module "flow_logs_cluster_name_here" {
source = "modules/tf_stream_alert_flow_logs"
destination_stream_arn = "${module.kinesis_cluster_name_here.arn}"
targets = "${var.flow_log_settings["cluster_name_here"]}"
region = "${lookup(var.clusters, "cluster_name_here")}"
flow_log_group_name = "${var.prefix}_cluster_name_here_stream_alert_flow_logs"
}
In variables.json
, define the specific VPC, Subnet, or ENI IDs to capture flow logs from:
{
"flow_log_settings": {
"vpcs": ["vpc-id"],
"subnets": ["public-subnet-id"],
"enis": ["eni-id"]
},
}
Apply these changes:
$ ./stream_alert_cli.py terraform build
To configure StreamAlert to process these logs, follow the instructions here to add the flow_log
type in conf/logs.json
and conf/sources.json
.
Finally, deploy the new version of the AWS Lambda function:
$ ./stream_alert_cli.py lambda deploy --env staging --func alert
If no Cloudwatch alarms are triggered, deploy to production
:
$ ./stream_alert_cli.py lambda deploy --env production --func alert
Nested Record Support
It is common for applications (Cloudwatch, Inspec, and more) to output a single line JSON object. Previously, StreamAlert treated each line as an individual payload. This meant nested JSON objects were treated as one payload. With this release, StreamAlert now detects nested records, and parses them as individual payloads to be processed by rules.
As an example, let's look at the following log (prettified for this example):
{
"Records": [
{
"eventVersion": "1",
"eventID": "1",
"eventTime": "10:45:35 PM UTC",
"eventType": "1",
"request": "aws lambda list-functions",
"awsRegion": "us-east-1"
},
{
"eventVersion": "1",
"eventID": "2",
"eventTime": "11:45:35 PM UTC",
"eventType": "2",
"request": "aws lambda delete-function",
"awsRegion": "us-east-1"
}
]
}
When defining a schema for a nested log type like this, a hint
named records
must be specified with a JSONPath-RW selector pointing to the nested records:
"nested_log_type": {
"parser": "json",
"schema": {
"eventVersion": "string",
"eventID": "string",
"eventTime": "string",
"eventType": "string",
"request": "string",
"awsRegion": "string"
},
"hints" : {
"records": "Records[*]"
}
Overhauled Integration Testing
Rule testing is a crucial part of writing safe, effective rules. With the new integration testing framework, rule fixtures (example logs) are defined in test/integration/rules
, and have the following structure:
{
"records": [
{
"data": "Jan 01 2017,1487095529,test-host-2,this is test data for rules,cluster 5",
"description": "host is test-host-2",
"trigger": true,
"source": "prefix_cluster1_stream_alert_kinesis",
"service": "kinesis"
}
]
}
Each record
includes a log to test (the data
key), along with metadata (description
, source
, service
), and a desired outcome of the test (whether or not it should trigger
an alert).
For this example, the following rule will be tested:
@rule(logs=['csv_log'],
matchers=[],
outputs=['s3'])
def sample_csv_rule(rec):
return rec['host'] == 'test-host-2'
To run tests against this rule, use the following helper script:
$ ./test/scripts/integration_test_kinesis.sh
sample_csv_rule
test: host is test-host-2 [Pass]
For additional examples, check out Rules Testing.
Simpler Rules and Matcher Declaration
Previously, rules and matchers required a name argument as well as a function name. This has been simplified, and now you only need to define the name in one place:
Before:
@matcher('prod')
def prod(rec):
return rec['environment'] == 'prod'
@rule('invalid_subnet',
logs=['osquery'],
matchers=['prod'],
outputs=['pagerduty'])
def invalid_subnet(rec):
return True
After:
@matcher()
def prod(rec): # matcher name `prod`
return rec['environment'] == 'prod'
@rule(logs=['osquery'],
matchers=['prod'],
outputs=['pagerduty'])
def invalid_subnet(rec): # rule name `invalid_subnet`
return True
External Alert Handling
To accommodate users with existing incident management and alerting infrastructure, a new flag has been added to return a list of generated alerts (instead of handling them with StreamAlert Outputs).
This option is enabled by passing return_alerts=True
to the StreamAlert
initializer in the main.py
function handler:
from stream_alert.handler import StreamAlert
def handler(event, context):
"""Main Lambda handler function"""
alerts = StreamAlert(return_alerts=True).run(event, context)
# custom workflow goes here
Bug Fixes
1.1.0
New Features
- Modular Parser Classes: Adding new parsers is now simplified and straightforward. To start, add a new Parser class in
stream_alert/parsers.py
with the following structure:
@parser
class NewParserName(ParserBase):
# the name of the new parser to be called in the conf/logs.json
__parserid__ = 'new-parser-name'
def parser(self):
# these attributes are automatically set on initialization
data = self.data
options = self.options
schema = self.schema
# parser logic goes here
# optionally, you can define helper methods in this
# class to make parsing easier/cleaner
# return a parsed dictionary
return parsed_payload
- Custom CSV Delimiters: Specify a custom delimiter for CSV log types:
"csv_log": {
"schema": {
"date": "string",
...
},
"parser": "csv",
"delimiter": "|",
"hints": {}
}
- Default Delimiters: When declaring CSV or KV log types, if you are using built-in defaults (
,
for csv,k=v
for kv), you can omit these settings from your config.