Table Of Contents
Introduction
Elastic Beanstalk
AWS Elastic Beanstalk is an easy-to-use service for deploying and scaling web applications and services developed with Java, .NET, PHP, Node.js, Python, Ruby, Go, and Docker on familiar servers such as Apache, Nginx, Passenger, and IIS.
You can simply upload your code and Elastic Beanstalk automatically handles the deployment, from capacity provisioning, load balancing, auto-scaling to application health monitoring. At the same time, you retain full control over the AWS resources powering your application and can access the underlying resources at any time.
Fluentd
Fluentd is an open source data collector for unified logging layer. Fluentd allows you to unify data collection and consumption for a better use and understanding of data.
- Fluentd can be used to tail access/error logs and transport them reliably to remote systems.
- Fluentd can “grep” for events and send out alerts.
- Fluentd can function as middleware to enable asynchronous, scalable logging for user action events.
Kinesis Streams
Amazon Kinesis Data Streams (KDS) is a massively scalable and durable real-time data streaming service.
KDS can continuously capture gigabytes of data per second from hundreds of thousands of sources such as website clickstreams, database event streams, financial transactions, social media feeds, IT logs, and location-tracking events.
Kinesis Firehose
Amazon Kinesis Data Firehose is a fully managed service for delivering real-time streaming data to destinations such as Amazon Simple Storage Service (Amazon S3), Amazon Redshift, Amazon Elasticsearch Service (Amazon ES), and Splunk.
You configure your data producers to send data to Kinesis Data Firehose, and it automatically delivers the data to the destination that you specified.
S3
Amazon Simple Storage Service (Amazon S3) is an object storage service that offers industry-leading scalability, data availability, security, and performance.
This means customers of all sizes and industries can use it to store and protect any amount of data for a range of use cases, such as websites, mobile applications, backup and restore, archive, enterprise applications, IoT devices, and big data analytics.
Mechanism
Here we will show how to use fluentd installed in elasticbeanstalk to import tomcat logs to kinesis streams and then subsequently output them to S3 using firehose.
Kinesis Stream
Let’s create a new kinesis stream in N.Virginia region which will consume our log events from fluentd with a data retention period of 24 hours.
Kinesis stream name: aws-eb-fluentd-kinesis-stream
Number of shards: 3 (to have higher throughput)
Fluentd
We will use the stable distribution of fluentd called td-agent
. At a high level, below are the steps:
Installation
Install td-agent for Amazon Linux 1:
curl -L https://toolbelt.treasuredata.com/sh/install-amazon1-td-agent3.sh | sh
Install Amazon Kinesis plugin which we will use to publish the logs to kinesis stream:
sudo td-agent-gem install fluent-plugin-kinesis --no-document --minimal-deps --no-suggestions --conservative
Configuration
Configuration file allows the user to control the input and output behavior of Fluentd by (1) selecting input and output plugins and (2) specifying the plugin parameters. The file is required for Fluentd to operate properly.
Config file needs to be placed at /etc/td-agent/td-agent.conf
Let’s define a sample configuration file that will publish catalina logs to kinesis stream.
Source Directive
Fluentd’s input sources are enabled by selecting and configuring the desired input plugins using source directives.
<source>
@type tail
tag catalina.errors
path /var/log/tomcat8/catalina.out
pos_file /var/log/td-agent/tmp/catalina.out.pos
<parse>
@type multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{1,3})[ ]{1,}(?<level>[^\s]+)[ ]{1,}\d{1,}[ ]{1,}---[ ]{1,}\[(?<thread>.*)\] (?<message>(.|\n)*)/
</parse>
@log_level error
</source>
Params | Value | Description |
---|---|---|
@type | tail | We make use of tail input plugin which allows Fluentd to read events from the tail of text files |
tag | catalina.errors | We create a tag catalina.errors which will be used as the directions for Fluentd’s internal routing engine |
path | /var/log/tomcat8/catalina.out | paths to read the text files |
pos_file | /var/log/td-agent/tmp/catalina.out.pos | Fluentd will use this file to record the position it last read into this file. pos_file handles multiple positions in one file |
parse (directive) | Format of the log. in_tail uses parser plugin to parse the log |
Parse Directive
Let’s explore the parser directive in more detail here.
<parse>
@type multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{1,3})[ ]{1,}(?<level>[^\s]+)[ ]{1,}\d{1,}[ ]{1,}---[ ]{1,}\[(?<thread>.*)\] (?<message>(.|\n)*)/
</parse>
Params | Value | Description |
---|---|---|
@type | multiline | Multiline parser plugin parses multiline logs |
format_firstline | regex | Regex for detecting start line of multiline log |
format1 | regex | Regex for matching multiline log |
Below is a sample catalina log from the application which we will be deploying to beanstalk later:
2020-03-18 01:21:58.104 ERROR 3660 --- [nio-8080-exec-6] com.eb.RestController : Runtime error occurred
java.lang.ArithmeticException: / by zero
at com.eb.RestController.error(RestController.java:23) ~[classes/:na]
at sun.reflect.GeneratedMethodAccessor42.invoke(Unknown Source) ~[na:na]
at sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_232]
at java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_232]
Regex expression will match this multiline log as follows:
Match Directive
The “match” directive looks for events with matching tags and processes them. he most common use of the match directive is to output events to other systems (for this reason, the plugins that correspond to the match directive are called “output plugins”).
<match catalina.errors>
@type kinesis_streams
stream_name aws-eb-fluentd-kinesis-stream
region us-east-1
<buffer>
chunk_limit_size 1m
flush_interval 10s
flush_thread_count 2
</buffer>
</match>
Match directive applies to events with a tag matching the pattern catalina.errors
which will be sent to the output destination.
Params | Value | Description |
---|---|---|
@type | kinesis_streams | Kinesis plugin to output logs to stream |
stream_name | aws-eb-fluentd-kinesis-stream | Name of the stream to put data |
region | us-east-1 | AWS region of your stream |
chunk_limit_size | 1m | max size of each chunks: events will be written into chunks until the size of chunks become this size |
flush_interval | 10s | flush/write chunks per specified time |
flush_thread_count | 2 | number of threads of output plugins, which is used to write chunks in parallel |
https://github.com/awslabs/aws-fluent-plugin-kinesis
https://docs.fluentd.org/configuration/buffer-section
System Directive
We change the logging level to trace
for troubleshooting purposes and to understand the activities fluentd performs. It’s not recommended to set it to trace in your production environment.
<system>
log_level trace
</system>
Complete Configuration
Complete configuration file is as below:
<system>
log_level trace
</system>
<source>
@type tail
tag catalina.errors
path /var/log/tomcat8/catalina.out
pos_file /var/log/td-agent/tmp/catalina.out.pos
<parse>
@type multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{1,3})[ ]{1,}(?<level>[^\s]+)[ ]{1,}\d{1,}[ ]{1,}---[ ]{1,}\[(?<thread>.*)\] (?<message>(.|\n)*)/
</parse>
@log_level error
</source>
<match catalina.errors>
@type kinesis_streams
stream_name aws-eb-fluentd-kinesis-stream
region us-east-1
<buffer>
chunk_limit_size 1m
flush_interval 10s
flush_thread_count 2
</buffer>
</match>
Apply Configuration
We then restart the td-agent to apply the configuration changes:
/etc/init.d/td-agent restart
Beanstalk
Our intention is to tail the catalina logs and publish them to kinesis stream using fluentd.
For this tutorial we make use of below sample spring boot application that exposes REST API and generates app logs.
ebextensions
You can add AWS Elastic Beanstalk configuration files (.ebextensions) to your web application’s source code to configure your environment and customize the AWS resources that it contains.
We use the ebextensions to define fluentd installation and configuration files.
We create three config files which cover fluentd, kinesis plugin installations and configurations which we had covered earlier.
Configuration files are available in the ebextensions folder of the repo.
0-td-agent-gen-config.config
files:
"/etc/td-agent/td-agent.conf":
owner: root
group: root
content: |
<system>
log_level trace
</system>
<source>
@type tail
tag catalina.errors
path /var/log/tomcat8/catalina.out
pos_file /var/log/td-agent/tmp/catalina.out.pos
<parse>
@type multiline
format_firstline /\d{4}-\d{1,2}-\d{1,2}/
format1 /^(?<time>\d{4}-\d{1,2}-\d{1,2} \d{1,2}:\d{1,2}:\d{1,2}.\d{1,3})[ ]{1,}(?<level>[^\s]+)[ ]{1,}\d{1,}[ ]{1,}---[ ]{1,}\[(?<thread>.*)\] (?<message>(.|\n)*)/
</parse>
@log_level error
</source>
<match catalina.errors>
@type kinesis_streams
stream_name aws-eb-fluentd-kinesis-stream
region us-east-1
<buffer>
chunk_limit_size 1m
flush_interval 10s
flush_thread_count 2
</buffer>
</match>
1-td-agent-install.config
# errors get logged to /var/log/cfn-init.log. See Also /var/log/eb-tools.log
commands:
01-command:
command: curl -L https://toolbelt.treasuredata.com/sh/install-amazon1-td-agent3.sh | sh
2-td-agent-fluent-kinesis-plugin-install.config
# errors get logged to /var/log/cfn-init.log. See Also /var/log/eb-tools.log
commands:
01-command:
command: sudo /usr/sbin/td-agent-gem install fluent-plugin-kinesis --no-document --minimal-deps --no-suggestions --conservative
02-command:
command: /etc/init.d/td-agent restart
ebextensions processes keys in the following order:
[1] Files
[2] Commands
https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/customize-containers-ec2.html
Following operations are performed when ebextensions configuration files are executed:
[1] Copy fluentd configuration file to /etc/td-agent/td-agent.conf
with ownership to root
user.
[2] Install fluentd td-agent
for Amazon Linux 1
[3] Install fluentd kinesis plugin
[4] Restart td-agent
to apply the configuration changes
Note: ebextensions config files are executed as root
user in the EC2 instance
Deployment
Now we have a spring boot application which exposes REST API and has ebextentions file which configures fluentd to publish logs to kinesis stream. Let’s deploy this application to elastic beanstalk.
[1] Download the war file from github releases https://github.com/HarshadRanganathan/aws-eb-fluentd-kinesis-app/releases/download/v0.1/aws-eb-fluentd-kinesis-app-0.1.war
[2] Create a new application in EB console https://console.aws.amazon.com/elasticbeanstalk/home?region=us-east-1#/createNewApplication by giving an application name.
[3] We then create a web server environment under the application as follows:
Environment name: aws-eb-fluentd-kinesis-app
Preconfigured platform: Tomcat
Application code: Upload the war file which you had downloaded and give a version label
Under configure more options - choose High availability preset, configure VPC, subnets, Load balancer, Virtual machine key pair etc. (EB application creation is beyond the scope of this guide)
Important thing to note here is that the IAM instance profile which you configure for your EB app should have permissions to list and write to Kinesis streams.
If in case, you choose to use the default aws-elasticbeanstalk-ec2-role
it should already have kinesis access.
Verification
Let’s check if our fluentd configuration is working as expected and pushing logs to Kinesis stream.
EC2
First step is to check if the ebextensions configuration updates got applied successfully. SSH into the EB app and perform below checks.
Check for any errors in below log files and that the commands got executed:
$ view /var/log/cfn-init.log
2020-03-19 19:46:50,762 [INFO] -----------------------Starting build-----------------------
2020-03-19 19:46:50,771 [INFO] Running configSets: Infra-EmbeddedPreBuild
2020-03-19 19:46:50,774 [INFO] Running configSet Infra-EmbeddedPreBuild
2020-03-19 19:46:50,778 [INFO] Running config prebuild_0_test
2020-03-19 19:46:50,783 [INFO] Running config prebuild_1_test
2020-03-19 19:46:53,074 [INFO] Command 01-command succeeded
2020-03-19 19:46:53,078 [INFO] Running config prebuild_2_test
2020-03-19 19:46:53,474 [INFO] Command 01-command succeeded
2020-03-19 19:46:57,641 [INFO] Command 02-command succeeded
2020-03-19 19:46:57,642 [INFO] ConfigSets completed
2020-03-19 19:46:57,643 [INFO] -----------------------Build complete-----------------------
$ view /var/log/eb-tools.log
Next we check if the fluentd processes are running:
$ ps w -C ruby -C td-agent --no-heading
7673 ? Sl 0:00 /opt/td-agent/embedded/bin/ruby /usr/sbin/td-agent --log /var/log/td-agent/td-agent.log --use-v1-config --group td-agent --daemon /var/run/td-agent/td-agent.pid
7678 ? Sl 0:01 /opt/td-agent/embedded/bin/ruby -Eascii-8bit:ascii-8bit /usr/sbin/td-agent --log /var/log/td-agent/td-agent.log --use-v1-config --group td-agent --daemon /var/run
8242 ? Ssl 0:00 puma 2.11.1 (tcp://127.0.0.1:22221) [healthd]
Let’s check fluentd log file available at /var/log/td-agent/
to see if things are working as expected:
$ view td-agent.log
2020-03-18 01:14:59 +0000 [info]: parsing config file is succeeded path="/etc/td-agent/td-agent.conf"
2020-03-18 01:14:59 +0000 [info]: starting fluentd-1.3.3 pid=3347 ruby="2.4.5"
2020-03-18 01:14:59 +0000 [info]: spawn command to main: cmdline=["/opt/td-agent/embedded/bin/ruby", "-Eascii-8bit:ascii-8bit", "/usr/sbin/td-agent", "--log", "/var/log/td-agent/td-agent.log", "--use-v1-config", "--group", "td-agent", "--daemon", "/var/run/td-agent/td-agent.pid", "--under-supervisor"]
2020-03-18 01:15:01 +0000 [info]: #0 fluentd worker is now running worker=0
2020-03-18 01:15:01 +0000 [debug]: #0 enqueue_thread actually running
2020-03-18 01:15:01 +0000 [trace]: #0 enqueueing all chunks in buffer instance=70262055861420
2020-03-18 01:15:01 +0000 [debug]: #0 flush_thread actually running
2020-03-18 01:15:01 +0000 [debug]: #0 flush_thread actually running
2020-03-18 01:15:02 +0000 [trace]: #0 enqueueing all chunks in buffer instance=70262055861420
2020-03-18 01:15:03 +0000 [trace]: #0 enqueueing all chunks in buffer instance=70262055861420
We see that the fluentd worker is running.
Let’s generate few app logs and see if those are available in the kinesis stream.
Send couple of requests to the beanstalk rest api (use load balancer DNS/EB URL/localhost) as below to generate success and error logs.
$ curl --location --request GET 'http://aws-eb-fluentd-kinesis-app.eba-ttuxbwiu.us-east-1.elasticbeanstalk.com/helloworld'
{"message": "Hello World"}
$ curl --location --request GET 'http://aws-eb-fluentd-kinesis-app.eba-ttuxbwiu.us-east-1.elasticbeanstalk.com/err'
{"error": "Runtime Error"}
Above request generates ArithmeticException
errors.
In the fluentd logs it will show that the logs are written to the chunks and are getting flushed periodically.
2020-03-18 00:16:11 +0000 [trace]: #0 enqueueing all chunks in buffer instance=69920775650060
2020-03-18 00:16:11 +0000 [trace]: #0 adding metadata instance=69920775650060 metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil>
2020-03-18 00:16:22 +0000 [trace]: #0 writing events into buffer instance=69920775650060 metadata_size=1
2020-03-18 00:16:24 +0000 [debug]: #0 Finish writing chunk
2020-03-18 00:16:24 +0000 [trace]: #0 write operation done, committing chunk="5a115f5a72697091e7722abaa8815591"
2020-03-18 00:16:24 +0000 [trace]: #0 committing write operation to a chunk chunk="5a115f5a72697091e7722abaa8815591" delayed=false
2020-03-18 00:16:24 +0000 [trace]: #0 purging a chunk instance=69920775650060 chunk_id="5a115f5a72697091e7722abaa8815591" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil>
2020-03-18 00:16:24 +0000 [trace]: #0 chunk purged instance=69920775650060 chunk_id="5a115f5a72697091e7722abaa8815591" metadata=#<struct Fluent::Plugin::Buffer::Metadata timekey=nil, tag=nil, variables=nil>
2020-03-18 00:16:24 +0000 [trace]: #0 done to commit a chunk chunk="5a115f5a72697091e7722abaa8815591"
2020-03-18 00:16:31 +0000 [trace]: #0 trying flush for a chunk chunk="5a115f6440a601d5529d0fb2b5b50e16"
2020-03-18 00:16:31 +0000 [trace]: #0 adding write count instance=69920770766840
2020-03-18 00:16:31 +0000 [trace]: #0 executing sync write chunk="5a115f6440a601d5529d0fb2b5b50e16"
2020-03-18 00:16:31 +0000 [debug]: #0 Write chunk 5a115f6440a601d5529d0fb2b5b50e16 / 4 records / 31 KB
2020-03-18 00:16:31 +0000 [debug]: #0 Finish writing chunk
Now let’s check in Kinesis stream to see if our log events are received.
Kinesis Stream
We verify if the log events are available in the kinesis stream by using AWS CLI.
First, get the list of shard id’s for the stream:
$ aws kinesis list-shards --stream-name aws-eb-fluentd-kinesis-stream
{
"Shards": [
{
"ShardId": "shardId-000000000000",
"HashKeyRange": {
"StartingHashKey": "0",
"EndingHashKey": "113427455640312821154458202477256070484"
},
"SequenceNumberRange": {
"StartingSequenceNumber": "49605210697203319534060347009940518023387210197078376450"
}
},
...
]
}
We then need to get shard iterator for each of the shards which we had configured.
$ aws kinesis get-shard-iterator --stream-name aws-eb-fluentd-kinesis-stream --shard-iterator-type TRIM_HORIZON --shard-id shardId-000000000000
{
"ShardIterator": "AAAAAAAAAAHgjXtAKjm8UpZMMP6XYzk5rThlgKNFRG78/ZeeFu/+muRJvKdez6ZJQ5EmMd2UXt2ikhXrmvLKg5vL32mSrWYEwJWy+wVbM02/UhAkrX1dXToTkIMA7FRn0pyrHtGER791k1CwOJCq+dCAsmo5vJsgbDFslvKMvsE36QxK0zTmWKsSX4qr5w6NUSG09cQkDZlF2Rr8CvIGVn7vF1HVPjP+T0AR60yJVfAmx+OYvi74TA=="
}
Once we got the iterator, we get the records from the stream.
$ aws kinesis get-records --shard-iterator AAAAAAAAAAHgjXtAKjm8UpZMMP6XYzk5rThlgKNFRG78/ZeeFu/+muRJvKdez6ZJQ5EmMd2UXt2ikhXrmvLKg5vL32mSrWYEwJWy+wVbM02/UhAkrX1dXToTkIMA7FRn0pyrHtGER791k1CwOJCq+dCAsmo5vJsgbDFslvKMvsE36QxK0zTmWKsSX4qr5w6NUSG09cQkDZlF2Rr8CvIGVn7vF1HVPjP+T0AR60yJVfAmx+OYvi74TA==
{
"SequenceNumber": "49605210697203319534060386749927772490963059369909944322",
"ApproximateArrivalTimestamp": 1584490674.415,
"Data": "{"level":"ERROR","thread":"nio-8080-exec-9","message":"com.eb.RestController                    : Runtime error occurred\n\njava.lang.ArithmeticException: / by zero\n\tat com.eb.RestController.error(RestController.java:23) ~[classes/:na]\n\tat sun.reflect.GeneratedMethodAccessor43.invoke(Unknown Source) ~[na:na]\n\tat sun.reflect.DelegatingMethodAccessorImpl.invoke(DelegatingMethodAccessorImpl.java:43) ~[na:1.8.0_232]\n\tat java.lang.reflect.Method.invoke(Method.java:498) ~[na:1.8.0_232]\n\tat org.springframework.web.method.support.InvocableHandlerMethod.doInvoke(InvocableHandlerMethod.java:189) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.method.support.InvocableHandlerMethod.invokeForRequest(InvocableHandlerMethod.java:138) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.mvc.method.annotation.ServletInvocableHandlerMethod.invokeAndHandle(ServletInvocableHandlerMethod.java:102) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.invokeHandlerMethod(RequestMappingHandlerAdapter.java:892) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.mvc.method.annotation.RequestMappingHandlerAdapter.handleInternal(RequestMappingHandlerAdapter.java:797) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.mvc.method.AbstractHandlerMethodAdapter.handle(AbstractHandlerMethodAdapter.java:87) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.DispatcherServlet.doDispatch(DispatcherServlet.java:1038) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.DispatcherServlet.doService(DispatcherServlet.java:942) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.FrameworkServlet.processRequest(FrameworkServlet.java:1005) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.servlet.FrameworkServlet.doGet(FrameworkServlet.java:897) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:634) [tomcat8-servlet-3.1-api.jar:na]\n\tat org.springframework.web.servlet.FrameworkServlet.service(FrameworkServlet.java:882) [spring-webmvc-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat javax.servlet.http.HttpServlet.service(HttpServlet.java:741) [tomcat8-servlet-3.1-api.jar:na]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:231) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.apache.tomcat.websocket.server.WsFilter.doFilter(WsFilter.java:52) [tomcat-websocket.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.springframework.web.filter.RequestContextFilter.doFilterInternal(RequestContextFilter.java:99) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.springframework.web.filter.FormContentFilter.doFilterInternal(FormContentFilter.java:92) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.springframework.web.filter.HiddenHttpMethodFilter.doFilterInternal(HiddenHttpMethodFilter.java:93) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:130) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]\n\tat org.springframework.boot.web.servlet.support.ErrorPageFilter.access$000(ErrorPageFilter.java:66) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]\n\tat org.springframework.boot.web.servlet.support.ErrorPageFilter$1.doFilterInternal(ErrorPageFilter.java:105) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.boot.web.servlet.support.ErrorPageFilter.doFilter(ErrorPageFilter.java:123) [spring-boot-2.1.4.RELEASE.jar:2.1.4.RELEASE]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.springframework.web.filter.CharacterEncodingFilter.doFilterInternal(CharacterEncodingFilter.java:200) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.springframework.web.filter.OncePerRequestFilter.doFilter(OncePerRequestFilter.java:107) [spring-web-5.1.6.RELEASE.jar:5.1.6.RELEASE]\n\tat org.apache.catalina.core.ApplicationFilterChain.internalDoFilter(ApplicationFilterChain.java:193) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.ApplicationFilterChain.doFilter(ApplicationFilterChain.java:166) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.StandardWrapperValve.invoke(StandardWrapperValve.java:199) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.StandardContextValve.invoke(StandardContextValve.java:96) [catalina.jar:8.5.50]\n\tat org.apache.catalina.authenticator.AuthenticatorBase.invoke(AuthenticatorBase.java:543) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.StandardHostValve.invoke(StandardHostValve.java:139) [catalina.jar:8.5.50]\n\tat org.apache.catalina.valves.ErrorReportValve.invoke(ErrorReportValve.java:81) [catalina.jar:8.5.50]\n\tat org.apache.catalina.valves.RemoteIpValve.invoke(RemoteIpValve.java:747) [catalina.jar:8.5.50]\n\tat org.apache.catalina.valves.AbstractAccessLogValve.invoke(AbstractAccessLogValve.java:678) [catalina.jar:8.5.50]\n\tat org.apache.catalina.core.StandardEngineValve.invoke(StandardEngineValve.java:87) [catalina.jar:8.5.50]\n\tat org.apache.catalina.connector.CoyoteAdapter.service(CoyoteAdapter.java:343) [catalina.jar:8.5.50]\n\tat org.apache.coyote.http11.Http11Processor.service(Http11Processor.java:609) [tomcat-coyote.jar:8.5.50]\n\tat org.apache.coyote.AbstractProcessorLight.process(AbstractProcessorLight.java:65) [tomcat-coyote.jar:8.5.50]\n\tat org.apache.coyote.AbstractProtocol$ConnectionHandler.process(AbstractProtocol.java:810) [tomcat-coyote.jar:8.5.50]\n\tat org.apache.tomcat.util.net.NioEndpoint$SocketProcessor.doRun(NioEndpoint.java:1623) [tomcat-coyote.jar:8.5.50]\n\tat org.apache.tomcat.util.net.SocketProcessorBase.run(SocketProcessorBase.java:49) [tomcat-coyote.jar:8.5.50]\n\tat java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1149) [na:1.8.0_232]\n\tat java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:624) [na:1.8.0_232]\n\tat org.apache.tomcat.util.threads.TaskThread$WrappingRunnable.run(TaskThread.java:61) [tomcat-util.jar:8.5.50]\n\tat java.lang.Thread.run(Thread.java:748) [na:1.8.0_232]\n"}
",
"PartitionKey": "b70af396c6b80edc88c33c24451e422d"
}
Data returned is Base64
encoded. So, if you decode the data it will be in JSON format as follows:
{
"level": "ERROR",
"thread": "nio-8080-exec-9",
"message": "com.eb.RestController : Runtime error occurred\n\njava.lang.ArithmeticException: / by zero\n\tat com.eb.RestController.error(RestController.java:23) ...."
}
Groups which we had defined in the regex are emitted as fields.
We see that the log events are getting published to kinesis streams as expected.
Now, let’s configure firehose to write the data to S3.
Kinesis Firehose
Next step is to create a delivery stream to S3 and integrate it with the kinesis stream which we had set up previously.
We will accomplish this using firehose.
Delivery stream name: aws-eb-fluentd-s3-firehose
Choose a source: aws-eb-fluentd-kinesis-stream (we choose kinesis stream as an input source here)
Transform: - (we are not planning to do any data transformations)
Destination: Amazon S3 (we choose amazon s3 as our destination source)
Give an S3 bucket and prefix that’s to be used for writing the data files e.g. S3 Bucket - eb-catalina-logs
Once you configure, send lot of requests to the beanstalk so that firehose writes the data to S3.
Note that firehose buffers the records until it reaches 1 MB or 60 seconds conditions.
Conclusion
We have now seen how we can use fluentd and kinesis to write the log events to S3.
[1] We use Fluentd, since as for inputs, Fluentd has a lot more community contributed plugins and libraries. For outputs, you can send not only Kinesis, but multiple destinations like Amazon S3, local file storage, etc.
[2] We use Kinesis streams to buffer the log events.
[3] We use firehose, since it allows to write data to multiple outputs like S3, Splunk, Elasticsearch, Redshift etc.