Diagnostic Interrupt - A way to debug and perform root cause analysis of unresponsive or unreachable AWS EC2 instance





Before 15 August 2019, it was very difficult to debug and do root cause analysis of an unresponsive or unreachable AWS EC2 instance.

Generally, the operating system gets crashed and rebooted when a kernel panic(in case of Linux) or stop error(in case of Windows) is triggered.
The operating system can be configured to perform diagnostics tasks on kernel panic or stop error such as generating memory dump files for root cause analysis and debugging.

On 15 August 2019, Amazon introduced a simple API to trigger a kernel panic by sending diagnostic Interrupt to an unresponsive instance, which in turn can direct the operating system to perform tasks like creating a crash dump, loading the secondary kernel, or obtaining a call trace.

Some concepts about operating system interrupt

An interrupt in the operating system is the highest priority signal send from hardware or software to the processor to process.
There are two types of interrupts that occur in the operating system:
  • Hardware Interrupt
  • Software Interrupt
In the context of EC2 debugging, we will talk only about Hardware Interrupts.
Hardware Interrupts are the signals generated by I/O actions from hardware, for e.g. pressing a key from keyboard generates hardware interrupt. Hardware interrupts can be further divided into two categories:
  • Maskable Interrupts
  • Non-Maskable Interrupts
Maskable Interrupts are the interrupts that can be delayed by the processor to process.
Non-Maskable Interrupts are the highest priority interrupts that the processor has to address immediately.
Now that we have got some understanding of what the operating system interrupts are, let's dive deep into how diagnostic interrupts(Non-Maskable Interrupts) could be sent to make the operating system perform some diagnostic tasks, like creating a crash dump.

Diagnostic Interrupt

It is Non-Maskable hardware interrupt sent to trigger kernel panic in unresponsive or unreachable EC2 instance configured to perform some diagnostic tasks, like creating crash dumps.

Send Diagnostic Interrupt to EC2 instance(Ubuntu) to generate a crash dump

  • Before sending the diagnostic interrupt to ec2 instance, ec2 machine must be configured to perform the required diagnostic task.
  • Configure Ubuntu EC2 instance to generate crash dump on kernel panic.
  • Once the EC2 instance is configured, use following AWS CLI command to send diagnostic interrupt:
  • By default crash dump file is saved to /var/crash/

Thanks for reading this blog. 
Hope you got some useful information about EC2 RCA and debugging approach.
If you liked this information, please share this knowledge with others.




Comments

Popular posts from this blog

Rclone: Sync files from ftp server to AWS S3 bucket

Query S3 bucket using AWS Athena service