- Interrupts and exceptions (x86)
- Interrupts and exceptions (Linux)
- Deferrable work
What is an interrupt?¶
An interrupt is an event that alters the normal execution flow of a program and can be generated by hardware devices or even by the CPU itself.
Interrupts can be grouped into two categories based on the source of the interrupt:
- synchronous, generated by executing an instruction
- asynchronous, generated by an external event
- can be ignored
- signalled via INT pin
- cannot be ignored
- signalled via NMI pin
Synchronous interrupts, usually named exceptions, handle conditions detected by the processor itself in the course of executing an instruction. Divide by zero or a system call are examples of exceptions.
Asynchronous interrupts, usually named interrupts, are external events generated by I/O devices. For example a network card generates an interrupts to signal that a packet has arrived.
There are two sources for exceptions:
- processor detected
- int n
Processor detected exceptions are raised when an abornmal condition is detected while executing an instruction.
A fault is a type of exception that is reported before the execution of the instruction and can be usually corrected. The saved EIP is the address of the instruction that caused the fault, so after the fault is corrected the program can re-execute the faulty instruction. (e.g page fault).
A trap is a type of exception that is reported after the execution of the instruction in which the exception was detected. The saved EIP is the address of the instruction after the instuction that caused the trap. (e.g debug trap).
Programmable Interrupt Controller¶
A device supporting interrupts has an output pin used for signalling an Interrupt ReQuest. IRQ pins are connected to a device named Programmable Interrupt Controller (PIC) which is connected to CPU’s INTR pin.
A PIC usually has a set of ports used to exchange information with the CPU. When a device connected to one of the PIC’s IRQ lines needs CPU attention the following flow happens:
- device raises an interrupt on the corresponding IRQn pin
- PIC converts the IRQ into a vector number and writes it to a port for CPU to read
- PIC raises an interrupt on CPU INTR pin
- PIC waits for CPU to acknowledge an interrupt
- CPU handles the interrupt
Will see later how the CPU handles the interrupt. Important to notice is that by design PIC won’t raise another interrupt until the CPU acknowledged the current interrupt.
Each IRQ line can be individually disabled. This allows simplifying design by making sure that interrupt handlers are always executed serially.
Advanced Programmable Interrupt Controller¶
With multicore systems, each core has a local APIC used to process interrupts from locally connected devices like timers or thermals sensors.
I/O APIC is used to distribute IRQ from external devices to CPU cores.
After discussing the hardware, now let’s see how the processor handles an interrupt.
In order to synchronize access to shared data between the interrupt handler and other potential concurrent activities such as driver initialization or driver data processing, it is often required to enable and disable interrupts in a controlled fashion.
This can be accomplished at several levels:
- at the device level
- by programming the device control registers
- at the PIC level
- PIC can be programmed to disable a given IRQ line
- at the CPU level; for example, on x86 one can use the following instructions:
- cli (CLear Interrupt flag)
- sti (SeT Interrupt flag)
Architecture specific interrupt handling in Linux¶
In this section we will discuss how Linux handles interrupts for the x86 architecture.
Interrupt Descriptor Table¶
The interrupt descriptor table (IDT) associates each interrupt or exception identifier with a descriptor for the instructions that service the associated event. We will name the identifier as vector number and the associated instructions as interrupt/exception handler.
An IDT has the following characteristics:
- it is used as a jump table by the CPU when a given vector is triggered
- it is an array of 256 x 8 bytes entries
- may reside anywhere in physical memory
- processor locates IDT by the means of IDTR
Below we can find Linux IRQ vector layout. The first 32 entries are reserved for exceptions, vector 128 is used for sycall interface and the rest are used mostly for hardware interrupts handlers.
On x86 an IDT entry has 8 bytes and it is named gate. There can be 3 types of gates:
- interrupt gate, holds the address of an interupt or exception handler. Jumping to the handler disables maskable interrupts (IF flag is cleared).
- trap gates, similar with an interrupt gate but it does not disable maskable interrupts while jumping to interupt/exception handler.
- task gates (not used in Linux)
Lets have a look at several fields of an IDT entry:
- segment selector, index into GDT/LDT to find the start of the code segment where the interupt handlers resides
- offset, offset inside the code segment
- T, represents the type of gate
- DPL, minimum privilege required for using the segments content.
Interrupt handler address¶
In order to find the interrupt handler address we first need to find the start address of the code segment where interrupt handler resides. For this we use the segment selector to index into GDT/LDT where we can find the corresponding segment descriptor. This will provide the start address kept in the ‘base’ field. Using base address and the offset we can now go at the start of the interrupt handler.
Stack of interrupt handler¶
Similar with control transfer to a normal function, a control transfer to an interrupt or exception handler uses the stack to store the information needed for returning to the interrupted code.
As can be seen in the figure below, an interrupt pushes the EFLAGS register before saving the address of the interrupted instruction. Certain types of exceptions also cause an error code to be pushed on the stack to help debug the exception.
Handling an interrupt request¶
After an interrupt request has been generated the processor runs a sequence of events that eventually ends up with running the kernel interrupt handler:
CPU checks the current privilege level
if need to change privilege level
- change stack with the one associated with new privilege
- save old stack information on the new stack
save EFLAGS, CS, EIP on stack
save error code on stack in case of an abort
execute the kernel interrupt handler
Returning from an interrupt handler¶
Most architectures offers special instructions to clean-up the stack and resume the execution after the interrupt handler has been executed. On x86 IRET is used to return from an interrupt handler. IRET is similar with RET except that IRET increments ESP by extra four bytes (because of the flags on stack) and moves the saved flags into EFLAGS register.
To resume the execution after an interrupt the following sequence is used (x86):
- pop the eror code (in case of an abort)
- call IRET
- pops values from the stack and restore the following register: CS, EIP, EFLAGS
- if privilege level changed returns to the old stack and old privilege level
Generic interrupt handling in Linux¶
In Linux the interrupt handling is done in three phases: critical, immediate and deferred.
In the first phase the kernel will run the generic interrupt handler that determines the interrupt number, the interrupt handler for this particular interrupt and the interrupt controller. At this point any timing critical actions will also be performed (e.g. acknowledge the interrupt at the interrupt controller level). Local processor interrupts are disabled for the duration of this phase and continue to be disabled in the next phase.
In the second phase all of the device drivers handler associated with this interrupt will be executed . At the end of this phase the interrupt controller’s “end of interrupt” method is called to allow the interrupt controller to reassert this interrupt. The local processor interrupts are enabled at this point.
|||Note that it is possible that one interrupt is associated with multiple devices and in this case it is said that the interrupt is shared. Usually, when using shared interrupts it is the responsibility of the device driver to determine if the interrupt is target to it’s device or not.|
Finally, in the last phase of interrupt handling interrupt context deferrable actions will be run. These are also sometimes known as “bottom half” of the interrupt (the upper half being the part of the interrupt handling that runs with interrupts disabled). At this point interrupts are enabled on the local processor.
Nested interrupts and exceptions¶
Nesting interrupts is permitted on many architectures. Some architectures define interrupt levels that allow preemption of an interrupt only if the pending interrupt has a greater priority then the current (settable) level (e.g see ARM’s priority mask).
In order to support as many architectures as possible, Linux has a more restrictive interrupt nesting implementation:
- an exception (e.g. page fault, system call) can not preempt an interrupt; if that occurs it is considered a bug
- an interrupt can preempt an exception or other interrupts; however, only one level of interrupt nesting is allowed
The diagram below shows the possible nesting scenarios:
While an interrupt is handled (from the time the CPU jumps to the interrupt handler until the interrupt handler returns - e.g. IRET is issued) it is said that code runs in “interrupt context”.
Code that runs in interrupt context has the following properties:
- it runs as a result of an IRQ (not of an exception)
- there is no well defined process context associated
- not allowed to trigger a context switch (no sleep, schedule, or user memory access)
Deferrable actions are used to run callback functions at a later time. If deferrable actions scheduled from an interrupt handler, the associated callback function will run after the interrupt handler has completed.
There are two large categories of deferrable actions: those that run in interrupt context and those that run in process context.
The purpose of interrupt context deferrable actions is to avoid doing too much work in the interrupt handler function. Running for too long with interrupts disabled can have undesired effects such as increased latency or poor system performance due to missing other interrupts (e.g. dropping network packets because the CPU did not react in time to dequeue packets from the network interface and the network card buffer is full).
In Linux there are three types of deferrable actions:
- runs in interrupt context
- statically allocated
- same handler may run in parallel on multiple cores
- runs in interrupt context
- can be dynamically allocated
- same handler runs are serialized
- run in process context
Deferrable actions have APIs to: initialize an instance, activate or schedule the action and mask/disable and unmask/enable the execution of the callback function. The later is used for synchronization purposes between the callback function and other contexts.
Soft IRQs is the term used for the low level mechanism that implements deferring work from interrupt handlers but that still runs in interrupt context.
Soft IRQ APIs:
Once activated, the callback function
- after an interrupt handler or
- from the ksoftirqd kernel thread
Since softirqs can reschedule themselves or other interrupts can occur that
reschedules them, they can potentially lead to (temporary) process starvation if
checks are not put into place. Currently, the Linux kernel does not allow
running soft irqs for more than
MAX_SOFTIRQ_TIME or rescheduling for
MAX_SOFTIRQ_RESTART consecutive times.
Once these limits are reached a special kernel thread, ksoftirqd is wake-up and all of the rest of pending soft irqs will be run from the context of this kernel thread.
Soft irqs usage is restricted, they are use by a handful of subsystems that have low latency requirements. For 4.19 this is the full list of soft irqs:
Tasklets are a dynamic type (not limited to a fixed number) of deferred work running in interrupt context.
Tasklets are implemented on top of two dedicated softirqs:
Tasklets are also serialized, i.e. the same tasklet can only execute on one processor.
Workqueues are a type of deferred work that runs in process context.
They are implemented on top of kernel threads.
Timers are implemented on top of the