Device Drivers notes

User and Kernel space

When you write device drivers, it’s important to make the distinction between “user space" and “kernel space".

Kernel space
Linux (which is a kernel) manages the machine's hardware in a simple and efficient manner, offering the user a simple and uniform programming interface.
In the same way, the kernel, and in particular its device drivers, form a bridge or interface between the end-user/programmer and the hardware.
Any subroutines or functions forming part of the kernel (modules and device drivers, for example) are considered to be part of kernel space.

User space
End-user programs, like the UNIX shell or other GUI based applications (kpresenter for example), are part of the user space.
Obviously, these applications need to interact with the system's hardware .
However, they don’t do so directly, but through the kernel supported functions.

Interfacing functions between user space and kernel space

The kernel offers several subroutines or functions in user space, which allow the end-user application programmer to interact with the hardware.
Usually, in UNIX or Linux systems, this dialogue is performed through functions or subroutines in order to read and write files.
The reason for this is that in Unix devices are seen, from the point of view of the user, as files.

On the other hand, in kernel space Linux also offers several functions or subroutines to perform the low level interactions directly with the hardware, and allow the transfer of information from kernel to user space.

User space(apps ) <--> Kernel space (drivers and modules) <--> hardware

The insmod command allows the installation of the module in the kernel.
ex # insmod nothing.ko

lsmod - Used to list the modules inztalled in the system .

rmmod - used to remove module from the kernel .

modprobe

--Modprobe utility is used to add loadable modules to the Linux kernel. You can also view and remove modules using modprobe command.

modprobe -l will display all available modules as shown below.

$ modprobe -l | less
kernel/arch/x86/kernel/cpu/mcheck/mce-inject.ko
kernel/arch/x86/kernel/cpu/cpufreq/e_powersaver.ko
kernel/arch/x86/kernel/cpu/cpufreq/p4-clockmod.ko
kernel/arch/x86/kernel/msr.ko
kernel/arch/x86/kernel/cpuid.ko
kernel/arch/x86/kernel/apm.ko
kernel/arch/x86/kernel/scx200.ko
kernel/arch/x86/kernel/microcode.ko
kernel/arch/x86/crypto/aes-i586.ko
kernel/arch/x86/crypto/twofish-i586.ko

Following example loads vmhgfs module to Linux kernel

 sudo modprobe vmhgfs

 modprobe -r option to unload a module from the kernel

 modprobe -r vmhgfs

Classes of Devices and Modules

The Linux way of looking at devices distinguishes between three fundamental device types. Each module usually implements one of these types, and thus is classifiable as a char module, a block module, or a network module.

Character devices

--A character (char) device is one that can be accessed as a stream of bytes (like a file).
--  The only relevant difference between a char device and a regular file is that you can always move back and forth in the regular file, whereas most char devices are just data channels.

Block devices

A block device is a device (e.g., a disk) that can host a filesystem. In most Unix systems, a block device can only handle I/O operations that transfer one or more whole blocks, which are usually 512 bytes (or a larger power of two) bytes in length.

Linux, instead, allows the application to read and write a block device like a char device—it permits the transfer of any number of bytes at a time. As a result, block and char devices differ only in the way data is managed internally by the kernel, and thus in the kernel/driver software interface.

Network interfaces

Any network transaction is made through an interface, that is, a device that is able to exchange data with other hosts. Usually, an interface is a hardware device, but it might also be a pure software device, like the loopback interface. A network interface is in charge of sending and receiving data packets, driven by the network subsystem of the kernel, without knowing how individual transactions map to the actual packets being transmitted. Many network connections (especially those using TCP) are stream-oriented, but network devices are, usually, designed around the transmission and receipt of packets.

The “Hello world" driver: loading and removing the driver in kernel space

When a module device driver is loaded into the kernel, some preliminary tasks are usually performed like resetting the device, reserving RAM, reserving interrupts, and reserving input/output ports, etc.

These tasks are performed, in kernel space, by two functions which need to be present (and explicitly declared): module_init and module_exit; they correspond to the user space commands insmod and rmmod , which are used when installing or removing a module.
To sum up, the user commands insmod and rmmod use the kernel space functions module_init and module_exit.


#include <linux/init.h>
#include <linux/module.h>
#include <linux/kernel.h>

MODULE_LICENSE("Dual BSD/GPL");

static int hello_init(void) {
  printk("<1> Hello world!\n");
  return 0;
}

static void hello_exit(void) {
  printk("<1> Bye, cruel world\n");
}

module_init(hello_init);
module_exit(hello_exit);

In order for functions  to be identified as the corresponding loading and removing functions, they have to be passed as parameters to the functions module_init and module_exit.

The printk function has also been introduced. It is very similar to the well known printf apart from the fact that it only works inside the kernel.
The <1> symbol shows the high priority of the message (low number).
In this way, besides getting the message in the kernel system log files, you should also receive this message in the system console.

When the module is loaded or removed, the messages that were written in the printk statement will be displayed in the system console.
If these messages do not appear in the console, you can view them by issuing the dmesg command or by looking at the system log file with cat /var/log/syslog.

In UNIX and Linux, devices are accessed from user space in exactly the same way as files are accessed. These device files are normally subdirectories of the /dev directory.

-To link normal files with a kernel module two numbers are used: major number and minor number. The major number is the one the kernel uses to link a file with its driver.
The minor number is for internal use of the device.

-The major number identifies the driver associated with the device.
-The minor number is used by the kernel to determine exactly which device is being referred to.

To achieve this, a file (which will be used to access the device driver) must be created, by typing the following command as root:

# mknod /dev/memory c 60 0

In the above, c means that a char device is to be created, 60 is the major number and 0 is the minor number.

Within the driver, in order to link it with its corresponding /dev file in kernel space, the register_chrdev function is used.
It is called with three arguments: major number, a string of characters showing the module name, and a file_operations structure which links the call with the file functions it defines

int memory_init(void) {
  int result;

  /* Registering device */
  result = register_chrdev(memory_major, "memory", &memory_fops);
  if (result < 0) {
    printk(
      "<1>memory: cannot obtain major number %d\n", memory_major);
    return result;
  }

  /* Allocating memory for the buffer */
  memory_buffer = kmalloc(1, GFP_KERNEL);
  if (!memory_buffer) {
    result = -ENOMEM;
    goto fail;
  }
  memset(memory_buffer, 0, 1);

  printk("<1>Inserting memory module\n");
  return 0;

  fail:
    memory_exit();
    return result;
}

If you pass a major number of 0 to register_chrdev, the return value will be the dynamically allocated major number. The downside is that you can't make a device file in advance, since you don't know what the major number will be. There are a couple of ways to do this. First, the driver itself can print the newly assigned number and we can make the device file by hand. Second, the newly registered device will have an entry in /proc/devices, and we can either make the device file by hand or write a shell script to read the file in and make the device file. The third method is we can have our driver make the the device file using the mknod system call after a successful registration and rm during the call to cleanup_module.


Also, note the use of the kmalloc function. This function is used for memory allocation of the buffer in the device driver which resides in kernel space.
Its use is very similar to the well known malloc function. Finally, if registering the major number or allocating the memory fails, the module acts accordingly.


In order to remove the module inside the memory_exit function, the function unregsiter_chrdev needs to be present. This will free the major number for the kernel.

<memory exit module> =

void memory_exit(void) {
  /* Freeing the major number */
  unregister_chrdev(memory_major, "memory");

  /* Freeing buffer memory */
  if (memory_buffer) {
    kfree(memory_buffer);
  }

  printk("<1>Removing memory module\n");

}

Open

The kernel space function, which corresponds to opening a file in user space (fopen), is the member open: of the file_operations structure in the call to register_chrdev.
In this case, it is the memory_open function. It takes as arguments: an inode structure(data structure holds information about a file or directory on disk.), which sends information to the kernel regarding the major number and minor number; and a file structure with information relative to the different operations that can be performed on a file.

int memory_open(struct inode *inode, struct file *filp) {

  /* Success */
  return 0;
}

Close

The corresponding function for closing a file in user space (fclose) is the release: member of the file_operations structure in the call to register_chrdev.
 In this particular case, it is the function memory_release, which has as arguments an inode structure and a file structure, just like before.

When a file is closed, it’s usually necessary to free the used memory and any variables related to the opening of the device.


int memory_release(struct inode *inode, struct file *filp) {

  /* Success */
  return 0;
}

Read

o read a device with the user function fread or similar, the member read: of the file_operations structure is used in the call to register_chrdev.
This time, it is the function memory_read. Its arguments are: a type file structure; a buffer (buf), from which the user space function (fread) will read; a counter with the number of bytes to transfer (count), which has the same value as the usual counter in the user space function (fread);
and finally, the position of where to start reading the file (f_pos).


ssize_t memory_read(struct file *filp, char *buf,
                    size_t count, loff_t *f_pos) {

  /* Transfering data to user space */
  copy_to_user(buf,memory_buffer,1);

  /* Changing reading position as best suits */
  if (*f_pos == 0) {
    *f_pos+=1;
    return 1;
  } else {
    return 0;
  }
}

Write

To write to a device with the user function fwrite or similar, the member write: of the file_operations structure is used in the call to register_chrdev.
It is the function memory_write, in this particular example, which has the following as arguments: a type file structure; buf, a buffer in which the user space function (fwrite) will write;
count, a counter with the number of bytes to transfer, which has the same values as the usual counter in the user space function (fwrite); and finally, f_pos, the position of where to start writing in the file.

<memory write> =

ssize_t memory_write( struct file *filp, char *buf,
                      size_t count, loff_t *f_pos) {

  char *tmp;

  tmp=buf+count-1;  // writes only the last charecter to the device .
  copy_from_user(memory_buffer,tmp,1);
  return 1;
}

Communicating with hardware

I/O Ports and I/O Memory

Every peripheral device is controlled by writing and reading its registers.
Most of the time a device has several registers, and they are accessed at consecutive addresses, either in the memory address space or in the I/O address space.

At the hardware level, there is no conceptual difference between memory regions and I/O regions: both of them are accessed by asserting electrical signals on the address bus and control bus (i.e., the read and write signals)[1]
and by reading from or writing to the data bus.

Memory Barriers

 Memory barriers are used to provide control over the order of memory accesses.
This is necessary sometimes because optimizations performed by the compiler and hardware can cause memory to be accessed in a different order than intended by the developer.

A memory barrier affects instructions that access memory in two ways:

provides control over the order that memory access instructions are performed, and
provides control over when memory access instructions will complete.

Memory access instructions, such as loads and stores, typically take longer to execute than other instructions. Therefore, compilers use registers to hold frequently used values and processors use high speed caches to hold the most frequently used memory locations.
Another common optimization is for compilers and processors to rearrange the order that instructions are executed so that the processor does not have to wait for memory accesses to complete.
This can result in memory being accessed in a different order than specified in the source code.
 While this typically will not cause a problem in a single thread of execution, it can cause a problem if the location can also be accessed from another processor or device.

 As mentioned above, both compilers and processors can optimize the execution of instructions in a way that necessitates the use of a memory barrier.
 A memory barrier that affects both the compiler and the processor is a hardware memory barrier, and a memory barrier that only affects the compiler is a software memory barrier.

In addition to hardware and software memory barriers, a memory barrier can be restricted to memory reads, memory writes, or both. A memory barrier that affects both reads and writes is a full memory barrier.


mb()

#include <asm/system.h>
void mb(void);
This function inserts a hardware memory barrier that prevents any memory access from being moved and executed on the other side of the barrier. It guarantees that any memory access initiated before the memory barrier will be complete before passing the barrier, and all subsequent memory accesses will be executed after the barrier.

rmb()

#include <asm/system.h>
void rmb(void);
This function inserts a hardware memory barrier that prevents any memory read access from being moved and executed on the other side of the barrier. It guarantees that any memory read access initiated before the memory barrier will be complete before passing the barrier, and all subsequent memory read accesses will be executed after the barrier.

wmb()

#include <asm/system.h>
void wmb(void);
This function inserts a hardware memory barrier that prevents any memory write access from being moved and executed on the other side of the barrier. It guarantees that any memory write access initiated before the memory barrier will be complete before passing the barrier, and all subsequent memory write accesses will be executed after the barrier.

barrier()

#include <linux/kernel.h>
void barrier(void);
This function inserts a software memory barrier that affects the compiler code generation, but it does not affect the hardware's execution of instructions. The compiler will save to memory any modified values that it has loaded in registers, and it will reread all values from memory the next time they are needed.

Using I/O Ports

I/O ports are the means by which drivers communicate with many devices, at least part of the time.

I/O Port Allocation

The kernel provides a registration interface that allows your driver to claim the ports it needs. The core function in that interface is request_region:

#include <linux/ioport.h>
struct resource *request_region(unsigned long first, unsigned long n,
                                const char *name);

This function tells the kernel that you would like to make use of n ports, starting with first. The name parameter should be the name of your device. The return value is non-NULL if the allocation succeeds.
If you get NULL back from request_region, you will not be able to use the desired ports.

All port allocations show up in /proc/ioports

When you are done with a set of I/O ports (at module unload time, perhaps), they should be returned to the system with:

void release_region(unsigned long start, unsigned long n);

There is also a function that allows your driver to check to see whether a given set of I/O ports is available:

int check_region(unsigned long first, unsigned long n);

After a driver has requested the range of I/O ports it needs to use in its activities, it must read and/or write to those ports. To this end, most hardware differentiates between 8-bit, 16-bit, and 32-bit ports

The Linux kernel headers (specifically, the architecture-dependent header <asm/io.h>) define the following inline functions to access I/O ports:

unsigned inb(unsigned port);
void outb(unsigned char byte, unsigned port);
Read or write byte ports (eight bits wide). The port argument is defined as unsigned long for some platforms and unsigned short for others. The return type of inb is also different across architectures.

unsigned inw(unsigned port);
void outw(unsigned short word, unsigned port);
These functions access 16-bit ports (one word wide); they are not available when compiling for the S390 platform, which supports only byte I/O.

unsigned inl(unsigned port);
void outl(unsigned longword, unsigned port);
These functions access 32-bit ports. longword is declared as either unsigned long or unsigned int, according to the platform. Like word I/O, "long" I/O is not available on S390.

Use IO memory

-- Despite the popularity of I/O ports in the x86 world, the main mechanism used to communicate with devices is through memory-mapped registers and device memory.

-- I/O memory is simply a region of RAM-like locations that the device makes available to the processor over the bus.
This memory can be used for a number of purposes, such as holding video data or Ethernet packets, as well as implementing device registers that behave just like I/O ports.

I/O Memory Allocation and Mapping

--I/O memory regions must be allocated prior to use. The interface for allocation of memory regions (defined in <linux/ioport.h>) is:

struct resource *request_mem_region(unsigned long start, unsigned long len,
                                    char *name);

This function allocates a memory region of len bytes, starting at start. If all goes well, a non-NULL pointer is returned; otherwise the return value is NULL. All I/O memory allocations are listed in /proc/iomem.

Memory regions should be freed when no longer needed:

void release_mem_region(unsigned long start, unsigned long len);

There is also an old function for checking I/O memory region availability:

int check_mem_region(unsigned long start, unsigned long len);

Allocation of I/O memory is not the only required step before that memory may be accessed. You must also ensure that this I/O memory has been made accessible to the kernel.
Getting at I/O memory is not just a matter of dereferencing a pointer; on many systems, I/O memory is not directly accessible in this way at all.
So a mapping must be set up first. This is the role of the ioremap function.

Once equipped with ioremap (and iounmap), a device driver can access any I/O memory address, whether or not it is directly mapped to virtual address space.
Remember, though, that the addresses returned from ioremap should not be dereferenced directly; instead, accessor functions provided by the kernel should be used.


Accessing I/O Memory

On some platforms, you may get away with using the return value from ioremap as a pointer.
Such use is not portable, and, increasingly, the kernel developers have been working to eliminate any such use. The proper way of getting at I/O memory is via a set of functions (defined via <asm/io.h>) provided for that purpose.

To read from I/O memory, use one of the following:

unsigned int ioread8(void *addr);
unsigned int ioread16(void *addr);
unsigned int ioread32(void *addr);

Here, addr should be an address obtained from ioremap (perhaps with an integer offset); the return value is what was read from the given I/O memory.

There is a similar set of functions for writing to I/O memory:

void iowrite8(u8 value, void *addr);
void iowrite16(u16 value, void *addr);
void iowrite32(u32 value, void *addr);

Interrupts

-- An interrupt is simply a signal that the hardware can send when it wants the processor's attention.
Linux handles interrupts in much the same way that it handles signals in user space.

--For the most part, a driver need only register a handler for its device's interrupts, and handle them properly when they arrive.

Installing interupt handler

-- Interrupt lines are a precious and often limited resource .
-- kernel keeps a registry of interrupt lines, similar to the registry of I/O ports.module is expected to request an interrupt channel (or IRQ, for interrupt request) before using it and to release it when finished.
-- Modules are also expected to be able to share interrupt lines with other drivers.

The interrupt registration interface:

int request_irq(unsigned int irq,
                irqreturn_t (*handler)(int, void *, struct pt_regs *),
                unsigned long flags,
                const char *dev_name,
                void *dev_id);

void free_irq(unsigned int irq, void *dev_id);

-- The value returned from request_irq to the requesting function is either 0 to indicate success or a negative error code.

unsigned int irq
The interrupt number being requested.

irqreturn_t (*handler)(int, void *, struct pt_regs *)
The pointer to the handling function being installed. We discuss the arguments to this function and its return value later in this chapter.

unsigned long flags
As you might expect, a bit mask of options (described later) related to interrupt management.

const char *dev_name
The string passed to request_irq is used in /proc/interrupts to show the owner of the interrupt (see the next section).

void *dev_id
Pointer used for shared interrupt lines. It is a unique identifier that is used when the interrupt line is freed and that may also be used by the driver to point to its own private data area (to identify which device is interrupting). If the interrupt is not shared, dev_id can be set to NULL, but it a good idea anyway to use this item to point to the device structure. We'll see a practical use for dev_id in Section 10.3.

The bits that can be set in flags are as follows:

SA_INTERRUPT
When set, this indicates a "fast" interrupt handler. Fast handlers are executed with interrupts disabled on the current processor.

SA_SHIRQ
This bit signals that the interrupt can be shared between devices.

--The interrupt handler can be installed either at driver initialization or when the device is first opened.
Although installing the interrupt handler from within the module's initialization function might sound like a good idea, it often isn't,
especially if your device does not share interrupts.
Because the number of interrupt lines is limited, you don't want to waste them.

--If a module requests an IRQ at initialization, it prevents any other driver from using the interrupt, even if the device holding it is never used.
Requesting the interrupt at device open, on the other hand, allows some sharing of resources.

-- The correct place to call request_irq is when the device is first opened, before the hardware is instructed to generate interrupts. The place to call free_irq is the last time the device is closed, after the hardware is told not to interrupt the processor any more.
The disadvantage of this technique is that you need to keep a per-device open count so that you know when interrupts can be disabled.

Example :

if (short_irq >= 0) {
    result = request_irq(short_irq, short_interrupt,
            SA_INTERRUPT, "short", NULL);
   if (result) {
        printk(KERN_INFO "short: can't get assigned irq %i\n",
                short_irq);
        short_irq = -1;
    }
    else { /* actually enable it -- assume this *is* a parallel port */
        outb(0x10,short_base+2);
    }
}

Proc Interface

-- Whenever a hardware interrupt reaches the processor, an internal counter is incremented, providing a way to check whether the device is working as expected.
--Reported interrupts are shown in /proc/interrupts.
-- The /proc/interrupts display shows how many interrupts have been delivered to each CPU on the system.

root@montalcino:/bike/corbet/write/ldd3/src/short# m /proc/interrupts
           CPU0       CPU1
  0:    4848108         34    IO-APIC-edge  timer
  2:          0          0          XT-PIC  cascade
  8:          3          1    IO-APIC-edge  rtc
 10:       4335          1   IO-APIC-level  aic7xxx
 11:       8903          0   IO-APIC-level  uhci_hcd
 12:         49          1    IO-APIC-edge  i8042
NMI:          0          0
LOC:    4848187    4848186
ERR:          0
MIS:          0

Autodetecting the IRQ Number

-- One of the most challenging problems for a driver at initialization time can be how to determine which IRQ line is going to be used by the device.
The driver needs the information in order to correctly install the handler.

-- Some devices are more advanced in design and simply "announce" which interrupt they're going to use.
In this case, the driver retrieves the interrupt number by reading a status byte from one of the device's I/O ports or PCI configuration space.

-- When the target device is one that has the ability to tell the driver which interrupt it is going to use, autodetecting the IRQ number just means probing the device.

Probing IRQ

.0--Linux kernel offers a low-level facility for probing the interrupt number. It works for only nonshared interrupts.

Functions

unsigned long probe_irq_on(void);

This function returns a bit mask of unassigned interrupts. The driver must preserve the returned bit mask, and pass it to probe_irq_off later.
 After this call, the driver should arrange for its device to generate at least one interrupt.

int probe_irq_off(unsigned long);

After the device has requested an interrupt, the driver calls this function, passing as its argument the bit mask previously returned by probe_irq_on. probe_irq_off returns the number of the interrupt that was issued after "probe_on."
 If no interrupts occurred, 0 is returned (therefore, IRQ 0 can't be probed for, but no custom device can use it on any of the supported architectures anyway).
If more than one interrupt occurred (ambiguous detection), probe_irq_off returns a negative value.

-- The programmer should be careful to enable interrupts on the device after the call to probe_irq_on and to disable them before calling probe_irq_off.
Additionally, you must remember to service the pending interrupt in your device after probe_irq_off.

probing ex :


int count = 0;
do {
    unsigned long mask;

    mask = probe_irq_on(  );
    outb_p(0x10,short_base+2); /* enable reporting */
    outb_p(0x00,short_base);   /* clear the bit */
    outb_p(0xFF,short_base);   /* set the bit: interrupt! */
    outb_p(0x00,short_base+2); /* disable reporting */
    udelay(5);  /* give it some time */
    short_irq = probe_irq_off(mask);

    if (short_irq =  = 0) { /* none of them? */
        printk(KERN_INFO "short: no irq reported by probe\n");
        short_irq = -1;
    }
    /*
     * if more than one line has been activated, the result is
     * negative. We should service the interrupt (no need for lpt port)
     * and loop over again. Loop at most five times, then give up
     */
} while (short_irq < 0 && count++ < 5);
if (short_irq < 0)
    printk("short: probe failed %i times, giving up\n", count);

Note the use of udelay before calling probe_irq_off. Depending on the speed of your processor, you may have to wait for a brief period to give the interrupt time to actually be delivered.

Probing might be a lengthy task.
While this is not true for short, probing a frame grabber, for example, requires a delay of at least 20 ms (which is ages for the processor), and other devices might take even longer.
Therefore, it's best to probe for the interrupt line only once, at module initialization, independently of whether you install the handler at device open (as you should) or within the initialization function (which is not recommended).

Fast and Slow Handlers

--  Fast interrupts were those that could be handled very quickly, whereas handling slow interrupts took significantly longer.
Slow interrupts could be sufficiently demanding of the processor, and it was worthwhile to reenable interrupts while they were being handled.

-- In modern kernels, most of the differences between fast and slow interrupts have disappeared.
There remains only one: fast interrupts ( SA_INTERRUPT flag) are executed with all other interrupts disabled on the current processor.
Note that other processors can still handle interrupts, although you will never see two processors handling the same IRQ at the same time.

-- On modern systems, SA_INTERRUPT is intended only for use in a few, specific situations such as timer interrupts.
Unless you have a strong reason to run your interrupt handler with other interrupts disabled, you should not use SA_INTERRUPT.

Implementation of interrupt handler

-- Actually, there's nothing unusual about a handler—it's ordinary C code.
only peculiarity is that a handler runs at interrupt time and, therefore, suffers some restrictions on what it can do.

-- A handler can't transfer data to or from user space, because it doesn't execute in the context of a process.
Handlers also cannot do anything that would sleep, such as calling wait_event, allocating memory with anything other than GFP_ATOMIC, or locking a semaphore.
Finally, handlers cannot call schedule.

-- The role of an interrupt handler is to give feedback to its device about interrupt reception and to read or write data according to the meaning of the interrupt being serviced.
The first step usually consists of clearing a bit on the interface board; most hardware devices won't generate other interrupts until their "interrupt-pending" bit has been cleared.

-- A typical task for an interrupt handler is awakening processes sleeping on the device if the interrupt signals the event they're waiting for,
such as the arrival of new data.

-- The programmer should be careful to write a routine that executes in a minimum amount of time, independent of its being a fast or slow handler.
If a long computation needs to be performed, the best approach is to use a tasklet or workqueue to schedule computation at a safer time.

Handler Arguments and Return Value

-- Three arguments are passed to an interrupt handler: irq, dev_id, and regs.
-- The interrupt number (int irq) is useful as information you may print in your log messages.
-- You usually pass a pointer to your device data structure in dev_id, so a driver that manages several instances of the same device doesn't need any extra code in the interrupt handler to find out which device is in charge of the current interrupt event.

example :
static irqreturn_t sample_interrupt(int irq, void *dev_id, struct pt_regs
                             *regs)
{
    struct sample_dev *dev = dev_id;

    /* now `dev' points to the right hardware item */
    /* .... */
}

--The last argument, struct pt_regs *regs, is rarely used. It holds a snapshot of the processor's context before the processor entered interrupt code.
 The registers can be used for monitoring and debugging.

-- Interrupt handlers should return a value indicating whether there was actually an interrupt to handle. If the handler found that its device did, indeed, need attention, it should return IRQ_HANDLED;
otherwise the return value should be IRQ_NONE.

-- where handled is nonzero if you were able to handle the interrupt. The return value is used by the kernel to detect and suppress spurious interrupts.
If your device gives you no way to tell whether it really interrupted, you should return IRQ_HANDLED.

Enable and disable interrupts

-- There are times when a device driver must block the delivery of interrupts for a (hopefully short) period of time.
Often, interrupts must be blocked while holding a spinlock to avoid deadlocking the system.

-- The kernel offers three functions for this purpose,Among other things, you cannot disable shared interrupt lines, and, on modern systems, shared interrupts are the norm.

void disable_irq(int irq);
void disable_irq_nosync(int irq);
void enable_irq(int irq);

--Calling any of these functions may update the mask for the specified irq in the programmable interrupt controller (PIC), thus disabling or enabling the specified IRQ across all processors.

-- disable_irq not only disables the given interrupt but also waits for a currently executing interrupt handler, if any, to complete.

Disabling all interrupts

-- It is possible to turn off all interrupt handling on the current processor with either of the following two functions.

void local_irq_save(unsigned long flags);
void local_irq_disable(void);

-- A call to local_irq_save disables interrupt delivery on the current processor after saving the current interrupt state into flags.
Note that flags is passed directly, not by pointer. local_irq_disable shuts off local interrupt delivery without saving the state.

-- Turning interrupts back on is accomplished with:

void local_irq_restore(unsigned long flags);
void local_irq_enable(void);

-- The first version restores that state which was stored into flags by local_irq_save, while local_irq_enable enables interrupts unconditionally.

Top and bottom halves

-- One of the main problems with interrupt handling is how to perform lengthy tasks within a handler.
Often a substantial amount of work must be done in response to a device interrupt, but interrupt handlers need to finish up quickly and not keep interrupts blocked for long.

-- Linux (along with many other systems) resolves this problem by splitting the interrupt handler into two halves.
The so-called top half is the routine that actually responds to the interrupt—the one you register with request_irq.

-- The bottom half is a routine that is scheduled by the top half to be executed later, at a safer time.

-- The big difference between the top-half handler and the bottom half is that all interrupts are enabled during execution of the bottom half—that's why it runs at a safer time.

-- In the typical scenario,the top half saves device data to a device-specific buffer,schedules its bottom half,and exits: this operation is very fast.

-- The bottom half then performs whatever other work is required, such as awakening processes, starting up another I/O operation, and so on.
This setup permits the top half to service a new interrupt while the bottom half is still working.

-- when a network interface reports the arrival of a new packet, the handler just retrieves the data and pushes it up to the protocol layer;
actual processing of the packet is performed in a bottom half.

Tasklets

-- tasklets are a special function that may be scheduled to run, in software interrupt context, at a system-determined safe time.
-- No tasklet ever runs in parallel with itself, since they run only once, but tasklets can run in parallel with other tasklets on SMP systems.
-- Tasklets are also guaranteed to run on the same CPU as the function that first schedules them.
Therefore, an interrupt handler can be secure that a tasklet does not begin executing before the handler has completed.
However, another interrupt can certainly be delivered while the tasklet is running, so locking between the tasklet and the interrupt handler may still be required.

Tasklets must be declared with the DECLARE_TASKLET macro:
DECLARE_TASKLET(name, function, data);

name is the name to be given to the tasklet, function is the function that is called to execute the tasklet (it takes one unsigned long argument and returns void), and data is an unsigned long value to be passed to the tasklet function.

-- The function tasklet_schedule is used to schedule a tasklet for running.

Sample interrupt routine

irqreturn_t short_tl_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
    do_gettimeofday((struct timeval *) tv_head); /* cast to stop 'volatile' warning */
    short_incr_tv(&tv_head);
    tasklet_schedule(&short_tasklet);
    short_wq_count++; /* record that an interrupt arrived */
    return IRQ_HANDLED;
}

The actual tasklet routine, short_do_tasklet, will be executed shortly (so to speak) at the system's convenience. As mentioned earlier, this routine performs the bulk of the work of handling the interrupt; it looks like this:

void short_do_tasklet(unsigned long);
DECLARE_TASKLET(short_tasklet, short_do_tasklet, 0);

void short_do_tasklet (unsigned long unused)
{
    int savecount = short_wq_count, written;
    short_wq_count = 0; /* we have already been removed from the queue */
    /*
     * The bottom half reads the tv array, filled by the top half,
     * and prints it to the circular text buffer, which is then consumed
     * by reading processes
     */

    /* First write the number of interrupts that occurred before this bh */
    written = sprintf((char *)short_head,"bh after %6i\n",savecount);
    short_incr_bp(&short_head, written);

    /*
     * Then, write the time values. Write exactly 16 bytes at a time,
     * so it aligns with PAGE_SIZE
     */

    do {
        written = sprintf((char *)short_head,"%08u.%06u\n",
                (int)(tv_tail->tv_sec % 100000000),
                (int)(tv_tail->tv_usec));
        short_incr_bp(&short_head, written);
        short_incr_tv(&tv_tail);
    } while (tv_tail != tv_head);

    wake_up_interruptible(&short_queue); /* awake any reading process */
}

Workqueue

-- workqueues invoke a function at some future time in the context of a special worker process.
-- workqueue function runs in process context, it can sleep if need be. You cannot, however, copy data into user space from a workqueue.
-- We do need a work_struct structure, which is declared and initialized with the following:

static struct work_struct short_wq;

    /* this line is in short_init(  ) */
    INIT_WORK(&short_wq, (void (*)(void *)) short_do_tasklet, NULL);

Interrupt handler for bottom handler using work-queue

irqreturn_t short_wq_interrupt(int irq, void *dev_id, struct pt_regs *regs)
{
    /* Grab the current time information. */
    do_gettimeofday((struct timeval *) tv_head);
    short_incr_tv(&tv_head);

    /* Queue the bh. Don't worry about multiple enqueueing */
    schedule_work(&short_wq);

    short_wq_count++; /* record that an interrupt arrived */
    return IRQ_HANDLED;
}

Shared interrupts

the Linux kernel supports interrupt sharing on all buses, even those (such as the ISA bus) where sharing has traditionally not been supported. Device drivers for the 2.6 kernel should be written to work with shared interrupts if the target hardware can support that mode of operation.

Installing a Shared Handler

 Shared interrupts are installed through request_irq just like nonshared ones, but there are two differences:

 The SA_SHIRQ bit must be specified in the flags argument when requesting the interrupt.

The dev_id argument must be unique. Any pointer into the module's address space will do, but dev_id definitely cannot be set to NULL.

request_irq succeeds if one of the following is true:

1-The interrupt line is free.

2-All handlers already registered for that line have also specified that the IRQ is to be shared.

Whenever two or more drivers are sharing an interrupt line and the hardware interrupts the processor on that line, the kernel invokes every handler registered for that interrupt, passing each its own dev_id. Therefore, a shared handler must be able to recognize its own interrupts and should quickly exit when its own device has not interrupted. Be sure to return IRQ_NONE whenever your handler is called and finds that the device is not interrupting.

No probing function is available for shared handlers. The standard probing mechanism works if the line being used is free, but if the line is already held by another driver with sharing capabilities, the probe fails, even if your driver would have worked perfectly.

Releasing the handler is performed in the normal way, using free_irq. Here the dev_id argument is used to select the correct handler to release from the list of shared handlers for the interrupt. That's why the dev_id pointer must be unique.

A driver using a shared handler needs to be careful about one more thing: it can't play with enable_irq or disable_irq. If it does, things might go haywire for other devices sharing the line; disabling another device's interrupts for even a short time may create latencies that are problematic for that device and it's user.

Running the handler

when the kernel receives an interrupt, all the registered handlers are invoked. A shared handler must be able to distinguish between interrupts that it needs to handle and interrupts generated by other devices.


The /proc Interface and Shared Interrupts

Installing shared handlers in the system doesn't affect /proc/stat, which doesn't even know about handlers. However, /proc/interrupts changes slightly.

All the handlers installed for the same interrupt number appear on the same line of /proc/interrupts. The following output (from an x86_64 system) shows how shared interrupt handlers are displayed:

           CPU0
  0:  892335412         XT-PIC  timer
  1:     453971         XT-PIC  i8042
  2:          0         XT-PIC  cascade
  5:          0         XT-PIC  libata, ehci_hcd
  8:          0         XT-PIC  rtc
  9:          0         XT-PIC  acpi
 10:   11365067         XT-PIC  ide2, uhci_hcd, uhci_hcd, SysKonnect SK-98xx, EMU10K1
 11:    4391962         XT-PIC  uhci_hcd, uhci_hcd
 12:        224         XT-PIC  i8042
 14:    2787721         XT-PIC  ide0
 15:     203048         XT-PIC  ide1
NMI:      41234
LOC:  892193503
ERR:        102
MIS:          0

This system has several shared interrupt lines. IRQ 5 is used for the serial ATA and IEEE 1394 controllers; IRQ 10 has several devices, including an IDE controller, two USB controllers, an Ethernet interface, and a sound card; and IRQ 11 also is used by two USB controllers.