16.1. Registration
Block drivers, like char drivers, must use a set of registration interfaces to make their devices available to the kernel. The concepts are similar, but the details of block device registration are all different. You have a whole new set of data structures and device operations to learn.
16.1.1. Block Driver Registration
The first step taken by most block drivers is to register themselves with the kernel. The function for this task is register_blkdev (which is declared in):
int register_blkdev(unsigned int major, const char *name);
The arguments are the major number that your device will be using and the associated name (which the kernel will display in /proc/devices). If major is passed as 0, the kernel allocates a new major number and returns it to the caller.
As always, a negative return value from register_blkdev indicates that an error has occurred.
The corresponding function for canceling a block driver registration is:
int unregister_blkdev(unsigned int major, const char *name);
Here, the arguments must match those passed to register_blkdev, or the function returns -EINVAL and not unregister anything.
In the 2.6 kernel, the call to register_blkdev is entirely optional. The functions performed by register_blkdev have been decreasing over time; the only tasks performed by this call at this point are (1) allocating a dynamic major number if requested, and (2) creating an entry in /proc/devices. In future kernels, register_blkdev may be removed altogether. Meanwhile, however, most drivers still call it; it's traditional.
16.1.2. Disk Registration
While register_blkdev can be used to obtain a major number, it does not make any disk drives available to the system. There is a separate registration interface that you must use to manage individual drives. Using this interface requires familiarity with a pair of new structures, so that is where we start.
16.1.2.1 Block device operations
Char devices make their operations available to the system by way of the file_operations structure. A similar structure is used with block devices; it is struct block_device_operations, which is declared in. The following is a brief overview of the fields found in this structure; we revisit them in more detail when we get into the details of the sbull driver:
int (*open)(struct inode *inode, struct file *filp);
int (*release)(struct inode *inode, struct file *filp);
Functions that work just like their char driver equivalents; they are called whenever the device is opened and closed. A block driver might respond to an open call by spinning up the device, locking the door (for removable media), etc. If you lock media into the device, you should certainly unlock it in the release method.
int (*ioctl)(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);
Method that implements the ioctl system call. The block layer first intercepts a large number of standard requests, however; so most block driver ioctl methods are fairly short.
int (*media_changed) (struct gendisk *gd);
Method called by the kernel to check whether the user has changed the media in the drive, returning a nonzero value if so. Obviously, this method is only applicable to drives that support removable media (and that are smart enough to make a "media changed" flag available to the driver); it can be omitted in other cases.
The struct gendisk argument is how the kernel represents a single disk; we will be looking at that structure in the next section.
int (*revalidate_disk) (struct gendisk *gd);
The revalidate_disk method is called in response to a media change; it gives the driver a chance to perform whatever work is required to make the new media ready for use. The function returns an int value, but that value is ignored by the kernel.
struct module *owner;
A pointer to the module that owns this structure; it should usually be initialized to
THIS_MODULE.
Attentive readers may have noticed an interesting omission from this list: there are no functions that actually read or write data. In the block I/O subsystem, these operations are handled by the request function, which deserves a large section of its own and is discussed later in the chapter. Before we can talk about servicing requests, we must complete our discussion of disk registration.
16.1.2.2 The gendisk structure
struct gendisk (declared in) is the kernel's representation of an individual disk device. In fact, the kernel also uses gendisk structures to represent partitions, but driver authors need not be aware of that. There are several fields in struct gendisk that must be initialized by a block driver:
int major;
int first_minor;
int minors;
Fields that describe the device number(s) used by the disk. At a minimum, a drive must use at least one minor number. If your drive is to be partitionable, however (and most should be), you want to allocate one minor number for each possible partition as well. A common value for minors is 16, which allows for the "full disk" device and 15 partitions. Some disk drivers use 64 minor numbers for each device.
char disk_name[32];
Field that should be set to the name of the disk device. It shows up in /proc/partitions and sysfs.
struct block_device_operations *fops;
Set of device operations from the previous section.
struct request_queue *queue;
Structure used by the kernel to manage I/O requests for this device; we examine it in Section 16.3.
int flags;
A (little-used) set of flags describing the state of the drive. If your device has removable media, you should set GENHD_FL_REMOVABLE. CD-ROM drives can set GENHD_FL_CD. If, for some reason, you do not want partition information to show up in /proc/partitions, set GENHD_FL_SUPPRESS_PARTITION_INFO.
sector_t capacity;
The capacity of this drive, in 512-byte sectors. The sector_t type can be 64 bits wide. Drivers should not set this field directly; instead, pass the number of sectors to set_capacity.
void *private_data;
Block drivers may use this field for a pointer to their own internal data.
The kernel provides a small set of functions for working with gendisk structures. We introduce them here, then see how sbull uses them to make its disk devices available to the system.
struct gendisk is a dynamically allocated structure that requires special kernel manipulation to be initialized; drivers cannot allocate the structure on their own. Instead, you must call:
struct gendisk *alloc_disk(int minors);
The minors argument should be the number of minor numbers this disk uses; note that you cannot change the minors field later and expect things to work properly.
When a disk is no longer needed, it should be freed with:
void del_gendisk(struct gendisk *gd);
A gendisk is a reference-counted structure (it contains a kobject). There are get_disk and put_disk functions available to manipulate the reference count, but drivers should never need to do that. Normally, the call to del_gendisk removes the final reference to a gendisk, but there are no guarantees of that. Thus, it is possible that the structure could continue to exist (and your methods could be called) after a call to del_gendisk. If you delete the structure when there are no users (that is, after the final release or in your module cleanup function), however, you can be sure that you will not hear from it again.
Allocating a gendisk structure does not make the disk available to the system. To do that, you must initialize the structure and call add_disk:
void add_disk(struct gendisk *gd);
Keep one important thing in mind here: as soon as you call add_disk, the disk is "live" and its methods can be called at any time. In fact, the first such calls will probably happen even before add_disk returns; the kernel will read the first few blocks in an attempt to find a partition table. So you should not call add_disk until your driver is completely initialized and ready to respond to requests on that disk.
16.1.3. Initialization in sbull
It is time to get down to some examples. The sbull driver (available from O'Reilly's FTP site with the rest of the example source) implements a set of in-memory virtual disk drives. For each drive, sbull allocates (with vmalloc, for simplicity) an array of memory; it then makes that array available via block operations. The sbull driver can be tested by partitioning the virtual device, building filesystems on it, and mounting it in the system hierarchy.
Like our other example drivers, sbull allows a major number to be specified at compile or module load time. If no number is specified, one is allocated dynamically. Since a call to register_blkdev is required for dynamic allocation, sbull does so:
sbull_major = register_blkdev(sbull_major, "sbull");
if (sbull_major <= 0) {
printk(KERN_WARNING "sbull: unable to get major number\n");
return -EBUSY;
}
Also, like the other virtual devices we have presented in this book, the sbull device is described by an internal structure:
struct sbull_dev {
int size; /* Device size in sectors */
u8 *data; /* The data array */
short users; /* How many users */
short media_change; /* Flag a media change? */
spinlock_t lock; /* For mutual exclusion */
struct request_queue *queue; /* The device request queue */
struct gendisk *gd; /* The gendisk structure */
struct timer_list timer; /* For simulated media changes */
};
Several steps are required to initialize this structure and make the associated device available to the system. We start with basic initialization and allocation of the underlying memory:
memset (dev, 0, sizeof (struct sbull_dev));
dev->size = nsectors*hardsect_size;
dev->data = vmalloc(dev->size);
if (dev->data = = NULL) {
printk (KERN_NOTICE "vmalloc failure.\n");
return;
}
spin_lock_init(&dev->lock);
It's important to allocate and initialize a spinlock before the next step, which is the allocation of the request queue. We look at this process in more detail when we get to request processing; for now, suffice it to say that the necessary call is:
dev->queue = blk_init_queue(sbull_request, &dev->lock);
Here, sbull_request is our request functionthe function that actually performs block read and write requests. When we allocate a request queue, we must provide a spinlock that controls access to that queue. The lock is provided by the driver rather than the general parts of the kernel because, often, the request queue and other driver data structures fall within the same critical section; they tend to be accessed together. As with any function that allocates memory, blk_init_queue can fail, so you must check the return value before continuing.
Once we have our device memory and request queue in place, we can allocate, initialize, and install the corresponding gendisk structure. The code that does this work is:
dev->gd = alloc_disk(SBULL_MINORS);
if (! dev->gd) {
printk (KERN_NOTICE "alloc_disk failure\n");
goto out_vfree;
}
dev->gd->major = sbull_major;
dev->gd->first_minor = which*SBULL_MINORS;
dev->gd->fops = &sbull_ops;
dev->gd->queue = dev->queue;
dev->gd->private_data = dev;
snprintf (dev->gd->disk_name, 32, "sbull%c", which + 'a');
set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE));
add_disk(dev->gd);
Here, SBULL_MINORS is the number of minor numbers each sbull device supports. When we set the first minor number for each device, we must take into account all of the numbers taken by prior devices. The name of the disk is set such that the first one is sbulla, the second sbullb, and so on. User space can then add partition numbers so that the third partition on the second device might be /dev/sbullb3.
Once everything is set up, we finish with a call to add_disk. Chances are that several of our methods will have been called for that disk by the time add_disk returns, so we take care to make that call the very last step in the initialization of our device.
16.1.4. A Note on Sector Sizes
As we have mentioned before, the kernel treats every disk as a linear array of 512-byte sectors. Not all hardware uses that sector size, however. Getting a device with a different sector size to work is not particularly hard; it is just a matter of taking care of a few details. The sbull device exports a hardsect_size parameter that can be used to change the "hardware" sector size of the device; by looking at its implementation, you can see how to add this sort of support to your own drivers.
The first of those details is to inform the kernel of the sector size your device supports. The hardware sector size is a parameter in the request queue, rather than in the gendisk structure. This size is set with a call to
blk_queue_hardsect_size immediately after the queue is allocated:
blk_queue_hardsect_size(dev->queue, hardsect_size);
Once that is done, the kernel adheres to your device's hardware sector size. All I/O requests are properly aligned at the beginning of a hardware sector, and the length of each request is an integral number of sectors. You must remember, however, that the kernel always expresses itself in 512-byte sectors; thus, it is necessary to translate all sector numbers accordingly. So, for example, when sbull sets the capacity of the device in its gendisk structure, the call looks like:
set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE));
KERNEL_SECTOR_SIZE is a locally-defined constant that we use to scale between the kernel's 512-byte sectors and whatever size we have been told to use. This sort of calculation pops up frequently as we look at the sbull request processing logic.
Block drivers, like char drivers, must use a set of registration interfaces to make their devices available to the kernel. The concepts are similar, but the details of block device registration are all different. You have a whole new set of data structures and device operations to learn.
16.1.1. Block Driver Registration
The first step taken by most block drivers is to register themselves with the kernel. The function for this task is register_blkdev (which is declared in
int register_blkdev(unsigned int major, const char *name);
The arguments are the major number that your device will be using and the associated name (which the kernel will display in /proc/devices). If major is passed as 0, the kernel allocates a new major number and returns it to the caller.
As always, a negative return value from register_blkdev indicates that an error has occurred.
The corresponding function for canceling a block driver registration is:
int unregister_blkdev(unsigned int major, const char *name);
Here, the arguments must match those passed to register_blkdev, or the function returns -EINVAL and not unregister anything.
In the 2.6 kernel, the call to register_blkdev is entirely optional. The functions performed by register_blkdev have been decreasing over time; the only tasks performed by this call at this point are (1) allocating a dynamic major number if requested, and (2) creating an entry in /proc/devices. In future kernels, register_blkdev may be removed altogether. Meanwhile, however, most drivers still call it; it's traditional.
16.1.2. Disk Registration
While register_blkdev can be used to obtain a major number, it does not make any disk drives available to the system. There is a separate registration interface that you must use to manage individual drives. Using this interface requires familiarity with a pair of new structures, so that is where we start.
16.1.2.1 Block device operations
Char devices make their operations available to the system by way of the file_operations structure. A similar structure is used with block devices; it is struct block_device_operations, which is declared in
int (*open)(struct inode *inode, struct file *filp);
int (*release)(struct inode *inode, struct file *filp);
Functions that work just like their char driver equivalents; they are called whenever the device is opened and closed. A block driver might respond to an open call by spinning up the device, locking the door (for removable media), etc. If you lock media into the device, you should certainly unlock it in the release method.
int (*ioctl)(struct inode *inode, struct file *filp, unsigned int cmd,
unsigned long arg);
Method that implements the ioctl system call. The block layer first intercepts a large number of standard requests, however; so most block driver ioctl methods are fairly short.
int (*media_changed) (struct gendisk *gd);
Method called by the kernel to check whether the user has changed the media in the drive, returning a nonzero value if so. Obviously, this method is only applicable to drives that support removable media (and that are smart enough to make a "media changed" flag available to the driver); it can be omitted in other cases.
The struct gendisk argument is how the kernel represents a single disk; we will be looking at that structure in the next section.
int (*revalidate_disk) (struct gendisk *gd);
The revalidate_disk method is called in response to a media change; it gives the driver a chance to perform whatever work is required to make the new media ready for use. The function returns an int value, but that value is ignored by the kernel.
struct module *owner;
A pointer to the module that owns this structure; it should usually be initialized to
THIS_MODULE.
Attentive readers may have noticed an interesting omission from this list: there are no functions that actually read or write data. In the block I/O subsystem, these operations are handled by the request function, which deserves a large section of its own and is discussed later in the chapter. Before we can talk about servicing requests, we must complete our discussion of disk registration.
16.1.2.2 The gendisk structure
struct gendisk (declared in
int major;
int first_minor;
int minors;
Fields that describe the device number(s) used by the disk. At a minimum, a drive must use at least one minor number. If your drive is to be partitionable, however (and most should be), you want to allocate one minor number for each possible partition as well. A common value for minors is 16, which allows for the "full disk" device and 15 partitions. Some disk drivers use 64 minor numbers for each device.
char disk_name[32];
Field that should be set to the name of the disk device. It shows up in /proc/partitions and sysfs.
struct block_device_operations *fops;
Set of device operations from the previous section.
struct request_queue *queue;
Structure used by the kernel to manage I/O requests for this device; we examine it in Section 16.3.
int flags;
A (little-used) set of flags describing the state of the drive. If your device has removable media, you should set GENHD_FL_REMOVABLE. CD-ROM drives can set GENHD_FL_CD. If, for some reason, you do not want partition information to show up in /proc/partitions, set GENHD_FL_SUPPRESS_PARTITION_INFO.
sector_t capacity;
The capacity of this drive, in 512-byte sectors. The sector_t type can be 64 bits wide. Drivers should not set this field directly; instead, pass the number of sectors to set_capacity.
void *private_data;
Block drivers may use this field for a pointer to their own internal data.
The kernel provides a small set of functions for working with gendisk structures. We introduce them here, then see how sbull uses them to make its disk devices available to the system.
struct gendisk is a dynamically allocated structure that requires special kernel manipulation to be initialized; drivers cannot allocate the structure on their own. Instead, you must call:
struct gendisk *alloc_disk(int minors);
The minors argument should be the number of minor numbers this disk uses; note that you cannot change the minors field later and expect things to work properly.
When a disk is no longer needed, it should be freed with:
void del_gendisk(struct gendisk *gd);
A gendisk is a reference-counted structure (it contains a kobject). There are get_disk and put_disk functions available to manipulate the reference count, but drivers should never need to do that. Normally, the call to del_gendisk removes the final reference to a gendisk, but there are no guarantees of that. Thus, it is possible that the structure could continue to exist (and your methods could be called) after a call to del_gendisk. If you delete the structure when there are no users (that is, after the final release or in your module cleanup function), however, you can be sure that you will not hear from it again.
Allocating a gendisk structure does not make the disk available to the system. To do that, you must initialize the structure and call add_disk:
void add_disk(struct gendisk *gd);
Keep one important thing in mind here: as soon as you call add_disk, the disk is "live" and its methods can be called at any time. In fact, the first such calls will probably happen even before add_disk returns; the kernel will read the first few blocks in an attempt to find a partition table. So you should not call add_disk until your driver is completely initialized and ready to respond to requests on that disk.
16.1.3. Initialization in sbull
It is time to get down to some examples. The sbull driver (available from O'Reilly's FTP site with the rest of the example source) implements a set of in-memory virtual disk drives. For each drive, sbull allocates (with vmalloc, for simplicity) an array of memory; it then makes that array available via block operations. The sbull driver can be tested by partitioning the virtual device, building filesystems on it, and mounting it in the system hierarchy.
Like our other example drivers, sbull allows a major number to be specified at compile or module load time. If no number is specified, one is allocated dynamically. Since a call to register_blkdev is required for dynamic allocation, sbull does so:
sbull_major = register_blkdev(sbull_major, "sbull");
if (sbull_major <= 0) {
printk(KERN_WARNING "sbull: unable to get major number\n");
return -EBUSY;
}
Also, like the other virtual devices we have presented in this book, the sbull device is described by an internal structure:
struct sbull_dev {
int size; /* Device size in sectors */
u8 *data; /* The data array */
short users; /* How many users */
short media_change; /* Flag a media change? */
spinlock_t lock; /* For mutual exclusion */
struct request_queue *queue; /* The device request queue */
struct gendisk *gd; /* The gendisk structure */
struct timer_list timer; /* For simulated media changes */
};
Several steps are required to initialize this structure and make the associated device available to the system. We start with basic initialization and allocation of the underlying memory:
memset (dev, 0, sizeof (struct sbull_dev));
dev->size = nsectors*hardsect_size;
dev->data = vmalloc(dev->size);
if (dev->data = = NULL) {
printk (KERN_NOTICE "vmalloc failure.\n");
return;
}
spin_lock_init(&dev->lock);
It's important to allocate and initialize a spinlock before the next step, which is the allocation of the request queue. We look at this process in more detail when we get to request processing; for now, suffice it to say that the necessary call is:
dev->queue = blk_init_queue(sbull_request, &dev->lock);
Here, sbull_request is our request functionthe function that actually performs block read and write requests. When we allocate a request queue, we must provide a spinlock that controls access to that queue. The lock is provided by the driver rather than the general parts of the kernel because, often, the request queue and other driver data structures fall within the same critical section; they tend to be accessed together. As with any function that allocates memory, blk_init_queue can fail, so you must check the return value before continuing.
Once we have our device memory and request queue in place, we can allocate, initialize, and install the corresponding gendisk structure. The code that does this work is:
dev->gd = alloc_disk(SBULL_MINORS);
if (! dev->gd) {
printk (KERN_NOTICE "alloc_disk failure\n");
goto out_vfree;
}
dev->gd->major = sbull_major;
dev->gd->first_minor = which*SBULL_MINORS;
dev->gd->fops = &sbull_ops;
dev->gd->queue = dev->queue;
dev->gd->private_data = dev;
snprintf (dev->gd->disk_name, 32, "sbull%c", which + 'a');
set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE));
add_disk(dev->gd);
Here, SBULL_MINORS is the number of minor numbers each sbull device supports. When we set the first minor number for each device, we must take into account all of the numbers taken by prior devices. The name of the disk is set such that the first one is sbulla, the second sbullb, and so on. User space can then add partition numbers so that the third partition on the second device might be /dev/sbullb3.
Once everything is set up, we finish with a call to add_disk. Chances are that several of our methods will have been called for that disk by the time add_disk returns, so we take care to make that call the very last step in the initialization of our device.
16.1.4. A Note on Sector Sizes
As we have mentioned before, the kernel treats every disk as a linear array of 512-byte sectors. Not all hardware uses that sector size, however. Getting a device with a different sector size to work is not particularly hard; it is just a matter of taking care of a few details. The sbull device exports a hardsect_size parameter that can be used to change the "hardware" sector size of the device; by looking at its implementation, you can see how to add this sort of support to your own drivers.
The first of those details is to inform the kernel of the sector size your device supports. The hardware sector size is a parameter in the request queue, rather than in the gendisk structure. This size is set with a call to
blk_queue_hardsect_size immediately after the queue is allocated:
blk_queue_hardsect_size(dev->queue, hardsect_size);
Once that is done, the kernel adheres to your device's hardware sector size. All I/O requests are properly aligned at the beginning of a hardware sector, and the length of each request is an integral number of sectors. You must remember, however, that the kernel always expresses itself in 512-byte sectors; thus, it is necessary to translate all sector numbers accordingly. So, for example, when sbull sets the capacity of the device in its gendisk structure, the call looks like:
set_capacity(dev->gd, nsectors*(hardsect_size/KERNEL_SECTOR_SIZE));
KERNEL_SECTOR_SIZE is a locally-defined constant that we use to scale between the kernel's 512-byte sectors and whatever size we have been told to use. This sort of calculation pops up frequently as we look at the sbull request processing logic.
Комментариев нет:
Отправить комментарий
Примечание. Отправлять комментарии могут только участники этого блога.