aboutsummaryrefslogtreecommitdiffstats
Commit message (Collapse)AuthorAgeFilesLines
...
| * | | | | | | | | cfq-iosched: Add documentation about idlingVivek Goyal2011-08-051-0/+71
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There are always questions about why CFQ is idling on various conditions. Recent ones is Christoph asking again why to idle on REQ_NOIDLE. His assertion is that XFS is relying more and more on workqueues and is concerned that CFQ idling on IO from every workqueue will impact XFS badly. So he suggested that I add some more documentation about CFQ idling and that can provide more clarity on the topic and also gives an opprotunity to poke a hole in theory and lead to improvements. So here is my attempt at that. Any comments are welcome. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | block: Make rq_affinity = 1 work as expectedTao Ma2011-08-051-5/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Commit 5757a6d76c introduced a new rq_affinity = 2 so as to make the request completed in the __make_request cpu. But it makes the old rq_affinity = 1 not work any more. The root cause is that if the 'cpu' and 'req->cpu' is in the same group and cpu != req->cpu, ccpu will be the same as group_cpu, so the completion will be excuted in the 'cpu' not 'group_cpu'. This patch fix problem by simpling removing group_cpu and the codes are more explicit now. If ccpu == cpu, we complete in cpu, otherwise we raise_blk_irq to ccpu. Cc: Christoph Hellwig <hch@infradead.org> Cc: Roland Dreier <roland@purestorage.com> Cc: Dan Williams <dan.j.williams@intel.com> Cc: Jens Axboe <jaxboe@fusionio.com> Signed-off-by: Tao Ma <boyu.mt@taobao.com> Reviewed-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | block: swim3: fix unterminated of_device_id tableAxel Lin2011-08-031-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | of_device_id structures need a NULL terminating entry, add it. Signed-off-by: Axel Lin <axel.lin@gmail.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | block/genhd.c: remove useless cast in diskstats_show()Herbert Poetzl2011-08-021-4/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Remove the (unsigned long long) cast in diskstats_show() and adjusts the seq_printf() format string to 'unsigned long' diskstats_show() uses part_stat_read() to get the stats, which either accesses the specified field in the struct disk_stats directly (non SMP) or sums up the per CPU values in a variable of the same type as the field, so in any case the result will have the same type and range as the specified field which for all disk_stats entries is unsigned long Also, for unsigned long ranges the output of %lu should be identical to the one of %llu, so no change in the actual proc entry contents. Signed-off-by: Herbert Poetzl <herbert@13thfloor.at> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | drivers/cdrom/cdrom.c: relax check on dvd manufacturer valueAndrew Morton2011-08-021-1/+7
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The report has an ISO which has a very long manufacturer ID. It seems that Linux is wrong, not the ISO maker. Relax the check for the length of this field: emit a warning and truncate the incoming data to 2048 bytes rather than rejecting the entire thing. dvd_manufact.value isn't null-terminated. I'm not even sure if it's a string. The kernel doesn't apepar to use it anyway. Addresses https://bugzilla.kernel.org/show_bug.cgi?id=39062 Reported-by: <ale.goujon@gmail.com> Tested-by: <ale.goujon@gmail.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | drivers/block/drbd/drbd_nl.c: use bitmap_parse instead of __bitmap_parseH Hartley Sweeten2011-08-021-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The buffer 'sc.cpu_mask' is a kernel buffer. If bitmap_parse is used instead of __bitmap_parse the extra parameter that indicates a kernel buffer is not needed. Signed-off-by: H Hartley Sweeten <hsweeten@visionengravers.com> Cc: Lars Ellenberg <drbd-dev@lists.linbit.com> Cc: Philipp Reisner <philipp.reisner@linbit.com> Cc: Jens Axboe <axboe@kernel.dk> Signed-off-by: Andrew Morton <akpm@linux-foundation.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | bsg-lib: add module.h includeJens Axboe2011-08-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Due to conflicts with the moduleh tree in linux-next, we run into an include file mess. We really need export.h in that tree, but if we add module.h locally then the issue is easier to resolve. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | cfq-iosched: Reduce linked group count upon group destructionVivek Goyal2011-08-021-0/+3
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | FQ keeps track of number of groups which are linked on blkcg->blkg_list. This is useful to avoid races between queue exit and cgroup exit code paths. So if at the request queue exit time linked group count is not zero, that means there are some group out there which is yet to be deleted under rcu read period and queue exit code should wait for on rcu period. In my previous patch I forgot to decrease the number of group count. So in current form, we nr_blkcg_linked_grps is always non-zero and we will always wait one rcu period (if BLK_CGROUP=y). The side effect of this is that it can increase boot time. I am surprised, nobody complained so far. Signed-off-by: Vivek Goyal <vgoyal@redhat.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | blk-throttle: correctly determine sync bioShaohua Li2011-08-011-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | read request is always sync. Using rw_is_sync() to determine if a bio is sync. Signed-off-by: Shaohua Li <shaohua.li@intel.com> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | loop: fix deadlock when sysfs and LOOP_CLR_FD race against each otherKay Sievers2011-07-311-2/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | LOOP_CLR_FD takes lo->lo_ctl_mutex and tries to remove the loop sysfs files. Sysfs calls show() and waits for lo->lo_ctl_mutex. LOOP_CLR_FD waits for show() to finish to remove the sysfs file. cat /sys/class/block/loop0/loop/backing_file mutex_lock_nested+0x176/0x350 ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] ? loop_attr_do_show_backing_file+0x2f/0xd0 [loop] loop_attr_do_show_backing_file+0x2f/0xd0 [loop] dev_attr_show+0x1b/0x60 ? sysfs_read_file+0x86/0x1a0 ? __get_free_pages+0x12/0x50 sysfs_read_file+0xaf/0x1a0 ioctl(LOOP_CLR_FD): wait_for_common+0x12c/0x180 ? try_to_wake_up+0x2a0/0x2a0 wait_for_completion+0x18/0x20 sysfs_deactivate+0x178/0x180 ? sysfs_addrm_finish+0x43/0x70 ? sysfs_addrm_start+0x1d/0x20 sysfs_addrm_finish+0x43/0x70 sysfs_hash_and_remove+0x85/0xa0 sysfs_remove_group+0x59/0x100 loop_clr_fd+0x1dc/0x3f0 [loop] lo_ioctl+0x223/0x7a0 [loop] Instead of taking the lo_ctl_mutex from sysfs code, take the inner lo->lo_lock, to protect the access to the backing_file data. Thanks to Tejun for help debugging and finding a solution. Cc: Milan Broz <mbroz@redhat.com> Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Cc: stable@kernel.org Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | loop: add BLK_DEV_LOOP_MIN_COUNT=%i to allow distros 0 pre-allocated loop ↵Kay Sievers2011-07-313-20/+31
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | devices Instead of unconditionally creating a fixed number of dead loop devices which need to be investigated by storage handling services, even when they are never used, we allow distros start with 0 loop devices and have losetup(8) and similar switch to the dynamic /dev/loop-control interface instead of searching /dev/loop%i for free devices. Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | loop: add management interface for on-demand device allocationKay Sievers2011-07-313-4/+121
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Loop devices today have a fixed pre-allocated number of usually 8. The number can only be changed at module init time. To find a free device to use, /dev/loop%i needs to be scanned, and all devices need to be opened until a free one is possibly found. This adds a new /dev/loop-control device node, that allows to dynamically find or allocate a free device, and to add and remove loop devices from the running system: LOOP_CTL_ADD adds a specific device. Arg is the number of the device. It returns the device i or a negative error code. LOOP_CTL_REMOVE removes a specific device, Arg is the number the device. It returns the device i or a negative error code. LOOP_CTL_GET_FREE finds the next unbound device or allocates a new one. No arg is given. It returns the device i or a negative error code. The loop kernel module gets automatically loaded when /dev/loop-control is accessed the first time. The alias specified in the module, instructs udev to create this 'dead' device node, even when the module is not loaded. Example: cfd = open("/dev/loop-control", O_RDWR); # add a new specific loop device err = ioctl(cfd, LOOP_CTL_ADD, devnr); # remove a specific loop device err = ioctl(cfd, LOOP_CTL_REMOVE, devnr); # find or allocate a free loop device to use devnr = ioctl(cfd, LOOP_CTL_GET_FREE); sprintf(loopname, "/dev/loop%i", devnr); ffd = open("backing-file", O_RDWR); lfd = open(loopname, O_RDWR); err = ioctl(lfd, LOOP_SET_FD, ffd); Cc: Tejun Heo <tj@kernel.org> Cc: Karel Zak <kzak@redhat.com> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | loop: replace linked list of allocated devices with an idr indexKay Sievers2011-07-312-73/+80
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Replace the linked list, that keeps track of allocated devices, with an idr index to allow a more efficient lookup of devices. Cc: Tejun Heo <tj@kernel.org> Signed-off-by: Kay Sievers <kay.sievers@vrfy.org> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
| * | | | | | | | | block: add bsg helper libraryMike Christie2011-07-315-0/+385
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | This moves the FC classes bsg code to the block layer and makes it a lib so that other classes like iscsi and SAS can use it. It is helpful because working with the request queue, bios, creating scatterlists, etc are a pain that the LLD does not have to worry about with normal IOs and should not have to worry about for bsg requests. Signed-off-by: Mike Christie <michaelc@cs.wisc.edu> Signed-off-by: Jens Axboe <jaxboe@fusionio.com>
* | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-199-153/+454
|\ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6 * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jbarnes/pci-2.6: PCI: OF: Don't crash when bridge parent is NULL. PCI: export pcie_bus_configure_settings symbol PCI: code and comments cleanup PCI: make cardbus-bridge resources optional PCI: make SRIOV resources optional PCI : ability to relocate assigned pci-resources PCI: honor child buses add_size in hot plug configuration PCI: Set PCI-E Max Payload Size on fabric
| * | | | | | | | | | PCI: OF: Don't crash when bridge parent is NULL.David Daney2011-08-191-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | In pcibios_get_phb_of_node(), we will crash while booting if bus->bridge->parent is NULL. Check for this case and avoid dereferencing the NULL pointer. Signed-off-by: David Daney <david.daney@cavium.com> Acked-by: Benjamin Herrenschmidt <benh@kernel.crashing.org> Acked-by: Grant Likely <grant.likely@secretlab.ca> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI: export pcie_bus_configure_settings symbolJon Mason2011-08-021-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | pcie_bus_configure_settings needs to be exported if the PCI hotplug driver is being compiled as a module. Reported-by: Stephen Rothwell <sfr@canb.auug.org.au> Signed-off-by: Jon Mason <mason@myri.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI: code and comments cleanupRam Pai2011-08-011-55/+55
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | a) adjust_resource_sorted() is now called reassign_resource_sorted() b) nice-to-have is now called optional c) add_list is now called realloc_list. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI: make cardbus-bridge resources optionalRam Pai2011-08-012-9/+36
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Allocate resources to cardbus bridge only after all other genuine resources requests are satisfied. Dont retry if resource allocation for cardbus-bridges fail. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI: make SRIOV resources optionalYinghai Lu2011-08-011-0/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | From: Yinghai Lu <yinghai@kernel.org> Allocate resources to SRIOV BARs only after all other required resource-requests are satisfied. Dont retry if resource allocation for SRIOV BARs fail. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Yinghai Lu <yinghai@kernel.org> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI : ability to relocate assigned pci-resourcesRam Pai2011-08-013-67/+113
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently pci-bridges are allocated enough resources to satisfy their immediate requirements. Any additional resource-requests fail if additional free space, contiguous to the one already allocated, is not available. This behavior is not reasonable since sufficient contiguous resources, that can satisfy the request, are available at a different location. This patch provides the ability to expand and relocate a allocated resource. v2: Changelog: Fixed size calculation in pci_reassign_resource() v3: Changelog : Split this patch. The resource.c changes are already upstream. All the pci driver changes are in here. Signed-off-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI: honor child buses add_size in hot plug configurationYinghai Lu2011-08-011-0/+26
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git commit c8adf9a3e873eddaaec11ac410a99ef6b9656938 "PCI: pre-allocate additional resources to devices only after successful allocation of essential resources." fails to take into consideration the optional-resources needed by children devices while calculating the optional-resource needed by the bridge. This can be a problem on some setup. For example, if a hotplug bridge has 8 children hotplug bridges, the bridge should have enough resources to accomodate the hotplug requirements for each of its children hotplug bridges. Currently this is not the case. This patch fixes the problem. Signed-off-by: Yinghai Lu <yinghai@kernel.org> Reviewed-by: Ram Pai <linuxram@us.ibm.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
| * | | | | | | | | | PCI: Set PCI-E Max Payload Size on fabricJon Mason2011-08-015-45/+236
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On a given PCI-E fabric, each device, bridge, and root port can have a different PCI-E maximum payload size. There is a sizable performance boost for having the largest possible maximum payload size on each PCI-E device. However, if improperly configured, fatal bus errors can occur. Thus, it is important to ensure that PCI-E payloads sends by a device are never larger than the MPS setting of all devices on the way to the destination. This can be achieved two ways: - A conservative approach is to use the smallest common denominator of the entire tree below a root complex for every device on that fabric. This means for example that having a 128 bytes MPS USB controller on one leg of a switch will dramatically reduce performances of a video card or 10GE adapter on another leg of that same switch. It also means that any hierarchy supporting hotplug slots (including expresscard or thunderbolt I suppose, dbl check that) will have to be entirely clamped to 128 bytes since we cannot predict what will be plugged into those slots, and we cannot change the MPS on a "live" system. - A more optimal way is possible, if it falls within a couple of constraints: * The top-level host bridge will never generate packets larger than the smallest TLP (or if it can be controlled independently from its MPS at least) * The device will never generate packets larger than MPS (which can be configured via MRRS) * No support of direct PCI-E <-> PCI-E transfers between devices without some additional code to specifically deal with that case Then we can use an approach that basically ignores downstream requests and focuses exclusively on upstream requests. In that case, all we need to care about is that a device MPS is no larger than its parent MPS, which allows us to keep all switches/bridges to the max MPS supported by their parent and eventually the PHB. In this case, your USB controller would no longer "starve" your 10GE Ethernet and your hotplug slots won't affect your global MPS. Additionally, the hotplugged devices themselves can be configured to a larger MPS up to the value configured in the hotplug bridge. To choose between the two available options, two PCI kernel boot args have been added to the PCI calls. "pcie_bus_safe" will provide the former behavior, while "pcie_bus_perf" will perform the latter behavior. By default, the latter behavior is used. NOTE: due to the location of the enablement, each arch will need to add calls to this function. This patch only enables x86. This patch includes a number of changes recommended by Benjamin Herrenschmidt. Tested-by: Jordan_Hargrave@dell.com Signed-off-by: Jon Mason <mason@myri.com> Signed-off-by: Jesse Barnes <jbarnes@virtuousgeek.org>
* | | | | | | | | | | Merge git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds2011-08-181-2/+2
|\ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc: fix array bounds error setting up PCIC NMI trap
| * | | | | | | | | | | sparc: fix array bounds error setting up PCIC NMI trapIan Campbell2011-08-181-2/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | CC arch/sparc/kernel/pcic.o arch/sparc/kernel/pcic.c: In function 'pcic_probe': arch/sparc/kernel/pcic.c:359:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:359:8: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:360:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:360:8: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:361:33: error: array subscript is above array bounds [-Werror=array-bounds] arch/sparc/kernel/pcic.c:361:8: error: array subscript is above array bounds [-Werror=array-bounds] cc1: all warnings being treated as errors I'm not particularly familiar with sparc but t_nmi (defined in head_32.S via the TRAP_ENTRY macro) and pcic_nmi_trap_patch (defined in entry.S) both appear to be 4 instructions long and I presume from the usage that instructions are int sized. Signed-off-by: Ian Campbell <ian.campbell@citrix.com> Cc: "David S. Miller" <davem@davemloft.net> Cc: sparclinux@vger.kernel.org Reviewed-by: Sam Ravnborg <sam@ravnborg.org> Signed-off-by: David S. Miller <davem@davemloft.net>
* | | | | | | | | | | | Merge branch 'upstream-linus' of ↵Linus Torvalds2011-08-186-9/+288
|\ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev * 'upstream-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/jgarzik/libata-dev: drivers/ata/sata_dwc_460ex.c: add missing kfree ata: Add iMX pata support pata_via: disable ATAPI DMA on AVERATEC 3200 [libata] sata_sil: fix used-uninit warning
| * | | | | | | | | | | | drivers/ata/sata_dwc_460ex.c: add missing kfreeJulia Lawall2011-08-181-8/+6
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, error handling code in this function calls the function sata_dwc_port_stop, but this function has essentially no effect if hsdevp has not been stored in ap, which is the case throughout this function. The only effect is to print a debugging message including ap->print_id. The code is rewritten to not call sata_dwc_port_stop, but instead to jump to a local label that prints the original error message and the print_id information. In the case where hsdevp has been already allocated (but not yet stored in ap), this value is freed as well. A simplified version of the semantic match that finds this problem is as follows: (http://coccinelle.lip6.fr/) // <smpl> @exists@ local idexpression x; statement S,S1; expression E; identifier fl; expression *ptr != NULL; @@ x = \(kmalloc\|kzalloc\|kcalloc\)(...); ... if (x == NULL) S <... when != x when != if (...) { <+...kfree(x)...+> } when any when != true x == NULL x->fl ...> ( if (x == NULL) S1 | if (...) { ... when != x when forall ( return \(0\|<+...x...+>\|ptr\); | * return ...; ) } ) // </smpl> Signed-off-by: Julia Lawall <julia@diku.dk> Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
| * | | | | | | | | | | | ata: Add iMX pata supportArnaud Patard (Rtp)2011-08-183-0/+263
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Add basic support for pata on iMX. It has been tested only on imx51. SDMA support will probably be added later so this version supports only PIO. v2: - enable only when needed IORDY - use dev_get_drvdata v3: - add missing clk_put() calls - use platform_get_irq() - fix resume code to avoid disabling IORDY on resume v4: - Remove EXPERIMENTAL and switch to depends on ARCH_MXC - Use devm_kzalloc() - make clock a must-have - Use only 1 ioremap Signed-off-by: Arnaud Patard <arnaud.patard@rtp-net.org> Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
| * | | | | | | | | | | | pata_via: disable ATAPI DMA on AVERATEC 3200Tejun Heo2011-08-181-0/+18
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | On AVERATEC 3200, pata_via causes memory corruption with ATAPI DMA, which often leads to random kernel oops. The cause of the problem is not well understood yet and only small subset of machines using the controller seem affected. Blacklist ATAPI DMA on the machine. Signed-off-by: Tejun Heo <tj@kernel.org> Bugzilla: https://bugzilla.kernel.org/show_bug.cgi?id=11426 Reported-and-tested-by: Jim Bray <jimsantelmo@gmail.com> Cc: Alan Cox <alan@linux.intel.com> Cc: stable@kernel.org Signed-off-by: Jeff Garzik <jgarzik@pobox.com>
| * | | | | | | | | | | | [libata] sata_sil: fix used-uninit warningJeff Garzik2011-08-181-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Init 'serror' to silence the following warning: drivers/ata/sata_sil.c: In function ‘sil_interrupt’: drivers/ata/sata_sil.c:453:14: warning: ‘serror’ may be used uninitialized in this function [-Wuninitialized] This is not a 'can never happen' but is nonetheless extremely unlikely. The easiest and cleanest warning fix is simply to init the var, rather than worry about marking the var uninit-ok. Signed-off-by: Jeff Garzik <jgarzik@redhat.com>
* | | | | | | | | | | | | Merge branch 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfsLinus Torvalds2011-08-186-40/+43
|\ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | * 'bugfixes' of git://git.linux-nfs.org/projects/trondmy/linux-nfs: NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resets NFSv4.1: Fix the callback 'highest_used_slotid' behaviour pnfs-obj: Fix the comp_index != 0 case pnfs-obj: Bug when we are running out of bio nfs: add missing prefetch.h include
| * | | | | | | | | | | | | NFSv4.1: Return NFS4ERR_BADSESSION to callbacks during session resetsTrond Myklebust2011-08-041-0/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | If the client is in the process of resetting the session when it receives a callback, then returning NFS4ERR_DELAY may cause a deadlock with the DESTROY_SESSION call. Basically, if the client returns NFS4ERR_DELAY in response to the CB_SEQUENCE call, then the server is entitled to believe that the client is busy because it is already processing that call. In that case, the server is perfectly entitled to respond with a NFS4ERR_BACK_CHAN_BUSY to any DESTROY_SESSION call. Fix this by having the client reply with a NFS4ERR_BADSESSION in response to the callback if it is resetting the session. Cc: stable@kernel.org [2.6.38+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | | | | | | | | | NFSv4.1: Fix the callback 'highest_used_slotid' behaviourTrond Myklebust2011-08-043-24/+22
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Currently, there is no guarantee that we will call nfs4_cb_take_slot() even though nfs4_callback_compound() will consistently call nfs4_cb_free_slot() provided the cb_process_state has set the 'clp' field. The result is that we can trigger the BUG_ON() upon the next call to nfs4_cb_take_slot(). This patch fixes the above problem by using the slot id that was taken in the CB_SEQUENCE operation as a flag for whether or not we need to call nfs4_cb_free_slot(). It also fixes an atomicity problem: we need to set tbl->highest_used_slotid atomically with the check for NFS4_SESSION_DRAINING, otherwise we end up racing with the various tests in nfs4_begin_drain_session(). Cc: stable@kernel.org [2.6.38+] Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | | | | | | | | | pnfs-obj: Fix the comp_index != 0 caseBoaz Harrosh2011-08-042-9/+10
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | There were bugs in the case of partial layout where olo_comp_index is not zero. This used to work and was tested but one of the later cleanup SQUASHMEs broke it and was not tested since. Also add a dprint that specify those received layout parameters. Everything else was already printed. [Needed in v3.0] CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | | | | | | | | | pnfs-obj: Bug when we are running out of bioBoaz Harrosh2011-08-041-7/+5
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When we have a situation that the number of pages we want to encode is bigger then the size of the bio. (Which can currently happen only when all IO is going to a single device .e.g group_width==1) then the IO is submitted short and we report back only the amount of bytes we actually wrote/read and all is fine. BUT ... There was a bug that the current length counter was advanced before the fail to add the extra page, and we come to a situation that the CDB length was one-page longer then the actual bio size, which is of course rejected by the osd-target. While here also fix the bio size calculation, in the case that we received more then one group of devices. CC: Stable Tree <stable@kernel.org> Signed-off-by: Boaz Harrosh <bharrosh@panasas.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
| * | | | | | | | | | | | | nfs: add missing prefetch.h includeHeiko Carstens2011-08-041-0/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Fix this compile error on s390: CC [M] fs/nfs/blocklayout/blocklayout.o fs/nfs/blocklayout/blocklayout.c: In function 'bl_end_io_read': fs/nfs/blocklayout/blocklayout.c:201:4: error: implicit declaration of function 'prefetchw' Introduced with 9549ec01 "pnfsblock: bl_read_pagelist". Cc: Fred Isaman <iisaman@citi.umich.edu> Signed-off-by: Heiko Carstens <heiko.carstens@de.ibm.com> Signed-off-by: Trond Myklebust <Trond.Myklebust@netapp.com>
* | | | | | | | | | | | | | Merge branch 'for-linus' of ↵Linus Torvalds2011-08-189-43/+183
|\ \ \ \ \ \ \ \ \ \ \ \ \ \ | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/mason/btrfs-unstable: Btrfs: set i_size properly when fallocating and we already btrfs: unlock on error in btrfs_file_llseek() btrfs: btrfs_permission's RO check shouldn't apply to device nodes Btrfs: truncate pages from clone ioctl target range Btrfs: fix uninitialized sync_pending Btrfs: fix wrong free space information btrfs: memory leak in btrfs_add_inode_defrag() Btrfs: use plain page_address() in header fields setget functions Btrfs: forced readonly when btrfs_drop_snapshot() fails Btrfs: check if there is enough space for balancing smarter Btrfs: fix a bug of balance on full multi-disk partitions Btrfs: fix an oops of log replay Btrfs: detect wether a device supports discard Btrfs: force unplugs when switching from high to regular priority bios
| * \ \ \ \ \ \ \ \ \ \ \ \ \ Merge branch 'btrfs-3.0' into for-linusChris Mason2011-08-188-35/+167
| |\ \ \ \ \ \ \ \ \ \ \ \ \ \
| | * | | | | | | | | | | | | | Btrfs: set i_size properly when fallocating and we alreadyJosef Bacik2011-08-181-0/+14
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | xfstests exposed a problem with preallocate when it fallocates a range that already has an extent. We don't set the new i_size properly because we see that we already have an extent. This isn't right and we should update i_size if the space already exists. With this patch we now pass xfstests 075. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: truncate pages from clone ioctl target rangeSage Weil2011-08-161-0/+4
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We need to truncate page cache pages for the clone ioctl target range or else we'll confuse ourselves to no end. If the old data was cached, we used to still see it (until remount). If the page was partially updated we used to get a mix of old and new data. Signed-off-by: Sage Weil <sage@newdream.net> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: fix uninitialized sync_pendingMiao Xie2011-08-161-1/+1
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | sync_pending is uninitialized before it be used, fix it. Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: fix wrong free space informationMiao Xie2011-08-161-5/+11
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs subtracted the size of the allocated space twice when it allocated the space from the bitmap in the cluster, it broke the free space information and led to oops finally. And this patch also fixes the bug that ctl->free_space was subtracted without lock. Reported-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Miao Xie <miaox@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | btrfs: memory leak in btrfs_add_inode_defrag()Dan Carpenter2011-08-161-0/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We don't use the defrag struct on this path. Signed-off-by: Dan Carpenter <error27@gmail.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: use plain page_address() in header fields setget functionsLi Zefan2011-08-161-4/+2
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We've stopped using highmem for extent buffers. Signed-off-by: Li Zefan <lizf@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: forced readonly when btrfs_drop_snapshot() failsTsutomu Itoh2011-08-162-10/+16
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | The filesystem turns readonly instead of returning the error to the caller when detected error in btrfs_drop_snapshot(). and, because the caller doesn't check the error, the function type is changed to 'void'. Signed-off-by: Tsutomu Itoh <t-itoh@jp.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: check if there is enough space for balancing smarterliubo2011-08-161-6/+35
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When checking if there is enough space for balancing a block group, since we do not take raid types into consideration, we do not account corrent amounts of space that we needed. This makes us do some extra work before we get ENOSPC. Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: fix a bug of balance on full multi-disk partitionsliubo2011-08-161-4/+13
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When balancing, we'll first try to shrink devices for some space, but if it is working on a full multi-disk partition with raid protection, we may encounter a bug, that is, while shrinking, total_bytes may be less than bytes_used, and btrfs may allocate a dev extent that accesses out of device's bounds. Then we will not be able to write or read the data which stores at the end of the device, and get the followings: device fsid 0939f071-7ea3-46c8-95df-f176d773bfb6 devid 1 transid 10 /dev/sdb5 Btrfs detected SSD devices, enabling SSD mode btrfs: relocating block group 476315648 flags 9 btrfs: found 4 extents attempt to access beyond end of device sdb5: rw=145, want=546176, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546304, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546432, limit=546147 attempt to access beyond end of device sdb5: rw=145, want=546560, limit=546147 attempt to access beyond end of device Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: fix an oops of log replayliubo2011-08-161-4/+24
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | When btrfs recovers from a crash, it may hit the oops below: ------------[ cut here ]------------ kernel BUG at fs/btrfs/inode.c:4580! [...] RIP: 0010:[<ffffffffa03df251>] [<ffffffffa03df251>] btrfs_add_link+0x161/0x1c0 [btrfs] [...] Call Trace: [<ffffffffa03e7b31>] ? btrfs_inode_ref_index+0x31/0x80 [btrfs] [<ffffffffa04054e9>] add_inode_ref+0x319/0x3f0 [btrfs] [<ffffffffa0407087>] replay_one_buffer+0x2c7/0x390 [btrfs] [<ffffffffa040444a>] walk_down_log_tree+0x32a/0x480 [btrfs] [<ffffffffa0404695>] walk_log_tree+0xf5/0x240 [btrfs] [<ffffffffa0406cc0>] btrfs_recover_log_trees+0x250/0x350 [btrfs] [<ffffffffa0406dc0>] ? btrfs_recover_log_trees+0x350/0x350 [btrfs] [<ffffffffa03d18b2>] open_ctree+0x1442/0x17d0 [btrfs] [...] This comes from that while replaying an inode ref item, we forget to check those old conflicting DIR_ITEM and DIR_INDEX items in fs/file tree, then we will come to conflict corners which lead to BUG_ON(). Signed-off-by: Liu Bo <liubo2009@cn.fujitsu.com> Tested-by: Andy Lutomirski <luto@mit.edu> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: detect wether a device supports discardJosef Bacik2011-08-163-2/+29
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | We have a problem where if a user specifies discard but doesn't actually support it we will return EOPNOTSUPP from btrfs_discard_extent. This is a problem because this gets called (in a fashion) from the tree log recovery code, which has a nice little BUG_ON(ret) after it, which causes us to fail the tree log replay. So instead detect wether our devices support discard when we're adding them and then don't issue discards if we know that the device doesn't support it. And just for good measure set ret = 0 in btrfs_issue_discard just in case we still get EOPNOTSUPP so we don't screw anybody up like this again. Thanks, Signed-off-by: Josef Bacik <josef@redhat.com> Signed-off-by: Chris Mason <chris.mason@oracle.com>
| | * | | | | | | | | | | | | | Btrfs: force unplugs when switching from high to regular priority biosChris Mason2011-08-051-0/+17
| | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | | Btrfs does bio submissions from a worker thread, and each device has a list of high priority bios and regular priority bios. Synchronous writes go to the high priority thread while async writes go to regular list. This commit brings back an explicit unplug any time we switch from high to regular priority, which makes it easier for the block layer to give us low latencies. Signed-off-by: Chris Mason <chris.mason@oracle.com>