• Ming Lei's avatar
    USB: EHCI: support running URB giveback in tasklet context · 428aac8a
    Ming Lei authored
    All 4 transfer types can work well on EHCI HCD after switching to run
    URB giveback in tasklet context, so mark all HCD drivers to support
    it.
    
    Also we don't need to release ehci->lock during URB giveback any more.
    
    >From below test results on 3 machines(2 ARM and one x86), time
    consumed by EHCI interrupt handler droped much without performance
    loss.
    
    1 test description
    1.1 mass storage performance test:
    - run below command 10 times and compute the average performance
    
        dd if=/dev/sdN iflag=direct of=/dev/null bs=200M count=1
    
    - two usb mass storage device:
    A: sandisk extreme USB 3.0 16G(used in test case 1 & case 2)
    B: kingston DataTraveler G2 4GB(only used in test case 2)
    
    1.2 uvc function test:
    - run one simple capture program in the below link
    
       http://kernel.ubuntu.com/~ming/up/capture.c
    
    - capture format 640*480 and results in High Bandwidth mode on the
    uvc device: Z-Star 0x0ac8/0x3450
    
    - on T410(x86) laptop, also use guvcview to watch video capture/playback
    
    1.3 about test2 and test4
    - both two devices involved are tested concurrently by above test items
    
    1.4 how to compute irq time(the time consumed by ehci_irq)
    - use trace points of irq:irq_handler_entry and irq:irq_handler_exit
    
    1.5 kernel
    3.10.0-rc3-next-20130528
    
    1.6 test machines
    Pandaboard A1: ARM CortexA9 dural core
    Arndale board: ARM CortexA15 dural core
    T410: i5 CPU 2.67GHz quad core
    
    2 test result
    2.1 test case1: single mass storage device performance test
    --------------------------------------------------------------------
    		upstream 		| patched
    		perf(MB/s)+irq time(us)	| perf(MB/s)+irq time(us)
    --------------------------------------------------------------------
    Pandaboard A1:  25.280(avg:145,max:772)	| 25.540(avg:14, max:75)
    Arndale board:  29.700(avg:33, max:129)	| 29.700(avg:10,  max:50)
    T410: 		34.430(avg:17, max:154*)| 34.660(avg:12, max:155)
    ---------------------------------------------------------------------
    
    2.2 test case2: two mass storage devices' performance test
    --------------------------------------------------------------------
    		upstream 			| patched
    		perf(MB/s)+irq time(us)		| perf(MB/s)+irq time(us)
    --------------------------------------------------------------------
    Pandaboard A1:  15.840/15.580(avg:158,max:1216)	| 16.500/16.160(avg:15,max:139)
    Arndale board:  17.370/16.220(avg:33 max:234)	| 17.480/16.200(avg:11, max:91)
    T410: 		21.180/19.820(avg:18 max:160)	| 21.220/19.880(avg:11, max:149)
    ---------------------------------------------------------------------
    
    2.3 test case3: one uvc streaming test
    - uvc device works well(on x86, luvcview can be used too and has
    same result with uvc capture)
    --------------------------------------------------------------------
    		upstream 		| patched
    		irq time(us)		| irq time(us)
    --------------------------------------------------------------------
    Pandaboard A1:  (avg:445, max:873)	| (avg:33, max:44)
    Arndale board:  (avg:316, max:630)	| (avg:20, max:27)
    T410: 		(avg:39,  max:107)	| (avg:10, max:65)
    ---------------------------------------------------------------------
    
    2.4 test case4: one uvc streaming plus one mass storage device test
    --------------------------------------------------------------------
    		upstream 		| patched
    		perf(MB/s)+irq time(us)	| perf(MB/s)+irq time(us)
    --------------------------------------------------------------------
    Pandaboard A1:  20.340(avg:259,max:1704)| 20.390(avg:24, max:101)
    Arndale board:  23.460(avg:124,max:726)	| 23.370(avg:15, max:52)
    T410: 		28.520(avg:27, max:169)	| 28.630(avg:13, max:160)
    ---------------------------------------------------------------------
    
    2.5 test case5: read single mass storage device with small transfer
    - run below command 10 times and compute the average speed
    
     dd if=/dev/sdN iflag=direct of=/dev/null bs=4K count=4000
    
    1), test device A:
    --------------------------------------------------------------------
    		upstream 		| patched
    		perf(MB/s)+irq time(us)	| perf(MB/s)+irq time(us)
    --------------------------------------------------------------------
    Pandaboard A1:  6.5(avg:21, max:64)	| 6.5(avg:10, max:24)
    Arndale board:  8.13(avg:12, max:23)	| 8.06(avg:7,  max:17)
    T410: 		6.66(avg:13, max:131)   | 6.84(avg:11, max:149)
    ---------------------------------------------------------------------
    
    2), test device B:
    --------------------------------------------------------------------
    		upstream 		| patched
    		perf(MB/s)+irq time(us)	| perf(MB/s)+irq time(us)
    --------------------------------------------------------------------
    Pandaboard A1:  5.5(avg:21,max:43)	| 5.49(avg:10, max:24)
    Arndale board:  5.9(avg:12, max:22)	| 5.9(avg:7, max:17)
    T410: 		5.48(avg:13, max:155)	| 5.48(avg:7, max:140)
    ---------------------------------------------------------------------
    
    * On T410, sometimes read ehci status register in ehci_irq takes more
    than 100us, and the problem has been reported on the link:
    
    	http://marc.info/?t=137065867300001&r=1&w=2Acked-by: default avatarAlan Stern <stern@rowland.harvard.edu>
    Signed-off-by: default avatarMing Lei <ming.lei@canonical.com>
    Signed-off-by: default avatarGreg Kroah-Hartman <gregkh@linuxfoundation.org>
    428aac8a
ehci-grlib.c 4.4 KB