- 22 Dec, 2012 9 commits
-
-
Frank Schaefer authored
This function will be used to uninitialize USB bulk transfers, too. Also rename the local variable isoc_bufs to usb_bufs. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
em28xx_irq_callback can be used for isoc and bulk transfers. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
It isn't used anymore and uses constants which no longer exist. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
Also rename the corresponding field isoc_ctl in struct em28xx to usb_ctl. We will use this struct for USB bulk transfers, too. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
It will be used for USB bulk transfers, too. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
Rename EM28XX_NUM_PACKETS to EM28XX_NUM_ISOC_PACKETS and EM28XX_DVB_MAX_PACKETS to EM28XX_DVB_NUM_ISOC_PACKETS to clarify that these values are used only for isoc usb transfers. Also use the term num_packets instead of max_packets, as this is how these values are used and called in struct urb. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
em28xx_copy_video uses a wrong offset for the target buffer when copying the data from an USB isoc packet. This happens only for the second and all following lines in the packet. The reason why this bug doesn't cause image corruption with my test device (SilverCrest Webcam 1.3 MPix) is, that this device never sends any packets that cross the end of a line. I don't know if all devices behave like this, so this patch should be considered for stable. With the upcoming patches to add support for USB bulk transfers, em28xx_copy_video will be called once per URB, which will always trigger this bug. Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Frank Schaefer authored
When em28xx_ir_init() fails due to an configuration error, it frees the memory of struct em28xx_IR *ir, but doesn't set the corresponding pointer in the device struct to NULL. On device removal, em28xx_ir_fini() gets called, which then calls rc_unregister_device() with a pointer to freed memory. Fixes bug 26572 (http://bugzilla.kernel.org/show_bug.cgi?id=26572) Signed-off-by: Frank Schäfer <fschaefer.oss@googlemail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
- 21 Dec, 2012 31 commits
-
-
Mauro Carvalho Chehab authored
Newer em28xx chipsets (em2874 and upper) are capable of supporting RC6 codes, on both mode 0 (command mode, 16 bits payload size, similar to RC5, also called "Philips mode") and mode 6a (OEM command mode, with offers a few alternatives with regards to the payload size). I don't have any mode 6a control ATM to test it, so, I opted to add support only to mode 0. After this patch, adding support to mode 6a should not be hard. Tested with a Philips television remote controller. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Mauro Carvalho Chehab authored
By disabling the NEC parity check, it is possible to handle all 3 NEC protocol variants (32, 24 or 16 bits). Change the driver in order to handle all of them. Unfortunately, em2860/em2863 provide only 16 bits for the IR scancode, even when NEC parity is disabled. So, this change should affect only em2874 and newer devices, with provides up to 32 bits for the scancode. Tested with one NEC-16, one NEC-24 and one RC5 IR. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Mauro Carvalho Chehab authored
Those two tuners may also be needed. Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Jonathan McDowell authored
I noticed that the EM28XX DVB driver doesn't auto select all of the appropriate DVB tuner modules required. In particular I needed DVB_LGDT3305 for my a340, but it looks like DVB_MT352 + DVB_S5H1409 were missing as well. Signed-Off-by: Jonathan McDowell <noodles@earth.li> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then pr_warn(... to printk(KERN_WARNING ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... and add pr_fmt. Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then pr_warn(... to printk(KERN_WARNING ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... and add pr_fmt. Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then pr_warn(... to printk(KERN_WARNING ... - WARNING: Prefer netdev_notice(netdev, ... then dev_notice(dev, ... then pr_notice(... to printk(KERN_NOTICE ... - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... and add pr_fmt. Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... and add pr_fmt. Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... and add pr_fmt. Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then pr_warn(... to printk(KERN_WARNING ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... and add pr_fmt. Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_warn(netdev, ... then dev_warn(dev, ... then pr_warn(... to printk(KERN_WARNING ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... - WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch error. - ERROR: that open brace { should be on the previous line Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch error. - ERROR: that open brace { should be on the previous line Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Julian Scheel authored
dvb_unregister_frontend has to be called before detach. Otherwise the unregister call will segfault. This made tm6000-dvb module unload unusable. Signed-off-by: Julian Scheel <julian@jusst.de> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... - WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warnings. - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... - WARNING: Prefer netdev_dbg(netdev, ... then dev_dbg(dev, ... then pr_debug(... to printk(KERN_DEBUG ... - WARNING: Prefer netdev_err(netdev, ... then dev_err(dev, ... then pr_err(... to printk(KERN_ERR ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
YAMANE Toshiaki authored
fixed below checkpatch warning. - WARNING: Prefer netdev_info(netdev, ... then dev_info(dev, ... then pr_info(... to printk(KERN_INFO ... Signed-off-by: YAMANE Toshiaki <yamanetoshi@gmail.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Matthijs Kooijman authored
This should fix a potential race condition, when the irq handler triggers while rc_register_device is still setting up the rdev->raw device. This crash has not been observed in practice, but there should be a very small window where it could occur. Since ir_raw_event_store_with_filter checks if rdev->raw is not NULL before using it, this bug is not triggered if the request_irq triggers a pending irq directly (since rdev->raw will still be NULL then). This commit was tested on nuvoton-cir only. Cc: Jarod Wilson <jarod@redhat.com> Cc: Maxim Levitsky <maximlevitsky@gmail.com> Cc: David Härdeman <david@hardeman.nu> Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Matthijs Kooijman authored
This fixes a problem in fintek-cir and nuvoton-cir where the irq handler would trigger during module load before the rdev member was set, causing a NULL pointer crash. It seems this crash is very reproducible (just bombard the receiver with IR signals during module load), probably because when request_irq is called, any pending intterupt is handled immediately, before request_irq returns and rdev can be set. This same crash was supposed to be fixed by commit 9ef449c6 ("[media] rc: Postpone ISR registration"), but the crash was still observed on the nuvoton-cir driver. This commit was tested on nuvoton-cir only. Cc: Jarod Wilson <jarod@redhat.com> Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Matthijs Kooijman authored
Before, labels were simply numbered. Now, the labels are named after the cleanup action they'll perform (first), based on how the winbond-cir driver does it. This makes the code a bit more clear and makes changes in the ordering of labels easier to review. This change is applied only to the rc drivers that do significant cleanup in their probe functions: ati-remote, ene-ir, fintek-cir, gpio-ir-recv, ite-cir, nuvoton-cir. This commit should not change any code, it just renames goto labels. [mchehab@redhat.com: removed changes at gpio-ir-recv.c, due to merge conflicts] Signed-off-by: Matthijs Kooijman <matthijs@stdin.nl> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Kirill Smelkov authored
precalculate_line() is not very high on profile, but it calls expensive gen_twopix(), so let's polish it too: call gen_twopix() only once for every color bar and then distribute the result. before: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # # Samples: 46K of event 'cycles' # Event count (approx.): 15574200568 # # Overhead Command Shared Object # ........ ............... .................... # 27.99% rawv libc-2.13.so [.] __memcpy_ssse3 23.29% vivi-* [kernel.kallsyms] [k] memcpy 10.30% Xorg [unknown] [.] 0xa75c98f8 5.34% vivi-* [vivi] [k] gen_text.constprop.6 4.61% rawv [vivi] [k] gen_twopix 2.64% rawv [vivi] [k] precalculate_line 1.37% swapper [kernel.kallsyms] [k] read_hpet after: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # # Samples: 45K of event 'cycles' # Event count (approx.): 15561769214 # # Overhead Command Shared Object # ........ ............... .................... # 30.73% rawv libc-2.13.so [.] __memcpy_ssse3 26.78% vivi-* [kernel.kallsyms] [k] memcpy 10.68% Xorg [unknown] [.] 0xa73015e9 5.55% vivi-* [vivi] [k] gen_text.constprop.6 1.36% swapper [kernel.kallsyms] [k] read_hpet 0.96% Xorg [kernel.kallsyms] [k] read_hpet ... 0.16% rawv [vivi] [k] precalculate_line ... 0.14% rawv [vivi] [k] gen_twopix (i.e. gen_twopix and precalculate_line overheads are almost gone) Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Kirill Smelkov authored
The "dev->mvcount % wmax" thing was showing high in profiles (we do it for each line which ~ 500 per frame) ? 000010c0 <vivi_fillbuff>: ... 0,39 ? 70:???mov 0x3ff4(%edi),%esi 0,22 ? 76:? mov 0x2a0(%edi),%eax 0,30 ? ? mov -0x84(%ebp),%ebx 0,35 ? ? mov %eax,%edx 0,04 ? ? mov -0x7c(%ebp),%ecx 0,35 ? ? sar $0x1f,%edx 0,44 ? ? idivl -0x7c(%ebp) 21,68 ? ? imul %esi,%ecx 0,70 ? ? imul %esi,%ebx 0,52 ? ? add -0x88(%ebp),%ebx 1,65 ? ? mov %ebx,%eax 0,22 ? ? imul %edx,%esi 0,04 ? ? lea 0x3f4(%edi,%esi,1),%edx 2,18 ? ?? call vivi_fillbuff+0xa6 0,74 ? ? addl $0x1,-0x80(%ebp) 62,69 ? ? mov -0x7c(%ebp),%edx 1,18 ? ? mov -0x80(%ebp),%ecx 0,35 ? ? add %edx,-0x84(%ebp) 0,61 ? ? cmp %ecx,-0x8c(%ebp) 0,22 ? ???jne 70 so since all variables stay the same for all iterations let's move computations out of the loop: the abovementioned division and "width*pixelsize" too before: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # # Samples: 49K of event 'cycles' # Event count (approx.): 16475832370 # # Overhead Command Shared Object # ........ ............... ...................... # 29.07% rawv libc-2.13.so [.] __memcpy_ssse3 20.57% vivi-* [kernel.kallsyms] [k] memcpy 10.20% Xorg [unknown] [.] 0xa7301494 5.16% vivi-* [vivi] [k] gen_text.constprop.6 4.43% rawv [vivi] [k] gen_twopix 4.36% vivi-* [vivi] [k] vivi_fillbuff 2.42% rawv [vivi] [k] precalculate_line 1.33% swapper [kernel.kallsyms] [k] read_hpet after: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # # Samples: 46K of event 'cycles' # Event count (approx.): 15574200568 # # Overhead Command Shared Object # ........ ............... .................... # 27.99% rawv libc-2.13.so [.] __memcpy_ssse3 23.29% vivi-* [kernel.kallsyms] [k] memcpy 10.30% Xorg [unknown] [.] 0xa75c98f8 5.34% vivi-* [vivi] [k] gen_text.constprop.6 4.61% rawv [vivi] [k] gen_twopix 2.64% rawv [vivi] [k] precalculate_line 1.37% swapper [kernel.kallsyms] [k] read_hpet 0.79% Xorg [kernel.kallsyms] [k] read_hpet 0.64% Xorg [kernel.kallsyms] [k] unix_poll 0.45% Xorg [kernel.kallsyms] [k] fget_light 0.43% rawv libxcb.so.1.1.0 [.] 0x0000aae9 0.40% runsv [kernel.kallsyms] [k] ext2_try_to_allocate 0.36% Xorg [kernel.kallsyms] [k] _raw_spin_lock_irqsave 0.31% vivi-* [vivi] [k] vivi_fillbuff (i.e. vivi_fillbuff own overhead is almost gone) Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Kirill Smelkov authored
Though dev->line[] is u8 array we work with it as with u16, u24 or u32 pixels, and also pass it to memcpy() and it's better to align it to at least 4. Before the patch, on x86 offsetof(vivi_dev, line) was 1003 and after patch it is 1004. There is slight performance increase, but I think is is slight, only because we start copying not from line[0]: ---- 8< ---- drivers/media/platform/vivi.c static void vivi_fillbuff(struct vivi_dev *dev, struct vivi_buffer *buf) { ... for (h = 0; h < hmax; h++) memcpy(vbuf + h * wmax * dev->pixelsize, dev->line + (dev->mv_count % wmax) * dev->pixelsize, wmax * dev->pixelsize); before: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # # Samples: 49K of event 'cycles' # Event count (approx.): 16799780016 # # Overhead Command Shared Object # ........ ............... .................... # 27.51% rawv libc-2.13.so [.] __memcpy_ssse3 23.77% vivi-* [kernel.kallsyms] [k] memcpy 9.96% Xorg [unknown] [.] 0xa76f5e12 4.94% vivi-* [vivi] [k] gen_text.constprop.6 4.44% rawv [vivi] [k] gen_twopix 3.17% vivi-* [vivi] [k] vivi_fillbuff 2.45% rawv [vivi] [k] precalculate_line 1.20% swapper [kernel.kallsyms] [k] read_hpet 23.77% vivi-* [kernel.kallsyms] [k] memcpy | --- memcpy | |--99.28%-- vivi_fillbuff | vivi_thread | kthread | ret_from_kernel_thread --0.72%-- [...] after: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # # Samples: 49K of event 'cycles' # Event count (approx.): 16475832370 # # Overhead Command Shared Object # ........ ............... ...................... # 29.07% rawv libc-2.13.so [.] __memcpy_ssse3 20.57% vivi-* [kernel.kallsyms] [k] memcpy 10.20% Xorg [unknown] [.] 0xa7301494 5.16% vivi-* [vivi] [k] gen_text.constprop.6 4.43% rawv [vivi] [k] gen_twopix 4.36% vivi-* [vivi] [k] vivi_fillbuff 2.42% rawv [vivi] [k] precalculate_line 1.33% swapper [kernel.kallsyms] [k] read_hpet Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-
Kirill Smelkov authored
I've noticed that vivi takes a lot of CPU to produce its frames. For example for 8 devices and 8 simple programs running, where each captures YUY2 640x480 and displays it to X via SDL, profile timing is as follows: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # Samples: 82K of event 'cycles' # Event count (approx.): 31551930117 # # Overhead Command Shared Object Symbol # ........ ............... .................... # 49.48% vivi-* [vivi] [k] gen_twopix 10.79% vivi-* [kernel.kallsyms] [k] memcpy 10.02% rawv libc-2.13.so [.] __memcpy_ssse3 8.35% vivi-* [vivi] [k] gen_text.constprop.6 5.06% Xorg [unknown] [.] 0xa73015f8 2.32% rawv [vivi] [k] gen_twopix 1.22% rawv [vivi] [k] precalculate_line 1.20% vivi-* [vivi] [k] vivi_fillbuff (rawv is display program, vivi-* is a combination of vivi-000 through vivi-007) so a lot of time is spent in gen_twopix() which as the follwing call-graph profile shows ... 49.48% vivi-* [vivi] [k] gen_twopix | --- gen_twopix | |--96.30%-- gen_text.constprop.6 | vivi_fillbuff | vivi_thread | kthread | ret_from_kernel_thread | --3.70%-- vivi_fillbuff vivi_thread kthread ret_from_kernel_thread ... is called mostly from gen_text(). If we'll look at gen_text(), in the inner loop, we'll see if (chr & (1 << (7 - i))) gen_twopix(dev, pos + j * dev->pixelsize, WHITE, (x+y) & 1); else gen_twopix(dev, pos + j * dev->pixelsize, TEXT_BLACK, (x+y) & 1); which calls gen_twopix() for every character pixel, and that is very expensive, because gen_twopix() branches several times. Now, let's note, that we operate on only two colors - WHITE and TEXT_BLACK, and that pixel for that colors could be precomputed and gen_twopix() moved out of the inner loop. Also note, that for black and white colors even/odd does not make a difference for all supported pixel formats, so we could stop doing that `odd` gen_twopix() parameter game. So the first thing we are doing here is 1) moving gen_twopix() calls out of gen_text() into vivi_fillbuff(), to pregenerate black and white colors, just before printing starts. what we have next is that gen_text's font rendering loop, even with gen_twopix() calls moved out, was inefficient and branchy, so let's 2) rewrite gen_text() loop so it uses less variables + unroll char horizontal-rendering loop + instantiate 3 code paths for pixelsizes 2,3 and 4 so that in all inner loops we don't have to branch or make indirections (*). Done all above reworks, for gen_text() we get nice, non-branchy streamlined code (showing loop for pixelsize=2): ? cmp $0x2,%eax ? ? jne 26 ? mov -0x18(%ebp),%eax ? mov -0x20(%ebp),%edi ? imul -0x20(%ebp),%eax ? movzwl 0x3ffc(%ebx),%esi 0,08 ? movzwl 0x4000(%ebx),%ecx 0,04 ? add %edi,%edi ? mov 0x0,%ebx 0,51 ? mov %edi,-0x1c(%ebp) ? mov %ebx,-0x14(%ebp) ? movl $0x0,-0x10(%ebp) ? lea 0x20(%edx,%eax,2),%eax ? mov %eax,-0x18(%ebp) ? xchg %ax,%ax 0,04 ? a0: mov 0x8(%ebp),%ebx ? mov -0x18(%ebp),%eax 0,04 ? movzbl (%ebx),%edx 0,16 ? test %dl,%dl 0,04 ? ? je 128 0,08 ? lea 0x0(%esi),%esi 1,61 ? b0:???shl $0x4,%edx 1,02 ? ? mov -0x14(%ebp),%edi 2,04 ? ? add -0x10(%ebp),%edx 2,24 ? ? lea 0x1(%ebx),%ebx 0,27 ? ? movzbl (%edi,%edx,1),%edx 9,92 ? ? mov %esi,%edi 0,39 ? ? test %dl,%dl 2,04 ? ? cmovns %ecx,%edi 4,63 ? ? test $0x40,%dl 0,55 ? ? mov %di,(%eax) 3,76 ? ? mov %esi,%edi 0,71 ? ? cmove %ecx,%edi 3,41 ? ? test $0x20,%dl 0,75 ? ? mov %di,0x2(%eax) 2,43 ? ? mov %esi,%edi 0,59 ? ? cmove %ecx,%edi 4,59 ? ? test $0x10,%dl 0,67 ? ? mov %di,0x4(%eax) 2,55 ? ? mov %esi,%edi 0,78 ? ? cmove %ecx,%edi 4,31 ? ? test $0x8,%dl 0,67 ? ? mov %di,0x6(%eax) 5,76 ? ? mov %esi,%edi 1,80 ? ? cmove %ecx,%edi 4,20 ? ? test $0x4,%dl 0,86 ? ? mov %di,0x8(%eax) 2,98 ? ? mov %esi,%edi 1,37 ? ? cmove %ecx,%edi 4,67 ? ? test $0x2,%dl 0,20 ? ? mov %di,0xa(%eax) 2,78 ? ? mov %esi,%edi 0,75 ? ? cmove %ecx,%edi 3,92 ? ? and $0x1,%edx 0,75 ? ? mov %esi,%edx 2,59 ? ? mov %di,0xc(%eax) 0,59 ? ? cmove %ecx,%edx 3,10 ? ? mov %dx,0xe(%eax) 2,39 ? ? add $0x10,%eax 0,51 ? ? movzbl (%ebx),%edx 2,86 ? ? test %dl,%dl 2,31 ? ???jne b0 0,04 ?128: addl $0x1,-0x10(%ebp) 4,00 ? mov -0x1c(%ebp),%eax 0,04 ? add %eax,-0x18(%ebp) 0,08 ? cmpl $0x10,-0x10(%ebp) ? ? jne a0 which almost goes away from the profile: # cmdline : /home/kirr/local/perf/bin/perf record -g -a sleep 20 # Samples: 49K of event 'cycles' # Event count (approx.): 16799780016 # # Overhead Command Shared Object Symbol # ........ ............... .................... # 27.51% rawv libc-2.13.so [.] __memcpy_ssse3 23.77% vivi-* [kernel.kallsyms] [k] memcpy 9.96% Xorg [unknown] [.] 0xa76f5e12 4.94% vivi-* [vivi] [k] gen_text.constprop.6 4.44% rawv [vivi] [k] gen_twopix 3.17% vivi-* [vivi] [k] vivi_fillbuff 2.45% rawv [vivi] [k] precalculate_line 1.20% swapper [kernel.kallsyms] [k] read_hpet i.e. gen_twopix() overhead dropped from 49% to 4% and gen_text() loops from ~8% to ~4%, and overal cycles count dropped from 31551930117 to 16799780016 which is ~1.9x whole workload speedup. (*) for RGB24 rendering I've introduced x24, which could be thought as synthetic u24 for simplifying the code. That's done because for memcpy used for conditional assignment, gcc generates suboptimal code with more indirections. Fortunately, in C struct assignment is builtin and that's all we need from pixeltype for font rendering. Signed-off-by: Kirill Smelkov <kirr@mns.spb.ru> Acked-by: Hans Verkuil <hans.verkuil@cisco.com> Signed-off-by: Mauro Carvalho Chehab <mchehab@redhat.com>
-