1. 17 Jun, 2014 11 commits
    • Aristeu Rozanski's avatar
      device_cgroup: rework device access check and exception checking · ea2cadec
      Aristeu Rozanski authored
      commit 79d71974 upstream.
      
      Whenever a device file is opened and checked against current device
      cgroup rules, it uses the same function (may_access()) as when a new
      exception rule is added by writing devices.{allow,deny}. And in both
      cases, the algorithm is the same, doesn't matter the behavior.
      
      First problem is having device access to be considered the same as rule
      checking. Consider the following structure:
      
      	A	(default behavior: allow, exceptions disallow access)
      	 \
      	  B	(default behavior: allow, exceptions disallow access)
      
      A new exception is added to B by writing devices.deny:
      
      	c 12:34 rw
      
      When checking if that exception is allowed in may_access():
      
      	if (dev_cgroup->behavior == DEVCG_DEFAULT_ALLOW) {
      		if (behavior == DEVCG_DEFAULT_ALLOW) {
      			/* the exception will deny access to certain devices */
      			return true;
      
      Which is ok, since B is not getting more privileges than A, it doesn't
      matter and the rule is accepted
      
      Now, consider it's a device file open check and the process belongs to
      cgroup B. The access will be generated as:
      
      	behavior: allow
      	exception: c 12:34 rw
      
      The very same chunk of code will allow it, even if there's an explicit
      exception telling to do otherwise.
      
      A simple test case:
      
      	# mkdir new_group
      	# cd new_group
      	# echo $$ >tasks
      	# echo "c 1:3 w" >devices.deny
      	# echo >/dev/null
      	# echo $?
      	0
      
      This is a serious bug and was introduced on
      
      	c39a2a30 devcg: prepare may_access() for hierarchy support
      
      To solve this problem, the device file open function was split from the
      new exception check.
      
      Second problem is how exceptions are processed by may_access(). The
      first part of the said function tries to match fully with an existing
      exception:
      
      	list_for_each_entry_rcu(ex, &dev_cgroup->exceptions, list) {
      		if ((refex->type & DEV_BLOCK) && !(ex->type & DEV_BLOCK))
      			continue;
      		if ((refex->type & DEV_CHAR) && !(ex->type & DEV_CHAR))
      			continue;
      		if (ex->major != ~0 && ex->major != refex->major)
      			continue;
      		if (ex->minor != ~0 && ex->minor != refex->minor)
      			continue;
      		if (refex->access & (~ex->access))
      			continue;
      		match = true;
      		break;
      	}
      
      That means the new exception should be contained into an existing one to
      be considered a match:
      
      	New exception		Existing	match?	notes
      	b 12:34 rwm		b 12:34 rwm	yes
      	b 12:34 r		b *:34 rw	yes
      	b 12:34 rw		b 12:34 w	no	extra "r"
      	b *:34 rw		b 12:34 rw	no	too broad "*"
      	b *:34 rw		b *:34 rwm	yes
      
      Which is fine in some cases. Consider:
      
      	A	(default behavior: deny, exceptions allow access)
      	 \
      	  B	(default behavior: deny, exceptions allow access)
      
      In this case the full match makes sense, the new exception cannot add
      more access than the parent allows
      
      But this doesn't always work, consider:
      
      	A	(default behavior: allow, exceptions disallow access)
      	 \
      	  B	(default behavior: deny, exceptions allow access)
      
      In this case, a new exception in B shouldn't match any of the exceptions
      in A, after all you can't allow something that was forbidden by A. But
      consider this scenario:
      
      	New exception	Existing in A	match?	outcome
      	b 12:34 rw	b 12:34 r	no	exception is accepted
      
      Because the new exception has "w" as extra, it doesn't match, so it'll
      be added to B's exception list.
      
      The same problem can happen during a file access check. Consider a
      cgroup with allow as default behavior:
      
      	Access		Exception	match?
      	b 12:34 rw	b 12:34 r	no
      
      In this case, the access didn't match any of the exceptions in the
      cgroup, which is required since exceptions will disallow access.
      
      To solve this problem, two new functions were created to match an
      exception either fully or partially. In the example above, a partial
      check will be performed and it'll produce a match since at least
      "b 12:34 r" from "b 12:34 rw" access matches.
      
      Cc: cgroups@vger.kernel.org
      Cc: Tejun Heo <tj@kernel.org>
      Cc: Serge Hallyn <serge.hallyn@canonical.com>
      Cc: Li Zefan <lizefan@huawei.com>
      Signed-off-by: default avatarAristeu Rozanski <arozansk@redhat.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ea2cadec
    • Geert Uytterhoeven's avatar
      spi: core: Ignore unsupported Dual/Quad Transfer Mode bits · 1735c34b
      Geert Uytterhoeven authored
      commit 83596fbe upstream.
      
      The availability of SPI Dual or Quad Transfer Mode as indicated by the
      "spi-tx-bus-width" and "spi-rx-bus-width" properties in the device tree is
      a hardware property of the SPI master, SPI slave, and board wiring.  Hence
      the SPI core should not reject an SPI slave because an SPI master driver
      doesn't (yet) support Dual or Quad Transfer Mode.
      
      Change the lack of Dual or Quad Transfer Mode support in the SPI master
      driver from an error condition to a warning condition, and ignore the
      unsupported mode bits, falling back to Single Transfer Mode, to avoid
      breakages when running old kernels with new device trees.
      
      Fixes: f477b7fb (spi: DUAL and QUAD support)
      Signed-off-by: default avatarGeert Uytterhoeven <geert+renesas@glider.be>
      Signed-off-by: default avatarMark Brown <broonie@linaro.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      1735c34b
    • Lai Jiangshan's avatar
      workqueue: fix a possible race condition between rescuer and pwq-release · afa532b6
      Lai Jiangshan authored
      commit 77668c8b upstream.
      
      There is a race condition between rescuer_thread() and
      pwq_unbound_release_workfn().
      
      Even after a pwq is scheduled for rescue, the associated work items
      may be consumed by any worker.  If all of them are consumed before the
      rescuer gets to them and the pwq's base ref was put due to attribute
      change, the pwq may be released while still being linked on
      @wq->maydays list making the rescuer dereference already freed pwq
      later.
      
      Make send_mayday() pin the target pwq until the rescuer is done with
      it.
      
      tj: Updated comment and patch description.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      afa532b6
    • Lai Jiangshan's avatar
      workqueue: make rescuer_thread() empty wq->maydays list before exiting · 437042eb
      Lai Jiangshan authored
      commit 4d595b86 upstream.
      
      After a @pwq is scheduled for emergency execution, other workers may
      consume the affectd work items before the rescuer gets to them.  This
      means that a workqueue many have pwqs queued on @wq->maydays list
      while not having any work item pending or in-flight.  If
      destroy_workqueue() executes in such condition, the rescuer may exit
      without emptying @wq->maydays.
      
      This currently doesn't cause any actual harm.  destroy_workqueue() can
      safely destroy all the involved data structures whether @wq->maydays
      is populated or not as nobody access the list once the rescuer exits.
      
      However, this is nasty and makes future development difficult.  Let's
      update rescuer_thread() so that it empties @wq->maydays after seeing
      should_stop to guarantee that the list is empty on rescuer exit.
      
      tj: Updated comment and patch description.
      Signed-off-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      437042eb
    • Thomas Petazzoni's avatar
      ARM: orion5x: fix target ID for crypto SRAM window · 21556235
      Thomas Petazzoni authored
      commit 1cc9d481 upstream.
      
      In commit 4ca2c040 ('ARM: orion5x:
      Move to ID based window creation'), the mach-orion5x code was changed
      to use the new mvebu-mbus API. However, in the process, a mistake was
      made on the crypto SRAM window target ID: it should have been 0x9
      (verified in the datasheet) and not 0x0.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Acked-by: default avatarSebastian Hesselbarth <sebastian.hesselbarth@gmail.com>
      Link: https://lkml.kernel.org/r/1397400006-4315-2-git-send-email-thomas.petazzoni@free-electrons.com
      Fixes: 4ca2c040 ('ARM: orion5x: Move to ID based window creation')
      Signed-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      21556235
    • Thomas Petazzoni's avatar
      memory: mvebu-devbus: fix the conversion of the bus width · 03ad5dd8
      Thomas Petazzoni authored
      commit ce965c3d upstream.
      
      According to the Armada 370 and Armada XP datasheets, the part of the
      Device Bus register that configure the bus width should contain 0 for
      a 8 bits bus width, and 1 for a 16 bits bus width (other values are
      unsupported/reserved).
      
      However, the current conversion done in the driver to convert from a
      bus width in bits to the value expected by the register leads to
      setting the register to 1 for a 8 bits bus, and 2 for a 16 bits bus.
      
      This mistake was compensated by a mistake in the existing Device Tree
      files for Armada 370/XP platforms: they were declaring a 8 bits bus
      width, while the hardware in fact uses a 16 bits bus width.
      
      This commit fixes that by adjusting the conversion logic.
      
      This patch fixes a bug that was introduced in
      3edad321 ('drivers: memory: Introduce
      Marvell EBU Device Bus driver'), which was merged in v3.11.
      Signed-off-by: default avatarThomas Petazzoni <thomas.petazzoni@free-electrons.com>
      Link: https://lkml.kernel.org/r/1397489361-5833-2-git-send-email-thomas.petazzoni@free-electrons.com
      Fixes: 3edad321 ('drivers: memory: Introduce Marvell EBU Device Bus driver')
      Acked-by: default avatarEzequiel Garcia <ezequiel.garcia@free-electrons.com>
      Acked-by: default avatarGregory CLEMENT <gregory.clement@free-electrons.com>
      Signed-off-by: default avatarJason Cooper <jason@lakedaemon.net>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      03ad5dd8
    • Antti Palosaari's avatar
      [media] fc2580: fix tuning failure on 32-bit arch · b40eb04d
      Antti Palosaari authored
      commit 8845cc64 upstream.
      
      There was some frequency calculation overflows which caused tuning
      failure on 32-bit architecture. Use 64-bit numbers where needed in
      order to avoid calculation overflows.
      
      Thanks for the Finnish person, who asked remain anonymous, reporting,
      testing and suggesting the fix.
      Signed-off-by: default avatarAntti Palosaari <crope@iki.fi>
      Signed-off-by: default avatarMauro Carvalho Chehab <m.chehab@samsung.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      b40eb04d
    • Daeseok Youn's avatar
      workqueue: fix bugs in wq_update_unbound_numa() failure path · 47f445e1
      Daeseok Youn authored
      commit 77f300b1 upstream.
      
      wq_update_unbound_numa() failure path has the following two bugs.
      
      - alloc_unbound_pwq() is called without holding wq->mutex; however, if
        the allocation fails, it jumps to out_unlock which tries to unlock
        wq->mutex.
      
      - The function should switch to dfl_pwq on failure but didn't do so
        after alloc_unbound_pwq() failure.
      
      Fix it by regrabbing wq->mutex and jumping to use_dfl_pwq on
      alloc_unbound_pwq() failure.
      Signed-off-by: default avatarDaeseok Youn <daeseok.youn@gmail.com>
      Acked-by: default avatarLai Jiangshan <laijs@cn.fujitsu.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 4c16bd32 ("workqueue: implement NUMA affinity for unbound workqueues")
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      47f445e1
    • Jianyu Zhan's avatar
      percpu: make pcpu_alloc_chunk() use pcpu_mem_free() instead of kfree() · ed0781a6
      Jianyu Zhan authored
      commit 5a838c3b upstream.
      
      pcpu_chunk_struct_size = sizeof(struct pcpu_chunk) +
      	BITS_TO_LONGS(pcpu_unit_pages) * sizeof(unsigned long)
      
      It hardly could be ever bigger than PAGE_SIZE even for large-scale machine,
      but for consistency with its couterpart pcpu_mem_zalloc(),
      use pcpu_mem_free() instead.
      
      Commit b4916cb1 ("percpu: make pcpu_free_chunk() use
      pcpu_mem_free() instead of kfree()") addressed this problem, but
      missed this one.
      
      tj: commit message updated
      Signed-off-by: default avatarJianyu Zhan <nasa4836@gmail.com>
      Signed-off-by: default avatarTejun Heo <tj@kernel.org>
      Fixes: 099a19d9 ("percpu: allow limited allocation before slab is online)
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      ed0781a6
    • Mimi Zohar's avatar
      ima: audit log files opened with O_DIRECT flag · 46a91e71
      Mimi Zohar authored
      commit f9b2a735 upstream.
      
      Files are measured or appraised based on the IMA policy.  When a
      file, in policy, is opened with the O_DIRECT flag, a deadlock
      occurs.
      
      The first attempt at resolving this lockdep temporarily removed the
      O_DIRECT flag and restored it, after calculating the hash.  The
      second attempt introduced the O_DIRECT_HAVELOCK flag. Based on this
      flag, do_blockdev_direct_IO() would skip taking the i_mutex a second
      time.  The third attempt, by Dmitry Kasatkin, resolves the i_mutex
      locking issue, by re-introducing the IMA mutex, but uncovered
      another problem.  Reading a file with O_DIRECT flag set, writes
      directly to userspace pages.  A second patch allocates a user-space
      like memory.  This works for all IMA hooks, except ima_file_free(),
      which is called on __fput() to recalculate the file hash.
      
      Until this last issue is addressed, do not 'collect' the
      measurement for measuring, appraising, or auditing files opened
      with the O_DIRECT flag set.  Based on policy, permit or deny file
      access.  This patch defines a new IMA policy rule option named
      'permit_directio'.  Policy rules could be defined, based on LSM
      or other criteria, to permit specific applications to open files
      with the O_DIRECT flag set.
      
      Changelog v1:
      - permit or deny file access based IMA policy rules
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Acked-by: default avatarDmitry Kasatkin <d.kasatkin@samsung.com>
      Cc: Tim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      46a91e71
    • Dmitry Kasatkin's avatar
      ima: introduce ima_kernel_read() · 9c71c8f7
      Dmitry Kasatkin authored
      commit 0430e49b upstream.
      
      Commit 8aac6270 "move exit_task_namespaces() outside of exit_notify"
      introduced the kernel opps since the kernel v3.10, which happens when
      Apparmor and IMA-appraisal are enabled at the same time.
      
      ----------------------------------------------------------------------
      [  106.750167] BUG: unable to handle kernel NULL pointer dereference at
      0000000000000018
      [  106.750221] IP: [<ffffffff811ec7da>] our_mnt+0x1a/0x30
      [  106.750241] PGD 0
      [  106.750254] Oops: 0000 [#1] SMP
      [  106.750272] Modules linked in: cuse parport_pc ppdev bnep rfcomm
      bluetooth rpcsec_gss_krb5 nfsd auth_rpcgss nfs_acl nfs lockd sunrpc
      fscache dm_crypt intel_rapl x86_pkg_temp_thermal intel_powerclamp
      kvm_intel snd_hda_codec_hdmi kvm crct10dif_pclmul crc32_pclmul
      ghash_clmulni_intel aesni_intel aes_x86_64 glue_helper lrw gf128mul
      ablk_helper cryptd snd_hda_codec_realtek dcdbas snd_hda_intel
      snd_hda_codec snd_hwdep snd_pcm snd_page_alloc snd_seq_midi
      snd_seq_midi_event snd_rawmidi psmouse snd_seq microcode serio_raw
      snd_timer snd_seq_device snd soundcore video lpc_ich coretemp mac_hid lp
      parport mei_me mei nbd hid_generic e1000e usbhid ahci ptp hid libahci
      pps_core
      [  106.750658] CPU: 6 PID: 1394 Comm: mysqld Not tainted 3.13.0-rc7-kds+ #15
      [  106.750673] Hardware name: Dell Inc. OptiPlex 9010/0M9KCM, BIOS A08
      09/19/2012
      [  106.750689] task: ffff8800de804920 ti: ffff880400fca000 task.ti:
      ffff880400fca000
      [  106.750704] RIP: 0010:[<ffffffff811ec7da>]  [<ffffffff811ec7da>]
      our_mnt+0x1a/0x30
      [  106.750725] RSP: 0018:ffff880400fcba60  EFLAGS: 00010286
      [  106.750738] RAX: 0000000000000000 RBX: 0000000000000100 RCX:
      ffff8800d51523e7
      [  106.750764] RDX: ffffffffffffffea RSI: ffff880400fcba34 RDI:
      ffff880402d20020
      [  106.750791] RBP: ffff880400fcbae0 R08: 0000000000000000 R09:
      0000000000000001
      [  106.750817] R10: 0000000000000000 R11: 0000000000000001 R12:
      ffff8800d5152300
      [  106.750844] R13: ffff8803eb8df510 R14: ffff880400fcbb28 R15:
      ffff8800d51523e7
      [  106.750871] FS:  0000000000000000(0000) GS:ffff88040d200000(0000)
      knlGS:0000000000000000
      [  106.750910] CS:  0010 DS: 0000 ES: 0000 CR0: 0000000080050033
      [  106.750935] CR2: 0000000000000018 CR3: 0000000001c0e000 CR4:
      00000000001407e0
      [  106.750962] Stack:
      [  106.750981]  ffffffff813434eb ffff880400fcbb20 ffff880400fcbb18
      0000000000000000
      [  106.751037]  ffff8800de804920 ffffffff8101b9b9 0001800000000000
      0000000000000100
      [  106.751093]  0000010000000000 0000000000000002 000000000000000e
      ffff8803eb8df500
      [  106.751149] Call Trace:
      [  106.751172]  [<ffffffff813434eb>] ? aa_path_name+0x2ab/0x430
      [  106.751199]  [<ffffffff8101b9b9>] ? sched_clock+0x9/0x10
      [  106.751225]  [<ffffffff8134a68d>] aa_path_perm+0x7d/0x170
      [  106.751250]  [<ffffffff8101b945>] ? native_sched_clock+0x15/0x80
      [  106.751276]  [<ffffffff8134aa73>] aa_file_perm+0x33/0x40
      [  106.751301]  [<ffffffff81348c5e>] common_file_perm+0x8e/0xb0
      [  106.751327]  [<ffffffff81348d78>] apparmor_file_permission+0x18/0x20
      [  106.751355]  [<ffffffff8130c853>] security_file_permission+0x23/0xa0
      [  106.751382]  [<ffffffff811c77a2>] rw_verify_area+0x52/0xe0
      [  106.751407]  [<ffffffff811c789d>] vfs_read+0x6d/0x170
      [  106.751432]  [<ffffffff811cda31>] kernel_read+0x41/0x60
      [  106.751457]  [<ffffffff8134fd45>] ima_calc_file_hash+0x225/0x280
      [  106.751483]  [<ffffffff8134fb52>] ? ima_calc_file_hash+0x32/0x280
      [  106.751509]  [<ffffffff8135022d>] ima_collect_measurement+0x9d/0x160
      [  106.751536]  [<ffffffff810b552d>] ? trace_hardirqs_on+0xd/0x10
      [  106.751562]  [<ffffffff8134f07c>] ? ima_file_free+0x6c/0xd0
      [  106.751587]  [<ffffffff81352824>] ima_update_xattr+0x34/0x60
      [  106.751612]  [<ffffffff8134f0d0>] ima_file_free+0xc0/0xd0
      [  106.751637]  [<ffffffff811c9635>] __fput+0xd5/0x300
      [  106.751662]  [<ffffffff811c98ae>] ____fput+0xe/0x10
      [  106.751687]  [<ffffffff81086774>] task_work_run+0xc4/0xe0
      [  106.751712]  [<ffffffff81066fad>] do_exit+0x2bd/0xa90
      [  106.751738]  [<ffffffff8173c958>] ? retint_swapgs+0x13/0x1b
      [  106.751763]  [<ffffffff8106780c>] do_group_exit+0x4c/0xc0
      [  106.751788]  [<ffffffff81067894>] SyS_exit_group+0x14/0x20
      [  106.751814]  [<ffffffff8174522d>] system_call_fastpath+0x1a/0x1f
      [  106.751839] Code: c3 0f 1f 44 00 00 55 48 89 e5 e8 22 fe ff ff 5d c3
      0f 1f 44 00 00 55 65 48 8b 04 25 c0 c9 00 00 48 8b 80 28 06 00 00 48 89
      e5 5d <48> 8b 40 18 48 39 87 c0 00 00 00 0f 94 c0 c3 0f 1f 80 00 00 00
      [  106.752185] RIP  [<ffffffff811ec7da>] our_mnt+0x1a/0x30
      [  106.752214]  RSP <ffff880400fcba60>
      [  106.752236] CR2: 0000000000000018
      [  106.752258] ---[ end trace 3c520748b4732721 ]---
      ----------------------------------------------------------------------
      
      The reason for the oops is that IMA-appraisal uses "kernel_read()" when
      file is closed. kernel_read() honors LSM security hook which calls
      Apparmor handler, which uses current->nsproxy->mnt_ns. The 'guilty'
      commit changed the order of cleanup code so that nsproxy->mnt_ns was
      not already available for Apparmor.
      
      Discussion about the issue with Al Viro and Eric W. Biederman suggested
      that kernel_read() is too high-level for IMA. Another issue, except
      security checking, that was identified is mandatory locking. kernel_read
      honors it as well and it might prevent IMA from calculating necessary hash.
      It was suggested to use simplified version of the function without security
      and locking checks.
      
      This patch introduces special version ima_kernel_read(), which skips security
      and mandatory locking checking. It prevents the kernel oops to happen.
      Signed-off-by: default avatarDmitry Kasatkin <d.kasatkin@samsung.com>
      Suggested-by: default avatarEric W. Biederman <ebiederm@xmission.com>
      Signed-off-by: default avatarMimi Zohar <zohar@linux.vnet.ibm.com>
      Cc: Tim Gardner <tim.gardner@canonical.com>
      Signed-off-by: default avatarKamal Mostafa <kamal@canonical.com>
      9c71c8f7
  2. 13 Jun, 2014 29 commits