- 01 Aug, 2016 6 commits
-
-
Xin Long authored
Prior to this patch, sctp defined TCP_CLOSING as SCTP_SS_CLOSING. TCP_CLOSING is such a special sk state in TCP that inet common codes even exclude it. For instance, inet_accept thinks the accept sk's state never be TCP_CLOSING, or it will give a WARN_ON. TCP works well with that while SCTP may trigger the call trace, as CLOSING state in SCTP has different meaning from TCP. This fix is to change to use TCP_CLOSE_WAIT as SCTP_SS_CLOSING, instead of TCP_CLOSING. Some side-effects could be expected, regardless of not being used before. inet_accept will accept it now. I did all the func_tests in lksctp-tools and ran sctp codnomicon fuzzer tests against this patch, no regression or failure found. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Alexander Stein authored
There are KSZ8721 PHYs with phy_id 0x00221619. In order to detect them as PHY_ID_KSZ8001 compatible while staying different to PHY_ID_KSZ9021 ignore the last two bits when matching PHY_ID Signed-off-by: Alexander Stein <alexanders83@web.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Chunhao Lin says: ==================== r8169: fix 3 runtime pm related issues. v2: use "struct device *d = &tp->pci_dev->dev" instead of "struct pci_dev *pdev = tp->pci_dev" v1: This series of patches fix 3 runtime pm related issues that are listed below. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chun-Hao Lin authored
When there is no AC power, NIC may not work after changing mac address. Please refer to following link. http://www.spinics.net/lists/netdev/msg356572.html This issue is caused by runtime power management. When there is no AC power, if we put NIC down (ifconfig down), the driver will be in runtime suspend state and hardware will be put into D3 state. During this time, driver cannot access hardware regisers. So if you set new mac address during this time, it will not be set to hardware. After resume, NIC will keep using the old mac address and the network will not work normally. In this patch I add detecting runtime pm status when setting mac address. If driver is in runtime suspend state, it will skip setting mac address, keep the new mac address, and set the new mac address during runtime resume. Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chun-Hao Lin authored
Not to call rtl8169_update_counters() to dump tally counter when driver is in runtime suspend state. Calling rtl8169_update_counters() in runtime suspend state will produce warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)". Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Chun-Hao Lin authored
NIC will be put into D3 state during runtime suspend state. When set or get hardware wol setting, driver will write or read hardware registers. If we set or get hardware wol setting in runtime suspend state, because NIC will in D3 state, the hardware registers read by driver will return all 0xff. That will let driver thinking register flag is not toggled and then prints the warning message "rtl_counters_cond == 1 (loop: 1000, delay: 10)" to kernel log. For fixing this issue, add checking driver's pm runtime status in rtl8169_get_wol() and rtl8169_set_wol(). Signed-off-by: Chunhao Lin <hau@realtek.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 31 Jul, 2016 26 commits
-
-
Florian Fainelli authored
In case we cannot complete bcm_sf2_sw_setup() for any reason, and we go to the out_unmap label, but the MDIO bus has not been registered yet, we will hit the BUG condition in drivers/net/phy/mdio_bus.c about the bus not being registered. Fix this by dedicating a specific lable for when we fail after the MDIO bus has been successfully registered. Fixes: 461cd1b0 ("net: dsa: bcm_sf2: Register our slave MDIO bus") Signed-off-by: Florian Fainelli <f.fainelli@gmail.com> Reviewed-by: Vivien Didelot <vivien.didelot@savoirfairelinux.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Colin Ian King authored
trivial fix to spelling mistake in printk message Signed-off-by: Colin Ian King <colin.king@canonical.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Commit 141ddefc ("sctp: change sk state to CLOSED instead of CLOSING in sctp_sock_migrate") changed sk state to CLOSED if the assoc is closed when sctp_accept clones a new sk. If there is still data in sk receive queue, users will not be able to read it any more, as sctp_recvmsg returns directly if sk state is CLOSED. This patch is to add CLOSED state check in sctp_recvmsg to allow reading data from TCP-style sk with CLOSED state as what TCP does. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
Prior to this patch, once sctp received SHUTDOWN or shutdown with RD, sk->sk_shutdown would be set with RCV_SHUTDOWN, and all events would be dropped in sctp_ulpq_tail_event(). It would cause: 1. some notifications couldn't be received by users. like SCTP_SHUTDOWN_COMP generated by sctp_sf_do_4_C(). 2. sctp would also never trigger sk_data_ready when the association was closed, making it harder to identify the end of the association by calling recvmsg() and getting an EOF. It was not convenient for kernel users. The check here should be stopping delivering DATA chunks after receiving SHUTDOWN, and stopping delivering ANY chunks after sctp_close(). So this patch is to allow notifications to enqueue into receive queue even if sk->sk_shutdown is set to RCV_SHUTDOWN in sctp_ulpq_tail_event, but if sk->sk_shutdown == RCV_SHUTDOWN | SEND_SHUTDOWN, it drops all events. Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Xin Long authored
sctp needs to queue auth chunk back when we know that we are going to generate another segment. But commit f1533cce ("sctp: fix panic when sending auth chunks") requeues the last chunk processed which is probably not the auth chunk. It causes panic when calculating the MAC in sctp_auth_calculate_hmac(), as the incorrect offset of the auth chunk in skb->data. This fix is to requeue it by using packet->auth. Fixes: f1533cce ("sctp: fix panic when sending auth chunks") Signed-off-by: Xin Long <lucien.xin@gmail.com> Acked-by: Marcelo Ricardo Leitner <marcelo.leitner@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Soheil Hassas Yeganeh authored
tcp_select_initial_window() intends to advertise a window scaling for the maximum possible window size. To do so, it considers the maximum of net.ipv4.tcp_rmem[2] and net.core.rmem_max as the only possible upper-bounds. However, users with CAP_NET_ADMIN can use SO_RCVBUFFORCE to set the socket's receive buffer size to values larger than net.ipv4.tcp_rmem[2] and net.core.rmem_max. Thus, SO_RCVBUFFORCE is effectively ignored by tcp_select_initial_window(). To fix this, consider the maximum of net.ipv4.tcp_rmem[2], net.core.rmem_max and socket's initial buffer space. Fixes: b0573dea ("[NET]: Introduce SO_{SND,RCV}BUFFORCE socket options") Signed-off-by: Soheil Hassas Yeganeh <soheil@google.com> Suggested-by: Neal Cardwell <ncardwell@google.com> Acked-by: Neal Cardwell <ncardwell@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Sabrina Dubroca says: ==================== macsec: reference counting fixes Patch 1 adds explicit reference counting on RXSCs, instead of the current implicit reference counting using the RXSA's refcount. Patch 2 fixes possible kernel panics during module unload caused by an RCU callback that schedules another RCU callback, which the rcu_barrier() added in b196c22a ("macsec: add rcu_barrier() on module exit") didn't protect against. Patch 3 fixes a refcounting issue with the underlying device for a macsec device when link creation fails. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sabrina Dubroca authored
When creation of a macsec device fails because an identical device already exists on this link, the current code decrements the refcnt on the parent link (in ->destructor for the macsec device), but it had not been incremented yet. Move the dev_hold(parent_link) call earlier during macsec device creation. Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sabrina Dubroca authored
Following the previous patch, RXSCs are held and properly refcounted in the RX path (instead of being implicitly held by their SA), so the SA doesn't need to hold a reference on its parent RXSC. This also avoids panics on module unload caused by the double layer of RCU callbacks (call_rcu frees the RXSA, which puts the final reference on the RXSC and allows to free it in its own call_rcu) that commit b196c22a ("macsec: add rcu_barrier() on module exit") didn't protect against. There were also some refcounting bugs in macsec_add_rxsa where I didn't put the reference on the RXSC on the error paths, which would lead to memory leaks. Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Sabrina Dubroca authored
Currently, we lookup the RXSC without taking a reference on it. The RXSA holds a reference on the RXSC, but the SA and SC could still both disappear before we take a reference on the SA. Take a reference on the RXSC in macsec_handle_frame. Fixes: c09440f7 ("macsec: introduce IEEE 802.1AE driver") Signed-off-by: Sabrina Dubroca <sd@queasysnail.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Grygorii Strashko says: ==================== drivers: net: cpsw: fix driver loading/unloading This series fixes set of isssues observed when CPSW driver module is unloaded/loaded: 1) rmmod: deadlock in cpdma_ctlr_destroy 2) rmmod: L3 back-trace and crash if all net interfaces are down, because CPSW can be powerred down by PM runtime in this case. 3) insmod: mdio device is not recreated on next insmod - need to use of_platform_depopulate() in cpsw_remove(). 4) rmmod: system crash on omap_device removal Tested on: am437x-idk, am57xx-beagle-x15 Changes in v2: - build warning fixed - added fix for correct omap_device removal Link on v1: https://lkml.org/lkml/2016/7/22/240 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Grygorii Strashko authored
Below call chain causes system crash when OMAP device is removed by calling of_platform_depopulate()/device_del(): device_del() - blocking_notifier_call_chain(&dev->bus->p->bus_notifier, BUS_NOTIFY_DEL_DEVICE, dev); - _omap_device_notifier_call() - omap_device_delete() - od->pdev->archdata.od = NULL; kfree(od->hwmods); kfree(od); - bus_remove_device() - device_release_driver() - __device_release_driver() - pm_runtime_get_sync() - _od_runtime_resume() - omap_hwmod_enable() <- OOPS od's delted already Backtrace: Unable to handle kernel NULL pointer dereference at virtual address 0000000d pgd = eb100000 [0000000d] *pgd=ad6e1831, *pte=00000000, *ppte=00000000 Internal error: Oops: 17 [#1] PREEMPT SMP ARM CPU: 1 PID: 1273 Comm: modprobe Not tainted 4.4.15-rt19-00115-ge4d3cd3-dirty #68 Hardware name: Generic DRA74X (Flattened Device Tree) task: eb1ee800 ti: ec962000 task.ti: ec962000 PC is at omap_device_enable+0x10/0x90 LR is at _od_runtime_resume+0x10/0x24 [...] [<c00299dc>] (omap_device_enable) from [<c0029a6c>] (_od_runtime_resume+0x10/0x24) [<c0029a6c>] (_od_runtime_resume) from [<c04ad404>] (__rpm_callback+0x20/0x34) [<c04ad404>] (__rpm_callback) from [<c04ad438>] (rpm_callback+0x20/0x80) [<c04ad438>] (rpm_callback) from [<c04aee28>] (rpm_resume+0x48c/0x964) [<c04aee28>] (rpm_resume) from [<c04af360>] (__pm_runtime_resume+0x60/0x88) [<c04af360>] (__pm_runtime_resume) from [<c04a4974>] (__device_release_driver+0x30/0x100) [<c04a4974>] (__device_release_driver) from [<c04a4a60>] (device_release_driver+0x1c/0x28) [<c04a4a60>] (device_release_driver) from [<c04a38c0>] (bus_remove_device+0xec/0x144) [<c04a38c0>] (bus_remove_device) from [<c04a0764>] (device_del+0x10c/0x210) [<c04a0764>] (device_del) from [<c04a67b0>] (platform_device_del+0x18/0x84) [<c04a67b0>] (platform_device_del) from [<c04a6828>] (platform_device_unregister+0xc/0x20) [<c04a6828>] (platform_device_unregister) from [<c05adcfc>] (of_platform_device_destroy+0x8c/0x90) [<c05adcfc>] (of_platform_device_destroy) from [<c04a02f0>] (device_for_each_child+0x4c/0x78) [<c04a02f0>] (device_for_each_child) from [<c05adc5c>] (of_platform_depopulate+0x30/0x44) [<c05adc5c>] (of_platform_depopulate) from [<bf123920>] (cpsw_remove+0x68/0xf4 [ti_cpsw]) [<bf123920>] (cpsw_remove [ti_cpsw]) from [<c04a68d8>] (platform_drv_remove+0x24/0x3c) [<c04a68d8>] (platform_drv_remove) from [<c04a49c8>] (__device_release_driver+0x84/0x100) [<c04a49c8>] (__device_release_driver) from [<c04a4b20>] (driver_detach+0xac/0xb0) [<c04a4b20>] (driver_detach) from [<c04a3be8>] (bus_remove_driver+0x60/0xd4) [<c04a3be8>] (bus_remove_driver) from [<c00d9870>] (SyS_delete_module+0x184/0x20c) [<c00d9870>] (SyS_delete_module) from [<c0010540>] (ret_fast_syscall+0x0/0x1c) Code: e3500000 e92d4070 1590630c 01a06000 (e5d6300d) Hence, fix it by using BUS_NOTIFY_REMOVED_DEVICE event for OMAP device deletion which is sent when DD has finished processing of device deletion. Cc: Tony Lindgren <tony@atomide.com> Cc: Tero Kristo <t-kristo@ti.com> Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Grygorii Strashko authored
Use of_platform_depopulate() in cpsw_remove() instead of of_device_unregister(), because CSPW child devices will not be recreated otherwise on next insmod. of_platform_depopulate() is correct way now as it will ensure that all steps done in of_platform_populate() are reverted, including cleaning up of OF_POPULATED flag. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Grygorii Strashko authored
The L3 error will be generated and system will crash during unloading of CPSW driver if CPSW is used as module and ethX devices are down. This happens because CPSW can be power off by PM runtime now when ethX devices are down. Hence, ensure that CPSW powered up by PM runtime before performing any deinitialization actions which require CPSW registers access. In case of PM runtime error just leave cpsw_remove() as we can't do anything anymore. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Grygorii Strashko authored
Fix deadlock in cpdma_ctlr_destroy() which is triggered now on cpsw module removal: cpsw_remove() - cpdma_ctlr_destroy() - spin_lock_irqsave(&ctlr->lock, flags) - cpdma_ctlr_stop() - spin_lock_irqsave(&ctlr->lock, flags); - cpdma_chan_destroy() - spin_lock_irqsave(&ctlr->lock, flags); The issue has not been observed before because CPDMA channels have been destroyed manually by CPSW until commit d941ebe8 ("net: ethernet: ti: cpsw: use destroy ctlr to destroy channels") was merged. Signed-off-by: Grygorii Strashko <grygorii.strashko@ti.com> Reviewed-by: Mugunthan V N <mugunthanvnm@ti.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
Using list_move() instead of list_del() + list_add(). Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Hariprasad Shenai authored
The commit 637d3e99 ("cxgb4: Discard the packet if the length is greater than mtu") introduced a regression in the VLAN interface performance when Tx VLAN offload is disabled. Check if skb is tagged, regardless of whether it is hardware accelerated or not. Presently we were checking only for hardware acclereated one, which caused performance to drop to ~0.17Mbps on a 10GbE adapter for VLAN interface, when tx vlan offload is turned off using ethtool. The ethernet head length calculation was going wrong in this case, and driver ended up dropping packets. Fixes: 637d3e99 ("cxgb4: Discard the packet if the length is greater than mtu") Signed-off-by: Hariprasad Shenai <hariprasad@chelsio.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
There is a error message within devm_ioremap_resource already, so remove the dev_err call to avoid redundant error message. Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-By: Iyappan Subramanian <isubramanian@apm.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Wei Yongjun authored
In the error handling case of nla_nest_start() failed read_unlock_bh() is called to unlock a lock that had not been taken yet. sparse warns about the context imbalance as the following: net/tipc/monitor.c:799:23: warning: context imbalance in '__tipc_nl_add_monitor' - different lock contexts for basic block Fixes: cf6f7e1d ('tipc: dump monitor attributes') Signed-off-by: Wei Yongjun <weiyj.lk@gmail.com> Acked-by: Ying Xue <ying.xue@windriver.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Yuval Mintz says: ==================== qed*: Small fixes series This contains several small [and straight-forward] fixes to qed* drivers. Please consider applying this to `net'. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Each PF/VF has a limited number of vlan filters for configuration purposes; This information is passed to qede and is used to prevent over-usage - once a vlan is to be configured and no filter credit is available, the driver would switch into working in vlan-promisc mode. Problem is the credit pool is shared by both PFs and VFs, and currently PFs aren't deducting the filters that are reserved for their VFs from their quota, which may lead to some vlan filters failing unknowingly due to lack of credit. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Driver uses reverse logic when checking if minimum bandwidth configuration applied, causing it to configure the guarantee only on the first hw-function. Fixes: a0d26d5a ("qed*: Don't reset statistics on inner reload") Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Adding the necessary logic to prevet statistics reset on inner-reload introduced a bug, and now statistics are reset only when re-probing the driver. Fixes: a0d26d5a ("qed*: Don't reset statistics on inner reload") Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Before requesting the firmware to start Rx queues, driver goes and sets the queue producer in the device to 0. But while the producer is 32-bit, the driver currently clears 64 bits, effectively zeroing an additional CID's producer as well. Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
Driver has reverse logic for checking the result of the spoof-checking configuration. As a result, it would log that the configuration failed [even though it succeeded], and will no longer do anything when requested to remove the configuration, as it's accounting of the feature will be incorrect. Fixes: 6ddc7608 ("qed*: IOV support spoof-checking") Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Yuval Mintz authored
As part of ndo_vlan_rx_kill_vid() implementation, qede is requesting firmware to remove the vlan filter. This currently happens even if the vlan wasn't previously added [In case device ran out of vlan credits]. For PFs this doesn't cause any issues as the firmware would simply ignore the removal request. But for VFs their parent PF is holding an accounting of the configured vlans, and such a request would cause the PF to fail the VF's removal request. Simply fix this for both PFs & VFs and don't remove filters that were not previously added. Fixes: 7c1bfcad ("qede: Add vlan filtering offload support") Signed-off-by: Yuval Mintz <Yuval.Mintz@qlogic.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
- 30 Jul, 2016 2 commits
-
-
git://git.infradead.org/users/pcmoore/auditLinus Torvalds authored
Pull audit updates from Paul Moore: "Six audit patches for 4.8. There are a couple of style and minor whitespace tweaks for the logs, as well as a minor fixup to catch errors on user filter rules, however the major improvements are a fix to the s390 syscall argument masking code (reviewed by the nice s390 folks), some consolidation around the exclude filtering (less code, always a win), and a double-fetch fix for recording the execve arguments" * 'stable-4.8' of git://git.infradead.org/users/pcmoore/audit: audit: fix a double fetch in audit_log_single_execve_arg() audit: fix whitespace in CWD record audit: add fields to exclude filter by reusing user filter s390: ensure that syscall arguments are properly masked on s390 audit: fix some horrible switch statement style crimes audit: fixup: log on errors from filter user rules
-
git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-securityLinus Torvalds authored
Pull security subsystem updates from James Morris: "Highlights: - TPM core and driver updates/fixes - IPv6 security labeling (CALIPSO) - Lots of Apparmor fixes - Seccomp: remove 2-phase API, close hole where ptrace can change syscall #" * 'next' of git://git.kernel.org/pub/scm/linux/kernel/git/jmorris/linux-security: (156 commits) apparmor: fix SECURITY_APPARMOR_HASH_DEFAULT parameter handling tpm: Add TPM 2.0 support to the Nuvoton i2c driver (NPCT6xx family) tpm: Factor out common startup code tpm: use devm_add_action_or_reset tpm2_i2c_nuvoton: add irq validity check tpm: read burstcount from TPM_STS in one 32-bit transaction tpm: fix byte-order for the value read by tpm2_get_tpm_pt tpm_tis_core: convert max timeouts from msec to jiffies apparmor: fix arg_size computation for when setprocattr is null terminated apparmor: fix oops, validate buffer size in apparmor_setprocattr() apparmor: do not expose kernel stack apparmor: fix module parameters can be changed after policy is locked apparmor: fix oops in profile_unpack() when policy_db is not present apparmor: don't check for vmalloc_addr if kvzalloc() failed apparmor: add missing id bounds check on dfa verification apparmor: allow SYS_CAP_RESOURCE to be sufficient to prlimit another task apparmor: use list_next_entry instead of list_entry_next apparmor: fix refcount race when finding a child profile apparmor: fix ref count leak when profile sha1 hash is read apparmor: check that xindex is in trans_table bounds ...
-
- 29 Jul, 2016 6 commits
-
-
git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespaceLinus Torvalds authored
Pull userns vfs updates from Eric Biederman: "This tree contains some very long awaited work on generalizing the user namespace support for mounting filesystems to include filesystems with a backing store. The real world target is fuse but the goal is to update the vfs to allow any filesystem to be supported. This patchset is based on a lot of code review and testing to approach that goal. While looking at what is needed to support the fuse filesystem it became clear that there were things like xattrs for security modules that needed special treatment. That the resolution of those concerns would not be fuse specific. That sorting out these general issues made most sense at the generic level, where the right people could be drawn into the conversation, and the issues could be solved for everyone. At a high level what this patchset does a couple of simple things: - Add a user namespace owner (s_user_ns) to struct super_block. - Teach the vfs to handle filesystem uids and gids not mapping into to kuids and kgids and being reported as INVALID_UID and INVALID_GID in vfs data structures. By assigning a user namespace owner filesystems that are mounted with only user namespace privilege can be detected. This allows security modules and the like to know which mounts may not be trusted. This also allows the set of uids and gids that are communicated to the filesystem to be capped at the set of kuids and kgids that are in the owning user namespace of the filesystem. One of the crazier corner casees this handles is the case of inodes whose i_uid or i_gid are not mapped into the vfs. Most of the code simply doesn't care but it is easy to confuse the inode writeback path so no operation that could cause an inode write-back is permitted for such inodes (aka only reads are allowed). This set of changes starts out by cleaning up the code paths involved in user namespace permirted mounts. Then when things are clean enough adds code that cleanly sets s_user_ns. Then additional restrictions are added that are possible now that the filesystem superblock contains owner information. These changes should not affect anyone in practice, but there are some parts of these restrictions that are changes in behavior. - Andy's restriction on suid executables that does not honor the suid bit when the path is from another mount namespace (think /proc/[pid]/fd/) or when the filesystem was mounted by a less privileged user. - The replacement of the user namespace implicit setting of MNT_NODEV with implicitly setting SB_I_NODEV on the filesystem superblock instead. Using SB_I_NODEV is a stronger form that happens to make this state user invisible. The user visibility can be managed but it caused problems when it was introduced from applications reasonably expecting mount flags to be what they were set to. There is a little bit of work remaining before it is safe to support mounting filesystems with backing store in user namespaces, beyond what is in this set of changes. - Verifying the mounter has permission to read/write the block device during mount. - Teaching the integrity modules IMA and EVM to handle filesystems mounted with only user namespace root and to reduce trust in their security xattrs accordingly. - Capturing the mounters credentials and using that for permission checks in d_automount and the like. (Given that overlayfs already does this, and we need the work in d_automount it make sense to generalize this case). Furthermore there are a few changes that are on the wishlist: - Get all filesystems supporting posix acls using the generic posix acls so that posix_acl_fix_xattr_from_user and posix_acl_fix_xattr_to_user may be removed. [Maintainability] - Reducing the permission checks in places such as remount to allow the superblock owner to perform them. - Allowing the superblock owner to chown files with unmapped uids and gids to something that is mapped so the files may be treated normally. I am not considering even obvious relaxations of permission checks until it is clear there are no more corner cases that need to be locked down and handled generically. Many thanks to Seth Forshee who kept this code alive, and putting up with me rewriting substantial portions of what he did to handle more corner cases, and for his diligent testing and reviewing of my changes" * 'for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/ebiederm/user-namespace: (30 commits) fs: Call d_automount with the filesystems creds fs: Update i_[ug]id_(read|write) to translate relative to s_user_ns evm: Translate user/group ids relative to s_user_ns when computing HMAC dquot: For now explicitly don't support filesystems outside of init_user_ns quota: Handle quota data stored in s_user_ns in quota_setxquota quota: Ensure qids map to the filesystem vfs: Don't create inodes with a uid or gid unknown to the vfs vfs: Don't modify inodes with a uid or gid unknown to the vfs cred: Reject inodes with invalid ids in set_create_file_as() fs: Check for invalid i_uid in may_follow_link() vfs: Verify acls are valid within superblock's s_user_ns. userns: Handle -1 in k[ug]id_has_mapping when !CONFIG_USER_NS fs: Refuse uid/gid changes which don't map into s_user_ns selinux: Add support for unprivileged mounts from user namespaces Smack: Handle labels consistently in untrusted mounts Smack: Add support for unprivileged mounts from user namespaces fs: Treat foreign mounts as nosuid fs: Limit file caps to the user namespace of the super block userns: Remove the now unnecessary FS_USERNS_DEV_MOUNT flag userns: Remove implicit MNT_NODEV fragility. ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pmLinus Torvalds authored
Pull power management fix from Rafael Wysocki: "Fix a nasty (and really hard to debug) memory corruption during resume from hibernation on x86-64 (that leads to a kernel panic most of the time) due to the use of a stale stack pointer value in FRAME_BEGIN (Josh Poimboeuf)" * tag 'pm-urgent-4.8-rc1' of git://git.kernel.org/pub/scm/linux/kernel/git/rafael/linux-pm: x86/power/64: Fix hibernation return address corruption
-
git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroupLinus Torvalds authored
Pull more cgroup updates from Tejun Heo: "I forgot to include the patches which got applied to for-4.7-fixes late during last cycle. Eric's three patches fix bugs introduced with the namespace support" * 'for-4.7-fixes' of git://git.kernel.org/pub/scm/linux/kernel/git/tj/cgroup: cgroupns: Only allow creation of hierarchies in the initial cgroup namespace cgroupns: Close race between cgroup_post_fork and copy_cgroup_ns cgroupns: Fix the locking in copy_cgroup_ns
-
git://git.kernel.org/pub/scm/linux/kernel/git/tip/tipLinus Torvalds authored
Pull smp hotplug updates from Thomas Gleixner: "This is the next part of the hotplug rework. - Convert all notifiers with a priority assigned - Convert all CPU_STARTING/DYING notifiers The final removal of the STARTING/DYING infrastructure will happen when the merge window closes. Another 700 hundred line of unpenetrable maze gone :)" * 'smp-hotplug-for-linus' of git://git.kernel.org/pub/scm/linux/kernel/git/tip/tip: (70 commits) timers/core: Correct callback order during CPU hot plug leds/trigger/cpu: Move from CPU_STARTING to ONLINE level powerpc/numa: Convert to hotplug state machine arm/perf: Fix hotplug state machine conversion irqchip/armada: Avoid unused function warnings ARC/time: Convert to hotplug state machine clocksource/atlas7: Convert to hotplug state machine clocksource/armada-370-xp: Convert to hotplug state machine clocksource/exynos_mct: Convert to hotplug state machine clocksource/arm_global_timer: Convert to hotplug state machine rcu: Convert rcutree to hotplug state machine KVM/arm/arm64/vgic-new: Convert to hotplug state machine smp/cfd: Convert core to hotplug state machine x86/x2apic: Convert to CPU hotplug state machine profile: Convert to hotplug state machine timers/core: Convert to hotplug state machine hrtimer: Convert to hotplug state machine x86/tboot: Convert to hotplug state machine arm64/armv8 deprecated: Convert to hotplug state machine hwtracing/coresight-etm4x: Convert to hotplug state machine ...
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/ideLinus Torvalds authored
Pull IDE updates from David Miller: "Just a couple small bug fixes, nothing overly exciting in here" * git://git.kernel.org/pub/scm/linux/kernel/git/davem/ide: ide: missing break statement in set_timings_mdma() ide: hpt366: fix incorrect mask when checking at cmd_high_time ide-tape: fix misprint in failure handling in idetape_init() cmd640: add __init attribute
-
git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparcLinus Torvalds authored
Pull sparc updates from David Miller: 1) Double spin lock bug in sunhv serial driver, from Dan Carpenter. 2) Use correct RSS estimate when determining whether to grow the huge TSB or not, from Mike Kravetz. 3) Don't use full three level page tables for hugepages, PMD level is sufficient. From Nitin Gupta. 4) Mask out extraneous bits from TSB_TAG_ACCESS register, we only want the address bits. * git://git.kernel.org/pub/scm/linux/kernel/git/davem/sparc: sparc64: Trim page tables for 8M hugepages sparc64 mm: Fix base TSB sizing when hugetlb pages are used sparc: serial: sunhv: fix a double lock bug sparc32: off by ones in BUG_ON() sparc: Don't leak context bits into thread->fault_address
-