- 19 Feb, 2017 13 commits
-
-
Benjamin Poirier authored
The following message is logged from time to time when using i40e: NOHZ: local_softirq_pending 08 i40e may schedule napi from a workqueue. Afterwards, softirqs are not run in a deterministic time frame. The problem is the same as what was described in commit ec13ee80 ("virtio_net: invoke softirqs after __napi_schedule") and this patch applies the same fix to i40e. Signed-off-by: Benjamin Poirier <bpoirier@suse.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Carolyn Wyborny authored
Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
Fix, or rather, avoid a sparse warning caused by the fact that csum_replace_by_diff expects to receive a __wsum value. Since the calculation appears to work, simply typecast the passed paylen value to __wsum to avoid the warning. This seems pretty fishy since __wsum was obviously annotated as a separate type on purpose, so this throws the entire calculation into question. Since it currently appears to behave as expected, the typecast is probably safe. Change-ID: I4fdc5cddd589abc16098176e8a61127e761488f4 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Harshitha Ramamurthy authored
There exists an intermittent bug which causes the 'Link Detected' field reported by the 'ethtool <iface>' command to be 'Yes' when in fact, there is no link. This patch fixes the problem by enabling temporary link polling when i40e_get_link_status returns an error. This causes the driver to remember that an admin queue command failed and polls, until the function returns with a success. Change-Id: I64c69b008db4017b8729f3fc27b8f65c8fe2eaa0 Signed-off-by: Harshitha Ramamurthy <harshitha.ramamurthy@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
This ensures that the pvid which is stored in __le16 format is converted to the CPU format. This will fix comparison issues on Big Endian platforms. Change-ID: I92c80d1315dc2a0f9f095d5a0c48d461beb052ed Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
On Big Endian platforms we would incorrectly calculate the wrong switch id since we did not properly convert the le16 value into CPU format. Caught by sparse. Change-ID: I69a2f9fa064a0a91691f7d0e6fcc206adceb8e36 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alan Brady authored
This patch refactors the '%*ph' printk format specifier to instead use the print_hex_dump function, as recommended by the '%*ph' documentation. This produces better/more standardized output. Change-ID: Id56700b4e8abc40ff8c04bc8379e7df04cb4d6fd Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Carolyn Wyborny authored
This patch fixes a bug introduced with the addition of the per queue ITR feature support in ethtool. With that addition, there were functions added which converted the ITR settings to binary values. The IS_ENABLED macros that run on those values check whether a bit is set or not and with the value being binary, the bit check always returned ITR disabled which prevents any updating of the ITR rate. This patch fixes the problem by changing the functions to return the current ITR value instead and renaming it to better reflect its function. These functions now provide a value which will be accurately asessed and update the ITR as intended. Change-ID: I14f1d088d052e27f652aaa3113e186415ddea1fc Signed-off-by: Carolyn Wyborny <carolyn.wyborny@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
Add a comment to reduce confusion. Change-ID: I3d5819c0f3f5174680442ae54398a073d4a61f4f Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Mitch Williams authored
When the i40evf_remove() calls netdev close, the device doesn't actually close - it schedules the work for the watchdog to perform. Since we're stopping the watchdog, this work doesn't get done. However, we're resetting the part, so we can free resources after the reset request has gone through. This plugs a memory leak. Change-ID: Id5335dcaf76ce00d2a4c3d26e9faf711d7f051cf Signed-off-by: Mitch Williams <mitch.a.williams@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Jacob Keller authored
This call is made just prior to running i40e_link_event. In i40e_link_event, we set hw->phy.get_link_info to true just prior to calling i40e_get_link_status, which conveniently runs i40e_update_link_info for us. Thus, we are running i40e_update_link_info twice, which seems like something we don't need to do... Change-ID: I36467a570f44b7546d218c99e134ff97c2709315 Signed-off-by: Jacob Keller <jacob.e.keller@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Joshua Hay authored
This patch adds a call to the mac_address_write admin q function during power down to update the PRTPM_SAH/SAL registers with the MC_MAG_EN bit thus enabling multicast magic packet wakeup. A FW workaround is needed to write the multicast magic wake up enable bit in the PRTPM_SAH register. The FW expects the mac address write admin q cmd to be called first with one of the WRITE_TYPE_LAA flags and then with the multicast relevant flags. *Note: This solution only works for X722 devices currently. A PFR will clear the previously mentioned bit by default, but X722 has support for a WOL_PRESERVE_ON_PFR flag which prevents the bit from being cleared. Once other devices support this flag, this solution should work as well. Change-ID: I51bd5b8535bd9051c2676e27c999c1657f786827 Signed-off-by: Joshua Hay <joshua.a.hay@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
Alan Brady authored
There exists a bug in which the driver is unable to exit overflow promiscuous mode after having added "too many" mac filters. It is expected that after triggering overflow promiscuous, removing the failed/extra filters should then disable overflow promiscuous mode. The bug exists because we were intentionally skipping the sync_vsi_filter path in cases where we were removing failed filters since they shouldn't have been added to the firmware in the first place, however we still need to go through the sync_vsi_filter code path to determine whether or not it is ok to exit overflow promiscuous mode. This patch fixes the bug by making sure we go through the sync_vsi_filter path in cases of failed filters. Change-ID: I634d249ca3e5fa50729553137c295e73e7722143 Signed-off-by: Alan Brady <alan.brady@intel.com> Tested-by: Andrew Bowers <andrewx.bowers@intel.com> Signed-off-by: Jeff Kirsher <jeffrey.t.kirsher@intel.com>
-
- 17 Feb, 2017 27 commits
-
-
Eric Dumazet authored
sk_page_frag_refill() allocates either a compound page or an order-0 page. We can use page_ref_inc() which is slightly faster than get_page() Signed-off-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Cui, Cheng authored
tcp: accommodate sequence number to a peer's shrunk receive window caused by precision loss in window scaling Prevent sending out a left-shifted sequence number from a Linux sender in response to a peer's shrunk receive-window caused by losing least significant bits in window-scaling. Cc: "David S. Miller" <davem@davemloft.net> Cc: Alexey Kuznetsov <kuznet@ms2.inr.ac.ru> Cc: James Morris <jmorris@namei.org> Cc: Hideaki YOSHIFUJI <yoshfuji@linux-ipv6.org> Cc: Patrick McHardy <kaber@trash.net> Signed-off-by: Cheng Cui <Cheng.Cui@netapp.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Edward Cree says: ==================== sfc: misc. fixes Three largely unrelated fixes to increase robustness in rare edge cases. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Peter Dunning authored
efx_start_all can return without initialising queues as a reset is pending. This means that when netif_device_attach is called, the kernel can start sending traffic without having an initialised TX queue to send to. This patch avoids this by not calling netif_device_attach if there is a pending reset. Fixes: e283546c ("sfc:On MCDI timeout, issue an FLR (and mark MCDI to fail-fast)") Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Bert Kenward authored
If the hw doesn't think they exist, we should defer to its authority. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Jon Cooper authored
On EF10, hardware filter IDs are 13 bits, but in some places we store 32-bit "full filter IDs" in which higher order bits encode the filter match-priority. This could cause a filter to have a full filter ID of 0xffff, which is also the value EFX_EF10_FILTER_ID_INVALID which we use in 16-bit "short" filter IDs (without match-priority bits). This would occur if the hardware filter ID was 0x1fff and the match-priority was 7. Unfortunately, some code that checks for EFX_EF10_FILTER_ID_INVALID can be called on full filter IDs, and will WARN_ON if this ever happens. So, since we have plenty of spare bits in the full filter ID, this patch shifts the priority bits left one bit when constructing the full filter IDs, ensuring that the 0x2000 bit of a full filter ID will always be 0 and thus no full filter ID can ever equal EFX_EF10_FILTER_ID_INVALID. This patch also replaces open-coded full<->short filter ID conversions with calls to functions, thus keeping the definition of the full filter ID format in one place. Signed-off-by: Edward Cree <ecree@solarflare.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Arnd Bergmann authored
I got a warning about broken code on ARM64 with 64K pages: drivers/net/vmxnet3/vmxnet3_drv.c: In function 'vmxnet3_rq_init': drivers/net/vmxnet3/vmxnet3_drv.c:1679:29: error: large integer implicitly truncated to unsigned type [-Werror=overflow] rq->buf_info[0][i].len = PAGE_SIZE; 'len' here is a 16-bit integer, so this clearly won't work. I don't think this driver is used much on anything other than x86, so there is no need to fix this properly and we can work around it with a Kconfig dependency to forbid known-broken configurations. qemu in theory supports it on other architectures too, but presumably only for compatibility with x86 guests that also run on vmware. CONFIG_PAGE_SIZE_64KB is used on hexagon, mips, sh and tile, the other symbols are architecture-specific names for the same thing. Signed-off-by: Arnd Bergmann <arnd@arndb.de> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Valentin Longchamp authored
It is required to build it as a module. Signed-off-by: Valentin Longchamp <valentin.longchamp@keymile.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Zhu Yanjun authored
In the function rds_ib_xmit_atomic, ib_ring is not allocated successfully. As such, it is not necessary to unalloc it. Cc: Joe Jin <joe.jin@oracle.com> Cc: Junxiao Bi <junxiao.bi@oracle.com> Signed-off-by: Zhu Yanjun <yanjun.zhu@oracle.com> Acked-by: Santosh Shilimkar <santosh.shilimkar@oracle.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Horman authored
Use PCI_DEVICE_ID_NETRONOME_NFP*, defined in linux/pci_ids.h, rather than replicating the same values in the NFP driver. Signed-off-by: Simon Horman <simon.horman@netronome.com> Acked-by: Jakub Kicinski <jakub.kicinski@netronome.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Gao Feng authored
The qdisc_stab_lock is used in qdisc_get_stab and qdisc_put_stab. These two functions are invoked in qdisc_create, qdisc_change, and qdisc_destroy which run fully under RTNL. So it already makes sure only one could access the qdisc_stab_list at the same time. Then it is unnecessary to use qdisc_stab_lock now. Signed-off-by: Gao Feng <fgao@ikuai8.com> Acked-by: Eric Dumazet <edumazet@google.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David Howells authored
Change module filename from af-rxrpc.ko to rxrpc.ko so as to be consistent with the other protocol drivers. Also adjust the documentation to reflect this. Further, there is no longer a standalone rxkad module, as it has been merged into the rxrpc core, so get rid of references to that. Reported-by: Marc Dionne <marc.dionne@auristor.com> Signed-off-by: David Howells <dhowells@redhat.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Simon Xiao authored
Return the correct tx_errors stats in netvsc. Reviewed-by: Haiyang Zhang <haiyangz@microsoft.com> Signed-off-by: Simon Xiao <sixiao@microsoft.com> Signed-off-by: Stephen Hemminger <sthemmin@microsoft.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
When allocating rtnl dump messages, struct ifla_port_vsi is never dumped, so we can save header plus payload in rtnl_port_size(). Infact, attribute IFLA_PORT_VSI_TYPE and struct ifla_port_vsi are not used anywhere in the kernel. We only need to keep the nla policy should applications in user space be filling this out. Same NLA_BINARY issue exists as was fixed in 364d5716 ("rtnetlink: ifla_vf_policy: fix misuses of NLA_BINARY") and others, but then again IFLA_PORT_VSI_TYPE is not used anywhere, so just add a comment that it's unused. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philippe Reynes authored
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Philippe Reynes authored
The ethtool api {get|set}_settings is deprecated. We move this driver to new api {get|set}_link_ksettings. As I don't have the hardware, I'd be very pleased if someone may test this patch. Signed-off-by: Philippe Reynes <tremyfr@gmail.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
added_by_external_learn fdb entries are added and expired by external entities like switchdev driver or external controllers. ageing is already disabled for such entries. Hence, don't indicate expiry for such fdb entries. CC: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> CC: Jiri Pirko <jiri@resnulli.us> CC: Ido Schimmel <idosch@mellanox.com> Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Ido Schimmel <idosch@mellanox.com> Tested-by: Ido Schimmel <idosch@mellanox.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Daniel Borkmann says: ==================== Misc BPF improvements This last series for this window adds various misc improvements to BPF, one is to mark registered map and prog types as __ro_after_init, another one for removing cBPF stubs in eBPF JITs and moving the stub to the core and last also improving JITs is to make generated images visible to the kernel and kallsyms, so they can be seen in traces. For details, please have a look at the individual patches. Thanks a lot! ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Long standing issue with JITed programs is that stack traces from function tracing check whether a given address is kernel code through {__,}kernel_text_address(), which checks for code in core kernel, modules and dynamically allocated ftrace trampolines. But what is still missing is BPF JITed programs (interpreted programs are not an issue as __bpf_prog_run() will be attributed to them), thus when a stack trace is triggered, the code walking the stack won't see any of the JITed ones. The same for address correlation done from user space via reading /proc/kallsyms. This is read by tools like perf, but the latter is also useful for permanent live tracing with eBPF itself in combination with stack maps when other eBPF types are part of the callchain. See offwaketime example on dumping stack from a map. This work tries to tackle that issue by making the addresses and symbols known to the kernel. The lookup from *kernel_text_address() is implemented through a latched RB tree that can be read under RCU in fast-path that is also shared for symbol/size/offset lookup for a specific given address in kallsyms. The slow-path iteration through all symbols in the seq file done via RCU list, which holds a tiny fraction of all exported ksyms, usually below 0.1 percent. Function symbols are exported as bpf_prog_<tag>, in order to aide debugging and attribution. This facility is currently enabled for root-only when bpf_jit_kallsyms is set to 1, and disabled if hardening is active in any mode. The rationale behind this is that still a lot of systems ship with world read permissions on kallsyms thus addresses should not get suddenly exposed for them. If that situation gets much better in future, we always have the option to change the default on this. Likewise, unprivileged programs are not allowed to add entries there either, but that is less of a concern as most such programs types relevant in this context are for root-only anyway. If enabled, call graphs and stack traces will then show a correct attribution; one example is illustrated below, where the trace is now visible in tooling such as perf script --kallsyms=/proc/kallsyms and friends. Before: 7fff8166889d bpf_clone_redirect+0x80007f0020ed (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff006451f1a007 (/usr/lib64/libc-2.18.so) After: 7fff816688b7 bpf_clone_redirect+0x80007f002107 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa0575728 bpf_prog_33c45a467c9e061a+0x8000600020fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fffa07ef1fc cls_bpf_classify+0x8000600020dc (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81678b68 tc_classify+0x80007f002078 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d40b __netif_receive_skb_core+0x80007f0025fb (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164d718 __netif_receive_skb+0x80007f002018 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164e565 process_backlog+0x80007f002095 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8164dc71 net_rx_action+0x80007f002231 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff81767461 __softirqentry_text_start+0x80007f0020d1 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817658ac do_softirq_own_stack+0x80007f00201c (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2c20 do_softirq+0x80007f002050 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff810a2cb5 __local_bh_enable_ip+0x80007f002085 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168d452 ip_finish_output2+0x80007f002152 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168ea3d ip_finish_output+0x80007f00217d (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff8168f2af ip_output+0x80007f00203f (/lib/modules/4.9.0-rc8+/build/vmlinux) [...] 7fff81005854 do_syscall_64+0x80007f002054 (/lib/modules/4.9.0-rc8+/build/vmlinux) 7fff817649eb return_from_SYSCALL_64+0x80007f002000 (/lib/modules/4.9.0-rc8+/build/vmlinux) f5d80 __sendmsg_nocancel+0xffff01c484812007 (/usr/lib64/libc-2.18.so) Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Cc: linux-kernel@vger.kernel.org Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
Remove the dummy bpf_jit_compile() stubs for eBPF JITs and make that a single __weak function in the core that can be overridden similarly to the eBPF one. Also remove stale pr_err() mentions of bpf_jit_compile. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Daniel Borkmann authored
All map types and prog types are registered to the BPF core through bpf_register_map_type() and bpf_register_prog_type() during init and remain unchanged thereafter. As by design we don't (and never will) have any pluggable code that can register to that at any later point in time, lets mark all the existing bpf_{map,prog}_type_list objects in the tree as __ro_after_init, so they can be moved to read-only section from then onwards. Signed-off-by: Daniel Borkmann <daniel@iogearbox.net> Acked-by: Alexei Starovoitov <ast@kernel.org> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Roopa Prabhu authored
Fixes: efa5356b ("bridge: per vlan dst_metadata netlink support") Signed-off-by: Roopa Prabhu <roopa@cumulusnetworks.com> Reviewed-by: Nikolay Aleksandrov <nikolay@cumulusnetworks.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tobias Klauser authored
After commit 34a5102c ("net: bgmac: allocate struct bgmac just once & don't copy it") the mac_addr member of struct bgmac is no longer necessary to pass the MAC address to bgmac_enet_probe(). Instead it can directly be stored in netdev->dev_addr. Also use eth_hw_addr_random() instead of eth_random_addr() in case a random MAC is nedded. This will make sure netdev->addr_assign_type will be properly set. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Acked-by: Jon Mason <jon.mason@broadcom.com> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Merge tag 'wireless-drivers-next-for-davem-2017-02-16' of git://git.kernel.org/pub/scm/linux/kernel/git/kvalo/wireless-drivers-next Kalle Valo says: ==================== wireless-drivers-next patches for 4.11 Mostly small fixes, not really any new features. Major changes: ath10k * when trying older firmware versions don't confuse user with error messages ath9k * fix crash in AP mode (regression) * fix relayfs crash (regression) * fix initialisation with AR9340 and AR9550 ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Tobias Klauser authored
Use eth_hw_addr_random() to set a random dev_addr and update addr_assign_type instead of open-coding it. Signed-off-by: Tobias Klauser <tklauser@distanz.ch> Signed-off-by: David S. Miller <davem@davemloft.net>
-
David S. Miller authored
Herbert Xu says: ==================== rhashtable: Handle table allocation failure during insertion v2 - Added Ack to patch 2. Fixed RCU annotation in code path executed by rehasher by using rht_dereference_bucket. v1 - This series tackles the problem of table allocation failures during insertion. The issue is that we cannot vmalloc during insertion. This series deals with this by introducing nested tables. The first two patches removes manual hash table walks which cannot work on a nested table. The final patch introduces nested tables. I've tested this with test_rhashtable and it appears to work. ==================== Signed-off-by: David S. Miller <davem@davemloft.net>
-
Herbert Xu authored
This patch adds code that handles GFP_ATOMIC kmalloc failure on insertion. As we cannot use vmalloc, we solve it by making our hash table nested. That is, we allocate single pages at each level and reach our desired table size by nesting them. When a nested table is created, only a single page is allocated at the top-level. Lower levels are allocated on demand during insertion. Therefore for each insertion to succeed, only two (non-consecutive) pages are needed. After a nested table is created, a rehash will be scheduled in order to switch to a vmalloced table as soon as possible. Also, the rehash code will never rehash into a nested table. If we detect a nested table during a rehash, the rehash will be aborted and a new rehash will be scheduled. Signed-off-by: Herbert Xu <herbert@gondor.apana.org.au> Signed-off-by: David S. Miller <davem@davemloft.net>
-