1. 07 Oct, 2014 1 commit
  2. 05 Oct, 2014 10 commits
    • David S. Miller's avatar
      sparc64: Kill unnecessary tables and increase MAX_BANKS. · d195b71b
      David S. Miller authored
      swapper_low_pmd_dir and swapper_pud_dir are actually completely
      useless and unnecessary.
      
      We just need swapper_pg_dir[].  Naturally the other page table chunks
      will be allocated on an as-needed basis.  Since the kernel actually
      accesses these tables in the PAGE_OFFSET view, there is not even a TLB
      locality advantage of placing them in the kernel image.
      
      Use the hard coded vmlinux.ld.S slot for swapper_pg_dir which is
      naturally page aligned.
      
      Increase MAX_BANKS to 1024 in order to handle heavily fragmented
      virtual guests.
      
      Even with this MAX_BANKS increase, the kernel is 20K+ smaller.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      d195b71b
    • bob picco's avatar
      sparc64: sparse irq · ee6a9333
      bob picco authored
      This patch attempts to do a few things. The highlights are: 1) enable
      SPARSE_IRQ unconditionally, 2) kills off !SPARSE_IRQ code 3) allocates
      ivector_table at boot time and 4) default to cookie only VIRQ mechanism
      for supported firmware. The first firmware with cookie only support for
      me appears on T5. You can optionally force the HV firmware to not cookie
      only mode which is the sysino support.
      
      The sysino is a deprecated HV mechanism according to the most recent
      SPARC Virtual Machine Specification. HV_GRP_INTR is what controls the
      cookie/sysino firmware versioning.
      
      The history of this interface is:
      
      1) Major version 1.0 only supported sysino based interrupt interfaces.
      
      2) Major version 2.0 added cookie based VIRQs, however due to the fact
         that OSs were using the VIRQs without negoatiating major version
         2.0 (Linux and Solaris are both guilty), the VIRQs calls were
         allowed even with major version 1.0
      
         To complicate things even further, the VIRQ interfaces were only
         actually hooked up in the hypervisor for LDC interrupt sources.
         VIRQ calls on other device types would result in HV_EINVAL errors.
      
         So effectively, major version 2.0 is unusable.
      
      3) Major version 3.0 was created to signal use of VIRQs and the fact
         that the hypervisor has these calls hooked up for all interrupt
         sources, not just those for LDC devices.
      
      A new boot option is provided should cookie only HV support have issues.
      hvirq - this is the version for HV_GRP_INTR. This is related to HV API
      versioning.  The code attempts major=3 first by default. The option can
      be used to override this default.
      
      I've tested with SPARSE_IRQ on T5-8, M7-4 and T4-X and Jalap?no.
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ee6a9333
    • David S. Miller's avatar
      sparc64: Adjust vmalloc region size based upon available virtual address bits. · bb4e6e85
      David S. Miller authored
      In order to accomodate embedded per-cpu allocation with large numbers
      of cpus and numa nodes, we have to use as much virtual address space
      as possible for the vmalloc region.  Otherwise we can get things like:
      
      PERCPU: max_distance=0x380001c10000 too large for vmalloc space 0xff00000000
      
      So, once we select a value for PAGE_OFFSET, derive the size of the
      vmalloc region based upon that.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      bb4e6e85
    • David S. Miller's avatar
      sparc64: Increase MAX_PHYS_ADDRESS_BITS to 53. · 7c0fa0f2
      David S. Miller authored
      Make sure, at compile time, that the kernel can properly support
      whatever MAX_PHYS_ADDRESS_BITS is defined to.
      
      On M7 chips, use a max_phys_bits value of 49.
      
      Based upon a patch by Bob Picco.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      7c0fa0f2
    • David S. Miller's avatar
      sparc64: Use kernel page tables for vmemmap. · c06240c7
      David S. Miller authored
      For sparse memory configurations, the vmemmap array behaves terribly
      and it takes up an inordinate amount of space in the BSS section of
      the kernel image unconditionally.
      
      Just build huge PMDs and look them up just like we do for TLB misses
      in the vmalloc area.
      
      Kernel BSS shrinks by about 2MB.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      c06240c7
    • David S. Miller's avatar
      sparc64: Fix physical memory management regressions with large max_phys_bits. · 0dd5b7b0
      David S. Miller authored
      If max_phys_bits needs to be > 43 (f.e. for T4 chips), things like
      DEBUG_PAGEALLOC stop working because the 3-level page tables only
      can cover up to 43 bits.
      
      Another problem is that when we increased MAX_PHYS_ADDRESS_BITS up to
      47, several statically allocated tables became enormous.
      
      Compounding this is that we will need to support up to 49 bits of
      physical addressing for M7 chips.
      
      The two tables in question are sparc64_valid_addr_bitmap and
      kpte_linear_bitmap.
      
      The first holds a bitmap, with 1 bit for each 4MB chunk of physical
      memory, indicating whether that chunk actually exists in the machine
      and is valid.
      
      The second table is a set of 2-bit values which tell how large of a
      mapping (4MB, 256MB, 2GB, 16GB, respectively) we can use at each 256MB
      chunk of ram in the system.
      
      These tables are huge and take up an enormous amount of the BSS
      section of the sparc64 kernel image.  Specifically, the
      sparc64_valid_addr_bitmap is 4MB, and the kpte_linear_bitmap is 128K.
      
      So let's solve the space wastage and the DEBUG_PAGEALLOC problem
      at the same time, by using the kernel page tables (as designed) to
      manage this information.
      
      We have to keep using large mappings when DEBUG_PAGEALLOC is disabled,
      and we do this by encoding huge PMDs and PUDs.
      
      On a T4-2 with 256GB of ram the kernel page table takes up 16K with
      DEBUG_PAGEALLOC disabled and 256MB with it enabled.  Furthermore, this
      memory is dynamically allocated at run time rather than coded
      statically into the kernel image.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      0dd5b7b0
    • David S. Miller's avatar
      sparc64: Adjust KTSB assembler to support larger physical addresses. · 8c82dc0e
      David S. Miller authored
      As currently coded the KTSB accesses in the kernel only support up to
      47 bits of physical addressing.
      
      Adjust the instruction and patching sequence in order to support
      arbitrary 64 bits addresses.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      8c82dc0e
    • David S. Miller's avatar
      sparc64: Define VA hole at run time, rather than at compile time. · 4397bed0
      David S. Miller authored
      Now that we use 4-level page tables, we can provide up to 53-bits of
      virtual address space to the user.
      
      Adjust the VA hole based upon the capabilities of the cpu type probed.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      4397bed0
    • David S. Miller's avatar
      sparc64: Switch to 4-level page tables. · ac55c768
      David S. Miller authored
      This has become necessary with chips that support more than 43-bits
      of physical addressing.
      
      Based almost entirely upon a patch by Bob Picco.
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      Acked-by: default avatarBob Picco <bob.picco@oracle.com>
      ac55c768
    • David S. Miller's avatar
      sparc64: Fix reversed start/end in flush_tlb_kernel_range() · 473ad7f4
      David S. Miller authored
      When we have to split up a flush request into multiple pieces
      (in order to avoid the firmware range) we don't specify the
      arguments in the right order for the second piece.
      
      Fix the order, or else we get hangs as the code tries to
      flush "a lot" of entries and we get lockups like this:
      
      [ 4422.981276] NMI watchdog: BUG: soft lockup - CPU#12 stuck for 23s! [expect:117032]
      [ 4422.996130] Modules linked in: ipv6 loop usb_storage igb ptp sg sr_mod ehci_pci ehci_hcd pps_core n2_rng rng_core
      [ 4423.016617] CPU: 12 PID: 117032 Comm: expect Not tainted 3.17.0-rc4+ #1608
      [ 4423.030331] task: fff8003cc730e220 ti: fff8003d99d54000 task.ti: fff8003d99d54000
      [ 4423.045282] TSTATE: 0000000011001602 TPC: 00000000004521e8 TNPC: 00000000004521ec Y: 00000000    Not tainted
      [ 4423.064905] TPC: <__flush_tlb_kernel_range+0x28/0x40>
      [ 4423.074964] g0: 000000000052fd10 g1: 00000001295a8000 g2: ffffff7176ffc000 g3: 0000000000002000
      [ 4423.092324] g4: fff8003cc730e220 g5: fff8003dfedcc000 g6: fff8003d99d54000 g7: 0000000000000006
      [ 4423.109687] o0: 0000000000000000 o1: 0000000000000000 o2: 0000000000000003 o3: 00000000f0000000
      [ 4423.127058] o4: 0000000000000080 o5: 00000001295a8000 sp: fff8003d99d56d01 ret_pc: 000000000052ff54
      [ 4423.145121] RPC: <__purge_vmap_area_lazy+0x314/0x3a0>
      [ 4423.155185] l0: 0000000000000000 l1: 0000000000000000 l2: 0000000000a38040 l3: 0000000000000000
      [ 4423.172559] l4: fff8003dae8965e0 l5: ffffffffffffffff l6: 0000000000000000 l7: 00000000f7e2b138
      [ 4423.189913] i0: fff8003d99d576a0 i1: fff8003d99d576a8 i2: fff8003d99d575e8 i3: 0000000000000000
      [ 4423.207284] i4: 0000000000008008 i5: fff8003d99d575c8 i6: fff8003d99d56df1 i7: 0000000000530c24
      [ 4423.224640] I7: <free_vmap_area_noflush+0x64/0x80>
      [ 4423.234193] Call Trace:
      [ 4423.239051]  [0000000000530c24] free_vmap_area_noflush+0x64/0x80
      [ 4423.251029]  [0000000000531a7c] remove_vm_area+0x5c/0x80
      [ 4423.261628]  [0000000000531b80] __vunmap+0x20/0x120
      [ 4423.271352]  [000000000071cf18] n_tty_close+0x18/0x40
      [ 4423.281423]  [00000000007222b0] tty_ldisc_close+0x30/0x60
      [ 4423.292183]  [00000000007225a4] tty_ldisc_reinit+0x24/0xa0
      [ 4423.303120]  [0000000000722ab4] tty_ldisc_hangup+0xd4/0x1e0
      [ 4423.314232]  [0000000000719aa0] __tty_hangup+0x280/0x3c0
      [ 4423.324835]  [0000000000724cb4] pty_close+0x134/0x1a0
      [ 4423.334905]  [000000000071aa24] tty_release+0x104/0x500
      [ 4423.345316]  [00000000005511d0] __fput+0x90/0x1e0
      [ 4423.354701]  [000000000047fa54] task_work_run+0x94/0xe0
      [ 4423.365126]  [0000000000404b44] __handle_signal+0xc/0x2c
      
      Fixes: 4ca9a237 ("sparc64: Guard against flushing openfirmware mappings.")
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      473ad7f4
  3. 30 Sep, 2014 6 commits
    • Sowmini Varadhan's avatar
      sparc64: Add vio_set_intr() to enable/disable Rx interrupts · ca605b7d
      Sowmini Varadhan authored
      The vio_set_intr() API should be used by VIO consumers to enable/disable
      Rx interrupts to facilitate deferred processing in softirq/bottom-half
      context.
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      ca605b7d
    • Dwight Engen's avatar
      vio: fix reuse of vio_dring slot · d0aedcd4
      Dwight Engen authored
      vio_dring_avail() will allow use of every dring entry, but when the last
      entry is allocated then dr->prod == dr->cons which is indistinguishable from
      the ring empty condition. This causes the next allocation to reuse an entry.
      When this happens in sunvdc, the server side vds driver begins nack'ing the
      messages and ends up resetting the ldc channel. This problem does not effect
      sunvnet since it checks for < 2.
      
      The fix here is to just never allocate the very last dring slot so that full
      and empty are not the same condition. The request start path was changed to
      check for the ring being full a bit earlier, and to stop the blk_queue if
      there is no space left. The blk_queue will be restarted once the ring is
      only half full again. The number of ring entries was increased to 512 which
      matches the sunvnet and Solaris vdc drivers, and greatly reduces the
      frequency of hitting the ring full condition and the associated blk_queue
      stop/starting. The checks in sunvent were adjusted to account for
      vio_dring_avail() returning 1 less.
      
      Orabug: 19441666
      OraBZ: 14983
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      d0aedcd4
    • Dwight Engen's avatar
      sunvdc: limit each sg segment to a page · 5eed69ff
      Dwight Engen authored
      ldc_map_sg() could fail its check that the number of pages referred to
      by the sg scatterlist was <= the number of cookies.
      
      This fixes the issue by doing a similar thing to the xen-blkfront driver,
      ensuring that the scatterlist will only ever contain a segment count <=
      port->ring_cookies, and each segment will be page aligned, and <= page
      size. This ensures that the scatterlist is always mappable.
      
      Orabug: 19347817
      OraBZ: 15945
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      5eed69ff
    • Allen Pais's avatar
      sunvdc: compute vdisk geometry from capacity · de5b73f0
      Allen Pais authored
      The LDom diskserver doesn't return reliable geometry data. In addition,
      the types for all fields in the vio_disk_geom are u16, which were being
      truncated in the cast into the u8's of the Linux struct hd_geometry.
      
      Modify vdc_getgeo() to compute the geometry from the disk's capacity in a
      manner consistent with xen-blkfront::blkif_getgeo().
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      de5b73f0
    • Allen Pais's avatar
      sunvdc: add cdrom and v1.1 protocol support · 9bce2182
      Allen Pais authored
      Interpret the media type from v1.1 protocol to support CDROM/DVD.
      
      For v1.0 protocol, a disk's size continues to be calculated from the
      geometry returned by the vdisk server. The geometry returned by the server
      can be less than the actual number of sectors available in the backing
      image/device due to the rounding in the division used to compute the
      geometry in the vdisk server.
      
      In v1.1 protocol a disk's actual size in sectors is returned during the
      handshake. Use this size when v1.1 protocol is negotiated. Since this size
      will always be larger than the former geometry computed size, disks created
      under v1.0 will be forwards compatible to v1.1, but not vice versa.
      Signed-off-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      9bce2182
    • David L Stevens's avatar
      sparc: VIO protocol version 1.6 · 163a4e74
      David L Stevens authored
      Add VIO protocol version 1.6 interfaces.
      Signed-off-by: default avatarDavid L Stevens <david.stevens@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      163a4e74
  4. 27 Sep, 2014 1 commit
  5. 17 Sep, 2014 5 commits
    • Sowmini Varadhan's avatar
      sparc64: Move request_irq() from ldc_bind() to ldc_alloc() · c21c4ab0
      Sowmini Varadhan authored
      The request_irq() needs to be done from ldc_alloc()
      to avoid the following (caught by lockdep)
      
       [00000000004a0738] __might_sleep+0xf8/0x120
       [000000000058bea4] kmem_cache_alloc_trace+0x184/0x2c0
       [00000000004faf80] request_threaded_irq+0x80/0x160
       [000000000044f71c] ldc_bind+0x7c/0x220
       [0000000000452454] vio_port_up+0x54/0xe0
       [00000000101f6778] probe_disk+0x38/0x220 [sunvdc]
       [00000000101f6b8c] vdc_port_probe+0x22c/0x300 [sunvdc]
       [0000000000451a88] vio_device_probe+0x48/0x60
       [000000000074c56c] really_probe+0x6c/0x300
       [000000000074c83c] driver_probe_device+0x3c/0xa0
       [000000000074c92c] __driver_attach+0x8c/0xa0
       [000000000074a6ec] bus_for_each_dev+0x6c/0xa0
       [000000000074c1dc] driver_attach+0x1c/0x40
       [000000000074b0fc] bus_add_driver+0xbc/0x280
      Signed-off-by: default avatarSowmini Varadhan <sowmini.varadhan@oracle.com>
      Acked-by: default avatarDwight Engen <dwight.engen@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      c21c4ab0
    • bob picco's avatar
      sparc64: T5 PMU · 05aa1651
      bob picco authored
      The T5 (niagara5) has different PCR related HV fast trap values and a new
      HV API Group. This patch utilizes these and shares when possible with niagara4.
      
      We use the same sparc_pmu niagara4_pmu. Should there be new effort to
      obtain the MCU perf statistics then this would have to be changed.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      05aa1651
    • bob picco's avatar
      sparc64: mem boot option correction · 7c21d533
      bob picco authored
      The "mem" boot option can result in many unexpected consequences. This patch
      attempts to prevent boot hangs which have been experienced on T4-4 and T5-8.
      Basically the boot loader allocates vmlinuz and initrd higher in available
      OBP physical memory. For example, on a 2Tb T5-8 it isn't possible to boot
      with mem=20G.
      
      The patch utilizes memblock to avoid reserved regions and trim memory which
      is only free. Other improvements are possible for a multi-node machine.
      
      This is a snippet of the boot log with mem=20G on T5-8 with the patch applied:
      MEMBLOCK configuration:	<- before memory reduction
       memory size = 0x1ffad6ce000 reserved size = 0xa1adf44
       memory.cnt  = 0xb
       memory[0x0]    [0x00000030400000-0x00003fdde47fff], 0x3fada48000 bytes
       memory[0x1]    [0x00003fdde4e000-0x00003fdde4ffff], 0x2000 bytes
       memory[0x2]    [0x00080000000000-0x00083fffffffff], 0x4000000000 bytes
       memory[0x3]    [0x00100000000000-0x00103fffffffff], 0x4000000000 bytes
       memory[0x4]    [0x00180000000000-0x00183fffffffff], 0x4000000000 bytes
       memory[0x5]    [0x00200000000000-0x00203fffffffff], 0x4000000000 bytes
       memory[0x6]    [0x00280000000000-0x00283fffffffff], 0x4000000000 bytes
       memory[0x7]    [0x00300000000000-0x00303fffffffff], 0x4000000000 bytes
       memory[0x8]    [0x00380000000000-0x00383fffc71fff], 0x3fffc72000 bytes
       memory[0x9]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0xa]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      MEMBLOCK configuration:	<- after reduction of memory
       memory size = 0x50a1adf44 reserved size = 0xa1adf44
       memory.cnt  = 0x4
       memory[0x0]    [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       memory[0x1]    [0x00380004000000-0x0038050d01d74a], 0x50901d74b bytes
       memory[0x2]    [0x00383fffc92000-0x00383fffca1fff], 0x10000 bytes
       memory[0x3]    [0x00383fffcb4000-0x00383fffcb5fff], 0x2000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00380000000000-0x0038000117e7f8], 0x117e7f9 bytes
       reserved[0x1]  [0x00380004000000-0x0038000d02f74a], 0x902f74b bytes
      ...
      Early memory node ranges
        node   7: [mem 0x380000000000-0x38000117dfff]
        node   7: [mem 0x380004000000-0x380f0d01bfff]
        node   7: [mem 0x383fffc92000-0x383fffca1fff]
        node   7: [mem 0x383fffcb4000-0x383fffcb5fff]
      Could not find start_pfn for node 0
      Could not find start_pfn for node 1
      Could not find start_pfn for node 2
      Could not find start_pfn for node 3
      Could not find start_pfn for node 4
      Could not find start_pfn for node 5
      Could not find start_pfn for node 6
      .
      
      The patch was tested on T4-1, T5-8 and Jalap?no.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      7c21d533
    • bob picco's avatar
      sparc64: find_node adjustment · 3dee9df5
      bob picco authored
      We have seen an issue with guest boot into LDOM that causes early boot failures
      because of no matching rules for node identitity of the memory. I analyzed this
      on my T4 and concluded there might not be a solution. I saw the issue in
      mainline too when booting into the control/primary domain - with guests
      configured.  Note, this could be a firmware bug on some older machines.
      
      I'll provide a full explanation of the issues below. Should we not find a
      matching BEST latency group for a real address (RA) then we will assume node 0.
      On the T4-2 here with the information provided I can't see an alternative.
      
      Technically the LDOM shown below should match the MBLOCK to the
      favorable latency group. However other factors must be considered too. Were
      the memory controllers configured "fine" grained interleave or "coarse"
      grain interleaved -  T4. Also should a "group" MD node be considered a NUMA
      node?
      
      There has to be at least one Machine Description (MD) "group" and hence one
      NUMA node. The group can have one or more latency groups (lg) - more than one
      memory controller. The current code chooses the smallest latency as the most
      favorable per group. The latency and lg information is in MLGROUP below.
      MBLOCK is the base and size of the RAs for the machine as fetched from OBP
      /memory "available" property. My machine has one MBLOCK but more would be
      possible - with holes?
      
      For a T4-2 the following information has been gathered:
      with LDOM guest
      MEMBLOCK configuration:
       memory size = 0x27f870000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x0000029fc67fff], 0x27f868000 bytes
       memory[0x1]    [0x0000029fd8a000-0x0000029fd8bfff], 0x2000 bytes
       memory[0x2]    [0x0000029fd92000-0x0000029fd97fff], 0x6000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x000000216c15c0], 0xec15c1 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c180c1e], 0x7980c1f bytes
      MBLOCK[0]: base[20000000] size[280000000] offset[0]
      (note: "base" and "size" reported in "MBLOCK" encompass the "memory[X]" values)
      (note: (RA + offset) & mask = val is the formula to detect a match for the
      memory controller. should there be no match for find_node node, a return
      value of -1 resulted for the node - BAD)
      
      There is one group. It has these forward links
      MLGROUP[1]: node[545] latency[1f7e8] match[200000000] mask[200000000]
      MLGROUP[2]: node[54d] latency[2de60] match[0] mask[200000000]
      NUMA NODE[0]: node[545] mask[200000000] val[200000000] (latency[1f7e8])
      (note: "val" is the best lg's (smallest latency) "match")
      
      no LDOM guest - bare metal
      MEMBLOCK configuration:
       memory size = 0xfdf2d0000
       memory.cnt  = 0x3
       memory[0x0]    [0x00000020400000-0x00000fff6adfff], 0xfdf2ae000 bytes
       memory[0x1]    [0x00000fff6d2000-0x00000fff6e7fff], 0x16000 bytes
       memory[0x2]    [0x00000fff766000-0x00000fff771fff], 0xc000 bytes
       reserved.cnt  = 0x2
       reserved[0x0]  [0x00000020800000-0x00000021a04580], 0x1204581 bytes
       reserved[0x1]  [0x00000024800000-0x0000002c7d29fc], 0x7fd29fd bytes
      MBLOCK[0]: base[20000000] size[fe0000000] offset[0]
      
      there are two groups
      group node[16d5]
      MLGROUP[0]: node[1765] latency[1f7e8] match[0] mask[200000000]
      MLGROUP[3]: node[177d] latency[2de60] match[200000000] mask[200000000]
      NUMA NODE[0]: node[1765] mask[200000000] val[0] (latency[1f7e8])
      group node[171d]
      MLGROUP[2]: node[1775] latency[2de60] match[0] mask[200000000]
      MLGROUP[1]: node[176d] latency[1f7e8] match[200000000] mask[200000000]
      NUMA NODE[1]: node[176d] mask[200000000] val[200000000] (latency[1f7e8])
      (note: for this two "group" bare metal machine, 1/2 memory is in group one's
      lg and 1/2 memory is in group two's lg).
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      3dee9df5
    • bob picco's avatar
      sparc64: sun4v TLB error power off events · 4ccb9272
      bob picco authored
      We've witnessed a few TLB events causing the machine to power off because
      of prom_halt. In one case it was some nfs related area during rmmod. Another
      was an mmapper of /dev/mem. A more recent one is an ITLB issue with
      a bad pagesize which could be a hardware bug. Bugs happen but we should
      attempt to not power off the machine and/or hang it when possible.
      
      This is a DTLB error from an mmapper of /dev/mem:
      [root@sparcie ~]# SUN4V-DTLB: Error at TPC[fffff80100903e6c], tl 1
      SUN4V-DTLB: TPC<0xfffff80100903e6c>
      SUN4V-DTLB: O7[fffff801081979d0]
      SUN4V-DTLB: O7<0xfffff801081979d0>
      SUN4V-DTLB: vaddr[fffff80100000000] ctx[1250] pte[98000000000f0610] error[2]
      .
      
      This is recent mainline for ITLB:
      [ 3708.179864] SUN4V-ITLB: TPC<0xfffffc010071cefc>
      [ 3708.188866] SUN4V-ITLB: O7[fffffc010071cee8]
      [ 3708.197377] SUN4V-ITLB: O7<0xfffffc010071cee8>
      [ 3708.206539] SUN4V-ITLB: vaddr[e0003] ctx[1a3c] pte[2900000dcc800eeb] error[4]
      .
      
      Normally sun4v_itlb_error_report() and sun4v_dtlb_error_report() would call
      prom_halt() and drop us to OF command prompt "ok". This isn't the case for
      LDOMs and the machine powers off.
      
      For the HV reported error of HV_ENORADDR for HV HV_MMU_MAP_ADDR_TRAP we cause
      a SIGBUS error by qualifying it within do_sparc64_fault() for fault code mask
      of FAULT_CODE_BAD_RA. This is done when trap level (%tl) is less or equal
      one("1"). Otherwise, for %tl > 1,  we proceed eventually to die_if_kernel().
      
      The logic of this patch was partially inspired by David Miller's feedback.
      
      Power off of large sparc64 machines is painful. Plus die_if_kernel provides
      more context. A reset sequence isn't a brief period on large sparc64 but
      better than power-off/power-on sequence.
      
      Cc: sparclinux@vger.kernel.org
      Signed-off-by: default avatarBob Picco <bob.picco@oracle.com>
      Signed-off-by: default avatarDavid S. Miller <davem@davemloft.net>
      4ccb9272
  6. 10 Sep, 2014 1 commit
  7. 09 Sep, 2014 12 commits
  8. 08 Sep, 2014 4 commits
    • Linus Torvalds's avatar
      Merge branch 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4 · 8c68face
      Linus Torvalds authored
      Pull ext4 bugfix from Ted Ts'o.
      
      [ Hmm.  It's possible we should make kfree() aware of error pointers,
        and use IS_ERR_OR_NULL rather than a NULL check.  But in the meantime
        this is obviously the right fix.  - Linus ]
      
      * 'for_linus_urgent' of git://git.kernel.org/pub/scm/linux/kernel/git/tytso/ext4:
        ext4: avoid trying to kfree an ERR_PTR pointer
      8c68face
    • Linus Torvalds's avatar
      Merge branch 'for-3.17' of git://linux-nfs.org/~bfields/linux · 861b7102
      Linus Torvalds authored
      Pull nfsd bugfixes from Bruce Fields:
       "A couple minor nfsd bugfixes"
      
      * 'for-3.17' of git://linux-nfs.org/~bfields/linux:
        lockd: fix rpcbind crash on lockd startup failure
        nfsd4: fix rd_dircount enforcement
      861b7102
    • J. Bruce Fields's avatar
      lockd: fix rpcbind crash on lockd startup failure · 7c17705e
      J. Bruce Fields authored
      Nikita Yuschenko reported that booting a kernel with init=/bin/sh and
      then nfs mounting without portmap or rpcbind running using a busybox
      mount resulted in:
      
        # mount -t nfs 10.30.130.21:/opt /mnt
        svc: failed to register lockdv1 RPC service (errno 111).
        lockd_up: makesock failed, error=-111
        Unable to handle kernel paging request for data at address 0x00000030
        Faulting instruction address: 0xc055e65c
        Oops: Kernel access of bad area, sig: 11 [#1]
        MPC85xx CDS
        Modules linked in:
        CPU: 0 PID: 1338 Comm: mount Not tainted 3.10.44.cge #117
        task: cf29cea0 ti: cf35c000 task.ti: cf35c000
        NIP: c055e65c LR: c0566490 CTR: c055e648
        REGS: cf35dad0 TRAP: 0300   Not tainted  (3.10.44.cge)
        MSR: 00029000 <CE,EE,ME>  CR: 22442488  XER: 20000000
        DEAR: 00000030, ESR: 00000000
      
        GPR00: c05606f4 cf35db80 cf29cea0 cf0ded80 cf0dedb8 00000001 1dec3086
        00000000
        GPR08: 00000000 c07b1640 00000007 1dec3086 22442482 100b9758 00000000
        10090ae8
        GPR16: 00000000 000186a5 00000000 00000000 100c3018 bfa46edc 100b0000
        bfa46ef0
        GPR24: cf386ae0 c07834f0 00000000 c0565f88 00000001 cf0dedb8 00000000
        cf0ded80
        NIP [c055e65c] call_start+0x14/0x34
        LR [c0566490] __rpc_execute+0x70/0x250
        Call Trace:
        [cf35db80] [00000080] 0x80 (unreliable)
        [cf35dbb0] [c05606f4] rpc_run_task+0x9c/0xc4
        [cf35dbc0] [c0560840] rpc_call_sync+0x50/0xb8
        [cf35dbf0] [c056ee90] rpcb_register_call+0x54/0x84
        [cf35dc10] [c056f24c] rpcb_register+0xf8/0x10c
        [cf35dc70] [c0569e18] svc_unregister.isra.23+0x100/0x108
        [cf35dc90] [c0569e38] svc_rpcb_cleanup+0x18/0x30
        [cf35dca0] [c0198c5c] lockd_up+0x1dc/0x2e0
        [cf35dcd0] [c0195348] nlmclnt_init+0x2c/0xc8
        [cf35dcf0] [c015bb5c] nfs_start_lockd+0x98/0xec
        [cf35dd20] [c015ce6c] nfs_create_server+0x1e8/0x3f4
        [cf35dd90] [c0171590] nfs3_create_server+0x10/0x44
        [cf35dda0] [c016528c] nfs_try_mount+0x158/0x1e4
        [cf35de20] [c01670d0] nfs_fs_mount+0x434/0x8c8
        [cf35de70] [c00cd3bc] mount_fs+0x20/0xbc
        [cf35de90] [c00e4f88] vfs_kern_mount+0x50/0x104
        [cf35dec0] [c00e6e0c] do_mount+0x1d0/0x8e0
        [cf35df10] [c00e75ac] SyS_mount+0x90/0xd0
        [cf35df40] [c000ccf4] ret_from_syscall+0x0/0x3c
      
      The addition of svc_shutdown_net() resulted in two calls to
      svc_rpcb_cleanup(); the second is no longer necessary and crashes when
      it calls rpcb_register_call with clnt=NULL.
      Reported-by: default avatarNikita Yushchenko <nyushchenko@dev.rtsoft.ru>
      Fixes: 679b033d "lockd: ensure we tear down any live sockets when socket creation fails during lockd_up"
      Cc: stable@vger.kernel.org
      Acked-by: default avatarJeff Layton <jlayton@primarydata.com>
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      7c17705e
    • J. Bruce Fields's avatar
      nfsd4: fix rd_dircount enforcement · aee37764
      J. Bruce Fields authored
      Commit 3b299709 "nfsd4: enforce rd_dircount" totally misunderstood
      rd_dircount; it refers to total non-attribute bytes returned, not number
      of directory entries returned.
      
      Bring the code into agreement with RFC 3530 section 14.2.24.
      
      Cc: stable@vger.kernel.org
      Fixes: 3b299709 "nfsd4: enforce rd_dircount"
      Signed-off-by: default avatarJ. Bruce Fields <bfields@redhat.com>
      aee37764