1. 23 Aug, 2019 1 commit
    • Lyude Paul's avatar
      drm/nouveau: Don't retry infinitely when receiving no data on i2c over AUX · c358ebf5
      Lyude Paul authored
      While I had thought I had fixed this issue in:
      
      commit 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after
      ->fini()")
      
      It turns out that while I did fix the error messages I was seeing on my
      P50 when trying to access i2c busses with the GPU in runtime suspend, I
      accidentally had missed one important detail that was mentioned on the
      bug report this commit was supposed to fix: that the CPU would only lock
      up when trying to access i2c busses _on connected devices_ _while the
      GPU is not in runtime suspend_. Whoops. That definitely explains why I
      was not able to get my machine to hang with i2c bus interactions until
      now, as plugging my P50 into it's dock with an HDMI monitor connected
      allowed me to finally reproduce this locally.
      
      Now that I have managed to reproduce this issue properly, it looks like
      the problem is much simpler then it looks. It turns out that some
      connected devices, such as MST laptop docks, will actually ACK i2c reads
      even if no data was actually read:
      
      [  275.063043] nouveau 0000:01:00.0: i2c: aux 000a: 1: 0000004c 1
      [  275.063447] nouveau 0000:01:00.0: i2c: aux 000a: 00 01101000 10040000
      [  275.063759] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000001
      [  275.064024] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
      [  275.064285] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
      [  275.064594] nouveau 0000:01:00.0: i2c: aux 000a: rd 00000000
      
      Because we don't handle the situation of i2c ack without any data, we
      end up entering an infinite loop in nvkm_i2c_aux_i2c_xfer() since the
      value of cnt always remains at 0. This finally properly explains how
      this could result in a CPU hang like the ones observed in the
      aforementioned commit.
      
      So, fix this by retrying transactions if no data is written or received,
      and give up and fail the transaction if we continue to not write or
      receive any data after 32 retries.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      c358ebf5
  2. 19 Jul, 2019 14 commits
    • Ben Skeggs's avatar
      drm/nouveau/secboot/gp102-: remove WAR for SEC2 RTOS start bug · 4d352dbd
      Ben Skeggs authored
      Appears to be fixed by "flcn/gp102-: improve implementation of
      bind_context() on SEC2/GSP".
      
      Tested on GP10[24678] and GV100.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      4d352dbd
    • Ben Skeggs's avatar
      drm/nouveau/flcn/gp102-: improve implementation of bind_context() on SEC2/GSP · 5210e967
      Ben Skeggs authored
      Fixes various issues encountered while attempting to initialise ACR.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      5210e967
    • Yongxin Liu's avatar
      drm/nouveau: fix memory leak in nouveau_conn_reset() · 09b90e2f
      Yongxin Liu authored
      In nouveau_conn_reset(), if connector->state is true,
      __drm_atomic_helper_connector_destroy_state() will be called,
      but the memory pointed by asyc isn't freed. Memory leak happens
      in the following function __drm_atomic_helper_connector_reset(),
      where newly allocated asyc->state will be assigned to connector->state.
      
      So using nouveau_conn_atomic_destroy_state() instead of
      __drm_atomic_helper_connector_destroy_state to free the "old" asyc.
      
      Here the is the log showing memory leak.
      
      unreferenced object 0xffff8c5480483c80 (size 192):
        comm "kworker/0:2", pid 188, jiffies 4294695279 (age 53.179s)
        hex dump (first 32 bytes):
          00 f0 ba 7b 54 8c ff ff 00 00 00 00 00 00 00 00  ...{T...........
          00 00 00 00 00 00 00 00 00 00 00 00 00 00 00 00  ................
        backtrace:
          [<000000005005c0d0>] kmem_cache_alloc_trace+0x195/0x2c0
          [<00000000a122baed>] nouveau_conn_reset+0x25/0xc0 [nouveau]
          [<000000004fd189a2>] nouveau_connector_create+0x3a7/0x610 [nouveau]
          [<00000000c73343a8>] nv50_display_create+0x343/0x980 [nouveau]
          [<000000002e2b03c3>] nouveau_display_create+0x51f/0x660 [nouveau]
          [<00000000c924699b>] nouveau_drm_device_init+0x182/0x7f0 [nouveau]
          [<00000000cc029436>] nouveau_drm_probe+0x20c/0x2c0 [nouveau]
          [<000000007e961c3e>] local_pci_probe+0x47/0xa0
          [<00000000da14d569>] work_for_cpu_fn+0x1a/0x30
          [<0000000028da4805>] process_one_work+0x27c/0x660
          [<000000001d415b04>] worker_thread+0x22b/0x3f0
          [<0000000003b69f1f>] kthread+0x12f/0x150
          [<00000000c94c29b7>] ret_from_fork+0x3a/0x50
      Signed-off-by: default avatarYongxin Liu <yongxin.liu@windriver.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      09b90e2f
    • Ralph Campbell's avatar
      drm/nouveau/dmem: missing mutex_lock in error path · d304654b
      Ralph Campbell authored
      In nouveau_dmem_pages_alloc(), the drm->dmem->mutex is unlocked before
      calling nouveau_dmem_chunk_alloc() as shown when CONFIG_PROVE_LOCKING
      is enabled:
      
      [ 1294.871933] =====================================
      [ 1294.876656] WARNING: bad unlock balance detected!
      [ 1294.881375] 5.2.0-rc3+ #5 Not tainted
      [ 1294.885048] -------------------------------------
      [ 1294.889773] test-malloc-vra/6299 is trying to release lock (&drm->dmem->mutex) at:
      [ 1294.897482] [<ffffffffa01a220f>] nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
      [ 1294.905782] but there are no more locks to release!
      [ 1294.910690]
      [ 1294.910690] other info that might help us debug this:
      [ 1294.917249] 1 lock held by test-malloc-vra/6299:
      [ 1294.921881]  #0: 0000000016e10454 (&mm->mmap_sem#2){++++}, at: nouveau_svmm_bind+0x142/0x210 [nouveau]
      [ 1294.931313]
      [ 1294.931313] stack backtrace:
      [ 1294.935702] CPU: 4 PID: 6299 Comm: test-malloc-vra Not tainted 5.2.0-rc3+ #5
      [ 1294.942786] Hardware name: ASUS X299-A/PRIME X299-A, BIOS 1401 05/21/2018
      [ 1294.949590] Call Trace:
      [ 1294.952059]  dump_stack+0x7c/0xc0
      [ 1294.955469]  ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
      [ 1294.962213]  print_unlock_imbalance_bug.cold.52+0xca/0xcf
      [ 1294.967641]  lock_release+0x306/0x380
      [ 1294.971383]  ? nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
      [ 1294.978089]  ? lock_downgrade+0x2d0/0x2d0
      [ 1294.982121]  ? find_held_lock+0xac/0xd0
      [ 1294.985979]  __mutex_unlock_slowpath+0x8f/0x3f0
      [ 1294.990540]  ? wait_for_completion+0x230/0x230
      [ 1294.995002]  ? rwlock_bug.part.2+0x60/0x60
      [ 1294.999197]  nouveau_dmem_migrate_alloc_and_copy+0x79f/0xbf0 [nouveau]
      [ 1295.005751]  ? page_mapping+0x98/0x110
      [ 1295.009511]  migrate_vma+0xa74/0x1090
      [ 1295.013186]  ? move_to_new_page+0x480/0x480
      [ 1295.017400]  ? __kmalloc+0x153/0x300
      [ 1295.021052]  ? nouveau_dmem_migrate_vma+0xd8/0x1e0 [nouveau]
      [ 1295.026796]  nouveau_dmem_migrate_vma+0x157/0x1e0 [nouveau]
      [ 1295.032466]  ? nouveau_dmem_init+0x490/0x490 [nouveau]
      [ 1295.037612]  ? vmacache_find+0xc2/0x110
      [ 1295.041537]  nouveau_svmm_bind+0x1b4/0x210 [nouveau]
      [ 1295.046583]  ? nouveau_svm_fault+0x13e0/0x13e0 [nouveau]
      [ 1295.051912]  drm_ioctl_kernel+0x14d/0x1a0
      [ 1295.055930]  ? drm_setversion+0x330/0x330
      [ 1295.059971]  drm_ioctl+0x308/0x530
      [ 1295.063384]  ? drm_version+0x150/0x150
      [ 1295.067153]  ? find_held_lock+0xac/0xd0
      [ 1295.070996]  ? __pm_runtime_resume+0x3f/0xa0
      [ 1295.075285]  ? mark_held_locks+0x29/0xa0
      [ 1295.079230]  ? _raw_spin_unlock_irqrestore+0x3c/0x50
      [ 1295.084232]  ? lockdep_hardirqs_on+0x17d/0x250
      [ 1295.088768]  nouveau_drm_ioctl+0x9a/0x100 [nouveau]
      [ 1295.093661]  do_vfs_ioctl+0x137/0x9a0
      [ 1295.097341]  ? ioctl_preallocate+0x140/0x140
      [ 1295.101623]  ? match_held_lock+0x1b/0x230
      [ 1295.105646]  ? match_held_lock+0x1b/0x230
      [ 1295.109660]  ? find_held_lock+0xac/0xd0
      [ 1295.113512]  ? __do_page_fault+0x324/0x630
      [ 1295.117617]  ? lock_downgrade+0x2d0/0x2d0
      [ 1295.121648]  ? mark_held_locks+0x79/0xa0
      [ 1295.125583]  ? handle_mm_fault+0x352/0x430
      [ 1295.129687]  ksys_ioctl+0x60/0x90
      [ 1295.133020]  ? mark_held_locks+0x29/0xa0
      [ 1295.136964]  __x64_sys_ioctl+0x3d/0x50
      [ 1295.140726]  do_syscall_64+0x68/0x250
      [ 1295.144400]  entry_SYSCALL_64_after_hwframe+0x49/0xbe
      [ 1295.149465] RIP: 0033:0x7f1a3495809b
      [ 1295.153053] Code: 0f 1e fa 48 8b 05 ed bd 0c 00 64 c7 00 26 00 00 00 48 c7 c0 ff ff ff ff c3 66 0f 1f 44 00 00 f3 0f 1e fa b8 10 00 00 00 0f 05 <48> 3d 01 f0 ff ff 73 01 c3 48 8b 0d bd bd 0c 00 f7 d8 64 89 01 48
      [ 1295.171850] RSP: 002b:00007ffef7ed1358 EFLAGS: 00000246 ORIG_RAX: 0000000000000010
      [ 1295.179451] RAX: ffffffffffffffda RBX: 00007ffef7ed1628 RCX: 00007f1a3495809b
      [ 1295.186601] RDX: 00007ffef7ed13b0 RSI: 0000000040406449 RDI: 0000000000000004
      [ 1295.193759] RBP: 00007ffef7ed13b0 R08: 0000000000000000 R09: 000000000157e770
      [ 1295.200917] R10: 000000000151c010 R11: 0000000000000246 R12: 0000000040406449
      [ 1295.208083] R13: 0000000000000004 R14: 0000000000000000 R15: 0000000000000000
      
      Reacquire the lock before continuing to the next page.
      Signed-off-by: default avatarRalph Campbell <rcampbell@nvidia.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      d304654b
    • Karol Herbst's avatar
      drm/nouveau/hwmon: return EINVAL if the GPU is powered down for sensors reads · 68bf8b57
      Karol Herbst authored
      fixes bogus values userspace gets from hwmon while the GPU is powered down
      Signed-off-by: default avatarKarol Herbst <kherbst@redhat.com>
      Reviewed-by: default avatarRhys Kidd <rhyskidd@gmail.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      68bf8b57
    • Ben Skeggs's avatar
      drm/nouveau: fix bogus GPL-2 license header · b0f84a84
      Ben Skeggs authored
      The bulk SPDX addition made all these files into GPL-2.0 licensed files.
      However the remainder of the project is MIT-licensed, these files
      were simply missing the boiler plate and got caught up in the global update.
      
      Fixes: 96ac6d43 (treewide: Add SPDX license identifier - Kbuild)
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      b0f84a84
    • Ilia Mirkin's avatar
      drm/nouveau: fix bogus GPL-2 license header · b7019ac5
      Ilia Mirkin authored
      The bulk SPDX addition made all these files into GPL-2.0 licensed files.
      However the remainder of the project is MIT-licensed, these files
      (primarily header files) were simply missing the boiler plate and got
      caught up in the global update.
      
      Fixes: b2441318 (License cleanup: add SPDX GPL-2.0 license identifier to files with no license)
      Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      Acked-by: default avatarEmil Velikov <emil.l.velikov@gmail.com>
      Acked-by: default avatarKarol Herbst <kherbst@redhat.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      b7019ac5
    • Lyude Paul's avatar
      drm/nouveau/i2c: Enable i2c pads & busses during preinit · 7cb95eee
      Lyude Paul authored
      It turns out that while disabling i2c bus access from software when the
      GPU is suspended was a step in the right direction with:
      
      commit 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after
      ->fini()")
      
      We also ended up accidentally breaking the vbios init scripts on some
      older Tesla GPUs, as apparently said scripts can actually use the i2c
      bus. Since these scripts are executed before initializing any
      subdevices, we end up failing to acquire access to the i2c bus which has
      left a number of cards with their fan controllers uninitialized. Luckily
      this doesn't break hardware - it just means the fan gets stuck at 100%.
      
      This also means that we've always been using our i2c busses before
      initializing them during the init scripts for older GPUs, we just didn't
      notice it until we started preventing them from being used until init.
      It's pretty impressive this never caused us any issues before!
      
      So, fix this by initializing our i2c pad and busses during subdev
      pre-init. We skip initializing aux busses during pre-init, as those are
      guaranteed to only ever be used by nouveau for DP aux transactions.
      Signed-off-by: default avatarLyude Paul <lyude@redhat.com>
      Tested-by: default avatarMarc Meledandri <m.meledandri@gmail.com>
      Fixes: 342406e4 ("drm/nouveau/i2c: Disable i2c bus access after ->fini()")
      Cc: stable@vger.kernel.org
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      7cb95eee
    • Ben Skeggs's avatar
      drm/nouveau/disp/tu102-: wire up scdc parameter setter · 3485b7b5
      Ben Skeggs authored
      Regs seem valid here still, and tested on TU116.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      3485b7b5
    • Ben Skeggs's avatar
      drm/nouveau/core: recognise TU116 chipset · 75dec321
      Ben Skeggs authored
      Modesetting only, still waiting on ACR/GR firmware from NVIDIA for Turing
      graphics/compute bring-up.
      
      Each subsystem was compared with traces, along with various tests to check
      that things generally work as they should, and appears compatible enough
      with the current TU117 code to enable support.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      75dec321
    • Ben Skeggs's avatar
      drm/nouveau/kms: disallow dual-link harder if hdmi connection detected · d1084184
      Ben Skeggs authored
      The fallthrough cases (pre-Fermi) would accidentally allow dual-link pixel
      clocks even where they shouldn't be.  This leads to a high resolution HDMI
      displays, connected via a DVI->HDMI adapter, to fail on the original NV50.
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      d1084184
    • Ilia Mirkin's avatar
      drm/nouveau/disp/nv50-: fix center/aspect-corrected scaling · 533f4752
      Ilia Mirkin authored
      Previously center scaling would get scaling applied to it (when it was
      only supposed to center the image), and aspect-corrected scaling did not
      always correctly pick whether to reduce width or height for a particular
      combination of inputs/outputs.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110660Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      533f4752
    • Ilia Mirkin's avatar
      drm/nouveau/disp/nv50-: force scaler for any non-default LVDS/eDP modes · f8d6211a
      Ilia Mirkin authored
      Higher layers tend to add a lot of modes not actually in the EDID, such
      as the standard DMT modes. Changing this would be extremely intrusive to
      everyone, so just force the scaler more often. There are no practical
      cases we're aware of where a LVDS/eDP panel has multiple resolutions
      exposed, and i915 already does it this way.
      
      Bugzilla: https://bugs.freedesktop.org/show_bug.cgi?id=110660Signed-off-by: default avatarIlia Mirkin <imirkin@alum.mit.edu>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      f8d6211a
    • Timo Wiren's avatar
      drm/nouveau/mcp89/mmu: Use mcp77_mmu_new instead of g84_mmu_new on MCP89. · bb2b4074
      Timo Wiren authored
      Fix a crash or broken depth testing in all OpenGL applications that use the
      depth buffer on MCP89 (GeForce 320M) seen on a MacBook Pro Late 2010.
      
      The bug is tracked in https://bugs.freedesktop.org/show_bug.cgi?id=108500Signed-off-by: default avatarTimo Wiren <timo.wiren@gmail.com>
      Signed-off-by: default avatarBen Skeggs <bskeggs@redhat.com>
      bb2b4074
  3. 15 Jul, 2019 3 commits
  4. 11 Jul, 2019 2 commits
  5. 09 Jul, 2019 4 commits
  6. 08 Jul, 2019 13 commits
  7. 05 Jul, 2019 3 commits