• Kirill Smelkov's avatar
    bigfile/pagemap: Fix non-leaf page iteration · ee9bcd00
    Kirill Smelkov authored
    Since the beginning of pagemap (45af76e6 "bigfile/pagemap: specialized
    {} uint64 -> void * mapping") we had a bug sitting in
    __pagemap_for_each_leaftab() (non-leaf iterating logic behind
    pagemap_for_each):
    
    After entry to stack-down was found, we did not updated tailv[l]
    accordingly. Thus if there are non-adjacent entries an entry could be
    e.g. emitted many times:
    
         l 3  __down 0x7f79da1ee000
         tailv[4]: 0x7f79da1ee000
          -> tailv[4] 0x7f79da1ee000  __down 0x7f79da1ed000
    
         l 4  __down 0x7f79da1ed000
         tailv[5]: 0x7f79da1ed000
         h 5  l 5  leaftab: 0x7f79da1ed000      <--
          lvl 5  idx 169  page 0x55aa
        ok 9 - pagemap_for_each(0) == 21930
    
         l 5  __down (nil)
         tailv[4]: 0x7f79da1ee008
          -> tailv[4] 0x7f79da1ee008  __down 0x7f79da1ed000
    
         l 4  __down 0x7f79da1ed000
         tailv[5]: 0x7f79da1ed000
         h 5  l 5  leaftab: 0x7f79da1ed000      <--
          lvl 5  idx 169  page 0x55aa
        not ok 10 - pagemap_for_each(1) == 140724106500272
    
    And many-time-emitted entries are not only incorrect, but can also lead
    to not-handled segmentation faults in e.g. fileh_close():
    
        https://lab.nexedi.com/nexedi/wendelin.core/blob/v0.6-1-gb0b2c52/bigfile/virtmem.c#L179
    
        /* drop all pages (dirty or not) associated with this fileh */
        pagemap_for_each(page, &fileh->pagemap) {
            /* it's an error to close fileh to mapping of which an access is
             * currently being done in another thread */
            BUG_ON(page->state == PAGE_LOADING);
            page_drop_memory(page);
            list_del(&page->lru);                           <-- HERE
            bzero(page, sizeof(*page)); /* just in case */
            free(page);
        }
    
    ( because after first bzero of a page, the page is all 0 bytes including
      page->lru{.next,.prev} so on the second time when the same page is
      emitted by pagemap_for_each, list_del(&page->lru) will try to set
      page->lru.next = ... which will segfault. )
    
    So fix it by properly updating tailv[l] while we scan/iterate current level.
    
    NOTE
    
    This applies only to non-leaf pagemap levels, as leaf level is scanned
    with separate loop in pagemap_for_each. That's why we probably did not
    noticed this earlier - up until now our usual workloads was to change
    data in adjacent batches and that means adjacent pages.
    
    Though today @Tyagov was playing with wendelin.core in some other way and
    it uncovered the bug.
    ee9bcd00
pagemap.h 12.5 KB