• Guilhem Bichot's avatar
    Fix for Bug#57932 "query with avg returns incorrect results": · 39b0af1e
    Guilhem Bichot authored
    when there was one NULL value, AVG(DISTINCT) could forget about other values.
    See commit comment of item_sum.cc.
    
    mysql-test/r/func_group.result:
      before the code fix, both SELECTs would return NULL
    sql/item_sum.cc:
      Assume we are executing "SELECT AVG([DISTINCT] some_field) FROM some_table".
      and some_field is the single field of some_table for simplicity.
      Each time a row is processed (evaluate_join_record()->
      end_send_group()->update_sum_func()) an aggregator is notified,
      which itself notifies an Item_sum_avg.
      Without DISTINCT, this Item_sum_avg immediately increments its
      internal "sum of values" and "count of values" (the latter being
      Item_sum_avg::count). The count is incremented only if the row's value
      is not NULL (in Item_sum_avg::add()), per AVG() semantices. This row's value
      is available in args[0] of Item_sum_avg ("args[0]" stands for
      "the first argument of the item": it's an Item_field which automatically
      receives the row's value when a row is read from the table).
      bool Item_sum_avg::add()
      {
        if (Item_sum_sum::add()) << calculates the sum (ignores NULL)
          return TRUE;
        if (!args[0]->null_value)<<if added value is not NULL
          count++;       <<increment "count"
        return FALSE;
      }
      and everything works.
      With DISTINCT, when a row is processed by evaluate_join_record(),
      Item_sum_avg does no immediate computation, rather stores
      the row's value in a tree (to throw the value away if it is a duplicate
      of previous value, otherwise to remember all
      distinct values). It's only when it's time to send the average to the
      user (at end of the query:
      sub_select(end_of_records=true)->end_send_group()->
      select_send->send_data()->Protocol::send_result_set_row()->
      Item::send()->Item_sum_avg->val_str()), that we iterate over the tree,
      compute the sum and count: for this, for each element of the tree,
      Item_sum_avg::add() is called and has the same two steps as before:
      * Item_sum_sum::add() updates the sum (finding the tree element's value
      correctly, and determining correctly its NULLness - look for "arg_is_null"
      in that function)
      * the "if (!args[0]->null_value)" test right after, breaks: it uses args[0],
      which isn't the tree's element but rather the value for the last row
      processed by evaluate_join_record(). So if that last row was NULL,
      "count" stays 0 for each row, and AVG() then returns NULL (count==0 =>
      NULL, per AVG() semantics).
      The fix is to let the aggregator tell whether the value
      it just saw was NULL. The aggregator knows where to get the info
      thanks to virtual functions. Item_sum_sum::add() now asks
      the aggregator. Item_sum_avg() also asks the aggregator
      and then knows it shouldn't increment "count".
    sql/item_sum.h:
      Aggregator can now tell about value/NULLness of just-aggregated value
    39b0af1e
item_sum.cc 86.7 KB