• Vicențiu Ciorbaru's avatar
    Introduce analyze_sample_percentage variable · f0773b78
    Vicențiu Ciorbaru authored
    The variable controls the amount of sampling analyze table performs.
    
    If ANALYZE table with histogram collection is too slow, one can reduce the
    time taken by setting analyze_sample_percentage to a lower value of the
    total number of rows.
    Setting it to 0 will use a formula to compute how many rows to sample:
    
    The number of rows collected is capped to a minimum of 50000 and
    increases logarithmically with a coffecient of 4096. The coffecient is
    chosen so that we expect an error of less than 3% in our estimations
    according to the paper:
    "Random Sampling for Histogram Construction: How much is enough?”
    – Surajit Chaudhuri, Rajeev Motwani, Vivek Narasayya, ACM SIGMOD, 1998.
    
    The drawback of sampling is that avg_frequency number is computed
    imprecisely and will yeild a smaller number than the real one.
    f0773b78
sql_statistics.cc 123 KB