CHANGELOG · dd35c3ddf6dce7a69cc116fe6165dad68b8e9251 · Boxiang Sun / gitlab-ce · GitLab

Find file Blame History Permalink

Improve AutolinkFilter#text_parse performance · dd35c3dd

Yorick Peterse authored Aug 02, 2016

By using clever XPath queries we can quite significantly improve the
performance of this method. The actual improvement depends a bit on the
amount of links used but in my tests the new implementation is usually
around 8 times faster than the old one. This was measured using the
following benchmark:

    require 'benchmark/ips'

    text = '<p>' + Note.select("string_agg(note, '') AS note").limit(50).take[:note] + '</p>'
    document = Nokogiri::HTML.fragment(text)
    filter = Banzai::Filter::AutolinkFilter.new(document, autolink: true)

    puts "Input size: #{(text.bytesize.to_f / 1024 / 1024).round(2)} MB"

    filter.rinku_parse

    Benchmark.ips(time: 15) do |bench|
      bench.report 'text_parse' do
        filter.text_parse
      end

      bench.report 'text_parse_fast' do
        filter.text_parse_fast
      end

      bench.compare!
    end

Here the "text_parse_fast" method is the new implementation and
"text_parse" the old one. The input size was around 180 MB. Running this
benchmark outputs the following:

    Input size: 181.16 MB
    Calculating -------------------------------------
              text_parse     1.000  i/100ms
         text_parse_fast     9.000  i/100ms
    -------------------------------------------------
              text_parse     13.021  (±15.4%) i/s -    188.000
         text_parse_fast    112.741  (± 3.5%) i/s -      1.692k

    Comparison:
         text_parse_fast:      112.7 i/s
              text_parse:       13.0 i/s - 8.66x slower

Again the production timings may (and most likely will) vary depending
on the input being processed.

dd35c3dd

To find the state of this project's repository at the time of any of these versions, check out the tags.

CHANGELOG 178 KB