• Yorick Peterse's avatar
    Improve AutolinkFilter#text_parse performance · dd35c3dd
    Yorick Peterse authored
    By using clever XPath queries we can quite significantly improve the
    performance of this method. The actual improvement depends a bit on the
    amount of links used but in my tests the new implementation is usually
    around 8 times faster than the old one. This was measured using the
    following benchmark:
    
        require 'benchmark/ips'
    
        text = '<p>' + Note.select("string_agg(note, '') AS note").limit(50).take[:note] + '</p>'
        document = Nokogiri::HTML.fragment(text)
        filter = Banzai::Filter::AutolinkFilter.new(document, autolink: true)
    
        puts "Input size: #{(text.bytesize.to_f / 1024 / 1024).round(2)} MB"
    
        filter.rinku_parse
    
        Benchmark.ips(time: 15) do |bench|
          bench.report 'text_parse' do
            filter.text_parse
          end
    
          bench.report 'text_parse_fast' do
            filter.text_parse_fast
          end
    
          bench.compare!
        end
    
    Here the "text_parse_fast" method is the new implementation and
    "text_parse" the old one. The input size was around 180 MB. Running this
    benchmark outputs the following:
    
        Input size: 181.16 MB
        Calculating -------------------------------------
                  text_parse     1.000  i/100ms
             text_parse_fast     9.000  i/100ms
        -------------------------------------------------
                  text_parse     13.021  (±15.4%) i/s -    188.000
             text_parse_fast    112.741  (± 3.5%) i/s -      1.692k
    
        Comparison:
             text_parse_fast:      112.7 i/s
                  text_parse:       13.0 i/s - 8.66x slower
    
    Again the production timings may (and most likely will) vary depending
    on the input being processed.
    dd35c3dd
autolink_filter.rb 3.44 KB