Commit 0bc443e3 authored by Yorick Peterse's avatar Yorick Peterse

Handle encoding in non-binary Blob instances

gitlab_git 10.6.4 relies on Rugged marking blobs as binary or not,
instead of relying on Linguist. Linguist in turn would mark text blobs
as binary whenever they would contain byte sequences that could not be
encoded using UTF-8.

However, marking such blobs as binary is not correct. If one pushes a
Markdown document with invalid character sequences it's still a text
based Markdown document and not some random binary blob.

This commit overwrites Blob#data so it automatically converts text-based
content to UTF-8 (the encoding we use everywhere else) while taking care
of replacing any invalid sequences with the UTF-8 replacement character.
The data of binary blobs is left as-is.
parent 9980f52c
......@@ -22,6 +22,18 @@ class Blob < SimpleDelegator
new(blob)
end
# Returns the data of the blob.
#
# If the blob is a text based blob the content is converted to UTF-8 and any
# invalid byte sequences are replaced.
def data
if binary?
super
else
@data ||= super.encode(Encoding::UTF_8, invalid: :replace, undef: :replace)
end
end
def no_highlighting?
size && size > 1.megabyte
end
......
# encoding: utf-8
require 'rails_helper'
describe Blob do
......@@ -7,6 +8,25 @@ describe Blob do
end
end
describe '#data' do
context 'using a binary blob' do
it 'returns the data as-is' do
data = "\n\xFF\xB9\xC3"
blob = described_class.new(double(binary?: true, data: data))
expect(blob.data).to eq(data)
end
end
context 'using a text blob' do
it 'converts the data to UTF-8' do
blob = described_class.new(double(binary?: false, data: "\n\xFF\xB9\xC3"))
expect(blob.data).to eq("\n���")
end
end
end
describe '#svg?' do
it 'is falsey when not text' do
git_blob = double(text?: false)
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment