Commit da419833 authored by Stan Hu's avatar Stan Hu

Fix LFS not working with S3 specific-storage settings

https://gitlab.com/gitlab-org/gitlab/-/merge_requests/48269 enabled LFS
clients to use chunked transfers via `Transfer-Encoding: chunked`.

However, in some cases, this breaks uploads to AWS S3 if
specific-storage settings are used.  We were able to reproduce this
problem with Vagrant, but not with VMs on AWS or GCP.

When direct uploads are used, GitLab will only generate pre-signed,
multipart URLs if the `Content-Length` parameter is sent by the
client. Previously when chunked transfers were not supported, this
header was always available and thus was hard-coded in the LFS storage
controller.

When this header is no longer available, the Workhorse `BodyUploader`
attempts to transfer the file with an S3 PutObject API call with
`Transfer-Encoding: chunked`.  This will return a `501 Not Implemented`
error since this header is not supported; S3 has a different mechanism
to chunked transfers. Note that Google Cloud Storage works fine in this
manner.

Now that `Content-Length` is not always available, we have a few
options:

1. Disable LFS chunked transfers.
2. Re-enable request buffering in NGINX.
3. Modify Workhorse to tell us whether `Content-Length` was sent.
4. Be pessimistic and always generate multipart URLs and use a max
length with the size of file.

Option 1 is not optimal because we still want to support LFS chunked
transfers, especially for GitLab.com where Cloudflare will reject files
over 5 GB.

Option 2 is not desirable either because it causes NGINX to store a
large temporary file and delays the transfer to object storage.

Option 3 is a slightly preferable option, but it involves modifying
Workhorse as well. We should consider this as a follow-up issue.

To fix the immediate problem, we implement option 4. Note that using
consolidated object storage settings avoids this problem because
Workhorse handles the upload natively with the AWS SDK and doesn't need
presigned URLs.

Relates to https://gitlab.com/gitlab-org/gitlab-workhorse/-/issues/292
parent c095fcc7
...@@ -21,7 +21,17 @@ module Repositories ...@@ -21,7 +21,17 @@ module Repositories
def upload_authorize def upload_authorize
set_workhorse_internal_api_content_type set_workhorse_internal_api_content_type
authorized = LfsObjectUploader.workhorse_authorize(has_length: true) # We don't actually know whether Workhorse received an LFS upload
# request with a Content-Length header or `Transfer-Encoding:
# chunked`. Since we don't know, we need to be pessimistic and
# set `has_length` to `false` so that multipart uploads will be
# used for AWS. Otherwise, AWS will respond with `501 NOT IMPLEMENTED`
# error because a PutObject request with `Transfer-Encoding: chunked`
# is not supported.
#
# This is only an issue with object storage-specific settings, not
# with consolidated object storage settings.
authorized = LfsObjectUploader.workhorse_authorize(has_length: false, maximum_size: size)
authorized.merge!(LfsOid: oid, LfsSize: size) authorized.merge!(LfsOid: oid, LfsSize: size)
render json: authorized render json: authorized
......
---
title: Fix LFS not working with S3 specific-storage settings
merge_request: 52296
author:
type: fixed
...@@ -11,23 +11,75 @@ RSpec.describe Repositories::LfsStorageController do ...@@ -11,23 +11,75 @@ RSpec.describe Repositories::LfsStorageController do
let_it_be(:pat) { create(:personal_access_token, user: user, scopes: ['write_repository']) } let_it_be(:pat) { create(:personal_access_token, user: user, scopes: ['write_repository']) }
let(:lfs_enabled) { true } let(:lfs_enabled) { true }
let(:params) do
{
repository_path: "#{project.full_path}.git",
oid: '6b9765d3888aaec789e8c309eb05b05c3a87895d6ad70d2264bd7270fff665ac',
size: '6725030'
}
end
before do before do
stub_config(lfs: { enabled: lfs_enabled }) stub_config(lfs: { enabled: lfs_enabled })
end end
describe 'PUT #upload_finalize' do describe 'PUT #upload_authorize' do
let(:headers) { workhorse_internal_api_request_header } let(:headers) { workhorse_internal_api_request_header }
let(:extra_headers) { {} } let(:extra_headers) { {} }
let(:uploaded_file) { temp_file }
let(:params) do before do
{ request.headers.merge!(extra_headers)
repository_path: "#{project.full_path}.git", request.headers.merge!(headers)
oid: '6b9765d3888aaec789e8c309eb05b05c3a87895d6ad70d2264bd7270fff665ac', end
size: '6725030'
} subject do
put :upload_authorize, params: params
end
context 'with unauthorized roles' do
where(:user_role, :expected_status) do
:guest | :forbidden
:anonymous | :unauthorized
end
with_them do
let(:extra_headers) do
if user_role == :anonymous
{}
else
{ 'HTTP_AUTHORIZATION' => ActionController::HttpAuthentication::Basic.encode_credentials(user.username, pat.token) }
end
end
before do
project.send("add_#{user_role}", user) unless user_role == :anonymous
end
it_behaves_like 'returning response status', params[:expected_status]
end
end
context 'with at least developer role' do
let(:extra_headers) { { 'HTTP_AUTHORIZATION' => ActionController::HttpAuthentication::Basic.encode_credentials(user.username, pat.token) } }
before do
project.add_developer(user)
end
it 'sets Workhorse with a max limit' do
expect(LfsObjectUploader).to receive(:workhorse_authorize).with(has_length: false, maximum_size: params[:size].to_i).and_call_original
subject
expect(response).to have_gitlab_http_status(:ok)
end
end end
end
describe 'PUT #upload_finalize' do
let(:headers) { workhorse_internal_api_request_header }
let(:extra_headers) { {} }
let(:uploaded_file) { temp_file }
before do before do
request.headers.merge!(extra_headers) request.headers.merge!(extra_headers)
......
...@@ -750,7 +750,7 @@ RSpec.describe 'Git LFS API and storage' do ...@@ -750,7 +750,7 @@ RSpec.describe 'Git LFS API and storage' do
expect(json_response['RemoteObject']).to have_key('GetURL') expect(json_response['RemoteObject']).to have_key('GetURL')
expect(json_response['RemoteObject']).to have_key('StoreURL') expect(json_response['RemoteObject']).to have_key('StoreURL')
expect(json_response['RemoteObject']).to have_key('DeleteURL') expect(json_response['RemoteObject']).to have_key('DeleteURL')
expect(json_response['RemoteObject']).not_to have_key('MultipartUpload') expect(json_response['RemoteObject']).to have_key('MultipartUpload')
expect(json_response['LfsOid']).to eq(sample_oid) expect(json_response['LfsOid']).to eq(sample_oid)
expect(json_response['LfsSize']).to eq(sample_size) expect(json_response['LfsSize']).to eq(sample_size)
end end
......
Markdown is supported
0%
or
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment