Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
K
klaus_wendelin
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Eteri
klaus_wendelin
Commits
452dea3a
Commit
452dea3a
authored
Feb 26, 2021
by
Roque
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
erp5_wendelin_data_lake_ingestion: fix check md5 script
parent
c2cf0c14
Changes
1
Show whitespace changes
Inline
Side-by-side
Showing
1 changed file
with
19 additions
and
6 deletions
+19
-6
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataSet_checkMd5DataStreamList.py
...erp5_wendelin_data_lake/DataSet_checkMd5DataStreamList.py
+19
-6
No files found.
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataSet_checkMd5DataStreamList.py
View file @
452dea3a
"""
Script to check that a filesystem md5sum of a folder (uploaded to file_system_checksum File)
is properly uploaded to Wendelin Data Lake.
Script to check that a data set is properly uploaded
to Wendelin Data Lake.
How to use it: create a file_system_checksum file containing md5sum
values of all dataset files uploaded with the following format:
Format of is the same as md5sum's output:
<md5_sum> <filename.extension>
It can be generated in the original data set folder outside wendelin by doing md5sum * > output.txt
"""
import
os.path
data
=
str
(
context
.
file_system_checksum
).
strip
()
lines
=
data
.
split
(
"
\
n
"
)
print
"Total files = "
,
len
(
lines
)
print
check_result
=
True
for
line
in
lines
[:]:
md5_checksum
=
line
[:
32
].
strip
()
full_filename
=
line
[
32
:].
strip
()
# check Data stream for this hash exists
filename
,
extension
=
full_filename
.
split
(
"."
)
filename
,
extension
=
os
.
path
.
splitext
(
full_filename
)
extension
=
extension
[
1
:]
reference
=
"%s/%s/%s"
%
(
data_set_reference
,
filename
,
extension
)
catalog_kw
=
{
"portal_type"
:
"Data Stream"
,
"reference"
:
reference
}
data_stream
=
context
.
portal_catalog
.
getResultValue
(
**
catalog_kw
)
if
data_stream
is
None
:
print
"[NOT FOUND]"
,
reference
check_result
=
False
else
:
is_upload_ok
=
(
data_stream
.
getVersion
()
==
md5_checksum
)
print
md5_checksum
,
filename
,
data_stream
is
not
None
,
is_upload_ok
if
not
is_upload_ok
:
check_result
=
False
print
if
check_result
:
print
"[OK] Data set correctly uploaded"
else
:
print
"[ERROR] Data set was not correctly uploaded"
return
printed
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment