Commit 452dea3a authored by Roque's avatar Roque

erp5_wendelin_data_lake_ingestion: fix check md5 script

parent c2cf0c14
Script to check that a filesystem md5sum of a folder (uploaded to file_system_checksum File)
is properly uploaded to Wendelin Data Lake.
Script to check that a data set is properly uploaded
to Wendelin Data Lake.
How to use it: create a file_system_checksum file containing md5sum
values of all dataset files uploaded with the following format:
Format of is the same as md5sum's output:
<md5_sum> <filename.extension>
It can be generated in the original data set folder outside wendelin by doing md5sum * > output.txt
import os.path
data = str(context.file_system_checksum).strip()
lines = data.split("\n")
print "Total files = ", len(lines)
check_result = True
for line in lines[:]:
md5_checksum = line[:32].strip()
full_filename = line[32:].strip()
# check Data stream for this hash exists
filename, extension = full_filename.split(".")
filename, extension = os.path.splitext(full_filename)
extension = extension[1:]
reference = "%s/%s/%s" %(data_set_reference, filename, extension)
catalog_kw = {"portal_type": "Data Stream",
"reference": reference}
data_stream = context.portal_catalog.getResultValue(**catalog_kw)
if data_stream is None:
print "[NOT FOUND]", reference
check_result = False
is_upload_ok = (data_stream.getVersion()==md5_checksum)
print md5_checksum, filename, data_stream is not None, is_upload_ok
if not is_upload_ok:
check_result = False
if check_result:
print "[OK] Data set correctly uploaded"
print "[ERROR] Data set was not correctly uploaded"
return printed
Markdown is supported
You are about to add 0 people to the discussion. Proceed with caution.
Finish editing this message first!
Please register or to comment