Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Support
Keyboard shortcuts
?
Submit feedback
Contribute to GitLab
Sign in / Register
Toggle navigation
W
wendelin
Project overview
Project overview
Details
Activity
Releases
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Issues
0
Issues
0
List
Boards
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Analytics
Analytics
CI / CD
Repository
Value Stream
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
Nikola Balog
wendelin
Commits
558e96dc
Commit
558e96dc
authored
Jun 24, 2020
by
Ivan Tyagov
Browse files
Options
Browse Files
Download
Plain Diff
Data lake changes
See merge request
nexedi/wendelin!45
parents
28a9e89e
4bd0aa64
Changes
44
Show whitespace changes
Inline
Side-by-side
Showing
44 changed files
with
595 additions
and
245 deletions
+595
-245
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_callables/DataIngestionLine_writeEbulkIngestionToDataStream.py
...bles/DataIngestionLine_writeEbulkIngestionToDataStream.py
+41
-0
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_callables/IngestionPolicy_parseEbulkIngestionTag.py
...ortal_callables/IngestionPolicy_parseEbulkIngestionTag.py
+3
-3
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_ingestion_policies/default_ebulk.xml
...hTemplateItem/portal_ingestion_policies/default_ebulk.xml
+246
-0
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_ingestion_policies/wendelin_embulk.xml
...emplateItem/portal_ingestion_policies/wendelin_embulk.xml
+5
-5
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_ingestion_reference_utils.xml
...plateItem/portal_skins/erp5_ingestion_reference_utils.xml
+0
-26
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_ingestion_reference_utils/getIngestionConstantsJson.py
...p5_ingestion_reference_utils/getIngestionConstantsJson.py
+0
-13
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataLake_executeDataAnalysisList.py
...p5_wendelin_data_lake/DataLake_executeDataAnalysisList.py
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataLake_stopIngestionList.py
...ins/erp5_wendelin_data_lake/DataLake_stopIngestionList.py
+6
-6
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_checkIngestionReferenceExists.py
...delin_data_lake/ERP5Site_checkIngestionReferenceExists.py
+4
-4
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_checkIngestionReferenceExists.xml
...elin_data_lake/ERP5Site_checkIngestionReferenceExists.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_checkReferenceInvalidated.py
..._wendelin_data_lake/ERP5Site_checkReferenceInvalidated.py
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_checkReferenceInvalidated.xml
...wendelin_data_lake/ERP5Site_checkReferenceInvalidated.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getDataStreamChunk.py
...ns/erp5_wendelin_data_lake/ERP5Site_getDataStreamChunk.py
+0
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getDataStreamChunk.xml
...s/erp5_wendelin_data_lake/ERP5Site_getDataStreamChunk.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getDataStreamList.py
...ins/erp5_wendelin_data_lake/ERP5Site_getDataStreamList.py
+36
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getDataStreamList.xml
...ns/erp5_wendelin_data_lake/ERP5Site_getDataStreamList.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getIngestionConstantsJson.py
..._wendelin_data_lake/ERP5Site_getIngestionConstantsJson.py
+13
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getIngestionConstantsJson.xml
...wendelin_data_lake/ERP5Site_getIngestionConstantsJson.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getIngestionReferenceDictionary.py
...lin_data_lake/ERP5Site_getIngestionReferenceDictionary.py
+0
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getIngestionReferenceDictionary.xml
...in_data_lake/ERP5Site_getIngestionReferenceDictionary.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_invalidateIngestionObjects.py
...wendelin_data_lake/ERP5Site_invalidateIngestionObjects.py
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_invalidateReference.py
...s/erp5_wendelin_data_lake/ERP5Site_invalidateReference.py
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_invalidateReference.xml
.../erp5_wendelin_data_lake/ERP5Site_invalidateReference.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_invalidateSplitIngestions.py
..._wendelin_data_lake/ERP5Site_invalidateSplitIngestions.py
+10
-14
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_renameIngestion.py
...skins/erp5_wendelin_data_lake/ERP5Site_renameIngestion.py
+34
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_renameIngestion.xml
...kins/erp5_wendelin_data_lake/ERP5Site_renameIngestion.xml
+62
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_revalidateReference.py
...s/erp5_wendelin_data_lake/ERP5Site_revalidateReference.py
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_revalidateReference.xml
.../erp5_wendelin_data_lake/ERP5Site_revalidateReference.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_stopIngestionList.py
...ins/erp5_wendelin_data_lake/ERP5Site_stopIngestionList.py
+50
-56
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5_getDescriptorHTMLContent.py
.../erp5_wendelin_data_lake/ERP5_getDescriptorHTMLContent.py
+0
-0
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5_getDescriptorHTMLContent.xml
...erp5_wendelin_data_lake/ERP5_getDescriptorHTMLContent.xml
+1
-1
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/IngestionPolicy_getIngestionOperationAndParameterDictEbulk.py
...stionPolicy_getIngestionOperationAndParameterDictEbulk.py
+8
-11
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDataStreamList.py
...portal_skins/erp5_wendelin_data_lake/getDataStreamList.py
+0
-38
bt5/erp5_wendelin_data_lake_ingestion/TestTemplateItem/portal_components/test.erp5.testDataLakeIngestion.py
...Item/portal_components/test.erp5.testDataLakeIngestion.py
+50
-43
bt5/erp5_wendelin_data_lake_ingestion/TestTemplateItem/portal_components/test.erp5.testDataLakeIngestion.xml
...tem/portal_components/test.erp5.testDataLakeIngestion.xml
+2
-3
bt5/erp5_wendelin_data_lake_ingestion/bt/template_keep_last_workflow_history_only_path_list
...ion/bt/template_keep_last_workflow_history_only_path_list
+1
-0
bt5/erp5_wendelin_data_lake_ingestion/bt/template_keep_workflow_path_list
...n_data_lake_ingestion/bt/template_keep_workflow_path_list
+1
-0
bt5/erp5_wendelin_data_lake_ingestion/bt/template_path_list
bt5/erp5_wendelin_data_lake_ingestion/bt/template_path_list
+1
-0
bt5/erp5_wendelin_data_lake_ingestion/bt/template_skin_id_list
...rp5_wendelin_data_lake_ingestion/bt/template_skin_id_list
+0
-1
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/fif_gadget_erp5_page_file_js.js
...plateItem/web_page_module/fif_gadget_erp5_page_file_js.js
+1
-1
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/fif_gadget_erp5_page_file_js.xml
...lateItem/web_page_module/fif_gadget_erp5_page_file_js.xml
+2
-2
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/gadget_fif_page_list_dataset_js.js
...teItem/web_page_module/gadget_fif_page_list_dataset_js.js
+1
-1
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/gadget_fif_page_list_dataset_js.xml
...eItem/web_page_module/gadget_fif_page_list_dataset_js.xml
+2
-2
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/gadget_fif_page_list_file_js.xml
...lateItem/web_page_module/gadget_fif_page_list_file_js.xml
+2
-2
No files found.
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_callables/DataIngestionLine_writeEbulkIngestionToDataStream.py
View file @
558e96dc
import
hashlib
import
base64
import
base64
from
Products.ZSQLCatalog.SQLCatalog
import
Query
CHUNK_SIZE
=
200000
def
getHash
(
data_stream
):
hash_md5
=
hashlib
.
md5
()
data_stream_chunk
=
None
n_chunk
=
0
chunk_size
=
CHUNK_SIZE
while
True
:
start_offset
=
n_chunk
*
chunk_size
end_offset
=
n_chunk
*
chunk_size
+
chunk_size
try
:
data_stream_chunk
=
''
.
join
(
data_stream
.
readChunkList
(
start_offset
,
end_offset
))
except
Exception
:
# data stream is empty
data_stream_chunk
=
""
hash_md5
.
update
(
data_stream_chunk
)
if
data_stream_chunk
==
""
:
break
n_chunk
+=
1
return
hash_md5
.
hexdigest
()
decoded
=
base64
.
b64decode
(
data_chunk
)
decoded
=
base64
.
b64decode
(
data_chunk
)
data_stream
.
appendData
(
decoded
)
data_stream
.
appendData
(
decoded
)
data_stream
.
setVersion
(
getHash
(
data_stream
))
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
reference_end_split
=
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
"split_end_suffix"
]
#if last chunk of split ingestion -> validate all related data streams and publish the current one:
if
data_stream
.
getId
().
endswith
(
reference_end_split
):
query
=
Query
(
portal_type
=
"Data Stream"
,
reference
=
data_stream
.
getReference
(),
validation_state
=
"draft"
)
split_ingestion_data_stream_list
=
portal_catalog
(
query
=
query
,
sort_on
=
((
'creation_date'
,
'ascending'
),))
#full_file_size = 0
for
chunk_data_stream
in
split_ingestion_data_stream_list
:
#full_file_size += chunk_data_stream.getSize()
if
chunk_data_stream
.
getValidationState
()
!=
"validated"
:
chunk_data_stream
.
validate
()
if
data_stream
.
getValidationState
()
!=
"validated"
:
data_stream
.
validate
()
data_stream
.
publish
()
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_callables/IngestionPolicy_parseEbulkIngestionTag.py
View file @
558e96dc
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
reference_separator
=
portal
.
getIngestionReferenceDictionary
()[
"reference_separator"
]
reference_separator
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"reference_separator"
]
reference_length
=
portal
.
getIngestionReferenceDictionary
()[
"reference_length"
]
reference_length
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"reference_length"
]
invalid_chars
=
portal
.
getIngestionReferenceDictionary
()[
"invalid_chars"
]
invalid_chars
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"invalid_chars"
]
record
=
reference
.
rsplit
(
reference_separator
)
record
=
reference
.
rsplit
(
reference_separator
)
length
=
len
(
record
)
length
=
len
(
record
)
...
...
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_ingestion_policies/default_ebulk.xml
0 → 100644
View file @
558e96dc
<?xml version="1.0"?>
<ZopeData>
<record
id=
"1"
aka=
"AAAAAAAAAAE="
>
<pickle>
<global
name=
"Ingestion Policy"
module=
"erp5.portal_type"
/>
</pickle>
<pickle>
<dictionary>
<item>
<key>
<string>
_Access_contents_information_Permission
</string>
</key>
<value>
<tuple>
<string>
Anonymous
</string>
<string>
Assignee
</string>
<string>
Assignor
</string>
<string>
Associate
</string>
<string>
Auditor
</string>
<string>
Manager
</string>
</tuple>
</value>
</item>
<item>
<key>
<string>
_Add_portal_content_Permission
</string>
</key>
<value>
<tuple>
<string>
Assignee
</string>
<string>
Assignor
</string>
<string>
Associate
</string>
<string>
Manager
</string>
</tuple>
</value>
</item>
<item>
<key>
<string>
_Modify_portal_content_Permission
</string>
</key>
<value>
<tuple>
<string>
Assignee
</string>
<string>
Assignor
</string>
<string>
Associate
</string>
<string>
Manager
</string>
</tuple>
</value>
</item>
<item>
<key>
<string>
_View_Permission
</string>
</key>
<value>
<tuple>
<string>
Anonymous
</string>
<string>
Assignee
</string>
<string>
Assignor
</string>
<string>
Associate
</string>
<string>
Auditor
</string>
<string>
Manager
</string>
</tuple>
</value>
</item>
<item>
<key>
<string>
data_operation_script_id
</string>
</key>
<value>
<string>
IngestionPolicy_getIngestionOperationAndParameterDictEbulk
</string>
</value>
</item>
<item>
<key>
<string>
default_reference
</string>
</key>
<value>
<string>
default_ebulk
</string>
</value>
</item>
<item>
<key>
<string>
description
</string>
</key>
<value>
<string>
Handles ingestion of raw files bytes sent to us from ebulk.
</string>
</value>
</item>
<item>
<key>
<string>
id
</string>
</key>
<value>
<string>
default_ebulk
</string>
</value>
</item>
<item>
<key>
<string>
portal_type
</string>
</key>
<value>
<string>
Ingestion Policy
</string>
</value>
</item>
<item>
<key>
<string>
script_id
</string>
</key>
<value>
<string>
IngestionPolicy_parseEbulkIngestionTag
</string>
</value>
</item>
<item>
<key>
<string>
title
</string>
</key>
<value>
<string>
Default Ebulk Ingestion Policy
</string>
</value>
</item>
<item>
<key>
<string>
version
</string>
</key>
<value>
<string>
001
</string>
</value>
</item>
<item>
<key>
<string>
workflow_history
</string>
</key>
<value>
<persistent>
<string
encoding=
"base64"
>
AAAAAAAAAAI=
</string>
</persistent>
</value>
</item>
</dictionary>
</pickle>
</record>
<record
id=
"2"
aka=
"AAAAAAAAAAI="
>
<pickle>
<global
name=
"PersistentMapping"
module=
"Persistence.mapping"
/>
</pickle>
<pickle>
<dictionary>
<item>
<key>
<string>
data
</string>
</key>
<value>
<dictionary>
<item>
<key>
<string>
edit_workflow
</string>
</key>
<value>
<persistent>
<string
encoding=
"base64"
>
AAAAAAAAAAM=
</string>
</persistent>
</value>
</item>
<item>
<key>
<string>
validation_workflow
</string>
</key>
<value>
<persistent>
<string
encoding=
"base64"
>
AAAAAAAAAAQ=
</string>
</persistent>
</value>
</item>
</dictionary>
</value>
</item>
</dictionary>
</pickle>
</record>
<record
id=
"3"
aka=
"AAAAAAAAAAM="
>
<pickle>
<global
name=
"WorkflowHistoryList"
module=
"Products.ERP5Type.Workflow"
/>
</pickle>
<pickle>
<dictionary>
<item>
<key>
<string>
_log
</string>
</key>
<value>
<list>
<dictionary>
<item>
<key>
<string>
action
</string>
</key>
<value>
<string>
edit
</string>
</value>
</item>
<item>
<key>
<string>
actor
</string>
</key>
<value>
<string>
zope
</string>
</value>
</item>
<item>
<key>
<string>
comment
</string>
</key>
<value>
<none/>
</value>
</item>
<item>
<key>
<string>
error_message
</string>
</key>
<value>
<string></string>
</value>
</item>
<item>
<key>
<string>
serial
</string>
</key>
<value>
<string>
984.990.49194.19609
</string>
</value>
</item>
<item>
<key>
<string>
state
</string>
</key>
<value>
<string>
current
</string>
</value>
</item>
<item>
<key>
<string>
time
</string>
</key>
<value>
<object>
<klass>
<global
name=
"DateTime"
module=
"DateTime.DateTime"
/>
</klass>
<tuple>
<none/>
</tuple>
<state>
<tuple>
<float>
1589986625.99
</float>
<string>
UTC
</string>
</tuple>
</state>
</object>
</value>
</item>
</dictionary>
</list>
</value>
</item>
</dictionary>
</pickle>
</record>
<record
id=
"4"
aka=
"AAAAAAAAAAQ="
>
<pickle>
<global
name=
"WorkflowHistoryList"
module=
"Products.ERP5Type.Workflow"
/>
</pickle>
<pickle>
<dictionary>
<item>
<key>
<string>
_log
</string>
</key>
<value>
<list>
<dictionary>
<item>
<key>
<string>
action
</string>
</key>
<value>
<string>
validate
</string>
</value>
</item>
<item>
<key>
<string>
actor
</string>
</key>
<value>
<string>
zope
</string>
</value>
</item>
<item>
<key>
<string>
comment
</string>
</key>
<value>
<string></string>
</value>
</item>
<item>
<key>
<string>
error_message
</string>
</key>
<value>
<string></string>
</value>
</item>
<item>
<key>
<string>
time
</string>
</key>
<value>
<object>
<klass>
<global
name=
"DateTime"
module=
"DateTime.DateTime"
/>
</klass>
<tuple>
<none/>
</tuple>
<state>
<tuple>
<float>
1589898672.1
</float>
<string>
UTC
</string>
</tuple>
</state>
</object>
</value>
</item>
<item>
<key>
<string>
validation_state
</string>
</key>
<value>
<string>
validated
</string>
</value>
</item>
</dictionary>
</list>
</value>
</item>
</dictionary>
</pickle>
</record>
</ZopeData>
bt5/erp5_wendelin_data_lake_ingestion/PathTemplateItem/portal_ingestion_policies/wendelin_embulk.xml
View file @
558e96dc
...
@@ -58,11 +58,11 @@
...
@@ -58,11 +58,11 @@
</item>
</item>
<item>
<item>
<key>
<string>
default_reference
</string>
</key>
<key>
<string>
default_reference
</string>
</key>
<value>
<string>
wendelin_e
m
bulk
</string>
</value>
<value>
<string>
wendelin_ebulk
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
description
</string>
</key>
<key>
<string>
description
</string>
</key>
<value>
<string>
Handles ingestion of raw files bytes sent to us from em
bulk.
</string>
</value>
<value>
<string>
[OBSOLETE - Kept for old ebulk clients] Handles ingestion of raw files bytes sent to us from e
bulk.
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
...
@@ -78,7 +78,7 @@
...
@@ -78,7 +78,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
title
</string>
</key>
<key>
<string>
title
</string>
</key>
<value>
<string>
Wendelin Em
bulk Ingestion Policy
</string>
</value>
<value>
<string>
[OBSOLETE] Wendelin E
bulk Ingestion Policy
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
version
</string>
</key>
<key>
<string>
version
</string>
</key>
...
@@ -152,7 +152,7 @@
...
@@ -152,7 +152,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
serial
</string>
</key>
<key>
<string>
serial
</string>
</key>
<value>
<string>
983.
12768.14725.372
0
</string>
</value>
<value>
<string>
983.
63603.62266.826
0
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
state
</string>
</key>
<key>
<string>
state
</string>
</key>
...
@@ -170,7 +170,7 @@
...
@@ -170,7 +170,7 @@
</tuple>
</tuple>
<state>
<state>
<tuple>
<tuple>
<float>
158
6946559.16
</float>
<float>
158
9986571.48
</float>
<string>
UTC
</string>
<string>
UTC
</string>
</tuple>
</tuple>
</state>
</state>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_ingestion_reference_utils.xml
deleted
100644 → 0
View file @
28a9e89e
<?xml version="1.0"?>
<ZopeData>
<record
id=
"1"
aka=
"AAAAAAAAAAE="
>
<pickle>
<global
name=
"Folder"
module=
"OFS.Folder"
/>
</pickle>
<pickle>
<dictionary>
<item>
<key>
<string>
_objects
</string>
</key>
<value>
<tuple/>
</value>
</item>
<item>
<key>
<string>
id
</string>
</key>
<value>
<string>
erp5_ingestion_reference_utils
</string>
</value>
</item>
<item>
<key>
<string>
title
</string>
</key>
<value>
<string></string>
</value>
</item>
</dictionary>
</pickle>
</record>
</ZopeData>
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_ingestion_reference_utils/getIngestionConstantsJson.py
deleted
100644 → 0
View file @
28a9e89e
import
json
portal
=
context
.
getPortalObject
()
dict
=
{
'invalid_suffix'
:
portal
.
getIngestionReferenceDictionary
()[
'invalid_suffix'
],
'split_end_suffix'
:
portal
.
getIngestionReferenceDictionary
()[
'split_end_suffix'
],
'single_end_suffix'
:
portal
.
getIngestionReferenceDictionary
()[
'single_end_suffix'
],
'split_first_suffix'
:
portal
.
getIngestionReferenceDictionary
()[
'split_first_suffix'
],
'none_extension'
:
portal
.
getIngestionReferenceDictionary
()[
'none_extension'
],
'reference_separator'
:
portal
.
getIngestionReferenceDictionary
()[
'reference_separator'
],
'complex_files_extensions'
:
portal
.
getIngestionReferenceDictionary
()[
'complex_files_extensions'
],
'reference_length'
:
portal
.
getIngestionReferenceDictionary
()[
'reference_length'
],
'invalid_chars'
:
portal
.
getIngestionReferenceDictionary
()[
'invalid_chars'
],
}
return
json
.
dumps
(
dict
)
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataLake_executeDataAnalysisList.py
View file @
558e96dc
from
Products.ERP5Type.Log
import
log
from
Products.ERP5Type.Log
import
log
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
portal_catalog
=
portal
.
portal_catalog
complex_files
=
portal
.
getIngestionReferenceDictionary
()[
"complex_files_extensions"
]
complex_files
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"complex_files_extensions"
]
for
data_analysis
in
portal_catalog
(
portal_type
=
"Data Analysis"
,
for
data_analysis
in
portal_catalog
(
portal_type
=
"Data Analysis"
,
simulation_state
=
"planned"
):
simulation_state
=
"planned"
):
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/DataLake_stopIngestionList.py
View file @
558e96dc
...
@@ -40,15 +40,15 @@ def isInterruptedAbandonedSplitIngestion(reference):
...
@@ -40,15 +40,15 @@ def isInterruptedAbandonedSplitIngestion(reference):
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
portal_catalog
=
portal
.
portal_catalog
reference_end_single
=
portal
.
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_end_single
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_first_split
=
portal
.
getIngestionReferenceDictionary
()[
"split_first_suffix"
]
reference_first_split
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_first_suffix"
]
reference_end_split
=
portal
.
getIngestionReferenceDictionary
()[
"split_end_suffix"
]
reference_end_split
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_end_suffix"
]
# stop single started ingestion (not split files)
# stop single started ingestion (not split files)
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
simulation_state
=
"started"
,
id
=
"%"
+
reference_end_single
):
id
=
"%"
+
reference_end_single
):
if
not
portal
.
Is
ReferenceInvalidated
(
data_ingestion
):
if
not
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_ingestion
):
related_split_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
related_split_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
reference
=
data_ingestion
.
getReference
())
reference
=
data_ingestion
.
getReference
())
if
len
(
related_split_ingestions
)
==
1
:
if
len
(
related_split_ingestions
)
==
1
:
...
@@ -67,7 +67,7 @@ for data_ingestion in portal_catalog(portal_type = "Data Ingestion",
...
@@ -67,7 +67,7 @@ for data_ingestion in portal_catalog(portal_type = "Data Ingestion",
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
simulation_state
=
"started"
,
id
=
"%"
+
reference_first_split
):
id
=
"%"
+
reference_first_split
):
if
not
portal
.
Is
ReferenceInvalidated
(
data_ingestion
):
if
not
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_ingestion
):
if
isInterruptedAbandonedSplitIngestion
(
data_ingestion
.
getReference
()):
if
isInterruptedAbandonedSplitIngestion
(
data_ingestion
.
getReference
()):
portal
.
ERP5Site_invalidateSplitIngestions
(
data_ingestion
.
getReference
(),
success
=
False
)
portal
.
ERP5Site_invalidateSplitIngestions
(
data_ingestion
.
getReference
(),
success
=
False
)
else
:
else
:
...
@@ -102,7 +102,7 @@ for data_ingestion in portal_catalog(portal_type = "Data Ingestion",
...
@@ -102,7 +102,7 @@ for data_ingestion in portal_catalog(portal_type = "Data Ingestion",
if
ingestion
.
getSimulationState
()
==
"started"
:
if
ingestion
.
getSimulationState
()
==
"started"
:
ingestion
.
stop
()
ingestion
.
stop
()
else
:
else
:
portal
.
I
nvalidateReference
(
ingestion
)
portal
.
ERP5Site_i
nvalidateReference
(
ingestion
)
ingestion
.
deliver
()
ingestion
.
deliver
()
except
Exception
as
e
:
except
Exception
as
e
:
context
.
logEntry
(
"ERROR appending split data streams for ingestion: %s - reference: %s."
%
(
data_ingestion
.
getId
(),
data_ingestion
.
getReference
()))
context
.
logEntry
(
"ERROR appending split data streams for ingestion: %s - reference: %s."
%
(
data_ingestion
.
getId
(),
data_ingestion
.
getReference
()))
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
i
ngestionReferenceExists.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5Site_checkI
ngestionReferenceExists.py
View file @
558e96dc
...
@@ -10,9 +10,9 @@ TRUE = "TRUE"
...
@@ -10,9 +10,9 @@ TRUE = "TRUE"
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
portal_catalog
=
portal
.
portal_catalog
reference_separator
=
portal
.
getIngestionReferenceDictionary
()[
"reference_separator"
]
reference_separator
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"reference_separator"
]
reference_end_single
=
portal
.
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_end_single
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_end_split
=
portal
.
getIngestionReferenceDictionary
()[
"split_end_suffix"
]
reference_end_split
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_end_suffix"
]
# remove supplier and eof from reference
# remove supplier and eof from reference
data_ingestion_reference
=
reference_separator
.
join
(
reference
.
split
(
reference_separator
)[
1
:
-
3
])
data_ingestion_reference
=
reference_separator
.
join
(
reference
.
split
(
reference_separator
)[
1
:
-
3
])
...
@@ -20,7 +20,7 @@ EOF = reference.split(reference_separator)[-3]
...
@@ -20,7 +20,7 @@ EOF = reference.split(reference_separator)[-3]
size
=
reference
.
split
(
reference_separator
)[
-
2
]
size
=
reference
.
split
(
reference_separator
)[
-
2
]
if
data_ingestion_reference
is
""
:
if
data_ingestion_reference
is
""
:
context
.
logEntry
(
"[ERROR] Data Ingestion reference parameter for
i
ngestionReferenceExists script is not well formated"
)
context
.
logEntry
(
"[ERROR] Data Ingestion reference parameter for
ERP5Site_checkI
ngestionReferenceExists script is not well formated"
)
raise
ValueError
(
"Data Ingestion reference is not well formated"
)
raise
ValueError
(
"Data Ingestion reference is not well formated"
)
# check if there are started ingestions for this reference
# check if there are started ingestions for this reference
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
i
ngestionReferenceExists.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5Site_checkI
ngestionReferenceExists.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
i
ngestionReferenceExists
</string>
</value>
<value>
<string>
ERP5Site_checkI
ngestionReferenceExists
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/Is
ReferenceInvalidated.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_check
ReferenceInvalidated.py
View file @
558e96dc
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
INVALID_SUFFIX
=
portal
.
getIngestionReferenceDictionary
()[
"invalid_suffix"
]
INVALID_SUFFIX
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"invalid_suffix"
]
return
document
.
getReference
().
endswith
(
INVALID_SUFFIX
)
return
document
.
getReference
().
endswith
(
INVALID_SUFFIX
)
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/InvalidateReference
.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_checkReferenceInvalidated
.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
InvalidateReference
</string>
</value>
<value>
<string>
ERP5Site_checkReferenceInvalidated
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDataStreamChunk.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5Site_
getDataStreamChunk.py
View file @
558e96dc
File moved
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDataStreamChunk.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5Site_
getDataStreamChunk.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
getDataStreamChunk
</string>
</value>
<value>
<string>
ERP5Site_
getDataStreamChunk
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getDataStreamList.py
0 → 100644
View file @
558e96dc
"""
This script is called from ebulk client to get list of Data Streams for a Data set.
"""
import
json
from
Products.ERP5Type.Log
import
log
portal
=
context
.
getPortalObject
()
try
:
data_set
=
portal
.
data_set_module
.
get
(
data_set_reference
)
if
data_set
is
None
or
portal
.
ERP5Site_checkReferenceInvalidated
(
data_set
):
return
{
"status_code"
:
0
,
"result"
:
[]
}
except
Exception
as
e
:
# fails because unauthorized access
log
(
"Unauthorized access to getDataStreamList: "
+
str
(
e
))
return
{
"status_code"
:
1
,
"error_message"
:
"401 - Unauthorized access. Please check your user credentials and try again."
}
data_stream_dict
=
{}
for
stream
in
data_set
.
DataSet_getDataStreamList
():
if
not
portal
.
ERP5Site_checkReferenceInvalidated
(
stream
)
and
stream
.
getValidationState
()
!=
"draft"
:
data_stream_info_dict
=
{
'id'
:
'data_stream_module/'
+
stream
.
getId
(),
'size'
:
stream
.
getSize
(),
'hash'
:
stream
.
getVersion
()
}
if
stream
.
getReference
()
in
data_stream_dict
:
data_stream_dict
[
stream
.
getReference
()][
'data-stream-list'
].
append
(
data_stream_info_dict
)
data_stream_dict
[
stream
.
getReference
()][
'large-hash'
]
=
data_stream_dict
[
stream
.
getReference
()][
'large-hash'
]
+
str
(
stream
.
getVersion
())
data_stream_dict
[
stream
.
getReference
()][
'full-size'
]
=
int
(
data_stream_dict
[
stream
.
getReference
()][
'full-size'
])
+
int
(
stream
.
getSize
())
else
:
data_stream_dict
[
stream
.
getReference
()]
=
{
'data-stream-list'
:
[
data_stream_info_dict
],
'id'
:
'data_stream_module/'
+
stream
.
getId
(),
'reference'
:
stream
.
getReference
(),
'large-hash'
:
stream
.
getVersion
(),
'full-size'
:
stream
.
getSize
()
}
result_dict
=
{
'status_code'
:
0
,
'result'
:
data_stream_dict
.
values
()}
return
json
.
dumps
(
result_dict
)
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDataStreamList.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5Site_
getDataStreamList.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
getDataStreamList
</string>
</value>
<value>
<string>
ERP5Site_
getDataStreamList
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_getIngestionConstantsJson.py
0 → 100644
View file @
558e96dc
import
json
portal
=
context
.
getPortalObject
()
dict
=
{
'invalid_suffix'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'invalid_suffix'
],
'split_end_suffix'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'split_end_suffix'
],
'single_end_suffix'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'single_end_suffix'
],
'split_first_suffix'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'split_first_suffix'
],
'none_extension'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'none_extension'
],
'reference_separator'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'reference_separator'
],
'complex_files_extensions'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'complex_files_extensions'
],
'reference_length'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'reference_length'
],
'invalid_chars'
:
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
'invalid_chars'
],
}
return
json
.
dumps
(
dict
)
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/getIngestionReferenceDictionary
.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_getIngestionConstantsJson
.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
getIngestionReferenceDictionary
</string>
</value>
<value>
<string>
ERP5Site_getIngestionConstantsJson
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/
getIngestionReferenceDictionary.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_
getIngestionReferenceDictionary.py
View file @
558e96dc
File moved
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/getIngestionConstantsJson
.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_getIngestionReferenceDictionary
.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
getIngestionConstantsJson
</string>
</value>
<value>
<string>
ERP5Site_getIngestionReferenceDictionary
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_invalidateIngestionObjects.py
View file @
558e96dc
...
@@ -13,7 +13,7 @@ kw_dict = {"query": portal_type_query,
...
@@ -13,7 +13,7 @@ kw_dict = {"query": portal_type_query,
"reference"
:
reference
}
"reference"
:
reference
}
for
document
in
portal_catalog
(
**
kw_dict
):
for
document
in
portal_catalog
(
**
kw_dict
):
portal
.
I
nvalidateReference
(
document
)
portal
.
ERP5Site_i
nvalidateReference
(
document
)
try
:
try
:
document
.
invalidate
()
document
.
invalidate
()
except
:
except
:
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/I
nvalidateReference.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_i
nvalidateReference.py
View file @
558e96dc
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
INVALID_SUFFIX
=
portal
.
getIngestionReferenceDictionary
()[
"invalid_suffix"
]
INVALID_SUFFIX
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"invalid_suffix"
]
try
:
try
:
if
not
document
.
getReference
().
endswith
(
INVALID_SUFFIX
):
if
not
document
.
getReference
().
endswith
(
INVALID_SUFFIX
):
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/IsReferenceInvalidated
.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_invalidateReference
.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
IsReferenceInvalidated
</string>
</value>
<value>
<string>
ERP5Site_invalidateReference
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_invalidateSplitIngestions.py
View file @
558e96dc
from
Products.ZSQLCatalog.SQLCatalog
import
Query
,
SimpleQuery
,
ComplexQuery
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
portal_catalog
=
portal
.
portal_catalog
...
@@ -14,34 +12,32 @@ try:
...
@@ -14,34 +12,32 @@ try:
data_ingestion
=
portal_catalog
.
getResultValue
(
data_ingestion
=
portal_catalog
.
getResultValue
(
portal_type
=
'Data Ingestion'
,
portal_type
=
'Data Ingestion'
,
id
=
data_stream
.
getId
())
id
=
data_stream
.
getId
())
portal
.
I
nvalidateReference
(
data_stream
)
portal
.
ERP5Site_i
nvalidateReference
(
data_stream
)
data_stream
.
invalidate
()
data_stream
.
invalidate
()
if
not
portal
.
Is
ReferenceInvalidated
(
data_ingestion
):
if
not
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_ingestion
):
portal
.
I
nvalidateReference
(
data_ingestion
)
portal
.
ERP5Site_i
nvalidateReference
(
data_ingestion
)
data_an
=
portal_catalog
.
getResultValue
(
data_an
=
portal_catalog
.
getResultValue
(
portal_type
=
'Data Analysis'
,
portal_type
=
'Data Analysis'
,
id
=
data_stream
.
getId
())
id
=
data_stream
.
getId
())
if
data_an
!=
None
:
if
data_an
!=
None
:
portal
.
I
nvalidateReference
(
data_an
)
portal
.
ERP5Site_i
nvalidateReference
(
data_an
)
data_array
=
portal_catalog
.
getResultValue
(
data_array
=
portal_catalog
.
getResultValue
(
portal_type
=
'Data Array'
,
portal_type
=
'Data Array'
,
id
=
data_stream
.
getId
())
id
=
data_stream
.
getId
())
if
data_array
!=
None
:
if
data_array
!=
None
:
portal
.
I
nvalidateReference
(
data_array
)
portal
.
ERP5Site_i
nvalidateReference
(
data_array
)
data_array
.
invalidate
()
data_array
.
invalidate
()
else
:
# split ingestion interrumped and restarted
else
:
# split ingestion interrumped and restarted
# invalidate draft datastreams and old started data ingestions
# invalidate draft datastreams and old started data ingestions
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
simulation_state
=
"started"
,
reference
=
reference
):
reference
=
reference
):
if
not
portal
.
Is
ReferenceInvalidated
(
data_ingestion
):
if
not
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_ingestion
):
portal
.
I
nvalidateReference
(
data_ingestion
)
portal
.
ERP5Site_i
nvalidateReference
(
data_ingestion
)
data_ingestion
.
deliver
()
data_ingestion
.
deliver
()
for
data_stream
in
portal_catalog
(
portal_type
=
"Data Stream"
,
for
data_stream
in
portal_catalog
(
portal_type
=
"Data Stream"
,
validation_state
=
"draft"
,
reference
=
reference
):
reference
=
reference
):
if
not
portal
.
Is
ReferenceInvalidated
(
data_stream
):
if
not
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_stream
):
portal
.
I
nvalidateReference
(
data_stream
)
portal
.
ERP5Site_i
nvalidateReference
(
data_stream
)
except
Exception
as
e
:
except
Exception
as
e
:
context
.
logEntry
(
"ERROR in ERP5Site_invalidateSplitIngestions: "
+
str
(
e
))
context
.
logEntry
(
"ERROR in ERP5Site_invalidateSplitIngestions: "
+
str
(
e
))
pass
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_renameIngestion.py
0 → 100644
View file @
558e96dc
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
reference_separator
=
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
"reference_separator"
]
none_extension
=
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
"none_extension"
]
# check new reference
data_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
reference
=
new_reference
)
if
len
(
data_ingestions
)
>
0
:
raise
"Error renaming: new reference '%s' already exists."
%
new_reference
# rename data ingestions
data_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
reference
=
reference
)
if
len
(
data_ingestions
)
==
0
:
raise
"Error renaming: could not find any data ingestion with reference '%s'."
%
reference
data_ingestion_title
=
reference_separator
.
join
(
new_reference
.
split
(
reference_separator
)[
1
:
-
1
])
for
data_ingestion
in
data_ingestions
:
data_ingestion
.
setReference
(
new_reference
)
data_ingestion
.
setTitle
(
data_ingestion_title
)
extension
=
new_reference
.
split
(
reference_separator
)[
-
1
]
data_stream_title
=
"%s%s"
%
(
data_ingestion_title
,
"."
+
extension
if
extension
!=
none_extension
else
""
)
# rename data streams
data_streams
=
portal_catalog
(
portal_type
=
"Data Stream"
,
reference
=
reference
)
for
data_stream
in
data_streams
:
data_stream
.
setReference
(
new_reference
)
data_stream
.
setTitle
(
data_stream_title
)
# rename data analysis
data_analysises
=
portal_catalog
(
portal_type
=
"Data Analysis"
,
reference
=
reference
)
for
data_analysis
in
data_analysises
:
data_analysis
.
setReference
(
new_reference
)
# rename data arrays
data_arrays
=
portal_catalog
(
portal_type
=
"Data Array"
,
reference
=
reference
)
for
data_array
in
data_arrays
:
data_array
.
setReference
(
new_reference
)
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_renameIngestion.xml
0 → 100644
View file @
558e96dc
<?xml version="1.0"?>
<ZopeData>
<record
id=
"1"
aka=
"AAAAAAAAAAE="
>
<pickle>
<global
name=
"PythonScript"
module=
"Products.PythonScripts.PythonScript"
/>
</pickle>
<pickle>
<dictionary>
<item>
<key>
<string>
Script_magic
</string>
</key>
<value>
<int>
3
</int>
</value>
</item>
<item>
<key>
<string>
_bind_names
</string>
</key>
<value>
<object>
<klass>
<global
name=
"NameAssignments"
module=
"Shared.DC.Scripts.Bindings"
/>
</klass>
<tuple/>
<state>
<dictionary>
<item>
<key>
<string>
_asgns
</string>
</key>
<value>
<dictionary>
<item>
<key>
<string>
name_container
</string>
</key>
<value>
<string>
container
</string>
</value>
</item>
<item>
<key>
<string>
name_context
</string>
</key>
<value>
<string>
context
</string>
</value>
</item>
<item>
<key>
<string>
name_m_self
</string>
</key>
<value>
<string>
script
</string>
</value>
</item>
<item>
<key>
<string>
name_subpath
</string>
</key>
<value>
<string>
traverse_subpath
</string>
</value>
</item>
</dictionary>
</value>
</item>
</dictionary>
</state>
</object>
</value>
</item>
<item>
<key>
<string>
_params
</string>
</key>
<value>
<string>
reference, new_reference
</string>
</value>
</item>
<item>
<key>
<string>
id
</string>
</key>
<value>
<string>
ERP5Site_renameIngestion
</string>
</value>
</item>
</dictionary>
</pickle>
</record>
</ZopeData>
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/R
evalidateReference.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_r
evalidateReference.py
View file @
558e96dc
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
INVALID_SUFFIX
=
portal
.
getIngestionReferenceDictionary
()[
"invalid_suffix"
]
INVALID_SUFFIX
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"invalid_suffix"
]
try
:
try
:
if
document
.
getReference
().
endswith
(
INVALID_SUFFIX
):
if
document
.
getReference
().
endswith
(
INVALID_SUFFIX
):
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
ingestion_reference_utils/R
evalidateReference.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_
wendelin_data_lake/ERP5Site_r
evalidateReference.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
R
evalidateReference
</string>
</value>
<value>
<string>
ERP5Site_r
evalidateReference
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/ERP5Site_stopIngestionList.py
View file @
558e96dc
from
Products.ERP5Type.Log
import
log
from
Products.ZSQLCatalog.SQLCatalog
import
Query
,
SimpleQuery
import
hashlib
import
hashlib
CHUNK_SIZE
=
200000
CHUNK_SIZE
=
200000
...
@@ -14,7 +12,7 @@ def getHash(data_stream):
...
@@ -14,7 +12,7 @@ def getHash(data_stream):
end_offset
=
n_chunk
*
chunk_size
+
chunk_size
end_offset
=
n_chunk
*
chunk_size
+
chunk_size
try
:
try
:
data_stream_chunk
=
''
.
join
(
data_stream
.
readChunkList
(
start_offset
,
end_offset
))
data_stream_chunk
=
''
.
join
(
data_stream
.
readChunkList
(
start_offset
,
end_offset
))
except
:
except
Exception
:
# data stream is empty
# data stream is empty
data_stream_chunk
=
""
data_stream_chunk
=
""
hash_md5
.
update
(
data_stream_chunk
)
hash_md5
.
update
(
data_stream_chunk
)
...
@@ -22,9 +20,18 @@ def getHash(data_stream):
...
@@ -22,9 +20,18 @@ def getHash(data_stream):
n_chunk
+=
1
n_chunk
+=
1
return
hash_md5
.
hexdigest
()
return
hash_md5
.
hexdigest
()
def
isFinishedSplitIngestion
(
reference
):
#check if all chunks of a split file were ingested
#that is if EOF chunk was ingested
reference_end_split
=
portal
.
ERP5Site_getIngestionReferenceDictionary
()[
"split_end_suffix"
]
eof_ingestion
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
reference
=
reference
,
id
=
"%"
+
reference_end_split
)
return
len
(
eof_ingestion
)
==
1
def
isInterruptedAbandonedSplitIngestion
(
reference
):
def
isInterruptedAbandonedSplitIngestion
(
reference
):
from
DateTime
import
DateTime
from
DateTime
import
DateTime
now
=
DateTime
()
day_hours
=
1.0
/
24
/
60
*
60
*
24
day_hours
=
1.0
/
24
/
60
*
60
*
24
# started split data ingestions for reference
# started split data ingestions for reference
catalog_kw
=
{
'portal_type'
:
'Data Ingestion'
,
catalog_kw
=
{
'portal_type'
:
'Data Ingestion'
,
...
@@ -40,70 +47,57 @@ def isInterruptedAbandonedSplitIngestion(reference):
...
@@ -40,70 +47,57 @@ def isInterruptedAbandonedSplitIngestion(reference):
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
portal_catalog
=
portal
.
portal_catalog
reference_end_single
=
portal
.
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_end_single
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_first_split
=
portal
.
getIngestionReferenceDictionary
()[
"split_first_suffix"
]
reference_first_split
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_first_suffix"
]
reference_end_split
=
portal
.
getIngestionReferenceDictionary
()[
"split_end_suffix"
]
reference_end_split
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_end_suffix"
]
# stop single started ingestion (not split files)
# stop single started ingestion (not split files)
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
simulation_state
=
"started"
,
id
=
"%"
+
reference_end_single
):
id
=
"%"
+
reference_end_single
):
if
not
portal
.
Is
ReferenceInvalidated
(
data_ingestion
):
if
not
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_ingestion
):
related_split_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
related_split_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
reference
=
data_ingestion
.
getReference
())
reference
=
data_ingestion
.
getReference
())
if
len
(
related_split_ingestions
)
==
1
:
if
len
(
related_split_ingestions
)
==
1
:
try
:
data_stream
=
portal_catalog
.
getResultValue
(
data_stream
=
portal_catalog
.
getResultValue
(
portal_type
=
'Data Stream'
,
portal_type
=
'Data Stream'
,
reference
=
data_ingestion
.
getReference
())
reference
=
data_ingestion
.
getReference
())
if
data_stream
is
not
None
:
if
data_stream
is
not
None
:
if
data_stream
.
getVersion
()
is
None
:
hash_value
=
getHash
(
data_stream
)
hash_value
=
getHash
(
data_stream
)
data_stream
.
setVersion
(
hash_value
)
data_stream
.
setVersion
(
hash_value
)
if
data_stream
.
getValidationState
()
!=
"validat
ed"
:
if
data_stream
.
getValidationState
()
!=
"validated"
and
data_stream
.
getValidationState
()
!=
"publish
ed"
:
data_stream
.
validate
()
data_stream
.
validate
()
if
data_stream
.
getValidationState
()
!=
"published"
:
data_stream
.
publish
()
if
data_ingestion
.
getSimulationState
()
==
"started"
:
if
data_ingestion
.
getSimulationState
()
==
"started"
:
data_ingestion
.
stop
()
data_ingestion
.
stop
()
except
Exception
as
e
:
context
.
log
(
"ERROR stoping single ingestion: %s - reference: %s."
%
(
data_ingestion
.
getId
(),
data_ingestion
.
getReference
()))
context
.
log
(
e
)
else
:
data_ingestion
.
deliver
()
#
append
split ingestions
#
handle
split ingestions
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
for
data_ingestion
in
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
simulation_state
=
"started"
,
id
=
"%"
+
reference_first_split
):
id
=
"%"
+
reference_first_split
):
if
not
portal
.
IsReferenceInvalidated
(
data_ingestion
):
if
not
portal
.
ERP5Site_checkReferenceInvalidated
(
data_ingestion
):
if
isInterruptedAbandonedSplitIngestion
(
data_ingestion
.
getReference
()):
if
isFinishedSplitIngestion
(
data_ingestion
.
getReference
()):
portal
.
ERP5Site_invalidateSplitIngestions
(
data_ingestion
.
getReference
(),
success
=
False
)
else
:
try
:
try
:
last_data_stream_id
=
""
query
=
Query
(
portal_type
=
"Data Stream"
,
reference
=
data_ingestion
.
getReference
(),
validation_state
=
"draft"
)
result_list
=
portal_catalog
(
query
=
query
,
sort_on
=
((
'creation_date'
,
'ascending'
),))
full_data_stream
=
None
for
data_stream
in
result_list
:
log
(
''
.
join
([
"Data stream for split ingestion: "
,
data_stream
.
getId
()]))
if
data_stream
.
getId
()
==
data_ingestion
.
getId
():
log
(
"It is base data stream"
)
full_data_stream
=
data_stream
else
:
log
(
"It is not base data stream, it is a part"
)
if
full_data_stream
!=
None
:
log
(
"appending content to base data stream..."
)
full_data_stream
.
appendData
(
data_stream
.
getData
())
last_data_stream_id
=
data_stream
.
getId
()
portal
.
data_stream_module
.
deleteContent
(
data_stream
.
getId
())
if
last_data_stream_id
.
endswith
(
reference_end_split
):
portal
.
ERP5Site_invalidateSplitIngestions
(
data_ingestion
.
getReference
(),
success
=
True
)
hash
=
getHash
(
full_data_stream
)
full_data_stream
.
setVersion
(
hash
)
if
full_data_stream
.
getValidationState
()
!=
"validated"
:
full_data_stream
.
validate
()
related_split_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
related_split_ingestions
=
portal_catalog
(
portal_type
=
"Data Ingestion"
,
simulation_state
=
"started"
,
simulation_state
=
"started"
,
reference
=
data_ingestion
.
getReference
())
reference
=
data_ingestion
.
getReference
())
for
ingestion
in
related_split_ingestions
:
for
ingestion
in
related_split_ingestions
:
if
ingestion
.
getId
()
==
full_data_stream
.
getId
(
):
if
ingestion
.
getId
().
endswith
(
reference_end_split
):
if
ingestion
.
getSimulationState
()
==
"started"
:
if
ingestion
.
getSimulationState
()
==
"started"
:
ingestion
.
stop
()
ingestion
.
stop
()
else
:
else
:
portal
.
InvalidateReference
(
ingestion
)
ingestion
.
deliver
()
ingestion
.
deliver
()
except
Exception
as
e
:
except
Exception
as
e
:
context
.
logEntry
(
"ERROR appending split data streams for ingestion: %s - reference: %s."
%
(
data_ingestion
.
getId
(),
data_ingestion
.
getReference
()))
context
.
log
(
"ERROR handling split data streams for ingestion: %s - reference: %s."
%
(
data_ingestion
.
getId
(),
data_ingestion
.
getReference
()))
context
.
logEntry
(
e
)
context
.
log
(
e
)
else
:
if
isInterruptedAbandonedSplitIngestion
(
data_ingestion
.
getReference
()):
portal
.
ERP5Site_invalidateSplitIngestions
(
data_ingestion
.
getReference
(),
success
=
False
)
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDescriptorHTMLContent.py
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5_
getDescriptorHTMLContent.py
View file @
558e96dc
File moved
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDescriptorHTMLContent.xml
→
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/
ERP5_
getDescriptorHTMLContent.xml
View file @
558e96dc
...
@@ -54,7 +54,7 @@
...
@@ -54,7 +54,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
id
</string>
</key>
<key>
<string>
id
</string>
</key>
<value>
<string>
getDescriptorHTMLContent
</string>
</value>
<value>
<string>
ERP5_
getDescriptorHTMLContent
</string>
</value>
</item>
</item>
</dictionary>
</dictionary>
</pickle>
</pickle>
...
...
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/IngestionPolicy_getIngestionOperationAndParameterDictEbulk.py
View file @
558e96dc
...
@@ -6,9 +6,9 @@ now_string = now.strftime('%Y%m%d-%H%M%S-%f')[:-3]
...
@@ -6,9 +6,9 @@ now_string = now.strftime('%Y%m%d-%H%M%S-%f')[:-3]
portal
=
context
.
getPortalObject
()
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
portal_catalog
=
portal
.
portal_catalog
reference_separator
=
portal
.
getIngestionReferenceDictionary
()[
"reference_separator"
]
reference_separator
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"reference_separator"
]
reference_end_single
=
portal
.
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
reference_end_single
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"single_end_suffix"
]
none_extension
=
portal
.
getIngestionReferenceDictionary
()[
"none_extension"
]
none_extension
=
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"none_extension"
]
# remove supplier, eof, size and hash from reference
# remove supplier, eof, size and hash from reference
reference
=
reference_separator
.
join
(
reference
.
split
(
reference_separator
)[
1
:
-
3
])
reference
=
reference_separator
.
join
(
reference
.
split
(
reference_separator
)[
1
:
-
3
])
...
@@ -20,8 +20,6 @@ supplier = movement_dict.get('supplier', None)
...
@@ -20,8 +20,6 @@ supplier = movement_dict.get('supplier', None)
extension
=
movement_dict
.
get
(
'extension'
,
None
)
extension
=
movement_dict
.
get
(
'extension'
,
None
)
dataset_reference
=
movement_dict
.
get
(
'dataset_reference'
,
None
)
dataset_reference
=
movement_dict
.
get
(
'dataset_reference'
,
None
)
data_ingestion_id
=
'%s_%s_%s_%s'
%
(
supplier
,
dataset_reference
,
now_string
,
eof
)
data_ingestion_id
=
'%s_%s_%s_%s'
%
(
supplier
,
dataset_reference
,
now_string
,
eof
)
size
=
movement_dict
.
get
(
'size'
,
None
)
if
movement_dict
.
get
(
'size'
,
None
)
!=
""
else
None
hash_value
=
movement_dict
.
get
(
'hash'
,
None
)
if
movement_dict
.
get
(
'hash'
,
None
)
!=
""
else
None
# search for applicable data ingestion
# search for applicable data ingestion
data_ingestion
=
portal_catalog
.
getResultValue
(
data_ingestion
=
portal_catalog
.
getResultValue
(
...
@@ -85,12 +83,9 @@ for supply_line in composed.objectValues(portal_type = 'Data Supply Line'):
...
@@ -85,12 +83,9 @@ for supply_line in composed.objectValues(portal_type = 'Data Supply Line'):
input_line
.
setAggregateSet
(
input_line
.
setAggregateSet
(
input_line
.
getAggregateList
()
+
operation_line
.
getAggregateList
())
input_line
.
getAggregateList
()
+
operation_line
.
getAggregateList
())
if
hash_value
is
None
or
eof
!=
reference_end_single
:
# do not set hash if split, calculate when append
hash_value
=
""
data_stream
=
portal
.
data_stream_module
.
newContent
(
data_stream
=
portal
.
data_stream_module
.
newContent
(
portal_type
=
"Data Stream"
,
portal_type
=
"Data Stream"
,
id
=
data_ingestion_id
,
id
=
data_ingestion_id
,
version
=
hash_value
,
title
=
"%s%s"
%
(
data_ingestion
.
getTitle
(),
"."
+
extension
if
extension
!=
none_extension
else
""
),
title
=
"%s%s"
%
(
data_ingestion
.
getTitle
(),
"."
+
extension
if
extension
!=
none_extension
else
""
),
reference
=
data_ingestion_reference
)
reference
=
data_ingestion_reference
)
...
@@ -109,10 +104,10 @@ if dataset_reference is not None:
...
@@ -109,10 +104,10 @@ if dataset_reference is not None:
# when a data set is uploaded from ebulk this means that "validation" is done at client side
# when a data set is uploaded from ebulk this means that "validation" is done at client side
# thus set set accordingly
# thus set set accordingly
data_set
.
validate
()
data_set
.
validate
()
except
:
except
Exception
:
data_set
=
portal
.
data_set_module
.
get
(
dataset_reference
)
data_set
=
portal
.
data_set_module
.
get
(
dataset_reference
)
if
portal
.
Is
ReferenceInvalidated
(
data_set
):
if
portal
.
ERP5Site_check
ReferenceInvalidated
(
data_set
):
portal
.
R
evalidateReference
(
data_set
)
portal
.
ERP5Site_r
evalidateReference
(
data_set
)
if
data_set
.
getValidationState
()
==
"invalidated"
:
if
data_set
.
getValidationState
()
==
"invalidated"
:
data_set
.
validate
()
data_set
.
validate
()
input_line
.
setDefaultAggregateValue
(
data_set
)
input_line
.
setDefaultAggregateValue
(
data_set
)
...
@@ -122,7 +117,9 @@ data_ingestion.start()
...
@@ -122,7 +117,9 @@ data_ingestion.start()
data_operation
=
operation_line
.
getResourceValue
()
data_operation
=
operation_line
.
getResourceValue
()
data_stream
=
input_line
.
getAggregateDataStreamValue
()
data_stream
=
input_line
.
getAggregateDataStreamValue
()
# if not split (one single ingestion) validate and publish the data stream
if
eof
==
reference_end_single
:
if
eof
==
reference_end_single
:
data_stream
.
validate
()
data_stream
.
validate
()
data_stream
.
publish
()
return
data_operation
,
{
'data_stream'
:
data_stream
}
return
data_operation
,
{
'data_stream'
:
data_stream
}
bt5/erp5_wendelin_data_lake_ingestion/SkinTemplateItem/portal_skins/erp5_wendelin_data_lake/getDataStreamList.py
deleted
100644 → 0
View file @
28a9e89e
"""
This script is called from ebulk client to get list of Data Streams for a
Data set.
"""
import
re
import
json
from
Products.ERP5Type.Log
import
log
from
Products.ZSQLCatalog.SQLCatalog
import
Query
,
SimpleQuery
,
ComplexQuery
portal
=
context
.
getPortalObject
()
portal_catalog
=
portal
.
portal_catalog
reference_separator
=
portal
.
getIngestionReferenceDictionary
()[
"reference_separator"
]
try
:
data_set
=
portal
.
data_set_module
.
get
(
data_set_reference
)
if
data_set
is
None
or
portal
.
IsReferenceInvalidated
(
data_set
):
return
{
"status_code"
:
0
,
"result"
:
[]
}
except
Exception
as
e
:
# fails because unauthorized access
log
(
"Unauthorized access to getDataStreamList."
)
return
{
"status_code"
:
1
,
"error_message"
:
"401 - Unauthorized access. Please check your user credentials and try again."
}
data_set
=
portal
.
data_set_module
.
get
(
data_set_reference
)
if
data_set
is
None
:
return
[]
data_stream_list
=
[]
for
stream
in
data_set
.
DataSet_getDataStreamList
():
if
stream
.
getVersion
()
==
""
:
return
{
"status_code"
:
2
,
"result"
:
[]
}
data_stream_list
.
append
({
'id'
:
'data_stream_module/'
+
stream
.
getId
(),
'reference'
:
stream
.
getReference
(),
'size'
:
stream
.
getSize
(),
'hash'
:
stream
.
getVersion
()
})
dict
=
{
'status_code'
:
0
,
'result'
:
data_stream_list
}
return
json
.
dumps
(
dict
)
bt5/erp5_wendelin_data_lake_ingestion/TestTemplateItem/portal_components/test.erp5.testDataLakeIngestion.py
View file @
558e96dc
...
@@ -27,10 +27,10 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -27,10 +27,10 @@ class TestDataIngestion(SecurityTestCase):
return
"DataIngestionTest"
return
"DataIngestionTest"
def
afterSetUp
(
self
):
def
afterSetUp
(
self
):
self
.
assertEqual
(
self
.
REFERENCE_SEPARATOR
,
self
.
portal
.
getIngestionReferenceDictionary
()[
"reference_separator"
])
self
.
assertEqual
(
self
.
REFERENCE_SEPARATOR
,
self
.
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"reference_separator"
])
self
.
assertEqual
(
self
.
INVALID
,
self
.
portal
.
getIngestionReferenceDictionary
()[
"invalid_suffix"
])
self
.
assertEqual
(
self
.
INVALID
,
self
.
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"invalid_suffix"
])
self
.
assertEqual
(
self
.
EOF
,
self
.
REFERENCE_SEPARATOR
+
self
.
portal
.
getIngestionReferenceDictionary
()[
"split_end_suffix"
])
self
.
assertEqual
(
self
.
EOF
,
self
.
REFERENCE_SEPARATOR
+
self
.
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_end_suffix"
])
self
.
assertEqual
(
self
.
PART_1
,
self
.
REFERENCE_SEPARATOR
+
self
.
portal
.
getIngestionReferenceDictionary
()[
"split_first_suffix"
])
self
.
assertEqual
(
self
.
PART_1
,
self
.
REFERENCE_SEPARATOR
+
self
.
portal
.
ERP5Site_
getIngestionReferenceDictionary
()[
"split_first_suffix"
])
def
getRandomReference
(
self
):
def
getRandomReference
(
self
):
random_string
=
''
.
join
([
random
.
choice
(
string
.
ascii_letters
+
string
.
digits
)
for
_
in
xrange
(
10
)])
random_string
=
''
.
join
([
random
.
choice
(
string
.
ascii_letters
+
string
.
digits
)
for
_
in
xrange
(
10
)])
...
@@ -70,6 +70,12 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -70,6 +70,12 @@ class TestDataIngestion(SecurityTestCase):
reference
=
reference
)
reference
=
reference
)
return
data_stream
return
data_stream
def
getDataStreamChunkList
(
self
,
reference
):
data_stream_list
=
self
.
portal
.
portal_catalog
(
portal_type
=
'Data Stream'
,
reference
=
reference
)
return
data_stream_list
def
ingestRequest
(
self
,
reference
,
eof
,
data_chunk
,
ingestion_policy
):
def
ingestRequest
(
self
,
reference
,
eof
,
data_chunk
,
ingestion_policy
):
encoded_data_chunk
=
base64
.
b64encode
(
data_chunk
)
encoded_data_chunk
=
base64
.
b64encode
(
data_chunk
)
request
=
self
.
portal
.
REQUEST
request
=
self
.
portal
.
REQUEST
...
@@ -84,13 +90,10 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -84,13 +90,10 @@ class TestDataIngestion(SecurityTestCase):
def
ingest
(
self
,
data_chunk
,
reference
,
extension
,
eof
,
randomize_ingestion_reference
=
False
):
def
ingest
(
self
,
data_chunk
,
reference
,
extension
,
eof
,
randomize_ingestion_reference
=
False
):
ingestion_reference
=
self
.
getIngestionReference
(
reference
,
extension
,
randomize_ingestion_reference
)
ingestion_reference
=
self
.
getIngestionReference
(
reference
,
extension
,
randomize_ingestion_reference
)
# use default ebulk policy
# use default ebulk policy
ingestion_policy
=
self
.
portal
.
portal_ingestion_policies
.
wendelin_embulk
ingestion_policy
=
self
.
portal
.
portal_ingestion_policies
.
default_ebulk
self
.
ingestRequest
(
ingestion_reference
,
eof
,
data_chunk
,
ingestion_policy
)
self
.
ingestRequest
(
ingestion_reference
,
eof
,
data_chunk
,
ingestion_policy
)
_
,
ingestion_reference
=
self
.
sanitizeReference
(
ingestion_reference
)
_
,
ingestion_reference
=
self
.
sanitizeReference
(
ingestion_reference
)
return
ingestion_reference
return
ingestion_reference
def
stepIngest
(
self
,
extension
,
delimiter
,
randomize_ingestion_reference
=
False
):
def
stepIngest
(
self
,
extension
,
delimiter
,
randomize_ingestion_reference
=
False
):
...
@@ -108,7 +111,6 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -108,7 +111,6 @@ class TestDataIngestion(SecurityTestCase):
chunk
.
append
(
line
)
chunk
.
append
(
line
)
else
:
else
:
break
break
ingestion_reference
=
self
.
ingest
(
data_chunk
,
reference
,
extension
,
self
.
SINGLE_INGESTION_END
,
randomize_ingestion_reference
=
randomize_ingestion_reference
)
ingestion_reference
=
self
.
ingest
(
data_chunk
,
reference
,
extension
,
self
.
SINGLE_INGESTION_END
,
randomize_ingestion_reference
=
randomize_ingestion_reference
)
if
os
.
path
.
exists
(
file_name
):
if
os
.
path
.
exists
(
file_name
):
...
@@ -127,8 +129,9 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -127,8 +129,9 @@ class TestDataIngestion(SecurityTestCase):
data_stream_data
=
data_stream
.
getData
()
data_stream_data
=
data_stream
.
getData
()
self
.
assertEqual
(
data_chunk
,
data_stream_data
)
self
.
assertEqual
(
data_chunk
,
data_stream_data
)
# check Data Stream and Data Set are validated
# check Data Set is validated and Data Stream is published
self
.
assertEqual
(
'validated'
,
data_stream
.
getValidationState
())
self
.
assertEqual
(
'validated'
,
data_set
.
getValidationState
())
self
.
assertEqual
(
'published'
,
data_stream
.
getValidationState
())
return
data_set
,
[
data_stream
]
return
data_set
,
[
data_stream
]
...
@@ -140,7 +143,7 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -140,7 +143,7 @@ class TestDataIngestion(SecurityTestCase):
def
test_02_DefaultSplitIngestion
(
self
):
def
test_02_DefaultSplitIngestion
(
self
):
"""
"""
Test multiple uploads from ebulk end up in
same Data Stream concatenated
Test multiple uploads from ebulk end up in
multiple Data Streams
(in case of large file upload when ebluk by default splits file to 50MBs
(in case of large file upload when ebluk by default splits file to 50MBs
chunks).
chunks).
"""
"""
...
@@ -152,7 +155,6 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -152,7 +155,6 @@ class TestDataIngestion(SecurityTestCase):
for
_
in
xrange
(
250
)])
for
_
in
xrange
(
250
)])
data_chunk_4
=
''
.
join
([
random
.
choice
(
string
.
ascii_letters
+
string
.
digits
)
\
data_chunk_4
=
''
.
join
([
random
.
choice
(
string
.
ascii_letters
+
string
.
digits
)
\
for
_
in
xrange
(
250
)])
for
_
in
xrange
(
250
)])
data_chunk
=
data_chunk_1
+
data_chunk_2
+
data_chunk_3
+
data_chunk_4
reference
=
self
.
getRandomReference
()
reference
=
self
.
getRandomReference
()
...
@@ -172,13 +174,20 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -172,13 +174,20 @@ class TestDataIngestion(SecurityTestCase):
time
.
sleep
(
1
)
time
.
sleep
(
1
)
self
.
tic
()
self
.
tic
()
# call explicitly alarm so all 4 Data Streams
can be concatenated to one
# call explicitly alarm so all 4 Data Streams
are validated and published
self
.
portal
.
portal_alarms
.
wendelin_
data_lake_handle_analysis
.
Alarm_dataLakeH
andleAnalysis
()
self
.
portal
.
portal_alarms
.
wendelin_
handle_analysis
.
Alarm_h
andleAnalysis
()
self
.
tic
()
self
.
tic
()
# check resulting Data Stream
# check resulting Data Streams
data_stream
=
self
.
getDataStream
(
ingestion_reference
)
data_stream_list
=
self
.
getDataStreamChunkList
(
ingestion_reference
)
self
.
assertEqual
(
data_chunk
,
data_stream
.
getData
())
#one data stream per chunk
self
.
assertEqual
(
len
(
data_stream_list
),
4
)
#last datastream (EOF) published, the rest validated
for
stream
in
data_stream_list
:
if
stream
.
getId
().
endswith
(
self
.
EOF
.
replace
(
self
.
REFERENCE_SEPARATOR
,
""
)):
self
.
assertEqual
(
'published'
,
stream
.
getValidationState
())
else
:
self
.
assertEqual
(
'validated'
,
stream
.
getValidationState
())
def
test_03_DefaultWendelinConfigurationExistency
(
self
):
def
test_03_DefaultWendelinConfigurationExistency
(
self
):
"""
"""
...
@@ -186,7 +195,7 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -186,7 +195,7 @@ class TestDataIngestion(SecurityTestCase):
"""
"""
# test default ebuk ingestion exists
# test default ebuk ingestion exists
self
.
assertNotEqual
(
None
,
self
.
assertNotEqual
(
None
,
getattr
(
self
.
portal
.
portal_ingestion_policies
,
"
wendelin_em
bulk"
,
None
))
getattr
(
self
.
portal
.
portal_ingestion_policies
,
"
default_e
bulk"
,
None
))
self
.
assertNotEqual
(
None
,
self
.
assertNotEqual
(
None
,
getattr
(
self
.
portal
.
data_supply_module
,
"embulk"
,
None
))
getattr
(
self
.
portal
.
data_supply_module
,
"embulk"
,
None
))
...
@@ -200,10 +209,8 @@ class TestDataIngestion(SecurityTestCase):
...
@@ -200,10 +209,8 @@ class TestDataIngestion(SecurityTestCase):
# check data relation between Data Set and Data Streams work
# check data relation between Data Set and Data Streams work
self
.
assertSameSet
(
data_stream_list
,
data_set
.
DataSet_getDataStreamList
())
self
.
assertSameSet
(
data_stream_list
,
data_set
.
DataSet_getDataStreamList
())
# publish data set and have all Data Streams publsihed automatically
# check data set and all Data Streams states
data_set
.
publish
()
self
.
assertEqual
(
'validated'
,
data_set
.
getValidationState
())
self
.
tic
()
self
.
assertEqual
(
'published'
,
data_set
.
getValidationState
())
self
.
assertSameSet
([
'published'
for
x
in
data_stream_list
],
self
.
assertSameSet
([
'published'
for
x
in
data_stream_list
],
[
x
.
getValidationState
()
for
x
in
data_stream_list
])
[
x
.
getValidationState
()
for
x
in
data_stream_list
])
...
...
bt5/erp5_wendelin_data_lake_ingestion/TestTemplateItem/portal_components/test.erp5.testDataLakeIngestion.xml
View file @
558e96dc
...
@@ -46,9 +46,8 @@
...
@@ -46,9 +46,8 @@
<key>
<string>
text_content_warning_message
</string>
</key>
<key>
<string>
text_content_warning_message
</string>
</key>
<value>
<value>
<tuple>
<tuple>
<string>
W: 88, 4: Unused variable \'ingestion_id\' (unused-variable)
</string>
<string>
W:102, 34: Unused variable \'i\' (unused-variable)
</string>
<string>
W: 95, 34: Unused variable \'i\' (unused-variable)
</string>
<string>
W:102, 76: Unused variable \'j\' (unused-variable)
</string>
<string>
W: 95, 76: Unused variable \'j\' (unused-variable)
</string>
</tuple>
</tuple>
</value>
</value>
</item>
</item>
...
...
bt5/erp5_wendelin_data_lake_ingestion/bt/template_keep_last_workflow_history_only_path_list
View file @
558e96dc
...
@@ -6,6 +6,7 @@ data_product_module/fif_descriptor
...
@@ -6,6 +6,7 @@ data_product_module/fif_descriptor
data_supply_module/embulk
data_supply_module/embulk
data_supply_module/embulk/**
data_supply_module/embulk/**
portal_ingestion_policies/wendelin_embulk
portal_ingestion_policies/wendelin_embulk
portal_ingestion_policies/default_ebulk
portal_categories/function/**
portal_categories/function/**
portal_categories/use/**
portal_categories/use/**
portal_alarms/wendelin_data_lake_handle_analysis
portal_alarms/wendelin_data_lake_handle_analysis
...
...
bt5/erp5_wendelin_data_lake_ingestion/bt/template_keep_workflow_path_list
View file @
558e96dc
...
@@ -6,4 +6,5 @@ data_product_module/fif_descriptor
...
@@ -6,4 +6,5 @@ data_product_module/fif_descriptor
data_supply_module/embulk
data_supply_module/embulk
data_supply_module/embulk/**
data_supply_module/embulk/**
portal_ingestion_policies/wendelin_embulk
portal_ingestion_policies/wendelin_embulk
portal_ingestion_policies/default_ebulk
portal_categories/use/**
portal_categories/use/**
\ No newline at end of file
bt5/erp5_wendelin_data_lake_ingestion/bt/template_path_list
View file @
558e96dc
...
@@ -10,4 +10,5 @@ portal_alarms/wendelin_data_lake_handle_analysis/**
...
@@ -10,4 +10,5 @@ portal_alarms/wendelin_data_lake_handle_analysis/**
portal_callables/DataIngestionLine_writeEbulkIngestionToDataStream
portal_callables/DataIngestionLine_writeEbulkIngestionToDataStream
portal_callables/IngestionPolicy_parseEbulkIngestionTag
portal_callables/IngestionPolicy_parseEbulkIngestionTag
portal_categories/use/**
portal_categories/use/**
portal_ingestion_policies/default_ebulk
portal_ingestion_policies/wendelin_embulk
portal_ingestion_policies/wendelin_embulk
\ No newline at end of file
bt5/erp5_wendelin_data_lake_ingestion/bt/template_skin_id_list
View file @
558e96dc
erp5_ingestion_reference_utils
erp5_wendelin_data_lake
erp5_wendelin_data_lake
\ No newline at end of file
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/fif_gadget_erp5_page_file_js.js
View file @
558e96dc
...
@@ -112,7 +112,7 @@
...
@@ -112,7 +112,7 @@
});
});
})
})
.
declareMethod
(
"
getDescriptorContent
"
,
function
(
descriptorReference
)
{
.
declareMethod
(
"
getDescriptorContent
"
,
function
(
descriptorReference
)
{
var
url
=
"
/
erp5/
getDescriptorHTMLContent?reference=
"
+
descriptorReference
,
var
url
=
"
/
ERP5_
getDescriptorHTMLContent?reference=
"
+
descriptorReference
,
xmlHttp
=
new
XMLHttpRequest
();
xmlHttp
=
new
XMLHttpRequest
();
try
{
try
{
xmlHttp
.
open
(
"
GET
"
,
url
,
false
);
xmlHttp
.
open
(
"
GET
"
,
url
,
false
);
...
...
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/fif_gadget_erp5_page_file_js.xml
View file @
558e96dc
...
@@ -238,7 +238,7 @@
...
@@ -238,7 +238,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
serial
</string>
</key>
<key>
<string>
serial
</string>
</key>
<value>
<string>
9
69.21927.24503.19865
</string>
</value>
<value>
<string>
9
84.982.33861.30037
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
state
</string>
</key>
<key>
<string>
state
</string>
</key>
...
@@ -256,7 +256,7 @@
...
@@ -256,7 +256,7 @@
</tuple>
</tuple>
<state>
<state>
<tuple>
<tuple>
<float>
15
33297268.38
</float>
<float>
15
89986197.54
</float>
<string>
UTC
</string>
<string>
UTC
</string>
</tuple>
</tuple>
</state>
</state>
...
...
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/gadget_fif_page_list_dataset_js.js
View file @
558e96dc
...
@@ -60,7 +60,7 @@
...
@@ -60,7 +60,7 @@
"
key
"
:
"
field_listbox
"
,
"
key
"
:
"
field_listbox
"
,
"
lines
"
:
15
,
"
lines
"
:
15
,
"
list_method
"
:
"
portal_catalog
"
,
"
list_method
"
:
"
portal_catalog
"
,
"
query
"
:
"
urn:jio:allDocs?query=portal_type%3A%22Data+Set%22+AND+validation_state%3A%22
publish
ed%22+AND+NOT+reference%3A%22%25_invalid%22
"
,
"
query
"
:
"
urn:jio:allDocs?query=portal_type%3A%22Data+Set%22+AND+validation_state%3A%22
validat
ed%22+AND+NOT+reference%3A%22%25_invalid%22
"
,
"
portal_type
"
:
[],
"
portal_type
"
:
[],
"
search_column_list
"
:
column_list
,
"
search_column_list
"
:
column_list
,
"
sort_column_list
"
:
column_list
,
"
sort_column_list
"
:
column_list
,
...
...
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/gadget_fif_page_list_dataset_js.xml
View file @
558e96dc
...
@@ -236,7 +236,7 @@
...
@@ -236,7 +236,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
serial
</string>
</key>
<key>
<string>
serial
</string>
</key>
<value>
<string>
98
2.48449.25246.62327
</string>
</value>
<value>
<string>
98
3.63603.62266.8260
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
state
</string>
</key>
<key>
<string>
state
</string>
</key>
...
@@ -254,7 +254,7 @@
...
@@ -254,7 +254,7 @@
</tuple>
</tuple>
<state>
<state>
<tuple>
<tuple>
<float>
158
7730634.22
</float>
<float>
158
9984604.57
</float>
<string>
UTC
</string>
<string>
UTC
</string>
</tuple>
</tuple>
</state>
</state>
...
...
bt5/erp5_wendelin_data_lake_ui/PathTemplateItem/web_page_module/gadget_fif_page_list_file_js.xml
View file @
558e96dc
...
@@ -236,7 +236,7 @@
...
@@ -236,7 +236,7 @@
</item>
</item>
<item>
<item>
<key>
<string>
serial
</string>
</key>
<key>
<string>
serial
</string>
</key>
<value>
<string>
98
2.48449.25246.62327
</string>
</value>
<value>
<string>
98
4.971.49023.47684
</string>
</value>
</item>
</item>
<item>
<item>
<key>
<string>
state
</string>
</key>
<key>
<string>
state
</string>
</key>
...
@@ -254,7 +254,7 @@
...
@@ -254,7 +254,7 @@
</tuple>
</tuple>
<state>
<state>
<tuple>
<tuple>
<float>
15
87730667.12
</float>
<float>
15
91034166.31
</float>
<string>
UTC
</string>
<string>
UTC
</string>
</tuple>
</tuple>
</state>
</state>
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment