1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
# -*- coding: utf-8 -*-
##############################################################################
#
# Copyright (c) 2009 Nexedi SA and Contributors. All Rights Reserved.
# Nicolas Delaby <nicolas@nexedi.com>
#
# WARNING: This program as such is intended to be used by professional
# programmers who take the whole responsability of assessing all potential
# consequences resulting from its eventual inadequacies and bugs
# End users who are looking for a ready-to-use solution with commercial
# garantees and support are strongly adviced to contract a Free Software
# Service Company
#
# This program is Free Software; you can redistribute it and/or
# modify it under the terms of the GNU General Public License
# as published by the Free Software Foundation; either version 2
# of the License, or (at your option) any later version.
#
# This program is distributed in the hope that it will be useful,
# but WITHOUT ANY WARRANTY; without even the implied warranty of
# MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the
# GNU General Public License for more details.
#
# You should have received a copy of the GNU General Public License
# along with this program; if not, write to the Free Software
# Foundation, Inc., 59 Temple Place - Suite 330, Boston, MA 02111-1307, USA.
#
##############################################################################
"""
Products.ERP5.interfaces.document
"""
from zope.interface import Interface
class IDocument(Interface):
"""Document is an abstract class with all methods related to document
management in ERP5. This includes searchable text, explicit relations,
implicit relations, metadata, versions, languages, etc.
Documents may either store their content directly or cache content
which is retrieved from a specified URL. The second case if often
referred as "External Document". Standalone "External Documents" may
be created by specifying a URL to the contribution tool which is in
charge of initiating the download process and selecting the appropriate
document type. Groups of "External Documents" may also be generated from
so-called "External Source" (refer to ExternalSource class for more
information).
External Documents may be downloaded once or updated at regular interval.
The later can be useful to update the content of an external source.
Previous versions may be stored in place or kept in a separate file.
This feature is known as the crawling API. It is mostly implemented
in ContributionTool with wrappers in the Document class. It can be useful
for create a small search engine.
There are currently two types of Document subclasses:
* File for binary file based documents. File has subclasses such as Image,
OOoDocument, PDFDocument, etc. to implement specific conversion methods.
* TextDocument for text based documents. TextDocument has subclasses such
as Wiki to implement specific methods.
TextDocument itself has a subclass (XSLTDocument) which provides
XSLT based analysis and transformation of XML content based on XSLT
templates.
Conversion should be achieved through the convert method and other methods
of the conversion API (convertToBaseFormat, etc.).
Moreover, any Document subclass must ne able to convert documents to text
(asText method) and HTML (asHTML method). Text is required for full text
indexing. HTML is required for crawling.
Instances can be created directly, or via portal_contributions tool which
manages document ingestion process whereby a file can be uploaded by http
or sent in by email or dropped in by webdav or in some other way as yet
unknown. The ingestion process has the following steps:
(1) portal type detection
(2) object creation and upload of data
(3) metadata discovery (optionally with conversion of data to another format)
(4) other possible actions to finalise the ingestion (ex. by assigning
a reference)
This class handles (3) and calls a ZMI script to do (4).
Metadata can be drawn from various sources:
input - data supplied with http request or set on the object during (2) (e.g.
discovered from email text)
filename - data which might be encoded in filename
user_login - information about user who is contributing the file
content - data which might be derived from document content
If a certain property is defined in more than one source, it is set according to
preference order returned by a script
Document_getPreferredDocumentMetadataDiscoveryOrderList
(or any type-based version since discovery is type dependent)
Methods for discovering metadata are:
getPropertyDictFromInput
getPropertyDictFromFileName
getPropertyDictFromUserLogin
getPropertyDictFromContent
Methods for processing content are implemented either in Document class
or in Base class:
getSearchableReferenceList (Base)
getSearchableText (Base)
index_html (overriden in Document subclasses)
Methods for handling relations are implemented either in Document class
or in Base class:
getImplicitSuccessorValueList (Base)
getImplicitPredecessorValueList (Base)
getImplicitSimilarValueList (Base)
getSimilarCloudValueList (Document)
Implicit relations consist in finding document references inside
searchable text (ex. INV-23456) and deducting relations from that.
Two customisable methods required. One to find a list of implicit references
inside the content (getSearchableReferenceList) and one to convert a given
document reference into a list of reference strings which could be present
in other content (asSearchableReferenceList).
document.getSearchableReferenceList() returns
[
{'reference':' INV-12367'},
{'reference': 'INV-1112', 'version':'012}',
{'reference': 'AB-CC-DRK', 'version':'011', 'language': 'en'}
]
The Document class behaviour can be extended / customized through scripts
(which are type-based so can be adjusted per portal type).
* Document_getPropertyDictFromUserLogin - finds a user (by user_login or
from session) and returns properties which should be set on the document
* Document_getPropertyDictFromContent - analyzes document content and returns
properties which should be set on the document
* Base_getImplicitSuccessorValueList - finds appropriate all documents
referenced in the current content
* Base_getImplicitPredecessorValueList - finds document predecessors based on
the document coordinates (can use only complete coordinates, or also partial)
* Document_getPreferredDocumentMetadataDiscoveryOrderList - returns an order
in which metadata should be set/overwritten
* Document_finishIngestion - called by portal_activities after all the ingestion
is completed (and after document has been converted, so text_content
is available if the document has it)
* Document_getNewRevision - calculates revision number which should be set
on this document. Implementation depends on revision numbering policy which
can be very different. Interaction workflow should call setNewRevision method.
* Document_populateContent - analyses the document content and produces
subcontent based on it (ex. images, news, etc.). This scripts can
involve for example an XSLT transformation to process XML.
Subcontent: documents may include subcontent (files, images, etc.)
so that publication of rich content can be path independent. Subcontent
can also be used to help the rendering in HTML of complex documents
such as ODF documents.
Consistency checking:
Default implementation uses DocumentReferenceConstraint to check if the
reference/language/version triplet is unique. Additional constraints
can be added if necessary.
NOTE: Document.py supports a notion of revision which is very specific.
The underlying concept is that, as soon as a document has a reference,
the association of (reference, version, language) must be unique accross
the whole system. This means that a given document in a given version in a
given language is unique. The underlying idea is similar to the one in a Wiki
system in which each page is unique and acts the the atom of collaboration.
In the case of ERP5, if a team collaborates on a Text document written with
an offline word processor, all updates should be placed inside the same object.
A Contribution will thus modify an existing document, if allowed from security
point of view, and increase the revision number. Same goes for properties
(title). Each change generates a new revision.
"""
def convert(format, **kw):
"""Call a wrapped function with CachingMethod and
return always converted result: ie. tuple(content_type, data)
format - the format specied in the form of an extension
string (ex. jpeg, html, text, txt, etc.)
if format is None means that we do not want to change the format
and return the raw data or a resized image in same format.
**kw - can be various things - e.g. resolution
"""
def isSupportBaseDataConversion():
"""This is a public interface to check a document that is support conversion
to base format and can be overridden in subclasses.
"""