Page tree
Skip to end of metadata
Go to start of metadata

Overview

The Advanced Text Extraction Workflow extracts various metadata properties from input documents. This page describes those properties and how to configure them. 

View incoming links.

The standard processing feedback fields are also populated as described in Configure the AIE Schema

Native Metadata Properties

The native metadata properties are listed in the following configuration file: <installation_dir>\conf\advancedtextextraction\advancedtextextraction-metadata.xml. In this configuration file, a mapping between native metadata properties and valid AIE field names is maintained.

This mapping has to be done due to the fact that the native metadata properties do not follow any particular naming convensions and thus their names may be invalid with regard to the AIE field naming scheme.

A lot of the fields to which the metadata properties are mapped, are included into the AIE schema defined for the Advanced Text Extraction module. This schema can be found in <installation_dir>\conf\advancedtextextraction\advancedtextextraction-schema.xml.

When working on your AIE project, you are able to specify exactly which fields you want in your own schema, under <project_dir>\conf\advancedtextextraction\advancedtextextraction-schema.xml.

It is highly recommended that you identify and use only the subset of fields that you feel are necessary for your application, to minimize the affect of having a large schema on the AIE performance.

Supported Metadata Properties

The sections below elaborate on two sets of metadata:

  • The "common" set of field names that apply to all document types. These are system-level fields which should not be removed from your schema.
  • Metadata properties (and respective fields) that are specific to particular supported document types. These are the fields you want to choose your own set of fields from, by editing your schema.

Common Metadata

The following table summarizes the "common" properties extracted by the Advanced Text Extraction Workflow of the Attivio Intelligence Engine (AIE):

Field name

Data type

Description

Notes

ancestorids

String

The list of ancestor document ID's (stored on a child document, the outermost ancestor first and the immediate parent last)

This is useful when tracking the full parent/child lineage of document hierarchies, for example, in a Word document within a zip archive that happens to be attached to an email message. The ancestorids field for the Word document start swith the ID of the email message document and ends with the ID of the zip archive document, thus preserving the hierarchy of relationships.
See also parentid.

childpath

String

The name or path of a child document (by which it is referred to in the parent document).

See also childtype.

childtype

String

The type of child document: attachment, entry, embedding.

See also childpath.

doctype

String

The document type

The supported document types are summarized in the advancedtextextraction-doctypes.xml file. The 'type' attribute value is stored as the doctype field value.
The "doctype" is a user-friendly name for the document type, e.g. "Microsoft Word 2007 Document".
See also parentdoctype.

fileext

String

The extension of the original filename (if any)

 

filename

String

The short filename of the document

 

mimetype

String

The MIME type of the document

The supported document types are summarized in the doc-types-config.xml file. The 'mimetype' attribute value is stored as the mimetype field value.
The "mimetype" is an industry-standard MIME type value, e.g. "application/vns.ms-word" for a Word document.
See also parentmimetype.

parentdoctype

String

The parent document type (if any)

The supported document types are summarized in the doc-types-config.xml file. The 'parenttype' attribute value is stored as the parentdoctype field value.
The "parentdoctype" is a common, umbrella name for document types that belong together in one group, e.g. "Word".
See also doctype.

parentid

String

The ID of the parent document (if any)

Holds the ID of the parent document, if any. In contrast to ancestorids, this allows for efficient querying of parent/child relationships on a single level of a document hierarchy. For a given sub-document, it allows you to quickly find its parent, and for a parent document, it allows you to quickly find all of its child documents.
See also ancestorids.

parentmimetype

String

The parent MIME type (if any)

The supported document types are summarized in the doc-types-config.xml file. The 'parentmimetype' attribute value is stored as the parentmimetype field value.
The "parentmimetype" is a "parent", umbrella MIME type which applies across several document types, e.g. "application/vnd.ms-office" for Microsoft Office types which include Word, Excel, PowerPoint, etc.
See also mimetype.

sourcepath

String

The original filepath of the document (if any)

 

sourceuri

String

The original URI of the document (if any)

 

text

Text

The text content extracted from the document

 

The Specific Properties

The following table summarizes all of the field names that are supported by AIE for documents going through the Advanced Text Extraction Workflow.

Note that AIE maps the metadata properties extracted, to a consistent, normalized set of Attivio fields. This is due to the fact that the native metadata properties extracted do not follow any particular pattern or naming scheme.

Field name

Data type

Description

Version added in

abstract

String

the document abstract

 

acceptlanguage

String

MIME/email-related property

AIE 3.0

account

String

the account information

 

actualwork

Long

the actual work (typically, for an Outlook task)

 

address

String

the address (e.g. for an Outlook contact)

 

albumtitle

String

the album title (e.g. for an MP3 file)

 

alternaterecipientallowed

Boolean

the alternate recipient allowed value (for an email message)

 

anniversary

String

the anniversary (typically for an Outlook contact)

 

application

String

the name of the application that was used to create the document

 

appversion

String

the version of the application that was used to create the document

 

assistant

String

the assistant's name

 

assistantphonenum

String

the person's assistant's telephone number (typically for an Outlook contact)

 

attachment

String

attachment

AIE 3.0

attachments

String

the list of attachments in an email message

 

attendees

String

the list of attendees (e.g. for an Outlook meeting request)

 

attrhidden

Boolean

the 'attr hidden' header value (for a MIME message)

 

attrreadonly

Boolean

the 'attr read-only' header value (for a MIME message)

 

attrsystem

Boolean

the 'attr system' header value (for a MIME message)

 

author

String

the document's author or authors

 

authorization

String

the authorization

 

autoforwarded

String

the 'auto-forwarded' header value (for an email message)

 

backupdate

Date

the document backup date

 

basefilelocation

String

the document base file location

 

bcc

String

the blank carbon copy field in an email message

 

billinginfo

String

the billing information (typically, for an Outlook task)

 

billto

String

bill to

 

birthday

String

the birthday (typically for an Outlook contact)

 

businessaddress

String

the business address (typically for an Outlook contact)

 

businessaddresscity

String

the contact's business address city (typically for an Outlook contact)

 

businessaddresscountry

String

the contact's business address country (typically for an Outlook contact)

 

businessaddresspobox

String

the contact's business address P.O. Box (typically for an Outlook contact)

 

businessaddresspostalcode

String

the contact's business address postal code (typically for an Outlook contact)

 

businessaddressstate

String

the contact's business address state (typically for an Outlook contact)

 

businessaddressstreet

String

the contact's business address street (typically for an Outlook contact)

 

businesscity

String

the person's business city (typically for an Outlook contact)

AIE 3.0

businesscountry

String

the person's business country (typically for an Outlook contact)

AIE 3.0

businessfaxnum

String

the person's business fax number (typically for an Outlook contact)

 

businessphonenum

String

the person's business telephone number (typically for an Outlook contact)

 

businessphonenum2

String

the person's alternative telephone number (typically for an Outlook contact)

 

businesspostalcode

String

the person's business postal code (typically for an Outlook contact)

AIE 3.0

businessstate

String

the person's business state (typically for an Outlook contact)

AIE 3.0

businessstreet

String

the person's business street (typically for an Outlook contact)

AIE 3.0

businessstreet2

String

the person's alternative business street (typically for an Outlook contact)

AIE 3.0

callbackphonenum

String

the person's callback telephone numbner (typically for an Outlook contact)

 

carphonenum

String

the person's car telephone number (typically for an Outlook contact)

 

cat

String

the document category

 

cc

String

the carbon copy field in an email message

 

ccme

Boolean

the 'CC me' property (for a MIME message)

 

checkedby

String

the name of the person that checked the document

 

client

String

the client

 

clientsubmittime

Date

the email client message submit time

 

comments

Text

comments

 

company

String

the company name

 

companyphonenum

String

the company telephone number (typically for an Outlook contact)

 

completeddate

Date

the completion date

 

contacts

String

the list of contacts (e.g. for an Outlook task)

 

contentbase

String

MIME/email-related property

AIE 3.0

contentlanguage

String

MIME/email-related property

AIE 3.0

contentlocation

String

MIME/email-related property

AIE 3.0

contenttransferencoding

String

MIME/email-related property

AIE 3.0

contenttype

String

the content type

 

conversationindex

String

the conversation index for a MIME message

 

conversationtopic

String

the conversation topic for a MIME message

 

creationdate

Date

the date the document was created

 

creatorentryid

String

the 'creator entry ID' header field (for an email message)

 

date

Date

the date on which the document was last modified

 

days

Integer

the number of days assigned (typically, for an Outlook task)

 

deferreddeliverytime

String

the 'deferred delivery time' header value

 

deleteaftersubmit

Boolean

the 'delete after submit' header value (for an email message)

 

department

String

the department

 

description

Text

the description

 

destination

String

the destination

 

displayasemail

String

the 'display as' value for a person's email address (typically for an Outlook contact)

 

displayasemail2

String

the 'display as' value for a person's alternative email address (typically for an Outlook contact)

 

displayasemail3

String

the 'display as' value for a person's second alternative email address (typically for an Outlook contact)

 

disposition

String

the disposition

 

division

String

the division

 

docnumber

Integer

the document's number

 

docrevnumber

String

the document revision number (typically a number but may be a string literal, depending on the versioning scheme)

 

docsecurity

Integer

the document security value (typically on MS Office documents)

 

doctype

String

the document type

 

domainkeysignature

String

MIME/email-related property

AIE 3.0

duedate

Date

the due date (typically, for an Outlook task)

 

editminutes

Integer

the number of minutes the document was last edited for

 

editor

String

the document editor

 

email

String

the person's email (typically for an Outlook contact)

 

email2

String

the person's alternative email address (typically for an Outlook contact)

 

email3

String

the person's second alternative email address (typically for an Outlook contact)

 

entryid

String

entry ID

AIE 3.0

entrytype

String

the entry type

 

expirationdate

Date

the email message's expiration date

 

familyname

String

the person's family name (typically for an Outlook contact)

AIE 3.0

fileas

String

the 'file as' property

 

firstname

String

the person's first name (typically for an Outlook contact)

AIE 3.0

flagstatus

String

the flag status (for email messages)

 

flagsts

Long

the flag sts value

 

footers

Text

the document footers

 

forwardto

String

the 'forward to' value (e.g. in an email message)

 

fullname

String

the full name (typically for Outlook contacts)

 

gender

String

the gender (typically for an Outlook contact)

 

group

String

the group

 

headers

Text

the document headers

 

headingpairs

String

the heading pairs

 

homeaddress

String

the home address (typically for an Outlook contact)

 

homeaddresscity

String

the person's home address city (typically for an Outlook contact)

 

homeaddresscountry

String

the person's home address country (typically for an Outlook contact)

 

homeaddresspobox

String

the person's home address P.O. Box (typically for an Outlook contact)

 

homeaddresspostalcode

String

the person's home address postal code (typically for an Outlook contact)

 

homeaddressstate

String

the person's home address state (typically for an Outlook contact)

 

homeaddressstreet

String

the person's home address street (typically for an Outlook contact)

 

homefaxnum

String

the person's home fax number (typically for an Outlook contact)

 

homephone

String

the person's home phone number

AIE 3.0

homephonenum

String

the person's home telephone number (typically for an Outlook contact)

 

homephonenum2

String

the person's alternative home telephone number (typically for an Outlook contact)

 

hours

Integer

the number of hours assigned (typically, for an Outlook task)

 

imaddress

String

the instant messenger address (typically for an Outlook contact)

 

importance

String

the importance (typically for Outlook tasks)

 

inetmailoverrideformat

Long

the 'Internet mail override format' header value (for a MIME message)

 

injectioninfo

String

MIME/email-related property

AIE 3.0

internetarticlenumber

Long

the Internet article number (for a MIME message)

 

internetcpid

Long

the 'Internet CPID' header value (for a MIME message)

 

internetfreebusyaddress

String

the Internet free busy address

 

internetmessageid

String

the 'Internet message ID' header value (for a MIME message)

 

isdnphonenum

String

the contact's ISDN telephone number (typically for an Outlook contact)

 

jobtitle

String

the job title (typically for Outlook contacts)

 

keywords

String

the keywords

 

language

String

the language

 

lastmodifier

String

the name of the person who last saved the document

 

lastmodifierentryid

String

the last modifier's entry ID (for an email message)

 

lastprinteddate

Date

the date the document was last printed

 

latestdeliverytime

Date

the latest delivery time (for an email message)

 

leadperformer

String

the lead performer (e.g. on an MP3 file)

 

lines

String

MIME/email-related property

AIE 3.0

linksdirty

String

the 'links dirty' flag, typically in MS Office documents

 

linksuptodate

String

the 'links are up-to-date' flag, typically in MS Office documents

 

location

String

the location

 

mailstatus

Long

the mail status

AIE 3.0

mailstop

String

the mail stop

 

manager

String

the manager

 

matter

String

the matter

 

messageflag

String

the message flag (for email messages)

 

messageid

String

MIME/email-related property

AIE 3.0

messagelocaleid

Long

the 'message locale ID' header value (for an email message)

 

middlename

String

the person's middle name (typically for an Outlook contact)

AIE 3.0

mileage

String

the mileage (typically, for an Outlook task)

 

mimeversion

String

MIME/email-related property

AIE 3.0

minutes

Integer

the number of minutes assigned (typically, for an Outlook task)

 

mobilephonenum

String

the person's mobile telephone number (typically for an Outlook contact)

 

msgclass

String

the message class (for a MIME message)

 

msgcodepage

Long

the 'message codepage' header value (for a MIME message)

 

msgeditorformat

Long

the 'message editor format' header value (for a MIME message)

 

msgflag

Long

the message flag value

 

name

String

the name value (e.g. for an Outlook task)

 

newsgroups

String

the list of newsgroups (for newsgroup postings)

 

nickname

String

the nickname (typically for an Outlook contact)

 

nntppostingdate

Date

MIME/email-related property

AIE 3.0

nntppostinghost

String

MIME/email-related property

AIE 3.0

normalizedsubject

String

normalized subject

AIE 3.0

ntsecuritydescriptor

String

the NT security descriptor (for an email message)

 

numchars

Integer

the number of characters in the document

 

numcharswithspaces

Integer

the number of characters in the document, including spaces

 

numhiddenslides

Integer

the number of hidden slides in the document (e.g. in a PowerPoint presentation)

 

numlines

Integer

the number of lines in the document

 

nummmclips

Integer

the number of multimedia clips in the document (e.g. in a PowerPoint presentation)

 

numnotes

Integer

the number of notes in the document

 

numpages

Integer

the number of pages in the document

 

numparagraphs

Integer

the number of paragraphs in the document

 

numslidenotes

Integer

the number of slide notes in the document (e.g. in a PowerPoint presentation)

 

numslides

Integer

the number of slides in the document (e.g. in a PowerPoint presentation)

 

numwords

Integer

the number of words in the document

 

office

String

the office

 

operator

String

the operator

 

optionalattendees

String

the list of optional attendees

 

organization

String

MIME/email-related property

AIE 3.0

originatordeliveryreportrequested

Boolean

the 'originator delivery report requested' header value

 

otheraddress

String

the other address (typically for an Outlook contact)

 

otheraddresscity

String

the person's other address city (typically for an Outlook contact)

 

otheraddresscountry

String

the person's other address country (typically for an Outlook contact)

 

otheraddresspobox

String

the person's other address P.O. Box (typically for an Outlook contact)

 

otheraddresspostalcode

String

the person's other postal code (typically for an Outlook contact)

 

otheraddressstate

String

the person's other state (typically for an Outlook contact)

 

otheraddressstreet

String

the person's other address street (typically for an Outlook contact)

 

otherfaxnum

String

the person's other fax number (typically for an Outlook contact)

 

otherphonenum

String

the person's other telephone number (typically for an Outlook contact)

 

owner

String

the owner

 

pagerphonenum

String

the person's pager telephone number (typically for an Outlook contact)

 

path

String

MIME/email-related property

AIE 3.0

percentcomplete

String

the percent complete (typically, for an Outlook task)

 

personalhomepage

String

the contact's personal home page (typically for an Outlook contact)

 

presentationformat

String

the presentation format (e.g. for a PowerPoint presentation)

 

primaryphonenum

String

the person's primary telephone number (typically for an Outlook contact)

 

priority

Long

the priority (for an email message)

 

profession

String

the person's profession (typically for an Outlook contact)

 

profileconnectflags

Long

the 'profile connect flags' header value (for a MIME message)

 

progid

String

MIME/email-related property

AIE 3.0

project

String

the project

 

purpose

String

the purpose

 

radiophonenum

String

the person's radio telephone number (typically for an Outlook contact)

 

rcvdbyflags

Long

the 'rcvd by flags' header value (for a MIME message)

 

rcvdrepresentingaddrtype

String

the 'rcvd representing addrtype' header value

 

rcvdrepresentingemailaddress

String

the 'rcvd representing email address' header value

 

rcvdrepresentingentryid

String

the 'rcvd representing entry ID' header value

 

rcvdrepresentingflags

Long

the 'rcvd representing flags' header value (for a MIME message)

 

rcvdrepresentingname

String

the 'rcvd representing name' header value

 

rcvdrepresentingsearchkey

String

the 'rcvd representing search key' header value

 

readreceiptrequested

Boolean

the 'mail read receipt requested' header value

 

received

String

the 'received' property

 

receivedbyaddrtype

String

the 'received by addrtype' header value

 

receivedbyemailaddress

String

the 'received by email address' header value

 

receivedbyentryid

String

the 'received by entry ID' header value

 

receivedbyname

String

the 'received by name' header value

 

receivedbysearchkey

String

the 'received by search key' header value

 

receiveddate

Date

the date on which the email message was received

 

receivedfrom

String

the value of the 'received from' header for email messages

 

recipientreassignmentprohibited

String

the 'recipient reassignment prohibited' header value

 

recordedby

String

the name of the person who recorded the information contained in the document

 

recordeddate

Date

the date on which the information contained in the document was recorded

 

reference

String

the reference

 

remindertopic

String

the reminder topic

 

replyrequested

String

the 'reply requested' header value (for an email message)

 

replytime

String

the 'reply time' header value (for an email message)

 

reporttag

String

the 'report tag' header value (for an email message)

 

requiredattendees

String

the list of required attendees

 

responserequested

String

the 'response requested' header value (for an email message)

 

returnpath

String

MIME/email-related property

AIE 3.0

revisionnotes

Text

the revision notes

 

rtfbody

String

rtf body

AIE 3.0

rtfembeddedbody

String

the 'RTF embedded body' property

 

rtfinsync

Boolean

the 'RTF in sync' header value

 

rtfsyncbodycount

String

the 'RTF sync body count' header value

 

rtfsyncbodycrc

String

the 'RTF sync body crc' header value

 

rtfsyncbodytag

String

the 'RTF sync body tag' header value

 

rtfsyncprefixcount

String

the 'RTF sync prefix count' header value

 

rtfsynctrailingcount

String

the 'RTF sync trailing count' header value

 

scalecrop

String

the scale crop

 

searchkey

String

the 'search key' header value

 

section

String

the section

 

senderaddrtype

String

the 'sender addrtype' header value

 

senderemailaddress

String

the 'sender email address' header value

 

senderentryid

String

the 'sender entry ID' header value

 

senderflags

Long

the 'sender flags' header value (for a MIME message)

 

sendername

String

the 'sender name' header value

 

sendersearchkey

String

the 'sender search key' header value

 

sensitivity

String

the sensitivity (typically for Outlook messages)

 

sentdate

Date

the date on which the message was sent (typically for email messages)

AIE 3.0

sentonbehalfof

String

the 'sent on behalf of' header value (for email messages)

 

sentrepresentingaddrtype

String

the 'sent representing addrtype' header value

 

sentrepresentingemailaddress

String

the 'sent representing email address' header value

 

sentrepresentingentryid

String

the 'sent representing entry ID' header value

 

sentrepresentingflags

Long

the 'sent representing flags' header value (for a MIME message)

 

sentrepresentingname

String

sent representing name

AIE 3.0

sentrepresentingsearchkey

String

the 'sent representing search key' header value

 

shareddoc

String

whether the document is shared

 

size

Long

the size of the document in bytes

 

sourcemodifieddate

Date

the source modified date (typically on PDF documents)

 

spouse

String

the spouse's name (typically for an Outlook contact)

 

startdate

Date

the start date (typically, for an Outlook task)

 

status

String

the status

 

submissionid

String

the submission ID (for a MIME message)

 

submittime

String

the submit time of an email message

 

telexphonenum

String

the person's telex telephone number (typically for an Outlook contact)

 

threadindex

String

MIME/email-related property

AIE 3.0

template

Text

the document template

 

threadtopic

String

MIME/email-related property

AIE 3.0

title

String

the document's title

 

titleofparts

String

the title of parts

 

to

String

the list of email message recipients ('to')

 

totalwork

Long

the total work (typically, for an Outlook task)

 

tracknumber

Integer

the track number (e.g. on an MP3 file)

 

transportmessageheaders

Text

the 'transport message headers' value

 

trustsender

Long

the 'trust sender' header value (for an email message)

 

ttyttdphonenum

String

the person's TTY/TTD telephone number (typically for an Outlook contact)

 

typist

String

the typist's name

 

useragent

String

MIME/email-related property

AIE 3.0

versiondate

Date

the version date

 

versionnotes

Text

the version notes

 

versionnumber

String

the version number (typically a number but may be a string literal, depending on the versioning scheme)

 

watermark_text

Text

the watermark text

 

webpage

String

the webpage

AIE 3.0

webpageaddress

String

the Web page address (typically for Outlook contacts)

 

weeks

Integer

the number of weeks assigned (typically, for an Outlook task)

 

workphone

String

the person's work phone number

AIE 3.0

xaccountkey

String

MIME/email-related property

AIE 3.0

xcomplaintsto

String

MIME/email-related property

AIE 3.0

xfolder

String

MIME/email-related property

AIE 3.0

xhttpuseragent

String

MIME/email-related property

AIE 3.0

xmimeole

String

MIME/email-related property

AIE 3.0

xmozillastatus

String

MIME/email-related property

AIE 3.0

xmozillastatus2

String

MIME/email-related property

AIE 3.0

xref

String

MIME/email-related property

AIE 3.0

xtrace

String

MIME/email-related property

AIE 3.0

xuidl

String

MIME/email-related property

AIE 3.0

year

Integer

the year

 

  • No labels