The Freshness field expression is used for scoring "newer" documents higher than older documents.
This field expression is usually used as a Field Boost as part of Relevancy Configuration.
Freshness can be specified to use a decay based algorithm for scoring documents according to recency, or can use a table based configuration, specifying multiple ranges of dates, each using a different quadratic function for producing a freshness score.
Decay Based Freshness
The default algorithm for calculating the freshness score for a given document is:
freshness = 1 / pow( abs(center - freshnessValue(fieldName, doc)) + 1, decay)
Where:
- decay is a double
- freshnessValue(fieldName, doc) = the indexed freshness value for fieldName in doc, i.e. the date field specified in the relevancy model.
The decay is used to accelerate or decelerate the change in score as documents approach the center date. A decay setting of zero results in a boost of one (which effectively removes this as a boost factor as all documents will be relevancy score + 1). Decay values greater than zero increase the relevancy score of newer documents; the higher the decay value, the faster the relevancy score boost will decrease for older documents. Decay values less than zero will result in higher relevancy scores for older documents and will effectively "penalize" newer documents with lower relevancy score boosts.
To calculate the desired decay factor for a given half-life (in seconds), the following formula can be used:
log(2) / log(abs(halflife)+1)
Sample values:
Half Life | Decay Value |
---|---|
1 hr | 0.085 (this is the default decay value) |
6 hr | 0.06945 |
12 hr | 0.06494 |
1 day | 0.06098 |
1 week | 0.05206 |
1 month | 0.047 |
The freshness score will be a number between 0.0 and 1.0 for positive decay values, 0.0 and -1.0 for negative ones.
FRESHNESS(<field>[, decay=<decay>][, center=<centerDate>][, centerResolution=<centerResolution>][, default=<defaultDate>])
- <field> - A field expression that returns a date value.
- <decay> - Specify the decay value for freshness computation. (default=0.085)
- <centerDate> - The center date used for computing document recency. (default = current time)
- <centerResolution> - resolution for center date, one of MILLISECONDS, SECONDS, MINUTES, HOURS, or DAYS. (default = HOURS)
- <defaultDate> - the default date to use for documents that do not have a value for the <field>
FRESHNESS(date, decay=0.085)
Table Based Freshness
Alternatively, freshness boosts may be manually specified in a table for different date ranges rather than using the decay function.
A freshness table has a center date (by default the system time) that its corresponding rows are relative to. Rows are specified as a max-delta time from the in the given java.concurrent.TimeUnit. Score is determined by rows the quadratic (a), linear (b), and constant (c) values and the documents freshness distance from the center (x) :
freshness = a*x^2 + b*x + c
The corresponding score is normalized against the maximum allowed table values resulting in a score of 0 to 1.
FRESHNESS(<field>[, center=<centerDate>][, centerResolution=<centerResolution>][, default=<defaultDate>][, RANGE(delta=<delta>[, units=<deltaUnits>][, constant=<constant>][,linear=<linear>][,quadratic=<quadratic>])]...)
- <field> - A field expression that returns a date value.
- <centerDate> - The center date used for computing document recency. (default = current time)
- <centerResolution> - resolution for center date, one of MILLISECONDS, SECONDS, MINUTES, HOURS, or DAYS. (default = HOURS)
- <defaultDate> - the default date to use for documents that do not have a value for the <field>
- <delta> - The max age for a range.
- <deltaUnits> - Time units for <delta> attribute for a range (default = SECONDS)
- <constant> - The constant coefficient for scoring documents in a range. (default = 0.0)
- <linear> - The linear coefficient for scoring documents in a range. (default = 0.0)
- <quadratic> - The quadratic coefficient for scoring documents in a range. (default = 0.0)
FRESHNESS(date, RANGE(delta=1, unit=DAYS, constant=1.0), RANGE(delta=7, unit=DAYS, constant=1, linear=-0.1))
Boost of 1 for any time in past 24 hours, linearly declining by 0.1 for each subsequent day from the center up to 7 days ago
Freshness Field Date Format
The field specified for use by the freshness boost must be a date field defined in the schema.
Handling and formatting of date fields is described in the transformer ingest field class ParseDate. In summary, the ISO 8601 date format is assumed and tested for; the method backs-off to ISO 8601 without the "T", then to the XML Gregorian spec, and finally epoch in seconds.
To specify a different date format, you should modify the schema for the date field as noted in the Schema and Field Properties article, Document Transformer properties section.
For example:
<field name="date" type="date" indexed="true" stored="true" sort="true" default="NOW"> <properties> <property name="workflow.date.format" value="MM/dd/yyyy"/> </properties> </field>