Overview
Field Expressions allow for performing transformations on fields during search evaluation. The output of this evaluation can then be used for: display, sorting, boosting, filtering, joining and faceting. User Defined Fields can be added to the system via the server sdk, providing the ability to implement custom Field Expressions.
Create A User Defined Field
User Defined Fields are implemented using the com.attivio.sdk.server.udf.UserDefinedFieldEvaluator class.
Implementation of a user defined field requires implementing 2 methods:
- configure() - initializes user defined field implementation including:
- type checking arguments
- initializing evaluation using query level context information
- applying runtime configuration via named parameters specified on request.
- evaluate() - performs the evaluation of a user defined field
- input values may be null, single value or multi value
- output value can be null, single value or multi value
The following classes/interfaces are part of the User Defined Field server sdk:
com.attivio.sdk.server.udf.UserDefinedFieldEvaluator - Interface that must be implemented for user defined fields.
com.attivio.sdk.server.udf.UserDefinedFieldContext - Provides query level context information to the configure() method
com.attivio.sdk.server.udf.ArgumentInfo - Provides argument type information to the configure() method
com.attivio.sdk.server.udf.InputValue - Access to per-row argument values for the evaluate() method. Each InputValue may be null, single value or multi value.
com.attivio.sdk.server.udf.OutputValue - Result output interface. The evaluate() method should write all output values (if any) to the provided OutputValue instance.
com.attivio.sdk.search.fields.UserDefinedField - A generic FieldExpression used to model user defined fields in the QueryRequest.
See the examples below for more indepth coverage of creating user defined fields.
Configure A User Defined Field
User Defined Fields are configured as part of the schema. See here for documentation on configuring user defined fields.
Example
<schema name="default"> <fields> ... </fields> <udfs> <udf name="testFunction" type="string" class="com.attivio.examples.TestFunction"/> </udfs> </schema>
REST Syntax
The REST syntax for specifying a user defined field is:
UDF(<functionName>[, <arg>[, <arg>]...][, <parameterName>=<parameterValue>[, <parameterName>=<parameterValue>]...])
- <functionName> - the name of the requested user defined field as defined in the schema.
- <arg> - any other Field Expression. any number of arguments can be specified, separated by commas.
- <parameterName> - the name of a runtime configuration parameter
- <parameterValue> - the value for a runtime configuration parameter. String values should be wrapped in double quotes.
Example
# UDF named stringJoin that uses the title and text field along with the constant string "constantString" as input arguments, # with a string parameter named "separator" with a value of "." UDF(stringJoin, title, text, "constantString", separator=".") AS STRINGJOIN
Java API
User Defined Fields can be used in the Java API via the UserDefinedField Field Expression.
This field expression takes a name referencing a configured user defined field, an arbitrary number of FieldExpressions to use as arguments and an arbitrary number of named parameters.
User Defined Fields can be used anywhere in the QueryRequest that accepts a FieldExpression as an argument.
Example
QueryRequest request; ... // Request all fields request.addField("*"); // Request a UDF for display purposes UserDefinedField udf = new UserDefinedField("stringJoin"); udf.addArgument( new StoredField("title") ); udf.addArgument( new StoredField("text") ); udf.addArgument( new ConstantValue("constantString") ); udf.setParameter("separator", "."); udf.setAlias("STRINGJOIN"); // return output in field named "STRINGJOIN" request.addField(udf); // add our udf to the query request // Alternatively, rest syntax could be used directly // request.addField("UDF(stringJoin, title, text, \"constantString\", separator=\".\") AS STRINGJOIN"); ...
Using A User Defined Field
User defined fields can be used anywhere a Field Expression can be used.
Using User Defined Fields from SQL
USER_DEFINED_<TYPE>(name, [arg1[, arg2]...][, USER_DEFINED_PARAM(k1, v2)[, USER_DEFINED_PARAM(k2, v2)]...])
SELECT title from books WHERE USER_DEFINED_STRING('myudf', title, USER_DEFINED_PARAM('separator', ',')) = 'Boston'
The above example uses the myudf user defined field in a where clause. It uses a single field expression argument - the title field. It also uses a single parameter called separator with the ',' value. The document is selected if myudf returns Boston.
SELECT USER_DEFINED_STRING('myudf', title, USER_DEFINED_PARAM('separator', ',')) AS myname FROM books
The above example uses the same myudf user defined field as a selected item.
The following Attivio return types are supported:
- USER_DEFINED_STRING - Should be used when the custom function returns string or text values. In addition, can be used to convert all other returned types to string.
- USER_DEFINED_INT - Should be used when the custom function returns integer values.
- USER_DEFINED_NUMERIC - Should be used when the custom function returns float, long or double values.
- USER_DEFINED_DATE - Should be used when the custom function returns date values.
Note the following:
- The user defined field name passed to USER_DEFINED_<TYPE> must be the same name defined in the schema.
- The first USER_DEFINED_PARAM argument (the UDF parameter name) must be a constant string.
- The second USER_DEFINED_PARAM argument (the UDF parameter value) must be a constant string or number or boolean or date.
Note
The default maximum number of items passed to the USER_DEFINED function, including the name, arguments and parameters, is 10. That value can be increased by changing the sqlsdk.server.udf.maxargs attivio property.
Examples
Below are examples that illustrate specific aspects of the User Defined Field server sdk.
Argument Type Checking
The configure() method should perform all argument validation, including type checking and argument count validation. Implementations should throw IllegalArgumentException if not enough/too many arguments are provided, or types for arguments are not what is expected.
package com.attivio.examples.udf; import com.attivio.sdk.server.udf.*; import com.attivio.sdk.search.fields.UserDefinedField; import com.attivio.sdk.schema.SchemaField; /** * Example implementation showing argument type checking. * * Evaluation method will extract the suffix of a string field, using an integer field to specify the suffix length. * * Schema Definition: {@literal <udf name="stringSuffix" type="string" class="com.attivio.examples.udf.StringSuffix"/>} * REST Example: UDF(stringSuffix, title, suffixlength_i) */ public class StringSuffix implements UserDefinedFieldEvaluator { @Override public void configure(UserDefinedFieldContext context, UserDefinedField input, ArgumentInfo[] args) { if (args.length != 2) { // Throw exception if incorrect number of arguments are provided throw new IllegalArgumentException("Expected 2 arguments"); } else if (args[0].getType() != SchemaField.Type.STRING) { // Throw exception if first argument does not have the correct type throw new IllegalArgumentException("Expected first argument to be a String"); } else if (args[1].getType() != SchemaField.Type.INTEGER) { // Throw exception if second argument does not have the correct type throw new IllegalArgumentException("Expected second argument to be an integer"); } } @Override public void evaluate(OutputValue output, InputValue[] args) { // get the length of the suffix, assume single value field or null int length = (args[1].size() == 1) ? Math.max(args[1].getInteger(0), 0) : 0; // apply substring to all string values in first argument for (int i = 0; i < args[0].size(); ++i) { String input = args[0].getString(i); if (length < input.length()) { output.addString(input.substring(input.length() - length)); } else { output.addString(input); } } } }
Query Context
Query level context information is passed to the configure() method via the UserDefinedFieldContext interface. This provides access to the locale, timezone, etc specified for the query requesting the user defined field. This information can be used to initialize the user defined field implementation.
package com.attivio.examples.udf; import java.util.Locale; import com.attivio.sdk.server.udf.*; import com.attivio.sdk.search.fields.UserDefinedField; import com.attivio.sdk.schema.SchemaField; /** * Locale sensitive toLowerCase() * * NOTE: the default lower case field expression is already sensitive to locale, * this example is provided to show how to use query specified locale during UDF evaluation. * * Schema Definition: {@literal <udf name="localseToLowerCase" type="string" class="com.attivio.examples.udf.LocaleToLowerCase"/>} * REST Example: UDF(localeToLowerCase, title) */ public class LocaleToLowerCase implements UserDefinedFieldEvaluator { Locale locale; @Override public void configure(UserDefinedFieldContext context, UserDefinedField input, ArgumentInfo[] args) { // get locale from context, assume ENGLISH if not specified locale = (context.getLocale() != null) ? context.getLocale() : Locale.ENGLISH; if (args.length != 1) { throw new IllegalArgumentException("expected 1 argument"); } else if (args[0].getType() != SchemaField.Type.STRING) { throw new IllegalArgumentException("Expected string argument"); } } @Override public void evaluate(OutputValue output, InputValue[] args) { // apply locale sensitive toLowerCase() to all input values for (int i = 0; i < args[0].size(); ++i) { output.addString( args[0].getString(i).toLowerCase(locale) ); } } }
Named Parameters
User Defined Fields can be passed named parameters in order to support runtime configuration. Named parameters will be available on the UserDefinedField instance passed to the configure() method.
package com.attivio.examples.udf; import com.attivio.sdk.server.udf.*; import com.attivio.sdk.search.fields.UserDefinedField; import com.attivio.sdk.schema.SchemaField; /** * Use named parameter to specify radix used for parsing integers * * Schema Definition: {@literal <udf name="radixParseInteger" type="integer" class="com.attivio.examples.udf.RadixParseInteger"/>} * REST Example: UDF(radixParseInteger, hex_s, radix=16) */ public class RadixParseInteger implements UserDefinedFieldEvaluator { int radix = 10; @Override public void configure(UserDefinedFieldContext context, UserDefinedField input, ArgumentInfo[] args) { if (args.length != 1) { throw new IllegalArgumentException("Expected 1 argument"); } else if (args[0].getType() != SchemaField.Type.STRING) { throw new IllegalArgumentException("Expected string argument"); } else { // Get the runtime specified radix from the UserDefinedField request, using 10 by default if not specified/not a number radix = input.getParameter("radix", 10); } } @Override public void evaluate(OutputValue output, InputValue[] args) { for (int i = 0; i < args[0].size(); ++i) { try { output.addInteger( Integer.parseInt(args[0].getString(i), radix) ); } catch (NumberFormatException e) { // failed to parse, dropping value } } } }
Null Values
Any InputValue may be null during evaluation. The result of a user defined field can also be null by not adding any result to the OutputValue. Defensive coding practices should check the number of values for each InputValue before performing any evaluation.
package com.attivio.examples.udf; import com.attivio.sdk.server.udf.*; import com.attivio.sdk.search.fields.UserDefinedField; import com.attivio.sdk.schema.SchemaField; /** * Example UDF that will echo the first value for the first non-null input argument. If all arguments are null, the output will be null * * Schema Definition: {@literal <udf name="firstInteger" type="integer" class="com.attivio.examples.udf.FirstInteger"/>} * REST Example: UDF(firstInteger, intfield1_i, intfield2_i) */ public class FirstInteger implements UserDefinedFieldEvaluator { @Override public void configure(UserDefinedFieldContext context, UserDefinedField input, ArgumentInfo[] args) { for (ArgumentInfo arg : args) { if (arg.getType() != SchemaField.Type.INTEGER) { throw new IllegalArgumentException("Expected integer argument"); } } } @Override public void evaluate(OutputValue output, InputValue[] args) { for (InputValue arg : args) { if (arg.size() != 0) { output.addInteger( arg.getInteger(0) ); return; } } // if we got here, all input values were null, // not adding any output in this case in order to produce a null output value } }
Multi Value Fields
Any InputValue may contain multiple values during evaluation. Multi value output can also be generated by writing multiple output values to the OutputValue.
package com.attivio.examples.udf; import com.attivio.sdk.server.udf.*; import com.attivio.sdk.search.fields.UserDefinedField; import com.attivio.sdk.schema.SchemaField; /** * Example multi-value function that produces Cartesian product of multi-value input arguments. * * Schema Definition: {@literal <udf name="cartesianAddInteger" type="integer" class="com.attivio.examples.udf.CartesianAddInteger"/>} * REST Example: UDF(cartesianAddInteger, intfield1_i, intfield2_i) */ public class CartesianAddInteger implements UserDefinedFieldEvaluator { @Override public void configure(UserDefinedFieldContext context, UserDefinedField input, ArgumentInfo[] args) { if (args.length != 2) { throw new IllegalArgumentException("2 arguments required"); } else if (args[0].getType() != SchemaField.Type.INTEGER) { throw new IllegalArgumentException("integer argument required"); } else if (args[1].getType() != SchemaField.Type.INTEGER) { throw new IllegalArgumentException("integer argument required"); } } @Override public void evaluate(OutputValue output, InputValue[] args) { for (int i = 0; i < args[0].size(); ++i) { for (int j = 0; j < args[1].size(); ++j) { output.addInteger( args[0].getInteger(i) + args[1].getInteger(j) ); } } } }
Warnings
- Implementations of the evaluate() method should not throw exceptions as this will result in the query failing. SDK users should catch any exceptions and output an appropriate result (or null) in the event of any runtime errors. All type/argument checking should be done in the configure() method. If the evaluate() method throws an exception, the entire query will fail.
- Calling JNI functions from implementations can result in segfaults crashing the system. This will bring down the JVM and can result in an index and/or search outage.
- To reduce likelihood of an indexing outage due to segfault caused by JNI call, it is recommended to disable search for index writers.
- See here for configuring this setting.
- Implementations of the evaluate() method should not have any side effects. No shared state should be modified that will have an effect on future calls to evaluate().
- If this property is not maintained, you may get unexpected results
- for instance, if the same UDF is used for both sorting and display, different values may be used for sorting and displaying the same document, resulting difficult to identify inconsistencies.