Overview
This document details how users register signals to be tracked in their Attivio Platform project.
What is a signal?
A signal is just an action in the search application that could potentially be used to refine future search results. An administrator can define what signals types they choose when they create a signal but the default signal is a search result click signal. This is when a user searches the application and clicks on one of the search result. A signal is created from this click that contains information like the query, relevancy model used, relevancy feature vector, query time stamp and other fields. Accumulation of these signals will allow for training of relevancy models. Relevancy models will be applied to query actions based on user, group and other things and this will optimize what a user sees when they enter a certain query.
What is signal type and signal strength?
The signal type is just a descriptor of the type of signal. Signals will not be compared to signals of other types when being run through the relevancy model. We have two types that are statically defined as "click" and "rating" corresponding to signals created as the result of clicking on a search result and signals created as a result of rating a search result. While we have these signal types statically defined, users can register signals with any type that they would like depending on what actions they would like to affect how their relevancy models are trained.
Signal strength is a way of quantifying how much weight a certain signal should get. The higher the strength, the more that signal will be weighed when training the relevancy model. For instance, if there is a rating signal, the strength of the signal would be higher the higher the rating given was. Signal strength can set to 0 to nullify bad signals. The values of signal strength are relative only to the strengths of a certain signal type. That is, a strength of 1 for a signal type with a strength range of 0-1 will not be given the same weight within its type as a for a signal type with a strength range of 0-100.
How can I track signals?
This API is for both available for both Java and REST. The Java classes needed to interface are in our SDK and can be accessed by including our SDK JAR libraries on the classpath of any Java project that wishes to interface with the signal tracking feature. The REST API can be accessed using HTTP requests to any node running at your system. See the section below for more details.
Java API
To interface with Attivio using Java, you need to first need to include our SDK JAR libraries on your classpath. The SDK JAR libraries exist in the Attivio Platform installation directory at this location:
<install_directory>/sdk/java/client/lib/
This will allow you to use our service factory to access the signal tracking API. For this you will need to enter the host and port on which ZooKeeper is running in Attivio Platform as well as the project name. Below is a sample program that will query the index and add a signal as though a user clicked on a result:
package com.test; import com.attivio.sdk.client.trigger.TriggerApi; import com.attivio.sdk.client.trigger.TriggerDictionaryInfo; import com.attivio.service.Platform; import com.attivio.service.ServiceFactory; import com.attivio.service.JmxAuthParams; public class TestTest { public static void main(String[] args) throws Exception { // initialization needed to access services using the service factory System.setProperty("AIE_ZOOKEEPER","localhost:16980"); Platform.instance.setProjectName("sampleproject"); // create a query request to get all documents with a table of orders Query query = new QueryString("table:orders"); QueryRequest request = new QueryRequest(query); // query against the index QueryResponse response = ServiceFactory.getService(SearchClient.class).search(request); // retrieve the signal tracking API from the service factory SignalTrackingApi signalTrackingApi = ServiceFactory.getService(SignalTrackingApi.class); // create the signal on the first result document and add it to the signals being tracked // this is simulating a user click on the first result of the query Signal signal = Signal.getSignal(response, 1); signalTrackingApi.addSignal(signal); // create the signal on the second document and add it to the signals being tracked // this is simulating a user rating on the second result of the query with three stars signal = Signal.getSignal(response, 2, Signal.RATING, 3); signalTrackingApi.addSignal(signal); } }
REST API
To add signals using REST requests, you need to have an Attivio node running. The root for all of Attivio's REST API endpoints are http://<host>:<port>/rest
.
To add a signal you need to make an HTTP POST request to the following endpoint
http://<host>:<port>/rest/signals/add
with a JSON object with the following parameters:
docId
- id of the document for which the signalprincipal
- the Attivio principal formatted like so: <realmId>:<principalId>:<principalName>docOrdinal
- the ordinal of the document in the search resultfeatureVector
- the feature vector associated with the documentquery
- the query string that returned the documentlocale
- the locale of the queryrelevancyModelName
- the name of the relevancy model associated with the queryrelevancyModelVersion
- the name of the relevancy model associated with the queryrelevancyModelNames
- the names of all relevancy models associated with the query requestqueryTimestamp
- the time the query was enteredsignalType
- the type of the signalsignalStrength
- the strength of the signal
Below is a sample curl
request to add a signal to Attivio:
curl -X POST \ http://localhost:17000/rest/signals/add \ -H 'cache-control: no-cache' \ -H 'content-type: application/json' \ -d '{ "principal": "default:user1:aieadmin", "docId": "doc1", "docOrdinal": 1, "featureVector": "score=0.16604789,freshness=0.16604789,phrase_title=0.0,phrase_anchortext=0.0,title=0.0,anchortext=0.0", "query": "*", "locale": "en", "relevancyModelName": "default", "relevancyModelVersion": 0, "relevancyModelNames": [ "default", "noop" ], "queryTimestamp": 1491847672751, "signalTimestamp": 1491847672879, "type": "click", "weight": 1 }'
Other endpoints include:
GET http://<host>:<port>/rest/signals/relevancyModels
This requires no input and will return a JSON list of all relevancy model names associated with all signals being tracked.
GET http://<host>:<port>/rest/signals/signalTypes
This requires no input and will return a JSON list of all unique signal types associated with all signals being tracked.
GET http://<host>:<port>/rest/signals/modelHistogram
This will return a JSON object that maps document ordinal to number of signals associated with that ordinal for a relevancy model specified with required URL parameters modelName
and modelVersion
.
GET http://<host>:<port>/rest/signals
This will return a JSON list of all signals being tracked. Signals will be modeled as JSON objects that are in the same form as those in the payload of the signals/add endpoint
above. Signals can be filtered using the following optional URL parameters:
modelName
- relevancy model namestartTime
- the earliest time at which a the query was made for the signalendTime
- the latest time at which a the query was made for the signalsignalType
- type of the signal, multiple signalType fields may be entered to filter on more than one signal type
Command Line Tool
The uploadSignals command line tool can be used to bulk upload signal data. This tool is especially useful for uploading signal data for "golden set" queries indicating a supervised set of ranked documents for given queries. This tool will automatically resolve feature vectors for specified documents according to the given query. Any external signal data that does not include feature vectors can be uploaded using this tool.
Project Properties
The following project properties are available to configure the signal tracking API. To change from the defaults, put the properties in the attivio.core-app.properties file located in <install_dir>/conf/properties/core-app
name | description | default |
---|---|---|
signal.flush.timeout | The delay in milliseconds between refreshing the signal store between non ui originated access requests to said store. There should be no need to change this unless the relevancy models for the project need to be trained more frequently than every 3 hours | 3600000 |
signal.flush.ui.timeout | The delay in milliseconds between refreshing the signal store between ui originated access requests to said store. Currently the only ui originated request to which this applies is the getModelHistogram method | 60000 |
signal.flush.delay | The delay in seconds after starting the application that the signal information will be automatically centralized | 86400 |
signal.flush.period | The amount of time in seconds between automatic centralization of signal data after the inital delay (configured with signal.flush.delay) | 86400 |
signal.flush.batch.size | The number of signals to have in a batch while we sort/centralize the data. If there is enough memory available in the application, increasing this value may slightly improve performance of signal centralization/access | 10000 |
signal.purge.period | The amount of time in milliseconds from being created that a signal will be stored in Attivio. Once signals are older than this they will be deleted. | 3888000000 |