View incoming links.
Required Modules
These features require the inclusion of the searchanalytics module when you run createproject
to create the project directories.
Overview
The Search Analytics dashboard is a tool aimed at application administrators to help them better understand their end users by analyzing the searches those users perform and how they respond to the documents those searches return.
Search Analytics combines data from the search history on the Attivio index with signals data submitted by applications that display the results of those searches. It can then correlate these two sets of data to provide useful insights into your users' behavior and your system's performance, and to provide guidance on how to improve the quality of users' search experience.
Tracking Signals
Part of the functionality provided by the Search Analytics dashboard includes analyzing the behavior of your users when they click on the documents in the search results. The dashboard uses this information to determine which documents are being chosen, what their position in the search results was, etc., so you can evaluate things such as how effectively your machine-learning relevancy models are performing. Signals are required to provide the Top Clicks and Average Click Position panels on the dashboard and also to enable exporting raw signal data in CSV format.
For Search Analytics to analyze signals, they must be sent into the Attivio Platform. The search application that your users use to perform searches must be integrated with Attivio's Signal Tracking API. (If you are using Attivio's Search UI application, or a custom application built using the Attivio SUIT library, user clicks should already be being tracked; if you are using your own custom application you will need to explicitly add signal tracking to it.) You can learn more about signals in particular and using Attivio's Machine Learning Relevancy in general by following the Machine Learning Relevancy Quick Start.
Filtering the Dashboard's Data
The Search Analytics dashboard lets you decide which data you want to see and filter the results that it shows you accordingly. You can specify a time range, a time resolution, and which Search Profiles and Relevancy Models you would like to see data for. (The discussions of the individual panels on the dashboard, below, describe how the filters affect them.)
Filtering by Time Ranges
You can specify time ranges by choosing an interval from the control at the top of the page. If you choose one of the pre-defined intervals, the Search Analytics dashboard will show you data for that range of time, up to now. For example, choosing the option "30d" will show the previous 30 days' worth of data, ending with today's. The pre-defined intervals are for the previous 12 hours, the previous day, the previous week, the previous month (30 days), the previous 90 days, the previous 6 months (180 days), and the previous year. In addition to these, you can choose a custom range by clicking the "Custom…" option and selecting starting and ending dates in the pop-up that appears. Finally, you can narrow down the range currently being displayed, essentially zooming in on part of it, by dragging within the timeline chart at the top of the page.
Note that the time range that is in effect is listed explicitly at the top of the page, above the Search Timeline panel.
Specifying Time Resolutions
When showing certain charts, the Search Analytics dashboard will aggregate the data into "buckets" representing subsets of the full time range in order to provide cumulative counts or averages over those ranges. You can specify the size of these subsets using the resolution menu, labeled By: The options in the menu vary based on the length of the time range selected. For example, if you are viewing the previous 30 days, you can see the following options: Day, Week, and Month.
If you are looking at the previous 7 days' data and you choose the Day resolution, for example, the values in the timeline at the top of the dashboard will show the total number of queries made on each day (the blue bars) and the average response time over all queries executed on a given day (the points on the yellow line). If you change the resolution to Hour, each blue bar from the previous chart would be replaced by 24 separate bars, the totals of which would add up to the previous bar's value. In the case of the average response time, because it is an average value, the points will represent the average of response times within each hour-long range instead of a single day-long range.
Filtering on Search Profiles
You may have multiple search profiles defined in your system that are used when different groups of users make queries or that are used by different applications that you've buit.You can narrow down the set of data shown in the dashboard to just those searches involving particular search profiles by choosing them from the Search Profile menu at the top of the page.
By default, queries and signals made with all search profiles are shown—the results are unfiltered. If you choose one or more profiles from the list, only queries made using those profiles will be visible in the data. Click Select All to return to the unfiltered state. Note that there is a special option at the bottom of the list for queries made with no search profile specified, which may be the case for some of your queries and signals.
Filtering on Relevancy Models
You may have multiple machine-learning relevancy models defined in your system, that are used when different groups of users make queries or that are used by different applications that you've built.You can narrow down the set of data shown in the dashboard to just those searches involving particular relevancy models by choosing them from the Relevancy Model menu at the top of the page.
By default, queries and signals made with all relevancy models are shown—the results are unfiltered. If you choose one or more models from the list, only queries made using those models will be visible in the data. Click Select All to return to the unfiltered state. Note that there is a special option at the bottom of the list for queries made with no relevancy model specified, which may be the case for some of your queries and signals.
Query Count
In the top-left corner of the dashboard is the total number of queries. This number represents the number of queries that meet the filter criteria; in other words, the pool of queries being analyzed to provide the data on the page.
All of the filtering options except resolution affect the query count.
Search Timeline
At the top of the dashboard is a chart showing overall search performance over time. It actually consists of two charts overlaid on top of one another: a bar chart showing the query volume and a line chart showing the average response time for those queries. If you hover over any point on the chart, you will see the specifics about the data at that point in time.
All of the filtering options affect the search timeline.
Query Volume
The blue bar chart shows how many queries were made in each subset of the time range displayed. For example, when showing the previous 30 days' worth of data by day, you will see the number of queries made today, those made yesterday, those the day before yesterday, etc., going back 30 days. Changing the resolution will change the size of these subsets so you could, for example, set the resolution to "week" and see how many queries were made in each of the previous 4 or 5 weeks. The left side shows the scale for these bars, in absolute numbers of queries.
You can use this information to see how much users are querying your system, and when. This might help you plan upgrades to your system's processing power or track surges in volume to external triggers such as your company's press releases or world events.
Average Response Time
The yellow line chart shows how quickly the queries users made were returned. The values it displays are in milliseconds, according to the scale on the right side of the chart. Lower numbers are, of course, better. It is normal for there to be a correlation between the shape of the bar chart and the line chart since handling more queries simultaneously can have an effect on your system's performance.
Queries With No Results
The Queries With No Results panel shows just that: queries your users made that returned zero results. It lists the query they entered and the number of times that query was made. They are sorted so the most frequently made queries are at the top of the list.
All of the filtering options except resolution affect the Queries With No Results panel.
Queries vs. Terms
Since sometimes knowing the terms that were in the query can be helpful, you can toggle between showing the actual query and the terms that the Attivio engine extracted from it when it was processed. For example, if a particular query was for "water pollution," it could be reduced to the two terms "water" and "pollution." This can help you understand not only how the query combining the terms has performed but also how the individual terms have, so if queries for "air pollution" and "noise pollution" were also made and also returned no information, showing terms instead of queries will group all of these together.
Expanding the Results
Clicking the More… link at the bottom of the panel will expand it to take up the entire dashboard below the timeline. In this state, you can beyond the first 10 items shown in the collapsed state. At the bottom of the expanded panel is a control that lets you page through the data. In addition to the list of queries on the left-hand side of the expanded panel, the right-hand side of the panel contains more detail about each query. Clicking a query selects it and shows its details: the users who made this query and other searches related to it.
Clicking the Close button in the top-right corner of the expanded panel will shrink it down so the other panels are visible again. As a shortcut, you can click an individual query in the collapsed panel to both expand the panel and see that query's details.
Top Searches
The Top Searches panel shows those queries most commonly executed by your users. It lists the query they entered and the number of times that query was made. It also lists the average response time for that particular query over all of these executions. They are sorted so the most frequently made queries are at the top of the list.
All of the filtering options except resolution affect the Top Searches panel.
Queries vs. Terms
Since sometimes knowing the terms that were in the query can be helpful, you can toggle between showing the actual query and the terms that the Attivio engine extracted from it when it was processed. For example, if a particular query was for "water pollution," it could be reduced to the two terms "water" and "pollution." This can help you understand not only how the query combining the terms has performed but also how the individual terms have, so if queries for "air pollution" and "noise pollution" were also made, you could see that a lot of users were looking for "pollution" as they'd all be grouped together.
Expanding the Results
Clicking the More… link at the bottom of the panel will expand it to take up the entire dashboard below the timeline. In this state, you can see beyond the first 10 items shown in the collapsed state. At the bottom of the expanded panel is a control that lets you page through the data. In addition to the list of queries on the left-hand side of the expanded panel, the right-hand side of the panel contains more detail about each query. Clicking a query selects it and shows its details: the average number of documents the query returned, documents related to this query, similar queries (i.e., users who made this query also made these other queries), the users who made the query, and the history of the average response time for this particular query.
Clicking the Close button in the top-right corner of the expanded panel will shrink it down so the other panels are visible again. As a shortcut, you can click an individual query in the collapsed panel to both expand the panel and see that query's details.
Top Users
The Top Users panel shows which users executed the most number of searches. It lists the user's name and the number of queries they made. They are sorted so the most active queriers are at the top of the list.
All of the filtering options except resolution affect the Top Users panel.
Expanding the Results
Clicking the More… link at the bottom of the panel will expand it to take up the entire dashboard below the timeline. In this state, you can see beyond the first 10 items shown in the collapsed state. (Note that the link is only available if more than 10 users have made queries in your system and there is additional data to show.) At the bottom of the expanded panel is a control that lets you page through the data.
Clicking the Close button in the top-right corner of the expanded panel will shrink it down so the other panels are visible again.
Top Clicks
The Top Clicks panel shows which documents users clicked on most frequently. It lists the document's title and the number of times users clicked it. They are sorted so the most-clicked documents are at the top of the list.
All of the filtering options except resolution affect the Top Clicks panel.
Your system must have signal data in order for the Top Clicks panel to show anything.
Expanding the Results
Clicking the More… link at the bottom of the panel will expand it to take up the entire dashboard below the timeline. In this state, you can see beyond the first 10 items shown in the collapsed state. At the bottom of the expanded panel is a control that lets you page through the data. In addition to the list of documents on the left-hand side of the expanded panel, the right-hand side of the panel contains more detail about each document. Clicking a document selects it and shows its details: the clicks it received over time, its average position in the search results when clicked (again, over time), what queries were made to return this document, and which users clicked on it.
Clicking the Close button in the top-right corner of the expanded panel will shrink it down so the other panels are visible again. As a shortcut, you can click an individual document in the collapsed panel to both expand the panel and see that document's details.
Average Click Position
This panel shows the overall average position of documents the users clicked on, over time. You can use this to evaluate the effectiveness of your machine-learning relevancy models over time. Note that lower values for click position are better since they indicate that users are finding the documents they're looking for at the top of the results being returned and that your machine-learning relevancy models are effective.
All of the filtering options affect the Average Click Position panel.
Your system must have signal data in order for the Average Click Position panel to show anything.
Exporting Data
You can export data about your system from the Search Analytics dashboard using the commands in the Export menu at the top of the page. Search Analytic lets you export its analyzed data in Microsoft Excel format. It also lets you download the raw data in your system in CSV format, for both queries made and signals added.
Exporting Data to Excel
To export data in Microsoft Excel format, choose Microsoft Excel from the Export menu. Search Analytics will generate and download the document to the standard download directory on your computer (as configured in your web browser).
All of the filtering options affect the Excel export.
The exported file is an Excel workbook containing multiple individual worksheets, as follows:
Overview
This worksheet contains general information about the system: how many queries are in the data set, what the filtering criteria are (time range, resolution, search profiles, and relevancy models), the user who exported the file, and the time it was done. There is also a link to the server which will show the dashboard with the same filtering criteria as were used to produce the file.
Query Volume
This worksheet is analogous to the Search Timeline panel on the dashboard page. It contains the following data, grouped according to chosen resolution: the number of queries made, the average response time for these queries, the average number of documents returned by those queries, and the average position of documents users clicked.
Top Searches
This worksheet is analogous to the Top Searches panel on the dashboard page. It lists the most-executed queries along with the number of times each was executed and the average response time for each query.
Top Queries with No Results
This worksheet is analogous to the Queries With No Results panel on the dashboard page. It lists the most-executed queries that returned zero documents, along with the number of times each was run.
Top Clicks
This worksheet is analogous to the Top Clicks panel on the dashboard page. It lists the most-clicked documents in the system, both their ID and title, as well as the number of clicks each one got. Documents which have an associated URI are shown as hyperlinks—clicking one will open its URI.
Top Users
This worksheet is analogous to the Top Users panel on the dashboard page. It contains details about the most prolific users in your system and how many searches they have performed.
Note that the data on any given worksheet in the Excel file is limited to the top 5,000 datapoints; if you need to see data beyond this, you can use either or both of the the raw export options, described below.
Exporting Raw Query Data
To get all of the raw query data from your system, choose Raw Query Data (CSV) from the Export menu. This is all of the data that Search Analytics keeps about your users' queries. This command produces a file containing comma-separated values with each row representing a single query. (Note that the queries in the system will be those that were allowed by the filtering rules configured for your system; see Search Analytics Architecture and Configuration for details.)
The document's first row contains the names of the columns, as follows:
- serializedQuery is the full query as it was parsed by the Attivio Platform and which may have been modified by the system (via query workflows, for example)
- query is the original, unmodified query
- queryLanguage is the language used to parse the query, if any ("simple" or "advanced")
- date is the time the query was executed
- terms is a comma-separated list of the terms that the query contained
- searchProfile is the search profile applied to the query, if any
- principalDomain is the domain of the user who performed the query
- principalID is the ID of the user who performed the query
- principalName is the name of the user who performed the query
- responseTime is the time (in milliseconds) that it took to execute the query
- returnedDocs is the number of documents found by the query
- appliedRelevancyModel is the machine-learning relevancy model that was used to rank the query's results, if any
This CSV file contains all of the queries made during the time range specified on the dashboard page. The resolution, search profile, and relevancy model filters do not affect this export.
Exporting Raw Signal Data
To get all of the raw signal data from your system, choose Raw Signal Data (CSV) from the Export menu. This is all of the data that Search Analytics keeps about the signals sent to your system to track your users' clicks and other actions. Note that whereas the Search Analytics dashboard only provides analyses of "click" signals, this export may contain other types of signals added by your search application (for example, the Attivio Search UI application will add a "like" signal when users rate a document with 1-5 stars). This command produces a file containing comma-separated values with each row representing a single signal. The document's first row contains the names of the columns, as follows:
- type is the type of signal (the Search Analytics dashboard only considers "click" signals but you may want to perform your own analysis on other signal types)
- query is the query that was executed to provide the result set containing the document the signal applies to
- docID is the ID of the document the signal applies to
- principalDomain is the domain of the user who performed the query
- principalID is the ID of the user who performed the query
- principalName is the name of the user who performed the query
- signalTimestamp is the time the signal was added (for example, the time time the user clicked the document)
- queryTimestamp is the time the query was executed
- docOrdinal is the document's position within the search results (that is, the "click position" in the case of click-type signals)
- relevancyModelName is the name of the machine-learning relevancy model that was used to rank the search results
- relevancyModelVersion is the version of the machine-learning relevancy model used
- locale is the locale that was used for the query
- labels are any additional data provided by the application submitting the signal
This CSV file contains all of the signals added during the time range specified on the dashboard page. The resolution, search profile, and relevancy model filters do not affect this export.