Lucene FieldCache and the title field

andreas1 · April 3, 2018, 5:37am

Dear community,

I need to get the title of a page in BoostingStrategy. I’m able to get it byaccessing the reader directly however accessing the title via the recommended way using the FieldCache returns just a word which is sometimes kind of related to the title but not the title itself. What do I miss?

BytesRef titleBuffer = new BytesRef(); 
FieldCache.DEFAULT.getTerms(subReader, "title").get(doc - subReaderContext.docBase, titleBuffer);
String title = titleBuffer.utf8ToString();

Thank you.

Andreas

Panos · April 3, 2018, 10:25pm

Hi, not sure if that helps your problem, but i remember i had somewhere this class to see all fields:

import java.io.IOException;
import java.util.Map;

import org.apache.lucene.index.AtomicReader;
import org.apache.lucene.index.Fields;
import org.apache.lucene.index.IndexReader;
import org.apache.lucene.index.MultiFields;
import org.apache.lucene.search.FieldCache;
import org.apache.lucene.util.BytesRef;

import com.atlassian.confluence.search.service.SearchQueryParameters;
import com.atlassian.confluence.search.v2.lucene.boosting.BoostingStrategy;
public class BoostBySpaceAndLanguageStrategy implements BoostingStrategy {
	private final BytesRef fieldRef = new BytesRef();
	private final FieldCache cache = FieldCache.DEFAULT;


	@Override
	public float boost(IndexReader reader, int doc, float score)
			throws IOException {
		AtomicReader aReader = (AtomicReader) reader;
		try {
			Fields fields = MultiFields.getFields(reader);
			for (String field : fields) {
				cache.getTermsIndex(aReader, field).get(doc, fieldRef);
				System.out.println(field+ "\t"+fieldRef.utf8ToString());
			}
		} catch (Exception e) {
			e.printStackTrace();
			return score;
		}
		return score;
	}


	@Override
	public float boost(IndexReader indexReader,
			SearchQueryParameters searchQueryParameters, int doc, float v)
			throws IOException {
		return boost(indexReader, doc, v);
	}

	@Override
	public float boost(IndexReader arg0, Map<String, Object> arg1, int doc,
			float score) throws IOException {
		return score;
	}
}

I am extracting the title a bit different, could it be that?

andreas1 · April 7, 2018, 6:36am

Dear Panos,

thank you very much for your help. Doing it your way shows other field names than going via reader.document - interesting. I was now able to use the field “content-name-untokenized” which contains the full title.

Kind regards

Andreas