CQL Searching From Container Expansion

jcarter · May 14, 2021, 8:58pm

CQL has a handy macro field that you can use to find all pages with a particular macro based on its key. For example macro = toc to find all objects with a Table of Contents macro.

There is another plugin (Scroll Versions) which registers a CustomContentEntityObject that I can query for with queries like scrollVersion = "1.2.3".

The CustomContentEntityObjects have a container which refers to an actual Confluence page object. I’d like to be able to query for the CustomContentEntityObjects from Scroll, but query on the connected page objects. I think that something like this should be possible with Expansions. For example:
scrollVersion = "1.2.3" and container.macro = toc should return all the pages which have a Table of Contents macro, provided I pass the right expansions to the CQLSearchService.

So far, I haven’t been able to make that work. The only elegant workaround I can think of would be to add my own custom CQL function to manage the relationship between the two entities. What I’ve actually done (don’t judge me) is do the CQL search and then filter the results by parsing the body content for each object and seeing if the macro in question is in there. It’s ugly, it won’t scale, but it works for my immediate case. I’d love a better idea.

Any better ideas?

dmorrow · May 17, 2021, 11:46pm

Hi @jcarter ,

I just moved this to the Confluence Server category, but I’m not an expert in Confluence Server, so I’ll reach out to the team to see if they can help.

Regards,
Dugald

rlau · May 19, 2021, 7:21am

Hi @jcarter,

Expansions themselves do not affect the results of the CQL query, they are applied after the fact. They enrich the search results by adding/removing information from each result, they can’t however filter out results (which is something you did manually as your first solution). If you have another look at the JavaDoc of Expansion it should make more sense.

If you also take a look at the documentation for the methods in CQLSearchService expansions are referred to as “expanding on the result” implying its after the CQL search has been done.

This is why you are unable to use container.macro = toc in your CQL query, base CQL handlers have no knowledge of expansion syntax and how to handle it. It is a common misconception to have though.

We are working on providing a sample solution to your problem, and will follow up with an update to that. Your idea of a CQL function would be fine, if you also provide an implementation of the Extractor2 module to extract from each Scroll content type some FieldDescriptor representing the container macro type.

Kind Regards,
Richard

jcarter · May 19, 2021, 2:41pm

Thanks, @rlau! That’s a good shout on the search extractor as well. That may be more efficient than a CQL function in my case.

rlau · June 25, 2021, 7:35am

Hi @jcarter ,

I have a solution for you! Ideally for your problem here we would have used a query-time join on documents in the content index to search by the container attributes, however there is currently no support for that (hopefully we can add this in the future). I went ahead with the approach of duplicating the container macro value in the child documents in the content index.

Click here to see the full source code integrated into the confluence-devrel-plugin.

Some explanation has been provided below:

Implementing the Extractor2 API:

public class ContainerMacroExtractor implements Extractor2 {

    private final static String SCROLL_PLUGIN_KEY = "com.k15t.scroll.scroll-platform:scroll-search-proxy-content-type";
    public final static String CONTAINER_MACROS_FIELD_NAME = "containerMacros";

    private final Logger log = LoggerFactory.getLogger(ContainerMacroFieldHandler.class);

    private XhtmlContent xhtmlContent;
    private MacroManager macroManager;

    public ContainerMacroExtractor(@ComponentImport XhtmlContent xhtmlContent, @ComponentImport MacroManager macroManager) {
        this.xhtmlContent = checkNotNull(xhtmlContent);
        this.macroManager = checkNotNull(macroManager);
    }

I made sure to import the XhtmlContent and MacroManager components so I could get information on what macros existed within a body of text. Then since I wanted to add a new field to the content index for each of the child content (which would be the scroll-search-proxy-content-type), I provided an implementation for extractFields:

    @Override
    public Collection<FieldDescriptor> extractFields(Object searchable) {
        ImmutableList.Builder<FieldDescriptor> resultBuilder = ImmutableList.builder();
        if (searchable instanceof CustomContentEntityObject) {
            CustomContentEntityObject customEntity = (CustomContentEntityObject) searchable;

            if (customEntity.getPluginModuleKey().equals(SCROLL_PLUGIN_KEY)) {
                ContentEntityObject container = customEntity.getContainer();

                MacroCollector collector = new MacroCollector(macroManager);
                BodyType bodyType = container.getBodyContent().getBodyType();
                if (bodyType.equals(XHTML)) {
                    processXhtml(container, collector);
                } else if (bodyType.equals(WIKI)) {
                    collector.processPotentialWikiMacro(container.getBodyAsString());
                }
                Function<String, FieldDescriptor> toContainerMacroFieldDescriptor = macroName -> new StringFieldDescriptor(CONTAINER_MACROS_FIELD_NAME, macroName, FieldDescriptor.Store.NO);
                collector.getMacroNames().stream().map(toContainerMacroFieldDescriptor).forEach(resultBuilder::add);
            }
        }
        return resultBuilder.build();
    }

    private void processXhtml(final ContentEntityObject searchableCeo, final MacroDefinitionHandler macroUsageCollector) {
        DefaultConversionContext context = new DefaultConversionContext(searchableCeo.toPageContext());
        try {
            xhtmlContent.handleMacroDefinitions(searchableCeo.getBodyAsString(), context, macroUsageCollector);
        } catch (XhtmlException ex) {
            log.warn("Failed to extracting macro usages on entity [{}] : {}", searchableCeo.getId(), ex.getMessage());
            log.debug("Failed to extracting macro usages on entity [{}] : {}", searchableCeo.getId(), ex);
        }
    }

The MacroCollector implementation here:

class MacroCollector implements MacroDefinitionHandler, WikiContentHandler {

    private Set<String> macroNames;
    private MacroManager macroManager;

    public MacroCollector(MacroManager macroManager) {
        this.macroNames = new HashSet<>();
        this.macroManager = checkNotNull(macroManager);
    }

    @Override
    public void handle(MacroDefinition macroDefinition) {
        macroNames.add(macroDefinition.getName());
        if (macroDefinition.getName().equals(UnmigratedBlockWikiMarkupMacro.MACRO_NAME)) {
            processPotentialWikiMacro(macroDefinition.getBodyText());
        }
    }

    @Override
    public void handleMacro(StringBuffer stringBuffer, MacroTag macroTag, String body) {
        macroNames.add(macroTag.command);
        processPotentialWikiMacro(body);
    }

    @Override
    public void handleText(StringBuffer stringBuffer, String s) {
        //Do Nothing
    }

    public Set<String> getMacroNames() {
        return macroNames;
    }

    protected void processPotentialWikiMacro(String wiki) {
        WikiMarkupParser parser = new WikiMarkupParser(macroManager, this);
        parser.parse(wiki);
    }
}

Yes, the process of gathering the macro information is quite messy. We do not store this information explicitly in the database so we can’t get it immediately from ContentEntityObject. Nor do we provide an abstraction of this macro collection process in a component which can be reused. This is in fact a duplication of logic from MacroExtractor which we have identified as behaviour we might need to refactor in the future.

The only step left is to provide a BaseFieldHandler implementation to be able to use CQL to search on that newly added field in the content index.

See: Adding a field to CQL to learn how to hook up a BaseFieldHandler to a field.

For the majority of container attributes you may ever want to duplicate in the index and search by for the children, its generally a lot simpler than this macro example.

If you have any further questions, please do not hesitate to reply.

Kind Regards,
Richard

jcarter · July 12, 2021, 6:08pm

Wow, @rlau, that is so far above and beyond anything I would have expected. I won’t have the chance to kick the tires on this too soon, but I’ll let you know how it goes when I do!