Issues getting XML Parsing Libraries Working

I’m developing a Confluence Data Center plugin using Java and am having trouble instantiating an DocumentBuilderFactory to parse the XML storage format. When I call this from within my plugin:

DocumentBuilder builder = DocumentBuilderFactory.newInstance().newDocumentBuilder();

I get this error:

[INFO] [talledLocalContainer] java.lang.ClassCastException: class org.apache.xerces.jaxp.DocumentBuilderFactoryImpl cannot be cast to class javax.xml.parsers.DocumentBuilderFactory (org.apache.xerces.jaxp.DocumentBuilderFactoryImpl is in unnamed module of loader org.apache.catalina.loader.ParallelWebappClassLoader @37d61cea; javax.xml.parsers.DocumentBuilderFactory is in unnamed module of loader org.apache.felix.framework.BundleWiringImpl$BundleClassLoader @4e30df9f)
[INFO] [talledLocalContainer]   at javax.xml.parsers.DocumentBuilderFactory.newInstance(Unknown Source)
<stack trace trimmed for clarity>

From what I understand this appears to be some kind of OSGi issue; the implementing class in Apache Xerces is provided by the JVM, and is present in the parent classloader, but I need to do something to make it available to my bundle’s classloader. I’ve tried a few things in the <Import-Package> section of my confluence-maven-plugin configuration instructions, but can’t seem to get it working.

Any tips? How do I get XML parsing working in a plugin?

2 Likes

Hi Kashev,

I found some interesting stuff to look at in the source code of javax.xml.parser.DocumentBuilderFactory. Apparently the class

org.apache.xerces.jaxp.DocumentBuilderFactoryImpl

is a fallback classname for when it cannot find the standard class in the classpath.

I also the authors of DocumentBuilderFactory note that you can try to troubleshoot with the following flag upon start of the plugin: -Djaxp.debug=1

Other than this, our bundle has access to this class by default without any issues.
Below is our instructions in the atlassian maven plugin descriptor in our pom:

<instructions>
  <Import-Package>
    *;version="0";resolution:=optional
  </Import-Package>
  <Atlassian-Plugin-Key>${atlassian.plugin.key}</Atlassian-Plugin-Key>
  <Atlassian-Scan-Folders>META-INF/plugin-descriptors
  </Atlassian-Scan-Folders>
  <Spring-Context>*</Spring-Context>
  <Export-Package />
</instructions>

Cheers,
Elias
Kantega SSO

3 Likes

Hello hello

When using XML libraries in Confluence ,

  1. you need to ensure their usage is secure
  2. you should use the atlassian-secure-xml dependency
<dependency>
    <groupId>com.atlassian.security</groupId>
    <artifactId>atlassian-secure-xml</artifactId>
    <version>3.2.11</version>
</dependency>
...
SecureXmlParserFactory.newDocumentBuilder();
SecureXmlParserFactory.newXmlInputFactory();
SecureXmlParserFactory.newXmlReader();

Also the plugin descriptor describe in @EliasBrattliSorensen answer should do

Here is one of my fun memories dealing with XML libraries in Confluence How we stopped vulnerable code from landing in production - Atlassian Developer Blog

Cheers
Hasnae
former Confluence person

1 Like

Hi @viqueen-hasnae , thank you so much for the tips! We had gotten dom4j working in a prototype thanks to @EliasBrattliSorensen 's tips, but after reading your article we will probably switch to the library you’re suggesting since it by default configures against external entity expansion.

We’re trying to use an XML parsing library specifically for parsing Confluence storage format. As a former Confluence person, do you have any specific tips for that? Issues I’m running into trying to parse it as XML:

  • Having trouble finding an authoritative, up to date schema/DTD
  • Have to include entities for things like emdashes, which I’ve been doing by including http://www.w3.org/TR/xhtml1/DTD/xhtml-lat1.ent, which isn’t very secure, as you point out.

Am I on the right track, or is there a much more normal way to parse the storage format?

Thanks,
Kashev

Hi @kashev

Yea I remember all the fun dealing with the Confluence storage format !

The schema/DTD are indeed missing, one should just refer to this documentation Confluence Storage Format | Confluence Data Center and Server 8.0 | Atlassian Documentation

Now API wise , we use the XhtmlContent (Atlassian Confluence 7.4.9 API)

You would use it as follow

@Path("/stuff")
public class MyStuff {
    
    private final ContentService contentService;
    private final XhtmlContent xhtmlContent;

    @Autowired
    public MyStuff(
            @ComponentImport final ContentService contentService,
            @ComponentImport final XhtmlContent xhtmlContent
    ) {
        this.contentService = contentService;
        this.xhtmlContent = xhtmlContent;
    }

   @GET
   @Path("/{id}/extract")
   @Produces("application/json")
   public Response extractThings(@PathParam("id") final long contentId) {
        final String contentBody = contentService.find(ExpansionsParser.parse("body.storage")).withId(ContentId.of(contentId)).fetch()
                .map(content -> content.getBody().get(ContentRepresentation.STORAGE).getValue())
                .orElseThrow(notFound("content not found for id: " + contentId));

        try {
            xhtmlContent.handleXhtmlElements(
                    contentBody,
                    new DefaultConversionContext(new RenderContext()),
                    ImmutableList.of(
                            // define your custom element visitors
                            new XhtmlVisitor() {
                                @Override
                                public boolean handle(XMLEvent xmlEvent, ConversionContext conversionContext) {
                                    return false;
                                }
                            }
                    )
            );
            // the most popular usage is the macro definition handler
            xhtmlContent.handleMacroDefinitions(
                    contentBody,
                    new DefaultConversionContext(new RenderContext()),
                    (macro) -> macroDefinitions.add(
                            ImmutableMap.<String, Object>builder()
                                    .put("name", macro.getName())
                                    .put("macroId", macro.getMacroIdentifier().orElse(MacroId.fromString("none")).getId())
                                    .put("parameters", macro.getParameters())
                                    .put("body.text", macro.getBodyText())
                                    .put("body.type", macro.getBodyType())
                                    .put("valid", macro.isValid())
                                    .put("schema.version", macro.getSchemaVersion())
                                    .build()
                    )
            );
        } catch (XhtmlException exception) {
            // log it and handle it
        }
   }

Here is an example usage of com.atlassian.confluence.xhtml.api.XhtmlContent to extract macros out of a content storage format extract list of content macros · viqueen/atlassian-devbox@f999e5e · GitHub

I hope that helps

2 Likes

Hi again @viqueen-hasnae ,

We’ve circled back to trying to get XhtmlContent working in our plugin and are having some issues just getting the import working. I am using Atlassian Spring Scanner 2, @ComponentImport annotations on Confluence-provided classes, and can’t get a plugin with an XhtmlContent as a member to load:

[INFO] [talledLocalContainer] org.springframework.beans.factory.BeanCreationException: Error creating bean with name 'xhtmlContent': FactoryBean threw exception on object creation; nested exception is java.lang.IllegalArgumentException: javax.xml.stream.XMLStreamException referenced from a method is not visible from class loader
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:176)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.getObjectFromFactoryBean(FactoryBeanRegistrySupport.java:101)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.AbstractBeanFactory.getObjectForBeanInstance(AbstractBeanFactory.java:1884)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.AbstractAutowireCapableBeanFactory.getObjectForBeanInstance(AbstractAutowireCapableBeanFactory.java:1284)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.AbstractBeanFactory.doGetBean(AbstractBeanFactory.java:267)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.AbstractBeanFactory.getBean(AbstractBeanFactory.java:208)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.DefaultListableBeanFactory.preInstantiateSingletons(DefaultListableBeanFactory.java:948)
[INFO] [talledLocalContainer]   at org.springframework.context.support.AbstractApplicationContext.finishBeanFactoryInitialization(AbstractApplicationContext.java:918)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.context.support.AbstractDelegatedExecutionApplicationContext.access$1600(AbstractDelegatedExecutionApplicationContext.java:57)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.context.support.AbstractDelegatedExecutionApplicationContext$4.run(AbstractDelegatedExecutionApplicationContext.java:322)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.util.internal.PrivilegedUtils.executeWithCustomTCCL(PrivilegedUtils.java:85)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.context.support.AbstractDelegatedExecutionApplicationContext.completeRefresh(AbstractDelegatedExecutionApplicationContext.java:287)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.extender.internal.dependencies.startup.DependencyWaiterApplicationContextExecutor$CompleteRefreshTask.run(DependencyWaiterApplicationContextExecutor.java:137)
[INFO] [talledLocalContainer]   at java.base/java.util.concurrent.ThreadPoolExecutor.runWorker(ThreadPoolExecutor.java:1128)
[INFO] [talledLocalContainer]   at java.base/java.util.concurrent.ThreadPoolExecutor$Worker.run(ThreadPoolExecutor.java:628)
[INFO] [talledLocalContainer]   at java.base/java.lang.Thread.run(Thread.java:829)
[INFO] [talledLocalContainer] Caused by: java.lang.IllegalArgumentException: javax.xml.stream.XMLStreamException referenced from a method is not visible from class loader
[INFO] [talledLocalContainer]   at java.base/java.lang.reflect.Proxy$ProxyBuilder.ensureVisible(Proxy.java:858)
[INFO] [talledLocalContainer]   at java.base/java.lang.reflect.Proxy$ProxyBuilder.validateProxyInterfaces(Proxy.java:700)
[INFO] [talledLocalContainer]   at java.base/java.lang.reflect.Proxy$ProxyBuilder.<init>(Proxy.java:627)
[INFO] [talledLocalContainer]   at java.base/java.lang.reflect.Proxy.lambda$getProxyConstructor$1(Proxy.java:426)
[INFO] [talledLocalContainer]   at java.base/jdk.internal.loader.AbstractClassLoaderValue$Memoizer.get(AbstractClassLoaderValue.java:329)
[INFO] [talledLocalContainer]   at java.base/jdk.internal.loader.AbstractClassLoaderValue.computeIfAbsent(AbstractClassLoaderValue.java:205)
[INFO] [talledLocalContainer]   at java.base/java.lang.reflect.Proxy.getProxyConstructor(Proxy.java:424)
[INFO] [talledLocalContainer]   at java.base/java.lang.reflect.Proxy.newProxyInstance(Proxy.java:1006)
[INFO] [talledLocalContainer]   at org.springframework.aop.framework.JdkDynamicAopProxy.getProxy(JdkDynamicAopProxy.java:126)
[INFO] [talledLocalContainer]   at org.springframework.aop.framework.ProxyFactory.getProxy(ProxyFactory.java:110)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.service.util.internal.aop.ProxyUtils.createProxy(ProxyUtils.java:68)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.service.util.internal.aop.ProxyUtils.createProxy(ProxyUtils.java:37)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.service.importer.support.AbstractServiceProxyCreator.createServiceProxy(AbstractServiceProxyCreator.java:105)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.service.importer.support.OsgiServiceProxyFactoryBean.createProxy(OsgiServiceProxyFactoryBean.java:176)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.service.importer.support.AbstractServiceImporterProxyFactoryBean.getObject(AbstractServiceImporterProxyFactoryBean.java:95)
[INFO] [talledLocalContainer]   at org.eclipse.gemini.blueprint.service.importer.support.OsgiServiceProxyFactoryBean.getObject(OsgiServiceProxyFactoryBean.java:122)
[INFO] [talledLocalContainer]   at org.springframework.beans.factory.support.FactoryBeanRegistrySupport.doGetObjectFromFactoryBean(FactoryBeanRegistrySupport.java:169)
[INFO] [talledLocalContainer]   ... 15 more

I’m sure this is some kind of OSGi issue, but I’m not sure why this is happening.

Code that I’m using:

import com.atlassian.confluence.xhtml.api.XhtmlContent;
import com.atlassian.plugin.spring.scanner.annotation.imports.ComponentImport;
import lombok.extern.slf4j.Slf4j;
import org.springframework.beans.factory.annotation.Autowired;
import javax.ws.rs.Consumes;
import javax.ws.rs.GET;
import javax.ws.rs.Path;
import javax.ws.rs.Produces;
import javax.ws.rs.core.MediaType;

@Path("test")
@Consumes(MediaType.APPLICATION_JSON)
@Produces(MediaType.APPLICATION_JSON)
@Slf4j
public class TestResource {
    private final XhtmlContent xhtmlContent;

    @Autowired
    public TestResource(@ComponentImport final XhtmlContent xhtmlContent) {
        this.xhtmlContent = xhtmlContent;
    }

    @GET
    public String test() {
        log.error("\n\nXhtmlContent: {}\n", xhtmlContent);
        return "string";
    }
}

Also tried to add the following to my POM, with and without the provided scope:

        <dependency>
            <groupId>javax.xml.stream</groupId>
            <artifactId>stax-api</artifactId>
            <version>1.0-2</version>
            <scope>provided</scope>
        </dependency>

Any tips you could give me would be very appreciated!

1 Like

Hello hello @kashev

It has been a while since I last interacted with Confluence plugins , but , from what I see my plugin with XhtmlContent usage is still working even with a newer version of Confluence.

So my setup is as follow :

  • I use the confluence-plugins-platform-pom to manage my confluence dependency versions
  • I also use spring scanner 2
  • my plugin configuration has an import-package *

In code that translates to the following :

pom.xml

<properties>
       <amps.version>8.2.0</amps.version>
        <atlassian.plugin.key>${project.groupId}.${project.artifactId}</atlassian.plugin.key>
        <atlassian.spring.scanner.version>2.2.0</atlassian.spring.scanner.version>
        <confluence.version>7.18.2</confluence.version>
        <confluence.data.version>${confluence.version}</confluence.data.version>
</properties>
<dependencyManagement>
      <dependencies>
          <dependency>
              <groupId>com.atlassian.confluence</groupId>
              <artifactId>confluence-plugins-platform-pom</artifactId>
              <version>${confluence.version}</version>
              <type>pom</type>
              <scope>import</scope>
          </dependency>
      </dependencies>
  </dependencyManagement>
  <dependencies>
      <dependency>
          <groupId>com.atlassian.confluence</groupId>
          <artifactId>confluence</artifactId>
          <scope>provided</scope>
      </dependency>
      <dependency>
          <groupId>com.atlassian.plugin</groupId>
          <artifactId>atlassian-spring-scanner-annotation</artifactId>
          <scope>provided</scope>
      </dependency>
</dependencies>

<build>
        <plugins>
            <plugin>
                <groupId>com.atlassian.maven.plugins</groupId>
                <artifactId>confluence-maven-plugin</artifactId>
                <version>${amps.version}</version>
                <extensions>true</extensions>
                <configuration>
                    <productVersion>${confluence.version}</productVersion>
                    <productDataVersion>${confluence.data.version}</productDataVersion>
                    <enableQuickReload>true</enableQuickReload>
                    <containerId>${containerId}</containerId>
                    <jvmArgs>${jvm.args}</jvmArgs>
                    <server>localhost</server>
                    <skipRestDocGeneration>true</skipRestDocGeneration>

                    <instructions>
                        <Atlassian-Plugin-Key>${atlassian.plugin.key}</Atlassian-Plugin-Key>

                        <Export-Package>
                        </Export-Package>

                        <Import-Package>
                            org.springframework.osgi.*;resolution:="optional",
                            org.eclipse.gemini.blueprint.*;resolution:="optional",
                            *
                        </Import-Package>

                        <!-- Ensure plugin is spring powered -->
                        <Spring-Context>*</Spring-Context>
                    </instructions>
                </configuration>
            </plugin>
            <plugin>
                <groupId>com.atlassian.plugin</groupId>
                <artifactId>atlassian-spring-scanner-maven-plugin</artifactId>
                <version>${atlassian.spring.scanner.version}</version>
                <executions>
                    <execution>
                        <goals>
                            <goal>atlassian-spring-scanner</goal>
                        </goals>
                        <phase>process-classes</phase>
                    </execution>
                </executions>
                <configuration>
                    <verbose>false</verbose>
                </configuration>
            </plugin>
        </plugins>
    </build>

src/main/resources/META-INF/spring/plugin-context.xml

<?xml version="1.0" encoding="UTF-8"?>
<beans xmlns="http://www.springframework.org/schema/beans"
       xmlns:xsi="http://www.w3.org/2001/XMLSchema-instance"
       xmlns:atlassian-scanner="http://www.atlassian.com/schema/atlassian-scanner/2"
       xsi:schemaLocation="http://www.springframework.org/schema/beans
        http://www.springframework.org/schema/beans/spring-beans.xsd
        http://www.atlassian.com/schema/atlassian-scanner/2
        http://www.atlassian.com/schema/atlassian-scanner/2/atlassian-scanner.xsd">
    <atlassian-scanner:scan-indexes/>
</beans>

can you take a look at atlassian-devbox/confluence-devbox at main · viqueen/atlassian-devbox · GitHub , it comes with the bare minimum to work with REST endpoints in Confluence, and it is orchestrating different services from Confluence’s API.

Also which JDK version are you running Confluence with ? and which AMPS version ? and which Confluence version … okay, tell me all the versions lols

1 Like

Hi @viqueen-hasnae !

Thanks to your help, I determined the issue was due to the pom in my shared library including another copy of xml-apis, which I was able to exclude (using maven exclusion). XML libraries are weird (reference Xeres hell).

Thank you so much for your help!

Versions For Posterity

JDK Version (running on an Intel Macbook)

➤ java -version
openjdk version "11.0.12" 2021-07-20
OpenJDK Runtime Environment Homebrew (build 11.0.12+0)
OpenJDK 64-Bit Server VM Homebrew (build 11.0.12+0, mixed mode)

AMPS version is 8.2.3 – I notice now that that’s relatively old, though looking through the release notes I would hope that that’s not the issue.