Wikidata Logging Wall

Wikidata Logging Wall
Photo by Pete Willis / Unsplash

I'm trying to setup my own Wikidata endpoint, for fun and profit. It requires lots of resources (do not even try without at least two SSD disks in RAID 0...), but the tools are easily accessible, mostly documented, and there are a few script ready to be executed.

There are exceptions. The one that drived me mad for days was about the default logging behavior. Executing the runBlazegraph.sh script, which essentially fires up the main graph instance to be populated with a Wikidata dump, it generates a stream of detailed debugging output which slows down every single operation. And, when you handle more than 1TB of data, you are going to execute lots of operations: slowing them down means days or weeks of waiting...

I tried (and failed) with different Java parameters passed to the command line, in the aim to handle the log4j behavior and raise the logging level, but in the end the friendly Wikimedia community found on IRC (#wikimedia-seach on Libera.chat) suggested me to dive into logback, which is the actual logging framework used within the Wikidata own version of Blazegraph.

Here is the logback templates used for the main Wikidata instance, intended to be populated by some Ruby script probably buried in the same repository. I've then managed to craft my own plain XML version:

<?xml version="1.0" encoding="UTF-8"?>
<configuration scan="true"  scanPeriod="5 minutes" packagingData="false">

    <!-- ugly trick to ensure ${HOSTNAME} is evaluated -->
    <property scope="context" name="hostname" value="${HOSTNAME}" />

    <evaluator name="SYNTAX_ERROR">
        <!-- This has to work in class loaders that don't have org.openrdf.query.MalformedQueryException too -->
        <expression>
            throwable != null &amp;&amp; throwable.getCause() != null &amp;&amp;
            throwable instanceof java.util.concurrent.ExecutionException &amp;&amp;
            throwable.getCause().getClass().getName().equals("org.openrdf.query.MalformedQueryException")
        </expression>
    </evaluator>
    <evaluator name="QUERY_TIMEOUT">
        <expression>
            throwable != null &amp;&amp; throwable instanceof java.util.concurrent.TimeoutException
        </expression>
    </evaluator>

    <!--
        File based logs:
        * rolling every day or when size > 100MB
    -->
    <appender name="file" class="ch.qos.logback.core.rolling.RollingFileAppender">
        <file>/var/log/wdqs/main.log</file>
        <rollingPolicy class="ch.qos.logback.core.rolling.SizeAndTimeBasedRollingPolicy">
            <!-- daily rollover -->
            <fileNamePattern>/var/log/wdqs/main.%d{yyyy-MM-dd}.%i.log.gz</fileNamePattern>
            <maxFileSize>100MB</maxFileSize>
            <maxHistory>30</maxHistory>
        </rollingPolicy>
        <filter class="org.wikidata.query.rdf.common.log.PerLoggerThrottler" />
        <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
            <pattern>%date %-5level %logger{0} - %message%n</pattern>
            <outputPatternAsHeader>true</outputPatternAsHeader>
        </encoder>
    </appender>
    <appender name="async-file" class="ch.qos.logback.classic.AsyncAppender">
        <neverBlock>true</neverBlock>
        <appender-ref ref="file" />
    </appender>

    <!--
        Console based logs:
        * per logger / message throttling is enabled
        * limited to 10 messages per second
        * level => ERROR
    -->
    <appender name="stdout" class="ch.qos.logback.core.ConsoleAppender">
        <filter class="org.wikidata.query.rdf.common.log.PerLoggerThrottler" />
        <filter class="org.wikidata.query.rdf.common.log.RateLimitFilter">
            <bucketCapacity>10</bucketCapacity>
            <refillIntervalInMillis>1000</refillIntervalInMillis>
        </filter>
        <filter class="ch.qos.logback.classic.filter.ThresholdFilter">
            <level>error</level>
        </filter>
        <encoder class="ch.qos.logback.classic.encoder.PatternLayoutEncoder">
            <pattern>%date %-5level %logger{0} - %message%n</pattern>
            <outputPatternAsHeader>true</outputPatternAsHeader>
        </encoder>
    </appender>
    <appender name="async-stdout" class="ch.qos.logback.classic.AsyncAppender">
        <neverBlock>true</neverBlock>
        <appender-ref ref="stdout" />
    </appender>

    <root level="info">
        <appender-ref ref="async-file"/>
        <appender-ref ref="async-stdout"/>
    </root>

    <logger name="org.wikidata.query.rdf" level="info"/>
    <logger name="org.wikidata.query.rdf.blazegraph.inline.literal.AbstractMultiTypeExtension" level="error"/>
    <logger name="com.bigdata" level="warn"/>
    <logger name="com.bigdata.util.concurrent.Haltable" level="off"/>
    <logger name="com.bigdata.rdf.internal.LexiconConfiguration" level="off"/> <!-- disabled temp. ref: T207643 -->

</configuration>

I've passed this to the main script

LOG_CONFIG=mylogback.xml bash ./runBlazegraph.sh

and... Finally, my Blazegraph process no longer outputs noisy (and slow) debugging details.