Skip to content

Rework janino scanner to make it lighter in mem and faster#7190

Open
rmannibucau wants to merge 1 commit into
apache:mainfrom
rmannibucau:dev/rework-janino-function-scanner
Open

Rework janino scanner to make it lighter in mem and faster#7190
rmannibucau wants to merge 1 commit into
apache:mainfrom
rmannibucau:dev/rework-janino-function-scanner

Conversation

@rmannibucau
Copy link
Copy Markdown
Contributor

@rmannibucau rmannibucau commented May 29, 2026

Please add a meaningful description for your change here


Thank you for your contribution! Follow this checklist to help us incorporate your contribution quickly and easily:

  • Run mvn clean install apache-rat:check to make sure basic checks pass. A more thorough check will be performed on your pull request automatically.
  • If you have a group of commits related to the same change, please squash your commits into one and force push your branch using git rebase -i.
  • Mention the appropriate issue in your description (for example: addresses #123), if applicable.

To make clear that you license your contribution under the Apache License Version 2.0, January 2004
you have to acknowledge this by using the following check-box.

--

some figures using a small jmh benchmark and just a prefiltering optimization - we can still do filtering on the fly on jars to avoid to load them all in mem but after the jar filtering it is less impacting, in terms of mem we can save dozens of megs (~50M on a test app)

Metric guiceClassPath xbeanFinder Improvement
Throughput (ops/ms) 0.069 0.378 +448% (5.48× higher)
Average Time (ms/op) 14.251 2.624 81.6% faster
Sample Average (ms/op) 14.447 2.583 82.1% faster
p50 (ms) 14.107 2.556 81.9% faster
p90 (ms) 15.734 2.654 83.1% faster
p95 (ms) 16.386 2.736 83.3% faster
p99 (ms) 19.360 3.014 84.4% faster
p99.9 (ms) 26.313 6.582 75.0% faster
Max (ms) 26.313 6.668 74.7% faster
Single Shot (ms/op) 15.813 3.087 80.5% faster

the code of the bench:

package bench;

import com.google.common.reflect.ClassPath;
import java.io.IOException;
import java.lang.reflect.Method;
import java.util.Arrays;
import java.util.Collection;
import java.util.Set;
import java.util.concurrent.TimeUnit;
import java.util.stream.Collectors;
import java.util.stream.Stream;
import org.apache.hop.pipeline.transforms.janino.function.JaninoFunction;
import org.apache.hop.pipeline.transforms.janino.scanner.ClassLoaderScanner;
import org.openjdk.jmh.annotations.Benchmark;
import org.openjdk.jmh.annotations.BenchmarkMode;
import org.openjdk.jmh.annotations.Fork;
import org.openjdk.jmh.annotations.Measurement;
import org.openjdk.jmh.annotations.Mode;
import org.openjdk.jmh.annotations.OutputTimeUnit;
import org.openjdk.jmh.annotations.Scope;
import org.openjdk.jmh.annotations.Setup;
import org.openjdk.jmh.annotations.State;
import org.openjdk.jmh.annotations.Warmup;
import org.openjdk.jmh.infra.Blackhole;

@BenchmarkMode(Mode.All)
@OutputTimeUnit(TimeUnit.MILLISECONDS)
@Warmup(iterations = 2, time = 2)
@Measurement(iterations = 3, time = 2)
@Fork(0)
@State(Scope.Thread)
public class ScannerBenchmark {

  private static final String PACKAGE = "org.apache.hop.pipeline.transforms.janino.function";

  private ClassLoaderScanner scanner;
  private ClassLoader classLoader;

  @Setup
  public void setup() {
    scanner = new ClassLoaderScanner();
    classLoader = Thread.currentThread().getContextClassLoader();
  }

  @Benchmark
  public void xbeanFinder(Blackhole bh) throws IOException {
    Collection<Method> methods =
        scanner.findMethodsWithAnnotationInPackage(
            classLoader, PACKAGE, JaninoFunction.class);
    bh.consume(methods);
  }

  @SuppressWarnings("deprecation")
  @Benchmark
  public void guiceClassPath(Blackhole bh) throws IOException {
    var classes = findAllClassesUsingGoogleGuice(classLoader, PACKAGE);
    var methods =
        classes.stream()
            .flatMap(c -> Arrays.stream(c.getDeclaredMethods()))
            .filter(m -> m.isAnnotationPresent(JaninoFunction.class))
            .collect(Collectors.toList());
    bh.consume(methods);
  }

  @Deprecated // shouldn't be public, kept for legacy and external usage
  public Set<Class<?>> findAllClassesUsingGoogleGuice(ClassLoader classLoader, String packageName)
      throws IOException {
    return ClassPath.from(classLoader).getAllClasses().stream()
        .filter(clazz -> clazz.getPackageName().contains(packageName))
        .flatMap(
            clazz -> {
              try {
                return Stream.of(clazz.load());
              } catch (Exception | Error e) {
                return Stream.empty();
              }
            })
        .collect(Collectors.toSet());
  }
}

WARNING: there are two known differences with guava scanning:

  1. manifest classpath entries are not respected - but note that it is not rare this value is broken anyway and it is unlikely to be used for real runtime,
  2. escaping of spaces can be a bit different but it is an old JRE bug (URL)
  3. jar are filtered by prefix (using ClassLoaderScanner.ignored-jars.txt) so the name of the jar becomes something sensitive

@rmannibucau rmannibucau force-pushed the dev/rework-janino-function-scanner branch 4 times, most recently from 7a7a450 to 833ad77 Compare May 29, 2026 16:10
@rmannibucau rmannibucau force-pushed the dev/rework-janino-function-scanner branch from 833ad77 to 33ed904 Compare May 29, 2026 16:13
@hansva
Copy link
Copy Markdown
Contributor

hansva commented May 31, 2026

awesome work! will review and merge

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants