JNet performance tips
This article covers performance tips for JNet, explains the reasons behind each recommendation, and provides benchmarks to help you decide when to apply them.
Reduce CLR-JVM™ boundary invocations
The library optimizes boundary invocations where possible, but you should avoid excessive method calls across the CLR-JVM™ boundary when performance is a concern. Consider the following code (available in the tests folder):
const int execution = 10000;
Java.Util.ArrayList<int> alist = new Java.Util.ArrayList<int>();
for (int i = 0; i < execution; i++)
{
alist.Add(i);
}
This code creates a Java.Util.ArrayList<int> and fills it in a loop, crossing the CLR-JVM™ boundary on every Add call. The cumulative cost of those boundary crossings can be significant
when performance matters.
JNet helper class
The specific example above can be optimized using JNetHelper, which builds a java.util.List
of primitive types from a CLR primitive array in a single boundary-efficient operation.
Array transfer
The optimized approach uses JNetHelper.ListFrom to transfer the entire array at once:
const int execution = 10000;
int[] tmpArray = new int[execution];
var tmpJList = JNetHelper.ListFrom(tmpArray);
Java.Util.ArrayList<int> alist = new Java.Util.ArrayList<int>(tmpJList);
This transfers the primitive array using only a few boundary invocations. The list is constructed entirely within the JVM™ and then returned to the CLR.
Transfer via java.nio.*Buffer
The same result can be achieved using a
java.nio.ByteBuffer
via other overloads of JNetHelper.ListFrom. This uses shared memory to move data from CLR to JVM™:
const int execution = 10000;
int[] tmpArray = new int[execution];
var intBuffer = IntBuffer.From(tmpArray, false, false);
var tmpJList = JNetHelper.ListFrom(intBuffer);
Java.Util.ArrayList<int> alist = new Java.Util.ArrayList<int>(tmpJList);
or:
const int execution = 10000;
int[] tmpArray = new int[execution];
var tmpJList = JNetHelper.ListFrom(tmpArray, true);
Java.Util.ArrayList<int> alist = new Java.Util.ArrayList<int>(tmpJList);
The difference between the two is:
- In the first example, the
IntBufferis allocated explicitly and can be reused — for instance, refilled with different data and sent to the JVM™ again without creating a new buffer. - In the second example, an
IntBufferis allocated internally byJNetHelper.ListFromon each invocation.
Performance comparison and tips
The JNetTest project (available in the tests folder) measures timing across all approaches. Using the loop-based example as a baseline:
- The
JNetHelperarray transfer approach is approximately 100× faster. - The
JNetHelperjava.nio.*Bufferapproach can reach 140× faster. - Building a
System.Collections.Generic.List<int>entirely in the CLR can be 1000× faster.
Tip
Three key principles follow from these results:
- When possible, avoid invoking JVM™ methods from the CLR in a tight loop.
- Collapse multiple boundary invocations into as few as possible — just as
JNetHelperdoes. - If a task can be completed entirely in the CLR or entirely in the JVM™, keep it there until a boundary crossing is strictly necessary.
Memory transfer at CLR-JVM™ boundary
JNet provides APIs to manage data exchange at the CLR-JVM™ boundary using
java.nio.ByteBuffer.
The section above shows one use of these APIs
via JNetHelper. However, java.nio.ByteBuffer can be constructed directly from many CLR input types:
- An
IntPtr— a native memory pointer, e.g. from COM, unmanaged allocations, or interop scenarios. - A
System.IO.MemoryStream— e.g. the output of a JSON serialization step. - An array of primitive types (
byte,short,int,long, and so on) — e.g. data read from disk or a network socket.
Without ByteBuffer, transferring a System.IO.MemoryStream to the JVM™ involves multiple copies:
- Extract the data from
System.IO.MemoryStreaminto abyte[]in the CLR. - Allocate the
byte[]and copy the stream content into it. - Transfer the
byte[]to the JVM™, which requires:- a. Allocating a new array in the JVM™ of the same length.
- b. Copying the memory from CLR to JVM™.
- Depending on the JVM™ implementation, step 3b may involve an additional temporary copy at the JNI boundary.
- The JVM™ can then use the data.
With ByteBuffer, the same transfer becomes:
- A
ByteBufferis created directly from theSystem.IO.MemoryStream— no copy yet. - The
ByteBufferreference is passed to the JVM™ — the memory is not moved. - The JVM™ accesses the CLR memory directly using
get()orget(int index).
Important
If the JVM™ code needs a byte[], prefer
get(byte[] dst, int offset, int length)
over
get(byte[] dst):
the former copies in blocks, while the latter copies byte-by-byte.
The impact of array creation
Whether data arrives via JNI array transfer or java.nio.ByteBuffer, if the receiving side needs
to allocate a new primitive array, both the JVM™ and the CLR are heavily impacted by the
allocation and subsequent garbage collection of that array.
Note
The allocation cost is the same regardless of whether the array originates from a JNI boundary
transfer or from a java.nio.ByteBuffer read.
The JNetByteBufferTest project (available in the tests folder) benchmarks the following cases:
- Transfer using a
byte[]— used as the baseline. - Transfer using
java.nio.ByteBuffer, allocating a newbyte[]on each read. - Transfer using
java.nio.ByteBuffer, reusing a previously allocatedbyte[]on each read. - Transfer using
java.nio.ByteBufferwithout reading the data — measures pure pointer transfer.
Tests run in both directions (CLR → JVM™ and JVM™ → CLR), repeated many times across different array lengths. Key findings:
Cases 1 and 2: raw
byte[]outperformsjava.nio.ByteBufferfor small arrays in both directions, because the cost of allocating a newbyte[]on each read offsets the transfer savings.Case 3: reusing the pre-allocated
byte[]gives a 4–5× improvement overbyte[]transfer in most scenarios. For very small payloads transferred from JVM™ to CLR,byte[]still edges ahead.Case 4: not directly comparable to the others — useful for measuring the cost of the
ByteBufferpointer handoff itself, with no data read overhead.
Tip
If your code only needs sparse or indexed access to transferred memory, use java.nio.ByteBuffer
and read individual elements via get(int index) — no array allocation, no memory copy,
maximum throughput.
Performance and tips
Memory allocation is a key performance factor because both the JVM™ and the CLR must find, track, and eventually garbage-collect every allocated array.
Tip
Where possible, reuse previously allocated arrays rather than creating new ones on each iteration. This reduces allocation pressure on both the JVM™ GC and the CLR GC.
Execute iterations in parallel
The Performance comparison and tips section recommends
minimizing CLR-JVM™ boundary crossings, and JNetHelper addresses this for primitives.
However, in many real-world scenarios you must iterate over non-primitive objects. To illustrate,
consider the following snippet:
ArrayList<Java.Lang.String> alist = GetAnArrayListOfString();
foreach (Java.Lang.String item in alist)
{
// EXPENSIVE OPERATION OVER item
}
With a standard iterator, the CLR requests the next Java.Lang.String via Java.Lang.Iterable,
executes the expensive operation, then requests the next item — the JVM™ is idle for the entire
duration of the operation. This means:
- Each EXPENSIVE OPERATION must wait for the next item to be fetched from the JVM™.
- While the EXPENSIVE OPERATION runs, the JVM™ sits idle.
The WithPrefetch extension method solves this by fetching the next item from the JVM™ in
parallel while the current operation is still running:
ArrayList<Java.Lang.String> alist = GetAnArrayListOfString();
foreach (Java.Lang.String item in alist.WithPrefetch())
{
// EXPENSIVE OPERATION OVER item
}
For a further improvement, combine WithPrefetch with WithThread, which offloads the prefetch
to a dedicated native thread:
ArrayList<Java.Lang.String> alist = GetAnArrayListOfString();
foreach (Java.Lang.String item in alist.WithPrefetch().WithThread())
{
// EXPENSIVE OPERATION OVER item
}
WithThread creates an external native thread responsible for driving the prefetch loop,
keeping the fetch pipeline fully independent from the CLR thread executing the operation.
Tip
Use WithPrefetch and WithThread when the number of items is large and the per-item
operation is expensive. For short iterations or cheap operations, the overhead of allocating
the native thread and managing the second iterator may exceed the gains from parallelism.