This is a mirror of official site: http://jasper-net.blogspot.com/

Using the Tika Java Library In Your .Net Application With IKVM

| Wednesday, April 27, 2011
This may sound scary and heretical but did you know it is possible to leverage Java libraries from .Net applications with no TCP sockets or web services getting caught in the crossfire? Let me introduce you to IKVM, which is frankly magic:

IKVM.NET is an implementation of Java for Mono and the Microsoft .NET Framework. It includes the following components:
  • A Java Virtual Machine implemented in .NET
  • A .NET implementation of the Java class libraries
  • Tools that enable Java and .NET interoperability
Using IKVM we have been able to successfully integrate our Dovetail Seeker search application with the Tika text extraction library implemented in Java. With Tika we can easily pull text out of rich documents from many supported formats. Why Tika?  Because there is nothing comparable in the .Net world as Tika.
This post will review how we integrated with Tika. If you like code you can find this example in a repo up on Github.

Compiling a Jar Into An Assembly

First thing, we need to get our hands on the latest version of Tika. I downloaded and built the Tika source using Maven as instructed. The result of this was a few jar files. The one we are interested in is tika-app-x.x.jar which has everything we need bundled into one useful container.
Next up we need to convert this jar we’ve built to a .Net assembly. Do this using ikvmc.exe.

tika\build>ikvmc.exe -target:library tika-app-0.7.jar

Unfortunately, you will see tons of troublesome looking warnings but the end result is a .Net assembly wrapping the Java jar which you can reference in your projects. 

Using Tika From .Net

IKVM is pretty transparent. You simply reference the the Tika app assembly and your .Net code is talking to Java types. It is a bit weird at first as you have Java versions of types and .Net versions. Next you’ll want to make sure that all the dependent IKVM runtime assemblies are included with your project. Using Reflector I found that the Tika app assembly referenced a lot of IKVM assemblies which do not appear to be used. I had to figure out through trial and error which assemblies where not being touched by the rich document extractions being done. If need be you could simple include all of the referenced IKVM assemblies with your application. Below I have done the work for you and eliminated all references to all the IKVM assemblies which appear to be in play.

Posted via email from Jasper-net

0 comments: