I am trying to figure out how to run the Clinical Document Pipeline from Java. I have a set of clinical documents as plain texts. I want to parse these documents and extract a list of that is in document doc_ID, there is CUI with frequency of freq. I spent several days installing cTAKES and looking for a solution. I narrow it down to ClinicalPipelineWithUmls.java where gets a test and runs SimplePipeline with a AnaylisisEngineDescription. Here is a part of the code:
String documentText = "Text of document to test goes here, such as the following. No edema, some soreness, denies pain.";
InputStream inStream = InputStreamCollectionReader.convertToByteArrayInputStream(documentText);
CollectionReader collectionReader = InputStreamCollectionReader.getCollectionReader(inStream);
AnalysisEngineDescription pipelineIncludingUmlsDictionaries = AnalysisEngineFactory.createAnalysisEngineDescription(
"desc/analysis_engine/AggregatePlaintextUMLSProcessor");
AnalysisEngineDescription xWriter = AnalysisEngineFactory.createPrimitiveDescription(
XWriter.class,
XWriter.PARAM_OUTPUT_DIRECTORY_NAME,
AssertionConst.evalOutputDir,
XWriter.PARAM_XML_SCHEME_NAME,
XWriter.XMI,
XWriter.PARAM_FILE_NAMER_CLASS_NAME,
CtakesFileNamer.class.getName());
SimplePipeline.runPipeline(collectionReader, pipelineIncludingUmlsDictionaries, xWriter);
System.out.println("Done at " + new Date());
The problem is it can not find "InputStreamCollectionReader". I searched for it but no success so far! Would you please give me a hint or show some directions? thanks for any help!
Is there any particular reason why you want to use InputStreamCollectionReader? Otherwise, there are examples on how to use TextReader here.
If you love us? You can donate to us via Paypal or buy me a coffee so we can maintain and grow! Thank you!
Donate Us With