Code Search for Developers
 
 
  

OpenOfficeParser.java from Kneobase at Krugle


Show OpenOfficeParser.java syntax highlighted

/*
 * Created on 07/10/2004
 *
 */
package com.kneobase.extractors.parser;

import java.io.IOException;
import java.io.InputStream;
import java.text.ParseException;
import java.util.zip.ZipEntry;
import java.util.zip.ZipInputStream;

/**
 * @author Ernesto De Santis
 *
 */
public class OpenOfficeParser extends A_StringParser {

    public ParsedBody parse(InputStream is) throws IOException, ParseException {
        I_BodyParser xmlParser = new TaggedParser();
        return xmlParser.parse(getContent(is));
    }
    
    private InputStream getContent(InputStream is) throws IOException{
        ZipInputStream zipInputStream = new ZipInputStream(is);
        
        getContentZipEntry(zipInputStream);
        
        return zipInputStream;
    }
    
    private ZipEntry getContentZipEntry(ZipInputStream zipInputStream) throws IOException{
        ZipEntry zipEntry;
        do{
            zipEntry = zipInputStream.getNextEntry();
        }while(zipEntry != null && !zipEntry.getName().equals("content.xml"));
        
        return zipEntry;
    }
    

}




See more files for this project here

Kneobase

Kneobase is an enterprise search engine, based upon the Lucene search engine and the Spring framework. It allows to perform full-text search across many different content sources. It is highly adaptable out-of-the-box and has a pluggable architecture.

Project homepage: http://sourceforge.net/projects/kneobase
Programming language(s): Java,XML
License: other

  A_StringParser.java
  ExcelPOIParser.java
  ExcelParser.java
  HtmlJTidyParser.java
  HtmlParser.java
  OpenOfficeParser.java
  PdfBoxParser.java
  PdfParser.java
  PlainParser.java
  PptPOIParser.java
  PptParser.java
  RtfParser.java
  TaggedParser.java
  WordPOIParser.java
  WordParser.java
  WordTextMiningParser.java
  XmlParser.java
  XmlSAXParser.java