Suggest a topic or lightning talk for the Information On Demand 2012 Technical Unconference

Data extraction of unstructured data while maintaining logical document composition

Data extraction tools that are used to operate on unstructured documents, such as Microsoft Office and IBM Symphony, ignore logical composition expressed by document creators. Logical composition of a document refers to Header, Footer, Body, custom properties, ...

The logical composition is essential when operating on the extracted data. For example if I am classifying a document it might be more important that I find some string of text in a header versus footer versus body.

The idea here is to do data extraction while preserving logical document composition.

13 votes
Vote
Sign in
Check!
(thinking…)
Reset
or sign in with
  • facebook
  • google
    Password icon
    I agree to the terms of service
    Signed in as (Sign out)
    You have left! (?) (thinking…)
    Anonymous shared this idea  ·   ·  Admin →

    0 comments

    Sign in
    Check!
    (thinking…)
    Reset
    or sign in with
    • facebook
    • google
      Password icon
      I agree to the terms of service
      Signed in as (Sign out)
      Submitting...

      Feedback and Knowledge Base