I need a more efficient way to locate those paragraphs of nuggets that matter to me. So what I’d like is kind of a local document index/repository, that would allow me to have some standing queries and easily locate sections in documents that talk about my queries. Here’s an example:
- I’d like to load in 10 large PDF files, each of say 100 pages. Each PDF contains English text, formatted very nicely into paragraphs and sections.
- I’d like to specify that I am interested in “blogging platforms”, “weaknesses in Ruby”, “localization and internationalization”
- Ideally then look at a list that showed the section of text, the name of the document, and other information that seemed to be related to and/or include the words and phrases I specified.
I am sure something like this exists. I would call it something like document indexing, document comprehension or structured searching.
Any suggested leads or ideas?