scroll to top
Stuck on your essay?
Get ideas from this essay and see how your work stacks up
Word Count: 8,428
ABSTRACT Text mining also known as knowledge discovery from text and document information mining refers to the process of extracting interesting patterns from very large text corpus for the purposes of discovering knowledge Text mining is an interdisciplinary field involving information retrieval text understanding information extraction clustering categorization visualization database technology machine learning and data mining Regarded by many as the next wave of knowledge discovery text mining has a very high commercial value This paper presents a general framework for text mining consisting of two stages text refining that transforms unstructured text documents into an intermediate form and knowledge distillation that deduces patterns or knowledge from the intermediate form I then give the explanations of two of the text refining methods which are information retrieval and information extraction Then I survey different documents representation methods and algorithms give the comparison among these representation and algorithms and also some of their advantages and limitations I then survey the state-of-the-art text mining approaches products and applications by aligning them based on the text refining and knowledge distillation functions as well as the intermediate form that they adopt At the last part I highlight the upcoming challenges of text mining and the opportunities it offers and give a short conclusion 1 INTRODUCTION Text mining also known as text data mining 25 or knowledge discovery from textual databases 19 is an emerging technology for analyzing large collections of unstructured documents for the purposes of extracting interesting and non-trivial patterns or knowledge It can be envisaged as a leap from data mining or knowledge discovery from structured databases 17 58 As the most natural form of storing and exchanging information is written words text mining has a very high commercial potential In fact a recent study indicated that 80 of a companys information was contained in text documents such as emails memos customer correspondence and reports The ability to distil this untapped source of
@Kibin is a lifesaver for my essay right now!!
- Sandra Slivka, student @ UC Berkeley
Wow, this is the best essay help I've ever received!
- Camvu Pham, student @ U of M
If I'd known about @Kibin in college, I would have gotten much more sleep
- Jen Soust, alumni @ UCLA