ON THE USE OF SIDE INFORMATION FOR MINING TEXT DATA
Keywords:Text Mining, Side Information, COATES, Clustering, Data Mining.
Side information is available along with text document several text mining application. This side
information can be the link in the documents, web logs which contains user access behavior, provenance information, the
link for ant document or any other non-textual attributes which are embedded in text document. All these attributes may
contain huge amount of information for clustering purposes. Sometimes clustering more difficult when some of the
information is noisy. In this matter it is inconvenient to merge side-information into the mining process because either it
can upgrade the quality of the representation for mining process or can add noise in this system. Thus, there should be a
right way to do this mining process so that it will make use of side information to maximize their advantage. Therefore, it
suggests to design an efficient algorithm which makes combination of classical portioning algorithm with probabilistic
models in order to create an effective clustering approach. Then the clustering approach will extend to classification
approach for real data set which shows advantages of using such an approach.