Implementation of Automatic Wrapper Adaptation System Using Dom Tree for Web Mining

!!!! Bi-Annual Double Blind Peer Reviewed Refereed Journal !!!!

!!!! Open Access Journal !!!!


A. A. Tekale - Department of computer engineering, ZES’S DCOER, Pune

Dr. Rajesh Prasad - Department of computer engineering, ZES’S DCOER, Pune

S. S. Nandgaonkar - Department of computer engineering, VPCOE, Baramati, Pune


Extracting precise information from Web sites is a useful task to obtain structured data from unstructured or semi structured data. This data is useful in further intelligent processing. Wrappers are the common information extraction systems which will transform largely unstructured information to structured data. Method in this paper is meant for extracting Web data. Some of the existing techniques require manually preparing training data and some does not require manual intervention. Wrapper generated for one site cannot be directly applied to new site even if the domain is same. Some methods only extract those data attributes which are specified in wrapper but, unseen Web pages may have additional attributes which needs to be identified. Automatically adapting the information extraction knowledge to a new unseen site, at the same time, discovering previously unseen attributes is the challenging task. System learns information extraction knowledge for new web site automatically. New attributes are discovered as well.

No votes yet
Your rating: None