org.htmlcleaner
Class HtmlTokenizer

java.lang.Object
  extended by org.htmlcleaner.HtmlTokenizer

public abstract class HtmlTokenizer
extends Object

Main HTML tokenizer.

It's task is to parse HTML and produce list of valid tokens: open tag tokens, end tag tokens, contents (text) and comments. As soon as new item is added to token list, cleaner is invoked to clean current list at the end.


Constructor Summary
HtmlTokenizer(Reader reader, CleanerProperties props, CleanerTransformations transformations, ITagInfoProvider tagInfoProvider)
          Constructor - cretes instance of the parser with specified content.
 
Method Summary
 DoctypeToken getDocType()
           
 
Methods inherited from class java.lang.Object
clone, equals, finalize, getClass, hashCode, notify, notifyAll, toString, wait, wait, wait
 

Constructor Detail

HtmlTokenizer

public HtmlTokenizer(Reader reader,
                     CleanerProperties props,
                     CleanerTransformations transformations,
                     ITagInfoProvider tagInfoProvider)
              throws IOException
Constructor - cretes instance of the parser with specified content.

Parameters:
reader -
props -
transformations -
tagInfoProvider -
Throws:
IOException
Method Detail

getDocType

public DoctypeToken getDocType()


Copyright © 2006-2011. All Rights Reserved.