HTML parser based on HTMLEditorKit.ParserCallback : HTML Parser « Network « Java Tutorial






import java.io.InputStream;
import java.io.InputStreamReader;
import java.io.Reader;
import java.net.URL;

import javax.swing.text.MutableAttributeSet;
import javax.swing.text.html.HTML;
import javax.swing.text.html.HTMLEditorKit;
import javax.swing.text.html.parser.ParserDelegator;

public class Main {
  public static void main(String args[]) throws Exception {
    URL url = new URL(args[0]);
    Reader reader = new InputStreamReader((InputStream) url.getContent());
    new ParserDelegator().parse(reader, new HTMLParse(), false);
  }
}

class HTMLParse extends HTMLEditorKit.ParserCallback {
  public void handleText(char[] data, int pos) {
    System.out.println(data);
  }

  public void handleStartTag(HTML.Tag t, MutableAttributeSet a, int pos) {
    System.out.println("+" + t.toString());
  }

  public void handleSimpleTag(HTML.Tag t, MutableAttributeSet a, int pos) {
    System.out.println("*" + t.toString());
  }

  public void handleEndTag(HTML.Tag t, int pos) {
    System.out.println("-" + t.toString());
  }

}








19.26.HTML Parser
19.26.1.Getting the Links in an HTML Document
19.26.2.Getting the Text in an HTML Document
19.26.3.Escape HTML special characters from a String
19.26.4.Using javax.swing.text.html.HTMLEditorKit to parse html document
19.26.5.Extract links from an HTML page
19.26.6.extends HTMLEditorKit.ParserCallback
19.26.7.HTML parser based on HTMLEditorKit.ParserCallback
19.26.8.Find and display hyperlinks contained within a web page
19.26.9.Get all hyper links from a web page
19.26.10.HTML Parser