This is a mirror of official site: http://jasper-net.blogspot.com/

HTML DOM Using .Net

| Sunday, June 6, 2010
Today the software development landscape has evolved significantly with the proliferation of Web technologies. Thus a majority of applications developed have some form of connectivity or integration with another application, web service, web application, remote database, etc.

This article will therefore try to touch one specific area, which is HTML content and DOM. And in doing so will investigate two approaches available in .Net which can be used to fuse these two for some practical purpose.

Examples provided are based on .Net code and libraries. However, the concepts remain the same for HTML and DOM are independent from any programming language. This article is not exhaustive in any manner however references are provided for those seeking a more in depth coverage.

Background

According to W3C [1], HTML is the publishing language for the World Wide Web. This basically means that HTML is the language that is used to display content in your web browser when you visit any website.

HTML (Hyper Text Markup Language) is a markup language where predefined tags are used to instruct the browser how content should appear. For example, <h1>This is a heading</h1>, is the heading tag that tells the browser, the text “This is a heading” should be displayed bolded, and slightly bigger than the rest of the text on the web page. Different tags are used for different purposes. These tags are defined by the W3C in their language specifications. Currently the latest specification for HTML is HTML 4.01 [2]. The purpose of a specification is to specify how a certain language should be used, i.e. the recommendations by the creators of the language. HTML 4.01 specification by W3C recommends how HTML should be used in your websites and what the language is suppose to do. There is also XHTML 1.0 which is the latest specification for XTHML [3]. This is an extension of the HTML 4, which was designed with the intent to harness and integrate the power of XML in web pages.

DOM (Document Object Model) is an interface that allows applications to dynamically access content, structure and style of documents. It is not restricted to a specific platform or language [4]. W3C has defined several levels of DOM (e.g. DOM Level 1 – 3) and also several modules for each level (e.g. Core, XML, HTML, etc. 14 modules altogether). An implementation (application, agent, library, api, sdk, etc.) is said to conform to a certain DOM level or a module, if it that implementation supports all the interfaces for that module and the associated semantics [5] [6].

Approach

The steps that will be taken to demonstrate how HTML DOM can be used in .Net are:

Step 1. Retrieve the HTML Content
Step 2. Process the HTML Content using DOM
Step 3. Make use of the processed HTML Content
The following are the details of these steps.

Read more: Codeproject

Posted via email from jasper22's posterous

0 comments: