Advertisement
Guest User

Untitled

a guest
Mar 26th, 2019
82
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 5.62 KB | None | 0 0
  1. XML DOM
  2. - API: caled XML DOM
  3. - XML: was the standard mark up language to use to exchange information between computers over a network. Its not as dominant anymore because of JSON.
  4. - WHat is XML
  5. The extensibe markup language is a markup language that defines a set of rules for encoding ocuments in a format that i both human-readable and machine-readable
  6. - XML is self describing
  7. - XML is designed to store and transport data
  8. - XML seperates data from presentation
  9. - XML tages are not predefined
  10. - XML is a platform independent
  11.  
  12. XML Termonology
  13. - tags have names: sample, color
  14. - Start tags: <sample>, <color>
  15. - End tags: </sample>, </color>
  16. - An XML element starts and ends with matching tag names:
  17. <color>Purple</color>
  18. - Tag nesting constructs a heirarchy (tree data structures)
  19. - Elements can have attributes: <sample id="5039">
  20.  
  21. FIgure 1. Example code, Most Popular Color Survey:
  22. <sample id="5309">
  23. <male>
  24. <name>John</name>
  25. <color>Yellow</color>
  26. </male>
  27. <male>
  28. <name>Micheal</name>
  29. <color>Purple</color>
  30. </male>
  31. </sample>
  32. Well Formed XML Documents
  33. - at its base level well-formed documents require that:
  34. 1. Content be defined.
  35. 2. Content be delimited with a beginning and end tag
  36. 3. Content be properly nested (parents within roots, children within parents)
  37.  
  38. - To be a well-formed document, rules must be established about the declaration and treatment of entities. Tags are case sensitive, with attributes delimited with quotation marks. Empty elements have rules established. Overlapping tags invalidate a document. Ideally, a well-formed document conforms to the design goals of XML. Other key syntax rules provided in the specification include:
  39.  
  40. 1. It contains only properly encoded legal Unicode characters.
  41.  
  42. 2. None of the special syntax characters such as < and & appear except when performing their markup-delineation roles.
  43.  
  44. 3. The begin, end, and empty-element tags that delimit the elements are correctly nested, with none missing and none overlapping.
  45.  
  46. 4. The element tags are case-sensitive; the beginning and end tags must match exactly. Tag names cannot contain any of the characters !"#$%&'()*+,/;<=>?@[\]^`{|}~, nor a space character, and cannot start with -, ., or a numeric digit.
  47.  
  48. 5. There is a single "root" element that contains all the other elements.
  49.  
  50.  
  51. What is XHTML?
  52. - XHTML is a standard for HTML that follows all the well formedness rules of XML. The important differences from HTML are:
  53. 1. XHTML elements must properly nested
  54. 2. XHTML elements mst always be closed
  55. 3. XHTML elementsmust be in lowercase
  56. 4. XHTML documents must have one root element
  57. 5. Attributes names must be in lower case
  58. 6. Attribute values must be quoted
  59. - Since XHTML documents are well formed, they may, therefore be parse using standard XML parsers, unlike HTML, which requireds a lenient HTML-specific parser.
  60.  
  61. Case Stude - From HTML to XHTML:
  62. $ java -jar tagsoup-1.2.1.jar --files sample.html
  63. src: sample.html dst: sample.xhtml
  64.  
  65. <html>
  66. <body>
  67. <h3>Sample</h3>
  68. <table id="5309">
  69. <tr><td>John></td><td>Yellow</td><tr>
  70. <tr><td>Micheal</td><td>Purple</td></tr>
  71. </table>
  72. </body>
  73. </html>
  74. id - attribute
  75. john, micheal - text
  76. td - element
  77. tr -
  78.  
  79. What is DOM?
  80. - The document Object model(DOM) is a platform and language-neutral application programmign interface (API) for XML documents. It defines the logical structures of documents and the way a document is accessed and manipulated.
  81. - With the Document Object Model, programmers can build documents, navigate their structure, and add , modify, or delete elements and content.
  82. - DOM is a logical model that may be implements in any convienient manner. MOst implementations use a logical strucutre wich is very much like a tree.
  83. - The name "Document Object MOdel" was chosen because it is "object model" in the traditional object ooriented design sense. The model encompasses not only the structure of the database.
  84.  
  85. API: DOM
  86. The document object model is a platform and language neutral applicaiton programming interface for XML documents
  87. - Interface NodeList
  88. - Interface Node
  89. - Interface Element
  90. - Interface Document
  91.  
  92. A NodeList represents a sequences of nodes:
  93. - length - tells you how many elements it has
  94. - item(i) - function item and give it index i
  95.  
  96. Interface Node
  97. - All of the components of an XML document are subclasses of NOde.
  98. - DOMString nodeName - nodeType: |Node.ELEMENT_NODE
  99. - DOMString nodeValue |Node.ATTRIBUTE_NODE
  100. - Node parentNode |Node.TEXT_NODE
  101. - boolean hasChilNodes() |Node.DOCUMENT_NODE
  102. - NodeList childNodes
  103.  
  104. Interface Element
  105. - Element is a subclass of Node
  106. - NodeList getElementsByTagName(tagName)
  107. - boolean hasAttribute(attributeName)
  108. - DOMSTRING getAttribute(attributeName)
  109.  
  110. Interface Document
  111. - A document represents an entire XML document, including its constitent elements, attributes, comments, etc
  112. - Element documentElement
  113. - Element getElementByID(elementId)
  114. - NodeList getElementsByTagName(tagName)
  115. documentElement is the one and only root element of the document.
  116.  
  117. From XHTML to CSV: comma seperated varibales
  118. $ pytho xhtmlToCsv.py sample.xhtml
  119. John,Yellow
  120. Micheal,Purple
  121. Python code:
  122. import sys
  123. import xml.dom.minidom
  124. document=xml.dom.minidom.parse(sys.argv[1])
  125. tablesElements = document.getElementsByTagName('table')
  126. for tr in tableElements[0].getElemementsByTagName('tr'):
  127. data = []
  128. for td in tr.getElementsByTageName('td'):
  129. for node in td.childNodes:
  130. if node in td.childNodes:
  131. if node.nodeType == node.TEXT_NODE:
  132. data,append(node,nodeValue)
  133. print(','.join(data))
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement