
Untitled
By: a guest on
May 27th, 2012 | syntax:
None | size: 1.00 KB | hits: 27 | expires: Never
Parsing contents of paragraph elements with Nokogiri
<p>
<font size="5" face="Arial, Helvetica, sans-serif" color="#00CCAA" class="">
<font color="#AAFF33" class="">
October 10, 1990 - Maybe a Title
</font>-
<font size="4" class="">
Some long text here.
<font color="#66CC00" class="">
<a href="SourceTitle/date.pdf">[Blah Blah, October 27, 1982 p. 2</a>
]
</font>.
More content.
<font color="#00FF33" class="">[Another Source, 1971, issue 01/4]
</font>.
</font>
<font size="5" face="Arial, Helvetica, sans-serif" color="#00CCAA" class="">
<font color="#AAFF33" class=""><font size="4" color="#00CCAA" class="">
Another fantastic article.
<a href="SourceTitle/Date.pdf">[Some Source, October 4, p.6]</a>
</font>
</font>
</font>
</font>
</p>
>> doc.xpath('//p').each do |node|
.. puts node.xpath("font[@size='5']/font").first.content.strip
.. end #=> 0
October 10, 1990 - Maybe a Title