This week only. Pastebin PRO Accounts Christmas Special! Don't miss out!Want more features on Pastebin? Sign Up, it's FREE!
Guest

Untitled

By: a guest on Mar 12th, 2013  |  syntax: Python  |  size: 5.33 KB  |  views: 29  |  expires: Never
download  |  raw  |  embed  |  report abuse  |  print
Text below is selected. Please press Ctrl+C to copy to your clipboard. (⌘+C on Mac)
  1. #!/usr/bin/python
  2. # -*- coding: utf-8 -*-
  3. #
  4.  
  5. import sys
  6. from lxml import etree
  7.  
  8.  
  9.  
  10. data3 = '''
  11. <!DOCTYPE html>
  12. <html xmlns="http://www.w3.org/1999/xhtml" xmlns:og="http://opengraphprotocol.org/schema/" xmlns:fb="http://www.facebook.com/2008/fbml">
  13. <head>
  14. <meta charset="UTF-8">
  15.         <title>The 20 Most Despicable Things Gordon Ramsay Has Said and Done, Ranked -- Grub Street New York</title>
  16.        
  17.         <link rel="alternate" type="application/rss+xml" title="RSS 2.0" href="http://feedproxy.google.com/nymag/grubstreet" />
  18.        
  19.        
  20.  
  21.         <meta name="Headline" content="The 20 Most Despicable Things Gordon Ramsay Has Said and Done, Ranked" />
  22.         <meta name="keywords" content="april bloomfield, el gordo, frank bruni, gordon ramsay, lawsuits, lists, marcus samuelsson, mario batali, shitlist, spotted pig, sued" />
  23.  
  24.         <meta name="description" content="Racism, fat-shaming, and vegetarian trickery." />
  25.  
  26.         <meta name="Byline" content="Sierra Tishgart" />
  27.         <meta name="Type_of_Feature" content="" />
  28.         <meta name="Issue_Date" content="March  8, 2013 12:50 PM" />
  29.         <meta name="related_stories" content="The 20 Most Despicable Things Gordon Ramsay Has Said and Done, Ranked" />
  30.         <meta name="document_type" content="Blog" />
  31.         <meta name="category" content="Lists" />
  32.  
  33.         <link rel="image_src" href="http://pixel.nymag.com/imgs/daily/grub/2013/03/08/08-gorgon-ramsay.o.jpg/a_146x97.jpg" />
  34.         <link rel="canonical" href="http://newyork.grubstreet.com/2013/03/20-despicable-things-gordon-ramsay.html" id="canonical" />
  35.        
  36.         <script>
  37.                 var canonicalUrl = "http://newyork.grubstreet.com/2013/03/20-despicable-things-gordon-ramsay.html";
  38.         </script>
  39.        
  40.  
  41.        
  42.         <meta name="content.tags.primary" content=";network - Grub Street,;city - New York City,;tag - lists" />
  43.         <meta name="content.tags" content=";tag - april bloomfield,;tag - el gordo,;tag - frank bruni,;tag - gordon ramsay,;tag - lawsuits,;tag - marcus samuelsson,;tag - mario batali,;tag - shitlist,;tag - spotted pig,;tag - sued" />
  44.         <meta name="content.hierarchy" content="New York City:Grub Street" />
  45.         <meta name="content.type" content="Blog" />
  46.         <meta name="content.subtype" content="Blog Entry" />   
  47.  
  48.        
  49.         <meta property="fb:app_id" content="206283005644" />
  50.         <meta property="og:title" content="The 20 Most Despicable Things Gordon Ramsay Has Said and Done, Ranked" />
  51.         <meta property="og:description" content="Racism, fat-shaming, and vegetarian trickery." />
  52.         <meta property="og:image" content="http://pixel.nymag.com/imgs/daily/grub/2013/03/08/08-gorgon-ramsay.o.jpg/a_146x97.jpg"/>
  53.         <meta property="og:url" content="http://newyork.grubstreet.com/2013/03/20-despicable-things-gordon-ramsay.html" />
  54.         <meta property="og:type" content="article" />
  55.         <meta property="og:site_name" content="Grub Street New York" />
  56.  
  57.  
  58.        
  59.        
  60.        
  61.         <meta name="viewport" content="width=1020">
  62.  
  63. <link type="text/css" rel="stylesheet" href="http://cache.nymag.com/css/screen/grubstreet/grubstreet-core.css" media="all" />
  64. <link type="text/css" rel="stylesheet" href="http://cache.nymag.com/css/screen/section/daily/slideshow.css" media="all" />
  65. <link type="text/css" rel="stylesheet" href="http://cache.nymag.com/css/screen/echo.css" media="all" />
  66. <link type="text/css" rel="stylesheet" href="http://cache.nymag.com/css/screen/loginRegister.css" media="all" />
  67. <link rel="stylesheet" href="http://cache.nymag.com/css/screen/advertising.css" media="all" />
  68. <link rel="shortcut icon" href="http://images.nymag.com/gfx/grubst/favicon.ico" />
  69.  
  70. <style type="text/css">
  71. #adsplashtop,#pushdown {padding:5px 5px;}
  72. #pushdown {border-top:1px solid #737373}
  73. </style>
  74.  
  75.  
  76.                
  77.        
  78.        
  79.                
  80.        
  81.        
  82.        
  83.        
  84.        
  85.         <!--[if IE 6]>
  86.         <link rel="stylesheet" href="http://cache.nymag.com/css/screen/grubstreet/win-ie6.css" type="text/css" media="screen, projection" />
  87. <![endif]-->
  88.  
  89. <!--[if IE 7]>
  90.         <link rel="stylesheet" href="http://cache.nymag.com/css/screen/grubstreet/win-ie7.css" type="text/css" media="screen, projection" />
  91. <![endif]-->
  92.    
  93.  
  94.  
  95.        
  96.         <script type="text/javascript">
  97.                 var NYM = {};
  98.                 NYM.config = {};
  99.                 NYM.config.membership = {
  100.                         "service":"nym"
  101.                 };
  102.                 NYM.config.advertising = {
  103.                         "sitename":"nym.grubstreet"
  104.                 };
  105.                
  106.         </script>
  107.                
  108.        
  109.        
  110.        
  111. <script type="text/javascript">
  112.         var date = 'March 12, 2013 12:42:38';
  113.         var currDate=new Date(date);
  114.         var GRUBST = {};
  115.         if (!NYM) {  
  116.                 var NYM = {};
  117.                 NYM.config = {};
  118.                 NYM.config.membership = {
  119.                         "service":"nym"
  120.                 };
  121.                 NYM.config.advertising = {
  122.                          "sitename":"nym.grubstreet"
  123.                 };
  124.         }
  125. </script>
  126. <script type="text/javascript" src="http://cache.nymag.com/scripts/modernizr-1.7.min.js"></script>
  127. <script type="text/javascript" src="http://ajax.googleapis.com/ajax/libs/jquery/1.4.2/jquery.min.js"></script>
  128. <script type="text/javascript" src="http://cache.nymag.com/scripts/jquery-ui-1.8.2.custom.min.js"></script>
  129. <script type="text/javascript" src="http://cache.nymag.com/scripts/ad_manager.js"></script>
  130. <script type="text/javascript" src="http://cache.nymag.com/js/2/global.js"></script>
  131. <script type="text/javascript" src="http://cache.nymag.com/scripts/skinTakeover.js"></script>
  132. <script type="text/javascript" src="http://cache.nymag.com/scripts/grubstreet-controls.js"></scr
  133. '''
  134.  
  135.  
  136.  
  137.  
  138.  
  139.  
  140.  
  141. n = 0
  142. while True:
  143.         n += 1
  144.  
  145.         tree = etree.HTML( data3 )
  146.         m = tree.xpath("//meta[@property]")
  147.        
  148.         print '-', n
  149.         for i in m:
  150.                 print n
  151.                 #print (i.attrib['property'], i.attrib['content'])
clone this paste RAW Paste Data