Advertisement
cjjmccray

Creator-less Government PDF Files - 1

May 10th, 2012
79
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 7.52 KB | None | 0 0
  1. UK Government PDF Files - Authors, Creators and Publishers - take #1
  2. --------------------------------------------------------------------
  3.  
  4. Some notes on a brief survey of PDF files on the UK Government's Cabinet Office webserver.
  5.  
  6. From URL:
  7. http://www.cabinetoffice.gov.uk/sites/default/files/resources/PMcombinedaprjunfinal.pdf
  8.  
  9. we can download and view that PDF file, and extract some information about it - the output from ls -l, and the fields: Author, Producer, Creator, Created, Modified and Format from the PDF file.
  10.  
  11. -rw-rw-r-- 1 chrism chrism 115421 2012-04-02 00:19 PMcombinedaprjunfinal.pdf
  12. Author: <none>
  13. Producer: <none>
  14. Creator: <none>
  15. Created: Fri 09 Dec 2011 16:46:18 GMT
  16. Modified: Fri 09 Dec 2011 16:46:18 GMT
  17. Format: PDF-1.5
  18. ------------------------------------------------------------
  19. Returning to the URL, the file-part is the section after the last '/' and is:
  20. PMcombinedaprjunfinal.pdf
  21.  
  22. Remove the file-part from the end to get:
  23. http://www.cabinetoffice.gov.uk/sites/default/files/resources/
  24.  
  25. which is a folder/directory on a web-server. Interpreting this URL as:
  26.  
  27. http - the protocol to use to retrieve the data, in this case HyperText Transfer Protocol
  28.  
  29. :// - separates the protocol from the rest of the URL
  30.  
  31. www.cabinetoffice.gov.uk - the server "www" on sub-domain "cabinetoffice" of domain "gov" in country-domain "uk" (note that there may be just one physical machine responding to many servers and domains, and the precise data structure cannot be determined just from this information - there is a list of .gov.uk domains at: https://update.cabinetoffice.gov.uk/resource-library/list-govuk-domain-names)
  32.  
  33. then a more familiar (particularly to Unix and Linux users) folder/directory structure:
  34.  
  35. /sites/default/files/resources/ - folder 'resources', located in parent folder 'files', located in parent folder 'default', located in folder 'sites' which is in the root folder of the site.
  36.  
  37. Now some web servers will allow "virtual directory listing" - where you're able to browse into a folder such as this and see it's contents. However, for computer security reasons, this is often set to display an error message or divert to a different page altogether. In the case of the Cabinet Office's system, it is set to silently divert to the home page for the Cabinet Office.
  38. ------------------------------------------------------------
  39.  
  40. So try hitting Google with a rather odd search string - the full folder on the Cabinet Office's web-server:
  41.  
  42. http://www.cabinetoffice.gov.uk/sites/default/files/resources/
  43.  
  44. Yes, it's a bit weird to search for a URL like this, but what should be found are links in pages others have written - links to pages served from that folder, or files in that folder, with the file names on the end of them.
  45.  
  46. It is the files we're interested - especially PDF files. Here are the results from the first page of the search (as at the time of writing, 0430h BST 11-May-2012):
  47.  
  48. Results:
  49. #1 Link to page "The Coalition: our programme for government - Cabinet Office" - ignored.
  50. #2 [PDF] Giving | Green Paper - Cabinet Office: http://www.cabinetoffice.gov.uk/sites/default/files/resources/Giving-Green-Paper.pdf
  51. #3 [PDF] The Compact - Cabinet Office: http://www.cabinetoffice.gov.uk/sites/default/files/resources/The%20Compact.pdf
  52. #4 [PDF] National Risk Register of Civil Emergencies - Cabinet Office: https://update.cabinetoffice.gov.uk/sites/default/files/resources/CO_NationalRiskRegister_2012_acc.pdf
  53. #5 Link to page "Guidance on Transparency | Cabinet Office" - ignored.
  54. #6 Link to page "The UK Cyber Security Strategy - Cabinet Office" - ignored.
  55. #7 Link to page "National Recovery Guidance - Additional Document ... - Cabinet Office" - ignored.
  56. #8 [PDF] The Ripple Effect: The nature and impact of the children and youn...: http://www.ncvo-vol.org.uk/sites/default/files/ripple-effect.pdf
  57. #9 Link to page "Working with Parliamentary Counsel | Deputy Prime ... - Cabinet Office" - ignored.
  58. #10 Link to page "Social Mobility Review documentation | Deputy ... - Cabinet Office" - ignored.
  59.  
  60. Note: #7 was included in the Google search as the PDF document contained a link to a file or other document stored in that Cabinet Office web server folder. As it linked to a PDF file, it was also included in this trawl for sample PDF files on Government web servers.
  61.  
  62. At this stage we have four PDF files to look at. Not a particularly big sample, but this is just to explore the variation in Author, Creator and Publisher fields.
  63. ------------------------------------------------------------
  64.  
  65. Here are the four files again, all downloaded with wget (and their outputs from ls -l), and with the fields: Author, Producer, Creator, Created, Modified and Format from the PDF files.
  66.  
  67. #2 http://www.cabinetoffice.gov.uk/sites/default/files/resources/Giving-Green-Paper.pdf
  68. -rw-rw-r-- 1 chrism chrism 516246 2012-04-02 00:03 Giving-Green-Paper.pdf
  69. Author: HM Government
  70. Producer: Adobe PDF Library 9.0
  71. Creator: Adobe InDesign CS4 (6.0.5)
  72. Created: Thu 23 Dec 2010 15:24:32 GMT
  73. Modified: Fri 07 Jan 2011 08:37:40 GMT
  74. Format: PDF-1.7
  75.  
  76. #3 http://www.cabinetoffice.gov.uk/sites/default/files/resources/The%20Compact.pdf
  77. -rw-rw-r-- 1 chrism chrism 295239 2012-04-02 00:29 The Compact.pdf
  78. Author: <none>
  79. Producer: Adobe PDF Library 9.0
  80. Creator: Adobe InDesign CS4 (6.0.6)
  81. Created: Wed 22 Dec 2010 10:21:47 GMT
  82. Modified: Wed 22 Dec 2010 10:21:48 GMT
  83. Format: PDF-1.4
  84.  
  85. #4 https://update.cabinetoffice.gov.uk/sites/default/files/resources/CO_NationalRiskRegister_2012_acc.pdf
  86. -rw-rw-r-- 1 chrism chrism 971546 2012-04-01 23:55 CO_NationalRiskRegister_2012_acc.pdf
  87. Author: Cabinet Office
  88. Producer: Adobe PDF Library 9.9
  89. Creator: Adobe InDesign CS5 (7.0)
  90. Created: Fri 10 Feb 2012 16:29:55 GMT
  91. Modified: Fri 17 Feb 2012 11:35:56 GMT
  92. Format: PDF-1.6
  93.  
  94. #8 http://www.ncvo-vol.org.uk/sites/default/files/ripple-effect.pdf
  95. -rw-rw-r-- 1 chrism chrism 2414087 2011-12-07 11:49 ripple-effect.pdf
  96. Author: National Children’s Bureau, National Council for Voluntary Organisations
  97. Producer: Adobe PDF Library 9.9
  98. Creator: Adobe InDesign CS5 (7.0.4)
  99. Created: Mon 03 Oct 2011 08:49:29 BST
  100. Modified: Tue 04 Oct 2011 16:21:12 BST
  101. Format: PDF-1.6
  102. ------------------------------------------------------------
  103.  
  104. Hmmm.... still not happy with this. The four PDF files found by Google are all professionally produced Government pamphlets and documents. The file under scrutiny here - PMcombinedaprjunfinal.pdf - looks like it has been produced using a "print to PDF" printer driver from a program such as Microsoft Word or Microsoft Excel. My own experience of using these is they normally turn out PDF files with the Producer &/or Creator files completed with details of the specific piece of software used to produce it.
  105.  
  106. A continuation of the survey of the Cabinet Office website may find other such documents, or broaden to include some of the other Ministerial meetings lists, and whether those have also been produced using an anonymised "print to PDF" system, or whether the file under scrutiny - PMcombinedaprjunfinal.pdf - is alone in this regard.
  107.  
  108. One other thing that's noticeable, the first three of the files - all from the same Cabinet Office webserver that we're interested in - they were all modified on the webserver overnight from 1st April to 2nd April 2012. This leads me to suspect the entire folder was restored from backup or transferred from another place at that time, and that in the restore/transfer process, the original file date and time stamps were not preserved.
  109.  
  110. Dammit... Hanlon's razor has struck again.
  111. http://en.wikipedia.org/wiki/Hanlon's_razor
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement