Advertisement
Guest User

ZIP File Format Specification version 6.3.4

a guest
Sep 14th, 2017
831
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 157.70 KB | None | 0 0
  1. File: APPNOTE.TXT - .ZIP File Format Specification
  2. Version: 6.3.4
  3. Status: Final - replaces version 6.3.3
  4. Revised: October 1, 2014
  5. Copyright (c) 1989 - 2014 PKWARE Inc., All Rights Reserved.
  6.  
  7. 1.0 Introduction
  8. ---------------
  9.  
  10. 1.1 Purpose
  11. -----------
  12.  
  13. 1.1.1 This specification is intended to define a cross-platform,
  14. interoperable file storage and transfer format. Since its
  15. first publication in 1989, PKWARE, Inc. ("PKWARE") has remained
  16. committed to ensuring the interoperability of the .ZIP file
  17. format through periodic publication and maintenance of this
  18. specification. We trust that all .ZIP compatible vendors and
  19. application developers that use and benefit from this format
  20. will share and support this commitment to interoperability.
  21.  
  22. 1.2 Scope
  23. ---------
  24.  
  25. 1.2.1 ZIP is one of the most widely used compressed file formats. It is
  26. universally used to aggregate, compress, and encrypt files into a single
  27. interoperable container. No specific use or application need is
  28. defined by this format and no specific implementation guidance is
  29. provided. This document provides details on the storage format for
  30. creating ZIP files. Information is provided on the records and
  31. fields that describe what a ZIP file is.
  32.  
  33. 1.3 Trademarks
  34. --------------
  35.  
  36. 1.3.1 PKWARE, PKZIP, SecureZIP, and PKSFX are registered trademarks of
  37. PKWARE, Inc. in the United States and elsewhere. PKPatchMaker,
  38. Deflate64, and ZIP64 are trademarks of PKWARE, Inc. Other marks
  39. referenced within this document appear for identification
  40. purposes only and are the property of their respective owners.
  41.  
  42.  
  43. 1.4 Permitted Use
  44. -----------------
  45.  
  46. 1.4.1 This document, "APPNOTE.TXT - .ZIP File Format Specification" is the
  47. exclusive property of PKWARE. Use of the information contained in this
  48. document is permitted solely for the purpose of creating products,
  49. programs and processes that read and write files in the ZIP format
  50. subject to the terms and conditions herein.
  51.  
  52. 1.4.2 Use of the content of this document within other publications is
  53. permitted only through reference to this document. Any reproduction
  54. or distribution of this document in whole or in part without prior
  55. written permission from PKWARE is strictly prohibited.
  56.  
  57. 1.4.3 Certain technological components provided in this document are the
  58. patented proprietary technology of PKWARE and as such require a
  59. separate, executed license agreement from PKWARE. Applicable
  60. components are marked with the following, or similar, statement:
  61. 'Refer to the section in this document entitled "Incorporating
  62. PKWARE Proprietary Technology into Your Product" for more information'.
  63.  
  64. 1.5 Contacting PKWARE
  65. ---------------------
  66.  
  67. 1.5.1 If you have questions on this format, its use, or licensing, or if you
  68. wish to report defects, request changes or additions, please contact:
  69.  
  70. PKWARE, Inc.
  71. 201 E. Pittsburgh Avenue, Suite 400
  72. Milwaukee, WI 53204
  73. +1-414-289-9788
  74. +1-414-289-9789 FAX
  75. zipformat@pkware.com
  76.  
  77. 1.5.2 Information about this format and copies of this document are publicly
  78. available at:
  79.  
  80. http://www.pkware.com/appnote
  81.  
  82. 1.6 Disclaimer
  83. --------------
  84.  
  85. 1.6.1 Although PKWARE will attempt to supply current and accurate
  86. information relating to its file formats, algorithms, and the
  87. subject programs, the possibility of error or omission cannot
  88. be eliminated. PKWARE therefore expressly disclaims any warranty
  89. that the information contained in the associated materials relating
  90. to the subject programs and/or the format of the files created or
  91. accessed by the subject programs and/or the algorithms used by
  92. the subject programs, or any other matter, is current, correct or
  93. accurate as delivered. Any risk of damage due to any possible
  94. inaccurate information is assumed by the user of the information.
  95. Furthermore, the information relating to the subject programs
  96. and/or the file formats created or accessed by the subject
  97. programs and/or the algorithms used by the subject programs is
  98. subject to change without notice.
  99.  
  100. 2.0 Revisions
  101. --------------
  102.  
  103. 2.1 Document Status
  104. --------------------
  105.  
  106. 2.1.1 If the STATUS of this file is marked as DRAFT, the content
  107. defines proposed revisions to this specification which may consist
  108. of changes to the ZIP format itself, or that may consist of other
  109. content changes to this document. Versions of this document and
  110. the format in DRAFT form may be subject to modification prior to
  111. publication STATUS of FINAL. DRAFT versions are published periodically
  112. to provide notification to the ZIP community of pending changes and to
  113. provide opportunity for review and comment.
  114.  
  115. 2.1.2 Versions of this document having a STATUS of FINAL are
  116. considered to be in the final form for that version of the document
  117. and are not subject to further change until a new, higher version
  118. numbered document is published. Newer versions of this format
  119. specification are intended to remain interoperable with with all prior
  120. versions whenever technically possible.
  121.  
  122. 2.2 Change Log
  123. --------------
  124.  
  125. Version Change Description Date
  126. ------- ------------------ ----------
  127. 5.2 -Single Password Symmetric Encryption 07/16/2003
  128. storage
  129.  
  130. 6.1.0 -Smartcard compatibility 01/20/2004
  131. -Documentation on certificate storage
  132.  
  133. 6.2.0 -Introduction of Central Directory 04/26/2004
  134. Encryption for encrypting metadata
  135. -Added OS X to Version Made By values
  136.  
  137. 6.2.1 -Added Extra Field placeholder for 04/01/2005
  138. POSZIP using ID 0x4690
  139.  
  140. -Clarified size field on
  141. "zip64 end of central directory record"
  142.  
  143. 6.2.2 -Documented Final Feature Specification 01/06/2006
  144. for Strong Encryption
  145.  
  146. -Clarifications and typographical
  147. corrections
  148.  
  149. 6.3.0 -Added tape positioning storage 09/29/2006
  150. parameters
  151.  
  152. -Expanded list of supported hash algorithms
  153.  
  154. -Expanded list of supported compression
  155. algorithms
  156.  
  157. -Expanded list of supported encryption
  158. algorithms
  159.  
  160. -Added option for Unicode filename
  161. storage
  162.  
  163. -Clarifications for consistent use
  164. of Data Descriptor records
  165.  
  166. -Added additional "Extra Field"
  167. definitions
  168.  
  169. 6.3.1 -Corrected standard hash values for 04/11/2007
  170. SHA-256/384/512
  171.  
  172. 6.3.2 -Added compression method 97 09/28/2007
  173.  
  174. -Documented InfoZIP "Extra Field"
  175. values for UTF-8 file name and
  176. file comment storage
  177.  
  178. 6.3.3 -Formatting changes to support 09/01/2012
  179. easier referencing of this APPNOTE
  180. from other documents and standards
  181.  
  182. 6.3.4 -Address change 10/01/2014
  183.  
  184.  
  185. 3.0 Notations
  186. -------------
  187.  
  188. 3.1 Use of the term MUST or SHALL indicates a required element.
  189.  
  190. 3.2 MAY NOT or SHALL NOT indicates an element is prohibited from use.
  191.  
  192. 3.3 SHOULD indicates a RECOMMENDED element.
  193.  
  194. 3.4 SHOULD NOT indicates an element NOT RECOMMENDED for use.
  195.  
  196. 3.5 MAY indicates an OPTIONAL element.
  197.  
  198.  
  199. 4.0 ZIP Files
  200. -------------
  201.  
  202. 4.1 What is a ZIP file
  203. ----------------------
  204.  
  205. 4.1.1 ZIP files MAY be identified by the standard .ZIP file extension
  206. although use of a file extension is not required. Use of the
  207. extension .ZIPX is also recognized and MAY be used for ZIP files.
  208. Other common file extensions using the ZIP format include .JAR, .WAR,
  209. .DOCX, .XLXS, .PPTX, .ODT, .ODS, .ODP and others. Programs reading or
  210. writing ZIP files SHOULD rely on internal record signatures described
  211. in this document to identify files in this format.
  212.  
  213. 4.1.2 ZIP files SHOULD contain at least one file and MAY contain
  214. multiple files.
  215.  
  216. 4.1.3 Data compression MAY be used to reduce the size of files
  217. placed into a ZIP file, but is not required. This format supports the
  218. use of multiple data compression algorithms. When compression is used,
  219. one of the documented compression algorithms MUST be used. Implementors
  220. are advised to experiment with their data to determine which of the
  221. available algorithms provides the best compression for their needs.
  222. Compression method 8 (Deflate) is the method used by default by most
  223. ZIP compatible application programs.
  224.  
  225.  
  226. 4.1.4 Data encryption MAY be used to protect files within a ZIP file.
  227. Keying methods supported for encryption within this format include
  228. passwords and public/private keys. Either MAY be used individually
  229. or in combination. Encryption MAY be applied to individual files.
  230. Additional security MAY be used through the encryption of ZIP file
  231. metadata stored within the Central Directory. See the section on the
  232. Strong Encryption Specification for information. Refer to the section
  233. in this document entitled "Incorporating PKWARE Proprietary Technology
  234. into Your Product" for more information.
  235.  
  236. 4.1.5 Data integrity MUST be provided for each file using CRC32.
  237.  
  238. 4.1.6 Additional data integrity MAY be included through the use of
  239. digital signatures. Individual files MAY be signed with one or more
  240. digital signatures. The Central Directory, if signed, MUST use a
  241. single signature.
  242.  
  243. 4.1.7 Files MAY be placed within a ZIP file uncompressed or stored.
  244. The term "stored" as used in the context of this document means the file
  245. is copied into the ZIP file uncompressed.
  246.  
  247. 4.1.8 Each data file placed into a ZIP file MAY be compressed, stored,
  248. encrypted or digitally signed independent of how other data files in the
  249. same ZIP file are archived.
  250.  
  251. 4.1.9 ZIP files MAY be streamed, split into segments (on fixed or on
  252. removable media) or "self-extracting". Self-extracting ZIP
  253. files MUST include extraction code for a target platform within
  254. the ZIP file.
  255.  
  256. 4.1.10 Extensibility is provided for platform or application specific
  257. needs through extra data fields that MAY be defined for custom
  258. purposes. Extra data definitions MUST NOT conflict with existing
  259. documented record definitions.
  260.  
  261. 4.1.11 Common uses for ZIP MAY also include the use of manifest files.
  262. Manifest files store application specific information within a file stored
  263. within the ZIP file. This manifest file SHOULD be the first file in the
  264. ZIP file. This specification does not provide any information or guidance on
  265. the use of manifest files within ZIP files. Refer to the application developer
  266. for information on using manifest files and for any additional profile
  267. information on using ZIP within an application.
  268.  
  269. 4.1.12 ZIP files MAY be placed within other ZIP files.
  270.  
  271. 4.2 ZIP Metadata
  272. ----------------
  273.  
  274. 4.2.1 ZIP files are identified by metadata consisting of defined record types
  275. containing the storage information necessary for maintaining the files
  276. placed into a ZIP file. Each record type MUST be identified using a header
  277. signature that identifies the record type. Signature values begin with the
  278. two byte constant marker of 0x4b50, representing the characters "PK".
  279.  
  280.  
  281. 4.3 General Format of a .ZIP file
  282. ---------------------------------
  283.  
  284. 4.3.1 A ZIP file MUST contain an "end of central directory record". A ZIP
  285. file containing only an "end of central directory record" is considered an
  286. empty ZIP file. Files may be added or replaced within a ZIP file, or deleted.
  287. A ZIP file MUST have only one "end of central directory record". Other
  288. records defined in this specification MAY be used as needed to support
  289. storage requirements for individual ZIP files.
  290.  
  291. 4.3.2 Each file placed into a ZIP file MUST be preceeded by a "local
  292. file header" record for that file. Each "local file header" MUST be
  293. accompanied by a corresponding "central directory header" record within
  294. the central directory section of the ZIP file.
  295.  
  296. 4.3.3 Files MAY be stored in arbitrary order within a ZIP file. A ZIP
  297. file MAY span multiple volumes or it MAY be split into user-defined
  298. segment sizes. All values MUST be stored in little-endian byte order unless
  299. otherwise specified in this document for a specific data element.
  300.  
  301. 4.3.4 Compression MUST NOT be applied to a "local file header", an "encryption
  302. header", or an "end of central directory record". Individual "central
  303. directory records" must not be compressed, but the aggregate of all central
  304. directory records MAY be compressed.
  305.  
  306. 4.3.5 File data MAY be followed by a "data descriptor" for the file. Data
  307. descriptors are used to facilitate ZIP file streaming.
  308.  
  309.  
  310. 4.3.6 Overall .ZIP file format:
  311.  
  312. [local file header 1]
  313. [encryption header 1]
  314. [file data 1]
  315. [data descriptor 1]
  316. .
  317. .
  318. .
  319. [local file header n]
  320. [encryption header n]
  321. [file data n]
  322. [data descriptor n]
  323. [archive decryption header]
  324. [archive extra data record]
  325. [central directory header 1]
  326. .
  327. .
  328. .
  329. [central directory header n]
  330. [zip64 end of central directory record]
  331. [zip64 end of central directory locator]
  332. [end of central directory record]
  333.  
  334.  
  335. 4.3.7 Local file header:
  336.  
  337. local file header signature 4 bytes (0x04034b50)
  338. version needed to extract 2 bytes
  339. general purpose bit flag 2 bytes
  340. compression method 2 bytes
  341. last mod file time 2 bytes
  342. last mod file date 2 bytes
  343. crc-32 4 bytes
  344. compressed size 4 bytes
  345. uncompressed size 4 bytes
  346. file name length 2 bytes
  347. extra field length 2 bytes
  348.  
  349. file name (variable size)
  350. extra field (variable size)
  351.  
  352. 4.3.8 File data
  353.  
  354. Immediately following the local header for a file
  355. SHOULD be placed the compressed or stored data for the file.
  356. If the file is encrypted, the encryption header for the file
  357. SHOULD be placed after the local header and before the file
  358. data. The series of [local file header][encryption header]
  359. [file data][data descriptor] repeats for each file in the
  360. .ZIP archive.
  361.  
  362. Zero-byte files, directories, and other file types that
  363. contain no content MUST not include file data.
  364.  
  365. 4.3.9 Data descriptor:
  366.  
  367. crc-32 4 bytes
  368. compressed size 4 bytes
  369. uncompressed size 4 bytes
  370.  
  371. 4.3.9.1 This descriptor MUST exist if bit 3 of the general
  372. purpose bit flag is set (see below). It is byte aligned
  373. and immediately follows the last byte of compressed data.
  374. This descriptor SHOULD be used only when it was not possible to
  375. seek in the output .ZIP file, e.g., when the output .ZIP file
  376. was standard output or a non-seekable device. For ZIP64(tm) format
  377. archives, the compressed and uncompressed sizes are 8 bytes each.
  378.  
  379. 4.3.9.2 When compressing files, compressed and uncompressed sizes
  380. should be stored in ZIP64 format (as 8 byte values) when a
  381. file's size exceeds 0xFFFFFFFF. However ZIP64 format may be
  382. used regardless of the size of a file. When extracting, if
  383. the zip64 extended information extra field is present for
  384. the file the compressed and uncompressed sizes will be 8
  385. byte values.
  386.  
  387. 4.3.9.3 Although not originally assigned a signature, the value
  388. 0x08074b50 has commonly been adopted as a signature value
  389. for the data descriptor record. Implementers should be
  390. aware that ZIP files may be encountered with or without this
  391. signature marking data descriptors and SHOULD account for
  392. either case when reading ZIP files to ensure compatibility.
  393.  
  394. 4.3.9.4 When writing ZIP files, implementors SHOULD include the
  395. signature value marking the data descriptor record. When
  396. the signature is used, the fields currently defined for
  397. the data descriptor record will immediately follow the
  398. signature.
  399.  
  400. 4.3.9.5 An extensible data descriptor will be released in a
  401. future version of this APPNOTE. This new record is intended to
  402. resolve conflicts with the use of this record going forward,
  403. and to provide better support for streamed file processing.
  404.  
  405. 4.3.9.6 When the Central Directory Encryption method is used,
  406. the data descriptor record is not required, but MAY be used.
  407. If present, and bit 3 of the general purpose bit field is set to
  408. indicate its presence, the values in fields of the data descriptor
  409. record MUST be set to binary zeros. See the section on the Strong
  410. Encryption Specification for information. Refer to the section in
  411. this document entitled "Incorporating PKWARE Proprietary Technology
  412. into Your Product" for more information.
  413.  
  414.  
  415. 4.3.10 Archive decryption header:
  416.  
  417. 4.3.10.1 The Archive Decryption Header is introduced in version 6.2
  418. of the ZIP format specification. This record exists in support
  419. of the Central Directory Encryption Feature implemented as part of
  420. the Strong Encryption Specification as described in this document.
  421. When the Central Directory Structure is encrypted, this decryption
  422. header MUST precede the encrypted data segment.
  423.  
  424. 4.3.10.2 The encrypted data segment SHALL consist of the Archive
  425. extra data record (if present) and the encrypted Central Directory
  426. Structure data. The format of this data record is identical to the
  427. Decryption header record preceding compressed file data. If the
  428. central directory structure is encrypted, the location of the start of
  429. this data record is determined using the Start of Central Directory
  430. field in the Zip64 End of Central Directory record. See the
  431. section on the Strong Encryption Specification for information
  432. on the fields used in the Archive Decryption Header record.
  433. Refer to the section in this document entitled "Incorporating
  434. PKWARE Proprietary Technology into Your Product" for more information.
  435.  
  436.  
  437. 4.3.11 Archive extra data record:
  438.  
  439. archive extra data signature 4 bytes (0x08064b50)
  440. extra field length 4 bytes
  441. extra field data (variable size)
  442.  
  443. 4.3.11.1 The Archive Extra Data Record is introduced in version 6.2
  444. of the ZIP format specification. This record MAY be used in support
  445. of the Central Directory Encryption Feature implemented as part of
  446. the Strong Encryption Specification as described in this document.
  447. When present, this record MUST immediately precede the central
  448. directory data structure.
  449.  
  450. 4.3.11.2 The size of this data record SHALL be included in the
  451. Size of the Central Directory field in the End of Central
  452. Directory record. If the central directory structure is compressed,
  453. but not encrypted, the location of the start of this data record is
  454. determined using the Start of Central Directory field in the Zip64
  455. End of Central Directory record. Refer to the section in this document
  456. entitled "Incorporating PKWARE Proprietary Technology into Your
  457. Product" for more information.
  458.  
  459. 4.3.12 Central directory structure:
  460.  
  461. [central directory header 1]
  462. .
  463. .
  464. .
  465. [central directory header n]
  466. [digital signature]
  467.  
  468. File header:
  469.  
  470. central file header signature 4 bytes (0x02014b50)
  471. version made by 2 bytes
  472. version needed to extract 2 bytes
  473. general purpose bit flag 2 bytes
  474. compression method 2 bytes
  475. last mod file time 2 bytes
  476. last mod file date 2 bytes
  477. crc-32 4 bytes
  478. compressed size 4 bytes
  479. uncompressed size 4 bytes
  480. file name length 2 bytes
  481. extra field length 2 bytes
  482. file comment length 2 bytes
  483. disk number start 2 bytes
  484. internal file attributes 2 bytes
  485. external file attributes 4 bytes
  486. relative offset of local header 4 bytes
  487.  
  488. file name (variable size)
  489. extra field (variable size)
  490. file comment (variable size)
  491.  
  492. 4.3.13 Digital signature:
  493.  
  494. header signature 4 bytes (0x05054b50)
  495. size of data 2 bytes
  496. signature data (variable size)
  497.  
  498. With the introduction of the Central Directory Encryption
  499. feature in version 6.2 of this specification, the Central
  500. Directory Structure MAY be stored both compressed and encrypted.
  501. Although not required, it is assumed when encrypting the
  502. Central Directory Structure, that it will be compressed
  503. for greater storage efficiency. Information on the
  504. Central Directory Encryption feature can be found in the section
  505. describing the Strong Encryption Specification. The Digital
  506. Signature record will be neither compressed nor encrypted.
  507.  
  508. 4.3.14 Zip64 end of central directory record
  509.  
  510. zip64 end of central dir
  511. signature 4 bytes (0x06064b50)
  512. size of zip64 end of central
  513. directory record 8 bytes
  514. version made by 2 bytes
  515. version needed to extract 2 bytes
  516. number of this disk 4 bytes
  517. number of the disk with the
  518. start of the central directory 4 bytes
  519. total number of entries in the
  520. central directory on this disk 8 bytes
  521. total number of entries in the
  522. central directory 8 bytes
  523. size of the central directory 8 bytes
  524. offset of start of central
  525. directory with respect to
  526. the starting disk number 8 bytes
  527. zip64 extensible data sector (variable size)
  528.  
  529. 4.3.14.1 The value stored into the "size of zip64 end of central
  530. directory record" should be the size of the remaining
  531. record and should not include the leading 12 bytes.
  532.  
  533. Size = SizeOfFixedFields + SizeOfVariableData - 12.
  534.  
  535. 4.3.14.2 The above record structure defines Version 1 of the
  536. zip64 end of central directory record. Version 1 was
  537. implemented in versions of this specification preceding
  538. 6.2 in support of the ZIP64 large file feature. The
  539. introduction of the Central Directory Encryption feature
  540. implemented in version 6.2 as part of the Strong Encryption
  541. Specification defines Version 2 of this record structure.
  542. Refer to the section describing the Strong Encryption
  543. Specification for details on the version 2 format for
  544. this record. Refer to the section in this document entitled
  545. "Incorporating PKWARE Proprietary Technology into Your Product"
  546. for more information applicable to use of Version 2 of this
  547. record.
  548.  
  549. 4.3.14.3 Special purpose data MAY reside in the zip64 extensible
  550. data sector field following either a V1 or V2 version of this
  551. record. To ensure identification of this special purpose data
  552. it must include an identifying header block consisting of the
  553. following:
  554.  
  555. Header ID - 2 bytes
  556. Data Size - 4 bytes
  557.  
  558. The Header ID field indicates the type of data that is in the
  559. data block that follows.
  560.  
  561. Data Size identifies the number of bytes that follow for this
  562. data block type.
  563.  
  564. 4.3.14.4 Multiple special purpose data blocks MAY be present.
  565. Each MUST be preceded by a Header ID and Data Size field. Current
  566. mappings of Header ID values supported in this field are as
  567. defined in APPENDIX C.
  568.  
  569. 4.3.15 Zip64 end of central directory locator
  570.  
  571. zip64 end of central dir locator
  572. signature 4 bytes (0x07064b50)
  573. number of the disk with the
  574. start of the zip64 end of
  575. central directory 4 bytes
  576. relative offset of the zip64
  577. end of central directory record 8 bytes
  578. total number of disks 4 bytes
  579.  
  580. 4.3.16 End of central directory record:
  581.  
  582. end of central dir signature 4 bytes (0x06054b50)
  583. number of this disk 2 bytes
  584. number of the disk with the
  585. start of the central directory 2 bytes
  586. total number of entries in the
  587. central directory on this disk 2 bytes
  588. total number of entries in
  589. the central directory 2 bytes
  590. size of the central directory 4 bytes
  591. offset of start of central
  592. directory with respect to
  593. the starting disk number 4 bytes
  594. .ZIP file comment length 2 bytes
  595. .ZIP file comment (variable size)
  596.  
  597. 4.4 Explanation of fields
  598. --------------------------
  599.  
  600. 4.4.1 General notes on fields
  601.  
  602. 4.4.1.1 All fields unless otherwise noted are unsigned and stored
  603. in Intel low-byte:high-byte, low-word:high-word order.
  604.  
  605. 4.4.1.2 String fields are not null terminated, since the length
  606. is given explicitly.
  607.  
  608. 4.4.1.3 The entries in the central directory may not necessarily
  609. be in the same order that files appear in the .ZIP file.
  610.  
  611. 4.4.1.4 If one of the fields in the end of central directory
  612. record is too small to hold required data, the field should be
  613. set to -1 (0xFFFF or 0xFFFFFFFF) and the ZIP64 format record
  614. should be created.
  615.  
  616. 4.4.1.5 The end of central directory record and the Zip64 end
  617. of central directory locator record MUST reside on the same
  618. disk when splitting or spanning an archive.
  619.  
  620. 4.4.2 version made by (2 bytes)
  621.  
  622. 4.4.2.1 The upper byte indicates the compatibility of the file
  623. attribute information. If the external file attributes
  624. are compatible with MS-DOS and can be read by PKZIP for
  625. DOS version 2.04g then this value will be zero. If these
  626. attributes are not compatible, then this value will
  627. identify the host system on which the attributes are
  628. compatible. Software can use this information to determine
  629. the line record format for text files etc.
  630.  
  631. 4.4.2.2 The current mappings are:
  632.  
  633. 0 - MS-DOS and OS/2 (FAT / VFAT / FAT32 file systems)
  634. 1 - Amiga 2 - OpenVMS
  635. 3 - UNIX 4 - VM/CMS
  636. 5 - Atari ST 6 - OS/2 H.P.F.S.
  637. 7 - Macintosh 8 - Z-System
  638. 9 - CP/M 10 - Windows NTFS
  639. 11 - MVS (OS/390 - Z/OS) 12 - VSE
  640. 13 - Acorn Risc 14 - VFAT
  641. 15 - alternate MVS 16 - BeOS
  642. 17 - Tandem 18 - OS/400
  643. 19 - OS X (Darwin) 20 thru 255 - unused
  644.  
  645. 4.4.2.3 The lower byte indicates the ZIP specification version
  646. (the version of this document) supported by the software
  647. used to encode the file. The value/10 indicates the major
  648. version number, and the value mod 10 is the minor version
  649. number.
  650.  
  651. 4.4.3 version needed to extract (2 bytes)
  652.  
  653. 4.4.3.1 The minimum supported ZIP specification version needed
  654. to extract the file, mapped as above. This value is based on
  655. the specific format features a ZIP program MUST support to
  656. be able to extract the file. If multiple features are
  657. applied to a file, the minimum version MUST be set to the
  658. feature having the highest value. New features or feature
  659. changes affecting the published format specification will be
  660. implemented using higher version numbers than the last
  661. published value to avoid conflict.
  662.  
  663. 4.4.3.2 Current minimum feature versions are as defined below:
  664.  
  665. 1.0 - Default value
  666. 1.1 - File is a volume label
  667. 2.0 - File is a folder (directory)
  668. 2.0 - File is compressed using Deflate compression
  669. 2.0 - File is encrypted using traditional PKWARE encryption
  670. 2.1 - File is compressed using Deflate64(tm)
  671. 2.5 - File is compressed using PKWARE DCL Implode
  672. 2.7 - File is a patch data set
  673. 4.5 - File uses ZIP64 format extensions
  674. 4.6 - File is compressed using BZIP2 compression*
  675. 5.0 - File is encrypted using DES
  676. 5.0 - File is encrypted using 3DES
  677. 5.0 - File is encrypted using original RC2 encryption
  678. 5.0 - File is encrypted using RC4 encryption
  679. 5.1 - File is encrypted using AES encryption
  680. 5.1 - File is encrypted using corrected RC2 encryption**
  681. 5.2 - File is encrypted using corrected RC2-64 encryption**
  682. 6.1 - File is encrypted using non-OAEP key wrapping***
  683. 6.2 - Central directory encryption
  684. 6.3 - File is compressed using LZMA
  685. 6.3 - File is compressed using PPMd+
  686. 6.3 - File is encrypted using Blowfish
  687. 6.3 - File is encrypted using Twofish
  688.  
  689. 4.4.3.3 Notes on version needed to extract
  690.  
  691. * Early 7.x (pre-7.2) versions of PKZIP incorrectly set the
  692. version needed to extract for BZIP2 compression to be 50
  693. when it should have been 46.
  694.  
  695. ** Refer to the section on Strong Encryption Specification
  696. for additional information regarding RC2 corrections.
  697.  
  698. *** Certificate encryption using non-OAEP key wrapping is the
  699. intended mode of operation for all versions beginning with 6.1.
  700. Support for OAEP key wrapping MUST only be used for
  701. backward compatibility when sending ZIP files to be opened by
  702. versions of PKZIP older than 6.1 (5.0 or 6.0).
  703.  
  704. + Files compressed using PPMd MUST set the version
  705. needed to extract field to 6.3, however, not all ZIP
  706. programs enforce this and may be unable to decompress
  707. data files compressed using PPMd if this value is set.
  708.  
  709. When using ZIP64 extensions, the corresponding value in the
  710. zip64 end of central directory record MUST also be set.
  711. This field should be set appropriately to indicate whether
  712. Version 1 or Version 2 format is in use.
  713.  
  714.  
  715. 4.4.4 general purpose bit flag: (2 bytes)
  716.  
  717. Bit 0: If set, indicates that the file is encrypted.
  718.  
  719. (For Method 6 - Imploding)
  720. Bit 1: If the compression method used was type 6,
  721. Imploding, then this bit, if set, indicates
  722. an 8K sliding dictionary was used. If clear,
  723. then a 4K sliding dictionary was used.
  724.  
  725. Bit 2: If the compression method used was type 6,
  726. Imploding, then this bit, if set, indicates
  727. 3 Shannon-Fano trees were used to encode the
  728. sliding dictionary output. If clear, then 2
  729. Shannon-Fano trees were used.
  730.  
  731. (For Methods 8 and 9 - Deflating)
  732. Bit 2 Bit 1
  733. 0 0 Normal (-en) compression option was used.
  734. 0 1 Maximum (-exx/-ex) compression option was used.
  735. 1 0 Fast (-ef) compression option was used.
  736. 1 1 Super Fast (-es) compression option was used.
  737.  
  738. (For Method 14 - LZMA)
  739. Bit 1: If the compression method used was type 14,
  740. LZMA, then this bit, if set, indicates
  741. an end-of-stream (EOS) marker is used to
  742. mark the end of the compressed data stream.
  743. If clear, then an EOS marker is not present
  744. and the compressed data size must be known
  745. to extract.
  746.  
  747. Note: Bits 1 and 2 are undefined if the compression
  748. method is any other.
  749.  
  750. Bit 3: If this bit is set, the fields crc-32, compressed
  751. size and uncompressed size are set to zero in the
  752. local header. The correct values are put in the
  753. data descriptor immediately following the compressed
  754. data. (Note: PKZIP version 2.04g for DOS only
  755. recognizes this bit for method 8 compression, newer
  756. versions of PKZIP recognize this bit for any
  757. compression method.)
  758.  
  759. Bit 4: Reserved for use with method 8, for enhanced
  760. deflating.
  761.  
  762. Bit 5: If this bit is set, this indicates that the file is
  763. compressed patched data. (Note: Requires PKZIP
  764. version 2.70 or greater)
  765.  
  766. Bit 6: Strong encryption. If this bit is set, you MUST
  767. set the version needed to extract value to at least
  768. 50 and you MUST also set bit 0. If AES encryption
  769. is used, the version needed to extract value MUST
  770. be at least 51. See the section describing the Strong
  771. Encryption Specification for details. Refer to the
  772. section in this document entitled "Incorporating PKWARE
  773. Proprietary Technology into Your Product" for more
  774. information.
  775.  
  776. Bit 7: Currently unused.
  777.  
  778. Bit 8: Currently unused.
  779.  
  780. Bit 9: Currently unused.
  781.  
  782. Bit 10: Currently unused.
  783.  
  784. Bit 11: Language encoding flag (EFS). If this bit is set,
  785. the filename and comment fields for this file
  786. MUST be encoded using UTF-8. (see APPENDIX D)
  787.  
  788. Bit 12: Reserved by PKWARE for enhanced compression.
  789.  
  790. Bit 13: Set when encrypting the Central Directory to indicate
  791. selected data values in the Local Header are masked to
  792. hide their actual values. See the section describing
  793. the Strong Encryption Specification for details. Refer
  794. to the section in this document entitled "Incorporating
  795. PKWARE Proprietary Technology into Your Product" for
  796. more information.
  797.  
  798. Bit 14: Reserved by PKWARE.
  799.  
  800. Bit 15: Reserved by PKWARE.
  801.  
  802. 4.4.5 compression method: (2 bytes)
  803.  
  804. 0 - The file is stored (no compression)
  805. 1 - The file is Shrunk
  806. 2 - The file is Reduced with compression factor 1
  807. 3 - The file is Reduced with compression factor 2
  808. 4 - The file is Reduced with compression factor 3
  809. 5 - The file is Reduced with compression factor 4
  810. 6 - The file is Imploded
  811. 7 - Reserved for Tokenizing compression algorithm
  812. 8 - The file is Deflated
  813. 9 - Enhanced Deflating using Deflate64(tm)
  814. 10 - PKWARE Data Compression Library Imploding (old IBM TERSE)
  815. 11 - Reserved by PKWARE
  816. 12 - File is compressed using BZIP2 algorithm
  817. 13 - Reserved by PKWARE
  818. 14 - LZMA (EFS)
  819. 15 - Reserved by PKWARE
  820. 16 - Reserved by PKWARE
  821. 17 - Reserved by PKWARE
  822. 18 - File is compressed using IBM TERSE (new)
  823. 19 - IBM LZ77 z Architecture (PFS)
  824. 97 - WavPack compressed data
  825. 98 - PPMd version I, Rev 1
  826.  
  827.  
  828. 4.4.6 date and time fields: (2 bytes each)
  829.  
  830. The date and time are encoded in standard MS-DOS format.
  831. If input came from standard input, the date and time are
  832. those at which compression was started for this data.
  833. If encrypting the central directory and general purpose bit
  834. flag 13 is set indicating masking, the value stored in the
  835. Local Header will be zero.
  836.  
  837. 4.4.7 CRC-32: (4 bytes)
  838.  
  839. The CRC-32 algorithm was generously contributed by
  840. David Schwaderer and can be found in his excellent
  841. book "C Programmers Guide to NetBIOS" published by
  842. Howard W. Sams & Co. Inc. The 'magic number' for
  843. the CRC is 0xdebb20e3. The proper CRC pre and post
  844. conditioning is used, meaning that the CRC register
  845. is pre-conditioned with all ones (a starting value
  846. of 0xffffffff) and the value is post-conditioned by
  847. taking the one's complement of the CRC residual.
  848. If bit 3 of the general purpose flag is set, this
  849. field is set to zero in the local header and the correct
  850. value is put in the data descriptor and in the central
  851. directory. When encrypting the central directory, if the
  852. local header is not in ZIP64 format and general purpose
  853. bit flag 13 is set indicating masking, the value stored
  854. in the Local Header will be zero.
  855.  
  856. 4.4.8 compressed size: (4 bytes)
  857. 4.4.9 uncompressed size: (4 bytes)
  858.  
  859. The size of the file compressed (4.4.8) and uncompressed,
  860. (4.4.9) respectively. When a decryption header is present it
  861. will be placed in front of the file data and the value of the
  862. compressed file size will include the bytes of the decryption
  863. header. If bit 3 of the general purpose bit flag is set,
  864. these fields are set to zero in the local header and the
  865. correct values are put in the data descriptor and
  866. in the central directory. If an archive is in ZIP64 format
  867. and the value in this field is 0xFFFFFFFF, the size will be
  868. in the corresponding 8 byte ZIP64 extended information
  869. extra field. When encrypting the central directory, if the
  870. local header is not in ZIP64 format and general purpose bit
  871. flag 13 is set indicating masking, the value stored for the
  872. uncompressed size in the Local Header will be zero.
  873.  
  874. 4.4.10 file name length: (2 bytes)
  875. 4.4.11 extra field length: (2 bytes)
  876. 4.4.12 file comment length: (2 bytes)
  877.  
  878. The length of the file name, extra field, and comment
  879. fields respectively. The combined length of any
  880. directory record and these three fields should not
  881. generally exceed 65,535 bytes. If input came from standard
  882. input, the file name length is set to zero.
  883.  
  884.  
  885. 4.4.13 disk number start: (2 bytes)
  886.  
  887. The number of the disk on which this file begins. If an
  888. archive is in ZIP64 format and the value in this field is
  889. 0xFFFF, the size will be in the corresponding 4 byte zip64
  890. extended information extra field.
  891.  
  892. 4.4.14 internal file attributes: (2 bytes)
  893.  
  894. Bits 1 and 2 are reserved for use by PKWARE.
  895.  
  896. 4.4.14.1 The lowest bit of this field indicates, if set,
  897. that the file is apparently an ASCII or text file. If not
  898. set, that the file apparently contains binary data.
  899. The remaining bits are unused in version 1.0.
  900.  
  901. 4.4.14.2 The 0x0002 bit of this field indicates, if set, that
  902. a 4 byte variable record length control field precedes each
  903. logical record indicating the length of the record. The
  904. record length control field is stored in little-endian byte
  905. order. This flag is independent of text control characters,
  906. and if used in conjunction with text data, includes any
  907. control characters in the total length of the record. This
  908. value is provided for mainframe data transfer support.
  909.  
  910. 4.4.15 external file attributes: (4 bytes)
  911.  
  912. The mapping of the external attributes is
  913. host-system dependent (see 'version made by'). For
  914. MS-DOS, the low order byte is the MS-DOS directory
  915. attribute byte. If input came from standard input, this
  916. field is set to zero.
  917.  
  918. 4.4.16 relative offset of local header: (4 bytes)
  919.  
  920. This is the offset from the start of the first disk on
  921. which this file appears, to where the local header should
  922. be found. If an archive is in ZIP64 format and the value
  923. in this field is 0xFFFFFFFF, the size will be in the
  924. corresponding 8 byte zip64 extended information extra field.
  925.  
  926. 4.4.17 file name: (Variable)
  927.  
  928. 4.4.17.1 The name of the file, with optional relative path.
  929. The path stored MUST not contain a drive or
  930. device letter, or a leading slash. All slashes
  931. MUST be forward slashes '/' as opposed to
  932. backwards slashes '\' for compatibility with Amiga
  933. and UNIX file systems etc. If input came from standard
  934. input, there is no file name field.
  935.  
  936. 4.4.17.2 If using the Central Directory Encryption Feature and
  937. general purpose bit flag 13 is set indicating masking, the file
  938. name stored in the Local Header will not be the actual file name.
  939. A masking value consisting of a unique hexadecimal value will
  940. be stored. This value will be sequentially incremented for each
  941. file in the archive. See the section on the Strong Encryption
  942. Specification for details on retrieving the encrypted file name.
  943. Refer to the section in this document entitled "Incorporating PKWARE
  944. Proprietary Technology into Your Product" for more information.
  945.  
  946.  
  947. 4.4.18 file comment: (Variable)
  948.  
  949. The comment for this file.
  950.  
  951. 4.4.19 number of this disk: (2 bytes)
  952.  
  953. The number of this disk, which contains central
  954. directory end record. If an archive is in ZIP64 format
  955. and the value in this field is 0xFFFF, the size will
  956. be in the corresponding 4 byte zip64 end of central
  957. directory field.
  958.  
  959.  
  960. 4.4.20 number of the disk with the start of the central
  961. directory: (2 bytes)
  962.  
  963. The number of the disk on which the central
  964. directory starts. If an archive is in ZIP64 format
  965. and the value in this field is 0xFFFF, the size will
  966. be in the corresponding 4 byte zip64 end of central
  967. directory field.
  968.  
  969. 4.4.21 total number of entries in the central dir on
  970. this disk: (2 bytes)
  971.  
  972. The number of central directory entries on this disk.
  973. If an archive is in ZIP64 format and the value in
  974. this field is 0xFFFF, the size will be in the
  975. corresponding 8 byte zip64 end of central
  976. directory field.
  977.  
  978. 4.4.22 total number of entries in the central dir: (2 bytes)
  979.  
  980. The total number of files in the .ZIP file. If an
  981. archive is in ZIP64 format and the value in this field
  982. is 0xFFFF, the size will be in the corresponding 8 byte
  983. zip64 end of central directory field.
  984.  
  985. 4.4.23 size of the central directory: (4 bytes)
  986.  
  987. The size (in bytes) of the entire central directory.
  988. If an archive is in ZIP64 format and the value in
  989. this field is 0xFFFFFFFF, the size will be in the
  990. corresponding 8 byte zip64 end of central
  991. directory field.
  992.  
  993. 4.4.24 offset of start of central directory with respect to
  994. the starting disk number: (4 bytes)
  995.  
  996. Offset of the start of the central directory on the
  997. disk on which the central directory starts. If an
  998. archive is in ZIP64 format and the value in this
  999. field is 0xFFFFFFFF, the size will be in the
  1000. corresponding 8 byte zip64 end of central
  1001. directory field.
  1002.  
  1003. 4.4.25 .ZIP file comment length: (2 bytes)
  1004.  
  1005. The length of the comment for this .ZIP file.
  1006.  
  1007. 4.4.26 .ZIP file comment: (Variable)
  1008.  
  1009. The comment for this .ZIP file. ZIP file comment data
  1010. is stored unsecured. No encryption or data authentication
  1011. is applied to this area at this time. Confidential information
  1012. should not be stored in this section.
  1013.  
  1014. 4.4.27 zip64 extensible data sector (variable size)
  1015.  
  1016. (currently reserved for use by PKWARE)
  1017.  
  1018.  
  1019. 4.4.28 extra field: (Variable)
  1020.  
  1021. This SHOULD be used for storage expansion. If additional
  1022. information needs to be stored within a ZIP file for special
  1023. application or platform needs, it SHOULD be stored here.
  1024. Programs supporting earlier versions of this specification can
  1025. then safely skip the file, and find the next file or header.
  1026. This field will be 0 length in version 1.0.
  1027.  
  1028. Existing extra fields are defined in the section
  1029. Extensible data fields that follows.
  1030.  
  1031. 4.5 Extensible data fields
  1032. --------------------------
  1033.  
  1034. 4.5.1 In order to allow different programs and different types
  1035. of information to be stored in the 'extra' field in .ZIP
  1036. files, the following structure MUST be used for all
  1037. programs storing data in this field:
  1038.  
  1039. header1+data1 + header2+data2 . . .
  1040.  
  1041. Each header should consist of:
  1042.  
  1043. Header ID - 2 bytes
  1044. Data Size - 2 bytes
  1045.  
  1046. Note: all fields stored in Intel low-byte/high-byte order.
  1047.  
  1048. The Header ID field indicates the type of data that is in
  1049. the following data block.
  1050.  
  1051. Header IDs of 0 thru 31 are reserved for use by PKWARE.
  1052. The remaining IDs can be used by third party vendors for
  1053. proprietary usage.
  1054.  
  1055. 4.5.2 The current Header ID mappings defined by PKWARE are:
  1056.  
  1057. 0x0001 Zip64 extended information extra field
  1058. 0x0007 AV Info
  1059. 0x0008 Reserved for extended language encoding data (PFS)
  1060. (see APPENDIX D)
  1061. 0x0009 OS/2
  1062. 0x000a NTFS
  1063. 0x000c OpenVMS
  1064. 0x000d UNIX
  1065. 0x000e Reserved for file stream and fork descriptors
  1066. 0x000f Patch Descriptor
  1067. 0x0014 PKCS#7 Store for X.509 Certificates
  1068. 0x0015 X.509 Certificate ID and Signature for
  1069. individual file
  1070. 0x0016 X.509 Certificate ID for Central Directory
  1071. 0x0017 Strong Encryption Header
  1072. 0x0018 Record Management Controls
  1073. 0x0019 PKCS#7 Encryption Recipient Certificate List
  1074. 0x0065 IBM S/390 (Z390), AS/400 (I400) attributes
  1075. - uncompressed
  1076. 0x0066 Reserved for IBM S/390 (Z390), AS/400 (I400)
  1077. attributes - compressed
  1078. 0x4690 POSZIP 4690 (reserved)
  1079.  
  1080.  
  1081. 4.5.3 -Zip64 Extended Information Extra Field (0x0001):
  1082.  
  1083. The following is the layout of the zip64 extended
  1084. information "extra" block. If one of the size or
  1085. offset fields in the Local or Central directory
  1086. record is too small to hold the required data,
  1087. a Zip64 extended information record is created.
  1088. The order of the fields in the zip64 extended
  1089. information record is fixed, but the fields MUST
  1090. only appear if the corresponding Local or Central
  1091. directory record field is set to 0xFFFF or 0xFFFFFFFF.
  1092.  
  1093. Note: all fields stored in Intel low-byte/high-byte order.
  1094.  
  1095. Value Size Description
  1096. ----- ---- -----------
  1097. (ZIP64) 0x0001 2 bytes Tag for this "extra" block type
  1098. Size 2 bytes Size of this "extra" block
  1099. Original
  1100. Size 8 bytes Original uncompressed file size
  1101. Compressed
  1102. Size 8 bytes Size of compressed data
  1103. Relative Header
  1104. Offset 8 bytes Offset of local header record
  1105. Disk Start
  1106. Number 4 bytes Number of the disk on which
  1107. this file starts
  1108.  
  1109. This entry in the Local header MUST include BOTH original
  1110. and compressed file size fields. If encrypting the
  1111. central directory and bit 13 of the general purpose bit
  1112. flag is set indicating masking, the value stored in the
  1113. Local Header for the original file size will be zero.
  1114.  
  1115.  
  1116. 4.5.4 -OS/2 Extra Field (0x0009):
  1117.  
  1118. The following is the layout of the OS/2 attributes "extra"
  1119. block. (Last Revision 09/05/95)
  1120.  
  1121. Note: all fields stored in Intel low-byte/high-byte order.
  1122.  
  1123. Value Size Description
  1124. ----- ---- -----------
  1125. (OS/2) 0x0009 2 bytes Tag for this "extra" block type
  1126. TSize 2 bytes Size for the following data block
  1127. BSize 4 bytes Uncompressed Block Size
  1128. CType 2 bytes Compression type
  1129. EACRC 4 bytes CRC value for uncompress block
  1130. (var) variable Compressed block
  1131.  
  1132. The OS/2 extended attribute structure (FEA2LIST) is
  1133. compressed and then stored in its entirety within this
  1134. structure. There will only ever be one "block" of data in
  1135. VarFields[].
  1136.  
  1137. 4.5.5 -NTFS Extra Field (0x000a):
  1138.  
  1139. The following is the layout of the NTFS attributes
  1140. "extra" block. (Note: At this time the Mtime, Atime
  1141. and Ctime values MAY be used on any WIN32 system.)
  1142.  
  1143. Note: all fields stored in Intel low-byte/high-byte order.
  1144.  
  1145. Value Size Description
  1146. ----- ---- -----------
  1147. (NTFS) 0x000a 2 bytes Tag for this "extra" block type
  1148. TSize 2 bytes Size of the total "extra" block
  1149. Reserved 4 bytes Reserved for future use
  1150. Tag1 2 bytes NTFS attribute tag value #1
  1151. Size1 2 bytes Size of attribute #1, in bytes
  1152. (var) Size1 Attribute #1 data
  1153. .
  1154. .
  1155. .
  1156. TagN 2 bytes NTFS attribute tag value #N
  1157. SizeN 2 bytes Size of attribute #N, in bytes
  1158. (var) SizeN Attribute #N data
  1159.  
  1160. For NTFS, values for Tag1 through TagN are as follows:
  1161. (currently only one set of attributes is defined for NTFS)
  1162.  
  1163. Tag Size Description
  1164. ----- ---- -----------
  1165. 0x0001 2 bytes Tag for attribute #1
  1166. Size1 2 bytes Size of attribute #1, in bytes
  1167. Mtime 8 bytes File last modification time
  1168. Atime 8 bytes File last access time
  1169. Ctime 8 bytes File creation time
  1170.  
  1171. 4.5.6 -OpenVMS Extra Field (0x000c):
  1172.  
  1173. The following is the layout of the OpenVMS attributes
  1174. "extra" block.
  1175.  
  1176. Note: all fields stored in Intel low-byte/high-byte order.
  1177.  
  1178. Value Size Description
  1179. ----- ---- -----------
  1180. (VMS) 0x000c 2 bytes Tag for this "extra" block type
  1181. TSize 2 bytes Size of the total "extra" block
  1182. CRC 4 bytes 32-bit CRC for remainder of the block
  1183. Tag1 2 bytes OpenVMS attribute tag value #1
  1184. Size1 2 bytes Size of attribute #1, in bytes
  1185. (var) Size1 Attribute #1 data
  1186. .
  1187. .
  1188. .
  1189. TagN 2 bytes OpenVMS attribute tag value #N
  1190. SizeN 2 bytes Size of attribute #N, in bytes
  1191. (var) SizeN Attribute #N data
  1192.  
  1193. OpenVMS Extra Field Rules:
  1194.  
  1195. 4.5.6.1. There will be one or more attributes present, which
  1196. will each be preceded by the above TagX & SizeX values.
  1197. These values are identical to the ATR$C_XXXX and ATR$S_XXXX
  1198. constants which are defined in ATR.H under OpenVMS C. Neither
  1199. of these values will ever be zero.
  1200.  
  1201. 4.5.6.2. No word alignment or padding is performed.
  1202.  
  1203. 4.5.6.3. A well-behaved PKZIP/OpenVMS program should never produce
  1204. more than one sub-block with the same TagX value. Also, there will
  1205. never be more than one "extra" block of type 0x000c in a particular
  1206. directory record.
  1207.  
  1208. 4.5.7 -UNIX Extra Field (0x000d):
  1209.  
  1210. The following is the layout of the UNIX "extra" block.
  1211. Note: all fields are stored in Intel low-byte/high-byte
  1212. order.
  1213.  
  1214. Value Size Description
  1215. ----- ---- -----------
  1216. (UNIX) 0x000d 2 bytes Tag for this "extra" block type
  1217. TSize 2 bytes Size for the following data block
  1218. Atime 4 bytes File last access time
  1219. Mtime 4 bytes File last modification time
  1220. Uid 2 bytes File user ID
  1221. Gid 2 bytes File group ID
  1222. (var) variable Variable length data field
  1223.  
  1224. The variable length data field will contain file type
  1225. specific data. Currently the only values allowed are
  1226. the original "linked to" file names for hard or symbolic
  1227. links, and the major and minor device node numbers for
  1228. character and block device nodes. Since device nodes
  1229. cannot be either symbolic or hard links, only one set of
  1230. variable length data is stored. Link files will have the
  1231. name of the original file stored. This name is NOT NULL
  1232. terminated. Its size can be determined by checking TSize -
  1233. 12. Device entries will have eight bytes stored as two 4
  1234. byte entries (in little endian format). The first entry
  1235. will be the major device number, and the second the minor
  1236. device number.
  1237.  
  1238. 4.5.8 -PATCH Descriptor Extra Field (0x000f):
  1239.  
  1240. 4.5.8.1 The following is the layout of the Patch Descriptor
  1241. "extra" block.
  1242.  
  1243. Note: all fields stored in Intel low-byte/high-byte order.
  1244.  
  1245. Value Size Description
  1246. ----- ---- -----------
  1247. (Patch) 0x000f 2 bytes Tag for this "extra" block type
  1248. TSize 2 bytes Size of the total "extra" block
  1249. Version 2 bytes Version of the descriptor
  1250. Flags 4 bytes Actions and reactions (see below)
  1251. OldSize 4 bytes Size of the file about to be patched
  1252. OldCRC 4 bytes 32-bit CRC of the file to be patched
  1253. NewSize 4 bytes Size of the resulting file
  1254. NewCRC 4 bytes 32-bit CRC of the resulting file
  1255.  
  1256. 4.5.8.2 Actions and reactions
  1257.  
  1258. Bits Description
  1259. ---- ----------------
  1260. 0 Use for auto detection
  1261. 1 Treat as a self-patch
  1262. 2-3 RESERVED
  1263. 4-5 Action (see below)
  1264. 6-7 RESERVED
  1265. 8-9 Reaction (see below) to absent file
  1266. 10-11 Reaction (see below) to newer file
  1267. 12-13 Reaction (see below) to unknown file
  1268. 14-15 RESERVED
  1269. 16-31 RESERVED
  1270.  
  1271. 4.5.8.2.1 Actions
  1272.  
  1273. Action Value
  1274. ------ -----
  1275. none 0
  1276. add 1
  1277. delete 2
  1278. patch 3
  1279.  
  1280. 4.5.8.2.2 Reactions
  1281.  
  1282. Reaction Value
  1283. -------- -----
  1284. ask 0
  1285. skip 1
  1286. ignore 2
  1287. fail 3
  1288.  
  1289. 4.5.8.3 Patch support is provided by PKPatchMaker(tm) technology
  1290. and is covered under U.S. Patents and Patents Pending. The use or
  1291. implementation in a product of certain technological aspects set
  1292. forth in the current APPNOTE, including those with regard to
  1293. strong encryption or patching requires a license from PKWARE.
  1294. Refer to the section in this document entitled "Incorporating
  1295. PKWARE Proprietary Technology into Your Product" for more
  1296. information.
  1297.  
  1298. 4.5.9 -PKCS#7 Store for X.509 Certificates (0x0014):
  1299.  
  1300. This field MUST contain information about each of the certificates
  1301. files may be signed with. When the Central Directory Encryption
  1302. feature is enabled for a ZIP file, this record will appear in
  1303. the Archive Extra Data Record, otherwise it will appear in the
  1304. first central directory record and will be ignored in any
  1305. other record.
  1306.  
  1307.  
  1308. Note: all fields stored in Intel low-byte/high-byte order.
  1309.  
  1310. Value Size Description
  1311. ----- ---- -----------
  1312. (Store) 0x0014 2 bytes Tag for this "extra" block type
  1313. TSize 2 bytes Size of the store data
  1314. TData TSize Data about the store
  1315.  
  1316.  
  1317. 4.5.10 -X.509 Certificate ID and Signature for individual file (0x0015):
  1318.  
  1319. This field contains the information about which certificate in
  1320. the PKCS#7 store was used to sign a particular file. It also
  1321. contains the signature data. This field can appear multiple
  1322. times, but can only appear once per certificate.
  1323.  
  1324. Note: all fields stored in Intel low-byte/high-byte order.
  1325.  
  1326. Value Size Description
  1327. ----- ---- -----------
  1328. (CID) 0x0015 2 bytes Tag for this "extra" block type
  1329. TSize 2 bytes Size of data that follows
  1330. TData TSize Signature Data
  1331.  
  1332. 4.5.11 -X.509 Certificate ID and Signature for central directory (0x0016):
  1333.  
  1334. This field contains the information about which certificate in
  1335. the PKCS#7 store was used to sign the central directory structure.
  1336. When the Central Directory Encryption feature is enabled for a
  1337. ZIP file, this record will appear in the Archive Extra Data Record,
  1338. otherwise it will appear in the first central directory record.
  1339.  
  1340. Note: all fields stored in Intel low-byte/high-byte order.
  1341.  
  1342. Value Size Description
  1343. ----- ---- -----------
  1344. (CDID) 0x0016 2 bytes Tag for this "extra" block type
  1345. TSize 2 bytes Size of data that follows
  1346. TData TSize Data
  1347.  
  1348. 4.5.12 -Strong Encryption Header (0x0017):
  1349.  
  1350. Value Size Description
  1351. ----- ---- -----------
  1352. 0x0017 2 bytes Tag for this "extra" block type
  1353. TSize 2 bytes Size of data that follows
  1354. Format 2 bytes Format definition for this record
  1355. AlgID 2 bytes Encryption algorithm identifier
  1356. Bitlen 2 bytes Bit length of encryption key
  1357. Flags 2 bytes Processing flags
  1358. CertData TSize-8 Certificate decryption extra field data
  1359. (refer to the explanation for CertData
  1360. in the section describing the
  1361. Certificate Processing Method under
  1362. the Strong Encryption Specification)
  1363.  
  1364. See the section describing the Strong Encryption Specification
  1365. for details. Refer to the section in this document entitled
  1366. "Incorporating PKWARE Proprietary Technology into Your Product"
  1367. for more information.
  1368.  
  1369. 4.5.13 -Record Management Controls (0x0018):
  1370.  
  1371. Value Size Description
  1372. ----- ---- -----------
  1373. (Rec-CTL) 0x0018 2 bytes Tag for this "extra" block type
  1374. CSize 2 bytes Size of total extra block data
  1375. Tag1 2 bytes Record control attribute 1
  1376. Size1 2 bytes Size of attribute 1, in bytes
  1377. Data1 Size1 Attribute 1 data
  1378. .
  1379. .
  1380. .
  1381. TagN 2 bytes Record control attribute N
  1382. SizeN 2 bytes Size of attribute N, in bytes
  1383. DataN SizeN Attribute N data
  1384.  
  1385.  
  1386. 4.5.14 -PKCS#7 Encryption Recipient Certificate List (0x0019):
  1387.  
  1388. This field MAY contain information about each of the certificates
  1389. used in encryption processing and it can be used to identify who is
  1390. allowed to decrypt encrypted files. This field should only appear
  1391. in the archive extra data record. This field is not required and
  1392. serves only to aid archive modifications by preserving public
  1393. encryption key data. Individual security requirements may dictate
  1394. that this data be omitted to deter information exposure.
  1395.  
  1396. Note: all fields stored in Intel low-byte/high-byte order.
  1397.  
  1398. Value Size Description
  1399. ----- ---- -----------
  1400. (CStore) 0x0019 2 bytes Tag for this "extra" block type
  1401. TSize 2 bytes Size of the store data
  1402. TData TSize Data about the store
  1403.  
  1404. TData:
  1405.  
  1406. Value Size Description
  1407. ----- ---- -----------
  1408. Version 2 bytes Format version number - must 0x0001 at this time
  1409. CStore (var) PKCS#7 data blob
  1410.  
  1411. See the section describing the Strong Encryption Specification
  1412. for details. Refer to the section in this document entitled
  1413. "Incorporating PKWARE Proprietary Technology into Your Product"
  1414. for more information.
  1415.  
  1416. 4.5.15 -MVS Extra Field (0x0065):
  1417.  
  1418. The following is the layout of the MVS "extra" block.
  1419. Note: Some fields are stored in Big Endian format.
  1420. All text is in EBCDIC format unless otherwise specified.
  1421.  
  1422. Value Size Description
  1423. ----- ---- -----------
  1424. (MVS) 0x0065 2 bytes Tag for this "extra" block type
  1425. TSize 2 bytes Size for the following data block
  1426. ID 4 bytes EBCDIC "Z390" 0xE9F3F9F0 or
  1427. "T4MV" for TargetFour
  1428. (var) TSize-4 Attribute data (see APPENDIX B)
  1429.  
  1430.  
  1431. 4.5.16 -OS/400 Extra Field (0x0065):
  1432.  
  1433. The following is the layout of the OS/400 "extra" block.
  1434. Note: Some fields are stored in Big Endian format.
  1435. All text is in EBCDIC format unless otherwise specified.
  1436.  
  1437. Value Size Description
  1438. ----- ---- -----------
  1439. (OS400) 0x0065 2 bytes Tag for this "extra" block type
  1440. TSize 2 bytes Size for the following data block
  1441. ID 4 bytes EBCDIC "I400" 0xC9F4F0F0 or
  1442. "T4MV" for TargetFour
  1443. (var) TSize-4 Attribute data (see APPENDIX A)
  1444.  
  1445. 4.6 Third Party Mappings
  1446. ------------------------
  1447.  
  1448. 4.6.1 Third party mappings commonly used are:
  1449.  
  1450. 0x07c8 Macintosh
  1451. 0x2605 ZipIt Macintosh
  1452. 0x2705 ZipIt Macintosh 1.3.5+
  1453. 0x2805 ZipIt Macintosh 1.3.5+
  1454. 0x334d Info-ZIP Macintosh
  1455. 0x4341 Acorn/SparkFS
  1456. 0x4453 Windows NT security descriptor (binary ACL)
  1457. 0x4704 VM/CMS
  1458. 0x470f MVS
  1459. 0x4b46 FWKCS MD5 (see below)
  1460. 0x4c41 OS/2 access control list (text ACL)
  1461. 0x4d49 Info-ZIP OpenVMS
  1462. 0x4f4c Xceed original location extra field
  1463. 0x5356 AOS/VS (ACL)
  1464. 0x5455 extended timestamp
  1465. 0x554e Xceed unicode extra field
  1466. 0x5855 Info-ZIP UNIX (original, also OS/2, NT, etc)
  1467. 0x6375 Info-ZIP Unicode Comment Extra Field
  1468. 0x6542 BeOS/BeBox
  1469. 0x7075 Info-ZIP Unicode Path Extra Field
  1470. 0x756e ASi UNIX
  1471. 0x7855 Info-ZIP UNIX (new)
  1472. 0xa220 Microsoft Open Packaging Growth Hint
  1473. 0xfd4a SMS/QDOS
  1474.  
  1475. Detailed descriptions of Extra Fields defined by third
  1476. party mappings will be documented as information on
  1477. these data structures is made available to PKWARE.
  1478. PKWARE does not guarantee the accuracy of any published
  1479. third party data.
  1480.  
  1481. 4.6.2 Third-party Extra Fields must include a Header ID using
  1482. the format defined in the section of this document
  1483. titled Extensible Data Fields (section 4.5).
  1484.  
  1485. The Data Size field indicates the size of the following
  1486. data block. Programs can use this value to skip to the
  1487. next header block, passing over any data blocks that are
  1488. not of interest.
  1489.  
  1490. Note: As stated above, the size of the entire .ZIP file
  1491. header, including the file name, comment, and extra
  1492. field should not exceed 64K in size.
  1493.  
  1494. 4.6.3 In case two different programs should appropriate the same
  1495. Header ID value, it is strongly recommended that each
  1496. program SHOULD place a unique signature of at least two bytes in
  1497. size (and preferably 4 bytes or bigger) at the start of
  1498. each data area. Every program SHOULD verify that its
  1499. unique signature is present, in addition to the Header ID
  1500. value being correct, before assuming that it is a block of
  1501. known type.
  1502.  
  1503. Third-party Mappings:
  1504.  
  1505. 4.6.4 -ZipIt Macintosh Extra Field (long) (0x2605):
  1506.  
  1507. The following is the layout of the ZipIt extra block
  1508. for Macintosh. The local-header and central-header versions
  1509. are identical. This block must be present if the file is
  1510. stored MacBinary-encoded and it should not be used if the file
  1511. is not stored MacBinary-encoded.
  1512.  
  1513. Value Size Description
  1514. ----- ---- -----------
  1515. (Mac2) 0x2605 Short tag for this extra block type
  1516. TSize Short total data size for this block
  1517. "ZPIT" beLong extra-field signature
  1518. FnLen Byte length of FileName
  1519. FileName variable full Macintosh filename
  1520. FileType Byte[4] four-byte Mac file type string
  1521. Creator Byte[4] four-byte Mac creator string
  1522.  
  1523.  
  1524. 4.6.5 -ZipIt Macintosh Extra Field (short, for files) (0x2705):
  1525.  
  1526. The following is the layout of a shortened variant of the
  1527. ZipIt extra block for Macintosh (without "full name" entry).
  1528. This variant is used by ZipIt 1.3.5 and newer for entries of
  1529. files (not directories) that do not have a MacBinary encoded
  1530. file. The local-header and central-header versions are identical.
  1531.  
  1532. Value Size Description
  1533. ----- ---- -----------
  1534. (Mac2b) 0x2705 Short tag for this extra block type
  1535. TSize Short total data size for this block (12)
  1536. "ZPIT" beLong extra-field signature
  1537. FileType Byte[4] four-byte Mac file type string
  1538. Creator Byte[4] four-byte Mac creator string
  1539. fdFlags beShort attributes from FInfo.frFlags,
  1540. may be omitted
  1541. 0x0000 beShort reserved, may be omitted
  1542.  
  1543.  
  1544. 4.6.6 -ZipIt Macintosh Extra Field (short, for directories) (0x2805):
  1545.  
  1546. The following is the layout of a shortened variant of the
  1547. ZipIt extra block for Macintosh used only for directory
  1548. entries. This variant is used by ZipIt 1.3.5 and newer to
  1549. save some optional Mac-specific information about directories.
  1550. The local-header and central-header versions are identical.
  1551.  
  1552. Value Size Description
  1553. ----- ---- -----------
  1554. (Mac2c) 0x2805 Short tag for this extra block type
  1555. TSize Short total data size for this block (12)
  1556. "ZPIT" beLong extra-field signature
  1557. frFlags beShort attributes from DInfo.frFlags, may
  1558. be omitted
  1559. View beShort ZipIt view flag, may be omitted
  1560.  
  1561.  
  1562. The View field specifies ZipIt-internal settings as follows:
  1563.  
  1564. Bits of the Flags:
  1565. bit 0 if set, the folder is shown expanded (open)
  1566. when the archive contents are viewed in ZipIt.
  1567. bits 1-15 reserved, zero;
  1568.  
  1569.  
  1570. 4.6.7 -FWKCS MD5 Extra Field (0x4b46):
  1571.  
  1572. The FWKCS Contents_Signature System, used in
  1573. automatically identifying files independent of file name,
  1574. optionally adds and uses an extra field to support the
  1575. rapid creation of an enhanced contents_signature:
  1576.  
  1577. Header ID = 0x4b46
  1578. Data Size = 0x0013
  1579. Preface = 'M','D','5'
  1580. followed by 16 bytes containing the uncompressed file's
  1581. 128_bit MD5 hash(1), low byte first.
  1582.  
  1583. When FWKCS revises a .ZIP file central directory to add
  1584. this extra field for a file, it also replaces the
  1585. central directory entry for that file's uncompressed
  1586. file length with a measured value.
  1587.  
  1588. FWKCS provides an option to strip this extra field, if
  1589. present, from a .ZIP file central directory. In adding
  1590. this extra field, FWKCS preserves .ZIP file Authenticity
  1591. Verification; if stripping this extra field, FWKCS
  1592. preserves all versions of AV through PKZIP version 2.04g.
  1593.  
  1594. FWKCS, and FWKCS Contents_Signature System, are
  1595. trademarks of Frederick W. Kantor.
  1596.  
  1597. (1) R. Rivest, RFC1321.TXT, MIT Laboratory for Computer
  1598. Science and RSA Data Security, Inc., April 1992.
  1599. ll.76-77: "The MD5 algorithm is being placed in the
  1600. public domain for review and possible adoption as a
  1601. standard."
  1602.  
  1603.  
  1604. 4.6.8 -Info-ZIP Unicode Comment Extra Field (0x6375):
  1605.  
  1606. Stores the UTF-8 version of the file comment as stored in the
  1607. central directory header. (Last Revision 20070912)
  1608.  
  1609. Value Size Description
  1610. ----- ---- -----------
  1611. (UCom) 0x6375 Short tag for this extra block type ("uc")
  1612. TSize Short total data size for this block
  1613. Version 1 byte version of this extra field, currently 1
  1614. ComCRC32 4 bytes Comment Field CRC32 Checksum
  1615. UnicodeCom Variable UTF-8 version of the entry comment
  1616.  
  1617. Currently Version is set to the number 1. If there is a need
  1618. to change this field, the version will be incremented. Changes
  1619. may not be backward compatible so this extra field should not be
  1620. used if the version is not recognized.
  1621.  
  1622. The ComCRC32 is the standard zip CRC32 checksum of the File Comment
  1623. field in the central directory header. This is used to verify that
  1624. the comment field has not changed since the Unicode Comment extra field
  1625. was created. This can happen if a utility changes the File Comment
  1626. field but does not update the UTF-8 Comment extra field. If the CRC
  1627. check fails, this Unicode Comment extra field should be ignored and
  1628. the File Comment field in the header should be used instead.
  1629.  
  1630. The UnicodeCom field is the UTF-8 version of the File Comment field
  1631. in the header. As UnicodeCom is defined to be UTF-8, no UTF-8 byte
  1632. order mark (BOM) is used. The length of this field is determined by
  1633. subtracting the size of the previous fields from TSize. If both the
  1634. File Name and Comment fields are UTF-8, the new General Purpose Bit
  1635. Flag, bit 11 (Language encoding flag (EFS)), can be used to indicate
  1636. both the header File Name and Comment fields are UTF-8 and, in this
  1637. case, the Unicode Path and Unicode Comment extra fields are not
  1638. needed and should not be created. Note that, for backward
  1639. compatibility, bit 11 should only be used if the native character set
  1640. of the paths and comments being zipped up are already in UTF-8. It is
  1641. expected that the same file comment storage method, either general
  1642. purpose bit 11 or extra fields, be used in both the Local and Central
  1643. Directory Header for a file.
  1644.  
  1645.  
  1646. 4.6.9 -Info-ZIP Unicode Path Extra Field (0x7075):
  1647.  
  1648. Stores the UTF-8 version of the file name field as stored in the
  1649. local header and central directory header. (Last Revision 20070912)
  1650.  
  1651. Value Size Description
  1652. ----- ---- -----------
  1653. (UPath) 0x7075 Short tag for this extra block type ("up")
  1654. TSize Short total data size for this block
  1655. Version 1 byte version of this extra field, currently 1
  1656. NameCRC32 4 bytes File Name Field CRC32 Checksum
  1657. UnicodeName Variable UTF-8 version of the entry File Name
  1658.  
  1659. Currently Version is set to the number 1. If there is a need
  1660. to change this field, the version will be incremented. Changes
  1661. may not be backward compatible so this extra field should not be
  1662. used if the version is not recognized.
  1663.  
  1664. The NameCRC32 is the standard zip CRC32 checksum of the File Name
  1665. field in the header. This is used to verify that the header
  1666. File Name field has not changed since the Unicode Path extra field
  1667. was created. This can happen if a utility renames the File Name but
  1668. does not update the UTF-8 path extra field. If the CRC check fails,
  1669. this UTF-8 Path Extra Field should be ignored and the File Name field
  1670. in the header should be used instead.
  1671.  
  1672. The UnicodeName is the UTF-8 version of the contents of the File Name
  1673. field in the header. As UnicodeName is defined to be UTF-8, no UTF-8
  1674. byte order mark (BOM) is used. The length of this field is determined
  1675. by subtracting the size of the previous fields from TSize. If both
  1676. the File Name and Comment fields are UTF-8, the new General Purpose
  1677. Bit Flag, bit 11 (Language encoding flag (EFS)), can be used to
  1678. indicate that both the header File Name and Comment fields are UTF-8
  1679. and, in this case, the Unicode Path and Unicode Comment extra fields
  1680. are not needed and should not be created. Note that, for backward
  1681. compatibility, bit 11 should only be used if the native character set
  1682. of the paths and comments being zipped up are already in UTF-8. It is
  1683. expected that the same file name storage method, either general
  1684. purpose bit 11 or extra fields, be used in both the Local and Central
  1685. Directory Header for a file.
  1686.  
  1687.  
  1688. 4.6.10 -Microsoft Open Packaging Growth Hint (0xa220):
  1689.  
  1690. Value Size Description
  1691. ----- ---- -----------
  1692. 0xa220 Short tag for this extra block type
  1693. TSize Short size of Sig + PadVal + Padding
  1694. Sig Short verification signature (A028)
  1695. PadVal Short Initial padding value
  1696. Padding variable filled with NULL characters
  1697.  
  1698. 4.7 Manifest Files
  1699. ------------------
  1700.  
  1701. 4.7.1 Applications using ZIP files may have a need for additional
  1702. information that must be included with the files placed into
  1703. a ZIP file. Application specific information that cannot be
  1704. stored using the defined ZIP storage records SHOULD be stored
  1705. using the extensible Extra Field convention defined in this
  1706. document. However, some applications may use a manifest
  1707. file as a means for storing additional information. One
  1708. example is the META-INF/MANIFEST.MF file used in ZIP formatted
  1709. files having the .JAR extension (JAR files).
  1710.  
  1711. 4.7.2 A manifest file is a file created for the application process
  1712. that requires this information. A manifest file MAY be of any
  1713. file type required by the defining application process. It is
  1714. placed within the same ZIP file as files to which this information
  1715. applies. By convention, this file is typically the first file placed
  1716. into the ZIP file and it may include a defined directory path.
  1717.  
  1718. 4.7.3 Manifest files may be compressed or encrypted as needed for
  1719. application processing of the files inside the ZIP files.
  1720.  
  1721. Manifest files are outside of the scope of this specification.
  1722.  
  1723.  
  1724. 5.0 Explanation of compression methods
  1725. --------------------------------------
  1726.  
  1727.  
  1728. 5.1 UnShrinking - Method 1
  1729. --------------------------
  1730.  
  1731. 5.1.1 Shrinking is a Dynamic Ziv-Lempel-Welch compression algorithm
  1732. with partial clearing. The initial code size is 9 bits, and the
  1733. maximum code size is 13 bits. Shrinking differs from conventional
  1734. Dynamic Ziv-Lempel-Welch implementations in several respects:
  1735.  
  1736. 5.1.2 The code size is controlled by the compressor, and is
  1737. not automatically increased when codes larger than the current
  1738. code size are created (but not necessarily used). When
  1739. the decompressor encounters the code sequence 256
  1740. (decimal) followed by 1, it should increase the code size
  1741. read from the input stream to the next bit size. No
  1742. blocking of the codes is performed, so the next code at
  1743. the increased size should be read from the input stream
  1744. immediately after where the previous code at the smaller
  1745. bit size was read. Again, the decompressor should not
  1746. increase the code size used until the sequence 256,1 is
  1747. encountered.
  1748.  
  1749. 5.1.3 When the table becomes full, total clearing is not
  1750. performed. Rather, when the compressor emits the code
  1751. sequence 256,2 (decimal), the decompressor should clear
  1752. all leaf nodes from the Ziv-Lempel tree, and continue to
  1753. use the current code size. The nodes that are cleared
  1754. from the Ziv-Lempel tree are then re-used, with the lowest
  1755. code value re-used first, and the highest code value
  1756. re-used last. The compressor can emit the sequence 256,2
  1757. at any time.
  1758.  
  1759. 5.2 Expanding - Methods 2-5
  1760. ---------------------------
  1761.  
  1762. 5.2.1 The Reducing algorithm is actually a combination of two
  1763. distinct algorithms. The first algorithm compresses repeated
  1764. byte sequences, and the second algorithm takes the compressed
  1765. stream from the first algorithm and applies a probabilistic
  1766. compression method.
  1767.  
  1768. 5.2.2 The probabilistic compression stores an array of 'follower
  1769. sets' S(j), for j=0 to 255, corresponding to each possible
  1770. ASCII character. Each set contains between 0 and 32
  1771. characters, to be denoted as S(j)[0],...,S(j)[m], where m<32.
  1772. The sets are stored at the beginning of the data area for a
  1773. Reduced file, in reverse order, with S(255) first, and S(0)
  1774. last.
  1775.  
  1776. 5.2.3 The sets are encoded as { N(j), S(j)[0],...,S(j)[N(j)-1] },
  1777. where N(j) is the size of set S(j). N(j) can be 0, in which
  1778. case the follower set for S(j) is empty. Each N(j) value is
  1779. encoded in 6 bits, followed by N(j) eight bit character values
  1780. corresponding to S(j)[0] to S(j)[N(j)-1] respectively. If
  1781. N(j) is 0, then no values for S(j) are stored, and the value
  1782. for N(j-1) immediately follows.
  1783.  
  1784. 5.2.4 Immediately after the follower sets, is the compressed data
  1785. stream. The compressed data stream can be interpreted for the
  1786. probabilistic decompression as follows:
  1787.  
  1788. let Last-Character <- 0.
  1789. loop until done
  1790. if the follower set S(Last-Character) is empty then
  1791. read 8 bits from the input stream, and copy this
  1792. value to the output stream.
  1793. otherwise if the follower set S(Last-Character) is non-empty then
  1794. read 1 bit from the input stream.
  1795. if this bit is not zero then
  1796. read 8 bits from the input stream, and copy this
  1797. value to the output stream.
  1798. otherwise if this bit is zero then
  1799. read B(N(Last-Character)) bits from the input
  1800. stream, and assign this value to I.
  1801. Copy the value of S(Last-Character)[I] to the
  1802. output stream.
  1803.  
  1804. assign the last value placed on the output stream to
  1805. Last-Character.
  1806. end loop
  1807.  
  1808. B(N(j)) is defined as the minimal number of bits required to
  1809. encode the value N(j)-1.
  1810.  
  1811. 5.2.5 The decompressed stream from above can then be expanded to
  1812. re-create the original file as follows:
  1813.  
  1814. let State <- 0.
  1815.  
  1816. loop until done
  1817. read 8 bits from the input stream into C.
  1818. case State of
  1819. 0: if C is not equal to DLE (144 decimal) then
  1820. copy C to the output stream.
  1821. otherwise if C is equal to DLE then
  1822. let State <- 1.
  1823.  
  1824. 1: if C is non-zero then
  1825. let V <- C.
  1826. let Len <- L(V)
  1827. let State <- F(Len).
  1828. otherwise if C is zero then
  1829. copy the value 144 (decimal) to the output stream.
  1830. let State <- 0
  1831.  
  1832. 2: let Len <- Len + C
  1833. let State <- 3.
  1834.  
  1835. 3: move backwards D(V,C) bytes in the output stream
  1836. (if this position is before the start of the output
  1837. stream, then assume that all the data before the
  1838. start of the output stream is filled with zeros).
  1839. copy Len+3 bytes from this position to the output stream.
  1840. let State <- 0.
  1841. end case
  1842. end loop
  1843.  
  1844. The functions F,L, and D are dependent on the 'compression
  1845. factor', 1 through 4, and are defined as follows:
  1846.  
  1847. For compression factor 1:
  1848. L(X) equals the lower 7 bits of X.
  1849. F(X) equals 2 if X equals 127 otherwise F(X) equals 3.
  1850. D(X,Y) equals the (upper 1 bit of X) * 256 + Y + 1.
  1851. For compression factor 2:
  1852. L(X) equals the lower 6 bits of X.
  1853. F(X) equals 2 if X equals 63 otherwise F(X) equals 3.
  1854. D(X,Y) equals the (upper 2 bits of X) * 256 + Y + 1.
  1855. For compression factor 3:
  1856. L(X) equals the lower 5 bits of X.
  1857. F(X) equals 2 if X equals 31 otherwise F(X) equals 3.
  1858. D(X,Y) equals the (upper 3 bits of X) * 256 + Y + 1.
  1859. For compression factor 4:
  1860. L(X) equals the lower 4 bits of X.
  1861. F(X) equals 2 if X equals 15 otherwise F(X) equals 3.
  1862. D(X,Y) equals the (upper 4 bits of X) * 256 + Y + 1.
  1863.  
  1864. 5.3 Imploding - Method 6
  1865. ------------------------
  1866.  
  1867. 5.3.1 The Imploding algorithm is actually a combination of two
  1868. distinct algorithms. The first algorithm compresses repeated byte
  1869. sequences using a sliding dictionary. The second algorithm is
  1870. used to compress the encoding of the sliding dictionary output,
  1871. using multiple Shannon-Fano trees.
  1872.  
  1873. 5.3.2 The Imploding algorithm can use a 4K or 8K sliding dictionary
  1874. size. The dictionary size used can be determined by bit 1 in the
  1875. general purpose flag word; a 0 bit indicates a 4K dictionary
  1876. while a 1 bit indicates an 8K dictionary.
  1877.  
  1878. 5.3.3 The Shannon-Fano trees are stored at the start of the
  1879. compressed file. The number of trees stored is defined by bit 2 in
  1880. the general purpose flag word; a 0 bit indicates two trees stored,
  1881. a 1 bit indicates three trees are stored. If 3 trees are stored,
  1882. the first Shannon-Fano tree represents the encoding of the
  1883. Literal characters, the second tree represents the encoding of
  1884. the Length information, the third represents the encoding of the
  1885. Distance information. When 2 Shannon-Fano trees are stored, the
  1886. Length tree is stored first, followed by the Distance tree.
  1887.  
  1888. 5.3.4 The Literal Shannon-Fano tree, if present is used to represent
  1889. the entire ASCII character set, and contains 256 values. This
  1890. tree is used to compress any data not compressed by the sliding
  1891. dictionary algorithm. When this tree is present, the Minimum
  1892. Match Length for the sliding dictionary is 3. If this tree is
  1893. not present, the Minimum Match Length is 2.
  1894.  
  1895. 5.3.5 The Length Shannon-Fano tree is used to compress the Length
  1896. part of the (length,distance) pairs from the sliding dictionary
  1897. output. The Length tree contains 64 values, ranging from the
  1898. Minimum Match Length, to 63 plus the Minimum Match Length.
  1899.  
  1900. 5.3.6 The Distance Shannon-Fano tree is used to compress the Distance
  1901. part of the (length,distance) pairs from the sliding dictionary
  1902. output. The Distance tree contains 64 values, ranging from 0 to
  1903. 63, representing the upper 6 bits of the distance value. The
  1904. distance values themselves will be between 0 and the sliding
  1905. dictionary size, either 4K or 8K.
  1906.  
  1907. 5.3.7 The Shannon-Fano trees themselves are stored in a compressed
  1908. format. The first byte of the tree data represents the number of
  1909. bytes of data representing the (compressed) Shannon-Fano tree
  1910. minus 1. The remaining bytes represent the Shannon-Fano tree
  1911. data encoded as:
  1912.  
  1913. High 4 bits: Number of values at this bit length + 1. (1 - 16)
  1914. Low 4 bits: Bit Length needed to represent value + 1. (1 - 16)
  1915.  
  1916. 5.3.8 The Shannon-Fano codes can be constructed from the bit lengths
  1917. using the following algorithm:
  1918.  
  1919. 1) Sort the Bit Lengths in ascending order, while retaining the
  1920. order of the original lengths stored in the file.
  1921.  
  1922. 2) Generate the Shannon-Fano trees:
  1923.  
  1924. Code <- 0
  1925. CodeIncrement <- 0
  1926. LastBitLength <- 0
  1927. i <- number of Shannon-Fano codes - 1 (either 255 or 63)
  1928.  
  1929. loop while i >= 0
  1930. Code = Code + CodeIncrement
  1931. if BitLength(i) <> LastBitLength then
  1932. LastBitLength=BitLength(i)
  1933. CodeIncrement = 1 shifted left (16 - LastBitLength)
  1934. ShannonCode(i) = Code
  1935. i <- i - 1
  1936. end loop
  1937.  
  1938. 3) Reverse the order of all the bits in the above ShannonCode()
  1939. vector, so that the most significant bit becomes the least
  1940. significant bit. For example, the value 0x1234 (hex) would
  1941. become 0x2C48 (hex).
  1942.  
  1943. 4) Restore the order of Shannon-Fano codes as originally stored
  1944. within the file.
  1945.  
  1946. Example:
  1947.  
  1948. This example will show the encoding of a Shannon-Fano tree
  1949. of size 8. Notice that the actual Shannon-Fano trees used
  1950. for Imploding are either 64 or 256 entries in size.
  1951.  
  1952. Example: 0x02, 0x42, 0x01, 0x13
  1953.  
  1954. The first byte indicates 3 values in this table. Decoding the
  1955. bytes:
  1956. 0x42 = 5 codes of 3 bits long
  1957. 0x01 = 1 code of 2 bits long
  1958. 0x13 = 2 codes of 4 bits long
  1959.  
  1960. This would generate the original bit length array of:
  1961. (3, 3, 3, 3, 3, 2, 4, 4)
  1962.  
  1963. There are 8 codes in this table for the values 0 thru 7. Using
  1964. the algorithm to obtain the Shannon-Fano codes produces:
  1965.  
  1966. Reversed Order Original
  1967. Val Sorted Constructed Code Value Restored Length
  1968. --- ------ ----------------- -------- -------- ------
  1969. 0: 2 1100000000000000 11 101 3
  1970. 1: 3 1010000000000000 101 001 3
  1971. 2: 3 1000000000000000 001 110 3
  1972. 3: 3 0110000000000000 110 010 3
  1973. 4: 3 0100000000000000 010 100 3
  1974. 5: 3 0010000000000000 100 11 2
  1975. 6: 4 0001000000000000 1000 1000 4
  1976. 7: 4 0000000000000000 0000 0000 4
  1977.  
  1978. The values in the Val, Order Restored and Original Length columns
  1979. now represent the Shannon-Fano encoding tree that can be used for
  1980. decoding the Shannon-Fano encoded data. How to parse the
  1981. variable length Shannon-Fano values from the data stream is beyond
  1982. the scope of this document. (See the references listed at the end of
  1983. this document for more information.) However, traditional decoding
  1984. schemes used for Huffman variable length decoding, such as the
  1985. Greenlaw algorithm, can be successfully applied.
  1986.  
  1987. 5.3.9 The compressed data stream begins immediately after the
  1988. compressed Shannon-Fano data. The compressed data stream can be
  1989. interpreted as follows:
  1990.  
  1991. loop until done
  1992. read 1 bit from input stream.
  1993.  
  1994. if this bit is non-zero then (encoded data is literal data)
  1995. if Literal Shannon-Fano tree is present
  1996. read and decode character using Literal Shannon-Fano tree.
  1997. otherwise
  1998. read 8 bits from input stream.
  1999. copy character to the output stream.
  2000. otherwise (encoded data is sliding dictionary match)
  2001. if 8K dictionary size
  2002. read 7 bits for offset Distance (lower 7 bits of offset).
  2003. otherwise
  2004. read 6 bits for offset Distance (lower 6 bits of offset).
  2005.  
  2006. using the Distance Shannon-Fano tree, read and decode the
  2007. upper 6 bits of the Distance value.
  2008.  
  2009. using the Length Shannon-Fano tree, read and decode
  2010. the Length value.
  2011.  
  2012. Length <- Length + Minimum Match Length
  2013.  
  2014. if Length = 63 + Minimum Match Length
  2015. read 8 bits from the input stream,
  2016. add this value to Length.
  2017.  
  2018. move backwards Distance+1 bytes in the output stream, and
  2019. copy Length characters from this position to the output
  2020. stream. (if this position is before the start of the output
  2021. stream, then assume that all the data before the start of
  2022. the output stream is filled with zeros).
  2023. end loop
  2024.  
  2025. 5.4 Tokenizing - Method 7
  2026. -------------------------
  2027.  
  2028. 5.4.1 This method is not used by PKZIP.
  2029.  
  2030. 5.5 Deflating - Method 8
  2031. ------------------------
  2032.  
  2033. 5.5.1 The Deflate algorithm is similar to the Implode algorithm using
  2034. a sliding dictionary of up to 32K with secondary compression
  2035. from Huffman/Shannon-Fano codes.
  2036.  
  2037. 5.5.2 The compressed data is stored in blocks with a header describing
  2038. the block and the Huffman codes used in the data block. The header
  2039. format is as follows:
  2040.  
  2041. Bit 0: Last Block bit This bit is set to 1 if this is the last
  2042. compressed block in the data.
  2043. Bits 1-2: Block type
  2044. 00 (0) - Block is stored - All stored data is byte aligned.
  2045. Skip bits until next byte, then next word = block
  2046. length, followed by the ones compliment of the block
  2047. length word. Remaining data in block is the stored
  2048. data.
  2049.  
  2050. 01 (1) - Use fixed Huffman codes for literal and distance codes.
  2051. Lit Code Bits Dist Code Bits
  2052. --------- ---- --------- ----
  2053. 0 - 143 8 0 - 31 5
  2054. 144 - 255 9
  2055. 256 - 279 7
  2056. 280 - 287 8
  2057.  
  2058. Literal codes 286-287 and distance codes 30-31 are
  2059. never used but participate in the huffman construction.
  2060.  
  2061. 10 (2) - Dynamic Huffman codes. (See expanding Huffman codes)
  2062.  
  2063. 11 (3) - Reserved - Flag a "Error in compressed data" if seen.
  2064.  
  2065. 5.5.3 Expanding Huffman Codes
  2066.  
  2067. If the data block is stored with dynamic Huffman codes, the Huffman
  2068. codes are sent in the following compressed format:
  2069.  
  2070. 5 Bits: # of Literal codes sent - 256 (256 - 286)
  2071. All other codes are never sent.
  2072. 5 Bits: # of Dist codes - 1 (1 - 32)
  2073. 4 Bits: # of Bit Length codes - 3 (3 - 19)
  2074.  
  2075. The Huffman codes are sent as bit lengths and the codes are built as
  2076. described in the implode algorithm. The bit lengths themselves are
  2077. compressed with Huffman codes. There are 19 bit length codes:
  2078.  
  2079. 0 - 15: Represent bit lengths of 0 - 15
  2080. 16: Copy the previous bit length 3 - 6 times.
  2081. The next 2 bits indicate repeat length (0 = 3, ... ,3 = 6)
  2082. Example: Codes 8, 16 (+2 bits 11), 16 (+2 bits 10) will
  2083. expand to 12 bit lengths of 8 (1 + 6 + 5)
  2084. 17: Repeat a bit length of 0 for 3 - 10 times. (3 bits of length)
  2085. 18: Repeat a bit length of 0 for 11 - 138 times (7 bits of length)
  2086.  
  2087. The lengths of the bit length codes are sent packed 3 bits per value
  2088. (0 - 7) in the following order:
  2089.  
  2090. 16, 17, 18, 0, 8, 7, 9, 6, 10, 5, 11, 4, 12, 3, 13, 2, 14, 1, 15
  2091.  
  2092. The Huffman codes should be built as described in the Implode algorithm
  2093. except codes are assigned starting at the shortest bit length, i.e. the
  2094. shortest code should be all 0's rather than all 1's. Also, codes with
  2095. a bit length of zero do not participate in the tree construction. The
  2096. codes are then used to decode the bit lengths for the literal and
  2097. distance tables.
  2098.  
  2099. The bit lengths for the literal tables are sent first with the number
  2100. of entries sent described by the 5 bits sent earlier. There are up
  2101. to 286 literal characters; the first 256 represent the respective 8
  2102. bit character, code 256 represents the End-Of-Block code, the remaining
  2103. 29 codes represent copy lengths of 3 thru 258. There are up to 30
  2104. distance codes representing distances from 1 thru 32k as described
  2105. below.
  2106.  
  2107. Length Codes
  2108. ------------
  2109. Extra Extra Extra Extra
  2110. Code Bits Length Code Bits Lengths Code Bits Lengths Code Bits Length(s)
  2111. ---- ---- ------ ---- ---- ------- ---- ---- ------- ---- ---- ---------
  2112. 257 0 3 265 1 11,12 273 3 35-42 281 5 131-162
  2113. 258 0 4 266 1 13,14 274 3 43-50 282 5 163-194
  2114. 259 0 5 267 1 15,16 275 3 51-58 283 5 195-226
  2115. 260 0 6 268 1 17,18 276 3 59-66 284 5 227-257
  2116. 261 0 7 269 2 19-22 277 4 67-82 285 0 258
  2117. 262 0 8 270 2 23-26 278 4 83-98
  2118. 263 0 9 271 2 27-30 279 4 99-114
  2119. 264 0 10 272 2 31-34 280 4 115-130
  2120.  
  2121. Distance Codes
  2122. --------------
  2123. Extra Extra Extra Extra
  2124. Code Bits Dist Code Bits Dist Code Bits Distance Code Bits Distance
  2125. ---- ---- ---- ---- ---- ------ ---- ---- -------- ---- ---- --------
  2126. 0 0 1 8 3 17-24 16 7 257-384 24 11 4097-6144
  2127. 1 0 2 9 3 25-32 17 7 385-512 25 11 6145-8192
  2128. 2 0 3 10 4 33-48 18 8 513-768 26 12 8193-12288
  2129. 3 0 4 11 4 49-64 19 8 769-1024 27 12 12289-16384
  2130. 4 1 5,6 12 5 65-96 20 9 1025-1536 28 13 16385-24576
  2131. 5 1 7,8 13 5 97-128 21 9 1537-2048 29 13 24577-32768
  2132. 6 2 9-12 14 6 129-192 22 10 2049-3072
  2133. 7 2 13-16 15 6 193-256 23 10 3073-4096
  2134.  
  2135. 5.5.4 The compressed data stream begins immediately after the
  2136. compressed header data. The compressed data stream can be
  2137. interpreted as follows:
  2138.  
  2139. do
  2140. read header from input stream.
  2141.  
  2142. if stored block
  2143. skip bits until byte aligned
  2144. read count and 1's compliment of count
  2145. copy count bytes data block
  2146. otherwise
  2147. loop until end of block code sent
  2148. decode literal character from input stream
  2149. if literal < 256
  2150. copy character to the output stream
  2151. otherwise
  2152. if literal = end of block
  2153. break from loop
  2154. otherwise
  2155. decode distance from input stream
  2156.  
  2157. move backwards distance bytes in the output stream, and
  2158. copy length characters from this position to the output
  2159. stream.
  2160. end loop
  2161. while not last block
  2162.  
  2163. if data descriptor exists
  2164. skip bits until byte aligned
  2165. read crc and sizes
  2166. endif
  2167.  
  2168. 5.6 Enhanced Deflating - Method 9
  2169. ---------------------------------
  2170.  
  2171. 5.6.1 The Enhanced Deflating algorithm is similar to Deflate but uses
  2172. a sliding dictionary of up to 64K. Deflate64(tm) is supported
  2173. by the Deflate extractor.
  2174.  
  2175. 5.7 BZIP2 - Method 12
  2176. ---------------------
  2177.  
  2178. 5.7.1 BZIP2 is an open-source data compression algorithm developed by
  2179. Julian Seward. Information and source code for this algorithm
  2180. can be found on the internet.
  2181.  
  2182. 5.8 LZMA - Method 14
  2183. ---------------------
  2184.  
  2185. 5.8.1 LZMA is a block-oriented, general purpose data compression
  2186. algorithm developed and maintained by Igor Pavlov. It is a derivative
  2187. of LZ77 that utilizes Markov chains and a range coder. Information and
  2188. source code for this algorithm can be found on the internet. Consult
  2189. with the author of this algorithm for information on terms or
  2190. restrictions on use.
  2191.  
  2192. Support for LZMA within the ZIP format is defined as follows:
  2193.  
  2194. 5.8.2 The Compression method field within the ZIP Local and Central
  2195. Header records will be set to the value 14 to indicate data was
  2196. compressed using LZMA.
  2197.  
  2198. 5.8.3 The Version needed to extract field within the ZIP Local and
  2199. Central Header records will be set to 6.3 to indicate the minimum
  2200. ZIP format version supporting this feature.
  2201.  
  2202. 5.8.4 File data compressed using the LZMA algorithm must be placed
  2203. immediately following the Local Header for the file. If a standard
  2204. ZIP encryption header is required, it will follow the Local Header
  2205. and will precede the LZMA compressed file data segment. The location
  2206. of LZMA compressed data segment within the ZIP format will be as shown:
  2207.  
  2208. [local header file 1]
  2209. [encryption header file 1]
  2210. [LZMA compressed data segment for file 1]
  2211. [data descriptor 1]
  2212. [local header file 2]
  2213.  
  2214. 5.8.5 The encryption header and data descriptor records may
  2215. be conditionally present. The LZMA Compressed Data Segment
  2216. will consist of an LZMA Properties Header followed by the
  2217. LZMA Compressed Data as shown:
  2218.  
  2219. [LZMA properties header for file 1]
  2220. [LZMA compressed data for file 1]
  2221.  
  2222. 5.8.6 The LZMA Compressed Data will be stored as provided by the
  2223. LZMA compression library. Compressed size, uncompressed size and
  2224. other file characteristics about the file being compressed must be
  2225. stored in standard ZIP storage format.
  2226.  
  2227. 5.8.7 The LZMA Properties Header will store specific data required
  2228. to decompress the LZMA compressed Data. This data is set by the
  2229. LZMA compression engine using the function WriteCoderProperties()
  2230. as documented within the LZMA SDK.
  2231.  
  2232. 5.8.8 Storage fields for the property information within the LZMA
  2233. Properties Header are as follows:
  2234.  
  2235. LZMA Version Information 2 bytes
  2236. LZMA Properties Size 2 bytes
  2237. LZMA Properties Data variable, defined by "LZMA Properties Size"
  2238.  
  2239. 5.8.8.1 LZMA Version Information - this field identifies which version
  2240. of the LZMA SDK was used to compress a file. The first byte will
  2241. store the major version number of the LZMA SDK and the second
  2242. byte will store the minor number.
  2243.  
  2244. 5.8.8.2 LZMA Properties Size - this field defines the size of the
  2245. remaining property data. Typically this size should be determined by
  2246. the version of the SDK. This size field is included as a convenience
  2247. and to help avoid any ambiguity should it arise in the future due
  2248. to changes in this compression algorithm.
  2249.  
  2250. 5.8.8.3 LZMA Property Data - this variable sized field records the
  2251. required values for the decompressor as defined by the LZMA SDK.
  2252. The data stored in this field should be obtained using the
  2253. WriteCoderProperties() in the version of the SDK defined by
  2254. the "LZMA Version Information" field.
  2255.  
  2256. 5.8.8.4 The layout of the "LZMA Properties Data" field is a function of
  2257. the LZMA compression algorithm. It is possible that this layout may be
  2258. changed by the author over time. The data layout in version 4.3 of the
  2259. LZMA SDK defines a 5 byte array that uses 4 bytes to store the dictionary
  2260. size in little-endian order. This is preceded by a single packed byte as
  2261. the first element of the array that contains the following fields:
  2262.  
  2263. PosStateBits
  2264. LiteralPosStateBits
  2265. LiteralContextBits
  2266.  
  2267. Refer to the LZMA documentation for a more detailed explanation of
  2268. these fields.
  2269.  
  2270. 5.8.9 Data compressed with method 14, LZMA, may include an end-of-stream
  2271. (EOS) marker ending the compressed data stream. This marker is not
  2272. required, but its use is highly recommended to facilitate processing
  2273. and implementers should include the EOS marker whenever possible.
  2274. When the EOS marker is used, general purpose bit 1 must be set. If
  2275. general purpose bit 1 is not set, the EOS marker is not present.
  2276.  
  2277. 5.9 WavPack - Method 97
  2278. -----------------------
  2279.  
  2280. 5.9.1 Information describing the use of compression method 97 is
  2281. provided by WinZIP International, LLC. This method relies on the
  2282. open source WavPack audio compression utility developed by David Bryant.
  2283. Information on WavPack is available at www.wavpack.com. Please consult
  2284. with the author of this algorithm for information on terms and
  2285. restrictions on use.
  2286.  
  2287. 5.9.2 WavPack data for a file begins immediately after the end of the
  2288. local header data. This data is the output from WavPack compression
  2289. routines. Within the ZIP file, the use of WavPack compression is
  2290. indicated by setting the compression method field to a value of 97
  2291. in both the local header and the central directory header. The Version
  2292. needed to extract and version made by fields use the same values as are
  2293. used for data compressed using the Deflate algorithm.
  2294.  
  2295. 5.9.3 An implementation note for storing digital sample data when using
  2296. WavPack compression within ZIP files is that all of the bytes of
  2297. the sample data should be compressed. This includes any unused
  2298. bits up to the byte boundary. An example is a 2 byte sample that
  2299. uses only 12 bits for the sample data with 4 unused bits. If only
  2300. 12 bits are passed as the sample size to the WavPack routines, the 4
  2301. unused bits will be set to 0 on extraction regardless of their original
  2302. state. To avoid this, the full 16 bits of the sample data size
  2303. should be provided.
  2304.  
  2305. 5.10 PPMd - Method 98
  2306. ---------------------
  2307.  
  2308. 5.10.1 PPMd is a data compression algorithm developed by Dmitry Shkarin
  2309. which includes a carryless rangecoder developed by Dmitry Subbotin.
  2310. This algorithm is based on predictive phrase matching on multiple
  2311. order contexts. Information and source code for this algorithm
  2312. can be found on the internet. Consult with the author of this
  2313. algorithm for information on terms or restrictions on use.
  2314.  
  2315. 5.10.2 Support for PPMd within the ZIP format currently is provided only
  2316. for version I, revision 1 of the algorithm. Storage requirements
  2317. for using this algorithm are as follows:
  2318.  
  2319. 5.10.3 Parameters needed to control the algorithm are stored in the two
  2320. bytes immediately preceding the compressed data. These bytes are
  2321. used to store the following fields:
  2322.  
  2323. Model order - sets the maximum model order, default is 8, possible
  2324. values are from 2 to 16 inclusive
  2325.  
  2326. Sub-allocator size - sets the size of sub-allocator in MB, default is 50,
  2327. possible values are from 1MB to 256MB inclusive
  2328.  
  2329. Model restoration method - sets the method used to restart context
  2330. model at memory insufficiency, values are:
  2331.  
  2332. 0 - restarts model from scratch - default
  2333. 1 - cut off model - decreases performance by as much as 2x
  2334. 2 - freeze context tree - not recommended
  2335.  
  2336. 5.10.4 An example for packing these fields into the 2 byte storage field is
  2337. illustrated below. These values are stored in Intel low-byte/high-byte
  2338. order.
  2339.  
  2340. wPPMd = (Model order - 1) +
  2341. ((Sub-allocator size - 1) << 4) +
  2342. (Model restoration method << 12)
  2343.  
  2344.  
  2345. 6.0 Traditional PKWARE Encryption
  2346. ----------------------------------
  2347.  
  2348. 6.0.1 The following information discusses the decryption steps
  2349. required to support traditional PKWARE encryption. This
  2350. form of encryption is considered weak by today's standards
  2351. and its use is recommended only for situations with
  2352. low security needs or for compatibility with older .ZIP
  2353. applications.
  2354.  
  2355. 6.1 Traditional PKWARE Decryption
  2356. ---------------------------------
  2357.  
  2358. 6.1.1 PKWARE is grateful to Mr. Roger Schlafly for his expert
  2359. contribution towards the development of PKWARE's traditional
  2360. encryption.
  2361.  
  2362. 6.1.2 PKZIP encrypts the compressed data stream. Encrypted files
  2363. must be decrypted before they can be extracted to their original
  2364. form.
  2365.  
  2366. 6.1.3 Each encrypted file has an extra 12 bytes stored at the start
  2367. of the data area defining the encryption header for that file. The
  2368. encryption header is originally set to random values, and then
  2369. itself encrypted, using three, 32-bit keys. The key values are
  2370. initialized using the supplied encryption password. After each byte
  2371. is encrypted, the keys are then updated using pseudo-random number
  2372. generation techniques in combination with the same CRC-32 algorithm
  2373. used in PKZIP and described elsewhere in this document.
  2374.  
  2375. 6.1.4 The following are the basic steps required to decrypt a file:
  2376.  
  2377. 1) Initialize the three 32-bit keys with the password.
  2378. 2) Read and decrypt the 12-byte encryption header, further
  2379. initializing the encryption keys.
  2380. 3) Read and decrypt the compressed data stream using the
  2381. encryption keys.
  2382.  
  2383. 6.1.5 Initializing the encryption keys
  2384.  
  2385. Key(0) <- 305419896
  2386. Key(1) <- 591751049
  2387. Key(2) <- 878082192
  2388.  
  2389. loop for i <- 0 to length(password)-1
  2390. update_keys(password(i))
  2391. end loop
  2392.  
  2393. Where update_keys() is defined as:
  2394.  
  2395. update_keys(char):
  2396. Key(0) <- crc32(key(0),char)
  2397. Key(1) <- Key(1) + (Key(0) & 000000ffH)
  2398. Key(1) <- Key(1) * 134775813 + 1
  2399. Key(2) <- crc32(key(2),key(1) >> 24)
  2400. end update_keys
  2401.  
  2402. Where crc32(old_crc,char) is a routine that given a CRC value and a
  2403. character, returns an updated CRC value after applying the CRC-32
  2404. algorithm described elsewhere in this document.
  2405.  
  2406. 6.1.6 Decrypting the encryption header
  2407.  
  2408. The purpose of this step is to further initialize the encryption
  2409. keys, based on random data, to render a plaintext attack on the
  2410. data ineffective.
  2411.  
  2412. Read the 12-byte encryption header into Buffer, in locations
  2413. Buffer(0) thru Buffer(11).
  2414.  
  2415. loop for i <- 0 to 11
  2416. C <- buffer(i) ^ decrypt_byte()
  2417. update_keys(C)
  2418. buffer(i) <- C
  2419. end loop
  2420.  
  2421. Where decrypt_byte() is defined as:
  2422.  
  2423. unsigned char decrypt_byte()
  2424. local unsigned short temp
  2425. temp <- Key(2) | 2
  2426. decrypt_byte <- (temp * (temp ^ 1)) >> 8
  2427. end decrypt_byte
  2428.  
  2429. After the header is decrypted, the last 1 or 2 bytes in Buffer
  2430. should be the high-order word/byte of the CRC for the file being
  2431. decrypted, stored in Intel low-byte/high-byte order. Versions of
  2432. PKZIP prior to 2.0 used a 2 byte CRC check; a 1 byte CRC check is
  2433. used on versions after 2.0. This can be used to test if the password
  2434. supplied is correct or not.
  2435.  
  2436. 6.1.7 Decrypting the compressed data stream
  2437.  
  2438. The compressed data stream can be decrypted as follows:
  2439.  
  2440. loop until done
  2441. read a character into C
  2442. Temp <- C ^ decrypt_byte()
  2443. update_keys(temp)
  2444. output Temp
  2445. end loop
  2446.  
  2447.  
  2448. 7.0 Strong Encryption Specification
  2449. -----------------------------------
  2450.  
  2451. 7.0.1 Portions of the Strong Encryption technology defined in this
  2452. specification are covered under patents and pending patent applications.
  2453. Refer to the section in this document entitled "Incorporating
  2454. PKWARE Proprietary Technology into Your Product" for more information.
  2455.  
  2456. 7.1 Strong Encryption Overview
  2457. ------------------------------
  2458.  
  2459. 7.1.1 Version 5.x of this specification introduced support for strong
  2460. encryption algorithms. These algorithms can be used with either
  2461. a password or an X.509v3 digital certificate to encrypt each file.
  2462. This format specification supports either password or certificate
  2463. based encryption to meet the security needs of today, to enable
  2464. interoperability between users within both PKI and non-PKI
  2465. environments, and to ensure interoperability between different
  2466. computing platforms that are running a ZIP program.
  2467.  
  2468. 7.1.2 Password based encryption is the most common form of encryption
  2469. people are familiar with. However, inherent weaknesses with
  2470. passwords (e.g. susceptibility to dictionary/brute force attack)
  2471. as well as password management and support issues make certificate
  2472. based encryption a more secure and scalable option. Industry
  2473. efforts and support are defining and moving towards more advanced
  2474. security solutions built around X.509v3 digital certificates and
  2475. Public Key Infrastructures(PKI) because of the greater scalability,
  2476. administrative options, and more robust security over traditional
  2477. password based encryption.
  2478.  
  2479. 7.1.3 Most standard encryption algorithms are supported with this
  2480. specification. Reference implementations for many of these
  2481. algorithms are available from either commercial or open source
  2482. distributors. Readily available cryptographic toolkits make
  2483. implementation of the encryption features straight-forward.
  2484. This document is not intended to provide a treatise on data
  2485. encryption principles or theory. Its purpose is to document the
  2486. data structures required for implementing interoperable data
  2487. encryption within the .ZIP format. It is strongly recommended that
  2488. you have a good understanding of data encryption before reading
  2489. further.
  2490.  
  2491. 7.1.4 The algorithms introduced in Version 5.0 of this specification
  2492. include:
  2493.  
  2494. RC2 40 bit, 64 bit, and 128 bit
  2495. RC4 40 bit, 64 bit, and 128 bit
  2496. DES
  2497. 3DES 112 bit and 168 bit
  2498.  
  2499. Version 5.1 adds support for the following:
  2500.  
  2501. AES 128 bit, 192 bit, and 256 bit
  2502.  
  2503.  
  2504. 7.1.5 Version 6.1 introduces encryption data changes to support
  2505. interoperability with Smartcard and USB Token certificate storage
  2506. methods which do not support the OAEP strengthening standard.
  2507.  
  2508. 7.1.6 Version 6.2 introduces support for encrypting metadata by compressing
  2509. and encrypting the central directory data structure to reduce information
  2510. leakage. Information leakage can occur in legacy ZIP applications
  2511. through exposure of information about a file even though that file is
  2512. stored encrypted. The information exposed consists of file
  2513. characteristics stored within the records and fields defined by this
  2514. specification. This includes data such as a file's name, its original
  2515. size, timestamp and CRC32 value.
  2516.  
  2517. 7.1.7 Version 6.3 introduces support for encrypting data using the Blowfish
  2518. and Twofish algorithms. These are symmetric block ciphers developed
  2519. by Bruce Schneier. Blowfish supports using a variable length key from
  2520. 32 to 448 bits. Block size is 64 bits. Implementations should use 16
  2521. rounds and the only mode supported within ZIP files is CBC. Twofish
  2522. supports key sizes 128, 192 and 256 bits. Block size is 128 bits.
  2523. Implementations should use 16 rounds and the only mode supported within
  2524. ZIP files is CBC. Information and source code for both Blowfish and
  2525. Twofish algorithms can be found on the internet. Consult with the author
  2526. of these algorithms for information on terms or restrictions on use.
  2527.  
  2528. 7.1.8 Central Directory Encryption provides greater protection against
  2529. information leakage by encrypting the Central Directory structure and
  2530. by masking key values that are replicated in the unencrypted Local
  2531. Header. ZIP compatible programs that cannot interpret an encrypted
  2532. Central Directory structure cannot rely on the data in the corresponding
  2533. Local Header for decompression information.
  2534.  
  2535. 7.1.9 Extra Field records that may contain information about a file that should
  2536. not be exposed should not be stored in the Local Header and should only
  2537. be written to the Central Directory where they can be encrypted. This
  2538. design currently does not support streaming. Information in the End of
  2539. Central Directory record, the Zip64 End of Central Directory Locator,
  2540. and the Zip64 End of Central Directory records are not encrypted. Access
  2541. to view data on files within a ZIP file with an encrypted Central Directory
  2542. requires the appropriate password or private key for decryption prior to
  2543. viewing any files, or any information about the files, in the archive.
  2544.  
  2545. 7.1.10 Older ZIP compatible programs not familiar with the Central Directory
  2546. Encryption feature will no longer be able to recognize the Central
  2547. Directory and may assume the ZIP file is corrupt. Programs that
  2548. attempt streaming access using Local Headers will see invalid
  2549. information for each file. Central Directory Encryption need not be
  2550. used for every ZIP file. Its use is recommended for greater security.
  2551. ZIP files not using Central Directory Encryption should operate as
  2552. in the past.
  2553.  
  2554. 7.1.11 This strong encryption feature specification is intended to provide for
  2555. scalable, cross-platform encryption needs ranging from simple password
  2556. encryption to authenticated public/private key encryption.
  2557.  
  2558. 7.1.12 Encryption provides data confidentiality and privacy. It is
  2559. recommended that you combine X.509 digital signing with encryption
  2560. to add authentication and non-repudiation.
  2561.  
  2562.  
  2563. 7.2 Single Password Symmetric Encryption Method
  2564. -----------------------------------------------
  2565.  
  2566. 7.2.1 The Single Password Symmetric Encryption Method using strong
  2567. encryption algorithms operates similarly to the traditional
  2568. PKWARE encryption defined in this format. Additional data
  2569. structures are added to support the processing needs of the
  2570. strong algorithms.
  2571.  
  2572. The Strong Encryption data structures are:
  2573.  
  2574. 7.2.2 General Purpose Bits - Bits 0 and 6 of the General Purpose bit
  2575. flag in both local and central header records. Both bits set
  2576. indicates strong encryption. Bit 13, when set indicates the Central
  2577. Directory is encrypted and that selected fields in the Local Header
  2578. are masked to hide their actual value.
  2579.  
  2580.  
  2581. 7.2.3 Extra Field 0x0017 in central header only.
  2582.  
  2583. Fields to consider in this record are:
  2584.  
  2585. 7.2.3.1 Format - the data format identifier for this record. The only
  2586. value allowed at this time is the integer value 2.
  2587.  
  2588. 7.2.3.2 AlgId - integer identifier of the encryption algorithm from the
  2589. following range
  2590.  
  2591. 0x6601 - DES
  2592. 0x6602 - RC2 (version needed to extract < 5.2)
  2593. 0x6603 - 3DES 168
  2594. 0x6609 - 3DES 112
  2595. 0x660E - AES 128
  2596. 0x660F - AES 192
  2597. 0x6610 - AES 256
  2598. 0x6702 - RC2 (version needed to extract >= 5.2)
  2599. 0x6720 - Blowfish
  2600. 0x6721 - Twofish
  2601. 0x6801 - RC4
  2602. 0xFFFF - Unknown algorithm
  2603.  
  2604. 7.2.3.3 Bitlen - Explicit bit length of key
  2605.  
  2606. 32 - 448 bits
  2607.  
  2608. 7.2.3.4 Flags - Processing flags needed for decryption
  2609.  
  2610. 0x0001 - Password is required to decrypt
  2611. 0x0002 - Certificates only
  2612. 0x0003 - Password or certificate required to decrypt
  2613.  
  2614. Values > 0x0003 reserved for certificate processing
  2615.  
  2616.  
  2617. 7.2.4 Decryption header record preceding compressed file data.
  2618.  
  2619. -Decryption Header:
  2620.  
  2621. Value Size Description
  2622. ----- ---- -----------
  2623. IVSize 2 bytes Size of initialization vector (IV)
  2624. IVData IVSize Initialization vector for this file
  2625. Size 4 bytes Size of remaining decryption header data
  2626. Format 2 bytes Format definition for this record
  2627. AlgID 2 bytes Encryption algorithm identifier
  2628. Bitlen 2 bytes Bit length of encryption key
  2629. Flags 2 bytes Processing flags
  2630. ErdSize 2 bytes Size of Encrypted Random Data
  2631. ErdData ErdSize Encrypted Random Data
  2632. Reserved1 4 bytes Reserved certificate processing data
  2633. Reserved2 (var) Reserved for certificate processing data
  2634. VSize 2 bytes Size of password validation data
  2635. VData VSize-4 Password validation data
  2636. VCRC32 4 bytes Standard ZIP CRC32 of password validation data
  2637.  
  2638. 7.2.4.1 IVData - The size of the IV should match the algorithm block size.
  2639. The IVData can be completely random data. If the size of
  2640. the randomly generated data does not match the block size
  2641. it should be complemented with zero's or truncated as
  2642. necessary. If IVSize is 0,then IV = CRC32 + Uncompressed
  2643. File Size (as a 64 bit little-endian, unsigned integer value).
  2644.  
  2645. 7.2.4.2 Format - the data format identifier for this record. The only
  2646. value allowed at this time is the integer value 3.
  2647.  
  2648. 7.2.4.3 AlgId - integer identifier of the encryption algorithm from the
  2649. following range
  2650.  
  2651. 0x6601 - DES
  2652. 0x6602 - RC2 (version needed to extract < 5.2)
  2653. 0x6603 - 3DES 168
  2654. 0x6609 - 3DES 112
  2655. 0x660E - AES 128
  2656. 0x660F - AES 192
  2657. 0x6610 - AES 256
  2658. 0x6702 - RC2 (version needed to extract >= 5.2)
  2659. 0x6720 - Blowfish
  2660. 0x6721 - Twofish
  2661. 0x6801 - RC4
  2662. 0xFFFF - Unknown algorithm
  2663.  
  2664. 7.2.4.4 Bitlen - Explicit bit length of key
  2665.  
  2666. 32 - 448 bits
  2667.  
  2668. 7.2.4.5 Flags - Processing flags needed for decryption
  2669.  
  2670. 0x0001 - Password is required to decrypt
  2671. 0x0002 - Certificates only
  2672. 0x0003 - Password or certificate required to decrypt
  2673.  
  2674. Values > 0x0003 reserved for certificate processing
  2675.  
  2676. 7.2.4.6 ErdData - Encrypted random data is used to store random data that
  2677. is used to generate a file session key for encrypting
  2678. each file. SHA1 is used to calculate hash data used to
  2679. derive keys. File session keys are derived from a master
  2680. session key generated from the user-supplied password.
  2681. If the Flags field in the decryption header contains
  2682. the value 0x4000, then the ErdData field must be
  2683. decrypted using 3DES. If the value 0x4000 is not set,
  2684. then the ErdData field must be decrypted using AlgId.
  2685.  
  2686.  
  2687. 7.2.4.7 Reserved1 - Reserved for certificate processing, if value is
  2688. zero, then Reserved2 data is absent. See the explanation
  2689. under the Certificate Processing Method for details on
  2690. this data structure.
  2691.  
  2692. 7.2.4.8 Reserved2 - If present, the size of the Reserved2 data structure
  2693. is located by skipping the first 4 bytes of this field
  2694. and using the next 2 bytes as the remaining size. See
  2695. the explanation under the Certificate Processing Method
  2696. for details on this data structure.
  2697.  
  2698. 7.2.4.9 VSize - This size value will always include the 4 bytes of the
  2699. VCRC32 data and will be greater than 4 bytes.
  2700.  
  2701. 7.2.4.10 VData - Random data for password validation. This data is VSize
  2702. in length and VSize must be a multiple of the encryption
  2703. block size. VCRC32 is a checksum value of VData.
  2704. VData and VCRC32 are stored encrypted and start the
  2705. stream of encrypted data for a file.
  2706.  
  2707.  
  2708. 7.2.5 Useful Tips
  2709.  
  2710. 7.2.5.1 Strong Encryption is always applied to a file after compression. The
  2711. block oriented algorithms all operate in Cypher Block Chaining (CBC)
  2712. mode. The block size used for AES encryption is 16. All other block
  2713. algorithms use a block size of 8. Two IDs are defined for RC2 to
  2714. account for a discrepancy found in the implementation of the RC2
  2715. algorithm in the cryptographic library on Windows XP SP1 and all
  2716. earlier versions of Windows. It is recommended that zero length files
  2717. not be encrypted, however programs should be prepared to extract them
  2718. if they are found within a ZIP file.
  2719.  
  2720. 7.2.5.2 A pseudo-code representation of the encryption process is as follows:
  2721.  
  2722. Password = GetUserPassword()
  2723. MasterSessionKey = DeriveKey(SHA1(Password))
  2724. RD = CryptographicStrengthRandomData()
  2725. For Each File
  2726. IV = CryptographicStrengthRandomData()
  2727. VData = CryptographicStrengthRandomData()
  2728. VCRC32 = CRC32(VData)
  2729. FileSessionKey = DeriveKey(SHA1(IV + RD)
  2730. ErdData = Encrypt(RD,MasterSessionKey,IV)
  2731. Encrypt(VData + VCRC32 + FileData, FileSessionKey,IV)
  2732. Done
  2733.  
  2734. 7.2.5.3 The function names and parameter requirements will depend on
  2735. the choice of the cryptographic toolkit selected. Almost any
  2736. toolkit supporting the reference implementations for each
  2737. algorithm can be used. The RSA BSAFE(r), OpenSSL, and Microsoft
  2738. CryptoAPI libraries are all known to work well.
  2739.  
  2740.  
  2741. 7.3 Single Password - Central Directory Encryption
  2742. --------------------------------------------------
  2743.  
  2744. 7.3.1 Central Directory Encryption is achieved within the .ZIP format by
  2745. encrypting the Central Directory structure. This encapsulates the metadata
  2746. most often used for processing .ZIP files. Additional metadata is stored for
  2747. redundancy in the Local Header for each file. The process of concealing
  2748. metadata by encrypting the Central Directory does not protect the data within
  2749. the Local Header. To avoid information leakage from the exposed metadata
  2750. in the Local Header, the fields containing information about a file are masked.
  2751.  
  2752. 7.3.2 Local Header
  2753.  
  2754. Masking replaces the true content of the fields for a file in the Local
  2755. Header with false information. When masked, the Local Header is not
  2756. suitable for streaming access and the options for data recovery of damaged
  2757. archives is reduced. Extra Data fields that may contain confidential
  2758. data should not be stored within the Local Header. The value set into
  2759. the Version needed to extract field should be the correct value needed to
  2760. extract the file without regard to Central Directory Encryption. The fields
  2761. within the Local Header targeted for masking when the Central Directory is
  2762. encrypted are:
  2763.  
  2764. Field Name Mask Value
  2765. ------------------ ---------------------------
  2766. compression method 0
  2767. last mod file time 0
  2768. last mod file date 0
  2769. crc-32 0
  2770. compressed size 0
  2771. uncompressed size 0
  2772. file name (variable size) Base 16 value from the
  2773. range 1 - 0xFFFFFFFFFFFFFFFF
  2774. represented as a string whose
  2775. size will be set into the
  2776. file name length field
  2777.  
  2778. The Base 16 value assigned as a masked file name is simply a sequentially
  2779. incremented value for each file starting with 1 for the first file.
  2780. Modifications to a ZIP file may cause different values to be stored for
  2781. each file. For compatibility, the file name field in the Local Header
  2782. should never be left blank. As of Version 6.2 of this specification,
  2783. the Compression Method and Compressed Size fields are not yet masked.
  2784. Fields having a value of 0xFFFF or 0xFFFFFFFF for the ZIP64 format
  2785. should not be masked.
  2786.  
  2787. 7.3.3 Encrypting the Central Directory
  2788.  
  2789. Encryption of the Central Directory does not include encryption of the
  2790. Central Directory Signature data, the Zip64 End of Central Directory
  2791. record, the Zip64 End of Central Directory Locator, or the End
  2792. of Central Directory record. The ZIP file comment data is never
  2793. encrypted.
  2794.  
  2795. Before encrypting the Central Directory, it may optionally be compressed.
  2796. Compression is not required, but for storage efficiency it is assumed
  2797. this structure will be compressed before encrypting. Similarly, this
  2798. specification supports compressing the Central Directory without
  2799. requiring that it also be encrypted. Early implementations of this
  2800. feature will assume the encryption method applied to files matches the
  2801. encryption applied to the Central Directory.
  2802.  
  2803. Encryption of the Central Directory is done in a manner similar to
  2804. that of file encryption. The encrypted data is preceded by a
  2805. decryption header. The decryption header is known as the Archive
  2806. Decryption Header. The fields of this record are identical to
  2807. the decryption header preceding each encrypted file. The location
  2808. of the Archive Decryption Header is determined by the value in the
  2809. Start of the Central Directory field in the Zip64 End of Central
  2810. Directory record. When the Central Directory is encrypted, the
  2811. Zip64 End of Central Directory record will always be present.
  2812.  
  2813. The layout of the Zip64 End of Central Directory record for all
  2814. versions starting with 6.2 of this specification will follow the
  2815. Version 2 format. The Version 2 format is as follows:
  2816.  
  2817. The leading fixed size fields within the Version 1 format for this
  2818. record remain unchanged. The record signature for both Version 1
  2819. and Version 2 will be 0x06064b50. Immediately following the last
  2820. byte of the field known as the Offset of Start of Central
  2821. Directory With Respect to the Starting Disk Number will begin the
  2822. new fields defining Version 2 of this record.
  2823.  
  2824. 7.3.4 New fields for Version 2
  2825.  
  2826. Note: all fields stored in Intel low-byte/high-byte order.
  2827.  
  2828. Value Size Description
  2829. ----- ---- -----------
  2830. Compression Method 2 bytes Method used to compress the
  2831. Central Directory
  2832. Compressed Size 8 bytes Size of the compressed data
  2833. Original Size 8 bytes Original uncompressed size
  2834. AlgId 2 bytes Encryption algorithm ID
  2835. BitLen 2 bytes Encryption key length
  2836. Flags 2 bytes Encryption flags
  2837. HashID 2 bytes Hash algorithm identifier
  2838. Hash Length 2 bytes Length of hash data
  2839. Hash Data (variable) Hash data
  2840.  
  2841. The Compression Method accepts the same range of values as the
  2842. corresponding field in the Central Header.
  2843.  
  2844. The Compressed Size and Original Size values will not include the
  2845. data of the Central Directory Signature which is compressed or
  2846. encrypted.
  2847.  
  2848. The AlgId, BitLen, and Flags fields accept the same range of values
  2849. the corresponding fields within the 0x0017 record.
  2850.  
  2851. Hash ID identifies the algorithm used to hash the Central Directory
  2852. data. This data does not have to be hashed, in which case the
  2853. values for both the HashID and Hash Length will be 0. Possible
  2854. values for HashID are:
  2855.  
  2856. Value Algorithm
  2857. ------ ---------
  2858. 0x0000 none
  2859. 0x0001 CRC32
  2860. 0x8003 MD5
  2861. 0x8004 SHA1
  2862. 0x8007 RIPEMD160
  2863. 0x800C SHA256
  2864. 0x800D SHA384
  2865. 0x800E SHA512
  2866.  
  2867. 7.3.5 When the Central Directory data is signed, the same hash algorithm
  2868. used to hash the Central Directory for signing should be used.
  2869. This is recommended for processing efficiency, however, it is
  2870. permissible for any of the above algorithms to be used independent
  2871. of the signing process.
  2872.  
  2873. The Hash Data will contain the hash data for the Central Directory.
  2874. The length of this data will vary depending on the algorithm used.
  2875.  
  2876. The Version Needed to Extract should be set to 62.
  2877.  
  2878. The value for the Total Number of Entries on the Current Disk will
  2879. be 0. These records will no longer support random access when
  2880. encrypting the Central Directory.
  2881.  
  2882. 7.3.6 When the Central Directory is compressed and/or encrypted, the
  2883. End of Central Directory record will store the value 0xFFFFFFFF
  2884. as the value for the Total Number of Entries in the Central
  2885. Directory. The value stored in the Total Number of Entries in
  2886. the Central Directory on this Disk field will be 0. The actual
  2887. values will be stored in the equivalent fields of the Zip64
  2888. End of Central Directory record.
  2889.  
  2890. 7.3.7 Decrypting and decompressing the Central Directory is accomplished
  2891. in the same manner as decrypting and decompressing a file.
  2892.  
  2893. 7.4 Certificate Processing Method
  2894. ---------------------------------
  2895.  
  2896. The Certificate Processing Method for ZIP file encryption
  2897. defines the following additional data fields:
  2898.  
  2899. 7.4.1 Certificate Flag Values
  2900.  
  2901. Additional processing flags that can be present in the Flags field of both
  2902. the 0x0017 field of the central directory Extra Field and the Decryption
  2903. header record preceding compressed file data are:
  2904.  
  2905. 0x0007 - reserved for future use
  2906. 0x000F - reserved for future use
  2907. 0x0100 - Indicates non-OAEP key wrapping was used. If this
  2908. this field is set, the version needed to extract must
  2909. be at least 61. This means OAEP key wrapping is not
  2910. used when generating a Master Session Key using
  2911. ErdData.
  2912. 0x4000 - ErdData must be decrypted using 3DES-168, otherwise use the
  2913. same algorithm used for encrypting the file contents.
  2914. 0x8000 - reserved for future use
  2915.  
  2916.  
  2917. 7.4.2 CertData - Extra Field 0x0017 record certificate data structure
  2918.  
  2919. The data structure used to store certificate data within the section
  2920. of the Extra Field defined by the CertData field of the 0x0017
  2921. record are as shown:
  2922.  
  2923. Value Size Description
  2924. ----- ---- -----------
  2925. RCount 4 bytes Number of recipients.
  2926. HashAlg 2 bytes Hash algorithm identifier
  2927. HSize 2 bytes Hash size
  2928. SRList (var) Simple list of recipients hashed public keys
  2929.  
  2930.  
  2931. RCount This defines the number intended recipients whose
  2932. public keys were used for encryption. This identifies
  2933. the number of elements in the SRList.
  2934.  
  2935. HashAlg This defines the hash algorithm used to calculate
  2936. the public key hash of each public key used
  2937. for encryption. This field currently supports
  2938. only the following value for SHA-1
  2939.  
  2940. 0x8004 - SHA1
  2941.  
  2942. HSize This defines the size of a hashed public key.
  2943.  
  2944. SRList This is a variable length list of the hashed
  2945. public keys for each intended recipient. Each
  2946. element in this list is HSize. The total size of
  2947. SRList is determined using RCount * HSize.
  2948.  
  2949.  
  2950. 7.4.3 Reserved1 - Certificate Decryption Header Reserved1 Data
  2951.  
  2952. Value Size Description
  2953. ----- ---- -----------
  2954. RCount 4 bytes Number of recipients.
  2955.  
  2956. RCount This defines the number intended recipients whose
  2957. public keys were used for encryption. This defines
  2958. the number of elements in the REList field defined below.
  2959.  
  2960.  
  2961. 7.4.4 Reserved2 - Certificate Decryption Header Reserved2 Data Structures
  2962.  
  2963.  
  2964. Value Size Description
  2965. ----- ---- -----------
  2966. HashAlg 2 bytes Hash algorithm identifier
  2967. HSize 2 bytes Hash size
  2968. REList (var) List of recipient data elements
  2969.  
  2970.  
  2971. HashAlg This defines the hash algorithm used to calculate
  2972. the public key hash of each public key used
  2973. for encryption. This field currently supports
  2974. only the following value for SHA-1
  2975.  
  2976. 0x8004 - SHA1
  2977.  
  2978. HSize This defines the size of a hashed public key
  2979. defined in REHData.
  2980.  
  2981. REList This is a variable length of list of recipient data.
  2982. Each element in this list consists of a Recipient
  2983. Element data structure as follows:
  2984.  
  2985.  
  2986. Recipient Element (REList) Data Structure:
  2987.  
  2988. Value Size Description
  2989. ----- ---- -----------
  2990. RESize 2 bytes Size of REHData + REKData
  2991. REHData HSize Hash of recipients public key
  2992. REKData (var) Simple key blob
  2993.  
  2994.  
  2995. RESize This defines the size of an individual REList
  2996. element. This value is the combined size of the
  2997. REHData field + REKData field. REHData is defined by
  2998. HSize. REKData is variable and can be calculated
  2999. for each REList element using RESize and HSize.
  3000.  
  3001. REHData Hashed public key for this recipient.
  3002.  
  3003. REKData Simple Key Blob. The format of this data structure
  3004. is identical to that defined in the Microsoft
  3005. CryptoAPI and generated using the CryptExportKey()
  3006. function. The version of the Simple Key Blob
  3007. supported at this time is 0x02 as defined by
  3008. Microsoft.
  3009.  
  3010. 7.5 Certificate Processing - Central Directory Encryption
  3011. ---------------------------------------------------------
  3012.  
  3013. 7.5.1 Central Directory Encryption using Digital Certificates will
  3014. operate in a manner similar to that of Single Password Central
  3015. Directory Encryption. This record will only be present when there
  3016. is data to place into it. Currently, data is placed into this
  3017. record when digital certificates are used for either encrypting
  3018. or signing the files within a ZIP file. When only password
  3019. encryption is used with no certificate encryption or digital
  3020. signing, this record is not currently needed. When present, this
  3021. record will appear before the start of the actual Central Directory
  3022. data structure and will be located immediately after the Archive
  3023. Decryption Header if the Central Directory is encrypted.
  3024.  
  3025. 7.5.2 The Archive Extra Data record will be used to store the following
  3026. information. Additional data may be added in future versions.
  3027.  
  3028. Extra Data Fields:
  3029.  
  3030. 0x0014 - PKCS#7 Store for X.509 Certificates
  3031. 0x0016 - X.509 Certificate ID and Signature for central directory
  3032. 0x0019 - PKCS#7 Encryption Recipient Certificate List
  3033.  
  3034. The 0x0014 and 0x0016 Extra Data records that otherwise would be
  3035. located in the first record of the Central Directory for digital
  3036. certificate processing. When encrypting or compressing the Central
  3037. Directory, the 0x0014 and 0x0016 records must be located in the
  3038. Archive Extra Data record and they should not remain in the first
  3039. Central Directory record. The Archive Extra Data record will also
  3040. be used to store the 0x0019 data.
  3041.  
  3042. 7.5.3 When present, the size of the Archive Extra Data record will be
  3043. included in the size of the Central Directory. The data of the
  3044. Archive Extra Data record will also be compressed and encrypted
  3045. along with the Central Directory data structure.
  3046.  
  3047. 7.6 Certificate Processing Differences
  3048. --------------------------------------
  3049.  
  3050. 7.6.1 The Certificate Processing Method of encryption differs from the
  3051. Single Password Symmetric Encryption Method as follows. Instead
  3052. of using a user-defined password to generate a master session key,
  3053. cryptographically random data is used. The key material is then
  3054. wrapped using standard key-wrapping techniques. This key material
  3055. is wrapped using the public key of each recipient that will need
  3056. to decrypt the file using their corresponding private key.
  3057.  
  3058. 7.6.2 This specification currently assumes digital certificates will follow
  3059. the X.509 V3 format for 1024 bit and higher RSA format digital
  3060. certificates. Implementation of this Certificate Processing Method
  3061. requires supporting logic for key access and management. This logic
  3062. is outside the scope of this specification.
  3063.  
  3064. 7.7 OAEP Processing with Certificate-based Encryption
  3065. -----------------------------------------------------
  3066.  
  3067. 7.7.1 OAEP stands for Optimal Asymmetric Encryption Padding. It is a
  3068. strengthening technique used for small encoded items such as decryption
  3069. keys. This is commonly applied in cryptographic key-wrapping techniques
  3070. and is supported by PKCS #1. Versions 5.0 and 6.0 of this specification
  3071. were designed to support OAEP key-wrapping for certificate-based
  3072. decryption keys for additional security.
  3073.  
  3074. 7.7.2 Support for private keys stored on Smartcards or Tokens introduced
  3075. a conflict with this OAEP logic. Most card and token products do
  3076. not support the additional strengthening applied to OAEP key-wrapped
  3077. data. In order to resolve this conflict, versions 6.1 and above of this
  3078. specification will no longer support OAEP when encrypting using
  3079. digital certificates.
  3080.  
  3081. 7.7.3 Versions of PKZIP available during initial development of the
  3082. certificate processing method set a value of 61 into the
  3083. version needed to extract field for a file. This indicates that
  3084. non-OAEP key wrapping is used. This affects certificate encryption
  3085. only, and password encryption functions should not be affected by
  3086. this value. This means values of 61 may be found on files encrypted
  3087. with certificates only, or on files encrypted with both password
  3088. encryption and certificate encryption. Files encrypted with both
  3089. methods can safely be decrypted using the password methods documented.
  3090.  
  3091. 8.0 Splitting and Spanning ZIP files
  3092. -------------------------------------
  3093.  
  3094. 8.1 Spanned ZIP files
  3095.  
  3096. 8.1.1 Spanning is the process of segmenting a ZIP file across
  3097. multiple removable media. This support has typically only
  3098. been provided for DOS formatted floppy diskettes.
  3099.  
  3100. 8.2 Split ZIP files
  3101.  
  3102. 8.2.1 File splitting is a newer derivation of spanning.
  3103. Splitting follows the same segmentation process as
  3104. spanning, however, it does not require writing each
  3105. segment to a unique removable medium and instead supports
  3106. placing all pieces onto local or non-removable locations
  3107. such as file systems, local drives, folders, etc.
  3108.  
  3109. 8.3 File Naming Differences
  3110.  
  3111. 8.3.1 A key difference between spanned and split ZIP files is
  3112. that all pieces of a spanned ZIP file have the same name.
  3113. Since each piece is written to a separate volume, no name
  3114. collisions occur and each segment can reuse the original
  3115. .ZIP file name given to the archive.
  3116.  
  3117. 8.3.2 Sequence ordering for DOS spanned archives uses the DOS
  3118. volume label to determine segment numbers. Volume labels
  3119. for each segment are written using the form PKBACK#xxx,
  3120. where xxx is the segment number written as a decimal
  3121. value from 001 - nnn.
  3122.  
  3123. 8.3.3 Split ZIP files are typically written to the same location
  3124. and are subject to name collisions if the spanned name
  3125. format is used since each segment will reside on the same
  3126. drive. To avoid name collisions, split archives are named
  3127. as follows.
  3128.  
  3129. Segment 1 = filename.z01
  3130. Segment n-1 = filename.z(n-1)
  3131. Segment n = filename.zip
  3132.  
  3133. 8.3.4 The .ZIP extension is used on the last segment to support
  3134. quickly reading the central directory. The segment number
  3135. n should be a decimal value.
  3136.  
  3137. 8.4 Spanned Self-extracting ZIP Files
  3138.  
  3139. 8.4.1 Spanned ZIP files may be PKSFX Self-extracting ZIP files.
  3140. PKSFX files may also be split, however, in this case
  3141. the first segment must be named filename.exe. The first
  3142. segment of a split PKSFX archive must be large enough to
  3143. include the entire executable program.
  3144.  
  3145. 8.5 Capacities and Markers
  3146.  
  3147. 8.5.1 Capacities for split archives are as follows:
  3148.  
  3149. Maximum number of segments = 4,294,967,295 - 1
  3150. Maximum .ZIP segment size = 4,294,967,295 bytes
  3151. Minimum segment size = 64K
  3152. Maximum PKSFX segment size = 2,147,483,647 bytes
  3153.  
  3154. 8.5.2 Segment sizes may be different however by convention, all
  3155. segment sizes should be the same with the exception of the
  3156. last, which may be smaller. Local and central directory
  3157. header records must never be split across a segment boundary.
  3158. When writing a header record, if the number of bytes remaining
  3159. within a segment is less than the size of the header record,
  3160. end the current segment and write the header at the start
  3161. of the next segment. The central directory may span segment
  3162. boundaries, but no single record in the central directory
  3163. should be split across segments.
  3164.  
  3165. 8.5.3 Spanned/Split archives created using PKZIP for Windows
  3166. (V2.50 or greater), PKZIP Command Line (V2.50 or greater),
  3167. or PKZIP Explorer will include a special spanning
  3168. signature as the first 4 bytes of the first segment of
  3169. the archive. This signature (0x08074b50) will be
  3170. followed immediately by the local header signature for
  3171. the first file in the archive.
  3172.  
  3173. 8.5.4 A special spanning marker may also appear in spanned/split
  3174. archives if the spanning or splitting process starts but
  3175. only requires one segment. In this case the 0x08074b50
  3176. signature will be replaced with the temporary spanning
  3177. marker signature of 0x30304b50. Split archives can
  3178. only be uncompressed by other versions of PKZIP that
  3179. know how to create a split archive.
  3180.  
  3181. 8.5.5 The signature value 0x08074b50 is also used by some
  3182. ZIP implementations as a marker for the Data Descriptor
  3183. record. Conflict in this alternate assignment can be
  3184. avoided by ensuring the position of the signature
  3185. within the ZIP file to determine the use for which it
  3186. is intended.
  3187.  
  3188. 9.0 Change Process
  3189. ------------------
  3190.  
  3191. 9.1 In order for the .ZIP file format to remain a viable technology, this
  3192. specification should be considered as open for periodic review and
  3193. revision. Although this format was originally designed with a
  3194. certain level of extensibility, not all changes in technology
  3195. (present or future) were or will be necessarily considered in its
  3196. design.
  3197.  
  3198. 9.2 If your application requires new definitions to the
  3199. extensible sections in this format, or if you would like to
  3200. submit new data structures or new capabilities, please forward
  3201. your request to zipformat@pkware.com. All submissions will be
  3202. reviewed by the ZIP File Specification Committee for possible
  3203. inclusion into future versions of this specification.
  3204.  
  3205. 9.3 Periodic revisions to this specification will be published as
  3206. DRAFT or as FINAL status to ensure interoperability. We encourage
  3207. comments and feedback that may help improve clarity or content.
  3208.  
  3209.  
  3210. 10.0 Incorporating PKWARE Proprietary Technology into Your Product
  3211. ------------------------------------------------------------------
  3212.  
  3213. 10.1 The Use or Implementation in a product of APPNOTE technological
  3214. components pertaining to either strong encryption or patching requires
  3215. a separate, executed license agreement from PKWARE. Please contact
  3216. PKWARE at zipformat@pkware.com or +1-414-289-9788 with regard to
  3217. acquiring such a license.
  3218.  
  3219. 10.2 Additional information regarding PKWARE proprietray technology is
  3220. available at http://www.pkware.com/appnote.
  3221.  
  3222. 11.0 Acknowledgements
  3223. ---------------------
  3224.  
  3225. In addition to the above mentioned contributors to PKZIP and PKUNZIP,
  3226. PKWARE would like to extend special thanks to Robert Mahoney for
  3227. suggesting the extension .ZIP for this software.
  3228.  
  3229. 12.0 References
  3230. ---------------
  3231.  
  3232. Fiala, Edward R., and Greene, Daniel H., "Data compression with
  3233. finite windows", Communications of the ACM, Volume 32, Number 4,
  3234. April 1989, pages 490-505.
  3235.  
  3236. Held, Gilbert, "Data Compression, Techniques and Applications,
  3237. Hardware and Software Considerations", John Wiley & Sons, 1987.
  3238.  
  3239. Huffman, D.A., "A method for the construction of minimum-redundancy
  3240. codes", Proceedings of the IRE, Volume 40, Number 9, September 1952,
  3241. pages 1098-1101.
  3242.  
  3243. Nelson, Mark, "LZW Data Compression", Dr. Dobbs Journal, Volume 14,
  3244. Number 10, October 1989, pages 29-37.
  3245.  
  3246. Nelson, Mark, "The Data Compression Book", M&T Books, 1991.
  3247.  
  3248. Storer, James A., "Data Compression, Methods and Theory",
  3249. Computer Science Press, 1988
  3250.  
  3251. Welch, Terry, "A Technique for High-Performance Data Compression",
  3252. IEEE Computer, Volume 17, Number 6, June 1984, pages 8-19.
  3253.  
  3254. Ziv, J. and Lempel, A., "A universal algorithm for sequential data
  3255. compression", Communications of the ACM, Volume 30, Number 6,
  3256. June 1987, pages 520-540.
  3257.  
  3258. Ziv, J. and Lempel, A., "Compression of individual sequences via
  3259. variable-rate coding", IEEE Transactions on Information Theory,
  3260. Volume 24, Number 5, September 1978, pages 530-536.
  3261.  
  3262.  
  3263. APPENDIX A - AS/400 Extra Field (0x0065) Attribute Definitions
  3264. --------------------------------------------------------------
  3265.  
  3266. A.1 Field Definition Structure:
  3267.  
  3268. a. field length including length 2 bytes
  3269. b. field code 2 bytes
  3270. c. data x bytes
  3271.  
  3272. A.2 Field Code Description
  3273.  
  3274. 4001 Source type i.e. CLP etc
  3275. 4002 The text description of the library
  3276. 4003 The text description of the file
  3277. 4004 The text description of the member
  3278. 4005 x'F0' or 0 is PF-DTA, x'F1' or 1 is PF_SRC
  3279. 4007 Database Type Code 1 byte
  3280. 4008 Database file and fields definition
  3281. 4009 GZIP file type 2 bytes
  3282. 400B IFS code page 2 bytes
  3283. 400C IFS Creation Time 4 bytes
  3284. 400D IFS Access Time 4 bytes
  3285. 400E IFS Modification time 4 bytes
  3286. 005C Length of the records in the file 2 bytes
  3287. 0068 GZIP two words 8 bytes
  3288.  
  3289. APPENDIX B - z/OS Extra Field (0x0065) Attribute Definitions
  3290. ------------------------------------------------------------
  3291.  
  3292. B.1 Field Definition Structure:
  3293.  
  3294. a. field length including length 2 bytes
  3295. b. field code 2 bytes
  3296. c. data x bytes
  3297.  
  3298. B.2 Field Code Description
  3299.  
  3300. 0001 File Type 2 bytes
  3301. 0002 NonVSAM Record Format 1 byte
  3302. 0003 Reserved
  3303. 0004 NonVSAM Block Size 2 bytes Big Endian
  3304. 0005 Primary Space Allocation 3 bytes Big Endian
  3305. 0006 Secondary Space Allocation 3 bytes Big Endian
  3306. 0007 Space Allocation Type1 byte flag
  3307. 0008 Modification Date Retired with PKZIP 5.0 +
  3308. 0009 Expiration Date Retired with PKZIP 5.0 +
  3309. 000A PDS Directory Block Allocation 3 bytes Big Endian binary value
  3310. 000B NonVSAM Volume List variable
  3311. 000C UNIT Reference Retired with PKZIP 5.0 +
  3312. 000D DF/SMS Management Class 8 bytes EBCDIC Text Value
  3313. 000E DF/SMS Storage Class 8 bytes EBCDIC Text Value
  3314. 000F DF/SMS Data Class 8 bytes EBCDIC Text Value
  3315. 0010 PDS/PDSE Member Info. 30 bytes
  3316. 0011 VSAM sub-filetype 2 bytes
  3317. 0012 VSAM LRECL 13 bytes EBCDIC "(num_avg num_max)"
  3318. 0013 VSAM Cluster Name Retired with PKZIP 5.0 +
  3319. 0014 VSAM KSDS Key Information 13 bytes EBCDIC "(num_length num_position)"
  3320. 0015 VSAM Average LRECL 5 bytes EBCDIC num_value padded with blanks
  3321. 0016 VSAM Maximum LRECL 5 bytes EBCDIC num_value padded with blanks
  3322. 0017 VSAM KSDS Key Length 5 bytes EBCDIC num_value padded with blanks
  3323. 0018 VSAM KSDS Key Position 5 bytes EBCDIC num_value padded with blanks
  3324. 0019 VSAM Data Name 1-44 bytes EBCDIC text string
  3325. 001A VSAM KSDS Index Name 1-44 bytes EBCDIC text string
  3326. 001B VSAM Catalog Name 1-44 bytes EBCDIC text string
  3327. 001C VSAM Data Space Type 9 bytes EBCDIC text string
  3328. 001D VSAM Data Space Primary 9 bytes EBCDIC num_value left-justified
  3329. 001E VSAM Data Space Secondary 9 bytes EBCDIC num_value left-justified
  3330. 001F VSAM Data Volume List variable EBCDIC text list of 6-character Volume IDs
  3331. 0020 VSAM Data Buffer Space 8 bytes EBCDIC num_value left-justified
  3332. 0021 VSAM Data CISIZE 5 bytes EBCDIC num_value left-justified
  3333. 0022 VSAM Erase Flag 1 byte flag
  3334. 0023 VSAM Free CI % 3 bytes EBCDIC num_value left-justified
  3335. 0024 VSAM Free CA % 3 bytes EBCDIC num_value left-justified
  3336. 0025 VSAM Index Volume List variable EBCDIC text list of 6-character Volume IDs
  3337. 0026 VSAM Ordered Flag 1 byte flag
  3338. 0027 VSAM REUSE Flag 1 byte flag
  3339. 0028 VSAM SPANNED Flag 1 byte flag
  3340. 0029 VSAM Recovery Flag 1 byte flag
  3341. 002A VSAM WRITECHK Flag 1 byte flag
  3342. 002B VSAM Cluster/Data SHROPTS 3 bytes EBCDIC "n,y"
  3343. 002C VSAM Index SHROPTS 3 bytes EBCDIC "n,y"
  3344. 002D VSAM Index Space Type 9 bytes EBCDIC text string
  3345. 002E VSAM Index Space Primary 9 bytes EBCDIC num_value left-justified
  3346. 002F VSAM Index Space Secondary 9 bytes EBCDIC num_value left-justified
  3347. 0030 VSAM Index CISIZE 5 bytes EBCDIC num_value left-justified
  3348. 0031 VSAM Index IMBED 1 byte flag
  3349. 0032 VSAM Index Ordered Flag 1 byte flag
  3350. 0033 VSAM REPLICATE Flag 1 byte flag
  3351. 0034 VSAM Index REUSE Flag 1 byte flag
  3352. 0035 VSAM Index WRITECHK Flag 1 byte flag Retired with PKZIP 5.0 +
  3353. 0036 VSAM Owner 8 bytes EBCDIC text string
  3354. 0037 VSAM Index Owner 8 bytes EBCDIC text string
  3355. 0038 Reserved
  3356. 0039 Reserved
  3357. 003A Reserved
  3358. 003B Reserved
  3359. 003C Reserved
  3360. 003D Reserved
  3361. 003E Reserved
  3362. 003F Reserved
  3363. 0040 Reserved
  3364. 0041 Reserved
  3365. 0042 Reserved
  3366. 0043 Reserved
  3367. 0044 Reserved
  3368. 0045 Reserved
  3369. 0046 Reserved
  3370. 0047 Reserved
  3371. 0048 Reserved
  3372. 0049 Reserved
  3373. 004A Reserved
  3374. 004B Reserved
  3375. 004C Reserved
  3376. 004D Reserved
  3377. 004E Reserved
  3378. 004F Reserved
  3379. 0050 Reserved
  3380. 0051 Reserved
  3381. 0052 Reserved
  3382. 0053 Reserved
  3383. 0054 Reserved
  3384. 0055 Reserved
  3385. 0056 Reserved
  3386. 0057 Reserved
  3387. 0058 PDS/PDSE Member TTR Info. 6 bytes Big Endian
  3388. 0059 PDS 1st LMOD Text TTR 3 bytes Big Endian
  3389. 005A PDS LMOD EP Rec # 4 bytes Big Endian
  3390. 005B Reserved
  3391. 005C Max Length of records 2 bytes Big Endian
  3392. 005D PDSE Flag 1 byte flag
  3393. 005E Reserved
  3394. 005F Reserved
  3395. 0060 Reserved
  3396. 0061 Reserved
  3397. 0062 Reserved
  3398. 0063 Reserved
  3399. 0064 Reserved
  3400. 0065 Last Date Referenced 4 bytes Packed Hex "yyyymmdd"
  3401. 0066 Date Created 4 bytes Packed Hex "yyyymmdd"
  3402. 0068 GZIP two words 8 bytes
  3403. 0071 Extended NOTE Location 12 bytes Big Endian
  3404. 0072 Archive device UNIT 6 bytes EBCDIC
  3405. 0073 Archive 1st Volume 6 bytes EBCDIC
  3406. 0074 Archive 1st VOL File Seq# 2 bytes Binary
  3407.  
  3408. APPENDIX C - Zip64 Extensible Data Sector Mappings
  3409. ---------------------------------------------------
  3410.  
  3411. -Z390 Extra Field:
  3412.  
  3413. The following is the general layout of the attributes for the
  3414. ZIP 64 "extra" block for extended tape operations.
  3415.  
  3416. Note: some fields stored in Big Endian format. All text is
  3417. in EBCDIC format unless otherwise specified.
  3418.  
  3419. Value Size Description
  3420. ----- ---- -----------
  3421. (Z390) 0x0065 2 bytes Tag for this "extra" block type
  3422. Size 4 bytes Size for the following data block
  3423. Tag 4 bytes EBCDIC "Z390"
  3424. Length71 2 bytes Big Endian
  3425. Subcode71 2 bytes Enote type code
  3426. FMEPos 1 byte
  3427. Length72 2 bytes Big Endian
  3428. Subcode72 2 bytes Unit type code
  3429. Unit 1 byte Unit
  3430. Length73 2 bytes Big Endian
  3431. Subcode73 2 bytes Volume1 type code
  3432. FirstVol 1 byte Volume
  3433. Length74 2 bytes Big Endian
  3434. Subcode74 2 bytes FirstVol file sequence
  3435. FileSeq 2 bytes Sequence
  3436.  
  3437. APPENDIX D - Language Encoding (EFS)
  3438. ------------------------------------
  3439.  
  3440. D.1 The ZIP format has historically supported only the original IBM PC character
  3441. encoding set, commonly referred to as IBM Code Page 437. This limits storing
  3442. file name characters to only those within the original MS-DOS range of values
  3443. and does not properly support file names in other character encodings, or
  3444. languages. To address this limitation, this specification will support the
  3445. following change.
  3446.  
  3447. D.2 If general purpose bit 11 is unset, the file name and comment should conform
  3448. to the original ZIP character encoding. If general purpose bit 11 is set, the
  3449. filename and comment must support The Unicode Standard, Version 4.1.0 or
  3450. greater using the character encoding form defined by the UTF-8 storage
  3451. specification. The Unicode Standard is published by the The Unicode
  3452. Consortium (www.unicode.org). UTF-8 encoded data stored within ZIP files
  3453. is expected to not include a byte order mark (BOM).
  3454.  
  3455. D.3 Applications may choose to supplement this file name storage through the use
  3456. of the 0x0008 Extra Field. Storage for this optional field is currently
  3457. undefined, however it will be used to allow storing extended information
  3458. on source or target encoding that may further assist applications with file
  3459. name, or file content encoding tasks. Please contact PKWARE with any
  3460. requirements on how this field should be used.
  3461.  
  3462. D.4 The 0x0008 Extra Field storage may be used with either setting for general
  3463. purpose bit 11. Examples of the intended usage for this field is to store
  3464. whether "modified-UTF-8" (JAVA) is used, or UTF-8-MAC. Similarly, other
  3465. commonly used character encoding (code page) designations can be indicated
  3466. through this field. Formalized values for use of the 0x0008 record remain
  3467. undefined at this time. The definition for the layout of the 0x0008 field
  3468. will be published when available. Use of the 0x0008 Extra Field provides
  3469. for storing data within a ZIP file in an encoding other than IBM Code
  3470. Page 437 or UTF-8.
  3471.  
  3472. D.5 General purpose bit 11 will not imply any encoding of file content or
  3473. password. Values defining character encoding for file content or
  3474. password must be stored within the 0x0008 Extended Language Encoding
  3475. Extra Field.
  3476.  
  3477. D.6 Ed Gordon of the Info-ZIP group has defined a pair of "extra field" records
  3478. that can be used to store UTF-8 file name and file comment fields. These
  3479. records can be used for cases when the general purpose bit 11 method
  3480. for storing UTF-8 data in the standard file name and comment fields is
  3481. not desirable. A common case for this alternate method is if backward
  3482. compatibility with older programs is required.
  3483.  
  3484. D.7 Definitions for the record structure of these fields are included above
  3485. in the section on 3rd party mappings for "extra field" records. These
  3486. records are identified by Header ID's 0x6375 (Info-ZIP Unicode Comment
  3487. Extra Field) and 0x7075 (Info-ZIP Unicode Path Extra Field).
  3488.  
  3489. D.8 The choice of which storage method to use when writing a ZIP file is left
  3490. to the implementation. Developers should expect that a ZIP file may
  3491. contain either method and should provide support for reading data in
  3492. either format. Use of general purpose bit 11 reduces storage requirements
  3493. for file name data by not requiring additional "extra field" data for
  3494. each file, but can result in older ZIP programs not being able to extract
  3495. files. Use of the 0x6375 and 0x7075 records will result in a ZIP file
  3496. that should always be readable by older ZIP programs, but requires more
  3497. storage per file to write file name and/or file comment fields.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement