Guest User

Untitled

a guest
Feb 25th, 2017
342
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 9.05 KB | None | 0 0
  1. background on 2011 data analysis capabilities of IC, GHCQ TOP SECRET Strap 1 COMINT
  2. semi-supervised learning (unsupervised, deep learning appears to be in future) - active learning, weak labels
  3. random forest classifiers starting from base decision trees
  4. truthing' - judgments, computer based (without HITs)
  5. missing value imputation - Missing Not At Random (MNAR)
  6. positive only classifiers (learn payphones without negative examples)
  7. semi-synthetic data (e.g. steganography), natural error
  8. hadoop
  9. distillery (a streams architecture, IBM SPL - streams processing language)
  10. graph processing - SANDIA Questia
  11. SIGINT background (so, training for audience)
  12. Intelligence cycle - collection, processing, analysis, reporting, target development
  13. computer network operations (CNO) - attack, exploit, defend, counter - skills, tradecraft, mindset
  14. collection: special source collection (network intercept) - CNE, CNA, CND, C-CNE
  15. content databases (relational), QFDs, DISTILLERY, 'the cloud' (MAP REDUCE)
  16. single internet link is a 'bearer' (as 10 G bearer)
  17. metadata and content
  18. target discovery: contact chaining from 'seed selector' (HUMINT supplied), 2-out neighbourhood
  19. behaviour maching an MO on a large target dataset
  20. closed loop detection (buy special phones and only communicate with each other)
  21. DNI - passive SIGINT: ... CNO - attack and defnd - exploit the network - address intelligence needs - DNI
  22. novel* hui-walter method of independent tests
  23. novel* CLASP scores in arrival process correlation
  24. examples of classification: logo recog, spam detect, protocol class, stego detect, genre class (2-17), website class (4 Classes), payphone class,
  25. arrival process correlation
  26. novel* GeoFusion, Akamai dataset
  27. LLNL - Latent Class Models (LCA)
  28. novel* Treebeard is a technique for interpreting random forest output
  29. interpresing information flow in graph cascades - information flows
  30.  
  31. sources: E, I, W (external, internal, website)
  32. temporal correlation between stochastic processes
  33. cross media: A calls B to start IM convo
  34. novel* NHPP (non-homogeneous poisson processes)
  35. model the rate function
  36. novel* PCG statistic used
  37. streaming implementation of PCG algorithm
  38. Enron, SKB datasets were used
  39. SKB used for circulation of extremist media
  40. ROC curves used for algorithm evaluation
  41. novel* forest fire algorithm
  42. novel* Bollob´as-Janson-Riordan family of graphs
  43. EDA on streams (pulling out global properties of data, broad-brush visualisations)
  44. stream: not enough memory to hold it all, only get to see data once - situational awareness, real time tipping
  45. novel* streaming algorithms: The 2009 Information Processing SCAMP at La Jolla
  46. EDA on streams: graph problems with no subsetting allowed, vizualization, modelling and outlier detection, profiling and correlation
  47. novel* event data is a hypergraph
  48. collocation graphs, network graphs, semantic graphs (concepts)
  49. novel* cliques, motifs, trusses are looked at (related to cores and cliques)
  50. closed loop detection
  51. CLASP https://pdfs.semanticscholar.org/9966/a1e8a67e1534a4b0377e23a303e43eeed13d.pdf
  52. CROUCHING SQUIRREL bot net detection
  53. DISCOVER archives for learning
  54. EPR end product report
  55. CPC, RPC, OPC are processing centres for SIGINT
  56. LI, PRESTON lawful intercept
  57. QFD query focused dataset
  58. NGE next generation events (filtered output post PPF/MVR and TERRAIN)
  59. BROAD OAK GCHQs targeting database - has selectors
  60. PPF packet processing framework
  61. MVR massive volume reduction
  62. ECIs their details cover network data collection
  63. COI, CHORDAL
  64. TERRAIN takes PPF/MVR output and does sessionalization
  65. FKB Flexible-survey KB - 15 minute probes on each bearer, to map what is collectible
  66. BOTNET, C&C, C2, DDOS
  67. PRIME TIME detica developed streaming analysis -> p-value for waiting time for transactions, exponential model
  68. novel* HIDDEN OTTER temporal chains in communication data
  69. -> back haul, TOR networks, botnets
  70. reinvention of remit chains algorithm, for Hadoop
  71. BAKER's DOZEN near phone numbers that display causal behaviour
  72. improves on CLASP
  73. external: hashtags on twitter, spread of disease through contacts
  74. passing of files between implanted machines
  75. information cascades and diffusion'
  76. block modelling
  77. LID hierarchical information store
  78. SAGA item similarity to set similarity
  79. SALTY OTTER missing edges in graph
  80. FIVE ALIVE large dump available
  81. data on covert infrastructure for exfiltating data from CNE implants
  82. suspected conficker infections (based on signatures collected, behavioural analysis)
  83. BIRCH graph algorithm (data clustering)
  84. CHART BREAKER vertex scores may calculate centrality around a target; is there concept drift? 'relationship scores' from email hypergraph
  85. GRINNING ROACH visualisation tool for SIGINT events
  86. PIRATE CAREBEAR visualisation tool for SIGINT events
  87. TDI target detection identifiers
  88. AUTO ASSOC same user or machine (think, anon id and machine id, ANID, MUID)
  89. INSTINCT data mining for counter-terror
  90. KACHINA with Sandia labs - collaboration point
  91. HRMAP HTTP request map
  92. SQUEAL ALERTS Squeal is a signature-based system for detecting electronic attacks; SQUEAL HITS
  93. SUN STORM, GOLD MINE, HAGER AWEL, woody and buzz are specific hadoop clusters
  94.  
  95. novel* Flajolet et al.’s hyperloglog sketches
  96. counting triangles (3-cliques)
  97. probabilistic counting of k-cliques
  98. hierarchical clustering
  99. novel* clique percolation with multiple labels per node
  100. finding graph invariants in a stream
  101. centrality measures around a target
  102. centrality or between-ness in a stream
  103. graph distance distribution and how it varies with the pizza threshhold (bears on hop distance for contact chaining)
  104. some work comparing SIGINT and billing records
  105. dashboarding for attack events (2012 olympics)
  106. time series modelling - bundle R in DISTILLERY
  107. novel* outlier - LLNL gaussian mixture model, particle filters, identify newness
  108. This question has also been posed to the 2011 NSASAG
  109. pull finite chunk from stream for change detection metrics or offline analysis (how window size) - variance blows up with too little, too much (concept drift)
  110. ANOVA across window sizes?
  111. can we correlate 'business profiles' of nodes to detect situational awareness of a DDOS
  112. find behaviour that matches a model (MO fishing)
  113. novel* CPD analysis and tensor decomp -> extract multivariate factors as rank 1 tensors
  114. relates to link prediction (missing data)
  115. e.g., target disposes of phone and gets another - can we find?
  116. e.g. 'box participating in DDOS, or beaconing a C2 server'
  117. streaming botnet sampling by adaptive sampling
  118. giant component -> remaining components should be examined
  119. 6.2.4 centrality measures
  120. eigenvector centrality measures (Google Page Rank)
  121. personalized page rank
  122. novel* KL-relative page rank
  123.  
  124.  
  125. SILVER LINING work package within TDB provides hadoop
  126. SILVER LIBRARY utilities for map reduce
  127. VALHALLA a windows desktop work environment
  128. MOUNT MCKINLEY linux cluster 652 nodes
  129. SEPANG another cluster
  130. scp used to transfer data from cluster to analysis location
  131.  
  132.  
  133. SALAMANCA TOP SECRET STRAP2 CHORDAL
  134. it is a DB for storing call data
  135. goes to SUNSTORM cloud and BHDIST DISTILLERY and others
  136. between country calls better, but some in country pretty good coverage
  137. data sets in folder by date, with unique event id
  138. FIVE ALIVE is a QFD name refers to 5-tuple (IPs, ports, protocol); flow size and direction may be present
  139. SKB signature knowledge database
  140. format: date time src_IP dst_IP frag_# IP_ID len protocol_# src_port dst_port
  141. seq_# ack_# file_offset file_type file_signature src_geo dst_geo
  142. BLACKHOLE SKB in QFD and this; blacktools used to extact
  143. 1 week available; need moar use blacktools
  144. arrival process format: name, number of events, space separated times; scored <name1> <name2>
  145. novel* detailed desc of event analysis using CLASP
  146.  
  147. SOLID INK telephony events from 2007 (billing records)
  148. FLUID INK as seen from GHCQ
  149. format for both: timestamp user 1 user 2 number
  150.  
  151. OPEN SOURCE: ENRON, US Flights (1987-2008), Wikipedia graph
  152. TOP SECRET STRAP2 UKEO: select websites by Radicalism and Extremism
  153. target dictionary (telephony and C2C) is delivered to DISTILLERY once a day
  154.  
  155. covert infrastructure
  156. GCHQ has knowledge of, and collection from, CNE acceses owned by foreign intelligence agencies. This is done without their permission and is known as fourth party collection
  157. LUCKY STRIKE a database
  158. SPIKY ROCK old dataset and source code
  159.  
  160. novel* akamai edgescape data set used
  161.  
  162. IP data sets for better location
  163. INJUNCTION
  164. PSYCHIC SALMON
  165. RAGING BULLFROG
  166. ROBOTIC FISH
  167. TIMID TOAD
  168.  
  169. CARBON COPY
  170. CASK situational awareness for olympics
  171. GRINNING ROACH
  172. MAMBA
  173. WHITERAVEN
Advertisement
Add Comment
Please, Sign In to add comment