Not a member of Pastebin yet?
Sign Up,
it unlocks many cool features!
- background on 2011 data analysis capabilities of IC, GHCQ TOP SECRET Strap 1 COMINT
- semi-supervised learning (unsupervised, deep learning appears to be in future) - active learning, weak labels
- random forest classifiers starting from base decision trees
- truthing' - judgments, computer based (without HITs)
- missing value imputation - Missing Not At Random (MNAR)
- positive only classifiers (learn payphones without negative examples)
- semi-synthetic data (e.g. steganography), natural error
- hadoop
- distillery (a streams architecture, IBM SPL - streams processing language)
- graph processing - SANDIA Questia
- SIGINT background (so, training for audience)
- Intelligence cycle - collection, processing, analysis, reporting, target development
- computer network operations (CNO) - attack, exploit, defend, counter - skills, tradecraft, mindset
- collection: special source collection (network intercept) - CNE, CNA, CND, C-CNE
- content databases (relational), QFDs, DISTILLERY, 'the cloud' (MAP REDUCE)
- single internet link is a 'bearer' (as 10 G bearer)
- metadata and content
- target discovery: contact chaining from 'seed selector' (HUMINT supplied), 2-out neighbourhood
- behaviour maching an MO on a large target dataset
- closed loop detection (buy special phones and only communicate with each other)
- DNI - passive SIGINT: ... CNO - attack and defnd - exploit the network - address intelligence needs - DNI
- novel* hui-walter method of independent tests
- novel* CLASP scores in arrival process correlation
- examples of classification: logo recog, spam detect, protocol class, stego detect, genre class (2-17), website class (4 Classes), payphone class,
- arrival process correlation
- novel* GeoFusion, Akamai dataset
- LLNL - Latent Class Models (LCA)
- novel* Treebeard is a technique for interpreting random forest output
- interpresing information flow in graph cascades - information flows
- sources: E, I, W (external, internal, website)
- temporal correlation between stochastic processes
- cross media: A calls B to start IM convo
- novel* NHPP (non-homogeneous poisson processes)
- model the rate function
- novel* PCG statistic used
- streaming implementation of PCG algorithm
- Enron, SKB datasets were used
- SKB used for circulation of extremist media
- ROC curves used for algorithm evaluation
- novel* forest fire algorithm
- novel* Bollob´as-Janson-Riordan family of graphs
- EDA on streams (pulling out global properties of data, broad-brush visualisations)
- stream: not enough memory to hold it all, only get to see data once - situational awareness, real time tipping
- novel* streaming algorithms: The 2009 Information Processing SCAMP at La Jolla
- EDA on streams: graph problems with no subsetting allowed, vizualization, modelling and outlier detection, profiling and correlation
- novel* event data is a hypergraph
- collocation graphs, network graphs, semantic graphs (concepts)
- novel* cliques, motifs, trusses are looked at (related to cores and cliques)
- closed loop detection
- CLASP https://pdfs.semanticscholar.org/9966/a1e8a67e1534a4b0377e23a303e43eeed13d.pdf
- CROUCHING SQUIRREL bot net detection
- DISCOVER archives for learning
- EPR end product report
- CPC, RPC, OPC are processing centres for SIGINT
- LI, PRESTON lawful intercept
- QFD query focused dataset
- NGE next generation events (filtered output post PPF/MVR and TERRAIN)
- BROAD OAK GCHQs targeting database - has selectors
- PPF packet processing framework
- MVR massive volume reduction
- ECIs their details cover network data collection
- COI, CHORDAL
- TERRAIN takes PPF/MVR output and does sessionalization
- FKB Flexible-survey KB - 15 minute probes on each bearer, to map what is collectible
- BOTNET, C&C, C2, DDOS
- PRIME TIME detica developed streaming analysis -> p-value for waiting time for transactions, exponential model
- novel* HIDDEN OTTER temporal chains in communication data
- -> back haul, TOR networks, botnets
- reinvention of remit chains algorithm, for Hadoop
- BAKER's DOZEN near phone numbers that display causal behaviour
- improves on CLASP
- external: hashtags on twitter, spread of disease through contacts
- passing of files between implanted machines
- information cascades and diffusion'
- block modelling
- LID hierarchical information store
- SAGA item similarity to set similarity
- SALTY OTTER missing edges in graph
- FIVE ALIVE large dump available
- data on covert infrastructure for exfiltating data from CNE implants
- suspected conficker infections (based on signatures collected, behavioural analysis)
- BIRCH graph algorithm (data clustering)
- CHART BREAKER vertex scores may calculate centrality around a target; is there concept drift? 'relationship scores' from email hypergraph
- GRINNING ROACH visualisation tool for SIGINT events
- PIRATE CAREBEAR visualisation tool for SIGINT events
- TDI target detection identifiers
- AUTO ASSOC same user or machine (think, anon id and machine id, ANID, MUID)
- INSTINCT data mining for counter-terror
- KACHINA with Sandia labs - collaboration point
- HRMAP HTTP request map
- SQUEAL ALERTS Squeal is a signature-based system for detecting electronic attacks; SQUEAL HITS
- SUN STORM, GOLD MINE, HAGER AWEL, woody and buzz are specific hadoop clusters
- novel* Flajolet et al.’s hyperloglog sketches
- counting triangles (3-cliques)
- probabilistic counting of k-cliques
- hierarchical clustering
- novel* clique percolation with multiple labels per node
- finding graph invariants in a stream
- centrality measures around a target
- centrality or between-ness in a stream
- graph distance distribution and how it varies with the pizza threshhold (bears on hop distance for contact chaining)
- some work comparing SIGINT and billing records
- dashboarding for attack events (2012 olympics)
- time series modelling - bundle R in DISTILLERY
- novel* outlier - LLNL gaussian mixture model, particle filters, identify newness
- This question has also been posed to the 2011 NSASAG
- pull finite chunk from stream for change detection metrics or offline analysis (how window size) - variance blows up with too little, too much (concept drift)
- ANOVA across window sizes?
- can we correlate 'business profiles' of nodes to detect situational awareness of a DDOS
- find behaviour that matches a model (MO fishing)
- novel* CPD analysis and tensor decomp -> extract multivariate factors as rank 1 tensors
- relates to link prediction (missing data)
- e.g., target disposes of phone and gets another - can we find?
- e.g. 'box participating in DDOS, or beaconing a C2 server'
- streaming botnet sampling by adaptive sampling
- giant component -> remaining components should be examined
- 6.2.4 centrality measures
- eigenvector centrality measures (Google Page Rank)
- personalized page rank
- novel* KL-relative page rank
- SILVER LINING work package within TDB provides hadoop
- SILVER LIBRARY utilities for map reduce
- VALHALLA a windows desktop work environment
- MOUNT MCKINLEY linux cluster 652 nodes
- SEPANG another cluster
- scp used to transfer data from cluster to analysis location
- SALAMANCA TOP SECRET STRAP2 CHORDAL
- it is a DB for storing call data
- goes to SUNSTORM cloud and BHDIST DISTILLERY and others
- between country calls better, but some in country pretty good coverage
- data sets in folder by date, with unique event id
- FIVE ALIVE is a QFD name refers to 5-tuple (IPs, ports, protocol); flow size and direction may be present
- SKB signature knowledge database
- format: date time src_IP dst_IP frag_# IP_ID len protocol_# src_port dst_port
- seq_# ack_# file_offset file_type file_signature src_geo dst_geo
- BLACKHOLE SKB in QFD and this; blacktools used to extact
- 1 week available; need moar use blacktools
- arrival process format: name, number of events, space separated times; scored <name1> <name2>
- novel* detailed desc of event analysis using CLASP
- SOLID INK telephony events from 2007 (billing records)
- FLUID INK as seen from GHCQ
- format for both: timestamp user 1 user 2 number
- OPEN SOURCE: ENRON, US Flights (1987-2008), Wikipedia graph
- TOP SECRET STRAP2 UKEO: select websites by Radicalism and Extremism
- target dictionary (telephony and C2C) is delivered to DISTILLERY once a day
- covert infrastructure
- GCHQ has knowledge of, and collection from, CNE acceses owned by foreign intelligence agencies. This is done without their permission and is known as fourth party collection
- LUCKY STRIKE a database
- SPIKY ROCK old dataset and source code
- novel* akamai edgescape data set used
- IP data sets for better location
- INJUNCTION
- PSYCHIC SALMON
- RAGING BULLFROG
- ROBOTIC FISH
- TIMID TOAD
- CARBON COPY
- CASK situational awareness for olympics
- GRINNING ROACH
- MAMBA
- WHITERAVEN
Advertisement
Add Comment
Please, Sign In to add comment