Diaphora 3.0 checkpoint at end of March 2023

MISC: Increased Diaphora version to 3.0.0.
API: Add support for the first 'to be exposed' function, used to get the callgraph percent difference.
CORE: Added CodeCut support to find anonymous compilation units.
CORE: Added IDAMagicStrings to try to find compilation unit names.
CORE: Added support to find and export Compilation Units.
CORE: Coallesce contiguous named compilation units using the minimum and maximum address.
CORE: Do not directly add matches to choosers, instead, work with internal Python dict objects and process them when the diffing session is done.
CORE: More refactorizations for properly supporting multimatches and finding the best matches.
CORE: Set a name to compilation units when enough matches indicate the name of the compilation unit using IDAMagicStrings.
DATABASE: Added proper indices and fine tunning of SQL heuristics and queries.
DATABASE: Moved tables and indices definitions to a different file.
DIFF: Support for handling mutiple matches by showing them in a different chooser.
DOC: Documented all functions and members in diaphora.py.
EXPORT: Add the `func_id` field to the `instructions` table.
EXPORT: Consider data references to functions from functions also code references.
GUI: Add support for Python logging facilities.
GUI: Added environment variable DIAPHORA_LOG_PRINT to print to stdout instead of using Python logging facilities.
GUI: Enable, by default, slow heuristics only for databases of ~1,000 functions at much.
GUI: Renamed 'Experimental' to 'Enable Speed Ups', as the old 'experimental' heuristics are either upgraded to 'normal' or removed.
HEUR: Add a filter for a minimum of 3 instructions for heuristic 'Same address, nodes, edges and mnemonics'.
HEUR: Add support for speed ups (internally called 'dirty heuristics') for detected symbols stripped matching and patch diffing.
HEUR: Added a default ORDER BY clause to order by compilation unit when there is a named compilation unit.
HEUR: Added a minimum ratio of 0.35 for heuristic 'Pseudo-code fuzzy AST hash'.
HEUR: Added a minimum ratio of 0.5 for heuristic 'Pseudo-code fuzzy (normal)'.
HEUR: Added heuristic 'Same rare basic block mnemonics list'.
HEUR: Added heuristic 'Local affinity' to find matches in functions gaps.
HEUR: Added heuristic 'Same anonymous compilation unit function match'.
HEUR: Added heuristic 'Same compilation unit'.
HEUR: Added heuristic 'Same named compilation unit function match'.
HEUR: Added heuristic type HEUR_TYPE_RATIO_MAX_TRUSTED. Results with a bad similarity ratio are assigned to the 'Partial' tab regardless of the calculated ratio.
HEUR: Added self-explanatory new heuristic 'Same rare assembly instruction'.
HEUR: Added support to find matches diffing assembly and pseudo-codes of previous known good matches.
HEUR: All heuristics now select the same fields by calling `diaphora_heuristics.get_query_fields()` to retrieve the fields.
HEUR: Allow the heuristic 'Same rare constant' to match functions with at least 3 basic blocks.
HEUR: Always consider functions matching by name the best match, no matter of the ratio that another match might produce.
HEUR: Changed heuristic 'Same nodes, edges and strongly connected components' to 'Same nodes, edges, loops and strongly connected components'. Now loops are also considered for matching.
HEUR: Changed heuristic 'Similar pseudo-code and names' to only consider results with a similarity ratio higher than 0.579.
HEUR: Consider matches only for symbol names that have at least 4 characters for heuristic 'Callee found finding matches'.
HEUR: Consider the first match for heuristic 'Local affinity' function the best match.
HEUR: First proper working version (hopefully) of the support for multimatches.
HEUR: Increased the number of decimal numbers (7) used for comparison ratios.
HEUR: Increased the queries timeout to 5 minutes.
HEUR: Marked heuristic 'Same rare constant' as slow.
HEUR: Moved heurisitc 'Brute force' to the unreliable category.
HEUR: Moved heuristic 'Same graph' to the unreliable category.
HEUR: Moved heuristic 'Nodes, edges, complexity and mnemonics with small differences' to the slow ones.
HEUR: Moved the 'Experimental' heuristics to the 'Partial' category.
HEUR: Order by address the functions for heuristic 'Local affinity', as compilers usually put functions in the same order in binaries.
HEUR: Relax the heuristic 'Same rare constant' to allow good matches with a bad similarity ratio to appear in the 'Partial' tab.
HEUR: Removed heuristic 'Bytes hash and names'.
HEUR: Removed heuristic 'Strongly connected components SPP and names'.
HEUR: Removed heuristics that were not finding anything, namely, 'All or most attributes', 'Same address, nodes, edges and primes (re-ordered instructions)', 'Strongly connected components small-primes-product' and 'Callgraph match'.
HEUR: Removed old wrong and buggy heuristic 'Call address sequence'.
HEUR: Removed the slow flag from heuristics 'Switch structures', 'Pseudo-code fuzzy XXX' and 'Same graph'.
HEUR: Removed unreliable heuristic 'Bytes sum'.
HEUR: Rewrite heuristics 'Same rare KOKA hash' and 'Same rare MD-Index' to use the WITH clause that makes queries much more readable and maintainable.
HEUR: Run slow heuristics at the very end of the diffing process, after the other heuristics.
HEUR: Run the only 2 remaining 'unreliable' heuristics at the very end of the diffing process.
HEUR: The DISTINCT and/or the ORDER BY clauses have been removed in some SQL heuristics because they were causing some queries to never finish triggering SQLite memory errors.
HEUR: Use difflib.unified_diff insted of ndiff because the later is way too slow to call it hundred of thousands of times.
HEUR: When diffing matches to find callees ignore those matches that differ more than 75% of the number of basic blocks.
MISC: Added a 'Diaphora:' prefix for log messages.
MISC: Change the text for the 'Call Address sequence' heuristic to show which initial results the matches are based on.
MISC: Fixed some minor typos in the sources.
MISC: Multiple little refactorizations here and there.
MISC: Renamed heuristic 'Same cleaned up assembly' to 'Same cleaned assembly'
BUG: Added the n-th fix to try not to leak cursors at all ever.
BUG: All parallel calls to add_matches_from_query_ratio_max() were wrong.
BUG: Always use the internal dicts for handling matches, never use the choosers except for adding the results at the end.
BUG: Commit every transaction that must be committed.
BUG: Do not analyze the databases each time a diff is started.
BUG: Do not consider IDA's auto-generated names for the 'Same RVA' heuristic.
BUG: Do not crash when there is no chooser (it's None) given for a specific category.
BUG: Do not directly call 'sqlite3.connect()', instead call a wrapper that does whatever initialization is required.
BUG: Handling timeouts in threads was horribly wrong because there was no code to handle the timeout inside the thread...
BUG: Hopefully final fix for issue #5.
BUG: If a reverser selected File -> Save As from the menu Diaphora would fail to find the .til file and it would crash.
BUG: Instruction level import support was very wrong, even with typos.
BUG: Multiple instances of functions leaking cursors were fixed.
BUG: Regular expression pattern in `get_cmp_asm` wasn't properly escaped.
BUG: Removing items from choosers in IDA was broken.
BUG: Some SQL queries were not able to properly execute due to huge B-TREEs being created by SQLite when diffing huge databases.
BUG: Some comparisons (pseudocode and graph) were being shown, wrongly, in a different order than the others.
BUG: Some heuristics were trying to filter with a wrong SQL expression functions starting with the 'sub_' prefix.
BUG: The check to determine if Diaphora should continue finding more callees diffing previous results was wrong.
BUG: The environment variables for multiple items were not being properly handled.
BUG: The logic to handle unmatched choosers was being handled the wrong way (the other way around), which was pretty confusing.
BUG: The members `get_cmp_asm` and `get_cmp_pseudo` were being called hundreds of thousand of times for no reason when diffing.
BUG: There were still many places were Diaphora could leak cursors in diaphora_ida.py.
BUG: When calling the function `check_ratio()` always convert, internally, to float the values of the MD-Indices.
BUG: Workaround implemented for the IDA bug 'max non-trivial tinfo_t count has been reached' that might be triggered when exporting immense databases.