Advertisement
joxeankoret

Diaphora 3.0 checkpoint at end of March 2023

Mar 27th, 2023 (edited)
1,502
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
text 8.39 KB | None | 0 0
  1. MISC: Increased Diaphora version to 3.0.0.
  2. API: Add support for the first 'to be exposed' function, used to get the callgraph percent difference.
  3. CORE: Added CodeCut support to find anonymous compilation units.
  4. CORE: Added IDAMagicStrings to try to find compilation unit names.
  5. CORE: Added support to find and export Compilation Units.
  6. CORE: Coallesce contiguous named compilation units using the minimum and maximum address.
  7. CORE: Do not directly add matches to choosers, instead, work with internal Python dict objects and process them when the diffing session is done.
  8. CORE: More refactorizations for properly supporting multimatches and finding the best matches.
  9. CORE: Set a name to compilation units when enough matches indicate the name of the compilation unit using IDAMagicStrings.
  10. DATABASE: Added proper indices and fine tunning of SQL heuristics and queries.
  11. DATABASE: Moved tables and indices definitions to a different file.
  12. DIFF: Support for handling mutiple matches by showing them in a different chooser.
  13. DOC: Documented all functions and members in diaphora.py.
  14. EXPORT: Add the `func_id` field to the `instructions` table.
  15. EXPORT: Consider data references to functions from functions also code references.
  16. GUI: Add support for Python logging facilities.
  17. GUI: Added environment variable DIAPHORA_LOG_PRINT to print to stdout instead of using Python logging facilities.
  18. GUI: Enable, by default, slow heuristics only for databases of ~1,000 functions at much.
  19. GUI: Renamed 'Experimental' to 'Enable Speed Ups', as the old 'experimental' heuristics are either upgraded to 'normal' or removed.
  20. HEUR: Add a filter for a minimum of 3 instructions for heuristic 'Same address, nodes, edges and mnemonics'.
  21. HEUR: Add support for speed ups (internally called 'dirty heuristics') for detected symbols stripped matching and patch diffing.
  22. HEUR: Added a default ORDER BY clause to order by compilation unit when there is a named compilation unit.
  23. HEUR: Added a minimum ratio of 0.35 for heuristic 'Pseudo-code fuzzy AST hash'.
  24. HEUR: Added a minimum ratio of 0.5 for heuristic 'Pseudo-code fuzzy (normal)'.
  25. HEUR: Added heuristic 'Same rare basic block mnemonics list'.
  26. HEUR: Added heuristic 'Local affinity' to find matches in functions gaps.
  27. HEUR: Added heuristic 'Same anonymous compilation unit function match'.
  28. HEUR: Added heuristic 'Same compilation unit'.
  29. HEUR: Added heuristic 'Same named compilation unit function match'.
  30. HEUR: Added heuristic type HEUR_TYPE_RATIO_MAX_TRUSTED. Results with a bad similarity ratio are assigned to the 'Partial' tab regardless of the calculated ratio.
  31. HEUR: Added self-explanatory new heuristic 'Same rare assembly instruction'.
  32. HEUR: Added support to find matches diffing assembly and pseudo-codes of previous known good matches.
  33. HEUR: All heuristics now select the same fields by calling `diaphora_heuristics.get_query_fields()` to retrieve the fields.
  34. HEUR: Allow the heuristic 'Same rare constant' to match functions with at least 3 basic blocks.
  35. HEUR: Always consider functions matching by name the best match, no matter of the ratio that another match might produce.
  36. HEUR: Changed heuristic 'Same nodes, edges and strongly connected components' to 'Same nodes, edges, loops and strongly connected components'. Now loops are also considered for matching.
  37. HEUR: Changed heuristic 'Similar pseudo-code and names' to only consider results with a similarity ratio higher than 0.579.
  38. HEUR: Consider matches only for symbol names that have at least 4 characters for heuristic 'Callee found finding matches'.
  39. HEUR: Consider the first match for heuristic 'Local affinity' function the best match.
  40. HEUR: First proper working version (hopefully) of the support for multimatches.
  41. HEUR: Increased the number of decimal numbers (7) used for comparison ratios.
  42. HEUR: Increased the queries timeout to 5 minutes.
  43. HEUR: Marked heuristic 'Same rare constant' as slow.
  44. HEUR: Moved heurisitc 'Brute force' to the unreliable category.
  45. HEUR: Moved heuristic 'Same graph' to the unreliable category.
  46. HEUR: Moved heuristic 'Nodes, edges, complexity and mnemonics with small differences' to the slow ones.
  47. HEUR: Moved the 'Experimental' heuristics to the 'Partial' category.
  48. HEUR: Order by address the functions for heuristic 'Local affinity', as compilers usually put functions in the same order in binaries.
  49. HEUR: Relax the heuristic 'Same rare constant' to allow good matches with a bad similarity ratio to appear in the 'Partial' tab.
  50. HEUR: Removed heuristic 'Bytes hash and names'.
  51. HEUR: Removed heuristic 'Strongly connected components SPP and names'.
  52. HEUR: Removed heuristics that were not finding anything, namely, 'All or most attributes', 'Same address, nodes, edges and primes (re-ordered instructions)', 'Strongly connected components small-primes-product' and 'Callgraph match'.
  53. HEUR: Removed old wrong and buggy heuristic 'Call address sequence'.
  54. HEUR: Removed the slow flag from heuristics 'Switch structures', 'Pseudo-code fuzzy XXX' and 'Same graph'.
  55. HEUR: Removed unreliable heuristic 'Bytes sum'.
  56. HEUR: Rewrite heuristics 'Same rare KOKA hash' and 'Same rare MD-Index' to use the WITH clause that makes queries much more readable and maintainable.
  57. HEUR: Run slow heuristics at the very end of the diffing process, after the other heuristics.
  58. HEUR: Run the only 2 remaining 'unreliable' heuristics at the very end of the diffing process.
  59. HEUR: The DISTINCT and/or the ORDER BY clauses have been removed in some SQL heuristics because they were causing some queries to never finish triggering SQLite memory errors.
  60. HEUR: Use difflib.unified_diff insted of ndiff because the later is way too slow to call it hundred of thousands of times.
  61. HEUR: When diffing matches to find callees ignore those matches that differ more than 75% of the number of basic blocks.
  62. MISC: Added a 'Diaphora:' prefix for log messages.
  63. MISC: Change the text for the 'Call Address sequence' heuristic to show which initial results the matches are based on.
  64. MISC: Fixed some minor typos in the sources.
  65. MISC: Multiple little refactorizations here and there.
  66. MISC: Renamed heuristic 'Same cleaned up assembly' to 'Same cleaned assembly'
  67. BUG: Added the n-th fix to try not to leak cursors at all ever.
  68. BUG: All parallel calls to add_matches_from_query_ratio_max() were wrong.
  69. BUG: Always use the internal dicts for handling matches, never use the choosers except for adding the results at the end.
  70. BUG: Commit every transaction that must be committed.
  71. BUG: Do not analyze the databases each time a diff is started.
  72. BUG: Do not consider IDA's auto-generated names for the 'Same RVA' heuristic.
  73. BUG: Do not crash when there is no chooser (it's None) given for a specific category.
  74. BUG: Do not directly call 'sqlite3.connect()', instead call a wrapper that does whatever initialization is required.
  75. BUG: Handling timeouts in threads was horribly wrong because there was no code to handle the timeout inside the thread...
  76. BUG: Hopefully final fix for issue #5.
  77. BUG: If a reverser selected File -> Save As from the menu Diaphora would fail to find the .til file and it would crash.
  78. BUG: Instruction level import support was very wrong, even with typos.
  79. BUG: Multiple instances of functions leaking cursors were fixed.
  80. BUG: Regular expression pattern in `get_cmp_asm` wasn't properly escaped.
  81. BUG: Removing items from choosers in IDA was broken.
  82. BUG: Some SQL queries were not able to properly execute due to huge B-TREEs being created by SQLite when diffing huge databases.
  83. BUG: Some comparisons (pseudocode and graph) were being shown, wrongly, in a different order than the others.
  84. BUG: Some heuristics were trying to filter with a wrong SQL expression functions starting with the 'sub_' prefix.
  85. BUG: The check to determine if Diaphora should continue finding more callees diffing previous results was wrong.
  86. BUG: The environment variables for multiple items were not being properly handled.
  87. BUG: The logic to handle unmatched choosers was being handled the wrong way (the other way around), which was pretty confusing.
  88. BUG: The members `get_cmp_asm` and `get_cmp_pseudo` were being called hundreds of thousand of times for no reason when diffing.
  89. BUG: There were still many places were Diaphora could leak cursors in diaphora_ida.py.
  90. BUG: When calling the function `check_ratio()` always convert, internally, to float the values of the MD-Indices.
  91. BUG: Workaround implemented for the IDA bug 'max non-trivial tinfo_t count has been reached' that might be triggered when exporting immense databases.
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement