T3RRYT3RR0R

ASCII string filter v3.1

Dec 5th, 2021 (edited)
720
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
  1. @Echo off
  2.  
  3.  For /f "tokens=4 delims=: " %%G in ('CHCP')Do Set "Restore_Codepage=CHCP %%G > nul"
  4.  Set "Return[Len]=" & Set "Return[String]=" & Set "{input}=" & Set "Modified="
  5.  
  6.  Setlocal DISABLEDelayedExpansion
  7.  
  8. REM the label marker ":#" is used within this script to delimit help output.
  9. :#
  10. :# ========================= ASCII string filter v3.1 by T3RRY ======================
  11. Rem - This script iterates over an input string character by character and tests
  12. Rem   each character against a a whitelist of printable ASCII characters, with
  13. Rem   succesful matches used to build a new string containing only printable
  14. Rem   ASCII characters.
  15. Rem - Execution time increases as string length increases. Each character in the
  16. Rem   string is tested against a whitelist containing 96 printable ASCII characters.
  17. :#
  18. :# Usage: Filepath <"String"> [ /P | /R ] | [ -? | /? | -help ]
  19. :#
  20. :# Rem to use from another batch file:
  21. :# For /f delims^= %%G in ('FilePath "string"')Do Echo(%%G
  22. :#
  23. :# Accepts input String via doublequoted argument - reads %* and trims " \P" or " \R"
  24. :# switches if present
  25. :# - No escaping of characters in the argument is required
  26. :# - If unbalanced doublequotes exist in the string all doublequotes will be Removed.
  27. :#
  28. :# Use Switch /P to preserve original spaces
  29. :#  - Default behaviour is to Remove all double spaces from the string.
  30. :#    Errorlevels:
  31. :#    0 : String contained only printable ASCII characters; Return[String]
  32. :#        contains the original input string.
  33. :#   -1 : String contained NonASCII or nonprintable ASCII characters;
  34. :#         Return[String] contains only printable ASCII characters
  35. :#         from the input string.
  36. :#
  37. :# Use Switch /R to reject input containing NonASCII characters
  38. :#  - Errorlevel 0 : string contains only printable ASCII Characters
  39. :#  - Errorlevel 1 or GTR: string contains one or more characters that are
  40. :#     not ASCII printable characters. The errorlevel corresponds to
  41. :#     the 1 indexed position of first non ASCII character encountered.
  42. :#     Note: the presence of TAB literals in the string will result
  43. :#     in an incorrect position being reported.
  44. :#
  45. ::::::::::::::::::::::::::::::::::
  46. Rem Version changes 11/Dec/2021 :
  47. Rem - Added TAB to ASCII printable characters. Handled via substitution. Seee help for more info.
  48. Rem - Script now differentiates between original paired spaces and paired spaces
  49. Rem   resulting from removal of non ASCII characters.
  50. ::::::::::::::::::::::::::::::::::
  51. Rem Version changes 09/Dec/2021 :
  52. Rem - Changed input method to handle cases where qouted args contain
  53. Rem   standard delims within quotes IE: "string "substring=text""
  54. Rem - Implemented negative errorlevel return: -1 to flag if
  55. Rem   the input string has been modified. 0 unmodified -1 modified.
  56. ::::::::::::::::::::::::::::::::::
  57. Rem Version changes 08/Dec/2021 :
  58. Rem - Added Help Switches -? /? and -help
  59. Rem - Added switch: /R
  60. Rem   - Reject strings containing non ASCII characters. Default: Strip NonASCCi
  61. Rem     characters from the string.
  62. Rem     Note: this switch does not define Return[Len] or Return[String]
  63. ::::::::::::::::::::::::::::::::::
  64. Rem Version changes 07/Dec/2021 :
  65. Rem - Rewritten for much faster performance - NOTE:
  66. Rem   - Added Switch: /P
  67. Rem    - Preserve all whitespace. Default: multiple spaces truncated to single.
  68. Rem - Renamed variable for returning String : Return[String]
  69. Rem - Added variable Return[Len] to return 0 indexed string length.
  70. Rem - Corrected handling of completely non ASCII strings to return empty / 0 Len
  71. Rem ** Utilize alternate data stream to store variable containing printable ASCII
  72. Rem    characters so the variable only needs to be generated on first execution.
  73. Rem     ** Requires this batch file to be run from an NTFS drive.
  74. :# =================================================================================
  75.  
  76. Set LF=^
  77.  
  78.  
  79. %= Empty lines above required =%
  80. For /F eol^=^%LF%%LF%^ delims^= %%A in ('forfiles /p "%~dp0." /m "%~nx0" /c "cmd /c echo(0x09"') do Set "TAB=%%A"
  81.  
  82.  Set "ASCII= !"
  83.  2> nul (
  84.   more < "%~f0:ASCII.dat" > nul || (
  85.    Setlocal EnableDelayedExpansion
  86.    For /l %%i in (34 1 126) Do (
  87.     Cmd /c Exit %%i
  88.     Set "ASCII=!ASCII!!=ExitCodeAscii!"
  89.    )
  90.    >"%~f0:ASCII.dat" (Echo(Set ^^"ASCII=!ASCII!")
  91.    ENDLOCAL
  92.  ))
  93.  
  94.  Set "ASCII="
  95.  For /f "delims=" %%G in ('More ^< "%~f0:ASCII.dat"')Do %%G
  96.  If not Defined ASCII (
  97.   2> nul (
  98.    Powershell.exe -c "Remove-item -path '%~nx0' -Stream '*'"
  99.   )
  100.   1>&2 Echo(An error has occured. Ensure "%~nx0" is located on an NTFS drive.
  101.   Pause
  102.   ENDLOCAL
  103.   Exit /b 1
  104.  )
  105.  
  106.  Rem Maximum stringlength to support. Modify here to propagate to RemoveChar loop and Return[Len]
  107. REM maximum 1015 chars due to input reading method.
  108.  Set "SupportLength=1015"
  109.  Set "{input}="
  110.  
  111. ::====================================================================================================
  112. Rem :: input capture method is a modified version of Dave Benhams method:
  113. Rem :: https://www.dostips.com/forum/viewtopic.php?t=4288#p23980
  114. SETLOCAL EnableDelayedExpansion
  115.  1>"%~f0:Params.dat" <"%~f0:Params.dat" (
  116.   SETLOCAL DisableExtensions
  117.   Set prompt=#
  118.   Echo on
  119.   For %%a in (%%a) do rem . %*.
  120.   Echo off
  121.   ENDLOCAL
  122.   Set /p "{input}="
  123.   Set /p "{input}="
  124.   Set "{input}=!{input}:~7,-2!"
  125.  @Rem duplicate {input} for the purpose of counting doublequotes.
  126.   Set "count=!{input}:~7,-2!"
  127.  ) || (
  128.   1>&2 Echo(%~nx0 requires an NTFS drive system to function as intended.
  129.   CMD /C Exit -1073741510
  130.  ) || Goto:Eof
  131.  
  132. ::====================================================================================================
  133.  
  134. Rem the below line can be used to Remove the aleternate data stream this file creates.
  135. Rem Powershell -c "Remove-item -path '%~nx0' -Stream '*'"
  136.  
  137.  CHCP 65001 > nul
  138.  If not defined {input} (
  139.   Echo(Demo:
  140.  Rem escaped for definition in DelayedExpansion environment
  141.   Set "{input}=this is [    ] a demo) * ^! &^=| ^! <. ~ ^^ & %% ▒ ╔ § ♣ This"
  142.   Set {input}
  143.  )
  144.  
  145. REM handle help switches
  146.  
  147.  Set {input} | %SystemRoot%\System32\Findstr.exe /Xli "{input}=\/? {input}=-? {input}=-help" > nul && (
  148.   Setlocal EnableDelayedExpansion
  149.   For /f "tokens=2* delims=#" %%G in ('%SystemRoot%\System32\Findstr.exe /blic:":# " "%~f0"')Do (
  150.    Set "Usage=%%G"
  151.    Echo(!Usage:Filepath=%~f0!
  152.   )
  153.   ENDLOCAL & ENDLOCAL
  154.   Exit /b 0
  155.  )
  156.  
  157. REM substitute doublequotes in {input} clone 'count'; count substring in string;
  158. REM assess if count is even; If false; Remove doublequotes from string.
  159.  
  160.  Set Div="is=#", "1/(is<<9)"
  161.  Set "{DQ}=0"
  162.  Set ^"count=!count:"={DQ}!"
  163.  2> nul Set "null=%count:{DQ}=" & Set /A {DQ}+=1& set "null=%"
  164.  Set /A !Div:#={DQ} %% 2! 2> nul || Set ^"{input}=!{input}:"=!"
  165.  
  166. REM handle nonhelp switches /R and /P [ mutually exclusive; only enacted if switch terminates commandline input. ]
  167.  
  168.  Set "ASCIISwitch[R]="
  169.  Set "ASCIISwitch[P]="
  170.  If defined {input} (
  171.   Set {input} | %SystemRoot%\System32\findstr.exe /Eli "\/P \/R" > nul && (
  172.    If /I "!{input}:~-3!"==" /P" (
  173.     Set "{input}=!{input}:~0,-3!"
  174.     Set "ASCIISwitch[P]=true"
  175.    ) Else If /I "!{input}:~-3!"==" /R" (
  176.     Set "{input}=!{input}:~0,-3!"
  177.     Set "ASCIISwitch[R]=true"
  178.  )))
  179.  
  180. Rem Remove outer doublequotes from input argument if not already removed due to unbalanced quoting.
  181.  
  182.  If .^%{input}:~0,1%^%{input}:~-1%. == ."". Set "{input}=!{input}:~1,-1!"
  183.  
  184. Rem Substitute TAB
  185.  For /f "delims=" %%G in ("!TAB!")Do Set "{input}=!{input}:%%G={TAB}!"
  186.  
  187. Rem Substitute Paired spaces prior to character removal
  188.  If not defined ASCIISwitch[R] Set "{input}=!{input}:  ={2xSp}!"
  189.  
  190. Rem RemoveChar loop - iterate over input character by character; Compare against each character in whitelist
  191. Rem Appends ASCII Whitelist characters to New string unless /R switch used, in which case NonASCII characters
  192. Rem  trigger an exit of the script with a positive errorlevel indicating the string is not ASCII.
  193. Rem  the return value is the 1 indexed position of the first non ascii character encountered.
  194.  
  195.  Set "end=" & Set "New="
  196.  For /l %%i in (0 1 %SupportLength%)Do If not "!{input}:~%%i,1!"=="" (
  197.   Set "Char=!{input}:~%%i,1!"
  198.   Set "ISAscii="
  199.   For /l %%c in (0 1 94)Do If not "!ASCII:~%%c,1!" == "" (
  200.    Set "C_Char=!ASCII:~%%c,1!"
  201.    if "!Char!"=="!C_Char!" (
  202.     Set "New=!New!!Char!"
  203.     Set "ISAscii=true"
  204.   ))
  205.   If Not Defined ISAscii (
  206.    Set "Modified=true"
  207.    If Defined ASCIISwitch[R] (
  208.     Endlocal & Endlocal &  %Restore_Codepage%
  209.     For /f "delims=" %%G in ('Set /A %%i+1')Do Exit /b %%G
  210.  )))
  211.  
  212. Rem strip new Paired spaces from string if switch /P not used.
  213.  
  214.  Set "{Input}=!New!"
  215.  If not Defined ASCIISwitch[P] (
  216.   For /l %%i in (0 1 9)Do if defined {Input} Set "{Input}=!{Input}:  = !"
  217.  )
  218.  
  219. Rem reinsert original paired spaces and Tab:
  220.  If defined {input} (
  221.   Set "{input}=!{input}:{2xSp}=  !"
  222.   Set "{input}=!{input}:{TAB}=%TAB%!"
  223.  )
  224.  
  225.  If defined {input} (
  226.   Echo(!{input}!
  227.   For /l %%i in (0 1 %SupportLength%)Do If not defined Return[Len] If "!{input}:~%%i,1!"=="" Set "Return[Len]=%%i"
  228.  ) Else (
  229.   Set "Return[Len]=0"
  230.   Set "Return[String]="
  231.  )
  232.  
  233.  For /f "Delims=" %%G in ("!{Input}!")Do (
  234.   ENDLOCAL & ENDLOCAL & Set "Return[Len]=%Return[Len]%" & Set "modified=%modified%" & Set "Return[string]=%%G"
  235.  )
  236.  %Restore_Codepage%
  237.  
  238. If not defined modified Exit /B 0
  239. Exit /b -1
  240.  
RAW Paste Data