Advertisement
dmilicev

detab.c

Nov 14th, 2019
454
0
Never
Not a member of Pastebin yet? Sign Up, it unlocks many cool features!
C 16.19 KB | None | 0 0
  1. /*
  2.  
  3.     detab.c     by  Dragan Milicev
  4.  
  5.  
  6. Removes whitespaces (space characters and tabs) from the end of lines in file
  7. and replaces the remaining tabs with space characters.
  8. Attention is paid to tab stops.
  9.  
  10. -----------------------------------------------------------------------------
  11. Run from command line:
  12. detab detab.c 4
  13. -----------------------------------------------------------------------------
  14. OUTPUT:
  15.  
  16.  In the input file "detab.c" are:
  17.  
  18.    16171 characters
  19.  
  20.      409 rows
  21.  
  22.      673 tabs that have been replaced with space
  23.  
  24.        0 spaces taken from the ends of rows
  25.  
  26.        0 removed tabs from the ends of rows
  27.  
  28.        0 removed whitespaces (space characters and tabs) from the ends of rows
  29.  
  30.  The output file is: "detab.c.txt"
  31.  
  32. Press any key to continue . . .
  33. -----------------------------------------------------------------------------
  34.  
  35.  
  36. Version with fgets() i fputs() and functions
  37. trim_string_end()
  38. detab_string()
  39. detab_file().
  40.  
  41. The trim_string_end() function has been added to remove whiteness
  42. (space and tabs) at the end of string s, and it counters
  43. num_deleted_spaces,
  44. num_replaced_tabs,
  45. num_deleted_whiteness.
  46.  
  47. U main() se obradjuju argumenti programa (FileName.ekstenzija i tabWidth)
  48. a funkciji detab_file() se prosledjuju imena ulaznog i izlaznog fajla, tabWidth
  49. i pointeri na brojace karaktera, redova i tabulatora.
  50.  
  51. main() processes program arguments (FileName.extension and tabWidth)
  52. and the names of the input and output files and tabWidth
  53. are passed to the detab_file() function,
  54. as well as pointers to counters of characters, lines and tabs.
  55.  
  56. The tab moves the cursor from its current position to the next tab stop.
  57.  
  58. tabWidth is the distance between two adjacent tab stops
  59. expressed by the number of space characters.
  60. Most often, tabWidth = 4, and can be 2, 4, 6, 8, or any other positive value.
  61.  
  62. If the tab is on the i-th line index,
  63. is replaced by the appropriate number of space characters by formula:
  64.  
  65.     nb = tabWidth-(i%tabWidth)
  66.  
  67. to the next tab stop.
  68.  
  69. If tabWidth = 4, tab stops are on lines with indexes:
  70.  
  71.           1         2         3         4         5         6         7         8
  72. 012345678901234567890123456789012345678901234567890123456789012345678901234567890
  73.     4   8   12  16  20  24  28  32  36  40  44  48  52  56  60  64  68  72  76  80 ... index
  74.  
  75. Integer division:
  76. x%y = x ,                                       if x<y
  77. x%y = 0 ,                                       if x=y
  78. x%y = an integer residue in division x with y,  if x>y
  79.  
  80. Tab on index i changes with nb = tabWidth-(i%tabWidth) space characters.
  81. to get to the next tab stop:
  82. index                      nb                                                  tab stop
  83.    0, changes with tabWidth-(i%tabWidth) = 4-( 0%4) = 4-0 = 4 space to get to the  4. index
  84.    1, changes with tabWidth-(i%tabWidth) = 4-( 1%4) = 4-1 = 3 space to get to the  4. index
  85.    2, changes with tabWidth-(i%tabWidth) = 4-( 2%4) = 4-2 = 2 space to get to the  4. index
  86.    3, changes with tabWidth-(i%tabWidth) = 4-( 3%4) = 4-3 = 1 space to get to the  4. index
  87.  
  88.    4, changes with tabWidth-(i%tabWidth) = 4-( 4%4) = 4-0 = 4 space to get to the  8. index
  89.    5, changes with tabWidth-(i%tabWidth) = 4-( 5%4) = 4-1 = 3 space to get to the  8. index
  90.    6, changes with tabWidth-(i%tabWidth) = 4-( 6%4) = 4-2 = 2 space to get to the  8. index
  91.    7, changes with tabWidth-(i%tabWidth) = 4-( 7%4) = 4-3 = 1 space to get to the  8. index
  92.  
  93.    8, changes with tabWidth-(i%tabWidth) = 4-( 8%4) = 4-0 = 4 space to get to the 12. index
  94.    9, changes with tabWidth-(i%tabWidth) = 4-( 9%4) = 4-1 = 3 space to get to the 12. index
  95.   10, changes with tabWidth-(i%tabWidth) = 4-(10%4) = 4-2 = 2 space to get to the 12. index
  96.   11, changes with tabWidth-(i%tabWidth) = 4-(11%4) = 4-3 = 1 space to get to the 12. index
  97.  
  98.   12, changes with tabWidth-(i%tabWidth) = 4-(12%4) = 4-0 = 4 space to get to the 16. index
  99.   13, changes with tabWidth-(i%tabWidth) = 4-(13%4) = 4-1 = 3 space to get to the 16. index
  100.   14, changes with tabWidth-(i%tabWidth) = 4-(14%4) = 4-2 = 2 space to get to the 16. index
  101.   15, changes with tabWidth-(i%tabWidth) = 4-(15%4) = 4-3 = 1 space to get to the 16. index
  102. ...
  103.  
  104. An example of a program call:
  105. detab FileName 4
  106.  
  107. In the text file FileName
  108. tabs are replaced by space characters
  109. and writes the result to a FileName.txt file
  110.  
  111. If tabWidth is not specified or tabWidth = 0, it does not replace tabs.
  112.  
  113. */
  114.  
  115. #include <stdio.h>
  116. #include <stdlib.h>                 // for exit()
  117. #include <string.h>                 // for strcpy()
  118.  
  119. #define MAX_STRING_LEN 1024         // maximum length of one row in a file
  120.  
  121.  
  122. // Removes whitespaces (spaces and tabs) from the end of string s.
  123. // There can be only one character for a new line '\ n' in string s,
  124. // and only at the end of string s,
  125. // and then that newline character remains at the end of the trimmed string.
  126. void trim_string_end( char s[],
  127.                       int *num_deleted_spaces,
  128.                       int *num_replaced_tabs,
  129.                       int *num_deleted_whiteness){
  130.  
  131.     int n = strlen(s);              // n is the length of string s (final character '\0' is not counted)
  132.                                     // n is the sequence number of the last character in the string, whose index is n-1
  133.     while(n > 0){                   // from the end of the string to the beginning, to the left
  134.  
  135.         if  (s[n-1]==' '){          // if the last character is space
  136.             (*num_deleted_spaces)++;
  137.             (*num_deleted_whiteness)++;
  138.             s[n-1]='\0';            // delete it
  139.         }else if (s[n-1]=='\t'){    // if the last character is a tab
  140.             (*num_replaced_tabs)++;
  141.             (*num_deleted_whiteness)++;
  142.             s[n-1]='\0';            // delete it
  143.         }else                       // if not space or tab
  144.             break;                  // break while loop
  145.  
  146.         n--;                        // pass to the next character to the left
  147.     }
  148. } // trim_string_end()
  149.  
  150.  
  151. // detab_string() in string s replaces the tabs with the appropriate number of space characters.
  152. // Returns the number of replaced tabs num_tab.
  153. // tabWidth is the distance between tab stops expressed by the number of space characters.
  154. int detab_string(char s[], int tabWidth){
  155.     char as[MAX_STRING_LEN];        // auxiliary string
  156.     char *pointer_as = &as[0];      // or:  char *pointer_as = as;
  157.     int num_tab = 0;                // counter for tabs
  158.     int len = strlen(s);            // length of string s (number of characters in string s)
  159.     int nb = 0; // the number of space characters that we change the specific tab at the index location i
  160.     int i = 0;                      // i is the index for the main string s
  161.     int j = 0;                      // j is the index for auxiliary string sp
  162.     int k;                          // counter for loop
  163.  
  164. // Copy s to as, then read from as and write to s.
  165. // The length of the string s when we replace the tabs
  166. //  with space characters will be larger than the initial one.
  167. // The worst case is that the string s consists only of a tab.
  168. // New, larger length of string s is
  169. // lenght_s = lenght_s - num_tab + num_tab * tabWidth + 1;  // +1 for '\0'
  170. // So let's first count the number of tabs in string s and calculate its new length.
  171. // Then check if the new length of s is greater than MAX_STRING_LEN.
  172. // If so, we allocate the new memory required for strings s and as.
  173.  
  174.                                     // count the number of tabs in string s
  175.     while(s[i]!='\0'){              // we read s from start to end
  176.  
  177.         if( s[i++] == '\t')         // if the character is read from the s[i] tab
  178.             num_tab++;              // we count the tabs
  179.     }
  180.  
  181.     i = 0;                          // reset loop counter
  182.  
  183.     // calculate new length of string s, considering the worst case scenario
  184.     // is that all tabs have been replaced with numTabs * tabWidth characters
  185.     len = len - num_tab + num_tab * tabWidth + 1;
  186.  
  187.     // If so, we reallocate the new memory required for strings s and as.
  188.     if( len > MAX_STRING_LEN ){
  189.  
  190.         s = realloc( s, len * sizeof(char) + 1 );
  191.  
  192.         // Let's check if dynamic memory allocation for variable s succeeded
  193.         if ( s == NULL )
  194.         {
  195.             fprintf(stderr, "\n\n string s realloc() error ! \n\n");
  196.             exit(EXIT_FAILURE);
  197.         }
  198.  
  199.         pointer_as = realloc( pointer_as, len * sizeof(char) + 1 );
  200.  
  201.         // Let's check if dynamic memory allocation for variable as succeeded
  202.         if ( pointer_as == NULL )
  203.         {
  204.             fprintf(stderr, "\n\n string as realloc() error ! \n\n");
  205.             exit(EXIT_FAILURE);
  206.         }
  207.  
  208.     } // if( len > MAX_STRING_LEN )
  209.  
  210.     strcpy(as,s);                   // Now, it is safe to copy s to as.
  211.  
  212.     while(as[j]!='\0'){             // we read as from start to end
  213.  
  214.         if( as[j] == '\t') {        // if the character is read from the as[j] tab
  215.  
  216.             // calculate the number of space characters that we change the specific tab
  217.             nb = tabWidth-(i%tabWidth);
  218.  
  219.             for( k=0; k<nb; k++ )   // in s we change that '\t' with nb space characters
  220.                 s[i++]=' ';         // for each space character, we increment the index i of s
  221.  
  222.             j++;                    // go to next ch in sp
  223.         }else                       // all other characters from sp that aren't tab,
  224.             s[i++]=as[j++];         // we place unchanged in s
  225.     }
  226.     s[i]='\0';                      // end the string s
  227.  
  228.     return num_tab;                 // returns the tab number in the string s
  229.  
  230. } // int detab_string()
  231.  
  232.  
  233. /*
  234. // I found a shorter detab() function on the internet:
  235. void detab( char* in, char* out, int tabWidth, size_t max_len ) {
  236.     size_t i = 0;
  237.  
  238.     while (*in && i < max_len - 1) {
  239.  
  240.         if (*in == '\t') {
  241.             in++;
  242.             out[i++] = ' ';
  243.  
  244.             while (i % tabWidth && i < max_len - 1) {
  245.                 out[i++] = ' ';
  246.             }
  247.         } else {
  248.             out[i++] = *in++;
  249.         }
  250.     }
  251.  
  252.     out[i] = 0;
  253. }
  254. */
  255.  
  256.  
  257. // detab_file () opens the input and output files, performs processing, and closes both files.
  258. // Removes whitespaces (space characters and tabs) from the end of rows in the input file,
  259. // Replaces the remaining tabs with the appropriate number of space characters
  260. // and saves the result to a new output file.
  261. // Calls the trim() and detab_string() functions.
  262. void detab_file(char input_file_name[],
  263.                 char output_file_name[],
  264.                 int tabWidth,
  265.                 int *num_characters,
  266.                 int *num_rows,
  267.                 int *num_tabs,
  268.                 int *num_deleted_spaces,
  269.                 int *num_replaced_tabs,
  270.                 int *num_deleted_whiteness ){
  271.  
  272.     FILE* pointer_input_file;
  273.     FILE* pointer_output_file;
  274.     char row[MAX_STRING_LEN];       // one line of text
  275.  
  276.     // We open the input text file for reading
  277.     if( (pointer_input_file = fopen(input_file_name, "r")) == NULL ) {
  278.         fprintf(stderr, "\n\n Error opening input file %s ! \n\n", input_file_name );
  279.         exit(EXIT_FAILURE);         // interrupt program execution
  280.     }
  281.     // We open the output text file for writing
  282.     if( (pointer_output_file = fopen(output_file_name, "w")) == NULL ) {
  283.         fprintf(stderr, "\n\n Error opening output file %s ! \n\n", output_file_name );
  284.         exit(EXIT_FAILURE);         // interrupt program execution
  285.     }
  286.     // loads one row at a time from the input file and processes it
  287.     while( (fgets(row, MAX_STRING_LEN, pointer_input_file) ) != NULL ) {
  288.         // remove the whites from the end of the row string
  289.         trim_string_end( row,
  290.                          num_deleted_spaces,
  291.                          num_replaced_tabs,
  292.                          num_deleted_whiteness);
  293.         // check maximum row length from file
  294.         if ( strlen(row) > MAX_STRING_LEN ){
  295.             fprintf(stderr, "\n\n %8d. row is longer than %d characters ! \n\n",
  296.                     (*num_rows)+1, MAX_STRING_LEN );
  297.             exit(EXIT_FAILURE);     // interrupt program execution
  298.         }
  299.         (*num_characters)+=strlen(row); // We loaded 1 row and increased the total number of characters
  300.                                     // fgets() loads and counts and character \n at the end of the line
  301.         (*num_rows)++;              // We loaded 1 row and increased the total number of rows
  302.         (*num_tabs) += detab_string(row,tabWidth);  // counts the total number of tabs
  303.         fputs(row,pointer_output_file);             // writes the processed row to the output file
  304.     }
  305.  
  306.     // Close the input file
  307.     if( ( fclose(pointer_input_file) ) == EOF ) {
  308.         fprintf(stderr, "\n\n Error closing input file %s ! \n\n", input_file_name);
  309.         exit(EXIT_FAILURE);         // interrupt program execution
  310.     }
  311. //    else
  312. //        printf("\n\n File \"%s\" has been successfully closed. \n\n", input_file_name );
  313.  
  314.     // Close the output file
  315.     if( ( fclose(pointer_output_file) ) == EOF ) {
  316.         fprintf(stderr, "\n\n Error closing output file %s ! \n\n", output_file_name);
  317.         exit(EXIT_FAILURE);         // interrupt program execution
  318.     }
  319. //    else
  320. //        printf("\n\n File \"%s\" has been successfully closed. \n\n", output_file_name );
  321.  
  322. } // void detab_file()
  323.  
  324.  
  325.  
  326. int main(int argc, char *argv[]) {
  327.  
  328.     char input_file_name[MAX_STRING_LEN];
  329.     char output_file_name[MAX_STRING_LEN];
  330.     char row[MAX_STRING_LEN];       // one line of text
  331.     int ch;                         // Ascii code of one character
  332.     int num_characters = 0;         // Total number of characters in the file
  333.     int num_rows = 0;               // Total number of rows in the file
  334.     int num_tabs = 0;               // Total number of tabs in the file
  335.     int num_deleted_spaces = 0;     // Total number of spaces taken off the ends of the file rows
  336.     int num_replaced_tabs = 0;      // Total number of tabs removed from the ends of the file rows
  337.     int num_deleted_whiteness = 0;  // Total number of whitenesses removed from the ends of the file rows
  338.     int tabWidth=0;                 // space between tab stops (expressed by space character number)
  339.  
  340. // number of command line arguments when calling a program
  341. // detab                , argc = 1, that is the name of the program itself
  342. // detab fajl.txt       , argc = 2, it is the name of the program itself and the name of the input file
  343. // detab fajl.txt 4     , argc = 3, it is the name of the program itself, the name of the input file, and tabWidth
  344.  
  345. // Replacing the tab with space characters is only done if argc = 3 and tabWidth > 0
  346. // otherwise only the program description is printed.
  347. // printf("\n argc = %d \n ", argc);
  348. // argv[0] is the name of the program itself,   (detab.exe)
  349. // argv[1] is the name of the input file,       (filename.extension)
  350. // argv[2] is tabWidth,                         (4)
  351.  
  352.     if( argc == 3 ) {               // if all arguments are specified
  353.  
  354.         tabWidth = atoi( argv[2] ); // converting string argv[2] to integer gets tabWidth
  355.  
  356.         if ( tabWidth<0 ) {         // check if tabWidth is a positive number
  357.             fprintf(stderr, "\n ERROR: tabWidth = %d is not a positive number ! \n\n", tabWidth);
  358.             system("PAUSE");
  359.             exit(EXIT_FAILURE);     // interrupt program execution
  360.         }
  361.  
  362.         // check if strcpy() has enough space in memory
  363.         if ( strlen(argv[1]) > MAX_STRING_LEN - 5 ) {   // -5 for .txt and '\0' for output_file_name
  364.             fprintf(stderr, "\n ERROR: The name of the input file \"%s\" is longer than %d characters ! \n\n",
  365.                     argv[1], MAX_STRING_LEN-5);
  366.             system("PAUSE");        // pause closing the screen
  367.             exit(EXIT_FAILURE);     // interrupt program execution
  368.         }
  369.                                     // form the names of the input and output files
  370.         strcpy(input_file_name,argv[1]);            // from argv[1] we make the name of the input file
  371.         strcpy(output_file_name,input_file_name);   // the name of the output file is as the name of the input
  372.         strcat(output_file_name,".txt");            // with the addition of .txt
  373.     }
  374.  
  375.     // print the program description if all the FileName and tabWidth arguments are not correctly specified
  376.     if( argc != 3 || tabWidth<=0 ) {
  377.         printf("\n DETAB FileName 4 \n"
  378.                "\n in the text file FileName replaces the tabs with the appropriate \n"
  379.                "\n number of space characters and result is recorded in FileName.txt \n\n");
  380.         system("PAUSE");            // pause closing the screen
  381.         exit(EXIT_FAILURE);         // interrupt program execution
  382.     }
  383.  
  384.     // call the function with its arguments
  385.     detab_file( input_file_name,
  386.                 output_file_name,
  387.                 tabWidth,
  388.                 &num_characters,
  389.                 &num_rows,
  390.                 &num_tabs,
  391.                 &num_deleted_spaces,
  392.                 &num_replaced_tabs,
  393.                 &num_deleted_whiteness );
  394.  
  395.     // Display the results of processing the input file:
  396.     printf("\n In the input file \"%s\" are: \n\n", input_file_name );
  397.     printf("%8d characters \n\n", num_characters);  // The total number of characters in the input file
  398.     printf("%8d rows \n\n", num_rows );             // The total number of rows in the input file
  399.     printf("%8d tabs that have been replaced with space \n\n", num_tabs );
  400.     printf("%8d spaces taken from the ends of rows \n\n", num_deleted_spaces );
  401.     printf("%8d removed tabs from the ends of rows \n\n", num_replaced_tabs );
  402.     printf("%8d removed whitespaces (space characters and tabs) from the ends of rows \n\n", num_deleted_whiteness);
  403.     printf(" The output file is: \"%s\" \n\n",output_file_name);
  404.  
  405.     system("PAUSE");                // pause closing the screen
  406.  
  407.     return 0;                       // return 0, program successfully completed
  408.  
  409. } // int main()
Advertisement
Add Comment
Please, Sign In to add comment
Advertisement