Find lines that contain greater than specified number of characters in Unix

When working with files in Unix or Linux, sometimes you need to find the files that are greater than a specified number of characters. Unix or Linux systems are often used as servers for applications in the corporate and enterprise. Some system interfaces require that files with data be sent from one server to another for processing. As a result, some files could be improperly formatted or could contain bad data which requires manual intervention.

awk 'length $(0)'>42  datafile.dat

In the command above, awk is the program used.

  • "$(0)" says to pull the entire line.
  • "42" is the number that you are filtering on. In the example, only lines that are greater than 42 characters will be displayed.
  • "datafile.dat" is the name of the file that you want to be checked.

Example

I had to use this when a production issue occurred for one of the applications that I supported. The data file that the application processed had a specified line length and format. If the length or the format was incorrect, then it knew that there was a problem with the file.

From the user perspective, it was difficult to find the offending lines in this file with over 100K lines. Using the awk command above, was able to find all of the offending lines in a matter of seconds or minutes.

Updated: 2022-04-14 | Posted: 2015-10-30
Author: Kenny Robinson, @almostengr