Filter text of an office file like .docx .xlsx .odt .ods on the command line with the free Open Source tool Swiss File Knife
sfk ofilter in.xlsx -+pattern
filter and change text lines, from standard input, or from file(s).
input lines may have a maximum length of 4000 characters.
line selection options
-+pat1 -+pat2 include lines containing pat1 OR pat2
-and+pat1 -and+pat2 include lines containing pat1 AND pat2
in any order.
"-+pat1*pat2" include lines containing pat1 AND pat2
in the given order.
-ls+pat1 include lines starting with pat1
-le+pat1 -le+pat2 include lines ending with pat1 OR pat2
"-ls+pat1*pat2" include starting pat1 and having pat2
-!pat1 -!pat2 exclude lines containing pat1 OR pat2
-ls!pat1 exclude lines starting with pat1
-le!pat1 -le!pat2 exclude lines ending with pat1 or pat2
-no-empty-lines exclude empty lines
-no-blank-lines exclude lines containing just whitespaces
-inc[lude] p1 to p2 include only lines within blocks surrounded by
boundary lines containing patterns p1 or p2
-inc- p1 to p2 same, but exclude boundary lines on output
-cut[-] p1 to p2 remove block of lines from p1 until p2
-inc[-] "*" to p1 include all from text start until marker
-cut[-] p1 to "*" cut all from marker line until end of text
-head=n read only first n lines of text files
-tail=n read only last n lines of text files
(up to a limit of 100000 bytes from file end)
-line=n read only nth line from input
-skipfirst=n skip first n lines. warns on hard wrap.
-force accept hard wrapped lines with -skipfirst
-nocheck with inc, cut: ignore endings without a start
-addmark txt with inc, cut: insert txt after every block
-context=n select n lines of context around hit lines
-precon=5:blue select context before or after hit lines,
-postcon=5:cyan:--- in blue or cyan, with separator "---".
-unique [-case] if same line occurs twice, keep only first.
default is case insensitive text comparison.
-global-unique when filtering multiple files in one command,
then -unique applies to lines in the same file, and -global-unique
applies across all files. this will cache the text of all files in
memory and may not be used with very large files.
-keep pattern after -unique: make an exception for lines
containing the given pattern, and keep them even if redundant.
-keep-empty, -keep-blank always keep empty or whitespace lines.
text processing options
applied after line selection options only.
-rep[lace] _src_dest_
replace string src by dest. first character is separator character (e.g. _).
src is case-insensitive. to select case-sensitive search, say -case.
-lsrep[lace], -lerep[lace]
same as -replace, but replaces only once at line start or line end.
-high[light] color pattern : highlight matching parts within lines.
color: red = dark red, Red = bright red, green, blue,
yellow, cyan, magenta, default.
pattern: e.g. "GET * HTTP/"
type "sfk help colors" for more about colors.
-lshigh[light], -lehigh[light]
same as -highlight, but only at line start or line end.
-sep[arate] "; " -form "$col1 mytext $[-0n.nq]col2 ..."
break every line into columns separated by any character listed after -sep,
then reformat the text according to a user-defined mask similar to printf.
when leaving out -sep, the whole line is packed into column 1. if -spat was
specified, then -form also supports slash patterns like \t.
google for "printf syntax" to get more details. example:
-form "$40col1 $-3.5col2 $05qline $(10.10qcount+1000)"
reformat column 1 as right-ordered with at least 40 chars, column 2 left-
ordered with at least 3 and a maximum of 5 chars, then add the input line
number, "q"uoted, right justified with 5 digits, prefixed by zeros,
then the output line number plus 1000 within quotes. NOTE: some examples
may not work in an sfk script, see section "common errors" below.
adding values so far only works with (q)line and (q)count.
-tabform "$col1 mytext ..."
split and reformat columns of tab separated csv data.
-stabform "$col3\t$col2\t$col1"
reorder three tab separated columns, creating tabbed output
using 's'lash patterns like \t
-utabform "#col1 mytext ..."
same as -tabform but using unix style syntax, to create scripts
that run without changes on Windows and Linux.
-uform "#40col1 #-3.5col2 #05qline"
same as -form but using unix style syntax. short for filter -upat.
-trim removes blanks and tab characters at line start and end.
use -ltrim or -rtrim to trim line start or end only.
-blocksep " " = treat blocks of whitespace as single whitespace separator.
-join[lines] join output lines, do not print linefeeds.
-wrap[=n] wrap output lines near console width [or at column n].
set SFK_CONFIG=columns:n to define or override the console width.
-toiso[=c] converts UTF-8 text to ISO-8859-1. some chars beyond
the 8 bit code range will be reduced to something similar, but
most of them are changed to a dot '.', or character c.
-toutf converts ISO-8859-1 text to UTF-8. if this is done with UTF-8
input text then existing UTF-8 sequences will be destroyed!
-tolower or -toupper convers a-z to lower- or uppercase.
conditional text processing
-[ls/le]where pattern -replace | -highlight | -sep ... -form
replace, highlight or reformat lines matching the given pattern.
all lines that do not match the pattern stay unchanged.
-within pattern -replace _from_to_
replace text in a part of the line matching the given pattern.
the rest of the line text stays unchanged.
pattern support
wildcards * and ? are active by default. add -lit[eral] to disable.
slash patterns are NOT active by default. add -spat to use \t \q etc.
if you need the wildcard * but ALSO want to find/replace '*' characters:
add -spat, then specify \* or \? to find/replace '*' or '?' characters.
instead of typing "sfk filter -spat -rep" all the time, you may use the
short form "sfk filt -srep". the same applies for -(s)sep, -(s)form etc.
unified syntax
since sfk 1.5.4 you can also use -: -ls: -le: under windows.
filter ... -uform or filter -upat ... -form uses # instead of $.
sfk variables versus -tabform
with -upat under windows, of sfk for linux, both filter -tabform
and sfk variables use the syntax #(name) to insert values.
to solve this, variable parsing is not strict and may keep
undefined variable names as is.
quoted multi line parameters are supported in scripts
using full trim. type "sfk script" for details.
further options
-case compare case sensitive. default is case insensitive.
for further options see: sfk help nocase
-lit[eral] treat wildcards * and ? as normal chars (read more above).
-arc XE: include content of .zip .jar .tar etc. archives
as deep as possible, including nested archives.
XD: demo will read first 1000 bytes of each entry.
-qarc quick read top level archives but not nested ones.
-utfout keep raw UTF-8 encoding on output, to use it
with further commands requiring UTF-8 data.
-verbose show names of all files which are currently scanned.
with wfilter: tell current proxy settings, if any.
-write do not print output to console but overwrite input file(s).
only files with actual text changes will be rewritten.
this function may be used only with plain ASCII files, not with
binaries like .doc, .xls. see also "sfk replace".
-write -to msk do not overwrite input files, but save according to mask msk,
e.g. tmp\$file . saves only changed files. say -writeall
to write all files, including those without changes.
-memlimit=mb when using -write, output is cached in memory, which is limited
to 300 mb. use this option to extend, e.g. -memlimit=400
-yes -write simulates by default. add -yes to really write changes.
-snap detect snapfiles and list subfile names having text matches.
-snapwithnames same as -snap, but include subfile names in filtering.
-nofile[names] do not list filenames, do not indent text lines.
-subnames with ofilter: insert .xlsx sheet subfile names.
-count, -cnt preceed all result lines by output line counter
-lnum preceed all result lines by input line number
-hidden include hidden and system files.
-noinfo do not warn on line selection combined with -write.
-noop \" no operation, take the \" parameter but do nothing.
may help if your (windows) shell miscounts quotations.
-hitfiles if another command follows (e.g. +run or +ffilter),
pass a list of files containing at least one hit.
-nocconv disable umlaut and accent character conversions during
output to console. "sfk help opt" for details.
-justrc print no output, just set return code on matching lines.
-upat unix style syntax with -form, using # instead of $
-timeout=n with wfilt: wait up to n msec for web data.
list of possible input sources
from stdin: type x.txt | sfk filter -+pattern
from single input file: sfk filter x.txt -+pattern
text from chained command: sfk list mydir .txt +filter -+pattern
from many files, directly: sfk filter -+pattern -dir mydir -file .txt
from many files, by chain: sfk list mydir .txt +filefilter -+pattern
in general, whenever you need to make sure that file contents (not the
file names) are processed, prefer to say "filefilter" or "ffilt".
web access support
searching the word "html" in an http URL can be done like:
sfk filter http://192.168.1.100/ -+htmlsfk filter http://.100/ -+htmlsfk wfilt .100 -+htmlsfk web .100 +filt -+htmlreturn codes for batch files
0 normal execution, no matching lines found.
1 normal execution, matching lines found.
with -write: returns rc 1 only if any changes were written.
>1 major error occurred. see "sfk help opt" for error handling options.
common errors
when using filter -form within sfk scripts, expressions like $10.10col1
may collide with script parameters $1 $2 $3. to solve this, use brackets
like $(10.10col1), or "sfk label ... -prefix=%", or -uform.
aliasessfk ... +getcol n get column n of whitespace separated text.
same as +filter -blocksep " " -form $coln
sfk ... +tabcol n get column n of tab separated text.
same as +filter -stabform $coln
see also--- open source commands ---sfk xfind search wildcard text in plain text files
sfk ofind search in office files .docx .xlsx .ods
sfk xfindbin search wildcard text in text/binary files
sfk xhexfind search in text/binary with hex dump output
sfk extract extract wildcard data from text/binary files
sfk filter filter and edit text with simple wildcards
sfk find search fixed text in text files
sfk findbin search fixed text in text/binary files
sfk hexfind search fixed text in binary files
sfk replace replace fixed text in text/binary files
--- freeware commands ---sfk view GUI tool to search text as you type
--- xe commercial commands ---sfk replace replace fixed text with high performance
sfk xreplace replace wildcard text in text/binary files
sfk help xe about SFK XE and xreplace with SFK Expressions.
sfk getvar fast single line lookup in multi line variable
sfk difflines show different lines between two files
sfk help unicode about wide character conversion functions
beware of Shell Command Characters.
to find or replace text containing spaces or special characters like <>|!&?*
you must add quotes "" around parameters or the shell will destroy your command.
it splits the command into parts and gives SFK only one part, causing errors.
therefore -replace _ _ _ must be written like: -replace "_ _ _"
within a .bat or .cmd file the percent % must be escaped like %% even
within quoted strings: sfk echo -spat "percent %% is a percent \x25"see alsosfk filter for more filtering examples
examplessfk ofilter in.docx -+foo
get all lines from in.docx containing foo
sfk ofilt in.xlsx -+apple -stabform $col2\t$col3
get table rows containing 'apple',
then use only columns 2 and 3.