In creating an auditing tool for my network, I'm finding that WMIC is outputting with spaces in between each character when accompanied by echoing regular text. For example,
This:
@echo off
echo Foo >> "C:\test.txt"
wmic CPU Get AddressWidth >> "C:\test.txt"
wmic CPU Get Description >> "C:\test.txt"
Returns this:
Foo
A d d r e s s W i d t h
6 4
D e s c r i p t i o n
I n t e l 6 4 F a m i l y 6 M o d e l 6 9 S t e p p i n g 1
If I remove (rem
) the echo Foo
line, the output is formatted nicely since there is only one output type:
AddressWidth
64
Description
Intel64 Family 6 Model 69 Stepping 1
I'm reading that this is because WMIC outputs to UNICODE, while standard batch commands output to ANSI. Can both be joined to share a common format? Can someone please explain in more depth the different format types, why WMIC would output to a different type, and/or any other contributing factors to this output? I've found some bread crumbs, but nothing concrete.
Answer
Pipe the output from Wmic
through more
:wmic CPU Get AddressWidth |more >> "C:\test.txt"
Edit for some more background: the issue you see is due to wmic
output being unicode utf-16. This means that each character (or more correctly, most of them) is encoded in two bytes. wmic
also puts a so called BOM (byte order mark) at the beginning of the output. See byte content below:
FF FE 44 00 65 00 73 00-63 00 72 00 69 00 70 00 ..D.e.s.c.r.i.p.
Those first two bytes (FF FE) specify endianness for UTF-16 and allow data processing tools to recognize the encoding [being UTF-16 little endian].
Obviously type
does this check and if it finds the BOM then properly recognizes the encoding.
On the other hand, if you first echo text
and then append Wmic
output - there is no BOM at the beginning and you can see inconsistent encoding:74 65 78 74 20 0D 0A 44-00 65 00 73 00 63 00 72 text ..D.e.s.c.r
If you put it through type
it cannot infer how to interpret, /most likely/ assumes single byte ('ANSI') and this results in spaces produced for non printable characters (zeros, being in fact high order bytes of two byte character encoding).
more
handles more (pun intended) cases and produces correct output for basic ASCII chars that's why it's commonly used as a hack for this purpose.
One additional note: some editors (notepad being simplest example) will properly display utf-16 encoded file if it is consistent - even without BOM. There is a way to force echo
to produce unicode output (but beware it does not produce BOM) - using cmd /u
causes output for internal commands to be unicode.
I can't really say why cmd unicode support is so limited (or as most would say - broken...) - probably historical/compatibility issues.
Last thing - if you need better unicode support (among many other benefits) I would recommend migrating to powershell
No comments:
Post a Comment