Saturday, December 21, 2019

windows - Combine Batch/WMIC + ANSI/UNICODE Output formatting



In creating an auditing tool for my network, I'm finding that WMIC is outputting with spaces in between each character when accompanied by echoing regular text. For example,



This:



@echo off
echo Foo >> "C:\test.txt"
wmic CPU Get AddressWidth >> "C:\test.txt"

wmic CPU Get Description >> "C:\test.txt"


Returns this:



Foo 
A d d r e s s W i d t h

6 4


D e s c r i p t i o n

I n t e l 6 4 F a m i l y 6 M o d e l 6 9 S t e p p i n g 1


If I remove (rem) the echo Foo line, the output is formatted nicely since there is only one output type:



AddressWidth  
64
Description

Intel64 Family 6 Model 69 Stepping 1


I'm reading that this is because WMIC outputs to UNICODE, while standard batch commands output to ANSI. Can both be joined to share a common format? Can someone please explain in more depth the different format types, why WMIC would output to a different type, and/or any other contributing factors to this output? I've found some bread crumbs, but nothing concrete.


Answer



Pipe the output from Wmic through more:
wmic CPU Get AddressWidth |more >> "C:\test.txt"



Edit for some more background: the issue you see is due to wmic output being unicode utf-16. This means that each character (or more correctly, most of them) is encoded in two bytes. wmic also puts a so called BOM (byte order mark) at the beginning of the output. See byte content below:



FF FE 44 00 65 00 73 00-63 00 72 00 69 00 70 00 ..D.e.s.c.r.i.p.




Those first two bytes (FF FE) specify endianness for UTF-16 and allow data processing tools to recognize the encoding [being UTF-16 little endian].
Obviously type does this check and if it finds the BOM then properly recognizes the encoding.
On the other hand, if you first echo text and then append Wmic output - there is no BOM at the beginning and you can see inconsistent encoding:
74 65 78 74 20 0D 0A 44-00 65 00 73 00 63 00 72 text ..D.e.s.c.r



If you put it through type it cannot infer how to interpret, /most likely/ assumes single byte ('ANSI') and this results in spaces produced for non printable characters (zeros, being in fact high order bytes of two byte character encoding).



more handles more (pun intended) cases and produces correct output for basic ASCII chars that's why it's commonly used as a hack for this purpose.



One additional note: some editors (notepad being simplest example) will properly display utf-16 encoded file if it is consistent - even without BOM. There is a way to force echo to produce unicode output (but beware it does not produce BOM) - using cmd /u causes output for internal commands to be unicode.



I can't really say why cmd unicode support is so limited (or as most would say - broken...) - probably historical/compatibility issues.




Last thing - if you need better unicode support (among many other benefits) I would recommend migrating to powershell


No comments:

Post a Comment

hard drive - Leaving bad sectors in unformatted partition?

Laptop was acting really weird, and copy and seek times were really slow, so I decided to scan the hard drive surface. I have a couple hundr...