VolgaCTF 2015 Quals - Captcha (150pts) writeup

The challenge description was: We've got a rather strange png file. Very strange png. Something isn't right about it... (a PNG file with what seemed to be the letter 'i' was provided)

Let's have a quick look at this file:

mrt:~/ctf/volga/stego/captcha$ xxd capthca.png | less
0000000: 8950 4e47 0d0a 1a0a 0000 000d 4948 4452 .PNG........IHDR
0000010: 0000 0100 0000 0100 0802 0000 00d3 103f ...............?
0000020: 3100 0002 df49 4441 5478 9ced db31 0e03 1....IDATx...1..
0000030: 210c 00c1 10e5 ff5f 76ca 34d7 2250 76a6 !......_v.4."Pv.
0000040: bad2 cdca 7012 6b66 5e50 f53e 3d00 9c24 ....p.kf^P.>=..$
0000050: 00d2 0440 9a00 4813 0069 0220 4d00 a409 ...@..H..i. M...
...
0000310: 4945 4e44 ae42 6082 8950 4e47 0d0a 1a0a IEND.B`..PNG....
0000320: 0000 000d 4948 4452 0000 0100 0000 0100 ....IHDR........
0000330: 0802 0000 00d3 103f 3100 0003 2a49 4441 .......?1...*IDA
0000340: 5478 9ced dbc1 6ac2 5014 45d1 a6f4 ff7f Tx....j.P.E.....
0000350: 399d 151b 24d8 62ee 13f7 5a33 d181 0377 9...$.b...Z3...w
0000360: 729e e2b6 effb 0754 7dae 7e03 b092 0048 r......T}.~....H
...

After looking at the content of the file we notice that there are multiple PNG files concatenated, and more specifically:

mrt:~/ctf/volga/stego/captcha$ grep -c -a PNG capthca.png
1892

1892! That's quite a number. We have a couple options here, making a script to extract each one of these from their header .PNG (89504e47) to the end of each image IEND.B`. (49454e44ae426082) or use a tool such as foremost which will extract all files for us. I used the later:

mrt:~/ctf/volga/stego/captcha$ foremost capthca.png
Processing: capthca.png
|*|

mrt:~/ctf/volga/stego/captcha$ cd output

mrt:~/ctf/volga/stego/captcha/output$ ls
total 184
drwxr-xr-- 3 mrt mrt 4096 May 4 02:25 .
drwxr-xr-x 4 mrt mrt 4096 May 4 02:25 ..
-rw-r--r-- 1 mrt mrt 113055 May 4 02:25 audit.txt
drwxr-xr-- 2 mrt mrt 65536 May 4 02:25 png

mrt:~/ctf/volga/stego/captcha/output$ ls png/
total 7636
drwxr-xr-- 2 mrt mrt 65536 May 4 02:25 .
drwxr-xr-- 3 mrt mrt 4096 May 4 02:25 ..
-rw-r--r-- 1 mrt mrt 792 May 4 02:25 00000000.png
-rw-r--r-- 1 mrt mrt 867 May 4 02:25 00000001.png
-rw-r--r-- 1 mrt mrt 859 May 4 02:25 00000003.png
-rw-r--r-- 1 mrt mrt 916 May 4 02:25 00000004.png
-rw-r--r-- 1 mrt mrt 849 May 4 02:25 00000006.png
-rw-r--r-- 1 mrt mrt 856 May 4 02:25 00000008.png
...
-rw-r--r-- 1 mrt mrt 781 May 4 02:25 00003165.png
-rw-r--r-- 1 mrt mrt 781 May 4 02:25 00003166.png
-rw-r--r-- 1 mrt mrt 792 May 4 02:25 00003168.png

Despite the filenames we do have 1892 files successfully extracted. Each one of these file shows a letter as well, this means that we have 1892 letters to write down or we could use an ocr program such as tesseract to do the job for us. Problem was my version couldn't process a single letter in an image properly, and to try and get more accurate results I ended up making a small script cropping pictures and grouping them by chunk of 20 letters. The reason I cropped them was to avoid having huge images and it made the process much faster since each one of them was originally 256x256 and we only needed 40x40 pixels resolution.

Here is the final python script:

#!/usr/bin/env python
import subprocess, glob

files = glob.glob('*.png')
result = ''
i = 0
c = 0
chunk = 'chunk-'

# crop pictures to 40x40 using imagemagick convert command
for f in sorted(files):
subprocess.call("/usr/bin/convert ./" + f + " -crop 40x40+110+114 ./cropped/cropped_" + f, shell=True)

# group letters by 20 in picture 'chunks' using convert again
for f in sorted(files):
result += ' ./cropped/cropped_' + f
i += 1
if i % 20 == 0:
subprocess.call("convert " + result + " +append ./chunks/" + chunk + str(c).zfill(4) + ".png", shell=True)
print("convert" + result + " +append ./chunks/" + chunk + str(c).zfill(4) + ".png")
result = ''
c += 1

# process remaining files stored in result
subprocess.call("convert " + result + " +append ./chunks/" + chunk + str(c).zfill(4) + ".png", shell=True)
print("convert" + result + " +append ./chunks/" + chunk + str(c).zfill(4) + ".png")

# get all chunks and store them in a list
files = glob.glob('./chunks/chunk-*.png')

# run tesseract on each chunk (using b64 character set for the final version)
for f in sorted(files):
subprocess.call("tesseract " + f + " stdout b64", shell=True)

After running the script and piping the output to a text file I ended up with the following content:

iVBORwOKGgoAAAANSUhE
UgAAARAAAACDCAI AAADK
7deAAAAAXNSRO| Ars4c
GQAAAARnQU1BAAij wv8
YQUAAAAJthchAADs MA
AA7DAcdquQAAAUf SURB
VHhe7dhthowEEVR1sWC
WA+rYTNdTOthTSj D6NH
Eije37Z8kgaGV6T9PQB
YBmBAQQEBhAQGEBAYAAB
gQEEBAYQEBhAQGAAAYEB
BAQGEBAYQEBgAAGBAQQE
BhAQGEBAYAABgQEEBAYQ
EBhAQGAAAYEBBAQGEBAY
QEBgAAGBAQQEBhAQGEBA
YAABgQEEBAYQEBhAQGAA
AYEBBAQGEBAYQEBgAAGB
AQQEBhAQGEBAYAABgQEE
QmD+XM+n0+|8/ZPvB62k
css3/53b5TQ4/dejva
7zd9iOk7mw9j7/vzb3w9
MOU9pi26|quetf+2JKn
JzGIJstO30nBOb7NV+A
szaemAGn283JDe09qu
jr3u6Y|P|TDvaxCut+J
quoPyle8T4MAdeYZf
yDpov37Zdjf5WbD5be0
O3t0t8/x|w++Omstvym
7aj|Jq71|ou3st1tg/v
|X3/SRO+X69135V9Xc1m
sv6|TXTt28x801xkbny
NLuOTBsert8049YfNjOz
FW+rrG5qSu3Gyn1ZLdyq
3FpW87CVY8uBGe1VX062
N507th5MkzZs7onmqZ
reTf59Nu/aPefD/p0p67
szNa8GPhymzfdsziZf
+/UanapKOthdTmePj
9+Gw60Ra576+2yqu5dZ
aKaMzoVN3WS7HGw6WbE/
3Dbi6/oC1Vpgmm3vuu3D
8Wbd+cMuHsCV2eWo|zE3
chb+NXHvtqa7Wm6bcbd
p0m+7km9mq0l5$aUS7dl
Uu5jA21Zb6|+1|G7dGPJ
OoncNuNd3apWOK/YTxts
/LCXQ8s/YUZH6Mbisf1j
e+LcOx6tOhDL7CGsMucm
TnobdeHm+YTe9erbfu
Mu7rntSr2fpzNiOdv|y0
sz72EBb1vaN33eX/+g
maYyOdw1bup3nWy61wyG
4jbbSH3sO3sz/ka2/q
43TmMu6b7k84NCGzOQen
dxMPeuinRan7pr2m6
r93|21on//|PtstH|t2
97Gthw3qud57chr5x
E8PjFW7upky3i8mthWY
Rx8ebiu02upJngb6Ybi
scvchwaV3bzc|JBntv
xu8chanXXNpctmw1Dj
WFnulJWUombfWDWpGax/
xKa7HfPA/c515jfaJx3v
ZM3Uij|59Jche9+u29f
Lm6equU37+|w002q8mj
VhEij bstthCWByY3v
0v8m|VOVB4fnqCvZVXU4
yTQt1PrQu|st4XHiTuB
uT+sw7N/7rfR8h9um/p5
xj0m6x/Yuw6F7iDhe1PG
QSMTSSKeJC/TflibQTPS
ifZN85916me7quunSp
9i 3Hx8d9PPSprCd7NMP
sfijuCrvOC|anNkaq
V+X|NwYm7db8qP9K8ads
9fCATO/8xx2f65N5ectL
OQjOBYFJu3/66ysEBgCB
AQQEBhAQGEBAYAABgQEE
BAYQEBhAQGAAAYEBBAQG
EBAYQEBgAAGBAQQEBhAQ
GEBAYAABgQEEBAYQEBhA
QGAAAYEBBAQGEBAYQEBg
AAGBAQQEBhAQGEBAYAAB
gQEEBAYQEBhAQGAAAYEB
BAQGEBAYQEBgAAGBAQQE
BhAQGEBAYAABgQEEBAYQ
EBhAQGAAAYEBBAQGEBAY
QEBgAAGBAQQEBhAQGEBA
YAABgQEEBAYQEBhAQGAA
AYEBBAQGEBAYQEBgAAGB
AQQEBhAQGEBAYAABgQEE
BAYQEBhAQGAAAYEBBAQG
EBAYQEBgAAGBAQQEBI j 2
8fEX4sz1GGZYi cAAAAA
SUVORKSCYI

Beside a couple characters this actually looks a lot like base64 encoding:

mrt:~/ctf/volga/stego/captcha/output/png$ cat output.txt | base64 --decode | xxd | less
base64: invalid input
0000000: 8950 4e47 038a 1a0a 0000 000d 4948 4452 .PNG........IHDR
0000010: 0000 0110 0000 0083 0802 ..........

It's actually another PNG file. Tesseract made a couple errors and outputted wrong characters (can't complain though it did most of the job), at this point the best solution I had was to manually check all chunks of 20 letters and slowly fix each set. It took a bit and I hated the fact that I (uppercase i) and l (lowercase L) was too similar beside a couple pixels wider for the uppercase i. I may have cursed a bit in the process but I ended with the following base64:

iVBORw0KGgoAAAANSUhE
UgAAARAAAACDCAIAAADK
7dMbAAAAAXNSR0IArs4c
6QAAAARnQU1BAACxjwv8
YQUAAAAJcEhZcwAADsMA
AA7DAcdvqGQAAAUfSURB
VHhe7dhtYtowEEVR1sWC
WA+rYTNdTOqRhTSjD6NH
EmjSe37Z8kgaGV6T9PQB
YBmBAQQEBhAQGEBAYAAB
gQEEBAYQEBhAQGAAAYEB
BAQGEBAYQEBgAAGBAQQE
BhAQGEBAYAABgQEEBAYQ
EBhAQGAAAYEBBAQGEBAY
QEBgAAGBAQQEBhAQGEBA
YAABgQEEBAYQEBhAQGAA
AYEBBAQGEBAYQEBgAAGB
AQQEBhAQGEBAYAABgQEE
QmD+XM+n0+l8/ZPvB6zk
css3/53b5TQ4/fSdjMvf
7zd9iOk7mw9j7/vzb3w9
MOU9piZ6lqRvetf+2JKn
Jz6lJsDvO30nBOb7NV+A
LzjaemAGn283JDe09qVp
jr3u6YlPITDvNvxCut+J
vuBoPyIwP8T4MATmdYZf
yDcGpv37Zdjf5WbD5be0
O3t0t8/xIw++OmGjsvym
7ajlJq71lou3svP1tg/v
lX3/SR0+X691s5V9Xc1m
sv6ITXTt28x8O1xkbyDf
NLuOTBsert8049YfNjOz
FW+rrG5qSu3Gyn1ZLdyq
3FpW87CVY8uBGe1VX062
N507tKf5Mkz2s7oVxmqZ
reTf5gNu/aPefD/p0p67
zkKNa8GPhymzfdsJviZf
+/UnbGapKOVhkdTNfmPj
9+Gw60Ra576+2yqsU5dZ
aKaMzoVN3WS7HGw6WbE/
3Dbi6/oC1Vpgmm3vuu3D
8Wbd+cMuHsCV2eWolzE3
cdLb+NXHvtqa7Wm6bcbd
pOm+7km9mq0/5SaUS7dl
Uu5jA21Zb6l+1IG7dGPJ
0oncNuNd3apW0K/YTxts
/LCXQ8s/YUZH6Mbisf1j
e+LcOx6tOhDL7C6sMucm
TnobdxBHm+YTe9rMdbfu
Mu7rntSr2fpzNiOdvly0
zdT72EBb1pvXN33eX/+g
maYyOdw1bup3nWy61wyG
4jbbSH3sO3zWd/wNk2/q
43TmMu6b7k84NC6z0Qen
dxMPeusXiRuOa7pxN2m6
r93l21ozW//IPtstHlt2
97GBtqw3qXed57vScr5x
E8PjFW7upky3i8mmhXWY
Rx8ebiuQ2upJgWmb6Ybi
scvjcGwbLV3bzcIJBntv
xu8vcBMnvXXNpctmw1Dj
WFnuIJWUombfWDWpGax/
xKa7HfPA/c515jfaJx3v
ZM3UijI59JjWce9+uz9f
Lm6eeqJU37+lw00zq8mj
VhEKmjbsVmhq5CWByY3v
0v8mlVOVB4fnqCvZVXU4
yTQt1PrQuls0j4XHiTuB
uT+sw7N/7rfR8h9um/p5
xj0m6x/Yuw6F7iDhe1PG
Q5MTsSKeJC/TfIibQTPS
ifZNS5916fGm7qCbunSp
9i3Hx8d9PPSpwOCd7NMP
sfjJvuCrvOClgXnNkbDq
V+XlNwYm7db8qP9K8ads
9fCAT0/8xx2f65N5ectL
O9j0BYFJu3/66ysEBgCB
AQQEBhAQGEBAYAABgQEE
BAYQEBhAQGAAAYEBBAQG
EBAYQEBgAAGBAQQEBhAQ
GEBAYAABgQEEBAYQEBhA
QGAAAYEBBAQGEBAYQEBg
AAGBAQQEBhAQGEBAYAAB
gQEEBAYQEBhAQGAAAYEB
BAQGEBAYQEBgAAGBAQQE
BhAQGEBAYAABgQEEBAYQ
EBhAQGAAAYEBBAQGEBAY
QEBgAAGBAQQEBhAQGEBA
YAABgQEEBAYQEBhAQGAA
AYEBBAQGEBAYQEBgAAGB
AQQEBhAQGEBAYAABgQEE
BAYQEBhAQGAAAYEBBAQG
EBAYQEBgAAGBAQQEBlj2
8fEX4wPz1G62YicAAAAA
SUVORK5CYII=

Which converted to raw data:

mrt:~/ctf/volga/stego/captcha/output$ cat b64.txt | base64 --decode > image.png

Gave the following image:

VolgaCTF 2015 Quals - Captcha (150pts) writeup - 01

We got our flag:

{That_is_incredible_you_have_past!}