This week-long class is designed to teach students the features of C++ that are most commonly encountered in binaries, their implementation details, and how to cope with them in Hex-Rays while reverse engineering. The class also covers Hex-Rays as a tool in great detail.
With practice, after completing this course, students should be able to produce databases such as these:
Rolf Rolles has been reverse engineering since 1997. In the meantime, he has published many technical articles about IDA and Hex-Rays, reverse engineering tool development, and deobfuscation.
He has worked in every major security-related area for reverse engineering: malware analysis, IDS signature development, tool development (as the lead developer of BinDiff from 2004-2006), training, copy protections, and vulnerability analysis.
These days, he runs Mobius Strip Reverse Engineering, teaching training classes and developing forthcoming products for automated binary analysis.
C++ reverse engineering is an uncommon skill and topic. The few writeups on C++ binaries that exist are usually light on details. Roughly every year since the late 1990s, a handful of scattered tutorials are published on virtual functions, inheritance, exception handling, and/or the standard template library (STL). However, since important parts of C++ are not standardized, and hence are implemented differently between compilers and platforms, these publications generally age poorly. No cohesive, comprehensive materials on C++ reverse engineering have emerged in public.
C++ is a huge, complex, and rapidly-evolving language with unique features. Former C++ programmers who return after a hiatus struggle to reacclimate themselves to major features introduced in the meantime. Owing to its complexity and its limitations, hobbyists tend to choose languages other than C++. Owing to its niche specialization to high-performance applications, and its rapid evolution, few programmers who are not employed professionally as C++ developers can justify the time investment of keeping up to date with the language.
Binary Literacy 1, the predecessor to this class, contained a module on C++. However, every time we taught the material, we found that students – even the ones most excited to learn about C++ – struggled with it. Upon discussion and reflection, the fundamental issue was that students were generally unfamiliar with C++ as a language, and particularly, how C++ programmers use its features to develop real software. A student who doesn’t understand why programmers use virtual functions or templates, or what role multiple inheritance plays in software design, has little use for details of their implementation; they will struggle when encountering these constructs in binaries. These observations lead to the design philosophy for this course. Students are not assumed to be have experience programming in C++.
Most features of C++ that are not in C came about because of common situations in software development for which C offered poor solutions. For every feature of C++ that we cover, we discuss the limitations of C that lead to the introduction of those C++ features, and we show examples of using them in the course of developing real software. Our blog entry about STL template type reconstruction shows an example of this educational approach.
Most of the binaries, and the primary coverage, shall be drawn from Microsoft Visual C++ binaries compiled for Windows. Where other platforms or compilers differ substantially (such as for virtual function tables and multiple inheritance), we shall discuss those differences.
Elements of software design in C
Structures
Classes
Miscellaneous topics
Inheritance
Virtual functions
Multiple inheritance
As discussed above, C++ is a huge language and gains new features every few years. A week-long class is not enough to cover even all common features of C++ circa 2003, let alone in the 2020s.
Also as discussed, the goal of the course is to deeply instill students with specific practical skills to use when reverse engineering C++ binaries. Therefore, we opted to cover the most common features in real-world binaries rather than sacrifice time for the core material on superficial treatment of other features.
Therefore, the material on C++ templates and the STL has been removed, and we will not cover exception handling or virtual inheritance. As for more modern features, if they are not listed in the syllabus above, they will not be covered. Perhaps the future holds a Binary Literacy 3 to cover additional topics.
Vectorize (Nitay Artenstein & Iddo Eldor & Jacob Bech)