All the mail mirrored from lore.kernel.org
 help / color / mirror / Atom feed
* [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
@ 2015-06-17 12:41 Pavel Dovgalyuk
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr Pavel Dovgalyuk
                   ` (4 more replies)
  0 siblings, 5 replies; 29+ messages in thread
From: Pavel Dovgalyuk @ 2015-06-17 12:41 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, rth7680, leon.alrae, aurelien, pavel.dovgaluk

This set of patches fixes exception handling for MIPS and i386 targets.
These targets contain instructions that break correct execution in 
icount/TCG modes (MIPS) and in regular TCG mode (i386).

Incorrect execution for i386 is causes by exceptions raised by MMU functions.
MMU helper functions are called from generated code and other helper
functions. In both cases they try to get function's return address for
restoring virtual CPU state.

When MMU helper is called from some other helper function
(like helper_maskmov_xmm) through cpu_st* function, the return address
will point to that helper. That is why CPU state cannot be restored in
the case of MMU fault.

This bug can occur when maskmov instruction is located in the middle of the
translation block.

Execution sequence for this example:

TB start:
PC1: instr1
     instr2
PC2: maskmov <page fault>
     <page fault processing>
PC1: instr1
     instr2
     maskmov

At the start of TB execution guest PC points to instr1. When page fault occurs
QEMU tries to restore guest PC (which should be equal to PC2). It reads host PC
from the call stack and checks whether it points to TB or not. Bug in ldst
helpers implementation provides incorrect host PC, which is not located within
the TB. That's why QEMU cannot recover guest PC and it remains the same (PC1).
After page fault processing QEMU restarts TB and executes instr1 and instr2
for the second time, because guest PC was not recovered.

Bugs in MIPS helper functions do not break the execution in regular TCG mode,
because PC value is updated before calling the functions that can raise
an exception. But icount value cannot be updated this way. Therefore
exceptions make execution in icount mode non-determinisic.
In icount mode every translation block looks as follows:

if icount < n then exit
icount -= n
instr1
instr2
...
instrn
exit

When one of these instructions initiates an exception, icount should be 
restored and adjusted number of instructions should be subtracted from icount
instead of initial n.

tlb_fill function passes retaddr to raise_exception, which allows restoring
current instructions in TB and correct icount calculation.

When exception triggered with other function (e.g. by embedding call to 
exception raising helper into TB), then PC is not passed as retaddr and
correct icount is not recovered. In such cases icount will be decreased 
by the value equal to the size of TB.

This behavior leads to incorrect values of virtual clock and 
non-deterministic execution of the code.

These patches passes pointer to the translation block code to the exception
handler. It allows correct restoring of PC and icount values.

v2 changes:
 * Added softmmu functions to pass TB return value into memory operations handlers
 * Fixed memory operations handling for MIPS
 * Disabled updates of the PC that are overridden with cpu_restore_state
 * Fixed memory operations and exceptions invoked by i386 helpers

---

Pavel Dovgalyuk (3):
      softmmu: add helper function to pass through retaddr
      target-mips: exceptions handling in icount mode
      target-i386: fix memory operations in helpers


 include/exec/cpu_ldst_template.h |   42 ++
 include/exec/exec-all.h          |   27 +
 softmmu_template.h               |   18 +
 target-i386/cc_helper.c          |    2 
 target-i386/cpu.h                |    5 
 target-i386/excp_helper.c        |   23 +
 target-i386/fpu_helper.c         |  146 ++++----
 target-i386/helper.c             |    4 
 target-i386/int_helper.c         |   32 +-
 target-i386/mem_helper.c         |   39 +-
 target-i386/misc_helper.c        |   12 -
 target-i386/ops_sse.h            |    2 
 target-i386/seg_helper.c         |  712 +++++++++++++++++++-------------------
 target-i386/svm_helper.c         |    4 
 target-i386/translate.c          |   25 -
 target-mips/cpu.h                |   28 +
 target-mips/helper.h             |    1 
 target-mips/msa_helper.c         |    5 
 target-mips/op_helper.c          |  183 +++++-----
 target-mips/translate.c          |   46 +-
 20 files changed, 726 insertions(+), 630 deletions(-)

-- 
Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-17 12:41 [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Pavel Dovgalyuk
@ 2015-06-17 12:42 ` Pavel Dovgalyuk
  2015-06-17 12:53   ` Paolo Bonzini
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 2/3] target-mips: exceptions handling in icount mode Pavel Dovgalyuk
                   ` (3 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2015-06-17 12:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, rth7680, leon.alrae, aurelien, pavel.dovgaluk

This patch introduces several helpers to pass return address
which points to the TB. Correct return address allows correct
restoring of the guest PC and icount. These functions should be used when
helpers embedded into TB invoke memory operations.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 include/exec/cpu_ldst_template.h |   42 +++++++++++++++++++++++++++++++-------
 include/exec/exec-all.h          |   27 ++++++++++++++++++++++++
 softmmu_template.h               |   18 ++++++++++++++++
 3 files changed, 79 insertions(+), 8 deletions(-)

diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
index 95ab750..1847816 100644
--- a/include/exec/cpu_ldst_template.h
+++ b/include/exec/cpu_ldst_template.h
@@ -62,7 +62,9 @@
 /* generic load/store macros */
 
 static inline RES_TYPE
-glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
+glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
+                                                  target_ulong ptr,
+                                                  uintptr_t retaddr)
 {
     int page_index;
     RES_TYPE res;
@@ -74,7 +76,8 @@ glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
     mmu_idx = CPU_MMU_INDEX;
     if (unlikely(env->tlb_table[mmu_idx][page_index].ADDR_READ !=
                  (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) {
-        res = glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(env, addr, mmu_idx);
+        res = glue(glue(helper_call_ld, SUFFIX), MMUSUFFIX)(env, addr,
+                                                            mmu_idx, retaddr);
     } else {
         uintptr_t hostaddr = addr + env->tlb_table[mmu_idx][page_index].addend;
         res = glue(glue(ld, USUFFIX), _p)((uint8_t *)hostaddr);
@@ -82,9 +85,17 @@ glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
     return res;
 }
 
+static inline RES_TYPE
+glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
+{
+    return glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(env, ptr, 0);
+}
+
 #if DATA_SIZE <= 2
 static inline int
-glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
+glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
+                                                  target_ulong ptr,
+                                                  uintptr_t retaddr)
 {
     int res, page_index;
     target_ulong addr;
@@ -95,14 +106,20 @@ glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
     mmu_idx = CPU_MMU_INDEX;
     if (unlikely(env->tlb_table[mmu_idx][page_index].ADDR_READ !=
                  (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) {
-        res = (DATA_STYPE)glue(glue(helper_ld, SUFFIX),
-                               MMUSUFFIX)(env, addr, mmu_idx);
+        res = (DATA_STYPE)glue(glue(helper_call_ld, SUFFIX),
+                               MMUSUFFIX)(env, addr, mmu_idx, retaddr);
     } else {
         uintptr_t hostaddr = addr + env->tlb_table[mmu_idx][page_index].addend;
         res = glue(glue(lds, SUFFIX), _p)((uint8_t *)hostaddr);
     }
     return res;
 }
+
+static inline int
+glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
+{
+    return glue(glue(glue(cpu_lds, SUFFIX), MEMSUFFIX), _ra)(env, ptr, 0);
+}
 #endif
 
 #ifndef SOFTMMU_CODE_ACCESS
@@ -110,8 +127,9 @@ glue(glue(cpu_lds, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
 /* generic store macro */
 
 static inline void
-glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr,
-                                      RES_TYPE v)
+glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
+                                                 target_ulong ptr,
+                                                 RES_TYPE v, uintptr_t retaddr)
 {
     int page_index;
     target_ulong addr;
@@ -122,13 +140,21 @@ glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr,
     mmu_idx = CPU_MMU_INDEX;
     if (unlikely(env->tlb_table[mmu_idx][page_index].addr_write !=
                  (addr & (TARGET_PAGE_MASK | (DATA_SIZE - 1))))) {
-        glue(glue(helper_st, SUFFIX), MMUSUFFIX)(env, addr, v, mmu_idx);
+        glue(glue(helper_call_st, SUFFIX), MMUSUFFIX)(env, addr, v, mmu_idx,
+                                                      retaddr);
     } else {
         uintptr_t hostaddr = addr + env->tlb_table[mmu_idx][page_index].addend;
         glue(glue(st, SUFFIX), _p)((uint8_t *)hostaddr, v);
     }
 }
 
+static inline void
+glue(glue(cpu_st, SUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr,
+                                      RES_TYPE v)
+{
+    glue(glue(glue(cpu_st, SUFFIX), MEMSUFFIX), _ra)(env, ptr, v, 0);
+}
+
 #endif /* !SOFTMMU_CODE_ACCESS */
 
 #undef RES_TYPE
diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
index 856e698..b3aefde 100644
--- a/include/exec/exec-all.h
+++ b/include/exec/exec-all.h
@@ -350,6 +350,33 @@ struct MemoryRegion *iotlb_to_region(CPUState *cpu,
 void tlb_fill(CPUState *cpu, target_ulong addr, int is_write, int mmu_idx,
               uintptr_t retaddr);
 
+uint8_t helper_call_ldb_cmmu(CPUArchState *env, target_ulong addr,
+                             int mmu_idx, uintptr_t retaddr);
+uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr);
+uint32_t helper_call_ldl_cmmu(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr);
+uint64_t helper_call_ldq_cmmu(CPUArchState *env, target_ulong addr,
+                              int mmu_idx, uintptr_t retaddr);
+
+uint8_t helper_call_ldb_mmu(CPUArchState *env, target_ulong addr,
+                            int mmu_idx, uintptr_t retaddr);
+uint16_t helper_call_ldw_mmu(CPUArchState *env, target_ulong addr,
+                             int mmu_idx, uintptr_t retaddr);
+uint32_t helper_call_ldl_mmu(CPUArchState *env, target_ulong addr,
+                             int mmu_idx, uintptr_t retaddr);
+uint64_t helper_call_ldq_mmu(CPUArchState *env, target_ulong addr,
+                             int mmu_idx, uintptr_t retaddr);
+
+void helper_call_stb_mmu(CPUArchState *env, target_ulong addr,
+                         uint8_t val, int mmu_idx, uintptr_t retaddr);
+void helper_call_stw_mmu(CPUArchState *env, target_ulong addr,
+                         uint16_t val, int mmu_idx, uintptr_t retaddr);
+void helper_call_stl_mmu(CPUArchState *env, target_ulong addr,
+                         uint32_t val, int mmu_idx, uintptr_t retaddr);
+void helper_call_stq_mmu(CPUArchState *env, target_ulong addr,
+                         uint64_t val, int mmu_idx, uintptr_t retaddr);
+
 #endif
 
 #if defined(CONFIG_USER_ONLY)
diff --git a/softmmu_template.h b/softmmu_template.h
index 39f571b..7d267b4 100644
--- a/softmmu_template.h
+++ b/softmmu_template.h
@@ -343,6 +343,15 @@ glue(glue(helper_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
     return helper_te_ld_name (env, addr, oi, GETRA());
 }
 
+DATA_TYPE
+glue(glue(helper_call_ld, SUFFIX), MMUSUFFIX)(CPUArchState *env,
+                                              target_ulong addr,
+                                              int mmu_idx,
+                                              uintptr_t retaddr)
+{
+    return helper_te_ld_name(env, addr, mmu_idx, retaddr);
+}
+
 #ifndef SOFTMMU_CODE_ACCESS
 
 /* Provide signed versions of the load routines as well.  We can of course
@@ -548,6 +557,15 @@ glue(glue(helper_st, SUFFIX), MMUSUFFIX)(CPUArchState *env, target_ulong addr,
     helper_te_st_name(env, addr, val, oi, GETRA());
 }
 
+void
+glue(glue(helper_call_st, SUFFIX), MMUSUFFIX)(CPUArchState *env,
+                                              target_ulong addr,
+                                              DATA_TYPE val, int mmu_idx,
+                                              uintptr_t retaddr)
+{
+    helper_te_st_name(env, addr, val, mmu_idx, retaddr);
+}
+
 #endif /* !defined(SOFTMMU_CODE_ACCESS) */
 
 #undef READ_ACCESS_TYPE

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v2 2/3] target-mips: exceptions handling in icount mode
  2015-06-17 12:41 [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Pavel Dovgalyuk
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr Pavel Dovgalyuk
@ 2015-06-17 12:42 ` Pavel Dovgalyuk
  2015-06-17 13:05   ` Aurelien Jarno
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 3/3] target-i386: fix memory operations in helpers Pavel Dovgalyuk
                   ` (2 subsequent siblings)
  4 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2015-06-17 12:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, rth7680, leon.alrae, aurelien, pavel.dovgaluk

This patch fixes exception handling in MIPS.
Instructions generate several types of exceptions.
When exception is generated, it breaks the execution of the current translation
block. Implementation of the exceptions handling does not correctly
restore icount for the instruction which caused the exception. In most cases
icount will be decreased by the value equal to the size of TB.
This patch passes pointer to the translation block internals to the exception
handler. It allows correct restoring of the icount value.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 target-mips/cpu.h        |   28 +++++++
 target-mips/helper.h     |    1 
 target-mips/msa_helper.c |    5 +
 target-mips/op_helper.c  |  183 ++++++++++++++++++++++------------------------
 target-mips/translate.c  |   46 ++++++------
 5 files changed, 141 insertions(+), 122 deletions(-)

diff --git a/target-mips/cpu.h b/target-mips/cpu.h
index f9d2b4c..70ba39a 100644
--- a/target-mips/cpu.h
+++ b/target-mips/cpu.h
@@ -1015,4 +1015,32 @@ static inline void cpu_mips_store_cause(CPUMIPSState *env, target_ulong val)
 }
 #endif
 
+static inline void QEMU_NORETURN do_raise_exception_err(CPUMIPSState *env,
+                                                        uint32_t exception,
+                                                        int error_code,
+                                                        uintptr_t pc)
+{
+    CPUState *cs = CPU(mips_env_get_cpu(env));
+
+    if (exception < EXCP_SC) {
+        qemu_log("%s: %d %d\n", __func__, exception, error_code);
+    }
+    cs->exception_index = exception;
+    env->error_code = error_code;
+
+    if (pc) {
+        /* now we have a real cpu fault */
+        cpu_restore_state(cs, pc);
+    }
+
+    cpu_loop_exit(cs);
+}
+
+static inline void QEMU_NORETURN do_raise_exception(CPUMIPSState *env,
+                                                    uint32_t exception,
+                                                    uintptr_t pc)
+{
+    do_raise_exception_err(env, exception, 0, pc);
+}
+
 #endif /* !defined (__MIPS_CPU_H__) */
diff --git a/target-mips/helper.h b/target-mips/helper.h
index 3bd0b02..50c699e 100644
--- a/target-mips/helper.h
+++ b/target-mips/helper.h
@@ -1,5 +1,6 @@
 DEF_HELPER_3(raise_exception_err, noreturn, env, i32, int)
 DEF_HELPER_2(raise_exception, noreturn, env, i32)
+DEF_HELPER_1(raise_exception_debug, noreturn, env)
 
 #ifdef TARGET_MIPS64
 DEF_HELPER_4(sdl, void, env, tl, tl, int)
diff --git a/target-mips/msa_helper.c b/target-mips/msa_helper.c
index 26ffdc7..f7bc710 100644
--- a/target-mips/msa_helper.c
+++ b/target-mips/msa_helper.c
@@ -1352,7 +1352,7 @@ void helper_msa_ctcmsa(CPUMIPSState *env, target_ulong elm, uint32_t cd)
         /* check exception */
         if ((GET_FP_ENABLE(env->active_tc.msacsr) | FP_UNIMPLEMENTED)
             & GET_FP_CAUSE(env->active_tc.msacsr)) {
-            helper_raise_exception(env, EXCP_MSAFPE);
+            do_raise_exception(env, EXCP_MSAFPE, GETPC());
         }
         break;
     }
@@ -1512,7 +1512,8 @@ static inline void check_msacsr_cause(CPUMIPSState *env)
         UPDATE_FP_FLAGS(env->active_tc.msacsr,
                 GET_FP_CAUSE(env->active_tc.msacsr));
     } else {
-        helper_raise_exception(env, EXCP_MSAFPE);
+        /* Will work only when check_msacsr_cause is actually inlined */
+        do_raise_exception(env, EXCP_MSAFPE, GETPC());
     }
 }
 
diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
index 73a8e45..2815c60 100644
--- a/target-mips/op_helper.c
+++ b/target-mips/op_helper.c
@@ -30,41 +30,23 @@ static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global);
 /*****************************************************************************/
 /* Exceptions processing helpers */
 
-static inline void QEMU_NORETURN do_raise_exception_err(CPUMIPSState *env,
-                                                        uint32_t exception,
-                                                        int error_code,
-                                                        uintptr_t pc)
+void helper_raise_exception_err(CPUMIPSState *env, uint32_t exception,
+                                int error_code)
 {
-    CPUState *cs = CPU(mips_env_get_cpu(env));
-
-    if (exception < EXCP_SC) {
-        qemu_log("%s: %d %d\n", __func__, exception, error_code);
-    }
-    cs->exception_index = exception;
-    env->error_code = error_code;
-
-    if (pc) {
-        /* now we have a real cpu fault */
-        cpu_restore_state(cs, pc);
-    }
-
-    cpu_loop_exit(cs);
+    do_raise_exception_err(env, exception, error_code, GETPC());
 }
 
-static inline void QEMU_NORETURN do_raise_exception(CPUMIPSState *env,
-                                                    uint32_t exception,
-                                                    uintptr_t pc)
+void helper_raise_exception(CPUMIPSState *env, uint32_t exception)
 {
-    do_raise_exception_err(env, exception, 0, pc);
+    do_raise_exception(env, exception, GETPC());
 }
 
-void helper_raise_exception_err(CPUMIPSState *env, uint32_t exception,
-                                int error_code)
+void helper_raise_exception_debug(CPUMIPSState *env)
 {
-    do_raise_exception_err(env, exception, error_code, 0);
+    do_raise_exception(env, EXCP_DEBUG, 0);
 }
 
-void helper_raise_exception(CPUMIPSState *env, uint32_t exception)
+static void raise_exception(CPUMIPSState *env, uint32_t exception)
 {
     do_raise_exception(env, exception, 0);
 }
@@ -72,21 +54,21 @@ void helper_raise_exception(CPUMIPSState *env, uint32_t exception)
 #if defined(CONFIG_USER_ONLY)
 #define HELPER_LD(name, insn, type)                                     \
 static inline type do_##name(CPUMIPSState *env, target_ulong addr,      \
-                             int mem_idx)                               \
+                             int mem_idx, uintptr_t retaddr)            \
 {                                                                       \
-    return (type) cpu_##insn##_data(env, addr);                         \
+    return (type) cpu_##insn##_data_ra(env, addr, retaddr);             \
 }
 #else
 #define HELPER_LD(name, insn, type)                                     \
 static inline type do_##name(CPUMIPSState *env, target_ulong addr,      \
-                             int mem_idx)                               \
+                             int mem_idx, uintptr_t retaddr)            \
 {                                                                       \
     switch (mem_idx)                                                    \
     {                                                                   \
-    case 0: return (type) cpu_##insn##_kernel(env, addr); break;        \
-    case 1: return (type) cpu_##insn##_super(env, addr); break;         \
+    case 0: return (type) cpu_##insn##_kernel_ra(env, addr, retaddr);   \
+    case 1: return (type) cpu_##insn##_super_ra(env, addr, retaddr);    \
     default:                                                            \
-    case 2: return (type) cpu_##insn##_user(env, addr); break;          \
+    case 2: return (type) cpu_##insn##_user_ra(env, addr, retaddr);     \
     }                                                                   \
 }
 #endif
@@ -106,14 +88,14 @@ static inline void do_##name(CPUMIPSState *env, target_ulong addr,      \
 #else
 #define HELPER_ST(name, insn, type)                                     \
 static inline void do_##name(CPUMIPSState *env, target_ulong addr,      \
-                             type val, int mem_idx)                     \
+                             type val, int mem_idx, uintptr_t retaddr)  \
 {                                                                       \
     switch (mem_idx)                                                    \
     {                                                                   \
-    case 0: cpu_##insn##_kernel(env, addr, val); break;                 \
-    case 1: cpu_##insn##_super(env, addr, val); break;                  \
+    case 0: cpu_##insn##_kernel_ra(env, addr, val, retaddr); break;     \
+    case 1: cpu_##insn##_super_ra(env, addr, val, retaddr); break;      \
     default:                                                            \
-    case 2: cpu_##insn##_user(env, addr, val); break;                   \
+    case 2: cpu_##insn##_user_ra(env, addr, val, retaddr); break;       \
     }                                                                   \
 }
 #endif
@@ -291,14 +273,19 @@ target_ulong helper_bitswap(target_ulong rt)
 
 static inline hwaddr do_translate_address(CPUMIPSState *env,
                                                       target_ulong address,
-                                                      int rw)
+                                                      int rw, uintptr_t retaddr)
 {
     hwaddr lladdr;
+    CPUState *cs = CPU(mips_env_get_cpu(env));
 
     lladdr = cpu_mips_translate_address(env, address, rw);
 
     if (lladdr == -1LL) {
-        cpu_loop_exit(CPU(mips_env_get_cpu(env)));
+        if (retaddr) {
+            /* now we have a real cpu fault */
+            cpu_restore_state(cs, retaddr);
+        }
+        cpu_loop_exit(cs);
     } else {
         return lladdr;
     }
@@ -309,10 +296,10 @@ target_ulong helper_##name(CPUMIPSState *env, target_ulong arg, int mem_idx)  \
 {                                                                             \
     if (arg & almask) {                                                       \
         env->CP0_BadVAddr = arg;                                              \
-        helper_raise_exception(env, EXCP_AdEL);                               \
+        do_raise_exception(env, EXCP_AdEL, GETPC());                          \
     }                                                                         \
-    env->lladdr = do_translate_address(env, arg, 0);                          \
-    env->llval = do_##insn(env, arg, mem_idx);                                \
+    env->lladdr = do_translate_address(env, arg, 0, GETPC());                 \
+    env->llval = do_##insn(env, arg, mem_idx, GETPC());                       \
     return env->llval;                                                        \
 }
 HELPER_LD_ATOMIC(ll, lw, 0x3)
@@ -329,12 +316,12 @@ target_ulong helper_##name(CPUMIPSState *env, target_ulong arg1,              \
                                                                               \
     if (arg2 & almask) {                                                      \
         env->CP0_BadVAddr = arg2;                                             \
-        helper_raise_exception(env, EXCP_AdES);                               \
+        do_raise_exception(env, EXCP_AdES, GETPC());                          \
     }                                                                         \
-    if (do_translate_address(env, arg2, 1) == env->lladdr) {                  \
-        tmp = do_##ld_insn(env, arg2, mem_idx);                               \
+    if (do_translate_address(env, arg2, 1, GETPC()) == env->lladdr) {         \
+        tmp = do_##ld_insn(env, arg2, mem_idx, GETPC());                      \
         if (tmp == env->llval) {                                              \
-            do_##st_insn(env, arg2, arg1, mem_idx);                           \
+            do_##st_insn(env, arg2, arg1, mem_idx, GETPC());                  \
             return 1;                                                         \
         }                                                                     \
     }                                                                         \
@@ -358,31 +345,31 @@ HELPER_ST_ATOMIC(scd, ld, sd, 0x7)
 void helper_swl(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
                 int mem_idx)
 {
-    do_sb(env, arg2, (uint8_t)(arg1 >> 24), mem_idx);
+    do_sb(env, arg2, (uint8_t)(arg1 >> 24), mem_idx, GETPC());
 
     if (GET_LMASK(arg2) <= 2)
-        do_sb(env, GET_OFFSET(arg2, 1), (uint8_t)(arg1 >> 16), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 1), (uint8_t)(arg1 >> 16), mem_idx, GETPC());
 
     if (GET_LMASK(arg2) <= 1)
-        do_sb(env, GET_OFFSET(arg2, 2), (uint8_t)(arg1 >> 8), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 2), (uint8_t)(arg1 >> 8), mem_idx, GETPC());
 
     if (GET_LMASK(arg2) == 0)
-        do_sb(env, GET_OFFSET(arg2, 3), (uint8_t)arg1, mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 3), (uint8_t)arg1, mem_idx, GETPC());
 }
 
 void helper_swr(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
                 int mem_idx)
 {
-    do_sb(env, arg2, (uint8_t)arg1, mem_idx);
+    do_sb(env, arg2, (uint8_t)arg1, mem_idx, GETPC());
 
     if (GET_LMASK(arg2) >= 1)
-        do_sb(env, GET_OFFSET(arg2, -1), (uint8_t)(arg1 >> 8), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -1), (uint8_t)(arg1 >> 8), mem_idx, GETPC());
 
     if (GET_LMASK(arg2) >= 2)
-        do_sb(env, GET_OFFSET(arg2, -2), (uint8_t)(arg1 >> 16), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -2), (uint8_t)(arg1 >> 16), mem_idx, GETPC());
 
     if (GET_LMASK(arg2) == 3)
-        do_sb(env, GET_OFFSET(arg2, -3), (uint8_t)(arg1 >> 24), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -3), (uint8_t)(arg1 >> 24), mem_idx, GETPC());
 }
 
 #if defined(TARGET_MIPS64)
@@ -398,55 +385,55 @@ void helper_swr(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
 void helper_sdl(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
                 int mem_idx)
 {
-    do_sb(env, arg2, (uint8_t)(arg1 >> 56), mem_idx);
+    do_sb(env, arg2, (uint8_t)(arg1 >> 56), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 6)
-        do_sb(env, GET_OFFSET(arg2, 1), (uint8_t)(arg1 >> 48), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 1), (uint8_t)(arg1 >> 48), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 5)
-        do_sb(env, GET_OFFSET(arg2, 2), (uint8_t)(arg1 >> 40), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 2), (uint8_t)(arg1 >> 40), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 4)
-        do_sb(env, GET_OFFSET(arg2, 3), (uint8_t)(arg1 >> 32), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 3), (uint8_t)(arg1 >> 32), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 3)
-        do_sb(env, GET_OFFSET(arg2, 4), (uint8_t)(arg1 >> 24), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 4), (uint8_t)(arg1 >> 24), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 2)
-        do_sb(env, GET_OFFSET(arg2, 5), (uint8_t)(arg1 >> 16), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 5), (uint8_t)(arg1 >> 16), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 1)
-        do_sb(env, GET_OFFSET(arg2, 6), (uint8_t)(arg1 >> 8), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 6), (uint8_t)(arg1 >> 8), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) <= 0)
-        do_sb(env, GET_OFFSET(arg2, 7), (uint8_t)arg1, mem_idx);
+        do_sb(env, GET_OFFSET(arg2, 7), (uint8_t)arg1, mem_idx, GETPC());
 }
 
 void helper_sdr(CPUMIPSState *env, target_ulong arg1, target_ulong arg2,
                 int mem_idx)
 {
-    do_sb(env, arg2, (uint8_t)arg1, mem_idx);
+    do_sb(env, arg2, (uint8_t)arg1, mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) >= 1)
-        do_sb(env, GET_OFFSET(arg2, -1), (uint8_t)(arg1 >> 8), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -1), (uint8_t)(arg1 >> 8), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) >= 2)
-        do_sb(env, GET_OFFSET(arg2, -2), (uint8_t)(arg1 >> 16), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -2), (uint8_t)(arg1 >> 16), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) >= 3)
-        do_sb(env, GET_OFFSET(arg2, -3), (uint8_t)(arg1 >> 24), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -3), (uint8_t)(arg1 >> 24), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) >= 4)
-        do_sb(env, GET_OFFSET(arg2, -4), (uint8_t)(arg1 >> 32), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -4), (uint8_t)(arg1 >> 32), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) >= 5)
-        do_sb(env, GET_OFFSET(arg2, -5), (uint8_t)(arg1 >> 40), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -5), (uint8_t)(arg1 >> 40), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) >= 6)
-        do_sb(env, GET_OFFSET(arg2, -6), (uint8_t)(arg1 >> 48), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -6), (uint8_t)(arg1 >> 48), mem_idx, GETPC());
 
     if (GET_LMASK64(arg2) == 7)
-        do_sb(env, GET_OFFSET(arg2, -7), (uint8_t)(arg1 >> 56), mem_idx);
+        do_sb(env, GET_OFFSET(arg2, -7), (uint8_t)(arg1 >> 56), mem_idx, GETPC());
 }
 #endif /* TARGET_MIPS64 */
 
@@ -463,13 +450,13 @@ void helper_lwm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
 
         for (i = 0; i < base_reglist; i++) {
             env->active_tc.gpr[multiple_regs[i]] =
-                (target_long)do_lw(env, addr, mem_idx);
+                (target_long)do_lw(env, addr, mem_idx, GETPC());
             addr += 4;
         }
     }
 
     if (do_r31) {
-        env->active_tc.gpr[31] = (target_long)do_lw(env, addr, mem_idx);
+        env->active_tc.gpr[31] = (target_long)do_lw(env, addr, mem_idx, GETPC());
     }
 }
 
@@ -483,13 +470,13 @@ void helper_swm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            do_sw(env, addr, env->active_tc.gpr[multiple_regs[i]], mem_idx);
+            do_sw(env, addr, env->active_tc.gpr[multiple_regs[i]], mem_idx, GETPC());
             addr += 4;
         }
     }
 
     if (do_r31) {
-        do_sw(env, addr, env->active_tc.gpr[31], mem_idx);
+        do_sw(env, addr, env->active_tc.gpr[31], mem_idx, GETPC());
     }
 }
 
@@ -504,7 +491,7 @@ void helper_ldm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            env->active_tc.gpr[multiple_regs[i]] = do_ld(env, addr, mem_idx);
+            env->active_tc.gpr[multiple_regs[i]] = do_ld(env, addr, mem_idx, GETPC());
             addr += 8;
         }
     }
@@ -524,13 +511,13 @@ void helper_sdm(CPUMIPSState *env, target_ulong addr, target_ulong reglist,
         target_ulong i;
 
         for (i = 0; i < base_reglist; i++) {
-            do_sd(env, addr, env->active_tc.gpr[multiple_regs[i]], mem_idx);
+            do_sd(env, addr, env->active_tc.gpr[multiple_regs[i]], mem_idx, GETPC());
             addr += 8;
         }
     }
 
     if (do_r31) {
-        do_sd(env, addr, env->active_tc.gpr[31], mem_idx);
+        do_sd(env, addr, env->active_tc.gpr[31], mem_idx, GETPC());
     }
 }
 #endif
@@ -1787,13 +1774,13 @@ target_ulong helper_yield(CPUMIPSState *env, target_ulong arg)
                 env->active_tc.CP0_TCStatus & (1 << CP0TCSt_DT)) {
                 env->CP0_VPEControl &= ~(0x7 << CP0VPECo_EXCPT);
                 env->CP0_VPEControl |= 4 << CP0VPECo_EXCPT;
-                helper_raise_exception(env, EXCP_THREAD);
+                do_raise_exception(env, EXCP_THREAD, GETPC());
             }
         }
     } else if (arg1 == 0) {
         if (0 /* TODO: TC underflow */) {
             env->CP0_VPEControl &= ~(0x7 << CP0VPECo_EXCPT);
-            helper_raise_exception(env, EXCP_THREAD);
+            do_raise_exception(env, EXCP_THREAD, GETPC());
         } else {
             // TODO: Deallocate TC
         }
@@ -1801,7 +1788,7 @@ target_ulong helper_yield(CPUMIPSState *env, target_ulong arg)
         /* Yield qualifier inputs not implemented. */
         env->CP0_VPEControl &= ~(0x7 << CP0VPECo_EXCPT);
         env->CP0_VPEControl |= 2 << CP0VPECo_EXCPT;
-        helper_raise_exception(env, EXCP_THREAD);
+        do_raise_exception(env, EXCP_THREAD, GETPC());
     }
     return env->CP0_YQMask;
 }
@@ -2131,7 +2118,7 @@ target_ulong helper_rdhwr_cpunum(CPUMIPSState *env)
         (env->CP0_HWREna & (1 << 0)))
         return env->CP0_EBase & 0x3ff;
     else
-        helper_raise_exception(env, EXCP_RI);
+        do_raise_exception(env, EXCP_RI, GETPC());
 
     return 0;
 }
@@ -2142,7 +2129,7 @@ target_ulong helper_rdhwr_synci_step(CPUMIPSState *env)
         (env->CP0_HWREna & (1 << 1)))
         return env->SYNCI_Step;
     else
-        helper_raise_exception(env, EXCP_RI);
+        do_raise_exception(env, EXCP_RI, GETPC());
 
     return 0;
 }
@@ -2153,7 +2140,7 @@ target_ulong helper_rdhwr_cc(CPUMIPSState *env)
         (env->CP0_HWREna & (1 << 2)))
         return env->CP0_Count;
     else
-        helper_raise_exception(env, EXCP_RI);
+        do_raise_exception(env, EXCP_RI, GETPC());
 
     return 0;
 }
@@ -2164,7 +2151,7 @@ target_ulong helper_rdhwr_ccres(CPUMIPSState *env)
         (env->CP0_HWREna & (1 << 3)))
         return env->CCRes;
     else
-        helper_raise_exception(env, EXCP_RI);
+        do_raise_exception(env, EXCP_RI, GETPC());
 
     return 0;
 }
@@ -2201,7 +2188,9 @@ void helper_wait(CPUMIPSState *env)
 
     cs->halted = 1;
     cpu_reset_interrupt(cs, CPU_INTERRUPT_WAKE);
-    helper_raise_exception(env, EXCP_HLT);
+    /* Last instruction in the block, PC was updated before
+       - no need to recover PC and icount */
+    raise_exception(env, EXCP_HLT);
 }
 
 #if !defined(CONFIG_USER_ONLY)
@@ -2262,9 +2251,9 @@ void mips_cpu_unassigned_access(CPUState *cs, hwaddr addr,
     }
 
     if (is_exec) {
-        helper_raise_exception(env, EXCP_IBE);
+        raise_exception(env, EXCP_IBE);
     } else {
-        helper_raise_exception(env, EXCP_DBE);
+        raise_exception(env, EXCP_DBE);
     }
 }
 #endif /* !CONFIG_USER_ONLY */
@@ -2299,7 +2288,7 @@ target_ulong helper_cfc1(CPUMIPSState *env, uint32_t reg)
                 arg1 = (int32_t)
                        ((env->CP0_Status & (1  << CP0St_FR)) >> CP0St_FR);
             } else {
-                helper_raise_exception(env, EXCP_RI);
+                do_raise_exception(env, EXCP_RI, GETPC());
             }
         }
         break;
@@ -2332,7 +2321,7 @@ void helper_ctc1(CPUMIPSState *env, target_ulong arg1, uint32_t fs, uint32_t rt)
             env->CP0_Status &= ~(1 << CP0St_FR);
             compute_hflags(env);
         } else {
-            helper_raise_exception(env, EXCP_RI);
+            do_raise_exception(env, EXCP_RI, GETPC());
         }
         break;
     case 4:
@@ -2344,7 +2333,7 @@ void helper_ctc1(CPUMIPSState *env, target_ulong arg1, uint32_t fs, uint32_t rt)
             env->CP0_Status |= (1 << CP0St_FR);
             compute_hflags(env);
         } else {
-            helper_raise_exception(env, EXCP_RI);
+            do_raise_exception(env, EXCP_RI, GETPC());
         }
         break;
     case 25:
@@ -3569,25 +3558,25 @@ void helper_msa_ld_df(CPUMIPSState *env, uint32_t df, uint32_t wd, uint32_t rs,
     case DF_BYTE:
         for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
             pwd->b[i] = do_lbu(env, addr + (i << DF_BYTE),
-                                env->hflags & MIPS_HFLAG_KSU);
+                                env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     case DF_HALF:
         for (i = 0; i < DF_ELEMENTS(DF_HALF); i++) {
             pwd->h[i] = do_lhu(env, addr + (i << DF_HALF),
-                                env->hflags & MIPS_HFLAG_KSU);
+                                env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     case DF_WORD:
         for (i = 0; i < DF_ELEMENTS(DF_WORD); i++) {
             pwd->w[i] = do_lw(env, addr + (i << DF_WORD),
-                                env->hflags & MIPS_HFLAG_KSU);
+                                env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     case DF_DOUBLE:
         for (i = 0; i < DF_ELEMENTS(DF_DOUBLE); i++) {
             pwd->d[i] = do_ld(env, addr + (i << DF_DOUBLE),
-                                env->hflags & MIPS_HFLAG_KSU);
+                                env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     }
@@ -3604,25 +3593,25 @@ void helper_msa_st_df(CPUMIPSState *env, uint32_t df, uint32_t wd, uint32_t rs,
     case DF_BYTE:
         for (i = 0; i < DF_ELEMENTS(DF_BYTE); i++) {
             do_sb(env, addr + (i << DF_BYTE), pwd->b[i],
-                    env->hflags & MIPS_HFLAG_KSU);
+                    env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     case DF_HALF:
         for (i = 0; i < DF_ELEMENTS(DF_HALF); i++) {
             do_sh(env, addr + (i << DF_HALF), pwd->h[i],
-                    env->hflags & MIPS_HFLAG_KSU);
+                    env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     case DF_WORD:
         for (i = 0; i < DF_ELEMENTS(DF_WORD); i++) {
             do_sw(env, addr + (i << DF_WORD), pwd->w[i],
-                    env->hflags & MIPS_HFLAG_KSU);
+                    env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     case DF_DOUBLE:
         for (i = 0; i < DF_ELEMENTS(DF_DOUBLE); i++) {
             do_sd(env, addr + (i << DF_DOUBLE), pwd->d[i],
-                    env->hflags & MIPS_HFLAG_KSU);
+                    env->hflags & MIPS_HFLAG_KSU, GETPC());
         }
         break;
     }
diff --git a/target-mips/translate.c b/target-mips/translate.c
index fd063a2..0de9244 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -1673,7 +1673,7 @@ generate_exception_err (DisasContext *ctx, int excp, int err)
 {
     TCGv_i32 texcp = tcg_const_i32(excp);
     TCGv_i32 terr = tcg_const_i32(err);
-    save_cpu_state(ctx, 1);
+    save_cpu_state(ctx, 0);
     gen_helper_raise_exception_err(cpu_env, texcp, terr);
     tcg_temp_free_i32(terr);
     tcg_temp_free_i32(texcp);
@@ -1682,7 +1682,7 @@ generate_exception_err (DisasContext *ctx, int excp, int err)
 static inline void
 generate_exception (DisasContext *ctx, int excp)
 {
-    save_cpu_state(ctx, 1);
+    save_cpu_state(ctx, 0);
     gen_helper_0e0i(raise_exception, excp);
 }
 
@@ -2092,7 +2092,7 @@ static void gen_ld(DisasContext *ctx, uint32_t opc,
         break;
     case OPC_LLD:
     case R6_OPC_LLD:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         op_ld_lld(t0, t0, ctx);
         gen_store_gpr(t0, rt);
         opn = "lld";
@@ -2227,7 +2227,7 @@ static void gen_ld(DisasContext *ctx, uint32_t opc,
         break;
     case OPC_LL:
     case R6_OPC_LL:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         op_ld_ll(t0, t0, ctx);
         gen_store_gpr(t0, rt);
         opn = "ll";
@@ -2255,12 +2255,12 @@ static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
         opn = "sd";
         break;
     case OPC_SDL:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_0e2i(sdl, t1, t0, ctx->mem_idx);
         opn = "sdl";
         break;
     case OPC_SDR:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_0e2i(sdr, t1, t0, ctx->mem_idx);
         opn = "sdr";
         break;
@@ -2278,12 +2278,12 @@ static void gen_st (DisasContext *ctx, uint32_t opc, int rt,
         opn = "sb";
         break;
     case OPC_SWL:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_0e2i(swl, t1, t0, ctx->mem_idx);
         opn = "swl";
         break;
     case OPC_SWR:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_0e2i(swr, t1, t0, ctx->mem_idx);
         opn = "swr";
         break;
@@ -2315,14 +2315,14 @@ static void gen_st_cond (DisasContext *ctx, uint32_t opc, int rt,
 #if defined(TARGET_MIPS64)
     case OPC_SCD:
     case R6_OPC_SCD:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         op_st_scd(t1, t0, rt, ctx);
         opn = "scd";
         break;
 #endif
     case OPC_SC:
     case R6_OPC_SC:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         op_st_sc(t1, t0, rt, ctx);
         opn = "sc";
         break;
@@ -4386,7 +4386,7 @@ static inline void gen_goto_tb(DisasContext *ctx, int n, target_ulong dest)
         gen_save_pc(dest);
         if (ctx->singlestep_enabled) {
             save_cpu_state(ctx, 0);
-            gen_helper_0e0i(raise_exception, EXCP_DEBUG);
+            gen_helper_raise_exception_debug(cpu_env);
         }
         tcg_gen_exit_tb(0);
     }
@@ -7768,7 +7768,7 @@ static void gen_mttr(CPUMIPSState *env, DisasContext *ctx, int rd, int rt,
         break;
     case 3:
         /* XXX: For now we support only a single FPU context. */
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         {
             TCGv_i32 fs_tmp = tcg_const_i32(rd);
 
@@ -8371,7 +8371,7 @@ static void gen_cp1 (DisasContext *ctx, uint32_t opc, int rt, int fs)
         break;
     case OPC_CTC1:
         gen_load_gpr(t0, rt);
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         {
             TCGv_i32 fs_tmp = tcg_const_i32(fs);
 
@@ -10487,22 +10487,22 @@ static void gen_rdhwr(DisasContext *ctx, int rt, int rd)
 
     switch (rd) {
     case 0:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_rdhwr_cpunum(t0, cpu_env);
         gen_store_gpr(t0, rt);
         break;
     case 1:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_rdhwr_synci_step(t0, cpu_env);
         gen_store_gpr(t0, rt);
         break;
     case 2:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_rdhwr_cc(t0, cpu_env);
         gen_store_gpr(t0, rt);
         break;
     case 3:
-        save_cpu_state(ctx, 1);
+        save_cpu_state(ctx, 0);
         gen_helper_rdhwr_ccres(t0, cpu_env);
         gen_store_gpr(t0, rt);
         break;
@@ -10602,7 +10602,7 @@ static void gen_branch(DisasContext *ctx, int insn_bytes)
             }
             if (ctx->singlestep_enabled) {
                 save_cpu_state(ctx, 0);
-                gen_helper_0e0i(raise_exception, EXCP_DEBUG);
+                gen_helper_raise_exception_debug(cpu_env);
             }
             tcg_gen_exit_tb(0);
             break;
@@ -17321,7 +17321,7 @@ static void decode_opc_special3(CPUMIPSState *env, DisasContext *ctx)
         {
             TCGv t0 = tcg_temp_new();
 
-            save_cpu_state(ctx, 1);
+            save_cpu_state(ctx, 0);
             gen_load_gpr(t0, rs);
             gen_helper_yield(t0, cpu_env, t0);
             gen_store_gpr(t0, rd);
@@ -18414,14 +18414,14 @@ static void gen_msa(CPUMIPSState *env, DisasContext *ctx)
             case OPC_LD_H:
             case OPC_LD_W:
             case OPC_LD_D:
-                save_cpu_state(ctx, 1);
+                save_cpu_state(ctx, 0);
                 gen_helper_msa_ld_df(cpu_env, tdf, twd, trs, ts10);
                 break;
             case OPC_ST_B:
             case OPC_ST_H:
             case OPC_ST_W:
             case OPC_ST_D:
-                save_cpu_state(ctx, 1);
+                save_cpu_state(ctx, 0);
                 gen_helper_msa_st_df(cpu_env, tdf, twd, trs, ts10);
                 break;
             }
@@ -19155,7 +19155,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, TranslationBlock *tb,
                 if (bp->pc == ctx.pc) {
                     save_cpu_state(&ctx, 1);
                     ctx.bstate = BS_BRANCH;
-                    gen_helper_0e0i(raise_exception, EXCP_DEBUG);
+                    gen_helper_raise_exception_debug(cpu_env);
                     /* Include the breakpoint location or the tb won't
                      * be flushed when it must be.  */
                     ctx.pc += 4;
@@ -19239,7 +19239,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, TranslationBlock *tb,
     }
     if (cs->singlestep_enabled && ctx.bstate != BS_BRANCH) {
         save_cpu_state(&ctx, ctx.bstate != BS_EXCP);
-        gen_helper_0e0i(raise_exception, EXCP_DEBUG);
+        gen_helper_raise_exception_debug(cpu_env);
     } else {
         switch (ctx.bstate) {
         case BS_STOP:

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* [Qemu-devel] [PATCH v2 3/3] target-i386: fix memory operations in helpers
  2015-06-17 12:41 [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Pavel Dovgalyuk
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr Pavel Dovgalyuk
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 2/3] target-mips: exceptions handling in icount mode Pavel Dovgalyuk
@ 2015-06-17 12:42 ` Pavel Dovgalyuk
  2015-06-17 13:27   ` Aurelien Jarno
  2015-06-17 13:24 ` [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Aurelien Jarno
  2015-06-17 14:19 ` Aurelien Jarno
  4 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgalyuk @ 2015-06-17 12:42 UTC (permalink / raw)
  To: qemu-devel; +Cc: pbonzini, rth7680, leon.alrae, aurelien, pavel.dovgaluk

This patch passes TB return address into softmmu functions that are
invoked from target helpers. This allows correct PC and icount recovering
while handling MMU faults.

Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
---
 target-i386/cc_helper.c   |    2 
 target-i386/cpu.h         |    5 
 target-i386/excp_helper.c |   23 +
 target-i386/fpu_helper.c  |  146 +++++----
 target-i386/helper.c      |    4 
 target-i386/int_helper.c  |   32 +-
 target-i386/mem_helper.c  |   39 +-
 target-i386/misc_helper.c |   12 -
 target-i386/ops_sse.h     |    2 
 target-i386/seg_helper.c  |  712 +++++++++++++++++++++++----------------------
 target-i386/svm_helper.c  |    4 
 target-i386/translate.c   |   25 --
 12 files changed, 506 insertions(+), 500 deletions(-)

diff --git a/target-i386/cc_helper.c b/target-i386/cc_helper.c
index ecbf0ec..4b523db 100644
--- a/target-i386/cc_helper.c
+++ b/target-i386/cc_helper.c
@@ -378,7 +378,7 @@ void helper_sti_vm(CPUX86State *env)
 {
     env->eflags |= VIF_MASK;
     if (env->eflags & VIP_MASK) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
 }
 #endif
diff --git a/target-i386/cpu.h b/target-i386/cpu.h
index 4ee12ca..cd70c17 100644
--- a/target-i386/cpu.h
+++ b/target-i386/cpu.h
@@ -1249,9 +1249,10 @@ void cpu_x86_inject_mce(Monitor *mon, X86CPU *cpu, int bank,
                         uint64_t misc, int flags);
 
 /* excp_helper.c */
-void QEMU_NORETURN raise_exception(CPUX86State *env, int exception_index);
+void QEMU_NORETURN raise_exception(CPUX86State *env, int exception_index,
+                                   uintptr_t retaddr);
 void QEMU_NORETURN raise_exception_err(CPUX86State *env, int exception_index,
-                                       int error_code);
+                                       int error_code, uintptr_t retaddr);
 void QEMU_NORETURN raise_interrupt(CPUX86State *nenv, int intno, int is_int,
                                    int error_code, int next_eip_addend);
 
diff --git a/target-i386/excp_helper.c b/target-i386/excp_helper.c
index 99fca84..48be348 100644
--- a/target-i386/excp_helper.c
+++ b/target-i386/excp_helper.c
@@ -23,10 +23,10 @@
 #include "exec/helper-proto.h"
 
 #if 0
-#define raise_exception_err(env, a, b)                                  \
+#define raise_exception_err(env, a, b, c)                               \
     do {                                                                \
         qemu_log("raise_exception line=%d\n", __LINE__);                \
-        (raise_exception_err)(env, a, b);                               \
+        (raise_exception_err)(env, a, b, c);                            \
     } while (0)
 #endif
 
@@ -37,7 +37,7 @@ void helper_raise_interrupt(CPUX86State *env, int intno, int next_eip_addend)
 
 void helper_raise_exception(CPUX86State *env, int exception_index)
 {
-    raise_exception(env, exception_index);
+    raise_exception(env, exception_index, 0);
 }
 
 /*
@@ -92,7 +92,8 @@ static int check_exception(CPUX86State *env, int intno, int *error_code)
  */
 static void QEMU_NORETURN raise_interrupt2(CPUX86State *env, int intno,
                                            int is_int, int error_code,
-                                           int next_eip_addend)
+                                           int next_eip_addend,
+                                           uintptr_t retaddr)
 {
     CPUState *cs = CPU(x86_env_get_cpu(env));
 
@@ -108,6 +109,10 @@ static void QEMU_NORETURN raise_interrupt2(CPUX86State *env, int intno,
     env->error_code = error_code;
     env->exception_is_int = is_int;
     env->exception_next_eip = env->eip + next_eip_addend;
+    if (retaddr) {
+        /* now we have a real cpu fault */
+        cpu_restore_state(cs, retaddr);
+    }
     cpu_loop_exit(cs);
 }
 
@@ -116,16 +121,16 @@ static void QEMU_NORETURN raise_interrupt2(CPUX86State *env, int intno,
 void QEMU_NORETURN raise_interrupt(CPUX86State *env, int intno, int is_int,
                                    int error_code, int next_eip_addend)
 {
-    raise_interrupt2(env, intno, is_int, error_code, next_eip_addend);
+    raise_interrupt2(env, intno, is_int, error_code, next_eip_addend, 0);
 }
 
 void raise_exception_err(CPUX86State *env, int exception_index,
-                         int error_code)
+                         int error_code, uintptr_t retaddr)
 {
-    raise_interrupt2(env, exception_index, 0, error_code, 0);
+    raise_interrupt2(env, exception_index, 0, error_code, 0, retaddr);
 }
 
-void raise_exception(CPUX86State *env, int exception_index)
+void raise_exception(CPUX86State *env, int exception_index, uintptr_t retaddr)
 {
-    raise_interrupt2(env, exception_index, 0, 0, 0);
+    raise_interrupt2(env, exception_index, 0, 0, 0, retaddr);
 }
diff --git a/target-i386/fpu_helper.c b/target-i386/fpu_helper.c
index 30d34d5..1ad2ff7 100644
--- a/target-i386/fpu_helper.c
+++ b/target-i386/fpu_helper.c
@@ -68,22 +68,24 @@ static inline void fpop(CPUX86State *env)
     env->fpstt = (env->fpstt + 1) & 7;
 }
 
-static inline floatx80 helper_fldt(CPUX86State *env, target_ulong ptr)
+static inline floatx80 helper_fldt(CPUX86State *env, target_ulong ptr,
+                                   uintptr_t retaddr)
 {
     CPU_LDoubleU temp;
 
-    temp.l.lower = cpu_ldq_data(env, ptr);
-    temp.l.upper = cpu_lduw_data(env, ptr + 8);
+    temp.l.lower = cpu_ldq_data_ra(env, ptr, retaddr);
+    temp.l.upper = cpu_lduw_data_ra(env, ptr + 8, retaddr);
     return temp.d;
 }
 
-static inline void helper_fstt(CPUX86State *env, floatx80 f, target_ulong ptr)
+static inline void helper_fstt(CPUX86State *env, floatx80 f, target_ulong ptr,
+                               uintptr_t retaddr)
 {
     CPU_LDoubleU temp;
 
     temp.d = f;
-    cpu_stq_data(env, ptr, temp.l.lower);
-    cpu_stw_data(env, ptr + 8, temp.l.upper);
+    cpu_stq_data_ra(env, ptr, temp.l.lower, retaddr);
+    cpu_stw_data_ra(env, ptr + 8, temp.l.upper, retaddr);
 }
 
 /* x87 FPU helpers */
@@ -126,10 +128,10 @@ static inline floatx80 helper_fdiv(CPUX86State *env, floatx80 a, floatx80 b)
     return floatx80_div(a, b, &env->fp_status);
 }
 
-static void fpu_raise_exception(CPUX86State *env)
+static void fpu_raise_exception(CPUX86State *env, uintptr_t retaddr)
 {
     if (env->cr[0] & CR0_NE_MASK) {
-        raise_exception(env, EXCP10_COPR);
+        raise_exception(env, EXCP10_COPR, retaddr);
     }
 #if !defined(CONFIG_USER_ONLY)
     else {
@@ -314,14 +316,14 @@ void helper_fldt_ST0(CPUX86State *env, target_ulong ptr)
     int new_fpstt;
 
     new_fpstt = (env->fpstt - 1) & 7;
-    env->fpregs[new_fpstt].d = helper_fldt(env, ptr);
+    env->fpregs[new_fpstt].d = helper_fldt(env, ptr, GETPC());
     env->fpstt = new_fpstt;
     env->fptags[new_fpstt] = 0; /* validate stack entry */
 }
 
 void helper_fstt_ST0(CPUX86State *env, target_ulong ptr)
 {
-    helper_fstt(env, ST0, ptr);
+    helper_fstt(env, ST0, ptr, GETPC());
 }
 
 void helper_fpush(CPUX86State *env)
@@ -604,7 +606,7 @@ void helper_fclex(CPUX86State *env)
 void helper_fwait(CPUX86State *env)
 {
     if (env->fpus & FPUS_SE) {
-        fpu_raise_exception(env);
+        fpu_raise_exception(env, GETPC());
     }
 }
 
@@ -634,11 +636,11 @@ void helper_fbld_ST0(CPUX86State *env, target_ulong ptr)
 
     val = 0;
     for (i = 8; i >= 0; i--) {
-        v = cpu_ldub_data(env, ptr + i);
+        v = cpu_ldub_data_ra(env, ptr + i, GETPC());
         val = (val * 100) + ((v >> 4) * 10) + (v & 0xf);
     }
     tmp = int64_to_floatx80(val, &env->fp_status);
-    if (cpu_ldub_data(env, ptr + 9) & 0x80) {
+    if (cpu_ldub_data_ra(env, ptr + 9, GETPC()) & 0x80) {
         tmp = floatx80_chs(tmp);
     }
     fpush(env);
@@ -655,10 +657,10 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
     mem_ref = ptr;
     mem_end = mem_ref + 9;
     if (val < 0) {
-        cpu_stb_data(env, mem_end, 0x80);
+        cpu_stb_data_ra(env, mem_end, 0x80, GETPC());
         val = -val;
     } else {
-        cpu_stb_data(env, mem_end, 0x00);
+        cpu_stb_data_ra(env, mem_end, 0x00, GETPC());
     }
     while (mem_ref < mem_end) {
         if (val == 0) {
@@ -667,10 +669,10 @@ void helper_fbst_ST0(CPUX86State *env, target_ulong ptr)
         v = val % 100;
         val = val / 100;
         v = ((v / 10) << 4) | (v % 10);
-        cpu_stb_data(env, mem_ref++, v);
+        cpu_stb_data_ra(env, mem_ref++, v, GETPC());
     }
     while (mem_ref < mem_end) {
-        cpu_stb_data(env, mem_ref++, 0);
+        cpu_stb_data_ra(env, mem_ref++, 0, GETPC());
     }
 }
 
@@ -978,7 +980,8 @@ void helper_fxam_ST0(CPUX86State *env)
     }
 }
 
-void helper_fstenv(CPUX86State *env, target_ulong ptr, int data32)
+static void do_fstenv(CPUX86State *env, target_ulong ptr, int data32,
+                      uintptr_t retaddr)
 {
     int fpus, fptag, exp, i;
     uint64_t mant;
@@ -1006,37 +1009,43 @@ void helper_fstenv(CPUX86State *env, target_ulong ptr, int data32)
     }
     if (data32) {
         /* 32 bit */
-        cpu_stl_data(env, ptr, env->fpuc);
-        cpu_stl_data(env, ptr + 4, fpus);
-        cpu_stl_data(env, ptr + 8, fptag);
-        cpu_stl_data(env, ptr + 12, 0); /* fpip */
-        cpu_stl_data(env, ptr + 16, 0); /* fpcs */
-        cpu_stl_data(env, ptr + 20, 0); /* fpoo */
-        cpu_stl_data(env, ptr + 24, 0); /* fpos */
+        cpu_stl_data_ra(env, ptr, env->fpuc, retaddr);
+        cpu_stl_data_ra(env, ptr + 4, fpus, retaddr);
+        cpu_stl_data_ra(env, ptr + 8, fptag, retaddr);
+        cpu_stl_data_ra(env, ptr + 12, 0, retaddr); /* fpip */
+        cpu_stl_data_ra(env, ptr + 16, 0, retaddr); /* fpcs */
+        cpu_stl_data_ra(env, ptr + 20, 0, retaddr); /* fpoo */
+        cpu_stl_data_ra(env, ptr + 24, 0, retaddr); /* fpos */
     } else {
         /* 16 bit */
-        cpu_stw_data(env, ptr, env->fpuc);
-        cpu_stw_data(env, ptr + 2, fpus);
-        cpu_stw_data(env, ptr + 4, fptag);
-        cpu_stw_data(env, ptr + 6, 0);
-        cpu_stw_data(env, ptr + 8, 0);
-        cpu_stw_data(env, ptr + 10, 0);
-        cpu_stw_data(env, ptr + 12, 0);
+        cpu_stw_data_ra(env, ptr, env->fpuc, retaddr);
+        cpu_stw_data_ra(env, ptr + 2, fpus, retaddr);
+        cpu_stw_data_ra(env, ptr + 4, fptag, retaddr);
+        cpu_stw_data_ra(env, ptr + 6, 0, retaddr);
+        cpu_stw_data_ra(env, ptr + 8, 0, retaddr);
+        cpu_stw_data_ra(env, ptr + 10, 0, retaddr);
+        cpu_stw_data_ra(env, ptr + 12, 0, retaddr);
     }
 }
 
-void helper_fldenv(CPUX86State *env, target_ulong ptr, int data32)
+void helper_fstenv(CPUX86State *env, target_ulong ptr, int data32)
+{
+    do_fstenv(env, ptr, data32, GETPC());
+}
+
+static void do_fldenv(CPUX86State *env, target_ulong ptr, int data32,
+                      uintptr_t retaddr)
 {
     int i, fpus, fptag;
 
     if (data32) {
-        cpu_set_fpuc(env, cpu_lduw_data(env, ptr));
-        fpus = cpu_lduw_data(env, ptr + 4);
-        fptag = cpu_lduw_data(env, ptr + 8);
+        cpu_set_fpuc(env, cpu_lduw_data_ra(env, ptr, retaddr));
+        fpus = cpu_lduw_data_ra(env, ptr + 4, retaddr);
+        fptag = cpu_lduw_data_ra(env, ptr + 8, retaddr);
     } else {
-        cpu_set_fpuc(env, cpu_lduw_data(env, ptr));
-        fpus = cpu_lduw_data(env, ptr + 2);
-        fptag = cpu_lduw_data(env, ptr + 4);
+        cpu_set_fpuc(env, cpu_lduw_data_ra(env, ptr, retaddr));
+        fpus = cpu_lduw_data_ra(env, ptr + 2, retaddr);
+        fptag = cpu_lduw_data_ra(env, ptr + 4, retaddr);
     }
     env->fpstt = (fpus >> 11) & 7;
     env->fpus = fpus & ~0x3800;
@@ -1046,17 +1055,22 @@ void helper_fldenv(CPUX86State *env, target_ulong ptr, int data32)
     }
 }
 
+void helper_fldenv(CPUX86State *env, target_ulong ptr, int data32)
+{
+    do_fldenv(env, ptr, data32, GETPC());
+}
+
 void helper_fsave(CPUX86State *env, target_ulong ptr, int data32)
 {
     floatx80 tmp;
     int i;
 
-    helper_fstenv(env, ptr, data32);
+    do_fstenv(env, ptr, data32, GETPC());
 
     ptr += (14 << data32);
     for (i = 0; i < 8; i++) {
         tmp = ST(i);
-        helper_fstt(env, tmp, ptr);
+        helper_fstt(env, tmp, ptr, GETPC());
         ptr += 10;
     }
 
@@ -1079,11 +1093,11 @@ void helper_frstor(CPUX86State *env, target_ulong ptr, int data32)
     floatx80 tmp;
     int i;
 
-    helper_fldenv(env, ptr, data32);
+    do_fldenv(env, ptr, data32, GETPC());
     ptr += (14 << data32);
 
     for (i = 0; i < 8; i++) {
-        tmp = helper_fldt(env, ptr);
+        tmp = helper_fldt(env, ptr, GETPC());
         ST(i) = tmp;
         ptr += 10;
     }
@@ -1109,7 +1123,7 @@ void helper_fxsave(CPUX86State *env, target_ulong ptr, int data64)
 
     /* The operand must be 16 byte aligned */
     if (ptr & 0xf) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
 
     fpus = (env->fpus & ~0x3800) | (env->fpstt & 0x7) << 11;
@@ -1117,33 +1131,33 @@ void helper_fxsave(CPUX86State *env, target_ulong ptr, int data64)
     for (i = 0; i < 8; i++) {
         fptag |= (env->fptags[i] << i);
     }
-    cpu_stw_data(env, ptr, env->fpuc);
-    cpu_stw_data(env, ptr + 2, fpus);
-    cpu_stw_data(env, ptr + 4, fptag ^ 0xff);
+    cpu_stw_data_ra(env, ptr, env->fpuc, GETPC());
+    cpu_stw_data_ra(env, ptr + 2, fpus, GETPC());
+    cpu_stw_data_ra(env, ptr + 4, fptag ^ 0xff, GETPC());
 #ifdef TARGET_X86_64
     if (data64) {
-        cpu_stq_data(env, ptr + 0x08, 0); /* rip */
-        cpu_stq_data(env, ptr + 0x10, 0); /* rdp */
+        cpu_stq_data_ra(env, ptr + 0x08, 0, GETPC()); /* rip */
+        cpu_stq_data_ra(env, ptr + 0x10, 0, GETPC()); /* rdp */
     } else
 #endif
     {
-        cpu_stl_data(env, ptr + 0x08, 0); /* eip */
-        cpu_stl_data(env, ptr + 0x0c, 0); /* sel  */
-        cpu_stl_data(env, ptr + 0x10, 0); /* dp */
-        cpu_stl_data(env, ptr + 0x14, 0); /* sel  */
+        cpu_stl_data_ra(env, ptr + 0x08, 0, GETPC()); /* eip */
+        cpu_stl_data_ra(env, ptr + 0x0c, 0, GETPC()); /* sel  */
+        cpu_stl_data_ra(env, ptr + 0x10, 0, GETPC()); /* dp */
+        cpu_stl_data_ra(env, ptr + 0x14, 0, GETPC()); /* sel  */
     }
 
     addr = ptr + 0x20;
     for (i = 0; i < 8; i++) {
         tmp = ST(i);
-        helper_fstt(env, tmp, addr);
+        helper_fstt(env, tmp, addr, GETPC());
         addr += 16;
     }
 
     if (env->cr[4] & CR4_OSFXSR_MASK) {
         /* XXX: finish it */
-        cpu_stl_data(env, ptr + 0x18, env->mxcsr); /* mxcsr */
-        cpu_stl_data(env, ptr + 0x1c, 0x0000ffff); /* mxcsr_mask */
+        cpu_stl_data_ra(env, ptr + 0x18, env->mxcsr, GETPC()); /* mxcsr */
+        cpu_stl_data_ra(env, ptr + 0x1c, 0x0000ffff, GETPC()); /* mxcsr_mask */
         if (env->hflags & HF_CS64_MASK) {
             nb_xmm_regs = 16;
         } else {
@@ -1155,8 +1169,8 @@ void helper_fxsave(CPUX86State *env, target_ulong ptr, int data64)
             || (env->hflags & HF_CPL_MASK)
             || !(env->hflags & HF_LMA_MASK)) {
             for (i = 0; i < nb_xmm_regs; i++) {
-                cpu_stq_data(env, addr, env->xmm_regs[i].XMM_Q(0));
-                cpu_stq_data(env, addr + 8, env->xmm_regs[i].XMM_Q(1));
+                cpu_stq_data_ra(env, addr, env->xmm_regs[i].XMM_Q(0), GETPC());
+                cpu_stq_data_ra(env, addr + 8, env->xmm_regs[i].XMM_Q(1), GETPC());
                 addr += 16;
             }
         }
@@ -1171,12 +1185,12 @@ void helper_fxrstor(CPUX86State *env, target_ulong ptr, int data64)
 
     /* The operand must be 16 byte aligned */
     if (ptr & 0xf) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
 
-    cpu_set_fpuc(env, cpu_lduw_data(env, ptr));
-    fpus = cpu_lduw_data(env, ptr + 2);
-    fptag = cpu_lduw_data(env, ptr + 4);
+    cpu_set_fpuc(env, cpu_lduw_data_ra(env, ptr, GETPC()));
+    fpus = cpu_lduw_data_ra(env, ptr + 2, GETPC());
+    fptag = cpu_lduw_data_ra(env, ptr + 4, GETPC());
     env->fpstt = (fpus >> 11) & 7;
     env->fpus = fpus & ~0x3800;
     fptag ^= 0xff;
@@ -1186,14 +1200,14 @@ void helper_fxrstor(CPUX86State *env, target_ulong ptr, int data64)
 
     addr = ptr + 0x20;
     for (i = 0; i < 8; i++) {
-        tmp = helper_fldt(env, addr);
+        tmp = helper_fldt(env, addr, GETPC());
         ST(i) = tmp;
         addr += 16;
     }
 
     if (env->cr[4] & CR4_OSFXSR_MASK) {
         /* XXX: finish it */
-        cpu_set_mxcsr(env, cpu_ldl_data(env, ptr + 0x18));
+        cpu_set_mxcsr(env, cpu_ldl_data_ra(env, ptr + 0x18, GETPC()));
         /* cpu_ldl_data(env, ptr + 0x1c); */
         if (env->hflags & HF_CS64_MASK) {
             nb_xmm_regs = 16;
@@ -1206,8 +1220,8 @@ void helper_fxrstor(CPUX86State *env, target_ulong ptr, int data64)
             || (env->hflags & HF_CPL_MASK)
             || !(env->hflags & HF_LMA_MASK)) {
             for (i = 0; i < nb_xmm_regs; i++) {
-                env->xmm_regs[i].XMM_Q(0) = cpu_ldq_data(env, addr);
-                env->xmm_regs[i].XMM_Q(1) = cpu_ldq_data(env, addr + 8);
+                env->xmm_regs[i].XMM_Q(0) = cpu_ldq_data_ra(env, addr, GETPC());
+                env->xmm_regs[i].XMM_Q(1) = cpu_ldq_data_ra(env, addr + 8, GETPC());
                 addr += 16;
             }
         }
diff --git a/target-i386/helper.c b/target-i386/helper.c
index 4f1ddf7..8fdc884 100644
--- a/target-i386/helper.c
+++ b/target-i386/helper.c
@@ -1014,7 +1014,7 @@ void breakpoint_handler(CPUState *cs)
         if (cs->watchpoint_hit->flags & BP_CPU) {
             cs->watchpoint_hit = NULL;
             if (check_hw_breakpoints(env, false)) {
-                raise_exception(env, EXCP01_DB);
+                raise_exception(env, EXCP01_DB, 0);
             } else {
                 cpu_resume_from_signal(cs, NULL);
             }
@@ -1024,7 +1024,7 @@ void breakpoint_handler(CPUState *cs)
             if (bp->pc == env->eip) {
                 if (bp->flags & BP_CPU) {
                     check_hw_breakpoints(env, true);
-                    raise_exception(env, EXCP01_DB);
+                    raise_exception(env, EXCP01_DB, 0);
                 }
                 break;
             }
diff --git a/target-i386/int_helper.c b/target-i386/int_helper.c
index b0d78e6..eacce79 100644
--- a/target-i386/int_helper.c
+++ b/target-i386/int_helper.c
@@ -48,11 +48,11 @@ void helper_divb_AL(CPUX86State *env, target_ulong t0)
     num = (env->regs[R_EAX] & 0xffff);
     den = (t0 & 0xff);
     if (den == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q = (num / den);
     if (q > 0xff) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q &= 0xff;
     r = (num % den) & 0xff;
@@ -66,11 +66,11 @@ void helper_idivb_AL(CPUX86State *env, target_ulong t0)
     num = (int16_t)env->regs[R_EAX];
     den = (int8_t)t0;
     if (den == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q = (num / den);
     if (q != (int8_t)q) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q &= 0xff;
     r = (num % den) & 0xff;
@@ -84,11 +84,11 @@ void helper_divw_AX(CPUX86State *env, target_ulong t0)
     num = (env->regs[R_EAX] & 0xffff) | ((env->regs[R_EDX] & 0xffff) << 16);
     den = (t0 & 0xffff);
     if (den == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q = (num / den);
     if (q > 0xffff) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q &= 0xffff;
     r = (num % den) & 0xffff;
@@ -103,11 +103,11 @@ void helper_idivw_AX(CPUX86State *env, target_ulong t0)
     num = (env->regs[R_EAX] & 0xffff) | ((env->regs[R_EDX] & 0xffff) << 16);
     den = (int16_t)t0;
     if (den == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q = (num / den);
     if (q != (int16_t)q) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q &= 0xffff;
     r = (num % den) & 0xffff;
@@ -123,12 +123,12 @@ void helper_divl_EAX(CPUX86State *env, target_ulong t0)
     num = ((uint32_t)env->regs[R_EAX]) | ((uint64_t)((uint32_t)env->regs[R_EDX]) << 32);
     den = t0;
     if (den == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q = (num / den);
     r = (num % den);
     if (q > 0xffffffff) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     env->regs[R_EAX] = (uint32_t)q;
     env->regs[R_EDX] = (uint32_t)r;
@@ -142,12 +142,12 @@ void helper_idivl_EAX(CPUX86State *env, target_ulong t0)
     num = ((uint32_t)env->regs[R_EAX]) | ((uint64_t)((uint32_t)env->regs[R_EDX]) << 32);
     den = t0;
     if (den == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     q = (num / den);
     r = (num % den);
     if (q != (int32_t)q) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     env->regs[R_EAX] = (uint32_t)q;
     env->regs[R_EDX] = (uint32_t)r;
@@ -379,12 +379,12 @@ void helper_divq_EAX(CPUX86State *env, target_ulong t0)
     uint64_t r0, r1;
 
     if (t0 == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     r0 = env->regs[R_EAX];
     r1 = env->regs[R_EDX];
     if (div64(&r0, &r1, t0)) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     env->regs[R_EAX] = r0;
     env->regs[R_EDX] = r1;
@@ -395,12 +395,12 @@ void helper_idivq_EAX(CPUX86State *env, target_ulong t0)
     uint64_t r0, r1;
 
     if (t0 == 0) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     r0 = env->regs[R_EAX];
     r1 = env->regs[R_EDX];
     if (idiv64(&r0, &r1, t0)) {
-        raise_exception(env, EXCP00_DIVZ);
+        raise_exception(env, EXCP00_DIVZ, GETPC());
     }
     env->regs[R_EAX] = r0;
     env->regs[R_EDX] = r1;
diff --git a/target-i386/mem_helper.c b/target-i386/mem_helper.c
index 1aec8a5..182935b 100644
--- a/target-i386/mem_helper.c
+++ b/target-i386/mem_helper.c
@@ -41,13 +41,14 @@ void helper_cmpxchg8b(CPUX86State *env, target_ulong a0)
     int eflags;
 
     eflags = cpu_cc_compute_all(env, CC_OP);
-    d = cpu_ldq_data(env, a0);
+    d = cpu_ldq_data_ra(env, a0, GETPC());
     if (d == (((uint64_t)env->regs[R_EDX] << 32) | (uint32_t)env->regs[R_EAX])) {
-        cpu_stq_data(env, a0, ((uint64_t)env->regs[R_ECX] << 32) | (uint32_t)env->regs[R_EBX]);
+        cpu_stq_data_ra(env, a0, ((uint64_t)env->regs[R_ECX] << 32)
+                                  | (uint32_t)env->regs[R_EBX], GETPC());
         eflags |= CC_Z;
     } else {
         /* always do the store */
-        cpu_stq_data(env, a0, d);
+        cpu_stq_data_ra(env, a0, d, GETPC());
         env->regs[R_EDX] = (uint32_t)(d >> 32);
         env->regs[R_EAX] = (uint32_t)d;
         eflags &= ~CC_Z;
@@ -62,19 +63,19 @@ void helper_cmpxchg16b(CPUX86State *env, target_ulong a0)
     int eflags;
 
     if ((a0 & 0xf) != 0) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
     eflags = cpu_cc_compute_all(env, CC_OP);
-    d0 = cpu_ldq_data(env, a0);
-    d1 = cpu_ldq_data(env, a0 + 8);
+    d0 = cpu_ldq_data_ra(env, a0, GETPC());
+    d1 = cpu_ldq_data_ra(env, a0 + 8, GETPC());
     if (d0 == env->regs[R_EAX] && d1 == env->regs[R_EDX]) {
-        cpu_stq_data(env, a0, env->regs[R_EBX]);
-        cpu_stq_data(env, a0 + 8, env->regs[R_ECX]);
+        cpu_stq_data_ra(env, a0, env->regs[R_EBX], GETPC());
+        cpu_stq_data_ra(env, a0 + 8, env->regs[R_ECX], GETPC());
         eflags |= CC_Z;
     } else {
         /* always do the store */
-        cpu_stq_data(env, a0, d0);
-        cpu_stq_data(env, a0 + 8, d1);
+        cpu_stq_data_ra(env, a0, d0, GETPC());
+        cpu_stq_data_ra(env, a0 + 8, d1, GETPC());
         env->regs[R_EDX] = d1;
         env->regs[R_EAX] = d0;
         eflags &= ~CC_Z;
@@ -87,11 +88,11 @@ void helper_boundw(CPUX86State *env, target_ulong a0, int v)
 {
     int low, high;
 
-    low = cpu_ldsw_data(env, a0);
-    high = cpu_ldsw_data(env, a0 + 2);
+    low = cpu_ldsw_data_ra(env, a0, GETPC());
+    high = cpu_ldsw_data_ra(env, a0 + 2, GETPC());
     v = (int16_t)v;
     if (v < low || v > high) {
-        raise_exception(env, EXCP05_BOUND);
+        raise_exception(env, EXCP05_BOUND, GETPC());
     }
 }
 
@@ -99,10 +100,10 @@ void helper_boundl(CPUX86State *env, target_ulong a0, int v)
 {
     int low, high;
 
-    low = cpu_ldl_data(env, a0);
-    high = cpu_ldl_data(env, a0 + 4);
+    low = cpu_ldl_data_ra(env, a0, GETPC());
+    high = cpu_ldl_data_ra(env, a0 + 4, GETPC());
     if (v < low || v > high) {
-        raise_exception(env, EXCP05_BOUND);
+        raise_exception(env, EXCP05_BOUND, GETPC());
     }
 }
 
@@ -122,11 +123,7 @@ void tlb_fill(CPUState *cs, target_ulong addr, int is_write, int mmu_idx,
         X86CPU *cpu = X86_CPU(cs);
         CPUX86State *env = &cpu->env;
 
-        if (retaddr) {
-            /* now we have a real cpu fault */
-            cpu_restore_state(cs, retaddr);
-        }
-        raise_exception_err(env, cs->exception_index, env->error_code);
+        raise_exception_err(env, cs->exception_index, env->error_code, retaddr);
     }
 }
 #endif
diff --git a/target-i386/misc_helper.c b/target-i386/misc_helper.c
index 4aaf1e4..cb1ea8d 100644
--- a/target-i386/misc_helper.c
+++ b/target-i386/misc_helper.c
@@ -68,7 +68,7 @@ void helper_single_step(CPUX86State *env)
     check_hw_breakpoints(env, true);
     env->dr[6] |= DR6_BS;
 #endif
-    raise_exception(env, EXCP01_DB);
+    raise_exception(env, EXCP01_DB, 0);
 }
 
 void helper_cpuid(CPUX86State *env)
@@ -187,7 +187,7 @@ void helper_rdtsc(CPUX86State *env)
     uint64_t val;
 
     if ((env->cr[4] & CR4_TSD_MASK) && ((env->hflags & HF_CPL_MASK) != 0)) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
     cpu_svm_check_intercept_param(env, SVM_EXIT_RDTSC, 0);
 
@@ -205,13 +205,13 @@ void helper_rdtscp(CPUX86State *env)
 void helper_rdpmc(CPUX86State *env)
 {
     if ((env->cr[4] & CR4_PCE_MASK) && ((env->hflags & HF_CPL_MASK) != 0)) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
     cpu_svm_check_intercept_param(env, SVM_EXIT_RDPMC, 0);
 
     /* currently unimplemented */
     qemu_log_mask(LOG_UNIMP, "x86: unimplemented rdpmc\n");
-    raise_exception_err(env, EXCP06_ILLOP, 0);
+    raise_exception_err(env, EXCP06_ILLOP, 0, GETPC());
 }
 
 #if defined(CONFIG_USER_ONLY)
@@ -556,7 +556,7 @@ void helper_hlt(CPUX86State *env, int next_eip_addend)
 void helper_monitor(CPUX86State *env, target_ulong ptr)
 {
     if ((uint32_t)env->regs[R_ECX] != 0) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
     /* XXX: store address? */
     cpu_svm_check_intercept_param(env, SVM_EXIT_MONITOR, 0);
@@ -568,7 +568,7 @@ void helper_mwait(CPUX86State *env, int next_eip_addend)
     X86CPU *cpu;
 
     if ((uint32_t)env->regs[R_ECX] != 0) {
-        raise_exception(env, EXCP0D_GPF);
+        raise_exception(env, EXCP0D_GPF, GETPC());
     }
     cpu_svm_check_intercept_param(env, SVM_EXIT_MWAIT, 0);
     env->eip += next_eip_addend;
diff --git a/target-i386/ops_sse.h b/target-i386/ops_sse.h
index 0765073..8c8e53b 100644
--- a/target-i386/ops_sse.h
+++ b/target-i386/ops_sse.h
@@ -483,7 +483,7 @@ void glue(helper_maskmov, SUFFIX)(CPUX86State *env, Reg *d, Reg *s,
 
     for (i = 0; i < (8 << SHIFT); i++) {
         if (s->B(i) & 0x80) {
-            cpu_stb_data(env, a0 + i, d->B(i));
+            cpu_stb_data_ra(env, a0 + i, d->B(i), GETPC());
         }
     }
 }
diff --git a/target-i386/seg_helper.c b/target-i386/seg_helper.c
index 92a49b3..9c3a77d 100644
--- a/target-i386/seg_helper.c
+++ b/target-i386/seg_helper.c
@@ -68,7 +68,8 @@
 
 /* return non zero if error */
 static inline int load_segment(CPUX86State *env, uint32_t *e1_ptr,
-                               uint32_t *e2_ptr, int selector)
+                               uint32_t *e2_ptr, int selector,
+                               uintptr_t retaddr)
 {
     SegmentCache *dt;
     int index;
@@ -84,8 +85,8 @@ static inline int load_segment(CPUX86State *env, uint32_t *e1_ptr,
         return -1;
     }
     ptr = dt->base + index;
-    *e1_ptr = cpu_ldl_kernel(env, ptr);
-    *e2_ptr = cpu_ldl_kernel(env, ptr + 4);
+    *e1_ptr = cpu_ldl_kernel_ra(env, ptr, retaddr);
+    *e2_ptr = cpu_ldl_kernel_ra(env, ptr + 4, retaddr);
     return 0;
 }
 
@@ -124,7 +125,8 @@ static inline void load_seg_vm(CPUX86State *env, int seg, int selector)
 }
 
 static inline void get_ss_esp_from_tss(CPUX86State *env, uint32_t *ss_ptr,
-                                       uint32_t *esp_ptr, int dpl)
+                                       uint32_t *esp_ptr, int dpl,
+                                       uintptr_t retaddr)
 {
     X86CPU *cpu = x86_env_get_cpu(env);
     int type, index, shift;
@@ -153,60 +155,61 @@ static inline void get_ss_esp_from_tss(CPUX86State *env, uint32_t *ss_ptr,
     shift = type >> 3;
     index = (dpl * 4 + 2) << shift;
     if (index + (4 << shift) - 1 > env->tr.limit) {
-        raise_exception_err(env, EXCP0A_TSS, env->tr.selector & 0xfffc);
+        raise_exception_err(env, EXCP0A_TSS, env->tr.selector & 0xfffc, retaddr);
     }
     if (shift == 0) {
-        *esp_ptr = cpu_lduw_kernel(env, env->tr.base + index);
-        *ss_ptr = cpu_lduw_kernel(env, env->tr.base + index + 2);
+        *esp_ptr = cpu_lduw_kernel_ra(env, env->tr.base + index, retaddr);
+        *ss_ptr = cpu_lduw_kernel_ra(env, env->tr.base + index + 2, retaddr);
     } else {
-        *esp_ptr = cpu_ldl_kernel(env, env->tr.base + index);
-        *ss_ptr = cpu_lduw_kernel(env, env->tr.base + index + 4);
+        *esp_ptr = cpu_ldl_kernel_ra(env, env->tr.base + index, retaddr);
+        *ss_ptr = cpu_lduw_kernel_ra(env, env->tr.base + index + 4, retaddr);
     }
 }
 
-static void tss_load_seg(CPUX86State *env, int seg_reg, int selector, int cpl)
+static void tss_load_seg(CPUX86State *env, int seg_reg, int selector, int cpl,
+                         uintptr_t retaddr)
 {
     uint32_t e1, e2;
     int rpl, dpl;
 
     if ((selector & 0xfffc) != 0) {
-        if (load_segment(env, &e1, &e2, selector) != 0) {
-            raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+        if (load_segment(env, &e1, &e2, selector, retaddr) != 0) {
+            raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
         }
         if (!(e2 & DESC_S_MASK)) {
-            raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
         }
         rpl = selector & 3;
         dpl = (e2 >> DESC_DPL_SHIFT) & 3;
         if (seg_reg == R_CS) {
             if (!(e2 & DESC_CS_MASK)) {
-                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
             }
             if (dpl != rpl) {
-                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
             }
         } else if (seg_reg == R_SS) {
             /* SS must be writable data */
             if ((e2 & DESC_CS_MASK) || !(e2 & DESC_W_MASK)) {
-                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
             }
             if (dpl != cpl || dpl != rpl) {
-                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
             }
         } else {
             /* not readable code */
             if ((e2 & DESC_CS_MASK) && !(e2 & DESC_R_MASK)) {
-                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
             }
             /* if data or non conforming code, checks the rights */
             if (((e2 >> DESC_TYPE_SHIFT) & 0xf) < 12) {
                 if (dpl < cpl || dpl < rpl) {
-                    raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+                    raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
                 }
             }
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, retaddr);
         }
         cpu_x86_load_seg_cache(env, seg_reg, selector,
                                get_seg_base(e1, e2),
@@ -214,7 +217,7 @@ static void tss_load_seg(CPUX86State *env, int seg_reg, int selector, int cpl)
                                e2);
     } else {
         if (seg_reg == R_SS || seg_reg == R_CS) {
-            raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, selector & 0xfffc, retaddr);
         }
     }
 }
@@ -226,7 +229,7 @@ static void tss_load_seg(CPUX86State *env, int seg_reg, int selector, int cpl)
 /* XXX: restore CPU state in registers (PowerPC case) */
 static void switch_tss(CPUX86State *env, int tss_selector,
                        uint32_t e1, uint32_t e2, int source,
-                       uint32_t next_eip)
+                       uint32_t next_eip, uintptr_t retaddr)
 {
     int tss_limit, tss_limit_max, type, old_tss_limit_max, old_type, v1, v2, i;
     target_ulong tss_base;
@@ -244,26 +247,26 @@ static void switch_tss(CPUX86State *env, int tss_selector,
     /* if task gate, we read the TSS segment and we load it */
     if (type == 5) {
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, tss_selector & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, tss_selector & 0xfffc, retaddr);
         }
         tss_selector = e1 >> 16;
         if (tss_selector & 4) {
-            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr);
         }
-        if (load_segment(env, &e1, &e2, tss_selector) != 0) {
-            raise_exception_err(env, EXCP0D_GPF, tss_selector & 0xfffc);
+        if (load_segment(env, &e1, &e2, tss_selector, retaddr) != 0) {
+            raise_exception_err(env, EXCP0D_GPF, tss_selector & 0xfffc, retaddr);
         }
         if (e2 & DESC_S_MASK) {
-            raise_exception_err(env, EXCP0D_GPF, tss_selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, tss_selector & 0xfffc, retaddr);
         }
         type = (e2 >> DESC_TYPE_SHIFT) & 0xf;
         if ((type & 7) != 1) {
-            raise_exception_err(env, EXCP0D_GPF, tss_selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, tss_selector & 0xfffc, retaddr);
         }
     }
 
     if (!(e2 & DESC_P_MASK)) {
-        raise_exception_err(env, EXCP0B_NOSEG, tss_selector & 0xfffc);
+        raise_exception_err(env, EXCP0B_NOSEG, tss_selector & 0xfffc, retaddr);
     }
 
     if (type & 8) {
@@ -275,7 +278,7 @@ static void switch_tss(CPUX86State *env, int tss_selector,
     tss_base = get_seg_base(e1, e2);
     if ((tss_selector & 4) != 0 ||
         tss_limit < tss_limit_max) {
-        raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc);
+        raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc, retaddr);
     }
     old_type = (env->tr.flags >> DESC_TYPE_SHIFT) & 0xf;
     if (old_type & 8) {
@@ -287,30 +290,33 @@ static void switch_tss(CPUX86State *env, int tss_selector,
     /* read all the registers from the new TSS */
     if (type & 8) {
         /* 32 bit */
-        new_cr3 = cpu_ldl_kernel(env, tss_base + 0x1c);
-        new_eip = cpu_ldl_kernel(env, tss_base + 0x20);
-        new_eflags = cpu_ldl_kernel(env, tss_base + 0x24);
+        new_cr3 = cpu_ldl_kernel_ra(env, tss_base + 0x1c, retaddr);
+        new_eip = cpu_ldl_kernel_ra(env, tss_base + 0x20, retaddr);
+        new_eflags = cpu_ldl_kernel_ra(env, tss_base + 0x24, retaddr);
         for (i = 0; i < 8; i++) {
-            new_regs[i] = cpu_ldl_kernel(env, tss_base + (0x28 + i * 4));
+            new_regs[i] = cpu_ldl_kernel_ra(env, tss_base + (0x28 + i * 4),
+                                            retaddr);
         }
         for (i = 0; i < 6; i++) {
-            new_segs[i] = cpu_lduw_kernel(env, tss_base + (0x48 + i * 4));
+            new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x48 + i * 4),
+                                             retaddr);
         }
-        new_ldt = cpu_lduw_kernel(env, tss_base + 0x60);
-        new_trap = cpu_ldl_kernel(env, tss_base + 0x64);
+        new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x60, retaddr);
+        new_trap = cpu_ldl_kernel_ra(env, tss_base + 0x64, retaddr);
     } else {
         /* 16 bit */
         new_cr3 = 0;
-        new_eip = cpu_lduw_kernel(env, tss_base + 0x0e);
-        new_eflags = cpu_lduw_kernel(env, tss_base + 0x10);
+        new_eip = cpu_lduw_kernel_ra(env, tss_base + 0x0e, retaddr);
+        new_eflags = cpu_lduw_kernel_ra(env, tss_base + 0x10, retaddr);
         for (i = 0; i < 8; i++) {
-            new_regs[i] = cpu_lduw_kernel(env, tss_base + (0x12 + i * 2)) |
-                0xffff0000;
+            new_regs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x12 + i * 2),
+                                             retaddr) | 0xffff0000;
         }
         for (i = 0; i < 4; i++) {
-            new_segs[i] = cpu_lduw_kernel(env, tss_base + (0x22 + i * 4));
+            new_segs[i] = cpu_lduw_kernel_ra(env, tss_base + (0x22 + i * 4),
+                                             retaddr);
         }
-        new_ldt = cpu_lduw_kernel(env, tss_base + 0x2a);
+        new_ldt = cpu_lduw_kernel_ra(env, tss_base + 0x2a, retaddr);
         new_segs[R_FS] = 0;
         new_segs[R_GS] = 0;
         new_trap = 0;
@@ -325,10 +331,10 @@ static void switch_tss(CPUX86State *env, int tss_selector,
     /* XXX: it can still fail in some cases, so a bigger hack is
        necessary to valid the TLB after having done the accesses */
 
-    v1 = cpu_ldub_kernel(env, env->tr.base);
-    v2 = cpu_ldub_kernel(env, env->tr.base + old_tss_limit_max);
-    cpu_stb_kernel(env, env->tr.base, v1);
-    cpu_stb_kernel(env, env->tr.base + old_tss_limit_max, v2);
+    v1 = cpu_ldub_kernel_ra(env, env->tr.base, retaddr);
+    v2 = cpu_ldub_kernel_ra(env, env->tr.base + old_tss_limit_max, retaddr);
+    cpu_stb_kernel_ra(env, env->tr.base, v1, retaddr);
+    cpu_stb_kernel_ra(env, env->tr.base + old_tss_limit_max, v2, retaddr);
 
     /* clear busy bit (it is restartable) */
     if (source == SWITCH_TSS_JMP || source == SWITCH_TSS_IRET) {
@@ -336,9 +342,9 @@ static void switch_tss(CPUX86State *env, int tss_selector,
         uint32_t e2;
 
         ptr = env->gdt.base + (env->tr.selector & ~7);
-        e2 = cpu_ldl_kernel(env, ptr + 4);
+        e2 = cpu_ldl_kernel_ra(env, ptr + 4, retaddr);
         e2 &= ~DESC_TSS_BUSY_MASK;
-        cpu_stl_kernel(env, ptr + 4, e2);
+        cpu_stl_kernel_ra(env, ptr + 4, e2, retaddr);
     }
     old_eflags = cpu_compute_eflags(env);
     if (source == SWITCH_TSS_IRET) {
@@ -348,35 +354,35 @@ static void switch_tss(CPUX86State *env, int tss_selector,
     /* save the current state in the old TSS */
     if (type & 8) {
         /* 32 bit */
-        cpu_stl_kernel(env, env->tr.base + 0x20, next_eip);
-        cpu_stl_kernel(env, env->tr.base + 0x24, old_eflags);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI]);
-        cpu_stl_kernel(env, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI]);
+        cpu_stl_kernel_ra(env, env->tr.base + 0x20, next_eip, retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + 0x24, old_eflags, retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 0 * 4), env->regs[R_EAX], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 1 * 4), env->regs[R_ECX], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 2 * 4), env->regs[R_EDX], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 3 * 4), env->regs[R_EBX], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 4 * 4), env->regs[R_ESP], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 5 * 4), env->regs[R_EBP], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 6 * 4), env->regs[R_ESI], retaddr);
+        cpu_stl_kernel_ra(env, env->tr.base + (0x28 + 7 * 4), env->regs[R_EDI], retaddr);
         for (i = 0; i < 6; i++) {
-            cpu_stw_kernel(env, env->tr.base + (0x48 + i * 4),
-                           env->segs[i].selector);
+            cpu_stw_kernel_ra(env, env->tr.base + (0x48 + i * 4),
+                              env->segs[i].selector, retaddr);
         }
     } else {
         /* 16 bit */
-        cpu_stw_kernel(env, env->tr.base + 0x0e, next_eip);
-        cpu_stw_kernel(env, env->tr.base + 0x10, old_eflags);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI]);
-        cpu_stw_kernel(env, env->tr.base + (0x12 + 7 * 2), env->regs[R_EDI]);
+        cpu_stw_kernel_ra(env, env->tr.base + 0x0e, next_eip, retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + 0x10, old_eflags, retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 0 * 2), env->regs[R_EAX], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 1 * 2), env->regs[R_ECX], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 2 * 2), env->regs[R_EDX], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 3 * 2), env->regs[R_EBX], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 4 * 2), env->regs[R_ESP], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 5 * 2), env->regs[R_EBP], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 6 * 2), env->regs[R_ESI], retaddr);
+        cpu_stw_kernel_ra(env, env->tr.base + (0x12 + 7 * 2), env->regs[R_EDI], retaddr);
         for (i = 0; i < 4; i++) {
-            cpu_stw_kernel(env, env->tr.base + (0x22 + i * 4),
-                           env->segs[i].selector);
+            cpu_stw_kernel_ra(env, env->tr.base + (0x22 + i * 4),
+                              env->segs[i].selector, retaddr);
         }
     }
 
@@ -384,7 +390,7 @@ static void switch_tss(CPUX86State *env, int tss_selector,
        context */
 
     if (source == SWITCH_TSS_CALL) {
-        cpu_stw_kernel(env, tss_base, env->tr.selector);
+        cpu_stw_kernel_ra(env, tss_base, env->tr.selector, retaddr);
         new_eflags |= NT_MASK;
     }
 
@@ -394,9 +400,9 @@ static void switch_tss(CPUX86State *env, int tss_selector,
         uint32_t e2;
 
         ptr = env->gdt.base + (tss_selector & ~7);
-        e2 = cpu_ldl_kernel(env, ptr + 4);
+        e2 = cpu_ldl_kernel_ra(env, ptr + 4, retaddr);
         e2 |= DESC_TSS_BUSY_MASK;
-        cpu_stl_kernel(env, ptr + 4, e2);
+        cpu_stl_kernel_ra(env, ptr + 4, e2, retaddr);
     }
 
     /* set the new CPU state */
@@ -448,23 +454,23 @@ static void switch_tss(CPUX86State *env, int tss_selector,
 
     /* load the LDT */
     if (new_ldt & 4) {
-        raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc);
+        raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc, retaddr);
     }
 
     if ((new_ldt & 0xfffc) != 0) {
         dt = &env->gdt;
         index = new_ldt & ~7;
         if ((index + 7) > dt->limit) {
-            raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc, retaddr);
         }
         ptr = dt->base + index;
-        e1 = cpu_ldl_kernel(env, ptr);
-        e2 = cpu_ldl_kernel(env, ptr + 4);
+        e1 = cpu_ldl_kernel_ra(env, ptr, retaddr);
+        e2 = cpu_ldl_kernel_ra(env, ptr + 4, retaddr);
         if ((e2 & DESC_S_MASK) || ((e2 >> DESC_TYPE_SHIFT) & 0xf) != 2) {
-            raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc, retaddr);
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, new_ldt & 0xfffc, retaddr);
         }
         load_seg_cache_raw_dt(&env->ldt, e1, e2);
     }
@@ -472,18 +478,18 @@ static void switch_tss(CPUX86State *env, int tss_selector,
     /* load the segments */
     if (!(new_eflags & VM_MASK)) {
         int cpl = new_segs[R_CS] & 3;
-        tss_load_seg(env, R_CS, new_segs[R_CS], cpl);
-        tss_load_seg(env, R_SS, new_segs[R_SS], cpl);
-        tss_load_seg(env, R_ES, new_segs[R_ES], cpl);
-        tss_load_seg(env, R_DS, new_segs[R_DS], cpl);
-        tss_load_seg(env, R_FS, new_segs[R_FS], cpl);
-        tss_load_seg(env, R_GS, new_segs[R_GS], cpl);
+        tss_load_seg(env, R_CS, new_segs[R_CS], cpl, retaddr);
+        tss_load_seg(env, R_SS, new_segs[R_SS], cpl, retaddr);
+        tss_load_seg(env, R_ES, new_segs[R_ES], cpl, retaddr);
+        tss_load_seg(env, R_DS, new_segs[R_DS], cpl, retaddr);
+        tss_load_seg(env, R_FS, new_segs[R_FS], cpl, retaddr);
+        tss_load_seg(env, R_GS, new_segs[R_GS], cpl, retaddr);
     }
 
     /* check that env->eip is in the CS segment limits */
     if (new_eip > env->segs[R_CS].limit) {
         /* XXX: different exception if CALL? */
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, retaddr);
     }
 
 #ifndef CONFIG_USER_ONLY
@@ -549,27 +555,27 @@ static int exception_has_error_code(int intno)
 #define SEG_ADDL(ssp, sp, sp_mask) ((uint32_t)((ssp) + (sp & (sp_mask))))
 
 /* XXX: add a is_user flag to have proper security support */
-#define PUSHW(ssp, sp, sp_mask, val)                             \
+#define PUSHW(ssp, sp, sp_mask, val, ra)                         \
     {                                                            \
         sp -= 2;                                                 \
-        cpu_stw_kernel(env, (ssp) + (sp & (sp_mask)), (val));    \
+        cpu_stw_kernel_ra(env, (ssp) + (sp & (sp_mask)), (val), ra); \
     }
 
-#define PUSHL(ssp, sp, sp_mask, val)                                    \
+#define PUSHL(ssp, sp, sp_mask, val, ra)                                \
     {                                                                   \
         sp -= 4;                                                        \
-        cpu_stl_kernel(env, SEG_ADDL(ssp, sp, sp_mask), (uint32_t)(val)); \
+        cpu_stl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), (uint32_t)(val), ra); \
     }
 
-#define POPW(ssp, sp, sp_mask, val)                              \
+#define POPW(ssp, sp, sp_mask, val, ra)                          \
     {                                                            \
-        val = cpu_lduw_kernel(env, (ssp) + (sp & (sp_mask)));    \
+        val = cpu_lduw_kernel_ra(env, (ssp) + (sp & (sp_mask)), ra); \
         sp += 2;                                                 \
     }
 
-#define POPL(ssp, sp, sp_mask, val)                                     \
+#define POPL(ssp, sp, sp_mask, val, ra)                                 \
     {                                                                   \
-        val = (uint32_t)cpu_ldl_kernel(env, SEG_ADDL(ssp, sp, sp_mask)); \
+        val = (uint32_t)cpu_ldl_kernel_ra(env, SEG_ADDL(ssp, sp, sp_mask), ra); \
         sp += 4;                                                        \
     }
 
@@ -598,7 +604,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
 
     dt = &env->idt;
     if (intno * 8 + 7 > dt->limit) {
-        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2, 0);
     }
     ptr = dt->base + intno * 8;
     e1 = cpu_ldl_kernel(env, ptr);
@@ -609,9 +615,9 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
     case 5: /* task gate */
         /* must do that check here to return the correct error code */
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, intno * 8 + 2);
+            raise_exception_err(env, EXCP0B_NOSEG, intno * 8 + 2, 0);
         }
-        switch_tss(env, intno * 8, e1, e2, SWITCH_TSS_CALL, old_eip);
+        switch_tss(env, intno * 8, e1, e2, SWITCH_TSS_CALL, old_eip, 0);
         if (has_error_code) {
             int type;
             uint32_t mask;
@@ -640,60 +646,60 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
     case 15: /* 386 trap gate */
         break;
     default:
-        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2, 0);
         break;
     }
     dpl = (e2 >> DESC_DPL_SHIFT) & 3;
     cpl = env->hflags & HF_CPL_MASK;
     /* check privilege if software int */
     if (is_int && dpl < cpl) {
-        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2, 0);
     }
     /* check valid bit */
     if (!(e2 & DESC_P_MASK)) {
-        raise_exception_err(env, EXCP0B_NOSEG, intno * 8 + 2);
+        raise_exception_err(env, EXCP0B_NOSEG, intno * 8 + 2, 0);
     }
     selector = e1 >> 16;
     offset = (e2 & 0xffff0000) | (e1 & 0x0000ffff);
     if ((selector & 0xfffc) == 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, 0);
     }
-    if (load_segment(env, &e1, &e2, selector) != 0) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+    if (load_segment(env, &e1, &e2, selector, 0) != 0) {
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     if (!(e2 & DESC_S_MASK) || !(e2 & (DESC_CS_MASK))) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     dpl = (e2 >> DESC_DPL_SHIFT) & 3;
     if (dpl > cpl) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     if (!(e2 & DESC_P_MASK)) {
-        raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+        raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, 0);
     }
     if (!(e2 & DESC_C_MASK) && dpl < cpl) {
         /* to inner privilege */
-        get_ss_esp_from_tss(env, &ss, &esp, dpl);
+        get_ss_esp_from_tss(env, &ss, &esp, dpl, 0);
         if ((ss & 0xfffc) == 0) {
-            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, 0);
         }
         if ((ss & 3) != dpl) {
-            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, 0);
         }
-        if (load_segment(env, &ss_e1, &ss_e2, ss) != 0) {
-            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+        if (load_segment(env, &ss_e1, &ss_e2, ss, 0) != 0) {
+            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, 0);
         }
         ss_dpl = (ss_e2 >> DESC_DPL_SHIFT) & 3;
         if (ss_dpl != dpl) {
-            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, 0);
         }
         if (!(ss_e2 & DESC_S_MASK) ||
             (ss_e2 & DESC_CS_MASK) ||
             !(ss_e2 & DESC_W_MASK)) {
-            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, 0);
         }
         if (!(ss_e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, 0);
         }
         new_stack = 1;
         sp_mask = get_sp_mask(ss_e2);
@@ -701,7 +707,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
     } else if ((e2 & DESC_C_MASK) || dpl == cpl) {
         /* to same privilege */
         if (vm86) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
         }
         new_stack = 0;
         sp_mask = get_sp_mask(env->segs[R_SS].flags);
@@ -709,7 +715,7 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
         esp = env->regs[R_ESP];
         dpl = cpl;
     } else {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
         new_stack = 0; /* avoid warning */
         sp_mask = 0; /* avoid warning */
         ssp = 0; /* avoid warning */
@@ -729,36 +735,36 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
     if (shift == 1) {
         if (new_stack) {
             if (vm86) {
-                PUSHL(ssp, esp, sp_mask, env->segs[R_GS].selector);
-                PUSHL(ssp, esp, sp_mask, env->segs[R_FS].selector);
-                PUSHL(ssp, esp, sp_mask, env->segs[R_DS].selector);
-                PUSHL(ssp, esp, sp_mask, env->segs[R_ES].selector);
+                PUSHL(ssp, esp, sp_mask, env->segs[R_GS].selector, 0);
+                PUSHL(ssp, esp, sp_mask, env->segs[R_FS].selector, 0);
+                PUSHL(ssp, esp, sp_mask, env->segs[R_DS].selector, 0);
+                PUSHL(ssp, esp, sp_mask, env->segs[R_ES].selector, 0);
             }
-            PUSHL(ssp, esp, sp_mask, env->segs[R_SS].selector);
-            PUSHL(ssp, esp, sp_mask, env->regs[R_ESP]);
+            PUSHL(ssp, esp, sp_mask, env->segs[R_SS].selector, 0);
+            PUSHL(ssp, esp, sp_mask, env->regs[R_ESP], 0);
         }
-        PUSHL(ssp, esp, sp_mask, cpu_compute_eflags(env));
-        PUSHL(ssp, esp, sp_mask, env->segs[R_CS].selector);
-        PUSHL(ssp, esp, sp_mask, old_eip);
+        PUSHL(ssp, esp, sp_mask, cpu_compute_eflags(env), 0);
+        PUSHL(ssp, esp, sp_mask, env->segs[R_CS].selector, 0);
+        PUSHL(ssp, esp, sp_mask, old_eip, 0);
         if (has_error_code) {
-            PUSHL(ssp, esp, sp_mask, error_code);
+            PUSHL(ssp, esp, sp_mask, error_code, 0);
         }
     } else {
         if (new_stack) {
             if (vm86) {
-                PUSHW(ssp, esp, sp_mask, env->segs[R_GS].selector);
-                PUSHW(ssp, esp, sp_mask, env->segs[R_FS].selector);
-                PUSHW(ssp, esp, sp_mask, env->segs[R_DS].selector);
-                PUSHW(ssp, esp, sp_mask, env->segs[R_ES].selector);
+                PUSHW(ssp, esp, sp_mask, env->segs[R_GS].selector, 0);
+                PUSHW(ssp, esp, sp_mask, env->segs[R_FS].selector, 0);
+                PUSHW(ssp, esp, sp_mask, env->segs[R_DS].selector, 0);
+                PUSHW(ssp, esp, sp_mask, env->segs[R_ES].selector, 0);
             }
-            PUSHW(ssp, esp, sp_mask, env->segs[R_SS].selector);
-            PUSHW(ssp, esp, sp_mask, env->regs[R_ESP]);
+            PUSHW(ssp, esp, sp_mask, env->segs[R_SS].selector, 0);
+            PUSHW(ssp, esp, sp_mask, env->regs[R_ESP], 0);
         }
-        PUSHW(ssp, esp, sp_mask, cpu_compute_eflags(env));
-        PUSHW(ssp, esp, sp_mask, env->segs[R_CS].selector);
-        PUSHW(ssp, esp, sp_mask, old_eip);
+        PUSHW(ssp, esp, sp_mask, cpu_compute_eflags(env), 0);
+        PUSHW(ssp, esp, sp_mask, env->segs[R_CS].selector, 0);
+        PUSHW(ssp, esp, sp_mask, old_eip, 0);
         if (has_error_code) {
-            PUSHW(ssp, esp, sp_mask, error_code);
+            PUSHW(ssp, esp, sp_mask, error_code, 0);
         }
     }
 
@@ -791,15 +797,15 @@ static void do_interrupt_protected(CPUX86State *env, int intno, int is_int,
 
 #ifdef TARGET_X86_64
 
-#define PUSHQ(sp, val)                          \
+#define PUSHQ(sp, val, ra)                      \
     {                                           \
         sp -= 8;                                \
-        cpu_stq_kernel(env, sp, (val));         \
+        cpu_stq_kernel_ra(env, sp, (val), ra);  \
     }
 
-#define POPQ(sp, val)                           \
+#define POPQ(sp, val, ra)                       \
     {                                           \
-        val = cpu_ldq_kernel(env, sp);          \
+        val = cpu_ldq_kernel_ra(env, sp, ra);   \
         sp += 8;                                \
     }
 
@@ -818,7 +824,7 @@ static inline target_ulong get_rsp_from_tss(CPUX86State *env, int level)
     }
     index = 8 * level + 4;
     if ((index + 7) > env->tr.limit) {
-        raise_exception_err(env, EXCP0A_TSS, env->tr.selector & 0xfffc);
+        raise_exception_err(env, EXCP0A_TSS, env->tr.selector & 0xfffc, 0);
     }
     return cpu_ldq_kernel(env, env->tr.base + index);
 }
@@ -846,7 +852,7 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int,
 
     dt = &env->idt;
     if (intno * 16 + 15 > dt->limit) {
-        raise_exception_err(env, EXCP0D_GPF, intno * 16 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 16 + 2, 0);
     }
     ptr = dt->base + intno * 16;
     e1 = cpu_ldl_kernel(env, ptr);
@@ -859,41 +865,41 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int,
     case 15: /* 386 trap gate */
         break;
     default:
-        raise_exception_err(env, EXCP0D_GPF, intno * 16 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 16 + 2, 0);
         break;
     }
     dpl = (e2 >> DESC_DPL_SHIFT) & 3;
     cpl = env->hflags & HF_CPL_MASK;
     /* check privilege if software int */
     if (is_int && dpl < cpl) {
-        raise_exception_err(env, EXCP0D_GPF, intno * 16 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 16 + 2, 0);
     }
     /* check valid bit */
     if (!(e2 & DESC_P_MASK)) {
-        raise_exception_err(env, EXCP0B_NOSEG, intno * 16 + 2);
+        raise_exception_err(env, EXCP0B_NOSEG, intno * 16 + 2, 0);
     }
     selector = e1 >> 16;
     offset = ((target_ulong)e3 << 32) | (e2 & 0xffff0000) | (e1 & 0x0000ffff);
     ist = e2 & 7;
     if ((selector & 0xfffc) == 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, 0);
     }
 
-    if (load_segment(env, &e1, &e2, selector) != 0) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+    if (load_segment(env, &e1, &e2, selector, 0) != 0) {
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     if (!(e2 & DESC_S_MASK) || !(e2 & (DESC_CS_MASK))) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     dpl = (e2 >> DESC_DPL_SHIFT) & 3;
     if (dpl > cpl) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     if (!(e2 & DESC_P_MASK)) {
-        raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+        raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, 0);
     }
     if (!(e2 & DESC_L_MASK) || (e2 & DESC_B_MASK)) {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
     }
     if ((!(e2 & DESC_C_MASK) && dpl < cpl) || ist != 0) {
         /* to inner privilege */
@@ -903,25 +909,25 @@ static void do_interrupt64(CPUX86State *env, int intno, int is_int,
     } else if ((e2 & DESC_C_MASK) || dpl == cpl) {
         /* to same privilege */
         if (env->eflags & VM_MASK) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
         }
         new_stack = 0;
         esp = env->regs[R_ESP];
         dpl = cpl;
     } else {
-        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, 0);
         new_stack = 0; /* avoid warning */
         esp = 0; /* avoid warning */
     }
     esp &= ~0xfLL; /* align stack */
 
-    PUSHQ(esp, env->segs[R_SS].selector);
-    PUSHQ(esp, env->regs[R_ESP]);
-    PUSHQ(esp, cpu_compute_eflags(env));
-    PUSHQ(esp, env->segs[R_CS].selector);
-    PUSHQ(esp, old_eip);
+    PUSHQ(esp, env->segs[R_SS].selector, 0);
+    PUSHQ(esp, env->regs[R_ESP], 0);
+    PUSHQ(esp, cpu_compute_eflags(env), 0);
+    PUSHQ(esp, env->segs[R_CS].selector, 0);
+    PUSHQ(esp, old_eip, 0);
     if (has_error_code) {
-        PUSHQ(esp, error_code);
+        PUSHQ(esp, error_code, 0);
     }
 
     /* interrupt gate clear IF mask */
@@ -961,7 +967,7 @@ void helper_syscall(CPUX86State *env, int next_eip_addend)
     int selector;
 
     if (!(env->efer & MSR_EFER_SCE)) {
-        raise_exception_err(env, EXCP06_ILLOP, 0);
+        raise_exception_err(env, EXCP06_ILLOP, 0, GETPC());
     }
     selector = (env->star >> 32) & 0xffff;
     if (env->hflags & HF_LMA_MASK) {
@@ -1016,11 +1022,11 @@ void helper_sysret(CPUX86State *env, int dflag)
     int cpl, selector;
 
     if (!(env->efer & MSR_EFER_SCE)) {
-        raise_exception_err(env, EXCP06_ILLOP, 0);
+        raise_exception_err(env, EXCP06_ILLOP, 0, GETPC());
     }
     cpl = env->hflags & HF_CPL_MASK;
     if (!(env->cr[0] & CR0_PE_MASK) || cpl != 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
     }
     selector = (env->star >> 48) & 0xffff;
     if (env->hflags & HF_LMA_MASK) {
@@ -1078,7 +1084,7 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int,
     /* real mode (simpler!) */
     dt = &env->idt;
     if (intno * 4 + 3 > dt->limit) {
-        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2);
+        raise_exception_err(env, EXCP0D_GPF, intno * 8 + 2, 0);
     }
     ptr = dt->base + intno * 4;
     offset = cpu_lduw_kernel(env, ptr);
@@ -1092,9 +1098,9 @@ static void do_interrupt_real(CPUX86State *env, int intno, int is_int,
     }
     old_cs = env->segs[R_CS].selector;
     /* XXX: use SS segment size? */
-    PUSHW(ssp, esp, 0xffff, cpu_compute_eflags(env));
-    PUSHW(ssp, esp, 0xffff, old_cs);
-    PUSHW(ssp, esp, 0xffff, old_eip);
+    PUSHW(ssp, esp, 0xffff, cpu_compute_eflags(env), 0);
+    PUSHW(ssp, esp, 0xffff, old_cs, 0);
+    PUSHW(ssp, esp, 0xffff, old_eip, 0);
 
     /* update processor state */
     env->regs[R_ESP] = (env->regs[R_ESP] & ~0xffff) | (esp & 0xffff);
@@ -1127,7 +1133,7 @@ static void do_interrupt_user(CPUX86State *env, int intno, int is_int,
     cpl = env->hflags & HF_CPL_MASK;
     /* check privilege if software int */
     if (is_int && dpl < cpl) {
-        raise_exception_err(env, EXCP0D_GPF, (intno << shift) + 2);
+        raise_exception_err(env, EXCP0D_GPF, (intno << shift) + 2, 0);
     }
 
     /* Since we emulate only user space, we cannot do more than
@@ -1372,22 +1378,26 @@ void helper_enter_level(CPUX86State *env, int level, int data32,
         while (--level) {
             esp -= 4;
             ebp -= 4;
-            cpu_stl_data(env, ssp + (esp & esp_mask),
-                         cpu_ldl_data(env, ssp + (ebp & esp_mask)));
+            cpu_stl_data_ra(env, ssp + (esp & esp_mask),
+                            cpu_ldl_data_ra(env, ssp + (ebp & esp_mask),
+                                            GETPC()),
+                            GETPC());
         }
         esp -= 4;
-        cpu_stl_data(env, ssp + (esp & esp_mask), t1);
+        cpu_stl_data_ra(env, ssp + (esp & esp_mask), t1, GETPC());
     } else {
         /* 16 bit */
         esp -= 2;
         while (--level) {
             esp -= 2;
             ebp -= 2;
-            cpu_stw_data(env, ssp + (esp & esp_mask),
-                         cpu_lduw_data(env, ssp + (ebp & esp_mask)));
+            cpu_stw_data_ra(env, ssp + (esp & esp_mask),
+                            cpu_lduw_data_ra(env, ssp + (ebp & esp_mask),
+                                             GETPC()),
+                            GETPC());
         }
         esp -= 2;
-        cpu_stw_data(env, ssp + (esp & esp_mask), t1);
+        cpu_stw_data_ra(env, ssp + (esp & esp_mask), t1, GETPC());
     }
 }
 
@@ -1406,20 +1416,22 @@ void helper_enter64_level(CPUX86State *env, int level, int data64,
         while (--level) {
             esp -= 8;
             ebp -= 8;
-            cpu_stq_data(env, esp, cpu_ldq_data(env, ebp));
+            cpu_stq_data_ra(env, esp, cpu_ldq_data_ra(env, ebp, GETPC()),
+                            GETPC());
         }
         esp -= 8;
-        cpu_stq_data(env, esp, t1);
+        cpu_stq_data_ra(env, esp, t1, GETPC());
     } else {
         /* 16 bit */
         esp -= 2;
         while (--level) {
             esp -= 2;
             ebp -= 2;
-            cpu_stw_data(env, esp, cpu_lduw_data(env, ebp));
+            cpu_stw_data_ra(env, esp, cpu_lduw_data_ra(env, ebp, GETPC()),
+                            GETPC());
         }
         esp -= 2;
-        cpu_stw_data(env, esp, t1);
+        cpu_stw_data_ra(env, esp, t1, GETPC());
     }
 }
 #endif
@@ -1438,7 +1450,7 @@ void helper_lldt(CPUX86State *env, int selector)
         env->ldt.limit = 0;
     } else {
         if (selector & 0x4) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         dt = &env->gdt;
         index = selector & ~7;
@@ -1451,22 +1463,22 @@ void helper_lldt(CPUX86State *env, int selector)
             entry_limit = 7;
         }
         if ((index + entry_limit) > dt->limit) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         ptr = dt->base + index;
-        e1 = cpu_ldl_kernel(env, ptr);
-        e2 = cpu_ldl_kernel(env, ptr + 4);
+        e1 = cpu_ldl_kernel_ra(env, ptr, GETPC());
+        e2 = cpu_ldl_kernel_ra(env, ptr + 4, GETPC());
         if ((e2 & DESC_S_MASK) || ((e2 >> DESC_TYPE_SHIFT) & 0xf) != 2) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, GETPC());
         }
 #ifdef TARGET_X86_64
         if (env->hflags & HF_LMA_MASK) {
             uint32_t e3;
 
-            e3 = cpu_ldl_kernel(env, ptr + 8);
+            e3 = cpu_ldl_kernel_ra(env, ptr + 8, GETPC());
             load_seg_cache_raw_dt(&env->ldt, e1, e2);
             env->ldt.base |= (target_ulong)e3 << 32;
         } else
@@ -1493,7 +1505,7 @@ void helper_ltr(CPUX86State *env, int selector)
         env->tr.flags = 0;
     } else {
         if (selector & 0x4) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         dt = &env->gdt;
         index = selector & ~7;
@@ -1506,27 +1518,27 @@ void helper_ltr(CPUX86State *env, int selector)
             entry_limit = 7;
         }
         if ((index + entry_limit) > dt->limit) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         ptr = dt->base + index;
-        e1 = cpu_ldl_kernel(env, ptr);
-        e2 = cpu_ldl_kernel(env, ptr + 4);
+        e1 = cpu_ldl_kernel_ra(env, ptr, GETPC());
+        e2 = cpu_ldl_kernel_ra(env, ptr + 4, GETPC());
         type = (e2 >> DESC_TYPE_SHIFT) & 0xf;
         if ((e2 & DESC_S_MASK) ||
             (type != 1 && type != 9)) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, GETPC());
         }
 #ifdef TARGET_X86_64
         if (env->hflags & HF_LMA_MASK) {
             uint32_t e3, e4;
 
-            e3 = cpu_ldl_kernel(env, ptr + 8);
-            e4 = cpu_ldl_kernel(env, ptr + 12);
+            e3 = cpu_ldl_kernel_ra(env, ptr + 8, GETPC());
+            e4 = cpu_ldl_kernel_ra(env, ptr + 12, GETPC());
             if ((e4 >> DESC_TYPE_SHIFT) & 0xf) {
-                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
             }
             load_seg_cache_raw_dt(&env->tr, e1, e2);
             env->tr.base |= (target_ulong)e3 << 32;
@@ -1536,7 +1548,7 @@ void helper_ltr(CPUX86State *env, int selector)
             load_seg_cache_raw_dt(&env->tr, e1, e2);
         }
         e2 |= DESC_TSS_BUSY_MASK;
-        cpu_stl_kernel(env, ptr + 4, e2);
+        cpu_stl_kernel_ra(env, ptr + 4, e2, GETPC());
     }
     env->tr.selector = selector;
 }
@@ -1559,7 +1571,7 @@ void helper_load_seg(CPUX86State *env, int seg_reg, int selector)
             && (!(env->hflags & HF_CS64_MASK) || cpl == 3)
 #endif
             ) {
-            raise_exception_err(env, EXCP0D_GPF, 0);
+            raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
         }
         cpu_x86_load_seg_cache(env, seg_reg, selector, 0, 0, 0);
     } else {
@@ -1571,51 +1583,51 @@ void helper_load_seg(CPUX86State *env, int seg_reg, int selector)
         }
         index = selector & ~7;
         if ((index + 7) > dt->limit) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         ptr = dt->base + index;
-        e1 = cpu_ldl_kernel(env, ptr);
-        e2 = cpu_ldl_kernel(env, ptr + 4);
+        e1 = cpu_ldl_kernel_ra(env, ptr, GETPC());
+        e2 = cpu_ldl_kernel_ra(env, ptr + 4, GETPC());
 
         if (!(e2 & DESC_S_MASK)) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         rpl = selector & 3;
         dpl = (e2 >> DESC_DPL_SHIFT) & 3;
         if (seg_reg == R_SS) {
             /* must be writable segment */
             if ((e2 & DESC_CS_MASK) || !(e2 & DESC_W_MASK)) {
-                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
             }
             if (rpl != cpl || dpl != cpl) {
-                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
             }
         } else {
             /* must be readable segment */
             if ((e2 & (DESC_CS_MASK | DESC_R_MASK)) == DESC_CS_MASK) {
-                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
             }
 
             if (!(e2 & DESC_CS_MASK) || !(e2 & DESC_C_MASK)) {
                 /* if not conforming code, test rights */
                 if (dpl < cpl || dpl < rpl) {
-                    raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+                    raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
                 }
             }
         }
 
         if (!(e2 & DESC_P_MASK)) {
             if (seg_reg == R_SS) {
-                raise_exception_err(env, EXCP0C_STACK, selector & 0xfffc);
+                raise_exception_err(env, EXCP0C_STACK, selector & 0xfffc, GETPC());
             } else {
-                raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+                raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, GETPC());
             }
         }
 
         /* set the access bit if not already set */
         if (!(e2 & DESC_A_MASK)) {
             e2 |= DESC_A_MASK;
-            cpu_stl_kernel(env, ptr + 4, e2);
+            cpu_stl_kernel_ra(env, ptr + 4, e2, GETPC());
         }
 
         cpu_x86_load_seg_cache(env, seg_reg, selector,
@@ -1638,39 +1650,39 @@ void helper_ljmp_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
     target_ulong next_eip;
 
     if ((new_cs & 0xfffc) == 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
     }
-    if (load_segment(env, &e1, &e2, new_cs) != 0) {
-        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+    if (load_segment(env, &e1, &e2, new_cs, GETPC()) != 0) {
+        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
     }
     cpl = env->hflags & HF_CPL_MASK;
     if (e2 & DESC_S_MASK) {
         if (!(e2 & DESC_CS_MASK)) {
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
         }
         dpl = (e2 >> DESC_DPL_SHIFT) & 3;
         if (e2 & DESC_C_MASK) {
             /* conforming code segment */
             if (dpl > cpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
         } else {
             /* non conforming code segment */
             rpl = new_cs & 3;
             if (rpl > cpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
             if (dpl != cpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc, GETPC());
         }
         limit = get_seg_limit(e1, e2);
         if (new_eip > limit &&
             !(env->hflags & HF_LMA_MASK) && !(e2 & DESC_L_MASK)) {
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
         }
         cpu_x86_load_seg_cache(env, R_CS, (new_cs & 0xfffc) | cpl,
                        get_seg_base(e1, e2), limit, e2);
@@ -1686,50 +1698,50 @@ void helper_ljmp_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
         case 9: /* 386 TSS */
         case 5: /* task gate */
             if (dpl < cpl || dpl < rpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
             next_eip = env->eip + next_eip_addend;
-            switch_tss(env, new_cs, e1, e2, SWITCH_TSS_JMP, next_eip);
+            switch_tss(env, new_cs, e1, e2, SWITCH_TSS_JMP, next_eip, GETPC());
             break;
         case 4: /* 286 call gate */
         case 12: /* 386 call gate */
             if ((dpl < cpl) || (dpl < rpl)) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
             if (!(e2 & DESC_P_MASK)) {
-                raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc, GETPC());
             }
             gate_cs = e1 >> 16;
             new_eip = (e1 & 0xffff);
             if (type == 12) {
                 new_eip |= (e2 & 0xffff0000);
             }
-            if (load_segment(env, &e1, &e2, gate_cs) != 0) {
-                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc);
+            if (load_segment(env, &e1, &e2, gate_cs, GETPC()) != 0) {
+                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc, GETPC());
             }
             dpl = (e2 >> DESC_DPL_SHIFT) & 3;
             /* must be code segment */
             if (((e2 & (DESC_S_MASK | DESC_CS_MASK)) !=
                  (DESC_S_MASK | DESC_CS_MASK))) {
-                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc, GETPC());
             }
             if (((e2 & DESC_C_MASK) && (dpl > cpl)) ||
                 (!(e2 & DESC_C_MASK) && (dpl != cpl))) {
-                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc, GETPC());
             }
             if (!(e2 & DESC_P_MASK)) {
-                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, gate_cs & 0xfffc, GETPC());
             }
             limit = get_seg_limit(e1, e2);
             if (new_eip > limit) {
-                raise_exception_err(env, EXCP0D_GPF, 0);
+                raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
             }
             cpu_x86_load_seg_cache(env, R_CS, (gate_cs & 0xfffc) | cpl,
                                    get_seg_base(e1, e2), limit, e2);
             env->eip = new_eip;
             break;
         default:
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             break;
         }
     }
@@ -1748,11 +1760,11 @@ void helper_lcall_real(CPUX86State *env, int new_cs, target_ulong new_eip1,
     esp_mask = get_sp_mask(env->segs[R_SS].flags);
     ssp = env->segs[R_SS].base;
     if (shift) {
-        PUSHL(ssp, esp, esp_mask, env->segs[R_CS].selector);
-        PUSHL(ssp, esp, esp_mask, next_eip);
+        PUSHL(ssp, esp, esp_mask, env->segs[R_CS].selector, GETPC());
+        PUSHL(ssp, esp, esp_mask, next_eip, GETPC());
     } else {
-        PUSHW(ssp, esp, esp_mask, env->segs[R_CS].selector);
-        PUSHW(ssp, esp, esp_mask, next_eip);
+        PUSHW(ssp, esp, esp_mask, env->segs[R_CS].selector, GETPC());
+        PUSHW(ssp, esp, esp_mask, next_eip, GETPC());
     }
 
     SET_ESP(esp, esp_mask);
@@ -1775,35 +1787,35 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
     LOG_PCALL("lcall %04x:%08x s=%d\n", new_cs, (uint32_t)new_eip, shift);
     LOG_PCALL_STATE(CPU(x86_env_get_cpu(env)));
     if ((new_cs & 0xfffc) == 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
     }
-    if (load_segment(env, &e1, &e2, new_cs) != 0) {
-        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+    if (load_segment(env, &e1, &e2, new_cs, GETPC()) != 0) {
+        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
     }
     cpl = env->hflags & HF_CPL_MASK;
     LOG_PCALL("desc=%08x:%08x\n", e1, e2);
     if (e2 & DESC_S_MASK) {
         if (!(e2 & DESC_CS_MASK)) {
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
         }
         dpl = (e2 >> DESC_DPL_SHIFT) & 3;
         if (e2 & DESC_C_MASK) {
             /* conforming code segment */
             if (dpl > cpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
         } else {
             /* non conforming code segment */
             rpl = new_cs & 3;
             if (rpl > cpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
             if (dpl != cpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc, GETPC());
         }
 
 #ifdef TARGET_X86_64
@@ -1813,8 +1825,8 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
 
             /* 64 bit case */
             rsp = env->regs[R_ESP];
-            PUSHQ(rsp, env->segs[R_CS].selector);
-            PUSHQ(rsp, next_eip);
+            PUSHQ(rsp, env->segs[R_CS].selector, GETPC());
+            PUSHQ(rsp, next_eip, GETPC());
             /* from this point, not restartable */
             env->regs[R_ESP] = rsp;
             cpu_x86_load_seg_cache(env, R_CS, (new_cs & 0xfffc) | cpl,
@@ -1828,16 +1840,16 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
             sp_mask = get_sp_mask(env->segs[R_SS].flags);
             ssp = env->segs[R_SS].base;
             if (shift) {
-                PUSHL(ssp, sp, sp_mask, env->segs[R_CS].selector);
-                PUSHL(ssp, sp, sp_mask, next_eip);
+                PUSHL(ssp, sp, sp_mask, env->segs[R_CS].selector, GETPC());
+                PUSHL(ssp, sp, sp_mask, next_eip, GETPC());
             } else {
-                PUSHW(ssp, sp, sp_mask, env->segs[R_CS].selector);
-                PUSHW(ssp, sp, sp_mask, next_eip);
+                PUSHW(ssp, sp, sp_mask, env->segs[R_CS].selector, GETPC());
+                PUSHW(ssp, sp, sp_mask, next_eip, GETPC());
             }
 
             limit = get_seg_limit(e1, e2);
             if (new_eip > limit) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
             /* from this point, not restartable */
             SET_ESP(sp, sp_mask);
@@ -1855,73 +1867,73 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
         case 9: /* available 386 TSS */
         case 5: /* task gate */
             if (dpl < cpl || dpl < rpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             }
-            switch_tss(env, new_cs, e1, e2, SWITCH_TSS_CALL, next_eip);
+            switch_tss(env, new_cs, e1, e2, SWITCH_TSS_CALL, next_eip, GETPC());
             return;
         case 4: /* 286 call gate */
         case 12: /* 386 call gate */
             break;
         default:
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
             break;
         }
         shift = type >> 3;
 
         if (dpl < cpl || dpl < rpl) {
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, GETPC());
         }
         /* check valid bit */
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG,  new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG,  new_cs & 0xfffc, GETPC());
         }
         selector = e1 >> 16;
         offset = (e2 & 0xffff0000) | (e1 & 0x0000ffff);
         param_count = e2 & 0x1f;
         if ((selector & 0xfffc) == 0) {
-            raise_exception_err(env, EXCP0D_GPF, 0);
+            raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
         }
 
-        if (load_segment(env, &e1, &e2, selector) != 0) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+        if (load_segment(env, &e1, &e2, selector, GETPC()) != 0) {
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         if (!(e2 & DESC_S_MASK) || !(e2 & (DESC_CS_MASK))) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         dpl = (e2 >> DESC_DPL_SHIFT) & 3;
         if (dpl > cpl) {
-            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, selector & 0xfffc, GETPC());
         }
         if (!(e2 & DESC_P_MASK)) {
-            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc);
+            raise_exception_err(env, EXCP0B_NOSEG, selector & 0xfffc, GETPC());
         }
 
         if (!(e2 & DESC_C_MASK) && dpl < cpl) {
             /* to inner privilege */
-            get_ss_esp_from_tss(env, &ss, &sp, dpl);
+            get_ss_esp_from_tss(env, &ss, &sp, dpl, GETPC());
             LOG_PCALL("new ss:esp=%04x:%08x param_count=%d env->regs[R_ESP]="
                       TARGET_FMT_lx "\n", ss, sp, param_count,
                       env->regs[R_ESP]);
             if ((ss & 0xfffc) == 0) {
-                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, GETPC());
             }
             if ((ss & 3) != dpl) {
-                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, GETPC());
             }
-            if (load_segment(env, &ss_e1, &ss_e2, ss) != 0) {
-                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+            if (load_segment(env, &ss_e1, &ss_e2, ss, GETPC()) != 0) {
+                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, GETPC());
             }
             ss_dpl = (ss_e2 >> DESC_DPL_SHIFT) & 3;
             if (ss_dpl != dpl) {
-                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, GETPC());
             }
             if (!(ss_e2 & DESC_S_MASK) ||
                 (ss_e2 & DESC_CS_MASK) ||
                 !(ss_e2 & DESC_W_MASK)) {
-                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, GETPC());
             }
             if (!(ss_e2 & DESC_P_MASK)) {
-                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc);
+                raise_exception_err(env, EXCP0A_TSS, ss & 0xfffc, GETPC());
             }
 
             /* push_size = ((param_count * 2) + 8) << shift; */
@@ -1932,22 +1944,22 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
             sp_mask = get_sp_mask(ss_e2);
             ssp = get_seg_base(ss_e1, ss_e2);
             if (shift) {
-                PUSHL(ssp, sp, sp_mask, env->segs[R_SS].selector);
-                PUSHL(ssp, sp, sp_mask, env->regs[R_ESP]);
+                PUSHL(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC());
+                PUSHL(ssp, sp, sp_mask, env->regs[R_ESP], GETPC());
                 for (i = param_count - 1; i >= 0; i--) {
-                    val = cpu_ldl_kernel(env, old_ssp +
-                                         ((env->regs[R_ESP] + i * 4) &
-                                          old_sp_mask));
-                    PUSHL(ssp, sp, sp_mask, val);
+                    val = cpu_ldl_kernel_ra(env, old_ssp +
+                                            ((env->regs[R_ESP] + i * 4) &
+                                             old_sp_mask), GETPC());
+                    PUSHL(ssp, sp, sp_mask, val, GETPC());
                 }
             } else {
-                PUSHW(ssp, sp, sp_mask, env->segs[R_SS].selector);
-                PUSHW(ssp, sp, sp_mask, env->regs[R_ESP]);
+                PUSHW(ssp, sp, sp_mask, env->segs[R_SS].selector, GETPC());
+                PUSHW(ssp, sp, sp_mask, env->regs[R_ESP], GETPC());
                 for (i = param_count - 1; i >= 0; i--) {
-                    val = cpu_lduw_kernel(env, old_ssp +
-                                          ((env->regs[R_ESP] + i * 2) &
-                                           old_sp_mask));
-                    PUSHW(ssp, sp, sp_mask, val);
+                    val = cpu_lduw_kernel_ra(env, old_ssp +
+                                             ((env->regs[R_ESP] + i * 2) &
+                                              old_sp_mask), GETPC());
+                    PUSHW(ssp, sp, sp_mask, val, GETPC());
                 }
             }
             new_stack = 1;
@@ -1961,11 +1973,11 @@ void helper_lcall_protected(CPUX86State *env, int new_cs, target_ulong new_eip,
         }
 
         if (shift) {
-            PUSHL(ssp, sp, sp_mask, env->segs[R_CS].selector);
-            PUSHL(ssp, sp, sp_mask, next_eip);
+            PUSHL(ssp, sp, sp_mask, env->segs[R_CS].selector, GETPC());
+            PUSHL(ssp, sp, sp_mask, next_eip, GETPC());
         } else {
-            PUSHW(ssp, sp, sp_mask, env->segs[R_CS].selector);
-            PUSHW(ssp, sp, sp_mask, next_eip);
+            PUSHW(ssp, sp, sp_mask, env->segs[R_CS].selector, GETPC());
+            PUSHW(ssp, sp, sp_mask, next_eip, GETPC());
         }
 
         /* from this point, not restartable */
@@ -2000,15 +2012,15 @@ void helper_iret_real(CPUX86State *env, int shift)
     ssp = env->segs[R_SS].base;
     if (shift == 1) {
         /* 32 bits */
-        POPL(ssp, sp, sp_mask, new_eip);
-        POPL(ssp, sp, sp_mask, new_cs);
+        POPL(ssp, sp, sp_mask, new_eip, GETPC());
+        POPL(ssp, sp, sp_mask, new_cs, GETPC());
         new_cs &= 0xffff;
-        POPL(ssp, sp, sp_mask, new_eflags);
+        POPL(ssp, sp, sp_mask, new_eflags, GETPC());
     } else {
         /* 16 bits */
-        POPW(ssp, sp, sp_mask, new_eip);
-        POPW(ssp, sp, sp_mask, new_cs);
-        POPW(ssp, sp, sp_mask, new_eflags);
+        POPW(ssp, sp, sp_mask, new_eip, GETPC());
+        POPW(ssp, sp, sp_mask, new_cs, GETPC());
+        POPW(ssp, sp, sp_mask, new_eflags, GETPC());
     }
     env->regs[R_ESP] = (env->regs[R_ESP] & ~sp_mask) | (sp & sp_mask);
     env->segs[R_CS].selector = new_cs;
@@ -2053,7 +2065,8 @@ static inline void validate_seg(CPUX86State *env, int seg_reg, int cpl)
 
 /* protected mode iret */
 static inline void helper_ret_protected(CPUX86State *env, int shift,
-                                        int is_iret, int addend)
+                                        int is_iret, int addend,
+                                        uintptr_t retaddr)
 {
     uint32_t new_cs, new_eflags, new_ss;
     uint32_t new_es, new_ds, new_fs, new_gs;
@@ -2074,32 +2087,32 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
     new_eflags = 0; /* avoid warning */
 #ifdef TARGET_X86_64
     if (shift == 2) {
-        POPQ(sp, new_eip);
-        POPQ(sp, new_cs);
+        POPQ(sp, new_eip, retaddr);
+        POPQ(sp, new_cs, retaddr);
         new_cs &= 0xffff;
         if (is_iret) {
-            POPQ(sp, new_eflags);
+            POPQ(sp, new_eflags, retaddr);
         }
     } else
 #endif
     {
         if (shift == 1) {
             /* 32 bits */
-            POPL(ssp, sp, sp_mask, new_eip);
-            POPL(ssp, sp, sp_mask, new_cs);
+            POPL(ssp, sp, sp_mask, new_eip, retaddr);
+            POPL(ssp, sp, sp_mask, new_cs, retaddr);
             new_cs &= 0xffff;
             if (is_iret) {
-                POPL(ssp, sp, sp_mask, new_eflags);
+                POPL(ssp, sp, sp_mask, new_eflags, retaddr);
                 if (new_eflags & VM_MASK) {
                     goto return_to_vm86;
                 }
             }
         } else {
             /* 16 bits */
-            POPW(ssp, sp, sp_mask, new_eip);
-            POPW(ssp, sp, sp_mask, new_cs);
+            POPW(ssp, sp, sp_mask, new_eip, retaddr);
+            POPW(ssp, sp, sp_mask, new_cs, retaddr);
             if (is_iret) {
-                POPW(ssp, sp, sp_mask, new_eflags);
+                POPW(ssp, sp, sp_mask, new_eflags, retaddr);
             }
         }
     }
@@ -2107,32 +2120,32 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
               new_cs, new_eip, shift, addend);
     LOG_PCALL_STATE(CPU(x86_env_get_cpu(env)));
     if ((new_cs & 0xfffc) == 0) {
-        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
     }
-    if (load_segment(env, &e1, &e2, new_cs) != 0) {
-        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+    if (load_segment(env, &e1, &e2, new_cs, retaddr) != 0) {
+        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
     }
     if (!(e2 & DESC_S_MASK) ||
         !(e2 & DESC_CS_MASK)) {
-        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
     }
     cpl = env->hflags & HF_CPL_MASK;
     rpl = new_cs & 3;
     if (rpl < cpl) {
-        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+        raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
     }
     dpl = (e2 >> DESC_DPL_SHIFT) & 3;
     if (e2 & DESC_C_MASK) {
         if (dpl > rpl) {
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
         }
     } else {
         if (dpl != rpl) {
-            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc);
+            raise_exception_err(env, EXCP0D_GPF, new_cs & 0xfffc, retaddr);
         }
     }
     if (!(e2 & DESC_P_MASK)) {
-        raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc);
+        raise_exception_err(env, EXCP0B_NOSEG, new_cs & 0xfffc, retaddr);
     }
 
     sp += addend;
@@ -2147,21 +2160,21 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
         /* return to different privilege level */
 #ifdef TARGET_X86_64
         if (shift == 2) {
-            POPQ(sp, new_esp);
-            POPQ(sp, new_ss);
+            POPQ(sp, new_esp, retaddr);
+            POPQ(sp, new_ss, retaddr);
             new_ss &= 0xffff;
         } else
 #endif
         {
             if (shift == 1) {
                 /* 32 bits */
-                POPL(ssp, sp, sp_mask, new_esp);
-                POPL(ssp, sp, sp_mask, new_ss);
+                POPL(ssp, sp, sp_mask, new_esp, retaddr);
+                POPL(ssp, sp, sp_mask, new_ss, retaddr);
                 new_ss &= 0xffff;
             } else {
                 /* 16 bits */
-                POPW(ssp, sp, sp_mask, new_esp);
-                POPW(ssp, sp, sp_mask, new_ss);
+                POPW(ssp, sp, sp_mask, new_esp, retaddr);
+                POPW(ssp, sp, sp_mask, new_ss, retaddr);
             }
         }
         LOG_PCALL("new ss:esp=%04x:" TARGET_FMT_lx "\n",
@@ -2180,26 +2193,26 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
             } else
 #endif
             {
-                raise_exception_err(env, EXCP0D_GPF, 0);
+                raise_exception_err(env, EXCP0D_GPF, 0, retaddr);
             }
         } else {
             if ((new_ss & 3) != rpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc, retaddr);
             }
-            if (load_segment(env, &ss_e1, &ss_e2, new_ss) != 0) {
-                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc);
+            if (load_segment(env, &ss_e1, &ss_e2, new_ss, retaddr) != 0) {
+                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc, retaddr);
             }
             if (!(ss_e2 & DESC_S_MASK) ||
                 (ss_e2 & DESC_CS_MASK) ||
                 !(ss_e2 & DESC_W_MASK)) {
-                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc, retaddr);
             }
             dpl = (ss_e2 >> DESC_DPL_SHIFT) & 3;
             if (dpl != rpl) {
-                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc);
+                raise_exception_err(env, EXCP0D_GPF, new_ss & 0xfffc, retaddr);
             }
             if (!(ss_e2 & DESC_P_MASK)) {
-                raise_exception_err(env, EXCP0B_NOSEG, new_ss & 0xfffc);
+                raise_exception_err(env, EXCP0B_NOSEG, new_ss & 0xfffc, retaddr);
             }
             cpu_x86_load_seg_cache(env, R_SS, new_ss,
                                    get_seg_base(ss_e1, ss_e2),
@@ -2249,12 +2262,12 @@ static inline void helper_ret_protected(CPUX86State *env, int shift,
     return;
 
  return_to_vm86:
-    POPL(ssp, sp, sp_mask, new_esp);
-    POPL(ssp, sp, sp_mask, new_ss);
-    POPL(ssp, sp, sp_mask, new_es);
-    POPL(ssp, sp, sp_mask, new_ds);
-    POPL(ssp, sp, sp_mask, new_fs);
-    POPL(ssp, sp, sp_mask, new_gs);
+    POPL(ssp, sp, sp_mask, new_esp, retaddr);
+    POPL(ssp, sp, sp_mask, new_ss, retaddr);
+    POPL(ssp, sp, sp_mask, new_es, retaddr);
+    POPL(ssp, sp, sp_mask, new_ds, retaddr);
+    POPL(ssp, sp, sp_mask, new_fs, retaddr);
+    POPL(ssp, sp, sp_mask, new_gs, retaddr);
 
     /* modify processor state */
     cpu_load_eflags(env, new_eflags, TF_MASK | AC_MASK | ID_MASK |
@@ -2280,37 +2293,37 @@ void helper_iret_protected(CPUX86State *env, int shift, int next_eip)
     if (env->eflags & NT_MASK) {
 #ifdef TARGET_X86_64
         if (env->hflags & HF_LMA_MASK) {
-            raise_exception_err(env, EXCP0D_GPF, 0);
+            raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
         }
 #endif
-        tss_selector = cpu_lduw_kernel(env, env->tr.base + 0);
+        tss_selector = cpu_lduw_kernel_ra(env, env->tr.base + 0, GETPC());
         if (tss_selector & 4) {
-            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc, GETPC());
         }
-        if (load_segment(env, &e1, &e2, tss_selector) != 0) {
-            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc);
+        if (load_segment(env, &e1, &e2, tss_selector, GETPC()) != 0) {
+            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc, GETPC());
         }
         type = (e2 >> DESC_TYPE_SHIFT) & 0x17;
         /* NOTE: we check both segment and busy TSS */
         if (type != 3) {
-            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc);
+            raise_exception_err(env, EXCP0A_TSS, tss_selector & 0xfffc, GETPC());
         }
-        switch_tss(env, tss_selector, e1, e2, SWITCH_TSS_IRET, next_eip);
+        switch_tss(env, tss_selector, e1, e2, SWITCH_TSS_IRET, next_eip, GETPC());
     } else {
-        helper_ret_protected(env, shift, 1, 0);
+        helper_ret_protected(env, shift, 1, 0, GETPC());
     }
     env->hflags2 &= ~HF2_NMI_MASK;
 }
 
 void helper_lret_protected(CPUX86State *env, int shift, int addend)
 {
-    helper_ret_protected(env, shift, 0, addend);
+    helper_ret_protected(env, shift, 0, addend, GETPC());
 }
 
 void helper_sysenter(CPUX86State *env)
 {
     if (env->sysenter_cs == 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
     }
     env->eflags &= ~(VM_MASK | IF_MASK | RF_MASK);
 
@@ -2346,7 +2359,7 @@ void helper_sysexit(CPUX86State *env, int dflag)
 
     cpl = env->hflags & HF_CPL_MASK;
     if (env->sysenter_cs == 0 || cpl != 0) {
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, GETPC());
     }
 #ifdef TARGET_X86_64
     if (dflag == 2) {
@@ -2390,7 +2403,7 @@ target_ulong helper_lsl(CPUX86State *env, target_ulong selector1)
     if ((selector & 0xfffc) == 0) {
         goto fail;
     }
-    if (load_segment(env, &e1, &e2, selector) != 0) {
+    if (load_segment(env, &e1, &e2, selector, GETPC()) != 0) {
         goto fail;
     }
     rpl = selector & 3;
@@ -2437,7 +2450,7 @@ target_ulong helper_lar(CPUX86State *env, target_ulong selector1)
     if ((selector & 0xfffc) == 0) {
         goto fail;
     }
-    if (load_segment(env, &e1, &e2, selector) != 0) {
+    if (load_segment(env, &e1, &e2, selector, GETPC()) != 0) {
         goto fail;
     }
     rpl = selector & 3;
@@ -2486,7 +2499,7 @@ void helper_verr(CPUX86State *env, target_ulong selector1)
     if ((selector & 0xfffc) == 0) {
         goto fail;
     }
-    if (load_segment(env, &e1, &e2, selector) != 0) {
+    if (load_segment(env, &e1, &e2, selector, GETPC()) != 0) {
         goto fail;
     }
     if (!(e2 & DESC_S_MASK)) {
@@ -2524,7 +2537,7 @@ void helper_verw(CPUX86State *env, target_ulong selector1)
     if ((selector & 0xfffc) == 0) {
         goto fail;
     }
-    if (load_segment(env, &e1, &e2, selector) != 0) {
+    if (load_segment(env, &e1, &e2, selector, GETPC()) != 0) {
         goto fail;
     }
     if (!(e2 & DESC_S_MASK)) {
@@ -2565,7 +2578,8 @@ void cpu_x86_load_seg(CPUX86State *env, int seg_reg, int selector)
 #endif
 
 /* check if Port I/O is allowed in TSS */
-static inline void check_io(CPUX86State *env, int addr, int size)
+static inline void check_io(CPUX86State *env, int addr, int size,
+                            uintptr_t retaddr)
 {
     int io_offset, val, mask;
 
@@ -2575,33 +2589,33 @@ static inline void check_io(CPUX86State *env, int addr, int size)
         env->tr.limit < 103) {
         goto fail;
     }
-    io_offset = cpu_lduw_kernel(env, env->tr.base + 0x66);
+    io_offset = cpu_lduw_kernel_ra(env, env->tr.base + 0x66, retaddr);
     io_offset += (addr >> 3);
     /* Note: the check needs two bytes */
     if ((io_offset + 1) > env->tr.limit) {
         goto fail;
     }
-    val = cpu_lduw_kernel(env, env->tr.base + io_offset);
+    val = cpu_lduw_kernel_ra(env, env->tr.base + io_offset, retaddr);
     val >>= (addr & 7);
     mask = (1 << size) - 1;
     /* all bits must be zero to allow the I/O */
     if ((val & mask) != 0) {
     fail:
-        raise_exception_err(env, EXCP0D_GPF, 0);
+        raise_exception_err(env, EXCP0D_GPF, 0, retaddr);
     }
 }
 
 void helper_check_iob(CPUX86State *env, uint32_t t0)
 {
-    check_io(env, t0, 1);
+    check_io(env, t0, 1, GETPC());
 }
 
 void helper_check_iow(CPUX86State *env, uint32_t t0)
 {
-    check_io(env, t0, 2);
+    check_io(env, t0, 2, GETPC());
 }
 
 void helper_check_iol(CPUX86State *env, uint32_t t0)
 {
-    check_io(env, t0, 4);
+    check_io(env, t0, 4, GETPC());
 }
diff --git a/target-i386/svm_helper.c b/target-i386/svm_helper.c
index 429d029..f5c6c13 100644
--- a/target-i386/svm_helper.c
+++ b/target-i386/svm_helper.c
@@ -354,7 +354,7 @@ void helper_vmrun(CPUX86State *env, int aflag, int next_eip_addend)
 void helper_vmmcall(CPUX86State *env)
 {
     cpu_svm_check_intercept_param(env, SVM_EXIT_VMMCALL, 0);
-    raise_exception(env, EXCP06_ILLOP);
+    raise_exception(env, EXCP06_ILLOP, GETPC());
 }
 
 void helper_vmload(CPUX86State *env, int aflag)
@@ -457,7 +457,7 @@ void helper_skinit(CPUX86State *env)
 {
     cpu_svm_check_intercept_param(env, SVM_EXIT_SKINIT, 0);
     /* XXX: not implemented */
-    raise_exception(env, EXCP06_ILLOP);
+    raise_exception(env, EXCP06_ILLOP, GETPC());
 }
 
 void helper_invlpga(CPUX86State *env, int aflag)
diff --git a/target-i386/translate.c b/target-i386/translate.c
index 305ce50..9238e1b 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -2303,7 +2303,6 @@ static void gen_movl_seg_T0(DisasContext *s, int seg_reg, target_ulong cur_eip)
     if (s->pe && !s->vm86) {
         /* XXX: optimize by finding processor state dynamically */
         gen_update_cc_op(s);
-        gen_jmp_im(cur_eip);
         tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_T[0]);
         gen_helper_load_seg(cpu_env, tcg_const_i32(seg_reg), cpu_tmp2_i32);
         /* abort translation because the addseg value may change or
@@ -4842,21 +4841,17 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         case 6: /* div */
             switch(ot) {
             case MO_8:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_divb_AL(cpu_env, cpu_T[0]);
                 break;
             case MO_16:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_divw_AX(cpu_env, cpu_T[0]);
                 break;
             default:
             case MO_32:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_divl_EAX(cpu_env, cpu_T[0]);
                 break;
 #ifdef TARGET_X86_64
             case MO_64:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_divq_EAX(cpu_env, cpu_T[0]);
                 break;
 #endif
@@ -4865,21 +4860,17 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         case 7: /* idiv */
             switch(ot) {
             case MO_8:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_idivb_AL(cpu_env, cpu_T[0]);
                 break;
             case MO_16:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_idivw_AX(cpu_env, cpu_T[0]);
                 break;
             default:
             case MO_32:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_idivl_EAX(cpu_env, cpu_T[0]);
                 break;
 #ifdef TARGET_X86_64
             case MO_64:
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_idivq_EAX(cpu_env, cpu_T[0]);
                 break;
 #endif
@@ -5212,7 +5203,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         if (dflag == MO_64) {
             if (!(s->cpuid_ext_features & CPUID_EXT_CX16))
                 goto illegal_op;
-            gen_jmp_im(pc_start - s->cs_base);
             gen_update_cc_op(s);
             gen_lea_modrm(env, s, modrm);
             gen_helper_cmpxchg16b(cpu_env, cpu_A0);
@@ -5221,7 +5211,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
         {
             if (!(s->cpuid_features & CPUID_CX8))
                 goto illegal_op;
-            gen_jmp_im(pc_start - s->cs_base);
             gen_update_cc_op(s);
             gen_lea_modrm(env, s, modrm);
             gen_helper_cmpxchg8b(cpu_env, cpu_A0);
@@ -5838,7 +5827,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 break;
             case 0x0c: /* fldenv mem */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_fldenv(cpu_env, cpu_A0, tcg_const_i32(dflag - 1));
                 break;
             case 0x0d: /* fldcw mem */
@@ -5848,7 +5836,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 break;
             case 0x0e: /* fnstenv mem */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_fstenv(cpu_env, cpu_A0, tcg_const_i32(dflag - 1));
                 break;
             case 0x0f: /* fnstcw mem */
@@ -5858,23 +5845,19 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 break;
             case 0x1d: /* fldt mem */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_fldt_ST0(cpu_env, cpu_A0);
                 break;
             case 0x1f: /* fstpt mem */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_fstt_ST0(cpu_env, cpu_A0);
                 gen_helper_fpop(cpu_env);
                 break;
             case 0x2c: /* frstor mem */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_frstor(cpu_env, cpu_A0, tcg_const_i32(dflag - 1));
                 break;
             case 0x2e: /* fnsave mem */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_fsave(cpu_env, cpu_A0, tcg_const_i32(dflag - 1));
                 break;
             case 0x2f: /* fnstsw mem */
@@ -5889,7 +5872,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 break;
             case 0x3e: /* fbstp */
                 gen_update_cc_op(s);
-                gen_jmp_im(pc_start - s->cs_base);
                 gen_helper_fbst_ST0(cpu_env, cpu_A0);
                 gen_helper_fpop(cpu_env);
                 break;
@@ -5925,7 +5907,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
                 case 0: /* fnop */
                     /* check exceptions (FreeBSD FPU probe) */
                     gen_update_cc_op(s);
-                    gen_jmp_im(pc_start - s->cs_base);
                     gen_helper_fwait(cpu_env);
                     break;
                 default:
@@ -6896,7 +6877,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             gen_exception(s, EXCP07_PREX, pc_start - s->cs_base);
         } else {
             gen_update_cc_op(s);
-            gen_jmp_im(pc_start - s->cs_base);
             gen_helper_fwait(cpu_env);
         }
         break;
@@ -6980,7 +6960,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             goto illegal_op;
         gen_op_mov_v_reg(ot, cpu_T[0], reg);
         gen_lea_modrm(env, s, modrm);
-        gen_jmp_im(pc_start - s->cs_base);
         tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_T[0]);
         if (ot == MO_16) {
             gen_helper_boundw(cpu_env, cpu_A0, cpu_tmp2_i32);
@@ -7172,7 +7151,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             } else {
                 gen_svm_check_intercept(s, pc_start, SVM_EXIT_LDTR_WRITE);
                 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
-                gen_jmp_im(pc_start - s->cs_base);
                 tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_T[0]);
                 gen_helper_lldt(cpu_env, cpu_tmp2_i32);
             }
@@ -7193,7 +7171,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             } else {
                 gen_svm_check_intercept(s, pc_start, SVM_EXIT_TR_WRITE);
                 gen_ldst_modrm(env, s, modrm, MO_16, OR_TMP0, 0);
-                gen_jmp_im(pc_start - s->cs_base);
                 tcg_gen_trunc_tl_i32(cpu_tmp2_i32, cpu_T[0]);
                 gen_helper_ltr(cpu_env, cpu_tmp2_i32);
             }
@@ -7727,7 +7704,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             gen_lea_modrm(env, s, modrm);
             gen_update_cc_op(s);
-            gen_jmp_im(pc_start - s->cs_base);
             gen_helper_fxsave(cpu_env, cpu_A0, tcg_const_i32(dflag == MO_64));
             break;
         case 1: /* fxrstor */
@@ -7740,7 +7716,6 @@ static target_ulong disas_insn(CPUX86State *env, DisasContext *s,
             }
             gen_lea_modrm(env, s, modrm);
             gen_update_cc_op(s);
-            gen_jmp_im(pc_start - s->cs_base);
             gen_helper_fxrstor(cpu_env, cpu_A0, tcg_const_i32(dflag == MO_64));
             break;
         case 2: /* ldmxcsr */

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr Pavel Dovgalyuk
@ 2015-06-17 12:53   ` Paolo Bonzini
  2015-06-18  5:17     ` Pavel Dovgaluk
  2015-06-18  9:24     ` Pavel Dovgaluk
  0 siblings, 2 replies; 29+ messages in thread
From: Paolo Bonzini @ 2015-06-17 12:53 UTC (permalink / raw)
  To: Pavel Dovgalyuk, qemu-devel; +Cc: rth7680, leon.alrae, aurelien



On 17/06/2015 14:42, Pavel Dovgalyuk wrote:
> This patch introduces several helpers to pass return address
> which points to the TB. Correct return address allows correct
> restoring of the guest PC and icount. These functions should be used when
> helpers embedded into TB invoke memory operations.
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  include/exec/cpu_ldst_template.h |   42 +++++++++++++++++++++++++++++++-------
>  include/exec/exec-all.h          |   27 ++++++++++++++++++++++++
>  softmmu_template.h               |   18 ++++++++++++++++
>  3 files changed, 79 insertions(+), 8 deletions(-)
> 
> diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
> index 95ab750..1847816 100644
> --- a/include/exec/cpu_ldst_template.h
> +++ b/include/exec/cpu_ldst_template.h
> @@ -62,7 +62,9 @@
>  /* generic load/store macros */
>  
>  static inline RES_TYPE
> -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
> +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
> +                                                  target_ulong ptr,
> +                                                  uintptr_t retaddr)

Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?

> diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> index 856e698..b3aefde 100644
> --- a/include/exec/exec-all.h
> +++ b/include/exec/exec-all.h
> @@ -350,6 +350,33 @@ struct MemoryRegion *iotlb_to_region(CPUState *cpu,
>  void tlb_fill(CPUState *cpu, target_ulong addr, int is_write, int mmu_idx,
>                uintptr_t retaddr);
>  
> +uint8_t helper_call_ldb_cmmu(CPUArchState *env, target_ulong addr,
> +                             int mmu_idx, uintptr_t retaddr);

Here we already have helper_ret_ldb_cmmu, so the new function is only
needed if DATA_SIZE != 1.

> +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
> +                              int mmu_idx, uintptr_t retaddr);

What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 case?

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 2/3] target-mips: exceptions handling in icount mode
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 2/3] target-mips: exceptions handling in icount mode Pavel Dovgalyuk
@ 2015-06-17 13:05   ` Aurelien Jarno
  0 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-17 13:05 UTC (permalink / raw)
  To: Pavel Dovgalyuk; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

On 2015-06-17 15:42, Pavel Dovgalyuk wrote:
> This patch fixes exception handling in MIPS.
> Instructions generate several types of exceptions.
> When exception is generated, it breaks the execution of the current translation
> block. Implementation of the exceptions handling does not correctly
> restore icount for the instruction which caused the exception. In most cases
> icount will be decreased by the value equal to the size of TB.
> This patch passes pointer to the translation block internals to the exception
> handler. It allows correct restoring of the icount value.
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  target-mips/cpu.h        |   28 +++++++
>  target-mips/helper.h     |    1 
>  target-mips/msa_helper.c |    5 +
>  target-mips/op_helper.c  |  183 ++++++++++++++++++++++------------------------
>  target-mips/translate.c  |   46 ++++++------
>  5 files changed, 141 insertions(+), 122 deletions(-)

[ snip ]

> diff --git a/target-mips/op_helper.c b/target-mips/op_helper.c
> index 73a8e45..2815c60 100644
> --- a/target-mips/op_helper.c
> +++ b/target-mips/op_helper.c
> @@ -30,41 +30,23 @@ static inline void cpu_mips_tlb_flush (CPUMIPSState *env, int flush_global);
>  /*****************************************************************************/
>  /* Exceptions processing helpers */
>  
> -static inline void QEMU_NORETURN do_raise_exception_err(CPUMIPSState *env,
> -                                                        uint32_t exception,
> -                                                        int error_code,
> -                                                        uintptr_t pc)
> +void helper_raise_exception_err(CPUMIPSState *env, uint32_t exception,
> +                                int error_code)
>  {
> -    CPUState *cs = CPU(mips_env_get_cpu(env));
> -
> -    if (exception < EXCP_SC) {
> -        qemu_log("%s: %d %d\n", __func__, exception, error_code);
> -    }
> -    cs->exception_index = exception;
> -    env->error_code = error_code;
> -
> -    if (pc) {
> -        /* now we have a real cpu fault */
> -        cpu_restore_state(cs, pc);
> -    }
> -
> -    cpu_loop_exit(cs);
> +    do_raise_exception_err(env, exception, error_code, GETPC());
>  }
>  
> -static inline void QEMU_NORETURN do_raise_exception(CPUMIPSState *env,
> -                                                    uint32_t exception,
> -                                                    uintptr_t pc)
> +void helper_raise_exception(CPUMIPSState *env, uint32_t exception)
>  {
> -    do_raise_exception_err(env, exception, 0, pc);
> +    do_raise_exception(env, exception, GETPC());
>  }

raise_exception is used to implement the SYSCALL instruction on MIPS.
With this change, this mean that each TB containing a syscall will have
to be translated at least twice, probably more. That's not something
acceptable performance wise.

> diff --git a/target-mips/translate.c b/target-mips/translate.c
> index fd063a2..0de9244 100644
> --- a/target-mips/translate.c
> +++ b/target-mips/translate.c
> @@ -1673,7 +1673,7 @@ generate_exception_err (DisasContext *ctx, int excp, int err)
>  {
>      TCGv_i32 texcp = tcg_const_i32(excp);
>      TCGv_i32 terr = tcg_const_i32(err);
> -    save_cpu_state(ctx, 1);
> +    save_cpu_state(ctx, 0);

If retranslation is used, you don't even need to save the branch status,
restore_state_to_opc can restore it.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-17 12:41 [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Pavel Dovgalyuk
                   ` (2 preceding siblings ...)
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 3/3] target-i386: fix memory operations in helpers Pavel Dovgalyuk
@ 2015-06-17 13:24 ` Aurelien Jarno
  2015-06-18  6:18   ` Pavel Dovgaluk
  2015-06-17 14:19 ` Aurelien Jarno
  4 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-17 13:24 UTC (permalink / raw)
  To: Pavel Dovgalyuk; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> This set of patches fixes exception handling for MIPS and i386 targets.
> These targets contain instructions that break correct execution in 
> icount/TCG modes (MIPS) and in regular TCG mode (i386).

Just to be clear, this is not something specific to MIPS and i386.
Every target which call cpu_loop_exit without doing a retranslation is
affected by the icount bug.

> Incorrect execution for i386 is causes by exceptions raised by MMU functions.
> MMU helper functions are called from generated code and other helper
> functions. In both cases they try to get function's return address for
> restoring virtual CPU state.
> 
> When MMU helper is called from some other helper function
> (like helper_maskmov_xmm) through cpu_st* function, the return address
> will point to that helper. That is why CPU state cannot be restored in
> the case of MMU fault.
> 
> This bug can occur when maskmov instruction is located in the middle of the
> translation block.
> 
> Execution sequence for this example:
> 
> TB start:
> PC1: instr1
>      instr2
> PC2: maskmov <page fault>
>      <page fault processing>
> PC1: instr1
>      instr2
>      maskmov
> 
> At the start of TB execution guest PC points to instr1. When page fault occurs
> QEMU tries to restore guest PC (which should be equal to PC2). It reads host PC
> from the call stack and checks whether it points to TB or not. Bug in ldst
> helpers implementation provides incorrect host PC, which is not located within
> the TB. That's why QEMU cannot recover guest PC and it remains the same (PC1).
> After page fault processing QEMU restarts TB and executes instr1 and instr2
> for the second time, because guest PC was not recovered.

Also just to be clear, the way you propose is just one way to fix the
issue. The other way (which is used for all similar instructions) is to
call just before the helper:
    gen_update_cc_op(s);
    gen_jmp_im(pc_start - s->cs_base);
		
> Bugs in MIPS helper functions do not break the execution in regular TCG mode,
> because PC value is updated before calling the functions that can raise
> an exception. But icount value cannot be updated this way. Therefore
> exceptions make execution in icount mode non-determinisic.
> In icount mode every translation block looks as follows:
> 
> if icount < n then exit
> icount -= n
> instr1
> instr2
> ...
> instrn
> exit
> 
> When one of these instructions initiates an exception, icount should be 
> restored and adjusted number of instructions should be subtracted from icount
> instead of initial n.
> 
> tlb_fill function passes retaddr to raise_exception, which allows restoring
> current instructions in TB and correct icount calculation.
> 
> When exception triggered with other function (e.g. by embedding call to 
> exception raising helper into TB), then PC is not passed as retaddr and
> correct icount is not recovered. In such cases icount will be decreased 
> by the value equal to the size of TB.
> 
> This behavior leads to incorrect values of virtual clock and 
> non-deterministic execution of the code.
> 
> These patches passes pointer to the translation block code to the exception
> handler. It allows correct restoring of PC and icount values.

While I think it's the correct way for load/stores or FPU exceptions, I
am not convinced we should do that in all cases. Retranslation has a
cost, and when the exception is likely to occur, it's better to save the
CPU state instead of going for a retranslation. Actually we had to
rollback such a change on SH4, as it has some performances issues. See
commit 1012740098d0307ce5d957ebbe9a7f020da7f574.

One way to fix that would be to reduce the cost of retranslation, for
example by using a kind of exception table that we generate at the same
time of the TB.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 3/3] target-i386: fix memory operations in helpers
  2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 3/3] target-i386: fix memory operations in helpers Pavel Dovgalyuk
@ 2015-06-17 13:27   ` Aurelien Jarno
  0 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-17 13:27 UTC (permalink / raw)
  To: Pavel Dovgalyuk; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

On 2015-06-17 15:42, Pavel Dovgalyuk wrote:
> This patch passes TB return address into softmmu functions that are
> invoked from target helpers. This allows correct PC and icount recovering
> while handling MMU faults.
> 
> Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> ---
>  target-i386/cc_helper.c   |    2 
>  target-i386/cpu.h         |    5 
>  target-i386/excp_helper.c |   23 +
>  target-i386/fpu_helper.c  |  146 +++++----
>  target-i386/helper.c      |    4 
>  target-i386/int_helper.c  |   32 +-
>  target-i386/mem_helper.c  |   39 +-
>  target-i386/misc_helper.c |   12 -
>  target-i386/ops_sse.h     |    2 
>  target-i386/seg_helper.c  |  712 +++++++++++++++++++++++----------------------
>  target-i386/svm_helper.c  |    4 
>  target-i386/translate.c   |   25 --
>  12 files changed, 506 insertions(+), 500 deletions(-)

[ snip ]

> diff --git a/target-i386/excp_helper.c b/target-i386/excp_helper.c
> index 99fca84..48be348 100644
> --- a/target-i386/excp_helper.c
> +++ b/target-i386/excp_helper.c
> @@ -108,6 +109,10 @@ static void QEMU_NORETURN raise_interrupt2(CPUX86State *env, int intno,
>      env->error_code = error_code;
>      env->exception_is_int = is_int;
>      env->exception_next_eip = env->eip + next_eip_addend;
> +    if (retaddr) {
> +        /* now we have a real cpu fault */
> +        cpu_restore_state(cs, retaddr);
> +    }
>      cpu_loop_exit(cs);
>  }

If we have to add this pattern to all targets, it's probably better to
add a cpu_loop_exit function which takes a return address in argument to
the core code. This also has the advantage that we know that all code
has been converted once cpu_loop_exit can be removed.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-17 12:41 [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Pavel Dovgalyuk
                   ` (3 preceding siblings ...)
  2015-06-17 13:24 ` [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Aurelien Jarno
@ 2015-06-17 14:19 ` Aurelien Jarno
  2015-06-18  7:12   ` Pavel Dovgaluk
       [not found]   ` <55826f70.2215370a.4634.ffff91b2SMTPIN_ADDED_BROKEN@mx.google.com>
  4 siblings, 2 replies; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-17 14:19 UTC (permalink / raw)
  To: Pavel Dovgalyuk; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> In icount mode every translation block looks as follows:
> 
> if icount < n then exit
> icount -= n
> instr1
> instr2
> ...
> instrn
> exit
> 
> When one of these instructions initiates an exception, icount should be 
> restored and adjusted number of instructions should be subtracted from icount
> instead of initial n.
> 
> tlb_fill function passes retaddr to raise_exception, which allows restoring
> current instructions in TB and correct icount calculation.
> 
> When exception triggered with other function (e.g. by embedding call to 
> exception raising helper into TB), then PC is not passed as retaddr and
> correct icount is not recovered. In such cases icount will be decreased 
> by the value equal to the size of TB.

Looking at how icount work, I see it's basically a variable in the CPU
state (icount_decr.u16.low), which is already accessed from the TB.
Couldn't we adjust it using additional code before generating an
exception, when in icount mode.

For example for MIPS, we can add some code before generate_exception
which use the value from s->gen_opc_icount[j] to adjust
the variable icount_decr.u16.low.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-17 12:53   ` Paolo Bonzini
@ 2015-06-18  5:17     ` Pavel Dovgaluk
  2015-06-18  8:16       ` Paolo Bonzini
  2015-06-18  9:24     ` Pavel Dovgaluk
  1 sibling, 1 reply; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  5:17 UTC (permalink / raw)
  To: 'Paolo Bonzini', qemu-devel; +Cc: rth7680, leon.alrae, aurelien

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 17/06/2015 14:42, Pavel Dovgalyuk wrote:
> > This patch introduces several helpers to pass return address
> > which points to the TB. Correct return address allows correct
> > restoring of the guest PC and icount. These functions should be used when
> > helpers embedded into TB invoke memory operations.
> >
> > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > ---
> >  include/exec/cpu_ldst_template.h |   42 +++++++++++++++++++++++++++++++-------
> >  include/exec/exec-all.h          |   27 ++++++++++++++++++++++++
> >  softmmu_template.h               |   18 ++++++++++++++++
> >  3 files changed, 79 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
> > index 95ab750..1847816 100644
> > --- a/include/exec/cpu_ldst_template.h
> > +++ b/include/exec/cpu_ldst_template.h
> > @@ -62,7 +62,9 @@
> >  /* generic load/store macros */
> >
> >  static inline RES_TYPE
> > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
> > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
> > +                                                  target_ulong ptr,
> > +                                                  uintptr_t retaddr)
> 
> Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?

I don't want to use 'helper' prefix, because helper functions are
usually called directly from TB.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-17 13:24 ` [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Aurelien Jarno
@ 2015-06-18  6:18   ` Pavel Dovgaluk
  0 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  6:18 UTC (permalink / raw)
  To: 'Aurelien Jarno'; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

> From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > This set of patches fixes exception handling for MIPS and i386 targets.
> > These targets contain instructions that break correct execution in
> > icount/TCG modes (MIPS) and in regular TCG mode (i386).
> 
> Just to be clear, this is not something specific to MIPS and i386.
> Every target which call cpu_loop_exit without doing a retranslation is
> affected by the icount bug.

True. But I haven't checked other platforms yet.

> > Incorrect execution for i386 is causes by exceptions raised by MMU functions.
> > MMU helper functions are called from generated code and other helper
> > functions. In both cases they try to get function's return address for
> > restoring virtual CPU state.
> >
> > When MMU helper is called from some other helper function
> > (like helper_maskmov_xmm) through cpu_st* function, the return address
> > will point to that helper. That is why CPU state cannot be restored in
> > the case of MMU fault.
> >
> > This bug can occur when maskmov instruction is located in the middle of the
> > translation block.
> >
> > Execution sequence for this example:
> >
> > TB start:
> > PC1: instr1
> >      instr2
> > PC2: maskmov <page fault>
> >      <page fault processing>
> > PC1: instr1
> >      instr2
> >      maskmov
> >
> > At the start of TB execution guest PC points to instr1. When page fault occurs
> > QEMU tries to restore guest PC (which should be equal to PC2). It reads host PC
> > from the call stack and checks whether it points to TB or not. Bug in ldst
> > helpers implementation provides incorrect host PC, which is not located within
> > the TB. That's why QEMU cannot recover guest PC and it remains the same (PC1).
> > After page fault processing QEMU restarts TB and executes instr1 and instr2
> > for the second time, because guest PC was not recovered.
> 
> Also just to be clear, the way you propose is just one way to fix the
> issue. The other way (which is used for all similar instructions) is to
> call just before the helper:
>     gen_update_cc_op(s);
>     gen_jmp_im(pc_start - s->cs_base);

This would work only for regular execution, not for icount one.

> > Bugs in MIPS helper functions do not break the execution in regular TCG mode,
> > because PC value is updated before calling the functions that can raise
> > an exception. But icount value cannot be updated this way. Therefore
> > exceptions make execution in icount mode non-determinisic.
> > In icount mode every translation block looks as follows:
> >
> > if icount < n then exit
> > icount -= n
> > instr1
> > instr2
> > ...
> > instrn
> > exit
> >
> > When one of these instructions initiates an exception, icount should be
> > restored and adjusted number of instructions should be subtracted from icount
> > instead of initial n.
> >
> > tlb_fill function passes retaddr to raise_exception, which allows restoring
> > current instructions in TB and correct icount calculation.
> >
> > When exception triggered with other function (e.g. by embedding call to
> > exception raising helper into TB), then PC is not passed as retaddr and
> > correct icount is not recovered. In such cases icount will be decreased
> > by the value equal to the size of TB.
> >
> > This behavior leads to incorrect values of virtual clock and
> > non-deterministic execution of the code.
> >
> > These patches passes pointer to the translation block code to the exception
> > handler. It allows correct restoring of PC and icount values.
> 
> While I think it's the correct way for load/stores or FPU exceptions, I
> am not convinced we should do that in all cases. Retranslation has a
> cost, and when the exception is likely to occur, it's better to save the
> CPU state instead of going for a retranslation. Actually we had to
> rollback such a change on SH4, as it has some performances issues. See
> commit 1012740098d0307ce5d957ebbe9a7f020da7f574.

Ok, I'll double check the patch to make translation to stop when exception
is likely to occur after current instruction.

> One way to fix that would be to reduce the cost of retranslation, for
> example by using a kind of exception table that we generate at the same
> time of the TB.

What exactly do you mean?

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-17 14:19 ` Aurelien Jarno
@ 2015-06-18  7:12   ` Pavel Dovgaluk
  2015-06-18  8:16     ` Aurelien Jarno
       [not found]   ` <55826f70.2215370a.4634.ffff91b2SMTPIN_ADDED_BROKEN@mx.google.com>
  1 sibling, 1 reply; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  7:12 UTC (permalink / raw)
  To: 'Aurelien Jarno'; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

> From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > In icount mode every translation block looks as follows:
> >
> > if icount < n then exit
> > icount -= n
> > instr1
> > instr2
> > ...
> > instrn
> > exit
> >
> > When one of these instructions initiates an exception, icount should be
> > restored and adjusted number of instructions should be subtracted from icount
> > instead of initial n.
> >
> > tlb_fill function passes retaddr to raise_exception, which allows restoring
> > current instructions in TB and correct icount calculation.
> >
> > When exception triggered with other function (e.g. by embedding call to
> > exception raising helper into TB), then PC is not passed as retaddr and
> > correct icount is not recovered. In such cases icount will be decreased
> > by the value equal to the size of TB.
> 
> Looking at how icount work, I see it's basically a variable in the CPU
> state (icount_decr.u16.low), which is already accessed from the TB.
> Couldn't we adjust it using additional code before generating an
> exception, when in icount mode.
> 
> For example for MIPS, we can add some code before generate_exception
> which use the value from s->gen_opc_icount[j] to adjust
> the variable icount_decr.u16.low.

It is possible, but it will incur additional overhead, because we will 
have to update icount every time the exception might be generated.
We'll have to update icount value before and after every helper call, 
that can cause an exception:

icount -= n
...
instr_k
icount += n - k
helper
icount -= n - k
...

And this overhead will slowdown the code even if no exception occur.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
       [not found]   ` <55826f70.2215370a.4634.ffff91b2SMTPIN_ADDED_BROKEN@mx.google.com>
@ 2015-06-18  7:51     ` Peter Maydell
  2015-06-18  7:56       ` Pavel Dovgaluk
  0 siblings, 1 reply; 29+ messages in thread
From: Peter Maydell @ 2015-06-18  7:51 UTC (permalink / raw)
  To: Pavel Dovgaluk
  Cc: Paolo Bonzini, Richard Henderson, Leon Alrae, QEMU Developers,
	Aurelien Jarno

On 18 June 2015 at 08:12, Pavel Dovgaluk <Pavel.Dovgaluk@ispras.ru> wrote:
>> From: Aurelien Jarno [mailto:aurelien@aurel32.net]
>> Looking at how icount work, I see it's basically a variable in the CPU
>> state (icount_decr.u16.low), which is already accessed from the TB.
>> Couldn't we adjust it using additional code before generating an
>> exception, when in icount mode.
>>
>> For example for MIPS, we can add some code before generate_exception
>> which use the value from s->gen_opc_icount[j] to adjust
>> the variable icount_decr.u16.low.
>
> It is possible, but it will incur additional overhead, because we will
> have to update icount every time the exception might be generated.
> We'll have to update icount value before and after every helper call,
> that can cause an exception:
>
> icount -= n
> ...
> instr_k
> icount += n - k
> helper
> icount -= n - k
> ...
>
> And this overhead will slowdown the code even if no exception occur.

Right, this is a tradeoff: in some cases it's faster to assume
no exception and handle state resync by doing a retranslate.
In some cases it's faster to assume there will be an exception
and do a manual sync. Guest load/store is obviously in the
first category. Guest doing an instruction which always takes
an exception (like syscall insns) is in the second category.
For other cases there's a choice. We need to support both
approaches; obviously you can argue for any particular case
whether it should be approach 1 or approach 2.

thanks
-- PMM

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  7:51     ` Peter Maydell
@ 2015-06-18  7:56       ` Pavel Dovgaluk
  0 siblings, 0 replies; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  7:56 UTC (permalink / raw)
  To: 'Peter Maydell'
  Cc: 'Paolo Bonzini', 'Richard Henderson',
	'Leon Alrae', 'QEMU Developers',
	'Aurelien Jarno'

> From: Peter Maydell [mailto:peter.maydell@linaro.org]
> On 18 June 2015 at 08:12, Pavel Dovgaluk <Pavel.Dovgaluk@ispras.ru> wrote:
> >> From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> >> Looking at how icount work, I see it's basically a variable in the CPU
> >> state (icount_decr.u16.low), which is already accessed from the TB.
> >> Couldn't we adjust it using additional code before generating an
> >> exception, when in icount mode.
> >>
> >> For example for MIPS, we can add some code before generate_exception
> >> which use the value from s->gen_opc_icount[j] to adjust
> >> the variable icount_decr.u16.low.
> >
> > It is possible, but it will incur additional overhead, because we will
> > have to update icount every time the exception might be generated.
> > We'll have to update icount value before and after every helper call,
> > that can cause an exception:
> >
> > icount -= n
> > ...
> > instr_k
> > icount += n - k
> > helper
> > icount -= n - k
> > ...
> >
> > And this overhead will slowdown the code even if no exception occur.
> 
> Right, this is a tradeoff: in some cases it's faster to assume
> no exception and handle state resync by doing a retranslate.
> In some cases it's faster to assume there will be an exception
> and do a manual sync. Guest load/store is obviously in the
> first category. Guest doing an instruction which always takes
> an exception (like syscall insns) is in the second category.
> For other cases there's a choice. We need to support both
> approaches; obviously you can argue for any particular case
> whether it should be approach 1 or approach 2.

Syscall and non-implemented instructions are in third category - they
always take an exception. In this case the translation should be stopped
without any additional actions.

By the way, I implemented this 'third category' approach for mips
and measured the performance. It does not show any performance degradation
when compared to original unfixed version.
All other exception-generating helpers and instructions use approach 1.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-18  5:17     ` Pavel Dovgaluk
@ 2015-06-18  8:16       ` Paolo Bonzini
  2015-06-18  8:20         ` Aurelien Jarno
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2015-06-18  8:16 UTC (permalink / raw)
  To: Pavel Dovgaluk, qemu-devel; +Cc: rth7680, leon.alrae, aurelien



On 18/06/2015 07:17, Pavel Dovgaluk wrote:
>>> > >
>>> > >  static inline RES_TYPE
>>> > > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
>>> > > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
>>> > > +                                                  target_ulong ptr,
>>> > > +                                                  uintptr_t retaddr)
>> > 
>> > Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?
> I don't want to use 'helper' prefix, because helper functions are
> usually called directly from TB.

True, but in the end these have the same functionality as helpers, just
they're indirectly called from other helpers.

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  7:12   ` Pavel Dovgaluk
@ 2015-06-18  8:16     ` Aurelien Jarno
  2015-06-18  8:58       ` Pavel Dovgaluk
  2015-06-18  9:08       ` Aurelien Jarno
  0 siblings, 2 replies; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-18  8:16 UTC (permalink / raw)
  To: Pavel Dovgaluk; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

On 2015-06-18 10:12, Pavel Dovgaluk wrote:
> > From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> > On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > > In icount mode every translation block looks as follows:
> > >
> > > if icount < n then exit
> > > icount -= n
> > > instr1
> > > instr2
> > > ...
> > > instrn
> > > exit
> > >
> > > When one of these instructions initiates an exception, icount should be
> > > restored and adjusted number of instructions should be subtracted from icount
> > > instead of initial n.
> > >
> > > tlb_fill function passes retaddr to raise_exception, which allows restoring
> > > current instructions in TB and correct icount calculation.
> > >
> > > When exception triggered with other function (e.g. by embedding call to
> > > exception raising helper into TB), then PC is not passed as retaddr and
> > > correct icount is not recovered. In such cases icount will be decreased
> > > by the value equal to the size of TB.
> > 
> > Looking at how icount work, I see it's basically a variable in the CPU
> > state (icount_decr.u16.low), which is already accessed from the TB.
> > Couldn't we adjust it using additional code before generating an
> > exception, when in icount mode.
> > 
> > For example for MIPS, we can add some code before generate_exception
> > which use the value from s->gen_opc_icount[j] to adjust
> > the variable icount_decr.u16.low.
> 
> It is possible, but it will incur additional overhead, because we will 
> have to update icount every time the exception might be generated.
> We'll have to update icount value before and after every helper call, 
> that can cause an exception:
> 
> icount -= n
> ...
> instr_k
> icount += n - k
> helper
> icount -= n - k
> ...
> 
> And this overhead will slowdown the code even if no exception occur.

That's where I might disagree. Retranslation seems a very good idea on
the paper, but in practice it doesn't seems to always bring the
performance improvement it should. In addition it seems to be highly
dependent on the target. Just to give some numbers, on MIPS (as your
patch originally concerns this architecture), 40% of code generation is
actually due to retranslation. The problem is that over the time we have
improved a lot the code generation (liveness analysis, better register
allocation, constant propagation, ...) and thus we have increased the
code generation time. While it clearly has some benefits when this code
is actually executed, it's not the case when the code is simply
retranslated. In short we spend more time to find the CPU state
corresponding to an exception than before.

A simple way to show that is to apply the simple patch below, which
disable retranslation and save the CPU state before each instruction:

diff --git a/target-mips/translate.c b/target-mips/translate.c
index 1d128ee..5238d71 100644
--- a/target-mips/translate.c
+++ b/target-mips/translate.c
@@ -19435,6 +19435,7 @@ gen_intermediate_code_internal(MIPSCPU *cpu, TranslationBlock *tb,
     LOG_DISAS("\ntb %p idx %d hflags %04x\n", tb, ctx.mem_idx, ctx.hflags);
     gen_tb_start(tb);
     while (ctx.bstate == BS_NONE) {
+        save_cpu_state(&ctx, 1);
         if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
             QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
                 if (bp->pc == ctx.pc) {
diff --git a/translate-all.c b/translate-all.c
index b6b0e1c..3d4c017 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     int64_t ti;
 #endif
 
+    return -1;
+
 #ifdef CONFIG_PROFILER
     ti = profile_getclock();
 #endif

On x86, this patch brings a 5% boot time improvement on MIPS. One of the
reason is that the TCG code generator has a good knowledge about which
TCG ops or helpers can trigger an exception, so it can optimize out part
of the instructions saving the CPU state. I guess that the host CPUs have
also evolved over the time, now being superscalar and out-of-order so
that saving the CPU state can be done "in background". Also it's just a
quick and dirty patch, we can probably even do better.

All of that to say that I am worried for the performances to see more
paths through the retranslation code, especially on MIPS as it seems to
be costly. That said I haven't really look in details at other targets,
nor hosts.

Now to come back about your patches, we might want to simply fix icount
first, even if it has some performance impact, and deal with the
retranslation issue separately, as it concerns more than just icount.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-18  8:16       ` Paolo Bonzini
@ 2015-06-18  8:20         ` Aurelien Jarno
  0 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-18  8:20 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: rth7680, leon.alrae, qemu-devel, Pavel Dovgaluk

On 2015-06-18 10:16, Paolo Bonzini wrote:
> 
> 
> On 18/06/2015 07:17, Pavel Dovgaluk wrote:
> >>> > >
> >>> > >  static inline RES_TYPE
> >>> > > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
> >>> > > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
> >>> > > +                                                  target_ulong ptr,
> >>> > > +                                                  uintptr_t retaddr)
> >> > 
> >> > Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?
> > I don't want to use 'helper' prefix, because helper functions are
> > usually called directly from TB.
> 
> True, but in the end these have the same functionality as helpers, just
> they're indirectly called from other helpers.

Not fully. The idea is that the helpers are non-inline functions
handling the slow path. The cpu_ld##USUFFIX##MEMSUFFIX are inline
functions handling the fast path, and calling the helpers for the slow
path. That allows for example GCC to optimize the fast path when the
cpu_ld functions are used in a loop.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  8:16     ` Aurelien Jarno
@ 2015-06-18  8:58       ` Pavel Dovgaluk
  2015-06-18  9:08       ` Aurelien Jarno
  1 sibling, 0 replies; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  8:58 UTC (permalink / raw)
  To: 'Aurelien Jarno'; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

> From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> On 2015-06-18 10:12, Pavel Dovgaluk wrote:
> > > From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> > > On 2015-06-17 15:41, Pavel Dovgalyuk wrote:
> > > > In icount mode every translation block looks as follows:
> > > >
> > > > if icount < n then exit
> > > > icount -= n
> > > > instr1
> > > > instr2
> > > > ...
> > > > instrn
> > > > exit
> > > >
> > > > When one of these instructions initiates an exception, icount should be
> > > > restored and adjusted number of instructions should be subtracted from icount
> > > > instead of initial n.
> > > >
> > > > tlb_fill function passes retaddr to raise_exception, which allows restoring
> > > > current instructions in TB and correct icount calculation.
> > > >
> > > > When exception triggered with other function (e.g. by embedding call to
> > > > exception raising helper into TB), then PC is not passed as retaddr and
> > > > correct icount is not recovered. In such cases icount will be decreased
> > > > by the value equal to the size of TB.
> > >
> > > Looking at how icount work, I see it's basically a variable in the CPU
> > > state (icount_decr.u16.low), which is already accessed from the TB.
> > > Couldn't we adjust it using additional code before generating an
> > > exception, when in icount mode.
> > >
> > > For example for MIPS, we can add some code before generate_exception
> > > which use the value from s->gen_opc_icount[j] to adjust
> > > the variable icount_decr.u16.low.
> >
> > It is possible, but it will incur additional overhead, because we will
> > have to update icount every time the exception might be generated.
> > We'll have to update icount value before and after every helper call,
> > that can cause an exception:
> >
> > icount -= n
> > ...
> > instr_k
> > icount += n - k
> > helper
> > icount -= n - k
> > ...
> >
> > And this overhead will slowdown the code even if no exception occur.
> 
> That's where I might disagree. Retranslation seems a very good idea on
> the paper, but in practice it doesn't seems to always bring the
> performance improvement it should. In addition it seems to be highly
> dependent on the target. Just to give some numbers, on MIPS (as your
> patch originally concerns this architecture), 40% of code generation is
> actually due to retranslation. The problem is that over the time we have
> improved a lot the code generation (liveness analysis, better register
> allocation, constant propagation, ...) and thus we have increased the
> code generation time. While it clearly has some benefits when this code
> is actually executed, it's not the case when the code is simply
> retranslated. In short we spend more time to find the CPU state
> corresponding to an exception than before.
> 
...
> 
> All of that to say that I am worried for the performances to see more
> paths through the retranslation code, especially on MIPS as it seems to
> be costly. That said I haven't really look in details at other targets,
> nor hosts.

I fixed syscalls, exceptions that occur without any conditions,
and removed redundant calls to save_cpu_state. Then I measured the performance
without enabling icount. And Linux boots even faster than with original version.
I'll submit this version for review soon.

> Now to come back about your patches, we might want to simply fix icount
> first, even if it has some performance impact, and deal with the
> retranslation issue separately, as it concerns more than just icount.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  8:16     ` Aurelien Jarno
  2015-06-18  8:58       ` Pavel Dovgaluk
@ 2015-06-18  9:08       ` Aurelien Jarno
  2015-06-18  9:29         ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-18  9:08 UTC (permalink / raw)
  To: Pavel Dovgaluk; +Cc: pbonzini, rth7680, leon.alrae, qemu-devel

On 2015-06-18 10:16, Aurelien Jarno wrote:
> On x86, this patch brings a 5% boot time improvement on MIPS. One of the
> reason is that the TCG code generator has a good knowledge about which
> TCG ops or helpers can trigger an exception, so it can optimize out part
> of the instructions saving the CPU state. I guess that the host CPUs have
> also evolved over the time, now being superscalar and out-of-order so
> that saving the CPU state can be done "in background". Also it's just a
> quick and dirty patch, we can probably even do better.
> 
> All of that to say that I am worried for the performances to see more
> paths through the retranslation code, especially on MIPS as it seems to
> be costly. That said I haven't really look in details at other targets,
> nor hosts.

For an i386 guest still on an x86 host, I get a 4% slower boot time by
not using retranslation (see patch below). This is not that much
compared to the complexity retranslation bring us.

diff --git a/target-i386/translate.c b/target-i386/translate.c
index 58b1959..de65bba 100644
--- a/target-i386/translate.c
+++ b/target-i386/translate.c
@@ -8001,6 +8001,9 @@ static inline void gen_intermediate_code_internal(X86CPU *cpu,
 
     gen_tb_start(tb);
     for(;;) {
+        gen_update_cc_op(dc);
+        gen_jmp_im(pc_ptr - dc->cs_base);
+
         if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
             QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
                 if (bp->pc == pc_ptr &&
diff --git a/translate-all.c b/translate-all.c
index b6b0e1c..3d4c017 100644
--- a/translate-all.c
+++ b/translate-all.c
@@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
     int64_t ti;
 #endif
 
+    return -1;
+
 #ifdef CONFIG_PROFILER
     ti = profile_getclock();
 #endif

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply related	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-17 12:53   ` Paolo Bonzini
  2015-06-18  5:17     ` Pavel Dovgaluk
@ 2015-06-18  9:24     ` Pavel Dovgaluk
  2015-06-18  9:30       ` Paolo Bonzini
  1 sibling, 1 reply; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  9:24 UTC (permalink / raw)
  To: 'Paolo Bonzini', qemu-devel; +Cc: rth7680, leon.alrae, aurelien

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 17/06/2015 14:42, Pavel Dovgalyuk wrote:
> > This patch introduces several helpers to pass return address
> > which points to the TB. Correct return address allows correct
> > restoring of the guest PC and icount. These functions should be used when
> > helpers embedded into TB invoke memory operations.
> >
> > Signed-off-by: Pavel Dovgalyuk <pavel.dovgaluk@ispras.ru>
> > ---
> >  include/exec/cpu_ldst_template.h |   42 +++++++++++++++++++++++++++++++-------
> >  include/exec/exec-all.h          |   27 ++++++++++++++++++++++++
> >  softmmu_template.h               |   18 ++++++++++++++++
> >  3 files changed, 79 insertions(+), 8 deletions(-)
> >
> > diff --git a/include/exec/cpu_ldst_template.h b/include/exec/cpu_ldst_template.h
> > index 95ab750..1847816 100644
> > --- a/include/exec/cpu_ldst_template.h
> > +++ b/include/exec/cpu_ldst_template.h
> > @@ -62,7 +62,9 @@
> >  /* generic load/store macros */
> >
> >  static inline RES_TYPE
> > -glue(glue(cpu_ld, USUFFIX), MEMSUFFIX)(CPUArchState *env, target_ulong ptr)
> > +glue(glue(glue(cpu_ld, USUFFIX), MEMSUFFIX), _ra)(CPUArchState *env,
> > +                                                  target_ulong ptr,
> > +                                                  uintptr_t retaddr)
> 
> Would it make sense to call these helper_cpu_ld##USUFFIX##MEMSUFFIX?
> 
> > diff --git a/include/exec/exec-all.h b/include/exec/exec-all.h
> > index 856e698..b3aefde 100644
> > --- a/include/exec/exec-all.h
> > +++ b/include/exec/exec-all.h
> > @@ -350,6 +350,33 @@ struct MemoryRegion *iotlb_to_region(CPUState *cpu,
> >  void tlb_fill(CPUState *cpu, target_ulong addr, int is_write, int mmu_idx,
> >                uintptr_t retaddr);
> >
> > +uint8_t helper_call_ldb_cmmu(CPUArchState *env, target_ulong addr,
> > +                             int mmu_idx, uintptr_t retaddr);
> 
> Here we already have helper_ret_ldb_cmmu, so the new function is only
> needed if DATA_SIZE != 1.
> 
> > +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
> > +                              int mmu_idx, uintptr_t retaddr);
> 
> What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 case?

tcg.h breaks these definitions:

/* Temporary aliases until backends are converted.  */
#ifdef TARGET_WORDS_BIGENDIAN
# define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
# define helper_ret_lduw_mmu  helper_be_lduw_mmu
# define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
# define helper_ret_ldul_mmu  helper_be_ldul_mmu
# define helper_ret_ldq_mmu   helper_be_ldq_mmu
# define helper_ret_stw_mmu   helper_be_stw_mmu
# define helper_ret_stl_mmu   helper_be_stl_mmu
# define helper_ret_stq_mmu   helper_be_stq_mmu
#else

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  9:08       ` Aurelien Jarno
@ 2015-06-18  9:29         ` Paolo Bonzini
  2015-06-18  9:42           ` Aurelien Jarno
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2015-06-18  9:29 UTC (permalink / raw)
  To: Aurelien Jarno, Pavel Dovgaluk; +Cc: rth7680, leon.alrae, qemu-devel

On 18/06/2015 11:08, Aurelien Jarno wrote:
> For an i386 guest still on an x86 host, I get a 4% slower boot time by
> not using retranslation (see patch below). This is not that much
> compared to the complexity retranslation bring us.

QEMU could just always compute and store the restore_state information.
 TCG needs to help filling it in (a new TCG opcode?), but it should be easy.

Paolo

> diff --git a/target-i386/translate.c b/target-i386/translate.c
> index 58b1959..de65bba 100644
> --- a/target-i386/translate.c
> +++ b/target-i386/translate.c
> @@ -8001,6 +8001,9 @@ static inline void gen_intermediate_code_internal(X86CPU *cpu,
>  
>      gen_tb_start(tb);
>      for(;;) {
> +        gen_update_cc_op(dc);
> +        gen_jmp_im(pc_ptr - dc->cs_base);
> +
>          if (unlikely(!QTAILQ_EMPTY(&cs->breakpoints))) {
>              QTAILQ_FOREACH(bp, &cs->breakpoints, entry) {
>                  if (bp->pc == pc_ptr &&
> diff --git a/translate-all.c b/translate-all.c
> index b6b0e1c..3d4c017 100644
> --- a/translate-all.c
> +++ b/translate-all.c
> @@ -212,6 +212,8 @@ static int cpu_restore_state_from_tb(CPUState *cpu, TranslationBlock *tb,
>      int64_t ti;
>  #endif
>  
> +    return -1;
> +
>  #ifdef CONFIG_PROFILER
>      ti = profile_getclock();
>  #endif

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-18  9:24     ` Pavel Dovgaluk
@ 2015-06-18  9:30       ` Paolo Bonzini
  2015-06-18  9:33         ` Pavel Dovgaluk
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2015-06-18  9:30 UTC (permalink / raw)
  To: Pavel Dovgaluk, qemu-devel; +Cc: rth7680, leon.alrae, aurelien



On 18/06/2015 11:24, Pavel Dovgaluk wrote:
>>> > > +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
>>> > > +                              int mmu_idx, uintptr_t retaddr);
>> > 
>> > What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 case?
> tcg.h breaks these definitions:
> 
> /* Temporary aliases until backends are converted.  */
> #ifdef TARGET_WORDS_BIGENDIAN
> # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
> # define helper_ret_lduw_mmu  helper_be_lduw_mmu
> # define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
> # define helper_ret_ldul_mmu  helper_be_ldul_mmu
> # define helper_ret_ldq_mmu   helper_be_ldq_mmu
> # define helper_ret_stw_mmu   helper_be_stw_mmu
> # define helper_ret_stl_mmu   helper_be_stl_mmu
> # define helper_ret_stq_mmu   helper_be_stq_mmu
> #else

Isn't this exactly the same as your helper_call_ldw_cmmu?

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-18  9:30       ` Paolo Bonzini
@ 2015-06-18  9:33         ` Pavel Dovgaluk
  2015-06-18  9:35           ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-18  9:33 UTC (permalink / raw)
  To: 'Paolo Bonzini', qemu-devel; +Cc: rth7680, leon.alrae, aurelien

> From: Paolo Bonzini [mailto:pbonzini@redhat.com]
> On 18/06/2015 11:24, Pavel Dovgaluk wrote:
> >>> > > +uint16_t helper_call_ldw_cmmu(CPUArchState *env, target_ulong addr,
> >>> > > +                              int mmu_idx, uintptr_t retaddr);
> >> >
> >> > What about helper_ret_ldw_cmmu for consistency with the DATA_SIZE == 1 case?
> > tcg.h breaks these definitions:
> >
> > /* Temporary aliases until backends are converted.  */
> > #ifdef TARGET_WORDS_BIGENDIAN
> > # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
> > # define helper_ret_lduw_mmu  helper_be_lduw_mmu
> > # define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
> > # define helper_ret_ldul_mmu  helper_be_ldul_mmu
> > # define helper_ret_ldq_mmu   helper_be_ldq_mmu
> > # define helper_ret_stw_mmu   helper_be_stw_mmu
> > # define helper_ret_stl_mmu   helper_be_stl_mmu
> > # define helper_ret_stq_mmu   helper_be_stq_mmu
> > #else
> 
> Isn't this exactly the same as your helper_call_ldw_cmmu?

Yes, but I can't compile it yet.

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr
  2015-06-18  9:33         ` Pavel Dovgaluk
@ 2015-06-18  9:35           ` Paolo Bonzini
  0 siblings, 0 replies; 29+ messages in thread
From: Paolo Bonzini @ 2015-06-18  9:35 UTC (permalink / raw)
  To: Pavel Dovgaluk, qemu-devel; +Cc: rth7680, leon.alrae, aurelien



On 18/06/2015 11:33, Pavel Dovgaluk wrote:
> > > /* Temporary aliases until backends are converted.  */
> > > #ifdef TARGET_WORDS_BIGENDIAN
> > > # define helper_ret_ldsw_mmu  helper_be_ldsw_mmu
> > > # define helper_ret_lduw_mmu  helper_be_lduw_mmu
> > > # define helper_ret_ldsl_mmu  helper_be_ldsl_mmu
> > > # define helper_ret_ldul_mmu  helper_be_ldul_mmu
> > > # define helper_ret_ldq_mmu   helper_be_ldq_mmu
> > > # define helper_ret_stw_mmu   helper_be_stw_mmu
> > > # define helper_ret_stl_mmu   helper_be_stl_mmu
> > > # define helper_ret_stq_mmu   helper_be_stq_mmu
> > > #else
> > 
> > Isn't this exactly the same as your helper_call_ldw_cmmu?
> 
> Yes, but I can't compile it yet.

I'm not sure what's the problem.  Can you just move this part of
tcg/tcg.h to another header file?

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  9:29         ` Paolo Bonzini
@ 2015-06-18  9:42           ` Aurelien Jarno
  2015-06-18 10:02             ` Paolo Bonzini
  0 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-18  9:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: rth7680, leon.alrae, qemu-devel, Pavel Dovgaluk

On 2015-06-18 11:29, Paolo Bonzini wrote:
> On 18/06/2015 11:08, Aurelien Jarno wrote:
> > For an i386 guest still on an x86 host, I get a 4% slower boot time by
> > not using retranslation (see patch below). This is not that much
> > compared to the complexity retranslation bring us.
> 
> QEMU could just always compute and store the restore_state information.
>  TCG needs to help filling it in (a new TCG opcode?), but it should be easy.

Yes, that was another approach I have in mind (I called it exception
table in my other mail), but it requires a tiny more work than just
saving the CPU state all the time. The problem is that the state
information we want to save are varying for target to target. Going
through a TCG opcode means we can use the liveness analysis pass to save
the minimum amount of data.

That said I would like to push further the idea of always saving the CPU
state a bit more to see if we can keep the same performances. There are
still improvements to do, by removing more code on the core side (like
finding the call to tb_finc_pc which is now useless), or on the target
side by checking/improving helper flags. We might save the CPU state too
often if a helper doesn't declare it doesn't touch globals.

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18  9:42           ` Aurelien Jarno
@ 2015-06-18 10:02             ` Paolo Bonzini
  2015-06-18 17:42               ` Aurelien Jarno
  0 siblings, 1 reply; 29+ messages in thread
From: Paolo Bonzini @ 2015-06-18 10:02 UTC (permalink / raw)
  To: Pavel Dovgaluk, rth7680, leon.alrae, qemu-devel



On 18/06/2015 11:42, Aurelien Jarno wrote:
>> > QEMU could just always compute and store the restore_state information.
>> >  TCG needs to help filling it in (a new TCG opcode?), but it should be easy.
> Yes, that was another approach I have in mind (I called it exception
> table in my other mail),

Okay, understood.  My idea was more like always generating the gen_op_*
arrays.

> but it requires a tiny more work than just
> saving the CPU state all the time. The problem is that the state
> information we want to save are varying for target to target. Going
> through a TCG opcode means we can use the liveness analysis pass to save
> the minimum amount of data.

I mentioned a TCG opcode because the target PC is not available inside
the translator.  So the translator could pepper the TCG instruction
stream with things like

     checkpoint  $target_pc, $target_cc_op, $0

TCG can then use them to fill in an array stored inside the
TranslationBlock, together with the host PC.  Since the gen_opc_pc,
gen_opc_instr_start, gen_opc_icount arrays are inside tcg_ctx, it may be
a good idea to store the checkpoint information compressed in a byte
array (e.g. as a series of ULEB128 values---the host and target PCs can
even be stored as deltas from the last value).

As a first step, gen_intermediate_code_pc and tcg_gen_code_search_pc can
then be merged into a single target-independent function that
uncompresses the byte array up to the required host PC into tcg_ctx.
Later you can optimize them to remove the tcg_ctx arrays altogether.

So the patches could be something like this:

1) SPARC: put the jump target information directly in gen_opc_* without
using gen_opc_jump_pc (not trivial)

2) a few targets: instead of gen_opc_* arrays, use a new generic member
of tcg_ctx (similar to how csbase is used generically), e.g.
tcg_ctx.gen_opc_target1[] and tcg_ctx.gen_opc_target2[].

3) all targets: always fill in tcg_ctx.gen_*, even if search_pc is false

4) TCG: add support for a checkpoint operation, make it fill in
tcg_ctx.gen_*

5) all targets: change explicit filling of tcg_ctx.gen_* to use the
checkpoint operation

6) TCG/translate-all: convert gen_intermediate_code_pc as outlined above

> That said I would like to push further the idea of always saving the CPU
> state a bit more to see if we can keep the same performances. There are
> still improvements to do, by removing more code on the core side (like
> finding the call to tb_finc_pc which is now useless), or on the target
> side by checking/improving helper flags. We might save the CPU state too
> often if a helper doesn't declare it doesn't touch globals.

True, on the other hand there are a lot of helpers to audit...

Paolo

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18 10:02             ` Paolo Bonzini
@ 2015-06-18 17:42               ` Aurelien Jarno
  2015-06-19  5:09                 ` Pavel Dovgaluk
  0 siblings, 1 reply; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-18 17:42 UTC (permalink / raw)
  To: Paolo Bonzini; +Cc: rth7680, leon.alrae, qemu-devel, Pavel Dovgaluk

On 2015-06-18 12:02, Paolo Bonzini wrote:
> 
> 
> On 18/06/2015 11:42, Aurelien Jarno wrote:
> >> > QEMU could just always compute and store the restore_state information.
> >> >  TCG needs to help filling it in (a new TCG opcode?), but it should be easy.
> > Yes, that was another approach I have in mind (I called it exception
> > table in my other mail),
> 
> Okay, understood.  My idea was more like always generating the gen_op_*
> arrays.
> 
> > but it requires a tiny more work than just
> > saving the CPU state all the time. The problem is that the state
> > information we want to save are varying for target to target. Going
> > through a TCG opcode means we can use the liveness analysis pass to save
> > the minimum amount of data.
> 
> I mentioned a TCG opcode because the target PC is not available inside
> the translator.  So the translator could pepper the TCG instruction

Well it is available through s->gen_opc_pc, but that's not that clean.

> stream with things like
> 
>      checkpoint  $target_pc, $target_cc_op, $0

Yes, it's clearly better to add an explicit instruction for that. As
said we can pass it through the liveness analysis. But it means that
this instruction will have a variable number of arguments.

> TCG can then use them to fill in an array stored inside the
> TranslationBlock, together with the host PC.  Since the gen_opc_pc,
> gen_opc_instr_start, gen_opc_icount arrays are inside tcg_ctx, it may be
> a good idea to store the checkpoint information compressed in a byte
> array (e.g. as a series of ULEB128 values---the host and target PCs can
> even be stored as deltas from the last value).

Either as deltas to the last value or as delta from the start of the
TB. What I am worried about is the size of the checkpoint information,
even if we do some compression, we might have one per guest instruction.
I have implemented a naive version of that without compression, storing
the checkpoint data at the end of the generated code, and it's about 30%
of the size of the TB for MIPS. It's probably smaller on architectures
storing only the PC. Also it's size is quite variable. That's why it's
probably not a good idea to store it directly in the TranslationBlock.
I don't like storing it directly in the generated code either,
especially given this part is supposed to be executable.

> As a first step, gen_intermediate_code_pc and tcg_gen_code_search_pc can
> then be merged into a single target-independent function that
> uncompresses the byte array up to the required host PC into tcg_ctx.
> Later you can optimize them to remove the tcg_ctx arrays altogether.
> 
> So the patches could be something like this:
> 
> 1) SPARC: put the jump target information directly in gen_opc_* without
> using gen_opc_jump_pc (not trivial)
> 
> 2) a few targets: instead of gen_opc_* arrays, use a new generic member
> of tcg_ctx (similar to how csbase is used generically), e.g.
> tcg_ctx.gen_opc_target1[] and tcg_ctx.gen_opc_target2[].
> 
> 3) all targets: always fill in tcg_ctx.gen_*, even if search_pc is false
> 
> 4) TCG: add support for a checkpoint operation, make it fill in
> tcg_ctx.gen_*
> 
> 5) all targets: change explicit filling of tcg_ctx.gen_* to use the
> checkpoint operation
> 
> 6) TCG/translate-all: convert gen_intermediate_code_pc as outlined above

That's sounds like a plan when I have more time ;-)

Aurelien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-18 17:42               ` Aurelien Jarno
@ 2015-06-19  5:09                 ` Pavel Dovgaluk
  2015-06-19  8:22                   ` Aurelien Jarno
  0 siblings, 1 reply; 29+ messages in thread
From: Pavel Dovgaluk @ 2015-06-19  5:09 UTC (permalink / raw)
  To: 'Aurelien Jarno', 'Paolo Bonzini'
  Cc: rth7680, leon.alrae, qemu-devel

> From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> On 2015-06-18 12:02, Paolo Bonzini wrote:
> >
> > TCG can then use them to fill in an array stored inside the
> > TranslationBlock, together with the host PC.  Since the gen_opc_pc,
> > gen_opc_instr_start, gen_opc_icount arrays are inside tcg_ctx, it may be
> > a good idea to store the checkpoint information compressed in a byte
> > array (e.g. as a series of ULEB128 values---the host and target PCs can
> > even be stored as deltas from the last value).
> 
> Either as deltas to the last value or as delta from the start of the
> TB. What I am worried about is the size of the checkpoint information,
> even if we do some compression, we might have one per guest instruction.
> I have implemented a naive version of that without compression, storing
> the checkpoint data at the end of the generated code, and it's about 30%
> of the size of the TB for MIPS. It's probably smaller on architectures
> storing only the PC. Also it's size is quite variable. That's why it's
> probably not a good idea to store it directly in the TranslationBlock.
> I don't like storing it directly in the generated code either,
> especially given this part is supposed to be executable.
> 
> > As a first step, gen_intermediate_code_pc and tcg_gen_code_search_pc can
> > then be merged into a single target-independent function that
> > uncompresses the byte array up to the required host PC into tcg_ctx.
> > Later you can optimize them to remove the tcg_ctx arrays altogether.
> >
> > So the patches could be something like this:
> >
> > 1) SPARC: put the jump target information directly in gen_opc_* without
> > using gen_opc_jump_pc (not trivial)
> >
> > 2) a few targets: instead of gen_opc_* arrays, use a new generic member
> > of tcg_ctx (similar to how csbase is used generically), e.g.
> > tcg_ctx.gen_opc_target1[] and tcg_ctx.gen_opc_target2[].
> >
> > 3) all targets: always fill in tcg_ctx.gen_*, even if search_pc is false
> >
> > 4) TCG: add support for a checkpoint operation, make it fill in
> > tcg_ctx.gen_*
> >
> > 5) all targets: change explicit filling of tcg_ctx.gen_* to use the
> > checkpoint operation
> >
> > 6) TCG/translate-all: convert gen_intermediate_code_pc as outlined above
> 
> That's sounds like a plan when I have more time ;-)

Doesn't this approach still require my fixes to work correctly?

Pavel Dovgalyuk

^ permalink raw reply	[flat|nested] 29+ messages in thread

* Re: [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386
  2015-06-19  5:09                 ` Pavel Dovgaluk
@ 2015-06-19  8:22                   ` Aurelien Jarno
  0 siblings, 0 replies; 29+ messages in thread
From: Aurelien Jarno @ 2015-06-19  8:22 UTC (permalink / raw)
  To: Pavel Dovgaluk; +Cc: 'Paolo Bonzini', rth7680, leon.alrae, qemu-devel

On 2015-06-19 08:09, Pavel Dovgaluk wrote:
> > From: Aurelien Jarno [mailto:aurelien@aurel32.net]
> > On 2015-06-18 12:02, Paolo Bonzini wrote:
> > >
> > > TCG can then use them to fill in an array stored inside the
> > > TranslationBlock, together with the host PC.  Since the gen_opc_pc,
> > > gen_opc_instr_start, gen_opc_icount arrays are inside tcg_ctx, it may be
> > > a good idea to store the checkpoint information compressed in a byte
> > > array (e.g. as a series of ULEB128 values---the host and target PCs can
> > > even be stored as deltas from the last value).
> > 
> > Either as deltas to the last value or as delta from the start of the
> > TB. What I am worried about is the size of the checkpoint information,
> > even if we do some compression, we might have one per guest instruction.
> > I have implemented a naive version of that without compression, storing
> > the checkpoint data at the end of the generated code, and it's about 30%
> > of the size of the TB for MIPS. It's probably smaller on architectures
> > storing only the PC. Also it's size is quite variable. That's why it's
> > probably not a good idea to store it directly in the TranslationBlock.
> > I don't like storing it directly in the generated code either,
> > especially given this part is supposed to be executable.
> > 
> > > As a first step, gen_intermediate_code_pc and tcg_gen_code_search_pc can
> > > then be merged into a single target-independent function that
> > > uncompresses the byte array up to the required host PC into tcg_ctx.
> > > Later you can optimize them to remove the tcg_ctx arrays altogether.
> > >
> > > So the patches could be something like this:
> > >
> > > 1) SPARC: put the jump target information directly in gen_opc_* without
> > > using gen_opc_jump_pc (not trivial)
> > >
> > > 2) a few targets: instead of gen_opc_* arrays, use a new generic member
> > > of tcg_ctx (similar to how csbase is used generically), e.g.
> > > tcg_ctx.gen_opc_target1[] and tcg_ctx.gen_opc_target2[].
> > >
> > > 3) all targets: always fill in tcg_ctx.gen_*, even if search_pc is false
> > >
> > > 4) TCG: add support for a checkpoint operation, make it fill in
> > > tcg_ctx.gen_*
> > >
> > > 5) all targets: change explicit filling of tcg_ctx.gen_* to use the
> > > checkpoint operation
> > >
> > > 6) TCG/translate-all: convert gen_intermediate_code_pc as outlined above
> > 
> > That's sounds like a plan when I have more time ;-)
> 
> Doesn't this approach still require my fixes to work correctly?

Yes it does. 

Aurélien

-- 
Aurelien Jarno                          GPG: 4096R/1DDD8C9B
aurelien@aurel32.net                 http://www.aurel32.net

^ permalink raw reply	[flat|nested] 29+ messages in thread

end of thread, other threads:[~2015-06-19  8:22 UTC | newest]

Thread overview: 29+ messages (download: mbox.gz / follow: Atom feed)
-- links below jump to the message on this page --
2015-06-17 12:41 [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Pavel Dovgalyuk
2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 1/3] softmmu: add helper function to pass through retaddr Pavel Dovgalyuk
2015-06-17 12:53   ` Paolo Bonzini
2015-06-18  5:17     ` Pavel Dovgaluk
2015-06-18  8:16       ` Paolo Bonzini
2015-06-18  8:20         ` Aurelien Jarno
2015-06-18  9:24     ` Pavel Dovgaluk
2015-06-18  9:30       ` Paolo Bonzini
2015-06-18  9:33         ` Pavel Dovgaluk
2015-06-18  9:35           ` Paolo Bonzini
2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 2/3] target-mips: exceptions handling in icount mode Pavel Dovgalyuk
2015-06-17 13:05   ` Aurelien Jarno
2015-06-17 12:42 ` [Qemu-devel] [PATCH v2 3/3] target-i386: fix memory operations in helpers Pavel Dovgalyuk
2015-06-17 13:27   ` Aurelien Jarno
2015-06-17 13:24 ` [Qemu-devel] [PATCH v2 0/3] Fix exceptions handling for MIPS and i386 Aurelien Jarno
2015-06-18  6:18   ` Pavel Dovgaluk
2015-06-17 14:19 ` Aurelien Jarno
2015-06-18  7:12   ` Pavel Dovgaluk
2015-06-18  8:16     ` Aurelien Jarno
2015-06-18  8:58       ` Pavel Dovgaluk
2015-06-18  9:08       ` Aurelien Jarno
2015-06-18  9:29         ` Paolo Bonzini
2015-06-18  9:42           ` Aurelien Jarno
2015-06-18 10:02             ` Paolo Bonzini
2015-06-18 17:42               ` Aurelien Jarno
2015-06-19  5:09                 ` Pavel Dovgaluk
2015-06-19  8:22                   ` Aurelien Jarno
     [not found]   ` <55826f70.2215370a.4634.ffff91b2SMTPIN_ADDED_BROKEN@mx.google.com>
2015-06-18  7:51     ` Peter Maydell
2015-06-18  7:56       ` Pavel Dovgaluk

This is an external index of several public inboxes,
see mirroring instructions on how to clone and mirror
all data and code used by this external index.