x86: use =A as output for RDTSC on x86-32

The timing_info benchmark on qemu_x86 shows this is a bit faster.

Before:
  START - Time Measurement
  Timing results: Clock frequency: 1000 MHz
  Context switch                               : 896 cycles ,   895 ns
  Interrupt latency                            : 768 cycles ,   767 ns
  Tick overhead                                :14912 cycles , 14911 ns
  Thread creation                              :18688 cycles , 18687 ns
  Thread abort (non-running)                   :49216 cycles , 49215 ns
  Thread abort (_current)                      :55616 cycles , 55615 ns
  Thread suspend                               :11072 cycles , 11071 ns
  Thread resume                                :10272 cycles , 10271 ns
  Thread yield                                 :12213 cycles , 12212 ns
  Thread sleep                                 :17984 cycles , 17983 ns
  Heap malloc                                  :21702 cycles , 21701 ns
  Heap free                                    :15176 cycles , 15175 ns
  Semaphore take with context switch           :19168 cycles , 19167 ns
  Semaphore give with context switch           :18400 cycles , 18399 ns
  Semaphore take without context switch        :2208 cycles ,  2207 ns
  Semaphore give without context switch        :4704 cycles ,  4703 ns
  Mutex lock                                   :1952 cycles ,  1951 ns
  Mutex unlock                                 :7936 cycles ,  7935 ns
  Message queue put with context switch        :20320 cycles , 20319 ns
  Message queue put without context switch     :5792 cycles ,  5791 ns
  Message queue get with context switch        :22112 cycles , 22111 ns
  Message queue get without context switch     :5312 cycles ,  5311 ns
  Mailbox synchronous put                      :27936 cycles , 27935 ns
  Mailbox synchronous get                      :23392 cycles , 23391 ns
  Mailbox asynchronous put                     :11808 cycles , 11807 ns
  Mailbox get without context switch           :20416 cycles , 20415 ns
  Drop to user mode                            :643712 cycles , 643711 ns
  User thread creation                         :652096 cycles , 652095 ns
  Syscall overhead                             :2720 cycles ,  2719 ns
  Validation overhead k_object init            :4256 cycles ,  4255 ns
  Validation overhead k_object permission      :4224 cycles ,  4223 ns
  Time Measurement finished

After:
  START - Time Measurement
  Timing results: Clock frequency: 1000 MHz
  Context switch                               : 896 cycles ,   895 ns
  Interrupt latency                            : 768 cycles ,   767 ns
  Tick overhead                                :14752 cycles , 14751 ns
  Thread creation                              :18464 cycles , 18463 ns
  Thread abort (non-running)                   :48992 cycles , 48991 ns
  Thread abort (_current)                      :55552 cycles , 55551 ns
  Thread suspend                               :10848 cycles , 10847 ns
  Thread resume                                :10048 cycles , 10047 ns
  Thread yield                                 :12213 cycles , 12212 ns
  Thread sleep                                 :17984 cycles , 17983 ns
  Heap malloc                                  :21702 cycles , 21701 ns
  Heap free                                    :15176 cycles , 15175 ns
  Semaphore take with context switch           :19104 cycles , 19103 ns
  Semaphore give with context switch           :18368 cycles , 18367 ns
  Semaphore take without context switch        :1984 cycles ,  1983 ns
  Semaphore give without context switch        :4480 cycles ,  4479 ns
  Mutex lock                                   :1728 cycles ,  1727 ns
  Mutex unlock                                 :7712 cycles ,  7711 ns
  Message queue put with context switch        :20224 cycles , 20223 ns
  Message queue put without context switch     :5568 cycles ,  5567 ns
  Message queue get with context switch        :22016 cycles , 22015 ns
  Message queue get without context switch     :5088 cycles ,  5087 ns
  Mailbox synchronous put                      :27840 cycles , 27839 ns
  Mailbox synchronous get                      :23296 cycles , 23295 ns
  Mailbox asynchronous put                     :11584 cycles , 11583 ns
  Mailbox get without context switch           :20192 cycles , 20191 ns
  Drop to user mode                            :643616 cycles , 643615 ns
  User thread creation                         :651872 cycles , 651871 ns
  Syscall overhead                             :2464 cycles ,  2463 ns
  Validation overhead k_object init            :4032 cycles ,  4031 ns
  Validation overhead k_object permission      :4000 cycles ,  3999 ns
  Time Measurement finished

Signed-off-by: Daniel Leung <daniel.leung@intel.com>
This commit is contained in:
Daniel Leung 2020-08-04 13:41:06 -07:00 committed by Maureen Helm
commit 80fb6538b3

View file

@ -290,11 +290,16 @@ static inline uint64_t z_tsc_read(void)
);
#endif
#ifdef CONFIG_X86_64
/*
* We cannot use "=A", since this would use %rax on x86_64 and
* return only the lower 32bits of the TSC
*/
__asm__ volatile ("rdtsc" : "=a" (rv.lo), "=d" (rv.hi));
#else
/* "=A" means that value is in eax:edx pair. */
__asm__ volatile ("rdtsc" : "=A" (rv.value));
#endif
return rv.value;
}