This is a title for this blog: September 2014

Sunday, 28 September 2014

Give that man some soup

For many memory prefetching algorithms, it is necessary to predict the re-reference interval of a memory access, so that the memory can be moved to the cache before the memory is called for, vastly decreasing memory access time.

Consider the following address stream:

ACDACBBD

In this example, the re-reference interval of D would typically be 4, because there are 4 memory accesses between consecutive accesses of D(I'll get to the reason that I say "typically" in a second). This means that, given the following access stream

ACDACBBDACAC

it would be natural to prefetch D, because, based on the idea that D is accesses every 4 accesses, it would likely be the next address. However, in practice, it has been shown that the next memory access is more likely to be something like BBD. Why is that? Well, we're not quite sure, but studies have shown that prefetching based on the unique re-reference interval, i.e. the number of unique memory accesses between consecutive addresses, produces significantly better results. Under this model, the unique re-reference interval of the first example would be 3, since there are 3 unique memory addresses: A, B, and C. The problem is that keeping track of the data required to find the unique re-reference interval is extremely difficult to do efficiently, so, for the past 2 weeks, I have been researching methods to do this efficiently.

Sunday, 14 September 2014

Some background information

I was told by my PI to graph the number of physical memory addresses that are passed through our simulator, and compare these to the structural addresses that these are mapped to. Many of the simulations didn't complete because the files' simpoints needed to be recompiled, so we recompiled the simpoints. To do all of this, we are using a software called Intel PIN tool that is a program through which one can run C++ code with a great deal of control over how the code is run. Simpoints are sets of instructions that are run. So, for example, if we compile 3 simpoints per benchmark, we are given 3 sets of output per benchmarks, each set being the result of ~25,000,000 instructions. We are using SPECfp as a benchmark, which is a standard floating-point algorithm benchmark that works by running state-of-the-art algorithms such as the Einstein evolution algorithm and quantum mechanical simulations.

I apologize that I can't give more accurate details as to the actual code that I'm working with, but I'm not really allowed to talk about it because the research is going towards an unreleased paper.

Tuesday, 2 September 2014

Good morning world

I'm splitting my time between the lasauil project that Neil, Ryan, and Evan are working on, and an internship with UT. For my internship I need to ssh into the UTCS computers, as well as a TACC (Texas Advanced Computing Center) allocations. Unfortunately, AISD blocks all SSH requests, so I've been trying to find ways around the block, with the help of Sam Grayson- I'm also contacting AISD directly about an exception. The work I am doing is related to cache architecture, specifically caching algorithms.