Sunday, 14 December 2014

Neat Code

Code written in javascript, css, html, etc. is all very free-form, which means that taking care to organize code neatly is particularly difficult. As I work on a project, I often find that there are many ways to write a certain part of my code, but that I'm not sure which one is the nicest, most readable way to write the code. Given that the different options are nearly identical in performance, I don't have an easy tie-breaker for the different styles. Shorter code is not always easier to read, but longer code is not always sufficiently succinct. If a single css file is becoming thousands of lines long, it seems like I should refactor some of it, but isn't it uglier to have the code split between css files than have it all be in one easily searchable document? My solution to this has typically been to read others' code and model it off theirs. For this reason, I spend a fair amount of time browsing websites like codepen.io and various web design blogs to catch a glimpse of others' programming paradigms. But who is to say that their code is better? For this reason, I've read various books on general code organization, e.g. Code Complete and The Pragmatic Programmer, two books that I plan on looking back upon over the break.

Sunday, 7 December 2014

Material Design and color selection

I decided to take a break from my internship and instead read up on web design. Google has a pretty exhaustive article on their own design philosophy, dubbed material design. These guidelines are extremely specific (they even specify to have no space in between en-dashes when specifying a range of times). The entire design philosophy centers around imitating tangible objects (mostly paper) in your websites, to have an intuitive and elegant aesthetic. Beyond that, the design rules are pretty typical- attractive colors, strategic placement of elements, and intuitive animations.

Thursday, 27 November 2014

Temporal grouping in Streams

I have been working on a modified version of the Irregular Stream Buffer that considers the instruction time between consecutive memory accesses in it's stream definition, i.e. there are twice the average number of memory accesses between two accesses in the same program counter, so a new stream is created and the old one is discarded. Theoretically, this will benefit the program because the buffer will have a steady stream of new information. However, if the algorithm clears out the buffer too often, then the buffer will constantly be re-calibrating and will become inaccurate.

Sunday, 19 October 2014

Frailty, thy name is visually intuitive representations of large quantities of arbitrary data

For my internship I have graphed the number of memory addresses that have been accessed by each individual PC (Program Counter), in order to analyze the distribution of accesses between programs. Simply recording the data yields a fairly obtuse bank of data: literally 2 long strings of unreadable numbers. To represent his data in an intuitive manner, I am using 3 different approaches:
  1. Sort the PCs by their access frequency (i.e. the total number of accesses) and graph the expansion of the PCs so that the expansion is on the Y axis and the frequency is on the X axis.
  2. Create a histogram expressing the expansion v. the number of PCs.
  3. Create a cuumulative distribution function (CDF) graphing the PCs sorted by expansion v. the expansion. (This graph will constantly be increasing, so the importance is in the rate of increase.)

Monday, 13 October 2014

Fuzzy fuzzy access streams

In memory access streams there is a topic called "contraction", which refers to the ratio of the number of physical addresses to the number of structural addresses in an Irregular Stream Buffer. (You can read more about the Irregular Stream Buffer here.) A contraction of 1, i.e. having the same number of physical addresses and structural addresses, is best for caching because it means that accuracy will improve and the same thing will never the be cached twice.

Contraction has a close cousin, expansion. Expansion refers to the situation in which there are more structural addresses than physical addresses, so the Stream Buffer ends up having a larger than necessary memory footprint. Contraction refers to the opposite situation: there are more physical addresses than structural addresses, which leads to lost information.

Say we have the access stream
ABCDEF

Now, say we have the following access stream after this one
XBDCEF
This will cause expansion because this second stream will be recognized as a separate stream.

Alternatively, consider this access stream instead
AVWXYZ
This will cause contraction because this second stream will be recognized as a modification of the first access stream (assuming that the stream recognition system is lenient enough)

Sunday, 28 September 2014

Give that man some soup

For many memory prefetching algorithms, it is necessary to predict the re-reference interval of a memory access, so that the memory can be moved to the cache before the memory is called for, vastly decreasing memory access time.

Consider the following address stream:
ACDACBBD
In this example, the re-reference interval of D would typically be 4, because there are 4 memory accesses between consecutive accesses of D(I'll get to the reason that I say "typically" in a second). This means that, given the following access stream
ACDACBBDACAC
it would be natural to prefetch D, because, based on the idea that D is accesses every 4 accesses, it would likely be the next address. However, in practice, it has been shown that the next memory access is more likely to be something like BBD. Why is that? Well, we're not quite sure, but studies have shown that prefetching based on the unique re-reference interval, i.e. the number of unique memory accesses between consecutive addresses, produces significantly better results. Under this model, the unique re-reference interval of the first example would be 3, since there are 3 unique memory addresses: A, B, and C. The problem is that keeping track of the data required to find the unique re-reference interval is extremely difficult to do efficiently, so, for the past 2 weeks, I have been researching methods to do this efficiently.

Sunday, 14 September 2014

Some background information

I was told by my PI to graph the number of physical memory addresses that are passed through our simulator, and compare these to the structural addresses that these are mapped to. Many of the simulations didn't complete because the files' simpoints needed to be recompiled, so we recompiled the simpoints. To do all of this, we are using a software called Intel PIN tool that is a program through which one can run C++ code with a great deal of control over how the code is run. Simpoints are sets of instructions that are run. So, for example, if we compile 3 simpoints per benchmark, we are given 3 sets of output per benchmarks, each set being the result of ~25,000,000 instructions. We are using SPECfp as a benchmark, which is a standard floating-point algorithm benchmark that works by running state-of-the-art algorithms such as the Einstein evolution algorithm and quantum mechanical simulations.

I apologize that I can't give more accurate details as to the actual code that I'm working with, but I'm not really allowed to talk about it because the research is going towards an unreleased paper.

Tuesday, 2 September 2014

Good morning world

I'm splitting my time between the lasauil project that Neil, Ryan, and Evan are working on, and an internship with UT. For my internship I need to ssh into the UTCS computers, as well as a TACC (Texas Advanced Computing Center) allocations. Unfortunately, AISD blocks all SSH requests, so I've been trying to find ways around the block, with the help of Sam Grayson- I'm also contacting AISD directly about an exception. The work I am doing is related to cache architecture, specifically caching algorithms.

Friday, 16 May 2014

Graphing

Most libraries built for graphing data on web pages are large and clunky, so I opted to write my own client code to generate graphs of user data for the uil website. The data is given as two groups of time stamps and question ids, one for questions answered correctly and one for questions answered incorrectly. First, the data is normalized so that the time stamps are between 0 and 1, inclusive, so that the question with time stamp 0 is your first question answered and 1 is your last question answered. The data is then sorted and iterated through and graphed, keeping track of the ratio between your number of questions answered correctly and incorrectly, generating a graph of your general performance over time.

This style of data crunching is briefly very computationally intensive, so if users are opening a lot of pages it could chew up some bandwidth. Additionally, I'm not sure how well this will work if users have a very long history, but there won't be more data points than there are questions in the whole database, so that should be fairly easy for us to control. 

Tuesday, 29 April 2014

Let's try this again

Code is very dry, and for the most part, uninteresting to read in large quantities on a blog. For the most part, blog posts consisting of a lot of code just end up being blindly copied and pasted by someone scouring the web looking for the specific css they need. How do I know this? I do it- all the time. Yet, when I was looking for a guide to making columns in css I found the amazing blog A List Apart, whose quirky sketches and almost philosophical writing immediately caught my eye.

A List Apart presents practical, thematic css guides, giving the user a general set of rules and inspiration to use in their web pages, rather than a simple index of terms. Their guide to columns and page layouts describes how the proportions of the divs on a page define the page's tone and give it character.

It's also clear that they understand what is key in a website, because their website is easily maneuverable and manages to avoid being overwhelming, despite consisting mostly of giant masses of text.

This fluid and satisfying design philosophy is something I hope to bring into not only my projects, but also my code. As our UIL database project becomes larger and larger, it is crucial to effectively comment and manage the code- avoiding things such as 1000 line long route files and enormous blocks of CSS. But this is not as easy as splitting as splitting files into more javascript files with mildly descriptive filenames- for this only pushes the problem to the app.js file, forcing us to balance many more route files. So what is the solution? I have no idea, really. For now I'll consult The Pragmatic Programmer and maybe Code Complete. 

Monday, 28 April 2014

manifesto.json

For years we have been limited to two dimensions. We have been squashed flat by our languages and have been unable to flourish in the third dimension as we have so long desired- but no more! We shall thrive and reproduce, gaily prancing around in multiple planes. The Z-Axis gives us the ability, nay, the god-given right to express our web pages in a multitude of additional ways. For how long have we seen boxy, single-dimension pages? No longer, for we have been given a gift of vertices and CSS3. Z indices and transforms will revolutionize the way we experience the internet, and will gift unto us a new era of html.

Web workers of the world, unite. You have nothing to lose but your chains!