Jeff Milling's Personal Blog: February 2013

Wednesday, February 27, 2013

More Shell Scripting!

A quick program I wrote for testing runtimes.
Actually, I was testing the testing of runtimes.
Gotta love testing!

So I wrote version one of my testing script on Monday and ran it on ASU's server. It did alright but there were some errors and I was not certain that the runtime data I was getting could be trusted. This is because I was using the GNU date command, which is supposed to fetch the current UNIX time in milliseconds since the last epoch, in 1970. As you can probably tell, that is a very large number. So large, in fact, it would sometimes overflow the variable's range, and I was sometimes getting ridiculous negative numbers. So, back to the drawing board. Today I built a small program in C++, as shown to the right, that calculates the first million Fibonacci numbers, which only takes about 0.117 milliseconds. I can use this program as a target for my runtime testing script while I'm debugging it.

So that's what I've been doing, testing and writing and testing and writing. I'm learning a lot about shell scripting, about C++, and about Computer Science in general. I've been using an amazing tool I found, called TextMate. It is an open source text editor specifically designed for programmers. It can compile and run just about any programming language you throw at it, all inside the app. It's very impressive. I actually just talked to the main developer, Allan Odgaard, yesterday on IRC and told him so. I would highly recommend this to any of my fellow programmers running OSX.

Thanks for reading!

- Jeff

Monday, February 25, 2013

I am compiling compiled compilers.

On Friday, Dr. Bazzi offered me the opportunity to run some tests on some of his student's programs from the compiler project. So I have begun work on that. Basically, what I need to do is test the programs to make sure they are functioning correctly, then I need to time their execution and compile all of that data into a spreadsheet for further use. It's quite interesting how the student's compilers are tested. What I am given is a list of programs written in a simplified code that the student's compilers should be able to work with. I am also given a list of expected outputs of the student's programs. So I need to compile all the programs, run them on every test file, and make sure that their output matches the expected output. Sounds kind of tedious, right? Wrong!

Instead of doing all that work manually which would take hours and probably days, I'm doing it with a bash script. A bash script is a very high-level kind of code that is written in a computer's command prompt. Bash scripts are incredibly useful for automating tasks.

My friend and fellow researcher, Mohsen, showed me a script that he wrote in order to grade his students on their project. He ran the script and a few minutes later it had generated a huge matrix of 0s and 1s, which he could directly import into a spreadsheet for grading. I found it incredibly fitting that Computer Science students were graded in 0s and 1s.

So I'm writing a bash script to test the programs, and I'm also writing one to time the programs, so that I can take data regarding their typical run times. Although these programs are not designed to be fast, I feel that I have a large enough sample to record an typical run time for both Java and C++.

My goal for this project is to generate plentiful data that I can use for graphs on my presentation. To do so, I must congregate all of the student's programs into one spread sheet for data, which leads me to the title: I am compiling compiled compilers. Thanks for reading!

-Jeff

Friday, February 22, 2013

I am compiling a compiler.

I started my first large-scale programming project today. The problem I am solving was made by Dr. Bazzi for his Computer Science class 340. It is essentially a simplified version of a compiler. If you don't know, a compiler is the program that turns a source code, written by a programmer, into something that the computer can execute. What this program has to be able to do is input a file of source code, and output a file of compiled code. An additional task that Dr. Bazzi has issued to his student was that the program must also be able to execute that file of compiled code.

The only difference between the assigned compiler and a real compiler is simplified programming languages. The input file is written in a very bare-bones and basic language, and the compiled code does not have to be machine code, but rather an intermediate type of code that the program can execute at a later date.

So in conclusion I am writing a compiler that compiles a simplified programming language into another language, and then executes it. The compiler that I am writing, as with all programs, must be compiled, which leads me to the statement: I am compiling a compiler.

Thanks for reading,
- Jeff

Friday, February 15, 2013

The Search for Open Source Code

So I have begun a hunt for code that I can use for testing and analytics. And the code that I use must be available to read, not only run. Because while I will be doing a number of run-time analytics like performance and memory consumption, I will also be doing source code analytics. Analytics like total lines of code, or number of classes, measurements that would not be possible without access to source code, the readable, human-written form of programs. But source code is not always easy to find.

Since source code can be compiled and run on any compatible computer, publishing source code is pretty much giving your program away for free. Nowadays, almost all of the most popular software companies charge money for their software, so source code cannot be published from those companies. This is a shame, because I like open source software, and free software means a lot more than just a free price. Free software gives users complete control over their computers by giving its users the ability to modify the source code. For example, say a text editor on your computer cannot open a specific file type that you use without reformatting the pages and ruining all your beautiful bullet points. Well, if that program is open source then you can take control and fix it yourself, or more likely, search the internet for a fix already written. If the fix has not been written, and you want to alleviate the pain of those in the future who will encounter that same problem as you, then you can write the code and publish it yourself. It's a beautiful system full of progress and void of capitalism, but let's not start with that.

The main reason I like open source is because it allows me to read through the code and actually understand how a program is written. It's like taking apart a new remote control car to see how it works. It's incredibly fun and interesting and it promotes learning. So over the next few days I will be doing just that. I will be reading through others code and searching for the perfect example of Java or C++ to use for testing.

Thanks for reading!

-Jeff

Thursday, February 14, 2013

Today I Learned: Generic Programming

I read a very interesting article today regarding the differences between Java and C++ from the viewpoint of something called "Generic Programming," and I would like to share the basics of that with you. I am aware that I may lose some of you along the way, however I will do all I can to keep it clear and concise.

Generic programming is a powerful tool that can be used for applications that store a lot of data. For reference, the paper that I read was called A Comparative Evaluation of Generic Programming in Java and C++ by Hossein Saiedian and Steve Hill. It goes into much further detail onto the specifics and does a better job -I'm sure- of describing the differences, but it is very technical. I am going to attempt explain how this paper proves C++'s implementation of generic programming is more efficient than Java's in a simple and easy to follow way.

To begin, what you need to understand is this: When a computer program interacts with data, it must use either objects or primitive data types. These are specifications that tell the computer what kind of data it is operating on. Primitive data types are rather simple, like an Integer (int) or a Character (char). But objects are data types that can be defined by the programmer, opening up many more opportunities. For example, if I wanted to give my dog a shiny new blue collar I could say something like...

Dog Chance = new Dog();
Collar blueCollar = new Collar(Color.BLUE);
Chance.setCollar(blueCollar);

Here we see the objects 'Dog,' 'Collar', and 'Color' used. Now don't worry if you don't know how to program because that's all you need to understand about that. But if you do know Java programming, and are preparing to complain in the comments that I failed to import Java.awt.Color, all I have to say is I was being "clear and concise!"

These objects, like "Dog," can be part of a larger definition of objects (aka class), like "Pet" or maybe "Animal." In Java all user-defined objects are derived from one class, "Object." In contrast, C++ does not have one giant class to which all other classes belong. Two classes in C++ are not necessarily derived from the same class. This is an important distinction.

Generic programming is a technique of programming that enables the computer to operate on data without actually knowing what type of data it is operating on. The benefits of this include modularity and reusability of code. Let's take an example design of storing data in a numbered list. In C++ this can be done rather simply using vectors, provided to programmers in the standard C++ library, std. A programmer can choose to only allow a certain type of object into a vector list, or one could choose to allow any objects in. Std::vector is a simple way of storing data using generic programming in C++.

In Java it is also possible to store data in numbered lists, Array or ArrayList being a few common ways, both of which can be initialized without a type parameter, meaning you can create and add to these lists without needing to know the data type. The difference between Java and C++ is when a programmer chooses not to provide a data type to their Array or ArrayList, Java uses the Object type. So while generic programming is written almost exactly the same as in C++, Java's version is not truly generic.

Why would this matter, you ask? It matters because C++'s version of generic programming is much more efficient. The language was specifically designed with generic programming in mind. Whereas Java's version of generic programming is more of a work-around, and the non-type specific Arrays actually call methods of the type Object which must implement a Container interface, which slows down generic programming for Java.

I found this article very interesting. Deep language analysis like the one seen here is something that I would love to include in my presentation. I hope you followed along for the most part, and I hope you enjoyed this article. Thanks for reading!

-Jeff

Saturday, February 9, 2013

Constructing a table.

No, I'm not sawing wood into four legs and a large plane to be used as a vehicle for eating or reading or writing. I have plenty of those, and don't need to make another one. I'm talking about a data table. A beautifully geometric compilation of information in a neat and organized form. Why? Well, let me explain.

So, as previously written, I have been reading. Reading and reading and reading. I have been reading a wide range of articles and postings about many different aspects of Java and C++. Mostly they are scientific articles, usually containing a (preferably large) sample size of tested scenarios and programs, including different compilers and operating systems. In contrast, some things I have read are commercial articles suffering from a severe case of layman's terms, and others too are forum posts, containing a large amount of differing (though unsupported) opinions. Clearly, some of what I read cannot be trusted, while others can. However, all of which are valuable in some small way. Either they offer a unique way of approaching the problem or give a unique piece of evidence, something valuable can be learned from almost all of my readings.

In order to make use of all the information I have encountered, I find it necessary to create a table of data from all those articles. This data is not the type of data you may think it is. I'm not just talking about numbers from tests and programs. I will be noting other important factors, for example, what parameters does a specific paper use to judge the “performance” of a programming language, or what results does the paper value most and why. Comparing the differences between the approaches that these articles take to prove their points will help me decide what kind of approach I can take in order to have the most comprehensive and factually sound comparison, because that's what this is, right? "A Comprehensive Comparison."

I have a lot of work ahead of me, but I'm dedicated and willing to put as much time into this project as necessary. I may be crazy, but I’ve been enjoying all of this technical reading.

As always, I will keep you updated on all the exciting things I am learning from this project, and the progress I am making. Thanks for stopping by, and stay tuned!

-Jeff

Wednesday, February 6, 2013

Read Read Read...

Hello!
It's reading week for me in the Computer Science Department. How fun! *rolls eyes*
Despite this, I must say that reading is something that everyone must do to prepare for a project, no matter how boring. I mean, what good is research if not built off of other research? And what is the purpose of that research if it is not shared publicly? I believe it is the duty of scientists and researchers to spread new knowledge so that it can be utilized everywhere in the world.

I know, I know. I may be a little idealistic because, obviously, research institutions need to make their money somehow, and my head is just stuck in the clouds; however, at ASU, something is different that proves I may not have my head so far up in the clouds. Let me elaborate: on Monday, when Dr. Bazzi was showing me around the Department, he told me something wonderful. When using ASU's wireless internet, everyone can access all articles on Google Scholar that used to be locked behind a price tag.

Wonderful, right?

I know it may sound simple, but for a researcher and a lover of knowledge, I am finally free. And with Google Scholar, reading is actually a little bit fun. A little bit.
I have access to a wealth of articles from very reputable sources, using brilliant methods to test and prove their hypotheses and analyze their data. I no longer have to sift through the commercialized journal articles of online news sites, now I have great sources.
So there's an update, articles are being read and notes are being taken. I'm excited for the future of my research, and I hope you are too. Stay tuned!

~ Jeff