seth's soapbox

2009 06 06 - Version Control

Up until a few months ago I wasn't using a version control program. I was doing version control, but I was doing it in the grossest possible sense. I was copying my program directories and including a date and mini-changelog in the directory name.

I decided many months ago that distributed version control is superior to centralized. You get your own local full copy of the repository so you can commit early and often. Branching and merging is trivial so you can make a branch to fix a bug, or to experiment with something. Props to Tridge for triggering the DVCS revolution by getting the Linux project's license to Bitkeeper revoked.

For many months I'd been trying to get comfortable with Git and was having great difficulty. I couldn't find any documentation that was easily digestible by me (granted I'm new to version control).

I was in version control despair until last month when I discovered Mercurial. I was able to be fully confident with it in less than a week! There is a free online book for Mercurial that was a nice introduction for me. It explains concepts you need to know and basic Mercurial usage in a extremely clear and concise way. Mercurial also has a very nice built in help feature.

Here is a lecture given at google about Mercurial.

2009 01 03 - Build Systems

The first build system I'd used was GNU Make. Make is just fine for small projects but it doesn't have automatic dependency generation which is nice for larger projects. Dependency generation can be done with GCC (g++ -MM *.cpp), but it's slow enough to make it unsuitable for large projects. Fastdep is better but shouldn't dependency generation be built in to the build system?

Another problem with Make is with portability. If you need to compile software on different unix-like systems the library and include directories can be different. It is extremely difficult to write portable code to locate libraries in make so generally a Makefile generator like Automake or CMake is used. I don't care for Makefile generators much myself, they usually produce gross code. Just look at a generated makefile for a large project that uses automake and you'll see what I'm talking about.

One interesting thing about most build systems is that they define their own languages. Application specific languages can be elegant, but they can also be inflexible and often there is non-standard stuff people want to do when building a program.

For a few months I've been using SCons. It has automatic dependency generation. SCons is written in Python, and also the build files themselves are written in Python which gives great flexibility. SCons build files are platform independent, not just between different unix-like platforms but it also works on Windows with Visual C++.

SCons is a nice upgrade from Make. It's also easier to learn than Make. I wouldn't hesitate to recommend it to a beginner looking for a build system. Take a look at the code you'd need to build a simple program consisting of the files (main.cpp, A.cpp, A.hpp, B.cpp, B.hpp):

Program('out', ['main.cpp','A.cpp','B.cpp'])

That's it! It builds and links the program with output executible 'out' or 'out.exe' on Windows. Of course it gets more complicated on non-trivial projects that use third party libraries.

2009 01 02 - Nice Way to do Logging

C++ provides a few macros conventient for logging.

__FILE__
Replaced with the name of the file.

__FUNCTION__
Replaced with the name of the function.

__LINE__
Line number.

It is very convenient to know this information for debug messages but it's cumbersome to specify them in every debug message. I made a simple logging mechanism that automatically takes care of that. This is what it looks like:

LOGGER << "test" << 123;

And the output looks like this(<file>:<function>:<line> <message>):

logger.cpp:main:61 test123

This is accomplished with a macro, a function that creates a temporary object, and "chaining".

I've seen what ostreams do referred to as "chaining". The idea that each member function call returns a reference to the object on which the function is being called. Apart from ostreams I've seen this used for setting options. For example:

person Person;
Person.set_name("Joe").set_hair_color("brown").set_occupation("fire man");

It'd probably be best to set this information via ctor parameters but I suppose if you need to change this info some time after construction then "chaining" would be syntatically nice.

Anyways! Here's an example. The logger object is constructed at the start of the expression, and the logger message is printed when the object is destructed at the end of the expression. I was thinking about putting the logger information in a SQLite3 database so it could be queried by file or function.

Download: Logger

2008 12 30 - SQLite3 Wrapper

There is one piece of software that I think every programmer should know about. That is SQLite.

SQLite is a SQL database written in C (it has bindings for any programming language of consequence) that can be compiled directly in to a C/C++ program. It is fully featured, it supports journaling and does atomic operations using native file locking operations available from the operating system. It runs without spawning an additional process to arbitrate access to the database file.

SQLite is very fast for certain types of workloads but doesn't scale to very high concurrency. However I've used it with some concurrency and it seems to work well with the database timeout set > 0. The SQLite website states that an example of a concurent load beyond SQLite's capability would be a very high traffic website.

Anyways, I'm a C++ programmer and the SQLite API is in C which requires quite a bit of boiler plate to get things working. Results to queries are obtained by using a call back function. Needless to say you can't register a C++ member function as a call back directly. SQLite does give you a (void *) you can use to pass anything you want back to the call back but obviously that isn't type safe and it requires boiler plate within the call back to cast back to the desired type.

I wrote a wrapper which encapsulates all the boiler plate and presents a more C++ like interface to sqlite. There are many existing C++ wrappers out there but obviously I didn't like any of them since I wrote my own. :)

My wrapper lets you use a call back function or member function. It also adds functions to pass a parameter to the call back in a type safe way.

Required Call Back:

//regular call back
int call_back(int columns, char ** response, char ** column_name);

//call back with object of any type (templated)
int call_back(std::string str, int columns, char ** response, char ** column_name);

Wrapper Examples:

//no call back
DB.query("DROP TABLE test");

//with function call back
DB.query("SELECT * FROM table", call_back);
DB.query("SELECT * FROM table", &Test, &test::call_back);

//with function call back with object
test Test;
std::string str = "test!";
DB.query("SELECT * FROM table", call_back_with_object, &str);
DB.query("SELECT * FROM table", &Test, &test::call_back_with_object, &str);

The easiest way to test this out is: download the wrapper + test program, download the sqlite3 amalgamation, extract the amalgamation in to the wrapper directory, and run the included Makefile.

Download: SQLite3 Wrapper + Test Program, Wrapper Only, SQLite

2008 11 05 - Browsers Suck

This is evidenced by asking any web developer if he has to check his webpages in more than one browser. Implementing a browser is so complex that no one can do it properly so what happens is that whatever browser becomes popular becomes the de-facto standard.

I became convinced of this after watching Alan Kay's 1997 OOPSLA presentation located here (FFWD to 0:20).

The fundamental problem is that the complexity is in the browser and not in the web page. The browser should basically be a simple VM that offers a very minimal number of instructions to run "web page programs". Such a browser would be so simple to implement that anybody could write one.

A simple VM for web page programs wouldn't be limiting any more than your CPU would be limiting. Your CPU has very few and very simple instructions but we've built levels and levels of abstraction on top of it to simplify programming for it. Exactly the same thing could be done with the web page VM.

I don't think bandwidth would be a problem even though more data would have to be sent. It's likely that there'd be a few popular abstractions that most people would use. These abstractions could all be identified with a hash the web site could send so that you'd only have to download the abstraction once (or whenever it got updated). For example if I went to google.com and they sent their web page building abstraction, and then I went to amazon.com and they were using the same abstraction I'd know it and I'd be able use the abstraction google sent me instead of having to download it again.

*sigh*

It's not always about what is best and it's difficult to replace entrenched technologies even if they are known to be fundamentally flawed. I just want to make sure people know how bad web browsers are.

2008 10 27 - CGSI (Conway's Game Seth's Implementation)

So I hate virtually all programming assignments in school. I hate creating programs that thousands of CS students have programmed before. I have plenty of projects on my plate that are original that I learn a great deal from. I figure anything worth programming hasn't been programmed before. I'm not going to reinvent the wheel if someone has already made a perfectly fine libwheel.so. I guess there could be exceptions. For example I would want to make a wheel if I thought I could make a superior one.

I don't mind assignments where I am given some freedom. My first one of these assignments was in programming 1 at Modesto Junior College with Mr. Ray. He wanted us to program a poker game in Java that would detect a pair. I ended up going crazy with it and made it detect all hands, and I made it two player. It was my first large project ever and it was a total mess. It had a 1000 line long main function and all my arrays started at 1. At the time I wasn't aware arrays started at 0.. but this was my first programming class mind you.

Another teacher who has given me interesting programming assignments is Dr. Carter at CSU Stanislaus. I've had him for 3 classes so far and he always asks us to come up with final projects related to the class. In my previous class with him (Chaos and Information Theory) I made a Conway's Game implementation. To show the squares I strugged with OpenGL for days until I finally gave up and used some code to write pixels directly to a frame buffer. This didn't look very good because the boxes where only a single pixel wide so you couldn't see stuff very well.

This semester I'm taking a computer graphics with Dr. Carter and have gotten a lot of insight in to OpenGL. Like my previous two classes he asked us to come up with a project related to graphics. This was perfect for me because I was very unsatisfied with the graphics of my Conway's Game program so I decided to make graphics for it!

CGSI is the result. It has a dynamically scaling grid (controlled by '[' and ']' that can go from 1 to 128 pixels wide. Memory gets dynamically allocated for each square(at 1600x1200 at 1px wide the program takes 130MB of RAM). It starts paused so you can set up stuff by either clicking a pixel at a time or by dragging a line. Once ready you can hit 'p' to unpause it. Run speed can be adjusted with the '-' and '=' keys. There are interesting preset patterns that can be inserted automatically by hitting number keys 1 to 5.

All that is required to build is freeglut or glut.

Download: CGSI

Images: interface, pattern #1, pattern #5

2008 09 30 - Locking Pointer

Now that I discovered that I've been using the volatile keyword improperly I have old code that needs to be properly locked. I took the time to figure out "locking pointers" and they are nice! The locking pointer is not really a pointer but a object wrapper that syntactically acts like a pointer.

The idea is that you overload the inline dereference operator "->" in the locking pointer to return a proxy object which also has "->" overloaded. This proxy object locks a mutex in it's constructor and unlocks the mutex in it's destructor. The inline dereference operator will chain-dereference objects until it runs in to a inline dereference operator that returns a pointer. Take a look:

locking_pointer<std::vector<int> > l_ptr(new std::vector<int>);
l_ptr->push_back(1);

All accesses to the member functions of any objects put in the locking pointer are locked at the expression level. This is equivalent to the following code:

std::vector<int> vec;
mutex.lock();
vec.push_back(1);
mutex.unlock();

The only inconsistent part of using the locking pointer is when dereferencing using the "*" dereference opterator. Since this operator doesn't chain like the inline dereference you have to use another "*" on the proxy object that gets returned. For example:

locking_pointer<int> l_ptr(new int(0));
std::cout << **l_ptr << "\n";

I'm not sure if there is a syntactically nicer way to do this. Perhaps just make a named function to do it and not use an operator?

Download: Locking Pointer Demo

2008 09 26 - Volatile Keyword

What is volatile for? I've been operating under the assumption that instantiating a built in type with volatile made access to the instantiation atomic. This is plain wrong! I'm thinking my confusion must have come from reading some Java documentation because Java uses volatile differently.

Volatile prevents caching in a register between instructions. If you had something like y=x; ++x; your compiler might cache the value of x from the first expression in a register for use in the second expression. If a device in your computer changed the value of x between the first expression and the second expression the ++x would increment the old value of x that was cached in a register and write it back to memory (oops!).

Volatile is useful for dealing with memory that might be modified at any time by a device other than a CPU. Using volatile tells the compiler to reread the memory every time it is needed. If you're using volatile unnecessarily you're disabling optimizations!

To make sure an operation is done atomically you have two options. The first is to look at your CPU documentation to see if the operation will be performed atomically. This is a viable option if the code only ever needs to run on one CPU. The other option is to use some sort of locking primitive offered by your operating system like a mutex.

Nice document included with the Linux source: Volatile Considered Harmful

2008 09 12 - SHA1 Implementation

I made a SHA1 implementation (C++) that is reasonably fast, and easy to use. It should be compiled with -O3 because it makes a large speed difference. I've only tested it on little-endian but it 'should' work on big-endian too. It hashes data as you load it so you can hash arbitrarily large files. I've used it on 10GB files with no problem.

On my system (2ghz opteron) it runs at 38.5mB/s when hashing 1mB blocks in memory with 1mB reserved buffer for loading data. See sha.hpp documentation or look at test program to see how to reserve buffer space.

Download: SHA1

2008 08 25 - Recursive Makefile

I made a Makefile that copies itself to sub-directories and recursively calls itself. It deletes the sub-Makefiles depth first when it cleans up. It uses a $(shell ) command to get directories which may not be portable. This may not be the best way to build a project, it's just something I made for the heck of it.

Download: Makefile, test src directory

2008 08 22 - Interesting Links (First Post)

I thought I'd make a post about some of the websites that have shaped my worldview and informed me of interesting things.

blogging heads

The format of blogging heads is two bloggers having a conversation. I follow John Horgan/George Johnson (Science) and Will Wilkenson (Philosophy and Libertarianism). There's quite a range of topics.

TED

TED (Technology, Entertainment, Design) is a yearly conference that costs a huge amount of money to attend live but they have all of their lectures available online. There is a huge variety of stuff. Among the most interesting I've watched (optimism/crows/LHC/spore/freakonomics story/wikipedia/80-20).

Valid HTML! Valid CSS!