Close

GCN Blogs


Tech Blog

By GCN Staff

View all blogs

Readers add their tips to better parallel processing

Joab Jackson’s recent blog item on whether parallel processing requires new languages pointed out that developers are divided on the best way to split programs across a multiple-core architecture. The comments his blog drew reflect some of those divisions.

One reader said some programming languages are already up to the task:

“If your processor runs lots of programs with a multitasking operating system, you will tend to make pretty good use of the processor cores as the OS schedules the various jobs to run. For instance, a legacy FORTRAN program (definitely not "parallelized") ran nicely in four processors by simply partitioning the data into four blocks and running four instances of the program. … Occasionally, you will have that one, monolithic program that does not easily distribute across cores, but that's the only one that will need much attention. … So most can make good use of multicore processing even as we move to 6, 12, 16 or even 100 cores in an individual server as long as the number of processes you have is large (process-rich environment).”


More on this topic from GCN:

How to write apps for multiple cores: Divide and conquer
Does parallel processing require new languages?
Double-duty for video cards 


Another reader said you don’t have to create new languages from scratch: “The quotes by [James] Reinders and [Brian] Goetz in this article brush over the fact that one can create new languages without discarding legacy code if that language supports interoperability. Java was a new language yet its support for calling existing routines through JNI helped people make the jump to it without discarding their existing code.”

Yet another commenter said trying to find new ways to program parallel processors “places the cart before the horse, so to speak. The idea that the industry must design a new programming model for multicore processors is pure folly. It should be the other way around. We must first come up with a correct parallel programming model and then design the parallel processors to support the model. … The reality is that everybody in the computer universe who has an understanding of the issues know that the multithreaded approach to parallelism is complete nonsense. Threads are unreliable and hard to program. There is an infinitely better way to design and program parallel computers that is 100 percent threadless and deterministic.”

One writer suggested a different approach. “MasPar (massive parallelism) is an unavoidable milestone to be passed on the ‘AGI-Roadmap’ for my chosen field of artificial general intelligence. If old languages like Erlang and Haskell can do MasPar, fine, then we don't need to reinvent the wheel. But what we probably do need is a co-evolution simultaneously of MasPar hardware and MasPar languages to program the hardware.”

To read more comments from readers, go to the story.

Meanwhile, the issue of writing programs for multiple cores and multiple processors is explored further in “How to write apps for multiple cores: Divide and conquer,” which also appears in the June 15 print edition of GCN. 

Posted on Jun 11, 2009 at 2:15 PM0 comments


Double duty for video cards

When the next version of the Mac operating system, code named Snow Leopard, is released later this year, users might experience some surprising boosts in speeds, at least for some applications. The time it takes, for instance, to re-encode a high-definition video for an iPod could dramatically decrease from hours to a few minutes.

The secret sauce for this boost? Snow Leopard will have the ability to hand off some of the number crunching in that conversion to the graphics processing unit (GPU). The new OS is scheduled to include support for Open Computing Language, which allows programmers to have their programs tap into the GPU.


Related stories:

How to write apps for multiple cores: Divide and conquer
Does parallel processing require new languages?
Double-duty for video cards 


Typically, the GPU, usually embedded in a graphics card, renders the screen display for computers. But ambitious programmers are finding that GPUs can also be used to speed certain types of applications, particularly those involving floating-point calculations.

For instance, researchers at Belgium’s University of Antwerp outfitted a commodity server with four dual-GPU Nvidia GeForce 9800 GX2 graphics cards. The server would be used to look for ways to improve tomography techniques. They found that this configuration could reconstruct a large tomography image in 59.9 seconds, which is faster than the 67.4 seconds it took an entire server cluster of 256 dual-core Opterons from Advanced Micro Devices.

The cluster cost the university $10 million to procure, whereas the researchers' server only ran $10,000.

For a certain group of problems, GPUs can provide a lot more computational power than an equivalent number of central processing units (CPUs), argued Sumit Gupta, senior product manager for Nvidia’s Tesla line of GPU-based accelerator cards.

In order to render material visual displays, GPUs have been tweaked to do lots of floating-point computations. This sort of computation differs from the integer-based operations that CPUs usually perform insofar that integer computation truncates calculations on the right side of the decimal point, which could lead to small rounding errors. Floating-point operations carry out rounding to 32 bits (and double floating point carries it out for 64 bits). The hard-number crunching of scientific research, in particular, requires the accuracy of floating-point operations.

Graphics cards have always excelled at this kind of floating-point computation, Gupta said. In order to portray tree leaves fluttering in the wind or water trickling over a streambed of the latest computer game, the GPU has to calculate the color, depth and other factors of each screen pixel, which requires heavy matrix multiplication to floating-point precision. These sorts of calculations are not unlike those scientists need to do to solve mathematical conundrums in molecular dynamics, computational chemistry, signals processing and the like.

Nvidia, for one, has seen the interest in having the GPU do double-duty and has modified some of its cards to make them fully programmable. The Nvidia Tesla C1060 computing board is being offered for the scientific crowd. It has one GPU with more than 240 processor cores. It can offer 933 million floating-point operations per second.

To help programmers tap into this computational power, Nvidia has created a package of tools named Cuda. Part of this package is a library for the C programming language, called C Cuda. It offers a number of parallel keywords that developers can use to break off portions of their code to run on the GPU. They just insert the name of the library in their C code, and then they are able to use the functions to signify chunks of the code that can be run in parallel.

Cuda has proven popular with developers. More than 75 research papers have been written on Cuda, and more than 50 universities teach how to use the platform, Gupta said. Certainly, the Cuda sessions were among the best-attended at the SC08 conference in Austin, Texas, last fall.

Even with tools such as Cuda, however, writing for GPUs certainly makes the job of programming a little bit more complicated. For its own developers, government integrator Lockheed Martin, via its Advanced Technology Laboratories, is looking at ways to ease programming in heterogeneous processor environments.

"If you use a GPU, you need to learn the Nvidia compiler and learn how to put the appropriate extensions into your code in the GPU," noted Daniel Waddington, principal research scientist at Lockheed Martin’s labs. He is leading an effort to build what he calls a refactoring engine. Called Chimera, this software will be able to recompile code written in well-known languages so it can be reused across a wider variety of processors without the programmer needing to know the low-level implementation details of the GPUs or other new types of processors.

"The problem is not only are designers moving to multicore processors, but designers are coming out with new designs a few times a year," said Lockheed Martin research scientist Shahrukh Tarapore, who also is working on Chimera. “They have different programming models and different capabilities.”

Right now, Chimera works with the C and C++ languages, which are widely used within Lockheed Martin. If successful, Chimera could be used by the company’s programmers to quickly build programs that can take advantage of the latest processors — be they CPUs, GPUs or even some other design.

"Your source code is first transformed into an abstract syntax tree [so] it can be translated into other forms," explained Tarapore. This approach will also identify which sections can be broken into chunks that could be run in parallel. Those pieces are then pulled from the main body of the program and replaced with pointers to components that can execute the tasks on specific pieces of hardware

Posted on Jun 10, 2009 at 11:27 AM0 comments


Does parallel processing require new languages?

Now that almost all new servers and computers are running processors with multiple cores, the software-design community is trying to figure out the best way of making use of this new architecture. Unfortunately, the community is divided about what the best way would to split their programs across these multiple cores.

Getting the full workload of multicore processors can be tricky because, in order for a program to make use of more than one core, it must divvy its workload in such a way that it doesn't take more effort than the gains achieved by adding more cores. Most programming languages were written assuming just one processor would be working through the code sequentially, line by line.


What readers are saying:

It is often more cost-efficient to redevelop a software from the start as a parallel program than trying to reverse engineer the parallelism out of a dusty-deck sequential program.

Add your own comments at the bottom of this article


"The challenge is that we have not, in general, designed our applications to express parallelism. We haven't designed programming languages to make that easy," said James Reinders, who works in Intel's software-products division and is the author of a book on parallel programming titled "Intel Threading Building Blocks."

Parallel programming requires attention into two areas, Reinders explained. One is "decomposing" the problem in such a way that it can be run in multiple chunks, side by side. "The art of figuring out how to divide things up so that they are independent is a big challenge," he said. One operation can't be dependent of another operation that hasn't been completed yet.


More on parallel processing

Does parallel processing require new languages?
Best practices for getting Java to work for multicore processors
Livermore Lab pioneers debugging tool
The Prescient Amdahl
Multiple-choice test
What coders can learn from supercomputing
DOD tackles multicore computing
The fastest computers are going hybrid
Multicore does not equal core
Lawrence Livermore erects HPC test bed
Microsoft brings F# to Visual Studio
Fortress does the math


The second area requiring attention is that of scalability. The programmer does not know how many processors his or her creation will run on, just that it should run on as many processors as are available for the task. If the code specifies how many processors are being used, then it is badly written code, Reinders said.

The Defense Advanced Research Project Agency (DARPA) has been working on the issue through its High Productivity Computing Systems program (HPCS), at least for what is called coarse-grained parallelism, or programs that run across many processors. It has funded the development of a number of new languages that developers could use to write such programs.

DARPA's new languages use an architecture called the Partitioned Global Address Space. PGAS does two things: It allows multiple processors to share a global pool of memory, but at the same time it allows the programmer to keep individual threads in specified logical partitions so they will be close to the data as possible, thereby taking advantage of the speed boost brought about by "locality," as this is called.

"This is an attempt to get the best out of both worlds," explained Tarek El-Ghazawi, at a PGAS Birds-of-a-Feather session held at the SC08 conference held in Austin, Texas, last winter. El-Ghazawi is a George Washington University computer science professor who has helped guide the development of PGAS

"The idea is to have multiple threads, concurrent threads…all seeing one big flat space. But in addition, the threads would locality-aware, and you as a programmer would know what parts are local and what parts are not," he said.

One DARPA language created under this model is Chapel, which is being developed by Cray. Chapel was designed to "reduce the gap between parallel languages and the mainstream" languages, said Cray engineer Brad Chamberlain.

IBM is creating another DARPA-funded language called X10, which can run on a Java Virtual Machine, making it usable across multiple platforms. Again the focus is on familiarity. The plan was to "start with a sequential programming model that works" and add more elements of concurrency and distribution explained IBM researcher Vijay Saraswat.

But is it really necessary to develop entirely new languages? Reinders argues that extending commonly used languages, rather than building parallel-specific languages anew, would better suit for programmer needs.

"It is an interesting thought exercise to ask if we were start from scratch to build the perfect parallel programming language, what would we do? X10 and Chapel are very interesting projects and are very exciting but I don't see them catching on in any big way," he said. Why? They are too radically different from the programming languages most coders are used to. They would be too difficult to learn.

Look back over the last decade, Reinders urges. The languages that caught on, such as Java and C#, were not that different from languages that were widely used at the time, such as C++ or Visual Basic. "They felt familiar" and so it was easy for programmers to adopt them. Hence their success.

Likewise any move forward into the exciting world of parallel programming will be along the easiest path forward.

"People with legacy code need tools that have strong attention to the languages they've written and give them an incremental approach to add parallelism," Reinders said. If languages like X10 and Chapel do turn out to be popular, their advancements will be integrated into more popular languages.

Not surprisingly, Intel itself has taken this approach. It has developed an extension to C++ called Threading Building Blocks (TBB).

To build TBB, Intel developers rewrote those aspects of C++ that might lead to unpredictable results when used in a multiprocessor or multicore environment, such as memory management.

To use TBB, developers just include a link to the TBB library files in their code headers, and the TBB functions will be compiled into the code. Intel itself offers an extension to Visual Studio, called Intel Parallel Studio, that supports TBB. Using TBB, programmers don't even have to worry about writing for multiple process, or multicore processors.

Reinders offered an example of how a TBB-enhanced C++ app could work. Say a program is running across all four cores of a quad-core processor. But when another program is loaded onto one of the cores, say a virus checking software program, performance of the piece of the program running on that one core now slows down, which in turn slows the entire program. TBB functionality would automatically see that slowing in performance and move that portion of the program off that core and onto the other three.

TBB is not the only parallel-focused extension to popular languages. At that same SC08 PGAS session, other researchers showed off how they were extending popular language for parallel duties. For instance, GWU researcher Bill Carlson is developing Unified Parallel C (UPC), an extension of the C programming language for parallel environments. Over at University of California Berkeley, work is being done on dialect of Java called Titanium. Elsewhere, John Mellor-Crummey presented his work on Co_Array Fortran, an extension of Fortran 95 that is also being prepared for the next version of that language.

Whatever the approach, the goal of writing programs that can run concurrently on several processors, or processor cores remains elusive. "Concurrency is complicated, as an industry we are still coming to grips with the programming model," said Sun Microsystems engineer Brian Goetz at a talk about processors at the JavaOne conference being held this week in San Francisco. Are small changes or big changes needed?

"I'm skeptical of people who say we have to throw everything out about computing and start from scratch. We clearly don't have to do that – it's very expensive to do." Goetz said. "I think there is an incremental path to get there, but I do think we need to change the way we think."

Posted on Jun 05, 2009 at 12:59 PM18 comments


Data location not the overriding factor in cloud security

One of the criticisms usually weighed against cloud computing is that, with many cloud services, the actual location of where they store your data is unknown. Google, for instance, does not divulge the location of its servers that handle Google Docs. For government agencies that need to keep track of the location of the data for policy and regulatory reasons, this is a major deal-breaker.

But should it be? Knowing where the data is located, and that proper protective measures are in place there, is certainly instrumental in safeguarding the data. But location may not be the correct way to think about these concerns, said Lew Tucker, who is the chief technology officer for cloud computing initiatives at Sun Microsystems. He brought up this point June 1 in a cloud computing panel at the CommunityOne conference.

The question of "where the bits reside, of what geography or national boundary these bits exist within," is somewhat moot, given that "we are totally connected by networks," he said.

In fact, access, rather than location, may be the better way of thinking about things.

"It really is who has access to these bits that is the really critical question, not the locale where they reside in," Tucker said. "But right now we are governed by rules about the locale of the disk drive."

It's a good distinction. When you think about the location of a particular document, or anything else, what you are really thinking about a series of bits residing on some physical medium, such as a hard drive or tape drive, which itself is probably located in a network-connected data center.

But no one who is actually inside the data center can view the data with any more ease than any than anyone else on the network, In fact, if the data resides on a server without a monitor, everyone can access the data in exactly the same way, by a terminal from some other location. Sure, a wrongdoer could sneak inside the data center and steal the server with sensitive data. But again, any data center breach can be described just as well in terms of who had access to the data center, as well as the location of the data itself.

Posted on Jun 02, 2009 at 2:08 PM0 comments


Cyber Coordinator vs Cyber Czar

Many pundits and federal government observers are already raising the question,  "Will a government cyber czar improve national security?". 

The question, which arose from a series of new and significant cybersecurity policy moves announced by President Barack Obama in a White House speech May 29, is fair, but ill-informed. 

First, let's drop the czar part. This position has in fact been defined as a coordinator, with no operational responsibility or authority to make policy unilaterally. So let's keep the role and expectations in check.

Second, what's different and more important about this announcement, compared to past cyber space initiatives, is the fact that the president himself put a big pile of political chips on the table in support of making America's digital infrastructure more secure. That makes a huge difference, regardless of what title you give his lead coordinator.

Third, the recommendations in his May 29 cyberspace policy announcement are well-grounded from work originally prepared last year by the Center for Strategic and International Studies' Commission on Cybersecurity, which had gained broad public and private sector support. So there is strong consensus and significant momentum behind most of the policies the president outlined last Friday.  It is true, and Obama made it clear, much work remains to be done to pull together a coherent national strategy and it will take time. But with Obama saying he's now watching, many disparate efforts are likely to get fresh, rigorous, and more coordinated attention.

Of course, the questions of who will get the job--and will he or she have the political skills to reconcile such a broad portfolio of competing cybersecurity challenges--remain vital concerns.

But by President Obama declaring publicly that he would select this individual personally, that this person would sit on both the National Security Council and the National Economic Council, and that he or she would have regular access to the president, all point positively toward the notion that the cybersecurity coordinator can be effective, if not the final authority. And in all likelihood the national cybersecurity strategy due to be delivered to the president will prove to be more balanced and pragmatic than would likely emerge under the traditional notion of a czar.  For as we all know, and most czars discover, the pretense of power typically comes with too little authority to get substantive things done. In this case, the person to watch is Obama, not his czar.

--Wyatt Kash

Posted on May 31, 2009 at 1:24 PM0 comments


But is it really cloud computing?

Later this month, the General Services Administration (GSA) is expected to unveil how it moved one of its most popular public-facing services, the USA.gov government search site, to a cloud computing-based infrastructure.

The agency appears ahead of the curve. The White House touts it the benefits of cloud computing in documents supporting the 2010 budget. The idea is that by outsourcing computing and software support to organizations that could do it more cost-effectively (even those done in-house, a.k.a the Defense Information Systems Agency), the government could save IT costs.

So GSA's announcement would be a sign of the way forward, yes? Except, the service GSA is using may not actually be, strictly speaking, cloud computing, at least by the definitions of cloud computing now being formulated by the National Institute of Standards and Technology.

We first heard of GSA planning to moving USA.gov, and its affiliate Hispanic site GobiernoUSA.gov, to the "cloud" in February. Beyond mentioning that IT infrastructure provider Terremark would supply the cloud on which such sites shall rest, GSA provided scant technical and pricing details.

In subsequent conversations with GCN, Martha Dorris, acting associate administrator for the Office of Citizen Services and Communications at GSA, mentioned that the agency expected to reduce Web management costs by up to 80 percent from the move from the current provider. The Terremark contract will also cover Webcontent.gov as well as the data.gov initiative, she advised us.

More recently, we heard that GSA switched the sites over to the Terremark facility earlier this month and are planning to unveil the site within a few weeks. "The move is progressing remarkably smooth, with no major surprises or problems, just the process of learning new systems," said Thomas Freebairn, GSA's acting director of USA.gov technologies, in a statement sent to us by GSA.

Admittedly, cloud computing is, itself, a not very-well defined term, more marketing-speak than anything else. It could apply to any sort of computational capability that is outsourced. So, naturally vendors of Application Service Providers (ASP), Software-as-a-Service (SaaS) or utility computing have been quick to rebrand their wares as cloud offerings. And so they should, if it helps customers get a better handle of the benefits.

But the definition of cloud computing is slowly getting inclusive. Last week, the NIST released a draft definition of cloud computing, authored by Peter Mell and Tim Grance of the agency's Information Technology Laboratory.

The NIST draft pointed to a few of the key attributes that separate cloud computing from other types of offerings. Two of the key ones are "rapid elasticity" and "pay per use."

"Rapid elasticity" means "Capabilities can be rapidly and elastically provisioned to quickly scale up and rapidly released to quickly scale down," the NIST draft states. Pay per use means "Capabilities are charged using a metered, fee-for-service, or advertising based billing model to promote optimization of resource use."(The last one refers to services such as Facebook or Flickr that generate revenue from advertising).

Does Terremark's offering fit this model? The answer, as the Magic Eight-Ball states, is cloudy.

Not too long ago, we spoke with Robert Thompson, sales director within Terremark's federal group, and Steve Hill, engineering director for the federal group. While they declined to talk about the GSA work specifically, they did describe the company's pricing model.

The offering, called Enterprise Cloud (E-Cloud) does not exactly fit the profile of cloud computing, though it is definitely a step away from the hosting services that many of the company's competitors offer. With most hosting services, you can contract space out on a server, using the operating system provided.

With Terremark, you supply a VMware-based image of your complete operating environment, including an operating system (either from you or provided by the company). The company then runs this virtual instance on its own servers, under the VMWare ESX hypervisors. So, we presume that is what GSA is doing is moving the entire array of USA.gov sites, along with the supporting content management system, into a VMWare instance, where it will be run on Terremark's servers.

Like traditional hosting services, Terremark bills on a monthly basis. On the GSA schedule, E-Cloud comes in a number of pricing tiers. One configuration offers the equivalent processing power of a single dual-core processor-based server (5 Ghz), along with 10 gigabytes of Random Access Memory (RAM) and 100 gigabytes of storage, for about $2,000 a month. Additionally, bandwidth to and from the Internet is offered at the rate of about $47.50 per megabyte of dedicated bandwidth.

"It is not based on a server mentality, but on a pool-of-resources mentality. The customers can subdivide that resource for any number of servers," Hill said. Usage is calculated by taking five-minute samples from the statistics provided by the VMWare management software. (Terremark, in conjunction with Computer Sciences Corp. also offers a specific cloud service, called Trusted E-Cloud, which offers additional government-focused security and managed services).

With Terremark's plan, users estimate how much processing power they need and use the estimates to pick the most appropriate plan. If their usage goes over these limits, customers pay an overage, but, like cell phone users, they can switch to a larger plan in the following months. Since most Terremark clusters are not operating at 100 percent capacity at given time, the chances are that the additional muscle can be put to use within a few minutes to handle any overages, Hill pointed out.

No discounts are offered for not using less than the full capacity, though. The customer pays the same whether 4.99 GHz or 1.0 Ghz is actually used. And this is strikingly different from cloud services from Google and Amazon, both of which charge only for the actual CPU, storage and bandwidth that was used.

So, is what GSA is using actually cloud computing? Or is it a slightly different sort of hosting model, an admittedly innovative one based on virtualization?

On the one hand, it does not have the truly elastic pricing that Amazon does, where you literally can buy 47 cents worth of computing if you need to. But it does allow the user to scale up and down as traffic waxes and wanes, admittedly on a month-to-month basis.

As federal agencies move into this exciting new world of outsourcing, they may have to answer such thorny questions (GSA itself certainly seems to be grappling with the issue). Or, better yet, maybe they won't worry so much about getting in compliance with the latest buzzword.

GCN editor Wyatt Kash contributed to this article.

Posted on May 18, 2009 at 1:00 PM1 comments


Slimmer XML is faster XML

Extensible Markup Language (XML) tags do not need to be human-readable. In fact, not titling your XML tags in a manner that would make sense to strangers can save a significant amount of storage space and even speed transmission time, advised John King, head of the training firm King Training.

King spoke at the Independent Oracle Users Group's Collaborate 2009 conference in Orlando, Fla. this week. In his presentation, he offered a number of tips of improving the performance of systems that must transfer XML data—most of them around clipping the size of the XML documents themselves.

Cataloguing data with XML has gone a long way toward easing the movement of data among different systems, thanks to the universal use of character streams. The documents that XML can produce are quite verbose though. A peculiar trait of XML is that it was designed to be human-readable, when, in fact, humans rarely read XML documents.

In XML, every bit of data gets its own tag, and every tag is buried in a hierarchy of additional tags to add context around that data. While all the chattiness may not seem harmful when the amount of data you're sharing and storing is comparatively light, "If you transmit 30,000 messages an hour, it adds up," King said.

One thing that can be done is to shorten the tags, technically called elements. An element has two parts—the start tag and the end tag. Each is basically a string of identical characters enclosed within brackets. A start tag is placed at the beginning of the data being encapsulated, and a stop tag is placed at the end of that data. The only difference between the two is the stop tag has "/" character before the string of characters within the bracket.

Elements can be as long as the developer wants, which means they can be as descriptive as possible, such as "<TodaysSalesItems>." Anyone could read that and knows what the data in between the tags has something to do with today's sale items.

But such description is entirely unnecessary because XML markup is rarely read by humans while it is being processed anyway, King reminded the audience. So a tag such as "<TodaysSalesItems>" could be shortened to "<tsi>" thereby saving 32 characters for every time this data element is used (16 characters for the opening tag and 16 for the closing tag).

Many organizations have long had abbreviation dictionaries, perhaps leftover from the days of punch cards when memory was a scarce resource. Such dictionaries offer previously-agreed upon short names for commonly used titles. They could be used as a basis for short element titles, King advised.

XML documents can be further shortened by using attributes whenever possible, instead of elements. Attributes are added on to elements to further clarify the meaning of a particular piece of data, which can save the need to develop a separate element for just some aspect of data. The only drawbacks to attributes are that they can be only be used for a single element and that they can not be further parsed, King said.

Cutting down on white spaces in a document also saves space. Leave out the new line, tab or blank space commands cuts bytes from the document. The data may not be as easily readable for the humans, as all the data is mushed together in a single line, but it is just as easily read by the machine.

Another place to look to shave bytes is to strip out all the unneeded sections of the Character Data (CDATA) section of the document, King advised. CDATA is the part of the XML document that is not parsed. It usually is used to stash notes about the document itself. While this is helpful for someone trying to understand the information, it makes no sense to include all CDATA information in each transmission of the information.

In order to show the storage and transmission savings that could be enjoyed by these techniques, King took a sample of XML'ed data from Oracle documentation, and substituted in shorter element names. He reduced a set of tags that was 313 characters in length to a set 215 characters in length, which is about two-thirds the original. He also made a set where he substituted attributes for some elements, reducing the size of the set even further, to 179 characters. He then encoded 30,000 records with each set of tags, so that they could be compared in terms of size and how quickly they could be committed to a database.

The regular data set, in a flat-file format, was 39.4 megabytes (MB), while the one with the shorter tags was 28.9 MB. The set with the attributes was 27.3 MB. Using a standard laptop with 4 gigabytes of working memory, it took 29 seconds to store original set of data in a database. It took 23 seconds to store the shortened set, and 21 seconds to store the set with attributes.

The only quirky finding King came across was that once in storage, the data set with attributes took up more space (20 MB) inside the database, than either the dataset with the shortened tags (11 MB) or the one with the original tags (14 MB).

King also noted that, when it comes to storing data in a database, the most efficient method, space-wise, is to shred the data, or pull each data out of its XML-tag encasement and store in a separate database field. Using this approach, King found that all three datasets took exactly the same amount of space in the database—8 MB. It took 11 minutes to shred each data set, however. So shredding saves space but consumes time, King noted.

While the speed of XML processing can be picked up by slimming down the XML data itself, King also offered a number of other ways to speed XML data transfers. Of course, vendors offer a wide range of XML accelerators, or appliances that strip the XML tags as they are sent from the data source and add them back in at the destination. This approach reduces the flexibility that comes with using XML, however. Another approach is using the JavaScript Object Notation (JSON), a lightweight data-interchange format.

One thing to take a look at is if the data is being validated against an XML schema, and, if so, if it is necessary. By validating an XML document, you are checking to see if it is in the appropriate structure, as defined by its schema. While this is a valuable function when receiving data from other parties, checking data when it is being moved about internally, however, consumes uneccessary processing power.

Also, King advised to look at what is being transmitted and question if all the elements need to be transmitted. For instance, a description of some item that is being ordered through a parts requisition system does not need to be included—only the part number might be necessary. In many cases, the complete set of information is sent in order to validate the data-mark-up. XML parsers can not validate XML files unless the entire structure of that schema is in place.

"Being well-formed eats our lunch, in terms of performance," King said.

Posted on May 05, 2009 at 12:06 PM2 comments


Tweeting 101

We've been covering Twitter quite a bit recently, though, like the Internet itself, it is something best experienced personally. The good news is that, in its most basic incarnation, Twitter is about the easiest Web application you could ever use — even if it is, at first glance, a bit obtuse to outsiders.

Here is how to get started.

First, go to Twitter's home page, you'll find a "Get Started — Join!" box, and pick a user name and password. After you fill out and submit everything Twitter wants you to, you are presented with your landing page. Bookmark that page; that's the one you'll want to return to whenever you want to check out Twitter.

In the middle of the page is a small box, which you can click and then fill in with whatever it is you want to say. Try to say something interesting. Keep in mind you have only 140 characters to work with (the page helpfully offers a counter that will let you know how many characters you have left after you start typing).

When you're finished your missive, hit "update." There you go, you've just tweeted.

For anyone to see this message, of course, you'll need a few "Followers." These are the people who can read your messages from their own Twitter accounts. The easiest way to get followers is to let people know you are on Twitter. Send out your Twitter name to colleagues and friends who are already using the service, so they can add you to the list of people they follow.

You'll also want to keep tabs on what other Twitterers are doing. When you "Follow" someone, their messages will appear on your Twitter landing-page. On the top-right hand of the page, you'll find a "Find People" search, which you can use to find other Twitter users by name and then add them to your list of people to follow. Also, scope out the the GovTwit site, it keeps a list of government personnel who are already tweeting.

And this is where the fun starts: As you add more followers, and get more updates from your group, your landing page will start to resemble an ever-flowing stream of updates, giving you a near-real-time completely-personalized ticker feed of what your peers are up to. This can be useful for your job, or just plain entertaining, depending on the kind of folks you follow (You'll have to keep hitting the browser's refresh button to get that ticker-tape effect).

If you want to reply to someone's else's message, of just send a message to their attention, put the recipient's Twitter handle, prefixed with a "@" sign, into the message, i.e. "Hello! @Govcomputernews" would send a message to Govcomputernews. The recipient would see the message in his or her replies page (which is a link off the main landing page with the account holder's handle). Note: These reply messages can be seen by anyone who is following your account. You can also send a direct message that can only be seen by the recipient, but that recipient needs to be following you for you to mail that person a private message.

Another way to get the word out about what you are Twittering about is to insert hash tags into messages. A hash tag is the "#" symbol affixed to a word that describes what you are writing about. That way, your message will be picked up by services that periodically do searches on widely-used hash tags and aggregate the results, as well as by Twitter's own search engine.

For instance, Steve Ressler declared "#GovLoop" would be a hash tag that his government's social networking site, GovLoop, would use to identify those messages that would be interest to that site. The home page of that site now has a link to the results of a real-time search of all the messages tagged with #GovLoop.

And that's about it. There are other aspects about running a Twitter account that could be worth attending to over time, such as setting up a private account or customizing your profile page, so check the Twitter Getting Started page for more tips.

Note: For a rundown of more advanced desktop clients that can be used Twitter, check out the GCN guide, here.

Posted on Apr 14, 2009 at 7:23 AM0 comments


Who tweets in government?

It's official: Twitter has made it to the top ranks of government. Yesterday, the head of the U.S. Armed Forces, Navy Adm. Mike Mullen, chairman of the Joint Chiefs of Staff, started tweeting.

According to a survey by research firm CommStat, almost 10 million people were using the microblogging service as of February. While 10 million is still small potatoes compared to, say, Facebook's small nation of 200 million users, Twitter's increase in users is quite impressive, up 700 percent from a year ago.

Moreover, "Tweeting" is not just something that the kids are doing: The majority of Twitter users are older than 35.

Who tweets in government? A surprising number of folks, and they are using it for all sorts of reasons. As part of its Twitter Grader service, marketing firm HubSpot compiles the numbers of Twitter users in major cities around the globe. If these stats are accurate, D.C. seems to be catching on to the phenomenon, right behind the digerati in other tech-savvy cities such as London, San Francisco, and Austin, Texas.

"We never thought we'd say this, but some of the best and most innovative new media experiments going on right now on the Internet are coming from the U.S. federal government," Business Insider's Silicon Valley Insider blog gushed not too long ago.

We suspect that Mullen himself will not be tweeting his deepest, darkest middle-of-the-night worries about, say, Iran or North Korea. Thus far, his tweets seem to official announcements from the Joint Chiefs of Staff office. And, given Mullen's daily schedule (we understand he is touring India and Pakistan at the moment), we wouldn't be surprised if a ghostwriter may actually be posting them, a favored tactic among busy celebrities.

But official use works, too. The format is flexible enough that it can be used for a wide variety of communiqués — from the official to the personal. As long as it can be stated in 140 characters or less, it can be added into the ever-flowing stream of tweets that define the day.

The Defense Department is not the first agency to use Twitter. NASA's Jet Propulsion laboratory found the service a great way to provide updates on the Mars Rovers. More than 43,000 Twitter users have signed up for updates from @marsphoenix account. Elsewhere, the National Institutes of Health posts health-related dispatches. The U.S. Geological Survey posts earthquake and tsunami warnings, while the Food and Drug Administration posts updates on food recalls.

At the state and local level, the Los Angeles Fire Department posts updates about its calls on Twitter. Both the New York State office of the Chief Information Officer and the Utah State Library Digital Library Services post about the latest computer services that their states offer, to offer two examples.

Congress has taken to Tweeting in a major way, what with 19 senators and 50 members of the House of Representatives all using the service (two volunteer-run sites, Congressional 140 site and TweetCongress, capture the latest posts from all these elected officials).

Twitter is also used by government personnel as a way to explore and document their respective fields. Bev Godwin, the director of USA.gov and the White House's new-media guru, used the format to report on some of the sessions at the recent Government 2.0 Camp. NASA astronaut Mike Massimino is chronicling his training for the fifth and final space shuttle Atlantis mission to service the Hubble Space Telescope, according to NASA. And Dan Mintz, the former the chief information officer of the Transportation Department, is a dedicated observer of all things tech through his feed.

And this is just the tip of the seemingly ever-growing iceberg. Bearing Point’s Steve Lunceford compiles a list of government Twitterers in government agencies and the government contracting community, called GovTwit. At last count, 1,060 names are on the list.

Steve Ressler, who runs the GovLoop government-oriented social networking site, started a discussion thread asking everyone to name their favorite government bloggers. Increasingly, conversations that used to take place by e-mail or around the water cooler are ending up on Twitter.

"The main source of most of our new traffic is from Twitter," Ressler commented in an interview with GCN affiliate publication Federal Computer Week.

Posted on Apr 09, 2009 at 1:48 PM4 comments


IPv4 addresses dwindle by the day

The available supply of IPv4 Internet addresses for the United States, Canada and the North Atlantic region is expected to be exhausted within roughly the next two years, based on current projections.

That’s according to data supplied by the American Registration for Internet Numbers and a widget now freely available from Intec NetCore.

The widget, which can be found in the right navigation panel on GCN.com's IPv6 portal page of GCN.com, displays the number of days remaining before ARIN is expected to run out of available IPv4 addresses. As of March 30, the number of days had dwindled to 786. ARIN is one of the five regional Internet registries in the world.

Of course, there are a virtually limitless number of new IPv6 addresses available. However, because users in the United States have grown accustomed to an abundance of the old-format addresses, most organizations have taken only tentative steps in preparing for the longer and more versatile IPv6 addresses.

Many organizations, however, may find themselves at a sudden global disadvantage when the market swings toward a more fully functional IPv6 era.

The cost of developing and maintaining systems that can efficiently process both address formats will ultimately prove costly, says Brad Boston, Cisco System’s senior vice president of global government solutions.

He notes that organizations and government agencies — especially those who depend on global supply chain logistics to run their operations — may one day find themselves becoming IPv4 islands in an IPv6-driven world.

In the meantime, GCN readers can keep an eye on the approaching day of reckoning.

Posted on Mar 30, 2009 at 2:25 PM5 comments


Recrunching the numbers on Mac procurement

Last month, GCN covered the issue of using Apple Macintosh computers in the workplace. We considered both the supposed pros (security, ease of use) and the cons (interoperability and the price premium) of using the Macs.

While, thanks to the Web, interoperability has become all but a non-issue over the past few years. But pricing remains a concern, real or imagined. In one of its government marketing Twitter blogs, Microsoft made hay with a Computer World article stating that Mac sales dropped by 17 percent, a drop one analyst attributed to the premium prices of the computers.

But enough evidence exists that the truism that Macs are always more expensive than PCs may, in fact, not always be true. In some cases, perhaps the Mac is the best deal.

In the GCN article, we compared the cost of a few Macs, along with a few similarly outfitted laptops on the General Services Administration's Schedule 70 for government IT purchases. It was a quick, informal comparison, one that pitted the a $1,900 MacBook Pro against a $738 Acer Aspire 5315.

One government IT manager, who wishes to remain on background, e-mailed that our comparison was hardly apples-to-apples (or Apple, as the case may be). The high-end MacBook Pro—with its 15-inch screen, backlit keyboard, Intel Core2Duo, two gigabytes of random access memory, Nvidia GeForce 9600M graphical processing unit—is just a much more powerful machine than the budget-minded Acer model, which is heavier, has a smaller screen and runs on a slower Intel Celeron processor. It is a "very big difference," he e-mailed, echoing more than a few reader comments.

If we wished to compare budget laptops, he suggested, a more apt comparison might be between the Acer and a basic MacBook, which is priced starting just below $1,000, not including any GSA discount. It's still pricier than the Acer, but not so much that the Apple polish couldn't justify the additional cost.

For additional consideration, our source also offered a very telling pair of comparisons between Apple laptops and Microsoft Windows-based devices. He compared list prices of both two mid-priced workhorse models and two budget-conscious models.

For this comparison, he chose the Dell Latitude line of laptops as the closet Windows equivalent to the Apple MacBook, on the basis that the Latitudes are generally pretty sturdy.

In the mid-priced market, he found that a white-clamshell MacBook, with an Intel 2 Ghz Core 2 Duo processor, would run $1,248, while a Dell Latitude E4300 with a similar specification and an Intel 2.26 Ghz processor would actually cost more, $1,690. In other words, in some instances, an Apple laptop may actually cost less than a similarly-configured Windows-based one.

On the other hand, when looking at the high-end, our source found that Apple is still more than willing to help you spend your money. An Intel 2.4 Ghz Core 2 Duo processor-based MacBook Pro, with configured for the power user could cost about $2,348, while a roughly equivalently configured (though arguably less elegant) Dell Latitude E6500, one with a 2.26 Ghz Intel processor, would run only $1,356.

In this case, you can get more computer for fewer dollars with Dell.

Both the MacBooks and the Latitudes in his comparison had similar hardware configurations, when it comes to Random Access Memory, processor speed and the like. And both products come with a three-year warranty.

Of course, comparing prices between competing products can be a never-ending game, and can be rigged in no end of ways. If you're enduring a slow day, why not do some of your own comparisons across both the Dell and Apple sites? As always, common sense holds true here: To get the most from your (or the taxpayer's) money, you have to understand your requirements very well. How fast do you really need your processor to be? How much memory do you need? How much disk space? How rugged does it need to be? How light? These comparisons do provide enough evidence, though, to start rethinking the idea that Apple Macs are always the more expensive choice.

Posted on Mar 18, 2009 at 1:46 PM7 comments


Kundra created a buzz

Last Thursday, the World Wide Web Consortium kicked off a workshop for establishing e-gov standards for information sharing. There, attendee buzz was all around a fresh catchphrase: open-government data.

The phrase was a distillation of an idea that the new federal chief information officer Vivek Kundra has had for government agencies exposing more of their data for government use. And everyone was discussing it. One attendee asked how agencies would prepare the data in such a way that it could be useful to others. Maybe all government employees should have their own blogs so they could describe how they are fulfilling the agency mission, another suggested.

Kundra's vision was broad and would require a lot of work. But many federal managers — at least those charged with keeping and extending government data — were clearly passionate about making the whole idea of government data feeds work.

Now that vision may be evaporating as quickly as it arrived.

It was only two weeks ago that Kundra, formerly the chief technology officer for Washington, D.C., was named the first federal CIO for the U.S. government. Judging from his work in Washington, he had more than a few innovative ideas about how government could better use IT — from cloud computing and open data feeds to better reporting tools for displaying such data.

A week later and just a few hours after his keynote at the FOSE IT expo, the FBI arrested a member of Kundra's D.C. office on charges of bribery. Although Kundra was not charged, the Obama administration placed him on a leave of absence until the scope of the criminal case could be better understood.

Even if Kundra is cleared of malfeasance, the question is still out whether he will be returned to the federal CIO spot. "The fact that Kundra himself is not accused of any wrongdoing is beside the point, " Federal Computer Week news editor Michael Hardy pointed out. "As the District of Columbia’s chief technology officer since 2007, he was responsible for the actions of his 300 employees."

For an administration harping on government transparency and oversight, the fact that these shenanigans took place in Kundra's office may be reason enough to look elsewhere to fill the post.

Whatever the outcome for Kundra, it would be a pity to lose his ideas, even if they require additional work on the part of IT managers. As Kundra pointed out, the federal government has fallen behind the commercial sector when it comes to deploying new technologies, and this disparity leads to both greater costs and less citizen service. Opening government data and making greater use of commercial cloud technologies are both valid ideas that could close that gap.

Update (3/17/09): The White House has reappointed Kundra as CIO, according to sources.

 

Posted on Mar 16, 2009 at 1:46 PM3 comments


GCN eNewsletters

eSeminar

  • Find opportunity in the cloud Patrick Stingley

    Washington Technology presents Patrick Stingley, chief technology officer of the Bureau of Land Management, in a recent eSeminar, where he explains opportunities and challenges of the federal government adoption of cloud computing. Read more