<?xml version="1.0" encoding="UTF-8"?>
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	xmlns:atom="http://www.w3.org/2005/Atom"
	xmlns:sy="http://purl.org/rss/1.0/modules/syndication/"
	xmlns:slash="http://purl.org/rss/1.0/modules/slash/"
	>

<channel>
	<title>Kenneths Blob &#187; Computers</title>
	<atom:link href="http://blog.langly.org/category/computers/feed/" rel="self" type="application/rss+xml" />
	<link>http://blog.langly.org</link>
	<description>My rants about everything</description>
	<lastBuildDate>Wed, 28 Apr 2010 18:10:10 +0000</lastBuildDate>
	<generator>http://wordpress.org/?v=2.9.2</generator>
	<language>en</language>
	<sy:updatePeriod>hourly</sy:updatePeriod>
	<sy:updateFrequency>1</sy:updateFrequency>
			<item>
		<title>Rant on Programming.</title>
		<link>http://blog.langly.org/2010/04/28/rant-on-programming/</link>
		<comments>http://blog.langly.org/2010/04/28/rant-on-programming/#comments</comments>
		<pubDate>Wed, 28 Apr 2010 15:08:44 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[Computers]]></category>

		<guid isPermaLink="false">http://blog.langly.org/?p=499</guid>
		<description><![CDATA[As both an avid programmer, and computer engineer, I see more and more informatics students finishing their bachelor or master&#8217;s degree without having the faintest idea about how a very basic CPU works. And by this I don&#8217;t mean that they don&#8217;t know how to write HDL, or how caches work. It&#8217;s that the closest ]]></description>
			<content:encoded><![CDATA[<p>As both an avid programmer, and computer engineer, I see more and more informatics students finishing their bachelor or master&#8217;s degree without having the faintest idea about how a very basic CPU works. And by this I don&#8217;t mean that they don&#8217;t know how to write HDL, or how caches work. It&#8217;s that the closest thing that the students encounter is the Java Virtual Machine, or the CLI from Microsoft. Now, I&#8217;m not going to start the entire &#8220;Which language should we teach the Students(tm)&#8221; discussion all over again, although I agree with Patterson on this manner ( Down-Up ).</p>
<p>However, independent of which programming language you learn in your freshman Computer Science track, I see no reason why not to combine it with some basic introduction to computer architecture. This on the basis of, independent of your future specialization, having a basic idea of how a computer works is really not a silly thing at all. At least on the level where you know that you have registers, memory and special flags. Furthermore, the modelling of a CPU lend itself very well to implementation in a high level language. At least if you are interested in a high level functional simulator. Writing such a piece of code could be as easy as you want, the code could be easily modular, and due to the nature of a processor or a virtual machine, you could easily introduce all the introductory concepts of programming. Your different functional units would be classes, the logic needs your basic control structures, and for parsing the input program, you need basic string processing.</p>
<p>Now, as always, to check if my hypothesis was correct, I took 5 minutes to write a simple model of a virtual machine / processor. Currently it supports 4 instructions ( add, put/load imm, prt, end ), so not all that but I think it illustrates the point. More importantly, the code base itself is more or less 200 lines of Java ( lending itself nicely to a exercise ), and I have tried to use some different concepts, such as basic control structures, OOP, all which you could find in your introduction to rogramming course. All without going overly complex.</p>
<p>Check the code out from git using: <a href="http://git.langly.org/java-cpu">http://git.langly.org/java-cpu</a>, or just point your browser to it.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2010/04/28/rant-on-programming/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>On GPU versus CPU cores.</title>
		<link>http://blog.langly.org/2009/11/17/gpu-vs-cpu-cores/</link>
		<comments>http://blog.langly.org/2009/11/17/gpu-vs-cpu-cores/#comments</comments>
		<pubDate>Tue, 17 Nov 2009 08:33:00 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[Computers]]></category>
		<category><![CDATA[GPU]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Computer Architecture]]></category>

		<guid isPermaLink="false">http://blog.langly.org/?p=485</guid>
		<description><![CDATA[In the confusion surrounding the amount of cores on a GPU contra the number of cores I want to contribute with my part, and do some clearing up.
Since the modern day GPUs with support for general purpose calculations, it has been pushed that they contain several hundreds of &#8220;Cores&#8221;, a magnitude higher than the amount ]]></description>
			<content:encoded><![CDATA[<p>In the confusion surrounding the amount of cores on a GPU contra the number of cores I want to contribute with my part, and do some clearing up.</p>
<p>Since the modern day GPUs with support for general purpose calculations, it has been pushed that they contain several hundreds of &#8220;Cores&#8221;, a magnitude higher than the amount of &#8220;Cores&#8221; you can find in a regular CPU, which these days are about 2-4 depending on your version of CPU. Now, this is due to marketing only, and it has bothered me for a while seeing how academicians and computer engineers have started to pick up the term core, using it relentlessly. </p>
<p>Now, the problem arises due to two major facts. First, a the comparison between what the general CPU manufacturers calls a &#8220;Core&#8221; (Intel, AMD etc.), and that what GPU manufacturers ( Nvidia, AMD/ATI ) calls a core, is in fact two similar, but different things. Typically, the modern GPU normally consists of several &#8220;Cores&#8221;, as seen in the following illustration of the new Fermi architecture:</p>
<p><img src="http://techreport.com/r.x/nvidia-fermi/cuda-core.gif" alt="Fermi ARchitecture" /></p>
<p>What you can see in this photo is that the GPU core is a scaled down version of what the CPU manufacturers would call ALU, or a functional unit. Now, compare this to the microarchitecture of a Core2 chip, and check out the yellow boxes. If you want to compare the amount of real cores, these are the ones you have to look at. Furthermore, it&#8217;s important to remember that the Core2 even have a vector unit, which can multiply / add several operands at once.</p>
<p><img src="http://upload.wikimedia.org/wikipedia/commons/thumb/6/60/Intel_Core2_arch.svg/518px-Intel_Core2_arch.svg.png" width="400" alt="Core2" /></p>
<p>Thus, what might be a more fair comparison is the number of multiprocessors in the GPU contra the number of cores on a CPU. In the newest Fermi architecture this is 16, contra the 4 cores on a quad core processor. </p>
<p>However, this is still an unfair comparison. The reason why is due to the type of applications the different architectures are optimized for. Needless to say, the GPU is optimized for graphics processing and stream processing, which in turn is just to churn out data with fairly regular behaviour and memory accesses. Thus, the complexity of of the GPU has been scaled down compared to that of an CPU which has to perform better on a much wider range of applications. Hence, what happens is that the CPU has to use a lot more resources / gates on control structures, leaving the control to calculation gate ratio much higher than found in a GPU. This again, leads to the huge differences between the number of &#8220;cores&#8221; between the CPU and GPU. </p>
<p>As a sidenote, there are still a lot of applications where the CPU outperforms the GPU <img src='http://blog.langly.org/wp-includes/images/smilies/icon_wink.gif' alt=';)' class='wp-smiley' /> </p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2009/11/17/gpu-vs-cpu-cores/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Zsh &#8211; Skipping words</title>
		<link>http://blog.langly.org/2009/10/22/zsh-skipping-words/</link>
		<comments>http://blog.langly.org/2009/10/22/zsh-skipping-words/#comments</comments>
		<pubDate>Thu, 22 Oct 2009 08:29:23 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[Tips&Tricks]]></category>
		<category><![CDATA[shell]]></category>
		<category><![CDATA[tricks]]></category>
		<category><![CDATA[zsh]]></category>

		<guid isPermaLink="false">http://blog.langly.org/?p=482</guid>
		<description><![CDATA[A couple of weeks ago I installed zsh on all of my shell accounts, and I&#8217;ve started to grow found of it. However, one thing that annoyed me is that per default I couldn&#8217;t press ctrl+arrows to jump back and forth amongst words like I could in bash.
However, the solution was quite easy as soon ]]></description>
			<content:encoded><![CDATA[<p>A couple of weeks ago I installed zsh on all of my shell accounts, and I&#8217;ve started to grow found of it. However, one thing that annoyed me is that per default I couldn&#8217;t press ctrl+arrows to jump back and forth amongst words like I could in bash.</p>
<p>However, the solution was quite easy as soon as I read the manual. First, in your terminal press ctrl+arrow, and copy the code that appears on your terminal. In my case it was &#8220;;5D&#8221; and &#8220;;5C&#8221;.</p>
<p>Then in your .zshrc file put:</p>
<div class="codesnip-container" >bindkey &#8220;;5D&#8221; backward-word<br />
bindkey &#8220;;5C&#8221; forward-wordbindkey</div>
<p>That should do it</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2009/10/22/zsh-skipping-words/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Fermi Architecture</title>
		<link>http://blog.langly.org/2009/10/01/fermi-architecture/</link>
		<comments>http://blog.langly.org/2009/10/01/fermi-architecture/#comments</comments>
		<pubDate>Thu, 01 Oct 2009 08:02:14 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[GPU]]></category>
		<category><![CDATA[Architecture]]></category>
		<category><![CDATA[Computer Architecture]]></category>
		<category><![CDATA[Computing]]></category>

		<guid isPermaLink="false">http://blog.langly.org/?p=479</guid>
		<description><![CDATA[Some new and interesting articles about the new Fermi architecture from nvidia:
http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&#38;p=1
and
http://techreport.com/articles.x/17670
Quite interesting to see how they are turning back to a more G80ish architecture again.
]]></description>
			<content:encoded><![CDATA[<p>Some new and interesting articles about the new Fermi architecture from nvidia:</p>
<p><a title="Inside Fermi: Nvidia's HPC Push" href="http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&amp;p=1" target="_blank">http://www.realworldtech.com/page.cfm?ArticleID=RWT093009110932&amp;p=1</a></p>
<p>and</p>
<p><a href="http://techreport.com/articles.x/17670">http://techreport.com/articles.x/17670</a></p>
<p>Quite interesting to see how they are turning back to a more G80ish architecture again.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2009/10/01/fermi-architecture/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>CUDA: Hacking PTX code.</title>
		<link>http://blog.langly.org/2009/02/12/cuda-hacking-ptx-code/</link>
		<comments>http://blog.langly.org/2009/02/12/cuda-hacking-ptx-code/#comments</comments>
		<pubDate>Thu, 12 Feb 2009 09:24:00 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[GPU]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">gpu/12022009-ptx</guid>
		<description><![CDATA[
In order to provide the CUDA developer with a low level programming
language without exposing any of the underlying instruction set, NVIDIA
have given us, the developers, the option to program in PTX ( Parallel
Thread eXecution ). The PTX being somewhat similar to &#8220;assembly code&#8221; in
structure opens up a new set of features to the developer, which ]]></description>
			<content:encoded><![CDATA[<p>
In order to provide the CUDA developer with a low level programming<br />
language without exposing any of the underlying instruction set, NVIDIA<br />
have given us, the developers, the option to program in PTX ( Parallel<br />
Thread eXecution ). The PTX being somewhat similar to &#8220;assembly code&#8221; in<br />
structure opens up a new set of features to the developer, which in<br />
certain cases might be useful to take advantage of. One case which I use<br />
a lot in my daily work is the ability to internally time blocks of code<br />
within a thread using the %clock register( Somewhat like the Time Stamp<br />
Counter on x86 ), which is not exposed through the CUDA high level<br />
language.
</p>
<p>
Although useful, the documentation is rather poor. Let me rephrase that.<br />
The PTX code itself is pretty well documented in the Nvidia SDK<br />
documentation, in the CUDA/docs/ptx_1.x.pdf file, with everything you<br />
need to know about the instruction format. However, its application is<br />
poorly documented in the documentation of the nvidia cuda compiler (<br />
nvcc.pdf ), and thus I thought I could be as kind as to provide you with<br />
a small hands on tutorial.
</p>
<p>
First, what I&#8217;ve found works best is to do some cheating, and let the<br />
compiler itself create a skeleton framework for me. This allows me to<br />
rapidly start developing the PTX code, without the boring part where I<br />
have to create all the auxiliary files by hand. What I usually do is to<br />
write a small skeleton .cu file, where I just create an empty __global__<br />
function with the correct parameters. Hence my initial skeleton file<br />
would look something like:
</p>
<p><div class="codesnip-container" >
<div class="c codesnip" style="font-family:monospace;"><span class="coMULTI">/* Cu-code */</span><br />
<span class="co2">#include &lt;cuda.h&gt;</span></p>
<p>__global__ <span class="kw4">void</span> zeroKernel<span class="br0">&#40;</span><span class="kw4">int</span> <span class="sy0">*</span>in<span class="sy0">,</span> <span class="kw4">int</span> <span class="sy0">*</span>out<span class="br0">&#41;</span><span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; out<span class="br0">&#91;</span>threadIdx.<span class="me1">x</span><span class="br0">&#93;</span> <span class="sy0">=</span> <span class="nu0">0</span><span class="sy0">;</span><br />
<span class="br0">&#125;</span></p>
<p><span class="kw4">int</span> main<span class="br0">&#40;</span><span class="br0">&#41;</span><span class="br0">&#123;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="co1">//** Set up **/</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="sy0">&lt;&lt;&lt;</span>threads<span class="sy0">,</span> grid<span class="sy0">&gt;&gt;&gt;</span> zeroKernel<span class="br0">&#40;</span>foo<span class="sy0">,</span>bar<span class="br0">&#41;</span><span class="sy0">;</span><br />
&nbsp; &nbsp; &nbsp; &nbsp; <span class="coMULTI">/** Tear down **/</span><br />
<span class="br0">&#125;</span></div>
</div>
<p>
 I would then run the nvcc with the command <i>&#8220;nvcc main.cu –ext=all<br />
–dir=a.out.devcode&#8221;</i>  in order to have it create the necessary files for me.<br />
Some explanation is needed though. One very useful feature of the CUDA runtime<br />
library is the support for what they call code repositories. During execution,<br />
the CUDA binary will check its current directory for a sub directory  and look<br />
for child directories, containing a cubin file. If the executable file finds a<br />
file matching his kernel, he will use the one from the code repository instead<br />
of the one found embedded in his binary file. The matching cubin file for the<br />
kernel can be seen here:
 </p>
<p><pre>
// cubin
architecture {sm_10}
abiversion   {1}
modname      {cubin}
code {
    name = _Z4testPiS_
    lmem = 0
    smem = 24
    reg  = 3
    bar  = 0
    const {
            segname = const
            segnum  = 1
            offset  = 0
            bytes   = 4
        mem {
            0x00000004
        }
    }
    bincode {
        0x00000005 0x60004780 0x30010209 0xc4100780
        0x1000ca05 0x0423c780 0x60040005 0x00000003
        0xd00e0209 0xa0c00781
    }
}
</pre>
</p>
<p>
The cubin file, is the executable file, and keeps all information<br />
needed by the binary application in order to execute. It also contains<br />
the kernel code in the <b>CODE</b> section of the cubin file itself. Quite<br />
nifty. For those of you especially interested in the binary format<br />
itself, Wladimir J. van der Laan has created an assembler / disassembler<br />
for the G80 architecture[1], and which can be read if you want to learn<br />
more about the true instruction set of the nvidia G80.
</p>
<p>
Besides the .cubin file, it should be a couple of files named comp_10 or<br />
comp_12, depending on which architecture you tried to compile the<br />
original .cu file to. This file will contain the PTX code for you to<br />
start code in, although with some extra directives such as debug<br />
statements, and various other lines of unneeded code. The following<br />
figure shows how the PTX code for the zeroKernel looks when compiled<br />
into PTX, minus the crud:
</p>
<p><pre>
/**
	PTX code
**/
.version 1.3

.entry _Z4testPiS_
{
    .reg .u16 %rh<3>;
    .reg .u32 %r<6>;
    .param .u32 __cudaparm__Z4testPiS__in;
    .param .u32 __cudaparm__Z4testPiS__out;
    .loc    14  5   0
	$LBB1__Z4testPiS_:
    .loc    14  6   0

    mov.u32     %r1, 0;

    ld.param.u32    %r2, [__cudaparm__Z4testPiS__out];
    mov.u16     %rh1, %tid.x;
    mul.wide.u16    %r3, %rh1, 4;
    add.u32     %r4, %r2, %r3;

    st.global.u32   [%r4+0], %r1;
    .loc    14  7   0
    exit;
$LDWend__Z4testPiS_:
}
</pre>
</p>
<p>
The given PTX code is the one that you can modify for your own purpose.<br />
Hence an easy check to make sure that the tool chain works is to change<br />
the <i>&#8220;mov.u32 %r1, 0;&#8221;</i> to <i>&#8220;mov.u32 %r1, 0xDEADBEEF;&#8221;</i>, which should give a<br />
different output from your main kernel. When done modifying the kernel,<br />
you can run <i>&#8220;ptxas -o sm10&#8243;</i> which will give you an updated of the cubin<br />
file itself. Careful though, ptxas will output by default to sm10<br />
architecture, so if your GPU/Tesla supports a different architecture you<br />
have to set this with the -arch sm_XX option.
</p>
<h3>Links: </h3>
<p>
[1] <a href="http://www.cs.rug.nl/~wladimir/decuda/">Decuda</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2009/02/12/cuda-hacking-ptx-code/feed/</wfw:commentRss>
		<slash:comments>2</slash:comments>
		</item>
		<item>
		<title>First OGP post</title>
		<link>http://blog.langly.org/2008/02/19/first-ogp-post/</link>
		<comments>http://blog.langly.org/2008/02/19/first-ogp-post/#comments</comments>
		<pubDate>Tue, 19 Feb 2008 21:00:00 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[GPU]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">ogp/firstpost</guid>
		<description><![CDATA[
Seeing how I&#8217;m lately have grown more and more involved into the OpenGraphics project, I&#8217;ve started a
separate blog category for it.  For those unaware of the OpenGraphics
project, we&#8217;re trying to create an open graphics board. At the moment
we&#8217;re close to having the Verilog down for the two main chips, just some
infrastructure and testing left.


And ]]></description>
			<content:encoded><![CDATA[<p>
Seeing how I&#8217;m lately have grown more and more involved into the <a<br />
href="http://opengraphics.org">OpenGraphics</a> project, I&#8217;ve started a<br />
separate blog category for it.  For those unaware of the OpenGraphics<br />
project, we&#8217;re trying to create an open graphics board. At the moment<br />
we&#8217;re close to having the Verilog down for the two main chips, just some<br />
infrastructure and testing left.
</p>
<p>
And what&#8217;s a blog entry about hardware without a screenshot. I still<br />
enjoy watching the RTL Schematics output from ISE, even after a couple<br />
of projects <img src='http://blog.langly.org/wp-includes/images/smilies/icon_smile.gif' alt=':)' class='wp-smiley' /> </p>
<p><p>
<a href="http://imgs.langly.org/main.php?g2_view=core.DownloadItem&#038;g2_itemId=2115&#038;g2_serialNumber=2" rel="lightbox"><br />
<img src="http://imgs.langly.org/main.php?g2_view=core.DownloadItem&#038;g2_itemId=2115&#038;g2_serialNumber=2" height="200"><br />
</a><br />
<br />
An I<sup>2</sup>C master I&#8217;m working on between writing tests.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2008/02/19/first-ogp-post/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>The GPU Pipeline. ( GPU #1 )</title>
		<link>http://blog.langly.org/2008/02/03/the-gpu-pipeline-gpu-1/</link>
		<comments>http://blog.langly.org/2008/02/03/the-gpu-pipeline-gpu-1/#comments</comments>
		<pubDate>Sun, 03 Feb 2008 20:31:00 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[GPU]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">gpu/gpu-intro1</guid>
		<description><![CDATA[
>
>


A couple of weeks ago, I posted a entry about me pondering upon writing
a series of articles on GPU, and topics related to GPUs such as
graphics. This first article will be a gentle introduction to the world
of graphics, trying to give you, the reader, a bird&#8217;s overview of the
entire pipeline. But first some general rules ]]></description>
			<content:encoded><![CDATA[<pre>
<< Work in progress >>
<< Images to come >>
</pre>
<p>
A couple of weeks ago, I posted a entry about me pondering upon writing<br />
a series of articles on GPU, and topics related to GPUs such as<br />
graphics. This first article will be a gentle introduction to the world<br />
of graphics, trying to give you, the reader, a bird&#8217;s overview of the<br />
entire pipeline. But first some general rules about this series of<br />
articles. Unlike others, I won&#8217;t go into the trap of saying that it will<br />
not be a lot of math required to understand the GPU. The GPU and 3D<br />
world is based upon mathematical principles, and thus a lot of the<br />
behaviour is best described using mathematics. I believe I shall manage<br />
to steer clear of mathematics in this introduction, but if you seriously<br />
want to learn how the GPU works, I strongly suggest brushing up on your<br />
linear algebra.
</p>
<p>BREAK</p>
<p>
Don&#8217;t panic!
</p>
<p>
Most modern computers nowadays have a special processor for dealing with<br />
graphics, being either 2D or 3D graphics. The reason why is two-fold,<br />
first it allows the CPU to concentrate on more the more important work<br />
at hand, and secondly it allows a more specialized processor than the<br />
generalized CPU. While today&#8217;s CPUs operates at core frequencies around<br />
~4Ghz, GPUs can still beat them at certain tasks operating on a mere<br />
1.4Ghz. This is mainly due to being a highly specialized device, and it<br />
having a lot of simple cores. Well, simple compared to the CPU core.<br />
While this specialization is great for certain domains, e.g. graphics,<br />
it decreases the performance in other fields. To begin understand why, a<br />
fundamental understanding of the graphics pipeline is important.
</p>
<p>
When a programmer programs a 3D application, he does so by creating a<br />
geometrical object, gives it a place in the world and then applying<br />
texture, before it appears on the monitor. Simplified this gives us 4<br />
steps. Create geometrical object => place it => texture => monitor, and<br />
which is reflected in both the OpenGL and DirectX pipelines. In a more<br />
technical terminology this is called Vertexes => Vertex Shader =><br />
Fragment Shader => Monitor. This simplified model has left out a<br />
fair share of important details between the different steps, but works<br />
for now.
</p>
<p>
In order to describe a shape to appear in the scene, the developer sends<br />
the GPU a series of 4-component vectors containing its position, X, Y, Z<br />
and a special H component describing if it is a point. Being a<br />
4-component vector allows us to do more transformations, but this is a<br />
topic to be further discussed in the article on vertex shaders. This<br />
stream of vertexes are so sent to the vertex shader, where they are<br />
multiplied with a 4&#215;4 matrix to give them a position in the view.  When<br />
the vertexes are transformed, they are assembled into predetermined<br />
shapes as specified by the programmer. OpenGL and DirectX gives the<br />
programmer 8 different shapes, whereas the rectangle is the most used<br />
one. By specifying a lot of rectangles, it is possible to describe very<br />
complex figures such as planes or overly dimensioned heroines.
</p>
<p><pre>< Reread ></pre>
<p>When the basic geometric shapes are assembled, they&#8217;re sent to a rasterizer.<br />
The rasterizer&#8217;s job is to create fragments. It does so by looking at the shape<br />
described by the vertexes, and create a set of fragments/points that might<br />
appear on the screen. Now, I know that this might be confusing if it&#8217;s the<br />
first time you read about graphics, but I&#8217;ll try to explain the difference<br />
between a fragment and a vertex.  Think of the vertexes as a way describing the<br />
corners in a triangle, while fragments are all the possible points inside the<br />
triangle. The points necessary to create the physical manifestation of the<br />
triangle, so to speak. In the figure below you see three vectors describing the<br />
triangle, while fragments actually makes the body of the triangle.
</p>
<p>
<a href="http://imgs.langly.org/main.php?g2_view=core.DownloadItem&#038;g2_itemId=2107" rel="lightbox"><br />
 <img height="300" src="http://imgs.langly.org/main.php?g2_view=core.DownloadItem&#038;g2_itemId=2107"><br />
</a>
</p>
<p>
When the rasterizer is done, it sends its fragments to the fragment<br />
shader. The fragment shader, working on the physical manifestation (<br />
remember? ), applies texture to the fragments and other special effects<br />
such as light. When the fragment shader is done, the fragment will be<br />
checked if it overlaps a previous fragment. No point in further<br />
processing a fragment if it is behind an already drawn fragment. (<br />
Usually this step has also been done earlier to reduce the work load. )<br />
If it passes the check, it will finally qualify for a position on the<br />
screen.
</p>
<p>
Being a quick and dirty introduction to the pipeline, several steps has<br />
been omitted in order to, I believe, make it a better birds view of the<br />
pipeline.</p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2008/02/03/the-gpu-pipeline-gpu-1/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
		<item>
		<title>Introduction to GPUs</title>
		<link>http://blog.langly.org/2008/01/11/introduction-to-gpus/</link>
		<comments>http://blog.langly.org/2008/01/11/introduction-to-gpus/#comments</comments>
		<pubDate>Fri, 11 Jan 2008 15:32:00 +0000</pubDate>
		<dc:creator>kenneo</dc:creator>
				<category><![CDATA[GPU]]></category>
		<category><![CDATA[Uncategorized]]></category>

		<guid isPermaLink="false">gpu/09012008-firstGPU</guid>
		<description><![CDATA[
I&#8217;ll start my first post of the year by announcing that I&#8217;ll try to
start blogging about GPU and 3D in general. Being a Ph.D student
researching on GPU architecture, I&#8217;ll blog about the different trends I
see in the world of graphics, while also having some in-depth articles
about relevant topics in the 3 dimensional world.


All this would ]]></description>
			<content:encoded><![CDATA[<p>
I&#8217;ll start my first post of the year by announcing that I&#8217;ll try to<br />
start blogging about GPU and 3D in general. Being a Ph.D student<br />
researching on GPU architecture, I&#8217;ll blog about the different trends I<br />
see in the world of graphics, while also having some in-depth articles<br />
about relevant topics in the 3 dimensional world.
</p>
<p>
All this would naturally be in an attempt to somehow try to organise my<br />
knowledge gained from researching, while also contributing something<br />
back to the community in a less formal and more accessible form than<br />
scientific journals and papers.
</p>
<p>
The first series of articles I plan to write is a introduction to<br />
the world of 3D, concentrating on basic topics such as textures,<br />
vertexes and the general 3D pipeline. To help illustrate the points<br />
I&#8217;ve started writing a simple application using Gtk and GtkGLExt<br />
allowing the user to easily manipulate the ModelView Matrix.</p>
<p><a href="http://devel.langly.org/glMatrix/glMatrix1.png" rel="lightbox"><br />
 <img height="200" src="http://devel.langly.org/glMatrix/glMatrix1.png"><br />
</a> </p>
<p><a href="http://devel.langly.org/glMatrix/glMatrix2.png" rel="lightbox"><br />
 <img height="200" src="http://devel.langly.org/glMatrix/glMatrix2.png"><br />
</a> </p>
<p>
The source of this simple application can be found here:<br />
<a href="http://devel.langly.org/glMatrix/glMatrix-0.1.tar.bz2"><br />
	glMatrix<br />
</a></p>
]]></content:encoded>
			<wfw:commentRss>http://blog.langly.org/2008/01/11/introduction-to-gpus/feed/</wfw:commentRss>
		<slash:comments>0</slash:comments>
		</item>
	</channel>
</rss>
