This tutorial shows how the BLAS level 1 functionality available in ViennaCL can be used. Operator overloading in C++ is used extensively to provide an intuitive syntax.
We start off with including necessary headers: 
 In this tutorial we do not need additional auxiliary functions, allowing us to start right with main(): 
 Scalar Operations 
Although usually not very efficient because of PCI-Express latency, ViennaCL enables you to directly manipulate individual scalar values. As such, a viennacl::scalar<double> behaves very similar to a normal double.
Let us define a few CPU and ViennaCL scalars:
 CPU scalars can be transparently assigned to GPU scalars and vice versa: 
std::cout << "Copying a few scalars..." << std::endl;
s2 = vcl_s2;
vcl_s3 = s3;
 Operations between GPU scalars work just as for CPU scalars: (Note that such single compute kernels on the GPU are considerably slower than on the CPU) 
std::cout << "Manipulating a few scalars..." << std::endl;
std::cout << "operator +=" << std::endl;
vcl_s1 += vcl_s2;
std::cout << "operator *=" << std::endl;
vcl_s1 *= vcl_s2;
std::cout << "operator -=" << std::endl;
vcl_s1 -= vcl_s2;
std::cout << "operator /=" << std::endl;
vcl_s1 /= vcl_s2;
std::cout << "operator +" << std::endl;
s1 = s2 + s3;
vcl_s1 = vcl_s2 + vcl_s3;
std::cout << "multiple operators" << std::endl;
s1 = s2 + s3 * s2 - s3 / 
s1;
vcl_s1 = vcl_s2 + vcl_s3 * vcl_s2 - vcl_s3 / vcl_s1;
 Operations can also be mixed: 
std::cout << "mixed operations" << std::endl;
vcl_s1 = s1 * vcl_s2 + s3 - vcl_s3;
 The output stream is overloaded as well for direct printing to e.g. a terminal: 
std::cout << "CPU scalar s3: " << s3 << std::endl;
std::cout << "GPU scalar vcl_s3: " << vcl_s3 << std::endl;
Vector Operations
Define a few vectors (from STL and plain C) and viennacl::vectors 
std::vector<ScalarType>      std_vec1(10);
std::vector<ScalarType>      std_vec2(10);
ScalarType                   plain_vec3[10];  
 Let us fill the CPU vectors with random values: 
for (unsigned int i = 0; i < 10; ++i)
{
  std_vec1[i] = randomNumber();
  vcl_vec2(i) = randomNumber();  
  plain_vec3[i] = randomNumber();
}
 Copy the CPU vectors to the GPU vectors and vice versa 
 Also partial copies by providing the corresponding iterators are possible: 
viennacl::copy(std_vec1.begin() + 4, std_vec1.begin() + 8, vcl_vec1.begin() + 4);   
 
viennacl::copy(vcl_vec1.begin() + 4, vcl_vec1.begin() + 8, vcl_vec2.begin() + 1);   
 
viennacl::copy(vcl_vec1.begin() + 4, vcl_vec1.begin() + 8, std_vec1.begin() + 1);   
 
 Compute the inner product of two GPU vectors and write the result to either CPU or GPU 
 Compute norms: 
 Plane rotation of two vectors: 
 Use viennacl::vector via the overloaded operators just as you would write it on paper: 
vcl_vec1 = vcl_s1 * vcl_vec2 / vcl_s3;
vcl_vec1 = vcl_vec2 / vcl_s3 + vcl_s2 * (vcl_vec1 - vcl_s2 * vcl_vec2);
  Swap the content of two vectors without a temporary vector: 
 The vectors can also be cleared directly: 
vcl_vec1.clear();
vcl_vec2.clear();
 That's it, the tutorial is completed. 
  std::cout << "!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!" << std::endl;
  return EXIT_SUCCESS;
}
Full Example Code
#include <iostream>
{
  
  std::cout << "Copying a few scalars..." << std::endl;
  s2 = vcl_s2;
  vcl_s3 = s3;
  std::cout << "Manipulating a few scalars..." << std::endl;
  std::cout << "operator +=" << std::endl;
  vcl_s1 += vcl_s2;
  std::cout << "operator *=" << std::endl;
  vcl_s1 *= vcl_s2;
  std::cout << "operator -=" << std::endl;
  vcl_s1 -= vcl_s2;
  std::cout << "operator /=" << std::endl;
  vcl_s1 /= vcl_s2;
  std::cout << "operator +" << std::endl;
  s1 = s2 + s3;
  vcl_s1 = vcl_s2 + vcl_s3;
  std::cout << "multiple operators" << std::endl;
  s1 = s2 + s3 * s2 - s3 / 
s1;
  vcl_s1 = vcl_s2 + vcl_s3 * vcl_s2 - vcl_s3 / vcl_s1;
  std::cout << "mixed operations" << std::endl;
  vcl_s1 = s1 * vcl_s2 + s3 - vcl_s3;
  std::cout << "CPU scalar s3: " << s3 << std::endl;
  std::cout << "GPU scalar vcl_s3: " << vcl_s3 << std::endl;
  std::vector<ScalarType>      std_vec1(10);
  std::vector<ScalarType>      std_vec2(10);
  ScalarType                   plain_vec3[10];  
  for (unsigned int i = 0; i < 10; ++i)
  {
    std_vec1[i] = randomNumber();
    vcl_vec2(i) = randomNumber();  
    plain_vec3[i] = randomNumber();
  }
  
  vcl_vec1 = vcl_s1 * vcl_vec2 / vcl_s3;
  
  vcl_vec1 = vcl_vec2 / vcl_s3 + vcl_s2 * (vcl_vec1 - vcl_s2 * vcl_vec2);
  std::cout << "!!!! TUTORIAL COMPLETED SUCCESSFULLY !!!!" << std::endl;
  return EXIT_SUCCESS;
}