Geeks With Blogs

News The Code Project

I am from Munshigonj, Bangladesh. For me, programming is a passion first, a hobby second, and a career third. My Primary Blog: http://weblogs.asp.net/razan/
Razan Paul Blog Explaining thoughts and findings is a great way to learn

Here you will find a sample java program to generate sparse ARFF file. The following italic text text has been taken from: http://www.cs.waikato.ac.nz/ml/weka/arff.html

Sparse ARFF files are very similar to ARFF files, but data with value 0 are not be explicitly represented.Sparse ARFF files have the same header (i.e @relation and @attribute tags) but the data section is different. Instead of representing each value in order, like this:

@data

0, X, 0, Y, "class A"

0, 0, W, 0, "class B"

the non-zero attributes are explicitly identified by attribute number and their value stated, like this:

@data

{1 X, 3 Y, 4 "class A"}

{2 W, 4 "class B"}

Each instance is surrounded by curly braces, and the format for each entry is: <index> <space> <value> where index is the attribute index (starting from 0).

The following Java code generates a sparse Instances object and outputs it to stdout and a physical file as ARFF file.

   1: import java.io.File;
   2: import weka.core.Attribute;
   3: import weka.core.FastVector;
   4: import weka.core.Instance;
   5: import weka.core.Instances;
   6: import weka.core.converters.ArffSaver;
   7: import weka.filters.Filter;
   8: import weka.filters.unsupervised.instance.NonSparseToSparse;
   9: /**
  10: * Generates a sparse ARFF file.
  11: *
  12: * @author Razan
  13: */
  14: public class AttTest {
  15:  public static void main(String[] args) throws Exception 
  16: {
  17:     FastVector attributes;
  18:     Instances dataSet;
  19:     double[] values;
  20:     attributes = new FastVector();
  21:      
  22:     attributes.addElement(new Attribute("att1")); 
  23:     attributes.addElement(new Attribute("att2")); 
  24:     attributes.addElement(new Attribute("att3")); 
  25:     attributes.addElement(new Attribute("att4"));
  26:     
  27:     dataSet = new Instances("ESDN", attributes, 0);
  28:      
  29:     values = new double[dataSet.numAttributes()]; 
  30:     values[0] = 3;
  31:     values[1] =7;
  32:     values[3] = 1;
  33:     dataSet.add(new Instance(1.0, values));
  34:     
  35:     values = new double[dataSet.numAttributes()]; 
  36:     values[2] = 2;
  37:     values[3] = 8;
  38:     dataSet.add(new Instance(1.0, values));
  39:     
  40:     NonSparseToSparse nonSparseToSparseInstance = new NonSparseToSparse(); 
  41:     nonSparseToSparseInstance.setInputFormat(dataSet); 
  42:     Instances sparseDataset = Filter.useFilter(dataSet, nonSparseToSparseInstance);
  43:      
  44:     System.out.println(sparseDataset);
  45:     
  46:     ArffSaver arffSaverInstance = new ArffSaver(); 
  47:     arffSaverInstance.setInstances(sparseDataset); 
  48:     arffSaverInstance.setFile(new File("ESDN.arff")); 
  49:     arffSaverInstance.writeBatch();
  50:  }
  51: }
The output of the sample program is in the following:

output

Hope this will save some of your time.

Technorati Tags: ,,
Posted on Tuesday, November 8, 2011 3:14 AM WEKA , RapidMiner , JAVA | Back to top


Comments on this post: Creating a simple sparse ARFF file

No comments posted yet.
Your comment:
 (will show your gravatar)
 


Copyright © Razan | Powered by: GeeksWithBlogs.net | Join free