Count Distinct Values via aggregation framework

Q: Is it possible to count distinct values of a field in mongodb?

A: Yes! This can be done via the aggregation framework in mongo. This takes two group commands; the first groups by all the distinct values, and the second does a count of them all.

pipeline = [ 
    { $group: { _id: "$myNonUniqueFieldId"}  },
    { $group: { _id: 1, count: { $sum: 1 } } }
];

db.runCommand( 
    {
    "aggregate": "collection" , 
    "pipeline": pipeline
    }
);
Advertisements

Get CPU/Core/HT Count

Several different methods available.

T-SQL:

SELECT 
  cpu_count AS NumberOfLogicalCPUs
, hyperthread_ratio
, ( cpu_count / hyperthread_ratio ) AS NumberOfPhysicalCPUs
, CASE
      WHEN hyperthread_ratio = cpu_count THEN cpu_count
      ELSE ( ( cpu_count - hyperthread_ratio ) / 
             ( cpu_count / hyperthread_ratio ) )
 END AS NumberOfCoresInEachCPU
, CASE
    WHEN hyperthread_ratio = cpu_count THEN cpu_count
    ELSE ( cpu_count / hyperthread_ratio ) 
    * ( ( cpu_count - hyperthread_ratio ) / 
            ( cpu_count / hyperthread_ratio ) )
  END AS TotalNumberOfCores
FROM sys.dm_os_sys_info

Powershell – Get Cores:

Get-WmiObject -namespace "root\CIMV2" -class Win32_Processor -Property NumberOfCores | select NumberOfCores

CMD Line:

wmic cpu get NumberofCores, NumberofLogicalProcessors

Useful Links:

http://weblogs.sqlteam.com/tarad/archive/2008/08/05/How-to-get-physical-CPU-count-on-a-server.aspx

http://www.pxserver.com/WinAudit.htm

Aggregation & mapReduce in Mongo

In Oracle, or SQL Server or any other RDBMS, SQL makes it very easy to do aggregation by using GROUP BY.

SELECT make, count(model) FROM Vehicle
GROUP by make 
order by make

In Mongo, Its slightly more complex to build group by aggregation from collections. There are many ways of getting different counts out, and some are explored below.

DISTINCT

//Distinct all Makes of cars in Collection
db.vehicle.distinct('model');

The above is the same as doing a Select Distinct in SQL. The data returned is all distinct types of vehicle models in the collection

COUNT

//Count all Ford Cards
db.vehicle.count({make:"FORD"});

The above will count all ford cars in the vehicle collection

GROUP

Generally only applicable for collections of up to 10,000 documents. Errors have been reported if attempted on larger sets. Use mapreduce for larger data collections.

// Group by make
db.vehicle.group(
           {key: { make:true},
            reduce: function(obj,prev) { prev.count++;},
            initial: { count: 0 }
            });

The result is something a bit like this:

{

        "make" : "ALFA ROMEO",

        "count" : 25

},

{

        "make" : "CORVETTE",

        "count" : 1

},

{

        "make" : "HONDA",

        "count" : 2

},

The collection is being aggregated at the make level. The reduce function is passing a aggregated count back to the object using the key criteria. This can be further enhanced by adding in the model document

// Group by make, model
db.vehicle.group(
           {key: { make:true, model:true},
            reduce: function(obj,prev) { prev.count++;},
            initial: { count: 0 }
            });

The result is something like this:

{

        "make" : "ALFA ROMEO",

        "model" : "MITO",

        "count" : 2

},

{

        "make" : "ASTON MARTIN",

        "model" : "VIRAGE",

        "count" : 1

},

{

        "make" : "CORVETTE",

        "model" : "C6",

        "count" : 1

},

The above 2 example use a cumulative count:

prev.count++;

but the same can also be achieved by doing:

prev.count += 1

MAP REDUCE

Map-reduce is made up of two operations. The map part formats the data into key-value pairs, and the reduce builds the data set.

Eg: we want to return the count of the number of values for each type of make of vehicle.

//Map Function
var map = function() {
    emit( this.make, {model: this.model} );
  }

Create the MAP function. We are essentially saying return me the make of vehicle, but for that vehicle, my aggregator is the make document.

//Reduce Function
//Add up all the votes for each key.
var reduce = function(key, values) {
  var sum = 0;
  values.forEach(function(doc) {
    sum += 1;
  });
  return {model: sum};
}

Create the REDUCE function. the MAP values are automatically passed into the reduce function. Note, that at this stage we havnt actually specified the collection we want to run this mapReduce on.

//var into output **CAREFUL** - saves it as a new collection 
//would overwrite existing collection
var op = db.vehicle.mapReduce(map,reduce, {out : "myoutput"} );

The above saves the results of the mapReduce into a myoutput collection.

//return results
db[op.result].find();

By doing the above, the results are returned to screen.

PRIMARY> db[op.result].find();

{ "_id" : "ALFA ROMEO", "value" : { "model" : 25 } }

{ "_id" : "ASTON MARTIN", "value" : { "model" : 48 } }

{ "_id" : "AUDI", "value" : { "model" : 33 } }

{ "_id" : "BENTLEY", "value" : { "model" : 8 } }

{ "_id" : "BMW", "value" : { "model" : 32 } }

{ "_id" : "CHRYSLER", "value" : { "model" : "YPSILON" } }

{ "_id" : "CITROEN", "value" : { "model" : "C4 PICASSO" } }

{ "_id" : "CORVETTE", "value" : { "model" : "C6" } }

{ "_id" : "FIAT", "value" : { "model" : 9 } }

{ "_id" : "FORD", "value" : { "model" : 14 } }

{ "_id" : "HONDA", "value" : { "model" : 2 } }

{ "_id" : "HYUNDAI", "value" : { "model" : 5 } }

{ "_id" : "JAGUAR", "value" : { "model" : 3 } }

{ "_id" : "KIA", "value" : { "model" : 2 } }

{ "_id" : "LAMBORGHINI", "value" : { "model" : 24 } }

{ "_id" : "LAND ROVER", "value" : { "model" : 8 } }

{ "_id" : "LOTUS", "value" : { "model" : 5 } }

{ "_id" : "MASERATI", "value" : { "model" : 6 } }

{ "_id" : "MAZDA", "value" : { "model" : 2 } }

{ "_id" : "MERCEDES-BENZ", "value" : { "model" : 7 } }

has more

Note that not all documents were shown – the shell limits the number to 20 when automatically iterating a cursor. If we want to return the next set of results, there’s the it shortcut

has more

PRIMARY>

PRIMARY> it

{ "_id" : "MINI", "value" : { "model" : 11 } }

{ "_id" : "MITSUBISHI", "value" : { "model" : "COLT" } }

{ "_id" : "NISSAN", "value" : { "model" : 5 } }

{ "_id" : "PERODUA", "value" : { "model" : 51 } }

{ "_id" : "PORSCHE", "value" : { "model" : 10 } }

{ "_id" : "PROTON", "value" : { "model" : 2 } }

{ "_id" : "RENAULT", "value" : { "model" : 4 } }

{ "_id" : "ROLLS-ROYCE", "value" : { "model" : "PHANTOM" } }

{ "_id" : "SSANGYONG", "value" : { "model" : "RODIUS" } }

{ "_id" : "SUZUKI", "value" : { "model" : 2 } }

{ "_id" : "TESLA", "value" : { "model" : "ROADSTER" } }

{ "_id" : "TOYOTA", "value" : { "model" : "PRIUS" } }

{ "_id" : "VOLKSWAGEN", "value" : { "model" : 5 } }

{ "_id" : "VOLVO", "value" : { "model" : 7 } }

To Clean up:

//Drop the output collection
db.myoutput.drop();

See references:

Mongo Docs: http://www.mongodb.org/display/DOCS/Aggregation

MongoDB Aggregation: http://kylebanker.com/blog/2009/11/mongodb-count-group/

Map Reduce: http://www.mongodb.org/display/DOCS/MapReduce

Mongo – Querying documents within documents

Some of our collections in Mongo are built with embedded documents, This means the actual raw data is held in the following heirarchy:

OurMongoDB (DB)
OurMongoCollection (Collection)
                                _id          (document)
                                Car         (document)
                                                _id                       (embedded document)
                                                Channel             (embedded document)
                                                Make                   (embedded document)
                                                Model                  (embedded document)
                                                …
                                userId   (document)

This means that normal mongo db find statements will not work if you are searching based on criteria…

To search for embedded document data – use dot notation (http://www.mongodb.org/display/DOCS/Dot+Notation+%28Reaching+into+Objects%29

--return only where channel is Test1
db.ourMongoCollection.find({"car.channel" : "Test1"});

--return only where channel is Test2
db.ourMongoCollection.find({"car.channel" : "Test2"});

--return count of all documents with channel as Test2
db.ourMongoCollection.find({"car.channel" : "Test2"}).count();

--return count of all channel Test1 documents with a make of Honda 
--(search criteria is case sensitive)
db.ourMongoCollection.find({"car.channel" : "Test1", "car.make" : "HONDA"}).count();