Count Distinct Values via aggregation framework

Q: Is it possible to count distinct values of a field in mongodb?

A: Yes! This can be done via the aggregation framework in mongo. This takes two group commands; the first groups by all the distinct values, and the second does a count of them all.

pipeline = [ 
    { $group: { _id: "$myNonUniqueFieldId"}  },
    { $group: { _id: 1, count: { $sum: 1 } } }
];

db.runCommand( 
    {
    "aggregate": "collection" , 
    "pipeline": pipeline
    }
);

2014 in review

The WordPress.com stats helper monkeys prepared a 2014 annual report for this blog.

Here’s an excerpt:

The Louvre Museum has 8.5 million visitors per year. This blog was viewed about 88,000 times in 2014. If it were an exhibit at the Louvre Museum, it would take about 4 days for that many people to see it.

Click here to see the complete report.

Delete a job from multiple servers using SSMS Server Groups

For more detail on Server groups see http://blog.hoegaerden.be/2012/11/24/ssms-connect-to-several-servers-in-one-click-okay-two/ as that explains it in more detail.

The issue is that as Job IDs are unique across servers, a standard sp_delete_job query wouldnt work as we first need to know the Job ID.

The below sample is to delete the same named job across multiple servers:

USE [msdb]
GO

declare @jobid varchar(38)

select @jobid = job_id from sysjobs where name = 'Job Name'

/****** Object:  Job [Monitoring - Backup Stats]    Script Date: 31/12/2014 09:28:37 ******/
EXEC msdb.dbo.sp_delete_job @job_id=@jobid, @delete_unused_schedule=1
GO

mongodb exception: can’t convert from BSON type String to Date

Problem: When attempting to do an aggregation on timestamps which are stored as strings, mongodb is unable to do the conversion

repmongo:SECONDARY> db.collection.aggregate(
…     { $match : { "type" : "Review"}},
…     { $group : {
…         _id: {
…             year : { $year : "$created" },
…             month : { $month : "$created" },
…             day : { $dayOfMonth : "$created" },
…         },
…         count: { $sum: 1 }
…     }},
…     { $sort : { _id : 1}}
… );
assert: command failed: {
        "errmsg" : "exception: can’t convert from BSON type String to Date",
        "code" : 16006,
        "ok" : 0
} : aggregate failed

To see which fields are of which type:

repmongo:SECONDARY> typeof db.collection.findOne().created;
string
repmongo:SECONDARY> typeof db.collection.findOne().updated;
string

Solution 1: The best thing to do is to resolve the data type inconsistency at the application layer so that data is entered into the database in ISO format which can then be easily worked with. This would require the existing data set to be changed to [date] data type.

Solution 2: For immediate analysis of the data use regex and substring to extract the date portions.

db.collection.aggregate(
    { $match : { "created" : {$in : [/2014-10/]}}},
    { $group : {
        _id: {
            year :  { $substr : ["$created", 0, 4 ] },   
            month : { $substr : ["$created", 5, 2 ] },                                      
            day :   { $substr : ["$created", 8, 2 ] },        
        },
        count: { $sum: 1 }
    }
              },
    { $sort : { _id : -1}}
);

The above returns data grouped by Year, Month & Day with a sum. The Regex conditions acts as a like clause on the date string.

{ "_id" : { "year" : "2014", "month" : "10", "day" : "21" }, "count" : 1 }

{ "_id" : { "year" : "2014", "month" : "10", "day" : "06" }, "count" : 1 }

If doing in SQL Language, the above would probably look something like this:

SELECT 
TO_DATE(created, 'YYYY'), 
TO_DATE(created, 'MM'), 
TO_DATE(created, 'DD'), 
count(_id)
FROM collection
WHERE created like '%2014-10%'
GROUP BY 
TO_DATE(created, 'YYYY'), 
TO_DATE(created, 'MM'), 
TO_DATE(created, 'DD') 
ORDER BY _id DESC

Free space in all databases..or just one

Useful query which returns a lot of information around physical and logical disk usage for all database. A where clause can be commented out to just return for one database or a selection

CREATE TABLE #tmp_fileinfo(
    [Server] [varchar](30) NOT NULL,
    [Instance] [varchar](30) NULL,
    [DBName] [sysname] NULL, 
    [File_Name] [sysname] NOT NULL,
    [Size] [numeric](10, 2) NULL,
    [Used] [numeric](10, 2) NULL,
    [Unused] [numeric](10, 2) NULL,
    [Drive] [nvarchar](3) NULL,
    [Physical_Location] [varchar](260) NULL,
    [State] [varchar](15) NULL,
    [AutoGrow_MB] [int] NULL,
    [Percent_growth_set] [Varchar](5) NULL,
    [Percent_growth] [int] NULL,
    [Max_Size_MB] [varchar](9) NULL,
    [Timestamp] [datetime] NULL
) ON [PRIMARY]
 
declare @SQL nvarchar(max)
set @SQL = ''
select @SQL = @SQL + 'USE'  + QUOTENAME(name) + '
insert into #tmp_fileinfo
select Convert(varchar(30),SERVERPROPERTY(''MachineName'')), Convert(varchar(30), 
SERVERPROPERTY(''InstanceName'')), ' + QUOTENAME(name,'''') + ' AS [DBName],
[name] As [File_Name]
,CONVERT(NUMERIC(10,2),ROUND([size]/128.,2)) AS [Size]
,CONVERT(NUMERIC(10,2),ROUND(FILEPROPERTY([name],''SpaceUsed'')/128.,2)) AS [Used]
,CONVERT(NUMERIC(10,2),ROUND(([size]-FILEPROPERTY([name],''SpaceUsed''))/128.,2)) AS [Unused]
,UPPER(SUBSTRING(physical_name, 1, 3)) AS [Drive]
,UPPER([physical_name]) AS [Physical Location]
,UPPER([state_desc]) AS [State]
,[AutoGrow_MB] = CASE WHEN [is_percent_growth] = 0 THEN [Growth]*8/1024 ELSE 0 END
,[Percent_growth_set] = CASE WHEN is_percent_growth = 1 THEN ''TRUE'' ELSE ''FALSE'' END
,[Percent_growth] = CASE WHEN [is_percent_growth] = 1 THEN [Growth] ELSE 0 END
,[MAX_SIZE_MB] = 
    CASE 
        WHEN CAST(max_size as VARCHAR(15)) = -1 THEN ''Unlimited''
        WHEN max_size = 268435456 THEN CAST(2097152 AS VARCHAR(15))
        WHEN max_size >=0 THEN CAST(([max_size]*8)/1024 AS VARCHAR(15))
    END
,getdate()
FROM [sys].[database_files] '
from sys.databases
  
execute (@SQL)
select * from #tmp_fileinfo 
where DBName = 'master'
order by DBName, File_Name
  
drop table #tmp_fileinfo

Collations available on a SQL Server

Nice little query to find all collations and definitions from SQL Server:

select name, COLLATIONPROPERTY(name, 'CodePage') as Code_Page, description
from sys.fn_HelpCollations()

Failed to Connect to server admin:servername\instance Error..but it does connect!

At the SQL Pass Summit last week, I attended a Pre con by Paul Randal and while using a Dedicated Admin Connection he seemed to get a strange error in Management Studio. It seemed he was familiar with the error so i thought I’d try to get to the bottom of it as it just happened to me…

The Issue

Whenever you try to open an admin (DAC) connection to an instance via SSMS, a random error pops up even though the connection is successfully made:

image

After you Ok the error…SSMS is still happy!..

image

Why does this happen?

This is all down to how the Management studio Query window is opened. IF an existing normal connection to the database already exists, then you will get the error. The key is to not have any connectivity to the server in management studio so a completely fresh connection can be opened and not a connection switch in the an existing query window!

Normally you will have object explorer open and the normal course of action is to press the ‘New Query’ button which will open a connected session using the connection context in your Object explorer. Then when you try to switch the connection, the error will happen.

Solution

To do cleanly (not that it matters!), close the object explorer or have a clean SSMS GUI and press the new query window. This will prompt for the connection where the DAC details can be entered and it will open the connection cleanly.

Follow

Get every new post delivered to your Inbox.

Join 160 other followers