.net - How to hash only image data in a jpg file with dotnet? -


i have ~20000 jpg images, of duplicates. unfortunately, files have been been tagged exif metadata, simple file hash cannot identify duplicated one.

i attempting create powershell script process these, can find no way extract bitmap data.

the system.drawing.bitmap can return bitmap object, not bytes. there's gethash() function, apparently acts on whole file.

how can hash these files in way exif information excluded? i'd prefer avoid external dependencies if possible.

this powershell v2.0 advanced function implemention. bit long have verified gives same hashcode (generated bitmap pixels) on same picture different metadata , file sizes. pipeline capable version accepts wildcards , literal paths:

function get-bitmaphashcode {     [cmdletbinding(defaultparametersetname="path")]     param(         [parameter(mandatory=$true,                     position=0,                     parametersetname="path",                     valuefrompipeline=$true,                     valuefrompipelinebypropertyname=$true,                    helpmessage="path bitmap file")]         [validatenotnullorempty()]         [string[]]         $path,          [alias("pspath")]         [parameter(mandatory=$true,                     position=0,                     parametersetname="literalpath",                     valuefrompipelinebypropertyname=$true,                    helpmessage="path bitmap file")]         [validatenotnullorempty()]         [string[]]         $literalpath     )      begin {         add-type -assemblyname system.drawing         $sha = new-object system.security.cryptography.sha256managed     }      process {         if ($pscmdlet.parametersetname -eq "path")         {             # in -path case may need resolve wildcarded path             $resolvedpaths = @($path | resolve-path | convert-path)         }         else          {             # must -literalpath             $resolvedpaths = @($literalpath | convert-path)         }          # find pinvoke info each specified path                foreach ($rpath in $resolvedpaths)          {                        write-verbose "processing $rpath"             try {                 $bmp    = new-object system.drawing.bitmap $rpath                 $stream = new-object system.io.memorystream                 $writer = new-object system.io.binarywriter $stream                 ($w = 0; $w -lt $bmp.width; $w++) {                     ($h = 0; $h -lt $bmp.height; $h++) {                         $pixel = $bmp.getpixel($w,$h)                         $writer.write($pixel.toargb())                     }                 }                 $writer.flush()                 [void]$stream.seek(0,'begin')                 $hash = $sha.computehash($stream)                 [bitconverter]::tostring($hash) -replace '-',''             }             {                 if ($bmp)    { $bmp.dispose() }                 if ($writer) { $writer.close() }             }         }       } } 

Comments

Popular posts from this blog

c++ - Convert big endian to little endian when reading from a binary file -

C#: Application without a window or taskbar item (background app) that can still use Console.WriteLine() -

unicode - Are email addresses allowed to contain non-alphanumeric characters? -