.net - How to hash only image data in a jpg file with dotnet? -
i have ~20000 jpg images, of duplicates. unfortunately, files have been been tagged exif metadata, simple file hash cannot identify duplicated one.
i attempting create powershell script process these, can find no way extract bitmap data.
the system.drawing.bitmap can return bitmap object, not bytes. there's gethash() function, apparently acts on whole file.
how can hash these files in way exif information excluded? i'd prefer avoid external dependencies if possible.
this powershell v2.0 advanced function implemention. bit long have verified gives same hashcode (generated bitmap pixels) on same picture different metadata , file sizes. pipeline capable version accepts wildcards , literal paths:
function get-bitmaphashcode { [cmdletbinding(defaultparametersetname="path")] param( [parameter(mandatory=$true, position=0, parametersetname="path", valuefrompipeline=$true, valuefrompipelinebypropertyname=$true, helpmessage="path bitmap file")] [validatenotnullorempty()] [string[]] $path, [alias("pspath")] [parameter(mandatory=$true, position=0, parametersetname="literalpath", valuefrompipelinebypropertyname=$true, helpmessage="path bitmap file")] [validatenotnullorempty()] [string[]] $literalpath ) begin { add-type -assemblyname system.drawing $sha = new-object system.security.cryptography.sha256managed } process { if ($pscmdlet.parametersetname -eq "path") { # in -path case may need resolve wildcarded path $resolvedpaths = @($path | resolve-path | convert-path) } else { # must -literalpath $resolvedpaths = @($literalpath | convert-path) } # find pinvoke info each specified path foreach ($rpath in $resolvedpaths) { write-verbose "processing $rpath" try { $bmp = new-object system.drawing.bitmap $rpath $stream = new-object system.io.memorystream $writer = new-object system.io.binarywriter $stream ($w = 0; $w -lt $bmp.width; $w++) { ($h = 0; $h -lt $bmp.height; $h++) { $pixel = $bmp.getpixel($w,$h) $writer.write($pixel.toargb()) } } $writer.flush() [void]$stream.seek(0,'begin') $hash = $sha.computehash($stream) [bitconverter]::tostring($hash) -replace '-','' } { if ($bmp) { $bmp.dispose() } if ($writer) { $writer.close() } } } } }
Comments
Post a Comment