I spent some time to work Azure Cognitive service. As an experiment I created a demo application that analyzes uploaded images and generates a caption text and tags. As a bonus it generates a smart cropped thumbnail. Project contains Image API storing images and smart AI-cropped thumbnails in Azure Storage with metadata tagging from Azure Vision API. Below are some detail, but the project is open-sourced via my Github repository:https://github.com/Guzzter/auto-tagging-image-api
Smart AI cropped thumbnails vs basic cropped thumbnail
Although the basic cropped thumbnail could be improved with some extra effort, example below shows power of AI. It looks at the image what the focus point should be before creating the thumbnail.
For running this demo you need a (free) Azure account where you need to create the following 2 Azure resources:
Create Storage account
Needed for storing uploaded images and generated thumbnails. After creation add the connection string to the placeholder in appsettings.Development.json. BTW, the containers 'photos' and 'thumbnails' will be create automatically.
Create Cognitive services multi-service account.
Used for Vision API for analyzing image conten to retrieve caption text and all related tags. After creation of account, copy the key and endpoint values to the placeholders in appsettings.Development.json
Running the application
When running the application locally you can use the Swagger interface to upload an image (see Samples folder). After the image is uploaded you can use List API method to retrieve image data with the recognized captain and tags from Vision API. Example output:
"caption": "a canal with buildings along it",
"tags": "building outdoor house narrow town residential",
Note that the bold texts are generated by AI interpreting the image.
Used NuGet packages:
I use the following packages:
- Azure.Storage.Blobs - used for BlobContainerClient for store/retrieve images from storage account
- Microsoft.Azure.CognitiveServices.Vision.ComputerVision - used for sending images to retrieve captain and tags
- Imageflow.AllPlatforms - used for creating thumbnail images
- MimeTypeMapOfficial - used for mapping extension to corresponding mime-type
- Swashbuckle.AspNetCore - used for Swagger API documentation page
This implementation is inspired by the older Microsoft vision tuturial. The tutorial is using .NET core 2(?) with razer pages. I choose to modernize it and use .Net 6.0 with Swagger API.
For resizing with ImageFlow, I used this documentation page: Querystring documentation for BuildCommandString