in

Index your Alfresco content material utilizing the brand new Amazon Kendra Alfresco connector


Amazon Kendra is a extremely correct and simple-to-use clever search service powered by machine studying (ML). Amazon Kendra gives a set of information supply connectors to simplify the method of ingesting and indexing your content material, wherever it resides.

Helpful information in organizations is saved in each structured and unstructured repositories. An enterprise search resolution ought to have the ability to index and search throughout a number of structured and unstructured repositories.

Alfresco Content material Companies gives open, versatile, extremely scalable enterprise content material administration (ECM) capabilities with the added advantages of a content material providers platform, making content material accessible wherever and nevertheless you’re employed by straightforward integrations with the enterprise functions you employ daily. Many organizations use the Alfresco content material administration platform to retailer their content material. One of many key necessities for enterprise prospects utilizing Alfresco is the power to simply and securely discover correct data throughout all of the saved paperwork.

We’re excited to announce that you may now use the brand new Amazon Kendra Alfresco connector to go looking paperwork saved in your Alfresco repositories and websites. On this submit, we present the best way to use the brand new connector to retrieve paperwork saved in Alfresco for indexing functions and securely use the Amazon Kendra clever search operate. As well as, the ML-powered clever search can precisely discover data from unstructured paperwork with pure language narrative content material, for which key phrase search will not be very efficient.

What’s new within the Amazon Kendra Alfresco connector

The Amazon Kendra Alfresco connector gives assist for the next:

  • Primary and OAuth2 authentication mechanisms for the Alfresco On-Premises (On-Prem) platform
  • Primary and OAuth2 authentication mechanisms for the Alfresco PaaS platform
  • Side-based crawling of Alfresco repository paperwork

Resolution overview

With Amazon Kendra, you’ll be able to configure a number of information sources to offer a central place to go looking throughout your doc repositories and websites. The answer on this submit demonstrates the next:

  • Retrieval of paperwork and feedback from Alfresco non-public websites and public websites
  • Retrieval of paperwork and feedback from Alfresco repositories utilizing Amazon Kendra-specific features
  • Authentication towards Alfresco On-Prem and PaaS platforms utilizing Primary and OAuth2 mechanisms, respectively
  • The Amazon Kendra search functionality with entry management throughout websites and repositories

If you will use solely one of many platforms, you’ll be able to nonetheless observe this submit to construct the instance resolution; simply ignore the steps akin to the platform that you’re not utilizing.

The next is a abstract of the steps to construct the instance resolution:

  1. Add paperwork to the three Alfresco websites and the repository folder. Be certain that the uploaded paperwork are distinctive throughout websites and repository folders.
  2. For the 2 non-public websites and repository, use document-level Alfresco permission administration to set entry permissions. For the general public website, you don’t have to arrange permissions on the doc degree. Notice that permissions data is retrieved by the Amazon Kendra Alfresco connector and used for entry management by the Amazon Kendra search operate.
  3. For the 2 non-public websites and repository, create a brand new Amazon Kendra index (you employ the identical index for each the non-public websites and the repository). For the general public website, create a brand new Amazon Kendra index.
  4. For the On-Prem non-public website, create an Amazon Kendra Alfresco information supply utilizing Primary authentication, inside the Amazon Kendra index for personal websites.
  5. For the On-Prem repository paperwork with Amazon Kendra-specific features, create an information supply utilizing Primary authentication, inside the Amazon Kendra index for personal websites.
  6. For the PaaS non-public website, create an information supply utilizing Primary authentication, inside the Amazon Kendra index for personal websites.
  7. For the PaaS public website, create an information supply utilizing OAuth2 authentication, inside the Amazon Kendra index for public websites.
  8. Carry out a sync for every information supply.
  9. Run a take a look at question within the Amazon Kendra index meant for personal websites and the repository utilizing entry management.
  10. Run a take a look at question within the Amazon Kendra index meant for public websites with out entry management.

Conditions

You want an AWS account with privileges to create AWS Identity and Access Management (IAM) roles and insurance policies. For extra data, see Overview of access management: Permissions and policies. You’ll want to have a fundamental information of AWS and the best way to navigate the AWS Management Console.

For the Alfresco On-Prem platform, full the next steps:

  1. Create a non-public website or use an current website.
  2. Create a repository folder or use an current repository folder.
  3. Get the repository URL.
  4. Get Primary authentication credentials (person ID and password).
  5. Be certain that authentication are a part of the ALFRESCO_ADMINISTRATORS group.
  6. Get the general public X509 certificates in .pem format and reserve it domestically.

For the Alfresco PaaS platform, full the next steps:

  1. Create a non-public website or use an current website.
  2. Create a public website or use an current website.
  3. Get the repository URL.
  4. Get Primary authentication credentials (person ID and password).
  5. Get OAuth2 credentials (consumer ID, consumer secret, and token URL).
  6. Affirm that authentication customers are a part of the ALFRESCO_ADMINISTRATORS group.

Step 1: Add instance paperwork

Every uploaded doc should have 5 MB or much less in textual content. For extra data, see Amazon Kendra Service Quotas. You possibly can add instance paperwork or use current paperwork inside every website.

As proven within the following screenshot, we’ve got uploaded 4 paperwork to the Alfresco On-Prem non-public website.

We now have uploaded three paperwork to the Alfresco PaaS non-public website.

We now have uploaded 5 paperwork to the Alfresco PaaS public website.

We now have uploaded two paperwork to the Alfresco On-Prem repository.

Assign the facet awskendra:indexControl to a number of paperwork within the repository folder.

Step 2: Configure Alfresco permissions

Use the Alfresco Permissions Administration characteristic to offer entry rights to instance customers for viewing uploaded paperwork. It’s assumed that you’ve some instance Alfresco person names, with electronic mail addresses, that can be utilized for setting permissions on the doc degree in non-public websites. These customers are usually not used for crawling the websites.

Within the following instance for the On-Prem non-public website, we’ve got supplied customers My Dev User1 and My Dev User2 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.

Within the following instance for the PaaS non-public website, we’ve got supplied person Kendra Consumer 3 with site-consumer entry to the instance doc. Repeat the identical process for the opposite uploaded paperwork.

For the Alfresco repository paperwork, we’ve got supplied person My Dev user1 with client entry to the instance doc.

The next desk lists the positioning or repository names, doc names, and permissions.

Platform Website or Repository Identify Doc Identify Consumer IDs
On-Prem MyAlfrescoSite ChannelMarketingBudget.xlsx My Supervisor User3
On-Prem MyAlfrescoSite wellarchitected-sustainability-pillar.pdf My Dev User1, My Dev User2
On-Prem MyAlfrescoSite WorkDocs.docx My Dev User1, My Dev User2, My Supervisor User3
On-Prem MyAlfrescoSite WorldPopulation.csv My Dev User1, My Dev User2, My Supervisor User3
PaaS MyAlfrescoCloudSite2 DDoS_White_Paper.pdf Kendra User3
PaaS MyAlfrescoCloudSite2 wellarchitected-framework.pdf Kendra User3
PaaS MyAlfrescoCloudSite2 ML_Training.pptx Kendra User1
PaaS MyAlfrescoCloudPublicSite batch_user.pdf Everybody
PaaS MyAlfrescoCloudPublicSite Amazon Easy Storage Service – Consumer Information.pdf Everybody
PaaS MyAlfrescoCloudPublicSite AWS Batch – Consumer Information.pdf Everybody
PaaS MyAlfrescoCloudPublicSite Amazon Detective.docx Everybody
PaaS MyAlfrescoCloudPublicSite Pricing.xlsx Everybody
On-Prem Repo: MyAlfrescoRepoFolder1 Polly-dg.pdf (facet awskendra:indexControl) My Dev User1
On-Prem Repo: MyAlfrescoRepoFolder1 Transcribe-api.pdf (facet awskendra:indexControl) My Dev User1

Step 3: Arrange Amazon Kendra indexes

You possibly can create a brand new Amazon Kendra index or use an current index for indexing paperwork hosted in Alfresco non-public websites. To create a brand new index, full the next steps:

  1. On the Amazon Kendra console, create an index referred to as Alfresco-Personal.
  2. Create a brand new IAM function, then select Subsequent.
  3. For Entry Management, select Sure.
  4. For Token Sort¸ select JSON.
  5. Maintain the person identify and group as default.
  6. Select None for person group enlargement as a result of we’re assuming no integration with AWS IAM Identity Center (successor to AWS Single Signal-On).
  7. Select Subsequent.
  8. Select Developer Version for this instance resolution.
  9. Select Create to create a brand new index.

The next screenshot exhibits the Alfresco-Personal index after it has been created.

  1. You possibly can confirm the entry management configuration on the Consumer entry management tab.

  1. Repeat these steps to create a second index referred to as Alfresco-Public.

Step 4: Create an information supply for the On-Prem non-public website

To create an information supply for the On-Prem non-public website, full the next steps:

  1. On the Amazon Kendra console, navigate to the Alfresco-Personal index.
  2. Select Information sources within the navigation pane.
  3. Select Add information supply.

  1. Select Add connector for the Alfresco connector.

  1. For Information supply identify, enter Alfresco-OnPrem-Personal.
  2. Optionally, add an outline.
  3. Maintain the remaining settings as default and select Subsequent.

To connect with the Alfresco On-Prem website, the connector wants entry to the general public certificates akin to the On-Prem server. This was one of many stipulations.

  1. Use a special browser tab to add the .pem file to an Amazon Simple Storage Service (Amazon S3) bucket in your account.

You employ this S3 bucket identify within the subsequent steps.

  1. Return to the info supply creation web page.
  2. For Supply, choose Alfresco server.
  3. For Alfresco repository URL, enter the repository URL (created as a prerequisite).
  4. For Alfresco person software URL, enter the identical worth because the repository URL.
  5. For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
  6. For Authentication, choose Primary authentication.
  7. For AWS Secrets and techniques Supervisor secret, select Create and add new secret.

A pop-up window opens to create an AWS Secrets Manager secret.

  1. Enter a reputation on your secret, person identify, and password, then select Save.

  1. For Digital Personal Cloud (VPC), select No VPC.
  2. Flip the id crawler on.
  3. For IAM function, select Create a brand new IAM function.
  4. Select Subsequent.

You possibly can configure the info supply to synchronize contents from a number of Alfresco websites. For this submit, we sync to the on-prem non-public website.

  1. For Content material to sync, choose Single Alfresco website sync and select MyAlfrescoSite.
  2. Choose Embody feedback to retrieve feedback along with paperwork.
  3. For Sync mode, choose Full sync.
  4. For Frequency, select Run on demand (or a special frequency choice as wanted).
  5. Select Subsequent.

  1. Map the Alfresco doc fields to the Amazon Kendra index fields (you’ll be able to hold the defaults), then select Subsequent.

  1. On the Assessment and Create web page, confirm all the data, then select Add information supply.

After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.

Step 5: Create an information supply for the On-Prem repository paperwork with Amazon Kendra-specific features

Equally to the earlier steps, create an information supply for the On-Prem repository paperwork with Amazon Kendra-specific features:

  1. On the Amazon Kendra console, navigate to the Alfresco-Personal index.
  2. Select Information sources within the navigation pane.
  3. Select Add information supply.
  4. Select Add connector for the Alfresco connector.
  5. For Information supply identify, enter Alfresco-OnPrem-Facets.
  6. Optionally, add an outline.
  7. Maintain the remaining settings as default and select Subsequent.
  8. For Supply, choose Alfresco server.
  9. For Alfresco repository URL, enter the repository URL (created as a prerequisite).
  10. For Alfresco person software URL, enter the identical worth because the repository URL.
  11. For SSL certificates location, select Browse S3 and select the S3 bucket the place you uploaded the .pem file.
  12. For Authentication, choose Primary authentication.
  13. For AWS Secrets and techniques Supervisor secret, select the key you created earlier.
  14. For Digital Personal Cloud (VPC), select No VPC.
  15. Flip the id crawler off.
  16. For IAM function, select Create a brand new IAM function.
  17. Select Subsequent.

For this scope, the connector retrieves solely these On-Prem server repository paperwork which have been assigned a facet referred to as awskendra:indexControl.

  1. For Content material to sync, choose Alfresco features sync.
  2. For Sync mode, select Full sync.
  3. For Frequency, select Run on demand (or a special frequency choice as wanted).
  4. Select Subsequent.
  5. Map the Alfresco doc fields to the Amazon Kendra index fields (you’ll be able to hold the defaults), then select Subsequent.
  6. On the Assessment and Create web page, confirm all the data, then select Add information supply.

After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.

Step 6: Create an information supply for the PaaS non-public website

Comply with related steps because the earlier sections to create an information supply for the PaaS non-public website:

  1. On the Amazon Kendra console, navigate to the Alfresco-Personal index.
  2. Select Information sources within the navigation pane.
  3. Select Add information supply.
  4. Select Add connector for the Alfresco connector.
  5. For Information supply identify, enter Alfresco-Cloud-Personal.
  6. Optionally, add an outline.
  7. Maintain the remaining settings as default and select Subsequent.
  8. For Supply, choose Alfresco cloud.
  9. For Alfresco repository URL, enter the repository URL (created as a prerequisite).
  10. For Alfresco person software URL, enter the identical worth because the repository URL.
  11. For Authentication, choose Primary authentication.
  12. For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
  13. Enter a reputation on your secret, person identify, and password, then select Save.
  14. For Digital Personal Cloud (VPC), select No VPC.
  15. Flip the id crawler off.
  16. For IAM function, select Create a brand new IAM function.
  17. Select Subsequent.

We are able to configure the info supply to synchronize contents from a number of Alfresco websites. For this submit, we configure the info supply to sync from the PaaS non-public website MyAlfrescoCloudSite2.

  1. For Content material to sync, choose Single Alfresco website sync and select MyAlfrescoCloudSite2.
  2. Choose Embody feedback.
  3. For Sync mode, select Full sync.
  4. For Frequency, select Run on demand (or a special frequency choice as wanted).
  5. Select Subsequent.
  6. Map the Alfresco doc fields to the Amazon Kendra index fields (you’ll be able to hold the defaults) and select Subsequent.
  7. On the Assessment and Create web page, confirm all the data, then select Add information supply.

After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.

Step 7: Create an information supply for the PaaS public website

We observe related steps as earlier than to create an information supply for the PaaS public website:

  1. On the Amazon Kendra console, navigate to the Alfresco-Public index.
  2. Select Information sources within the navigation pane.
  3. Select Add information supply.
  4. Select Add connector for the Alfresco connector.
  5. For Information supply identify, enter Alfresco-Cloud-Public.
  6. Optionally, add an outline.
  7. Maintain the remaining settings as default and select Subsequent.
  8. For Supply, choose Alfresco cloud.
  9. For Alfresco repository URL, enter the repository URL (created as a prerequisite).
  10. For Alfresco person software URL, enter the identical worth because the repository URL.
  11. For Authentication, choose OAuth2.0 authentication.
  12. For AWS Secrets and techniques Supervisor secret, select Create and add new secret.
  13. Enter a reputation on your secret, consumer ID, consumer secret, and token URL, then select Save.
  14. For Digital Personal Cloud (VPC), select No VPC.
  15. Flip the id crawler off.
  16. For IAM function, select Create a brand new IAM function.
  17. Select Subsequent.

We configure this information supply to sync to the PaaS public website MyAlfrescoCloudPublicSite.

  1. For Content material to sync, choose Single Alfresco website sync and select MyAlfrescoCloudPublicSite.
  2. Optionally, choose Embody feedback.
  3. For Sync mode, select Full sync.
  4. For Frequency, select Run on demand (or a special frequency choice as wanted).
  5. Select Subsequent.
  6. Map the Alfresco doc fields to the Amazon Kendra index fields (you’ll be able to hold the defaults) and select Subsequent.
  7. On the Assessment and Create web page, confirm all the data, then select Add information supply.

After the info supply has been created, the info supply web page is displayed as proven within the following screenshot.

Step 8: Carry out a sync for every information supply

Navigate to every of the info sources and select Sync now. Full just one synchronization at a time.

Look ahead to synchronization to be full for all information sources. When every synchronization is full for an information supply, you see the standing as proven within the following screenshot.

It’s also possible to view Amazon CloudWatch logs for a selected sync underneath Sync run historical past.

Step 9: Run a take a look at question within the non-public index utilizing entry management

Now it’s time to check the answer. We first run a question within the non-public index utilizing entry management:

  1. On the Amazon Kendra console, navigate to the Alfresco-Personal index and select Search listed content material.

  1. Enter a question within the search subject.

As proven within the following screenshot, Amazon Kendra didn’t return any outcomes.

  1. Select Apply token.
  2. Enter the e-mail tackle akin to the My Dev User1 person and select Apply.

Notice that Amazon Kendra entry management works based mostly on the e-mail tackle related to an Alfresco person identify.

  1. Run the search once more.

The search ends in a doc listing (containing wellarchitected-sustainability-pillar.pdf within the following instance) based mostly on the entry management setup.

When you run the identical question once more and supply an electronic mail tackle that doesn’t have entry to both of those paperwork, you shouldn’t see these paperwork within the outcomes listing.

  1. Enter one other question to go looking within the paperwork based mostly on the facet awskendra:indexControl.
  2. Select Apply token, enter the e-mail tackle akin to My Dev User1 person, and select Apply.
  3. Rerun the question.

Step 10: Run a take a look at question within the public index with out entry management.

Equally, we will take a look at our resolution by operating queries within the public index with out entry management:

  1. On the Amazon Kendra console, navigate to the Alfresco-Public index and select Search listed content material.
  2. Run a search question.

As a result of this instance Alfresco public website has not been arrange with any entry management, we don’t use an entry token.

Clear up

To keep away from incurring future prices, clear up the assets you created as a part of this resolution. Delete newly added Alfresco information sources inside the indexes. When you created new Amazon Kendra indexes whereas testing this resolution, delete them as nicely.

Conclusion

With the brand new Alfresco connector for Amazon Kendra, organizations can faucet into the repository of data saved of their account securely utilizing clever search powered by Amazon Kendra.

To study these prospects and extra, seek advice from the Amazon Kendra Developer Guide. For extra data on how one can create, modify, or delete metadata and content material when ingesting your information from Alfresco, seek advice from Enriching your documents during ingestion and Enrich your content and metadata to enhance your search experience with custom document enrichment in Amazon Kendra.


Concerning the Authors

Arun Anand is a Senior Options Architect at Amazon Internet Companies based mostly in Houston space. He has 25+ years of expertise in designing and creating enterprise functions. He works with companions in Vitality & Utilities section offering architectural and finest follow suggestions for brand spanking new and current options.

Rajnish Shaw is a Senior Options Architect at Amazon Internet Companies, with a background as a Product Developer and Architect. Rajnish is obsessed with serving to prospects construct functions on the cloud. Outdoors of labor Rajnish enjoys spending time with household and mates, and touring.

Yuanhua Wang is a software program engineer at AWS with greater than 15 years of expertise within the know-how trade. His pursuits are software program structure and construct instruments on cloud computing.


Optimize knowledge preparation with new options in AWS SageMaker Information Wrangler

Deliver your personal AI utilizing Amazon SageMaker with Salesforce Information Cloud