Skip to content

Announcing the updated Microsoft SharePoint connector (V2.0) for Amazon Kendra Udaya Jaladi AWS Machine Learning Blog

  • by

Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides.

Valuable data in organizations is stored in both structured and unstructured repositories. Amazon Kendra can pull together data across several structured and unstructured knowledge base repositories to index and search on.

One such knowledge base repository is Microsoft SharePoint, and we are excited to announce that we have updated the SharePoint connector for Amazon Kendra to add even more capabilities. In this new version (V2.0), we have added support for SharePoint Subscription Edition and multiple authentication and sync modes to index contents based on new, modified, or deleted contents.

You can now also choose OAuth 2.0 to authenticate with SharePoint Online. Multiple synchronization options are available to update your index when your data source content changes. You can filter the search results based on the user and group information to ensure your search results are only shown based on user access rights.

In this post, we demonstrate how to index content from SharePoint using the Amazon Kendra SharePoint connector V2.0.

Solution overview

You can use Amazon Kendra as a central location to index the content provided by various data sources for intelligent search. In the following sections, we go through the steps to create an index, add the SharePoint connector, and test the solution.

Prerequisites

To get started, you need the following:

A SharePoint (Server or Online) user with owner rights.
An AWS account with privileges to create AWS Identity and Access Management (IAM) roles and policies. For more information, see Overview of access management: Permissions and policies.
Basic knowledge of AWS.

Create an Amazon Kendra Index

To create an Amazon Kendra index, complete the following steps:

On the Amazon Kendra console, choose Create an index.
For Index name, enter a name for the index (for example, my-sharepoint-index).
Enter an optional description.
Choose Create a new role.
For Role name, enter an IAM role name.
­Configure optional encryption settings and tags.
Choose Next.
For Access control settings, choose Yes.
For Token configuration, set Token type to JSON and leave the default values for Username and Groups.
For User-group expansion, leave the defaults.
Choose Next.
For Specify provisioning, select Developer edition, which is suited for building a proof of concept and experimentation, and choose Create.

Add a SharePoint data source to your Amazon Kendra index

One of the advantages of implementing Amazon Kendra is that you can use a set of pre-built connectors for data sources such as Amazon Simple Storage Service (Amazon S3), Amazon Relational Database Service (Amazon RDS), SharePoint Online, and Salesforce.

To add a SharePoint data source to your index, complete the following steps:

On the Amazon Kendra console, navigate to the index that you created.
Choose Data sources in the navigation pane.
Under SharePoint Connector V2.0, choose Add connector.
For Data source name, enter a name (for example, my-sharepoint-data-source).
Enter an optional description.
Choose English (en) for Default language.
Enter optional tags.
Choose Next.

Depending on the hosting option your SharePoint application is using, pick the appropriate hosting method. The required attributes for the connector configuration appear based on the hosting method you choose.

If you select SharePoint Online, complete the following steps:

Enter the URL for your SharePoint Online repository.
Choose your authentication option (these authentication details will be used by the SharePoint connector to integrate with your SharePoint application).
Enter the tenant ID of your SharePoint Online application.
For AWS Secrets Manager secret, pick the secret that has SharePoint Online application credentials or create a new secret and add the connection details (for example, AmazonKendra-SharePoint-my-sharepoint-online-secret).

To learn more about AWS Secrets Manger, refer to Getting started with Secrets Manager.

The SharePoint connector uses the clientId, clientSecret, userName, and password information to authenticate with the SharePoint Online application. These details can be accessed on the App registrations page on the Azure portal, if the SharePoi­nt Online application is already registered.

If you select SharePoint Server, complete the following steps:

Choose your SharePoint version (for example, we use SharePoint 2019 for this post).
Enter the site URL for your SharePoint Server repository.
For SSL certificate location, enter the path to the S3 bucket file where the SharePoint Server SSL certificate is located.
Enter the web proxy host name and the port number details if the SharePoint server requires a proxy connection.

For this post, no web proxy is used because the SharePoint application used for this example is a public-facing application.

Select the authorization option for the Access Control List (ACL) configuration.

These authentication details will be used by the SharePoint connector to integrate with your SharePoint instance.

For AWS Secrets Manager secret, choose the secret that has SharePoint Server credentials or create a new secret and add the connection details (for example, AmazonKendra-my-sharepoint-server-secret).

The SharePoint connector uses the user name and password information to authenticate with the SharePoint Server application. If you use an email ID with domain form IDP as the ACL setting, the LDAP server endpoint, search base, LDAP user name, and LDAP password are also required.

To achieve a granular level of control over the searchable and displayable content, identity crawler functionality is introduced in the SharePoint connector V2.0.

Enable the identity crawler and select Crawl Local Group Mapping and Crawl AD Group Mapping.
For Virtual Private Cloud (VPC), choose the VPC through which the SharePoint application is reachable from your SharePoint connector.

For this post, we choose No VPC because the SharePoint application used for this example is a public-facing application deployed on Amazon Elastic Compute Cloud (Amazon EC2) instances.

Chose Create a new role (Recommended) and provide a role name, such as AmazonKendra-sharepoint-v2.
Choose Next.
Select entities that you would like to include for indexing. You can choose All or specific entities based on your use case. For this post, we choose All.

You can also include or exclude documents by using regular expressions. You can define patterns that Amazon Kendra either uses to exclude certain documents from indexing or include only documents with that pattern. For more information, refer to SharePoint Configuration.

Select your sync mode to update the index when your data source content changes.

You can sync and index all contents in all entities, regardless of the previous sync process by selecting Full sync, or only sync new, modified, or deleted content, or only sync new or modified content. For this post, we select Full sync.

Choose a frequency to run the sync schedule, such as Run on demand.
Choose Next.

In this next step, you can create field mappings to add an extra layer of metadata to your documents. This enables you to improve accuracy through manual tuning, filtering, and faceting.

Review the default field mappings information and choose Next.
As a last step, review the configuration details and choose Add data source to create the SharePoint connector data source for the Amazon Kendra index.

Test the solution

Now you’re ready to prepare and test the Amazon Kendra search features using the SharePoint connector.

For this post, AWS getting started documents are added to the SharePoint data source. The sample dataset used for this post can be downloaded from AWS_Whitepapers.zip. This dataset has PDF documents categorized into multiple directories based on the type of documents (for example, documents related to AWS database options, security, and ML).

Also, sample dataset directories in SharePoint are configured with user email IDs and group details so that only the users and groups with permissions can access specific directories or individual files.

To achieve granular-level control over the search results, the SharePoint connector crawls the local or Active Directory (AD) group mapping in the SharePoint data source in addition to the content when the identity crawler is enabled with the local and AD group mapping options selected. With this capability, Amazon Kendra indexed content is searchable and displayable based on the access control permissions of the users and groups.

To sync our index with SharePoint content, complete the following steps:

On the Amazon Kendra console, navigate to the index you created.
Choose Data sources in the navigation pane and select the SharePoint data source.
Choose Sync now to start the process to index the content from the SharePoint application and wait for the process to complete.

If you encounter any sync issues, refer to Troubleshooting data sources for more information.

When the sync process is successful, the value for Last sync status will be set to Successful – service is operating normally. The content from the SharePoint application is now indexed and ready for queries.

Choose Search indexed content (under Data management) in the navigation pane.
Enter a test query in the search field and press Enter.

A test query such as “What is the durability of S3?” provides the following Amazon Kendra suggested answers. Note that the results for this query are from all the indexed content. This is because there is no context of user name or group information for this query.

To test the access-controlled search, expand Test query with username or groups and choose Apply user name or groups to add a user name (email ID) or group information.

When an Experience Builder app is used, it includes the user context, and therefore you don’t need to add user or group IDs explicitly.

For this post, access to the Databases directory in the SharePoint site is provided to the database-specialists group only.
Enter a new test query and press Enter.

In this example, only the content in the Databases directory is searched and the results are displayed. This is because the database-specialists group only has access to the Databases directory.

Congratulations! You have successfully used Amazon Kendra to surface answers and insights based on the content indexed from your SharePoint application.

Amazon Kendra Experience Builder

You can build and deploy an Amazon Kendra search application without the need for any front-end code. Amazon Kendra Experience Builder helps you build and deploy a fully functional search application in a few clicks so that you can start searching right away.

Refer to Building a search experience with no code for more information.

Clean up

To avoid incurring future costs, clean up the resources you created as part of this solution. If you created a new Amazon Kendra index while testing this solution, delete it if you no longer need it. If you only added a new data source using the Amazon Kendra connector for SharePoint, delete that data source after your solution review is completed.

Refer to Deleting an index and data source for more information.

Conclusion

In this post, we showed how to ingest documents from your SharePoint application into your Amazon Kendra index. We also reviewed some of the new features that are introduced in the new version of the SharePoint connector.

To learn more about the Amazon Kendra connector for SharePoint, refer to Microsoft SharePoint connector V2.0.

Finally, don’t forget to check out the other blog posts about Amazon Kendra!

About the Author

Udaya Jaladi is a Solutions Architect at Amazon Web Services (AWS), specializing in assisting Independent Software Vendor (ISV) customers. With expertise in cloud strategies, AI/ML technologies, and operations, Udaya serves as a trusted advisor to executives and engineers, offering personalized guidance on maximizing the cloud’s potential and driving innovative product development. Leveraging his background as an Enterprise Architect (EA) across diverse business domains, Udaya excels in architecting scalable cloud solutions tailored to meet the specific needs of ISV customers.

 Amazon Kendra is a highly accurate and simple-to-use intelligent search service powered by machine learning (ML). Amazon Kendra offers a suite of data source connectors to simplify the process of ingesting and indexing your content, wherever it resides. Valuable data in organizations is stored in both structured and unstructured repositories. Amazon Kendra can pull together  Read More Amazon Kendra, Best Practices, Intermediate (200), Technical How-to 

Leave a Reply

Your email address will not be published. Required fields are marked *