Is there any practice to sync Canvas Data in Databricks environment? Or AWS EMR ?
There is not an off the shelf solution I am aware of for Databricks or really any platform outside of a handful of the traditional RDBMS platforms via DAP. We are using Databricks over here and opted to develop our workflows from scratch.
One of the most important steps is the schema parsing. Since the Delta tables offer a lot of flexibility around schema evolution, we chose to build our process to "keep the most columns", meaning that if a column is removed in the API, we'll retain it and new data no longer is written there, if we are missing a column usually because it is new then it gets added. This allows us the ability to reconcile the data that changes on our schedule instead of when a tool would purge it from our system when it falls off the schema API.
We "almost" could get schema interpretation to work in Databricks for CD2, the only snag is when you run a snapshot instead of "fully" sticking to the schema from the API, the meta.action column is completely omitted, which causes a problem when you try to run an incremental following a snapshot since the interpreted schema will not evolve for nested columns. There probably is a way around this, but we decided to base everything around the schema API for how all of our tables are structured. I suppose if you wanted the simplest option you could create a workflow to only use daily snapshots and just use the schema interpretation and not have to worry about all of the steps involved to maintain the incremental loads.
It sounds a bit daunting, but we had a working prototype built pretty quickly with only a two person team working on this project. We really like the speed and performance that Databricks offers.
EDUCAUSE is full of inspiration, whether you’re onsite or following from afar. Join the evolution by sharing what sparks your curiosity, the questions you have, and the insights you discover. We'll learn from each other, expand our networks, and strengthen our Community. Your perspective matters, and the more we share,…
Prompted by the most recent Canvas release notes, we looked into the Page Views feature, and we were happy with the new documented changes. However, we notice that more user actions are done by GraphQL API, which does not provide details in the page view report, either in the GUI or in the CSV file download. We and most…
I am currently teaching five courses this semester with more than 70 students. To track their progress, I currently have to review each course and section individually, which is very time-consuming. Ideally, I would like to consolidate key information for all students into a single Excel file—showing class attendance…
Hello, We are currently using the Analytics API provided in the Instructure Developer Documentation Portal to retrieve certain metrics-related data. We would like to know if it’s possible to replicate these datasets using the Canvas Data Platform (CDP), instead of relying on API calls. For example, could you please advise…
Is there a way to tell via CD2 if a course is using Enhanced Rubrics? I didn't see it in the entity diagram for Rubrics table, but I might have missed something... Thanks Ian