Dark background Colaberry alumni images

Posted on May 3, 2023 by Kevin Guisarde .

Step into the world of boundless opportunities at our weekly and monthly Blog events, designed to empower and equip students and professionals with cutting-edge skills in Business Intelligence and Analytics. Brace yourself for an awe-inspiring lineup of events, ranging from Power BI and Data Warehouse events, SQL Wednesday events, Qlik and Tableau events, and IPBC Saturday events, to multiple sessions, focused on helping students ace their coursework and mortgage projects.

Power BI Event (Monday, 7:30 pm CST)

Data Warehouse (ETL) Event (Monday, 7:30 pm CST)

Our Power BI and Data Warehouse event is an excellent opportunity for beginners and professionals to learn and improve their skills in creating effective data visualizations and building data warehouses. Our experienced trainers will provide a comprehensive overview of the latest tools and techniques to help you unlock the full potential of Power BI and Data Warehouse. Join us on Monday at 7:30 pm CST to learn more.

SQL Wednesday Event (2nd and 3rd Wednesday 7:30pm CST)

Our SQL Wednesday event is designed to help participants gain in-depth knowledge and understanding of SQL programming language. The event is divided into two sessions on the 2nd and 3rd Wednesday of every month, where we cover different topics related to SQL programming. Our experts will guide you through the nuances of SQL programming, and teach you how to use the language to extract insights from large datasets.

Tableau Events (Thursday 7:30 pm CST)

Qlik Events (Thursday 7:30 pm CST)

Our Qlik and Tableau events are dedicated to helping participants master the art of data visualization using these powerful tools. Whether you are a beginner or an experienced professional, our trainers will provide you with valuable insights and best practices to create compelling data stories using Qlik and Tableau. Join us on Thursday to learn how to make sense of complex data and present it in an engaging and impactful way.

IPBC Saturday Event (Saturday at 10 am CST)

Our IPBC Saturday event is designed to provide participants with a broad understanding of the fundamentals of business analytics, including predictive analytics, descriptive analytics, and prescriptive analytics. Our trainers will provide hands-on experience with the latest tools and techniques, and demonstrate how to apply analytics to real-world business problems.

Mortgage Project Help (Monday, Wednesday, & Thursday at 7:30 pm CST)

For those students who need help with their mortgage projects, we have dedicated sessions on Monday, Wednesday, and Thursday at 7:30 pm CST. Our experts will guide you through the process of creating a successful mortgage project, and help you understand the key factors that contribute to a successful project.

Homework help (Wednesday, Thursday, Saturday)

We understand that students may face challenges in their coursework, and may need additional help to understand concepts or complete assignments. That’s why we offer dedicated homework help sessions on Wednesday at 8 pm CST, Thursday at 7:30 pm CST, and Saturday at 1:30 pm CST. Our tutors will provide personalized guidance and support to help you overcome any challenges you may face in your coursework.

CAP Competition Event (1st Wednesday of the Month at 7:30 pm CST)

We have our monthly CAP competition where students showcase their communication skills and compete in our Monthly Data Challenge. Open to all students, this event offers a chance to sharpen skills and showcase abilities in front of a live audience. The top three winners move on to the next level. The event is free, so come and support your fellow classmates on the 1st Wednesday of every month at 7:30 pm CST. We look forward to seeing you there!

The Good Life Event (1st Thursday of the Month at 10 am CST)

Good Life event on the 1st Thursday of every month at 10 am CST. Successful alumni come to share their inspiring success stories and offer valuable advice to current students. Don’t miss this opportunity to gain insights and learn from those who have already achieved success. It’s an event not to be missed, so mark your calendar and join us for the next Good Life event.

Data Talent Showcase Event (4th Thursday of Every Month at 4 pm CST)

Our Data Talent Showcase Event is the next level of the CAP Competition where the top three winners compete against each other. It’s an event where judges from the industry come to evaluate and select the winner based on the projects presented. This event is a great opportunity for students to showcase their skills and receive feedback from industry experts. Join us at the event and witness the competition among the best students, and see who comes out on top!

Discover the electrifying world of events that Colaberry organizes for students and alumni, aimed at fostering continuous growth and progress in the ever-evolving realm of Business Intelligence and Analytics. With an ever-changing landscape, our dynamic and captivating lineup of events ensures that you stay ahead of the curve and are continuously intrigued. Get ready to be swept off your feet by the exciting opportunities that await you!

To see our upcoming events, <click here>

Jupyter Hub Architecture Diagram

Posted on March 15, 2021 by Yash .

Serving Jupyter Notebooks to Thousands of Users

In our organization, Colaberry Inc, we provide professionals from various backgrounds and various levels of experience, with the platform and the opportunity to learn Data Analytics and Data Science. In order to teach Data Science, the Jupyter Notebook platform is one of the most important tools. A Jupyter Notebook is a document within an open-source web application that allows you to create and share documents that contain live code, equations, visualizations, and narrative text.

In this blog, we will learn the basic architecture of JupyterHub, the multi-user jupyter notebook platform, its working mechanism, and finally how to set up jupyter notebooks to serve a large user base.

Why Jupyter Notebooks?

In our platform, refactored.ai we provide users an opportunity to learn Data Science and AI by providing courses and lessons on Data Science and machine learning algorithms, the basics of the python programming language, and topics such as data handling and data manipulation.

Our approach to teaching these topics is to provide an option to “Learn by doing”. In order to provide practical hands-on learning, the content is delivered using the Jupyter Notebooks technology.

Jupyter notebooks allow users to combine code, text, images, and videos in a single document. This also makes it easy for students to share their work with peers and instructors. Jupyter notebook also gives users access to computational environments and resources without burdening the users with installation and maintenance tasks.

Limitations

One of the limitations of the Jupyter Notebook server is that it is a single-user environment. When you are teaching a group of students learning data science, the basic Jupyter Notebook server falls short of serving all the users.

JupyterHub comes to our rescue when it comes to serving multiple users, with their own separate Jupyter Notebook servers seamlessly. This makes JupyterHub equivalent to a web application that could be integrated into any web-based platform, unlike the regular jupyter notebooks.

JupyterHub Architecture

The below diagram is a visual explanation of the various components of the JupyterHub platform. In the subsequent sections, we shall see what each component is and how the various components work together to serve multiple users with jupyter notebooks.

Components of JupyterHub

Notebooks

At the core of this platform are the Jupyter Notebooks. These are live documents that contain user code, write-up or documentation, and results of code execution in a single document. The contents of the notebook are rendered in the browser directly. They come with a file extension .ipynb. The figure below depicts how a jupyter notebook looks:

 

Notebook Server

As mentioned above, the notebook servers serve jupyter notebooks as .ipynb files. The browser loads the notebooks and then interacts with the notebook server via sockets. The code in the notebook is executed in the notebook server. These are single-user servers by design.

Hub

Hub is the architecture that supports serving jupyter notebooks to multiple users. In order to support multiple users, the Hub uses several components such as Authenticator, User Database, and Spawner.

Authenticator

This component is responsible for authenticating the user via one of the several authentication mechanisms. It supports OAuth, GitHub, and Google to name a few of the several available options. This component is responsible for providing an Auth Token after the user is successfully authenticated. This token is used to provide access for the corresponding user.

Refer to JupyterHub documentation for an exhaustive list of options. One of the notable options is using an identity aggregator platform such as Auth0 that supports several other options.

User Database

Internally, Jupyter Hub uses a user database to store the user information to spawn separate user pods for the logged-in user and then serve notebooks contained within the user pods for individual users.

Spawner

A spawner is a worker component that creates individual servers or user pods for each user allowed to access JupyterHub. This mechanism ensures multiple users are served simultaneously. It is to be noted that there is a predefined limitation on the number of the simultaneous first-time spawn of user pods, which is roughly about 80 simultaneous users. However, this does not impact the regular usage of the individual servers after initial user pod creation.

How It All Works Together

The mechanism used by JupyterHub to authenticate multiple users and provide them with their own Jupyter Notebook servers is described below.

The user requests access to the Jupyter notebook via the JupyterHub (JH) server.
The JupyterHub then authenticates the user using one of the configured authentication mechanisms such as OAuth. This returns an auth token to the user to access the user pod.
A separate Jupyter Notebook server is created and the user is provided access to it.
The requested notebook in that server is returned to the user in the browser.
The user then writes code (or documentation text) in the notebook.
The code is then executed in the notebook server and the response is returned to the user’s browser.

Deployment and Scalability

The JupyterHub servers could be deployed in two different approaches:
Deployed on the cloud platforms such as AWS or Google Cloud platform. This uses Docker and Kubernetes clusters in order to scale the servers to support thousands of users.
A lightweight deployment on a single virtual instance to support a small set of users.

Scalability

In order to support a few thousand users and more, we use the Kubernetes cluster deployment on the Google Cloud platform. Alternatively, this could also have been done on the Amazon AWS platform to support a similar number of users.

This uses a Hub instance and multiple user instances each of which is known as a pod. (Refer to the architecture diagram above). This deployment architecture scales well to support a few thousand users seamlessly.

To learn more about how to set up your own JupyterHub instance, refer to the Zero to JupyterHub documentation.

Conclusion

JupyterHub is a scalable architecture of Jupyter Notebook servers that supports thousands of users in a maintainable cluster environment on popular cloud platforms.

This architecture suits several use cases with thousands of users and a large number of simultaneous users, for example, an online Data Science learning platform such as refactored.ai